8000 Support Multi-Blank Transducer in Espnet2 by jctian98 · Pull Request #4876 · espnet/espnet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Support Multi-Blank Transducer in Espnet2 #4876

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Mar 25, 2023
Merged

Conversation

jctian98
Copy link
Contributor

This PR attempts to support multi-blank transducer in Espnet 2

Paper related: https://arxiv.org/abs/2211.03541

Acknowledgment: The original implementation is from NeMo (https://github.com/NVIDIA/NeMo) and by @hainan-xv. The original PR in NeMo is NVIDIA/NeMo#5527

@mergify mergify bot added the ESPnet2 label Jan 18, 2023
@jctian98
Copy link
Contributor Author

Still in progress. I'll report some results on Librispeech when this PR is fully validated.

@ftshijt ftshijt added New Features ASR Automatic speech recogntion labels Jan 19, 2023
@sw005320 sw005320 added this to the v.202303 milestone Feb 7, 2023
@jctian98
Copy link
Contributor Author
jctian98 commented Feb 9, 2023

The current version is validated on Librispeech dataset. Below are the results.
Experiments suggest the Multi-Blank Transducer criterion outperforms the vanilla transducer in speed and accuracy, which is well-aligned with the paper.

Criterion Big blanks Multi_blank_sigma Test-clean WER Test-other WER RTF
Vanilla Transducer - - 2.9 6.7 0.461
Multi-Blank Transducer [2,4,8] 0.05 2.8 6.6 0.324
Multi-Blank Transducer [2,3,4,5,6,7,8] 0.05 2.8 6.4 0.325

Several hints for this method:

  1. Currently the multi-blank transducer method is only compatible with greedy search.
  2. The multi-blank transducer requires more training epochs. In the standard Espent recipe, we train for 25 epochs. But in these experiments, the number of epochs is 50.
  3. In the internal implementation of the multi-blank transducer, the order of big_blanks would be reversed (which is not expected). E.g., if the multi_blank transducer accepts the argument big_blank_durations [2,4,8], it actually considers the sequence as [8, 4, 2]. Although this has been handled in this PR, making a record is good.

I am happy to collaborate so that this PR can pass the CI. @ftshijt

Copy link
Member
@b-flo b-flo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jctian98 ,

It would be better to move the rnnt_multi_blank folder under asr/transducer instead of asr_transducer.

The second folder is reserved for the standalone version of ESPnet2-Transducer while the first is for the shared version with CTC/Att, which you're working with.

P.S: Unrelated but is the RTF you reported for vanilla Transducer with greedy search also? It seems really slow.

@hainan-xv
Copy link

The current version is validated on Librispeech dataset. Below are the results. Experiments suggest the Multi-Blank Transducer criterion outperforms the vanilla transducer in speed and accuracy, which is well-aligned with the paper.

Criterion Big blanks Multi_blank_sigma Test-clean WER Test-other WER RTF
Vanilla Transducer - - 2.9 6.7 0.461
Multi-Blank Transducer [2,4,8] 0.05 2.8 6.6 0.324
Multi-Blank Transducer [2,3,4,5,6,7,8] 0.05 2.8 6.4 0.325
Several hints for this method:

  1. Currently the multi-blank transducer method is only compatible with greedy search.
  2. The multi-blank transducer requires more training epochs. In the standard Espent recipe, we train for 25 epochs. But in these experiments, the number of epochs is 50.
  3. In the internal implementation of the multi-blank transducer, the order of big_blanks would be reversed (which is not expected). E.g., if the multi_blank transducer accepts the argument big_blank_durations [2,4,8], it actually considers the sequence as [8, 4, 2]. Although this has been handled in this PR, making a record is good.

I am happy to collaborate so that this PR can pass the CI. @ftshijt

Congrats on making this work with ESPnet and thanks!

@sw005320
Copy link
Contributor
sw005320 commented Feb 9, 2023

Hi @jctian98 ,

It would be better to move the rnnt_multi_blank folder under asr/transducer instead of asr_transducer.

The second folder is reserved for the standalone version of ESPnet2-Transducer while the first is for the shared version with CTC/Att, which you're working with.

I agree

@sw005320
Copy link
Contributor
sw005320 commented Feb 9, 2023

@sw005320 sw005320 added the RNNT (RNN) transducer related issue label Feb 9, 2023
@jctian98
Copy link
Contributor Author
jctian98 commented Feb 9, 2023

@b-flo

P.S: Unrelated but is the RTF you reported for vanilla Transducer with greedy search also? It seems really slow.

All decoding is with greedy search. It is slow because I use $infernece_nj=88. After setting $infernece_nj=20, the three RTF numbers become 0.190, 0.152 and 0.157 respectively. (I am not very sure how many physical CPU cores are on my machine)

It would be better to move the rnnt_multi_blank folder under asr/transducer instead of asr_transducer.

Moved

@sw005320

Can you apply black?

Didn't know it. Have applied now.

@hainan-xv

Congrats on making this work with ESPnet and thanks!

Thanks for your kind support!

@jctian98 jctian98 requested a review from b-flo February 9, 2023 18:11
@b-flo
Copy link
Member
b-flo commented Feb 9, 2023

I'll review the PR in a few days, thanks again for this contribution @jctian98

All decoding is with greedy search. It is slow because I use $infernece_nj=88. After setting $infernece_nj=20, the three RTF numbers become 0.190, 0.152 and 0.157 respectively. (I am not very sure how many physical CPU cores are on my machine)

OK great, thank you! You can look at /proc/cpuinfo or use top (press 1 to show CPU cores). I assume the inference_nj value is too high given the RTF difference.

Btw, do you have WER and RTF for the vanilla Transducer with default beam search, please?

Edit: You duplicated the rnnt_multi_blank folder, it's in asr/transducer and asr_transducer now!

@jctian98
Copy link
Contributor Author

Here I update the experimental results like below:

Criterion Big blanks Multi_blank_sigma Test-clean WER Test-other WER RTF (nj=20)
Vanilla Transducer - Greedy Search - - 2.9 6.7 0.190
Vanilla Transducer - Default Beam Search (beam=10) - - 2.8 6.5 0.750
Multi-Blank Transducer - Greedy Search [2,4,8] 0.05 2.8 6.6 0.152
Multi-Blank Transducer - Greedy Search [2,3,4,5,6,7,8] 0.05 2.8 6.4 0.157

I have checked /proc/cpuinfo and it suggests I have 88 CPU cores. Maybe some competition of system resources would slow down the RTF when $inference_nj=88. After changing from $inference_nj=20 to $inference_nj=10, the RTF gets better only slightly. So I suppose $inference_nj=20 is valid on my machine.

I have also report the WER and RTF for vanilla transducer beam search. No LM used.

I have removed asr_transducer/multi_blank_rnnt folder. Sorry for this mistake.

CI suggests the black should be applied to two files. But it seems applying black to these files on my local machine doesn't change them. Sorry I don't know how to get through this. Please help. Thanks! :)

Copy link
Member
@b-flo b-flo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI suggests the black should be applied to two files. But it seems applying black to these files on my local machine doesn't change them. Sorry I don't know how to get through this. Please help. Thanks! :)

Below are the CI flagged issues in your PR. There are also 2 more in asr_inference.py l.419 and l.686:

        else:                                                                                        
                                                                                                     
            # Normal ASR

You can skip them in this PR I think. The files are correctly formated in master, you should update your branch.

@b-flo
Copy link
Member
b-flo commented Feb 10, 2023

I have checked /proc/cpuinfo and it suggests I have 88 CPU cores. Maybe some competition of system resources would slow down the RTF when $inference_nj=88. After changing from $inference_nj=20 to $inference_nj=10, the RTF gets better only slightly. So I suppose $inference_nj=20 is valid on my machine.

Maybe some bottleneck but I'll take a look. Did you use calculate_rtf utils?

Edit: Wait, /proc/cpuinfo will show all core and threads, and during decoding mono threading is used by default. What's your CPU model?

I have also report the WER and RTF for vanilla transducer beam search. No LM used.

Thank you for the report! Well, that's a rather high RTF!
It may be worth further investigation. I'm currently using a somewhat similar architecture and same decoding parameters for Libri-100 and end-up with an RTF of ~0.246 (~0.1 for ALSD and mAES for same perfs). Without taking into account difference between the shared/standalone versions, It may help us improve some parts.

jctian98 and others added 3 commits February 10, 2023 23:16
Co-authored-by: Florian Boyer <41155456+b-flo@users.noreply.github.com>
Co-authored-by: Florian Boyer <41155456+b-flo@users.noreply.github.com>
Copy link
Member
@b-flo b-flo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It can be merged after CI fixes (most of them come from loss computation files).

@sw005320
Copy link
Contributor

Good work @jctian98. I didn't review all the code in rnnt_multi_blank but the rest is OK outside some minor things.

That's being said, I'm wondering if rnnt_multi_blank should be an external dependency alongside warp-transducer. Or at least, it should be put somewhere that make more "sense". It's for another PR but I think we should discuss what to do with such implementations. @sw005320 Do you have an opinion on that?

I agree.
Yes, we can think of it 67E6 in another PR.
Transducer is also used in speech translation and it is better to put it in a better place and we can put some custom losses (or make it as an external tool).
I'm thinking of putting this in espnet/espnet/tree/master/espnet2/loss/rnnt_multi_blank or something like that.

(Or we could also extend warp-transducer for multi-blank and remove numba dependency with that.)

This is a nice idea.
We can add this to our TODO list.

@sw005320
Copy link
Contributor

@jctian98, I think the current CI issue comes from flack8
Can you check it?

@jctian98
Copy link
Contributor Author

Hi, sorry for my late reply.

(1) On my local machine, the command flake8 espnet2/asr/transducer/rnnt_multi_blank/ espnet2/asr/espnet_model.py espnet2/tasks/asr.py espnet2/bin/asr_inference.py gives no output so I suppose I have passed the flake8 locally with current commit.

(2) In addition, when running ci/test_flake8.sh, the output is

bash ci/test_flake8.sh 
flake8-docstrings ready files coverage: 178 / 234 = 76.0683%
espnet/nets/beam_search_timesync.py:195:44: E741 ambiguous variable name 'l'
        hyps = sorted(new_hyps, key=lambda l: scores[l], reverse=True)[: self.beam_size]

But the file mentioned above is not modified in this PR.

(3) In the CI check log, there are some comments suggesting multi line docstring summary not separated with an empty line. But the empty lines exactly exist after the multi-line doc strings. (e.g., espnet2/asr/transducer/rnnt_multi_blank/utils/cpu_utils/cpu_rnnt.py:338)

I'm a bit confused about the current status. Please let me know what should I do next.

@sw005320
Copy link
Contributor
sw005320 commented Feb 20, 2023

Can you check the versions of the test tools (used in your local and used in CI)?

@jctian98
Copy link
Contributor Author

On my local machine, I'm with flake8=6.0.0, isort=5.12.0, black=23.1.0 and pycodestyle=2.10.0. I suppose they are very new.

Sorry, I don't understand what the versions used in C are.

@sw005320
Copy link
Contributor

Sorry, I don't understand what the versions used in C are.

used in CI

@sw005320
Copy link
Contributor

image

@jctian98
Copy link
Contributor Author

Hi,

with flake8=5.0.4, ci/test_flake8.sh also gives no error message on my machine.

following this, I have also changed to flake8=3.9.2. ci/test_flake8.sh also gives no error message :(

@sw005320
Copy link
Contributor

@ftshijt, do you know what’s happening?

@codecov
Copy link
codecov bot commented Mar 16, 2023

Codecov Report

Merging #4876 (1cdb526) into master (b579d70) will decrease coverage by 1.17%.
The diff coverage is 4.82%.

@@            Coverage Diff             @@
##           master    #4876      +/-   ##
==========================================
- Coverage   77.03%   75.86%   -1.17%     
==========================================
  Files         606      615       +9     
  Lines       53755    54684     +929     
==========================================
+ Hits        41409    41487      +78     
- Misses      12346    13197     +851     
Flag Coverage Δ
test_integration_espnet1 66.29% <ø> (ø)
test_integration_espnet2 48.51% <32.00%> (+0.54%) ⬆️
test_python 65.83% <4.39%> (-1.03%) ⬇️
test_utils 23.28% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...spnet2/asr/transducer/rnnt_multi_blank/__init__.py 0.00% <0.00%> (ø)
espnet2/asr/transducer/rnnt_multi_blank/rnnt.py 0.00% <0.00%> (ø)
...sr/transducer/rnnt_multi_blank/rnnt_multi_blank.py 0.00% <0.00%> (ø)
...ducer/rnnt_multi_blank/utils/cpu_utils/cpu_rnnt.py 0.00% <0.00%> (ø)
...ucer/rnnt_multi_blank/utils/cuda_utils/gpu_rnnt.py 0.00% <0.00%> (ø)
...nt_multi_blank/utils/cuda_utils/gpu_rnnt_kernel.py 0.00% <0.00%> (ø)
...sducer/rnnt_multi_blank/utils/cuda_utils/reduce.py 0.00% <0.00%> (ø)
...nsducer/rnnt_multi_blank/utils/global_constants.py 0.00% <0.00%> (ø)
...r/transducer/rnnt_multi_blank/utils/rnnt_helper.py 0.00% <0.00%> (ø)
espnet2/asr/espnet_model.py 84.79% <50.00%> (+7.95%) ⬆️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320
Copy link
Contributor

@jctian98, we need to add

@jctian98
Copy link
Contributor Author

@jctian98, we need to add

Will work on it soon!

@mergify mergify bot added the CI Travis, Circle CI, etc label Mar 23, 2023
@mergify mergify bot added the README label Mar 23, 2023
@jctian98
Copy link
Contributor Author
jctian98 commented Mar 25, 2023

This is a follow-up.

  1. The previous test code does not address the transducer model defined in espnet2/asr/espnet_model.py (although it tests the standalone transducer espnet2/asr_transducer). This PR adds new code to test both the vanilla and Multi-Blank transducer in espnet2/asr/espnet_model.py .
  2. the Multi-Blank Transducer criterion only runs on GPU. It cannot be tested if the test environment has no GPU (like now, maybe).
  3. codecov/patch reports many cautions for the GPU-version Multi-Blank transducer implementation. This part of code cannot be covered unless a GPU is accessible. This is the only test that fails in CI.

Unit Test: see test/espnet2/asr/test_espnet_model.py and test/espnet2/asr/transducer/test_transducer_beam_search.py
Integration test: see ci/test_integration_espnet2.sh
Again, for both unit and integration tests, some code will not run unless a GPU is accessible. See some if in the code.

I suppose the test code may need an additional review.

The model has been uploaded to HF.

@sw005320
Copy link
Contributor

Thanks!
They look good to me, considering the current mult-blank transducer loss is only available for CUDA.

@hainan-xv, in the future, it would be great if you also prepare the CPU version for debugging and CI purposes.

@sw005320 sw005320 merged commit b937311 into espnet:master Mar 25, 2023
@jctian98 jctian98 changed the title [WIP] Support Multi-Blank Transducer in Espnet2 Support Multi-Blank Transducer in Espnet2 Mar 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion CI Travis, Circle CI, etc ESPnet2 New Features README RNNT (RNN) transducer related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0