Support Multi-Blank Transducer in Espnet2 #4876

jctian98 · 2023-01-18T17:06:35Z

This PR attempts to support multi-blank transducer in Espnet 2

Paper related: https://arxiv.org/abs/2211.03541

Acknowledgment: The original implementation is from NeMo (https://github.com/NVIDIA/NeMo) and by @hainan-xv. The original PR in NeMo is NVIDIA/NeMo#5527

jctian98 · 2023-01-18T17:13:17Z

Still in progress. I'll report some results on Librispeech when this PR is fully validated.

merge remote espnet into local for push

jctian98 · 2023-02-09T07:45:14Z

The current version is validated on Librispeech dataset. Below are the results.
Experiments suggest the Multi-Blank Transducer criterion outperforms the vanilla transducer in speed and accuracy, which is well-aligned with the paper.

Criterion	Big blanks	Multi_blank_sigma	Test-clean WER	Test-other WER	RTF
Vanilla Transducer	-	-	2.9	6.7	0.461
Multi-Blank Transducer	[2,4,8]	0.05	2.8	6.6	0.324
Multi-Blank Transducer	[2,3,4,5,6,7,8]	0.05	2.8	6.4	0.325

Several hints for this method:

Currently the multi-blank transducer method is only compatible with greedy search.
The multi-blank transducer requires more training epochs. In the standard Espent recipe, we train for 25 epochs. But in these experiments, the number of epochs is 50.
In the internal implementation of the multi-blank transducer, the order of big_blanks would be reversed (which is not expected). E.g., if the multi_blank transducer accepts the argument big_blank_durations [2,4,8], it actually considers the sequence as [8, 4, 2]. Although this has been handled in this PR, making a record is good.

I am happy to collaborate so that this PR can pass the CI. @ftshijt

b-flo

Hi @jctian98 ,

It would be better to move the rnnt_multi_blank folder under asr/transducer instead of asr_transducer.

The second folder is reserved for the standalone version of ESPnet2-Transducer while the first is for the shared version with CTC/Att, which you're working with.

P.S: Unrelated but is the RTF you reported for vanilla Transducer with greedy search also? It seems really slow.

espnet2/asr/espnet_model.py

hainan-xv · 2023-02-09T15:23:43Z

The current version is validated on Librispeech dataset. Below are the results. Experiments suggest the Multi-Blank Transducer criterion outperforms the vanilla transducer in speed and accuracy, which is well-aligned with the paper.

Criterion Big blanks Multi_blank_sigma Test-clean WER Test-other WER RTF
Vanilla Transducer - - 2.9 6.7 0.461
Multi-Blank Transducer [2,4,8] 0.05 2.8 6.6 0.324
Multi-Blank Transducer [2,3,4,5,6,7,8] 0.05 2.8 6.4 0.325
Several hints for this method:

Currently the multi-blank transducer method is only compatible with greedy search.

The multi-blank transducer requires more training epochs. In the standard Espent recipe, we train for 25 epochs. But in these experiments, the number of epochs is 50.

In the internal implementation of the multi-blank transducer, the order of big_blanks would be reversed (which is not expected). E.g., if the multi_blank transducer accepts the argument big_blank_durations [2,4,8], it actually considers the sequence as [8, 4, 2]. Although this has been handled in this PR, making a record is good.

I am happy to collaborate so that this PR can pass the CI. @ftshijt

Congrats on making this work with ESPnet and thanks!

sw005320 · 2023-02-09T16:14:01Z

Hi @jctian98 ,

It would be better to move the rnnt_multi_blank folder under asr/transducer instead of asr_transducer.

The second folder is reserved for the standalone version of ESPnet2-Transducer while the first is for the shared version with CTC/Att, which you're working with.

I agree

sw005320 · 2023-02-09T16:14:50Z

Can you apply black?
https://github.com/espnet/espnet/actions/runs/4132028864/jobs/7140252200#step:8:22

jctian98 · 2023-02-09T18:05:28Z

@b-flo

P.S: Unrelated but is the RTF you reported for vanilla Transducer with greedy search also? It seems really slow.

All decoding is with greedy search. It is slow because I use $infernece_nj=88. After setting $infernece_nj=20, the three RTF numbers become 0.190, 0.152 and 0.157 respectively. (I am not very sure how many physical CPU cores are on my machine)

It would be better to move the rnnt_multi_blank folder under asr/transducer instead of asr_transducer.

Moved

@sw005320

Can you apply black?

Didn't know it. Have applied now.

@hainan-xv

Congrats on making this work with ESPnet and thanks!

Thanks for your kind support!

b-flo · 2023-02-09T18:38:56Z

I'll review the PR in a few days, thanks again for this contribution @jctian98

All decoding is with greedy search. It is slow because I use $infernece_nj=88. After setting $infernece_nj=20, the three RTF numbers become 0.190, 0.152 and 0.157 respectively. (I am not very sure how many physical CPU cores are on my machine)

OK great, thank you! You can look at /proc/cpuinfo or use top (press 1 to show CPU cores). I assume the inference_nj value is too high given the RTF difference.

Btw, do you have WER and RTF for the vanilla Transducer with default beam search, please?

Edit: You duplicated the rnnt_multi_blank folder, it's in asr/transducer and asr_transducer now!

jctian98 · 2023-02-10T10:19:29Z

Here I update the experimental results like below:

Criterion	Big blanks	Multi_blank_sigma	Test-clean WER	Test-other WER	RTF (nj=20)
Vanilla Transducer - Greedy Search	-	-	2.9	6.7	0.190
Vanilla Transducer - Default Beam Search (beam=10)	-	-	2.8	6.5	0.750
Multi-Blank Transducer - Greedy Search	[2,4,8]	0.05	2.8	6.6	0.152
Multi-Blank Transducer - Greedy Search	[2,3,4,5,6,7,8]	0.05	2.8	6.4	0.157

I have checked /proc/cpuinfo and it suggests I have 88 CPU cores. Maybe some competition of system resources would slow down the RTF when $inference_nj=88. After changing from $inference_nj=20 to $inference_nj=10, the RTF gets better only slightly. So I suppose $inference_nj=20 is valid on my machine.

I have also report the WER and RTF for vanilla transducer beam search. No LM used.

I have removed asr_transducer/multi_blank_rnnt folder. Sorry for this mistake.

CI suggests the black should be applied to two files. But it seems applying black to these files on my local machine doesn't change them. Sorry I don't know how to get through this. Please help. Thanks! :)

b-flo

CI suggests the black should be applied to two files. But it seems applying black to these files on my local machine doesn't change them. Sorry I don't know how to get through this. Please help. Thanks! :)

Below are the CI flagged issues in your PR. There are also 2 more in asr_inference.py l.419 and l.686:

        else:                                                                                        
                                                                                                     
            # Normal ASR

~~You can skip them in this PR I think.~~ The files are correctly formated in master, you should update your branch.

espnet2/asr/transducer/rnnt_multi_blank/utils/rnnt_helper.py

espnet2/bin/asr_inference.py

b-flo · 2023-02-10T11:28:15Z

I have checked /proc/cpuinfo and it suggests I have 88 CPU cores. Maybe some competition of system resources would slow down the RTF when $inference_nj=88. After changing from $inference_nj=20 to $inference_nj=10, the RTF gets better only slightly. So I suppose $inference_nj=20 is valid on my machine.

Maybe some bottleneck but I'll take a look. Did you use calculate_rtf utils?

Edit: Wait, /proc/cpuinfo will show all core and threads, and during decoding mono threading is used by default. What's your CPU model?

I have also report the WER and RTF for vanilla transducer beam search. No LM used.

Thank you for the report! Well, that's a rather high RTF!
It may be worth further investigation. I'm currently using a somewhat similar architecture and same decoding parameters for Libri-100 and end-up with an RTF of ~0.246 (~0.1 for ALSD and mAES for same perfs). Without taking into account difference between the shared/standalone versions, It may help us improve some parts.

Co-authored-by: Florian Boyer <41155456+b-flo@users.noreply.github.com>

b-flo

LGTM. It can be merged after CI fixes (most of them come from loss computation files).

sw005320 · 2023-02-17T13:01:09Z

Good work @jctian98. I didn't review all the code in rnnt_multi_blank but the rest is OK outside some minor things.

That's being said, I'm wondering if rnnt_multi_blank should be an external dependency alongside warp-transducer. Or at least, it should be put somewhere that make more "sense". It's for another PR but I think we should discuss what to do with such implementations. @sw005320 Do you have an opinion on that?

I agree.
Yes, we can think of it 67E6 in another PR.
Transducer is also used in speech translation and it is better to put it in a better place and we can put some custom losses (or make it as an external tool).
I'm thinking of putting this in espnet/espnet/tree/master/espnet2/loss/rnnt_multi_blank or something like that.

(Or we could also extend warp-transducer for multi-blank and remove numba dependency with that.)

This is a nice idea.
We can add this to our TODO list.

sw005320 · 2023-02-17T13:01:58Z

@jctian98, I think the current CI issue comes from flack8
Can you check it?

jctian98 · 2023-02-20T16:12:14Z

Hi, sorry for my late reply.

(1) On my local machine, the command flake8 espnet2/asr/transducer/rnnt_multi_blank/ espnet2/asr/espnet_model.py espnet2/tasks/asr.py espnet2/bin/asr_inference.py gives no output so I suppose I have passed the flake8 locally with current commit.

(2) In addition, when running ci/test_flake8.sh, the output is

bash ci/test_flake8.sh 
flake8-docstrings ready files coverage: 178 / 234 = 76.0683%
espnet/nets/beam_search_timesync.py:195:44: E741 ambiguous variable name 'l'
        hyps = sorted(new_hyps, key=lambda l: scores[l], reverse=True)[: self.beam_size]

But the file mentioned above is not modified in this PR.

(3) In the CI check log, there are some comments suggesting multi line docstring summary not separated with an empty line. But the empty lines exactly exist after the multi-line doc strings. (e.g., espnet2/asr/transducer/rnnt_multi_blank/utils/cpu_utils/cpu_rnnt.py:338)

I'm a bit confused about the current status. Please let me know what should I do next.

sw005320 · 2023-02-20T16:17:11Z

Can you check the versions of the test tools (used in your local and used in CI)?

jctian98 · 2023-02-20T16:33:23Z

On my local machine, I'm with flake8=6.0.0, isort=5.12.0, black=23.1.0 and pycodestyle=2.10.0. I suppose they are very new.

Sorry, I don't understand what the versions used in C are.

sw005320 · 2023-02-20T17:09:19Z

Sorry, I don't understand what the versions used in C are.

used in CI

sw005320 · 2023-02-20T17:30:45Z

jctian98 · 2023-02-21T09:02:13Z

Hi，

with flake8=5.0.4, ci/test_flake8.sh also gives no error message on my machine.

following this, I have also changed to flake8=3.9.2. ci/test_flake8.sh also gives no error message :(

sw005320 · 2023-02-21T14:29:40Z

@ftshijt, do you know what’s happening?

for more information, see https://pre-commit.ci

codecov · 2023-03-16T14:12:39Z

Codecov Report

Merging #4876 (1cdb526) into master (b579d70) will decrease coverage by 1.17%.
The diff coverage is 4.82%.

@@            Coverage Diff             @@
##           master    #4876      +/-   ##
==========================================
- Coverage   77.03%   75.86%   -1.17%     
==========================================
  Files         606      615       +9     
  Lines       53755    54684     +929     
==========================================
+ Hits        41409    41487      +78     
- Misses      12346    13197     +851

Flag	Coverage Δ
test_integration_espnet1	`66.29% <ø> (ø)`
test_integration_espnet2	`48.51% <32.00%> (+0.54%)`	⬆️
test_python	`65.83% <4.39%> (-1.03%)`	⬇️
test_utils	`23.28% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...spnet2/asr/transducer/rnnt_multi_blank/__init__.py	`0.00% <0.00%> (ø)`
espnet2/asr/transducer/rnnt_multi_blank/rnnt.py	`0.00% <0.00%> (ø)`
...sr/transducer/rnnt_multi_blank/rnnt_multi_blank.py	`0.00% <0.00%> (ø)`
...ducer/rnnt_multi_blank/utils/cpu_utils/cpu_rnnt.py	`0.00% <0.00%> (ø)`
...ucer/rnnt_multi_blank/utils/cuda_utils/gpu_rnnt.py	`0.00% <0.00%> (ø)`
...nt_multi_blank/utils/cuda_utils/gpu_rnnt_kernel.py	`0.00% <0.00%> (ø)`
...sducer/rnnt_multi_blank/utils/cuda_utils/reduce.py	`0.00% <0.00%> (ø)`
...nsducer/rnnt_multi_blank/utils/global_constants.py	`0.00% <0.00%> (ø)`
...r/transducer/rnnt_multi_blank/utils/rnnt_helper.py	`0.00% <0.00%> (ø)`
espnet2/asr/espnet_model.py	`84.79% <50.00%> (+7.95%)`	⬆️
... and 3 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

sw005320 · 2023-03-21T12:37:56Z

@jctian98, we need to add

unit test https://github.com/espnet/espnet/tree/master/test/espnet2/asr
integration test https://github.com/espnet/espnet/blob/master/ci/test_integration_espnet2.sh
upload a model and corresponding results to egs2/librispeech/asr1/README.md

jctian98 · 2023-03-21T13:36:26Z

@jctian98, we need to add

unit test https://github.com/espnet/espnet/tree/master/test/espnet2/asr

integration test https://github.com/espnet/espnet/blob/master/ci/test_integration_espnet2.sh

upload a model and corresponding results to egs2/librispeech/asr1/README.md

Will work on it soon!

for more information, see https://pre-commit.ci

jctian98 · 2023-03-25T13:08:34Z

This is a follow-up.

The previous test code does not address the transducer model defined in espnet2/asr/espnet_model.py (although it tests the standalone transducer espnet2/asr_transducer). This PR adds new code to test both the vanilla and Multi-Blank transducer in espnet2/asr/espnet_model.py .
the Multi-Blank Transducer criterion only runs on GPU. It cannot be tested if the test environment has no GPU (like now, maybe).
codecov/patch reports many cautions for the GPU-version Multi-Blank transducer implementation. This part of code cannot be covered unless a GPU is accessible. This is the only test that fails in CI.

Unit Test: see test/espnet2/asr/test_espnet_model.py and test/espnet2/asr/transducer/test_transducer_beam_search.py
Integration test: see ci/test_integration_espnet2.sh
Again, for both unit and integration tests, some code will not run unless a GPU is accessible. See some if in the code.

I suppose the test code may need an additional review.

The model has been uploaded to HF.

sw005320 · 2023-03-25T14:54:53Z

Thanks!
They look good to me, considering the current mult-blank transducer loss is only available for CUDA.

@hainan-xv, in the future, it would be great if you also prepare the CPU version for debugging and CI purposes.

tyriontian and others added 2 commits January 19, 2023 00:27

primary commit of Multi-Blank Transducer

ef66f00

Merge branch 'espnet:master' into master

452c7e0

mergify bot added the ESPnet2 label Jan 18, 2023

ftshijt added New Features ASR Automatic speech recogntion labels Jan 19, 2023

tyriontian added 2 commits February 1, 2023 15:57

fix the bug of wrong order big blanks

1c25dfc

Merge remote-tracking branch 'origin/master'

af203d3

merge remote espnet into local for push

sw005320 added this to the v.202303 milestone Feb 7, 2023

update train config

8c880fa

b-flo requested changes Feb 9, 2023

View reviewed changes

b-flo reviewed Feb 9, 2023

View reviewed changes

espnet2/asr/espnet_model.py Outdated Show resolved Hide resolved

sw005320 added the RNNT (RNN) transducer related issue label Feb 9, 2023

tyriontian added 3 commits February 10, 2023 01:39

move multi_blank rnnt to asr/transducer

f6aec85

do not comment fast-emit

b77e7dd

apply black

b2d0774

jctian98 requested a review from b-flo February 9, 2023 18:11

remove duplicate folder

d6bf2a0

b-flo reviewed Feb 10, 2023

View reviewed changes

espnet2/asr/transducer/rnnt_multi_blank/utils/rnnt_helper.py Outdated Show resolved Hide resolved

espnet2/bin/asr_inference.py Outdated Show resolved Hide resolved

jctian98 and others added 3 commits February 10, 2023 23:16

Update espnet2/asr/transducer/rnnt_multi_blank/utils/rnnt_helper.py

7dc83e8

Co-authored-by: Florian Boyer <41155456+b-flo@users.noreply.github.com>

Update espnet2/bin/asr_inference.py

2cf4c49

Co-authored-by: Florian Boyer <41155456+b-flo@users.noreply.github.com>

Merge branch 'espnet:master' into master

e58313d

add a warning when LM is used for greedy search

27af40d

b-flo approved these changes Feb 17, 2023

View reviewed changes

sw005320 and others added 2 commits March 16, 2023 09:22

Merge branch 'master' into master

bc84915

[pre-commit.ci] auto fixes from pre-commit.com hooks

40ca613

for more information, see https://pre-commit.ci

Merge branch 'espnet:master' into master

25b4934

Add CI test

9288156

mergify bot added the CI Travis, Circle CI, etc label Mar 23, 2023

[pre-commit.ci] auto fixes from pre-commit.com hooks

584c9fa

for more information, see https://pre-commit.ci

Update Librispeech README

7965aab

Merge branch 'espnet:master' into master

144d204

Solve conflict

76d8514

mergify bot added the README label Mar 23, 2023

tyriontian added 2 commits March 25, 2023 01:11

add transducer integration test for espnet2/asr/espnet_model.py

ebaa5cb

complete shell test

1cdb526

sw005320 merged commit b937311 into espnet:master Mar 25, 2023

jctian98 changed the title ~~[WIP] Support Multi-Blank Transducer in Espnet2~~ Support Multi-Blank Transducer in Espnet2 Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Multi-Blank Transducer in Espnet2 #4876

Support Multi-Blank Transducer in Espnet2 #4876

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Support Multi-Blank Transducer in Espnet2 #4876

Support Multi-Blank Transducer in Espnet2 #4876

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!