Add multi-channel enh_asr for CHiME-4 #4706

YoshikiMas · 2022-10-11T10:13:47Z

This PR supports enh_asr on the 6-channel recordings of the CHiME-4 dataset.
It is based on a paper entitled "End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation" accepted by SLT 2022.

TODO:

modify README.md with results and pretrained models

YoshikiMas · 2022-10-11T10:15:37Z

This PR also addresses this issue.

codecov · 2022-10-11T10:37:39Z

Codecov Report

Merging #4706 (bd1b363) into master (e9d583b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4706   +/-   ##
=======================================
  Coverage   80.45%   80.45%           
=======================================
  Files         527      527           
  Lines       46215    46215           
=======================================
  Hits        37181    37181           
  Misses       9034     9034

Flag	Coverage Δ
test_integration_espnet1	`66.37% <ø> (ø)`
test_integration_espnet2	`49.06% <ø> (ø)`
test_python	`68.66% <ø> (ø)`
test_utils	`23.30% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Emrys365 · 2022-10-13T16:29:11Z

egs2/chime4/enh_asr1/conf/tuning/train_enh_asr_wpd_init_noenhloss_wavlm_conformer.yaml

+    "../enh1/exp/enh_train_enh_beamformer_wpd_ci_sdr_shorttap_raw/valid.loss.best.pth:separator:enh_model.separator",
+    "../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:frontend:s2t_model.frontend",
+    "../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:preencoder:s2t_model.preencoder",
+    "../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:encoder:s2t_model.encoder",
+    "../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:ctc:s2t_model.ctc",
+    "../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:decoder:s2t_model.decoder",


Could you also add configurations to obtain the models used here for initialization?

It would be nice if you also add some notes in README.md to show how to reproduce the experiments.

It is similar to @Emrys365.
Can you provide a specific command (e.g., local/run_miris.sh) to reproduce the result, including pre-trained models (specifying a model or training a model)?

The pre-training configs specified in this config file (train_asr_conformer_wavlm2.yaml and train_enh_beamformer_wpd_ci_sdr_shorttap.yaml) are included in this pull request.
You mean an additional script, which performs the pre-training of SE and ASR models, would be helpful, right?

You mean an additional script, which performs the pre-training of SE and ASR models, would be helpful, right?

Yes!
Let's make local/run_miris.sh and also add a comment to README.md on how to train this model from scratch.

egs2/chime4/enh_asr1/run.sh

Emrys365 · 2022-10-13T16:33:51Z

egs2/chime4/asr1/local/data.sh

+    utils/combine_data.sh data/tr05_multi_isolated_6ch_track data/tr05_simu_isolated_6ch_track data/tr05_real_isolated_6ch_track
+    utils/combine_data.sh data/${train_dev} data/dt05_simu_isolated_6ch_track data/dt05_real_isolated_6ch_track


Is it better to add these subsets in egs2/chime4/enh_asr1/local/data.sh?

Thank you for pointing out. These lines performed similar processing, and we can just remove the mentioned lines.
I'll read the detail and check the generated data after this meditation.

Emrys365

LGTM. I just left some minor comments.

sw005320 · 2022-11-11T15:25:03Z

I just merge this PR, but please make a follow-up PR, @YoshikiMas
This will improve the reproducibility of your work.

YoshikiMas added 2 commits October 11, 2022 18:57

add scripts for multi-iris

93d11a7

fix commentouted line related to matlab

8ed83f4

mergify bot added the ESPnet2 label Oct 11, 2022

sw005320 added Recipe ASR Automatic speech recogntion SE Speech enhancement labels Oct 11, 2022

sw005320 added this to the v.202211 milestone Oct 11, 2022

mergify bot added the README label Oct 12, 2022

YoshikiMas added 2 commits October 12, 2022 19:12

update README.md

b99fcf0

remove enh results for real data

84c402f

sw005320 requested a review from Emrys365 October 12, 2022 11:26

Emrys365 reviewed Oct 13, 2022

View reviewed changes

egs2/chime4/enh_asr1/run.sh Outdated Show resolved Hide resolved

Emrys365 reviewed Oct 13, 2022

View reviewed changes

Emrys365 approved these changes Oct 13, 2022

View reviewed changes

YoshikiMas and others added 6 commits October 14, 2022 16:27

add configurations for pre-training

a339d8a

remove comments

45c34ac

remove redundant combine_data.sh

c479445

remove redundant asr combine_data.sh

e45c034

Merge branch 'master' into multi-iris

d7c5c20

Merge branch 'master' into multi-iris

bd1b363

sw005320 merged commit 209ffa0 into espnet:master Nov 11, 2022

YoshikiMas mentioned this pull request Nov 11, 2022

MultiIRIS follow up #4765

Merged

YoshikiMas mentioned this pull request Nov 20, 2022

Issue in enh_asr.sh #4650

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add multi-channel enh_asr for CHiME-4 #4706

Add multi-channel enh_asr for CHiME-4 #4706

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		utils/combine_data.sh data/tr05_multi_isolated_6ch_track data/tr05_simu_isolated_6ch_track data/tr05_real_isolated_6ch_track
		utils/combine_data.sh data/${train_dev} data/dt05_simu_isolated_6ch_track data/dt05_real_isolated_6ch_track

Add multi-channel enh_asr for CHiME-4 #4706

Add multi-channel enh_asr for CHiME-4 #4706

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!