-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add multi-channel enh_asr for CHiME-4 #4706
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This PR also addresses this issue. |
Codecov Report
@@ Coverage Diff @@
## master #4706 +/- ##
=======================================
Coverage 80.45% 80.45%
=======================================
Files 527 527
Lines 46215 46215
=======================================
Hits 37181 37181
Misses 9034 9034
Flags with carried forward coverage won't be shown. Click here to find out more. 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
"../enh1/exp/enh_train_enh_beamformer_wpd_ci_sdr_shorttap_raw/valid.loss.best.pth:separator:enh_model.separator", | ||
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:frontend:s2t_model.frontend", | ||
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:preencoder:s2t_model.preencoder", | ||
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:encoder:s2t_model.encoder", | ||
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:ctc:s2t_model.ctc", | ||
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:decoder:s2t_model.decoder", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add configurations to obtain the models used here for initialization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if you also add some notes in README.md to show how to reproduce the experiments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is similar to @Emrys365.
Can you provide a specific command (e.g., local/run_miris.sh
) to reproduce the result, including pre-trained models (specifying a model or training a model)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pre-training configs specified in this config file (train_asr_conformer_wavlm2.yaml
and train_enh_beamformer_wpd_ci_sdr_shorttap.yaml
) are included in this pull request.
You mean an additional script, which performs the pre-training of SE and ASR models, would be helpful, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean an additional script, which performs the pre-training of SE and ASR models, would be helpful, right?
Yes!
Let's make local/run_miris.sh
and also add a comment to README.md
on how to train this model from scratch.
egs2/chime4/asr1/local/data.sh
Outdated
utils/combine_data.sh data/tr05_multi_isolated_6ch_track data/tr05_simu_isolated_6ch_track data/tr05_real_isolated_6ch_track | ||
utils/combine_data.sh data/${train_dev} data/dt05_simu_isolated_6ch_track data/dt05_real_isolated_6ch_track |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better to add these subsets in egs2/chime4/enh_asr1/local/data.sh
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out. These lines performed similar processing, and we can just remove the mentioned lines.
I'll read the detail and check the generated data after this meditation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I just left some minor comments.
I just merge this PR, but please make a follow-up PR, @YoshikiMas |
This PR supports enh_asr on the 6-channel recordings of the CHiME-4 dataset.
It is based on a paper entitled "End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation" accepted by SLT 2022.
TODO: