8000 Add multi-channel enh_asr for CHiME-4 by YoshikiMas · Pull Request #4706 · espnet/espnet · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Add multi-channel enh_asr for CHiME-4 #4706

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 11, 2022
Merged

Conversation

YoshikiMas
Copy link
Contributor
@YoshikiMas YoshikiMas commented Oct 11, 2022

This PR supports enh_asr on the 6-channel recordings of the CHiME-4 dataset.
It is based on a paper entitled "End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation" accepted by SLT 2022.

TODO:

  • modify README.md with results and pretrained models

@mergify mergify bot added the ESPnet2 label Oct 11, 2022
@YoshikiMas
Copy link
Contributor Author

This PR also addresses this issue.

@codecov
Copy link
codecov bot commented Oct 11, 2022

Codecov Report

Merging #4706 (bd1b363) into master (e9d583b) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4706   +/-   ##
=======================================
  Coverage   80.45%   80.45%           
=======================================
  Files         527      527           
  Lines       46215    46215           
=======================================
  Hits        37181    37181           
  Misses       9034     9034           
Flag Coverage Δ
test_integration_espnet1 66.37% <ø> (ø)
test_integration_espnet2 49.06% <ø> (ø)
test_python 68.66% <ø> (ø)
test_utils 23.30% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 added Recipe ASR Automatic speech recogntion SE Speech enhancement labels Oct 11, 2022
@sw005320 sw005320 added this to the v.202211 milestone Oct 11, 2022
@mergify mergify bot added the README label Oct 12, 2022
@sw005320 sw005320 requested a review from Emrys365 October 12, 2022 11:26
Comment on lines +27 to +32
"../enh1/exp/enh_train_enh_beamformer_wpd_ci_sdr_shorttap_raw/valid.loss.best.pth:separator:enh_model.separator",
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:frontend:s2t_model.frontend",
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:preencoder:s2t_model.preencoder",
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:encoder:s2t_model.encoder",
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:ctc:s2t_model.ctc",
"../asr1/exp/asr_train_asr_conformer_wavlm2_raw_en_char/valid.acc.best.pth:decoder:s2t_model.decoder",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add configurations to obtain the models used here for initialization?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if you also add some notes in README.md to show how to reproduce the experiments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is similar to @Emrys365.
Can you provide a specific command (e.g., local/run_miris.sh) to reproduce the result, including pre-trained models (specifying a model or training a model)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-training configs specified in this config file (train_asr_conformer_wavlm2.yaml and train_enh_beamformer_wpd_ci_sdr_shorttap.yaml) are included in this pull request.
You mean an additional script, which performs the pre-training of SE and ASR models, would be helpful, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean an additional script, which performs the pre-training of SE and ASR models, would be helpful, right?

Yes!
Let's make local/run_miris.sh and also add a comment to README.md on how to train this model from scratch.

Comment on lines 106 to 107
utils/combine_data.sh data/tr05_multi_isolated_6ch_track data/tr05_simu_isolated_6ch_track data/tr05_real_isolated_6ch_track
utils/combine_data.sh data/${train_dev} data/dt05_simu_isolated_6ch_track data/dt05_real_isolated_6ch_track
Copy link
Collaborator
@Emrys365 Emrys365 Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to add these subsets in egs2/chime4/enh_asr1/local/data.sh?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out. These lines performed similar processing, and we can just remove the mentioned lines.
I'll read the detail and check the generated data after this meditation.

Copy link
Collaborator
@Emrys365 Emrys365 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I just left some minor comments.

@sw005320 sw005320 merged commit 209ffa0 into espnet:master Nov 11, 2022
@sw005320
Copy link
Contributor

I just merge this PR, but please make a follow-up PR, @YoshikiMas
This will improve the reproducibility of your work.

@YoshikiMas YoshikiMas mentioned this pull request Nov 11, 2022
@YoshikiMas YoshikiMas mentioned this pull request Nov 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion ESPnet2 README Recipe SE Speech enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0