-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Add ASR recipe of MAGICDATA mandarin read speech #4635
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ftshijt
merged 20 commits into
espnet:master
from
tjysdsg:magicdata_mandarin_read_speech
Sep 24, 2022
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
6b39896
Add MAGICDATA entry to db.sh
77c945b
Add boilderplate scripts for magicdata
tjysdsg cf710bc
Add MAGICDATA data preparation scripts
tjysdsg 9b7724e
Remove unicode whitespaces in transcription
tjysdsg fb96781
Add training scripts and configs
tjysdsg e443eec
Reduce batch_bins because of cuda OOM error
tjysdsg dfef899
Reduce grad accum and clip to avoid overfitting
tjysdsg de23a67
Revert chagnes to batch_bins and grad_accum/grad_clip, use default as…
tjysdsg f638685
Store test results in README.md
tjysdsg 2713f7c
Simplifiy training configurations
tjysdsg aaad835
Lower learning rate and re-train
tjysdsg da14886
Lower learning rate to avoid training loss hill climbing after 7-epoch
tjysdsg 2a188ec
Download and untar magicdata automatically
tjysdsg 30d08e8
Update db.sh
tjysdsg 2f93570
Update results
tjysdsg 9d0f3b3
Update the recipie list in egs2/README.md
tjysdsg 0c9b4e3
Merge https://github.com/espnet/espnet into magicdata_mandarin_read_s…
tjysdsg d0d3f76
Update results in README.md
tjysdsg 1d1d6bd
Add Huggingface link to README.md
tjysdsg 17f568c
Merge branch 'master' into magicdata_mandarin_read_speech
ftshijt File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# RESULTS | ||
|
||
## Environments | ||
|
||
- date: `Wed Sep 21 01:11:58 EDT 2022` | ||
- python version: `3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0]` | ||
- espnet version: `espnet 202207` | ||
- pytorch version: `pytorch 1.8.1+cu102` | ||
- Git hash: `9d0f3b3e1be6650d38cc5008518f445308fe06d9` | ||
- Commit date: `Mon Sep 19 20:27:41 2022 -0400` | ||
- Pretrained model: https://huggingface.co/espnet/jiyangtang_magicdata_asr_conformer_lm_transformer | ||
|
||
## [Conformer](conf/tuning/train_asr_conformer.yaml) with [Transformer-LM](conf/tuning/train_lm_transformer.yaml) | ||
|
||
### CER | ||
|
||
| dataset | Snt | Wrd | Corr | Sub | Del | Ins | Err | S.Err | | ||
|-----------------------------------------------------------------------------------------------|-------|--------|------|-----|-----|-----|-----|-------| | ||
| decode_asr_rnn_lm_lm_train_lm_transformer_zh_char_valid.loss.ave_asr_model_valid.acc.ave/test | 24279 | 243325 | 96.4 | 1.7 | 2.0 | 0.1 | 3.7 | 15.6 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr1/asr.sh |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# ====== About run.pl, queue.pl, slurm.pl, and ssh.pl ====== | ||
# Usage: <cmd>.pl [options] JOB=1:<nj> <log> <command...> | ||
# e.g. | ||
# run.pl --mem 4G JOB=1:10 echo.JOB.log echo JOB | ||
# | ||
# Options: | ||
# --time <time>: Limit the maximum time to execute. | ||
# --mem <mem>: Limit the maximum memory usage. | ||
# -–max-jobs-run <njob>: Limit the number parallel jobs. This is ignored for non-array jobs. | ||
# --num-threads <ngpu>: Specify the number of CPU core. | ||
# --gpu <ngpu>: Specify the number of GPU devices. | ||
# --config: Change the configuration file from default. | ||
# | ||
# "JOB=1:10" is used for "array jobs" and it can control the number of parallel jobs. | ||
# The left string of "=", i.e. "JOB", is replaced by <N>(Nth job) in the command and the log file name, | ||
# e.g. "echo JOB" is changed to "echo 3" for the 3rd job and "echo 8" for 8th job respectively. | ||
# Note that the number must start with a positive number, so you can't use "JOB=0:10" for example. | ||
# | ||
# run.pl, queue.pl, slurm.pl, and ssh.pl have unified interface, not depending on its backend. | ||
# These options are mapping to specific options for each backend and | ||
# it is configured by "conf/queue.conf" and "conf/slurm.conf" by default. | ||
# If jobs failed, your configuration might be wrong for your environment. | ||
# | ||
# | ||
# The official documentation for run.pl, queue.pl, slurm.pl, and ssh.pl: | ||
# "Parallelization in Kaldi": http://kaldi-asr.org/doc/queue.html | ||
# =========================================================~ | ||
|
||
|
||
# Select the backend used by run.sh from "local", "stdout", "sge", "slurm", or "ssh" | ||
cmd_backend='local' | ||
|
||
# Local machine, without any Job scheduling system | ||
if [ "${cmd_backend}" = local ]; then | ||
|
||
# The other usage | ||
export train_cmd="run.pl" | ||
# Used for "*_train.py": "--gpu" is appended optionally by run.sh | ||
export cuda_cmd="run.pl" | ||
# Used for "*_recog.py" | ||
export decode_cmd="run.pl" | ||
|
||
# Local machine logging to stdout and log file, without any Job scheduling system | ||
elif [ "${cmd_backend}" = stdout ]; then | ||
|
||
# The other usage | ||
export train_cmd="stdout.pl" | ||
# Used for "*_train.py": "--gpu" is appended optionally by run.sh | ||
export cuda_cmd="stdout.pl" | ||
# Used for "*_recog.py" | ||
export decode_cmd="stdout.pl" | ||
|
||
|
9E7A
||
# "qsub" (Sun Grid Engine, or derivation of it) | ||
elif [ "${cmd_backend}" = sge ]; then | ||
# The default setting is written in conf/queue.conf. | ||
# You must change "-q g.q" for the "queue" for your environment. | ||
# To know the "queue" names, type "qhost -q" | ||
# Note that to use "--gpu *", you have to setup "complex_value" for the system scheduler. | ||
|
||
export train_cmd="queue.pl" | ||
export cuda_cmd="queue.pl" | ||
export decode_cmd="queue.pl" | ||
|
||
|
||
# "qsub" (Torque/PBS.) | ||
elif [ "${cmd_backend}" = pbs ]; then | ||
# The default setting is written in conf/pbs.conf. | ||
|
||
export train_cmd="pbs.pl" | ||
export cuda_cmd="pbs.pl" | ||
export decode_cmd="pbs.pl" | ||
|
||
|
||
# "sbatch" (Slurm) | ||
elif [ "${cmd_backend}" = slurm ]; then | ||
# The default setting is written in conf/slurm.conf. | ||
# You must change "-p cpu" and "-p gpu" for the "partition" for your environment. | ||
# To know the "partion" names, type "sinfo". | ||
# You can use "--gpu * " by default for slurm and it is interpreted as "--gres gpu:*" | ||
# The devices are allocated exclusively using "${CUDA_VISIBLE_DEVICES}". | ||
|
||
export train_cmd="slurm.pl" | ||
export cuda_cmd="slurm.pl" | ||
export decode_cmd="slurm.pl" | ||
|
||
elif [ "${cmd_backend}" = ssh ]; then | ||
# You have to create ".queue/machines" to specify the host to execute jobs. | ||
# e.g. .queue/machines | ||
# host1 | ||
# host2 | ||
# host3 | ||
# Assuming you can login them without any password, i.e. You have to set ssh keys. | ||
|
||
export train_cmd="ssh.pl" | ||
export cuda_cmd="ssh.pl" | ||
export decode_cmd="ssh.pl" | ||
|
||
# This is an example of specifying several unique options in the JHU CLSP cluster setup. | ||
# Users can modify/add their own command options according to their cluster environments. | ||
elif [ "${cmd_backend}" = jhu ]; then | ||
|
||
export train_cmd="queue.pl --mem 2G" | ||
export cuda_cmd="queue-freegpu.pl --mem 2G --gpu 1 --config conf/queue.conf" | ||
export decode_cmd="queue.pl --mem 4G" | ||
|
||
else | ||
echo "$0: Error: Unknown cmd_backend=${cmd_backend}" 1>&2 | ||
return 1 | ||
fi |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/decode_asr_rnn.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
--sample-frequency=16000 | ||
--num-mel-bins=80 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# Default configuration | ||
command qsub -V -v PATH -S /bin/bash | ||
option name=* -N $0 | ||
option mem=* -l mem=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -l ncpus=$0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option num_nodes=* -l nodes=$0:ppn=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l ngpus=$0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
--sample-frequency=16000 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# Default configuration | ||
command qsub -v PATH -cwd -S /bin/bash -j y -l arch=*64* | ||
option name=* -N $0 | ||
option mem=* -l mem_free=$0,ram_free=$0 | ||
option mem=0 # Do not add anything to qsub_opts | ||
option num_threads=* -pe smp $0 | ||
option num_threads=1 # Do not add anything to qsub_opts | ||
option max_jobs_run=* -tc $0 | ||
option num_nodes=* -pe mpi $0 # You must set this PE as allocation_rule=1 | ||
default gpu=0 | ||
option gpu=0 | ||
option gpu=* -l gpu=$0 -q g.q |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Default configuration | ||
command sbatch --export=PATH | ||
option name=* --job-name $0 | ||
option time=* --time $0 | ||
option mem=* --mem-per-cpu $0 | ||
option mem=0 | ||
option num_threads=* --cpus-per-task $0 | ||
option num_threads=1 --cpus-per-task 1 | ||
option num_nodes=* --nodes $0 | ||
default gpu=0 | ||
option gpu=0 -p cpu | ||
option gpu=* -p gpu --gres=gpu:$0 -c $0 # Recommend allocating more CPU than, or equal to the number of GPU | ||
# note: the --max-jobs-run option is supported as a special case | ||
# by slurm.pl and you don't have to handle it in the config file. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/train_asr_conformer.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
tuning/train_lm_transformer.yaml |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
beam_size: 20 | ||
penalty: 0.0 | ||
maxlenratio: 0.0 | ||
minlenratio: 0.0 | ||
ctc_weight: 0.6 | ||
lm_weight: 0.3 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# network architecture | ||
# encoder related | ||
encoder: conformer | ||
encoder_conf: | ||
output_size: 512 # dimension of attention | ||
attention_heads: 8 | ||
linear_units: 2048 # the number of units of position-wise feed forward | ||
num_blocks: 12 # the number of encoder blocks | ||
dropout_rate: 0.1 | ||
positional_dropout_rate: 0.1 | ||
attention_dropout_rate: 0.0 | ||
input_layer: conv2d # encoder architecture type | ||
normalize_before: true | ||
pos_enc_layer_type: rel_pos | ||
selfattention_layer_type: rel_selfattn | ||
activation_type: swish | ||
macaron_style: true | ||
use_cnn_module: true | ||
cnn_module_kernel: 31 | ||
|
||
# decoder related | ||
decoder: transformer | ||
decoder_conf: | ||
attention_heads: 4 | ||
linear_units: 2048 | ||
num_blocks: 6 | ||
dropout_rate: 0.1 | ||
positional_dropout_rate: 0.1 | ||
self_attention_dropout_rate: 0.0 | ||
src_attention_dropout_rate: 0.0 | ||
|
||
# hybrid CTC/attention | ||
model_conf: | ||
ctc_weight: 0.3 | ||
lsm_weight: 0.1 # label smoothing option | ||
length_normalized_loss: false | ||
|
||
# minibatch related | ||
batch_type: numel | ||
batch_bins: 20000000 | ||
num_workers: 4 | ||
|
||
# optimization related | ||
accum_grad: 4 | ||
grad_clip: 5 | ||
max_epoch: 50 | ||
val_scheduler_criterion: | ||
- valid | ||
- acc | ||
best_model_criterion: | ||
- - valid | ||
- acc | ||
- max | ||
keep_nbest_models: 10 | ||
|
||
optim: adam | ||
optim_conf: | ||
lr: 0.0005 | ||
scheduler: warmuplr | ||
scheduler_conf: | ||
warmup_steps: 30000 | ||
|
||
specaug: specaug | ||
specaug_conf: | ||
apply_time_warp: true | ||
time_warp_window: 5 | ||
time_warp_mode: bicubic | ||
apply_freq_mask: true | ||
freq_mask_width_range: | ||
- 0 | ||
- 30 | ||
num_freq_mask: 2 | ||
apply_time_mask: true | ||
time_mask_width_range: | ||
- 0 | ||
- 40 | ||
num_time_mask: 2 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
lm: transformer | ||
lm_conf: | ||
pos_enc: null | ||
embed_unit: 128 | ||
att_unit: 512 | ||
head: 8 | ||
unit: 2048 | ||
layer: 16 | ||
dropout_rate: 0.1 | ||
|
||
# optimization related | ||
grad_clip: 5.0 | ||
batch_type: numel | ||
batch_bins: 2000000 | ||
accum_grad: 1 | ||
max_epoch: 15 # 15epoch is enougth | ||
|
||
optim: adam | ||
optim_conf: | ||
lr: 0.001 | ||
scheduler: warmuplr | ||
scheduler_conf: | ||
warmup_steps: 25000 | ||
|
||
best_model_criterion: | ||
- - valid | ||
- loss | ||
- min | ||
keep_nbest_models: 10 # 10 is good. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
../../TEMPLATE/asr1/db.sh |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.