-
Notifications
You must be signed in to change notification settings - Fork 3k
[magpietts][eval][bugfix] fixed infer and eval scripts and supported loading wandb hparam config. #13907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: magpietts_2503
Are you sure you want to change the base?
[magpietts][eval][bugfix] fixed infer and eval scripts and supported loading wandb hparam config. #13907
Conversation
5de52c6
to
9a3ecfc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes issues related to loading configuration files for inference and evaluation, while also simplifying ASR model selection and error reporting. Key changes include:
- Adjusting model configuration loading to support different YAML structures from TensorBoard and WandB.
- Updating sample rate usage to use model.sample_rate consistently.
- Simplifying ASR model initialization by using a unified API.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
File | Description |
---|---|
scripts/magpietts/infer_and_evaluate.py | Updates configuration loading logic, error messages, and fixes sample rate usage; adds support for WandB hparam YAML structure and multiple inference modes |
scripts/magpietts/evaluate_generated_audio.py | Simplifies ASR model initialization using a unified API based on the model name |
Comments suppressed due to low confidence (2)
scripts/magpietts/infer_and_evaluate.py:157
- The variable 'checkpoint_name' is reassigned using 'checkpoint_file' after 'model.eval()'. This could cause a runtime error when running in nemo file mode, where 'checkpoint_file' is None. Consider assigning 'checkpoint_name' conditionally based on the branch used to load the model.
checkpoint_name = checkpoint_file.split("/")[-1].split(".ckpt")[0]
scripts/magpietts/infer_and_evaluate.py:122
- [nitpick] Consider adding an inline comment or updating the function docstring to explain the purpose of the new 'hparams_file_from_wandb' parameter for better maintainability and clarity.
hparams_file_from_wandb=False,
|
||
# asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="nvidia/parakeet-tdt-1.1b") | ||
if asr_model_name in ["nvidia/parakeet-tdt-1.1b", "nvidia/parakeet-ctc-0.6b", "stt_en_conformer_transducer_large"]: | ||
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name=asr_model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, stt_en_conformer_transducer_large
and parakeet
return slightly different things when transcribing. For one of them we need to add [0] to the output because it returns a list. Please check if the code works with all ASRs. Maybe this is no longer a problem, but I am quite sure at one point they were different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 and should we also remove "nvidia/parakeet-ctc-0.6b" for now since it is not giving normalized predictions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stt_en_conformer_transducer_large and parakeet return slightly different things when transcribing. For one of them we need to add [0] to the output because it returns a list
@paarthneekhara, standard ASR model inference takes as input a list of audio files so that the output is a list of transcripts. The output of any model should have to choose [0]th item if the input list only contains a single audio. Please check the README files of below models. Maybe ASR folks have fixed the issue as you mentioned 😆
stt_en_conformer_transducer_large
: https://huggingface.co/nvidia/stt_en_conformer_transducer_largeparakeet-tdt-1.1b
: https://huggingface.co/nvidia/parakeet-tdt-1.1b
But in any case, this PR change is to load pre-trained ASR models using a single API instead of separate classes. I tried to keep the ASR models in the original script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shehzeen parakeet-ctc-0.6b
transcibed normalized text, but parakeet-tdt-0.6b-v2
does not. See my tries below,
In [1]: import nemo.collections.asr as nemo_asr
In [2]: parakeet_tdt = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-1.1b")
In [5]: print(parakeet_tdt.transcribe(["predicted_audio_87.wav"])[0].text)
sd desdi pass zero zero fail zero fail zero cancelled four hundred and sixteen seventy six
In [6]: stt_en_conformer_transducer_large = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/stt_en_conformer_transducer_large")
In [7]: print(stt_en_conformer_transducer_large.transcribe(["predicted_audio_87.wav"])[0].text)
s d desti pass zero zero fail zero fail zero cancelled four hundred sixteen seventy six
In [9]: parakeet_tdt_06b_v2 = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v2")
In [10]: print(parakeet_tdt_06b_v2.transcribe(["predicted_audio_87.wav"])[0].text)
SD Des D pass 0 0 fail 0 fail 0 cancelled 416 76
In [11]: parakeet_ctc_06b = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-ctc-0.6b")
In [12]: print(parakeet_ctc_06b.transcribe(["predicted_audio_87.wav"])[0].text)
sd di pass zero zero fail zero fail zero canceled four hundred and sixteen seventy six
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paarthneekhara the above examples are also good to verify your previous concern disappears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@XuesongYang got it! Thanks for clarifying. Looks good to me.
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
…he for..loop of num_repeats; Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
820d65c
to
a083bb1
Compare
*.ckpt
and*/wandb/latest-run/files/config.yaml
.nemo_asr.models.ASRModel.from_pretrained(model_name=asr_model_name)
.sample_rate
by directly callingmodel.sample_rate
. This needs a fix after the PR Mini Code Refactor: Typos, Yaml updates, Code changes #13677