[magpietts][eval][bugfix] fixed infer and eval scripts and supported loading wandb hparam config. #13907

XuesongYang · 2025-06-12T21:29:46Z

tensorboard and wandb save hparam config files with slightly different YAML structure. This fix will address blockers when feeding in *.ckpt and */wandb/latest-run/files/config.yaml.
simplified loading ASR models by calling the general nemo_asr.models.ASRModel.from_pretrained(model_name=asr_model_name).
fixed getting sample_rate by directly calling model.sample_rate. This needs a fix after the PR Mini Code Refactor: Typos, Yaml updates, Code changes #13677

Copilot

Pull Request Overview

This PR fixes issues related to loading configuration files for inference and evaluation, while also simplifying ASR model selection and error reporting. Key changes include:

Adjusting model configuration loading to support different YAML structures from TensorBoard and WandB.
Updating sample rate usage to use model.sample_rate consistently.
Simplifying ASR model initialization by using a unified API.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
scripts/magpietts/infer_and_evaluate.py	Updates configuration loading logic, error messages, and fixes sample rate usage; adds support for WandB hparam YAML structure and multiple inference modes
scripts/magpietts/evaluate_generated_audio.py	Simplifies ASR model initialization using a unified API based on the model name

Comments suppressed due to low confidence (2)

scripts/magpietts/infer_and_evaluate.py:157

The variable 'checkpoint_name' is reassigned using 'checkpoint_file' after 'model.eval()'. This could cause a runtime error when running in nemo file mode, where 'checkpoint_file' is None. Consider assigning 'checkpoint_name' conditionally based on the branch used to load the model.

checkpoint_name = checkpoint_file.split("/")[-1].split(".ckpt")[0]

scripts/magpietts/infer_and_evaluate.py:122

[nitpick] Consider adding an inline comment or updating the function docstring to explain the purpose of the new 'hparams_file_from_wandb' parameter for better maintainability and clarity.

hparams_file_from_wandb=False,

paarthneekhara · 2025-06-12T22:08:34Z

scripts/magpietts/evaluate_generated_audio.py

-
-        # asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="nvidia/parakeet-tdt-1.1b")
+        if asr_model_name in ["nvidia/parakeet-tdt-1.1b", "nvidia/parakeet-ctc-0.6b", "stt_en_conformer_transducer_large"]:
+            asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name=asr_model_name)


IIRC, stt_en_conformer_transducer_large and parakeet return slightly different things when transcribing. For one of them we need to add [0] to the output because it returns a list. Please check if the code works with all ASRs. Maybe this is no longer a problem, but I am quite sure at one point they were different.

+1 and should we also remove "nvidia/parakeet-ctc-0.6b" for now since it is not giving normalized predictions?

stt_en_conformer_transducer_large and parakeet return slightly different things when transcribing. For one of them we need to add [0] to the output because it returns a list

@paarthneekhara, standard ASR model inference takes as input a list of audio files so that the output is a list of transcripts. The output of any model should have to choose [0]th item if the input list only contains a single audio. Please check the README files of below models. Maybe ASR folks have fixed the issue as you mentioned 😆

stt_en_conformer_transducer_large: https://huggingface.co/nvidia/stt_en_conformer_transducer_large

parakeet-tdt-1.1b: https://huggingface.co/nvidia/parakeet-tdt-1.1b

But in any case, this PR change is to load pre-trained ASR models using a single API instead of separate classes. I tried to keep the ASR models in the original script.

@shehzeen parakeet-ctc-0.6b transcibed normalized text, but parakeet-tdt-0.6b-v2 does not. See my tries below,

In [1]: import nemo.collections.asr as nemo_asr In [2]: parakeet_tdt = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-1.1b") In [5]: print(parakeet_tdt.transcribe(["predicted_audio_87.wav"])[0].text) sd desdi pass zero zero fail zero fail zero cancelled four hundred and sixteen seventy six In [6]: stt_en_conformer_transducer_large = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/stt_en_conformer_transducer_large") In [7]: print(stt_en_conformer_transducer_large.transcribe(["predicted_audio_87.wav"])[0].text) s d desti pass zero zero fail zero fail zero cancelled four hundred sixteen seventy six In [9]: parakeet_tdt_06b_v2 = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v2") In [10]: print(parakeet_tdt_06b_v2.transcribe(["predicted_audio_87.wav"])[0].text) SD Des D pass 0 0 fail 0 fail 0 cancelled 416 76 In [11]: parakeet_ctc_06b = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-ctc-0.6b") In [12]: print(parakeet_ctc_06b.transcribe(["predicted_audio_87.wav"])[0].text) sd di pass zero zero fail zero fail zero canceled four hundred and sixteen seventy six

@paarthneekhara the above examples are also good to verify your previous concern disappears.

@XuesongYang got it! Thanks for clarifying. Looks good to me.

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

…he for..loop of num_repeats; Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

XuesongYang requested a review from Copilot June 12, 2025 21:29

XuesongYang force-pushed the xueyang/pr-infer-eval-fix branch from 5de52c6 to 9a3ecfc Compare June 12, 2025 21:30

This comment was marked as outdated.

Sign in to view

XuesongYang requested a review from Copilot June 12, 2025 21:52

Copilot AI reviewed Jun 12, 2025

View reviewed changes

XuesongYang added TTS Run CICD labels Jun 12, 2025

XuesongYang requested review from blisc, shehzeen and paarthneekhara June 12, 2025 21:54

paarthneekhara reviewed Jun 12, 2025

View reviewed changes

XuesongYang requested a review from paarthneekhara June 13, 2025 07:33

github-actions bot removed the TTS label Jun 13, 2025

ko3n1g added Run CICD and removed Run CICD labels Jun 13, 2025

XuesongYang added the TTS label Jun 13, 2025

shehzeen approved these changes Jun 15, 2025

View reviewed changes

Xuesong Yang and others added 5 commits June 16, 2025 13:53

[eval] bugfix infer and eval scripts and supports loading wandb config.

0c61d55

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

minor updates.

be34ced

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

bugfix: checkpint_name can not be parsed when checkpoint_file is None

f2c6f54

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

refactor: moved many variables irrelevant to num repeats outside of t…

35e98eb

…he for..loop of num_repeats; Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

fix default cer and ssim target. otherwise, float(None) errors out.

a083bb1

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>

XuesongYang force-pushed the xueyang/pr-infer-eval-fix branch from 820d65c to a083bb1 Compare June 16, 2025 20:53

github-actions bot removed the TTS label Jun 16, 2025

ko3n1g added Run CICD and removed Run CICD labels Jun 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[magpietts][eval][bugfix] fixed infer and eval scripts and supported loading wandb hparam config. #13907

[magpietts][eval][bugfix] fixed infer and eval scripts and supported loading wandb hparam config. #13907

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[magpietts][eval][bugfix] fixed infer and eval scripts and supported loading wandb hparam config. #13907

Are you sure you want to change the base?

[magpietts][eval][bugfix] fixed infer and eval scripts and supported loading wandb hparam config. #13907

Uh oh!

Conversation

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!