8000 [magpietts][eval][bugfix] fixed infer and eval scripts and supported loading wandb hparam config. by XuesongYang · Pull Request #13907 · NVIDIA/NeMo · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[magpietts][eval][bugfix] fixed infer and eval scripts and supported loading wandb hparam config. #13907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: magpietts_2503
Choose a base branch
from

Conversation

XuesongYang
Copy link
Collaborator
@XuesongYang XuesongYang commented Jun 12, 2025
  • tensorboard and wandb save hparam config files with slightly different YAML structure. This fix will address blockers when feeding in *.ckpt and */wandb/latest-run/files/config.yaml.
  • simplified loading ASR models by calling the general nemo_asr.models.ASRModel.from_pretrained(model_name=asr_model_name).
  • fixed getting sample_rate by directly calling model.sample_rate. This needs a fix after the PR Mini Code Refactor: Typos, Yaml updates, Code changes #13677

@XuesongYang XuesongYang requested a review from Copilot June 12, 2025 21:29
@XuesongYang XuesongYang force-pushed the xueyang/pr-infer-eval-fix branch from 5de52c6 to 9a3ecfc Compare June 12, 2025 21:30
Copilot

This comment was marked as outdated.

@XuesongYang XuesongYang requested a review from Copilot June 12, 2025 21:52
Copy link
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes issues related to loading configuration files for inference and evaluation, while also simplifying ASR model selection and error reporting. Key changes include:

  • Adjusting model configuration loading to support different YAML structures from TensorBoard and WandB.
  • Updating sample rate usage to use model.sample_rate consistently.
  • Simplifying ASR model initialization by using a unified API.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
scripts/magpietts/infer_and_evaluate.py Updates configuration loading logic, error messages, and fixes sample rate usage; adds support for WandB hparam YAML structure and multiple inference modes
scripts/magpietts/evaluate_generated_audio.py Simplifies ASR model initialization using a unified API based on the model name
Comments suppressed due to low confidence (2)

scripts/magpietts/infer_and_evaluate.py:157

  • The variable 'checkpoint_name' is reassigned using 'checkpoint_file' after 'model.eval()'. This could cause a runtime error when running in nemo file mode, where 'checkpoint_file' is None. Consider assigning 'checkpoint_name' conditionally based on the branch used to load the model.
checkpoint_name = checkpoint_file.split("/")[-1].split(".ckpt")[0]

scripts/magpietts/infer_and_evaluate.py:122

  • [nitpick] Consider adding an inline comment or updating the function docstring to explain the purpose of the new 'hparams_file_from_wandb' parameter for better maintainability and clarity.
hparams_file_from_wandb=False,


# asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="nvidia/parakeet-tdt-1.1b")
if asr_model_name in ["nvidia/parakeet-tdt-1.1b", "nvidia/parakeet-ctc-0.6b", "stt_en_conformer_transducer_large"]:
asr_model = nemo_asr.models.ASRModel.from_pretrained(model_name=asr_model_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, stt_en_conformer_transducer_large and parakeet return slightly different things when transcribing. For one of them we need to add [0] to the output because it returns a list. Please check if the code works with all ASRs. Maybe this is no longer a problem, but I am quite sure at one point they were different.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 and should we also remove "nvidia/parakeet-ctc-0.6b" for now since it is not giving normalized predictions?

Copy link
Collaborator Author
@XuesongYang XuesongYang Jun 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stt_en_conformer_transducer_large and parakeet return slightly different things when transcribing. For one of them we need to add [0] to the output because it returns a list

@paarthneekhara, standard ASR model inference takes as input a list of audio files so that the output is a list of transcripts. The output of any model should have to choose [0]th item if the input list only contains a single audio. Please check the README files of below models. Maybe ASR folks have fixed the issue as you mentioned 😆

But in any case, this PR change is to load pre-trained ASR models using a single API instead of separate classes. I tried to keep the ASR models in the original script.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shehzeen parakeet-ctc-0.6b transcibed normalized text, but parakeet-tdt-0.6b-v2 does not. See my tries below,

In [1]: import nemo.collections.asr as nemo_asr

In [2]: parakeet_tdt = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-1.1b")
In [5]: print(parakeet_tdt.transcribe(["predicted_audio_87.wav"])[0].text)
sd desdi pass zero zero fail zero fail zero cancelled four hundred and sixteen seventy six

In [6]: stt_en_conformer_transducer_large = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/stt_en_conformer_transducer_large")
In [7]: print(stt_en_conformer_transducer_large.transcribe(["predicted_audio_87.wav"])[0].text)
s d desti pass zero zero fail zero fail zero cancelled four hundred sixteen seventy six

In [9]: parakeet_tdt_06b_v2 = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-tdt-0.6b-v2")
In [10]: print(parakeet_tdt_06b_v2.transcribe(["predicted_audio_87.wav"])[0].text)
SD Des D pass 0 0 fail 0 fail 0 cancelled 416 76

In [11]: parakeet_ctc_06b = nemo_asr.models.ASRModel.from_pretrained(model_name="nvidia/parakeet-ctc-0.6b")
In [12]: print(parakeet_ctc_06b.transcribe(["predicted_audio_87.wav"])[0].text)
sd di pass zero zero fail zero fail zero canceled four hundred and sixteen seventy six

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paarthneekhara the above examples are also good to verify your previous concern disappears.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@XuesongYang got it! Thanks for clarifying. Looks good to me.

Xuesong Yang and others added 5 commits June 16, 2025 13:53
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
…he for..loop of num_repeats;

Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
Signed-off-by: Xuesong Yang <1646669+XuesongYang@users.noreply.github.com>
@XuesongYang XuesongYang force-pushed the xueyang/pr-infer-eval-fix branch from 820d65c to a083bb1 Compare June 16, 2025 20:53
@github-actions github-actions bot removed the TTS label Jun 16, 2025
@ko3n1g ko3n1g added Run CICD and removed Run CICD labels Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0