HiFiGAN by teasgen

About • Installation • How To Train • How To Evaluate • Credits • License

About

This repository contains scripts for training and evaluation of HiFiGAN

Installation

Follow these steps to install the project:

(Optional) Create and activate new environment using conda.

# create env
conda create -n project_env python=3.10

# activate env
conda activate project_env

Install all required packages:
```
pip install -r requirements.txt
```
Install pre-commit:
```
pre-commit install
```

How To Train

You should have single A100-80gb GPU to exactly reproduce training, otherwise please implement and use gradient accumulation

To train a model, run the following commands and register in WandB:

Three-steps sequential training:

8k context

python3 train.py -cn hifigan.yaml \
  writer.run_name=hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero \
  dataloader.batch_size=64 \
  model.hu=512 \
  trainer.n_epochs=200 \
  +datasets.train.rand_split=True \
  trainer.epoch_len=175

22k context

python3 train.py -cn hifigan_cos_sheduler.yaml \
  writer.run_name=hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero_resume_len22k \
  dataloader.batch_size=32 \
  model.hu=512 \
  trainer.n_epochs=500 \
  +datasets.train.rand_split=True \
  trainer.epoch_len=350 \
  datasets.train.audio_length_limit=22050 \
  datasets.test.audio_length_limit=22050 \
  trainer.resume_from=<PATH_TO_SAVING_DIR>/hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero/checkpoint-epoch60.pth

44k context

python3 train.py -cn hifigan_cos_sheduler_resume_low_lr.yaml \
  writer.run_name=hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero_resume_2_len44k \
  dataloader.batch_size=16 \
  model.hu=512 \
  trainer.n_epochs=550 \
  +datasets.train.rand_split=True \
  trainer.epoch_len=700 \
  datasets.train.audio_length_limit=44100 \
  datasets.test.audio_length_limit=44100 \
  trainer.resume_from=<PATH_TO_SAVING_DIR>/hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero_resume_len22k/checkpoint-epoch460.pth

Moreover, training logs are available in WandB

https://wandb.ai/teasgen/vocoder

How To Evaluate

Best model could be downloaded by link https://drive.google.com/file/d/1xe3kqva4BiXi0hAGMBaqEiT795AE16hn/view?usp=sharing Or you may download it using CLI

gdown 1xe3kqva4BiXi0hAGMBaqEiT795AE16hn
tar xvf hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero_resume_2_len44k.tar

The checkpoint will be saved into <CURRENT_DIR>/hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero_resume_2_len44k/checkpoint-epoch550.pth

There are three types of evaluation

Reproduce the Wav using Mel-spectrogram.

The input - directory with ground truth Wavs, the output - directory with generated Wavs. GT wav is being transformed to Mel-spectrogram and after using HiFiGAN-repack-by-teasgen transformed to Wav. The example of directory with GT wavs located in this repo

gt_wavs_js - LJSpeech dataset 5 random samples from test split
gt_wavs - dataset with 5 random long Wavs

python3 synthesize.py -cn inference.yaml \
  inferencer.from_pretrained=hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero_resume_2_len44k/checkpoint-epoch550.pth \
  inferencer.save_path=wav2wav_lj \
  datasets.test.wav_dir=gt_wavs_lj

Generated wavs will be saved into <CURRENT_DIR>/data/test/<inferencer.save_path> Instead of datasets.test.wav_dir=gt_wavs_lj you may place custom dir: datasets.test.wav_dir=<GT_WAVS_DIRNAME>

Generate the Wav using text. Directory with texts version

The input - directory with texts, the output - directory with generated Wavs. Text is being transformed to Mel-spectrogram using Tacotron2 and after using HiFiGAN-repack-by-teasgen transformed to Wav. The example of directory with GT wavs located in this repo

test_data_text - transcriptions of gt_wavs

python3 synthesize.py -cn synthesize.yaml \
  inferencer.from_pretrained=hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero_resume_2_len44k/checkpoint-epoch550.pth \
  inferencer.save_path=text_dir2wav \
  datasets.test.transcription_dir=test_data_text

Generated wavs will be saved into <CURRENT_DIR>/data/test/<inferencer.save_path> Instead of datasets.test.transcription_dir=test_data_text you may place custom dir: datasets.test.transcription_dir=<GT_WAVS_DIRNAME>

Generate the Wav using text. Text in CLI version

The input text is set via CLI, the output - directory with generated Wavs.

python3 synthesize.py -cn synthesize_text_cli.yaml\
  inferencer.from_pretrained=hifigan_v1_my_v2_sr_22_05_len_8k_zero_to_hero_resume_2_len44k/checkpoint-epoch550.pth \
  inferencer.save_path=text_cli2wav \
  datasets.test.transcription="I am Vlad\, this is my pet project"

Generated wavs will be saved into <CURRENT_DIR>/data/test/<inferencer.save_path>

Neuro-MOS calculation

I am using https://github.com/AndreevP/wvmos

wvmos installation

pip install git+https://github.com/AndreevP/wvmos

Evaluation

python3 src/utils/mos_calculation.py --predicts-dir <PATH_TO_DIR_WITH_PREDICTIONS>

<PATH_TO_DIR_WITH_PREDICTIONS> is a directory with Wavs

Credits

This repository is based on a PyTorch Project Template.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
gt_wavs		gt_wavs
gt_wavs_lj		gt_wavs_lj
src		src
test_data_text		test_data_text
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
synthesize.py		synthesize.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HiFiGAN by teasgen

About

Installation

How To Train

How To Evaluate

Reproduce the Wav using Mel-spectrogram.

Generate the Wav using text. Directory with texts version

Generate the Wav using text. Text in CLI version

Neuro-MOS calculation

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

teasgen/neural_vocoder

Folders and files

Latest commit

History

Repository files navigation

HiFiGAN by teasgen

About

Installation

How To Train

How To Evaluate

Reproduce the Wav using Mel-spectrogram.

Generate the Wav using text. Directory with texts version

Generate the Wav using text. Text in CLI version

Neuro-MOS calculation

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages