🗒 TODO List

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

🗒 TODO List

[✓] Release EmoDubber's training and inference code (Basic Fuction). (Fixed on 5/27/2025)
[✓] Upload pre-processed dataset features to Baidu Cloud and Google Cloud. (Done 5/27/2025)
[✓] Release model checkpoint (Basic Fuction) to inference waveform. (Before 6/1/2025)
[-] Release EmoDubber's emotion controlling code (Emotion Fuction).
[-] Provide metrics testing scripts (LSE-C, LSE-D, SECS, WER, MCD).

Environment

Clone this repository:

git clone https://github.com/GalaxyCong/EmoDubber.git
cd EmoDubber

Create an environment

conda create -n emodub python=3.10 -y
conda activate emodub

Install python requirements:

pip install -r requirements.txt

Install monotonic_align

pip install git+https://github.com/resemble-ai/monotonic_align.git

(Option) Last step. Download trainer.py to replace your anaconda3/envs/emodub/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py;
Download checkpoint_connector.py to replace your anaconda3/envs/emodub/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py

(Note: If you want to train model from scratch, step 5 is required. If you only want to do inference, please ignore it. Step5 is used to prevent the error of Missing key(s) in state_dict (TTS_model.ckpt >> EmoDubber_all). I avoid this problem by setting "strict=False" in torch lightning.)

Prepare Data Feature

When performing training, both Raw Audio and Prosessed Features need to be downloaded. When inferencing, only Prosessed Features are needed.

Chem

Chem 16KHz Raw Audio: Google Drive || Baidu Drive(erap)
Chem Prosessed Feature: Google Drive || Baidu Drive(nriv)

GRID

GRID 16KHz Raw Audio: Google Drive || Baidu Drive(xikd)
GRID Prosessed Feature: Google Drive || Baidu Drive(cbdy)

Train Your Own Model

Ensure input path is correct (see configs/data/Chem_dataset.yaml or configs/data/GRID_dataset).
Download TTS_model.ckpt (pretraining on LibriTTS-clean-100 dataset) and save it in Pretrained_TTSmodel folder.
Finally, please stay in root directory, and run directly:

python EmoDubber_Networks/Train_EmoDubber_Chem16K.py

or

python EmoDubber_Networks/Train_EmoDubber_GRID16K.py

Our Checkpoints

We provide EmoDubber's checkpoints (Basic Fuction). Moreover, we also provide the audio generated by these checkpoints. They were used to compare with other SOTA dubbing baseline in main setting (Setting1 & Setting 2), .ie, without emotion control. It will hopefully facilitate future comparisons.

The links are given below:

Checkpoints on Chem dataset

Checkpoint: Google Drive or Baidu Drive(sxus)
Generated Result: Google Drive or Baidu Drive(heu2)

Checkpoints on GRID dataset

Checkpoint: Google Drive or Baidu Drive(hv9t)
Generated Result: Google Drive or Baidu Drive(2ibw)

Inference

Download EmoDubber's 16k Hz Vocoder and save it to ./Vocoder_16KHz folder.
Run script for inference (stay in root):

For main setting1:

python EmoDubber_Networks/Inference_Chem_Unbatch_New_S1.py \
    --checkpoint_path [model_dir] \
    --vocoder_checkpoint_path [vocoder_dir] \
    --Val_list [script_dir] \
    --Silent_Lip [lip_dir]  \
    --Silent_Face [face_dir] \
    --Refence_audio [reference_dir] \

For main setting2:

python EmoDubber_Networks/Inference_Chem_Unbatch_New_S2.py \
    --checkpoint_path [model_dir] \
    --vocoder_checkpoint_path [vocoder_dir] \
    --Val_list [script_dir] \
    --Silent_Lip [lip_dir]  \
    --Silent_Face [face_dir] \
    --Refence_audio [reference_dir] \
    --Set2_list [script2_dir] \

Arguments

checkpoint_path: Path to the directory containing checkpoint files. We have provided our checkpoints.
vocoder_checkpoint_path: Path to the vocoder that matches EmoDubber. Default in Vocoder_16KHz folder.
Val_list: Path to txt script. Equal to valid_filelist_path in ./configs/data/*.yaml.
Silent_Lip: Path to lip-motion. Equal to lip_embedding_path in ./configs/data/*.yaml.
Silent_Face: Path to face feature. Equal to VA_path in ./configs/data/*.yaml.
Refence_audio: Path to reference audio feature. Equal to Speaker_GE2E_ID_path in ./configs/data/*.yaml.
Set2_list: Path to txt script of setting2, requried in running Inference_Chem_Unbatch_New_S2.py or Inference_GRID_Unbatch_New_S2.py. It aims to avoid using the target audio as the reference, the reference audio should come from another clip. Set2_list can be download here.

Emotion Controlling

Under construction

Training emotional expert classifier

We provide all checkpoints. Below is the checkpoint of our emotional expert classifier.

👉 Five types of emotions (Recommendation): https://drive.google.com/drive/folders/1vSVTAkZsoinSlYgeVCvBXs5V2k-gIurV?usp=sharing

👉 Seven types of emotions (Recommendation): https://drive.google.com/drive/folders/1h0Y1TChA9vgX3_6u5GUrJK69n0rgKTRU?usp=sharing

Seven types of emotions with emotionless data augmentation: https://drive.google.com/drive/folders/1DuhQYe5FuowHBRMOFRhthrvPJBlfK_5E?usp=sharing

Inference

Under construction

License

Code: MIT License

Citing

If you find this helps your research, please consider citing:

@article{cong2024emodubber,
  title={EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing},
  author={Cong, Gaoxiang and Pan, Jiadong and Li, Liang and Qi, Yuankai and Peng, Yuxin and Hengel, Anton van den and Yang, Jian and Huang, Qingming},
  journal={arXiv preprint arXiv:2412.08988},
  year={2024}
}

Contact

My email is gaoxiang.cong@vipl.ict.ac.cn

Any discussions and suggestions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
EmoDubber_Networks		EmoDubber_Networks
Pretrained_TTSmodel		Pretrained_TTSmodel
Test_Metrics_Source		Test_Metrics_Source
Vocoder_16KHz		Vocoder_16KHz
assets		assets
configs		configs
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

🗒 TODO List

Environment

Prepare Data Feature

Chem

GRID

Train Your Own Model

Our Checkpoints

Checkpoints on Chem dataset

Checkpoints on GRID dataset

Inference

Arguments

Emotion Controlling

Training emotional expert classifier

Inference

License

Citing

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

GalaxyCong/EmoDubber

Folders and files

Latest commit

History

Repository files navigation

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

🗒 TODO List

Environment

Prepare Data Feature

Chem

GRID

Train Your Own Model

Our Checkpoints

Checkpoints on Chem dataset

Checkpoints on GRID dataset

Inference

Arguments

Emotion Controlling

Training emotional expert classifier

Inference

License

Citing

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages