8000 GitHub - GalaxyCong/EmoDubber: Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.

License

Notifications You must be signed in to change notification settings

GalaxyCong/EmoDubber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

43 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing

python arXiv code demo

πŸ—’ TODO List

  • [βœ“] Release EmoDubber's training and inference code (Basic Fuction). (Fixed on 5/27/2025)
  • [βœ“] Upload pre-processed dataset features to Baidu Cloud and Google Cloud. (Done 5/27/2025)
  • [βœ“] Release model checkpoint (Basic Fuction) to inference waveform. (Before 6/1/2025)
  • [-] Release EmoDubber's emotion controlling code (Emotion Fuction).
  • [-] Provide metrics testing scripts (LSE-C, LSE-D, SECS, WER, MCD).

Illustration

Environment

  1. Clone this repository:
git clone https://github.com/GalaxyCong/EmoDubber.git
cd EmoDubber
  1. Create an environment
conda create -n emodub python=3.10 -y
conda activate emodub
  1. Install python requirements:
pip install -r requirements.txt
  1. Install monotonic_align
pip install git+https://github.com/resemble-ai/monotonic_align.git
  1. (Option) Last step. Download trainer.py to replace your anaconda3/envs/emodub/lib/python3.10/site-packages/lightning/pytorch/trainer/trainer.py;
    Download checkpoint_connector.py to replace your anaconda3/envs/emodub/lib/python3.10/site-packages/lightning/pytorch/trainer/connectors/checkpoint_connector.py

(Note: If you want to train model from scratch, step 5 is required. If you only want to do inference, please ignore it. Step5 is used to prevent the error of Missing key(s) in state_dict (TTS_model.ckpt >> EmoDubber_all). I avoid this problem by setting "strict=False" in torch lightning.)

Prepare Data Feature

When performing training, both Raw Audio and Prosessed Features need to be downloaded. When inferencing, only Prosessed Features are needed.

Chem

GRID

Train Your Own Model

  1. Ensure input path is correct (see configs/data/Chem_dataset.yaml or configs/data/GRID_dataset).
  2. Download TTS_model.ckpt (pretraining on LibriTTS-clean-100 dataset) and save it in Pretrained_TTSmodel folder.
  3. Finally, please stay in root directory, and run directly:
python EmoDubber_Networks/Train_EmoDubber_Chem16K.py

or

python EmoDubber_Networks/Train_EmoDubber_GRID16K.py

Our Checkpoints

We provide EmoDubber's checkpoints (Basic Fuction). Moreover, we also provide the audio generated by these checkpoints. They were used to compare with other SOTA dubbing baseline in main setting (Setting1 & Setting 2), .ie, without emotion control. It will hopefully facilitate future comparisons.

The links are given below:

Checkpoints on Chem dataset

Checkpoints on GRID dataset

Inference

  1. Download EmoDubber's 16k Hz Vocoder and save it to ./Vocoder_16KHz folder.

  2. Run script for inference (stay in root):

  • For main setting1:
python EmoDubber_Networks/Inference_Chem_Unbatch_New_S1.py \
    --checkpoint_path [model_dir] \
    --vocoder_checkpoint_path [vocoder_dir] \
    --Val_list [script_dir] \
    --Silent_Lip [lip_dir]  \
    --Silent_Face [face_dir] \
    --Refence_audio [reference_dir] \
  • For main setting2:
python EmoDubber_Networks/Inference_Chem_Unbatch_New_S2.py \
    --checkpoint_path [model_dir] \
    --vocoder_checkpoint_path [vocoder_dir] \
    --Val_list [script_dir] \
    --Silent_Lip [lip_dir]  \
    --Silent_Face [face_dir] \
    --Refence_audio [reference_dir] \
    --Set2_list [script2_dir] \

Arguments

  • checkpoint_path: Path to the directory containing checkpoint files. We have provided our checkpoints.
  • vocoder_checkpoint_path: Path to the vocoder that matches EmoDubber. Default in Vocoder_16KHz folder.
  • Val_list: Path to txt script. Equal to valid_filelist_path in ./configs/data/*.yaml.
  • Silent_Lip: Path to lip-motion. Equal to lip_embedding_path in ./configs/data/*.yaml.
  • Silent_Face: Path to face feature. Equal to VA_path in ./configs/data/*.yaml.
  • Refence_audio: Path to reference audio feature. Equal to Speaker_GE2E_ID_path in ./configs/data/*.yaml.
  • Set2_list: Path to txt script of setting2, requried in running Inference_Chem_Unbatch_New_S2.py or Inference_GRID_Unbatch_New_S2.py. It aims to avoid using the target audio as the reference, the reference audio should come from another clip. Set2_list can be download here.

Emotion Controlling

Under construction

Training emotional expert classifier

We provide all checkpoints. Below is the checkpoint of our emotional expert classifier.

πŸ‘‰ Five types of emotions (Recommendation): https://drive.google.com/drive/folders/1vSVTAkZsoinSlYgeVCvBXs5V2k-gIurV?usp=sharing

πŸ‘‰ Seven types of emotions (Recommendation): https://drive.google.com/drive/folders/1h0Y1TChA9vgX3_6u5GUrJK69n0rgKTRU?usp=sharing

Seven types of emotions with emotionless data augmentation: https://drive.google.com/drive/folders/1DuhQYe5FuowHBRMOFRhthrvPJBlfK_5E?usp=sharing

Inference

Under construction

License

Code: MIT License

Citing

If you find this helps your research, please consider citing:

@article{cong2024emodubber,
  title={EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing},
  author={Cong, Gaoxiang and Pan, Jiadong and Li, Liang and Qi, Yuankai and Peng, Yuxin and Hengel, Anton van den and Yang, Jian and Huang, Qingming},
  journal={arXiv preprint arXiv:2412.08988},
  year={2024}
}

Contact

My email is gaoxiang.cong@vipl.ict.ac.cn

Any discussions and suggestions are welcome!

About

Official source codes for the paper: EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0