8000 GitHub - toto222/DICE-Talk: DICE-Talk is a diffusion-based emotional talking head generation method that can generate vivid and diverse emotions for speaking portraits.
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

DICE-Talk is a diffusion-based emotional talking head generation method that can generate vivid and diverse emotions for speaking portraits.

License

Notifications You must be signed in to change notification settings

toto222/DICE-Talk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

DICE-Talk

Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation.

License

πŸ”₯πŸ”₯πŸ”₯ NEWS

2025/04/29: We released the initial version of the inference code and models. Stay tuned for continuous updates!

πŸŽ₯ Demo

Input Neutral Happy Angry Surprised
1_ne.mp4
1_ha.mp4
1_an.mp4
1_su.mp4
2_ne.mp4
2_ha.mp4
2_an.mp4
2_su.mp4

For more visual demos, please visit our Page.

πŸ“œ Requirements

  • It is recommended to use a GPU with 20GB or more VRAM and have an independent Python 3.10.
  • Tested operating system: Linux

πŸ”‘ Inference

Installtion

  • ffmpeg requires to be installed.
  • PyTorch: make sure to select the appropriate CUDA version based on your hardware, for example,
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu118
  • Dependencies:
pip install -r requirements.txt
  • All models are stored in checkpoints by default, and the file structure is as follows:
DICE-Talk
  β”œβ”€β”€checkpoints
  β”‚  β”œβ”€β”€DICE-Talk
  β”‚  β”‚  β”œβ”€β”€audio_linear.pth
  β”‚  β”‚  β”œβ”€β”€emo_model.pth
  β”‚  β”‚  β”œβ”€β”€pose_guider.pth  
  β”‚  β”‚  β”œβ”€β”€unet.pth
  β”‚  β”œβ”€β”€stable-video-diffusion-img2vid-xt
  β”‚  β”‚  β”œβ”€β”€...
  β”‚  β”œβ”€β”€whisper-tiny
  β”‚  β”‚  β”œβ”€β”€...
  β”‚  β”œβ”€β”€RIFE
  β”‚  β”‚  β”œβ”€β”€flownet.pkl
  β”‚  β”œβ”€β”€yoloface_v5m.pt
  β”œβ”€β”€...

Download by huggingface-cli follow

python3 -m pip install "huggingface_hub[cli]"

huggingface-cli download EEEELY/DICE-Talk --local-dir  checkpoints
huggingface-cli download stabilityai/stable-video-diffusion-img2vid-xt --local-dir  checkpoints/stable-video-diffusion-img2vid-xt
huggingface-cli download openai/whisper-tiny --local-dir checkpoints/whisper-tiny

or manully download pretrain model, svd-xt and whisper-tiny to checkpoints/.

Run demo

python3 demo.py --image_path '/path/to/input_image' --audio_path '/path/to/input_audio'\ 
  --emotion_path '/path/to/input_emotion' --output_path '/path/to/output_video'

Run GUI

python3 gradio_app.py
gradio_demo

On the left you need to:

  • Upload an image or take a photo
  • Upload or record an audio clip
  • Select the type of emotion to generate
  • Set the strength for identity preservation and emotion generation
  • Choose whether to crop the input image

On the right are the generated videos.

πŸ”— Citation

If you find our work helpful for your research, please consider citing our work.

@article{tan2025dicetalk,
  title={Disentangle Identity, Cooperate Emotion: Correlation-Aware Emotional Talking Portrait Generation}, 
  author={Tan, Weipeng and Lin, Chuming and Xu, Chengming and Xu, FeiFan and Hu, Xiaobin and Ji, Xiaozhong and Zhu, Junwei and Wang, Chengjie and Fu, Yanwei},
  journal={arXiv preprint arXiv:2504.18087},
  year={2025}
}

@article{ji2024sonic,
  title={Sonic: Shifting Focus to Global Audio Perception in Portrait Animation},
  author={Ji, Xiaozhong and Hu, Xiaobin and Xu, Zhihong and Zhu, Junwei and Lin, Chuming and He, Qingdong and Zhang, Jiangning and Luo, Donghao and Chen, Yi and Lin, Qin and others},
  journal={arXiv preprint arXiv:2411.16331},
  year={2024}
}

@article{ji2024realtalk,
  title={Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network},
  author={Ji, Xiaozhong and Lin, Chuming and Ding, Zhonggan and Tai, Ying and Zhu, Junwei and Hu, Xiaobin and Luo, Donghao and Ge, Yanhao and Wang, Chengjie},
  journal={arXiv preprint arXiv:2406.18284},
  year={2024}
}

About

DICE-Talk is a diffusion-based emotional talking head generation method that can generate vivid and diverse emotions for speaking portraits.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0