8000 GitHub - Vchitect/DCM: DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Vchitect/DCM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

1 Nanjing University       2The University of Hong Kong       3Shanghai Artificial Intelligence Laboratory
4University of Chinese Academy of Sciences 5 S-Lab, Nanyang Technological University
(*: Equal contribution; †: Corresponding authors)

Paper | Project Page

🗓️Release

[2025/06/04] 🔥 We released the train and inference code of DCM.

💡Introduction

Diffusion Models have achieved remarkable results in video synthesis but require iterative denoising steps, leading to substantial computational overhead. Consistency Models have made significant progress in accelerating diffusion models. However, directly applying them to video diffusion models often results in severe degradation of temporal consistency and appearance details. In this paper, by analyzing the training dynamics of Consistency Models, we identify a key conflicting learning dynamics during the distillation process: there is a significant discrepancy in the optimization gradients and loss contributions across different timesteps. This discrepancy prevents the distilled student model from achieving an optimal state, leading to compromised temporal consistency and degraded appearance details.

To address this issue, we propose a parameter-efficient Dual-Expert Consistency Model (DCM), where a semantic expert focuses on learning semantic layout and motion, while a detail expert specializes in fine detail refinement. Furthermore, we introduce Temporal Coherence Loss to improve motion consistency for the semantic expert and apply GAN and Feature Matching Loss to enhance the synthesis quality of the detail expert. Our approach achieves state-of-the-art visual quality with significantly reduced sampling steps, demonstrating the effectiveness of expert specialization in video diffusion model distillation. For more details and visual results, go checkout our Project Page.

method

🔧 Usage

🚀 Installation

Run the following instructions to create an Anaconda environment.

conda create -n dcm python=3.10.0
conda activate dcm
git clone https://github.com/Vchitect/DCM
cd DCM
pip install torch==2.5.0 torchvision==0.20.0 torchaudio==2.5.0 --index-url https://download.pytorch.org/whl/cu121
pip install -e .
pip install flash-attn --no-build-isolation

🚀 Inference

First download the checkpoints from DCM and put the checkpoints into ckpt . And then, run the following command to perform inference.

Note: Before running the script, the relevant parameters and path configurations need to be modified accordingly.

# For HunyuanVideo
scripts/inference/inference_hy.sh
# For WAN2.1
scripts/inference/inference_wan.sh

🚀 Distillation

👉 HunyuanVideo

[Preparation]

Data Preparation For the distillation of HunyuanVideo, we utilize the FastVideo-preprocessed dataset HD-Mixkit-Finetune-Hunyuan. Download and p[lace it under the data directory. You may also follow the instructions provided here to process and utilize your own data.

Model Initialization HunyuanVideo should be used for initialization in the first-stage semantic expert distillation. Download and place it under the pretrained directory. For the second-stage details expert distillation, the semantic expert from the first stage should be used for initialization.

[Training]

# For semantic expert
./scripts/distill/distill_hy_semantic_expert.sh
# For detail expert
./scripts/distill/distill_hy_detail_expert.sh

👉 WAN2.1

[Preparation]

Data Preparation For the distillation of WAN2.1, we use a self-collected dataset, processed online without pre-encoding into latent representations. The fastvideo.dataset.t2v_datasets.WANVideoDataset class can be modified accordingly to accommodate your specific dataset.

Model Initialization Wan2.1-T2V-1.3B-Diffusers should be used for initialization in the first-stage semantic expert distillation. Download and place it under the pretrained directory. For the second-stage details expert distillation, the semantic expert from the first stage should be used for initialization.

[Training]

# For semantic expert
./scripts/distill/distill_wan_semantic_expert.sh
# For detail expert
./scripts/distill/distill_wan_detail_expert.sh

BibTeX

@article{lv2025dualexpert,
  author    = {Lv, Zhengyao and Si, Chenyang and Pan, Tianlin and Chen, Zhaoxi and Wong, Kwan-Yee K. and Qiao, Yu and Liu, Ziwei},
  title     = {Dual-Expert Consistency Model for Efficient and High-Quality Video Generation},
  booktitle = {arXiv preprint},
  year      = {2025},
}

Acknowledgement

This repository borrows code from FastVideo. Thanks for their contributions!

About

DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0