ConvUNets have been overlooked... but they outperform Diffusion Transformers!
6/11/2025: We have released the codes of DiC! 🔥🔥🔥 Weights, SiT, and REPA versions are coming very soon.
3/3/2025: Codes & Weights are at the final stage of inspection. We will have them released ASAP.
2/27/2025: DiC is accepted by CVPR 2025! 🎉🎉
🤔 In this work, we intend to build a diffusion model with Conv3x3 that is simple but efficient.
🔧 We re-design architectures & blocks of the model to tap the potential of Conv3x3 to the full.
🚀 The proposed DiC ConvUNets are more powerful than DiTs, and much much faster!
This repo is mostly based on the official repo of DiT. Weights, SiT and REPA versions will be opensourced very soon.
Torch model script: dic_models.py
Please run command pip install -r requirements.txt
to install the supporting packages.
(Optional) Please download the VAE from this link. The VAE could be automatically downloaded as well.
Here we provide two ways to train a DiC model: 1. train on the original ImageNet dataset; 2. train on preprocessed VAE features (Recommended).
Training Data Preparation Use the original ImageNet dataset + VAE encoder. Firstly, download ImageNet as follows:
imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
Then run the following command:
torchrun --nnodes=1 --nproc_per_node=8 train.py --data-path={path to imagenet/train} --image-size=256 --model={model name} --epochs={iteration//5000} # fp32 Training
accelerate launch --mixed_precision fp16 train_accelerate.py --data-path {path to imagenet/train} --image-size=256 --model={model name} --epochs={iteration//5000} # fp16 Training