CAR: Controllable AutoRegressive Modeling for Visual Generation

Ziyu Yao^1,2, Jialin Li², Yifeng Zhou², Yong Liu², Xi Jiang^2,3, Chengjie Wang², Feng Zheng³, Yuexian Zou¹, Lei Li⁴

¹ Peking University, ² Tencent Youtu Lab, ³ Southern University of Science and Technology, ⁴ University of Washington

CAR Models

We have currently released the CAR-d16 weights for demo purposes, and larger models will be made available following future upgrades and extensions of CAR.

The CAR models are available on and can also be downloaded from the following links:

Model	reso.	Condition	HF weights🤗
CAR-d16	256	Canny Edge	car_canny_d16.pth
CAR-d16	256	HED Map	car_hed_d16.pth
CAR-d16	256	Depth Map	car_depth_d16.pth
CAR-d16	256	Normal Map	car_normal_d16.pth
CAR-d16	256	Sketch	car_sketch_d16.pth

As CAR is based on the pre-trained VAR model, the following pre-trained weights also need to be downloaded: vae_ch160v4096z32.pth, var_d16.pth.

Training

1. Prepare Dataset

The arg --data_path should indicate the path to the ImageNet dataset.

2. Extract conditions from ImageNet dataset

You can choose to extract conditions from all categories or select a subset of 1000 categories for condition extraction. Run the following commands:

# canny
python extract_canny.py
# hed
python extract_hed.py
# depth
python extract_depth.py
# normal
python extract_normal.py
# sketch
python extract_sketch.py

3. Train CAR model

# d16, 256x256
torchrun --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... train.py \
  --data_path=/path/to/imagenet --condition_path=/path/to/condition/extract/above \
  --vae_ckpt=/path/to/pretrained/vae/ckpt --pretrained_var_ckpt=/path/to/pretrained/var/ckpt \
  --tblr=0.0001 --depth=16 --bs=768 --ep=200 --fp16=1 --alng=1e-3 --wpe=0.1

Inference

# cls is an index ranging from 0 to 999 in the ImageNet label set
# type indicates which condition is extracted from the original image (canny, hed, depth, normal, sketch)
python inference.py --vae_ckpt=/path/to/pretrained/vae/ckpt --var_ckpt=/path/to/pretrained/var/ckpt \
  --car_ckpt=/path/to/car/ckpt --img_path=/path/to/original/image/to/extract/condition \
  --save_path=/path/to/save/image --cls=3 --type=hed

Acknowledgments

The development of CAR is based on VAR. We deeply appreciate this significant contribution to the community.

Citation

If you find our work helpful in your research, we would be grateful if you could consider giving us a star ⭐ or citing it using:

@article{yao2024car,
  title={Car: Controllable autoregressive modeling for visual generation},
  author={Yao, Ziyu and Li, Jialin and Zhou, Yifeng and Liu, Yong and Jiang, Xi and Wang, Chengjie and Zheng, Feng and Zou, Yuexian and Li, Lei},
  journal={arXiv preprint arXiv:2410.04671},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
conditions		conditions
docs		docs
models		models
utils		utils
LICENSE		LICENSE
README.md		README.md
dist.py		dist.py
inference.py		inference.py
requirements.txt		requirements.txt
train.py		train.py
trainer.py		trainer.py

Uh oh!

Repository files navigation

CAR: Controllable AutoRegressive Modeling for Visual Generation

CAR Models

Training

1. Prepare Dataset

2. Extract conditions from ImageNet dataset

3. Train CAR model

Inference

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

MiracleDance/CAR

Folders and files

Latest commit

History

Repository files navigation

CAR: Controllable AutoRegressive Modeling for Visual Generation

CAR Models

Training

1. Prepare Dataset

2. Extract conditions from ImageNet dataset

3. Train CAR model

Inference

Acknowledgments

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages