8000 GitHub - MiracleDance/CAR: CAR: Controllable AutoRegressive Modeling for Visual Generation
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

MiracleDance/CAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAR: Controllable AutoRegressive Modeling for Visual Generation

Ziyu Yao1,2, Jialin Li2, Yifeng Zhou2, Yong Liu2, Xi Jiang2,3, Chengjie Wang2, Feng Zheng3, Yuexian Zou1, Lei Li4

1 Peking University, 2 Tencent Youtu Lab, 3 Southern University of Science and Technology, 4 University of Washington

arXiv  huggingface weights 

CAR Models

We have currently released the CAR-d16 weights for demo purposes, and larger models will be made available following future upgrades and extensions of CAR.

The CAR models are available on and can also be downloaded from the following links:

Model reso. Condition HF weights🤗
CAR-d16 256 Canny Edge car_canny_d16.pth
CAR-d16 256 HED Map car_hed_d16.pth
CAR-d16 256 Depth Map car_depth_d16.pth
CAR-d16 256 Normal Map car_normal_d16.pth
CAR-d16 256 Sketch car_sketch_d16.pth

As CAR is based on the pre-trained VAR model, the following pre-trained weights also need to be downloaded: vae_ch160v4096z32.pth, var_d16.pth.

Training

1. Prepare Dataset

The arg --data_path should indicate the path to the ImageNet dataset.

2. Extract conditions from ImageNet dataset

You can choose to extract conditions from all categories or select a subset of 1000 categories for condition extraction. Run the following commands:

# canny
python extract_canny.py
# hed
python extract_hed.py
# depth
python extract_depth.py
# normal
python extract_normal.py
# sketch
python extract_sketch.py

3. Train CAR model

# d16, 256x256
torchrun --nproc_per_node=8 --nnodes=... --node_rank=... --master_addr=... --master_port=... train.py \
  --data_path=/path/to/imagenet --condition_path=/path/to/condition/extract/above \
  --vae_ckpt=/path/to/pretrained/vae/ckpt --pretrained_var_ckpt=/path/to/pretrained/var/ckpt \
  --tblr=0.0001 --depth=16 --bs=768 --ep=200 --fp16=1 --alng=1e-3 --wpe=0.1 

Inference

# cls is an index ranging from 0 to 999 in the ImageNet label set
# type indicates which condition is extracted from the original image (canny, hed, depth, normal, sketch)
python inference.py --vae_ckpt=/path/to/pretrained/vae/ckpt --var_ckpt=/path/to/pretrained/var/ckpt \
  --car_ckpt=/path/to/car/ckpt --img_path=/path/to/original/image/to/extract/condition \
  --save_path=/path/to/save/image --cls=3 --type=hed

Acknowledgments

The development of CAR is based on VAR. We deeply appreciate this significant contribution to the community.

Citation

If you find our work helpful in your research, we would be grateful if you could consider giving us a star ⭐ or citing it using:

@article{yao2024car,
  title={Car: Controllable autoregressive modeling for visual generation},
  author={Yao, Ziyu and Li, Jialin and Zhou, Yifeng and Liu, Yong and Jiang, Xi and Wang, Chengjie and Zheng, Feng and Zou, Yuexian and Li, Lei},
  journal={arXiv preprint arXiv:2410.04671},
  year={2024}
}

About

CAR: Controllable AutoRegressive Modeling for Visual Generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0