Scale-wise Text-conditioned AutoRegressive image generation

Important: We have made the weights and code for STAR available in a new repository. Click here to access it!

News

[2025-02] We have released official Codebase and weights at Hugging Face!
[2024-06] STAR Technical Report is released.

Introduction

STAR, the first scale-wise text-to-image model based on VAR, supports resolutions from 256×256 to 1024×1024.

By incorporating text conditioning, normalized 2D RoPE, and causal-driven stable sampling, STAR outperforms existing models in fidelity, consistency, and quality, with a faster generation speed of 2.21s for 1024×1024 images on an A100.

CLICK for Detailed Introduction & Architecture

Unlike VAR, which focuses on a toy category-based auto-regressive generation for 256 images, STAR explores the potential of this scale-wise auto-regressive paradigm in real-world scenarios, aiming to make AR as effective as diffusion models. To achieve this, we: + replace the single category token with a text encoder and cross-attention for detailed text guidance; + introduce cross-scale normalized RoPE to stabilize structural learning and reduce training costs, unleasing the power for high-resolution training; + propose a new sampling method to overcome the intrinsic simultaneous sampling issue in AR models. While the 6983 se approaches have been (partially) explored to diffusion models, we are the first to validate and apply them in auto-regressive image generation, resulting in high-resolution, text-conditioned synthesis and can get StableDiffusion 2 performance.

framework of STAR

Quantitative Performance

Per-category FID on MJHQ-30K

Efficiency & CLIP-Score of 1024x1024 generation

Qualitative Performance

Reproduction

See Repo for detailes.

Citation

Thanks to the developers of Visual Autoregressive Modeling for their excellent work. Our code is adapted from VAR. If our work assists your research, feel free to give us a star ⭐ or cite us using:

@article{ma2024star,
  title={STAR: Scale-wise Text-conditioned AutoRegressive image generation}, 
  author={Xiaoxiao Ma and Mohan Zhou and Tao Liang and Yalong Bai and Tiejun Zhao and Biye Li and Huaian Chen and Yi Jin},
  journal={arXiv preprint arXiv:2406.10797},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scale-wise Text-conditioned AutoRegressive image generation

Important: We have made the weights and code for STAR available in a new repository. Click here to access it!

News

Introduction

Quantitative Performance

Qualitative Performance

Reproduction

Citation

About

Uh oh!

Uh oh!

Contributors 2

Uh oh!

krennic999/STAR

Folders and files

Latest commit

History

Repository files navigation

Scale-wise Text-conditioned AutoRegressive image generation

Important: We have made the weights and code for STAR available in a new repository. Click here to access it!

News

Introduction

Quantitative Performance

Qualitative Performance

Reproduction

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 2

Uh oh!