8000 GitHub - krennic999/STAR: STAR: Scale-wise Text-to-image generation via Auto-Regressive representations
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

krennic999/STAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Scale-wise Text-conditioned AutoRegressive image generation

ArXiv   Huggingface Weights   Project Page

Important: We have made the weights and code for STAR available in a new repository. Click here to access it!

News

Introduction

STAR, the first scale-wise text-to-image model based on VAR, supports resolutions from 256×256 to 1024×1024.

By incorporating text conditioning, normalized 2D RoPE, and causal-driven stable sampling, STAR outperforms existing models in fidelity, consistency, and quality, with a faster generation speed of 2.21s for 1024×1024 images on an A100.

image
CLICK for Detailed Introduction & Architecture Unlike VAR, which focuses on a toy category-based auto-regressive generation for 256 images, STAR explores the potential of this scale-wise auto-regressive paradigm in real-world scenarios, aiming to make AR as effective as diffusion models. To achieve this, we: + replace the single category token with a text encoder and cross-attention for detailed text guidance; + introduce cross-scale normalized RoPE to stabilize structural learning and reduce training costs, unleasing the power for high-resolution training; + propose a new sampling method to overcome the intrinsic simultaneous sampling issue in AR models. While the 6983 se approaches have been (partially) explored to diffusion models, we are the first to validate and apply them in auto-regressive image generation, resulting in high-resolution, text-conditioned synthesis and can get StableDiffusion 2 performance.
image

framework of STAR

Quantitative Performance

Image 1
Per-category FID on MJHQ-30K
Image 2
Efficiency & CLIP-Score of 1024x1024 generation

Qualitative Performance

Image 1

Reproduction

See Repo for detailes.

Citation

Thanks to the developers of Visual Autoregressive Modeling for their excellent work. Our code is adapted from VAR. If our work assists your research, feel free to give us a star ⭐ or cite us using:

@article{ma2024star,
  title={STAR: Scale-wise Text-conditioned AutoRegressive image generation}, 
  author={Xiaoxiao Ma and Mohan Zhou and Tao Liang and Yalong Bai and Tiejun Zhao and Biye Li and Huaian Chen and Yi Jin},
  journal={arXiv preprint arXiv:2406.10797},
  year={2024}
}

About

STAR: Scale-wise Text-to-image generation via Auto-Regressive representations

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  
0