8000 GitHub - a-v-ebrahimi/Chain-of-Zoom: Official repository for "Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment"
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Official repository for "Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment"

License

Notifications You must be signed in to change notification settings

a-v-ebrahimi/Chain-of-Zoom

 
 

Repository files navigation

Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment

This repository is the official implementation of Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment, led by

Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye

main figure

Project Website arXiv


🔥 Summary

Modern single-image super-resolution (SISR) models deliver photo-realistic results at the scale factors on which they are trained, but show notable drawbacks:

  1. Blur and artifacts when pushed to magnify beyond its training regime
  2. High computational costs and inefficiency of retraining models when we want to magnify further

This brings us to the fundamental question:
How can we effectively utilize super-resolution models to explore much higher resolutions than they were originally trained for?

We address this via Chain-of-Zoom 🔎, a model-agnostic framework that factorizes SISR into an autoregressive chain of intermediate scale-states with multi-scale-aware prompts. CoZ repeatedly re-uses a backbone SR model, decomposing the conditional probability into tractable sub-problems to achieve extreme resolutions without additional training. Because visual cues diminish at high magnifications, we augment each zoom step with multi-scale-aware text prompts generated by a prompt extractor VLM. This prompt extractor can be fine-tuned through GRPO with a critic VLM to further align text guidance towards human preference.

🗓 ️News

  • [May 2025] Code and paper are uploaded.

🛠️ Setup

First, create your environment. We recommend using the following commands.

git clone https://github.com/bryanswkim/Chain-of-Zoom.git
cd Chain-of-Zoom

conda create -n coz python=3.10
conda activate coz
pip install -r requirements.txt

⏳ Models

Models Checkpoints
Stable Diffusion v3 Hugging Face
Qwen2.5-VL-3B-Instruct Hugging Face
RAM Hugging Face

🌄 Example

You can quickly check the results of using CoZ with the following example:

python inference_coz.py \
  -i samples \
  -o inference_results/coz_vlmprompt \
  --rec_type recursive_multiscale \
  --prompt_type vlm \
  --lora_path ckpt/SR_LoRA/model_20001.pkl \
  --vae_path ckpt/SR_VAE/vae_encoder_20001.pt \
  --pretrained_model_name_or_path 'stabilityai/stable-diffusion-3-medium-diffusers' \
  --ram_ft_path ckpt/DAPE/DAPE.pth \
  --ram_path ckpt/RAM/ram_swin_large_14m.pth;

Which will give a result like below:

main figure

🔬 Efficient Memory

Using --efficient_memory allows CoZ to run on a single GPU with 24GB VRAM, but highly increases inference time due to offloading.
We recommend using two GPUs.

📝 Citation

If you find our method useful, please cite as below or leave a star to this repository.

@article{kim2025chain,
  title={Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment},
  author={Kim, Bryan Sangwoo and Kim, Jeongsol and Ye, Jong Chul},
  journal={arXiv preprint arXiv:2505.18600},
  year={2025}
}

🤗 Acknowledgements

We thank the authors of OSEDiff for sharing their awesome work!

Note

This work is currently in the preprint stage, and there may be some changes to the code.

About

Official repository for "Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.0%
  • Shell 1.0%
0