VISION XL: High Definition Video Inverse Problem Solver using Latent Diffusion Models

This repository is the official implementation of "VISION XL: High Definition Video Inverse Problem Solver using Latent Diffusion Models".

✨ Summary

We introduce high-definition video inverse problem solver using a state-of-the-art latent diffusion model SDXL. Recent approaches enable image diffusion models to solve video inverse problems with significantly reduced computational requirements. Although effective, these methods have notable drwabacks:

Limited resolution support
Dependecy on the optical flow estimation

We propose a novel video inverse problem solver to manage high computational demands of advanced latent diffusion modles with improved temporal consistency. The following advantages are observed

Supports multiple aspect ratios and delivers HD-resolution
Better temporal consistency
No requirements on additional modules

Experimental results confirm that our method significantly enhances performance in solving high-definition video inverse problems, suggesting a new standared for efficiency and flexibility.

🗓 News

[16 Jan 2025] Code released.
[29 Nov 2024] Paper uploaded.

🔥 Setup

First, create your environment. We recommend using the following comments.

git clone https://github.com/vision-xl/codes.git
cd vision-xl/codes
conda env create -f environment.yaml

For reproducibility, using the same package version is necessary since some dependencies lead to significant differences (for instance, diffusers). Diffusers will automatically download checkpoints for SDXL.

🚀 Examples

High-definition video inverse problem solver

Landscape (764x1280) video reconstruction sample

SAMPLE_FLAGS="--method vision-xl --degradation +deblur --ratio 0.6 --folder_path examples/assets/pexels_sample/landscape"
python -m examples.video_recon $SAMPLE_FLAGS

Vertical (1280x764) video reconstruction sample

SAMPLE_FLAGS="--method vision-xl --degradation +deblur --ratio 1.67 --folder_path examples/assets/pexels_sample/vertical"
python -m examples.video_recon $SAMPLE_FLAGS

Square (1024x1024) video reconstruction sample

SAMPLE_FLAGS="--method vision-xl --degradation +deblur --ratio 1.0 --folder_path examples/assets/pexels_sample/square"
python -m examples.video_recon $SAMPLE_FLAGS

For the other degradations, run above with --degradation "degradation type". We support wide range of degradation types, such as +deblur, +sr, +inpaint, deblur, sr, and inpaint.

Tip

If you want to use different ratio, change --ratio as supporting ratio defined in "SD_XL_BASE_RATIOS" of 'examples/video_recon.py'.

📝 Citation

If you find our method useful, please leave a star to this repository.

Note

This repository build from official repository of CFG++. This work is currently in the preprint stage, and there may be some changes to the code.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
ckpt		ckpt
examples		examples
utils		utils
README.md		README.md
degradations.py		degradations.py
environment.yaml		environment.yaml
latent_diffusion.py		latent_diffusion.py
latent_sdxl.py		latent_sdxl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VISION XL: High Definition Video Inverse Problem Solver using Latent Diffusion Models

✨ Summary

🗓 News

🔥 Setup

🚀 Examples

High-definition video inverse problem solver

📝 Citation

About

Uh oh!

Releases

Packages

Languages

GPTAlgoPro/codes

Folders and files

Latest commit

History

Repository files navigation

VISION XL: High Definition Video Inverse Problem Solver using Latent Diffusion Models

✨ Summary

🗓 News

🔥 Setup

🚀 Examples

High-definition video inverse problem solver

📝 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages