8000 GitHub - SihuiJi/LayerFlow: [SIGGRAGH'25] Official repository of LayerFlow: A Unified Model for Layer-aware Video Generation
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[SIGGRAGH'25] Official repository of LayerFlow: A Unified Model for Layer-aware Video Generation

Notifications You must be signed in to change notification settings

SihuiJi/LayerFlow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LayerFlow: A Unified Model for Layer-aware Video Generation

Sihui Ji · Hao Luo · Xi Chen · Yuanpeng Tu · Yiyang Wang · Hengshuang Zhao

Paper PDF Project Page
The University of Hong Kong   |   DAMO Academy, Alibaba Group |   Hupan Lab

🔥 News

📖 Introduction

TL;DR: We present LayerFlow, a unified solution for layer-aware video generation. Given per-layer prompts, LayerFlow generates videos for the transparent foreground, clean background, and blended scene. It also supports versatile variants like decomposing a blended video or generating the background for the given foreground and vice versa.

📑 Open-source Plan

  • Inference code
  • Model checkpoints
  • Training code

🛠️ Installation

Begin by cloning the repository:

git clone https://github.com/SihuiJi/LayerFlow.git
cd LayerFlow

Our project is developed based on the SAT version code of CogVideoX. You can follow the instructions of CogVideoX to install dependencies or:

conda create -n layer python==3.10
conda activate layer
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0
pip install -r requirements.txt

🧱 Download Pretrained Models

Models Download Link (RGB version) Download Link (RGBA version)
Multi-layer generation 🤗 Huggingface 🤖 ModelScope 🤗 Huggingface 🤖 ModelScope
Multi-layer decomposition 🤗 Huggingface 🤖 ModelScope 🤗 Huggingface 🤖 ModelScope
Foreground-conditioned generation 🤗 Huggingface 🤖 ModelScope 🤗 Huggingface 🤖 ModelScope
Background-conditioned generation 🤗 Huggingface 🤖 ModelScope 🤗 Huggingface 🤖 ModelScope

💡Note:

  • All models are finetuned from CogVideoX-2B.
  • RGB version represents the models generating foreground layer without alpha-matte, while the model of RGBA version simultaneously generate foreground videos and its alpha-matte which can be combined into RGBA videos. However, due to difficulties in cross-domain generation and channel alignment, the results are generally less stable compared to RGB version.

Download models using huggingface-cli:

pip install "huggingface_hub[cli]"
huggingface-cli download zjuJish/LayerFlow --local-dir ./sat/ckpts_2b_lora

or using git:

git lfs install
git clone https://huggingface.co/zjuJish/LayerFlow

For the pretrained VAE from CogVideoX-2B model, download as follows:

mkdir CogVideoX-2b-sat
cd CogVideoX-2b-sat
wget https://cloud.tsinghua.edu.cn/f/fdba7608a49c463ba754/?dl=1
mv 'index.html?dl=1' vae.zip
unzip vae.zip

Since model weight files are large, it’s recommended to use git lfs. See here for git lfs installation.

git lfs install

Next, clone the T5 model, which is used as an encoder and doesn’t require training or fine-tuning.

git clone https://huggingface.co/THUDM/CogVideoX-2b.git # Download model from Huggingface
# git clone https://www.modelscope.cn/ZhipuAI/CogVideoX-2b.git # Download from Modelscope
mkdir t5-v1_1-xxl
mv CogVideoX-2b/text_e
6E67
ncoder/* CogVideoX-2b/tokenizer/*  CogVideoX-2b-sat/t5-v1_1-xxl

You may also use the model file location on Modelscope.

Arrange the above model files in the following structure:

CogVideoX-2b-sat
│
├── t5-v1_1-xxl
│   ├── added_tokens.json
│   ├── config.json
│   ├── model-00001-of-00002.safetensors
│   ├── model-00002-of-00002.safetensors
│   ├── model.safetensors.index.json
│   ├── special_tokens_map.json
│   ├── spiece.model
│   └── tokenizer_config.json
└── vae
    └── 3d-vae.pt

sat
│
├── ckpts_2b_lora
│   ├── multi-layer-generation
│   ├── 1000
│   │   └── mp_rank_00_model_states.pt
│   └── latest
│   ├── multi-layer-decomposition
│   ├── 1000
│   │   └── mp_rank_00_model_states.pt
│   └── latest
│   ├── foreground-conditioned-generation
│   ├── 1000
│   │   └── mp_rank_00_model_states.pt
│   └── latest
│   ├── background-conditioned-generation
│   ├── 1000
│   │   └── mp_rank_00_model_states.pt
│   └── latest

🔑 Inference

cd sat

Run multi-layer generation (RGB version)

bash 'inference_stage2_gen_rgb.sh'

Run multi-layer generation (RGBA version)

bash 'inference_stage2_gen_rgba.sh'

Run multi-layer decomposition (RGB version)

bash 'inference_stage2_seg_rgb.sh'

Run multi-layer decomposition (RGBA version)

bash 'inference_stage2_seg_rgba.sh'

Run foreground-conditioned generation (RGB version)

bash 'inference_stage2_fg2bg_rgb.sh'

Run foreground-conditioned generation (RGBA version)

bash 'inference_stage2_fg2bg_rgba.sh'

Run background-conditioned generation (RGB version)

bash 'inference_stage2_bg2fg_rgb.sh'

Run background-conditioned generation (RGBA version)

bash 'inference_stage2_bg2fg_rgba.sh'

🤗 Acknowledgements

This project is developed on the codebase of CogVideoX. We appreciate this great work!

🌟 Citation

Please leave us a star 🌟 and cite our paper if you find our work helpful.

@article{ji2025layerflow,
  title={LayerFlow : A Unified Model for Layer-aware Video Generation}, 
  author={Ji, Sihui and Luo, Hao and Chen, Xi and Tu, Yuanpeng and Wang, Yiyang and Zhao, Hengshuang},
  year={2025},
  journal={arXiv preprint arXiv:2506.04228}, 
}

About

[SIGGRAGH'25] Official repository of LayerFlow: A Unified Model for Layer-aware Video Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0