GitHub - SihuiJi/LayerFlow: [SIGGRAGH'25] Official repository of LayerFlow: A Unified Model for Layer-aware Video Generation

LayerFlow: A Unified Model for Layer-aware Video Generation

Sihui Ji · Hao Luo · Xi Chen · Yuanpeng Tu · Yiyang Wang · Hengshuang Zhao

The University of Hong Kong | DAMO Academy, Alibaba Group | Hupan Lab

🔥 News

[2025.06.17]: Release the inference code.
[2025.06.04]: Release the project page and the arxiv paper.
[2025.03.29]: LayerFLow is accepted by Siggragh 2025 🎉🎉🎉.

📖 Introduction

TL;DR: We present LayerFlow, a unified solution for layer-aware video generation. Given per-layer prompts, LayerFlow generates videos for the transparent foreground, clean background, and blended scene. It also supports versatile variants like decomposing a blended video or generating the background for the given foreground and vice versa.

📑 Open-source Plan

Inference code
Model checkpoints
Training code

🛠️ Installation

Begin by cloning the repository:

git clone https://github.com/SihuiJi/LayerFlow.git
cd LayerFlow

Our project is developed based on the SAT version code of CogVideoX. You can follow the instructions of CogVideoX to install dependencies or:

conda create -n layer python==3.10
conda activate layer
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0
pip install -r requirements.txt

🧱 Download Pretrained Models

Models	Download Link (RGB version)	Download Link (RGBA version)
Multi-layer generation	🤗 Huggingface 🤖 ModelScope	🤗 Huggingface 🤖 ModelScope
Multi-layer decomposition	🤗 Huggingface 🤖 ModelScope	🤗 Huggingface 🤖 ModelScope
Foreground-conditioned generation	🤗 Huggingface 🤖 ModelScope	🤗 Huggingface 🤖 ModelScope
Background-conditioned generation	🤗 Huggingface 🤖 ModelScope	🤗 Huggingface 🤖 ModelScope

💡Note:

All models are finetuned from CogVideoX-2B.

RGB version represents the models generating foreground layer without alpha-matte, while the model of RGBA version simultaneously generate foreground videos and its alpha-matte which can be combined into RGBA videos. However, due to difficulties in cross-domain generation and channel alignment, the results are generally less stable compared to RGB version.

Download models using huggingface-cli:

pip install "huggingface_hub[cli]"
huggingface-cli download zjuJish/LayerFlow --local-dir ./sat/ckpts_2b_lora

or using git:

git lfs install
git clone https://huggingface.co/zjuJish/LayerFlow

For the pretrained VAE from CogVideoX-2B model, download as follows:

mkdir CogVideoX-2b-sat
cd CogVideoX-2b-sat
wget https://cloud.tsinghua.edu.cn/f/fdba7608a49c463ba754/?dl=1
mv 'index.html?dl=1' vae.zip
unzip vae.zip

Since model weight files are large, it’s recommended to use git lfs. See here for git lfs installation.

git lfs install

Next, clone the T5 model, which is used as an encoder and doesn’t require training or fine-tuning.

git clone https://huggingface.co/THUDM/CogVideoX-2b.git # Download model from Huggingface
# git clone https://www.modelscope.cn/ZhipuAI/CogVideoX-2b.git # Download from Modelscope
mkdir t5-v1_1-xxl
mv CogVideoX-2b/text_e
6E67
ncoder/* CogVideoX-2b/tokenizer/*  CogVideoX-2b-sat/t5-v1_1-xxl

You may also use the model file location on Modelscope.

Arrange the above model files in the following structure:

CogVideoX-2b-sat
│
├── t5-v1_1-xxl
│   ├── added_tokens.json
│   ├── config.json
│   ├── model-00001-of-00002.safetensors
│   ├── model-00002-of-00002.safetensors
│   ├── model.safetensors.index.json
│   ├── special_tokens_map.json
│   ├── spiece.model
│   └── tokenizer_config.json
└── vae
    └── 3d-vae.pt

sat
│
├── ckpts_2b_lora
│   ├── multi-layer-generation
│   ├── 1000
│   │   └── mp_rank_00_model_states.pt
│   └── latest
│   ├── multi-layer-decomposition
│   ├── 1000
│   │   └── mp_rank_00_model_states.pt
│   └── latest
│   ├── foreground-conditioned-generation
│   ├── 1000
│   │   └── mp_rank_00_model_states.pt
│   └── latest
│   ├── background-conditioned-generation
│   ├── 1000
│   │   └── mp_rank_00_model_states.pt
│   └── latest

🔑 Inference

cd sat

Run multi-layer generation (RGB version)

bash 'inference_stage2_gen_rgb.sh'

Run multi-layer generation (RGBA version)

bash 'inference_stage2_gen_rgba.sh'

Run multi-layer decomposition (RGB version)

bash 'inference_stage2_seg_rgb.sh'

Run multi-layer decomposition (RGBA version)

bash 'inference_stage2_seg_rgba.sh'

Run foreground-conditioned generation (RGB version)

bash 'inference_stage2_fg2bg_rgb.sh'

Run foreground-conditioned generation (RGBA version)

bash 'inference_stage2_fg2bg_rgba.sh'

Run background-conditioned generation (RGB version)

bash 'inference_stage2_bg2fg_rgb.sh'

Run background-conditioned generation (RGBA version)

bash 'inference_stage2_bg2fg_rgba.sh'

🤗 Acknowledgements

This project is developed on the codebase of CogVideoX. We appreciate this great work!

🌟 Citation

Please leave us a star 🌟 and cite our paper if you find our work helpful.

@article{ji2025layerflow,
  title={LayerFlow : A Unified Model for Layer-aware Video Generation}, 
  author={Ji, Sihui and Luo, Hao and Chen, Xi and Tu, Yuanpeng and Wang, Yiyang and Zhao, Hengshuang},
  year={2025},
  journal={arXiv preprint arXiv:2506.04228}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
sat		sat
.gitignore		.gitignore
README.md		README.md
logo.png		logo.png
output.txt		output.txt
requirements.txt		requirements.txt
teaser.png		teaser.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LayerFlow: A Unified Model for Layer-aware Video Generation

🔥 News

📖 Introduction

📑 Open-source Plan

🛠️ Installation

🧱 Download Pretrained Models

🔑 Inference

🤗 Acknowledgements

🌟 Citation

About

Uh oh!

Releases

Packages

Languages

SihuiJi/LayerFlow

Folders and files

Latest commit

History

Repository files navigation

LayerFlow: A Unified Model for Layer-aware Video Generation

🔥 News

📖 Introduction

📑 Open-source Plan

🛠️ Installation

🧱 Download Pretrained Models

🔑 Inference

🤗 Acknowledgements

🌟 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages