🚀 LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
This repository contains an unofficial implementation of the MATE block from the paper:
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, Ji Hou, Tao Xu, Jialiang Wang, Felix Juefei-Xu, Yaqiao Luo, Peizhao Zhang, Tingbo Hou, Peter Vajda, Niraj K. Jha, Xiaoliang Dai
CVPR 2025
This is an unofficial implementation of the MATE (MA-branch + TE-branch) block described in the LinGen paper, built on top of the PixArt codebase. The implementation enables linear computational complexity for text-to-video generation by replacing the quadratic-complexity self-attention with the proposed MATE block.
- 🔧 MATE Block Implementation: Custom implementation of the MA-branch and the TE-branch
- 📹 Video Support: Extended PixArt architecture to handle video data
- ⚡ Linear Complexity: Replaces the quadratic-complexity self-attention with the linear-complexity MATE block
- 🎨 Based on PixArt: Built upon the Diffusion Transformer architecture of PixArt
The MATE block consists of two main components:
- Bidirectional Mamba2 block for short-to-long-range token correlations
- Rotary Major Scan (RMS) for token rearrangement at almost no extra cost
- Review tokens for enhanced long video generation
- Implementation located in:
mamba_blocks/
directory
- Temporal Swin Attention block for spatially adjacent and temporally medium-range correlations
- Completely addresses adjacency preservation issues of Mamba
- Implementation located in:
temporal_swin_attn.py
├── PixArt/
│ └── PixArtMS.py # Modified PixArt with MATE block integration & video support
├── mamba_blocks/ # MA-branch implementations
├── temporal_swin_attn.py # TE-branch implementation
└── README.md # This file
-
PixArt/PixArtMS.py
:- Added option to replace standard self-attention with MATE block
- Extended to support video data
- Maintains compatibility with the original PixArt architecture
-
mamba_blocks/
:- Contains implementations of the MA-branch components
- Includes bidirectional Mamba2, Rotary Major Scan, and review tokens
-
temporal_swin_attn.py
:- Implements the TE-branch Temporal Swin Attention mechanism
- Handles temporal correlations and spatial adjacency
- Python >= 3.9 (Recommend Anaconda or Miniconda)
- PyTorch >= 1.13.0+cu11.7
The MATE block can be enabled in the PixArt architecture by modifying the configuration in PixArt/PixArtMS.py
. The implementation supports both image and video generation tasks with linear computational complexity.
- 15x FLOPs Reduction and 11.5x Latency Reduction: Significant speedup compared to standard Diffusion Transformers
- Linear Scaling: Computational cost scales linearly with number of pixels in the generated videos
- Minute-Length Videos: Enables generation of long videos without compromising quality
- Single GPU Inference: High-resolution minute-length video generation on a single GPU
- This is an unofficial implementation based on the paper description
- Built on the PixArt codebase which uses the standard Diffusion Transformer architecture
If you use this implementation in your research, please cite the original LinGen paper:
@inproceedings{wang2025lingen,
title={Lingen: Towards high-resolution minute-length text-to-video generation with linear computational complexity},
author={Wang, Hongjie and Ma, Chih-Yao and Liu, Yen-Cheng and Hou, Ji and Xu, Tao and Wang, Jialiang and Juefei-Xu, Felix and Luo, Yaqiao and Zhang, Peizhao and Hou, Tingbo and Vajda, Peter and Jha, Niraj K. and Dai, Xiaoliang},
booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
pages={2578--2588},
year={2025}
}
- LinGen Team: For the innovative MATE block and LinGen architecture design
- PixArt Team: For the excellent Diffusion Transformer codebase
- Original Paper: LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
For questions about this implementation, please open an issue in this repository.
For questions about the original LinGen research, please refer to the official project page.