8000 GitHub - jha-lab/LinGen
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

jha-lab/LinGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚀 LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

CVPR 2025


This repository contains an unofficial implementation of the MATE block from the paper:

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu, Ji Hou, Tao Xu, Jialiang Wang, Felix Juefei-Xu, Yaqiao Luo, Peizhao Zhang, Tingbo Hou, Peter Vajda, Niraj K. Jha, Xiaoliang Dai
CVPR 2025


🎯 About This Repository

This is an unofficial implementation of the MATE (MA-branch + TE-branch) block described in the LinGen paper, built on top of the PixArt codebase. The implementation enables linear computational complexity for text-to-video generation by replacing the quadratic-complexity self-attention with the proposed MATE block.

Key Features

  • 🔧 MATE Block Implementation: Custom implementation of the MA-branch and the TE-branch
  • 📹 Video Support: Extended PixArt architecture to handle video data
  • ⚡ Linear Complexity: Replaces the quadratic-complexity self-attention with the linear-complexity MATE block
  • 🎨 Based on PixArt: Built upon the Diffusion Transformer architecture of PixArt

🏗️ Architecture Overview

The MATE block consists of two main components:

MA-Branch

  • Bidirectional Mamba2 block for short-to-long-range token correlations
  • Rotary Major Scan (RMS) for token rearrangement at almost no extra cost
  • Review tokens for enhanced long video generation
  • Implementation located in: mamba_blocks/ directory

TE-Branch

  • Temporal Swin Attention block for spatially adjacent and temporally medium-range correlations
  • Completely addresses adjacency preservation issues of Mamba
  • Implementation located in: temporal_swin_attn.py

📂 Repository Structure

├── PixArt/
│   └── PixArtMS.py          # Modified PixArt with MATE block integration & video support
├── mamba_blocks/            # MA-branch implementations  
├── temporal_swin_attn.py    # TE-branch implementation
└── README.md               # This file

Core Modifications

  1. PixArt/PixArtMS.py:

    • Added option to replace standard self-attention with MATE block
    • Extended to support video data
    • Maintains compatibility with the original PixArt architecture
  2. mamba_blocks/:

    • Contains implementations of the MA-branch components
    • Includes bidirectional Mamba2, Rotary Major Scan, and review tokens
  3. temporal_swin_attn.py:

    • Implements the TE-branch Temporal Swin Attention mechanism
    • Handles temporal correlations and spatial adjacency

🚀 Getting Started

Dependencies

Usage

The MATE block can be enabled in the PixArt architecture by modifying the configuration in PixArt/PixArtMS.py. The implementation supports both image and video generation tasks with linear computational complexity.


📊 Key Benefits

  • 15x FLOPs Reduction and 11.5x Latency Reduction: Significant speedup compared to standard Diffusion Transformers
  • Linear Scaling: Computational cost scales linearly with number of pixels in the generated videos
  • Minute-Length Videos: Enables generation of long videos without compromising quality
  • Single GPU Inference: High-resolution minute-length video generation on a single GPU

⚠️ Important Notes

  • This is an unofficial implementation based on the paper description
  • Built on the PixArt codebase which uses the standard Diffusion Transformer architecture

📝 Citation

If you use this implementation in your research, please cite the original LinGen paper:

@inproceedings{wang2025lingen,
  title={Lingen: Towards high-resolution minute-length text-to-video generation with linear computational complexity},
  author={Wang, Hongjie and Ma, Chih-Yao and Liu, Yen-Cheng and Hou, Ji and Xu, Tao and Wang, Jialiang and Juefei-Xu, Felix and Luo, Yaqiao and Zhang, Peizhao and Hou, Tingbo and Vajda, Peter and Jha, Niraj K. and Dai, Xiaoliang},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={2578--2588},
  year={2025}
}

🙏 Acknowledgments


📧 Contact

For questions about this implementation, please open an issue in this repository.
For questions about the original LinGen research, please refer to the official project page.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

0