Stars
Repo for SeedVR2 & SeedVR (CVPR2025 Highlight)
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Reverse Engineering Gemma 3n: Google's New Edge-Optimized Language Model
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
[CVPR 2025 Best Paper Award Candidate] VGGT: Visual Geometry Grounded Transformer
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
Wan: Open and Advanced Large-Scale Video Generative Models
FlashMLA: Efficient MLA decoding kernels
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
MoBA: Mixture of Block Attention for Long-Context LLMs
[CVPR 2025 Highlight] Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
[Arxiv 2024] Edicho: Consistent Image Editing in the Wild
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[CVPR'25] Official Implementations for Paper - MagicQuill: An Intelligent Interactive Image Editing System
A suite of image and video neural tokenizers
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
Video Generation, Physical Commonsense, Semantic Adherence, VideoCon-Physics
Scaling Diffusion Transformers with Mixture of Experts
Meta Lingua: a lean, efficient, and easy-to-hack codebase to research LLMs.
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
A PyTorch native platform for training generative AI models
Long context evaluation for large language models
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.