Stars
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
HunyuanVideo Keyframe Control Lora is an adapter for HunyuanVideo T2V model for keyframe-based video generation
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
Official inference repo for FLUX.1 models
Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.
OmniGen2: Exploration to Advanced Multimodal Generation.
PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838
Janus-Series: Unified Multimodal Understanding and Generation Models
[ICML2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.
PyTorch code and models for VJEPA2 self-supervised learning from video.
PyTorch code and models for V-JEPA self-supervised learning from video.
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, InternVL3, Llava, GLM4…
The official repo of 8866 Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
NoakLiu / FastCache-xDiT
Forked from xdit-project/xDiTFastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation [Efficient ML Model]
how to optimize some algorithm in cuda.
Distributed Compiler based on Triton for Parallel Systems
Tile primitives for speedy kernels
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
SkyReels-V2: Infinite-length Film Generative model
MAGI-1: Autoregressive Video Generation at Scale
一个能与Cursor集成的图片生成mcp server工具,实现调用即梦逆向接口