Lists (9)
Sort Name ascending (A-Z)
Stars
[TMLR 2025🔥] A survey for the autoregressive models in vision.
DDPO for finetuning diffusion models, implemented in PyTorch with LoRA support
An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
Official repository for VisionZip (CVPR 2025)
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
Frontier Multimodal Foundation Models for Image and Video Understanding
A high-throughput and memory-efficient inference and serving engine for LLMs
Community maintained hardware plugin for vLLM on Ascend
GenEval: An object-focused framework for evaluating text-to-image alignment
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Mu…
Domain Generalization with MixStyle (ICLR'21)
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
mixup: Beyond Empirical Risk Minimization
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
A method to increase the speed and lower the memory footprint of existing vision transformers.
HunyuanVideo: A Systematic Framework For Large Video Generation Model