Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
verl: Volcano Engine Reinforcement Learning for LLMs
The development and future prospects of multimodal reasoning models.
A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models
The repository for IEEE CVPR 2023 (A Light Weight Model for Active Speaker Detection)
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
Lets make video diffusion practical!
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
Official repo and evaluation implementation of VSI-Bench
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
📰 Must-read papers and blogs on LLM based Long Context Modeling 🔥
SpatialLM: Training Large Language Models for Structured Indoor Modeling
[ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
One for All Modalities Evaluation Toolkit - including text, image, video, audio tasks.
A feature-rich command-line audio/video downloader
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Awesome Reasoning LLM Tutorial/Survey/Guide
RWKV-SpeechChat is a real-time dialogue script based on a frozen 3B RWKV model with trained adapters and initial states. Various trained weights can be applied to perform a range of audio tasks, in…
Fully open reproduction of DeepSeek-R1
Movie Gen Bench - two media generation evaluation benchmarks released with Meta Movie Gen
VOCANO: A note transcription framework for singing voice in polyphonic music
A curated list of audio-visual learning methods and datasets.