Highlights
- Pro
Stars
Pixel-Level Reasoning Model trained with RL
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
HoPE: Hybrid of Position Embedding for Length Generalization in Vision-Language Models
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
MiMo: Unlocking the Reasoning Potential of Language Model – From Pretraining to Posttraining
A streamlined and customizable framework for efficient large model evaluation and performance benchmarking
A powerful tool for creating fine-tuning datasets for LLM
Hackable and optimized Transformers building blocks, supporting a composable construction.
[ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, realistic, and adaptive scene generation for applications in…
verl: Volcano Engine Reinforcement Learning for LLMs
Reasoning in LLMs: Papers and Resources, including Chain-of-Thought, OpenAI o1, and DeepSeek-R1 🍓
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Witness the aha moment of VLM with less than $3.
强化学习中文教程(蘑菇书🍄),在线阅读地址:https://datawhalechina.github.io/easy-rl/
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
🍒 Cherry Studio is a desktop client that supports for multiple LLM providers.
[ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'
[ICML 2025 Oral] An official implementation of VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Fully open data curation for reasoning models
Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection