Stars
Solve Visual Understanding with Reinforced VLMs
Fully open reproduction of DeepSeek-R1
Official repository of ’Visual-RFT: Visual Reinforcement Fine-Tuning’
An open-source implementaion for fine-tuning Molmo-7B-D and Molmo-7B-O by allenai.
LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.
Train transformer language models with reinforcement learning.
TimesFM (Time Series Foundation Model) is a pretrained time-series foundation model developed by Google Research for time-series forecasting.
Official implementation for "AutoTimes: Autoregressive Time Series Forecasters via Large Language Models"
Code release for "Learning to Generate Explainable Stock Predictions using Self-Reflective Large Language Models" https://arxiv.org/abs/2402.03659
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Replace OpenAI GPT with another LLM in your app by changing a single line of code. Xinference gives you the freedom to use any LLM you need. With Xinference, you're empowered to run inference with …
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, InternLM3, GLM4, Mistral, Yi1.5, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, Qwen2-Audio, Ovis2, …
DeepSeek-VL: Towards Real-World Vision-Language Understanding
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Universal LLM Deployment Engine with ML Compilation
An automated pipeline for evaluating LLMs for role-playing.
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Code & Models for Temporal Segment Networks (TSN) in ECCV 2016
Nightly release of ControlNet 1.1
AoyuQC / VILA
Forked from NVlabs/VILAVILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.