Stars
Simulated experiments for "Real-Time Execution of Action Chunking Flow Policies".
Online RL with Simple Reward Enables Training VLA Models with Only One Trajectory
Official implementation of Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"
🚀 One-stop solution for creating your digital avatar from chat history 💡 Fine-tune LLMs with your chat logs to capture your unique style, then bind to a chatbot to bring your digital self to life. …
Codes of paper "GraspSAM: When Segment Anything Model meets Grasp Detection", ICRA 2025
Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities
The simplest, fastest repository for training/finetuning small-sized VLMs.
[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
real time face swap and one-click video deepfake with only a single image
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
[CoRL 2024] HumanPlus: Humanoid Shadowing and Imitation from Humans
[CVPR 25 Highlight & ECCV 24 Workshop Best Paper] RoboTwin Dual-arm Robot Manipulation Simulation Platform
moojink / openvla-oft
Forked from openvla/openvlaFine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
Official code for "Behavior Generation with Latent Actions" (ICML 2024 Spotlight)
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[TRO 2025] NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning.
NVIDIA Isaac GR00T N1.5 is the world's first open foundation model for generalized humanoid robot reasoning and skills.
Code for the paper: "Active Vision Might Be All You Need: Exploring Active Vision in Bimanual Robotic Manipulation"
[CVPR 2025 Best Paper Award Candidate] VGGT: Visual Geometry Grounded Transformer