Stars
A collection of vision-language-action model post-training methods.
A curated list of awesome papers on Embodied AI and related research/industry-driven resources.
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
[IROS 2025] The Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems
[CVPR 2024 Highlight] PhysGaussian: Physics-Integrated 3D Gaussians for Generative Dynamics
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide
Reference workflow for generating large amounts of synthetic motion trajectories for robot manipulation from a few human demonstrations.
robosuite: A Modular Simulation Framework and Benchmark for Robot Learning
This code corresponds to simulation environments used as part of the DexMimicGen project.
A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.
Matrix-Game: Interactive World Foundation Model
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
GRUtopia: Dream General Robots in a City at Scale
RGB-Only Gaussian Splatting SLAM for Unbounded Outdoor Scenes (ICRA 2025)
[IEEE T-RO 2023] Source code of RING and RING++ for loop closure detection in LiDAR SLAM.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
[ICML2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
A generative world for general-purpose robotics & embodied AI learning.
Wan: Open and Advanced Large-Scale Video Generative Models
New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos
deepbeepmeep / Wan2GP
Forked from Wan-Video/Wan2.1Wan 2.1 for the GPU Poor
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
[ICCV 2025] Aether: Geometric-Aware Unified World Modeling
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference