Stars
[ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
🔥RSS2025 & CVPR2025 & ICLR2025 Embodied AI Paper List Resources. Star ⭐ the repo and follow me if you like what you see 🤩.
[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language
😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.
Repository for running the VGGT model in PyTorch
Fast and memory-efficient exact attention
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
[CVPR 2025 Highlight] Official code for paper "Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation"
[CVPR 2025 Oral] VGGT: Visual Geometry Grounded Transformer
Depth Any Video with Scalable Synthetic Data (ICLR 2025)
ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).
SpatialLM: Large Language Model for Spatial Understanding
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Pointcept: a codebase for point cloud perception research. Latest works: Sonata (CVPR'25 Highlight), PTv3 (CVPR'24 Oral), PPT (CVPR'24), MSC (CVPR'23)
[CVPR 2025] GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping
Code for the paper: "ODIN: A Single Model for 2D and 3D Segmentation" (CVPR 2024)
Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.
Code for NeurIPS 2024 work "MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps"
Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs (CVPR2025 Highlight)
[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
[ICLR 2025] Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide
An open source code repository of driving world models, with training, inferencing, evaluation tools, and pretrained checkpoints.
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.