Lists (13)
Sort Name ascending (A-Z)
Stars
Official implemetation of the paper "InSpire: Vision-Language-Action Models with Intrinsic Spatial Reasoning"
Stanford-ILIAD / openvla-mini
Forked from openvla/openvlaOpenVLA: An open-source vision-language-action model for robotic manipulation.
Official code for EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Official implementation of "OneTwoVLA: A Unified Vision-Language-Action Model with Adaptive Reasoning"
UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
[CoRL 24] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy
🚀 One-stop solution for creating your digital avatar from chat logs 💡 Fine-tune LLMs with your chat logs to capture your unique style, then bind to a chatbot to bring your digital self to life. 从聊天…
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
OpenHelix: An Open-source Dual-System VLA Model for Robotic Manipulation
Collect some World Models for Embodied AI
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
DelinQu / SimplerEnv-OpenVLA
Forked from simpler-env/SimplerEnvEvaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo, and OpenVLA) in simulation under common setups (e.g., Google Robot, WidowX+Bridge)
A very simple GRPO implement for reproducing r1-like LLM thinking.
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
[CVPR 2025] Mr. DETR: Instructive Multi-Route Training for Detection Transformers
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.
[NeurIPS 24] Spiking Neural Network as Adaptive Event Stream Slicer
SpatialLM: Large Language Model for Spatial Understanding
Cosmos-Reason1 models understand the physical common sense and generate appropriate embodied decisions in natural language through long chain-of-thought reasoning processes.
A generative world for general-purpose robotics & embodied AI learning.
Dense Policy: Bidirectional Autoregressive Learning of Actions
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Official PyTorch implementation for "Large Language Diffusion Models"