Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
- All languages
- ActionScript
- Arduino
- C
- C#
- C++
- CSS
- CartoCSS
- Common Lisp
- Cuda
- Cython
- Dockerfile
- Emacs Lisp
- Go
- HTML
- Haxe
- Inform 7
- Java
- JavaScript
- Jinja
- Julia
- Jupyter Notebook
- Kotlin
- Lasso
- Lua
- MATLAB
- Makefile
- Markdown
- OpenEdge ABL
- PDDL
- PostScript
- Prolog
- Python
- R
- Rich Text Format
- Ruby
- Rust
- SMT
- Scala
- Shell
- Slash
- Standard ML
- Swift
- Tcl
- TeX
- TypeScript
- VHDL
- Vim Script
A minimal codebase for finetuning large multimodal models, supporting llava-1.5/1.6, llava-interleave, llava-next-video, llava-onevision, llama-3.2-vision, qwen-vl, qwen2-vl, phi3-v etc.
[CoRL2024] Official repo of `A3VLM: Actionable Articulation-Aware Vision Language Model`
[IROS24 Oral]ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models
A collection of tabletop tasks in Mujoco
Environments, assets, workflow for open-source mobile robotics, integrated with IsaacLab.
Democratizing Augmented Reality research and development.
[CVPR 2025] Official code repository for Beacon3D benchmark
A powerful automation agent for macOS that enables natural language control of various system applications and services. This agent allows you to interact with your Mac using simple text commands, …
MineWorld: A Real-time interactive world model on Minecraft
Official implementation of "Re3Sim: Generating High-Fidelity Simulation Data via 3D-Photorealistic Real-to-Sim for Robotic Manipulation"
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
[ICCV 2025] A simple training-free approach adapting DUSt3R for dynamic scenes.
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
CoTracker is a model for tracking any point (pixel) on a video.
RF-DETR is a real-time object detection model architecture developed by Roboflow, SOTA on COCO & designed for fine-tuning.
Official Implementation of "JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse"
Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
EMMOE: A Comprehensive Benchmark for Embodied Mobile Manipulation in Open Environments
Embodied Reasoning Question Answer (ERQA) Benchmark
Learn how to use CUA (our Computer Using Agent) via the API on multiple computer environments.
No fortress, purely open ground. OpenManus is Coming.
Official Implementation of Paper "ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment"
Decentralized Simulation Framework designed to integrate multiple advanced physics engines along with various photo-realistic graphics engines to simulate everything