Lists (1)
Sort Name ascending (A-Z)
Stars
A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.
AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
[ICRA 2024]: Train your parkour robot in less than 20 hours.
✨✨Latest Advances on Multimodal Large Language Models
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Streamlit — A faster way to build and share data apps.
[CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection