Highlights
- Pro
Lists (1)
Sort Name ascending (A-Z)
Stars
[ICLR 2025] "Temporal Reasoning Transfer from Text to Video", Lei Li, Yuanxin Liu, Linli Yao, Peiyuan Zhang, Chenxin An, Lean Wang, Xu Sun, Lingpeng Kong, Qi Liu
Code and data for "Does Spatial Cognition Emerge in Frontier Models?"
A python module to repair invalid JSON from LLMs
A Python library for creating and solving mazes.
A high-throughput and memory-efficient inference and serving engine for LLMs
Tile primitives for speedy kernels
VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and cloud.
Habitat-Web is a web application to collect human demonstrations for embodied tasks on Amazon Mechanical Turk (AMT) using the Habitat simulator.
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.
Python library for loading and using triangular meshes.
Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting" [CVPR2024 Highlight]
[CVPR 2023] Code and datasets for 'Chat2Map Efficient Scene Mapping from Multi-Ego Conversations'
[ICCV 2023] PEANUT: Predicting and Navigating to Unseen Targets
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
A comprehensive list of papers using large language/multi-modal models for Robotics/RL, including papers, codes, and related websites
Hackable and optimized Transformers building blocks, supporting a composable construction.
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
Code for CVPR 2023 paper "Procedure-Aware Pretraining for Instructional Video Understanding"
PyTorch code and models for the DINOv2 self-supervised learning method.
JARVIS, a system to connect LLMs with ML community. Paper: https://arxiv.org/pdf/2303.17580.pdf
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
The repository for the largest and most comprehensive empirical study of visual foundation models for Embodied AI (EAI).
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory. CVPR 2023.
[CVPR 2023] vMAP: Vectorised Object Mapping for Neural Field SLAM