8000 yangcaoai (Yang Cao) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View yangcaoai's full-sized avatar

Block or report yangcaoai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

Python 123 8 Updated Mar 1, 2025
Python 2 Updated May 6, 2025

🔥RSS2025 & CVPR2025 & ICLR2025 Embodied AI Paper List Resources. Star ⭐ the repo and follow me if you like what you see 🤩.

273 6 Updated May 12, 2025

Direct IsaacLab Workflow for Legged Robots

Python 201 15 Updated May 3, 2025

[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

Python 260 28 Updated Feb 10, 2023

😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.

172 5 Updated May 9, 2025

Repository for running the VGGT model in PyTorch

Python 129 3 Updated Apr 20, 2025

Fast and memory-efficient exact attention

Python 17,321 1,677 Updated May 8, 2025

VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning

Python 240 15 Updated Apr 15, 2025

[CVPR 2025 Highlight] Official code for paper "Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation"

24 1 Updated Apr 7, 2025

[CVPR 2025 Oral] VGGT: Visual Geometry Grounded Transformer

Python 6,525 649 Updated May 12, 2025

Depth Any Video with Scalable Synthetic Data (ICLR 2025)

Python 477 28 Updated Dec 4, 2024

ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).

254 16 Updated Apr 19, 2025

SpatialLM: Large Language Model for Spatial Understanding

Python 3,157 243 Updated Mar 28, 2025

Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

Jupyter Notebook 198 16 Updated May 2, 2025

Pointcept: a codebase for point cloud perception research. Latest works: Sonata (CVPR'25 Highlight), PTv3 (CVPR'24 Oral), PPT (CVPR'24), MSC (CVPR'23)

Python 2,118 244 Updated May 10, 2025

[CVPR 2025] GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping

Python 20 1 Updated May 8, 2025

Code for the paper: "ODIN: A Single Model for 2D and 3D Segmentation" (CVPR 2024)

Python 149 14 Updated Apr 14, 2025

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.

Python 617 118 Updated Oct 29, 2023

Code for NeurIPS 2024 work "MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps"

Python 14 1 Updated Dec 11, 2024
Python 9 Updated Feb 26, 2025

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs (CVPR2025 Highlight)

60 3 Updated Apr 6, 2025

[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Python 499 25 Updated May 7, 2025

HumanOmni

Python 161 8 Updated Mar 10, 2025

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

148 4 Updated Feb 19, 2025

[ICLR 2025] Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors

Python 57 1 Updated Mar 4, 2025

[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide

5,026 325 Updated May 9, 2025

An open source code repository of driving world models, with training, inferencing, evaluation tools, and pretrained checkpoints.

Python 233 35 Updated May 6, 2025

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

603 29 Updated May 12, 2025
Next
0