yangcaoai

Yang Cao yangcaoai

Machine learning and computer vision

75 followers · 166 following

Achievements

Stars

jxbbb / TOD3Cap

[ECCV 2024] TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes

Python 123 8 Updated Mar 1, 2025

jinpeng0528 / SEFE

Python 2 Updated May 6, 2025

Songwxuan / RSS2025-CVPR2025-ICLR2025-Embodied-AI-Paper-List

🔥RSS2025 & CVPR2025 & ICLR2025 Embodied AI Paper List Resources. Star ⭐ the repo and follow me if you like what you see 🤩.

273 6 Updated May 12, 2025

Hellod035 / LeggedLab

Direct IsaacLab Workflow for Legged Robots

Python 201 15 Updated May 3, 2025

daveredrum / ScanRefer

[ECCV 2020] ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language

Python 260 28 Updated Feb 10, 2023

liudaizong / Awesome-3D-Visual-Grounding

😎 up-to-date & curated list of awesome 3D Visual Grounding papers, methods & resources.

172 5 Updated May 9, 2025

ibaiGorordo / vggt-pytorch-inference

Repository for running the VGGT model in PyTorch

Python 129 3 Updated Apr 20, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,321 1,677 Updated May 8, 2025

VARGPT-family / VARGPT-v1.1

VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning

Python 240 15 Updated Apr 15, 2025

devinxzhang / MFuser

[CVPR 2025 Highlight] Official code for paper "Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation"

24 1 Updated Apr 7, 2025

facebookresearch / vggt

[CVPR 2025 Oral] VGGT: Visual Geometry Grounded Transformer

Python 6,525 649 Updated May 12, 2025

Nightmare-n / DepthAnyVideo

Depth Any Video with Scalable Synthetic Data (ICLR 2025)

Python 477 28 Updated Dec 4, 2024

harlanhong / ACTalker

ACTalker: an end-to-end video diffusion framework for talking head synthesis that supports both single and multi-signal control (e.g., audio, expression).

254 16 Updated Apr 19, 2025

manycore-research / SpatialLM

SpatialLM: Large Language Model for Spatial Understanding

Python 3,157 243 Updated Mar 28, 2025

yangcaoai / CoDA_NeurIPS2023

Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection

Jupyter Notebook 198 16 Updated May 2, 2025

Pointcept / Pointcept

Pointcept: a codebase for point cloud perception research. Latest works: Sonata (CVPR'25 Highlight), PTv3 (CVPR'24 Oral), PPT (CVPR'24), MSC (CVPR'23)

Python 2,118 244 Updated May 10, 2025

LiuJF1226 / GaussHDR

[CVPR 2025] GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping

Python 20 1 Updated May 8, 2025

ayushjain1144 / odin

Code for the paper: "ODIN: A Single Model for 2D and 3D Segmentation" (CVPR 2024)

Python 149 14 Updated Apr 14, 2025

JonasSchult / Mask3D

Mask3D predicts accurate 3D semantic instances achieving state-of-the-art on ScanNet, ScanNet200, S3DIS and STPLS3D.

Python 617 118 Updated Oct 29, 2023

Pixie8888 / MVSDet

Code for NeurIPS 2024 work "MVSDet: Multi-View Indoor 3D Object Detection via Efficient Plane Sweeps"

Python 14 1 Updated Dec 11, 2024

staymylove / Dyve

Python 9 Updated Feb 26, 2025

zhongyingji / guidedvd-3dgs

Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs (CVPR2025 Highlight)

60 3 Updated Apr 6, 2025

xuxw98 / ESAM

[ICLR 2025, Oral] EmbodiedSAM: Online Segment Any 3D Thing in Real Time

Python 499 25 Updated May 7, 2025

HumanMLLM / HumanOmni

HumanOmni

Python 161 8 Updated Mar 10, 2025

MiZhenxing / ThinkDiff

I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

148 4 Updated Feb 19, 2025

NJU-3DV / FDS

[ICLR 2025] Flow Distillation Sampling: Regularizing 3D Gaussians with Pre-trained Matching Priors

Python 57 1 Updated Mar 4, 2025

deepseek-ai / DeepSeek-V3

Python 96,660 15,720 Updated Apr 9, 2025

TianxingChen / Embodied-AI-Guide

[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide

5,026 325 Updated May 9, 2025

SenseTime-FVG / OpenDWM

An open source code repository of driving world models, with training, inferencing, evaluation tools, and pretrained checkpoints.

Python 233 35 Updated May 6, 2025

jonyzhang2023 / awesome-embodied-vla-va-vln

A curated list of state-of-the-art research in embodied AI, focusing on vision-language-action (VLA) models, vision-language navigation (VLN), and related multimodal learning approaches.

603 29 Updated May 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yang Cao yangcaoai

Achievements

Achievements

Block or report yangcaoai

Stars

jxbbb / TOD3Cap

jinpeng0528 / SEFE

Songwxuan / RSS2025-CVPR2025-ICLR2025-Embodied-AI-Paper-List

Hellod035 / LeggedLab

daveredrum / ScanRefer

liudaizong / Awesome-3D-Visual-Grounding

ibaiGorordo / vggt-pytorch-inference

Dao-AILab / flash-attention

VARGPT-family / VARGPT-v1.1

devinxzhang / MFuser

facebookresearch / vggt

Nightmare-n / DepthAnyVideo

harlanhong / ACTalker

manycore-research / SpatialLM

yangcaoai / CoDA_NeurIPS2023

Pointcept / Pointcept

LiuJF1226 / GaussHDR

ayushjain1144 / odin

JonasSchult / Mask3D

Pixie8888 / MVSDet

staymylove / Dyve

zhongyingji / guidedvd-3dgs

xuxw98 / ESAM

HumanMLLM / HumanOmni

MiZhenxing / ThinkDiff

NJU-3DV / FDS

deepseek-ai / DeepSeek-V3

TianxingChen / Embodied-AI-Guide

SenseTime-FVG / OpenDWM

jonyzhang2023 / awesome-embodied-vla-va-vln