Highlights
- Pro
Stars
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
Open-source Multi-agent Poster Generation from Papers
[3DV'25] WaterSplatting Fast Underwater 3D Scene Reconstruction Using Gaussian Splatting
[Embodied-AI-Survey-2025] Paper List and Resource Repository for Embodied AI
Pointers to large-scale underwater datasets and relevant resources.
The Project of ECCV 2024 Oral Paper "Oriented Object Detection vis Point-Axis Representation"
[IEEE GRSM 2025 🔥] "Text2Earth: Unlocking Text-driven Remote Sensing Image Generation with a Global-Scale Dataset and a Foundation Model"
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
[NeurIPS 2024] Understanding Multi-Granularity for Open-Vocabulary Part Segmentation
This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023)
[NeurIPS2023] Code release for "Hierarchical Open-vocabulary Universal Image Segmentation"
EntitySeg Toolbox: Towards Open-World and High-Quality Image Segmentation
A generative world for general-purpose robotics & embodied AI learning.
OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
[CVPR 2024] Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations
[ECCV-2022] The First Unified End-to-End System for Panoptic Part Segmentation
This is a repository for listing papers on scene graph generation and application.
Official implementation of CVPR 2024 paper: "FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition"
[CVPR2024] StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
[ECCV 2024] Official implementation of the paper "X-Pose: Detecting Any Keypoints"
Official code of SmartEdit [CVPR-2024 Highlight]