-
The University of Hong Kong
- https://scholar.google.com/citations?user=JW4F5HoAAAAJ&hl
Stars
LimSim & LimSim++: Integrated traffic and autonomous driving simulators with (M)LLM support
[CVPR 2025 Oral] Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
[CVPR 2025 Oral] VGGT: Visual Geometry Grounded Transformer
TesserAct: Learning 4D Embodied World Models
Kimi-Audio, an open-source audio foundation model excelling in audio understanding, generation, and conversation
🌍 WorldGen - Generate Any 3D Scene in Seconds
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
3D Gaussian Splatting (3DGS) on fisheye cameras
[NeurIPS 2024] MotionGS: Exploring Explicit Motion Guidance for Deformable 3D Gaussian Splatting
MAGI-1: Autoregressive Video Generation at Scale
HunyuanVideo-I2V: A Customizable Image-to-Video Model based on HunyuanVideo
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
[ECCV 2024] Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
[CVPR 2025]Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
[CVPR 2025 Oral] PyTorch re-implementation for Autoregressive Distillation of Diffusion Transformers (ARD).
[CVPR 2025 Highlight] VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"
A unified framework for 3D content generation.
[ICLR'25 Oral] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Unifying 3D Mesh Generation with Language Models
[ArXiv 2025] Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
Code for "BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation", arXiv 2025.
Train your AI self, amplify you, bridge the world
[AAAI 2025] Offical implementation of "DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input"
Official implementation of NeurIPS 2024 paper: "FreeSplat: Generalizable 3D Gaussian Splatting Towards Free-View Synthesis of Indoor Scenes"
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning