-
Tsinghua University
- Beijing
More
Lists (1)
Sort Name ascending (A-Z)
Starred repositories
[CVPR' 2025'] Mani-GS: Gaussian Splatting Manipulation with Triangular Mesh
[RSS 2025] Learning to Act Anywhere with Task-centric Latent Actions
PlaceIt3D: Language-Guided Object Placement in Real 3D Scenes
A curated list of awesome 3D scene generation papers. (arXiv 2505.05474)
[NeurIPS 2024]Repos for "Visualization-of-Thought" dataset, construction code and evaluation.
Official implementation of CVPR25 paper "Decompositional Neural Scene Reconstruction with Generative Diffusion Prior"
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
PE3R: Perception-Efficient 3D Reconstruction. Take 2 - 3 photos with your phone, upload them, wait a few minutes, and then start exploring your 3D world via text!
Implementing DeepSeek R1's GRPO algorithm from scratch
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
[Lumina Embodied AI Community] 具身智能技术指南 Embodied-AI-Guide
Code for "Multi-view Reconstruction via SfM-guided Monocular Depth Estimation". CVPR 2025 (Oral Presentation)
[CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching
Awesome RL Reasoning Recipes ("Triple R")
Calculating the actual value of your job beyond just salary
[ICLR 2025] EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
Code Release for CVPR (2025), "GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting"
Implementation of the project: SceneSplat - Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
[CVPR'2025] MonoInstance: Enhancing Monocular Priors via Multi-view Instance Alignment for Neural Rendering and Reconstruction
Towards a Training Free Approach for 3D Scene Editing
SpatialLM: Large Language Model for Spatial Understanding
MetaSpatial leverages reinforcement learning to enhance 3D spatial reasoning in vision-language models (VLMs), enabling more structured, realistic, and adaptive scene generation for applications in…
[CVPR 2025 Best Paper Award Candidate] VGGT: Visual Geometry Grounded Transformer
[CVPR 2025] Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
[CVPR 2025] MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation