Stars
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning
High-quality and streaming Speech-to-Speech interactive agent in a single file. 只用一个文件实现的流式全双工语音交互原型智能体!
The development and future prospects of multimodal reasoning models.
Cosmos-Transfer1 is a world-to-world transfer model designed to bridge the perceptual divide between simulated and real-world environments.
[CVPR 2025 Best Paper Award] VGGT: Visual Geometry Grounded Transformer
Witness the aha moment of VLM with less than $3.
Fully open reproduction of DeepSeek-R1
Infinite Photorealistic Worlds using Procedural Generation
3D-Adapter: Geometry-Consistent Multi-View Diffusion for High-Quality 3D Generation
[NeurIPS 2024] Animate3D: Animating Any 3D Model with Multi-view Video Diffusion
[ECCV 2024] Code for VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
A collection of papers on diffusion models for 3D generation.
Code for "GVHMR: World-Grounded Human Motion Recovery via Gravity-View Coordinates", Siggraph Asia 2024
A growing curation of Text-to-3D, Diffusion-to-3D works.
[CVPR 2025 Highlight] 3DTopia-XL: High-Quality 3D PBR Asset Generation via Primitive Diffusion
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement
Unified framework for robot learning built on NVIDIA Isaac Sim