Lists (3)
Sort Name ascending (A-Z)
Starred repositories
Emu Series: Generative Multimodal Models from BAAI
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
GRUtopia: Dream General Robots in a City at Scale
A Python framework for accelerated simulation, data generation and spatial computing.
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
[CVPR2025 Highlight] SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Liquid: Language Models are Scalable and Unified Multi-modal Generators
Lumina-Image 2.0: A Unified and Efficient Image Generative Framework
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
[ArXiv 2025] WORLDMEM: Long-term Consistent World Simulation with Memory
Roblox Foundation Model for 3D Intelligence
[ECCV 2024] The official repo for "Texture-GS: Disentangling the Geometry and Texture for 3D Gaussian Splatting Editing"
OctGPT: Octree-based Multiscale Autoregressive Models for 3D Shape Generation
Stereo4D dataset and processing code
[ECCV 2022] SimpleRecon: 3D Reconstruction Without 3D Convolutions
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Le 9526 arning
[CVPR2025] Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data
Python package for the evaluation of odometry and SLAM
This repo accompanies the research paper, ARKitScenes - A Diverse Real-World Dataset for 3D Indoor Scene Understanding Using Mobile RGB-D Data and contains the data, scripts to visualize and proces…
[CVPR 2024 - Oral, Best Paper Award Candidate] Marigold: Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation
[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Blender Python PLY importer for point clouds and nonstandard models.
A simple and elegant Jekyll theme for an academic personal homepage
deepbeepmeep / Wan2GP
Forked from Wan-Video/Wan2.1Wan 2.1 for the GPU Poor
Infinite Photorealistic Worlds using Procedural Generation
Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.