Stars
🚀 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!
🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.
[CVPR 2025 Best Paper Nomination] FoundationStereo: Zero-Shot Stereo Matching
"VicaSplat: A Single Run is All You Need for 3D Gaussian Splatting and Camera Estimation from Unposed Video Frames"
🐍 Geometric Computer Vision Library for Spatial AI
ROMAN is a view-invariant global localization method that maps open-set objects and uses the geometry, shape, and semantics of objects to find the transformation between a current pose and previous…
PyTorch Lightning + Hydra. A very user-friendly template for ML experimentation. ⚡🔥⚡
🔥 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
[CVPR2025] Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
A generative world for general-purpose robotics & embodied AI learning.
🌟A curated list of DUSt3R-related papers and resources, tracking recent advancements using this geometric foundation model.
A 3D Gaussian Splatting framework with various derived algorithms and an interactive web viewer
CUDA accelerated rasterization of gaussian splatting
Unofficial implementation of 3D Gaussian Splatting in PyTorch + CUDA with MIT license
arXiv LaTeX Cleaner: Easily clean the LaTeX code of your paper to submit to arXiv
[TPAMI 2023] SceneDreamer: Unbounded 3D Scene Generation from 2D Image Collections
Curated list of papers and resources focused on 3D Gaussian Splatting, intended to keep pace with the anticipated surge of research in the coming months.
A growing curation of Text-to-3D, Diffusion-to-3D works.
EmotiVoice 😊: a Multi-Voice and Prompt-Controlled TTS Engine
Draw a mockup and generate html for it