More
8000
Stars
MAGI-1: Autoregressive Video Generation at Scale
State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!
[CVPR 2025 Oral]Infinity ∞ : Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis
A generative world for general-purpose robotics & embodied AI learning.
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
[NeurIPS 2024] Code release for "Segment Anything without Supervision"
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Official Implementation for CVPR 2024 paper: CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
[CVPR2024, Highlight] Official code for DragDiffusion
[NeurIPS 2024 Best Paper][GPT beats diffusion🔥] [scaling laws in visual generation📈] Official impl. of "Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction". An *ult…
A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-to-end driving
A471 OMG-LLaVA and OMG-Seg codebase [CVPR-24 and NeurIPS-24]
[ICLR 2024 Spotlight] Official implementation of ScaleCrafter for higher-resolution visual generation at inference time.
Official Code for MotionCtrl [SIGGRAPH 2024]
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
[ICCV 2023 Oral] "FateZero: Fusing Attentions for Zero-shot Text-based Video Editing"
[ICLR 2024] Real-Fake: Effective Training Data Synthesis Through Distribution Matching
[arXiv:2309.16669] Code release for "Training a Large Video Model on a Single Machine in a Day"
Official release of InternLM series (InternLM, InternLM2, InternLM2.5, InternLM3).
😂😂😂Official Implementation for ICCV 2023 paper: OxfordTVG-HIC: Can Machine Make Humorous Captions from Images?
[ICLR 2024 Spotlight] Unified Human-Scene Interaction via Prompted Chain-of-Contacts
[ECCV 2024 Best Paper Candidate] PointLLM: Empowering Large Language Models to Understand Point Clouds
[NeurIPS 2023] This repo contains the code for our paper Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
a PyTorch re-implementation of ECCV 2022 paper based on Detectron2: k-means mask Transformer.
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything