Lists (4)
Sort Name ascending (A-Z)
Stars
Official repository for LegoGPT, the first approach for generating physically stable LEGO brick models from text prompts.
[CVPR2025] RORem: Training a Robust Object Remover with Human-in-the-Loop
VisualCloze: A universal image generation framework that can support a wide range of in-domain tasks and generalize to unseen ones. (🔥 🔥 🔥 Merged into offical pipelines of diffusers.)
🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× vs cuBLAS
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
Official implementation code of the paper <AnyText2: Visual Text Generation and Editing With Customizable Attributes>
🎨 IMAGGarment-1: Fine-Grained Garment Generation with Controllable Structure, Color, and Logo. It supports precise and customizable garment synthesis guided by multi-conditions (e.g., sketch, colo…
MAGI-1: Autoregressive Video Generation at Scale
Implementation code of the paper MIGE: A Unified Framework for Multimodal Instruction-Based Image Generation and Editing
Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"
Official implementations for paper: PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention
UniAnimate-DiT: Human Image Animation with Large-Scale Video Diffusion Transformer
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
CogView4, CogVie 9AF2 w3-Plus and CogView3(ECCV 2024)
🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
Code for Deep Single-image Portrait Image Relighting
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning (CVPR 2025)
[CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.
GPT-ImgEval: Evaluating GPT-4o’s state-of-the-art image generation capabilities
MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning
⭐️ Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning.
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
SkyReels-A2: Compose anything in video diffusion transformers
Lumina-mGPT 2.0: Stand-alone Autoregressive Image Modeling