-
CVC,UAB
- Barcelona
-
20:38
(UTC +02:00) - wangkai930418.github.io
- https://orcid.org/0000-0002-9605-8279
Starred repositories
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI
[CVPR2024] Diffusion-based Blind Text Image Super-Resolution (Official)
StyleAR: Customizing Multimodal Autoregressive Model for Style-Aligned Text-to-Image Generation
Large World Model -- Modeling Text and Video with Millions Context
Awesome Unified Multimodal Models
The code of the paper "Free-Lunch Color-Texture Disentanglement for Stylized Image Generation"
Open-source Multi-agent Poster Generation from Papers
GenEval: An object-focused framework for evaluating text-to-image alignment
Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
Lumina-mGPT 2.0: Stand-alone Autoregressive Image Modeling
Official Implementation of Diffusion Step Annealing (DiSA) in Autoregressive Image Generation
Codebase for "Jodi: Unification of Visual Generation and Understanding via Joint Modeling"
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds
[ICLR 2025] Official code implementation of DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
DreamO: A Unified Framework for Image Customization
[ICLR 2025] CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) …
[ICLR 2025] official implementation of "Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models"
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
[Preprint] UCGM: Unified Continuous Generative Models
Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training released! Surpasses GPT-4o in ID persistence! Official ComfyUI workflow release! Only 4GB VRAM is enou…
Expressive Gaussian Human Avatars from Monocular RGB Video (NeurIPS 2024)
[CVPR 2025 Oral] PyTorch re-implementation for Autoregressive Distillation of Diffusion Transformers (ARD).
This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video""