-
Nankai University
Lists (29)
Sort Name ascending (A-Z)
3D
clip
CoT
datasets
DETR
Diffusion
🔮 Future ideas
GAN
GPT
latex
Linear attention
Lora
MAE
mamba
Mixup/Cutmix
MLP
Moe
Network
NLP
one
Optimizers
OVSS+OVD
RNN
SAM
Semantic Segmentation
Uncertainty
Wait
work
Writing
Starred repositories
[NeurIPS 2024 spotlight] Offical implementation of MSFA and release of SARDet_100K dataset for Large-Scale Synthetic Aperture Radar (SAR) Object Detection
Offical implementation of "SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection"
(IJCV2024 & ICCV2023) LSKNet: A Foundation Lightweight Backbone for Remote Sensing
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
Resources and paper list for "Image Generation with Thinking", particular focus on the utilizing of reinforcement learning.
An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.
OmniGen2: Exploration to Advanced Multimodal Generation.
[ICCV 2025] Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement 🔥
🚀🚀🚀A curated list of papers on controllable video generation.
Liquid: Language Models are Scalable and Unified Multi-modal Generators
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
A Collection of Papers on Diffusion Language Models
MMaDA - Open-Sourced Multimodal Large Diffusion Language Models
Awesome Unified Multimodal Models
Janus-Series: Unified Multimodal Understanding and Generation Models
This is a repo to track the latest autoregressive visual generation papers.
[CVPR 2025 Highlight] TinyFusion: Diffusion Transformers Learned Shallow
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Collections of Papers and Projects for Multimodal Reasoning.
Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"
PyTorch Implementation of "LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding"
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". MMFuser addresses the limitations of current MLLMs in captur…