Martinser

🎯

Focusing

Ge Wu Martinser

🎯

Focusing

49 followers · 345 following

Nankai University

Achievements

Lists (29)

Sort

Starred repositories

zcablii / SARDet_100K

[NeurIPS 2024 spotlight] Offical implementation of MSFA and release of SARDet_100K dataset for Large-Scale Synthetic Aperture Radar (SAR) Object Detection

Python 565 32 Updated May 6, 2025

zcablii / SM3Det

Offical implementation of "SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection"

Python 217 6 Updated Feb 2, 2025

zcablii / LSKNet

(IJCV2024 & ICCV2023) LSKNet: A Foundation Lightweight Backbone for Remote Sensing

Python 589 47 Updated Feb 10, 2025

showlab / Awesome-Unified-Multimodal-Models

📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.

617 32 Updated Jun 27, 2025

exped1230 / Awesome_Image_Generation_with_Thinking

Resources and paper list for "Image Generation with Thinking", particular focus on the utilizing of reinforcement learning.

13 1 Updated Jul 14, 2025

AIDC-AI / Ovis-U1

An unified model that seamlessly integrates multimodal understanding, text-to-image generation, and image editing within a single powerful framework.

Python 347 9 Updated Jul 8, 2025

VectorSpaceLab / OmniGen2

OmniGen2: Exploration to Advanced Multimodal Generation.

Jupyter Notebook 3,385 264 Updated Jul 5, 2025

NJU-PCALab / RAG-Diffusion

[ICCV 2025] Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement 🔥

Python 585 26 Updated Jun 26, 2025

MCG-NJU / DDT

DDT: Decoupled Diffusion Transformer

DB87 Python 264 15 Updated Jul 3, 2025

Martinser / REG

Python 55 6 Updated Jul 10, 2025

mayuelala / Awesome-Controllable-Video-Generation

🚀🚀🚀A curated list of papers on controllable video generation.

299 22 Updated Jul 8, 2025

FoundationVision / Liquid

Liquid: Language Models are Scalable and Unified Multi-modal Generators

Python 601 34 Updated Apr 8, 2025

PKU-YuanGroup / UniWorld-V1

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Python 636 20 Updated Jul 1, 2025

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,924 130 Updated Oct 30, 2024

ML-GSAI / Diffusion-LLM-Papers

A Collection of Papers on Diffusion Language Models

90 Updated Jul 4, 2025

Gen-Verse / MMaDA

MMaDA - Open-Sourced Multimodal Large Diffusion Language Models

Python 1,191 55 Updated Jun 13, 2025

JiuhaiChen / BLIP3o

Python 1,274 49 Updated Jul 11, 2025

AIDC-AI / Awesome-Unified-Multimodal-Models

Awesome Unified Multimodal Models

454 11 Updated Jul 2, 2025

deepseek-ai / Janus

Janus-Series: Unified Multimodal Understanding and Generation Models

Python 17,450 2,243 Updated Feb 1, 2025

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 4,550 384 Updated Jul 2, 2025

lxa9867 / Awesome-Autoregressive-Visual-Generation

This is a repo to track the latest autoregressive visual generation papers.

369 5 Updated Jun 25, 2025

VainF / TinyFusion

[CVPR 2025 Highlight] TinyFusion: Diffusion Transformers Learned Shallow

Python 130 1 Updated Apr 5, 2025

AMAP-ML / USP

USP: Unified Self-Supervised Pretraining for Image Generation and Understanding

Python 72 Updated Jun 30, 2025

microsoft / LoRA

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 12,239 781 Updated Dec 17, 2024

Video-R1 / Awesome-Multimodal-Reasoning

Collections of Papers and Projects for Multimodal Reasoning.

105 9 Updated Apr 25, 2025

willisma / SiT

Official PyTorch Implementation of "SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers"

Python 908 55 Updated Mar 12, 2024

ShenZhang-Shin / LEDiT

PyTorch Implementation of "LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding"

19 1 Updated Mar 7, 2025

zhixuan-lin / forgetting-transformer

[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"

Python 116 7 Updated Jul 5, 2025

yuecao0119 / MMFuser

The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". MMFuser addresses the limitations of current MLLMs in captur…

Ge Wu Martinser

Lists (29)

3D

clip

CoT

datasets

DETR

Diffusion

🔮 Future ideas

GAN

GPT

latex

Linear attention

Lora

MAE

mamba

Mixup/Cutmix

MLP

Moe

Network

NLP

one

Optimizers

OVSS+OVD

RNN

SAM

Semantic Segmentation

Uncertainty

Wait

work

Writing

Starred repositories

mae