Stars
PyTorch code for binary segmentation on CelebAMask-HQ dataset via both a UNet written from scratch and a pretrained DeepLabv3 model.
A large-scale face dataset for face parsing, recognition, generation and editing.
Collection of awesome medical dataset resources.
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
The MCP Code Executor is an MCP server that allows LLMs to execute Python code within a specified Conda environment.
Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
[NeurIPS 2023]DDCoT: Duty-Distinct Chain-of-Thought Prompting for Multimodal Reasoning in Language Models
A Self-Training Framework for Vision-Language Reasoning
Official implementation for "Multimodal Chain-of-Thought Reasoning in Language Models" (stay tuned and more will be updated)
Contrastive Chain-of-Thought Prompting
Code for AAAI'24 paper "Graph Neural Prompting with Large Language Models".
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
Qwen2.5-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[CVPR2025 Highlight] Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
[NeurIPS 2024] HairDiffusion: Vivid Multi-Colored Hair Editing via Latent Diffusion
😎 A list of awesome scene understanding papers.
Official implementation for "Enhancing Semantics in Multimodal Chain of Thought via Soft Negative Sampling"
Fully open reproduction of DeepSeek-R1
[CVPR 2025 Oral] PyTorch re-implementation for Autoregressive Distillation of Diffusion Transformers (ARD).
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[CVPR2025 Highlight] PAR: Parallelized Autoregressive Visual Generation. https://yuqingwang1029.github.io/PAR-project
This repository provides valuable reference for researchers in the field of multimodality, please start your exploratory travel in RL-based Reasoning MLLMs!
code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"
A curated list of Awesome Personalized Large Multimodal Models resources