xinyu1205

Xinyu Huang xinyu1205

Ph.D. Student at Fudan University, homepage: xinyu1205.github.io

133 followers · 54 following

Fudan University
Shanghai, China
https://xinyu1205.github.io

Achievements

Stars

facebookresearch / metamorph

Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning

Python 178 6 Updated Apr 19, 2025

EvolvingLMMs-Lab / MGPO

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

17 Updated May 28, 2025

Luodian / main-page-preview

Astro 2 2 Updated Jun 2, 2025

xinyu1205 / MGPO

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

Python 4 Updated May 26, 2025

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 3,470 233 Updated May 30, 2025

stepfun-ai / Step1X-Edit

A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.

Python 1,343 59 Updated May 28, 2025

zhijian-liu / torchprofile

A general and accurate MACs / FLOPs profiler for PyTorch models

Python 613 42 Updated May 5, 2024

NVlabs / Sana

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 4,212 270 Updated May 20, 2025

JiuhaiChen / BLIP3o

Python 1,080 35 Updated May 30, 2025

bytedance / UI-TARS-desktop

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

TypeScript 14,364 1,191 Updated Jun 2, 2025

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,127 41 Updated May 21, 2025

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,778 1,450 Updated May 29, 2025

IDEA-Research / DINO-X-API

DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding

Python 1,064 40 Updated May 26, 2025

black-forest-labs / flux

Official inference repo for FLUX.1 models

Python 21,933 1,560 Updated Feb 6, 2025

TAU-VAILab / Spice-E

This repo contains the python code as well as the webpage html files for the Spice-E project from VAILab at TAU.

Jupyter Notebook 21 1 Updated Dec 9, 2024

River-Zhang / ICEdit

Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training released! Surpasses GPT-4o in ID persistence! Official ComfyUI workflow release! Only 4GB VRAM is enou…

Python 1,612 92 Updated May 16, 2025

jamez-bondos / awesome-gpt4o-images

Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…

JavaScript 6,213 556 Updated May 26, 2025