8000 xinyu1205 (Xinyu Huang) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View xinyu1205's full-sized avatar

Block or report xinyu1205

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning

Python 178 6 Updated Apr 19, 2025

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

17 Updated May 28, 2025
Astro 2 2 Updated Jun 2, 2025

High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning

Python 4 Updated May 26, 2025

Open-source unified multimodal model

Python 3,470 233 Updated May 30, 2025

A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.

Python 1,343 59 Updated May 28, 2025

A general and accurate MACs / FLOPs profiler for PyTorch models

Python 613 42 Updated May 5, 2024

SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

Python 4,212 270 Updated May 20, 2025
Python 1,080 35 Updated May 30, 2025

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

TypeScript 14,364 1,191 Updated Jun 2, 2025

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,127 41 Updated May 21, 2025

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 21,778 1,450 Updated May 29, 2025

DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding

Python 1,064 40 Updated May 26, 2025

Official inference repo for FLUX.1 models

Python 21,933 1,560 Updated Feb 6, 2025

This repo contains the python code as well as the webpage html files for the Spice-E project from VAILab at TAU.

Jupyter Notebook 21 1 Updated Dec 9, 2024

Image editing is worth a single LoRA! 0.1% training data for fantastic image editing! Training released! Surpasses GPT-4o in ID persistence! Official ComfyUI workflow release! Only 4GB VRAM is enou…

Python 1,612 92 Updated May 16, 2025

Awesome curated collection of images and prompts generated by GPT-4o and gpt-image-1. Explore AI generated visuals created with ChatGPT and Sora, showcasing OpenAI’s advanced image generation capab…

JavaScript 6,213 556 Updated May 26, 2025

High-Resolution Image Synthesis with Latent Diffusion Models

Jupyter Notebook 12,914 1,620 Updated Feb 29, 2024

[ICLR 2025] Autoregressive Video Generation without Vector Quantization

Python 509 14 Updated May 22, 2025

MM-EUREKA: Exploring the Frontiers of Multimodal Reasoning with Rule-based Reinforcement Learning

Python 631 23 Updated May 27, 2025

PyTorch implementation of MAR+DiffLoss https://arxiv.org/abs/2406.11838

Python 1,582 92 Updated Sep 27, 2024

Includes the code for training and testing the CountGD model from the paper CountGD: Multi-Modal Open-World Counting.

Python 234 23 Updated May 29, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 51,301 6,202 Updated May 31, 2025

Understanding R1-Zero-Like Training: A Critical Perspective

Python 957 44 Updated May 24, 2025

An Open-source RL System from ByteDance Seed and Tsinghua AIR

Python 1,285 52 Updated May 11, 2025

No fortress, purely open ground. OpenManus is Coming.

Python 46,352 8,101 Updated May 27, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 8,838 1,107 Updated Jun 2, 2025

Explore the Multimodal “Aha Moment” on 2B Model

Python 591 20 Updated Mar 18, 2025

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 1,932 82 Updated May 21, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Python 2,793 296 Updated Mar 10, 2025
Next
0