8000 SeuTao (Tao Shen) / Starred · GitHub

More Web Proxy on the site http://driver.im/

SeuTao

Follow

🎯

Focusing

Tao Shen SeuTao

🎯

Focusing

Follow

AI Developer / Kaggle Grandmaster / Engineer / Data Scientist / Researcher

973 followers · 23 following

Shanghai/Shenzhen
https://www.kaggle.com/shentao
@SeuTao1
https://scholar.google.com/citations?user=8cprenoAAAAJ&hl=zh-CN

Achievements

Achievements

Stars

selftok-team / SelftokTokenizer

Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning

Python 40 2 Updated May 18, 2025

hp-l33 / ARPG

Autoregressive Image Generation with Randomized Parallel Decoding

Python 61 Updated Apr 1, 2025

ziqipang / RandAR

[CVPR 2025 (Oral)] Open implementation of "RandAR"

Python 143 3 Updated Mar 20, 2025

JiuhaiChen / BLIP3o

Python 445 12 Updated May 18, 2025

Victorwz / Open-Qwen2VL

Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources

Python 200 7 Updated May 17, 2025

GaParmar / clean-fid

PyTorch - FID calculation with proper image resizing and quantization steps [CVPR 2022]

Python 1,067 72 Updated Jul 23, 2024

mit-han-lab / hart

HART: Efficient Visual Generation with Hybrid Autoregressive Transformer

Python 586 35 Updated Oct 16, 2024

modelscope / Nexus-Gen

Python 178 10 Updated May 14, 2025

bytedance / SuperEdit

Official repo for: SuperEdit - Rectifying and Facilitating Supervision for Instruction-Based Image Editing

Python 121 8 Updated May 15, 2025

ByteDance-Seed / Seed1.5-VL

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 894 22 Updated May 15, 2025

bluorion-com / ZClip

Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".

Python 119 7 Updated May 12, 2025

bytedance / deer-flow

DeerFlow is a community-driven Deep Research framework, combining language models with tools like web search, crawling, and Python execution, while contributing back to the open-source community.

TypeScript 10,636 987 Updated May 18, 2025

yifan123 / flow_grpo

An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL

Python 546 19 Updated May 18, 2025

PKU-YuanGroup / WISE

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

Python 86 1 Updated Apr 8, 2025

rongyaofang / GoT

Official repository of "GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing"

Jupyter Notebook 240 9 Updated Apr 30, 2025

huggingface / nanoVLM

The simplest, fastest repository for training/finetuning small-sized VLMs.

Python 2,588 193 Updated May 16, 2025

clovaai / CRAFT-pytorch

Official implementation of Character Region Awareness for Text Detection (CRAFT)

Python 3,245 923 Updated Jul 16, 2024

inclusionAI / Ming

Ming - facilitating advanced multimodal understanding and generation capabilities built upon the Ling LLM.

Python 94 4 Updated May 12, 2025

Karine-Huang / T2I-CompBench

[Neurips 2023 & TPAMI] T2I-CompBench (++) for Compositional Text-to-image Generation Evaluation

Python 255 10 Updated Apr 10, 2025

CaraJ7 / T2I-R1

Official repository of T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

Python 285 16 Updated May 12, 2025

JiaqiLiao77 / ImageGen-CoT

ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning

32 Updated Apr 6, 2025

Shakker-Labs / RepText

RepText: Rendering Visual Text via Replicating 🔥

74 5 Updated May 1, 2025

huggingface / picotron

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 1,487 101 Updated Mar 7, 2025

hutaiHang / ATM

21 1 Updated Apr 15, 2025

kylesargent / FlowMo

Official PyTorch implementation of FlowMo.

Jupyter Notebook 58 4 Updated Apr 7, 2025

stepfun-ai / Step1X-Edit

A SOTA open-source image editing model, which aims to provide comparable performance against the closed-source models like GPT-4o and Gemini 2 Flash.

Python 1,245 56 Updated May 13, 2025

SandAI-org / MAGI-1

MAGI-1: Autoregressive Video Generation at Scale

Python 3,036 164 Updated May 14, 2025

NVlabs / TokenBench

A Video Tokenizer Evaluation Dataset

Python 115 8 Updated Jan 13, 2025

NVIDIA / Cosmos

New repo collection for NVIDIA Cosmos: https://github.com/nvidia-cosmos

Jupyter Notebook 7,966 512 Updated Apr 29, 2025

HiDream-ai / HiDream-E1

Python 483 28 Updated Apr 29, 2025

0