Stars
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Fully open data curation for reasoning models
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
MAGI-1: Autoregressive Video Generation at Scale
Complex-Edit: CoT-Like Instruction Generation for Complexity-Controllable Image Editing Benchmark
MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
MedReason: Eliciting Factual Medical Reasoning Steps in LLMs via Knowledge Graphs
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning in Large Language Models
FlashMLA: Efficient MLA decoding kernels
This repository includes the official implementation of our paper "Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation"
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models
Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
A Training-free Iterative Framework for Long Story Visualization
“FlowAR: Scale-wise Autoregressive Image Generation Meets Flow Matching” FlowAR employs a simplest scale design and is compatible with any VAE.
Official inference framework for 1-bit LLMs
Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges
[ICLR 2025] This is the official repository of our paper "MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine“
Official inference repo for FLUX.1 models
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…