GaoYusong

😂

Yusong Gao GaoYusong

😂

Built Data Infra -> Building Token Factory

125 followers · 284 following

Ant Group
Earth
https://scholar.google.com/citations?user=TxUuDcwAAAAJ&hl=zh-CN&oi=ao

Achievements

Starred repositories

unslothai / unsloth

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 41,984 3,358 Updated Jul 12, 2025

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,998 1,580 Updated Jul 13, 2025

mirage-project / mirage

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 1,546 95 Updated Jul 11, 2025

duckdb / ducklake

DuckLake is an integrated data lake and catalog format

C++ 1,786 68 Updated Jul 10, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 5,171 606 Updated Jun 27, 2025

HazyResearch / Megakernels

kernels, of the mega variety

Python 441 22 Updated Jun 2, 2025

shenh10 / DeepSeek_Simulator

Python 79 8 Updated Apr 2, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,076 144 Updated Mar 21, 2025

asgeirtj / system_prompts_leaks

Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini

JavaScript 7,561 1,711 Updated Jul 11, 2025

google / perfetto

Production-grade client-side tracing, profiling, and analysis for complex software systems.

C++ 4,344 516 Updated Jul 13, 2025

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 52,138 8,673 Updated Jul 13, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,872 178 Updated Jul 12, 2025

ai-dynamo / nixl

NVIDIA Inference Xfer Library (NIXL)

C++ 465 110 Updated Jul 13, 2025

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,385 176 Updated Jul 12, 2025

zartbot / shallowsim

DeepSeek-V3/R1 inference performance simulator

Jupyter Notebook 155 20 Updated Mar 27, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 4,468 484 Updated Jul 13, 2025

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 811 68 Updated Jun 3, 2025

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 6,238 909 Updated Mar 27, 2024

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,137 920 Updated Jun 17, 2025

inclusionAI / AReaL

Distributed RL System for LLM Reasoning

Python 1,993 116 Updated Jul 12, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,517 640 Updated Jul 2, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,271 849 Updated Jul 11, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,647 875 Updated Apr 29, 2025

vllm-project / aibrix

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 3,917 394 Updated Jul 13, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,586 1,038 Updated Jul 12, 2025

dzhng / deep-research

An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…

TypeScript 16,999 1,763 Updated Jun 7, 2025

open-compass / opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 5,665 623 Updated Jul 8, 2025

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,551 453 Updated Jul 12, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 18,324 1,805 Updated Jul 13, 2025

NVIDIA / cuda-checkpoint

CUDA checkpoint and restore utility

C 346 19 Updated Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yusong Gao GaoYusong

Achievements

Achievements

Block or report GaoYusong

Starred repositories

unslothai / unsloth

NVIDIA / TensorRT-LLM

mirage-project / mirage

duckdb / ducklake

GeeeekExplorer / nano-vllm

HazyResearch / Megakernels

shenh10 / DeepSeek_Simulator

deepseek-ai / profile-data

asgeirtj / system_prompts_leaks

google / perfetto

vllm-project / vllm

zhaochenyang20 / Awesome-ML-SYS-Tutorial

ai-dynamo / nixl

SafeAILab / EAGLE

zartbot / shallowsim

ai-dynamo / dynamo

alibaba / rtp-llm

NVIDIA / FasterTransformer

deepseek-ai / 3FS

inclusionAI / AReaL

deepseek-ai / DeepGEMM

deepseek-ai / DeepEP

deepseek-ai / FlashMLA

vllm-project / aibrix

kvcache-ai / ktransformers

dzhng / deep-research

open-compass / opencompass

NVIDIA / TransformerEngine

Dao-AILab / flash-attention

NVIDIA / cuda-checkpoint

Starred topics

lakehouse