8000 GaoYusong (Yusong Gao) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View GaoYusong's full-sized avatar
😂
😂

Block or report GaoYusong

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Fine-tuning & Reinforcement Learning for LLMs. 🦥 Train Qwen3, Llama 4, DeepSeek-R1, Gemma 3, TTS 2x faster with 70% less VRAM.

Python 41,984 3,358 Updated Jul 12, 2025

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,998 1,580 Updated Jul 13, 2025

Mirage: Automatically Generating Fast GPU Kernels without Programming in Triton/CUDA

C++ 1,546 95 Updated Jul 11, 2025

DuckLake is an integrated data lake and catalog format

C++ 1,786 68 Updated Jul 10, 2025

Nano vLLM

Python 5,171 606 Updated Jun 27, 2025

kernels, of the mega variety

Python 441 22 Updated Jun 2, 2025
Python 79 8 Updated Apr 2, 2025

Analyze computation-communication overlap in V3/R1.

1,076 144 Updated Mar 21, 2025

Collection of extracted System Prompts from popular chatbots like ChatGPT, Claude & Gemini

JavaScript 7,561 1,711 Updated Jul 11, 2025

Production-grade client-side tracing, profiling, and analysis for complex software systems.

C++ 4,344 516 Updated Jul 13, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 52,138 8,673 Updated Jul 13, 2025

My learning notes/codes for ML SYS.

Python 2,872 178 Updated Jul 12, 2025

NVIDIA Inference Xfer Library (NIXL)

C++ 465 110 Updated Jul 13, 2025

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,385 176 Updated Jul 12, 2025

DeepSeek-V3/R1 inference performance simulator

Jupyter Notebook 155 20 Updated Mar 27, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 4,468 484 Updated Jul 13, 2025

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 811 68 Updated Jun 3, 2025

Transformer related optimization, including BERT, GPT

C++ 6,238 909 Updated Mar 27, 2024

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,137 920 Updated Jun 17, 2025

Distributed RL System for LLM Reasoning

Python 1,993 116 Updated Jul 12, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,517 640 Updated Jul 2, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,271 849 Updated Jul 11, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,647 875 Updated Apr 29, 2025

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go 3,917 394 Updated Jul 13, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,586 1,038 Updated Jul 12, 2025

An AI-powered research assistant that performs iterative, deep research on any topic by combining search engines, web scraping, and large language models. The goal of this repo is to provide the si…

TypeScript 16,999 1,763 Updated Jun 7, 2025

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Python 5,665 623 Updated Jul 8, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,551 453 Updated Jul 12, 2025

Fast and memory-efficient exact attention

Python 18,324 1,805 Updated Jul 13, 2025

CUDA checkpoint and restore utility

C 346 19 Updated Jan 27, 2025
Next
0