zheanxu

Zhean Xu zheanxu

58 followers · 16 following

Achievements

Stars

GeeeekExplorer / nano-vllm

Nano vLLM

Python 4,892 566 Updated Jun 27, 2025

0xD0GF00D / DocumentSASS

Unofficial description of the CUDA assembly (SASS) instruction sets.

Python 104 11 Updated Mar 10, 2025

MARD1NO / CUDA-PPT

99 14 Updated Apr 2, 2025

deepseek-ai / DeepSeek-Prover-V2

1,162 81 Updated Apr 30, 2025

tpn / pdfs

Technically-oriented PDF Collection (Papers, Specs, Decks, Manuals, etc)

HTML 8,948 1,701 Updated Jun 27, 2025

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 745 127 Updated Feb 21, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,365 110 Updated Jul 4, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 866 68 Updated Jul 4, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,497 158 Updated Jul 3, 2025

NervanaSystems / maxas

Assembler for NVIDIA Maxwell architecture

Sass 1,011 167 Updated Jan 3, 2023

KCORES / kcores-llm-arena

LLM Arena by KCORES team

HTML 855 38 Updated Apr 29, 2025

ademeure / DeeperGEMM

Forked from deepseek-ai/DeepGEMM

DeeperGEMM: crazy optimized version

Cuda 69 Updated May 5, 2025

sunkx109 / GPUs-Specs

Summary of the Specs of Commonly Used GPUs for Training and Inference of LLM

48 7 Updated Mar 15, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,300 209 Updated Jun 27, 2025

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 7,790 1,296 Updated Jul 3, 2025

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 14,197 858 Updated Jul 5, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,502 636 Updated Jul 2, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,245 833 Updated Jul 4, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,641 874 Updated Apr 29, 2025

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,852 279 Updated May 15, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,502 1,031 Updated Jul 1, 2025

centerforaisafety / hle

Humanity's Last Exam

Python 838 44 Updated Jun 6, 2025

deepseek-ai / DeepSeek-R1

90,403 11,663 Updated Jun 27, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,758 170 Updated Jul 5, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.🎉

Cuda 5,326 565 Updated Jun 29, 2025

deepseek-ai / DeepSeek-V3

Python 98,041 15,948 Updated Jun 27, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,301 364 Updated Jul 5, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,304 364 Updated Jul 6, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 4,686 474 Updated Jun 18, 2025

EndlessCheng / codeforces-go

算法竞赛模板库 by 灵茶山艾府 💭💡🎈

Go 6,944 708 Updated Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zhean Xu zheanxu

Achievements

Achievements

Block or report zheanxu

Stars

GeeeekExplorer / nano-vllm

0xD0GF00D / DocumentSASS

MARD1NO / CUDA-PPT

deepseek-ai / DeepSeek-Prover-V2

tpn / pdfs

NVIDIA / multi-gpu-programming-models

tile-ai / tilelang

ByteDance-Seed / Triton-distributed

HazyResearch / ThunderKittens

NervanaSystems / maxas

KCORES / kcores-llm-arena

ademeure / DeeperGEMM

sunkx109 / GPUs-Specs

BBuf / how-to-optim-algorithm-in-cuda

NVIDIA / cutlass

stas00 / ml-engineering

deepseek-ai / DeepGEMM

deepseek-ai / DeepEP

deepseek-ai / FlashMLA

deepseek-ai / open-infra-index

kvcache-ai / ktransformers

centerforaisafety / hle

deepseek-ai / DeepSeek-R1

zhaochenyang20 / Awesome-ML-SYS-Tutorial

xlite-dev / LeetCUDA

deepseek-ai / DeepSeek-V3

linkedin / Liger-Kernel

flashinfer-ai / flashinfer

gpu-mode / lectures

EndlessCheng / codeforces-go