Xinyu302

Xinyu Yang Xinyu302

Student of Beihang University

18 followers · 32 following

BUAA
Beijing
04:57 (UTC +08:00)

Achievements

x3 x2

Achievements

x3 x2

Highlights

Stars

ByteDance-Seed / Triton-distributed

Distributed Triton for Parallel Systems

Python 775 50 Updated May 30, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,228 94 Updated Jun 2, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,401 607 Updated May 27, 2025

spencerwooo / mihoro

Mihomo CLI client on Linux. Formerly `clashrup`.

Rust 61 9 Updated Mar 12, 2025

FZJ-JSC / tutorial-multi-gpu

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 266 56 Updated May 28, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,417 145 Updated Jun 2, 2025

KnowingNothing / compiler-and-arch

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

461 36 Updated Jan 15, 2025

iree-org / iree-turbine

IREE's PyTorch Frontend, based on Torch Dynamo.

Python 85 56 Updated Jun 2, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 958 61 Updated May 28, 2025

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 831 67 Updated Sep 4, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,131 343 Updated Jun 2, 2025

Evian-Zhang / llvm-ir-tutorial

LLVM IR入门指南

LLVM 1,410 156 Updated Jan 31, 2024

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 14,825 1,891 Updated Jun 2, 2025

pytorch / torchtitan

A PyTorch native platform for training generative AI models

Python 3,877 381 Updated Jun 2, 2025

banach-space / llvm-tutor

A collection of out-of-tree LLVM passes for teaching and learning

C++ 3,192 408 Updated Apr 27, 2025

pytorch / ao

PyTorch native quantization and sparsity for training and inference

Python 2,074 273 Updated Jun 2, 2025

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 619 49 Updated May 5, 2025

ColfaxResearch / cutlass-kernels

Cuda 208 33 Updated Jul 11, 2024

corsix / amx

Apple AMX Instruction Set

C 1,088 52 Updated Dec 26, 2024

volcengine / veScale

A PyTorch Native LLM Training Framework

Python 813 48 Updated Dec 27, 2024

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 18,215 2,138 Updated May 27, 2025

llvm / torch-mlir

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,548 557 Updated Jun 2, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,577 481 Updated Jun 2, 2025

bytedance / byteir

A model compilation solution for various hardware

MLIR 437 48 Updated May 8, 2025

Tencent / ncnn

ncnn is a high-performance neural network inference framework optimized for the mobile platform

C++ 21,564 4,260 Updated Jun 2, 2025

Cjkkkk / CUDA_gemm

A simple high performance CUDA GEMM implementation.

Cuda 374 41 Updated Jan 4, 2024

Xinyu302 / buaa_compiler

C++ 2 Updated Dec 21, 2020

BBuf / tvm_mlir_learn

compiler learning resources collect.

Python 2,412 347 Updated Mar 19, 2025

Mokoghost / BUAA_Compiler2020

2020北航编译技术实验部分个人作业

C++ 8 Updated Jan 7, 2021

luice / BUAA-Compiler-Pascal-to-x86

北航计算机学院本科《编译原理》实验课的大作业。源语言为类PASCAL语言，目标语言为x86汇编，编译器用C++语言实现。

C++ 134 24 Updated Oct 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xinyu Yang Xinyu302

Achievements

Achievements

Highlights

Block or report Xinyu302

Stars

ByteDance-Seed / Triton-distributed

tile-ai / tilelang

deepseek-ai / DeepGEMM

spencerwooo / mihoro

FZJ-JSC / tutorial-multi-gpu

HazyResearch / ThunderKittens

KnowingNothing / compiler-and-arch

iree-org / iree-turbine

bytedance / flux

IST-DASLab / marlin

linkedin / Liger-Kernel

Evian-Zhang / llvm-ir-tutorial

sgl-project / sglang

pytorch / torchtitan

banach-space / llvm-tutor

pytorch / ao

microsoft / BitBLAS

ColfaxResearch / cutlass-kernels

corsix / amx

volcengine / veScale

liguodongiot / llm-action

llvm / torch-mlir

xlite-dev / LeetCUDA

bytedance / byteir

Tencent / ncnn

Cjkkkk / CUDA_gemm

Xinyu302 / buaa_compiler

BBuf / tvm_mlir_learn

Mokoghost / BUAA_Compiler2020

luice / BUAA-Compiler-Pascal-to-x86