10000 Xinyu302 (Xinyu Yang) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Xinyu302's full-sized avatar
  • BUAA
  • Beijing
  • 04:57 (UTC +08:00)

Highlights

  • Pro

Block or report Xinyu302

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Distributed Triton for Parallel Systems

Python 775 50 Updated May 30, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,228 94 Updated Jun 2, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,401 607 Updated May 27, 2025

Mihomo CLI client on Linux. Formerly `clashrup`.

Rust 61 9 Updated Mar 12, 2025

Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial

Cuda 266 56 Updated May 28, 2025

Tile primitives for speedy kernels

Cuda 2,417 145 Updated Jun 2, 2025

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

461 36 Updated Jan 15, 2025

IREE's PyTorch Frontend, based on Torch Dynamo.

Python 85 56 Updated Jun 2, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 958 61 Updated May 28, 2025

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 831 67 Updated Sep 4, 2024

Efficient Triton Kernels for LLM Training

Python 5,131 343 Updated Jun 2, 2025

LLVM IR入门指南

LLVM 1,410 156 Updated Jan 31, 2024

SGLang is a fast serving framework for large language models and vision language models.

Python 14,825 1,891 Updated Jun 2, 2025

A PyTorch native platform for training generative AI models

Python 3,877 381 Updated Jun 2, 2025

A collection of out-of-tree LLVM passes for teaching and learning

C++ 3,192 408 Updated Apr 27, 2025

PyTorch native quantization and sparsity for training and inference

Python 2,074 273 Updated Jun 2, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 619 49 Updated May 5, 2025

Apple AMX Instruction Set

C 1,088 52 Updated Dec 26, 2024

A PyTorch Native LLM Training Framework

Python 813 48 Updated Dec 27, 2024

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 18,215 2,138 Updated May 27, 2025

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,548 557 Updated Jun 2, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,577 481 Updated Jun 2, 2025

A model compilation solution for various hardware

MLIR 437 48 Updated May 8, 2025

ncnn is a high-performance neural network inference framework optimized for the mobile platform

C++ 21,564 4,260 Updated Jun 2, 2025

A simple high performance CUDA GEMM implementation.

Cuda 374 41 Updated Jan 4, 2024
C++ 2 Updated Dec 21, 2020

compiler learning resources collect.

Python 2,412 347 Updated Mar 19, 2025

2020北航编译技术实验部分个人作业

C++ 8 Updated Jan 7, 2021

北航计算机学院本科《编译原理》实验课的大作业。源语言为类PASCAL语言,目标语言为x86汇编,编译器用C++语言实现。

C++ 134 24 Updated Oct 20, 2018
0