8000 rightchose / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View rightchose's full-sized avatar
👋
ppp
👋
ppp
  • Zhejiang University
  • ShangHai, China

Highlights

  • Pro

Block or report rightchose

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Sass 4 Updated Aug 7, 2024
Cuda 13 7 Updated Mar 12, 2025

Examples of CUDA implementations by Cutlass CuTe

Makefile 178 23 Updated Feb 2, 2025

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,024 53 Updated May 12, 2025

小彭老师领衔编写,现代C++的中文百科全书

Typst 871 65 Updated Mar 29, 2025

Microbenchmark framework

C++ 4 Updated Sep 5, 2024

Guidelines Support Library

C++ 6,406 747 Updated May 12, 2025

Tile primitives for speedy kernels

Cuda 2,352 142 Updated May 16, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 17,480 2,046 Updated May 1, 2025

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 353 35 Updated May 14, 2025

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 8,347 638 Updated Aug 18, 2024

This is a Chinese translation of the CUDA programming guide

1,538 237 Updated Nov 13, 2024
C++ 119 33 Updated Dec 6, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 47,505 7,451 Updated May 17, 2025

Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".

Python 2,104 168 Updated Mar 27, 2024

Yinghan's Code Sample

Cuda 327 58 Updated Jul 25, 2022

FlashInfer: Kernel Library for LLM Serving

Cuda 2,971 308 Updated May 17, 2025

study of Ampere' Sparse Matmul

Cuda 18 5 Updated Jan 10, 2021

卢瑟们的作业展示,答案讲解,以及一些C++知识

C++ 720 138 Updated Apr 17, 2025

CUDA Library Samples

Cuda 1,933 389 Updated May 12, 2025

Build your own STL in one weekend

C++ 280 21 Updated Dec 18, 2024

C++ implementation of Qwen-LM

C++ 587 52 Updated Dec 6, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,525 175 Updated Jun 25, 2024

Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.

Cuda 411 79 Updated Sep 8, 2024

CUDA and Triton implementations of Flash Attention with SoftmaxN.

Python 70 5 Updated May 26, 2024

MegCC是一个运行时超轻量,高效,移植简单的深 47F3 度学习模型编译器

C++ 484 58 Updated Oct 23, 2024

小彭老师推出 SyCL 2020 课程(施工中,日后会在直播中放出)

C++ 15 Updated Sep 3, 2023

鉴定网络热门并行编程框架 - 性能测评(附小彭老师锐评)已评测:Taichi、SyCL、C++、OpenMP、TBB、Mojo

C++ 35 Updated Aug 28, 2023

Mini Logging Library with C++20 education purpose

C++ 31 1 Updated Aug 31, 2024
Next
0