weishengying

weishengying weishengying

30 followers · 2 following

Achievements

Stars

madsys-dev / deepseekv2-profile

Jupyter Notebook 133 16 Updated Mar 4, 2025

caijixueIT / CUDA_Learning_for_Freshman

Cuda 11 1 Updated Mar 6, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 14,384 1,766 Updated May 16, 2025

chengzeyi / stable-fast

https://wavespeed.ai/ Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.

Python 1,257 79 Updated Mar 27, 2025

weishengying / cutlass_flash_atten_fp8

使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention

Cuda 65 4 Updated Aug 12, 2024

CompVis / stable-diffusion

A latent text-to-image diffusion model

Jupyter Notebook 70,642 10,435 Updated Jun 18, 2024

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 1,936 208 Updated May 15, 2025

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

C++ 3,718 916 Updated Apr 29, 2025

weishengying / cute_gemm

Cuda 12 2 Updated Aug 14, 2024

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 4,435 448 Updated Feb 9, 2025

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 2,171 190 Updated May 15, 2025

weishengying / tiny-flash-attention

使用 cutlass 实现 flash-attention 精简版，具有教学意义

Cuda 41 5 Updated Aug 12, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 17,371 1,687 Updated May 8, 2025

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 351 35 Updated May 14, 2025

ColfaxResearch / cutlass-kernels

Cuda 203 33 Updated Jul 11, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 814 79 Updated Dec 30, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 2,950 304 Updated May 15, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA etc.🔥

Cuda 4,221 451 Updated May 12, 2025

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,845 515 Updated Apr 11, 2025

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 2,161 270 Updated May 11, 2025

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 819 66 Updated Sep 4, 2024

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,003 247 Updated May 9, 2025

OpenGVLab / OmniQuant

[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Python 805 62 Updated Oct 8, 2024

lizongying / my-tv

我的电视电视直播软件，安装即可使用

C 32,015 3,583 Updated Jun 20, 2024

Guangxuan-Xiao / torch-int

This repository contains integer operators on GPUs for PyTorch.

Python 204 54 Updated Sep 29, 2023

feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Python 726 68 Updated Aug 22, 2024

NVIDIA / cccl

CUDA Core Compute Libraries

C++ 1,636 215 Updated May 16, 2025

NVIDIA / cuCollections

C++ 537 94 Updated May 15, 2025

weishengying / MoE

MoE layer for pytorch

C++ 3 Updated Jan 16, 2024

NobuoTsukamoto / tensorrt-examples

TensorRT Examples (TensorRT, Jetson Nano, Python, C++)

Jupyter Notebook 94 24 Updated Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weishengying weishengying

Achievements

Achievements

Block or report weishengying

Stars

madsys-dev / deepseekv2-profile

caijixueIT / CUDA_Learning_for_Freshman

sgl-project / sglang

chengzeyi / stable-fast

weishengying / cutlass_flash_atten_fp8

CompVis / stable-diffusion

xdit-project / xDiT

NVIDIA / nccl

weishengying / cute_gemm

gpu-mode / lectures

BBuf / how-to-optim-algorithm-in-cuda

weishengying / tiny-flash-attention

Dao-AILab / flash-attention

66RING / tiny-flash-attention

ColfaxResearch / cutlass-kernels

tspeterkim / flash-attention-minimal

flashinfer-ai / flashinfer

xlite-dev / LeetCUDA

AutoGPTQ / AutoGPTQ

casper-hansen / AutoAWQ

IST-DASLab / marlin

mit-han-lab / llm-awq

OpenGVLab / OmniQuant

lizongying / my-tv

Guangxuan-Xiao / torch-int

feifeibear / LLMSpeculativeSampling

NVIDIA / cccl

NVIDIA / cuCollections

weishengying / MoE

NobuoTsukamoto / tensorrt-examples