kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,502 1,031 Updated Jul 1, 2025

spcl / muliticast-based-allgather

C 16 4 Updated Feb 12, 2025

mdy666 / mdy_triton

Jupyter Notebook 139 14 Updated Jul 4, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 2,757 170 Updated Jul 5, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 835 38 Updated Jun 5, 2025

Jiayi-Pan / TinyZero

Minimal reproduction of DeepSeek R1-Zero

Python 11,978 1,492 Updated Apr 24, 2025

gty111 / GEMM_MMA

Optimize GEMM with tensorcore step by step

27 6 Updated Dec 17, 2023

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 7,705 2,068 Updated May 22, 2025

yifuwang / symm-mem-recipes

Python 93 7 Updated Dec 27, 2024

hkproj / triton-flash-attention

Python 178 18 Updated Jan 2, 2025

zhihu / ZhiLight

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 898 104 Updated Jun 26, 2025

liguodongiot / llm-action

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

HTML 19,100 2,277 Updated Jul 3, 2025

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 642 49 Updated May 5, 2025

bytedance / HLLM

HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling

Python 413 51 Updated Oct 4, 2024

Next

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mao Yunfei retonym

Achievements

Achievements

Block or report retonym

Lists (1)

intel

Stars

pytorch-labs / tritonbench

zhuzilin / ring-flash-attention

modelscope / easydistill

ppl-ai / pplx-kernels

allenai / OLMoE

Infrasys-AI / AISystem

huggingface / trl

leimao / CUTLASS-Examples

cloudcores / CuAssembler

EleutherAI / lm-evaluation-harness

alibaba / rtp-llm

deepseek-ai / DeepEP

deepseek-ai / DeepGEMM

kubernetes-sigs / lws

deepseek-ai / open-infra-index

AccumulateMore / CV

kvcache-ai / ktransformers

spcl / muliticast-based-allgather

mdy666 / mdy_triton

zhaochenyang20 / Awesome-ML-SYS-Tutorial

efeslab / Nanoflow

Jiayi-Pan / TinyZero

gty111 / GEMM_MMA

NVIDIA / cuda-samples

yifuwang / symm-mem-recipes

hkproj / triton-flash-attention

zhihu / ZhiLight

liguodongiot / llm-action

microsoft / BitBLAS

bytedance / HLLM