8000 retonym (Mao Yunfei) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View retonym's full-sized avatar

Block or report retonym

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 175 24 Updated Jul 4, 2025

Ring attention implementation with flash attention

Python 796 72 Updated Jul 3, 2025

a toolkit on knowledge distillation for large language models

Python 101 5 Updated Jul 2, 2025

Perplexity GPU Kernels

C++ 387 48 Updated Jun 10, 2025

OLMoE: Open Mixture-of-Experts Language Models

Jupyter Notebook 797 76 Updated Mar 14, 2025

AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术

Jupyter Notebook 14,221 2,053 Updated Jul 3, 2025

Train transformer language models with reinforcement learning.

Python 14,459 2,015 Updated Jul 4, 2025

CUTLASS and CuTe Examples

Cuda 59 9 Updated Jan 4, 2025

An unofficial cuda assembler, for all generations of SASS, hopefully :)

Python 509 86 Updated Apr 20, 2023

A framework for few-shot evaluation of language models.

Python 9,454 2,505 Updated Jul 5, 2025

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 807 68 Updated Jun 3, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,245 833 Updated Jul 4, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,502 636 Updated Jul 2, 2025

LeaderWorkerSet: An API for deploying a group of pods as a unit of replication

Go 508 84 Updated Jul 3, 2025

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,852 279 Updated May 15, 2025

✔(已完结)最全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】

Jupyter Notebook 11,513 1,386 Updated Jun 23, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,502 1,031 Updated Jul 1, 2025
Jupyter Notebook 139 14 Updated Jul 4, 2025

My learning notes/codes for ML SYS.

Python 2,757 170 Updated Jul 5, 2025

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 835 38 Updated Jun 5, 2025

Minimal reproduction of DeepSeek R1-Zero

Python 11,978 1,492 Updated Apr 24, 2025

Optimize GEMM with tensorcore step by step

27 6 Updated Dec 17, 2023

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 7,705 2,068 Updated May 22, 2025
Python 93 7 Updated Dec 27, 2024

A highly optimized LLM inference acceleration engine for Llama and its variants.

C++ 898 104 Updated Jun 26, 2025

本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)

HTML 19,100 2,277 Updated Jul 3, 2025

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 642 49 Updated May 5, 2025

HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling

Python 413 51 Updated Oct 4, 2024
Next
0