8000 yyccli (Yangcheng Li) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View yyccli's full-sized avatar

Block or report yyccli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

An Emacs framework for the stubborn martian hacker

Emacs Lisp 20,650 3,117 Updated Jul 5, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,263 841 Updated Jul 8, 2025

NVIDIA Inference Xfer Library (NIXL)

C++ 450 107 Updated Jul 8, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 4,433 473 Updated Jul 8, 2025

AI Tensor Engine for ROCm

Python 221 65 Updated Jul 8, 2025

Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators

C++ 434 208 Updated Jul 8, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Python 5,510 636 Updated Jul 2, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 1,379 111 Updated Jul 8, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,127 914 Updated Jun 17, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,641 875 Updated Apr 29, 2025

A retargetable MLIR-based machine learning compiler and runtime toolkit.

C++ 3,207 725 Updated Jul 8, 2025

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.

Python 697 27 Updated Apr 20, 2025

Large Language Model (LLM) Systems Paper List

1,356 74 Updated Jul 4, 2025

Code repository for 'From Batch to Stream: Automatic Generation of Online Algorithms' https://arxiv.org/abs/2404.04743

Python 6 Updated Apr 12, 2024

Universal LLM Deployment Engine with ML Compilation

Python 20,934 1,763 Updated Jul 7, 2025

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,598 312 Updated Oct 19, 2024

This is a repository for all workshop related materials.

Jupyter Notebook 224 87 Updated May 4, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 3,521 297 Updated Jul 8, 2025

ASCII generator (image to text, image to image, video to video)

Python 7,930 609 Updated Nov 22, 2024

KvikIO - High Performance File IO

C++ 213 78 Updated Jul 7, 2025
Cuda 137 17 Updated Mar 18, 2024

🌟 Wiki of OI / ICPC for everyone. (某大型游戏线上攻略,内含炫酷算术魔法)

TypeScript 23,528 4,298 Updated Jul 8, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 15,830 2,288 Updated Jul 8, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 91,364 24,620 Updated Jul 8, 2025

Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"

Python 12,212 778 Updated Dec 17, 2024

搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)

Python 3,525 404 Updated Jul 8, 2025

Puzzles for learning Triton

Jupyter Notebook 1,747 138 Updated Nov 18, 2024

Machine learning compiler based on MLIR for Sophgo TPU.

C++ 752 185 Updated Jul 7, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA/Tensor Cores Kernels, HGEMM, FA-2 MMA.🎉

Cuda 5,396 568 Updated Jun 29, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 6,671 567 Updated Jul 8, 2025
Next
0