8000 guopeng-gpli (guopeng li) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View guopeng-gpli's full-sized avatar

Highlights

  • Pro

Block or report guopeng-gpli

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ICML 2025🔥] ParallelComp: Parallel Long-Context Compressor for Length Extrapolation

Python 12 Updated Jun 16, 2025

Framework for running AI locally on mobile devices and wearables. Hardware-aware C/C++ backend with wrappers for Flutter & React Native. Kotlin & Swift coming soon.

C++ 933 53 Updated Jun 17, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,024 897 Updated Jun 17, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,420 1,027 Updated Jun 15, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,612 867 Updated Apr 29, 2025

Huawei Cloud datasets

Jupyter Notebook 72 11 Updated Apr 15, 2025

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

Rust 7,636 456 Updated Jun 19, 2025

Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]

HTML 23 3 Updated May 13, 2025

An GPU/CUDA implementation of the Hungarian algorithm

Cuda 111 19 Updated Apr 12, 2019

Redis for LLMs

Python 1,447 231 Updated Jun 19, 2025

Beginner-friendly serverless LLM deployment with Replicate & fly.io

Python 13 2 Updated Sep 3, 2023

Caribou is a framework for geo-distributed deployment of serverless workflows to save carbon emissions.

Python 8 3 Updated Jun 3, 2025

ustc thesis proposal 中国科学技术大学 开题报告 latex 模板

TeX 23 3 Updated Dec 26, 2019

Code for reproducing results for SOSP paper Bagpipe

Python 9 3 Updated Oct 20, 2023

Efficient and easy multi-instance LLM serving

Python 435 34 Updated Jun 19, 2025

📚A curated list of Awesome LLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.

Python 4,128 286 Updated Jun 18, 2025

Gorilla: Training and Evaluating LLMs for Function Calls (Tool Calls)

Python 12,165 1,151 Updated Jun 19, 2025

基于pytorch的中文意图识别和槽位填充

Python 178 27 Updated Jul 3, 2024

Awesome Mobile LLMs

204 12 Updated Jun 2, 2025

Production-ready platform for agentic workflow development.

TypeScript 103,884 15,622 Updated Jun 19, 2025

BERT-based intent and slots detector for chatbots.

Python 191 29 Updated Feb 21, 2025

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 177 9 Updated Oct 15, 2024

a curated list of high-quality papers on resource-efficient LLMs 🌱

125 7 Updated Mar 15, 2025

Serverless LLM Serving for Everyone.

Python 490 48 Updated Jun 18, 2025

system paper reading notes

246 12 Updated Mar 3, 2022

Large Language Model (LLM) Systems Paper List

1,311 72 Updated Jun 19, 2025

A curated list for Efficient Large Language Models

Python 1,736 136 Updated Jun 17, 2025

Semantic Kernel (SK) is a lightweight SDK enabling integration of AI Large Language Models (LLMs) with conventional programming languages.

Mermaid 221 128 Updated Jun 11, 2025

🚀 Docker 镜像代理,通过 GitHub Actions 将 docker.io、gcr.io、registry.k8s.io、k8s.gcr.io、quay.io、ghcr.io 等国外镜像转换为国内镜像加速下载

Go 1,100 689 Updated Feb 25, 2025

Secure Transformer Inference is a protocol for serving Transformer-based models securely.

Python 93 22 Updated May 8, 2024
Next
0