8000 yukavio (KavioYu) · GitHub

More Web Proxy on the site http://driver.im/

yukavio

Follow

KavioYu yukavio

Follow

Focus on model inference optimization, such as inference engine and model compression.

12 followers · 2 following

Shanghai

Achievements

Achievements

Pinned Loading

sglang sglang Public

Forked from sgl-project/sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Python 2 1
vllm vllm Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python
flashinfer flashinfer Public

Forked from flashinfer-ai/flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda
nsa nsa Public

native sparse attention kernel

Python 7 1
flash-attention flash-attention Public

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python

0