8000 ziang663 · GitHub

More Web Proxy on the site http://driver.im/

ziang663

Follow

ziang663

Follow

0 followers · 1 following

Popular repositories Loading

vllm-v1 vllm-v1 Public

Forked from vllm-project/vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 1
ScaleLLM ScaleLLM Public

Forked from vectorch-ai/ScaleLLM

A high-performance inference system for large language models, designed for production environments.

C++
flash-attention flash-attention Public

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python
custom_op custom_op Public

this is my cuda kernel room

Cuda
Megatron-LM Megatron-LM Public

Forked from NVIDIA/Megatron-LM

Ongoing research training transformer models at scale

Python

0