RWKV (pronounced RwaKuv) is an RNN with great LLM performance, which can also be directly trained like a GPT transformer (parallelizable). We are at RWKV-7 "Goose". So it's combining the best of RN…

Python 13,579 912 Updated May 7, 2025

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,314 567 Updated Oct 28, 2024

NVIDIA / Faster 8000 Transformer

Transformer related optimization, including BERT, GPT

C++ 6,149 903 Updated Mar 27, 2024

feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Python 723 68 Updated Aug 22, 2024

autoliuweijie / FastBERT

The score code of FastBERT (ACL2020)

Python 605 90 Updated Oct 29, 2021

epfml / landmark-attention

Landmark Attention: Random-Access Infinite Context Length for Transformers

Python 423 36 Updated Dec 20, 2023

liyucheng09 / Selective_Context

Compress your input to ChatGPT or other LLMs, to let them process 2x more content and save 40% memory and GPU time.

Python 375 19 Updated Feb 12, 2024

kssteven418 / LTP

[KDD'22] Learned Token Pruning for Transformers

Python 97 18 Updated Feb 27, 2023

google / trax

Trax — Deep Learning with Clear Code and Speed

Python 8,204 825 Updated Apr 10, 2025

Hsword / SpotServe

SpotServe: Serving Generative Large Language Models on Preemptible Instances

118 10 Updated Feb 22, 2024

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 12,312 2,755 Updated May 10, 2025

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 581 61 Updated Apr 6, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 38,274 4,360 Updated May 10, 2025

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 21,210 2,620 Updated Mar 4, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,527 831 Updated Apr 29, 2025

andreinechaev / nvcc4jupyter

A plugin for Jupyter Notebook to run CUDA C/C++ code

Jupyter Notebook 227 93 Updated Sep 13, 2024

LightChen233 / Awesome-Multilingual-LLM

95 4 Updated Dec 19, 2024

Le Tien Dat t1end4t

Lists (20)

agent

awesome

cpp

cuda

graph 8000

hallucination

haskell

inference

multimodal

nix

non-english nlp

other

papers

practice

prompt

python

rag

reasoning

refs

rust

Stars