8000 ranjiewwen (jiewen) / Starred · GitHub

More Web Proxy on the site http://driver.im/

ranjiewwen

Follow

🎯

Focusing

jiewen ranjiewwen

🎯

Focusing

Follow

CV&NLP

345 followers · 215 following

algorithmic engineer
chengdu

Achievements

Achievements

Highlights

Developer Program Member

Organizations

Lists (3)

Sort

CV

computer vision

LLM

large language model

NLP

natural language processing

Stars

alipay / PainlessInferenceAcceleration

Accelerate inference without tears

Python 315 21 Updated Mar 14, 2025

hpcaitech / SwiftInfer

Efficient AI Inference & Serving

Python 469 29 Updated Jan 8, 2024

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

753 44 Updated May 27, 2025

deepspeedai / DeepSpeed-Kernels

C++ 71 15 Updated Mar 26, 2025

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,962 552 Updated Apr 11, 2025

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 22,642 2,498 Updated Aug 12, 2024

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,899 383 Updated Jul 11, 2024

Ucas-HaoranWei / Vary-toy

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)

Python 620 44 Updated Dec 30, 2024

DLYuanGod / TinyGPT-V

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Python 1,283 81 Updated Apr 18, 2024

wangzhaode / llm-export

llm-export can export llm model to onnx.

Python 292 33 Updated Jan 17, 2025

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,530 176 Updated Jun 25, 2024

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,271 149 Updated May 18, 2025

xlite-dev / Awesome-LLM-Inference

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

Python 4,057 279 Updated May 27, 2025

mistralai / mistral-inference

Official inference library for Mistral models

Jupyter Notebook 10,266 916 Updated Mar 20, 2025

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 18,429 2,263 Updated Aug 6, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,212 431 Updated Feb 19, 2025

bytedance / ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 473 37 Updated Mar 15, 2024

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,248 74 Updated Mar 6, 2025

MegEngine / InferLLM

a lightweight LLM model inference framework

C++ 728 93 Updated Apr 7, 2024

baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Python 4,124 295 Updated Nov 8, 2024

NascentCore / llm-numbers-cn

中文版 llm-numbers

123 5 Updated Dec 25, 2023

ray-project / llm-numbers

Numbers every LLM developer should know

4,230 140 Updated Jan 16, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,586 1,451 Updated May 29, 2025

dvlab-research / LongLoRA

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,660 289 Updated Aug 14, 2024

openai / openai-python

The official Python library for the OpenAI API

Python 26,857 3,928 Updated May 29, 2025

OpenPPL / ppl.nn.llm

138 18 Updated Apr 23, 2024

huggingface / safetensors

Simple, safe way to store and distribute tensors

Python 3,280 252 Updated May 23, 2025

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,032 251 Updated May 9, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 6,427 549 Updated May 29, 2025

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,253 256 Updated May 29, 2025

0