8000 ranjiewwen (jiewen) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View ranjiewwen's full-sized avatar
🎯
Focusing
🎯
Focusing
  • algorithmic engineer
  • chengdu

Organizations

@DIP-ML-AI

Block or report ranjiewwen

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Accelerate inference without tears

Python 315 21 Updated Mar 14, 2025

Efficient AI Inference & Serving

Python 469 29 Updated Jan 8, 2024

📰 Must-read papers and blogs on Speculative Decoding ⚡️

8000 753 44 Updated May 27, 2025

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,962 552 Updated Apr 11, 2025

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

Python 22,642 2,498 Updated Aug 12, 2024

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,899 383 Updated Jul 11, 2024

Official code implementation of Vary-toy (Small Language Model Meets with Reinforced Vision Vocabulary)

Python 620 44 Updated Dec 30, 2024

TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones

Python 1,283 81 Updated Apr 18, 2024

llm-export can export llm model to onnx.

Python 292 33 Updated Jan 17, 2025

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,530 176 Updated Jun 25, 2024

Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3.

Python 1,271 149 Updated May 18, 2025

📚A curated list of Awesome LLM/VLM Inference Papers with codes: WINT8/4, FlashAttention, PagedAttention, Parallelism, MLA, etc.

Python 4,057 279 Updated May 27, 2025

Official inference library for Mistral models

Jupyter Notebook 10,266 916 Updated Mar 20, 2025

Inference Llama 2 in one file of pure C

C 18,429 2,263 Updated Aug 6, 2024

High-speed Large Language Model Serving for Local Deployment

C++ 8,212 431 Updated Feb 19, 2025

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 473 37 Updated Mar 15, 2024

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,248 74 Updated Mar 6, 2025

a lightweight LLM model inference framework

C++ 728 93 Updated Apr 7, 2024

A series of large language models developed by Baichuan Intelligent Technology

Python 4,124 295 Updated Nov 8, 2024

中文版 llm-numbers

123 5 Updated Dec 25, 2023

Numbers every LLM developer should know

4,230 140 Updated Jan 16, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorR…

C++ 10,586 1,451 Updated May 29, 2025

Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)

Python 2,660 289 Updated Aug 14, 2024

The official Python library for the OpenAI API

Python 26,857 3,928 Updated May 29, 2025

Simple, safe way to store and distribute tensors

Python 3,280 252 Updated May 23, 2025

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 3,032 251 Updated May 9, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 6,427 549 Updated May 29, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,253 256 Updated May 29, 2025
Next
0