[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,066 54 Updated Jun 25, 2025

ByteDance-Seed / FlexPrefill

Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Python 118 7 Updated May 19, 2025

antgroup / cakekv

Python 19 3 Updated Mar 17, 2025

safelix / linrec

Linear Recurrence Operations for PyTorch

Cuda 5 Updated Jul 4, 2025

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 18,264 1,794 Updated Jul 9, 2025

ericwtodd / function_vectors

Function Vectors in Large Language Models (ICLR 2024)

Python 170 35 Updated Apr 17, 2025

TransformerLensOrg / TransformerLens

A library for mechanistic interpretability of GPT-style language models

Python 2,328 412 Updated Jul 9, 2025

NVlabs / GatedDeltaNet

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 183 11 Updated Mar 18, 2025

ByteDance-Seed / VeOmni

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

Python 367 21 Updated Jul 8, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,334 368 Updated Jul 9, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,641 876 Updated Apr 29, 2025

triton-lang / triton

Development repository for the Triton language and compiler

MLIR 16,089 2,102 Updated Jul 9, 2025

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,817 107 Updated Apr 3, 2025

mit-han-lab / Block-Sparse-Attention

A sparse attention kernel supporting mix sparse patterns

C++ 249 12 Updated Feb 13, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,531 1,035 Updated Jul 1, 2025

an-yongqi / systematic-outliers

[ICLR 2025] Systematic Outliers in Large Language Models.

Python 5 1 Updated Feb 11, 2025

Zefan-Cai / KVCache-Factory

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,188 150 Updated Jan 4, 2025

wenhao728 / awesome-diffusion-v2v

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.

Python 234 10 Updated May 25, 2025

mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 302 34 Updated Nov 22, 2024

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 8,987 819 Updated Jul 8, 2025

CASIA-IVA-Lab / FastSAM

Fast Segment Anything

Python 7,966 730 Updated Jul 30, 2024

ruizheliUOA / Awesome-Interpretability-in-Large-Language-Models

This repository collects all relevant resources about interpretability in LLMs

362 25 Updated Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yongqi An an-yongqi

Achievements

Achievements

Highlights

Block or report an-yongqi

Stars

arcee-ai / mergekit

ChenxinAn-fdu / POLARIS

MiniMax-AI / MiniMax-M1

GeeeekExplorer / nano-vllm

fla-org / flash-linear-attention

HazyResearch / lolcats

jxiw / M1

volcengine / verl

microsoft / MInference