8000 an-yongqi (Yongqi An) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View an-yongqi's full-sized avatar

Highlights

  • Pro

Block or report an-yongqi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Tools for merging pretrained large language models.

Python 5,982 575 Updated Jun 19, 2025

Scaling RL on advanced reasoning models

Python 388 19 Updated Jul 9, 2025

MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.

Python 2,607 204 Updated Jul 7, 2025

Nano vLLM

Python 5,063 594 Updated Jun 27, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 2,878 213 Updated Jul 9, 2025

Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"

Python 243 25 Updated Jan 31, 2025

M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models

Python 24 1 Updated Jun 22, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 10,610 1,753 Updated Jul 9, 2025

[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filli…

Python 1,066 54 Updated Jun 25, 2025

Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference

Python 118 7 Updated May 19, 2025
Python 19 3 Updated Mar 17, 2025

Linear Recurrence Operations for PyTorch

Cuda 5 Updated Jul 4, 2025

Fast and memory-efficient exact attention

Python 18,264 1,794 Updated Jul 9, 2025

Function Vectors in Large Language Models (ICLR 2024)

Python 170 35 Updated Apr 17, 2025

A library for mechanistic interpretability of GPT-style language models

Python 2,328 412 Updated Jul 9, 2025

[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule

Python 183 11 Updated Mar 18, 2025

VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Framework

Python 367 21 Updated Jul 8, 2025

Efficient Triton Kernels for LLM Training

Python 5,334 368 Updated Jul 9, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,641 876 Updated Apr 29, 2025

Development repository for the Triton language and compiler

MLIR 16,089 2,102 Updated Jul 9, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 1,817 107 Updated Apr 3, 2025

A sparse attention kernel supporting mix sparse patterns

C++ 249 12 Updated Feb 13, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 14,531 1,035 Updated Jul 1, 2025

[ICLR 2025] Systematic Outliers in Large Language Models.

Python 5 1 Updated Feb 11, 2025

Unified KV Cache Compression Methods for Auto-Regressive Models

Python 1,188 150 Updated Jan 4, 2025

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.

Python 234 10 Updated May 25, 2025

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 302 34 Updated Nov 22, 2024

Enjoy the magic of Diffusion models!

Python 8,987 819 Updated Jul 8, 2025

Fast Segment Anything

Python 7,966 730 Updated Jul 30, 2024

This repository collects all relevant resources about interpretability in LLMs

362 25 Updated Nov 1, 2024
Next
0