Starred Repositories
Browse starred repositories
Sort: Recently starred
-
Nano vLLM
-
Maple Mono: Open source monospace font with round corner, ligatures and Nerd-Font for IDE and terminal, fine-grained customization options. 带连字和控制台图标的圆角等宽字体,中英文宽度完美2:1,细粒度的自定义选项
-
-
Retrying library for Python
-
This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding code links.
-
Universal battlefield-adaptive Operator Evaluation Protocol for Arknights / 泛用型环境自适应干员强度评价体系 for 明日方舟
-
-
[DAC'25] Official implement of "HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference"
-
Processed / Cleaned Data for Paper Copilot
-
PDF2zh for Zotero | Zotero PDF中文翻译插件
-
PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/MCP/Docker/Zotero
-
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
-
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
-
H2-LLM: Hardware-Dataflow Co-Exploration for Heterogeneous Hybrid-Bonding-based Low-Batch LLM Inference
-
Open deep learning compiler stack for Kendryte AI accelerators ✨
-
MAGI-1: Autoregressive Video Generation at Scale
-
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
-
This is an experimental library that has evolved to P2688
-
match(it): A lightweight single-header pattern-matching library for C++17 with macro-free APIs.
-
Python interface for MLIR - the Multi-Level Intermediate Representation
-
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
-
-
Distributed Compiler Based on Triton for Parallel Systems
-
Python Fire is a library for automatically generating command line interfaces (CLIs) from absolutely any Python object.
-
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
-
Simple, Elegant, Typed Argument Parsing with argparse
-
XAttention: Block Sparse Attention with Antidiagonal Scoring
-
DeeperGEMM: crazy optimized version
-
-
High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.