8000 zhenxl (zhengxianli) / Starred · GitHub

More Web Proxy on the site http://driver.im/

zhenxl

Follow

zhengxianli zhenxl

Follow

engineer

8 followers · 75 following

chitu.ai
beijing

Stars

sandyresearch / chipmunk

🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× vs cuBLAS

Cuda 60 2 Updated May 8, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient MLA decoding kernels

Cuda 11,560 834 Updated Apr 29, 2025

pytorch-labs / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 125 17 Updated May 21, 2025

LouChao98 / vqtree

Python 1 Updated Apr 16, 2024

punica-ai / punica

Serving multiple LoRA finetuned LLM as one

Python 1,059 48 Updated May 8, 2024

ppl-ai / pplx-kernels

Perplexity GPU Kernels

C++ 305 33 Updated May 21, 2025

rhmaaa / comet-25

C++ 5 1 Updated Feb 11, 2025

deveshks / non-BlockingDistributedBST

C++ implementation of a non-blocking binary search tree with insert and search

C++ 1 Updated Aug 30, 2016

thu-pacman / chitu

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,118 74 Updated May 15, 2025

ademeure / DeeperGEMM

Forked from deepseek-ai/DeepGEMM

DeeperGEMM: crazy optimized version

Cuda 69 Updated May 5, 2025

ScalingIntelligence / codemonkeys

Python 38 1 Updated Jan 28, 2025

CalebDu / Awesome-Cute

C++ 68 12 Updated May 16, 2025

pytorch-labs / applied-ai

Applied AI experiments and examples for PyTorch

Python 269 28 Updated May 16, 2025

continuedev / continue

⏩ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks

TypeScript 26,344 2,815 Updated May 21, 2025

alex-petrenko / faster-fifo

Faster alternative to Python's multiprocessing.Queue (IPC FIFO queue)

C++ 193 31 Updated Apr 28, 2025

argilla-io / distilabel

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 2,705 201 Updated May 19, 2025

vwxyzjn / cleanrl

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Python 7,083 756 Updated Apr 8, 2025

huggingface / search-and-learn

Recipes to scale inference-time compute of open models

Python 1,069 115 Updated May 8, 2025

huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences

Python 5,185 443 Updated Apr 30, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 24,499 2,255 Updated May 21, 2025

huggingface / trl

Train transformer language models with reinforcement learning.

Python 13,839 1,896 Updated May 21, 2025

srush / LLM-Training-Puzzles

What would you do with 1000 H100s...

Jupyter Notebook 1,045 66 Updated Jan 10, 2024

HazyResearch / zoology

Understand and test language model architectures on synthetic tasks.

Python 194 32 Updated Mar 6, 2025

cchan / tccl

extensible collectives library in triton

Python 86 5 Updated Mar 31, 2025

mobiusml / gemlite

Fast low-bit matmul kernels in Triton

Python 301 23 Updated May 21, 2025

databricks / megablocks

Python 1,356 195 Updated Apr 29, 2025

cmuparlay / psac

Parallel Self-Adjusting Computation

C++ 13 2 Updated Jul 5, 2021

stanford-futuredata / stk

Python 105 20 Updated Aug 26, 2024

LoongServe / LoongServe

Jupyter Notebook 99 8 Updated Nov 11, 2024

bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,274 331 Updated May 16, 2023

0