8000 zhenxl (zhengxianli) / Starred · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View zhenxl's full-sized avatar
  • chitu.ai
  • beijing

Block or report zhenxl

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

🎬 3.7× faster video generation E2E 🖼️ 1.6× faster image generation E2E ⚡ ColumnSparseAttn 9.3× vs FlashAttn‑3 💨 ColumnSparseGEMM 2.5× vs cuBLAS

Cuda 60 2 Updated May 8, 2025

FlashMLA: Efficient MLA decoding kernels

Cuda 11,560 834 Updated Apr 29, 2025

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 125 17 Updated May 21, 2025
Python 1 Updated Apr 16, 2024

Serving multiple LoRA finetuned LLM as one

Python 1,059 48 Updated May 8, 2024

Perplexity GPU Kernels

C++ 305 33 Updated May 21, 2025
C++ 5 1 Updated Feb 11, 2025

C++ implementation of a non-blocking binary search tree with insert and search

C++ 1 Updated Aug 30, 2016

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,118 74 Updated May 15, 2025

DeeperGEMM: crazy optimized version

Cuda 69 Updated May 5, 2025
C++ 68 12 Updated May 16, 2025

Applied AI experiments and examples for PyTorch

Python 269 28 Updated May 16, 2025

⏩ Create, share, and use custom AI code assistants with our open-source IDE extensions and hub of models, rules, prompts, docs, and other building blocks

TypeScript 26,344 2,815 Updated May 21, 2025

Faster alternative to Python's multiprocessing.Queue (IPC FIFO queue)

C++ 193 31 Updated Apr 28, 2025

Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verified research papers.

Python 2,705 201 Updated May 19, 2025

High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)

Python 7,083 756 Updated Apr 8, 2025

Recipes to scale inference-time compute of open models

Python 1,069 115 Updated May 8, 2025

Robust recipes to align language models with human and AI preferences

Python 5,185 443 Updated Apr 30, 2025

Fully open reproduction of DeepSeek-R1

Python 24,499 2,255 Updated May 21, 2025

Train transformer language models with reinforcement learning.

Python 13,839 1,896 Updated May 21, 2025

What would you do with 1000 H100s...

Jupyter Notebook 1,045 66 Updated Jan 10, 2024

Understand and test language model architectures on synthetic tasks.

Python 194 32 Updated Mar 6, 2025

extensible collectives library in triton

Python 86 5 Updated Mar 31, 2025

Fast low-bit matmul kernels in Triton

Python 301 23 Updated May 21, 2025
Python 1,356 195 Updated Apr 29, 2025

Parallel Self-Adjusting Computation

C++ 13 2 Updated Jul 5, 2021
Python 105 20 Updated Aug 26, 2024
Jupyter Notebook 99 8 Updated Nov 11, 2024

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,274 331 Updated May 16, 2023
Next
0