ReKarma

🎯

Focusing

LeeHX ReKarma

🎯

Focusing

From Zero To Hero

3 followers · 10 following

ByteDance Inc

Achievements

Stars

hahnyuan / LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 507 59 Updated Sep 11, 2024

feifeibear / LLMRoofline

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 107 4 Updated Mar 13, 2024

excalidraw / excalidraw

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 103,491 10,256 Updated Jul 13, 2025

ScalingIntelligence / KernelBench

KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems

Python 474 48 Updated Jul 11, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 4,718 475 Updated Jun 18, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 3,354 375 Updated Jul 13, 2025

leimao / CUTLASS-Examples

CUTLASS and CuTe Examples

Cuda 63 9 Updated Jan 4, 2025

shawntan / scattermoe

Triton-based implementation of Sparse Mixture of Experts.

Python 225 18 Updated Nov 28, 2024

mit-han-lab / radial-attention

Radial Attention Official Implementation

Python 348 16 Updated Jul 6, 2025

facebookexperimental / triton

Github mirror of trition-lang/triton repo.

MLIR 48 16 Updated Jul 12, 2025

udacity / cs344

Introduction to Parallel Programming class code

Cuda 1,322 1,140 Updated Jun 27, 2022

guandeh17 / Self-Forcing

Python 2,239 148 Updated Jul 11, 2025

SandAI-org / MAGI-1

MAGI-1: Autoregressive Video Generation at Scale

Python 3,372 196 Updated Jun 17, 2025

FlagOpen / FlagAttention

A collection of memory efficient attention operators implemented in the Triton language.

Python 272 18 Updated Jun 5, 2024

xlite-dev / ffpa-attn

⚡️FFPA: Extend FlashAttention-2 with Split-D, achieve ~O(1) SRAM complexity for large headdim, 1.8x~3x↑ vs SDPA.🎉

Cuda 191 8 Updated May 10, 2025

weishengying / cutlass_flash_atten_fp8

使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention

Cuda 72 6 Updated Aug 12, 2024

ByteDance-Seed / Bagel

Open-source unified multimodal model

Python 4,542 383 Updated Jul 2, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 5,484 576 Updated Jul 13, 2025

TU-Berlin-DIMA / fast-interconnects

Research project on scaling GPU-accelerated data management to large data volumes. Code base of two SIGMOD papers.

Rust 17 5 Updated Jun 14, 2022

ademeure / cuda-side-boost

Cuda 20 2 Updated May 5, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 880 70 Updated Jul 11, 2025

VainF / TinyFusion

[CVPR 2025 Highlight] TinyFusion: Diffusion Transformers Learned Shallow

Python 130 1 Updated Apr 5, 2025

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,008 68 Updated Jul 8, 2025

svg-project / Sparse-VideoGen

[ICML2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity

Python 365 15 Updated Jun 6, 2025

VainF / Diff-Pruning

[NeurIPS 2023] Structural Pruning for Diffusion Models

Python 198 14 Updated Jul 8, 2024

deepseek-ai / open-infra-index

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,862 280 Updated May 15, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,136 920 Updated Jun 17, 2025

thu-ml / SpargeAttn

SpargeAttention: A training-free sparse attention that can accelerate any model inference.

Cuda 649 48 Updated Jun 19, 2025

deepseek-ai / profile-data

Analyze computation-communication overlap in V3/R1.

1,076 144 Updated Mar 21, 2025

deepseek-ai / EPLB

Expert Parallelism Load Balancer

Python 1,231 194 Updated Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LeeHX ReKarma

Achievements

Achievements

Block or report ReKarma

Stars

hahnyuan / LLM-Viewer

feifeibear / LLMRoofline

excalidraw / excalidraw

ScalingIntelligence / KernelBench

gpu-mode / lectures

flashinfer-ai / flashinfer

leimao / CUTLASS-Examples

shawntan / scattermoe

mit-han-lab / radial-attention

facebookexperimental / triton

udacity / cs344

guandeh17 / Self-Forcing

SandAI-org / MAGI-1

FlagOpen / FlagAttention

xlite-dev / ffpa-attn

weishengying / cutlass_flash_atten_fp8

ByteDance-Seed / Bagel

xlite-dev / LeetCUDA

TU-Berlin-DIMA / fast-interconnects

ademeure / cuda-side-boost

ByteDance-Seed / Triton-distributed

VainF / TinyFusion

bytedance / flux

svg-project / Sparse-VideoGen

VainF / Diff-Pruning

deepseek-ai / open-infra-index

deepseek-ai / 3FS

thu-ml / SpargeAttn

deepseek-ai / profile-data

deepseek-ai / EPLB