Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

Python 455 56 Updated Sep 11, 2024

feifeibear / LLMRoofline

Compare different hardware platforms via the Roofline Model for LLM inference tasks.

Jupyter Notebook 100 4 Updated Mar 13, 2024

mitmath / matrixcalc

MIT IAP short course: Matrix Calculus for Machine Learning and Beyond

Jupyter Notebook 482 65 Updated Feb 3, 2025

usstq / mm_amx

matmul using AMX instructions

C++ 13 5 Updated May 7, 2024

NyxWh1sper / nyxwh1sper.github.io

HTML 1 Updated Apr 14, 2025

1a1a11a / libCacheSim

a high performance library for building cache simulators

C++ 219 63 Updated May 10, 2025

0xAX / asm

Learning assembly for Linux x86_64

Assembly 2,855 336 Updated May 2, 2025

tinygrad / open-gpu-kernel-modules

Forked from NVIDIA/open-gpu-kernel-modules

NVIDIA Linux open GPU with P2P support

C 1,136 109 Updated May 5, 2025

facebook / infer

A static analyzer for Java, C, C++, and Objective-C

OCaml 15,204 2,033 Updated May 12, 2025

CodingMizore / SharedFDU

course notes for everyone

15 Updated May 6, 2025

ByteDance-Seed / Triton-distributed

Distributed Triton for Parallel Systems

Python 690 43 Updated May 12, 2025

ppl-ai / pplx-kernels

Perplexity GPU Kernels

C++ 285 31 Updated May 13, 2025

trailofbits / vast

VAST is an experimental compiler pipeline designed for program analysis of C and C++. It provides a tower of IRs as MLIR dialects to choose the best fit representations for a program analysis or fu…

C++ 418 29 Updated Apr 24, 2025

google / perfetto

Performance instrumentation and tracing for Android, Linux and Chrome

C++ 3,867 460 Updated May 14, 2025

Thesys-lab / Helix-ASPLOS25

Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"

Python 40 6 Updated Nov 24, 2024

async-profiler / async-profiler

Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events

C++ 8,177 901 Updated May 13, 2025

uclasystem / Semeru

A Memory-Disaggregated Managed Runtime.

66 6 Updated Aug 28, 2021

jeraymond / refcount

Reference counting in c

C 33 4 Updated Apr 14, 2023

ece-fast-lab / ASPLOS-2025-M5

This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered Memory Systems

C 12 Updated Apr 1, 2025

jeremy-rifkin / cpptrace

Simple, portable, and self-contained stacktrace library for C++11 and newer

C++ 949 106 Updated May 14, 2025

MrKai77 / Loop

Window management made elegant.

Swift 8,274 173 Updated Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yi Sun Boreas618

Achievements

Achievements

Block or report Boreas618

Lists (1)

🔮 Future ideas

Stars

Blosc / python-blosc2

pytorch / extension-cpp

czg1225 / AsyncDiff

thuml / depyf

infinigence / FlashOverlap

microsoft / TileFusion

yibo-huang / tigon

uccl-project / uccl

microsoft / taccl

hahnyuan / LLM-Viewer