Lists (1)
Sort Name ascending (A-Z)
Stars
A high-performance library for compressed ndarrays, with a flexible computational engine
[NeurIPS 2024] AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising
depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
A lightweight design for computation-communication overlap.
TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.
Tigon: A Distributed Database for a CXL Pod [OSDI '25]
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
MIT IAP short course: Matrix Calculus for Machine Learning and Beyond
a high performance library for building cache simulators
NVIDIA Linux open GPU with P2P support
A static analyzer for Java, C, C++, and Objective-C
Distributed Triton for Parallel Systems
VAST is an experimental compiler pipeline designed for program analysis of C and C++. It provides a tower of IRs as MLIR dialects to choose the best fit representations for a program analysis or fu…
Performance instrumentation and tracing for Android, Linux and Chrome
Open-source implementation for "Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow"
Sampling CPU and HEAP profiler for Java featuring AsyncGetCallTrace + perf_events
This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered Memory Systems
Simple, portable, and self-contained stacktrace library for C++11 and newer