Stars
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welcome).
pprof is a tool for visualization and analysis of profiling data
A simple, performant and scalable Jax LLM!
Development repository for the Triton language and compiler
Pax is a Jax-based machine learning framework for training large scale models. Pax allows for advanced and fully configurable experimentation and parallelization, and has demonstrated industry lead…
Universal LLM Deployment Engine with ML Compilation
A machine learning framework project motivated by CMU-10414
TH3CHARLie / Halide
Forked from halide/Halidea language for fast, portable data-parallel computation
A simple yet powerful tool to turn traditional container/OS images into unprivileged sandboxes.
Collection of Summer 2025 tech internships!
Various translations of OSTEP can be found here. Help the cause and contribute!
MIT 6.824 (Distributed Systems) labs in Go
A library for replicating your python class between multiple servers, based on raft protocol
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
AI education materials for Chinese students, teachers and IT professionals.
portion, a Python library providing data structure and operations for intervals.
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
The reference implementation of the Linux FUSE (Filesystem in Userspace) interface
TensorFlow code and pre-trained models for BERT
This is the top-level repository for the Accel-Sim framework.
A polyhedral compiler for expressing fast and portable data parallel algorithms
Practice on cifar100(ResNet, DenseNet, VGG, GoogleNet, InceptionV3, InceptionV4, Inception-ResNetv2, Xception, Resnet In Resnet, ResNext,ShuffleNet, ShuffleNetv2, MobileNet, MobileNetv2, SqueezeNet…
JavaScript asynchronous Continuation-Passing Style transformation (deprecated).