Stars
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
GNNear: Accelerating Full-Batch Training of Graph NeuralNetworks with Near-Memory Processing
The Artifact of NeoMem: Hardware/Software Co-Design for CXL-Native Memory Tiering
LLM Inference analyzer for different hardware platforms
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
Latency and Memory Analysis of Transformer Models for Training and Inference
This is the FreePDK45 V1.4 Process Development Kit for the 45 nm technology
An integrated power, area, and timing modeling framework for multicore and manycore architectures
Python-based research interface for blackbox and hyperparameter optimization, based on the internal Google Vizier Service.
Provide Python access to the NVML library for GPU diagnostics
Cavs: An Efficient Runtime System for Dynamic Neural Networks
Bridging polyhedral analysis tools to the MLIR framework
Neural network graphs and training metrics for PyTorch, Tensorflow, and Keras.
Research and development for optimizing transformers
[FPGA 2021, Best Paper Award] An automated floorplanning and pipelining tool for Vivado HLS.