-
University of Virginia
- yg9bq@virginia.edu
- @YiminGao
TMMA: A Tiled Matrix Multiplication Accelerator for Self-Attention Projections in Transformer Models, optimized for edge deployment on Xilinx KV260.
Quantize GPT LLM models with HuggingFace transformers
Source code to simulate WTF-PAD on a set of web traffic traces.
Project repository for creating padding machines for Tor to defend against website fingerprinting
Scalable systolic array-based matrix-matrix multiplication implemented in Vivado HLS for Xilinx FPGAs.
A systolic array simulator for multi-cycle MACs and varying-byte words, with the paper accepted to HPCA 2022.
IC implementation of Systolic Array for TPU
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
Repository to host and maintain scale-sim-v2 code
Systolic Array implementation for ASIC Course
ARIES: An Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines (FPGA 2025 Best Paper Nominee)
Allo: A Programming Model for Composable Accelerator Design
Machine-Learning Accelerator System Exploration Tools
Systolic matrix multiplication kernel implemented on Xilinx PYNQ FPGA board
SAURIA (Systolic-Array tensor Unit for aRtificial Intelligence Acceleration) is an open-source Convolutional Neural Network accelerator based on a GeMM systolic array engine.
Digital timing diagram editor
INT8 & FP16 multiplier accumulator (MAC) design with UVM verification completed.
FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Chris Titus Tech's Windows Utility - Install Programs, Tweaks, Fixes, and Updates