Starred repositories
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
FPGA based Vision Transformer accelerator (Harvard CS205)
You can run it on pynq z1. The repository contains the relevant Verilog code, Vivado configuration and C code for sdk testing. The size of the systolic array can be changed, now it is 16X16.
mflowgen -- A Modular ASIC/FPGA Flow Generator
ASIC Design kit for Skywater 130 for use with mflowgen
Open source process design kit for usage with SkyWater Technology Foundry's 130nm node.
[TCAD'23] AccelTran: A Sparsity-Aware Accelerator for Transformers
[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Brevitas: neural network quantization in PyTorch
A curated list for Efficient Large Language Models
Verilog implementation of various types of CPUs
An open-source static random access memory (SRAM) compiler.
ASIC Design Kit for FreePDK45 + Nangate for use with mflowgen
Textbook and full source codes to learn basics of RISC-V pipelined CPU design using the Bluespec Hardware Design Language(s)
EDA toolchain for processing-in-memory architectures, including an architecture synthesizer, a compiler, and a simulator
An Automatic Synthesis Tool for PIM-based CNN Accelerators.
This is the verilog implementation of IEEE 754 32 bit floating point multiplier
The CORE-V CVA6 is an Application class 6-stage RISC-V CPU capable of booting Linux
🎲 A Tiny and Platform-Independent True Random Number Generator for any FPGA (and ASIC).
OpenSource GPU, in Verilog, loosely based on RISC-V ISA
IC implementation of Systolic Array for TPU
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Chisel: A Modern Hardware Design Language
This is originally a collection of papers on neural network accelerators. Now it's more like my selection of research on deep learning and computer architecture.
MEMORY CENTRIC SYSTEMS FOR AI(CSI6207-01) Lecture at Yonsei(20-1)
DRAMsim3: a Cycle-accurate, Thermal-Capable DRAM Simulator
ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference