Welcome to the repository for "What, When, Where to Compute-in-Memory for Efficient Matrix Multiplication during Machine Learning Inference" by Tanvi Sharma, Mustafa Ali, Indranil Chakroborty and Kaushik Roy. WWW utilizes already existing infrastructure from Timeloop/Accelergy to analytically evaluate different compute in memory designs when integrated in the memory hierarchy of a tensorcore like architecture.
WWW compares SRAM based Compute-in-Memory designs (referred as primitives in this work) by abstracting them in terms of the following template:
and deciding the dataflow using our priority based algorithm for each CiM primitive, when integrated at the shared memory and register file level.
This repository contains the setup for
- Timeloopfe infrastructure (accelergy-timeloop-infrastructure/ and timeloop-accelergy-execises/)
- Priority based algorithm (www-cim/constraints/)
- Scripts used to calculate final performance metrics and plot graphs (www-cim/post-process)
For the paper, please visit WWW.