Goal of this project is to measure and analyse the scaling of the silicon area needed for large ternary dot product area. This is part of a larger work to explore various architectures for Neural Networks on Chip.
For more information read the documentation.
128 element dot-product is computed each cycle. At a nominal ASIC frequency of 50 MHz this design achieves a performance of 6.4 Giga OP/s
.
Vector size | Adder tree depth | Output type | # of logic cells | Total # of cells | Wire length (um) | Dimensions (um) | Area (um2) | Tiles |
---|---|---|---|---|---|---|---|---|
1 | - | 2-bit signed | 2 | 32 | 215 | 18 x 11 | 198 um2 | 0.9% |
2 | 1 | 3-bit signed | 10 | 44 | 395 | 45 x 8 | 360 um2 | 1.6% |
4 | 2 | 4-bit signed | 31 | 77 | 757 | 60 x 11 | 660 um2 | 3.8% |
32 | 5 | 7-bit signed | 336 | 508 | 9982 | 112 x 70 | 7840 um2 | 36% |
64 | 6 | 8-bit signed | 737 | 1073 | 23329 | 160 x 86 | 13760 um2 | 75% |
128* | 7 | 9-bit signed | 1472 | 2121 | 59822 | 112 x 200 | 22400 um2 | 143% |
256 | 8 | 10-bit signed | 2941 | 4207 | 151707 | 320 x 112 | 35840 um2 | 269% |
*) Version taped out with TinyTapeout 10
Type | Tiles | Wire length (um) | Setup Worst Slack | Setup Slack (Typical) | fMax |
---|---|---|---|---|---|
Naive v[0]+v[1]+v[2]+ ... |
38.395 % | 12040 | 7.3ns | 13.48ns | 153 MHz |
Adder tree | 38.466 % | 11425 | 7.3ns | 13.45ns | 152 MHz |
Logic, carry save adder | 38.136 % | 10931 | 6.9ns | 13.41ns | 151 MHz |
HA/FA cells, carry save adder | 28.712 % | 9164 | 3.5ns | 12.24ns | 129 MHz |
Note that there is no significant area difference between various approaches unless sky130 HA/FA cells are used!
Left: blue cells - compute, white cells - ternary vector storage
Right: wires connecting cells
blue cells - compute, white cells - ternary vector storage
Read the project's documentation.
Tiny Tapeout is an educational project that aims to make it easier and cheaper than ever to get your digital and analog designs manufactured on a real chip.
To learn more and get started, visit https://tinytapeout.com.