Virgo is a GPU microarchitecture that integrates dedicated matrix units at the cluster (SM)-level, achieving better FLOPS scalability and energy efficiency.
This repository includes the essential RTL logic for Virgo's implementation, including the Gemmini matrix unit integration, shared memory, baseline Tensor Core models, memory coalescer, and Vortex SIMT core integration.
The entire Virgo GPU design is implemented within the Chipyard SoC environment. To evaluate the full design, please follow the instructions in Chipyard.
The GPU kernel written and evaluated for Virgo can be found in virgo-kernels.
A Virgo cluster is constructed by integrating a collection of Tile
s that
house Virgo's compute units, such as Vortex SIMT cores and the Gemmini matrix
unit, as well as memory units such as the shared memory and interconnect. We
use rocket-chip's Cluster
API
to construct the cluster hiearchy.
The Chisel RTL code for the main Virgo hardware modules can be found in
src/main/scala/radiance
:
tile
RadianceCluster.scala
: Top-level definition of a Virgo ClusterRadianceSharedMem.scala
: Shared memory and interconnect implementationRadianceTile.scala
: Vortex SIMT core tileGemminiTile.scala
: Gemmini-based Virgo matrix unit and MMIO instantiationVortexCore.scala
: Chisel wrapper module for Vortex SIMT coreBarrier.scala
: Cluster-wide barrier synchronizer module
memory
Coalescing.scala
: Memory coalescer implementationSyncMem.scala
: SRAM implementation for the shared memory*Node.scala
: Arbiter and multiplexer nodes used in the shared memory interconnect
core
TensorCoreDecoupled.scala
: Hopper-style Tensor Core implementationTensorDPU.scala
: Four-element dot-product units used in Tensor Core implementations
subsystem
: Chipyard Config definitions for parameterizing clusters
More details to follow.