Instruction Cache Architectures for Tightly-Coupled Shared-Memory Clusters

Daniele Bortolotti, Francesco Paterna, Christian Pinto, Andrea Marongiu, Martino Ruggiero and Luca Benini.
Presented at the 1st Plateform 2012 Developper Conference, Grenoble, France.
To keep the pace of Moore’s law, several Chip-Multiprocessors (CMP) platforms are embracing the many-core paradigm, where a large number of simple cores are integrated onto the same die. Current examples of many-cores include GP-GPUs such as NVIDIA Fermi, the HyperCore Architecture Line (HAL) processors from Plurality, or STMicroelectronics’ Platform 2012. All of the cited architectures share a few common traits: their fundamental computing tile is a tightly coupled multicore cluster with a shared multibanked L1 memory for fast data access and a fairly large number of simple cores. Key to providing instruction-fetch bandwidth for a cluster is an effective instruction cache architecture design. The main contribution of this work is the analysis and comparison of the two main architectures for instruction caching targeting tightly coupled CMP clusters: (1) private instruction caches per core and (2) shared instruction cache per cluster. We developed a cycle-accurate model of a P2012-like cluster with the two cache organizations, and with several configurable architectural parameters for exploration.