Vertical Stealing: Robust, Locality-Aware Do-All Workload Distribution for 3D MPSoCs

Marongiu A., Burgio P., Benini L.
Proceedings of the 2010 International Conference on Compilers, architecture, and synthesis for embedded systems (CASES 2010).
In this paper we address the issue of efficient do all workload distribution on a embedded 3D MPSoC. 3D stacking technology enables low latency and high bandwidth access to multiple, large memory banks in close spatial proximity. In our implementation one silicon layer contains multiple processors, whereas one or more DRAM layers on top host a NUMA memory subsystem. To obtain high locality and balanced workload we consider a two-step approach. First, a compiler pass analyzes memory references in a loop and schedules each iteration to the processor owning the most frequently accessed data. Second, if locality-aware loop parallelization has generated unbalanced workload we allow idle processors to execute part of the remaining work from neighbours by implementing runtime support for work stealing.