Description
PRO3D, or “Programming for Future 3D Architecture with Many Cores”, is a FP7 project funded by the EU under grant agreement n° 248 776. PRO3D belongs to theme “3.6 Computing Systems”.
The project started on January 1st and was successfully completed on December 31st, 2012. This page presents the summary of the project achievements.
PRO3D Executive Summary
During the last three decades, the performance of microprocessors & microcontrollers has steadily increased at the impressive rate of 100 times per decade. This was fuelled by: (1) The exponential growth in clock speeds; (2) The exponential growth in the number of transistors per die; and (3) The acceleration of instruction flows obtained by various techniques for reducing latency and maximising the amount of computation per clock cycle.
However, this picture was rapidly and radically changing. The shift to parallel architectures was not at all the consequence of a scientific breakthrough. It was primarily a consequence of hitting technology walls that prevented from pushing forward the efficient implementation of traditional uniprocessor designs in silicon. These technologies hitting walls are: (a) Voltage scaling and power reduction techniques,
or Power Wall; (b) Instruction-level parallelism, or Complexity Wall; (c) Memory latency hiding techniques, or Memory Wall; (d) Reliable and low-variability silicon technology, or Yield Wall.
As an illustration of the rapidly evolving context of PRO3D, we can mention that the expected industrial solutions to the Memory Wall changed dramatically during the course of the project. At the start of PRO3D the most promising solution was WideIO, where a dedicated memory layer is connected to the computing fabric by vertical interconnects. This solution was expected to outperform the flat, 2D, LPDDR solutions both in speed and energy efficiency. By the time the project completed, the LPDDR roadmap had accelerated to the point of overlapping significantly WideIO whith an incremental evolution of the standard, 2D-based, solution: LPDDR3 & LPDDR4.
The PRO3D project proposed a holistic approach for the development activities ranged from programming to architecture exploration and fabrication technologies, and yield the following outcome: (1)
– Thermal Modelling & Simulation
3D technology allow to achieve smaller footprint in each layer and shorter vertical wires but thermal problems caused by higher power densities and greater thermal resistances contributes to impede the establishment of this technology. Thus, thermal modeling has become increasingly important in 3D chip design.
This simulator developped in PRO3D is ideal for situations where a quick estimate of chip temperatures is required. 3D-ICE can be used to assist early stages of architecture design as for floorplanning of 3D multicore architectures or for the enhancement of liquid cooling efficiency through channel modulation. Simulations run with 3D-ICE contributed as well to the development of new strategies for optimal placement of thermal sensors or for the validation of run time thermal-aware management policies.
– Rigorous Design Flow & Statistical Model Checking.
PRO3D developped a rigorous design flow based on the BIP and DOL component framework: The flow is (i) model-based, that is, both application software and mixed hardware/software system descriptions are modeled by using a single, semantic framework; (ii) it is component-based, that is, it provides primitives for building composite components as the composition of simpler components; (iii) it is tool-supported, that is, all steps in the design flow are realized automatically by tools; (iv) It supports
(SMC)
The core idea of SMC is to
– Exploiting 3D Opportunities into Multicore Architectures
The architecture developed in PRO3D project include several computation tiles connected via a mesh NoC. The computation tiles are homogeneous and consist of a network interface for accessing the external NoC, a fast crossbar for quick access to local memory, a set of devices for effcient intra-tile synchronization and a variable number of Processing Elements (PEs). The memory hierarchy comprises several levels and memory types: L1 is the closest to the PE and can also be accessed by non local processing elements with an increased access time. L2 level is Tightly Coupled Data Memory (TCDM) directly connected via a logarithmic interconnect inside each cluster. L3 is a large memory that can be accessed by all the clusters of the fabric implementation as a central 3D stacked memory.
– Performing thermal aware system-level exploration & analysis of 3D designs.
The move towards architectures with tens of cores complicates the process of designing applications. The source of the complexity is two-fold: (1) the specification of the application must expose concurrency, while ensuring the lack of any of the common errors such as race conditions; (2) a highly concurrent application must be mapped on to the architecture with performance and/or temperature considerations.
The DOL formalism was selected as the programming model for the project and has been improved along several axes: (1) Specification and verification, as we specify data flow applications as a set of processes communicating via first-in-first-out (FIFO) buffers; (2) Code generation and calibration of the application for design space exploration of mappings; (3) An analytic approach to compute the worst-case die temperature; (4) Another approach to directly calibrate the thermal performance of an application without need for intermediate parameters. And finaly, (5) Real-time performance with thermal-aware run-time adaptation has been studied. The main options analyzed were: throttling by shaping the input of tasks, and DVSF control of a processor.
– Virtual Prototyping
The lack of Electronic Design Automation (EDA) tools that can provide IC designers with efficient simulation of functional and thermal behavior of ICs limits the development of thermal-aware design and run-time approaches for 3D ICs. In PRO3D, we have successfully developed a new virtual platform prototyping framework targeting the full-system simulation of massively parallel heterogeneous 3D system-on-chip, composed by a general purpose processor and a many-core hardware accelerator.