Description

PRO3D, or “Programming for Future 3D Architecture with Many Cores”, is a FP7 project funded by the EU under grant agreement n° 248 776. PRO3D belongs to theme “3.6 Computing Systems”.

The project started on January 1st and was successfully completed on December 31st, 2012. This page presents the summary of the project achievements.

PRO3D Executive Summary

During the last three decades, the performance of microprocessors & microcontrollers has steadily increased at the impressive rate of 100 times per decade. This was fuelled by: (1) The exponential growth in clock speeds; (2) The exponential growth in the number of transistors per die; and (3) The acceleration of instruction flows obtained by various techniques for reducing latency and maximising the amount of computation per clock cycle.

However, this picture was rapidly and radically changing. The shift to parallel architectures was not at all the consequence of a scientific breakthrough. It was primarily a consequence of hitting technology walls that prevented from pushing forward the efficient implementation of traditional uniprocessor designs in silicon. These technologies hitting walls are: (a) Voltage scaling and power reduction techniques,
or Power Wall; (b) Instruction-level parallelism, or Complexity Wall; (c) Memory latency hiding techniques, or Memory Wall; (d) Reliable and low-variability silicon technology, or Yield Wall.

As an illustration of the rapidly evolving context of PRO3D, we can mention that the expected industrial solutions to the Memory Wall changed dramatically during the course of the project. At the start of PRO3D the most promising solution was WideIO, where a dedicated memory layer is connected to the computing fabric by vertical interconnects. This solution was expected to outperform the flat, 2D, LPDDR solutions both in speed and energy efficiency. By the time the project completed, the LPDDR roadmap had accelerated to the point of overlapping significantly WideIO whith an incremental evolution of the standard, 2D-based, solution: LPDDR3 & LPDDR4.

The PRO3D project proposed a holistic approach for the development activities ranged from programming to architecture exploration and fabrication technologies, and yield the following outcome: (1) Thermal Modelling & Simulation. (2) Programming, compilation, verification & deployment for 3D manycore architectures, including Statistical Model Checking (SMC) of System Models. (3) Exploiting 3D opportunities into multicore architectures. (4) System-level thermal-aware exploration & analysis of 3D designs. (5) Virtual Prototyping.

– Thermal Modelling & Simulation

3D technology allow to achieve smaller footprint in each layer and shorter vertical wires but thermal problems caused by higher power densities and greater thermal resistances contributes to impede the establishment of this technology. Thus, thermal modeling has become increasingly important in 3D chip design.

3D-ICE stands for 3D Interlayer Cooling Emulator. It is a Linux based Thermal Emulator Library written in C, which can perform transient thermal analyses of vertically stacked 3D integrated circuits with inter-tier Microchannel Liquid Cooling. It is based on the conventional compact modeling of heat transfer by conduction in solids, and advances a novel compact modeling methodology, called the Compact Transient Thermal Modeling (CTTM), for heat transfer by convection in microchannels.

This simulator developped in PRO3D is ideal for situations where a quick estimate of chip temperatures is required. 3D-ICE can be used to assist early stages of architecture design as for floorplanning of 3D multicore architectures or for the enhancement of liquid cooling efficiency through channel modulation. Simulations run with 3D-ICE contributed as well to the development of new strategies for optimal placement of thermal sensors or for the validation of run time thermal-aware management policies.

– Rigorous Design Flow & Statistical Model Checking.

PRO3D developped a rigorous design flow based on the BIP and DOL component framework: The flow is (i) model-based, that is, both application software and mixed hardware/software system descriptions are modeled by using a single, semantic framework; (ii) it is component-based, that is, it provides primitives for building composite components as the composition of simpler components; (iii) it is tool-supported, that is, all steps in the design flow are realized automatically by tools; (iv) It supports Satistical Model Checking of system models
(SMC)
.

The core idea of SMC is to conduct simulations of the system and then use statistical results in order to decide whether the system satisfies the property or not. For instance, SMC can be used to estimate the probability that a system satisfies a given property. In contrast with exhaustive approaches, a simulation-based solution does not guarantee a correct result. However, it is possible to bound the probability of making an error. In PRO3D, SMC has been successfully applied through the BIP framework for evaluation of performance properties of mixed hardware/software system models.

– Exploiting 3D Opportunities into Multicore Architectures

The architecture developed in PRO3D project include several computation tiles connected via a mesh NoC. The computation tiles are homogeneous and consist of a network interface for accessing the external NoC, a fast crossbar for quick access to local memory, a set of devices for effcient intra-tile synchronization and a variable number of Processing Elements (PEs). The memory hierarchy comprises several levels and memory types: L1 is the closest to the PE and can also be accessed by non local processing elements with an increased access time. L2 level is Tightly Coupled Data Memory (TCDM) directly connected via a logarithmic interconnect inside each cluster. L3 is a large memory that can be accessed by all the clusters of the fabric implementation as a central 3D stacked memory.

– Performing thermal aware system-level exploration & analysis of 3D designs.

The move towards architectures with tens of cores complicates the process of designing applications. The source of the complexity is two-fold: (1) the specification of the application must expose concurrency, while ensuring the lack of any of the common errors such as race conditions; (2) a highly concurrent application must be mapped on to the architecture with performance and/or temperature considerations.

The DOL formalism was selected as the programming model for the project and has been improved along several axes: (1) Specification and verification, as we specify data flow applications as a set of processes communicating via first-in-first-out (FIFO) buffers; (2) Code generation and calibration of the application for design space exploration of mappings; (3) An analytic approach to compute the worst-case die temperature; (4) Another approach to directly calibrate the thermal performance of an application without need for intermediate parameters. And finaly, (5) Real-time performance with thermal-aware run-time adaptation has been studied. The main options analyzed were: throttling by shaping the input of tasks, and DVSF control of a processor.

– Virtual Prototyping

The lack of Electronic Design Automation (EDA) tools that can provide IC designers with efficient simulation of functional and thermal behavior of ICs limits the development of thermal-aware design and run-time approaches for 3D ICs. In PRO3D, we have successfully developed a new virtual platform prototyping framework targeting the full-system simulation of massively parallel heterogeneous 3D system-on-chip, composed by a general purpose processor and a many-core hardware accelerator.