Publications

The results and findings of PRO3D are published in :

  • National and international journals;
  • International and local conferences;
  • Special issues of jrounals and chapters of edited books;
  • International and national workshops and conferences.

There are currently 117 publications.




Timing Analysis on a Processor with Temperature-Controlled Speed Scaling

Pratyush Kumar and Lothar Thiele.
In In Proc. of the 18th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2012, Beijing, China, April 2012.
Several recent works consider the problem of temperature-constrained scheduling of jobs. In such attempts, speed of the processor and the execution of jobs is software controlled such that temperature and performance constraints are met. An alternative approach is to use measurements from temperature sensors to actuate the speed of the processor as a feedback control loop. Though such a solution explicitly and independently meets the thermal constraints, the analysis of the real-time properties of tasks served by such a processor is not straightforward. In this paper, we study this problem for a variable stream of jobs characterized by an input arrival rate. We show that an intuitive notion of monotonicity extends to such a processor. Using this property, we present an analytical technique to determine the worst-case delay suffered by jobs. The presented technique efficiently and tightly determines the delay as a function of the initial temperature. The simplicity of this analysis motivates further analysis and mainstream use of such systems.

Platform 2012 Many-Core Programmable Accelerator: Status and Perspectives

Eric Flamand and Diego Melpignano.
DATE 2012, Dresden, Germany, March 12-16, 2012.
Invited Speaker during the session “Many-Core Architectures and Compilers”. Organizer: Maurizio Palesi.

P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator

Luca Benini, Eric Flamand, Didier Fuin, and Diego Melpignano.
In DATE 2012, Dresden. Pages 983-987.
P2012 is an area- and power-efficient many-core computing fabric based on multiple globally asynchronous, locally synchronous (GALS) clusters supporting aggressive fine-grained power, reliability and variability management. Clusters feature up to 16 processors and one control processor with independent instruction streams sharing a multi-banked L1 data memory, a multi-channel DMA engine, and specialized hardware for synchronization and scheduling. P2012 achieves extreme area and energy efficiency by supporting domain-specific acceleration at the processor and cluster level through the addition of dedicated HW IPs. P2012 can run standard OpenCL and OpenMP parallel codes well as proprietary Native Programming Model (NPM) SW components that provide the highest level of control on application-to-resource mapping. In Q3 2011 the P2012 SW Development Kit (SDK) has been made available to a community of R&D users; it includes full OpenCL and NPM development environments. The first P2012 SoC prototype in 28 nm CMOS will sample in Q4 2012, featuring four clusters and delivering 80GOPS (with single precision floating point support) in 15.2 mm2 with 2 W power consumption.

An energy ecient DRAM subsystem for 3D integrated SoCs

C. Weis, I. Loi, L. Benini, and N. Wehn.
In Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pages 1138-1141, march 2012.
Energy efficiency is the key driver for the design optimization of System-on-Chips for mobile terminals (smartphones and tablets). 3D integration of heterogeneous dies based on TSV (through silicon via) technology enables stacking of multiple memory or logic layers and has the advantage of higher bandwidth at lower energy consumption for the memory interface. In this work we propose a highly energy ecient DRAM subsystem for next-generation 3D integrated SoCs, which will consist of a SDR/DDR 3D-DRAM controller and an attached 3D-DRAM cube with a fine-grained access and a very flexible (WIDE-IO) interface. We implemented a synthesizable model of the SDR/DDR 3D-DRAM channel controller and a functional model of the 3D-stacked DRAM which embeds an accurate power estimation engine. We investigated di erent DRAM families (WIDE IO DDR/SDR, LPDDR and LPDDR2) and densities that range from 256Mb to 4Gb per channel. The implementation results of the proposed 3D-DRAM subsystem show that energy optimized accesses to the 3D-DRAM enable an overall average of 37% power savings as compared to standard accesses. To the best of our knowledge this is the rst design of a 3D-DRAM channel controller and 3DDRAM model featuring co-optimization of memory and controller architecture.

Rigorous Component-Based System Design

Saddek Bensalem, Ananda Basu, Marius Bozga, Paraskevas Bourgos, and Joseph Sifakis.
In Francisco Duran, editor, 9th International Workshop on Rewriting Logic and its Applications WRLA 2012, Tallinn, Estonia. March 24-25, 2012. Pre proceedings, pages 1{6. Institute of Cybernetics at Tallinn University of Technology, March 2012.
Rigorous system design requires the use of a single powerful component framework allowing the representation of the designed system at di fferent levels of detail, from application software to its implementation. This is essential for ensuring the overall coherency and correctness. The paper introduces a rigorous design flow based on the BIP (Behavior, Interaction, Priority) component framework. This design flow relies on several, tool-supported, source-to-source transformations allowing to progressively and correctly transform high level application software to- wards efficient implementations for specifi c platforms.

Quantifying the Impact of Frequency Scaling on the Energy Eciency of the Single-Chip Cloud Computer

A. Bartolini, M. Sadri, J. Furst, A.K. Coskun, and L. Benini.
Design, Automation Test in Europe Conference Exhibition (DATE), 2012, pages 181-186, March 2012
Dynamic frequency and voltage scaling (DVFS) techniques have been widely used for meeting energy constraints. Single-chip many-core systems bring new challenges owing to the large number of operating points and the shift to message passing interface (MPI) from shared memory communication. DVFS, however, has been mostly studied on single-chip systems with one or few cores, without considering the impact of the communication among cores. This paper evaluates the impact of frequency scaling on the performance and power of many-core systems with MPI. We conduct experiments on the Single-Chip Cloud Computer (SCC), an experimental many-core processor developed by Intel. The paper first introduces the run-time monitoring infrastructure and the application suite we have designed for an in-depth evaluation of the SCC. We provide an extensive analysis quantifying the eff ects of frequency perturbations on performance and energy efficiency. Experimental results show that run-time communication patterns lead to signifi cant diff erences in power/performance tradeoff s in many-core systems with MPI.

Thermal Balancing of Liquid-Cooled 3D-MPSoCs Using Channel Modulation

Sabry, Mohamed M. and Sridhar, Arvind and Atienza Alonso, David.
IEEE/ACM 2012 Design Automation and Test in Europe conference (DATE), Dresden, Germany, March 12-16, 2012.
While possessing the potential to replace conventional air-cooled heat sinks, inter-tier microchannel liquid cooling of 3D ICs also creates the problem of increased thermal gradients from the fluid inlet to outlet ports [1, 2]. These cooling-induced thermal gradients can be high enough to create undesirable stress in the ICs, undermining the structural reliability and lifetimes. In this paper, we present a novel design-time solution for the thermal gradient problem in liquid-cooled 3D Multi-Processor System-on-Chip (MPSoC) architectures. The proposed method is based on channel width modulation and provides the designers with an additional dimension in the design-space exploration. We formulate the channel width modulation as an optimal control design problem to minimize the temperature gradients in the 3D IC while meeting the design constraints. The proposed thermal balancing technique uses an analytical model for forced convective heat transfer in microchannels, and has been applied to a two tier 3D-MPSoC. The results show that the proposed approach can reduce thermal gradients by up to 31% when applied to realistic 3D-MPSoC architectures, while maintaining pressure drops in the microchannels well below their safe limits of operation.
http://infoscience.epfl.ch/record/173478/files/06.3_2_0505.pdf

An OpenMP Compiler for Efficient Use of Distributed Scratchpad Memory in MPSoCs

A. Marongiu and L. Benini.
Computers, IEEE Transactions on, 61(2):222-236, February 2012.
Most of today’s state-of-the-art processors for mobile and embedded systems feature on-chip scratchpad memories. To eciently exploit the advantages of low-latency high-bandwidth memory modules
in the hierarchy, there is the need for programming models and/or language features that expose such architectural details. On the other hand, e ectively exploiting the limited on-chip memory space requires the programmer to devise an ecient partitioning and distributed placement of shared data at the application level. In this paper, we propose a programming framework that combines the ease of use of OpenMP with simple, yet powerful, language extensions to trigger array data partitioning. Our compiler exploits pro led information on array access count to automatically generate data allocation schemes optimized for locality of references.

PRO3D: Programming for Future 3D Manycore Architectures

Christian Fabre.
Invited talk at the 6th Interconnection Network Architecture: On-Chip, Multi-Chip (INA-OCMC) held in conjunction with the 7th. HiPEAC Conference, January 25, 2012, Paris, France.

Neural Network-Based Thermal Simulation of Integrated Circuits on GPUs

Sridhar, Arvind and Vincenzi, Alessandro and Ruggiero, Martino and Atienza Alonso, David.
IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems.
Institute of Electrical and Electronics Engineers.
With the rising challenges in heat removal in integrated circuits (ICs), the development of thermal-aware computing architectures and run-time management systems have become indispensable to the continuation of IC design scaling. These thermal-aware design technologies of the future strongly depend on the availability of efficient and accurate means for thermal modeling and analysis. These thermal models must have not only the sufficient accuracy to capture the complex mechanisms that regulate thermal diffusion in ICs, but also a level of abstraction that allows for their fast execution for design space exploration. In this paper, we propose an innovative thermal modeling approach for full-chips that can handle the scalability problem of transient heat flow simulation in large 2D/3D multi-processor ICs. This is achieved by parallelizing the computation-intensive task of transient temperature tracking using neural networks and exploiting the computational power of massively parallel graphics processing units (GPUs). Our results show up to 35x run-time speed-up compared to state-of-the-art IC thermal simulation tools while keeping the error lower than 1ºC. Speed-ups scale with the size of the 3D multi-processor ICs and our proposed method serves as a valuable design space exploration tool.
http://infoscience.epfl.ch/record/169825/files/TCAD2012-06106739.pdf
DOI: 10.1109/TCAD.2011.2174236