Publications

The results and findings of PRO3D are published in :

  • National and international journals;
  • International and local conferences;
  • Special issues of jrounals and chapters of edited books;
  • International and national workshops and conferences.

There are currently 117 publications.




PRO3D: Programming for Future 3D Manycore Architectures

Christian Fabre, Iuliana Bacivarov, Lothar Thiele, Hoeseok Yang, Pratyush Kumar, Devesh Chokshi, Ahmed Jerraya, Julien Mottin, Jean-Pierre Krimm, Ananda Basu, Saddek Bensalem, Marius Bozga, Paraskevas Bourgos, Martino Ruggiero, Luca Benini, Andrea Marongiu, Eric Flamand and Diego Melpignano.
Proceeddings of the 6th Workshop on Interconnection Network Architecture: On-Chip, Multi-Chip (INA-OCMC 2012).
PRO3D tackles two 3D technologies and their consequences on stacked architectures and software stack: through silicon vias (TSV) and liquid cooling. 3D memory hierarchies and the thermal impact of software on the 3D stack are mainly explored. The PRO3D software development flow is based on a rigorous assembly of software components and monitors the thermal integrity of the 3D stack. PRO3D experiments are mainly targeted on P2012, an industrial embedded manycore platform developped by STMicroelectronics.
DOI: 10.1145/2107763.2107776

From BIP System Model to Platform P2012: A Code Generation Flow

Paraskevas Bourgos, Ananda Basu, Marius Bozga and Saddek Bensalem.
1st Platform P2012 Developper Conference, Grenoble, France.
We present an infrastructure for generating code for the P2012 platform from BIP system models. These models describe the behavior of mixed hardware/software systems. They can be simulated and formally verified using the BIP toolset. System models are obtained in a compositional and incremental manner, by source-to-source transformations, from descriptions of the application software, the hardware architecture, and the mapping. The above descriptions are given using the DOL framework. The generated code is targeted for the NPL runtime implemented for the P2012 platform, available within the 2011.1 SDK. This runtime provides API for thread management, memory allocation, communication and synchronization.

Rigorous Component-Based System Design using the BIP Framework

Saddek Bensalem, Ananda Basu, Marius Bozga, Paraskevas Bourgos and Joseph Sifakis.
Invited presentation at the 5th Annual Layered Assurance Workshop, Orlando, Florida.

OpenMP-based HW Acceleration for On-Chip Multi-Core Shared-Memory Clusters

Andrea Marongiu, Paolo Burgio and Luca Benini.
Presented at the 1st Plateform 2012 Developper Conference, Grenoble, France.
Key to designing accelerator-based MPSoCs in a cost-effective manner is the availability of methodologies to quickly define and instantiate accelerators within a suitable architectural template, from both a hardware and a software perspective. By clearly defining such templates, and providing streamlined communication and synchronization mechanisms between processors and accelerators, programming models can be enriched with abstract constructs to allow designers to focus on accelerator specification at a high level. We present a vertically integrated HW/SW architecture, with a programming model and runtime support to design tightly coupled clusters including one or more dedicated accelerators named HW Processing Units (HWPU). The proposed approach includes an extended OpenMP programming API and compiler that allows the designer to mix code parallelization and acceleration mechanisms while hiding implementation details. Specifically, we extend OpenMP with a key custom directive to outline code regions which are to be hardware-accelerated, rather than executed in software.

Instruction Cache Architectures for Tightly-Coupled Shared-Memory Clusters

Daniele Bortolotti, Francesco Paterna, Christian Pinto, Andrea Marongiu, Martino Ruggiero and Luca Benini.
Presented at the 1st Plateform 2012 Developper Conference, Grenoble, France.
To keep the pace of Moore’s law, several Chip-Multiprocessors (CMP) platforms are embracing the many-core paradigm, where a large number of simple cores are integrated onto the same die. Current examples of many-cores include GP-GPUs such as NVIDIA Fermi, the HyperCore Architecture Line (HAL) processors from Plurality, or STMicroelectronics’ Platform 2012. All of the cited architectures share a few common traits: their fundamental computing tile is a tightly coupled multicore cluster with a shared multibanked L1 memory for fast data access and a fairly large number of simple cores. Key to providing instruction-fetch bandwidth for a cluster is an effective instruction cache architecture design. The main contribution of this work is the analysis and comparison of the two main architectures for instruction caching targeting tightly coupled CMP clusters: (1) private instruction caches per core and (2) shared instruction cache per cluster. We developed a cycle-accurate model of a P2012-like cluster with the two cache organizations, and with several configurable architectural parameters for exploration.

Fast and Lightweight Support for Nested OpenMP Parallelism on a Multi-Cluster Platform

Andrea Marongiu, Paolo Burgio and Luca Benini.
Presented at the 1st Plateform 2012 Developper Conference, Grenoble, France.
OpenMP is a well-known programming model for shared memory parallelism. It consists of a set of compiler directives, library routines and environment variables that provide a simple means to specify parallel execution within a sequential code. OpenMP was originally designed for Symmetric Multi-Processors (SMP), but recently many implementations for embedded MPSoCs have been proposed. In this paper we describe an implementation of the OpenMP runtime library for a multi-cluster MPSoC, modeled after the P2012 architectural template.

Compiling Applications for P2012 with the BIP Tool Chain

Ananda Basu, Marius Bozga, Sadddek Bensalem, Jean-Pierre Krimm, Julien Mottin, Christian Fabre and François Pacull.
1st Platform 2012 Developper Conference, STMicroelectronics \& CEA LETI, Minatec, Grenoble, France.
This presentation describes the results of applying the rigorous system design ow based on the BIP framework on two embedded target applications. The first application is used within the SMECY project and is an image processing application for patterns and form recognition in aerospace and defence industries. The second application is also an image processing application dedicated to produce actual images from camera raw sensor output, and is used within PRO3D. We will detail step-by-step the engineering activities by which an embedded data-flow application written in plain C can be parallelized, then ported to and compiled by the BIP tool chain and finally be run on P2012 manycore simulators. For each application, we first provide a description in DOL. The corresponding system model in BIP is generated from the above description along with the description of the architecture and a mapping. Finally, we generate the deployable code from the system model which is executed on the P2012 simulators.

Energy-Efficient Multi-Objective Thermal Control for Liquid-Cooled 3D Stacked Architectures

Aly, Mohamed Mostafa Sabri and Coskun, Ayse Kivilcim and Atienza Alonso, David and Simunic Rosing, Tajana and Brunschwiler, Thomas.
Institute of Electrical and Electronics Engineers.
3D stacked systems reduce communication delay in multiprocessor system-on-chips (MPSoCs) and enable heterogeneous integration of cores, memories, sensors, and RF devices. However, vertical integration of layers exacerbates temperatureinduced problems such as reliability degradation. Liquid cooling is a highly efficient solution to overcome the accelerated thermal problems in 3D architectures; however, it brings new challenges in modeling and run-time management for such 3D MPSoCs with multi-tier liquid cooling. This paper proposes a novel design-time/run-time thermal management strategy. The design-time phase involves a rigorous thermal impact analysis of various thermal control variables. We then utilize this analysis to design a run-time fuzzy controller for improving energy efficiency in 3D MPSoCs through liquid cooling management and dynamic voltage and frequency scaling (DVFS). The fuzzy controller adjusts the liquid flow rate dynamically to match the cooling demand of the chip for preventing over-cooling and for maintaining a stable thermal profile. The DVFS decisions increase chip-level energy savings and help balance the temperature across the system. Our controller is used in conjunction with temperature aware load balancing and dynamic power management strategies. Experimental results on 2- and 4-tier 3D MPSoCs show that our strategy prevents the system from exceeding the given threshold temperature. At the same time, we reduce cooling energy by up to 63% and system-level energy by up to 21% in comparison to statically setting a flow rate setting to handle worst-case temperatures.
http://infoscience.epfl.ch/record/167825/files/TCAD2011-Sabry_et_al_1.pdf
DOI: 10.1109/TCAD.2011.2164540

Verification of the P2012 Manycore Architecture: Underlying Issues

Richard Hersemeule, STMicroelectronics.
Presentation at Software Technologies Concertation on Formal Methods for Components and Objects, FMCO 2011, Torino, Italy.

Component Assemblies in the context of Manycore

Ananda Basu, Saddek Bensalem, Marius Bozga, Paraskevas Burgos and Joseph Sifakis.
Presentation at Software Technologies Concertation on Formal Methods for Components and Objects, FMCO 2011, Torino, Italy.