The results and findings of PRO3D are published in :

  • National and international journals;
  • International and local conferences;
  • Special issues of jrounals and chapters of edited books;
  • International and national workshops and conferences.

There are currently 117 publications.

GreenCool: An Energy-Ecient Liquid Cooling Design Technique for 3D MPSoCs Via Channel Width Modulation

Mohamed Mostafa Sabry Aly, Arvind Sridhar, Jie Meng, Ayse Kivilcim Coskun, and David Atienza Alonso.
IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems, 31(12):1-14, 2012.
Liquid cooling using interlayer microchannels has appeared as a viable and scalable packaging technology for 3D multiprocessor system-on-chips (MPSoCs). Microchannelbased liquid cooling, however, can substantially increase the onchip thermal gradients, which are undesirable for reliability, performance, and cooling eciency. In this work we present GreenCool, an optimal design methodology for liquid-cooled 3D MPSoCs. GreenCool simultaneously minimizes the cooling energy for a given system while maintaining thermal gradients and peak temperatures under safe limits. This is accomplished by tuning the heat transfer characteristics of the microchannels using channel width modulation. Channel width modulation is compatible with the current process technologies and incurs minimal additional fabrication costs. Through an extensive set of experiments, we show that channel width modulation is capable of complementing and enhancing the benefits of temperature-aware floor-planning. We also experiment with a 6-core 3D system with stacked DRAM, for which GreenCool improves energy eciency by up to 53% compared to cooling optimization without channel modulation.

Knowledge Based Transactional Behavior

Saddek Bensalem, Marius Bozga, Jean Quilbeuf, and Doron Peled.
In Eight Haifa Veri cation Conference, HVC2012, November 2012. Post proceedings to appear as LNCS volume
Component-based systems (including distributed programs and multiagent systems) involve a lot of coordination. This coordination is done in the background, and is transparent to the operation of the system. The reason for this overhead is the interplay between concurrency and nondeterministic choice: processes alternate between progressing independently and coordinating with other processes, where coordination can involve multiple choices of the participating components. This kind of interactions appeared as early as some of the main communication-based programming languages, where overhead effort often causes a restriction on the possible coordination. With the goal of enhancing the eefficiency of coordination for component-based systems, we propose here a method for coordination-based on the precalculation of the knowledge of processes and coordination agents. This knowledge can be used to lift part of the communication or synchronization that appears in the background of the execution to support the interaction. Our knowledge-based method is orthogonal to the actual algorithms or primitives that are used to guarantee the synchronization: it only removes messages conveying information that knowledge can infer.

Power Agnostic Technique for Efficient Temperature Estimation of Multicore Embedded System

Devendra Rai, Hoeseok Yang, Iuliana Bacivarov, and Lothar Thiele.
In CASES, pages 61-70, 2012.
Temperature plays an increasingly important role in the overall performance of a computing system and in its reliability. Increased availability of multi- and many-core systems provides an opportunity to manage the overall temperature pro le of the system by cleverly designing the application-to core mapping and the associated scheduling policies. There are clear penalties associated with an uncontrolled temperature pro le: a core reaching a critical temperature usually activates built in shut down or voltage and/or frequency scaling mechanisms to cool it down, thereby leading to unplanned performance loss of the system. Similarly, deep thermal cycles with high frequency lead to severe deterioration in the overall reliability of the system. Design space exploration tools are often used to optimize binding and scheduling choices based on a given set of constraints and objectives. These exploration tools rely on fast and accurate temperature estimation techniques. We argue that the currently available techniques are not an ideal t to design space exploration tools, and suggest a system level technique which is based on application fingerprinting. It does not need any information about the processor floorplan, the physical and thermal structure, or about power consumption. Instead, its temperature estimation is based on a set of application-specific calibration runs and associated temperature measurements using available built-in sensors. Using extensive experimental studies, we show that our technique can estimate temperature on all cores of a system to within 5 °C, and is three orders of magnitude faster than state of the art numerical simulators like Hotspot.

SIMinG-1k: A Thousand-Core Simulator Running on General-Purpose Graphical Processing Units

Shivani Raghav, Andrea Marongiu, Christian Pinto, Martino Ruggiero, David Atienza, and Luca Benini.
Concurrency and Computation: Practice and Experience, October 2012.
This paper introduces SIMinG-1k – a manycore simulator infrastructure. SIMinG-1k is a graphics processing unit accelerated, parallel simulator for design-space exploration of large-scale manycore systems. It features an optimal trade-off between modeling accuracy and simulation speed. Its main objectives are high performance, flexibility, and ability to simulate thousands of cores. SIMinG-1k can model different architectures (currently, we support ARM and Intel x86) using two-step approach where architecture specificc front end is decoupled from a fast and parallel manycore virtual machine running on graphical processing unit platform. We evaluate the simulator for target architecture with up to 4096 cores. Our results demonstrate very high scalability and almost linear speedup with simulation of increasing number of cores.

A Distributed Interleaving Scheme for Ecient Access to WideIO DRAM Memory

Ciprian Seiculescu, Luca Benini, and Giovanni De Micheli.
In Proceedings of the Eighth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, CODES+ISSS ’12, pages 103-112, New York, NY, USA, October 2012. ACM.
Achieving the main memory (DRAM) required bandwidth at acceptable power levels for current and future applications is a major challenge for System-on-Chip designers for mobile platforms. Three dimensional (3D) integration and 3D stacked DRAM memories promise to provide a significant boost in bandwidth at low power levels by exploiting multiple channels and wide data interfaces. In this paper, we address the problem of efficiently exploiting the multiple channels provided by standard (JEDEC’s WIDE-IO) 3D-stacked memories, to extract maximal effective bandwidth and minimize latency for main memory access. We propose a new distributed interleaved access method that leverages the on-chip interconnect to simplify the design and implementation of the DRAM controller, without impacting performance compared to traditional centralized implementations. We perform experiments on realistic workload for a mobile communication and multimedia platform and show that our proposed distributed interleaving memory access method improves the overall throughput while minimally impacting the performance of latency sensitive communication flows.

Optimized Distributed Implementation of Multiparty Interactions with Observation

Saddek Bensalem, Marius Bozga, Jean Quibeuf, and Joseph Sifakis.
In 2nd International Workshop on Programming based on Actors, Agents, and Decentralized Control Workshop held at ACM SPLASH 2012, Pre-Proceedings, pages 90-99, October 2012.
Using high level coordination primitives allows enhanced expressiveness of component-based frameworks to cope with the inherent complexity of present-day systems designs. Nonetheless, their
distributed implementation raises multiple issues, regarding both the correctness and the runtime performance of the nal implementation. We propose a novel approach for distributed implementation
of multiparty interactions subject to scheduling constraints expressed by priorities.We rely on new composition operators and semantics that combine multiparty interactions with observation. We show that this model provides a natural encoding for priorities and moreover, can be used as an intermediate step towards provably correct and optimized distributed implementations.

Statistical Model Checking QoS Properties of Systems with SBIP

Saddek Bensalem, Marius Bozga, Benoît Delahaye, Cyrille Jegourel, Axel Legay, and Ayoub Nouri.
In Tiziana Margaria and Bernhard Ste en, editors, Leveraging Applications of Formal Methods, Veri cation and Validation. Technologies for Mastering Change – 5th International Symposium, ISoLA 2012, Proceedings, Part I, volume 7609 of Lecture Notes in Computer Science, pages 327-341. Springer, 2012.
BIP is a component-based framework supporting rigorous design of embedded systems. This paper presents SBIP, an extension of BIP that relies on a new stochastic semantics that enables verification
of large-size systems by using Statistical Model Checking. The approach is illustrated on several industrial case studies.

A framework for automated distributed implementation of component-based models

Borzoo Bonakdarpour, Marius Bozga, Mohamad Jaber, Jean Quilbeuf, and Joseph Sifakis.
Distributed Computing, 25(5):383-409, October 2012.
Although distributed systems are widely used nowadays, their implementation and deployment are still time-consuming, error-prone, and hardly predictable tasks. In this paper, we propose a method
for producing automatically efficient and correct-by-construction distributed implementations from a model of the application software in Behavior, Interaction, Priority (BIP). BIP is a well-founded component-based framework encompassing high-level multi-party interactions for synchronizing components (e.g., rendezvous and broadcast) and dynamic priorities for scheduling between interactions. Our method transforms an arbitrary BIP model into a Send/Receive BIP model that is directly implementable on distributed execution platforms. The transformation consists in (1) breaking the atomicity of actions in components by replacing synchronous multiparty interactions with asynchronous Send/Receive interactions; (2) inserting distributed controllers that coordinate the execution of interactions according to a user-de ned partition of interactions, and (3) adding a distributed algorithm for handling conflicts between controllers. The obtained Send/Receive BIP model is proven observationally equivalent to its corresponding initial model. Hence, all functional properties of the initial BIP model are preserved by construction in the implementation. Moreover, the obtained Send/Receive BIP model can be used to automatically derive distributed executable code. The proposed method is fully implemented. Currently, it is possible to generate C++ implementations for (1) TCP sockets for conventional distributed communication, (2) MPI for multiprocessor platforms, and (3) POSIX threads for deployment on multi-core platforms. We present four case studies and report experimental results for diff erent design choices including partition of interactions and choice of algorithm for distributed conflict resolution.

Advanced coupled voltage-frequency control for power ecient DVFS management

Carolina Albea-Sanchez, Diego Puschini, Suzanne Lesecq, and Yeter Akgul.
In 38th Annual Conference of the IEEE Industrial Electronics Society, page 99, Montreal, Canada, October 2012.
Power management is a hot-topic in complex System-on-Chip (SoC) designs. In the context of advanced technologies, Dynamic Voltage-Frequency Scaling (DVFS) techniques are widely proposed to improve efficiency. Nowadays, these mechanisms are composed of independent actuators controlling the applied voltage and clock frequency. A pre-defined ned sequence has to be used to switch from one state to another in order to avoid timing faults but increasing the energy cost. The timing of the sequence depends on the dynamic response of actuators. In this work, an external controller is designed in order to couple both actuators to manage the voltage and frequency transient periods, increasing power efficiency. The proposed controller has been implemented to couple a Vdd-hopping mechanism with a Frequency-Lock Loop circuit.

Local (V,T) Estimation in Integrated Circuits Using a Set of Statistical Tests

S. Lesecq, L. Vincent, E. Beigne, and Ph Maurines.
Keynote at the IEEE International Conference on Advanced Technologies for Communications (ATC/REV), Hanoi, Vietnam, October 10-12, 2012.
Mobile platforms need ever-increasing computational performances under stringent energy consumption limitation mainly due to the battery lifespan. An optimal operating point is obtained thanks to a compromise between performance and power consumption. For distributed architectures (e.g. MultiProcessor System on Chip), the supply voltage and the operating frequency of each processing element are dynamically tuned to reach ecient performance/power consumption trade-off s. To increase the performance of each \actuator”, the physical state (e.g. the current supply voltage and temperature) of the integrated circuit must be monitored to locally adapt the control parameters. During this keynote, we will present a new estimation method based on statistical tests to estimate the supply voltage and the temperature of a local area in an integrated circuit. Standard ring oscillators buried in the chip provide the raw measurements that are fused to estimate the IC physical state.