Platform 2012, a many-core computing accelerator for embedded SoCs: performance evaluation of visual analytics applications

Diego Melpignano, Luca Benini, Eric Flamand, Bruno Jego, Thierry Lepley, Germain Haugou, Fabien Clermidy, and Denis Dutoit.
In DAC 2012, pages 1137-1142.
P2012 is an area- and power-efficient many-core computing accelerator based on multiple globally asynchronous, locally synchronous processor clusters. Each cluster features up to 16 processors with independent instruction streams sharing a multi-banked one-cycle access L1 data memory, a multi-channel DMA engine and specialized hardware for synchronization and aggressive power management. P2012 is 3D stacking ready and can be customized to achieve extreme area and energy efficiency by adding domain-speci c HW IPs to the cluster. The rst P2012 SoC prototype in 28 nm CMOS will sample in Q3, featuring four 16-processor clusters, a 1MB L2 memory and delivering 80GOPS (with 32 bit single precision floating point support) in 18 mm2 with 2 W power consumption (worst-case). P2012 can run standard OpenCL™ and proprietary Native Programming Model SW components to achieve the highest level of control on application-to-resource mapping. A dedicated version of the OpenCV vision library is provided in the P2012 SW Development Kit to enable visual analytics acceleration. This paper will discuss preliminary performance measurements of common feature extraction and tracking algorithms, parallelized on P2012, versus sequential execution on ARM CPUs.