Static Thermal Model Learning for High-Performance Multicore Servers

Beneventi, F. and Bartolini, A. and Benini, L.
Proceedings of 20th International Conference on Computer Communications and Networks (ICCCN).
Aggressive thermal management is a critical feature for high-end computing platforms, as worst-case thermal budgeting is becoming unaffordable. Reactive thermal management, which sets temperature thresholds to trigger thermal capping actions, is too “near-sighted”, and it may lead to severe performance degradation and thermal overshoots. More aggressive proactive thermal management minimizes performance penalty with smooth optimal control, but it requires the knowledge of the system thermal models to be precise. Unfortunately, in practice these models are not provided by equipment manufacturers, and they strongly depend on the deployment environment. Hence, we need to develop procedures to derive thermal models automatically in the field. In this paper, we focus on static thermal model learning. We tackle the problem in a real-life context: we developed a complete infrastructure for model-building and thermal data collection in the Linux environment, and we tested it on an Intel Nehalem-based server CPU. Model building is based on a least-square procedure which extracts the model linking power dissipation with temperature in steady-state conditions. Our results show high accuracy and robustness even in presence of a complex thermal environment and limited-precision power and temperature measurements typical of today’s commercial servers.
DOI: 10.1109/ICCCN.2011.6006065