The HPC market is developing quickly pushed by new application domains (i.e. computationally intensive data analytics), new applications of massively parallel computations, and by the increased ability of new customers to enter the market. In this scenario, it is necessary to develop a middleware able to make the system reliable despite the increasing number of resources and the descreasing mean time between failures.
To manage this complexity, the EU Horizon 2020 project RECIPE will provide:
- A hierarchical runtime resource management infrastructure able to optimise energy efficiency and to minimise the occurrence of termal hotspots. Such infrastructure will also enforce the time constraints imposed by the application, ensuring reliability for both time-critical and throughput-oriented computations;
- A predictive reliability methodology to support QoS in face of both transient and long-term hardware failures;
- A set of integration layers to allow the resource manager to interact with both the application and the underlying deeply heterogeneous architecture.
- To increase the energy efficiency of HPC systems by 25%, with an improvement of 15% of MTTF;
- To improve the energy-delay product by up to 25%;
- To reduce the occurrence of fault executions by 20% with recovery times compatible to real-time performance and full exploitation of available resources under non-saturated conditions.
The RECIPE project will assess its results against real world use cases, addressing key application domains:
- Geophysical exploration: thanks to the efficient implementation of the RTRM, the resulting Full Waveform Inversion tool reduces the uncertainty of current seismic exploration surveys;
- Environmental monitoring and metereology: the developed RTRM will improve the ability to keep the status of water basins under control and the behaviour of power plants exploiting renewable energy sources (RES) such as wind turbines;
- Bio-medical machine learning and big data analytics: the developed software infrastructure will enable the deployment of the epileptic seizure detection algorithms in a prototype platform able to manage a large-scale population while meeting the real-time requirements of the application.
To enact this ambitious research and innovation program, the RECIPE project relies on a consortium composed of leading academic partners, including POLIMI, the largest technical University in Italy, providing expertise on resource management and programming models as well as scientific coordination; EPFL, the leading provider of thermal models for HPC; UPV, one of the key innovators in optimized interconnection networks, and CeRICT, providing expertise on accelerators; as well as two supercomputing centers: BSC, one of the leading HPC providers in Europe with the MareNostrum, classed 13th in the Top 500 in June 2017, and PSNC, another Top 500 HPC center in Poland; a research hospital from Switzerland, CHUV, and an SME active in product design and development, IBTS, which provide effective exploitation avenues through industry-based use cases.