SPH-EXA2: Smoothed Particle Hydrodynamics at Exascale
Research Project | 4 Project Members
The goal of the SPH-EXA2 project is to scale the Smoothed Particle Hydrodynamics (SPH) method implemented in SPH-EXA1 to enable Tier-0 and Exascale simulations . To reach this goal we define four concrete and interrelated objectives: physics, performance, correctness, and portability & reproducibility . We aim at coupling relevant physics modules with our SPH framework enabling the possibility of addressing both long-standing and cutting-edge problems via beyond state state-of-the-art simulations at extreme scales in the fields of Cosmology and Astrophysics. Such simulations include the formation, growth, and mergers of supermassive black holes in the early universe which would greatly impact the scientific community (for instance, the 2020 Nobel Prize in Physics has been awarded for pioneering research on super-massive black holes). Moreover, the ability to simulate planet formation with high-resolution models will play an important role in consolidating Switzerland's position as a leader in experimental physics and observational astronomy. Additional targets will be related to explosive scenarios such as core-collapse and Type Ia supernovas , in which Switzerland has also maintained a long record of international renown. These simulations would be possible with a Tier-0-ready SPH code and would have a large impact on projects such as the current NCCR PlanetS funded by the SNF. The long-term and ambitious vision of the SPH-EXA consortium is to study fluid and solid mechanics in a wide range of research fields, that nowadays are unfeasible (with the current models, codes, and architectures). To this end, in SPH-EXA2 we build on SPH-EXA1 and develop a scalable bare-bones SPH simulation framework , and refer to it as SPH-EXA . In Switzerland, within the framework of the PASC SPH-EXA (2017-2021) project , we developed the SPH-EXA miniapp as a scalable SPH code that employs state-of-the-art parallel programming models and software engineering techniques to exploit the current HPC architectures, including accelerators. The current SPH-EXA mini-app performs pure hydrodynamical simulations with up to 1 trillion SPH particles using only CPUs on 4,096 nodes on Piz Daint at CSCS. With relatively limited memory per GPU, the miniapp can still scale up to 250 billion SPH particles. In terms of performance , the use of accelerators is necessary to meet the above SPH-EXA2 goal and objectives. Offloading computationally-intensive steps to hardware accelerators, such as self-gravity evaluation and ancillary physics, will enable SPH-EXA to simulate increasingly complex cosmological & astrophysical scenarios. We envision that various types of hardware accelerators will be deployed on the supercomputers that we will use in this project, such as NVIDIA GPUs (in Piz Daint) or AMD GPUs (in LUMI). Portability across GPUs will be ensured by using OpenACC and OpenMP target offloading, which is supported by different GPU vendors. Scheduling & load balancing and fault tolerance are major challenges on the way to Exascale. We will address these challenges in SPH-EXA2 by employing locality-aware data decomposition, dynamic & adaptive scheduling and load balancing , and advanced fault tolerance techniques. Specifically, we will schedule & load balance the computational load across heterogeneous CPUs, various NUMA domains (e.g., multiple sockets or memory controllers, multi-channel DRAM, and NV-RAM), and between CPUs and GPUs. To achieve correctness , we will examine and verify the effectiveness of the new MPI 4.0 and beyond standard support for fault tolerance, in addition to selective particle replication (SPR) and optimal checkpointing (to NV-RAM or SSD). To ensure performance portability & reproducibility , we will benchmark SPH-EXA1's performance on a wide variety of platforms, as well as build off-the-shelf SPH-EXA containers that can easily be deployed with no additional setup required. This will also enlarge the SPH-EXA code user base. The primary advantage of the SPH-EXA2 project is its scientific interdisciplinarity . The project involves computer scientists, computer engineers, astrophysicists, and cosmologists. This is complemented by a holistic co-design, which involves applications (cosmology, astrophysics, CFD), algorithms (SPH, domain decomposition, load balancing, scheduling, fault tolerance, etc.), and architectures (CPUs, GPUs, etc.) as opposed to the traditional binary software-hardware co-design.