Faculty of Science
Faculty of Science
UNIverse - Public Research Portal

High Performance Computing

Projects & Collaborations

18 found
Show per page
Project cover

Swiss Participation in the Square Kilometre Array Observatory

Research Project  | 3 Project Members

The Square Kilometre Array Observatory ( SKAO ) is a next-generation radio astronomy facility, involving partners around the globe, that will lead to groundbreaking new insights in astrophysics and cosmology. Established on March 12, 2019 the SKAO is the second inter-governmental organisation dedicated to astronomy in the world. It will be operated over three sites: the Global Headquarters in the UK, the mid-frequency array in South Africa (SKA-mid), and the low-frequency array in Australia (SKA-low). The two telescopes under construction, SKA-Mid and SKA-Low, will combine the signals received from thousands of small antennae spread over a distance of several thousand kilometres to simulate a single giant radio telescope capable of extremely high sensitivity and angular resolution, using a technique called aperture synthesis. Some of the sub-arrays of the SKA will also have a very large field-of-view (FOV), making it possible to survey very large areas of the sky at once! Switzerland has become the eighth country to join the intergovernmental nations that will collaborate in building the Square Kilometre Array Observatory (SKAO), to be built in Australia and South Africa. Swiss involvement is organized through a strong consortium of research institutions, called SKACH , including, Fachhochschule Nordwestschweiz (FHNW), Universität Zürich (UZH), Eidgenössische Technische Hochschule Zürich (ETHZ), École Polytechnique Fédérale de Lausanne (EPFL), Zürcher Hochschule für Angewandte Wissenschaften (ZHAW), Universität Basel (UniBas), Université de Genève (UniGE), Haute École spécialisée de Suisse Occidentale (HES-SO), Centro Svizzero di Calcolo Scientifico (CSCS). The SKA telescopes will look at the history of the Universe as far back as the Cosmic Dawn, when the very first stars and galaxies formed. These key facilities will help Swiss scientists discover answers to the burning questions throughout several key topics in the field of astrophysics, including: dark energy, cosmic reionization, dark matter, galaxy evolution, cosmic magnetic fields, tests of gravity, solar physics, and others. During its operation, the SKAO will collect unprecedented amounts of data, requiring the world's fastest supercomputers to process this data in near real time. Swiss data scientists are working on complex Big Data algorithms enhanced by High-Performance Computing and machine learning techniques to handle these large data streams. As part of SKACH , the aim of our group is to extend the SPH-EXA simulation framework to include proper cosmological physics to reach trillion particle simulations on hybrid Tier-0 computing architectures. To this end we aim at coupling relevant physics modules with our SPH framework enabling the possibility of addressing both long-standing and cutting-edge problems via beyond state state-of-the-art simulations at extreme scales in the fields of Cosmology and Astrophysics. Such simulations include the formation, growth, and mergers of supermassive black holes in the early universe which would greatly impact the scientific community (for instance, the 2020 Nobel Prize in Physics has been awarded for pioneering research on super-massive black holes). Moreover, the ability to simulate planet formation with high-resolution models will play an important role in consolidating Switzerland's position as a leader in experimental physics and observational astronomy. Additional targets will be related to explosive scenarios such as core-collapse and Type Ia supernovas , in which Switzerland has also maintained a long record of international renown. These simulations would be possible with a Tier-0-ready SPH code and would have a large impact on projects such as the current NCCR PlanetS funded by the SNF.

Project cover

SPH-EXA2: Smoothed Particle Hydrodynamics at Exascale

Research Project  | 4 Project Members

The goal of the SPH-EXA2 project is to scale the Smoothed Particle Hydrodynamics (SPH) method implemented in SPH-EXA1 to enable Tier-0 and Exascale simulations . To reach this goal we define four concrete and interrelated objectives: physics, performance, correctness, and portability & reproducibility . We aim at coupling relevant physics modules with our SPH framework enabling the possibility of addressing both long-standing and cutting-edge problems via beyond state state-of-the-art simulations at extreme scales in the fields of Cosmology and Astrophysics. Such simulations include the formation, growth, and mergers of supermassive black holes in the early universe which would greatly impact the scientific community (for instance, the 2020 Nobel Prize in Physics has been awarded for pioneering research on super-massive black holes). Moreover, the ability to simulate planet formation with high-resolution models will play an important role in consolidating Switzerland's position as a leader in experimental physics and observational astronomy. Additional targets will be related to explosive scenarios such as core-collapse and Type Ia supernovas , in which Switzerland has also maintained a long record of international renown. These simulations would be possible with a Tier-0-ready SPH code and would have a large impact on projects such as the current NCCR PlanetS funded by the SNF. The long-term and ambitious vision of the SPH-EXA consortium is to study fluid and solid mechanics in a wide range of research fields, that nowadays are unfeasible (with the current models, codes, and architectures). To this end, in SPH-EXA2 we build on SPH-EXA1 and develop a scalable bare-bones SPH simulation framework , and refer to it as SPH-EXA . In Switzerland, within the framework of the PASC SPH-EXA (2017-2021) project , we developed the SPH-EXA miniapp as a scalable SPH code that employs state-of-the-art parallel programming models and software engineering techniques to exploit the current HPC architectures, including accelerators. The current SPH-EXA mini-app performs pure hydrodynamical simulations with up to 1 trillion SPH particles using only CPUs on 4,096 nodes on Piz Daint at CSCS. With relatively limited memory per GPU, the miniapp can still scale up to 250 billion SPH particles. In terms of performance , the use of accelerators is necessary to meet the above SPH-EXA2 goal and objectives. Offloading computationally-intensive steps to hardware accelerators, such as self-gravity evaluation and ancillary physics, will enable SPH-EXA to simulate increasingly complex cosmological & astrophysical scenarios. We envision that various types of hardware accelerators will be deployed on the supercomputers that we will use in this project, such as NVIDIA GPUs (in Piz Daint) or AMD GPUs (in LUMI). Portability across GPUs will be ensured by using OpenACC and OpenMP target offloading, which is supported by different GPU vendors. Scheduling & load balancing and fault tolerance are major challenges on the way to Exascale. We will address these challenges in SPH-EXA2 by employing locality-aware data decomposition, dynamic & adaptive scheduling and load balancing , and advanced fault tolerance techniques. Specifically, we will schedule & load balance the computational load across heterogeneous CPUs, various NUMA domains (e.g., multiple sockets or memory controllers, multi-channel DRAM, and NV-RAM), and between CPUs and GPUs. To achieve correctness , we will examine and verify the effectiveness of the new MPI 4.0 and beyond standard support for fault tolerance, in addition to selective particle replication (SPR) and optimal checkpointing (to NV-RAM or SSD). To ensure performance portability & reproducibility , we will benchmark SPH-EXA1's performance on a wide variety of platforms, as well as build off-the-shelf SPH-EXA containers that can easily be deployed with no additional setup required. This will also enlarge the SPH-EXA code user base. The primary advantage of the SPH-EXA2 project is its scientific interdisciplinarity . The project involves computer scientists, computer engineers, astrophysicists, and cosmologists. This is complemented by a holistic co-design, which involves applications (cosmology, astrophysics, CFD), algorithms (SPH, domain decomposition, load balancing, scheduling, fault tolerance, etc.), and architectures (CPUs, GPUs, etc.) as opposed to the traditional binary software-hardware co-design.

Project cover

DAPHNE: Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning

Research Project  | 2 Project Members

The DAPHNE project aims to define and build an open and extensible system infrastructure for integrated data analysis pipelines, including data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. Key observations are that (1) systems of these areas share many compilation and runtime techniques, (2) there is a trend towards complex data analysis pipelines that combine these systems, and (3) the used, increasingly heterogeneous, hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, as well as data formats and representations differ substantially. Therefore, this project aims - with a joint consortium of experts from the data management, ML systems, and HPC communities - at systematically investigating the necessary system infrastructure, language abstractions, compilation and runtime techniques, as well as systems and tools necessary to increase the productivity when building such data analysis pipelines, and eliminating unnecessary performance bottlenecks.

Project cover

MLS: Multilevel Scheduling in Large Scale High Performance Computers (extension)

Research Project  | 2 Project Members

This project proposes to investigate and develop multilevel scheduling (MLS), a multilevel approach for achieving scalable scheduling in large scale high performance computing systems across the multiple levels of parallelism, with a focus on software parallelism. By integrating multiple levels of parallelism, MLS differs from hierarchical scheduling, traditionally employed to achieve scalability within a single level of parallelism. MLS is based on extending and bridging the most successful (batch, application, and thread) scheduling models beyond single or a couple of parallelism levels (scaling across) and beyond their current scale (scaling out). The proposed MLS approach aims to leverage all available parallelism and address hardware heterogeneity in large scale high performance computers such that execution times are reduced, performance targets are achieved, and acceptable efficiency is maintained. The methodology for reaching the multilevel scheduling aims involves theoretical research studies, simulation, and experiments. The expected outcome is an answer to the following research question: Given massive parallelism, at multiple levels, and of diverse forms and granularities, how can it be exposed, expressed, and exploited such that execution times are reduced, performance targets (e.g., robustness against perturbations) are achieved, and acceptable efficiency (e.g., tradeoff between maximizing parallelism and minimizing cost) is maintained? This proposal leverages the most efficient existing scheduling solutions to extend them beyond one or two levels, respectively, and to scale them out within single levels of parallelism. The proposal addresses four tightly coupled problems: scalable scheduling, adaptive and dynamic scheduling, heterogeneous scheduling, and bridging schedulers designed for competitive execution (e.g., batch and operating system schedulers) with those for cooperative execution (e.g., application level schedulers). Overall, the project aims to make a fundamental advance toward simpler to use large scale high performance computing systems, with impacts not only in the computer science community but also in all computational science domains.

Project cover

MODA at sciCORE: Monitoring and Operational Data Analytics at sciCORE

Research Project  | 3 Project Members

The goal of this project is to improve HPC operations and research regarding system performance, resilience, and efficiency. The performance optimization aspect targets optimal resource allocation and job scheduling. The resilience aspect strives to ensure orderly operations when facing anomalies or misuse, this includes security mechanisms against malicious applications. The efficiency aspect is about resource management and energy efficiency of HPC systems. To this end, appropriate techniques are employed to (a) monitor the system and collect data, such as sensor data, system logs, and job resource usage, (b) analyze system data through statistical and machine learning methods, and (c) make control and tuning decisions to optimize the system and avoid waste and misuse of computing power. The operational ideals that this project follows are (a) to gain a data-driven understanding of the system instead of operating it like a black box, (b) to continuously monitor all system states and application behavior, (c) to holistically consider the interaction between system states and application behavior, and (d) to develop solutions that can detect and resolve performance issues autonomously.

Project cover

MODA: Monitoring and Operational Data Analytics for HPC Systems

Research Project  | 2 Project Members

The goal of this project is to improve HPC operations and research regarding system performance, resilience, and efficiency. The performance optimization aspect targets optimal resource allocation and job scheduling. The resilience aspect strives to ensure orderly operations when facing anomalies or misuse, this includes security mechanisms against malicious applications. The efficiency aspect is about resource management and energy efficiency of HPC systems. To this end, appropriate techniques are employed to (a) monitor the system and collect data, such as sensor data, system logs, and job resource usage, (b) analyze system data through statistical and machine learning methods, and (c) make control and tuning decisions to optimize the system and avoid waste and misuse of computing power. The operational ideals that this project follows are (a) to gain a data-driven understanding of the system instead of operating it like a black box, (b) to continuously monitor all system states and application behavior, (c) to holistically consider the interaction between system states and application behavior, and (d) to develop solutions that can detect and resolve performance issues autonomously.

Project cover

SPH-EXA: Optimizing Smooth Particle Hydrodynamics for Exascale Computing (extension)

Research Project  | 6 Project Members

Understanding how fluids and plasmas behave under complex physical conditions is on the basis of some of the most important questions that researchers try to answer. These range from practical solutions to engineering problems to cosmic structure formation and evolution. In that respect, numerical simulations of fluids in astrophysics and computational fluid dynamics (CFD) are among the most computationally demanding calculations in terms of sustained floating-point operations per second (FLOP/s). It is expected that they will benefit greatly from the future Exascale computing infrastructures, that will perform 1018 FLOP/s. This type of scenarios pushes the computational astrophysics and CFD fields well into sustained Exascale computing. Nowadays, they can only be tackled by either reducing the scale, the resolution and/or the dimensionality of the problem, or using approximated versions of the physics involved. How this affects the outcome of the simulations, and therefore our knowledge on the problem, is still not well understood. The simulation codes used in numerical astrophysics and CFD (hydrocodes, hereafter) are numerous and varied. Most of them rely on a hydrodynamics solver that calculates the evolution of the system to be studied along with all the coupled physics. Among these hydrodynamics solvers, the Smooth Particle Hydrodynamics (SPH) technique is a purely Lagrangian method, with no subjacent mesh, where the fluid can freely move in a practically boundless domain, this being very convenient for astrophysics and CFD simulations. SPH codes are very important in astrophysics because they couple naturally with the fastest and most efficient gravity solvers such as tree-code and fast multiple methods. Nevertheless, the parallelization of SPH codes is not straightforward due to its boundless nature and the lack of a structured grid, causing continuously changing interactions between fluid elements or between fluid elements and mechanical structures, from one time-step to the next. This, indeed, poses an additional layer of complexity in parallelizing SPH codes, yet it also renders them a very attractive and challenging application for the computer science community in view of its parallelization and scalability challenges for the upcoming Exascale computing systems. We aim in this project to have a scalable and fault-tolerant SPH kernel, developed into a mini/proxy co-design application. The SPH mini-app will be incorporated into current production codes in the fields of astrophysics (SPHYNX, ChaNGa), and CFD (SPH-flow), producing what we call the SPH-EXA version of those codes.

Project cover

The DIALOGUE Study: Using digital health to improve care for families with predisposition to hereditary cancer

Research Project  | 15 Project Members

In Hereditary Breast and Ovarian Cancer (HBOC) syndrome, communication of genetic test results with relatives is essential to cascade genetic screening. Cascade genetic screening is a sequential process of identifying and testing blood relatives of a known mutation carrier to determine if they also carry the pathogenic variant, in order to propose preventive and other clinical management options that reduce morbidity and mortality. However, according to Swiss and Korean privacy laws, individuals identified with the pathogenic variant have the sole responsibility to share information about test results and health implication to relatives. Empirical evidence suggests that up to 50% of biological relatives are unaware of relevant genetic information, suggesting that potential benefits of genetic testing are not communicated effectively. Thus, interventions designed to help probands effectively communicate with relatives are critical for better management of hereditary cancer risk.

Technology could play a significant role in facilitating communication and genetic education within HBOC families. Given the lack of well-developed digital health tools to assist individuals with genetic predisposition to cancer effectively communicate genetic information to their relatives, the study aims to develop a modern, scalable, mobile friendly digital health solution for Swiss and Korean HBOC families. The digital health solution will be based on the Family Gene Toolkit (FGT), a web-based intervention designed to enhance communication of genetic test results within HBOC families that has been successfully tested for acceptability, usability, and participant satisfaction.

The study will also expand an existing research infrastructure developed in Switzerland, to enable future collaborative projects between Switzerland and Korea in this field. The Specific Aims of the project are: 1) Develop a digital health solution to support the communication of cancer predisposition among HBOC families, based on linguistic and cultural adaptation methods of the Family Gene Toolkit for the Swiss and Korean population 2) Develop the K-CASCADE research infrastructure in Korea by expanding an existing research infrastructure developed by the CASCADE Consortium in Switzerland 3) Evaluate the efficacy of the aforementioned digital solution on psychological distress and communication of genetic test results, as well as knowledge of cancer genetics, coping, decision making and quality of life 4) Explore the reach, effectiveness, adoption, implementation, and maintenance of the aforementioned digital solution.

The content for the digital health solution will be based on the FGT with linguistic adaptation to Korean, German, French and Italian, and will be made available for web and mobile access. Aim 1 will be achieved through focus groups in each country to better identify cultural context with 20 -24 HBOC mutation carriers and relatives and 6-10 healthcare providers involved in genetic services (counseling and testing).

For Aim 2 , K-CASCADE, a Korean database of HBOC families (mutation carriers and relatives) will be created based on the Swiss CASCADE Consortium database, creating a lasting research infrastructure that will facilitate future collaboration, including the possibility to apply machine learning algorithms for prediction of breast and ovarian cancer risk.

For Aim 3, feasibility and efficacy of the digital health solution against the comparison intervention will be assessed in a randomized trial, with a sample of 104 HBOC mutation carriers (52 in each study arm).

Aim 4 will be achieved with survey and interview data collected from participating HBOC families and healthcare providers during all phases of the study. Dissemination strategies will also be generated to ensure sustainable use of the digital health solution. Adapting existing interventions, rather than developing new ones, takes advantage of previous valid experiences without duplicating efforts.

Adaptation and implementation of culturally sensitive, digital health interventions that can facilitate communication processes within the family and enhance understanding of genetic cancer risk are extremely timely and relevant, given the expansion of genetic testing technology, the falling costs of genetic testing, and the increased pressure for integration of genetic knowledge in routine clinical care. The study would be one of the first resource-effective international research platforms to develop digital health solutions that can be scaled to large patient numbers and can be used in routine practice.

Project cover

3BEARS: Broad Bundle of BEnchmarks for Allocation of Resources and Scheduling​ ​in Parallel and Distributed Computing

Research Project  | 5 Project Members

The goal of the project is to develop ways to co-design parallel applications and scheduling algorithms in order to achieve high performance and optimize resource utilization. Parallel applications nowadays are a mix of High-Performance Computing (HPC), Big Data, and Machine Learning (ML) software. They show varied computational profiles, being compute-, data-, I/O-intensive, or a combination thereof. Because of the varied nature of their parallelism, their performance can degrade due to factors such as synchronization, management of parallelism, communication, and load imbalance. In this situation, scheduling has to be done with care to avoid causing new performance problems (e.g., fixing load imbalance may degrade communication performance). In this work, we concentrate explicitly on scheduling algorithms that minimize load imbalance and/or minimize communication costs. Our focus is the characterization of workloads represented by the mix of HPC, Big Data, and ML applications, in order to use them to test existing scheduling techniques and to enable the development of novel and more suitable scheduling techniques.