UNIverse - Public Research Portal
Project cover

DAPHNE: Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning

Research Project
 | 
01.10.2020
 - 30.09.2024

The DAPHNE project aims to define and build an open and extensible system infrastructure for integrated data analysis pipelines, including data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. Key observations are that (1) systems of these areas share many compilation and runtime techniques, (2) there is a trend towards complex data analysis pipelines that combine these systems, and (3) the used, increasingly heterogeneous, hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, as well as data formats and representations differ substantially. Therefore, this project aims - with a joint consortium of experts from the data management, ML systems, and HPC communities - at systematically investigating the necessary system infrastructure, language abstractions, compilation and runtime techniques, as well as systems and tools necessary to increase the productivity when building such data analysis pipelines, and eliminating unnecessary performance bottlenecks.

Funding

DAPHNE: Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC, and Machine Learning

Horizon 2020 Collaborative Projects (GrantsTool), 10.2020-09.2024 (48)
PI : Ciorba, Florina M..

Publications

Patrick Damme et al. (2022) ‘DAPHNE: An Open and Extensible System Infrastructure for Integrated Data Analysis Pipelines’. https://www.cidrdb.org/: https://www.cidrdb.org/. Available at: https://www.cidrdb.org/cidr2022/papers/p4-damme.pdf.

URLs
URLs

Ihde, N. et al. (2021) ‘A Survey of Big Data, HPC and Machine Learning Benchmarks’. Springer: Springer. Available at: https://hpi.de/fileadmin/user_upload/fachgebiete/rabl/publications/2021/A_Survey_of_Big_Data_High_Performance_Computing_and_Machine_Learning_Benchmarks.pdf.

URLs
URLs

Müller Korndörfer, Jonas H. et al. (2021) ‘LB4OMP: A Dynamic Load Balancing Library for Multithreaded Applications’, IEEE Transactions on parallel and distributed systems, p. 12. Available at: https://doi.org/10.1109/tpds.2021.3107775.

URLs
URLs

Nina Ihde et al. (2021) ‘A Survey of Big Data, High Performance Computing, and Machine Learning Benchmarks’. Springer: Springer. Available at: https://link.springer.com/chapter/10.1007/978-3-030-94437-7_7.

URLs
URLs

Members (2)

Profile Photo

Florina M. Ciorba

Principal Investigator
MALE avatar

Ahmed Hamdy Mohamed Eleliemy

Project Member