18th Workshop on high performance computing in meteorology

Optimisation of Data Movement in Complex Workflows

Speaker

Harvey Richardson (Cray UK Ltd)

Description

Since its formation in 2015, the Cray EMEA Research Lab has taken a keen interest in new approaches to optimise data location, distributed task computation and HPC I/O. To fully address HPC I/O optimization we must move beyond configuration, application and library optimisation to looking at the whole workflow and to optimise data movement required by dependencies in the workflow. Our Octopus project will deliver a high-level workflow description, scheduling, and execution framework, built around data and memory hierarchy awareness. We believe that being able to schedule applications while being able to reason about their data production, data locality, and data consumption, and accounting for the cost of moving data between tiers of the memory hierarchy will enable efficient execution of coupled applications in complex workflows. Our use-cases cover: in-situ analysis and visualization; simulation with multiple consumers; coupled simulations with multiple concurrent analysis, external-data source and archiving dependencies; and even sample distribution and shuffling in high-throughput machine learning tasks. Octopus is an ambitious project that will provide optimisations impossible if we constrain ourselves to the individual application. We have an outline design and will develop Octopus collaboratively with partners of recently funded EU projects, ECMWF is one of these partners.
We have developed an enabling library-based ‘Data Junction’ to transport (and redistribute) data encapsulated in Octopus objects between applications, this already supports various transports (including MPI, Dataspaces, Ceph RADOS, libfabric and POSIX file).

The talk will describe Octopus, work completed so far on data transport and how this will continue via collaborative EU H2020 projects MAESTRO and EPIGRAM-HS. We are interested in how the Octopus framework can be applied to a variety of NWP and climate workflows.

Affiliation Cray EMEA Research Lab

Primary authors

Dykes Tim (Cray UK Ltd) Clement Foyer (Cray UK Ltd) Utz-Uwe Haus (Cray EMEA Research Lab) Harvey Richardson (Cray UK Ltd) Karthee Sivalingam (Cray EMEA Research Lab) Adrian Tate (Cray EMEA Research Lab)

Presentation materials