20th ECMWF workshop on high performance computing in meteorology

Abstracts

This abstract is not assigned to a timetable spot.

Updates of HPC in Japan Meteorological Agency

TOYODA Eizi 1

1Japan Meteorological Agency

Updates of HPC in Japan Meteorological Agency (JMA) will be reported. JMA plans to replace current Cray supercomputer in March 2024. In March 2023, JMA also deployed additional supercomputer Fujitsu FX1000. JMA's recent efforts are focused on next-generation of forecast systems: the Supercomputer Fugaku is used for research and development, and new mesoscale forecast systems are being built on JMA's own FX1000.

This abstract is not assigned to a timetable spot.

Porting and Benchmarking CloudSC on The A64FX Processor

Andrew Beggs 1

1ECMWF

Old is new again and vectorisation has become a key pathway to increase the performance of code. To feed this increase in compute performance, High Bandwidth Memory has also become increasingly popular. This poster looks at the effects and strategies to utilise both Arm’s Scalar Vector Extension and HBM (available on Fujitsu’s A64FX processor) on CloudSC - a physics component of the Integrated Forecasting System known for being computationally demanding.

Affiliations:
ECMWF

This abstract is not assigned to a timetable spot.

Atlas Parallel Conservative Remapping on Arbitrary Spherical Meshes

Pedro Maciel 1, Michail Diamantakis 1, Slavko Brdar , Willem Deconinck 1

1ECMWF

[[POSTER]] Conservative interpolations have many applications for earth system models. There are, most notably, mass-conservative parallel exchange of variables between model components between atmosphere/ocean, atmosphere/earth-surface, or preparation of initial data as climate fields. To account for various grids used by these model components, we support any arbitrary mesh on the sphere such as octahedral or reduced Gaussian grids of IFS, quasi-structured grids such as ORCA of NEMO and FESOM2, or fully unstructured grids. This general interpolation method is implemented in the Atlas library of ECMWF. We present an implementation and analysis for very high resolution grids.

This abstract is not assigned to a timetable spot.

Bringing the Destination Earth Continuous Extremes Digital Twin to operations

Emma Kuwertz 1, Bentorey Hernandez Cruz 1, Tryggvi Hjorvar 1, Johannes Bulin 1, Paul Burton 1, Andrew Bennett 1, Michael Sleigh

1ECMWF

The Destination Earth programme is delivering several Digital Twins (DT) that leverage cutting-edge technology to provide relevant Earth System predictions for climate and extreme weather. One of these is the Continuous Extremes DT which is underpinned by ECMWF’s world-leading Integrated Forecasting System (IFS). It produces high-resolution global real-time medium-range weather forecasts on the LUMI EuroHPC platform using GPU acceleration, delivering products to the Data Bridge using the Digital Twin Engine infrastructure. We describe the approach and challenges of delivering this DT in an operational-like environment and look towards the future.

This abstract is not assigned to a timetable spot.

Porting ECTrans to AMD GPUs

Paul Mullowney 1, Andreas Mueller 2

1AMD, Inc, 2ECMWF

High fidelity, high accuracy weather forecasting requires significant investment in software development to leverage leadership computing resources. Many of the codes employed for this task have been developed over many years under large multi-national collaborations. There is a well justified desire to minimize changes to the underlying source code, with all it’s embedded knowledge, while simultaneously taking advantage of modern HPC resources. With this in mind, many of the development teams have chosen the path of directive-based offloading to leverage GPU compute architectures. In this talk, I will present on our efforts to port the ECTrans component of the IFS forecasting model to AMD GPUs. I will discuss results for both OpenACC and OpenMP offloading approaches on current generation MI250X devices. I will discuss opportunities for potential performance gains on the next generation, MI300, devices.

This abstract is not assigned to a timetable spot.

Diversifying Your HPC Technology with Azure: A New Era of Scientific Computing

Alexandre Jean

Are you looking to diversify your High Performance Computing technology and take advantage of the latest innovations in scientific computing thanks to the Azure Cloud?


We'll showcase a real-world scenario and highlight the unique advantages of using Azure for your HPC workloads, including access to cutting-edge solutions (infrastructure and software), seamless integration with your existing tools and workflows, and the ability to scale up or down as needed.

Don't miss this opportunity to discover how Azure can help you accessing HPC technologies and unlock new possibilities.

This abstract is not assigned to a timetable spot.

A Deep Dive into DPU Computing – Addressing HPC Performance Bottlenecks

Gilad Shainer , Richard Graham 1

1NVIDIA

AI and scientific workloads demand ultra-fast processing of high-resolution simulations, extreme-size datasets, and highly parallelized algorithms. As these computing requirements continue to grow, the traditional GPU-CPU architecture further suffers from imbalance computing, data latency and lack of parallel or pre-data-processing. The introduction of the Data Processing Unit (DPU) brings a new tier of computing to address these bottlenecks, and to enable, for the first-time, compute overlapping and nearly zero communication latency. The session will deliver a deep dive into DPU computing, and how it can help address long lasting performance bottlenecks. Performance results of a variety of HPC applications will be presented as well.

This abstract is not assigned to a timetable spot.

Integrated Forecasting System Performance Optimization

Ioan Hadade 1, Olivier Marsden 1, Richard Graham 2

1ECMWF, 2NVIDIA

The Integrated Forecasting System (IFS) is used to create all the weather forecasts produced by ECMWF for its member states. ECMWF and NVIDIA are working together to improve IFS’s time to solution, with one of the specific areas of focus being how the application utilizes the network for data exchange. Initial work has identified opportunities for performance improvement by changing the management of network resources and data layout, adding support for MPI data-types to expose the native data layout to the InfiniBand network hardware’s gather/scatter capabilities, changing the semantics of some of the data exchange phases and exploring the advantages of using the BlueField DPU for further acceleration. This presentation will describe approaches taken and provide a snapshot of the performance improvements.

This abstract is not assigned to a timetable spot.

Cloud microphysics dwarfs extensions: interfacing to Python and Loki GPU port of the nonlinear, tangent and adjoint variants.

Zbigniew Piotrowski 1

1ECMWF

Organisation of the weather suite computations on heterogenous machines depends, among others, on the feasibility of execution of a given component on accelerators. Here, two "proof of concept" developments within the CLOUDSC dwarf family are discussed. Aiming at the prospective non-hydrostatic FVM dynamical core implemented in GT4PY hardware-agnostic framework, the technical feasibility of integration of legacy Fortran physics and Python dynamics is discussed. In turn, paving the way towards the data assimilation computations on GPU, Loki port of the nonlinear-tangent linear-adjoint triad of CLOUDSC2 is successfully executed on GPU and assessed in terms of accuracy and performance.

This abstract is not assigned to a timetable spot.

EarthWorks: The computational and engineering challenges faced when building a global storm-resolving resolution modeling system

Sheri Mickelson 1

1NCAR

EarthWorks is an NSF-funded project that is led through Colorado State University with a partnership through NCAR which aims to run a fully coupled Earth system model at 3.75 km resolution, global storm-resolving resolutions. The EarthWorks’ model configuration leverages the infrastructure of the Community Earth System Model (CESM) to couple the CAM-MPAS model with the MPAS Ocean, MPAS Sea Ice, and the Community Land Model (CLM) models.

Our fully coupled model configuration has been successfully tested in multi-year simulations at 120, 60, and 30 km resolutions on CPU’s, with shorter functional tests being completed at our target resolution of 3.75 km. In order to achieve multi-year simulations at our target resolution, we need to create a fully GPU resident version of our model. This presentation will focus on the refactoring work that is well underway and the acceleration that has been obtained from the various components thus far. We will also discuss the software engineering challenges that this project has faced and how they have been overcome.

This abstract is not assigned to a timetable spot.

Diverse Aspects of Computing at DWD

Harald Anlauf 1, Ulrich Schättler 2, Marek Jacob 1, Florian Prill 3

1DWD, 2Deutscher Wetterdienst, 3German Weather Service DWD

In this presentation we report on the latest upgrades of DWD's operational workload. One focus will be the increasing diversity of our HPC system, and its implications on the NWP process chain. Also, we give an overview on ICON, the community model used for our numerical prediction on all scales.

To implement improvements in the forecasting system, our NEC SX Aurora Tsubasa HPC system is being upgraded to a more than 8 PFlop system by extending the current Aurora 10 installation with Aurora 30 vector engines (VE). This will allow the operational implementation of the planned seamless integration of nowcasting and shortest range numerical forecast.

A small GPU partition with 16 NVIDIA A100-80GB already has been added in 2022, which is mainly used by the forecasting department for AI applications in the project Met4airports.

This abstract is not assigned to a timetable spot.

ICON on its Way to Exascale – Status and Next Steps

Claudia Frauen 1

1German Climate Computing Center

The development of the weather and climate model ICON started more than 20 years ago. The model has a large monolithic code base mostly written in Fortran. By using OpenACC directives the atmospheric component of the model has been enabled to run successfully on (NVIDIA) GPU-based HPC systems like JUWELS Booster. However, getting to run the model on the AMD GPU-based system LUMI has already highlighted the limitations of this strategy. Here, we will compare the performance and scaling of the current code-base on the European pre-exascale systems LUMI and JUWELS. Looking to the future, projects like the German WarmWorld or the ICON-C initiative of the ICON consortium are taking steps to develop ICON into a scalable, modularized, flexible open-source code, which will enable scalable development and accelerate refactoring for performance and performance portability.

This abstract is not assigned to a timetable spot.

Bureau of Meteorology HPC Status

Adam Smith 1

1Bureau of Meteorology

An overview is presented of the Australian Bureau of Meteorology's current status, developments, and future directions for HPC and modelling.

This abstract is not assigned to a timetable spot.

Exascale Computing for Meteorology

Ilene Carpenter 1

1HPE

Exascale computing capabilities will enable meteorological agencies to substantially increase the fidelity of their NWP modeling systems and enable scientists to answer challenging questions that require more computing capability than they've had to date. But as we enter the Exascale Era for meteorology centers, we're at a time of rapid external change driven by new capabilities of generative AI and large language models. Suddenly, new enterprises need high performance computing and chip manufacturers are tailoring solutions to meet this new demand.

What will the impact of this change be on the meteorology community? The demand for GPUs from the enterprise may impact the price and supply of these components. In a time of rapid change and volatile energy and component costs, it is critical that applications be architected to maximize agility, so that centers can choose solutions that provide the best value for their budgets and systems, or systems of systems, will need to be architected for a mixed workload that may change over the typical lifespan of HPC systems.

This abstract is not assigned to a timetable spot.

A python implementation of the ICON dynamical core for operational NWP

Daniel Hupp 1, Christoph Müller 1, Nina Burgdorfer 1, Abishek Gopal 2, Nicoletta Farabullini 3, Till Ehrengruber 2, Samuel Kellerhals 3, Peter Kardos 3, Magdalena Luz 3, Matthias Röthlin 1, Enrique G. Paredes 2, Benjamin Weber 1, Rico Häuselmann 2, Felix Thaler 2, Jonas Jucker 3, Linus Groner 2, Hannes Vogt 2, Mauro Bianco 2, Anurag Dipankar 3, Carlos Osuna 1, Xavier Lapillonne 1

1MeteoSwiss, 2ETH Zurich / CSCS, 3ETH Zurich

Operational Numerical Prediction center are confronted to the emergence of new technologies such as Graphical Processing Units (GPUs). Many of these weather models are implemented in large community driven codes, written in the Fortran programming language. There are solutions to port such models to GPUs using for example compiler directives. But, these approaches have limitation in terms of performance as well as portability to other hardware architectures. We present here a re-write of the dynamical core using a python-based domain-specific-language approach, called gt4py, which is developed in the framework of the Swiss EXCLAIM project. We present the validation and first performance results of the new dynamical core implementation for the MeteoSwiss future operational configurations ICON-CH1-EPS, ICON-CH2-EPS and KENDA-I1. We finally discuss the specific challenges associated with using such an approach in an operational context as well as opportunities offered by the python approach.

This abstract is not assigned to a timetable spot.

Seeking portability and productivity for future NWP code with GT4Py

Christian Kühnlein 1

1ECMWF

Achieving hardware-specific implementation and optimization while maintaining productivity in an increasingly diverse environment of supercomputing architectures is challenging and requires rethinking traditional numerical weather prediction model programming designs. We provide insights into the ongoing porting and development of ECMWF’s non-hydrostatic FVM atmospheric dynamical core in Python with the domain-specific library GT4Py. The presentation highlights the GT4Py approach for FVM and other ECMWF codes, shows recent high-performance computing results on CPUs and GPUs, and outlines the roadmap for the overall model porting project with partners at CSCS and ETH Zurich.

This abstract is not assigned to a timetable spot.

Parallel Software Framework of MCV Model

Qingu Jiang 1, Xingliang Li , Xinzhu Yu 2, Xueshun Shen 3, Li Liu 2

1Numerical Weather Prediction Center of CMA, 2Tsinghua University, 3CMA Earth System Modeling and Prediction Centre

China Meteorological Administration (CMA) has recently defined a technology roadmap for Earth system model development. The dynamical core based on the multi-moment constrained finite volume method (MCV) was selected for building global and regional unified atmospheric models. the discrete method of MCV has high accuracy and good parallel scalability, which is suitable for heterogeneous computing architecture. Aiming at the trend of Earth system model development and reducing the costs of software development and maintenance, C-Coupler, a coupler developed by Tsinghua University originally mainly for coupling Earth system model components, was selected as the parallel software framework for MCV. The latest C-Coupler 3 implements many features for supporting the parallel requirements of MCV model. This presentation will describe the MCV software aspects features, including the implementation of data structures, halo exchange, physics-dynamics coupling, and parallel I/O, and show the preliminary computational performance results of global high resolution. Finally, some computational optimization practices on the MCV model and future optimization ideas are presented.

This abstract is not assigned to a timetable spot.

Exploration of public cloud computing by an operational site running the Unified Model

Jeff Zais 1

1NIWA New Zealand

NIWA (the National Institute for Water & Atmospheric Research) in New Zealand uses the UK Met Office Unified Model for both weather & climate studies. Though the years these workloads have been run on in-house systems (Cray T3E in 1999, IBM p575 in 2010, Cray XC50 in 2017). A system refresh is due in 2024/2025, and traditional hosted systems are under consideration, as well as compute cycles in the public cloud. In this talk, a comparison will be presented between the hosted and cloud options. The focus will be performance aspects such as elapsed time and scaling with core counts. Also included will be considerations of file system performance on run time, for the numerous options available from various cloud vendors.

This abstract is not assigned to a timetable spot.

Heterogeneous HPC to the rescue? Ways to improve the energy efficiency of climate simulations today and tommorrow.

Jan Frederik Engels 1

1DKRZ

Rising energy costs have recently brought the topic of energy efficient HPC back to the forefront - and especially in climate research, the responsible use of resources should be a matter of course.
This talk will give an overview of the work at DKRZ on energy efficiency of current systems and the design of future systems. Starting with obvious measures for the data centre infrastructure, first results for further optimisation of existing systems will be presented. In addition, concepts for the analysis of current and future architectures will be considered in order to investigate the most suitable hardware for modular climate simulations using ICON as an example.

This abstract is not assigned to a timetable spot.

Loki: A Source-to-Source Translation Tool for Numerical Weather Prediction codes and more

Michael Staneker 1, Balthasar Reuter 1, Ahmad Nawab , Michael Lange 1

1ECMWF

All known or presumed candidates for exascale supercomputers will feature novel computing hardware or heterogeneous architectures, with GPUs currently being a cornerstone of this development. Using these machines efficiently with today's numerical weather prediction (NWP) codes requires adapting large code bases to new programming paradigms and applying architecture specific optimizations. Encoding all these optimisations within a single code base is infeasible.

Source-to-source translation offers the possibility to use existing code as-is and apply the necessary transformations and hardware-specific optimizations. To that end, we present Loki, a Python tool purpose built for ECMWF’s Integrated Forecasting System (IFS) that offers automatic source-to-source translation capabilities based on compiler technology to target a broad range of programming paradigms.

Following the recent open-source release of version 0.1.3, Loki is available on GitHub and ready for testing by the weather and climate community and beyond. It offers an API to encode custom transformations, allowing for expert-guided code translation. It supports multiple Fortran front ends, and can output Fortran, C, Python and now also CUDA Fortran.

In this poster, we highlight Loki’s key features,and present a performance comparison between auto-translated code and manually optimized variants.

This abstract is not assigned to a timetable spot.

Performance Portability and Programmability in the Evolution of Weather Codes Towards Heterogeneous Platfoms

Camilo Moreno 1

1Intel Corporation

Efforts are underway across the weather & climate community to adapt codebases that had traditionally targeted a single CPU architecture to take advantage of a wider variety of compute platforms based on different CPUs, GPUs, and other components. This evolution is made urgent by ambitious collaboration initiatives as well as aggressive roadmaps for model complexity and detail. A broad landscape of solutions is emerging to face these code evolution challenges, including DSLs, parallel programming models (such as SYCL), source-translation tools, and others. Performance Portability and Programmability offers a useful framework for judging related trade-offs. We will discuss the value of thinking in this way, and examine some examples of emerging code strategies under this lens.

This abstract is not assigned to a timetable spot.

Production Workflow for the Destination Earth Continuous Extremes Digital Twin

Michael Sleigh , Tryggvi Hjorvar 1, Paul Burton 1, Emma Kuwertz 1, Johannes Bulin 1, Bentorey Hernandez Cruz 1, Andrew Bennett 1

1ECMWF

The Continuous Extremes Digital Twin (DT) is being delivered by ECMWF as a part of the Destination Earth initiative to provide real-time medium-range weather forecasts at high resolution. Making use of EuroHPC computing resources, ECMWF's Integrated Forecasting System (IFS) is deployed on the LUMI supercomputer to generate forecasts initialised from operational data produced on-site at ECMWF's HPC facility.
High resolution products are disseminated to the Destination Earth Data Bridge for downstream users. This poster presents an overview of the DT's interaction with the Destination Earth DT Engine infrastructure, and the technical workflows implemented to support continuous production across platforms in a real-time environment.

This abstract is not assigned to a timetable spot.

Representing the 3D cloud radiative effects in ecRad using Machine Learning

Pelagie Alves 1, Christophe Bovalo 2, Matthew Chantry 3, Rémi Druilhe , David Guibert 1, Luca Marradi 1, Lionel Vincent 1

1Eviden, 2Atos, 3ECMWF

Machine Learning (ML) techniques are now widely used in atmospheric sciences. While some parameterizations have been successfully emulated, others could reach higher precision using ML.
In ecRad, ECMWF radiative scheme, Tripleclouds is a solver that considers the cloud heterogeneity at each model level but not the 3D cloud radiative effects. On the other hand, SPARTACUS does represent these effects but it is to expensive to be run within the operational configuration of IFS.
We trained neural networks to emulate the 3D cloud radiative effects. They will act as a corrective term to Tripleclouds formulation. Several architectures have been tested and will be evaluated.
The real target of these neural networks is RAPS, the benchmarking version of IFS. Two strategies have been implemented for online inferences: tight (Fortran-C-Python) and loose (MPI-RMA) couplings. Both approaches can be run either on CPUs or GPUs.

This abstract is not assigned to a timetable spot.

Digital Twins of the Earth system

Thomas Geenen , Tiago Quintino 1, Nils Wedi 1

1ECMWF

The talk describes ongoing efforts to create digital twins of the Earth as part of the European Commission's Destination Earth programme.
The focus of the talk will be on the services infrastructure required to operate and interact with such a digital information system and how the growing digital twin landscape and impact sector and decision support tools can be integrated with DestinE, facilitated by the digital twin engine.

This abstract is not assigned to a timetable spot.

Harnessing HPC in ECMWF's Meteorological Product Generation

Damien Decremer 1

1ECMWF

ECMWF has revolutionised meteorological data generation through its strategic use of high-performance computer (HPC). This poster offers insights into how HPC accelerates our vast-scale simulations, overcoming unique challenges and enhancing reliability. With state-of-the-art computational infrastructures, we optimise performance indicators and ensure 24/7 operations. Dive into ECMWF's pioneering intersection of meteorology and HPC, setting new standards in global data delivery.

This abstract is not assigned to a timetable spot.

Ubiquity: An Open-Source Platform For Ubiquitous Large-Scale Computing

Chris Coates 1

1Logicalis UK

Objective
The objective of this session is to bring together Ubiquity developers, potential administrators, users, and solution providers to engage in insightful discussions regarding recent developments and challenges faced in the field of HPC, in relation to the Ubiquity framework.

What is Ubiquity?
Ubiquity represents a rapidly expanding open-source and open-development platform designed for HPC, and embraces the principles of cloud computing, converged computing and ubiquitous computing.

By leveraging container technologies and container orchestration to enable efficient and scalable solutions, Ubiquity has garnered an increasing level of adoption, and caters to a wide range of industries, encompassing financial services, energy, manufacturing and sciences.

What will be discussed?
This session will centre around the exploration of several key topics. Firstly, participants will delve into the recent advancements and innovations within the Ubiquity ecosystem, discussing how the framework has evolved to address emerging challenges in the HPC landscape. The session will also address the role of Ubiquity in the field of artificial intelligence (AI), examining integration with AI workflows and the impact on HPC performance. It will also highlight the utilisation of Ubiquity within hybrid and multi-cloud environments, highlighting benefits and considerations associated with deploying Ubiquity-based solutions both on-prem and in cloud infrastructures.

Why is this useful?
This session is relevant to the HPC community due to increasing demand for high-performance and scalable computing solutions across various industries, with a focus being on how those solutions need to be flexible, and easier to maintain. Ubiquity enables this functionality.

Attendees will benefit from knowledge-sharing and networking opportunities this session provides. The insights shared during the session will assist attendees in navigating challenges associated with HPC adoption, containerisation, and orchestrating Ubiquity-based solutions within their domains. Participants will also gain a comprehensive understanding of the role Ubiquity plays in facilitating HPC advancements and driving innovation.

This abstract is not assigned to a timetable spot.

Met Office HPC Update

Paul Selwood 1

1Met Office

In 2021, the Met Office announced a new, ten year contract with Microsoft for its supercomputing requirements. This contract is part of a wider programme that doesn't just cover supercomputing, but also scientific data storage, pre/post-processing systems, networking, hosting, virtual desktops as well as improvements to observation and product delivery capabilities. The Microsoft contract is being delivered as a service and utilises a combination of Microsoft's Azure cloud services and HPE's Cray EX and ClusterStor technologies.

This presentation will give an overview of the programme; some of the drivers behind the move from the on-premise supercomputing we've used for many years, an outline of what the Microsoft solution is, and share progress towards
implementation.

This abstract is not assigned to a timetable spot.

Introduction to CAMS Mini-Application

Olivier Marsden 1, Iain Miller 1

1ECMWF

The Copernicus Atmosphere Monitoring Service, CAMS, is an integrated part of the IFS code, which is used to provide consistent and quality-controlled information related to air pollution and health, solar energy, greenhouse gases and climate forcing everywhere in the world. Typical features of CAMS enabled runs of IFS include an increased number of fields and use of a generated iterative solver, which lead to a corresponding increase in computational cost. This work details the creation and analysis of a Mini-Application to highlight areas that can reduce the overhead of running CAMS code and some of the resultant changes.

This abstract is not assigned to a timetable spot.

Machine learning at ECMWF

Matthew Chantry 1

1ECMWF

Machine learning (ML) is becoming an important component in weather forecasting, through a spectrum of approaches covering augmentation of numerical weather prediction (NWP) models to entire replacement of these same models. Across this spectrum, applications of machine learning in the domain have been rapidly increasing in complexity and computational cost. We will examine the HPC challenges and opportunities in the use of machine learning for weather forecasting, covering both the uniques challenges in combining of ML with NWP and examining how data-driven weather forecasting models could change the equation.

This abstract is not assigned to a timetable spot.

Technical Challenges of deploying and running future Destination Earth Twins

Craig Prunty 1, Hans-Christian Hoppe 2, Utz-Uwe Haus 3

1SiPearl, 2ParTec AG, 3HPE HPC EMEA Research Lab

The Destination Earth flagship project of the European Commission has set out to build an open, reusable, and scalable infrastructure to run digital twins of the earth at unprecedented scale within this decade. The opportunity of using large fractions of the largest EuroHPC installations is posing new challenges for the earth sciences community, reaching from running operational weather codes and climate simulations at highest resolution on shared systems, across different architectures, and in coupled setups of multiple codes that previously have not been coupled tightly, to data and access control federation challenges.

We will report on the first findings and suggestions of the Strategic Technology Agenda team, namely in the area of federation of compute resources, as well as data streaming inside and across sites, and give a lookout on the overall structure and expected outcomes of this technology agenda project (DE-380) in the coming months.

This abstract is not assigned to a timetable spot.

Best Practices for NWP in the cloud

Timothy Brown , Karthik Raman

The use of cloud computing technologies within HPC has grown considerably over the last few years coupled with the demand for higher resolution weather and climate modeling. This has created an exponentially higher demand for performant, elastic, flexible, and reliable distributed computing resources that AWS HPC is well positioned to serve. In this talk, we will present best practices for running weather and climate modeling in a cloud native environment and also talk about porting NWP codes to emerging architectures for example ARM-based Graviton3E on AWS. We will conclude with best practices and performance results on scaling NWP models across multiple architectures.

This abstract is not assigned to a timetable spot.

DASI: An Interface for Semantic Data Management

Jenny Wong 1, Metin Cakircali 1, Olivier Iffrig 1, Simon Smart 1, James Hawkes 1, Tiago Quintino 1

1ECMWF

With supercomputers reaching exascale, new approaches to data management and storage aimed at eliminating unnecessary data movement are essential for I/O intensive workflows. The IO-SEA project aims to implement solutions for scaling such applications to exascale HPC systems by designing storage and data access architectures in collaboration with a wide range of use cases with high I/O demands. These use cases, which include ECMWF’s weather forecasting and product generation, help to define the requirements and are used to test and evaluate the effectiveness of the implementations.

The subject of this talk is one of the project’s outputs, DASI (Data Access and Storage Interface), developed by ECMWF. DASI is designed to provide simple “scientist-friendly” data access on top of complex storage systems, such as the hierarchical storage management built in IO-SEA. It achieves this through semantic data management, where all access and control of data is performed using scientifically meaningful indexing relevant for the particular application domain. DASI is inspired by the FDB software in operational use at ECMWF, but is domain-agnostic, allowing it to be configured for any scientific domain. The interface allows users to write, retrieve and query data through metadata, as well setting data policies relating to data lifetime or access frequency. DASI abstracts the data storage mechanisms from the user, providing flexibility in implementation and ease of use.

The use cases involved in the co-design of the IO-SEA technologies include a benchmark of ECMWF’s weather forecasting and product generation workflow. As part of the project the benchmark has been adapted to use technologies developed in IO-SEA, such as DASI, and results on the impact of adopting these new technologies on the benchmark will be presented.

This abstract is not assigned to a timetable spot.

MME REP: Climate Data Records and EO data processing in a server-less computing paradigm

Fernando Ibáñez Casado , Mike Grant 1, Salvatore Pinto 1

1EUMETSAT

The Multi-Mission Element for Reprocessing (MME REP) is the latest EUMETSAT’s large-scale processing system, for Climate Data Records generation and bulk EO data processing on all EUMETSAT mission data.

Designed for flexibility, MME REP uses server-less computing and container technologies to integrate and run weather and climate products generation and EO data processing code developed for legacy architectures and different processing frameworks. A key point is the integration of heterogeneity into the system - common batch-processing (HTCondor) based workflows are supported alongside custom-built processing orchestrators, all within a single overall resource management framework (Kubernetes). This enables automatic scaling-up and down of resources, but retains the accessibility and feel of a more traditional computing cluster for those workflows using it, at the same time as supporting more modern approaches.

MME REP is composed by two main layers (Infrastructure and Middleware) and is based on hybrid computing ecosystem, with an infrastructure hosted roughly half on-premises and half on external cloud resources, totaling to more than 3500 CPU cores and 70TB of RAM. It is directly connected to the EUMETSAT Data Lake storage system, to gather input and publish output data, and features multiple tiers of I/O from local to centralised for intermediate and shared data in order to minimise the distance data must travel from storage to usage.

The presentation will focus on illustrating the MME REP server-less computing paradigm and related cloud technologies, and their application to EO data processing in representative use cases, with a focus on advantages and disadvantages of the solution in terms of flexibility, elasticity, scalability, security and fault tolerance.

This abstract is not assigned to a timetable spot.

Automating GPU adaptation of NWP single column physics using Loki

Michael Staneker 1, Ahmad Nawab , Balthasar Reuter 1, Michael Lange 1

1ECMWF

This talk is intended as the second part of a double-header detailing the GPU adaptation strategy for the IFS physics, i.e. source-term computation.

Adapting production numerical weather prediction (NWP) codes for GPU execution, that have typically been developed and optimised for multi-core CPUs, can be a very challenging task. One source of complexity is simply the vast size of these codebases, which are continually being updated by domain scientists. The lack of a universally agreed GPU programming model and language also add greatly to the difficulty, as vendor-specific modifications are frequently needed to achieve optimal performance.

ECMWF’s solution for managing this complexity in a sustainable and maintainable manner is to use source-to-source translation and GPU-enabled data structures to separate technical and scientific concerns. To this end, Loki has been developed, a source-to-source translation tool purpose-built for the IFS.

Using examples from ecWam, ECMWF’s open-sourced operational wave model, and the IFS model physics, this talk will illustrate how Loki can be used to automate the GPU adaptation of single column physics algorithms. Two Loki transformations in particular, the single column coalesced (SCC) and pool allocator transforms, will be explained in detail.

Finally, the talk will also give a brief overview of FIELD_API, a Fortran library jointly developed by ECMWF and Météo France that facilitates the creation and management, including GPU/accelerator offload, of field objects in scientific code.

This abstract is not assigned to a timetable spot.

IFShub - An integrated web interface to IFS developer workflow

Sylvie Lamy-Thepaut 1, Eduard Rosert 2, Krzysztof Sciubisz 1, Paolo Battino 1, Paul Burton 1

1ECMWF, 2Deutscher Wetterdienst

Over many years, a wide range of tools have been developed for aiding the workflow of ECMWF's IFS developers - ranging from prepIFS for managing the configuration of researcher's IFS experiments, ecflow_UI for monitoring & controlling the complex workflow of a running experiment on our HPC systems, to a multitude of tools for visualising, investigating and managing the large quantities of data produced. These various tools typically run on different platforms, with a variety of command line or graphical user interfaces.

ECMWF are working with software developers Old Reliable Tech, to develop IFShub as an integrated web based interface, allowing researchers to manage their IFS experiment workflow from a unified interface, accessible via a web browser from any internet connected computer. Some components of the workflows, such as prepIFS are being completely re-imagined and re-implemented as web based tools, whilst others are simply having a web front-end applied to a relatively unchanged existing functional back-end.

The hub is being built on a modular architecture. It will entail a certain number of modules, each one with a back-end web service exposing REST APIs (written in Python using Django framework), and its corresponding web front-end app (written in TypeScript using REACT components). One central module of the hub will unify access control, navigation and handle cross-module communication. This will ensure a coherent user experience, a single sign-on connected to existing ECMWF authentication service and a gateway for accessing the APIs through command-line tools or through other HTTP-based clients.

The modules are being designed to be both developed and deployed as containerised applications, for better portability, resilience and scalability, and once in operation will be hosted on Kubernetes clusters running on the ECMWF datacenter.

This unified interface to developer workflows is essential as the workflow itself becomes more diverse, incorporating ML aspects, and an increasing number of applications and configurations across an expanding number of HPC systems. Historically ECMWF has run IFS only on its own HPC systems, however our commitment to deliver Digital Twins within the Destination Earth project, running on multiple HPC platforms within the EuroHPC consortium will make an easily accessible and well integrated workflow management system for users both inside and outside of ECWMF a necessity.

In this presentation, we will describe our progress and future plans to deliver IFShub to IFS users, and the opportunities this project presents to improve the usability and productivity of the IFS developer workflow.

This abstract is not assigned to a timetable spot.

Mapping a Coupled Earth-System Simulator onto the Modular Supercomputer Architecture

Sam Hatfield 1, Lukas Mosimann , Kristian Mogensen , Ioan Hadade 2, Olivier Marsden 2

1University of Oxford, 2ECMWF

The Modular Supercomputer Architecture concept, developed for the DEEP project series, describes a novel kind of heterogeneous computing platform comprising several different “modules”, each of which is a separate compute cluster in its own right. The modules are connected with a federated network to allow heterogeneous jobs to execute across them. One module may be GPU-based to benefit compute kernels with dense linear algebra or machine learning tasks, for example, whereas another module may have a particularly well-optimised file system. The truly heterogeneous Modular Supercomputer Architecture therefore works particularly well for complex applications comprising a range of different compute patterns.

One such application is the Earth system simulation, in which the Earth system is broken down into individual components for representing the atmosphere, the ocean, the land surface, and others. Here we present results from adapting ECMWF's Earth system model, the Integrated Forecasting System (IFS), to take advantage of the Modular Supercomputing Architecture.

We focus on the relationship between two particularly compute-intensive model components: the atmosphere and the ocean. We will present results from performing concurrent heterogeneous coupled atmosphere-ocean integrations on a prototypical Modular Supercomputer Architecture system, the DEEP machine at the Jülich Supercomputing Centre. To do this we developed a new prototype coupler which permits the atmosphere and ocean to execute on separate MPI tasks. We pin the atmosphere tasks to the GPU-enabled Extreme Scale Booster module, so that spectral transforms can be offloaded to GPUs. We pin the ocean tasks to the CPU-only Cluster module. In this configuration, as a side effect, we are also able to run the ocean component in pure MPI mode and the atmosphere component in hybrid MPI+OpenMP mode, which provides more flexibility in fully exploiting multicore CPUs.

This abstract is not assigned to a timetable spot.

Embracing Diversity and Democratizing High-Performance Computing: Cultivating Inclusion and Driving Innovation

Samuel Mathekga 1, Binjamin Barsch 1, Mthetho Sovara 1, Mary-Jane Bopape 2

1Centre for High Performance Computing, 2South African Earth Observation Network (SAEON)

This presentation highlights the critical importance of diversifying High-Performance Computing (HPC) and presents effective strategies for cultivating a more inclusive community. HPC serves as a fundamental tool in various industries, including meteorology and oceanography, however, the field has long been recognized for its lack of diversity, particularly among women and minorities. In recent years, there has been a growing realization of the numerous benefits that diversification brings, such as enhanced innovation and the rectification of bias and discrimination. Diverse teams contribute a multitude of perspectives and ideas, leading to improved outcomes and decision-making in the realm of HPC. Additionally, the applications of HPC in healthcare, finance, and engineering underscore the pivotal role of diversity, as diverse backgrounds and experiences foster comprehensive solutions for pressing global challenges like climate change and sustainability.

Diversifying HPC encounters several barriers, including limited representation, restricted access to resources, and inadequate opportunities for underrepresented groups. To address these challenges, the Centre for High-Performance Computing (CHPC) has adopted a comprehensive approach. This involves the implementation of HPC content translation in local languages, making the field more accessible and inclusive to a wider range of individuals. Furthermore, sustainable mentorship programs have proven invaluable in providing guidance and support to aspiring individuals from diverse backgrounds who seek to enter the HPC community. The CHPC also advocates for inclusive hiring practices that actively seek out candidates from underrepresented groups, ensuring equitable opportunities for all.

In conclusion, the diversification of HPC is not only crucial for the future of the field but also enriches the HPC outcomes by embracing a broader range of perspectives and ideas. By adhering to best practices and actively embracing underrepresented groups, we can establish an HPC community that is truly inclusive and diverse, unlocking the full potential of this powerful technology.

This abstract is not assigned to a timetable spot.

Porting IFS dwarfs to SX-AURORA TSUBASA

Olivier Marsden 1, Ioan Hadade 1, Fatemeh Pouyan

1ECMWF

IFS dwarfs are standalone mini applications that consist of set of algorithms to present the key functional blocks of various parts of IFS application such as cloud microphysics scheme or radiation scheme. These mini applications can be considered as cheaper, more flexible and faster ways to discover parametric dependencies between the application and the architecture components to make a proper architectural decision. A common approach to analysis performance through execution is to benchmark the mini applications on diverse architectures with varying the problem size. We selected the SX-Aurora TSUBASA architecture to port IFS dwarfs. SX-Aurora TSUBASA is a new hybrid architecture that consists of vector engines (VE) for computing functions and vector host (VH) for operating system functions of an application executing on this architecture.

Measuring performance of these mini applications on the SX-Aurora system can provide insight into how a certain computing component such as vector core can carry particular types of operations. This kind of benchmarking allow us to explore the potential benefits and drawbacks of computational kernels and hardware components. Thus, our objective is to explore how particular application or architecture parameters can have impact on performance.

This abstract is not assigned to a timetable spot.

Enhancing Global Offshore Oil Pollution Monitoring with ECMWF Data

Harry McCormack , Becky Warrilow , Hamish Robertson 1

1CGG

Monitoring the temporal and spatial variability of offshore oil pollution, primarily caused by oil and gas infrastructure and shipping activities, is crucial for environmental protection and regulatory compliance. Satellite imagery offers a valuable source of near real-time information on oil pollution, providing global coverage. However, incorporating additional datasets such as wind speed and ocean currents can significantly improve the accuracy of pollution detection. To address the challenge of monitoring at a global scale with an increasing volume and frequency of data, the integration of high-performance computing (HPC) and machine learning technologies is essential.

SeaScope, a system for monitoring offshore oil pollution, utilizes HPC and machine learning for scalable global monitoring. It also offers human-in-the-loop options for additional verification. In this workshop presentation, we will showcase some examples of SeaScope at work. We will demonstrate forecast/hindcasting models incorporating ECMWF/OSCAR data and tangible examples of its usage in crisis response scenarios. Additionally, we will discuss the benefits of migrating to Kafka for applications running on HPC, highlighting how it enhances data processing efficiency.

SeaScope has been deployed in a new HPC datacenter utilizing immersion cooling technology which enables efficient heat dissipation. This enhances the overall energy efficiency, capabilities and reliability of the system. Leveraging HPC, SeaScope can process and analyze large volumes of data in parallel, enabling faster and more accurate detection of offshore oil pollution worldwide.

Utilizing ECMWF data alongside satellite imagery, SeaScope offers a feasible integrated approach to accurately monitor global offshore oil pollution. This supports timely decision-making and contributes to the preservation of the environment through efficient enforcement of standards and regulations. During the workshop, we will discuss the practical aspects of SeaScope's implementation, including its utilization of HPC with immersion cooling technology, and showcase its capabilities through real-world examples.

This abstract is not assigned to a timetable spot.

Destination Earth, Data Spaces and a European Cloud Federation: opportunities and challenges

Charalampos (Babis) TSITLAKIDIS 1

1European Commission - DG CONNECT

Destination Earth (DestinE) is a major initiative of the European Commission. It aims to develop a highly accurate digital model of the Earth to monitor and predict environmental change and human impact, supporting sustainable development.
DestinE will unlock the potential of digital modelling of the Earth system at a level that represents a real breakthrough in terms of accuracy, local detail, access-to-information speed and interactivity. DestinE will use unprecedented observation and simulation capabilities, powered by Europe’s HPC computers, a vast amount of data sources and the latest developments in Cloud Computing and AI. Thanks to this we will be better prepared to respond to major natural disasters, adapt to climate change and predict the socioeconomic impact.

At the same time, to harness the value of data for the benefit of the European economy and society, the Commission supports the development of common European data spaces in strategic economic sectors and domains of public interest (e.g. Green Deal, health, agriculture, manufacturing, energy, mobility) together with the deployment of a smart middleware (“SIMPL”) that will enable cloud-to-edge federations and support all major data initiatives funded by the Commission, such as DestinE, EOSC and the common European data spaces.

In this context, a number of opportunities have been already identified while in parallel, a number of challenges will have to be tackled.

This abstract is not assigned to a timetable spot.

Testing weather code on multiple HPC systems

Johannes Bulin 1

1ECMWF

ECMWF develops and maintains the Integrated Forecasting System (IFS) to deliver world-leading research in earth system science and produce operational forecasts critical for our member states and customers. As the IFS is complex and consists of several million lines of code, intensive testing is required. Among other things, we use automated testing and continuous integration for the IFS. While these are long-established techniques, applying them to scientific software that runs on multiple HPC systems with different architectures presents additional challenges that will be addressed in this talk.

As tests must be run on different HPC systems (not all of them under ECMWF control, as part of the DestinationEarth initiative), authentication to these systems must be handled in a safe, robust and easily maintainable manner. We present our currently used approaches and software tools to handle authentication to these different HPC systems.

Another issue is that the output of our weather forecasts must be checked for correctness on different platforms (CPU/GPU, different compilers, different hardware). As these tests should be fully automated, no human interactions - like score cards - should be necessary. We present different validation mechanism and discuss their advantages and shortcomings.

This abstract is not assigned to a timetable spot.

Pace: A GPU-enabled implementation of FV3GFS using GT4Py

Christopher Kung 1, Eddie Davis 2, Jeremy McGibbon 2, Tobias Wicky , Rhea George 2, Johann Dahm 2, Linus Groner 3, Lucas Harris , Florian Deconinck 1, Tal Ben-Nun 4, Oliver Fuhrer 5, Elynn Wu 2, Oliver Elbert 6

1NASA, 2Allen Institute for Artificial Intelligence, 3ETH Zurich, CSCS, 4ETH Zurich, 5Allen Institute of Artificial Intelligence, 6NOAA

As heterogeneous compute architectures become more prevalent in HPC platforms, multiple approaches are being explored to transition weather and climate models onto these systems. One promising method is to use a domain-specific language for weather and climate modeling, allowing the same frontend code to run at speed on multiple hardware backends. This avoids issues of code duplication and eases model development and maintenance significantly. GridTools for Python (GT4Py) is one such DSL that uses Python as a frontend language but translates code into highly-optimized C++ or CUDA during compilation. The Python frontend also gives modelers access to the ecosystem of Python packages and tools, such as pytest.

We present Pace, a GT4Py implementation of the nonhydrostatic FV3 dynamical core and the GFDL cloud microphysics that achieves a 3.5-4x speedup over Fortran on GPU-accelerated systems. Pace also improves developer productivity by enabling novel workflows, test cases, and tools: we can subtract the Pace dynamical core from itself at different model timesteps to ensure it is not stateful, easily incorporate machine-learning emulators, and directly integrate any Python pre-or post-processing routines. Work is ongoing at NOAA and NASA to increase the model’s capabilities, but already Pace demonstrates the power of DSL programming and shows great promise for the future of numerical modeling.

This abstract is not assigned to a timetable spot.

GT4Py: A Python framework for weather and climate applications

Till Ehrengruber 1

1CSCS / ETH Zürich

GT4Py is a Python framework for weather and climate applications simplifying the development and maintenance of high-performance codes in prototyping and production environments. GT4Py separates model development from hardware architecture dependent optimizations, instead of intermixing both together in source code, as regularly done in lower-level languages like Fortran, C, or C+. Domain scientists focus solely on numerical modeling using a declarative embedded domain specific language supporting common computational patterns of dynamical cores and physical parametrizations. An optimizing toolchain then transforms this high-level representation into a finely-tuned implementation for the target hardware architecture. This separation of concerns allows performance engineers to implement new optimizations or support new hardware architectures without requiring changes to the application, increasing productivity for domain scientists and performance engineers alike. From early prototypes in 2015, by now GT4Py has successfully been applied in various projects (Exclaim, KILOS, Pace) to port atmospheric models (ICON, FVM, FV3GFS) to heterogeneous architectures. We will present our journey towards making GT4Py a valuable tool for the weather and climate modeling community facing an increasingly diverse landscape of computing platforms. Our talk will focus on the user-interface of GT4Py and an overview of the optimizing toolchain.

This abstract is not assigned to a timetable spot.

Implementation of the Production European Weather Cloud

Xavier Abellan , Stig Telfer 1

1StackHPC Ltd

In partnership with EUMETSAT, ECMWF operates the European Weather Cloud (EWC) as a community cloud, dedicated to serving users of the European Meteorological Infrastructure. The European Weather Cloud's purpose is to facilitate data analysis on weather and climate datasets using cloud-native infrastructure and support the innovation of new and diverse methods for weather and climate data analysis, such as:

  1. Hosted web services, providing "virtual lab" environments for weather and climate data analysis.
  2. Data-intensive processing applications which will encompass traditional HPC applications.
  3. Machine Learning and AI.
  4. Virtual Desktop Infrastructure (VDI), creating workstation environments close to the data.
  5. Data science environments using JupyterLab/JupyterHub.
  6. Kubernetes-hosted scientific workloads.

During 2023, the EWC has now progressed to a production phase which includes two separate cloud deployments of OpenStack and Ceph, named the Common Cloud Infrastructure. Each production cloud is intended to be able to operate independently. A separate test and validation environment is also included in the production phase, to support cloud operations.

This talk, presented jointly by ECMWF and StackHPC, will provide an overview of the EWC private cloud architecture, highlighting performance-driven solutions for the internal network, storage and GPU acceleration.

This abstract is not assigned to a timetable spot.

Data streaming and job coordination in the Destination Earth climate twin

Sebastien Cabaniols 1, Ali Mohammed 1, Christopher Haine 1, Utz-Uwe Haus 2

1HPE EMEA Research Lab, 2HPE HPC EMEA Research Lab

Digital Twins of the earth are one of the key workflows that require data streams between measurements and multiple computational models, as well as a dynamically changing set of analysis applications. This exacerbates the well-known High Performance Computing (HPC) problem of dealing with data-intensive workloads, challenging users at multiple levels of the stack, be it as high as the workflow management level in orchestrating and coupling applications, or as low as the memory hierarchy level where memory and storage technologies converge, but carry incompatible software legacy for managing data with them.

In this presentation we discuss how the Climate Twin project (DE-340) is venturing to use ECMWF’s MultIO layer together with the Maestro data and memory aware middleware to enable high-performance dynamic coupling of applications in dynamic workflows, including data availability notification and automatic staging and archiving, potentially including from and to the Destination Earth data lake. We also discuss how remote event signaling using a Maestro proxy could be used for cross-site workflow coordination, a feature that we believe will become necessary in federated HPC system usage in the coming years, also outside of the Destination Earth project.

This abstract is not assigned to a timetable spot.

Modular Supercomputing: enabling application diversity in HPC

Estela Suarez 1

1Forschungszentrum Juelich GmbH, Juelich Supercomputing Centre

The ever-increasing demand for high-performance computing (HPC) resources across various scientific domains leads to a growing diversity of HPC applications. This diversification is driven by the integration of new algorithmic approaches, such as machine and deep learning, into existing workflows, as well as the emergence of digital twins that model complex phenomena in precise detail through multi-physics and multi-scale workflows. However, this diversity in algorithmic approaches and physical models poses a significant challenge for HPC technology providers and hosting sites. They must design HPC systems capable of accommodating these varied requirements while managing energy consumption and operational costs.
In response to this challenge, the Modular Supercomputing Architecture (MSA) offers a solution by orchestrating heterogeneous computer resources, including CPUs, GPUs, many-core accelerators, and disruptive technologies, at the system level. These resources are organized into compute modules, which are clusters optimized for specific application classes and user demands. Interconnected via a high-speed network and a common software stack, the MSA creates a unified machine, enabling users to tailor their applications' hardware resources by choosing how many nodes to utilize in each module. An advanced scheduler and dynamic resource manager maximize system utilization by efficiently allocating resources to jobs.
The MSA facilitates the execution of multi-physics or multi-scale simulations across compute modules through a global system-software and programming environment. Application workflows that involve different actions can be distributed to run on the most suitable hardware, enabling users to adapt their codes for optimal performance.
This talk will delve into the core features of the Modular Supercomputing Architecture, providing insights into its hardware and software elements through real-world examples of running MSA-systems. Attendees will gain high-level guidance on adapting their codes to utilize the MSA effectively. Finally, the talk will conclude by offering an outlook on the MSA's potential in the Exascale computing era, promising new possibilities for high-performance computing across diverse scientific domains.

This abstract is not assigned to a timetable spot.

Transforming Weather and Climate Code with PSyclone

Sergi Siso 1, Andrew Porter 2, Rupert Ford 1

1STFC Hartree Centre, 2Science and Technology Facilities Council, UK

PSyclone is a code-generation and transformation system designed to enable weather and climate models written in Fortran to achieve performance portability and to help with code maintenance. PSyclone supports two modes. The first mode takes existing unmodified, potentially MPI-parallel, code which PSyclone transforms to make efficient use of multi-core CPUs or GPUs. This approach is being applied successfully to directly-addressed codes found in ocean modelling such as NEMO and to parametrisation schemes found in atmosphere models such as UKCA. The second mode follows the PSyKAl separation of concerns to implement a Fortran-embedded DSL. Two DSLs have been implemented, one for mixed finite element codes which is being used in the U.K. Met Office's LFRic atmosphere model and one for directly addressed 2D finite-difference code. In its DSL mode PSyclone can additionally generate the required MPI-parallel code.

This presentation will concentrate on the code transformation aspects of PSyclone and its use in Weather and Climate codes. Firstly, the requirements and status of PSyclone's code transformation support for LFRic will be discussed. The current status of PSyclone's use with NEMO will then be provided, including results from running with OpenMP GPU-offload directives as well as OpenACC directives. PSyclone's use with, and development for, other codes such as NEMOVAR, CROCO and SOCRATES will also be outlined. Finally, a tool to generate adjoint code from tangent-linear code, which is based on PSyclone's code transformation capabilities, will be introduced.

This abstract is not assigned to a timetable spot.

Diversing HPC: Services offered by ESiWACE3

Erwan Raffin 1, Rosa Rodriguez Gasen 2, David Guibert 3, joachim Biercamp 4

1Atos, 2BSC, 3Eviden, 4DKRZ

ESiWACE3, the third phase of the Centre of Excellence (CoE) in Simulations of Weather and Climate in Europe, is funded by the EuroHPC Joint Undertaking to provide support for the wider community of weather and climate modelling in the use of state-of-the-art supercomputers and new architectures.

The objectives of the CoE are i) to transfer and to establish knowledge and technology for efficient and scalable simulations of weather and climate across the Earth system modelling community in Europe, ii) to close common technology gaps in the knowledge and toolbox for high-resolution Earth system modelling via joint developments across the European community, and iii) to serve as a sustainable community hub for training, communication, and dissemination for high-performance computing for weather and climate modelling in Europe.

We will present the ESiWACE3 service establishing collaborative projects in which the ESiWACE experts provide advice and engineering effort to applicants. More than offering engineering time and implementing code modifications, the projects aim to provide guidance and establish knowledge transfer between HPC experts and modelling groups.

Further, we will present the High Performance Climate and Weather Benchmark (HPCW) that is meant to isolate key elements in the workflow of weather and climate prediction systems to improve performance and to allow a detailed performance comparison for different hardware, thus fostering interactions with HPC vendors and technology providers.

Finally, we will give an overview of other ESiWACE3 services, like recommendations for domain-specific data compression derived from community input and our portfolio of training activities.

For all services, we have the ambition to reach the whole European modelling community, from MSc and PhD students in Earth system sciences to senior engineers and scientists. Specific attention is paid to attracting female participants and participants from inclusiveness target countries.

This abstract is not assigned to a timetable spot.

Numerical Weather prediction at MeteoSwiss using ICON on GPUs

Daniel Hupp 1, Dmitry Alexeev 2, William Sawyer 3, Guillaume Van Parys 1, Fabian Gessler 4, Rocco Meli 3, Mikael Stellio 4, Carlos Osuna 1, Jonas Jucker 5, Xavier Lapillonne 1, Victoria Cherkas 4, André Walser 1, Remo Dietlicher 4, Ulrisch Schaetler 6, Marek Jacob 6, Gabriel Vollenweider 1

1MeteoSwiss, 2Nvidia, 3CSCS, 4MeteoSwiss, C2SM, 5C2SM, 6DWD

Numerical weather prediction can greatly benefit from increased computing capacity by improving resolution, model complexity or number of ensembles. Many improvements in the recent years in high performance computing has come from new hardware architectures which are more efficient than tradition CPUs. Among them Graphical Processing Units (GPUs) have emerged has a leading technology and most of the world fastest supercomputers are now equipped with such architectures. To take advantage of this progress the ICON numerical weather prediction and climate model has been adapted to run on GPUs. We present the specific challenges associated with porting such a large model for numerical weather prediction application. To efficiently use the hybrid GPU system in an operational context, a holistic approach is needed which considers all steps from input to output, including data assimilation, with a strong constrain on time to solution. We first present the porting strategy for the different model components considering a base line GPU implementation using OpenACC compiler directives. We then show the performance results and optimization for the future operational configuration ICON-CH1, ICON-CH2 and KENDA-1 assimilation cycle configurations on Nvidia A100 GPUs. Finally, we discuss the opportunities and challenges of running on the ALPS supercomputing infrastructure at the Swiss supercomputing centre CSCS in an operational context.

This abstract is not assigned to a timetable spot.

Foundation Model in Earth Science: Towards Atmospheric Prediction and Analysis

Ankur Kumar 1, Rahul Ramachandran 2, Tsengdar Lee 3, Udaysankar Nair 1, Christopher E. Phillips 1, Sujit Roy 1

1University of Alabama in Huntsville, 2NASA Marshall Space Flight Center, 3NASA

Transformers, in general, have been explored in the domain of language and vision models to understand long sequences and can be utilized in different Earth Science applications. Foundation models in the domain of vision and language models utilizing transformers have shown great promise for generalization over different downstream applications1. Fourier neural operator-based token mixer transformers by Guibas et al. with a backbone of ViT have been used for predicting wind and precipitation on the ERA5 dataset[2,3]. Our goal is an attempt to build a foundation model for atmospheric prediction and analysis. For the task, we are intending to use model-level data of MERRA2 and HRRR. The initial study was performed using Fourcastnet architecture, which was trained over the MERRA2 data set with 38 variables [4]. We trained on data from 2000 to 2015 and for prediction, we used the initial condition for the model from 2017. The tracks and intensities of multiple cat 4-5 hurricanes were forecasted, which produces a track error range of 50-70 km and 70-100 km for the 6-hour and 18-hour predictions, respectively. Though there was a relatively large error in estimating the hurricane intensity, this error was 5-7 hPa and 10-15 hPa for 6-hour and 18-hour predictions. Figure 1 shows the forecast of Hurricane Florance using MERRA2 data and compared with ERA5. Fourcastnet showed promise in forecasting-based tasks with a lead time of up to 5 days. However, Forcastnet is not a foundation model and was designed primarily with goal of forecasting wind and precipitation.

Future work includes designing a foundation model with pretraining the model with a swin transformer[5] or MAE[6] based backbone where we are using a swin encoder and decoder in U-net based approach. Additionally, we intend to use different attention mechanisms to leverage the model potential to understand physics and be able to solve partial differential equations. Currently, we are aiming at downstream applications, such as weather forecast generation (English language), identifying changes on mesoscale and microscale, and global scale weather patterns such as the Inter-Tropical Convergence Zone or Indian Ocean Dipole.

  1. Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., Brunskill, E. and Brynjolfsson, E., 2021. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258.
  2. Guibas, J., Mardani, M., Li, Z., Tao, A., Anandkumar, A. and Catanzaro, B., 2021, September. Efficient Token Mixing for Transformers via Adaptive Fourier Neural Operators. In International Conference on Learning Representations
  3. Pathak, J., Subramanian, S., Harrington, P., Raja, S., Chattopadhyay, A., Mardani, M., Kurth, T., Hall, D., Li, Z., Azizzadenesheli, K. and Hassanzadeh, P., 2022. Fourcastnet: A global data-driven high-resolution weather model using adaptive fourier neural operators. arXiv preprint arXiv:2202.11214.
  4. Lee, T., Roy, S., Kumar, A., Ramachandran, R. and Nair, U., 2023, February. Long-Term Forecasting of Environment variables of MERRA2 based on Transformers. In European Geosciences Union (EGU) General Assembly 2023.
  5. Liu, Z., Ning, J., Cao, Y., Wei, Y., Zhang, Z., Lin, S. and Hu, H., 2022. Video swin transformer. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 3202-3211).
  6. He, K., Chen, X., Xie, S., Li, Y., Dollár, P. and Girshick, R., 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 16000-16009).
This abstract is not assigned to a timetable spot.

Computation Storage as seen from the Excalidata project

Jean-Thomas Acquaviva 1, Konstantinos Chasapis 2

1DDN Storage, 2DDN

Scientific Applications produce and consume routinely huge amounts of data. This requires back-and-forth data movement from the compute node to the storage system.
However, it appears that some operations present a limited arithmetic intensity, which means that costly data movement is triggered from storage to compute only to apply a simple operation. Computational or Active Storage is an answer to this fact, where the storage has been extended with limited computational capabilities. For specific numerical kernels, the compute nodes express the operation and the data extent to which the operation is to be applied. The storage performs the computation and sends back the result.
This dramatically reduces the volume of data transiting to the network, consequently improving performance and reducing energy costs.
The practical implementation of this concept remains non-trivial as many conceptual and engineering aspects have to be considered.
We propose to present our ongoing work conducted in the framework of ExCalidata, a sub-project of the UK Excalibur national initiative. We will detail the challenges, the API, and the prototype which has been realized.

This abstract is not assigned to a timetable spot.

MultIO: A framework for message-driven data routing in high-resolution weather and climate modelling

Domokos Sarmany 1, Mirco Valentini 1, Razvan Aguridan 1, James Hawkes 1, Tiago Quintino 1, Simon Smart 1, Philipp Geier 1

1ECMWF

In numerical weather prediction and high-performance computing, the computational bottleneck has gradually shifted from floating-point arithmetics to data throughput to and from storage – a phenomenon often referred to as the IO performance gap. MultIO, a set of software libraries developed at ECMWF, offers a mechanism to alleviate this by moving computation on data closer to where the data is produced. It offers two related, but distinct functionalities.

  • An asynchronous IO-server to decouple data output from model computations.
  • A user-programmable processing pipeline that operate on model output directly, accessing data while they are still in memory, and thus reducing the storage I/O operations required for some of the most frequent post-processing tasks.

One of the fundamental design decisions behind MultIO is that routing decisions are based on the metadata attached to each individual message. The user may control the type and amount of post-processing by setting the message metadata via the Fortran/C/Python APIs, and by configuring a processing pipeline of actions. There is also support for users to implement their own actions.

MultIO has been successfully used in the digital twins of the European Union's Destination Earth initiative, as well as in other EU projects such as MAESTRO, ACROSS and NextGEMS. For high-resolution climate runs, in particular, it has reduced drastically the ‘off-line’, IO-intensive post-processing required for meteorological and climate research. It will form part of the operational cycles of IFS for ocean data assimilation, as well as for generation of the next ECMWF re-analysis dataset (ERA6).

This abstract is not assigned to a timetable spot.

LFRic and NGMS: Meeting challenges of exascale through diversifying HPC

Iva Kavčič 1

1Met Office

LFRic is the new weather and climate modelling system being developed by the Met Office to replace the existing Unified Model (UM) in preparation for the challenges of exascale computing. The GungHo dynamical core at the heart of LFRic relies on fundamentally different data structures and layout to UM. Those, as well the requirement for increased resolution, led to unprecedented technical challenges, such as compiler support, high volume of data and utilising opportunities presented by heterogeneous architectures.

The guiding design principle in LFRic, imposed to promote performance portability, is “separation of concerns” between the science and parallel code. The design of the supporting infrastructure facilitates modularity and the use of emerging high-performance computing (HPC) technologies, such as Domain-Specific Languages (DSLs) and parallel I/O libraries.

All the fundamental changes above mean changes in several components of the operational forecast workflow. The Next Generation Modelling Systems Programme, organised under a series of projects of which LFRic is one, coordinates the diverse modelling approaches and technologies to cover all aspects of this workflow. We will give an update on the status of the NGMS programme and its implementation at the Met Office.

Last, but certainly not least, key elements for success of any complex programme are expertise and skills of people working on it. People working in NGMS need to have a wide range of skills that are at intersection of several STEM-related areas, which poses challenges in identifying and developing technical expertise. It is therefore imperative to continue to widen and diversify our talent pool and develop it by providing training and support. In NGMS we are making use from resources across multiple teams, as well as from collaboration with external partners. We are also continuously improving our recruitment process to support this.

This abstract is not assigned to a timetable spot.

AI for Simulation: Accelerating HPC with AI and IPUs

Alexander Titterton 1

1Graphcore

For many years, researchers have been solving the world’s most complex scientific problems by undertaking traditional HPC techniques across a wide range of applications. However, due to the growing complexity of calculations, as well as operational costs, a new, more efficient approach is required.

Advances in Artificial Intelligence have already begun to revolutionise applications such as Computer Vision and Natural Language Processing, with services such as ChatGPT becoming a household name. Moreover, classical HPC processes such as simulation and numerical analysis can be sped up by orders of magnitude using novel AI techniques.

In this talk, we will provide a technical introduction to Graphcore’s Intelligence Processing Unit (IPU) and explore how traditional HPC workloads can be enhanced and accelerated using AI techniques running on the IPU. We’ll also explore how innovators are adopting this new approach with applications relating to weather forecasting $^{1}$, climate modelling and computational fluid dynamics.

$^{1}$ M. Chantry et al. "Machine learning emulation of gravity wave drag in numerical weather forecasting" Journal of Advances in Modeling Earth Systems 13.7 (2021)

This abstract is not assigned to a timetable spot.

Code refactoring patterns targeting bandwidth optimized architectures and heterogeneous architectures

Ruchira Sasanka 1, Till Rasmussen 2, Mads Ribergaard 2, Jacob Poulsen 1

1Intel, 2The Danish Meteorological Institute

This presentation focuses on parallel refactoring patterns at different levels (SIMD refactoring, OpenMP refactoring and MPI refactoring) to target heterogeneous architectures. Recent versions of the OpenMP standard allows several different approaches to parallelism and we will explore each of these different options and their effect on performance and energy both when targeting CPUs and GPUs.
The sample code that will be used to illustrate the refactoring patterns and the corresponding performance comes from the sea ice dynamics, and particularly on the solver for the Elastic-Viscous-Plastic (EVP) equations. The EVP approach is probably the most common used solver for the sea ice dynamics (e.g. CICE, SI3/NEMO and FESOM). It is based on finite-difference operating on a masked grid and it introduces artificial elastic waves that are iteratively dampened. Hundreds of iterations are necessary to reach a solution and an additional computational challenge is that the number of active compute grid points evolves dynamically during the simulation. These features pose technical challenges to the implementation and the case will therefore allow us to illustrate several refactorization patterns.

This abstract is not assigned to a timetable spot.

Designing sustainable buildings globally with Hybrid-cloud (digital twins)

Ryan Procter , Simon Ponsford

YellowDog worked closely with two researchers from UCL, in partnership with FCBStudios, to run 354 million discrete simulations of the effect of weather conditions from 500 international locations on a huge library of building characteristics.

The building industry accounts for 39% of global energy-related carbon emissions. Any sustainable future must find a way to reduce this to zero. The ultimate aim is to turn this data into a platform which is available across the world, enabling sustainable buildings design in any climate.

As with many research institutions, capacity of on-premise resource is limited and often fully allocated across a range of world-changing projects. In this instance, that meant the scale and timeline of the project were restricted.

YellowDog’s platform now enabled the team to access all available compute across the world and rapidly. After initial meetings, both parties were excited about the potential impact of working together and the project team decided to expand their research (by 140 million simulations), beyond what was initially intended to be a resource for UK firms building internationally.

The team used EnergyPlus, a whole building energy simulation application, modelling detailed building characteristics (708,588 files) and weather conditions (500 international locations) to simulate building design for every set of building and weather condition. 354 million discrete simulations were performed in the cloud, consuming 45,000 instance hours of compute concurrently across five cloud regions in North America and Europe.

Over the course of 11 calendar days, the full simulation was completed, with the entire simulation creating 37.5TB of data which will now underpin the low-carbon design solutions shared through the open platform. This has drastically accelerated the research (by months) and will potentially have a lasting impact the timeline of international sustainability efforts.

By using a Hybrid-cloud setup and accessing available (and often idle) compute across the world, this can be scaled and accelerated.

This abstract is not assigned to a timetable spot.

Earth system modeling on Modular Supercomputing Architectures: coupled atmosphere-ocean simulations with ICON

Abhiraj Bishnoi 1, Catrin Meyer , René Redler 2, Norbert Eicker 3, Helmuth Haak 2, Lars Hoffmann 4, Daniel Klocke 5, Luis Kornblueh 6, Estela Suarez 7, Olaf Stein 8

1Kalkulo AS Oslo, 2MPI-Met Hamburg, 3Research Centre Jülich, 4Forschungszentrum Jülich, 5Max Planck Institute for Meteorology, 6Max Planck Institute forMeteorology, 7Forschungszentrum Juelich GmbH, Juelich Supercomputing Centre, 8Research Centre Juelich

Running complex Earth System model codes on modern supercomputing architectures poses challenges to efficient modelling and job submission strategies. The modular setup of these models naturally fits a modular supercomputing architecture (MSA), which tightly integrates heterogeneous hardware resources into a larger and more flexible HPC system. While parts of the ESM codes can easily take advantage of the increased parallelism and communication capabilities of modern GPUs, others lack behind due to the long development cycles or are better suited to run on classical CPUs due to their communication and memory usage patterns.
To better cope with these imbalances between the development of the model components, we performed benchmark campaigns on the JUWELS modular HPC system. We enabled the weather and climate model ICON to run in coupled atmosphere-ocean setup, where the ocean and the model I/O is running on the CPU Cluster, while the atmosphere is simulated simultaneously on the GPUs of JUWELS Booster (ICON-MSA). Both, atmosphere and ocean, are running globally with a resolution of 5 km. In our test case, an optimal configuration in terms of model performance (core hours per simulation day) was found for the combination 84 GPU nodes on the JUWELS Booster module and 80 CPU nodes on the JUWELS Cluster module, of which 63 nodes were used for the ocean simulation and the remaining 17 nodes were reserved for I/O. With this configuration the waiting times of the coupler were minimized. Compared to a simulation performed on CPUs only, the MSA approach reduces energy consumption by 59 % with comparable runtimes. ICON-MSA is able to scale up to a significant portion of JUWELS, making best use of the available computing resources. A maximum throughput of 170 SDPD was achieved when running ICON on 335 JUWELS Booster nodes and 268 Cluster nodes.

This abstract is not assigned to a timetable spot.

Leveraging diversity in all dimensions to predict multiscale meteorology and its impacts

Katherine Evans 1

1Oak Ridge National Laboratory

Our stakeholders are requesting accurate, actionable weather forecasts that include information about impacts upon the water cycle, energy infrastructure, disease propagation and other related and interconnected targets. The community is motivated to realize a vision to deliver custom and complex predictions of meteorologically-based events by developing and implementing a multipronged strategy. The nature of the complexities could be manifested in many dimensions e.g., spatial regions, time scales, and data products. Our strategy includes continued computational weather model development to leverage emerging high performance computing resources through improved algorithms, portable software strategies, pervasive evaluation metrics, data management, and related research. Applying these more powerful and flexible models to diagnose and predict a diverse suite of weather impact events e.g., flooding, air pollution, power interruptions, disease outbreaks, etc. requires coordinated and specialized interactions of weather model output with multiple impact model types, data structures and volumes, software, security protocols, and scientific cultures. This talk presents considerations in planning, executing, and optimizing multidomain coupling, workflows and communication strategies to create an organized ecosystem of models, data, and protocols for these complex yet compelling science challenges. Several examples of projects to address them are presented to highlight progress in this direction and how much interesting research is still before us. The creation of best practices, development of community benchmarks, and identification of community needs, and including all voices in the process is also needed to enable multifaceted, actionable, and flexible forecasts of weather impacts that will save lives, mitigate damage, and spur adaptation for future risks.

This abstract is not assigned to a timetable spot.

Weather forecasting on the cloud: an evaluation of IFS performance on Azure HPC instances

Cathal O Brien 1

1ECMWF

An increasingly large fraction of HPC workloads are being run on the cloud. The flexible access to the modern hardware provided by the cloud is appealing. However, achieving optimal performance for HPC workloads on the cloud requires additional work, due to cloud-specific factors such as hypervisor-reserved CPU cores. This work evaluates the performance of running ECMWF workloads on Microsoft Azures HPC instances.

This work begins by describing the workflow involved in running HPC jobs on Azure, including how the Virtual Machine Images were developed and how HPC clusters can be deployed on the cloud.

Azures HBv2, HBv3 and HBv4 instances were benchmarked, and the performance compared to ECMWFs on-premise Atos machine. Standalone IFS components Cloudsc and Ecrad were benchmarked to establish single-node performance characteristics. Ectrans and representative IFS runs were used to test scalability across many nodes. The performance of Azures parallel file system was evaluated using representative IFS runs with IO enabled.

This abstract is not assigned to a timetable spot.

The concept of NCAR's community software facility

Thomas Hauser 1, Tricia O'Keefe 1, Glen Romine 2

1National Center for Atmospheric Research, 2NCAR

NCAR’s community models have been instrumental in advancing our understanding of weather, climate, and the Earth system. These models are the scientific instruments that are essential for our understanding of the Earth system. The model complexity escalates as the community moves towards ultra-resolution Earth system modeling, ensemble simulations, and the integration of massive volumes of remotely sensed observations. Current community modeling codes are challenging to run at ultra-resolution, limiting scientific exploration, and ensemble simulations, while crucial for reducing uncertainty, are computationally expensive. The rapid changes and innovation in processor architectures, especially the transition of major HPC systems to GPU acceleration, present a challenge for our community software. Additionally, the rise of artificial intelligence and machine learning (AI/ML) offers opportunities to improve code performance and add value to Earth system modeling.

This presentation will give an overview of the community software facility the National Center for Atmospheric Research (NCAR) is developing to modernize and enhance its community models. The facility approach will allow us to rethink how we do community model support and development. The facility's goals are reducing technical debt, refactoring our model software, and updating the diagnostics and analysis software infrastructure to create robust, portable, and sustainable community software, making these models more performant and user-friendly. It will also foster a more inclusive Earth system science community by improving the ease of use of scientific workflows and user support services, thereby increasing accessibility to community models. Additionally, improving the ability and process of how we accept community contributions is part of our design process. We intend to develop this emerging approach with various universities, US agencies, and international partners. The outcome of this initiative will enable critical actionable science problems currently inaccessible with existing capabilities.

We are also proposing the establishment of a community of practice for software modernization to foster a culture of continuous learning and innovation within the weather and climate community. This community of practice will allow sharing of best practices, exchanging ideas, and collaborating on developing new tools and methodologies. It will also provide a supportive environment for capacity building, enabling members to enhance their skills and knowledge in software modernization. Furthermore, this initiative will facilitate the integration of AI/ML techniques into our software development practices, workflows, and models. By promoting open dialogue and cooperation, we aim to accelerate the pace of scientific discovery and technological advancement in Earth system modeling.

This abstract is not assigned to a timetable spot.

New Generation HPC for CMA

Shuai Deng 1

1National Meteorological Information Centre

The Meteorological High-Performance Computer (HPC) System is a critical operational platform supporting numerical weather prediction model operations and daily research tasks. Its processing capability, efficiency, reliability, and scalability are key elements and essential factors influencing the achievement of overall system business objectives. The China Meteorological Administration (CMA) has been continuously iteratively updating its supercomputing system, and is currently constructing a HPC system that ensures a balanced computation and I/O, scalability, stability, reliability, and easy management. This system utilizes the latest Intel 6458Q 32-core 3.1GHz processors, DDR5 memory, with a peak performance not less than 46 PetaFLOPs and a total available storage of not less than 138 Petabytes.Currently, this system is in the trial operation phase, gradually carrying out the migration of CMA numerical weather forecasting models. Notably, cma-gfs and cma-meso models have achieved approximately 30% performance improvement and are continuously enhancing their performance. Additionally, this HPC system supports training and inference of large-scale weather models based on artificial intelligence. The deployment of Pangu-Weather and FourCastNet models has been completed, and training work based on CMA reanalysis data is ongoing.

This abstract is not assigned to a timetable spot.

Save Up to 220 CO2e per PB with the WEKA Data Platform

Derek Burke 1

1WekaIO UK Ltd.

The increased pressure on organisations to deliver data-driven business outcomes has created an exponential demand for power in modern data centres (on-premises and in the cloud) making them some of the world’s biggest consumers of power. In fact, evidence suggests that today, data centres account for more than 2% of global energy consumption, and, if left unchecked, is projected to rise to 8% by 2030.

The WEKA Data Platform makes all data as fast as local and available via multiple protocols. This accelerates the data pipeline and increases utilisation of on-premises and cloud compute. WEKA's 'Zero Copy' architecture also reduces the complexities of having to maintain and copy data between disparate storage systems which delivers a unique strategic advantage to WEKA customers and one that has significant carbon savings and other environmental benefits.

ESG Capital Group conducted a comparative lifecycle assessment of the WEKA Data Platform compared to legacy data architectures to determine the potential carbon savings and other environmental benefits of moving to WEKA.
Taking into account US and Global research studies to estimate average energy consumption and CO2 emissions generated for equipment and operations, they concluded that the WEKA Data Platform saves over 220 tons of CO2e per petabyte over the typical 3-5 year lifecycle compared to a traditional data architecture. And these savings will grow with your infrastructure.

This talk will focus on the key technical capabilities of the WEKA Data Platform and how they can be leveraged to achieve huge carbon savings, accelerate your data pipeline and reduce TCO.

This abstract is not assigned to a timetable spot.

I/O management at NCEP. Problems and Progress

George Vandenberghe 1

1NOAA/NCEP

I/O performance has been a significant component of HPC capability since the genesis of HPC in the 1940s.
The enormous increases in compute capabailty and capacity per unit cost have not been matched by commensurate increases
in I/O metrics per unit cost. Space has approximately scaled with compute but transfer rates and transactions/second
have not. Both compute and I/O metrics are increasing exponentially with time but the doubling time for I/O metrics
is approximatly twice the time for compute metrics leading to modern HPC systems becoming more and more I/O bound
with time. I/O performance at NCEP was a trivial constraint on NWP applications in the 1980s but is a prominent
constraint now. NCEP has addressed this with improvements in how the massive amounts of I/O needed for modern NWP
are handled and with fundamental design changes in how I/O is going to be handled moving forward. Design changes
made in the forecast workflow in calendar 2019 and 2020 have reduced global model I/O requirements by a factor of five
and have shifted the time when I/O is expected to become cripplingly constraining again, about seven
to ten years further out. Other more fundamental workflow changes planned will shift this even further out and, after
a very pessimistic outlook in 2018, the I/O problem at NCEP is now much improved.

This abstract is not assigned to a timetable spot.

GPU adaptation of NWP single column algorithms

Michael Staneker 1

1ECMWF

This talk is intended as first part of a double-header detailing the GPU adaptation strategy for the IFS physics.

As for most applications, GPUs are and will continue to be an important cornerstone of high-performance computing in meteorology. To utilise the computational potential of GPUs fully and efficiently, it is necessary to rewrite and port legacy code to take the highly parallel architecture and memory hierarchy into account. Despite the existence of different programming models with different degrees of abstraction, there are some fundamental concepts, ideas, and optimisation strategies for GPGPU.It is crucial to understand how control flow elements are mapped to GPU hardware and memory architectures.

Therefore, this presentation aims to introduce GPU programming and porting of Fortran NWP codes or more specifically of single column algorithms exhibiting data dependencies only in the vertical direction, making them embarrassingly parallel in the horizontal. This class of algorithm is commonly used in NWP codes and therefore important to be optimally offloaded to GPUs. Starting from a vectorised Fortran code optimised for CPUs, multiple possible GPU implementations will be discussed using different programming models like pragma-based approaches e.g., OpenMP and OpenACC and low-level approaches like CUDA Fortran, CUDA C, HIP and SYCL. This also includes a discussion about performance portability and the trade-off between necessary code changes and performance improvements. Further, control flow changes to allow for SIMT-style kernels and possible further optimisations like temporary array demotion will be discussed.

This abstract is not assigned to a timetable spot.

Hybrid 2024: Adapting IFS for a hybrid CPU-GPU architectures

Michael Lange 1

1ECMWF

The use of specialised accelerator architectures has become widespread in high-performance computing (HPC), with many of the largest systems in the world now using graphics-processing units (GPU) to achieve unprecedented performance and throughput. The efficient usage of such accelerator architectures has long been envisioned at ECMWF and the wider weather and climate community as a way to provide the necessary computational power to further increase model resolution and enable operational km-scale modelling.

In this talk we present "Hybrid 2024", a project that has been created to prepare the IFS for HPC accelerator architectures ahead of the next HPC procurement. In close collaboration with ECMWF member states and supported by the Destination Earth initiative we are aiming to restructure core model components of the IFS, and various technical infrastructure packages, to allow hybrid CPU-GPU execution. The focus is on sustainable solutions through modern software engineering methods that allow adaptation of the code and evaluation of performance without impinging on scientific development. While the primary target for Hybrid 2024 are current GPUs, the proposed solutions are general enough to allow subsequent adaptation to future GPU and non-GPU accelerators, as well as exploring bespoke solutions for alternative CPU architectures.

A combination of library development, data-structure refactoring and source-to-source translation will be used to create GPU-specific build modes that will be developed and maintained alongside the current CPU-optimised execution paradigm. The development will be accompanied by routine benchmark deployment and performance evaluation to continuously assess our ability to utilise GPU and accelerator architectures. In this talk we will provide an overview of the technical solutions proposed in this project and give an update on the current progress.

This abstract is not assigned to a timetable spot.

Towards Diversified Exascale Numerical Weather Prediction Workflows

Christopher Harrop , Stefan Gary 1, Alvaro Torreira 1, Christina Holt 2, Venita Hagerty 3, Isidora Jankov 4, Kyle Chard 5, Michael Wilde 1

1Parallel Works, 2CIRES, University of Colorado, and NOAA Global Systems Laboratory, 3CIRA, Colorado State University, and NOAA Global Systems Laboratory, 4CIRA affiliated with NOAA/ESRL/GSD, 5University of Chicago and Argonne National Laboratory

Robust automation of complex numerical weather prediction (NWP) workflows is essential for both conducting earth system modeling research and for the reliable delivery of operational forecasts that serve the public and protect life and property. The workflows used in those two closely related, but distinct, pursuits are similar in their structure and share many of the same constituent components and requirements. However, the research and operational workflow regimes also have important differences arising from their different missions and goals. In the case of workflows used to research application of new high performance computing (HPC) tools and technologies for NWP, many of these differences manifest as extreme computational scales and use of emerging technologies unsuitable for an operational environment. Workflow efforts at NOAA have primarily focused on the needs of the NWP research to operations (R2O) pipeline and are subject to the rigid constraints of the existing operational framework. The exascale computing era is ushering in many new challenges and opportunities for diversification of computational technologies and techniques. We have initiated a collaborative project to develop a next generation NWP workflow system responsive to the advanced research needs of exascale and beyond. This work is conducted under the auspices of the NOAA Software Engineering for Novel Architectures (SENA) project in collaboration with the Department of Energy’s Exascale Computing Project (ECP) and Parallel Works. The goal of this project is to evaluate the collection of existing workflow tools curated by the ECP’s ExaWorks project, and use them to build a NWP workflow system specifically designed for pioneering research of emerging HPC, cloud, and machine learning technologies. We present our vision, progress, and challenges towards development of a diversified workflow system that can straddle on-premise and cloud-based computing resources and accommodate both high-performance and high-throughput oriented tasks while managing the associated data flows.

This abstract is not assigned to a timetable spot.

Enabling Elastic Cloud Integration with Kubernetes

Timothy Whitcomb 1

1US Naval Research Laboratory

Numerical weather prediction (NWP) models require large, tightly-coupled computational resources for parallel execution. However, the NWP ecosystem requires significant additional software as part of the production process that may or may not be tightly coupled in a parallel sense. These workloads (including service integrations) feature dynamic bursts of activity consistent with new forecast data availability, rather than a time-invariant load. Cloud elasticity (the ability to provision new resources when required and shut them down when no longer needed) is a promising avenue to deal with these spikes and lulls, but often requires integration with a particular cloud provider's infrastructure to achieve elastic scaling.

In this talk, I describe recent work leveraging the Kubernetes container orchestration platform to architect applications downstream of NWP models in a cloud environment that provides elastic scaling while maintaining a layer of cloud vendor independence and the ability to deploy across multiple clouds and to on-premises computing environments. These architectures include infrastructure-as-code with Terraform, GitOps continuous delivery using ArgoCD, and elastic performance via the Kubernetes Event-Driven Autoscaler. Taken together, this approach uses Kubernetes resource abstractions to identify integration points between the applications themselves and the on-prem or cloud environment in which they execute.

This abstract is not assigned to a timetable spot.

Running Weather, Climate, Hydrological and Farming low-latency workflows on the ACROSS platform

Paolo Viviani 1, Konstantinos Tsarpalis 2, Chris Bradley 3, Albrecht Weerts 4, Emanuele Danovaro 3, Joost Buitink 4, Ruben Imhoff 4, Tiago Quintino 3, Sven Willner 5, Olivier Terzo 1, Jairo Segura 6

1Links, 2Neuropublic, 3ECMWF, 4Deltares, 5Max Planck Institute for Meteorology, 6MPI-M

ACROSS is a EuroHPC project with the primary objective of creating a platform that facilitates the execution of diverse application workflows, encompassing large numerical simulations, machine learning, and big-data analytics, on HPC resources with optimal efficiency. The platform is validated through three large-scale pilots: Greener aero-engine modules optimisation, Geological storage of CO2 and Hydro-Meteorological and Climatological workflows.

ECMWF, MPI-M, Deltares, and Neuropublic have harnessed the potential of the ACROSS platform to develop large-scale low-latency workflows that surpass the current state-of-the-art capabilities.
In the hydro-meteorological workflow, the team achieved hydrological forecasts for the Rhine and Meuse catchments at 1km spatial and hourly temporal resolution, taking only 1 to 2 minutes per issue time after obtaining the IFS forecast. This significant advancement was made possible by employing in-situ NWP product generation, stream-like pre-processing of meteorological forcing data, and a complete re-implementation of the WFLOW hydrological model, resulting in a remarkable 11x speed improvement. Furthermore, the approach can be scaled up to accommodate more compute nodes, allowing for the simulation of all European catchments and/or multiple ensemble members.

MPI-M has also developed a novell data management subsystem called BORGES, which adopts the FDB object store and has been integrated into the ICON model. BORGES facilitates semantic data access and seamless integration of ICON climatological simulations with WFLOW, thereby enabling hydro-climatological simulations over extensive river catchments.

In addition to these developments, the new Farming advisory services are forced by 4km global-scale IFS forecasts and 1km WRF mesoscale downscaling with data assimilation using Neuropublic's weather station network. The team has dedicated special attention to ensuring low-latency utilization of global-scale NWP by employing stream-like post-processing of the global-scale product during the WRF-DA Data assimilation phase.

This abstract is not assigned to a timetable spot.

Project Rajin and UXarray: community tools for the analysis of kilometer scale climate and weather model outputs

Orhan Eroglu 1, John Clyne 2, Brian Medeiros 1, Colin Zarzycki 3

1NCAR, 2National Center for Atmospheric Research (NCAR), 3Pennsylvania State University

The transition by the global weather and climate modeling communities to kilometer-scale resolutions enabled by unstructured grid dynamical cores will permit unprecedented new science and improved forecasting accuracy, but comes with a two-fold cost. First, the increased model resolution results in the output of massive volumes of data, further exacerbating Big Data problems. Second, the existing, general purpose software tools commonly used for analyzing, post-processing, and visualizing geoscience model outputs primarily operate on structured grid data, providing little or no support for unstructured meshes. In this talk we discuss a collaboration between the U.S. National Center for Atmospheric Research and the Department of Energy’s SEATS program to lead the community development of open source, Python tools supporting fundamental analysis and visualization operations on unstructured grid model outputs at global storm resolving resolutions. We will provide an update on the current state of the software, and discuss our experiences with engaging a broad community of volunteer contributors in the development of these tools.

This abstract is not assigned to a timetable spot.

Optimization of the Digital Twins for Weather and Climate Predictions: an exciting battleground

Stella Valentina Paronuzzi Ticco 1, Xavier Yepes Arbós 1, Victor Correal 1, Daan Degrauwe 2, Marta Garcia 1, Denis Haumont 3, Vijendra Singh 1, Joan Vinyals 1, Mario Acosta Cobos 4

1Barcelona Supercomputing Center, 2RMI Belgium, 3Royal Meteorological Institute of Belgium, 4BSC

A simulated digital planet will be critical to evaluate different scenarios of what a changing present is and what the future might look like, taking into account the climate and weather changes. The European Commission’s Destination Earth (DestinE) program aims towards this by developing high-precision digital twins (DT) of the Earth. DestinE is expected to provide the first two digital twins: the Digital Twin on Weather-Induced and Geophysical Extremes and the Digital Twin on Climate Change Adaptation.
Leveraging the available performance of pre-exascale systems, the Barcelona Supercomputing Center and the Royal Meteorological Institute of Belgium are involved in this effort as partners with leading technical expertise in analyzing and optimizing European weather and climate code performance in diverse and heterogeneous platforms.
In this presentation, we will focus on the joint efforts made in the technical work packages of both DTs to increase the performance of the most computationally demanding parts of the models, the challenge represented by the need for performance portability on accelerators, and the lesson(s) learned so far on this exciting battleground.

This abstract is not assigned to a timetable spot.

Domain Specific Language Adoption into NASA’s Goddard Earth Observing System code

Amidu Oloso 1, Tsengdar Lee 2, Purnendu Chakraborty 1, Florian Deconinck 2, Christopher Kung 2

1SSAI / NASA, 2NASA

NASA Goddard Space Flight Center is undertaking an effort to develop a next generation Goddard Earth Observing System (GEOS) model that enables an increase in resolution and scalability to meet future NASA Global Modeling and Assimilation Office requirements. Such requirements include higher-resolution production support for coupled data assimilation and global nature runs for climate observing system simulation experiments. With the recognition that accelerator-based heterogeneous High Performance Computing systems provide a promising platform to meet future GEOS requirements, the Advanced Software Technology Group (ASTG) at NASA Goddard Space Flight Center is undertaking an effort to leverage such architectures by incorporating a domain specific language (DSL) into GEOS. DSL adoption brings an opportunity for code portability and scalability across multiple platforms, including traditional CPU systems and accelerator-based systems. It also provides GEOS developers a higher-level language, Python in this case, for climate scientists and developers that improves productivity by abstracting away the details of the computing architecture, which lets developers focus solely on the algorithms. To jump start the DSL adoption, the ASTG is leveraging the GridTools for Python (GT4Py) DSL developed by ETH Zurich and CSCS as well as a GT4Py-port of the Cubed Sphere Dynamical Core (FV3) from the Allen Institute for AI. The ASTG is exploring a hybrid Fortran-Python approach to incorporate the GT4Py-port of FV3 into GEOS and will discuss the implementation and performance characteristics of this approach for leveraging GPU-based computing.

This abstract is not assigned to a timetable spot.

The Earth-2 Service Architecture

Peter Messmer 1

1NVIDIA

An interactive climate information system accessible to a broad audience needs to handle a wide range of requests, from querying historic records to playing through what-if-scenarios in future climates. This diversity implies a system that can handle heterogeneity not only at the level of the underlying raw data, but also at the level of on-demand processing to generate the final data products. Especially the workflows for on the fly processing, including simple post-processing of existing climate data, running inference using an AI based model, or executing a massively parallel GPU-accelerated climate simulation, requires access to resources optimized for the corresponding tasks. And while the system should present itself to the user as a single service, designing a monolithic architecture poses a range of technical challenges esp regarding flexibility, extensibility and ultimately performance. The NVIDIA Earth-2 platform therefore opted for an architecture comprised of multiple inter-operating services. Each service targeting special functions, ranging from running an inference step all the way to visualization. This fine granularity of services offers not only the desired extensibility, but also the necessary flexibility to execute tasks on the most adequate resource for the job: a GPU-accelerated HPC system for a process-based models, an AI accelerated systems for inference jobs or a system with graphics optimized GPUs for visualization tasks. In this presentation, we’re going to give an overview of the Earth-2 service architecture, the interfaces used between the individual services and the deployment mechanisms used to map the diverse workloads to the most optimal resources.

This abstract is not assigned to a timetable spot.

ML for High-Performance Climate: Massive Data Post Processing, Extreme Compression, and Earth Virtualization Engines

Torsten Hoefler 1

1ETHZ

Machine learning presents a great opportunity for Climate simulation and research. We will discuss some ideas from the Earth Virtualization Engines summit in Berlin and several research results ranging from ensemble prediction and bias correction of simulation output, extreme compression of high-resolution data, and a vision towards affordable km-scale ensemble simulations. We will also discuss programming framework research to improve simulation performance. Specifically, our ensemble spread prediction and bias correction network applied to global data, achieves a relative improvement in ensemble forecast skill (CRPS) of over 14%. Furthermore, we demonstrate that the improvement is larger for extreme weather events on select case studies. We also show that our post-processing can use fewer trajectories to achieve comparable results to the full ensemble. Our ML-based compression method achieves data reduction from 300x to more than 3,000x and outperforms the state-of-the-art compressor SZ3 in terms of weighted RMSE and MAE. It can faithfully preserve important large scale atmosphere structures and does not introduce artifacts. When using the resulting neural network as a 790x compressed data loader to train the WeatherBench forecasting model, its RMSE increases by less than 2%. The three orders of magnitude compression democratizes access to high-resolution climate data and enables numerous new research directions. We will close by discussing ongoing research directions and opportunities for using machine learning for ensemble simulations and combine several machine learning techniques. All those methods will enable km-scale global climate simulations.