19th Workshop on high performance computing in meteorology

Europe/London
Description

Towards Exascale Computing in Numerical Weather Prediction (NWP)

Workshop description

Every second year ECMWF hosts a workshop on the use of high performance computing in meteorology.

The workshop brings together experts in high-performance computing from across the national weather centres, academia and industry to discuss and present recent developments in high-performance computing for weather forecasting applications.
The theme of the 2021 workshop was "Towards Exascale Computing in Numerical Weather Prediction (NWP)" and will focus on the challenges and opportunities that exascale computing presents for the weather and climate community.

Workshop key topics

•    International Exascale efforts in NWP
•    Scalability and performance optimisation of weather applications
•    Porting of weather and climate codes to heterogeneous architectures
•    Emerging high-performance computing technologies
•    Potential of cloud computing for meteorological applications
•    Machine learning for weather and climate
•    Digital twins of the Earth

The workshop consisted of keynote talks from invited speakers, 20-30 minute overview talks and expert presentations as well as a panel discussion.

Workshop aims

Our aim is to provide a forum where users from our Member States and around the world can report on recent experience and achievements in the field of high performance computing; plans for the future and requirements for computing power were also presented.

Events team
    • 08:20 08:40
      Opening and welcome 20m
      Speaker: Isabella Weger (ECMWF)
    • 08:40 12:00
      Session 1
      Convener: Ioan Hadade (ECMWF)
      • 08:40
        A baseline for global weather and climate simulations at 1km resolution 20m

        Advancing the simulation and understanding of the Earth’s weather and climate by representing deep convection explicitly is demonstrated with an average grid spacing of 1.4km. Our global simulations are spanning a 4 months period (November 2018 - February 2019, NDJF season) with the state-of-the-art Integrated Forecasting System (IFS) of ECMWF. So far this has only been possible in limited-area or very short time range simulations, thus lacking the feedback of fundamental energy exchanges onto the larger scales at extended time ranges. The world’s first seasonal timescale global simulation with 1.4 km average grid spacing addresses the question of how resolved deep convection feeds back on global dynamics of the atmosphere, and thus provides a reference and guidance for future simulations, albeit under the caveat of affordability of only a single realisation at this point in time. Our work makes available an unprecedented O(1 km) dataset with 137 vertical levels covering the atmosphere up to a height of 80 km. The simulation results are compared with corresponding 9 km average grid spacing simulations with and without deep convection parametrisation, respectively. The simulations were conducted as part of our INCITE20 award for computer access to Summit, the currently fastest computer in the world (Top500, Nov 2019). Thanks to its unprecedented detail, the dataset will support future satellite mission planning that relies on observing system simulation experiments (OSSEs) based on 'nature runs' for simulating yet non-existing satellite observations. This work may be seen as a prototype contributing to a future digital twin of the Earth and we will quantify and illustrate how to handle the unprecedented data flow produced by such extreme-scale simulations.

        Speaker: Nils Wedi (ECMWF)
      • 09:00
        Preparing the IFS for HPC accelerator architectures 20m

        The need to fully exploit the advantages of accelerators in modern HPC systems plays a vital role in achieving the next step change in predictive skills for weather and climate models. However, the diversification of HPC hardware continues and the use of heterogeneous computing architectures is becoming more widespread in the scientific modelling community. This incurs a recurring need to adapt and transform existing model software to new programming models, which is becoming increasingly impractical using a single code base.

        In this talk we highlight the key elements of the ECMWF roadmap for adapting the Integrated Forecasting System (IFS) to HPC accelerators. We discuss how a combination of flexible data structures, an extended use of library interfaces and the use of source-to-source translation tools is envisaged to allow the incremental adaptation of the code base to multiple HPC architectures alongside scientific development.

        We also demonstrate Loki, an in-house developed tool that allows bespoke source-to-source transformations to be devised to adapt individual model components to novel programming paradigms or community-driven DSLs. In addition to GPU-driven efforts we will also highlight preliminary results using FPGAs and showcase the potential for utilising HPC systems based on a dataflow programming paradigm in weather and climate models.

        Speaker: Michael Lange (ECMWF)
      • 09:20
        Break 30m
      • 09:50
        InfiniBand In-Network Computing Technology and Roadmap 20m

        The latest revolution in HPC is the effort around the co-design approach, a collaborative effort to reach Exascale performance by taking a holistic system-level approach to fundamental performance improvements, is In-Network Computing. The CPU-centric approach has reached the limits of its scalability in several aspects, and In-Network Computing acting as “distributed co-processor” can handle and accelerates performance of various data algorithms, such as reductions and more.

        The past focus for smart interconnects development was to offload the network functions from the CPU to the network. With the new efforts in the co-design approach, the new generation of smart interconnects will also offload data algorithms that will be managed within the network, allowing users to run these algorithms as the data being transferred within the system interconnect, rather than waiting for the data to reach the CPU. This technology is being referred to as In-Network Computing, which is the leading approach to achieve performance and scalability for Exascale systems. In-Network Computing transforms the data center interconnect to become a “distributed CPU”, and “distributed memory”, enables to overcome performance walls and to enable faster and more scalable data analysis.

        The new generation of HDR 200G InfiniBand In-Network Computing technology includes several elements - Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), a technology that was developed by Oak Ridge National Laboratory and Mellanox and received the R&D100 award, that enables to execute data reduction algorithm on the network devices instead of the host based processor. Other elements include smart Tag Matching and rendezvoused protocol, and more. These technologies are in use at some of the recent large scale supercomputers around the world, including the top TOP500 platforms.

        The session will discuss the In-Network Computing technology and testing results from various systems around weather based applications.

        Speakers: Gilad Shainer (NVIDIA), Rich Graham (NVIDIA)
      • 10:10
        Operational NWP at DWD: Life before Exascale 20m

        The last years again have seen some changes in the operational forecasting chain at DWD. The most significant one being the replacement of the COSMO-Model by the limited area mode application of ICON, the ICON-D2. Other developments, also in data assimilation, are ongoing, for which a significant increase of computing power is needed. This is delivered now in two phases by NEC, which won the procurement in 2019.

        In this presentation we will give an overview on the community model ICON and the major developments at DWD for the NWP mode. The need of exascale computing for NWP and the other applications will be discussed.

        A possibility to reach exascale is given by accelerators and the work to port ICON to GPGPUs using OpenACC has quite progressed, mainly by the work of our partners at MeteoSwiss, MPI-M and DKRZ. But with upcoming new accelerator hardware also the programming models are changing. Therefore it is unclear, how "operational exascale code" will look like in the future and it seems that this makes the road to exascale a rather rocky one.

        Speaker: Mr Ulrich Schättler (Deutscher Wetterdienst)
      • 10:30
        An approach to architectural design for exascale systems 20m

        This talk will explore the parameters upon which architectural decisions for exascale systems will be based. It is over a decade since the first Petascale systems were introduced using accelerators. Since that time there have been many significant changes in the technology landscape with the widespread adoption of GPUs and other forms of accelerators now meaning that almost all high end supercomputers are using these technologies. While the raw performance advantages of these technologies are obvious, there are clear implications for programmability and energy efficiency. With other technologies, such as quantum accelerators and neuromorphic processing on the immediate horizon, and the ever increasing use of machine learning as part of the scientific process, the options available to scientists and engineers is becoming even more diverse. The talk will cover the next generation of technologies and an approach to architectural design which aims to simplify this complexity and lead to a new generation of energy efficient supercomputers with the flexibility to address the next wave of scientific challenges.

        Speaker: Mr Jean-Pierre Panziera (Atos)
      • 10:50
        ECMWF's new data centre in Bologna 30m
        Speakers: Ioan Hadade (ECMWF), Isabella Weger (ECMWF), Martin Palkovic, Michael Hawkins (ECMWF), Oliver Treiber (ECMWF)
      • 11:20
        Activities in Gather.Town 40m
    • 14:50 18:00
      Session 2
      Convener: Nils Wedi (ECMWF)
      • 14:50
        LUMI: the EuroHPC pre-exascale system of the North 20m

        The EuroHPC initiative is a joint effort by the European Commission and 31 countries to establish a world-class ecosystem in supercomputing to Europe (read more at https://eurohpc-ju.europa.eu/. One of its first concrete efforts is to install the first three "precursor to exascale" supercomputers. Finland, together with 8 other countries from the Nordics and central Europe, will collaboratively host one of these systems in Kajaani, Finland. This system, LUMI , will be the one of the most powerful and advanced computing systems on the planet at the time of its installation. The vast consortium of countries with an established tradition in scientific computing and strong national computing centers will be a key asset for the successful infrastructure. In this talk we will discuss the LUMI infrastructure and its great value and potential for the research community.

        Speaker: Pekka Manninen (CSC - IT Center for Science Ltd)
      • 15:10
        The Leonardo pre-exascale EuroHPC system in the context of CINECA's HPC architecture for the meteo-clima services 20m

        CINECA will host one of the pre-exascale systems procured by the JU EuroHPC. That system will be hosted on the campus of the Bologna Tecnopolo, which houses the ECMWF data center and will host various research and development laboratories. In addition to presenting the characteristics and development strategy of the HPC CINECA system, will be presented the research and development activities in the context of the actions concerning the meteo-clima domain and the activities relating to operational services for national meteorological forecasts in collaboration with the European and Italian stakeholders, regional offices and the new national agency Italia Meteo.

        Speaker: Sanzio Bassini (CINECA)
      • 15:30
        MareNostrum 5 20m

        In 2019, the EuroHPC selected the Barcelona Supercomputing Center as entity to host one of the largest European supercomputers, MareNostrum5; this new MareNostrum will be one of the pre-exascale machines, with a peak performance of 200 petaflops (200 x 1015 floating-point operations per second). It is expected to come into operation on 31 December 2020. Its peak power of 200 petaflops will be 17 times higher than its current supercomputer, MareNostrum 4, and 10,000 times higher than the supercomputer that began the saga in 2004, MareNostrum 1. It will also be much larger in size, so the new machine will be physically spread out between the Torre Girona Chapel -MareNostrum's current base- and the lower floors of BSC's new corporate building, just a few meters away from the chapel.
        In addition to putting the BSC on the recently created European supercomputing map, MareNostrum 5 will imply a leap in scale over the current infrastructure at the Barcelona Supercomputing Center.
        BSC was set up in 2004 by the Spanish Government, the Catalan Government and the Universitat Politècnica de Catalunya (UPC). It was based around a core group of UPC lecturers led by Mateo Valero and the first supercomputer, MareNostrum 1.

        Speaker: Sergi Girona (Barcelona Supercomputing Center)
      • 15:50
        Break 30m
      • 16:20
        Destination Earth and Digital Twins - A European opportunity for HPC 20m

        The aim of the Destination Earth (DestinE) action is to support the EU’s green transition by providing a new Earth system simulation and observation capability. DestinE's initial focus is on the prediction of weather induced extremes and climate change adaptation through so-called digital twins. These twins will greatly enhance our ability to produce models with unprecedented detail and reliability, allowing policy-makers to anticipate and mitigate the effects of climate change, saving lives and alleviating economic consequences in cases of natural disasters. High-performance computing and data handling sit at the core of the technical developments and DestinE will rely on EuroHPC resources provided by its new generation of pre-exascale systems. In turn, DestinE offers a great opportunity to demonstrate that the EuroHPC investments can produce value for dealing with one of the most important challenges our society is facing.

        Speaker: Peter Bauer
      • 16:40
        How could/should digital twin thinking change how we use HPC in weather and climate? 20m

        Digital twins are defined in many ways, but definitions usually end up something like "simulation of the real world constrained by real time data". This definition basically encompass our understanding of the use of weather and climate models whether using data assimilation or tuning to history. It would appear that we (environmental science in general and weather and climate specifically) have been doing digital twins for a long time. So what then should new digital twin projects in our field aim to do that is different from our business as usual approach of evolving models and increasing use of data?

        There seem to be two key promises associated with the current drive towards digital twinning: scenario evaluation and democratising access to simulations. To make progress we can consider two specific exemplar questions: (1) How can I better use higher-resolution local models of phenomenon X driven by lower-resolution (yet still expensive large-scale) global models?; and (2) How can I couple my socio-economic S model into this system?

        Delivering on these promises, even for just these two exemplars, requires some additional thinking beyond what we do now, and such thinking can take advantage of our changing computing environments, and in particular our use of tiered memory and storage, so that the burden of supporting these digital twin applications does not come with unaffordable complexity or cost in the primary simulation environment. It will also require much more transparency and community engagement in the experimental definition. In this talk I outline some of the possibilities in this space, building from existing technologies (software and hardware), models, and infrastructure (existing systems), and suggest some practical ways forward for delivering on some of the digital twin promises, some of which might even be possible within existing programmes such as the EC's Destination Earth (DestinE).

        Speaker: Bryan Lawrence (NCAS, University of Reading)
      • 17:00
        Building Global Kilometer-Scale Prediction Models 20m

        We will give an overview of efforts to build earth system prediction models that are able to run at kilometer-scales on CPU, GPU and hybrid systems. The entire design and development process must be examined including algorithms, languages, tools and libraries, software approach, and other aspects in order to maximize the combination of computational performance, portability, productivity, and scientific accuracy. We will introduce GeoFLOW, a framework that enables comparisons of algorithms and approaches in terms of computational performance and scientific accuracy. Such comparisons are made at the target kilometer-scale resolution to fully characterize performance, scalability and accuracy. The approach is to build from the ground up, progressing from simpler to more complex models, selecting the algorithms and techniques that provide the best overall capabilities.

        Speaker: Mark Govett (NOAA Earth System Research Laboratory)
      • 17:20
        Activities in Gather.Town 40m
    • 08:20 12:00
      Session 3
      Convener: Iain Miller (ECMWF)
      • 08:20
        The new very-high-resolution EC-Earth 4 climate demonstrator 20m

        The new very-high-resolution EC-Earth 4 climate demonstrator

        Recent studies have established that the typical atmospheric and oceanic resolutions used for the CMIP5 coordinated exercise, i.e., around 40km-150km globally, are limiting factors to correctly reproduce the climate mean state and variability. In the framework of the ESiWACE project, the Barcelona Supercomputing Center (BSC) developed a coupled version of the EC-Earth 3 climate model at a groundbreaking horizontal resolution of about 15km in each climate system component. In the atmosphere, the horizontal domain was based on a spectral truncation of the atmospheric model (IFS) at T1279 (15 km) together with 91 vertical levels. The ocean component (NEMO) ran on the ORCA12 tripolar (cartesian) grid at a horizontal resolution of about 1/12° (16 km), with 75 vertical levels.

        This very-high-resolution (VHR) configuration was used in the Glob15km project to run a 50-year spinup from which one historical and one control simulation of 50 years each were started, following the HighResMIP protocol from CMIP6. These experiments are currently being used to identify the improvements in process representation with respect to coarser resolution and to pin down physical and dynamical reasons behind these differences induced by resolution change.

        The VHR coupled configuration was a great benchmark to reveal the most critical scalability problems of the EC-Earth 3 model. Within the ESiWACE2 project, those issues have been tackled to allow operational climate predictions at more than 1 SYPD with production-mode configurations. The new Tco639-ORCA12 configuration is based on the EC-Earth 4 model, made up of OpenIFS cycle 43r3 and NEMO 4, and uses a cubic octahedral grid in the atmosphere. In this version of EC-Earth, both the atmospheric and the oceanic component output diagnostics through the asynchronous XIOS servers, contributing to reduce the I/O overhead and improving scalability, which will be evaluated at one of the forthcoming pre-exascale EuroHPC systems.

        Speaker: Miguel Castrillo (BSC-CNS)
      • 08:40
        An update on the LFRic project 20m

        This talk will focus on the latest developments within the NGMS LFRic project, a replacement for the Met Office weather and climate model based on a separation of concerns approach splitting the natural science code from the computational optimisations. The talk provides details on on-going improvements to the performance and implementation of the full global atmosphere model implemented using the LFRic framework.

        The Met Office, in collaboration with STFC Daresbury have been developing LFRic to meet the challenges of an uncertain future regarding computing architectures on the road to exascale computing. The new model follows a separation of concerns approach regarding science code and the optimised performance code. Code optimisations are generated at compile time using a Python based code generator (PSyclone) which parses the science code and applies appropriate optimisations provided as a ‘recipe’ based on the knowledge of the computational scientists. The generated code produced by PSyclone calls back to the LFRic infrastructure library which provides the appropriate functionality for parallelisation and optimisations. By doing this the burden of knowledge on individual developers is reduced and performance porting and parameterisations can be performed more easily.

        Speaker: Dr Andrew Coughtrie (Met Office)
      • 09:00
        Computational efforts to improve the OpenIFS efficiency towards the exascale era 20m

        The increase in the forecast capability of Numerical Weather Prediction (NWP) is strongly linked to the spatial resolution to solve more complex problems. However, this requires a large demand of computing power and it might generate a massive volume of model output. In this context, the improvement of the computational efficiency of NWP models will be mandatory.

        In this work we present our efforts to improve the efficiency of OpenIFS towards exascale computing. Three different efforts are presented, focusing on the efficient I/O management, but also on scalability.

        A new I/O scheme was integrated into OpenIFS 43R3 to provide the asynchronous and parallel I/O capabilities of the XIOS server as an alternative to the former sequential I/O scheme. The OpenIFS-XIOS integration contains all the FullPos post-processing features and spectral transformations to have model output in grid-point only representation.

        To further improve the OpenIFS-XIOS integration, we also considered using lossy compression filters that may allow reaching high compression ratios and enough compression speed to considerably reduce the I/O time while keeping high accuracy. In particular, we explore the feasibility of using the SZ lossy compressor developed by the Argonne National Laboratory (ANL) to write highly compressed OpenIFS data through XIOS.

        The last effort focuses on scalability. In order to anticipate the computational behaviour of OpenIFS for new pre-exascale machines, OpenIFS is therefore benchmarked on a petascale machine (MareNostrum 4) to find potential computational bottlenecks. Our benchmarking consists of large strong scaling tests (tens of thousands of cores, more than the 60% of MareNostrum 4) by running different output configurations.

        According to the results, the developments presented contribute to approach OpenIFS to the new upcoming HPC landscape: the XIOS server outperforms the sequential I/O scheme; the lossy compressor is faster than the default lossless one, achieving much higher compression ratios; and benchmarks suggest that OpenIFS scales reasonably well when using the hybrid approach MPI+OpenMP and XIOS.

        Speaker: Mr Xavier Yepes Arbós (Barcelona Supercomputing Center)
      • 09:20
        Break 30m
      • 09:50
        Bridging gaps: The Maestro Data-Aware Middleware 20m

        We will introduce the Maestro middleware framework, a data- and memory-aware abstraction for workflow coupling, inter- and intra-application data exchange and redistribution. A central design goal was to enable modelling of memory and storage hierarchies to allow for reasoning about data movement and placement based on costs of moving data objects. At the same time, data objects can carry user-defined metadata as it is typical in meteorological production workflows, allowing workflow management software to reason about the data without inspecting it, at scale.
        We will also present an outlook on future requirements for such frameworks in the context of federated data handling and cross-site workflow coupling.

        Speaker: Dr Utz-Uwe Haus (HPE HPC EMEA Research Lab)
      • 10:10
        On the Convergence of HPC, Cloud and Data Analytics for Exascale Weather Forecasting - ECMWF Present and Future 20m

        ECMWF operational forecast generates massive amounts of I/O in short bursts, accumulating to tens of TiB in hourly windows. From this output, millions of user-defined daily products are generated and disseminated to member states and commercial clients all over the world. These products are processed from the raw output of the IFS model, within the time critical path and under strict delivery schedule. Upcoming rise in resolution and growing popularity will increase both the size and number of these products.

        The adoption of a new object store (FDB version 5) for the time-critical operations has opened the door for more comprehensive improvements to the post-processing chain and enabled new access paths to very high-resolution time critical datasets. These improvements will bring product generation and data analytics closer to the NWP model and the model output data, to build true data-centric processing and analytics workflows.

        These are part of ECMWF plans to achieve Exascale NWP by 2025 and to empower our users and member states with novel and increased usage of our weather forecast data. As Exascale NWP datasets are expected to feature between 250 TiB to 1 PiB per forecast cycle, the data-centric approach is critical to enable their efficient usage, by minimising data transport and bringing post-processing and insight discovery closer to the data source.

        We present the latest ECMWF developments in model I/O, product generation and storage, and how we are reworking our operational workflows to adapt to forthcoming new architectures and memory-storage hierarchies, as we build bridges from HPC data producer to Cloud based data analytics workflows.

        Speaker: Tiago Quintino (ECMWF)
      • 10:30
        Autosubmit: An end-to-end workflow manager 20m

        The execution of Weather and Climate simulation models requires orchestrating a series of jobs (steps) that depend on each other. In the context of High-Performance Computing, resources are usually scarce as the users' demand rises as their simulations compete in the scheduling system. An HPC platform usually implements clusters of different capacities and technologies, each with its scheduling system. This feature presents the opportunity to minimize waiting times by executing jobs on other platforms while also keeping track of their execution status. However, as we try to maximize the usage of the available clusters, the complexity of the workflow increases. Autosubmit, a workflow management software system, allows users to configure their simulation experiments to run on multiple platforms. After the user configures the experiment and the configuration is validated, Autosubmit assumes the execution and sends the available jobs to their respective platforms. It keeps track of the jobs' status by querying the scheduling systems of the platforms and stores relevant information for later reporting.

        Some scheduling systems present features that Autosubmit can exploit to achieve lower waiting times. For example, Autosubmit can wrap an independent group of mutually dependent jobs to appear as a single large job. Then, this large ball of jobs will spend time in the queue only once and then execute its sequence of jobs. A Graphical User Interface in web technology (Javascript and ReactJS) enhances Autosubmit; we call it Autosubmit GUI. We present dynamic and current representations of the experiment on this web, along with important information from each job. We represent the data collected by Autosubmit so the user can compare past executions of the experiment at a job level of detail. Autosubmit manages the workflow from the HPC level to the front-end graphical representation level.

        Speaker: Wilmer Uruchi (Barcelona Supercomputing Center)
      • 10:50
        Activities in Gather.Town 1h 10m
    • 14:50 18:00
      Session 4
      Convener: Michael Lange (ECMWF)
      • 14:50
        Keynote: Preparing for Extreme Heterogeneity in High Performance Computing 1h

        While computing technologies have remained relatively stable for nearly two decades, new architectural features, such as heterogeneous cores, deep memory hierarchies, non-volatile memory (NVM), and near-memory processing, have emerged as possible solutions to address the concerns of energy-efficiency, manufacturability, and cost. However, we expect this ‘golden age’ of architectural change to lead to extreme heterogeneity and it will have a major impact on software systems and applications. Software will need to be redesigned to exploit these new capabilities and provide some level of performance portability across these diverse architectures. In this talk, I will survey these emerging technologies, discuss their architectural and software implications, and describe several new approaches (e.g., domain specific languages, intelligent runtime systems) to address these challenges.

        Speaker: Jeffrey Vetter (Oak Ridge National Laboratory)
      • 15:50
        Break 30m
      • 16:20
        Tasmania: A stencil-oriented framework for performance-portable weather and climate applications in Python 20m

        Dynamical cores and physical parameterizations have traditionally been engineered in isolation for the sake of tractability. The compartmentalization of both the modeling, numerical and software development facilitates the proliferation of model components with incompatible structures. The implementation is often tailored to the specifics of the intended host model, so that transferring schemes between models often requires a significant effort and involves sophisticated interfaces. In the attempt to alleviate these problematics, the Tasmania framework offers domain scientists a favorable platform in Python to write model-agnostic and plug-compatible components with a clean and common interface. Each component explicitly states the name, dimensions and units of both the fields required as input and the quantities provided in output. This enhances the overall readability of the code and accommodates sanity checks, units conversion, axes transposition, and data-dependency analyses. Components can be composed via couplers to form a hierarchy of models with different levels of complexity. Each coupler pursues a well-defined physics-dynamics or physics-physics coupling algorithm. Within each component, stencil-based computations arising from Eulerian-type dynamics and single-column physics can be encoded using a variety of tools, ranging from scientific computing packages like NumPy and CuPy, to just-in-time accelerators like Numba, to domain specific libraries like GT4Py. Indeed, Tasmania allows to define, organize and manage multiple backends in an organic fashion. Infrastructure code ensures that memory allocation, kernel compilation and stencil launch are properly dispatched. This highly relieves the user code of boilerplate and backend-specific instructions, favors a smooth transition from prototyping to production, and ultimately enables performance portability. By means of an idealized three-dimensional mountain-flow model, we show how Tasmania combined with GT4Py provides an effective solution to build modular applications which run optimally on both CPU and GPU, thus overcoming the well-known slowness of the Python interpreter.

        Speaker: Stefano Ubbiali (ETH Zurich)
      • 16:40
        Implementation of a performance-portable global atmospheric model using a domain-specific language in Python 20m

        The weather and climate community has set ambitious goals to reach global km-scale modeling capability on future exascale high-performance computing (HPC) systems. But currently, state-of-the-art models are executed using much coarser grid spacing and almost none of the productive weather and climate models are capable of exploiting modern HPC architectures with hybrid node designs.

        Alongside rapidly evolving HPC hardware, new associated programming models are coming out and no de-facto standard has been adopted by the community. As a consequence, some groups are opting for compiler directives, some groups are shifting from Fortran to other programming languages and - finally - some groups are actively developing and using domain-specific languages (DSLs) and compilers. By increasing the level of abstraction in the user code, a DSL toolchain is able to target different programming models and hardware architectures, as well as apply domain-specific optimization which are out of reach of standard compilers.

        In this talk we summarize our experience from an effort to port the FV3GFS/xSHIELD atmospheric model to a Python-based DSL. We will discuss design decisions, testing strategy and the use of containers for development, as well as performance compared to the original Fortran code. We will conclude with an outline of the further development roadmap as well as a summary of remaining challenges.

        Speaker: Oliver Fuhrer (Allen Institute for Artificial Intelligence)
      • 17:00
        Exascale Computing for NWP and Climate Science 20m

        The Exascale era will bring a variety of challenges and opportunities to the weather and climate communities. As we reach the limits of what can be done using traditional technologies and methodologies, we see an explosion not just in processor diversity but also in the methods used to do science and to produce actionable weather and climate information. These changes will demand agility – from the applications and from the scientists who develop and use them.

        This talk will explore the changing landscape of computing in the exascale era and introduce the SmartSim library, which enables physical models to interface with the rapidly growing ecosystem of data science and machine learning.

        Speaker: Dr Ilene Carpenter (Hewlett Packard Enterprise)
      • 17:20
        Activities in Gather.Town 40m
    • 08:20 12:00
      Session 5
      Convener: Peter Dueben (ECMWF)
      • 08:20
        Single-Precision in Earth-System Models 20m

        Earth-System models traditionally use double-precision, 64 bit floating-point numbers to perform arithmetic. According to orthodoxy, we must use such a relatively high level of precision in order to minimise the potential impact of rounding errors on the physical fidelity of the model. However, given the inherently imperfect formulation of our models, and the computational benefits of lower precision arithmetic, we must question this orthodoxy. At ECMWF, a single-precision, 32 bit variant of the atmospheric model IFS has been undergoing rigorous testing in preparation for operations for around 5 years. The single-precision simulations have been found to have effectively the same forecast skill as the double-precision simulations while finishing in 40% less time, thanks to the memory and cache benefits of single-precision numbers. Following these positive results, other modelling groups are now also considering single-precision as a way to accelerate their simulations.

        In this talk I will present the rationale behind the move to lower-precision floating-point arithmetic and up-to-date results from the now-operational single-precision atmospheric model at ECMWF. I will also present new results from running ECMWF's coupled atmosphere-ocean-sea-ice-wave system entirely with single-precision. Finally I will discuss the feasibility of even lower levels of precision, like half-precision, which are now becoming available through GPU- and ARM-based systems such as Summit and Fugaku, respectively.

        Speaker: Dr Sam Hatfield (ECMWF)
      • 08:40
        A mixed precision implementation in Numerical Weather Prediction models 20m

        In the past few years the Barcelona Supercomputing Center has been involved in different european projects which aimed to optimize the computational performance of oceanographic state of the art models. In this framework we developed, and successfully applied, a tool able to automatically identify the numerical precision required for each real variable present in a given Fortran code.

        We will present the work that has been done to adapt this workflow to the Panther library. Panther (P-Adaptive Numerical Tool for High order Efficient discRetizations) is a discontinuous finite elements library developed at ECMWF for Numerical Weather Prediction models. It includes both a full dynamical core prototype based on the combination of the semi-Lagrangian (SL) semi-implicit (SI) time integration approach with the discontinuous Galerkin (DG) space discretization method as well as different transport schemes such as SL-DG advection. We focused on the SL-DG solver. Starting from the style of the code this work presented a great challenge: up to now we always confronted ourselves with monolithic Fortran 90 style codes (if not mixed with Fortran77), while Panther uses the most advanced features of Fortran 2018, making of this code a good example of a Fortran Object Oriented Programming style code. We will show how we were able to discriminate, among all the variables, those that are more sensitive to numerical precision and can not be demoted to single precision. Eventually we will illustrate the general gain in performance that can be achieved with this method.

        Speakers: Stella Valentina Paronuzzi Ticco (Barcelona Supercomputing Center), Giovanni Tumolo (ECMWF)
      • 09:00
        Supercomputer Fugaku and new achievement using Fugaku and AI technologies for high-resolution, real-time tsunami inundation prediction 20m

        Supercomputer Fugaku, which is developed by RIKEN and Fujitsu, is the first supercomputer in the history receiving the first prizes in three major supercomputers’ ranking, TOP500, HPCG, and Graph500 at the same time in June 2020. It also received the first prize in the HPL-AI ranking. Fugaku is holding four prizes for three consecutive terms now.
        Fujitsu’s commercial product PRIMEHPC FX1000 is the same hardware with the Fugaku and capable to support the same software stack except LLIO, Light weight Layered IO accelerator designed for the large and scalable environment like Fugaku system. PRIMEHPC FX700 uses A64FX CPU used for Fugaku, while supporting the de facto standard interconnect, InfiniBand HDR. Fujitsu supercomputers and its derivatives form a center of excellence of Arm HPC ecosystem.
        This presentation also introduces Fugaku use case as actual example.
        By performing tens of thousands of high-resolution tsunami simulations and training the simulation results with AI on the supercomputer "Fugaku", we have developed a new AI model that can predict high-resolution tsunami inundation on a normal PC. When an earthquake occurs, by inputting the tsunami waveform data observed offshore into the newly developed AI model, it is possible to predict the coastal flood situation before the tsunami arrives with high spatial resolution. This high-resolution model provides detailed flood prediction and insights into the impact of tsunamis on coastal infrastructure such as buildings and roads. In addition, since the AI model pre-trained on the supercomputer "Fugaku" can be executed on a normal PC in a few seconds, it is easy to build a real-time flood prediction system that previously required urgent real-time use of the supercomputer.

        Speakers: Mr Toshiyuki Shimizu (Fujitsu Limited), Dr Yusuke Oishi (Fujitsu Limited)
      • 09:20
        Break 30m
      • 09:50
        DKRZ site news: The new system 20m

        I will present the new HPC and HSM systems at DKRZ which will go into operation mid 2021 and end of 2020 respectively.
        Special focus will lie on the Cooperation between Bull/ATOS and DKRZ to evaluate the potential of GPGPUs for the second phase of the new system.

        Speaker: Hendryk Bockelmann (DKRZ)
      • 10:10
        AMDs Journey to Exascale 20m

        With the Frontier and El Capitan systems on the horizon, ExaFLOP-sized systems have now become a reality. Manufacturers, HPC users, and with especial interest from weather community, are all now investigating how to most easily leverage hardware, software, and toolchains in order to scale problems and codes across this new generation of systems.

        In this presentation the AMD HPC Centre of Excellence will discuss AMDs approach in this journey. With a brief synopsis of where we see the direction of travel for HPC to the middle of this decade we will discuss some methods of characterization of WRFv3, and how this lends itself to the CPU architecture (last level cache, memory) and AMD EPYC™ processors. We will then review AMDs approach to migrate codes from the CPU to the GPU through hardware like AMD Instinct™, open-source software development environment, and standardized programming paradigms (the OpenMP API “target” directives and the HIP programming model). We will also briefly present how to port existing codes to these programming paradigms.

        Speaker: Mr Mathieu Gontier (AMD)
      • 10:30
        Performance Portability for Existing Weather and Climate Models using PSyclone: Application to the NEMO Ocean Model. 20m

        The high-performance computing landscape today is varied and complex with a wide range of both hardware and software technologies available. Given the size, longevity and performance requirements of weather and climate models, this landscape and its continuing evolution presents a problem: it is simply not feasible to repeatedly re-write a model every time a new supercomputer or programming model comes along. Domain Specific Languages (DSLs) are a potential solution to this problem since they permit a Separation of Concerns for the scientific and performance aspects of a code. Consequently, it becomes possible to have single-source science code that can be translated/transformed into code that performs well on a variety of architectures.

        PSyclone is a code-generation and transformation tool that has been developed to support the DSL being used by the UK Met Office’s LFRic Model. One of the drawbacks of DSLs is that they require a revolution: the whole model must be re-written in the new language. This represents a very significant investment and brings with it concerns about the sustainability of the associated tool chain. For this reason, some communities are rightly cautious about adopting DSLs. Therefore, PSyclone has also been developed to support an evolutionary approach to DSL adoption by working with existing Fortran code. In this presentation we will describe the approach taken and present results for the application of PSyclone to the NEMO ocean model, allowing it to be run at scale on both GPUs and in hybrid MPI/OpenMP mode.

        Speaker: Dr Andrew Porter (STFC Hartree Centre)
      • 10:50
        Activities in Gather.Town 1h 10m
    • 14:50 18:15
      Session 6
      Convener: Willem Deconinck (ECMWF)
      • 14:50
        Keynote: Towards Earth System Modeling Systems at Storm Resolving Resolutions 1h

        Research applications employing global atmospheric models of weather, climate and air-quality are moving to ever higher resolutions. One of the most important motivations for doing this is to remove the need to parameterize clouds, which requires horizontal meshes with cell spacing of a few kilometers and a vertical spacing of a few hundred meters. While we can perform exploratory regional or locally-refined global simulations on today’s systems, further advances are needed to bring these capabilities to the Earth System research community on a routine basis. It is now becoming clear that achieving this goal will not only require accelerated architectures and new programming models, but also a blend of new computational and data science algorithms.

        This lecture will focus on efforts at the National Center for Atmospheric Research to tackle these challenges. Next year, NCAR will deploy a new ~20 petaflops hybrid CPU/GPU system, Derecho. At the same time, an NSF-funded CSSI project called EarthWorks aims to develop a (mostly) GPU-resident Earth System model capable of running at global storm-resolving (GSR) resolutions. EarthWorks, a partnership between NCAR and Colorado State University, leverages the infrastructure of the Community Earth System Model (CESM) model as well as scalability improvements made to the Community Atmosphere Model (CAM) within the CESM, through another NCAR initiative called the System for Integrated Modeling of the Atmosphere (SIMA). A significant departure from current CESM model configurations made in EarthWorks is the use of the MPAS-Ocean model developed and maintained at Los Alamos National Laboratory as part of the Climate, Ocean and Sea Ice Modeling (COSIM) project.

        Beyond the modeling aspects, EarthWorks also focuses on identifying and testing frameworks for integrating machine learning inference capabilities into ES models, and the development of parallel data compression and analysis workflows designed to handle the enormous data volumes produced by GSR simulations.

        Speaker: Richard Loft (National Center for Atmospheric Research)
      • 15:50
        Break 30m
      • 16:20
        Hybrid multi-grid parallelization of WAVEWATCH III model on spherical multiple-cell grids 20m

        Spherical Multiple-Cell (SMC) grid is an unstructured grid, supporting flexible domain shapes and multi-resolutions. It retains the quadrilateral cells as in the latitude-longitude grid so that simple finite difference schemes can be used. Sub-timesteps are applied on refined cells and grid cells are merged at high latitudes to relax the CFL restriction. A fixed reference direction is used to solve the vector polar problem so that the whole Arctic can be included. The SMC grid has been implemented in the WAVEWATCH III (WW3) wave model since 2012 and updated in the latest WW3 V6.07. A 4-level resolution (3, 6,12, 25 km) global wave forecasting model has been used in the Met Office since October 2016, leading to great reduction of model errors and removal of our European regional wave model. WW3 model parallelization is by wave spectral component decomposition (CD) in MPI mode, which has a limit on number of MPI ranks. Hybrid (MPI-OpenMP) parallelization may extend the node usage but the OpenMP scalability flattens out beyond a few threads. Another parallelization method to combine CD with domain decomposition (DD) is enabled in WW3 model by a multi-grid framework for further extension of node usage. The SMC grid is recently added into the multi-grid framework and its flexible domain shape allows optimized domain splitting and minimized boundary exchanges. The combined CD-DD method is tested with SMC sub-grids with various hybrid node-thread combinations. Results indicate that switching from pure MPI to hybrid MPI-OpenMP mode can halve the global model elapsed time. Using hybrid CD-DD on 3 SMC sub-grids may reduce the time further by 30%. Node usage is extended from 10 to 180 nodes and elapsed time for one model day run is reduced from 3.5 min on 10 nodes to 1 min on 180 nodes. Besides, the hybrid mode reduces memory demand on one computing node and allows future model updates to higher resolutions.

        Speaker: Dr Jian-Guo Li (Met Office)
      • 16:40
        Exploring the Programming Models for the LUMI Supercomputer 20m

        The EuroHPC organization is procuring many supercomputers for various consortiums across Europe. One of them is the LUMI supercomputer and the main performance is provided by AMD GPUs. A user with no knowledge of programming on GPUs or not familiar with AMD GPUs, has to make some decisions among which programming model to use depending on many factors.
        In this presentation we describe a short introduction on porting applications to AMD GPUS and we discuss various programming models for porting codes, such as HIP, OpenMP Offloading, hipSYCL, Kokkos, and Alpaka. We present a brief introduction and comparison to some of the programming models and also benchmarking with BabelStream and some other benchmarks on AMD MI100 GPU and mentioning how to tune some parameters based on the GPU hardware. Finally, we compare results between AMD MI100 and NVIDIA (V100 and A100).

        Speaker: Dr George Markomanolis (CSC - IT Center For Science Ltd)
      • 17:00
        Intel® HPC update for Earth System Modelling 20m

        We present an update on Intel® hardware and software roadmaps with an emphasis on technologies most impacting Earth Systems Modelling. In the hardware update, we describe innovations in upcoming 4th Generation Intel® Xeon® Scalable Processors, codenamed “Sapphire Rapids”, and the benefits of accompanying converged technologies such as Intel® Optane™ persistent memory and DAOS exascale-class storage to data-centric workflows. In the software update, we present how the features of Intel® oneAPI enable cross-architectural programming based on established language standards. Finally, we preview future Intel® Xe-HPC -based “Ponte Vecchio” GPUs for HPC and AI.

        Speaker: Robert Wisniewski (Intel Corporation)
      • 17:20
        Virtual dinner speech 15m
        Speaker: Isabella Weger (ECMWF)
      • 17:35
        Activities in Gather.Town 40m
    • 08:20 12:00
      Session 7
      Convener: Andreas Mueller (ECMWF)
      • 08:20
        Keynote: Fugaku: the First “Applications First” Exascale Supercomputer and Beyond 1h

        Fugaku was developed with the "application first" philosophy, aiming to achieve up to 100x speedup over its predecessor, the K-computer, while being extremely general purpose via the adoption of the Arm instruction set. In order to attain its performance requirement while being power efficient, the design emphasized low power circuit designs, while realizing extreme high bandwidth via the use of HBM2 and the Tofu-D network, which embeds high performance network switch on every one of its 160,000 nodes. The resulting machine is very well balanced, accommodating extreme high bandwidth, appropriate for climate and meteorological applications. It is expected that Fugaku will be a platform to innovate such areas, for safety and sustainability of the society

        Speaker: Satoshi Matsuoka (Riken Center for Computational Science)
      • 09:20
        Break 30m
      • 09:50
        Elastic Large Scale Ensemble Data Assimilation with Particle Filters for Continental Weather Simulation 20m

        Particle filters are a major tool used for data assimilation (DA) in climate modeling. The ability to handle a very large number of particles, is critical for high dimensional climate models. The presented approach introduces a novel way of running such DA studies for the example of a particle filter using sequential important resampling (SIR). The new approach executes efficient on latest high performance computing platforms. It is resilient to numerical and hardware faults while minimizing data movements and enabling dynamic load balancing. Multiple particle propagations are performed per running simulation instance, i.e., runner, each assimilation cycle. Particle weights are computed locally on each of these runners and transmitted to a central server that normalizes them, resamples new particles based on their weight, and redistributes the work to runners one by one to react to load imbalance. Our approach leverages the multi-level checkpointing library FTI, permitting particles to move from one runner to another in the background while particle propagation goes on. This also enables the number of runners to vary during the execution either in reaction to failures and restarts, or to adapt to changing resource availability dictated by external decision processes. The approach is experimented with the Weather Research and Forecasting (WRF) model, to assess its performance for probabilistic weather forecasting. Multiple thousand particles on more than 20000 compute cores are used to assimilate cloud cover observations into short-range weather forecasts over Europe.

        Speaker: Sebastian Friedemann (UGA)
      • 10:10
        Exascale-ready adaptive mesh refinement and applications in Earth system modelling 20m

        Increasing the resolution of the computational mesh is one of the most effective tools to boost the accuracy of numerical earth system simulations and one of the current core challenges is enabling global sub-km scale simulations. However, increased mesh resolution comes at the cost of increased computational effort and memory consumption. With adaptive mesh refinement (AMR) the resolution can be dynamically controlled to use a fine resolution only when numerically necessary and keep a coarser mesh outside of regions of interest, hence reducing the required resources by orders of magnitude. Especially when the mesh refinement changes dynamically in time, efficiently managing the mesh and simulation data in parallel becomes a major challenge on its own. Modern space-filling curve (SFC) techniques are well-suited for this task due their low-memory footprint and fast scalable management algorithms. Previously, these techniques were only available for hexahedral or 2D triangular element shapes. We demonstrate the t8code library for massively scalable AMR that extends SFCs to all classic element shapes (Quadrilaterals, Triangles, Hexahedra, Tetrahedra, Prisms, Pyramids) and hence offers efficient and scalable mesh management for a wide variety of simulations. Our algorithms scale to over 1e12 mesh elements and 1 million parallel processes. In the ongoing Helmholtz incubator project Pilot Lab Exascale Earth System Modelling (PL-ExaESM) we couple t8code with the Modular Earth Submodel System framework MESSy in order to reduce the output file size of atmospheric chemistry simulations. In this talk we present our first results and discuss further plans with AMR in ESM.

        Speaker: Dr Johannes Holke (German Aerospace Center (DLR))
      • 10:30
        Towards fault tolerance in high-performance computing for numerical weather and climate prediction 20m

        Progress in numerical weather and climate prediction accuracy greatly depends on the growth of the available computing power. As the number of cores in top computing facilities pushes into the millions, increased average frequency of hardware and software failures forces users to review their algorithms and systems in order to protect simulations from breakdown.

        This talk will discuss hardware, application-level and algorithm-level resilience approaches of particular relevance to time-critical numerical weather and climate prediction systems, analysing a selection of applicable existing strategies. Numerical examples will showcase the performance of the techniques in addressing faults, with particular emphasis on iterative solvers for linear systems, including results with a new fault-tolerant version of the Generalized Conjugate Residual Krylov solver used in the next-generation ECMWF's FVM dynamical core.

        The potential impact and resilience-performance trade-offs implied by these strategies will be considered in relation to current development of numerical weather prediction algorithms and systems towards the exascale.

        Speaker: Dr Tommaso Benacchio (Politecnico di Milano)
      • 10:50
        Activities in Gather.Town 1h 10m
    • 14:50 18:00
      Session 8
      Convener: Balthasar Reuter (ECMWF)
      • 14:50
        European Weather Cloud: A community cloud service tailored for Meteorology 20m

        ECMWF (European Centre for Medium-Range Weather Forecasts) jointly with EUMETSAT (European Organisation for the Exploitation of Meteorological Satellites) work together in a project named “European Weather Cloud” (https://www.europeanweather.cloud/). The strategic goal of this initiative is to build and offer a community cloud infrastructure on which Member and Co‐operating States of both organizations can create and manage on-demand virtual resources enabling access to the ECMWF’s Numerical Weather Prediction (NWP) products and EUMETSAT’s satellite data in a timely, efficient, and configurable fashion. Moreover, one of the main goals is to involve more entities in this initiative in a joint effort to form a federation of clouds/data offered from our Member States, for the maximum benefit of the European Meteorological Infrastructure as well as to support WMO NMHSs.

        During the current pilot phase of the project several use cases have been defined and f both organizations have organised user and technical workshops to actively engage with the meteorological community to align the evolution of the European Weather Cloud to reflect and satisfy their goals and needs.

        In this presentation, the status of the project will be analysed describing the existing infrastructure, the offered services and how these are accessed by the end-users along with examples of the existing use cases. Moreover, we will present the approach we followed to deploy the pilot infrastructure at ECMWF, the decisions made, the challenges and opportunities and lessons learned from this exercise. The initial thoughts and approaches currently considered for the Cloud -HPC convergence will be discussed, together with a review of similar activities.
        Finally, the plans, next steps for the evolution and the transition to operations of the European Weather Cloud and its relationship with other projects and initiatives such as DestinE will conclude the presentation.

        Speaker: Dr Vasileios Baousis (ECMWF)
      • 15:10
        IFS on AWS - Running RAPS in the cloud 20m

        Maxar was contracted by the European Center for Medium-Range Weather Forecasts (ECMWF) to run a selection of the modeling systems contained in the Real Applications on Parallel Systems (RAPS) on Amazon Web Services (AWS) cloud computing resources. The scope of work was limited to running the ‘high-resolution forecast’ configuration of the atmospheric model Integrated Forecast System (IFS), coupled to the ocean model Nucleus for European Modelling of the Ocean (NEMO) and to the Wave Model (WAM).

        This presentation will report on the work to run RAPS in the cloud.
        Starting from the transfer of data and compilation, to simple tests of uncoupled IFS at low resolution, to IFS run at a resolution of TCo1279L137 coupled with NEMO and WAM. Performance and scaling results will be presented, as well as comparisons to twin runs on the Cray XC40. Some discussion of the costs of running in the cloud will be included.

        Speaker: Brian Etherton (MAXAR)
      • 15:30
        Can cloud computing accelerate transition of research to operations for workflows with cycling data assimilation? 20m

        In numerical weather prediction, transition of research to operations often requires extensive computational experiments with cycling data assimilation, where short (6-12 hours) model forecasts are cycled in sequence with data assimilation and data handling tasks. While computationally expensive (~100M CPU hours), these cycling experiments scale poorly (~10,000 CPUs) and have to be performed sequentially. This requirement is further exacerbated at NOAA where the need to calibrate sub-seasonal forecasts with the new coupled model (UFS) requires development of the new coupled reanalysis product that spans a 30-year period. Existing on-premises NOAA HPC resources are insufficient to accommodate the additional workload of generating a new reanalysis in a timely manner. To prove feasibility of using the commercial cloud to surge computational resources for reanalysis production, NOAA-PSL has ported a complete cycling system to the AWS and MS Azure cloud providers.

        The initial prototype was by-design deployed at a low resolution (1 degree configuration of the coupled ocean, atmosphere, and ice UFS model) and in deterministic-only configuration (GSI 3DVAR for atmosphere and JEDI SOCA 3DVAR for ocean and ice). We used the Cylc workflow engine to orchestrate the cycling tasks and Slurm to manage the job scheduling and resource allocation. We tried to replicate on-premise configuration of the compiler and architecture suite by using Intel’s 2018v4 compiler, NOAA-EMC’s hpc-stack, and using Intel-based computing instances. On AWS, we configured the cluster using a single master node (m5.large or c5n.2xlarge), six compute nodes (c5n.18xlarge), and file storage based on a shared Elastic Block Storage (EBS) volume or FSx Lustre. On Azure, we configured the cluster using a single master node (Standard_H8 or Standard_HC44rs), four compute nodes (Standard_HC44rs), and file storage based on either BeeOND or Lustre. Observations and initial conditions were staged on an AWS S3 bucket and this bucket was used by both cloud providers. Our initial assumptions about the ability to use singularity containers to easily port the workflow between platforms proved overly optimistic as the container configuration and compile had to match the specifics of the MPI drivers installed on each provider. Our hopes to efficiently utilize spot pricing were also overly optimistic where we experienced poor availability of the preferred instances. Performance and cost were similar between the two providers and comparable to prototype runs on the on-premise HPC (when the cost of waiting in the queue was excluded) demonstrating feasibility of using cloud computing as a surge platform for NWP computations.

        Speaker: Sergey Frolov (Naval Research Laboratory)
      • 15:50
        Break 10m
      • 16:00
        Developing Climate Models in Python 20m

        AI2 Climate Modeling is working to improve the FV3GFS weather and climate model using online machine learning parameterizations, and to add GPU capability and improve its useability by porting it to a Python-based domain specific language (DSL). Both these efforts have been greatly facilitated by wrapping the existing Fortran code as a library which can be imported in Python, called fv3gfs-wrapper (McGibbon et al. 2021). This package provides simple interfaces to progress the model main loop and get or set variables used by the Fortran model. This enables the main components of the time integration loop to be written in Python, allowing arbitrary Python code to interact with the model state. Furthermore, the base model is as performant when wrapped in Python as it is in pure Fortran.

        To demonstrate the wrapper we will show executable examples, including simple MPI communication, adding machine learning to the model, and running interactively in a Jupyter notebook. Everything is written in Docker, allowing you to run these examples on your laptop. If the correct environment is installed, they can also be run on MPI-enabled supercomputing systems.

        We will also discuss what it is like developing a weather and climate model based in Python. With this approach, we are able to independently and interactively develop and evaluate model components, and take advantage of a wide array of community libraries for data loading, data visualization, and machine learning. We will also showcase easy-to-use static typing, testing, and debugging tools available in Python which are particularly useful to a model developer.

        Speaker: Jeremy McGibbon (Allen Institute for Artificial Intelligence)
      • 16:20
        Accelerating IFS Physics Code on Intel GPUs Using OneAPI 20m

        I will discuss the porting of ECMWF physics code, using the Cloudsc dwarf app as an example, for acceleration on Intel's PVC GPU. The discussion will focus on a SyCL/DPCPP approach, and include questions of architecture and workload mapping relevant to the performance of the target weather code. If time allows I may also explore how the ported code can then be enabled for other hardware targets.

        Speaker: Dr Camilo Moreno (Intel Corporation)
      • 16:40
        Optimization of ACRANEB2 radiation kernel on Intel Xeon Processors. 20m

        In the talk, we describe the optimization techniques employed to accelerate the performance of a radiation kernel (ACRANEB2) on Intel Xeon Processors. We briefly describe the compute capabilities of latest Intel Processors and implications on HPC application performance. In our performance analysis of ACRANEB2, we have identified two top hot regions involving mathematical functions and prefix-sum calculations. Date dependency in prefix-sum calculations makes it harder for the Compliers to vectorize the computations and leads to poor out-of-box performance. Hence, to effectively use the AVX512 vectors units, we developed an optimized version of prefix-sum using explicit vectorization techniques which will be discussed in detail in the talk. Our performance results show that the average speed-up of the explicit SIMD prefix-sum over the baseline and OpenMP SIMD implementations is 4.6x (GCC and Clang) and 1.6x (ICC), respectively. For better integration into ACRANEB2, we extended the standalone prefix-sum to a packed implementation that operates on a set of input vectors. In addition, we also block and fuse prefix-sum calculations with math operations to maximize the gains from AVX512 vectorization. Overall, this led to performance gains of up to 1.3x over baseline for ACRANEB2 on latest Intel Xeon Processors.

        Speaker: Vamsi Sripathi (Intel)
      • 17:00
        Activities in Gather.Town 40m
    • 08:20 12:00
      Session 9
      Convener: Olivier Marsden (ECMWF)
      • 08:20
        Studying a new GPU treatment for chemical modules inside CAMP 20m

        A novel solution to speedup chemistry modules in atmospheric models will be presented. The Chemistry Across Multiple Phases (CAMP), a new flexible treatment for gas- and aerosol-phase chemical processes, is used as our testbed. The model allows multiple chemical processes (e.g., gas- and aerosol-phase chemical reactions, emissions, deposition, photolysis, and mass-transfer) to be solved simultaneously as a single system. In this particular contribution not only CAMP is presented, but also the innovative solutions to adapt CAMP for GPU accelerators. CAMP is adapted to allow the solving of multiple atmospheric grid-cells in a given spatial region as a single system, as opposed to the common practice of solving them individually. It reduces the number of solving iterations and the memory required to solve the cells in a parallel environment. In this work, the new implementation is tested on the linear solving routine. The KLU Sparse and the Biconjugate Gradient algorithms are used for taking CPU and GPU measurements respectively. Preliminary results present up to 35x speedup in the GPU solution compared to the original CPU version. The results also show a possible 70x speedup by avoiding the continuous data movement between CPU and GPU during the solving. Finally, a future version of a complete GPU chemical solver will be explored. For the complete version, we aim for 1) A full heterogeneous version using CPU and GPU resources and 2) Minimizing and hiding data movement between host and device.

        Speaker: Mr Christian Guzman Ruiz (Barcelona Supercomputing Center)
      • 08:40
        ESCAPE 2: Energy-efficient SCalable Algorithms for weather and climate Prediction at Exascale 20m

        In the simulation of complex multi-scale flow problems, such as those arising in weather and climate modelling, one of the biggest challenges is to satisfy operational requirements in terms of time-to-solution and available energy without compromising the accuracy and stability of the solution. These competing factors require extreme computational capabilities in conjunction with state-of-the-art algorithms that can optimally suit the targeted underlying hardware while improving the convergence to the desired solution. The European Centre for Medium Range Weather Forecasts (ECMWF) is leading the ESCAPE projects funded by Horizon 2020 under initiative Future and Emerging Technologies in High Performance Computing. The participating models are broken down into smaller building blocks called dwarfs. These are then optimised for different hardware architectures and alternative algorithms are investigated. It was shown that GPU optimisation which takes full advantage of NVLink interconnect can provide a massive speedup (23x for spectral transform and 57x for MPDATA) if all computations are run on the GPU. This work enabled ECMWF to use the GPUs of the Summit supercomputer in 1km simulations. This talk will give an overview of the optimisations required to achieve good performance on Summit and the results obtained.

        Speaker: Andreas Mueller (ECMWF)
      • 09:00
        NEC SX-Aurora TSUBASA for your better application performance 20m

        NEC SX-Aurora TSUBASA is a PCIe card type vector supercomputer and inherits the technology of SX Vector Supercomputer from 1983. Users can get the vector performance with some high-level languages, such as C/C++, Fortran or Python. The presentation will give the feature of SX-Aurora TSUBASA and the hardware/software roadmap for the future. And there will be some data which means that SX-Aurora TSUBASA is suitable for your numerical weather prediction and possibility what SX-Aurora TSUBASA can do for Exascale Computing.

        Speaker: Mr Yasuhisa Masaoka (NEC Corporation)
      • 09:20
        Break 30m
      • 09:50
        Driving Numerical Weather Prediction with NVIDIA technology 20m

        The performance focus of numerical weather prediction has traditionally been on the model and the large degree of parallelism offered by NVIDIA GPUs has attracted the attention of the modeling community early on. As a result, various forecast services run their operational model now on GPUs and it is expected that this trend will continue in the future. However, with the wealth of data being generated by models or high resolution sensors, other parts of the forecast pipeline are becoming bottlenecks, including pre-processing, assimilation or post-processing. New forms of data driven science enabled by AI, or digital twins offering interactive exploratory data analysis require not only increasing levels of processing power but also ask for advanced software infrastructure. NVIDIA’s recent hardware announcements, including ARM CPUs, smart network interfaces and GPUs, as well software layers, ranging from machine learning frameworks to the Omniverse platform, are well in alignment with these requirements. In this presentation I will cover some of the technologies and ongoing activities at NVIDIA that relate to numerical weather forecast systems. This includes recent progress in model acceleration, esp in light of the recently announced Grace ARM CPU, as well as results in AI based projects, including bias correction and tropical cyclones detection. In addition, we will cover recent visualization techniques and digital twin infrastructure with the Omniverse platform.

        Speaker: Jeff Adie (NVIDIA)
      • 10:10
        Overview of the Heterogeneous Computing Project in the Weather & Climate Center of Excellence 20m

        Within the Center of Excellence in HPC, AI and Quantum Computing for Weather & Climate launched by ECMWF and Atos, one of the projects looks at developing a CPU-GPU-based version of ECMWF’s Integrated Forecasting System (IFS), preparing the product-generation pipeline as well as data-centric workflows for new technologies. This presentation gives an overview of the project activity and an outlook for the next phase.

        Speaker: Dr Erwan Raffin (Atos)
      • 10:30
        Next generation ICON NWP forecasting system on NVIDIA GPUs at MeteoSwiss 20m

        In 2016 MeteoSwiss became the first national weather service to deploy in operations a NWP model forecast on GPUs. After five years and 2 generations of HPC systems based on NVIDIA GPUs, MeteoSwiss is preparing the transition from the regional NWP COSMO model to a new generation NWP forecasting system based on the ICON model. The new high resolution ICON based forecasting system will be deployed in the Alps HPC system infrastructure at CSCS in 2023. During the past years of developments, MeteoSwiss has prepared this transition porting the ICON model to GPUs and optimizing it to replace the COSMO model. In comparison to COSMO, not only ICON is composed by a larger code base but it exhibits a larger complexity in its structure and computational patterns, as for example those derived from the use of the icosahedral grid. In this work we will show the challenges of running NWP models operationally on GPUs and the technologies used for enabling the ICON model on GPU, with a combination of OpenACC and a new generation of high-level python DSL designed for the dynamical core of ICON. Finally we will present an outlook on developments at MeteoSwiss for future HPC system architectures.

        Speaker: Dr Xavier Lapillonne (MeteoSwiss)
      • 10:50
        Activities in Gather.Town 1h 10m
    • 13:20 16:50
      Session 10
      Convener: Patrick Gillies (ECMWF)
      • 13:20
        Machine learning, high performance computing and numerical weather prediction 20m

        This talk will present an overview on the use of machine learning at ECMWF. It will outline ECMWF's Machine Learning Roadmap for the next 5-10 years as well as the MAELSTROM EuroHPC project that will realise a co-design cycle for machine learning applications in weather and climate science. The talk will present challenges for machine learning in Earth system modelling and outline opportunities. It will discuss how machine learning can be introduced into the operational numerical weather prediction workflow, and the potential impact on computational efficiency and portability.

        Speaker: Peter Dueben (ECMWF)
      • 13:40
        Machine learning models to emulate gravity wave drag by Atos Center of Excellence 20m

        Atos Centre Of Excellence R&D team will present their collaboration work with the ECMWF team on exploring machine learning models to emulate gravity wave drag, especifically the parameterisation of non-orographic gravity wave drag.

        Speakers: Mr Alexis Giorkallos (Atos), Christophe Bovalo (Atos)
      • 14:00
        AI vs. mathematical modeling for climate and weather analysis 20m

        With the rise in the availability and importance of big data, machine learning approaches in the field of meteorology, including specifically in the alpine regions, have become rapidly more prevalent in the scientific literature. Artificial intelligence is a fast-changing field, with deep learning techniques (with important applications in computer vision) popularized in the last decade. Weather and climate forecasting using machine learning approaches has been shown to enhance conventional statistical techniques. Methods break down into a few major categories, including now-casting, short-range weather prediction, medium-range prediction, subseasonal forecasting, seasonal forecasting, and climate-change prediction. In this session, we raise the questions, will artificial intelligence totally replace numerical models? How can they supplement or enhance the performance and results of traditional mathematical models? In this scope, the answer to these questions can be different for the subareas of study within meteorological modeling. For instance, “Hard AI” refers to applications in which predictions on the corresponding timescales can be largely or completely replaced by artificial intelligence; in this case, physical constraints like conservation laws, are able to be ignored as marginal errors that do not accumulate to a significant level over time. Mobile phone data is an excellent source of data for this purpose, as it provides a large database of information to work with, which is necessary for machine learning. In general, a wide range of machine learning algorithms and models have use cases in alpine meteorology, from linear regression, to random forest ensembles (RFs), to convolutional neural networks (CNNs), to generative adversarial networks (GANs).

        Speaker: Thomas Chen (Academy for Mathematics, Science, and Engineering)
      • 14:20
        Break 30m
      • 14:50
        Tackling the EXAScaler Data Challenge 20m

        Exascale storage systems requires a core architecture in software, hardware and application IO protocol that can scale. Parallel filesystems, which feature a client that manages IO in parallel across the server infrastructure is an important component to remove storage backend congestion. i.e. Intelligence in compute, network and storage is needed to enable holistic scaling. Good economics is also critical and successful approaches must leverage low cost flash to be viable. Thirdly, distributed lock management must either be very efficient, often bypassing some troublesome parts of POSIX, and/or the applications require redesign of I/O to remove bottlenecks in lock management. We discuss DDN’s experience and approach to these three areas of EXAScaler storage and explain the pros and cons of each.

        Speaker: Dr James Coomer (DDN)
      • 15:10
        First Experiences with CDI-PIO on DAOS 20m

        CDI-PIO is the parallel I/O component of the Climate Data Interface (CDI) that is developed and maintained by the Max-Planck-Institute for Meteorology and DKRZ. It is used by ICON, MPIOM, ECHAM, and the Climate Data Operator (CDO) toolkit. The two main I/O paths for output data are writing GRIB files using MPI-IO, and writing NetCDF4 files using HDF5 (which may then also use MPI-IO, or other VOL plugins).
        The Distributed Asynchronous Object Storage (DAOS) is a new open source high performance object store for storage class memory and NVMe storage, which has been integrated into the ROMIO MPI-IO implementation. The HDF5 consortium is also developing a native HDF5 VOL plugin for DAOS.
        This presentation will outline how CDI-PIO can be run on a DAOS storage system using the ROMIO DAOS backend. We will also report first performance results comparing Intel DAOS and IBM Spectrum Scale on similar NVMe storage hardware.

        Speakers: Mr Michael Hennecke (Lenovo), Mr Thomas Jahns (DKRZ)
      • 15:30
        Accelerating Storage with Optane & DAOS 20m

        The Distributed Asynchronous Object Storage (DAOS, see http://daos.io) is an open source scale-out storage system designed from the ground up to deliver high bandwidth, low latency, and high I/O operations per second (IOPS) to HPC applications. It enables next-generation data-centric workflows that combine simulation, data analytics, and AI. This talk will first provide an overview of DAOS capabilities, then introduce how DAOS can accelerate meteorology workloads and finally present the roadmap and future features.

        Speakers: Johann Lombardi (Intel), Nicolau Manubens Gil (ECMWF)
      • 15:50
        Navigating the evolving path to exascale with NCAR’s Derecho 20m

        In 2018, NCAR began its efforts to design and procure the successor to its current 5.34-petaflops Cheyenne system. More challenging that past procurements, the effort faced a dynamic landscape in terms of application evolution, including GPU-ready models and machine learning; scientific demands, including convection-permitting Earth system modeling and subseasonal to decadal Earth system prediction; and a more diverse range of feasible technology options than had been available in nearly a decade. With scientific and computational advice from the user community and co-design efforts with vendors, NCAR navigated this complex landscape to design and procure the Derecho system and drive its next steps on the path to exascale Earth systems science.

        Speaker: Dave Hart (NCAR)
      • 16:10
        Closing remarks 10m
        Speaker: Isabella Weger (ECMWF)
      • 16:20
        Activities in Gather.Town & Close 30m