Workshop on software strategies for sustainable physical modelling
Workshop themes
Themes
- Programming Models
The choice of programming language has always been central to Earth system and wider model development. Fortran has served the community well for decades, with the IFS/Arpege code base standing as a world-leading example. Yet, with the evolution of HPC architectures—especially the rise of GPUs and domain-specific accelerators—questions are growing around Fortran’s long-term viability, particularly regarding compiler support and optimisation[1].
Python has emerged as a strong contender. Widely adopted in machine learning and scientific computing, Python offers flexibility, modularity, and ease of use for domain scientists. It supports hybrid modelling approaches that combine physics-based and machine-learned components, and its interactive capabilities make it ideal for diagnostics and plotting. However, Python’s performance limitations on HPC systems require careful use of optimised compilation backends such as GT4Py (developed in Europe for weather and climate models), Numba, Nvidia’s Warp, or MLIR. These tools enable Python to meet the demands of high-performance computing, but they also introduce new layers of complexity and maintenance.
The ongoing modularisation of the IFS is a key enabler of this transition. By breaking the monolithic code base into smaller, self-contained packages with clear APIs, it becomes possible to choose the most appropriate language for each component. This flexibility reduces the pressure to select a single language for the entire system. Moreover, recent advances in machine learning—including LLM-based tools—are beginning to support code translation and transformation across languages, further reducing barriers to innovation.
[ 1 ] Shipman and Randles, 2023, An evaluation of risks associated with relying on Fortran for mission critical codes for the next 15 years, https://www.osti.gov/servlets/purl/1970284/ - Composition and Sharing
The current IFS comprises multiple complex components, with the core atmospheric system developed and maintained jointly with the global and local models operated by Météo-France and ACCORD, all within a shared code base. Recent technical refactoring to enable GPU capabilities has highlighted the limitations of this monolithic structure, which contrasts sharply with the agility required for future physical forecasting systems. Efforts are underway at ECMWF to modularise key components—both shared and organisation-specific—to enhance maintainability, test coverage, and adaptability to emerging computing architectures. Select standalone components are being released as open-source projects, facilitating collaboration with academic and industrial partners. This approach has proved especially effective in supporting GPU adaptation. Our goal is to evolve towards a system architecture composed of individually tested component libraries, enabling granular adaptation to diverse programming models and HPC hardware.
Given the increasing diversity of Earth system components—including spectral and finite volume dynamical cores, ocean, wave, flood, and atmospheric chemistry models—and the potential shift towards high-level Python-based paradigms, it is essential to improve how these components are coupled and integrated into complete Earth system or data assimilation models. As we move towards a modular architecture, it is crucial to design coupling strategies that accommodate both physics-based components and data-driven models. This includes ensuring consistent data interfaces, reproducibility, and scalability so that machine-learned elements can complement traditional numerical schemes without compromising scientific integrity or operational robustness.
- Deployment
Through Destination Earth, ECMWF has gained experience deploying the IFS across multiple external HPC systems with diverse architectures. This capability marks a major step forward, but achieving the agility and portability seen in modern data-driven models—such as AIFS—remains a challenge. These models support “forecast-in-a-box” deployment, enabling rapid scaling from laptops to cloud platforms and operational HPC centres.
To move in this direction, containerisation and modular, high-level frameworks are key. Domain-specific DSLs like GT4Py for physical models and PyTorch for machine learning components offer promising pathways to unify workflows across heterogeneous environments. Our goal is to evolve IFS deployment into a flexible, reproducible system that supports hybrid physics–ML approaches and adapts seamlessly to future HPC architectures.
This requires rethinking packaging, dependency management, and orchestration tools, as well as improving workflows for testing and validation across platforms. By leveraging container technologies, modular APIs, and automated pipelines, we aim to make deployment as dynamic as model development itself.
- Hybrid modelling and the Future Data Assimilation System
A central challenge will be supporting hybrid modelling that combines physical and statistical approaches, including emulation of physical model components and nudging approaches. From a technical perspective, integration of ML poses challenges due to domain decomposition strategies intrinsic to physical models, which often conflict with the global or non-local architectures typical of ML emulators. Overcoming these challenges requires carefully designed interfaces that reconcile the spatial partitioning of numerical models with ML requirements, ensuring consistent data exchange, scalability, and performance.
Hybrid modelling is also interesting for data assimilation which will remain a fundamental component of future forecasting systems, not only because it provides high-quality training data for machine learning (ML) models, but also due to its unmatched ability to produce high-quality initial conditions. This calls for a flexible framework that enables seamless integration of ML models within the data assimilation process. ML integration will extend beyond hybrid models to important applications such as modelling background and observation error covariance structures with ML algorithms, as well as emulating observation operators, tangent linear, and adjoint models. These advancements aim to enhance assimilation performance while reducing computational costs through more efficient variational assimilation.
By embracing this hybrid modelling approach, the system should support the simultaneous optimization of physical states and ML-driven components for the forward model and within assimilation cycles, fostering innovations that combine established numerical methods with modern data-driven techniques.
- Workflow orchestration
As high-performance scientific codes evolve toward modular architectures, orchestrating workflows across diverse components and platforms becomes a critical challenge. Traditional tightly coupled pipelines of IFS are giving way to flexible, service-oriented designs that can integrate physics-based modules, machine-learning components, and data assimilation systems seamlessly. Future workflows must support scalability, portability, and reproducibility across heterogeneous environments—from laptops and cloud platforms to operational HPC centres.
Designing a flexible and scalable orchestration framework that supports both traditional forecasting and data assimilation workflows, as well as emerging data-driven methods, is a significant challenge. We wish to explore the key architectural choices needed to enable such a framework—one that empowers a new generation of operational suites by combining modularity, interoperability, and dynamic resource management. Our goal is to identify how these design principles can strengthen agility, reproducibility, and innovation across research and operational environments.