1ECMWF
The Copernicus Atmosphere Monitoring Service (CAMS), operated by the European Centre for Medium-Range Weather Forecasts (ECMWF) on behalf of the European Commission, provides daily analyses in near-real time (NRT) and 5-day forecasts of atmospheric composition as well as a reanalysis of atmospheric composition going back to 2003. This is accomplished by assimilating air quality and greenhouse gas retrievals from a range of satellite sensors into ECMWF’s Integrated Forecasting System (IFS) which contains on-line chemistry routines.
While the assimilation of atmospheric composition data is beneficial in its own right to provide the best possible initial conditions for the 5-day CAMS composition forecasts, there are exciting and challenging potential feedbacks between composition and NWP assimilation that are worth exploring. For example, using ozone, aerosols or greenhouse gases interactively in the model’s radiation scheme or radiative transfer observation operator could improve NWP forecasts and might allow us to make better use of the assimilated radiances. There is a beneficial impact of ozone-wind tracing in the ECMWF 4D-Var system on ozone analyses and forecasts in the NWP version of the IFS, and NWP forecast benefits are seen when extending the humidity analysis into the stratosphere.
In this talk we will explore some of these feedbacks and highlight the benefits as well as the growing complexity and challenges that composition forecasts can bring to the Earth system model and data assimilation system.
1ECMWF
Data assimilation for Numerical Weather Prediction has traditionally been concerned with estimating initial conditions for forecasting the weather and, increasingly, the future evolution of the Earth system on scales ranging from days to months. This is still DA core business and there are numerous directions where near and medium-term DA developments hold promise to significantly extend current predictive capabilities. We will briefly review these DA research directions from an ECMWF perspective and discuss some of the opportunities and challenges ahead.
At the same time the recent advent of Machine Learning has changed the landscape of NWP and Earth system prediction and offered new opportunities but also challenges for the sustainability of current R&D and operational practices in our domain. In the second part of the talk, we will give an overview of the activities going on at ECMWF aimed at hybrising current NWP workflow with Machine Learning technologies and make the case that these activities offer a more effective and sustainable development path than the acritical adoption of fully data-driven emulators.
1MIDS, KU Eichstaett-Ingolstadt
Numerical weather prediction (NWP) has undergone a profound revolution in recent decades. At the same time, the volume and diversity of atmospheric observations have expanded dramatically, including enhanced measurements of atmospheric water. The advent of deep learning in geophysical modeling has further transformed the field, leading to the development of hybrid NWP models that combine physical modeling with machine learning techniques. In some experimental cases, machine learning models have even replaced traditional physical models, trained using reanalysis data produced through the assimilation of observations into physical models.
Building on these exciting advancements in numerical modeling, we examine perspectives of data assimilation and its integration with machine learning. Our discussion highlights the critical importance of respecting physical constraints and accurately representing uncertainties, particularly in high-resolution models. Such models must capture physical processes across scales ranging from a single kilometer to thousands of kilometers, including the complex dynamics of water phase transitions.
1ECMWF
Starting circa 2022, global data-driven numerical weather prediction (NWP) models have become competitive with leading physics-based systems such as ECMWF's Integrated Forecast System (IFS) across a number of headline skill scores. Currently, these AI-powered models invariably rely on conventional weather (re)analyses for initialization and training. The question then arises: can machine learning be used to build an end-to-end forecasting system, thus bypassing the need for a traditional analysis?
Recent attempts to emulate 4D-Var have shown some promise, but have proven unable to calculate a weather analysis at comparable resolution and quality to what is routinely produced by traditional data assimilation methods such as 4D-Var.
Over the past year, the ECMWF has been exploring a radically different data-driven approach to learning a medium-range weather forecast exclusively from Earth System observations, called Artificial Intelligence Direct Observation Prediction. AI-DOP seeks to learn physical dynamics and processes in the Earth System by leveraging the relationships between observed quantities (e.g., microwave satellite brightness temperatures, GPS-RO bending angles, or altimeter wave height data) and geophysical variables such as temperature and winds, using only historical time series of satellite and conventional observations - i.e., with no climatology and/or NWP (re)analysis inputs or feedbacks.
An in-depth look into AI-DOP forecasts offers good indications that the system is able to build a coherent internal representation of the Earth System state. The relationships that the network learns between different observed variables generalize to areas where no observations exist - for example, forecasts of upper-level winds compare well with ERA5 even in areas where there is no radiosonde or aircraft coverage.
This talk will give an overview of AI-DOP and its current development status, and discuss possible ways forward towards a data-driven, end-to-end system that can compete with the IFS at the medium-range.
1ECMWF
While most earth system assimilation research exists within the purely physical or purely empirical domains, a hybrid of the two may ultimately provide the best results. From a Bayesian perspective, it is only by combining prior knowledge (physical equations) with observations that the most accurate analysis (or posterior) is obtained. More practically, only physical equations are capable of accessing unobserved physical variables. A granular hybrid of data assimilation and machine learning would retain the better known physical components to provide strong constraints around which more poorly known or empirical model components, such as physical parametrizations, can be learned or improved from observations. This level of hybridization ultimately requires a complete overhaul of our data assimilation approaches and a new machinery to support it. To adapt to evolving model components and new observations, as well as to cope with the changing climate, it is unclear whether the hybrid should evolve continuously, or whether the model should stay fixed apart from during defined training phases. Despite all this, pioneer hybrid approaches are already becoming part of the operational forecasting system, where they allow rapid expansion to new domains, particularly the land, snow, sea ice and ocean surfaces. One example is the assimilation of microwave observations over sea ice. There are few "ground truth" observations in polar regions with which to train any conventional machine learning method, so the sea ice framework needs the hybrid approach to deal with the otherwise poor knowledge of both the physical state and the required physical models. In this time of rapid evolution of earth system assimilation and forecasting in response to machine learning and empirical methods, the granular hybrid approach is probably the least well explored, but perhaps the most promising, avenue for future progress.
1Ecole des Ponts ParisTech
Machine learning (ML) and more specifically deep learning (DL) is increasingly used in geophysical data assimilation (DA),
whether serving as a tool to improve classical DA, to be combined with DA, or to offer a substitute to DA. I will give an overview of the recent achievements and promising routes offered by ML into DA. For instance, ML can be leveraged in the regularisation of ensemble-based DA, in the solvers of variational DA methods, for generating or augmenting ensembles in DA, for building surrogates of the tangent linear and adjoint models to be used within DA, to learn a model error correction within a weak-constraint 4D-Var framework, or, ultimately, as a replacement for the DA analysis. I will also present an example where DL unveils a potentially efficient analysis scheme that the DA community completely overlooked so far.
1CIRES / PSL / NOAA, 2CIWRO/NSSL, 3NOAA National Severe Storms Lab, 4NOAA EMC, 5NOAA, 6NOAA PSL, 7CIRES / NOAA ESRL PSD, 8UCAR/JCSDA, 9NOAA/NWS/NCEP/Environmental Modeling Center, 10University of Wisconsin - Madison, Space Science and Engineering Center, Cooperative Institute for Meteorological Satellite Studies, 11NOAA/NWS/NCEP/EMC
Machine learning (ML) models trained on the reanalysis record are now the most computationally efficient way to provide highly accurate forecasts for the medium range weather given the initial state generated by the traditional (highly computationally expensive) data assimilation methods. Can we use the power of the ML methods to enable faster, more accurate data assimilation? This presentation reviews several directions that NOAA is pursuing to incorporate ML into the data assimilation pipeline. This includes using the ML model emulators to propagate the ensemble forecast for the Ensemble Kalman Filter; use of the ML-based tangent linear and adjoint models in the context of the 4DVAR, ML-based balance and observation operators, ML-based observational bias correction, and early exploration of the direct ML-based data assimilation.
1University of Reading
In recent decades, steady improvements in numerical weather prediction (NWP) skill have been driven by enhancements to data assimilation (DA) methods and increasing volumes of observations assimilated. Nevertheless, only 5-10 % of available satellite data is assimilated in operational systems, in part due to a lack of 1) knowledge of observation uncertainty structures and 2) appropriate fast numerical techniques. We will discuss recent progress in estimating and accounting for observation-error correlations, potentially allowing for the use of denser observations in operations. We will give examples for different observation types, including Doppler radar, geostationary and polar orbiting satellite data.
Given this progress, it is important to ask how prior and observation-error correlations interact and how this affects the value of the observations in analyses produced via conventional data assimilation techniques. In an optimal system, the reduction in the analysis-error variance and spread of information is shown to be greatest when the observation and prior errors have complementary statistics. This can be explained in terms of the relative uncertainty of the observations and prior on different spatial scales.
In the new paradigm of machine learning weather prediction (MLWP), uncertainty and information content remain important concepts. Overfitting and underfitting are well known issues that can arise. Overfitting occurs when a ML model learns the noise as well as the signal in the training data, resulting in poor performance when confronted with new independent data. Underfitting occurs when the training data does not contain enough information, and the ML model cannot make satisfactory predictions.
Future observing systems may be required to provide data driving both NWP and MLWP forecasts. Thus, a better understanding of the relationships between observation data information content and information encoded in the weights of a neural network is a key open question for future research.
1UK Met Office, 2MetOffice
Since 2020, the Met Office has worked on the reformulation of both its observation processing and its data assimilation system to ensure that the expected capability of the next generation HPC are fully exploited. To achieve this task, the MO has adopted the JEDI code framework (Joint Effort for Data assimilation Integration) and initiated a strong collaboration with JCSDA (Joint Centre for Satellite Data Assimilation). This collaboration has allowed rapid development of new data assimilation methods and tools and observation quality control procedures.
In this presentation, we will give an overview of both our new JEDI-based observation processing application (JOPA) and our new JEDI-based Application for Data Assimilation (JADA) as well as their key components such as: the hybrid tangent-linear. Scientific and technical challenges and achievements will also be presented.
1University of Hamburg
At time scales of one week and beyond, predictability of day-to-day weather is a global problem, as the quality of midlatitude forecasts is influenced by large-scale tropical processes and underlyling strength of tropics-extratropical coupling (TEC). Advancements in extratropical practical predictability have been argued as coupled to improvements in tropical initial states and a better representation of TEC-related processes in NWP models. Additionally, equatorially-trapped waves such as Kelvin and mixed Rossby-gravity waves have been hypothesized to contribute to longer predictability in the tropics relative to the extratropics, potentially benefiting midlatitude predictability. However, the underlying mechanisms remain unclear, and no evidence from ECMWF data supports extended predictability of these waves.
A well-established approach to studying the impact of tropical forecast errors on extratropical forecast skill involves nudging tropical forecasts toward analyses. While this method suppresses tropical error growth and isolates extratropical forecast errors, it prevents the study of TEC processes in subtropical regions, where both midlatitude circulation and tropical forcing play roles.
I will present a novel framework for investigating the role of the tropics in global predictability, implemented across a hierarchy of models. This approach applies observing system experiments (OSEs) or observing system simulation experiments (OSSEs) that assimilate observations exclusively within the tropics or extratropics. It can be seen as observation-denial experiments but instead of excluding specific observation types, these OSSEs confine or deprive observations to the tropical belt. I will discuss latitude- and altitude-dependent analyses of forecast error growth in balanced and unbalanced wave circulation focusing on subtropical regions and TEC processes associated with poleward-propagating wave signals from tropical initial conditions.
1University of Bologna, 2CEREA, ENPC, 3NERSC, 4RMI, 5Plymouth Marine Laboratory, 6University of Reading & National Centre for Earth Observation, 7Ecole des Ponts ParisTech, 8MERCATOR, 9CEREA, École des Ponts and EDF R&D (France), 10Royal Meteorological Institute of Belgium, 11University of Reading
In recent years, data assimilation (DA), and more generally the climate science modelling enterprise have been influenced by the rapid advent of artificial intelligence, in particular machine learning (ML), opening the path to various form of ML-based methodology.
In this talk we will schematically show how ML can be included in the prediction and DA workflow in different ways with various degrees of integration within each other. In a so-called “non-intrusive” ML, we will show how ML can be used to supplement a chaotic system and help predicting the local instabilities and/or abrupt regime’s changes. DA and ML can also be placed side by side in an iterative approach alternating a DA step that assimilate sparse and noisy data, and a ML step whereby the data-driven model is further optimised against the analyses outputted from the DA. In a further level of fusion ML can finally be used to within hybrid ML-DA methods in which ML is used to cope with some limitations in DA approaches. In particular we shall show an innovative formulation of the EnKF that embodies a variational autoencoder enabling the EnKF to (i) handle non-Gaussian observations, and, (ii) respecting physical balances.
Using a set of idealised model and observational scenarios, we will show numerical results for all of the above-mentioned possibilities. We will focus on, and will be motivated by, problems originated in diverse areas of climate science, namely chaotic systems such as the atmosphere and the highly nonlinear and non-Gaussian DA for Arctic Sea ice and ocean biogeochemistry.
1RIKEN, 2IMT-Atlantique, 3IMT Atlantique
At RIKEN, the Japan’s national flagship research institute for all sciences, we have been exploring several attempts to integrate data assimilation (DA) and AI/ML. DA integrates the (usually process-driven) model and data, while AI/ML is purely data driven and is proven to be very powerful in many applications. An example is to integrate data-driven AI/ML-based precipitation nowcasting with process-driven numerical weather prediction (NWP). We developed a nowcasting system based on a convolutional long short term memory (LSTM) which takes several time steps of 2-D precipitation image data to predict future images. NWP with radar DA produces future precipitation images, which can be input to the data-driven LSTM to further improve the predicted images. Another example is to develop ML’ed observation operators for satellite radiances. We obtained an improvement by purely ML’ed observation operators without any information from a physically based radiative transfer model. The third example is to use DA with an ML’ed surrogate model for producing more accurate analyses for further training the ML’ed surrogate model. We found that DA with flow-independent background error covariance could produce more accurate ML’ed surrogate model, but ensemble-based DA resulted in a mixed situation probably because the ensemble forecasts by the ML’ed surrogate model may not produce proper error covariance. We also explored developing a limited-area ML’ed surrogate NWP model in collaboration with IMT-Atlantique. In this presentation, we will share the most recent activities of integrating DA and AI/ML at RIKEN.
1ECMWF
Enhancing the spatial resolution and increasing the update frequency of global analyses and forecasts enables weather prediction systems to better capture rapidly evolving atmospheric conditions. The Integrated Forecast System (IFS) of ECMWF comprises an Earth-system model coupled with an advanced 4D-Var data assimilation (DA) system, which was recently experimentally upgraded to higher spatial resolution and to provide more frequent updates.
We developed and tested an updated ECMWF 4D-Var DA system, called Extending Window (Ext-Win) DA, which increases the frequency of analysis updates from the current 6-hour interval to as frequent as hourly. The Ext-Win DA system employs assimilation windows of varying lengths, ranging from 5 to 14 hours, to incorporate the most recently available observations and provide optimal estimates of the Earth's system state at any time of day. Considering computational constraints and product dissemination requirements, we propose a practical implementation of the Ext-Win framework that upgrades the ECMWF DA system to provide analysis and forecast updates every 3 hours. The impact of this new framework is evaluated in terms of forecast skill, convergence rate, and computational footprint.
The higher resolution initial conditions were achieved through higher resolution (4.4 km) 4D-Var trajectory and higher resolution (20 km) 4D-Var minimizations utilizing tangent linear model and its adjoint. We demonstrate significant improvements in large-scale forecast skill, an improved forecasting of extreme events, and better use of high-resolution observations. The impact of the higher resolution DA system is demonstrated using a specific test case of the tropical cyclone Otis, which made landfall as a Category 5 tropical cyclone, and was only predicted to reach the tropical storm intensity by most of the global (and regional) NWP models. Employing higher resolution 4D-Var DA, we were able to emulate the observed rapid intensification of TC Otis.
1Deutscher Wetterdienst, 2DWD (German Weather Service), 3Deutscher Wetterdienst (DWD)
The Deutscher Wetterdienst (DWD) is dedicated to utilize the potential of artificial intelligence (AI) across various components of the data assimilation (DA) process such as quality control, forward operators, bias correction, and error covariance estimation. DWD aims to employ a multi-faceted approach that leverages AI to address the complex challenges. We envision the AI-Var approach described by Keller and Potthast (2024) to be a cornerstone of DWD's future DA system, reimagining DA for Numerical Weather Prediction (NWP) as a complete data-driven learning problem. By embedding the variational DA cost function within a neural network, AI-Var enables the direct assimilation of observations without requiring a pre-existing analysis dataset.
Building on AI-Var, DWD is developing AIDA, a generalized framework that extends AI-var into a comprehensive DA system. While AIDA is designed to handle arbitrary grids and diverse observational datasets, it is also developed in a modular way, ensuring compatibility and adaptability to seamlessly integrate with DWD’s operational workflow. By leveraging scalable AI architectures, AIDA combines computational efficiency and flexibility, making it a robust foundation for a fully data-driven NWP system as well as other heavily DA-related tasks such as reanalysis. Furthermore, the system also supports the blending and merging of different sources of data, enabling its use also in other contexts such as nowcasting and post-processing.
DWD envisions its future DA system to rely extensively on AI, integrating it with classical physics- and statistics-based approaches while leveraging its transformative potential. This hybrid approach ensures that the strengths of traditional methods, such as physical interpretability and established theoretical foundations, complement flexibility and computational power of AI. Additionally, DWD aims to foster community collaboration by contributing AIDA to Anemoi as a data assimilation component. This enables the community to benefit from and build upon AIDA’s capabilities, thus enabling further advancement of data-driven NWP.
1Norwegian Meteorological Institute
This presentation provides an overview of the current status, advancements, and future vision of data assimilation within the ACCORD-HIRLAM consortium. Focused on the HARMONIE-AROME framework, it highlights ongoing efforts in enhancing weather prediction accuracy through advanced data assimilation methods, including 3D-Var, 4D-Var, and ensemble-based techniques. Key areas of progress include enhanced data assimilation methods, the integration of more observational data types, improvements in radar and satellite data usage, optimisation of computational performance, and integration of Artificial Intelligence and machine learning in various parts of the data assimilation process.
The work also emphasizes pre-operational implementations and diagnostic and verification tools to refine assimilation strategies, demonstrate the benefits of high-resolution data assimilation and ensure robust operational readiness. We also show some experiences from km-scale re-analysis projects using HARMONIE-AROME.
Current challenges include combining operational needs such as short observation cut-off times and the need for frequent launches of forecasts with state-of-the-art developments regarding data assimilation algorithms and observation usage. Towards sub-hourly observation cycling and continuous data assimilation implications for model spin-up are being addressed. Plans also include expanding system capabilities, such as the assimilation of newly launched and planned satellite instruments, application of all-sky and dynamical emissivities for low-peaking satellite channels and work towards coupling the atmospheric and surface data assimilation. There is also a need to focus on developments for the prediction of extreme weather, such as tuned and adaptive observation quality control, and measures to verify these improvements.
This collaborative effort represents a significant contribution to leveraging innovative technologies to improve high-resolution weather forecasting across the Nordic and European operational centres.
The relevance of observed information for numerical weather prediction has never been higher now that data-driven models based on machine learning have made the headlines with their performance and accuracy. While direct prediction from observations is attractive as it dispenses with the need for a complex analysis model, there are some limitations since only observed quantities can be the target of the learning. Therefore, there is still a reliance on physically based models which can provide analysis of present-day observations or reanalysis of past observations which can provide information on meteorologically relevant variables that are not directly observed. The balance between physics based versus ML techniques in forward modelling and traditional data assimilation may evolve but the need for a comprehensive global observing system will remain.
All traditional NWP physical models, as well as most data driven models, still rely on accurate analyses for their initialisation. These analyses are usually obtained with various assimilation techniques which vary from variational approached (3D-Var, 4D-Var, etc) to ensemble methods (various flavours of Ensemble Kalman Filters, etc). What all these systems have in common is the need for observations which are well calibrated and accurate. Satellite observations from microwave and infrared sensors as well as GPS radio occultation (RO) have the largest impact in the analysis, particular in areas where ground-based observations are scarce, such as over the open oceans. Ground based observations are however fundamental as they provide a very accurate anchor for the satellite observations. Emerging low-cost technologies are an interesting area to explore, but their level of accuracy is still to be understood. Continued reliance on the commercial sector for satellite observations, for example small satellite constellations for radio occultation measurements, is, however, expected.
This talk will cover the present observing system usage for NWP analysis at ECMWF and the future landscape of the next decade. This will include highlights from preparatory studies for EUMETSAT’s new hyperspectral sensors, IRS on MTG-S and IASI_NG on EPS_SG, the proposed microwave constellation (STERNA) and the Doppler wind lidar (Aeolus-2) as well as studies to exploit the ESA’s Copernicus evolution missions (CIMR, CRYSTAL, LSTM, etc). Results from the RO meteorology experiment (ROMEX) with increased observation availability will be presented. Advances towards all-sky IR and visible radiance assimilation for operational NWP will also be presented along with the exploitation of actively sensed (radar and lidar) observations from EarthCARE.
1ECMWF
The weather we experience is a result of the atmosphere interacting with all the other parts of the Earth system.
In particular, the atmosphere interacts with the ocean and waves, rivers and lakes, the land surface and vegetation, and sea ice and snow.
Many of these subsystems evolve with substantially longer timescales than the atmosphere itself, so can be a source of predictability but also a brake on the speed at which we update the initial conditions for the Earth system.
In this talk we will discuss why coupled data assimilation exists as a topic and why the analysis of the different components is often not combined with the atmospheric analysis.
But more optimistically, coupled data assimilation can bring benefits to both the analysis of the Earth system state, and on the coupled forecasts themselves. We will discuss examples where this has already been demonstrated, what the community is currently trying to achieve, as well as the challenges which we foresee.
1Meteorological Institute, LMU Munich
The predictability of weather is intrinsically limited by rapid growth of small-scale errors. But current prediction systems are not yet at the limit, and improvements are possible in the forecast model, the data assimilation system and in the observations. In this presentation I will present estimates of the potential for improvement from each of these components, which suggests that the largest gains will come from improved data assimilation. Some implications will be discussed for the potential of machine learning methods to accelerate these improvements, including new results on how ai weather models are constrained by the choice of loss function versus model architecture and training data.
1Ecole des Ponts ParisTech, 2ECMWF, 3CEREA, ENPC
Systematic model errors significantly limit the predictability horizon and practical utility of the current state-of-the-art forecasting systems. Even though accounting for these systematic model errors is increasingly viewed as a fundamental challenge in the field of numerical weather prediction, estimation and correction of the predictable component of the model error has received relatively little attention. Modern implementations of weak-constraint 4D-Var are an exception here and a promising avenue within the variational data assimilation framework, showing encouraging results. Weak-constraint 4D-Var can be viewed as an online hybrid data assimilation and machine learning approach which gradually learns about model errors from partial and imperfect observations, allowing to improve the state estimation. We propose a natural extension of this approach by applying deep learning techniques to further develop the concept of online model error estimation and correction.
In this talk, we will present recent progress in developing a hybrid model for the ECMWF Integrated Forecasting System (IFS). This system augments the state-of-the-art physics-based model with a statistical model implemented via a neural network, providing flow dependent model error corrections. While the statistical model can be pre-trained offline, we demonstrate that by extending the 4D-Var control vector to include the parameters of the neural network, i.e. the model of model error, we can further improve its predictive capability. We will discuss the impact of applying the flow dependent model error corrections in the medium range forecasts on the forecast quality.
1SMHI
In recent years, machine learning has revolutionized weather forecasting. However, its application to data assimilation has not yet seen the same progress. This disparity is not due to the simplicity of data assimilation but likely stems from the absence of accessible benchmark datasets. Effective data assimilation for earth system modeling is hindered by high-dimensional state spaces, nonlinearities, and complex error structures, especially when approaching the hectometric scale. This talk explores the potential of machine learning to address these issues within consortia focusing on very high resolution models like HIRLAM (ACCORD). Representational learning like autoencoders can reduce dimensionality and offer latent space representations where the system is kept in balance. Transport methods, such as diffusion or normalizing flows, can transform non-Gaussian error distributions into Gaussian ones, facilitating the use of classical approaches. Self-supervised learning holds promise for capturing interactions within and between the land, ocean, and atmosphere. Additionally, generative models can efficiently produce ensembles for uncertainty quantification and extend data assimilation from state to probabilistic estimation. Ultimately, machine learning holds the potential to revolutionize the entire weather prediction chain by enabling direct forecasts from observations, potentially rendering traditional data assimilation obsolete.
1ECMWF
Computational efficiency improvements are central to the development of the Ensemble of Data Assimilations (EDA). Over the past years we have achieved more accurate and reliable EDA to match the resolution and accuracy improvements in the assimilation system through a combination of better model and observation uncertainty representations and increase in number and resolution of members enabled through computational efficiency gains. Taken together, the computational optimizations of the EDA over the last few years resulted in EDA about 1/4 of the cost for same performance and enables improved performance at manageable costs.
The last development is soft re-centring which enables to run EDA members at same 9km resolution as the deterministic 4D-Var analysis. In soft re-centring the control member of the EDA runs in a 4D-Var configuration closer to the high-resolution 4D-Var, which is used to re-centre the background and warm-start the minimization of the perturbed members. The soft re-centring warm-start shifts the distribution of perturbed members first guess towards the unperturbed observations by adding the analysis increment from the first minimization of the control member. This effectively replaces one minimization so that same accuracy and reliability can now be achieved with a single minimization in the perturbed members compared with two minimizations required without soft re-centring. As each perturbed analysis is the result of a full 4D-Var analysis update, it is a valid model trajectory and this allows to minimise the initial shock of re-centring.
To further improve the computational efficiency of the EDA, we have started to explore the application of probabilistic data-driven techniques in 4D-Var. We developed a Variational Encoder-Decoder (VED) machine learning model for emulating EDA statistics. Specifically, we define a VED that parameterizes the sampling distribution of the diagonal part of the flow dependent 4D-Var B matrix, conditioned on a subset of EDA analyses. We show the approach is theoretically justified and explore the sensitivity of ECMWF 4D-Var setup to the use of such methods. Next steps are to show to what degree the EDA can be augmented with emulated EDA members, and in particular explore the trade-off between the number of perturbed EDA members and emulated members, with attention on the minimum number of EDA members required to maintain accuracy and reliability.