1Utrecht University
We apply Gaussian density neural network and quantile regression neural network ensembles to predict the El Niño–Southern Oscillation. Both models are able to assess the predictive uncertainty of the forecast by predicting a Gaussian distribution and the quantiles of the forecasts, respectively. This direct estimation of the predictive uncertainty for each given forecast is a novel feature in the prediction of the El Niño–Southern Oscillation by statistical models. The predicted mean and median, respectively, show a high‐correlation skill for long lead times (r=0.5, 12 months) for the 1963–2017 evaluation period. For the 1982–2017 evaluation period, the probabilistic forecasts by the Gaussian density neural network can better estimate the predictive uncertainty than a standard method to assess the predictive uncertainty of statistical models.
Vegetation, especially in a combination with hazardous weather is a common cause for outages in electricity transmission and distribution networks. Extreme weather factors, like wind gusts, excessive precipitation, or/and heavy icing provokes vegetation to became direct or an indirect source of outages. Climate change furthermore increases concerns relative to such issues, as prediction of weather can become more challenging and weather impacts can be more extreme. Hence, predictive decision making and situation awareness in the context of vegetation- and weather-induced risk near power lines can significantly improve power network resilience and reduce cost of electricity outages.
Authors developing a platform where machine learning techniques applied to integrated NWP and satellite imagery data across various temporal and spatial scales to improve power grid risk assessment. Fusion of satellite imagery, observations from ground surveys, and weather data is used to identify areas susceptible to weather- and vegetation-related outages. A main goal of the developed platform is to support grid operator’s decision making on power grid maintenance (especially in rural and inaccessible areas) and provide them with real-time vegetation risk map. In this way, operators can optimally allocate resources for vegetation management and increase the safety of ground crews, while still complying with monitoring regulations.
The integration data comes in combination with recent advances in artificial intelligence (deep learning, image processing, reinforced learning) that opened a way to develop new monitoring solutions for power system assets in the face of weather- and vegetation-related hazards.
The strength of the platform developed relies on its capacity to combine a range of data sources from different platforms, with varying quality and precision, resolution, data availability, and temporal and spatial sampling. The platform was tested on a study area in Askvoll, Norway, where the data sources considered ranged from existing asset information, to outage data and vegetation management reports, to satellite images, LiDAR point clouds and drone aerial data, to NWP data for last two decades.
As a main output, the solution includes an interactive heatmap highlighting areas which are likely to face disturbances due to vegetation encroachment in proximity of power lines. Moreover, the application of our approach in the considered study area enabled us to confidently identify areas not exposed to such a threat (98.8 % of certitude), which is a valuable information for resource optimization during inspection planning. In addition, our maps also explicitly indicate areas more prone to wind-related disturbances, using a likelihood-of-outage function directly calculated on the basis of wind predictions. Finally, by combining these different insights into one tool, our solution indicates how remote sensing-based data can be used by vegetation management teams to prioritize the line sections that need tree trimming.
We also describe the obtained results and show how vegetation management teams can eventually prioritize the line sections that need tree trimming.
1ANITI, 2NVIDIA, 3CERFACS, 4Toulouse university, INP-IRIT, ANITI
In Data assimilation, the state of a system is estimated using two models that are the observational model, which relates the state to physical observations, and the dynamical model, that is used to propagate the system along the time dimension. Both models are described using random variables that account for observation and model errors. Most data assimilation basic equations are based on a Bayesian approach in which prior information is combined with above statistical models to obtain the probability density of the system state, conditioned to past observations. The estimation is done in two steps: the analysis of an incoming observation and the propagation to the next time step.
Data assimilation algorithms use additional assumptions to obtain closed expressions for the probability densities so that they can be handled by computers. Historically in the linear Kalman filter (KF) approach, statistical models are supposed to be Gaussian and models linear. Hence, the propagation and analysis steps consist in updating mean and covariance matrix characterizing the Gaussian densities of the state conditioned to observations. In the Ensemble Kalman Filter (EnKF) approach, these densities are approximated by a (hopefully small) set of sampling vectors, and formula are available for the analysis and the propagation steps as well.
The above presentation of KF and EnKF suggests a possible generalization of data assimilation algorithms. The bottom line of this approach is to let a learning process decide what is the best internal representation for the densities. This representation will be stored in some computer memory M (say, in a hopefully not too long vector) and we propose to use machine learning to estimate the analysis and propagation steps acting on it, using recurrent network based learning architectures, hence the name Data Assimilation Network (DAN). The memory M will be connected to the state of the system with a mapping we call "procoder", that will also be learned. The data used for the learning consists in batches of state trajectories of the (stochastic) dynamical system and corresponding (noisy) observations. We show that this general algorithm, that is a variant of the Elman Neural network, outperforms state of the art ensemble variational techniques in the case of twin experiments with the Lorenz system.
The former supervised experiment show that DA algorithms can be successfully learned from data. However these data still contains state trajectories which means that the learning relies on some preexisting observation and dynamical model. This brings the second part of our work where we learn a DA algorithm together with an observation model and a dynamical model from observations only. As the propagation already is a generalized dynamical model acting on densities, this is achieved by just adding the DAN a learnable observation model. This way, our network presents a recurrent variational auto encoder structure and the so-called Evidence Lower BOund (ELBO) can be used as a cost function. Numerical experiments shows that this proposed unsupervised DAN also compare extremely well to state of the art ensemble variational techniques.
1Meteopress
The standard for weather radar nowcasting in the Central Europe region is the COTREC extrapolation method. We propose a recurrent neural network, based on the PredRNN architecture, which outperforms the COTREC 60 minutes predictions by a significant margin.
Nowcasting, as a complement to numerical weather predictions, is a well-known concept. However, the increasing speed of information flow in our society today creates an opportunity for its effective implementation. Methods currently used for these predictions are primarily based on the optical flow and are struggling in the prediction of the development of the echo shape and intensity.
In this work, we are benefiting from a data-driven approach and building on the advances in the capabilities of neural networks for computer vision. We define the prediction task as an extrapolation of sequences of the latest weather radar echo measurements. To capture the spatiotemporal behaviour of rainfall and storms correctly, we propose the use of a recurrent neural network using a combination of long short term memory (LSTM) techniques with convolutional neural networks (CNN). Our approach is applicable to any geographical area, radar network resolution and refresh rate.
We conducted the experiments comparing predictions for 10 to 60 minutes into the future with the Critical Success Index, which evaluates the spatial accuracy of the predicted echo. Our neural network model has been trained with one year of rainfall data captured by weather radars over the Czech Republic. Results for our bordered testing domain show that our method achieves comparable or higher CSI than both COTREC and optical flow extrapolation methods available in the open-source pySTEPS library.
With our work, we aim to contribute to the nowcasting research in general and create another source of short-time predictions for both experts and the general public.
1Chinese Academy of Meteorological Sciences, 2China Meteorological Administration
The early detection of convection initiation (CI) is essential for relieving the damages caused by thunderstorms and accordingly severe weather. However, the models tend to miss or delay the trigger of convection, and radar can only give information when the storms have started. Previous research has shown that satellite remote sensing might provide lead times of up to potentially 1 hour plus in advance of convective storms being detected by radar.
In this study, a decision tree based method was developed for the identification of flash heavy rain. Decision trees are pretty popular in many weather applications, which are powerful in handling big data, human readable and able to learn with only the most relevant variables. The decision tree was automatically developed to predict whether convection will occur. It splits data recursively by identifying the most relevant question at each level of the data. At the root node, the data are split with the identification of strong or weak weather forcing? The data are further refined down each of the yes and no branches until a prediction is made at a leaf node with the class label. The strategy and possible predictive features are obtained by using a variety of ML feature engineering techniques, including the Correlation Coefficient, mutual information and other embedded feature engineering methods (logistic regression, random forest and gradient lifting tree). The most relevant features from multi-source and various lead time data were extracted effectively. Then more than sixty flash heavy rain events in the Beijing-Tianjin-Hebei region of China in 2018 -2019 summer are trained and tested by the decision tree approach. Some preliminary results of our effort on combing the numerical forecasts and high-density ground-based observations and satellite data will be reported in this presentation.
1Nanjing University
Tropical cyclone (TC) is among the most destructive weather phenomena on the earth, whose structure and intensity are strongly modulated by TC boundary layer. Mesoscale model used for TC research and prediction must rely on boundary layer parameterization due to low spacial resolution. These boundary layer schemes are mostly developed on field experiments under moderate wind speed. They often underestimate the influence of shear-driven rolls and turbulences in the meantime. When applied under extreme condition like TC boundary layer, significant bias will be unavoidable. In this study, a novel machine learning model—one dimensional convolutional neural network (1DCNN)—is proposed to tackle the TC boundary layer parameterization dilemma. The 1DCNN saves about half of the free parameters and accomplishes a steady improvement compared to fully-connected neural network. TC large eddy simulation outputs are used as training data of 1DCNN, which shows strong skewness in calculated turbulent fluxes. The data skewness problem is alleviated in order to reduce 1DCNN model bias. It is shown in an offline TC boundary layer test that our proposed scheme performs significantly better than popular schemes now utilized in TC simulations.
1CIRES/NOA/GSL, 2NOAA/ESRL (CIRA), 3NVIDIA, 4NOAA Earth System Research Laboratory
Atmospheric science and numerical weather models have a data problem: keeping up with the quantities of model and satellite data in real-time applications. These include but are not limited to alerts, now-casting, and data assimilation. We found that deep learning is a valuable tool for quick cyclone detection and image segmentation on large data sources. Using deep learning, we developed four different U-Net models to detect different cyclone ROI from both weather model total precipitable water output fields as well as water vapor channels in satellite imagery. The models identify cyclone ROI in a range of 0.03 to 0.15 seconds. Success was measured in both Intersection over Union (IoU) metrics and model accuracy, where IoU scores for the U-Nets ranged from 0.51 to 0.76 and the accuracy ranged from 80% to 99%. These models additionally identified more ambiguous ROI sometimes missed by more classic methods, such as hand-labeling and heuristic models, further proving beneficial in higher-risk weather location extraction.
1ETH Zurich, 2EPFL, Switzerland
Sudden stratospheric warmings (SSWs) are extreme wintertime circulation events of the Arctic stratosphere that are accompanied by a disruption of the polar vortex, a strong westerly around the cold pole. The disruption can be classified into split and displaced events based on the vortex geometry. Past studies show that SSWs can have long-lasting impacts of several weeks to months on surface weather. However, it is difficult to forecast SSWs at lead times longer than 1-2 weeks. Therefore, understanding the dynamical development of the vortex disruption will be helpful to improve the predictability of SSWs, and ultimately, the weather at the Earth’s surface. To this end, we employ a mode-decomposition diagnosis for the principal components (PCs) of the potential vorticity (PV) equation at 850 K. The contributions to the time tendency of the PCs are separated into linear and nonlinear contributions from empirical orthogonal function modes with low and high frequency power. The results show a clear signal for the development of SSWs around 20 days before the onset of the events, with a dominant contribution from the linear interaction between low modes. Further analyses indicate that the dynamical processes for the development of split and displaced SSWs differ, which could be used to classify the type of event before SSWs. The distinct contributions from linear and nonlinear terms to the PCs tendency for split or displaced SSWs versus non-SSWs can be used as input to different classification algorithms (e.g., Support Vector Machine) to identify and classify SSWs from non-SSWs. By incorporating the knowledge of the dynamical processes that lead to the vortex disruption into machine learning algorithms, processes related to SSWs could be identified around 20 days ahead of the events, which is beyond the current predictability of SSWs and which therefore promises to improve the prediction of surface weather.
1LERMA / Observatoire de Paris / CNRS, 2ECMWF
Currently, ASCAT level 2 soil moisture products are assimilated into the ECMWF operational soil moisture SEKF. In the context of the EUMETSAT HSAF project, alternative approaches using a neural network method are being investigated, as a forward model emulator or as a retrieval scheme. These two possibilities are investigated: firstly, the neural network approach can be used to retrieve soil moisture from the raw ASCAT backscatter observations. The retrieved soil moisture could then be assimilated into the SEKF. Secondly, the neural network approach can be used as part of a forward model to produce simulated backscatter values using land surface model parameters as inputs. This could then form part of a system to directly assimilate ASCAT backscatter observations into the SEKF. Preliminary results of both approaches will be presented and their pros/cons will be commented. Furthermore, an analysis of the links of the NN and CDF-matching techniques (often necessary step in the assimilation process) will be done. A method to estimate NN uncertainties will also be mentioned as this is an important element for the use of NN in assimilation frameworks.
1ECMWF
Many recent studies show the impact of the COVID-19 lockdown on NO2 air pollution being drastically reduced using the TROPOMI instrument onboard the Sentibnel5-Precursor Copernicus satellite comparing. Some studies compared several weeks before and during the lockdown and other compared equivalent weeks between 2020 and 2019. While urban pollution has undoubtedly being reduced during those unprecedented times, majority of those studies do not account for the weather induced variability or the business as usual (BAU) variability that can strongly affect pollution levels from week to week and from a year to another. We show that with the help of machine learning techniques (gradient boosting) the BAU signal can be predicted and then subtracted to the real observation giving another estimate supposedly more accurate. We confront the satellites estimates using the BAU simulated signal subtraction and using the year to year comparisons with surface based and model estimates over Europe and possibly over other densely populated hence heavily polluted areas.
1Barcelona Supercomputing Center
The ever-growing collections of Earth system data, coming from remote sensing, in situ measurements and model simulations, and the recent developments in the fields of Machine and Deep Learning for exploiting high-dimensional data, offer exciting new opportunities for expanding our knowledge about the Earth system (Schneider et al. 2017, Reichstein et al. 2019, Bergen et al. 2019, Huntingford et al. 2019).
For instance, artificial neural networks have been successfully applied to the task of creating parameterizations of subgrid processes in climate models (Rasp et al. 2018, Dueben and Bauer 2018, Brenowitz & Bretherton 2019, Bolton and Zana 2019, Rozas et al. 2019). In this study, we investigate data-driven models based on supervised convolutional neural networks and conditional generative adversarial networks (cGANs, Isola et al. 2016, Manepalli et al. 2019) for simulating precipitation, a meteorological variable heavily affected by parameterizations in weather and climate models. The problem of emulating precipitation is formulated as an image-to-image translation task, from several ERA5 reanalysis variables to a gridded observational precipitation dataset (E-OBS). We compare the performance of our supervised and generative models and explore the potential of cGANs for capturing the variability of the training data and learning stochastic machine learning parameterizations.
1INTP-ENM, UMR CNRS CNRM, Météo-France, CERFACS, 2IMT Atlantique
Bridging physics and deep learning is a topical challenge. While deep learning frameworks open avenues in physical science, the design of physically-consistent deep neural network architectures is an open issue. In the spirit of physics-informed NNs, we introduce a tool that provides new means to automatically translate physical equations, given as PDEs, into neural network architectures. This combines symbolic calculus and a neural network generator. The later exploits NN-based implementations of PDE solvers. With some knowledge of a problem, we obtain a plug-and-play tool to generate physics-informed NN architectures. They provide computationally-efficient yet compact representations to address a variety of issues, including among others adjoint derivation, model calibration, forecasting, data assimilation as well as uncertainty quantification. This work relies on the open source package PDE-NetGen. As an illustration, the workflow is first presented for the 2D diffusion equation, then applied to the data-driven and physics-informed identification of uncertainty dynamics for the Burgers equation. We will discuss some perspective for nowcasting.
1IMSG at NOAA/NWS/NCEP/EMC, 2NOAA/NWS/NCEP/EMC
We developed a shallow neural network-based emulator of a complete suite of atmospheric physics parameterizations in NCEP Global Forecast System (GFS). The suite emulated by a single NN includes radiative transfer, cloud macro- and micro-physics, shallow and deep convection, boundary layer processes, gravity wave drag etc. NCEP GFS with the neural network replacing the original suite of atmospheric parameterizations produces stable and realistic medium range weather forecasts for 24 initial conditions spanning all months of 2018. We present preliminary results of parallel runs, evaluating the accuracy and speedup.
1National Technical University of Athens, 2University of West Attica, 3NCSR Demokritos
Synthetic Aperture Radar’s (SAR) monitoring capabilities regardless of clouds or daylight, render it a valuable source of data and a superior option to optical satellite images for the demanding task of ship detection. Traditional deep learning techniques are computationally intensive and inefficient, while two-stage detectors’ need for handling different components due to sparse detection are proven not to be the ideal solution. In order to accelerate SAR imagery analysis and achieve realtime detection, three one-stage ship detectors namely Single Shot Multibox Detector (SSD), YOLOv3 and YOLOv4 are studied and compared in this paper. The detectors are trained by optimizing classification-loss and localization-loss simultaneously. To this end, two datasets for training and evaluating the ship detectors are used: an existing public SAR ship detection dataset and the Demokritos SAR Ship Dataset (DSSD), which we created via automatic methodology that we implemented. For the DSSD we additionally applied a pseudolabelling technique exploiting the detection results of the existing dataset. Experimental results illustrate that the most recent YOLOv4 model achieves a superior detection performance in both datasets when compared to the SSD and YOLOv3 models, while the reasons behind these results are identified and examined.
1Princeton Unniversity & GFDL, 2MIT
An unsupervised learning method is presented for determining global marine ecological provinces (eco-provinces) from plankton community structure and nutrient flux data. The systematic aggregated eco-province (SAGE) method identifies eco-provinces within a highly nonlinear ecosystem model. To accommodate the non-Gaussian covariance of the data, SAGE uses t-stochastic neighbor embedding (t-SNE) to reduce dimensionality. Over a hundred eco-provinces are identified with the density-based spatial clustering of applications with noise (DBSCAN) algorithm. Using a connectivity graph with ecological dissimilarity as the distance metric, robust aggregated eco-provinces (AEPs) are objectively defined by nesting the eco-provinces. Using the AEPs, the control of nutrient supply rates on community structure is explored. Eco-provinces and AEPs are unique and aid model interpretation. They could facilitate model intercomparison and potentially improve understanding and monitoring of marine ecosystems.
1Météo-France, 2Meteo France
Surface pressure charts are produced by national weather services at least twice a day. For years, attempts at automatic drawing of fronts and other synoptic-scale features have been made to facilitate the work of forecasters. First attempts were based on physical rules. Recently, computer vision methods and more specifically convolutional neural networks have been used to detect weather fronts. Training data usually come from regional reanalyses or numerical weather prediction models with horizontal resolutions from 25 km to 100 km. Surface pressure charts drawn by human forecasters are usualy used as ground-truth. The novelty of our work lies on the use of higher resolution data. Training data are analyses of the French global model Arpege with a resolution of 0,1° (approximately 10 km). A U-Net is used to detect cold and warm fronts. Ground-truths are given by surface pressure charts produced 4-times a day by the French meteorological agency. Preliminary results will be shown, together with some details of our appraoch: the pre-processing of surface pressure charts, the choice of the meteorological parameters and the tuning of the U-Net hyper-parameters.
1CMRE, 2Centre for Maritime Research and Experimentation
Variational data assimilation requires implementing the tangent-linear and adjoint (TA/AD) version of any (model and/or observation) operator used within the analysis scheme. This intrinsically hampers the use of complicated observations, for instance measuring integrated quantities and/or derived from remote sensors. Machine learning, further to more straight-forward applications such as bias correction and quality control of observations, may be used in data assimilation to derive data-driven operators. Here, we assess a new approach to assimilate observations of acoustic underwater propagation (Transmission Loss, TL) into a regional ocean analysis and forecast system. TL measurements depend on the underlying sound speed fields, hence on temperature and salinity, and their inversion would require heavy coding of the TA/AD of an acoustic underwater propagation model. In this study, the non-linear version of the acoustic model is applied to an ensemble of perturbed oceanic conditions. The resulting transmission loss outputs are used to formulate either a classical statistical linear operator based on Canonical Correlation Analysis (CCA), or a neural network-based (NN) operator. For the latter, two linearization strategies are compared, the best performing one relying on reverse mode automatic differentiation. The new observation operator is applied in data assimilation experiments over the Ligurian Sea (Mediterranean Sea), using the Observing System Simulation Experiments (OSSE) methodology to assess the impact of TL observations onto oceanic fields. TL Observations are extracted from a nature run with perturbed surface boundary conditions and ocean physics. Sensitivity analysis and forecast experiments show not only the highest accuracy of the neural network reconstruction of transmission loss with respect to the truth when compared to CCA, but also that its use in the assimilation of TL observations is able to significantly improve the upper ocean variability at regional scale. The use of the NN observation operator is computationally affordable, and its general formulation appears promising for the adjoint-free assimilation of any remote sensing observing network.
1Max Planck Institute Jena
The Earth is a complex dynamic networked system. Machine learning, i.e. derivation of computational models from data, has already made important contributions to predict and understand components of the Earth system, specifically in climate, remote sensing and environmental sciences. For instance, classifications of land cover types, prediction of land-atmosphere and ocean-atmosphere exchange, or detection of extreme events have greatly benefited from these approaches. Such data-driven information has already changed how Earth system models are evaluated and further developed. However, many studies have not yet sufficiently addressed and exploited dynamic aspects of systems, such as memory effects for prediction and effects of spatial context, e.g. for classification and change detection. In particular new developments in deep learning offer great potential to overcome these limitations.
Yet, a key challenge and opportunity is to integrate (physical-biological) system modeling approaches with machine learning into hybrid modeling approaches, which combines physical consistency and machine learning versatility. A couple of examples are given, where the combination of system-based and machine-learning-based modelling helps our understanding of aspects of the Earth system.
1LSCE-IPSL, LOCEAN-IPSL, 2LOCEAN-IPSL, CNRS, 3Princeton Univ.
The role of mesoscale eddies is crucial for ocean circulation and its energy budget. At scales of 10 to 300 km, the mesoscale eddies transfer hydrographic properties and energy at different spatial and temporal scales, hence contributing to equilibrating large scale ocean dynamics and thermodynamics, which is paramount for long-term climate modeling. Representing correctly their effect in ocean models is of greatest importance. However, ocean-eddying models remain prohibitively expensive to run, thus the development of low-resolution models with a skill comparable to their high-resolution counterparts is of high interest to the community.
In this work we use machine learning based surrogates that emulates the effect of changing parameters of closure models, then use the results for parameter selection by throwing out the "bad" sets of parameters according to some given metrics. An application on a NEMO low resolution model (1°) is considered in this work.
1Nanjing University
Current empirical or conventional statistical analysis approaches to tropical cyclone (TC) intensity and size estimation are limited. In this study, a physics-augmented deep learning model, called “DeepTCNet”, is developed to estimate TC intensity and wind radii from infrared (IR) imagery over the North Atlantic Ocean. While standard deep learning practices have achieved reliable estimates of TCs, informing this data-driven model some physics could enhance its performance. Three ways of augmenting the DeepTCNet by incorporating physical knowledge and/or some physical relationships of TCs are proposed: (1) infusing the auxiliary physical information of TC into model; (2) introducing sequential IR images which appending a physical continuity of the intensity change of TC; (3) learning auxiliary physical task. By the augmentation with auxiliary physical information of TC fullness, DeepTCNet improves the intensity estimation skill by 19% relative to the no-augmented one. Tracking TCs from sequential IR images over 18 hours improves the intensity estimation by 12% than a single IR image. By jointly learning auxiliary task of TC intensity, the wind radii estimation is improved by 12% on average than merely learning multiple wind radii tasks, and averaged 20% compared to separately learning single wind radii task. DeepTCNet also shows better performance than several state-of-the-art estimation techniques of TCs. The intensity estimation error of DeepTCNet is 35% lower than the ADT while it outperforms the MTCSWA technique with an average improvement of 28%.
1Jiangxi province meteorological observatory, 2Fuzhou city meteorological observatory
A deep learning approach for forcasting short-duration heavy rain based on historical radar data. The radar datasets consist of 12 years radar reflectivity recorded by Nanchang City.in order to capture most of convective weather,we pick out the strong convective weather based on heavy rain records(above 20 mm/hour) from Nanchang national weather stations.Thus we got 12500 radar records in total 12 years.Using CNN and LSTM units to analysis the historical radar data and predict 0-2 hour short-duration heavy rain based on radar reflectivity 1 hour before in NanChang.The CNN layers consists of two convolution layers and two pool layers to obtain the radar image feature.The LSTM layers consists of ten LSTM units for focasting.in order to prove the forcasting skills,we use the learning curve to obtain the best model parameters after lots of trial and error.Result suggests that the 1 hour forcast POD and FAR are 0.52 and 0.64,the 2hour forcast POD and FAR are 0.41 and 0.71 in 2019 whole years test.
1University of Copenhagen / Danish Meteorological Institute, 2Danish Meteorological Institute, 33Cooperative Institute for Research in Environmental Sciences, University of Colorado, 4Niels Bohr Institute, University of Copenhagen, 5ECMWF
Atmospheric radiative transfer computations are an important but computationally demanding component of numerical weather and climate models. RRTM for GCM applications - Parallel (RRTMGP) is a newly developed scheme for predicting the optical properties of the gaseous atmosphere (Pincus et al., 2019). Compared with its predecessor RRTMG, which is widely used in dynamical models, the scheme is more accurate but it is also computationally slower in the longwave due to a higher spectral resolution.
In this work, neural networks (NN) were developed to replace the longwave computations of RRTMGP. A large range of atmospheric conditions were sampled when generating the training data. The new code, which utilizes BLAS to accelerate the underlying matrix computations, is faster by a factor of 1-4, depending on the software and hardware platform. We also refactored the radiative solver RTE, which can be used together with RRTMGP to calculate radiative fluxes. Computing clear-sky longwave fluxes with the accelerated version of RTE+RRTMGP is 2-3 times faster than the original scheme when neural networks are used. The errors in fluxes and heating rates, evaluated using benchmark line-by-line calculations of atmospheric radiation, are very similar to the original scheme across a large range of atmospheric states which span both present-day and future climate. The results indicate that a targeted machine learning approach can offer a significant speed-up while retaining high accuracy.
References
Pincus, R., Mlawer, E. J., & Delamere, J. S. (2019). Balancing accuracy, efficiency, and flexibility in radiation calculations for dynamical models. Journal of Advances in Modeling Earth Systems, 11(10), 3074-3089
1Karlsruhe Institute of Technology, 2IMK-TRO, Karlsruhe Institute of Technology (KIT)
The physical and dynamical processes associated with warm conveyor belts (WCBs) have a major impact on the large-scale midlatitude dynamics and are important sources and magnifiers of forecast uncertainty. Most often, WCBs are defined as trajectories that ascend in a time interval of two days from the lower troposphere into the upper troposphere. Although this Lagrangian approach has been proven to advance our understanding of the involved processes significantly, the calculation of trajectories is computationally expensive and requires data at a high spatial and temporal resolution. In this study, we present a convolutional neural network (CNN) model that aims to predict the inflow, ascent, and outflow phases of WCBs from instantaneous gridded fields. To this regard, a UNet-type CNN is trained using a combination of meteorological parameters from ERA-Interim reanalysis as predictors. Validation against a Lagrangian-based dataset confirms that the CNN model is reliable in replicating the climatological frequency of WCBs as well as the footprints of WCBs at instantaneous time steps. The overall goal is to apply the CNN model as a verification diagnostic to large datasets such as ensemble reforecast or climate model projections. This will enable us to identify processes related to WCBs that dilute the forecast skill in these models.
1RIKEN Center for Computational Science, 2The University of Tokyo, 3RIKEN
Model bias correction has been studied as an important subject in data assimilation. Model biases can be effectively treated by statistical model bias correction methods combined with a variational or Kalman filter-based data assimilation method (Dee, 2005). Conventional methods assumed a bias correction term of a simple functional form such as a constant or a linear dependence on model state variables (Dee and Da Silva, 1998, Danforth et al., 2007). Recently, data-driven estimation of governing equations of the system in an arbitrary form, a.k.a. ‘model detection’ and ‘system identification’, has been rapidly evolving by virtue of machine learning (Brunton et al. 2016, Vlachas et al. 2018). The application of such machine learning methods to data assimilation is a potential solution to model bias correction with unknown complexity.
In this study, an application of the Long-Short Term Memory (LSTM), a kind of neural networks, on model bias correction problems is explored in the context of data assimilation using the Local Ensemble Transformed Kalman Filter (LETKF). The proposed method is applicable to a model bias which is dependent on the current and past model states in an arbitrarily nonlinear manner. Spatial localization is found effective and is also implemented. The new method is examined by idealized numerical experiments using a multi-scale Lorenz-96 model (Lorenz 1996; Wilks 2005).
As the first step, the feasibility of LSTM-based schemes is examined in an offline manner, so that the LSTM is trained with pairs of forecast and analysis time series produced by the LEKTF without bias correction. The bias correction with the LSTM showed more accurate analyses and forecasts than a simple polynomial fitting and standard neural networks in estimating the bias correction term for a missing feedback from smaller scale variables. The potential advantage of using the LSTM in various situations will be discussed.
Variety of reasons may contribute to the fact that there are many kinds of false signals in the luminous events detected by the Fengyun-4A (FY-4A) lightning mapping imager (LMI). In addition,different kinds of filtering algorithms had been established particularly to filter out corresponding false signals originating from different sources. In order to find a more general method that can filter out different kinds of false signals at the same time, an algorithm using Bayesian inference was proposed to filter out false signals detected by LMI during a thunderstorm in the Beijing-Tianjin-Hebei region on August 8,2017. First, events that occurred within 330 ms and 16.5 km of each other were considered as consecutive signals caused by lightning based on lightning clustering parameterization method. If the condition was not satisfied, the other events were regarded as isolated false signals and were filtered out. Then, there were differences between the intensity, the background brightness of consecutive signals and these of isolated false signals. Based on these differences, the Bayesian inference was proposed for the first time to filter out the false signals in the consecutive signals. Considering the possible error of Bayesian inference, a signal re-judgment method based on the characteristics of lightning spatiotemporal continuity was designed to filter out false signals as much as possible and to preserve real lightning signals. Finally, temperature of brightness blackbody (TBB) data and worldwide lightning location network (WWLLN) data were used to verify the algorithm preliminarily. The results show that the false signals account for about 50% of all signals. After filtering out the false signals, more than 90% of the signals are in the region with cloud-top temperature lower than 240 K. The signals' locations show great correspondence with these of WWLLN strokes.
1University of Reading
We present a method to reduce computational expense by making an on-the-fly decision as to which level of numerical scheme accuracy is used to progress to the next timestep during integration of a dynamical system. The decision is driven by a neural network and is performed in real time.
Traditionally, we can use numerical schemes of varying computational expense to calculate the solutions of an ODE system to different accuracy, and we use the same scheme for a whole time period. In this experiment we try to identify situations when it is worth spending more, and situations when we can afford to save computational expense.
We develop this method on a basic three-variable chaotic system: the Rossler system. Initially we choose two schemes and then train a recurrent neural network to predict the deviation of the less accurate scheme from the more accurate. This prediction is then used to decide, at each timestep, which scheme is used to proceed to the following timestep. By doing this, our method adapts to the dynamical instability of the current system state. This reduces the computational expense whilst achieving similar levels of accuracy over a fixed time interval.
We make an analysis of the relative computational costs versus accuracy reduction. We postulate under which circumstances such an adaptive method might be beneficial. As mentioned, this experiment uses an ODE system and thus only makes adaptive changes to the time-integration. The future goal is to apply this technique to larger PDE systems, in which case changes to spatial-integration will be explored.
1RIKEN, 2IMT Atlantique
The Phased-Array Weather Radar (PAWR) has been in operations since 2012 in Japan. The PAWR scans the whole sky in the 60-km range every 30 seconds at 110 elevation angles. Taking advantage of the PAWRs’ frequent and dense three-dimensional (3D) volume scans, we developed a high-resolution regional numerical weather prediction (NWP) system (SCALE-LETKF, Miyoshi et al., 2016a,b, Lien et al., 2017), and an optical-flow-based 3D precipitation nowcasting system (Otsuka et al., 2016).
Because convective clouds evolve rapidly within a 10-minute forecast, sometimes the assumption of Lagrangian persistence is violated, and the prediction skill of the nowcast drops quickly with the forecast lead time. SCALE-LETKF provides physically based predictions; therefore, NWP is supposed to outperform the nowcast for longer forecasts. Merging NWP and nowcast will provide better predictions compared to each of them.
Recent advances in the machine-learning algorithms will provide an efficient algorithm for that purpose. In this study, a 3D extension of the Convolutional Long Short-Term Memory (Conv-LSTM; Shi et al., 2015) is applied to PAWR nowcasting. In addition to the Conv-LSTM with past observations, we also develop a Conv-LSTM that accepts forecast data. NWP uses HPC resources with full physics equations of the atmosphere, so that Conv-LSTM with NWP would be a new direction toward fusing Big Data and HPC, in which training with the big data from high-resolution NWP and Data Assimilation, as well as PAWR observation would be a challenge.
The 3D Conv-LSTM successfully made predictions of convective storms. On average, the Conv-LSTM outperformed the optical flow statistically. Furthermore, Conv-LSTM with forecast data outperformed that without forecast data.
1Meteo France, 2Météo-France
Deep learning and especially Convolutional Neural Networks (CNN) have proved their ability to recognize particular shapes in meteorological fields, such as weather fronts, tropical cyclones or hailstorms. In this presentation, a U-Net architecture is used to detect bow echoes from the reflectivity forecasts of the French AROME-EPS.
A bow echo is a particular type of mesoscale convective system (MCS) which can be responsible for strong wind gusts or tornadoes, and whose main characteristic is a shape of bow in the reflectivity field. Convection-permitting models allow for an accurate representation of these events, however, detecting bow echoes in model outputs essentially relies on a visual, time-consuming examination. The neural network is trained and evaluated on hand-labelled databases of several hundreds of pictures. A comprehensive sensitivity analysis of the U-Net to several parameters has been performed to determine the best configuration. An evaluation of the U-Net performance indicates that the detection rate is above 80% and false alarm rate below 30%. Detection rate is better for heavy and large bow echoes and most of false alarms are weak and small bow echoes according to reflectivity intensity and bow echo size. We show that this detection tool could be efficiently used in operational practice to summarize the AROME-EPS and evaluate the risk of bow echo. This U-Net can also be successfully applied on the deterministic Arome outputs and on radar data, without re-training. Based on this detection on radar data, an evaluation of the AROME-EPS skill to forecast bow echoes is presented.
1The Key Laboratory of Microwave Remote Sensing, National Space Science Center, Chinese Academy of Sciences, 2The Royal Netherlands Meteorological Institute (KNMI), the Netherlands
Ku-band backscatter (NRCS) is more affected by rain than C-band scatterometers, with 10 times more rejections in wind Quality Control (QC). Their normalized residual values from wind retrieval (Rn) is correlated with rain rate, while the spatially informed QC indicator (Joss) demonstrates improved rain screening. Hence, both geophysical and spatial retrieval properties can inform rain QC.
Machine learning excels in representing complex relations between parameters, while providing a different basis and scaling for data interpretation than other modeling applications, in order to extract structural information and relations from parameters and existing models. Among them, support vector machines (SVM) are good at optimal feature extraction and value regression.
SVM is applied to collocations from ASCAT and OSCAT-2 data, with rain references from GPM products. First, the sub-set for which rain correction and wind improvement is necessary and possible is determined by investigating Rn and Joss, considering registration and observing features of the rain products. To assure SVM extract validated kernel functions to embed information in collocated C-band observations and Geophysical Model Functions, the properties of collocations are analyzed. In addition, a fraction of data is not used in training, but kept for validation. A parameter relevant to rain fraction in a Wind Vector Cell (WVC) is also proposed to describe the underlying physics of the problem. A recognition-regression procedure for rain rate and wind regression approach have been conducted. Results for rain identification on Ku-band scatterometer QC rejections achieves 70.2% reinstated WVCs with associated corrected wind speed of good accuracy. The correlation coefficient with GPM and estimated rain rate is 0.42 for rain rates ranging from 0 to 10 mm/h. Results can provide both wind and rain information for applications, and as a good reference for wind-rain uncertainty for data assimilation. Further research would be to investigate the kernel functions for the uncertainty determination of the established SVM.
1CEREA, ENPC, 2ECMWF, 3Ecole des Ponts ParisTech
Recent developments in machine learning (ML) have demonstrated impressive skills in reproducing complex spatiotemporal processes. However, contrary to data assimilation (DA), the underlying assumption behind ML methods is that the system is fully observed and without noise, which is rarely the case in numerical weather prediction. In order to circumvent this issue, it is possible to embed the ML problem into a DA formalism characterised by a cost function similar to that of the weak-constraint 4D-Var (Bocquet et al., 2019; Bocquet et al., 2020). In practice ML and DA are combined to solve the problem: DA is used to estimate the state of the system while ML is used to estimate the full model.
In realistic systems, the model dynamics can be very complex and it may not be possible to reconstruct it from scratch. An alternative could be to learn the model error of an already existent model using the same approach combining DA and ML. In this presentation, we test the feasibility of this method using a quasi geostrophic (QG) model. After a brief description of the QG model model, we introduce a realistic model error to be learnt. We then asses the potential of ML methods to reconstruct this model error, first with perfect (full and noiseless) observation and then with sparse and noisy observations. We show in either case to what extent the trained ML models correct the mid-term forecasts. Finally, we show how the trained ML models can be used in a DA system and to what extent they correct the analysis.
Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L.: Data assimilation as a learning tool to infer ordinary differential equation representations of dynamical models, Nonlin. Processes Geophys., 26, 143–162, 2019
Bocquet, M., Brajard, J., Carrassi, A., and Bertino, L.: Bayesian inference of chaotic dynamics by merging data assimilation, machine learning and expectation-maximization, Foundations of Data Science, 2 (1), 55-80, 2020
1Royal Netherlands Meteorological Institute
A global database of aerosol absorption is important to reduce the aerosol radiative forcing estimate uncertainty. Currently the quantitative aerosol absorption properties, e.g. the absorbing aerosol optical depth (AAOD) and the single scattering albedo (SSA), are mainly provided by the ground-based Aerosol RObotic NETwork (AERONET). However, the spatial distribution of AERONET sites is sparse and uneven so that geo-statistical interpolation/extrapolation is not feasible on a global scale. Quantitative retrievals of aerosol absorption are even more challenging for satellite remote sensing (e.g. multi-angular measurements), while the qualitative satellite ultra-violet aerosol index (UVAI) is much easier to obtain without a priori assumptions on aerosol optical properties in retrieval algorithms. The global UVAI record has been continuously contributed by various sensors for over four decades, which is beneficial to construct a long-term global aerosol absorption climatology. Hence, we build a numerical relationship between OMAERUV UVAI and AERONET AAOD to estimate the global AAOD based on a deep neural network algorithm. The training data set is constructed by independent measurements and/or model simulations with strict quality controls. The input features are selected by both filter and wrapper methods. The optimal model parameters are determined by 10-fold cross validations. The predicted AAOD and further derived SSA show better agreements with AERONET observations, compared with that provided by the OMAERUV and the MERRA-2.
1German Climate Computing Center (DKRZ), 2NVIDIA, 3Freie Univ Berlin
Historical temperature measurements are the basis of global climate datasets like HadCRUT4. This dataset contains many missing values, particularly for periods before the mid-twentieth century, although recent years are also incomplete. Here we demonstrate that artificial intelligence can skilfully fill these observational gaps when combined with numerical climate model data. We show that recently developed image inpainting techniques perform accurate monthly reconstructions via transfer learning using either 20CR (Twentieth-Century Reanalysis) or the CMIP5 (Coupled Model Intercomparison Project Phase 5) experiments. The resulting global annual mean temperature time series exhibit high Pearson correlation coefficients (≥0.9941) and low root mean squared errors (≤0.0547 °C) as compared with the original data. These techniques also provide advantages relative to state-of-the-art kriging interpolation and principal component analysis-based infilling. When applied to HadCRUT4, our method restores a missing spatial pattern of the documented El Niño from July 1877. With respect to the global mean temperature time series, a HadCRUT4 reconstruction by our method points to a cooler nineteenth century, a less apparent hiatus in the twenty-first century, an even warmer 2016 being the warmest year on record and a stronger global trend between 1850 and 2018 relative to previous estimates. We propose image inpainting as an approach to reconstruct missing climate information and thereby reduce uncertainties and biases in climate records.
From:
Kadow, C., Hall, D.M. & Ulbrich, U. Artificial intelligence reconstructs missing climate information. Nature Geoscience 13, 408–413 (2020). https://doi.org/10.1038/s41561-020-0582-5
1Netherlands eScience Center, 2Utrecht University
Recent developments in deep learning have led to many new neural networks and methods potentially applicable to weather forecasting and climate science. Among these techniques, Bayesian deep learning (BDL) serves as a good candidate to improve and speed up weather forecasts. However, before implementing it for operational weather forecasting, it is necessary to understand its characteristics in weather forecasting workflows, for instance in representing different types of uncertainties. In this study, we use Bayesian Long-Short Term Memory neural networks (BayesLSTM) to forecast output from the Lorenz 84 system with seasonal forcing. Fundamental concepts of data assimilation are reflected upon in our analysis and we demonstrate that the BayesLSTMs are able to address uncertainties in the initial conditions and model parameters. We evaluate different possibilities to generate ensemble forecasts with the BayesLSTM and formulate a viable strategy for sampling the BayesLSTM and making forecasts. By perturbing the Lorenz 84 model parameters and initial conditions, we find that forecasts with the BayesLSTM perform similarly as the perturbed Lorenz model outputs in the temporal space, the spectral space, and the Euclidean space, depending on lead time. We show that the forecasts with the neural network can stay close to the attractor of the Lorenz system. Our study indicates that BDL is promising to accelerate forecast speed and enhance weather forecasting capabilities.
1Deimos Space, 2DEIMOS Space S.L.U., 3Agencia Estatal de Meteorología
Recent years have seen a sharp growth in Satellite Earth Observation (EO) product applications, such as environment and resource monitoring, emergency management and civilian security, leading to an increase in demands on amount, type and quality of remote-sensing satellite data and efficient methods for data analysis. While modern ML and AI algorithms are revolutionizing automatization, speed and quality of data analysis, the use of satellite EO-based image products for rapid meteorological and civil security applications is still limited by the bottleneck and long latencies created by the classical EO data chain, which involves the acquisition, compression, and storage of sensor data on-board the satellite, and its transfer to ground for further processing.
The H2020 EU project EO-ALERT (http://eo-alert-h2020.eu), addresses this problem through the development of a next-generation EO data processing chain that moves optimised key elements from the ground segment to on-board the satellite. Applying optimized ML methods, EO products are generated directly on-board the spacecraft.
The capabilities of the EO-ALERT product and its remote sensing data processing chain are demonstrated in an application scenario for meteorological nowcasting and very short-range forecasting for early warnings of convective storms. Its 3-step-approach consists of: Candidate convective cell extraction from satellite imagery, tracking of their position and features extracted from IR and visible channels over time, and the discrimination of convective cells in their different stages of evolution using ML classifiers (AdaBoost). Training and validation are performed using a specifically created dataset of MSG-SEVIRI images and OPERA weather-radar network composites corresponding to 205 days between 2016 and 2018 exhibiting extreme convective weather events. The performance is further compared against NWCSAF’s Rapid Developing Thunderstorms (RDT-CW) product. Through on-board implementation, the system is able to detect and predict convective storm cells and to send the processed information to ground, within 5 minutes of the observation.
1State Key Laboratory of Atmospheric Boundary Layer Physics and Atmospheric Chemistry (LAPC), Institute of Atmospheric Physics, Chinese Academy of Sciences, 2Computer Network Information Center, Chinese Academy of Sciences, 3Chengdu University of Information Technology
Current mainstay algorithms of precipitation nowcasting are based on radar echo extrapolation techniques, as radar data can provide detailed and continuous snapshots of moving storms at high spatiotemporal resolution (e.g. 1 km, 6 min). Assimilating radar data into mesoscale numeric weather prediction (NWP) models induces limited improvements in prediction skill because these two sources of information are intrinsically of different spatiotemporal scales of the underlying complex atmospheric flow. Here we attempt to rely on AI techniques as a model-data fusion framework to assimilate various sources of information for precipitation nowcasting. The convolutional LSTM (ConvLSTM) deep neural networks are adopted as the cornerstone of our fusion framework for their established ability to learn spatiotemporal representations across scales. We find that the ConvLSTM nowcasting trained with radar echo data predicts blurred radar echo images especially for extended short periods. This is partly due to the configuration of the nowcasting system that put more weights on heavy rains, and partly due to the absence of incorporating the dynamics of the atmosphere in the nowcasting system. The ConvLSTM nowcasting trained with the ERA5-land hourly Total Precipitation product (0.1°) captures the smooth evolutions of the atmospheric flow but fails to detect abrupt variations. We extend the training datasets to include diagnostics from the Weather Research and Forecasting (WRF) simulations, such as CAPE (Convective Available Potential Energy) and PW (Precipitable Water). By assimilating additional information about the dynamics of the atmosphere, the ConvLSTM nowcasting succeeds in detecting the abrupt variations of the atmosphere. We conclude that AI is a promising technique to assimilate multiple sources of information across different scales in precipitation nowcasting. Further in-depth investigations are needed to develop AI-based model-fusion algorithms (e.g. transfer learning) that can better assimilate these diverse data and break the conventional limits of nowcasting.
1RIKEN
In the era of modern science, scientists have developed numerical models to predict and understand the weather and ocean phenomena based on fluid dynamics. While these models have shown high accuracy at kilometer scales, they are operated with massive computer resources because of their computational complexity. In recent years, new approaches to solve these models based on machine learning have been put forward. The results suggested that it is possible to reduce the computational complexity by Neural Networks (NNs) instead of classical numerical simulations. In this project, we aim to shed light upon different ways to accelerating physical models using NNs. We test two approaches: Data-Driven Statistical Model (DDSM) and Hybrid Physical-Statistical Model (HPSM) and compare their performance to the classical Process-Driven Physical Model (PDPM). DDSM emulates the physical model by a NN. The HPSM, also known as super-resolution, uses a low-resolution version of the physical model and maps its outputs to the original high-resolution domain via a NN. To evaluate these two methods, we measured their accuracy and their computation time. Our results of idealized experiments with a quasi-geostrophic model show that HPSM reduces the computation time by a factor of 3 and it is capable to predict the output of the physical model at high accuracy up to 9.25 days. The DDSM, however, reduces the computation time by a factor of 4 and can predict the physical model output with an acceptable accuracy only within 2 days. These first results are promising and imply the possibility of bringing complex physical models into real time systems with lower-cost computer resources in the future.
1ECMWF
Billions of new observations are added every day to an already vast record of earth system observations from satellite and ground-based measuring devices. The future will see increasing diversity from sources like smallsats and networked devices (such as smartphones or internet of things devices). There are also important and often unique observing capabilities in research campaigns and field sites. This data is used for two purposes: to make analyses of the evolving geophysical state, and to validate and improve physical models of the earth system. Hence, observations are foundational in both the initial conditions for geophysical forecasting and the models used to make forward predictions. The current state of the art, both for analysis and model parameter estimation, is data assimilation (DA). The new wave of machine learning (ML) for earth sciences may offer possibilities including the complete replacement of the DA process and the learning of new model components from scratch. But ML will have to contend with the characteristics of real observations: that they are indirect, ambiguous, sparse, diverse, only partially representative, and affected by many uncertainties. Current DA methods have the tools to handle these issues in a statistically optimal manner, whereas current ML approaches are typically only applied to regular, `perfect' data. However, there is no conflict between ML and DA since they are both founded in Bayesian probabilistic methods and they have an exact mathematical equivalence. The DA and ML methods applied in the earth sciences can learn from each other, and the future is likely to be the combination of both.
1NCAR, 2National Center for Atmospheric Research
Earth scientists have developed complex models to represent the interactions among small particles in the atmosphere, such as liquid water droplets and volatile organic compounds. These complex models produce significantly different results from their simplified counterparts but are too computationally expensive to run directly within weather and climate models. Machine learning emulators trained on a limited set of runs from the complex models have the potential to approximate the results of these complex models at a far smaller computational cost. However, there are many open questions about the best approaches to generating emulation data, training ML emulators, evaluating them both offline and within Earth System Models, and explaining the sensitivities of the emulator.
In this talk, we will discuss the development of machine learning emulators for warm rain bin and superdroplet microphysics as well as the Generator of Explicit Chemistry and Kinetics of Organics in the Atmosphere (GECKO-A) model. For microphysics, we ran CAM6 for 2 years with the cloud-to-rain processes handled by either the bin or superdroplet scheme and saved the inputs and outputs of the scheme globally at selected time steps. We used this information to train a set of neural network emulators. The neural network emulators are able to approximate the behavior of the bin and superdroplet schemes while running within CAM 6 but at a computational cost close to the bulk MG2 scheme. Machine learning interpretation techniques also reveal the relative contributions and sensitivities of the different inputs to the emulator. We will discuss lessons learned about both the training process and the resulting model climate.
For GECKO-A, we ran the model forward in time with multiple sets of fixed atmospheric conditions and different precursor compounds. Then we trained fully connected and recurrent neural networks to emulate GECKO-A's. We tested the different machine learning methods by running them forward in time with both fixed and varying atmospheric conditions. We plan to incorporate the GECKO-A emulator into a full 3D atmospheric model to evaluate how the transitions between precursors, gases, and aerosols evolve spatio-temporally. We will also discuss lessons learned from working with this dataset and the challenges of problem formulation and evaluation. Finally, we are releasing data from both of these domains as machine learning challenge problems to encourage further innovation in this area by the broader Earth Science and Machine Learning communities.
1ESA
The Rise of AI for EO
AI and EO is a marriage made in heaven!,
Today, AI is in the midst of a true “renaissance”, driven by Moore’s Law* and now super-fed by Big Data. We believe AI has a huge, but still largely untapped potential for EO technology. In a sense, we are now at an inflection point, at a kind of crossroads of opportunities, whereby on the one hand AI is becoming one of the most transformative technologies of the century, while on the other hand European EO capability is delivering a totally unique and comprehensive and dynamic picture of the planet, thereby generating big open data sets to be explored by AI.
Due to the rapid increase in the volume and variety of EO data sources, AI techniques become increasingly needed to analyse the data in an automatic, flexible and scalable way. Today, EO data remotely sensed from space are particularly suited - but at the same time challenging - for AI processing as they are:
• Big in size and volume with Terrabytes of data routinely streamed daily from space, which at the end needs to be turned into “small” actionable information.
• Diverse including a variety of sensors from optical (multispectral and hyperspectral) to radar data. Up to now, AI has been applied mainly to optical imagery, in particular at very high resolution by use of traditional Computer Vision techniques (using mainly RGB bands). More work is needed to make full use of all available spatial, temporal and spectral information of EO data at the global scale. e.g. exploiting the full information of the “complex nature” of radar data within AI schemes, including information on the amplitude, frequency, phase or polarization of the collected radar echoes,
• Complex and physically-based capturing dynamic features of a highly non-linear coupled Earth System. This goes well beyond merely recognising cats and dogs, in images where a wide variety of training datasets are available (such as ImageNet) and also calls to integrate physical principles into the statistical approach.
Machines algorithms powered by AI are therefore critically needed to accelerate “insight” into the data but always in combination with domain experts vital to properly interpret the statistical correlations and data. The intersection of AI and EO remains an emergent field, but a rapidly growing one. There has been a lot of work on ML applied to EO over the past decade, but with the rapid emergence of Deep Learning, the field has been growing rapidly, as illustrated by the increase in the number of publications. However, although being very powerful, DL techniques of course suffer from their own inherent limitations such as being data hungry while lacking transparency and not being able to distinguish causation.
In this talk, we will present some of the ESA activities on AI4EO, and in particular the Φ-lab, in order to better understand how to harness the full power of Artificial Intelligence for Earth Observation (AI4EO).
1NOAA, 2RTI
In this presentation we will attempt to demonstrate that adopting modern AI techniques, including ML, has the potential to optimize the Numerical Weather Prediction (NWP) process and ultimately, the Earth System Model of the future, when all components of the Environment become coupled at the assimilation and forecasting levels. In this presentation, we will highlight some of the results of an incubator effort done in NOAA’s center for satellite applications and research, where the initial work has focused on the aspect of satellite data exploitation. The study covers (1) instrument calibration error correction, (2) pre-processing of satellite data and quality control, (3) parameterization of radiative transfer modeling, (4) data fusion and data assimilation, as well as (5) post-processing of NWP outputs to correct for systematic and geophysically-varying errors and finally (6) Spatial and Temporal resolutions enhancement.
We will assess the quality of the analyses obtained using an entirely AI-based system, by checking the inter parameters correlation matrix, the spatial variability spectrum and the mass conservation. A first step shown in this study is to ensure that we can perform similar steps currently done in NWP without loss of accuracy and without introducing artefacts, but with significant efficiency increase. This will allow us to assimilate and exploit a higher volume of data and to begin exploiting other sources of environmental data such as IoT, smallsats, near-space platforms, etc.
We will also discuss the potential of AI/ML beyond the efficiency aspect, and the limitations that should be circumvented in order to achieve the full potential of AI/ML in NWP.
1NVIDIA
In the last few years, the Earth System Science community has rapidly come to adopt machine learning as a viable and useful approach for doing science, and it has been applied to a surprising diverse array or problems, with an ever-increasing degree of success. However, despite our progress, there remains much to learn. In its current form, ML may be best viewed as an alternative and complimentary approach to traditional software development. However, we have a great deal to learn about how to best employ and deploy these tools, which require a new and more ‘orgnanic’ process. Furthermore, there are many challenges specific to science that need to be further explored including: massive data labelling, enforcing physical constraints, uncertainty quantification, explainability, reliability, AI safety, data movement problems, and the need for benchmarks. Finally, it is my opinion that ML has the potential to grow far beyond its current limits, as a wide range of possibilities open up when both the software and the software-engineer and composed of code. In this presentation, we will explore these issues and the survey cutting-edge research that is taking place on the frontiers of ML sampling from topics like: self-supervision, continual learning, online-learning, human in the loop, AutoML, neural architecture search, expanded use of GANs, loss-function learning, spatio-temporal prediction, equation identification, ODE learning, differential programming, amongst others.
1Colorado State University
Neural networks (NNs) have emerged as a promising tool in many meteorological applications. While they perform amazingly well at many complex tasks neural networks are generally treated as a black box, i.e. it is typically considered too difficult a task to understand how they work. However, a better understanding of neural networks would have many advantages. A better understanding could provide important information for the design and improvement of NNs, increase trust in NN methods especially for operational use, and even enable us to gain new scientific knowledge from NN models. Fortunately, progress in the computer science field of Explainable AI (XAI) is yielding many new methods that can help scientists gain a better understanding of a NN's inner workings. For example, neural network visualization methods, such as Layer-Wise Relevance propagation (LRP), can help meteorologists extract strategies the neural network uses to make its decisions. Furthermore, viewing the problem more from a meteorologist’s perspective, another important tool is synthetic experiments, where we design synthetic inputs that represent specific meteorological scenarios and test the response of the neural network to those inputs. We present some of these techniques and demonstrate their utility for sample applications. For example, we show how these methods can be used to identify strategies used by a neural network trained to emulate radar imagery from GOES satellite imagery. Finally, we look at the process of gaining insights into neural networks for meteorological applications as a whole – and highlight that it is an iterative, scientist-driven discovery process, that incorporates old fashioned methods of hypothesis generation and testing and experimental design. In this context NN visualization tools simply provide additional tools to assist the meteorologist in this endeavor, and are comparable to a biologist's use of a microscope as tool for scientific analysis.
1TU Berlin
With the unprecedented advances in the satellite technology, recent years have witnessed a significant increase in the volume of remote sensing (RS) image archives (Demir and Bruzzone 2016). Thus, the development of efficient and accurate content based image retrieval (CBIR) systems in massive archives of RS images is a growing research interest in RS. CBIR aims to search for RS images of the similar information content within a large archive with respect to a query image. To this end, CBIR systems are defined based on two main steps: i) image description step (which characterizes the spatial and spectral information content of RS images); and ii) image retrieval step (which evaluates the similarity among the considered descriptors and then retrieve images similar to a query image in the order of similarity).
Due to the significant growth of RS image archives, an image search and retrieval through linear scan (which exhaustively compares the query image with each image in the archive) is computationally expensive and thus impractical. This problem is also known as large-scale CBIR problem. In large-scale CBIR, the storage of the data is also challenging as RS image contents are often represented in high-dimensional features. Accordingly, in addition to the scalability problem, the storage of the image descriptors also becomes a critical bottleneck. To address these problems, approximate nearest neighbour (ANN) search has attracted extensive research attention in RS. In particular, hashing based ANN search schemes have become a cutting-edge research topic for large-scale RS image retrieval due to their high efficiency in both storage cost and search /retrieval speed. Hashing methods encode highdimensional image descriptors into a low-dimensional Hamming space where the image descriptors are represented by binary hash codes. By this way, the (approximate) nearest neighbours among the images can be efficiently identified based on the Hamming distance with simple bit-wise operations. In addition, the binary codes can significantly reduce the amount of memory required for storing the content of images. Traditional hashing-based RS CBIR systems initially extract hand-crafted image descriptors and then generate hash functions that map the original high-dimensional representations into low-dimensional binary codes, such that the similarity to the original space can be well preserved (Fernandez-Beltran et al. 2020). Thus, descriptor generation and hashing processes are independently applied, resulting in sub-optimal hash codes. Success of DNNs in image feature learning has inspired research on developing DL based hashing methods, which can simultaneously learn the image representation and the hash function with proper loss functions.
This paper aims at presenting recent advances in CBIR systems in RS for fast and accurate information discovery from massive data archives. Initially, we analyse the limitations of the traditional CBIR systems that rely on the hand-crafted RS image descriptors applied to exhaustive search and retrieval problems. Then, we focus our attention on the advances in RS CBIR systems for which the DL models are at the forefront. In particular, we present the theoretical properties of the deep hashing based CBIR systems that have high time-efficient search capability within huge data archives (Roy et al. 2020). A particular attention is given to the metric learning and the graph structure driven deep hashing networks for scalable and accurate content-based indexing and retrieval of RS images. Finally, the most promising research directions in RS CBIR are discussed together with the description of the BigEarthNet (which is a new large-scale Sentinel-2 multispectral benchmark archive introduced to advance deep learning studies in RS) (Sumbul et al. 2019).
REFERENCES
Demir, B., Bruzzone, L., 2016. Hashing Based Scalable Remote Sensing Image Search and Retrieval in Large Archives. IEEE Transactions on Geoscience and Remote Sensing 54 (2):892-904.
Fernandez-Beltran, R., Demir, B., Pla, F. and Plaza, A., 2020. Unsupervised Remote Sensing Image Retrieval using Probabilistic Latent Semantic Hashing. IEEE Geoscience and Remote Sensing Letters, doi: 10.1109/LGRS.2020.2969491.
Roy, S., Sangineto, E., Demir, B., Sebe, N., 2020. Metric-Learning based Deep Hashing Network for Content Based Retrieval of Remote Sensing Images. IEEE Geoscience and Remote Sensing Letters, in press.
Sumbul, G., Charfuelan, M., Demir, B., Markl, V., 2019. BIGEARTHNET: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding. IEEE International Conference on Geoscience and Remote Sensing Symposium, pp. 5901-5904, Yokohama, Japan.
1China Meteorological Administration, 2Chinese Academy of Meteorological Sciences, 3Hebei Meteorological Observatory
Numerical weather predictions (NWP) and observational systems have improved greatly both on quantity and quality in recent years. Combining expertise knowledge and machine learning (ML) to extract and synthesize valuable information from these "big data" is expected to one of present major challenges and opportunities to improve the severe weather forecast. A ML correction method based on the physical ingredients was developed in this study to improve the hour-scale flash heavy rain forecast, which contains a key factors extraction technique by blending ML feature engineering and expert knowledge , and several ML models.
The summer flash heavy rain events in the Beijing-Tianjin-Hebei region of China are studied. Firstly, the hourly features were analyzed based on ten years observations. Then a comprehensive verification was conducted with the operational forecasts including the high resolution ECMWF global model and mulitple local forecasts (SMS 9 km and 3km, GRAPES 9km and 3km, BJ-RUC 9km and 3km) , and those with better performances are chosen as ML Components. Data in 2018 and 2019 summer are used for training and test the ML models. Here, we report on initial results.
More than 200 thermo-dynamical features were selected by expertise. Multiple ML feature engineering techniques including the correlation coefficient, the mutual information, and the embedded methods are combined and used to furtherly select features and get rid of the redundancy. The results show that the finer the scale is the more important the physical features are. There is a good correlation between 6hr (or 12hr) accumulated precipitation and the estimated precipitation. However, the direct rainfall forecasts become less important when it comes to the 3hr (or 1hr) accumulations, and the contributions of thermo-dynamical features are significantly enhanced, in particular the moisture at low level and the wind field near surface. This means that the uncertainty of model rain is increasing with the finer of the forecast, which is consistent with the professional cognition. Therefore, by using forecasted physical ingredients with more reliability and the rapidly updating observations may be potential to improve the hour-scale flash heavy rain forecast. Multiple ML models including ET (Extra Tree)/RF (Radom Forest)/Catboost were trained, and then a secondary ensemble was conducted. The results show that the ML correction based on physical features can significantly improve the forecast on 3-hour cumulative precipitation compared to the ECMWF and local meso-scale model. Taking the threshold of 10mm rainfall as an example, the Ts score of ET model is 0.27, which is about 35% higher than those of the numerical forecasts (0.2 or less); the missing alarm rate is about 26% lower than ECMWF forecast, and the false alarm rate is about 28% lower than the local SMS model. A case study of flash heavy rain in August 2019 shows that both the intensity and the heavy rain area are significantly improved in the ML correction system.
1ESA Climate Office
Climate change is arguably the greatest challenge facing humankind in the twenty-first century. The United Nations Framework Convention on Climate Change (UNFCCC) provides the vehicle for multilateral action to combat climate change and its impacts on humanity and ecosystems. In order to make decisions on climate change mitigation and adaptation, the UNFCCC requires a systematic monitoring of the global climate system.
The objective of the ESA Climate Change Initiative (CCI) programme is to realise the full potential of the long term global EO archive that ESA together with its Member States have established over the last 35 years, as a significant and timely contribution to the climate data record required by the UNFCCC.
Since 2010 the programme has contributed to a rapidly expanding body of scientific knowledge on 22 Essential Climate Variables (ECVs), through the production of Climate Data Records. Although varying across geophysical parameters, ESA Climate Data Records follow a community-driven data standards, so facilitating their blending and application.
AI has played a pivotal role in the production of these Climate Data Records. Eleven CCI projects - Aerosol CCI, Cloud CCI, Fire CCI, Greenhouse Gases CCI, Ocean Colour CCI, Sea Level CCI, Soil Moisture CCI, High Resolution Landcover CCI, Biomass CCI, Permafrost CCI, and Sea Surface Salinity CCI - have applied AI in their data record production and research, or identified specific AI usage for their research roadmap.
The use of AI in these CCI projects is varied, for example to detect burned areas in Fire CCI, to retrieve dust Aerosol Optical Depth from thermal infrared spectra in Aerosol CCI, and pixel classification via a custom built AI tool in Ocean Colour CCI. Moreover, the ESA climate community have identified climate science gaps in context to ECVs with the potential for meaningful advancement through AI.
1ECMWF, 2LERMA / Observatoire de Paris / CNRS, 3CESBIO
Since July 2019, a SMOS neural network soil moisture product has been assimilated into the ECMWF operational soil moisture simplified extended Kalman filter (SEKF) as part of the land data assimilation system. There are two versions of SMOS neural network soil moisture products produced routinely at ECMWF. The first has been trained on the SMOS level 2 soil moisture product and is delivered to ESA. The second is trained on the ECMWF operational model soil moisture values and this is the one that is then assimilated into the SEKF. The neural network products will be compared, with differences due to the different training datasets used and despite the same SMOS observation inputs. Also, assimilation results for the product trained on the ECMWF model soil moisture will be presented.
In the context of the EUMETSAT HSAF project, recent work to develop neural network methods linking ASCAT backscatter measurements to soil moisture from ERA5, both in a retrieval and as part of a forward model, will also be introduced.
1RIKEN
At RIKEN, we have been exploring a fusion of big data and big computation, and now with AI techniques and machine learning (ML). The new Japan’s flagship supercomputer “Fugaku”, ranked #1 in the most recent TOP500 list (https://www.top500.org/) in June 2020, is designed to be efficient for both double-precision big simulations and reduced-precision machine learning applications, aiming to play a pivotal role in creating super-smart “Society 5.0.” Our group in RIKEN has been pushing the limits of numerical weather prediction (NWP) through two orders of magnitude bigger computations by taking advantage of the previous Japan’s flagship supercomputer named “K computer”. The efforts include ensemble Kalman filter experiments with 10240 ensemble members and 100-m-mesh, 30-second-update “Big Data Assimilation” by fully exploiting the novel phased array weather radar. Now with the new “Fugaku” in mind, we have been exploring ideas for fusing Big Data Assimilation and AI. The ideas include fusion of data-driven precipitation nowcasting with process-driven NWP, NWP model acceleration using neural networks (NN), applying ML to satellite and radar operators in data assimilation (DA), and NWP model’s systematic error identification and correction by NN. The data produced by NWP models become bigger and moving around the data to other computers for ML may not be feasible. Having a next-generation computer like “Fugaku”, good for both big NWP computation and ML, may bring a breakthrough toward creating a new methodology of fusing data-driven (inductive) and process-driven (deductive) approaches in meteorology. This presentation will provide general perspectives toward future developments and challenges in NWP, with some specific research examples of DA-AI fusion at RIKEN.
1Imperial College London
Can Artificial Neural Network (NN) lean (and/or replace) a Data Assimilation (DA) process? What would be the effect of this approach?
DA is the Bayesian approximation of the true state of some physical systems at a given time by combining time-distributed observations with a dynamic model in an optimal way. NN models can be used to learn the assimilation process in different ways. In particular, Recurrent Neural Networks can be efficiently applied for this purpose.
NNs can approximate any non-linear dynamical system. How DA can be used to improve the performance of a NN?
DA can be used, for example, to improve the accuracy of a NN by including information provided by external data. In general, DA can be used to ingest meaningful data in the training process of a NN.
We show the effectiveness of these methods by case studies and sensitivities studies.
1Ecole des Ponts ParisTech
The recent introduction of machine learning techniques in the field of numerical geophysical prediction has expanded the scope so far assigned to data assimilation, in particular through efficient automatic differentiation, optimisation and nonlinear functional representations. Data assimilation together with machine learning techniques, can not only help estimate the state vector but also the physical system dynamics or some of the model parametrisations. This addresses a major issue of numerical weather prediction: model error. I will discuss from a theoretical perspective how to combine data assimilation and deep learning techniques to assimilate noisy and sparse observations with the goal to estimate both the state and dynamics, with, when possible, a proper estimation of residual model error. I will review several ways to accomplish this using for instance offline, variational algorithms and online, sequential filters. The skills of these solutions with be illustrated on low-order chaotic dynamical systems. Finally, I will discuss how such techniques can enhance the successful techniques of data assimilation. Examples will be taken from collaborations with J. Brajard, A. Carrassi, L. Bertino, A. Farchi, M. Bonavita, P. Laloyaux, and Q. Malartic.
1Ecole des Ponts ParisTech
A novel method based on the combination of data assimilation and machine learning is introduced. The combined approach is designed for emulating hidden, possibly chaotic, dynamics and/or to devise data-driven parametrisations of unresolved processes in dynamical numerical models.
The method consists in applying iteratively a data assimilation step, here ensemble Kalman filter or smoother, and a neural network. Data assimilation is used to effectively handle sparse and noisy data. The output analysis is spatially complete and is used as a training set by the neural network. The two steps can then be repeated iteratively.
We will show the use of this combined DA-ML approach in two set of experiments. In the first one the goal is to infer a full surrogate model. Here we carry experiments using the chaotic 40-variables Lorenz 96 model and show that the surrogate model achieves high forecast skill up to two Lyapunov times, it has the same spectrum of positive Lyapunov exponents as the original dynamics and the same power spectrum of the more energetic frequencies. In this context we will also illustrate the sensitivity of the method to critical setup parameters: the forecast skill decreases smoothly with increased observational noise but drops abruptly if less than half of the model domain is observed.
In the second set of experiments, the goal is to infer unresolved-scale parametrization. Data assimilation is applied to estimate the full state of the system from a truncated model. The unresolved part of the truncated model is viewed as model errors in the DA system. In a second step, ML is used to emulate the unresolved part, a predictor of model error given the state of the system. Finally, the ML-based parametrisation model is added to the physical core truncated model to produce a hybrid model.
Experiments are carried out using the two-scale Lorenz model and the reduced-order coupled atmosphere-ocean model MAOOAM. The DA component of the proposed approach relies on an ensemble Kalman filter while the ML parametrisation is represented by a neural network. We will show that in both cases the hybrid model yields better forecast skills than the truncated model, and its attractor resembles much more the original system’s attractor than the truncated model.
1ECMWF
Model error is one of the main obstacles to improved accuracy and reliability in state-of-the-art analysis and forecasting applications, both in Numerical Weather Prediction and in climate prediction conducted with comprehensive high resolution general circulation models. In a data assimilation framework, recent advances in the context of weak constraint 4D-Var have shown that it is possible to estimate and correct for a large fraction of model error in parts of the atmosphere. This has been demonstrated in the stratosphere where the current global observing system is sufficiently dense and homogeneous.
The recent explosion of interest in Machine Learning / Deep Learning technologies has been driven by their remarkable success in disparate application areas. This raises the question of whether model error estimation and correction in operational NWP and climate prediction can also benefit from these techniques. Based on recent results (Bonavita and Laloyaux, 2020) we aim to start to provide answers to these questions. Specifically, we show that Artificial Neural Networks (ANN) can reproduce the main results obtained with weak constraint 4D-Var in the operational configuration of the IFS model of ECMWF. More interestingly, we show that the use of ANN models inside the weak-constraint 4D-Var framework has the potential to extend the applicability of the weak constraint methodology for model error correction to the whole atmospheric column. Finally, we discuss the potential and limitations of the Machine Learning / Deep Learning technologies in the core NWP tasks. In particular, we reconsider the fundamental constraints of a purely data driven approach to forecasting and provide a view on how to best integrate Machine Learning technologies within current data assimilation and forecasting methods
1IMT Atlantique, 2IMT Atlantique, Lab-STICC, 3IMT Atlantique, LATIM, 4INTP-ENM, UMR CNRS CNRM, Météo-France, CERFACS, 5Ifremer, LOPS
This paper addresses variational data assimilation from a learning point of view. Data assimilation aims to reconstruct the time evolution of some state given a series of observations, possibly noisy and irregularly-sampled. Using automatic differentiation tools embedded in deep learning frameworks, we introduce end-to-end neural network (NN) architectures for variational data assimilation. It comprises two key components: a variational model and a gradient-based solver both implemented as neural networks. The latter exploits ideas similar to meta-learning and optimizer learning. A key feature of the proposed end-to-end framework is that we may train the NN models using both supervised and unsupervised strategies. Especially, we may evaluate whether the minimization of the classic definition of variational formulations from ODE-based or PDE-based representations of geophysical dynamics leads to the best reconstruction performance.
We report numerical experiments on Lorenz-63 and Lorenz-96 systems for a weak constraint 4D-Var setting with noisy and irregularly-sampled/partial observations. The key features of the proposed neural network framework is two-fold: (i) the learning of fast iterative solvers, which can reach the same minimization performance as a fixed-step gradient descent with only a few tens of iterations, (ii) the significant gain in the reconstruction performance (a relative gain greater than 50%) when considering a supervised solver, i.e. a solver trained to optimize the reconstruction error rather than to minimize the considered variational cost. In this supervised setting, we also show that the joint learning of the variational prior and of the solver significantly outperform NN representations. Intriguingly, the trained representations leading to the best reconstruction performance may lead to significantly worse short-term forecast performance. We believe these results may open new research avenues for the specification of assimilation models and solvers in geoscience, including the design of observation settings to implement learning strategies.
1Universidad Nacional del Nordeste
Data assimilation combines different information sources using a quantification of the uncertainty of each source to weight them. Therefore, a proper consideration of the uncertainty of observations and model states is crucial for the performance of the data assimilation system. Expert knowledge and expensive offline fine tuning experiments have been used in the past to determine the set of hyperparameters that define aspects of the prior, model error and observational uncertainties and in particular their covariances. In recent years, there is a paradigm shift with several Bayesian and maximum likelihood methods that attempt to infer these hyperparameters within the data assimilation system. In this talk, I will give the foundational basis of these methods which rely on the assumption of slow variations of the hyperparameters compared to the latent state variables. An overview of maximum likelihood methods including the online and batch expectation-maximization algorithm, gradient-based optimization and a Bayesian hierarchical method based on nested ensemble Kalman filters will be discussed. Finally, some experiments to estimate stochastic parameterizations that compare the methods in a proof-of-concept dynamical system will be shown.
1Colorado State University
Deep Learning has been shown to be efficient for many data-assimilation problems, and many deep learning methods have been used for this purpose. However, these applications typically focus on obtaining a best estimate of the atmospheric state, while providing a proper uncertainty estimate is as least as important. This is even more problematic as deep learning is prove to overfitting as the number of parameters to be estimated is always larger than the output dimension. Ad hoc techniques like weight decay and drop out have been proposed to avoid the overfitting, and indeed they do provide a regularisation of the problem, but the methods cannot be seen consistently as prior information (even though this has been claimed in the literature).
In this presentation I will discuss the problem, show why standard techniques for uncertainty quantification are not appropriate, and formulate a principled way to treat uncertainty quantification in Deep Learning. Existing ideas for Bayesian Deep Learning have been shown to scale badly with dimension, so special interest will be given to scalability, exploring existing techniques from data assimilation. Since it is unlikely that the full data-assimilation system will be abandoned in favour of deep learning, the incorporation of deep learning with uncertainty quantification into an existing data-assimilation structure will also be discussed.
1National Center for Atmospheric Research
Weather forecasting has progressed from being a very human-intensive effort to now being highly enabled by computation. The first big advance was in terms of numerical weather prediction (NWP), i.e., integrating the equations of motion forward in time with good initial conditions. But the more recent improvements have come from applying machine-learning (ML) techniques to improve forecasting and to enable large quantities of machine-based forecasts.
One of the early successes of the use of AI in weather forecasting was the Dynamical Integrated foreCast (DICast®) System. DICast builds on several concepts that mimic the human forecasting decision process. It leverages the NWP model output as well as historical observations at the site for the forecast. It begins by correcting the output of each NWP model according to past performance. DICast then optimizes blending of the various model outputs, again building on the past performance record. DICast has been applied to predict the major variables of interest (such as temperature, dew point, wind speed, irradiance, and probability of precipitation) at sites throughout the world. It is typical for DICast to outperform the best individual model by 10-15%. One advantage of DICast is that it can be trained on a relatively limited dataset (as little as 30 to 90 days) and updates dynamically to include the most recent forecast information. The gridded version of this system, the Graphical Atmospheric Forecast System (GRAFS) can interpolate forecasts to data-sparse regions.
DICast and other machine-learning methods have been applied by the National Center for Atmospheric Research (NCAR) to various needs for targeted weather forecasts. Such applications include hydrometeorological forecasting for agricultural decision support; forecasting road weather to enhance the safety of surface transportation; forecasting movement of wildland fires; and predicting wind, and solar power for utilities and grid operators to facilitate grid integration. NCAR has found AI/ML to be an effective for postprocessing for these and many applications and it has become part of any state-of-the-science forecasting system.
An example of using multiple AI methods for targeted forecasts is predicting solar power production. AI methods are used in both Nowcasting and in forecasting for Day-Ahead grid integration. DICast is one of the methods that blends input from multiple forecast engines. For the very short ranges, NCAR developed a regime-based solar irradiance forecasting system. This system uses k-means clustering to identify cloud regimes, then applies a neural network to each regime separately. This system was shown to out-predict other methods that did not utilize regime separation. NCAR is currently designing a comprehensive wind and solar forecasting system for desert regions that combines NWP with machine-learning approaches. For the first few hours, ML approaches leverage historical and real-time data. DICast improves on the NWP forecasts. The meteorological variables are converted to power using a model regression tree. The Analog Ensemble ML approach further improves the forecast and provides probabilistic information. This systems approach that leverages the best of NWP with ML shows success at providing a seamless forecast across multiple time scales for use-inspired applications.
1University of Colorado
Despite the scientific consensus on climate change, drastic uncertainties remain. Crucial questions about regional climate trends, changes in extreme events, such as heat waves and mega-storms, and understanding how climate varied in the distant past, must be answered in order to improve predictions, assess impacts and vulnerability, and inform mitigation and sustainable adaptation strategies. Machine learning can help answer such questions and shed light on climate change. I will give an overview of our climate informatics research, focusing on semi- and unsupervised deep learning approaches to studying rare and extreme events, and downscaling temperature and precipitation.
1ETH Zurich, 2ECMWF
Quantifying uncertainty in weather forecasts is critical, especially for predicting extreme weather events. This is typically accomplished with ensemble prediction systems, which consist of many perturbed numerical weather simulations, or trajectories, run in parallel. These systems are associated with a high computational cost and often involve statistical post-processing steps to inexpensively improve their raw prediction qualities. We propose a mixed model that uses only a subset of the original weather trajectories combined with a post-processing step using deep neural networks. These enable the model to account for non-linear relationships that are not captured by current numerical models or post-processing methods.
Applied to global data, our mixed models achieve a relative improvement in ensemble forecast skill (CRPS) of over 13%. Further, we demonstrate that this improvement is even more significant for extreme weather events on selected case studies. We also show that our post-processing can use fewer trajectories to achieve comparable results to the full ensemble. This can enable reduced computational costs for ensemble prediction systems or allow higher resolution trajectories to be run within operational deadlines, resulting in more accurate raw ensemble forecasts.
1Météo-France
Since the number of available NWP forecasts is rapidly increasing, especially with the development of ensemble prediction systems, there is a need to develop innovative forecast products that provide a synthetic view of the weather situation and potential risks. A promising option is to identify different weather patterns in NWP outputs, that can then be used to delimit areas of interest, to provide a diagnostic of occurrence of a given event, or to issue pattern-based probability maps. The detection of weather objects has been performed for several years, mainly with algorithmic approaches based on a set of simple rules. In recent years, machine learning and deep learning algorithms have provided powerful tools to segment objects, that can overcome some limitations of standard algorithms and allow for the detection of more complex and finer-scale objects. In this presentation we show that the well-known U-Net convolutional neural network, a typical encoder-decoder architecture with skip connections, can be successfully applied to the high-resolution Arome model outputs for the detection of several weather features with different spatial scales, including continuous and intermittent rainfall areas, weather fronts, and two extreme weather events, tropical cyclones and bow echoes. The performance of these detections strongly relies on the availability and accuracy of large training and testing datasets. Since no off-the-shelf data are available, a time-consuming human labelling exercise has been performed for each pattern considered. In order to extend the application of our U-Net to a wider range of outputs, without increasing labelling and training steps, transfer learning has been successfully used. It is shown in particular that a network trained on a specific geographic region can be applied to other domains, and networks trained on NWP outputs can properly detect similar objects in corresponding observations.
1ECMWF
The capability of machine learning to learn complex, non-linear behaviour from data offers many application areas across the numerical weather prediction workflow, including observation processing, data assimilation, the forecast model and the emulation of physical parametrisation schemes, as well as post-processing. This talk provides an overview on the activities at ECMWF to explore the potential of machine learning, and in particular deep learning, to improve weather predictions in the coming years.
1MeteoSwiss
The numerical weather predictions (NWPs) approach to forecasting of surface winds and their corresponding uncertainty in complex terrain remains an important challenge. Even for kilometer-scale NWP, many local topographical features remain unaccounted, often resulting in biased forecasts with respect to local weather conditions.
Through statistical postprocessing of NWP, such systematic biases can be adjusted a posteriori using wind measurements. However, for unobserved locations, these approaches fail to give satisfying results. Indeed, the complex and nonlinear relationship between model error and topography calls for more advanced techniques such neural networks (NN).
In addition, the prevalence of aleatoric uncertainties in wind forecasts demands the adoption of a probabilistic approach where the statistical model is not only trained to predict an error expectation (the bias), but also its scale (standard deviation). In this context, the model must be trained and evaluated using a proper scoring rule.
To this end, we developed a machine learning application to efficiently handle very large datasets, train probabilistic NN architectures, and test multiple combinations of predictors. This enabled us to improve the quality of the model direct output not only at the location of reference measurements, but also at any given point in space, and for forecasts up to 5 days. More importantly, the results underline that the combination of physical models with a data-driven approach opens new opportunities to improve weather forecasts.
1Cooperative Institute for Research in the Atmosphere (CIRA), 2University of Oklahoma, 3National Center for Atmospheric Research
We have developed a convolutional neural network (CNN) to predict tornadoes at lead times up to one hour in a storm-centered framework. We trained the CNN with data similar to those used in operations – namely, a radar image of the storm and a sounding of the near-storm environment. However, CNNs and other ML methods are often distrusted by users, who view them as opaque “black boxes” whose decisions cannot be explained. To address this problem, the field of interpretable ML has emerged, providing methods for understanding what an ML model has learned. However, interpretation methods can be misleading, often producing artifacts ("noise") rather than illuminating the true physical relationships in the data. To address both of these problems (opaque models and noisy interpretation results), we have applied several interpretation methods to the CNN for tornado prediction, each augmented with either a statistical-significance test or physical constraints.
Specifically, we use four interpretation methods: the permutation importance test, saliency maps, class-activation maps, and backward optimization. For the permutation test, we use four different versions of the test and apply significance-testing to each, which allows us to identify where the ranking of predictor importance is robust. For saliency and class-activation maps, we use the "sanity checks" proposed by Adebayo et al. (2018; http://papers.nips.cc/paper/8160-sanity-checks-for-saliency-maps), augmented with formal significance tests. These tests ensure that interpretation results cannot be reproduced by a trivial method like an untrained edge-detection filter. For backward optimization, which produces synthetic storms that minimize or maximize tornado probability (prototypical non-tornadic and tornadic storms, respectively), we use physical constraints that force the synthetic storms to be more realistic.
To our knowledge, this work is one of the few applications of ML interpretation, and the only one with significance-tested or physically constrained ML interpretation, in the geosciences. As ML becomes more integrated into geoscience research and everyday applications, such work will be crucial in building ML systems that are trusted and understood by humans.
1TU Munich, 2Researcher
Ensemble weather predictions require statistical postprocessing of systematic errors to obtain reliable and accurate probabilistic forecasts. Traditionally, this is accomplished with distributional regression models in which the parameters of a predictive distribution are estimated from a training period. We propose a flexible alternative based on neural networks that can incorporate nonlinear relationships between arbitrary predictor variables and forecast distribution parameters that are automatically learned in a data-driven way rather than requiring prespecified link functions. In a case study of ECMWF 2-m temperature forecasts at surface stations in Germany, the neural network approach significantly outperforms benchmark postprocessing methods while being computationally more affordable. Key components to this improvement are the use of auxiliary predictor variables and station-specific information with the help of embeddings. Furthermore, the trained neural network can be used to gain insight into the importance of meteorological variables, thereby challenging the notion of neural networks as uninterpretable black boxes. Our approach can easily be extended to other statistical postprocessing and forecasting problems. We anticipate that recent advances in deep learning combined with the ever-increasing amounts of model and observation data will transform the postprocessing of numerical weather forecasts in the coming decade.
1University of Oxford
Weather forecasting systems have not fundamentally changed since they were first operationalised nearly 50 years ago. They use traditional finite-element methods to solve the fluid dynamical flow of the atmosphere and include as much sub-grid physics as they can computationally afford. Given the huge amounts of data currently available from both models and observations new opportunities exist to train data-driven models to produce these forecasts. Traditional weather forecasting models are steadily improving over time, as computational power and other improvements allow for increased spatial resolution, effectively incorporating more physics into our forecasts. However these improvements are best seen in the prognostic variables of weather forecasting: e.g. velocity, temperature, pressure. For other quantities of arguably greater importance, for example precipitation, these improvements come at a slower pace.
The current boom in machine learning (ML) has inspired several groups to approach the problem of weather forecasting. Here we will provide an overview of the latest attempts at this from local now-casting of precipitation up to global forecasts of atmospheric dynamics. We will then present our latest efforts towards a multi-model system leveraging existing numerical models to incorporate physical understanding within a data-driven machine learning approach for skilful forecasting of global precipitation over several days.
1University of Oxford, 2ECMWF, 3University Of Oxford
The rise of machine learning offers many exciting avenues for improving weather forecasting. Possibly the lowest hanging fruit is the acceleration of parameterisation schemes through machine learning emulation. Parameterisation schemes are highly uncertain closure schemes necessitated by the finite grid-spacing of weather forecasting models. Here we assess the challenges and benefits of emulating two parameterisation schemes related to gravity wave drag in the IFS model of ECMWF. Despite the similar structure of these schemes we find that one poses far greater challenge to build a successful emulator. After successful offline testing we present results coupling our emulators to the IFS model. In coupled mode the IFS still produces accurate forecasts and climatologies. Building on this we use our emulator in the data assimilation task, leveraging that tangent-linear and adjoint models of neural networks can be easily derived.
1University of Minnesota, 2University of Pittsburgh, 3USGS
Physics-based models of dynamical systems are often used to study engineering and environmental systems. Despite their extensive use, these models have several well-known limitations due to incomplete or inaccurate representations of the physical processes being modeled. Given rapid data growth due to advances in sensor technologies, there is a tremendous opportunity to systematically advance modeling in these domains by using machine learning (ML) methods. However, capturing this opportunity is contingent on a paradigm shift in data-intensive scientific discovery since the “black box” use of ML often leads to serious false discoveries in scientific applications. Because the hypothesis space of scientific applications is often complex and exponentially large, an uninformed data-driven search can easily select a highly complex model that is neither generalizable nor physically interpretable, resulting in the discovery of spurious relationships, predictors, and patterns. This problem becomes worse when there is a scarcity of labeled samples, which is quite common in science and engineering domains.
This talk makes a case that in a real-world systems that are governed by physical processes, there is an opportunity to take advantage of fundamental physical principles to inform the search of a physically meaningful and accurate ML model. Even though this will be illustrated for a few problems in the domain of aquatic sciences and hydrology, the paradigm has the potential to greatly advance the pace of discovery in a number of scientific and engineering disciplines where physics-based models are used, e.g., power engineering, climate science, weather forecasting, materials science, and biomedicine.
1University of Michigan
Atmospheric General Circulation Models (GCMs) contain computationally-demanding physical parameterization schemes, which approximate the unresolved subgrid-scale physics processes. This work explores whether a selection of machine learning (ML) techniques can serve as computationally-efficient emulators of physical parameterizations in GCMs, and what the pros and cons of the different approaches are. We test the ML emulators in a simplified model hierarchy with NCAR’s Community Atmosphere Model version 6 (CAM6), which is part of NCAR’s Community Earth System Model. Dry and idealized-moist CAM6 model configurations are used, which employ simplified physical forcing mechanisms for radiation, boundary layer mixing, surface fluxes, and precipitation (in the moist setup). Several machine learning techniques are developed, trained, and tested offline using CAM6 output data. These include linear regression, random and boosted forests, and multiple deep learning architectures. We show that these methods can reproduce the physical forcing mechanisms. We also show that the growing complexity of the physical forcing in our model hierarchy puts increased demands on the ML algorithms and their training & tuning. We compare the different machine learning techniques and discuss their strengths and weaknesses.
1Environment and Climate Change Canada
In an Ensemble Kalman Filter (EnKF), many short-range forecasts are used to propagate error statistics. In the Canadian global EnKF system, different ensemble members use different configurations of the forecast model. The integrations with different versions of the model physics can be used to optimize the probability distributions for the model parameters.
Continuous parameters accept a continuous range of values. Categorical parameters can serve as switches between different parametrizations. In the genetic algorithm, the best member are duplicated, while adding a small perturbation, and the worst performing configurations are removed. The algorithm is being used in the migration of the global ensemble prediction system to an upgraded version of the model.
Quality is measured with both a deterministic and an ensemble score, using the observations assimilated in the EnKF system. With the ensemble score, the algorithm can converge to non-Gaussian distributions. Unfortunately, for several model parameters, there is not enough information to improve the distributions. The optimized system has slight reductions in biases for humidity sensitive radiance measurements. Modest improvements are also
seen in medium-range ensemble forecasts.
1Rice University
Numerical weather prediction (NWP) models require ever-growing computing time and resources, but still, have sometimes difficulties with predicting weather extremes. We introduce a data-driven framework that is based on analog forecasting (prediction using past similar patterns), and employs a novel deep learning pattern-recognition technique (capsule neural networks, CapsNets) and an impact-based auto-labeling strategy. Using data from a large-ensemble fully coupled Earth system model, CapsNets are trained on mid-tropospheric large-scale circulation patterns (Z500) labeled $0-4$ depending on the existence and geographical region of surface temperature extremes over North America several days ahead. The trained networks predict the occurrence/region of cold or heat waves, only using Z500, with accuracies (recalls) of $69\%-45\%$ ($77\%-48\%$) or $62\%-41\%$ ($73\%-47\%$) $1-5$ days ahead. Using both surface temperature and Z500, accuracies (recalls) with CapsNets increase to $\sim 80\%$ ($88\%$). In both cases, CapsNets outperform simpler techniques such as convolutional neural networks and logistic regression, and their accuracy is least affected as the size of the training set is reduced. The results show the promises of multi-variate data-driven frameworks for accurate and fast extreme weather predictions, which can potentially augment NWP efforts in providing early warnings.
1University of Washington, Seattle, USA, 2Microsoft
We present a deep convolutional neural network (CNN) to forecast four variables on spherical shells characterizing the dry global atmosphere: 1000-hPa height, 500-hPa height, 2-m surface temperature and 700-300 hPa thickness. The variables are carried on a cubed sphere, which is a natural architecture on which to evaluate CNN stencils. In addition to the forecast fields, three external fields are specified: a land-sea mask, topographic height, and top-of-atmosphere insolation. The model is recursively stepped forward in 12-hour time steps while representing the atmospheric fields with 6-hour temporal and roughly 1.9 x 1.9 degree spatial resolution. It produces skillful forecasts at lead times up to about 7 days. The model remains stable out to arbitrarily long forecast lead times.
As an example of its climatological behavior, panel (a) in the figure shows the 1000-hPa and 500-hPa height fields from a free running forecast 195 days after a July initialization. The model correctly develops active wintertime weather systems in response to the seasonal changes in top-of-atmosphere insolation. As a qualitative comparison, panels (b) and (c) show the verification and the climatology for the same January 15th.
While our model certainly does not provide a complete state-of-the-art weather forecast, its skill is less than 2 days of lead time behind the approximately equivalent horizontal resolution T63 137L IFS. It is difficult to make a rigorous timing comparison between our model, which runs on a GPU, and the T63 IFS which was run on a multi-core CPU, but reasonable wall-clock estimates suggest our model is three orders of magnitude faster. It remains to be seen how more advanced deep-learning weather prediction models will compare to current NWP models with respect to both speed and accuracy, but these results suggest they could be an attractive alternative for large-ensemble weather and sub-seasonal forecasting.
1University of Washington, Seattle, USA, 2Microsoft
We develop an ensemble prediction system (EPS) based on a purely data-driven global atmospheric model that uses convolutional neural networks (CNNs) on the cubed sphere. Mirroring practices in operational EPSs, we incorporate both initial-condition and model-physics perturbations; the former are sub-optimally drawn from the perturbed ECMWF ReAnalysis 5 members, while the latter are produced by randomizing the training process for the CNNs. Our grand ensemble consists of 320 perturbed members, each of 32 CNNs run with 10 perturbed initial conditions. At lead times up to two weeks, our EPS lags the state-of-the-art 50-member ECMWF EPS by about 2-3 days of forecast lead time, and is modestly under-dispersive, with a spread-skill ratio of about 0.75.
For weekly-averaged forecasts in the sub-seasonal-to-seasonal forecast range (2-6 weeks ahead), a particularly challenging window for weather forecasting, our data-driven EPS consistently outperforms persistence forecasts of 850-hPa temperature and 2-meter temperature, with useful skill relative to climatology as measured by the ranked probability skill score and the continuous ranked probability score (CRPS). Over twice-weekly forecasts in the period 2017-18, the CRPS of our model matches that of the ECMWF EPS to within 95% statistical confidence bounds for T850 at week 4 and weeks 5-6. While our model performs similarly for T2 compared to T850, the ECMWF EPS includes a coupled ocean model, which results in much better T2 forecasts, especially over tropical oceans. Our EPS is closest to parity with the ECMWF EPS in the extra-tropics, especially during spring and summer months, where the ECMWF ensemble is weakest. Notably, our model, while only predicting six 2-dimensional atmospheric variables, runs extremely efficiently, computing all 6-week forecasts in under four minutes on a single GPU. Nevertheless, further work remains to incorporate ocean and sea-ice information into our data-driven model to improve its representations of large-scale climate dynamics.