Costa Christopoulos

and 6 more

This work integrates machine learning into an atmospheric parameterization to target uncertain mixing processes while maintaining interpretable, predictive, and well-established physical equations. We adopt an eddy-diffusivity mass-flux (EDMF) parameterization for the unified modeling of various convective and turbulent regimes. To avoid drift and instability that plague offline-trained machine learning parameterizations that are subsequently coupled with climate models, we frame learning as an inverse problem: Data-driven models are embedded within the EDMF parameterization and trained online using output from large-eddy simulations (LES) forced with GCM-simulated large-scale conditions in the Pacific. Rather than optimizing subgrid-scale tendencies, our framework directly targets climate variables of interest, such as the vertical profiles of entropy and liquid water path. Specifically, we use ensemble Kalman inversion to simultaneously calibrate both the EDMF parameters and the parameters governing data-driven lateral mixing rates. The calibrated parameterization outperforms existing EDMF schemes, particularly in tropical and subtropical locations of the present climate, and maintains high fidelity in simulating shallow cumulus and stratocumulus regimes under increased sea surface temperatures from AMIP4K experiments. The results showcase the advantage of physically-constraining data-driven models and directly targeting relevant variables through online learning to build robust and stable machine learning parameterizations.

Sheide Chammas

and 5 more

Clouds, especially low clouds, are crucial for regulating Earth’s energy balance and mediating the response of the climate system to changes in greenhouse gas concentrations. Despite their importance for climate, they remain relatively poorly understood and are inaccurately represented in climate models. A principal reason is that the high computational expense of simulating them with large-eddy simulations (LES) has inhibited broad and systematic numerical experimentation and the generation of large datasets for training parametrization schemes for climate models. Here we demonstrate LES of low clouds on Tensor Processing Units (TPUs), application-specific integrated circuits that were originally developed for machine learning applications. We show that TPUs in conjunction with tailored software implementations can be used to simulate computationally challenging stratocumulus clouds in conditions observed during the Dynamics and Chemistry of Marine Stratocumulus (DYCOMS) field study. The TPU-based LES code successfully reproduces clouds during DYCOMS and opens up the large computational resources available on TPUs to cloud simulations. The code enables unprecedented weak and strong scaling of LES, making it possible, for example, to simulate stratocumulus with $10\times$ speedup over real-time evolution in domains with a $34.7 \mathrm{km} \times 53.8 \mathrm{km}$ horizontal cross section. The results open up new avenues for computational experiments and for substantially enlarging the sample of LES available to train parameterizations of low clouds.

Melanie Bieli

and 5 more

The small-scale microphysical processes governing the formation of precipitation particles cannot be resolved explicitly by cloud resolving and climate models. Instead, they are represented by microphysics schemes that are based on a combination of theoretical knowledge, statistical assumptions, and fitting to data (“tuning”). Historically, tuning was done in an ad-hoc fashion, leading to parameter choices that are not explainable or repeatable. Recent work has treated it as an inverse problem that can be solved by Bayesian inference. The posterior distribution of the parameters given the data—the solution of Bayesian inference—is found through computationally expensive sampling methods, which require over O(10^5) evaluations of the forward model; this is prohibitive for many models. We present a proof-of-concept of Bayesian learning applied to a new bulk microphysics scheme named “Cloudy”, using the recently developed Calibrate-Emulate-Sample (CES) algorithm. Cloudy models collision-coalescence and collisional breakup of cloud droplets with an adjustable number of prognostic moments and with easily modifiable assumptions for the cloud droplet mass distribution and the collision kernel. The CES algorithm uses machine learning tools to accelerate Bayesian inference by reducing the number of forward evaluations needed to O(10^2). It also exhibits a smoothing effect when forward evaluations are polluted by noise. In a suite of perfect-model experiments, we show that CES enables computationally efficient Bayesian inference of parameters in Cloudy from noisy observations of moments of the droplet mass distribution. In an additional imperfect-model experiment, a collision kernel parameter is successfully learned from output generated by a Lagrangian particle-based microphysics model.

Oliver Dunbar

and 3 more

Targeted high-resolution simulations driven by a general circulation model (GCM) can be used to calibrate GCM parameterizations of processes that are globally unresolvable but can be resolved in limited-area simulations. This raises the question of where to place high-resolution simulations to be maximally informative about the uncertain parameterizations in the global model. Here we construct an ensemble-based parallel algorithm to locate regions that maximize the uncertainty reduction, or information gain, in the uncertainty quantification of GCM parameters with regional data. The algorithm is based on a Bayesian framework that exploits a quantified posterior distribution on GCM parameters as a measure of uncertainty. The algorithm is embedded in the recently developed calibrate-emulate-sample (CES) framework, which performs efficient model calibration and uncertainty quantification with only O(10^2) forward model evaluations, compared with O(10^5) forward model evaluations typically needed for traditional approaches to Bayesian calibration. We demonstrate the algorithm with an idealized GCM, with which we generate surrogates of high-resolution data. In this setting, we calibrate parameters and quantify uncertainties in a quasi-equilibrium convection scheme. We consider (i) localization in space for a statistically stationary problem, and (ii) localization in space and time for a seasonally varying problem. In these proof-of-concept applications, the calculated information gain reflects the reduction in parametric uncertainty obtained from Bayesian inference when harnessing a targeted sample of data. The largest information gain results from regions near the intertropical convergence zone (ITCZ) and indeed the algorithm automatically targets these regions for data collection.