Machine learning approaches, such as random forests, have been used to effectively emulate various aspects of climate and weather models in recent years. The limitations to these approaches are not yet known, particularly with regards to varying complexity of the underlying physical parameterization scheme within the climate model. Utilizing a hierarchy of model configurations, we explore the limits of random forest emulator skill using simplified model frameworks within NCAR's Community Atmosphere Model, version 6 (CAM6). These include a dry CAM6 configuration, a moist extension of the dry model, and an extension of the moist case that includes an additional convection scheme. Each model configuration is run with identical resolution and over the same time period. With unique random forests being optimized for each tendency or precipitation rate across the hierarchy, we create a variety of "best case" emulators. The random forest emulators are then evaluated against the CAM6 output as well as a baseline neural network emulator for completeness All emulators show significant skill when compared to the "truth" (CAM6), often in line with or exceeding similar approaches within the literature. In addition, as the CAM6 complexity is increased, the random forest skill noticeably decreases, regardless of the extensive tuning and training process each random forest goes through. This indicates a limit on the feasibility of random forests to act as physics emulators in climate models and encourages further exploration in order to identify ideal uses in the context of state-of-the-art climate model configurations.
Atmospheric General Circulation Models (GCMs) continue to increase in complexity which is especially true for their computationally-demanding physical parameterizations. This work explores whether, and how, computationally-efficient machine learning (ML) techniques can become an option for replacing physical parameterization schemes in GCMs. We test this idea in a model hierarchy with NCAR’s Community Atmosphere Model version 6 (CAM6) which is part of NCAR’s Community Earth System Model (CESM 2.1). In particular, dry and idealized-moist CAM6 model configurations are considered which employ simplified physical forcing mechanisms for radiation, boundary layer mixing, surface fluxes, and precipitation (in the moist setup). Several ML models are developed, trained, and tested offline using CAM6 output data. The assessed ML techniques include linear regression, random forests, and neural networks with and without convolutional layers. Using a variety of ML hyperparameter choices, all of the ML methods are able to capture the general structure of the CAM6 physical forcing. However, in order to capture the details in the physical forcing patterns, the ML hyperparameters must be tuned. Once tuned, we compare different ML techniques against one another in order to assess their strengths and weaknesses. Future work will explore the online coupling of the ML-generated physical tendencies to the CAM6 atmospheric dynamical core.