1 Introduction
Climate model emulators are simplified physical or statistical models
that are computationally efficient. Climate model emulators played a
central role in producing future global near-surface temperature
projections for the Working Group I Sixth Assessment Report (Forster et
al. 2021; Lee et al. 2021) of the Intergovernmental Panel on Climate
Change (IPCC AR6). The IPCC AR6 used climate model emulators to
supplement simulations from coupled atmosphere-ocean general circulation
models (AOGCMs) extending available simulations further into the future
and projecting future climate scenarios not available from AOGCMs. It is
important, therefore, that the simplifying assumptions used by emulators
are rigorously tested so the robustness of their performance is
understood.
Physically based climate model emulators, such as energy balance models
(EBMs), use bulk physical relationships to emulate the large-scale
behavior of Earth’s climate system. For example, EBMs were used by
Colman and Soldatenko (2020) to investigate links between climate
variability and climate sensitivity and, by Modak and Mauritsen (2021)
to investigate the probability of occurrence of the 2000-2012 global
warming hiatus.
Two-layer EBMs produce close emulations of idealized abrupt-4xCO2 and
1pctCO2 simulations from AOGCMs (e.g., “EBM-ε” in Geoffroy et al.
2013b; “held-two-layer-uom” in Nicholls et al. 2020). Differences
between emulations and AOGCM projections are generally greatest at times
of pronounced change in the rate of temperature increase. Such changes
are associated with time-varying feedbacks (Senior and Mitchell, 2000;
Winton et al., 2010; Armour et al., 2013; Dong et al., 2020; Dunne et
al., 2020; Rugenstein et al., 2020; Dong et al., 2021) which are caused
by evolving spatial pattern effects in surface temperature (Stevens
2016; Andrews et al., 2015; Rugenstein et al., 2016; Dong et al., 2021)
and non-linear state dependences in climate feedbacks (Good et al.,
2015; Rohrschneider et al., 2019; Bloch-Johnson et al., 2021). EBMs have
been enhanced to capture time-varying feedbacks: the Geoffroy et al.
(2013b) EBM includes an efficacy parameter for deep ocean heat uptake
and the “held-two-layer-uom” EBM also includes a state dependent
feedback parameter (Rohrschneider et al., 2019; Nicholls et al., 2020).
These paradigms, however, do not precisely capture the feedback changes
in AOGCMs and contribute to model structural error which is irreducible
unless the EBM structure is enhanced (e.g., extending a two-layer EBM to
three or more layers (Cummins et al., 2020)).
Assessments of emulator performance are more trustworthy when
projections are validated using data different from those used to
calibrate the model parameters (out-of-sample validation). EBM
parameters are frequently calibrated using idealized step-forcing
experiments (e.g., abrupt-4xCO2) with the parameters estimated using
analytical methods (Geoffroy et al., 2013a) or statistical methods
(e.g., Cummins et al., 2020). The Coupled Model Intercomparison Project
Phase 6 (CMIP6) (Eyring et al. 2016) historical and future shared
socio-economic pathway (SSP) projections for AOGCMs, therefore, are well
suited for assessing EBM emulator performance. They can be used to
produce out-of-sample assessments using realistic climate scenarios.
Although climate model emulators have been evaluated (e.g., Nicholls et
al., 2020; Nicholls et al., 2021), it is not known how well emulators
perform for the latest CMIP6 (Eyring et al. 2016) AOGCMs using
realistic, out-of-sample climate projections and latest assessments of
effective radiative forcing (ERF). Furthermore, the contribution of
irreducible model structural errors to total prediction error remains
poorly understood.
In this study, we evaluate the performance of a two-layer energy balance
model (EBM2) (Held et al. 2010; Geoffroy et al. 2013a, b) for emulating
CMIP6 historical and future temperature trends using different EBM
calibrations. We calibrate the EBM2 parameters for specific periods and
ERFs and evaluate the temperature projections for subsequent periods and
alternative ERF scenarios. EBM2 is benchmarked against an
impulse-response step model and a three-layer EBM.