4 Discussion and conclusions

Our results show prediction errors in EBM2 for future global temperature projections vary greatly between AOGCMs, forcings, time periods and methods of emulator calibration. The errors can be large, in many cases exceeding 20%. In this section, we discuss: the implications of our results; how emulations from EBM2 might be improved; and, the real-world relevance of our results.
We agree with Nicholls et al. (2021) that close emulation of the historical period is not sufficient to guarantee reliable emulation of future temperature changes. Late twentieth-century warming is suppressed by strong aerosol cooling (Smith and Forster 2021) and opposing errors in the emulation of GHG and aerosol forcings give a misleading impression of the accuracy of emulator performance. Further, opposing trends in GHG and aerosol forcings during the twenty-first century can cause a large divergence between AOGCM and EBM2 projections. Nicholls et al. (2021) found that many climate model emulators do not reliably emulate future projections from AOGCMs for high emissions scenarios. Our results also suggest that strong mitigation scenarios may not be reliably emulated.
EBM2 calibration using the abrupt-4xCO2 simulation does not produce reliable projections of historical warming for several AOGCMs. Although calibration of the λ and ε parameters using optimization substantially reduces emulation errors for time periods where an AOGCM simulation is available, optimization of these parameters does not guarantee reliable out-of-sample projections. Further, without an AOGCM projection for a given AOGCM and scenario, it is not knowable if the EBM2 future projection will be reliable. This undermines trust in the EBM2 future projections.
Incorporating time varying feedbacks and an unforced pattern effect into EBM2 could reduce emulation errors and improve the reliability of future projections. Late twentieth-century warming has been suppressed by changes in the observed sea surface temperature (SST) patterns and associated cloud feedbacks (Andrews et al., 2018; Dong et al., 2021; Fueglistaler and Silvers, 2021) and future warming could be affected by future changes in the pattern effect (Zhou et al., 2021). Climate model simulations show that climate feedbacks weaken through time in response to step-forcings and changes in feedbacks are associated with changes in SST patterns (e.g., Dong et al., 2020; Dunne et al. 2020). To include time varying feedbacks in EBM2, however, requires further research to distinguish forced changes in feedbacks from unforced climate noise and to explicitly link global feedback changes to variations in SST patterns (e.g., using SST anomalies for regions of tropical deep convection (Fueglistaler and Silvers (2021)).
Improvements in the emulations by optimization of the λ and ε parameters could be implicitly compensating for errors arising from being unable to cleanly separate forcing and climate feedbacks in AOGCMs, as forcing estimates are dependent on the method used (Forster et al. 2013; Sherwood et al. 2015; Larson and Portmann 2016; Fredriksen et al. 2021). We used the latest estimates of ERF derived from fixed-SST simulations but substantial uncertainty in ERF remains (Forster et al. 2016; Dong et al. 2021).
We optimized the λ and ε parameters by minizing the RMSE for temperature. Using the Hector emulator, Dorheim et al. (2020) show that minimizing errors for temperature and ocean heat flux produces more physically plausible parameter tunings than minimizing errors in temperature projections alone. Our initial investigations minimizing RMSE for temperature and N, however, showed that the emulation of historical temperatures was substantially worse than minimizing RMSE for temperature alone. Incorporating time varying feedbacks may mitigate this issue. Machine learning could also provide new techniques for calibrating and designing climate model emulators (Strobach and Bel, 2020; Watson-Parris, 2020).
There are several reasons why some AOGCMs are closely emulated and others not. First, some AOGCMs have greater symmetry in their responses to GHG and aerosol forcings (Figure 2) and EBM2 assumes symmetric responses to opposing forcings. Second, optimization of the λ and ε parameters (for temperature) yields closer emulations of N for some AOGCMs (Figure 3). Third, if EBM2 has a good representation of time varying feedbacks and the evolution of pattern effects in a AOGCM, model structural error is smaller. Finally, with small ensemble sizes, some of the variation in emulation errors arises from chance.
One approach for managing the variability in emulation errors between AOGCMs is to use a multi-model ensemble. Multi-model ensembles can be used to estimate structural uncertainty (e.g., Tebaldi and Knutti, 2007) and typically offer improved skill over individual climate models (e.g., Hagedorn et al. 2005). Our AOGCM ensemble is small, however, and we find that the ensemble mean of AOGCM emulations does not perform as well as the best AOGCM (Figure 4).
Our findings are relevant to observationally contrained climate model emulators aiming to simulate real-world changes (e.g., Forster et al. 2021). Emulator structural errors and uncertainties in inputs (e.g., ERF) are as relevant to real-world emulations as to emulations of AOGCMs. Indeed, there are additional challenges. There is only one realization of past climate and future climate is unknown. Observational large ensembles (McKinnon et al. 2017) could be used to characterize uncertainty in emulating past climate. For future projections, AOGCMs remain an essential tool for estimating out-of-sample prediction errors, as done in this study, and enable the use of optimization techniques for emulator calibration.
Acknowledgments
LSJ, ACM, TA and PMF were supported by the European Union’s Horizon 2020 research and innovation programme under grant agreement No 820829 (CONSTRAIN). TA was supported by the Met Office Hadley Centre Climate Programme funded by BEIS. CJS was supported by a joint NERC-IIASA Collaborative Research Fellowship (NE/T009381/1). ACM was supported by The Leverhulme Trust (PLP-2018-278). We acknowledge: the World Climate Research Programme and its Working Group on Coupled Modeling for coordinating and promoting CMIP6; the climate modeling groups for producing their model output; the Earth System Grid Federation (ESGF) for archiving the data and providing access; and the funding agencies who support CMIP6 and ESGF.
Data Availability Statement
CMIP6 data were downloaded from the ESGF; publically available from https://esgf-node.llnl.gov/search/cmip6/. Code will be publically available with a DOI in a Zenodo repository.
References
Andrews, T., Gregory, J. M., & Webb, M. J. (2015). The dependence of radiative forcing and feedback on evolving patterns of surface temperature change in climate models. Journal of Climate ,28 (4), 1630–1648. https://doi.org/10.1175/JCLI-D-14-00545.1.
Andrews, T., Gregory, J. M., Paynter, D., Silvers, L. G., Zhou, C., Mauritsen, T., Webb, M. J., Armour, K. C., Forster, P. M., & Titchner, H. (2018). Accounting for changing temperature patterns increases historical estimates of climate sensitivity. Geophysical Research Letters, 45, 8490–8499. https://doi.org/10.1029/2018GL078887.
Armour, K. C., Bitz, C. M., & Roe, G. H. (2013). Time-Varying Climate Sensitivity from Regional Feedbacks. Journal of Climate ,26 (13), 4518–4534. https://doi.org/10.1175/JCLI-D-12-00544.1.
Bloch-Johnson, J., Rugenstein, M., Stolpe, M. B., Rohrschneider, T., Zheng, Y., & Gregory, J. M. (2021). Climate Sensitivity Increases Under Higher CO2 Levels Due to Feedback Temperature Dependence. InGeophysical Research Letters (Vol. 48, Issue 4). Blackwell Publishing Ltd. https://doi.org/10.1029/2020GL089074.
Byrne, B., & Goldblatt, C. (2013). Radiative forcing at high concentrations of well-mixed greenhouse gases. Geophys. Res. Lett. , 41, 152–160, doi:10.1002/2013GL058456.
Cummins, D. P., Stephenson, D. B., & Stott, P. A. (2020). Optimal Estimation of Stochastic Energy Balance Model Parameters. Journal of Climate , 33, 7909-7926. doi: 10.1175/JCLI-D-19-0589.1.
Colman, R., & Soldatenko, S. (2020). Understanding the links between climate feedbacks, variability and change using a two‑layer energy balance model. Climate Dynamics, 54, 3441–3459, https://doi.org/10.1007/s00382-020-05189-3.
Dong, Y., Armour, K. C., Zelinka, M. D., Proistosescu, C., Battisti, D. S., Zhou, C., & Andrews, T. (2020). Intermodel spread in the pattern effect and its contribution to climate sensitivity in CMIP5 and CMIP6 models. Journal of Climate , 33 (18), 7755–7775. https://doi.org/10.1175/JCLI-D-19-1011.1.
Dong, Y., Armour, K. C., Proistosescu, C., Andrews, T., Battisti, D. S., Forster, P. M., Paynter, D., Smith, C. J., & Shiogama, H. (2021). Biased estimates of Equilibrium Climate Sensitivity and Transient Climate Response derived from historical CMIP6 simulations.Geophysical Research Letters . https://doi.org/10.1029/2021GL095778.
Dorheim, K., Link, R., Hartin, C., Kravitz, B., & Snyder, A. (2020). Calibrating Simple Climate Models to Individual Earth System Models: Lessons Learned From Calibrating Hector. Earth and Space Science ,7 (11). https://doi.org/10.1029/2019EA000980.
Dunne, J. P., Winton, M., Bacmeister, J., Danabasoglu, G., Gettelman, A., Golaz, J. C., Hannay, C., Schmidt, G. A., Krasting, J. P., Leung, L. R., Nazarenko, L., Sentman, L. T., Stouffer, R. J., & Wolfe, J. D. (2020). Comparison of Equilibrium Climate Sensitivity Estimates From Slab Ocean, 150-Year, and Longer Simulations. Geophysical Research Letters , 47 (16). https://doi.org/10.1029/2020GL088852.
Eyring, V., Bony, S., Meehl, G. A., Senior, C. A., Stevens, B., Stouffer, R. J., & Taylor, K. E. (2016). Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. , 9, 1937–1958, doi:10.5194/gmd-9-1937-2016.
Forster, P. M., Andrews, T., Good, P., Gregory, J. M., Jackson, L. S., & Zelinka, M. (2013). Evaluating adjusted forcing and model spread for historical and future scenarios in the CMIP5 generation of climate models. J. Geophys. Res. Atmos. , 118, 1139–1150. https://doi.org/10.1002/jgrd.50174.
Forster, P. M., T. Richardson, A. C. Maycock, C. J. Smith, B. H. Samset, G. Myhre, T. Andrews, R. Pincus, & M. Schulz (2016). Recommendations for diagnosing effective radiative forcing from climate models for CMIP6, J. Geophys. Res. Atmos., 121, 12,460–12,475, doi:10.1002/2016JD025320.
Forster, P., T. Storelvmo, K. Armour, W. Collins, J. L. Dufresne, D. Frame, D. J. Lunt, T. Mauritsen, M. D. Palmer, M. Watanabe, M. Wild, & H. Zhang (2021). The Earth’s Energy Budget, Climate Feedbacks, and Climate Sensitivity. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change[Masson-Delmotte, V., P. Zhai, A. Pirani, S. L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M. I. Gomis, M. Huang, K.Leitzell, E. Lonnoy, J.B.R. Matthews, T. K. Maycock, T. Waterfield, O. Yelekçi, R. Yu and B. Zhou (eds.)]. Cambridge University Press. In Press.
Fredriksen, H., Rugenstein, M., & Graversen, R. (2021). Estimating Radiative Forcing With a Nonconstant Feedback Parameter and Linear Response. Journal of Geophysical Research: Atmospheres ,126 (24). https://doi.org/10.1029/2020jd034145.
Fueglistaler, S., & Silvers, L. G. (2021). The Peculiar Trajectory of Global Warming. Journal of Geophysical Research: Atmospheres ,126 (4). https://doi.org/10.1029/2020JD033629.
Geoffroy, O., Saint-Martin, D., Olivié, D. J. L., Voldoire, A., Bellon, G., & S. Tytéca, S. (2013a). Transient Climate Response in a Two-Layer Energy-Balance Model. Part I: Analytical Solution and Parameter Calibration Using CMIP5 AOGCM Experiments. Journal of Climate , 26, 1841-1857. doi: 10.1175/JCLI-D-12-00195.1.
Geoffroy, O., Saint-martin, D., Bellon, G., & Voldoire, A. (2013b). Transient Climate Response in a Two-Layer Energy-Balance Model. Part II: Representation of the Efficacy of Deep-Ocean Heat Uptake and Validation for CMIP5 AOGCMs. Journal of Climate , 26, 1859-1876. doi: 10.1175/JCLI-D-12-00196.1.
Gillett, N. P., Shiogama, H., Funke, B., Hegerl, G., Knutti, R., Matthes, K., Santer, B. D., Stone, D., & Tebaldi, C. (2016). The Detection and Attribution Model Intercomparison Project (DAMIP v1.0) contribution to CMIP6. Geosci. Model Dev., 9, 3685–3697. doi:10.5194/gmd-9-3685-2016.
Good, P., Gregory, J. M., & Lowe, J. A. (2011). A step‐response simple climate model to reconstruct and interpret AOGCM projections.Geophysical Research Letters , 38, L01703. doi:10.1029/2010GL045208.
Good, P., Lowe, J. A., Andrews, T., Wiltshire, A., Chadwick, R., Ridley, J. K., Menary, M. B., Bouttes, N., Dufresne, J. L., Gregory, J. M., Schaller, N., & Shiogama, H. (2015). Nonlinear regional warming with increasing CO2 concentrations. Nature Climate Change, 5(2), 138–142. doi.org/10.1038/nclimate2498.
Gregory, J. M., Andrews, T., & Good, P. (2015). The inconstancy of the transient climate response parameter under increasing CO2. Phil. Trans. R. Soc. A 373: 20140417. http://dx.doi.org/10.1098/rsta.2014.0417.
Gregory, J. M., Andrews, T., Good, P., Mauritsen, T., & Forster, P. M. (2016). Small global‑mean cooling due to volcanic radiative forcing. Clim. Dyn., 47, 3979–3991. DOI 10.1007/s00382-016-3055-1.
Hagedorn, R., Doblas-Reyes, F. J. & Palmer, T. N. (2005). The rationale behind the success of multi-model ensembles in seasonal forecasting – I. Basic concept. Tellus 57A, 219–233.
Held, I. M., Winton, M., Takahashi, K., Delworth, T., Zeng, F., & Vallis, G. K. (2010). Probing the Fast and Slow Components of Global Warming by Returning Abruptly to Preindustrial Forcing. Journal of Climate, 23, 2418-2427. Doi: 10.1175/2009JCLI3466.1.
Larson, E. J. L., & Portmann, R. W. (2016). A Temporal Kernel Method to Compute Effective Radiative Forcing in CMIP5 Transient Simulations. Journal of Climate, 29, 1497–1509. https://doi.org/10.1175/JCLI-D-15-0577.1.
Lee, J. Y., J. Marotzke, G. Bala, L. Cao, S. Corti, J. P. Dunne, F. Engelbrecht, E. Fischer, J. C. Fyfe, C. Jones, A. Maycock, J. Mutemi, O. Ndiaye, S. Panickal, & T. Zhou (2021). Future Global Climate: Scenario-Based Projections and Near-Term Information. In: Climate Change 2021: The Physical Science Basis. Contribution of Working Group I to the Sixth Assessment Report of the Intergovernmental Panel on Climate Change [Masson-Delmotte, V., P. Zhai, A. Pirani, S. L. Connors, C. Péan, S. Berger, N. Caud, Y. Chen, L. Goldfarb, M. I. Gomis, M. Huang, K. Leitzell, E. Lonnoy, J. B. R. Matthews, T. K. Maycock, T. Waterfield, O.Yelekçi, R. Yu and B. Zhou (eds.)]. Cambridge University Press. In Press.
McKinnon, K. A., Poppick, A., Dunn-Sigouin, E., & Deser, C. (2017). An “Observational Large Ensemble” to Compare Observed and Modeled Temperature Trend Uncertainty due to Internal Variability. Journal of Climate, 30, 7585–7598. https://doi.org/10.1175/JCLI-D-16-0905.1.
Modak, A., & Mauritsen, T. (2021). The 2000–2012 global warming hiatus more likely with a low climate sensitivity. Geophysical Research Letters, 48, e2020GL091779. https://doi.org/10.1029/2020GL091779.
Nicholls, Z. R. J., Meinshausen, M., Lewis, J., Gieseke, R., Dommenget, D., Dorheim, K., Fan5, C. S., Fuglestvedt, J. S., Gasser, T., Goluke, U., Goodwin, P., Hartin, C., P. Hope, A., Kriegler, E., J. Leach, N., Marchegiani, D., A. McBride, L., Quilcaille, Y., Rogelj, J., & Xie, Z. (2020). Reduced Complexity Model Intercomparison Project Phase 1: Introduction and evaluation of global-mean temperature response. Geoscientific Model Development, 13(11), 5175–5190. https://doi.org/10.5194/gmd-13-5175-2020.
Nicholls, Z., Meinshausen, M., Lewis, J., Corradi, M. R., Dorheim, K., Gasser, T., Gieseke, R., Hope, A. P., Leach, N. J., McBride, L. A., Quilcaille, Y., Rogelj, J., Salawitch, R. J., Samset, B. H., Sandstad, M., Shiklomanov, A., Skeie, R. B., Smith, C. J., Smith, S. J., Su, X., Tsutsui, J., Vega-Westhoff, B., & Woodard, D. L. (2021). Reduced complexity Model Intercomparison Project Phase 2: Synthesizing Earth system knowledge for probabilistic climate projections. Earth’s Future, 9, e2020EF001900. https://doi.org/10.1029/2020EF001900.
Pincus, R., Forster, P. M., & Stevens, B. (2016), The Radiative Forcing Model Intercomparison Project (RFMIP): experimental protocol for CMIP6. Geosci. Model Dev., 9, 3447–3460. doi:10.5194/gmd-9-3447-2016.
Rohrschneider, T., Stevens, B., & Mauritsen, T. (2019). On simple representations of the climate response to external radiative forcing.Climate Dynamics , 53 (5–6), 3131–3145. https://doi.org/10.1007/s00382-019-04686-4.
Rugenstein, M. A. A., Caldeira, K., & Knutti, R. (2016). Dependence of global radiative feedbacks on evolving patterns of surface heat fluxes.Geophysical Research Letters , 43 (18), 9877–9885. https://doi.org/10.1002/2016GL070907.
Rugenstein, M., Bloch-Johnson, J., Gregory, J., Andrews, T., Mauritsen, T., Li, C., Frölicher, T. L., Paynter, D., Danabasoglu, G., Yang, S., Dufresne, J. L., Cao, L., Schmidt, G. A., Abe-Ouchi, A., Geoffroy, O., & Knutti, R. (2020). Equilibrium Climate Sensitivity Estimated by Equilibrating Climate Models. Geophysical Research Letters ,47 (4).https://doi.org/10.1029/2019GL083898.
Senior, C. A., & Mitchell, J. F. B. (2000). The time-dependence of climate sensitivity. Geophysical Research Letters , 27 (17), 2685–2688. https://doi.org/10.1029/2000GL011373.
Sherwood, S. C., Bony, S., Boucher, O., Bretherton, C., Forster, P. M., Gregory, J. M., & Stevens, B. (2015). Adjustments in the forcing-feedback framework for understanding climate change.Bulletin of the American Meteorological Society , 96 (2), 217–228. https://doi.org/10.1175/BAMS-D-13-00167.1.
Smith, C. J., Harris, G. R., Palmer, M. D., Bellouin, N., Collins, W., Myhre, G., Schulz, M., Golaz, J.-C., Ringer, M., Storelvmo, T., & Forster, P. M. (2021). Energy Budget Constraints on the Time History of Aerosol Forcing and Climate Sensitivity. Journal of Geophysical Research: Atmospheres, 126, e2020JD033622. https://doi.org/10.1029/2020JD033622.
Smith, C. J., & Forster, P. M. (2021). Suppressed Late-20th Century Warming in CMIP6 Models Explained by Forcing and Feedbacks.Geophysical Research Letters , 48 (19). https://doi.org/10.1029/2021GL094948.
Stevens, B., Sherwood, S. C., Bony, S., & Webb, M. J. (2016). Prospects for narrowing bounds on Earth’s equilibrium climate sensitivity,Earth’s Future , 4, 512–522. doi:10.1002/2016EF000376.
Strobach, E., & Bel, G. (2020). Learning algorithms allow for improved reliability and accuracy of global mean surface temperature projections.Nature Communications , 11 (1). https://doi.org/10.1038/s41467-020-14342-9.
Tebaldi, C., & Knutti, R. (2007). The use of the multi-model ensemble in probabilistic climate projections. Phil. Trans. R. Soc. A (2007) 365, 2053–2075, doi:10.1098/rsta.2007.2076.
Watson-Parris D. (2021). Machine learning for weather and climate are worlds apart. Phil. Trans. R. Soc. A. 379: 20200098, doi.org/10.1098/rsta.2020.0098.
Winton, M., Takahashi, K., & Held, I. M. (2010). Importance of Ocean Heat Uptake Efficacy to Transient Climate Change. Journal of Climate, 23, 2333-2344, DOI: 10.1175/2009JCLI3139.1.
Zhou, C., Zelinka, M. D., Dessler, A. E., & Wang, M. (2021). Greater committed warming after accounting for the pattern effect. Nature Climate Change , 11 (2), 132–136. https://doi.org/10.1038/s41558-020-00955-x.