Figure 1. Global mean temperature anomalies from a 1850-1900 baseline for CMIP6 AOGCMs. The range between the ensemble maximum and minimum temperature changes is shown by gray shading. Changes in temperatures are forced by historical forcings during 1850-2014 and are shown for the period 1915-2014. RMSEs are calculated over 1915-2014.

3.2 Roles of different forcings for near-surface temperature change

In Figure 2 we focus on two AOGCMs with relatively large errors in their emulations for the historical period (HadGEM3-GC31-LL and IPSL-CM6A-LR), one AOGCM with relatively small errors (CanESM5), and one AOGCM whose responses contrast with the other AOGCMs (NorESM2-LM).
Although EBM2 was calibrated using abrupt-4xCO2, errors predominantly arise from emulation of the response to GHG forcing; in part because GHG has the largest ERF. The EBM2 emulations overestimate the temperature increase due to GHGs for HadGEM3-GC31-LL, IPSL-CM6A-LR and CanESM5 (even though the CanESM5 historical fit is good). In contrast, the EBM2 emulation underestimates the temperature response to GHGs for NorESM2-LM.
Emulation of the temperature response to aerosol forcing is the largest source of error in one model (NorESM2-LM). For all models, errors associated with aerosol forcing offset errors associated with GHG forcing. This cancellation of errors gives a spurious impression of better performance for the historical simulations. As shown for the combined forcings (Figure 1), the step model produces closer emulations of temperature for both GHG and aerosol forcings.
Emulation of the temperature response to natural forcings is a small source of error for the eight AOGCMs and the emulations are mostly within the spread of the AOGCM ensemble (Figures 2 and S2). Although larger ensembles and longer simulations are required to robustly assess the emulated response to volcanic forcing, thermal inertia of the EBM2 layers and allowance for rapid cloud adjustments within RFMIP ERFs will contribute to closer emulations (Held et al. 2010; Gregory et al. 2016).