3 Results

3.1 Historical period

EBM2 captures the increasing temperature trend during the twentieth century and distinguishes between high and low climate sensitivity AOGCMs (Figure 1). In all EBM2 emulations, a proportion of the RMSE (~ 0.07 K) arises from interannual variations in the AOGCM ensemble means that is not captured in the emulations (there are three members in each AOGCM historical ensemble). The performance of EBM2, however, varies substantially between AOGCMs. There are instances of both large and small RMSE emulations for both high and low climate sensitivity AOGCMs. For AOGCMs where there are substantial differences between the emulations and the AOGCM projections, the differences occur over different time periods. Differences are large for 1925-1950 (HadGEM3-GC31-LL), for 1950-1975 (NorESM2-LM) and for 2000-2015 (HadGEM3-GC31-LL, IPSL-CM6A-LR, GFDL-ESM4 and NorESM2-LM). For IPSL-CM6A-LR, temperatures are overestimated by the emulators throughout 1915-2014. Intriguingly, close emulation of temperatures in abrupt-4xCO2 does not guarantee close emulation for the historical period (e.g. GFDL-ESM4), and a relatively poor emulation of abrupt-4xCO2 does not prohibit close emulation for the historical period (e.g. CNRM-CM6-1) (Figure S1).
The step model produces emulations with RMSEs equivalent to or less than emulations from EBM2 in seven of the eight AOGCMs. The exception is NorESM2-LM which has relatively large inter-annual variability and is the only model to show an apparent cooling trend during years 20-50 of its abrupt-4xCO2 simulation (Figure S1).
EBM3 performs better than EBM2 for abrupt-4xCO2, which is expected given the additional timescales resolved by the third layer. The additional degrees of freedom enable a much closer emulation of temperatures during years 10-40 of the abrupt-4xCO2 experiment, a period when the rate of temperature increase weakens rapidly (Figure S1). However, the improvement of EBM3 over EBM2 in the abrupt-4xCO2 experiment does not consistently translate to the historical experiment. Indeed there are two AOGCMs for which EBM2 has smaller RMSEs than EBM3 (MIROC6 and IPSL-CM6A-LR). Both EBMs overestimate temperatures for 1990-2014 in four of the eight AOGCMs and generally produce larger RMSEs than the step model.