Figure 2. As Figure 1, except that temperature changes are forced by historical greenhouse gas (top row), anthropogenic aerosol (middle row), and natural (bottom row) forcings from RFMIP.

3.3 Alternative calibration of EBM2

To determine whether temperature emulations from EBM2 for the historical period can be improved by changes to the fitted parameters alone, we apply optimization (Section 2.3) to calibrate the λ and ε parameters (Figure 3, Tables S2 and S3).
This improves the emulations for all models. The greatest improvement occurs during 1980-2014 and the emulation of temperature during this period is improved further if the optimization is amended to minimize the RMSE specifically over this period. The spread in emulated temperatures about the 1:1 line is mainly driven by the small AOGCM ensemble sizes and is, therefore, similar for both EBM2 calibrations. Interannual variability is particularly large for NorESM2-LM and the emulated temperatures have a low correlation with the AOGCM temperatures for years prior to the 1980s when the climate response to forcing is relatively weak.
The emulations of the net radiative flux at the TOA (N) (Figure 3) show that close emulations of near-surface temperature can be produced despite poor emulations of N. There is a large spread in the emulations of N about the 1:1 line for all models. The emulation of N during the late twentieth/early twenty-first century is poor for HadGEM3-GC31-LL and emulated N has a weak correlation with its AOGCM for NorESM2-LM. Optimization does not improve the emulation of N. There are small changes in emulated N for CanESM5 and NorESM2-LM. The improved temperature emulations from the optimization method for HadGEM3-GC31-LL and IPSL-CM6-LR come at the expense of poorer emulations of N. This result is important because it demonstrates that climate model emulators can produce reasonable simulations of near-surface temperature change, but the evolution of ocean heat uptake and TOA energy imbalance is incorrect demonstrating limitations to physical interpretation.
We also constrained the λ and ε parameters separately for GHG and aerosol forcing using the DAMIP experiments. The constrained parameter values differ for the two types of forcing (Tables S2 and S3). Constrained parameter values also vary when RMSE is minimized over different periods of time.