Figure 2. Distribution of vertical levels in various SHiELD
configurations for a surface pressure of 1000 hPa and a standard
atmospheric temperature structure.
The simulation characteristics and prediction skill of SHiELD have been
previously discussed in several papers and will not be repeated here.
Improving predictions of tropical cyclone track, intensity, and genesis
has been a key driver of SHiELD development: Chen et al. (2019a)
describes the 2016 and 2017 versions, while the considerably improved
2018 version is described in Chen et al. (2019b). Most notably SHiELD
greatly improves upon other global models’ ability to predict tropical
cyclone intensity. The large-scale prediction skill, and CONUS
precipitation and 2-m temperature skill, are briefly described for the
2016 and 2017 versions in LZhou2019 and Harris2019.
The anomaly correlation coefficient (ACC) of the 500-mb geopotential
height field is the standard means for evaluating the large-scale
prediction skill of medium-range prediction models. Figure 3 (top) shows
that the global ACC of SHiELD has been better at all lead times than the
contemporary GFS since the 2017 version, and significantly so on days
1–6. At all lead times except for days 7 and 8, each new version has
improved upon the previous version. The result for root-mean square
error (RMSE; Figure 3, bottom) is even more striking: every version is
an improvement upon the previous at every lead time, and both the 2018
and 2019 versions are significantly better than the operational GFS.
Results for just the northern hemisphere (20N–80N, Supplemental
Figure S1) are less dramatic but SHiELD still shows statistically
significant improvements in ACC and RMSE out to day 5. Both the GFS and
all versions of SHiELD reach an ACC of 0.6 at 8.3–8.5 days globally and
8.5–8.7 days in the northern hemisphere, with some year-to-year and
version-to-version variability.
The time series of day-5 global ACC and RMSE (Figure 4) shows that while
there is a general secular improvement in both SHiELD and the GFS, there
can be large seasonal and even interannual variability in forecast
skill. Usually, predictions are more skillful in northern winter, as
strong synoptic forcing dominates the large-scale weather patterns, but
some northern summers see little to no forecast degradation.
The
implementation of GFSv13 on 11 May 2016, which included a major upgrade
to the data assimilation cycling system of the GFS, significantly
reduced RMSE in May and June 2016 compared to the preceding four months
of the year. These results are worthy of further investigation. We do
conclude that it may be misleading to use a short time period to
evaluate or compare global prediction models.
The time-evolution of the large-scale forecast skill for both the GFS
and SHiELD are very similar on monthly and shorter time-periods, which
is expected as they use identical initial conditions, and SHiELD
benefits from continual upgrades of the GFS initial conditions. As
discussed in Chen et al. (2019b) the quality of the initial conditions
is the preeminent factor in determining the forecast skill for the
large-scale circulation as well as for metrics such as hurricane track
forecasts that depend closely on the prediction skill of the large-scale
flow.
These results are for hindcasts but the ACC and RMSE for our real-time
forecasts are nearly identical. An important caveat is that the
operational GFS supports nearly the entire NCEP modeling suite, and so
the GFS has many more demands and a much more stringent evaluation
process imposed upon its development than does SHiELD. The development
cycle of the GFS will therefore necessarily be less rapid and more
methodological than that of SHiELD. Alternately, an experimental
research model like SHiELD does have the freedom to pursue many
different avenues for model development (“failure is always an
option”) so that the most successful new ideas can later be
transitioned into operations, a major goal of the UFS.