Figure 11. Precipitation skill scores (top) and bias score (bottom) vs. StageIV for 6-hr CONUS precipitation in three versions of C-SHiELD, given for precipitation events greater than three six-hourly accumulation thresholds (0.1, 5.0, and 25.0 mm). Skill scores are given for both Equitable Threat Score (ETS; Hogan and Mason 2012) and Fractions Skill Score (FSS; Roberts and Lean 2008). C-SHiELD 2017 is validated from May 2017 to May 2018; C-SHiELD 2018 is validated from April 2018 to May 2019; C-SHiELD 2019 is validated from January to December 2019. Validation is performed on the 4-km StageIV grid using 3x3 neighborhoods, corresponding to a 12-km radius.
Precipitation forecast skill (Figure 11, top panels) is similar among all three versions of C-SHiELD. The 2019 version has the least overall bias (Figure 11, bottom panels) as earlier versions had too much light and too little heavy precipitation. The 2019 version reduced the diurnal cycle in the bias of light and moderate precipitation, although this was still apparent in the bias score for heavy precipitation and still had a prominent high bias of heavy precipitation during the first 30 hours. We speculate that the re-configuration of the numerical diffusion, which improved storm placement, and the revised settings for the GFDL microphysics, which improved structure and evolution of the storms, combined to improve the biases in the 2019 version.
We use the surrogate severe technique of Sobash et al. (2011) to validate our 2–5 km updraft helicity (UH) fields against storm reports from the Storm Prediction Center. This is a well-established method used for evaluation of convective-scale prediction models (cf. https://hwt.nssl.noaa.gov/sfe/2018/docs/HWT_SFE_2018_Prelim_Findings_v1.pdf). We create surrogate severe fields and validate against observed severe fields to compute FSS and Bias scores in C-SHiELD and plot the results as a function of UH threshold and smoothing radius (Figure 12), similar to Figure 17 in Sobash et al. (2016). For all versions of C-SHiELD the highest FSS is found from the largest smoothing radius of 240 km and for UH thresholds of 150–200 m2 s-2, with slightly higher or lower thresholds giving similar skill scores. The UH threshold giving the best score for C-SHiELD is higher than in many other convective-scale models due to the significantly higher updraft helicities in FV3-based models (Potvin et al. 2019). This in turn is likely due to the emphasis on vorticity in the horizontal discretization as described in Harris2019.
The maximum FSS in the 2018 and 2019 versions is about 0.8, on par with operational and research convective-scale models (cf. Sobash et al. 2019) and significantly higher than the 2017 version. There is a uniform over-prediction bias for all but the highest UH thresholds (Figure 12, bottom row). This bias was significant in the 2017 version but is decreased every year for most threshold-radius combinations, and for the highest-FSS combination decreases from 0.47 in 2017 to 0.22 in 2019. C-SHiELD 2019 still has a high frequency bias except for the very highest UH thresholds, as it is still too aggressive at creating strong storms.