Figure 5. Results of a bootstrapping analysis applied to the
B3LYP contact density. Regression lines corresponding to bootstrap
samples (1000 in each case) are coloured blue, their means are marked as
black lines. The red bands represent 95% confidence intervals of
prediction (assuming a normal distribution). (A) Result for all data
points (i.e. excluding 9 and 10 , not shown). The
histogram in the inset shows the (non-normal) distribution of function
values for small values of the contact density. For the histogram, 10000
bootstrap samples were drawn. (B) Result for a data set that excludes
data points 5 , 6 , 7 , and 8 (coloured
gray) from the data set shown in (A).
For small values of the contact density, the ensemble of regression
lines (blue) is skewed toward larger values of the isomer shift. This
non-normal distribution of function values is illustrated in the inset
(Figure 5A) and can be explained by the presence of the cluster of four
data points to the very left of the calibration plot. Neglecting these
four data points corresponding to complexes 5 , 6 ,7 and 8 changes the distribution of regression lines
and their mean significantly (Figure 5B). Both intercept and slope
(which are strongly correlated due to the large values of the contact
density) decrease, which also decreases the ability to discriminate
between predictions. In the absence of additional data points at
intermediate to low contact densities, it is difficult to provide a
conclusive answer as to which regression line is more reliable.
Nevertheless, we consider the cluster of four data points a valuable
addition to the data set for two reasons: (i) the complexes associated
with this cluster (5 , 6 , 7 , and 8 )
have different structural motifs, providing an argument against a
systematic bias; (ii) coefficients of linear regression models
(intercept and slope) tend to be biased toward small absolute values
(“regression toward the mean”130).
In an effort to future-proof the calibration presented here and the
statistical analysis, we constructed a tool to include more data points,
facilitating manifest statistical conclusions beyond the data reported
here. To this end, an online database is set up
(tinyurl.com/mbs-notebook),
which is publicly accessible and open to submissions from other
researchers. This database can be used in at least three ways:
- Obtaining a predicted isomer shift or quadrupole splitting
including the associated uncertainty estimates simply by typing in thecomputed contact density or quadrupole splitting.
- Submitting reference data points for additional complexes to obtain
more rigorous statistics; the data points will be reviewed by the
authors.
- Obtaining complete statistical analyses by submitting new data sets
computed with a different computational setup, e.g. different basis
sets, solvation models, relativistic corrections, etc.; the data sets
will be validated by the authors.
With this database, the authors provide a tool for the prediction and
rigorous statistical analysis of computed Mössbauer parameters that will
hopefully be of value for all researchers interested in the analysis of
electronic structures with 57Fe Mössbauer
spectroscopy.