Fig. 1 : Isotope range of certified reference materials and
working standards used in this study11,18-20
2.2 Statistical analyses
To test normalization accuracy, we only used the 8 certified reference
materials as calibration standards, while the laboratory working
standards were used to test linearity in the two instruments. The
isotope composition of the certified and working standards were
normalized to the working gas (Eq. 1) in the vendor IRMS software
(Isodat and lyticOS for the Thermo Delta-V and Elementar VisION,
respectively) and then exported in a tabular format; all subsequent
normalizations and analyses (Eq. 3-5) were performed in R version
4.2.135. For each normalization, two certified
reference materials were designated as quality controls. Quality
controls were excluded from the normalization calculation, and the
performance of the normalization was assessed using the average observed
isotope composition of the quality controls relative to their expected
value36. For each combination of quality controls (28
unique combinations), all possible one-point, two-point, three-point,
and four-point combinations of the remaining certified reference
materials were determined for a total of 1568 combinations. These
remaining certified reference materials were used as calibration
standards. Using those combinations of calibration standards and quality
controls, one-point anchoring and multipoint linear normalizations were
calculated for each element (C and N) and facility (2) for a total of
6272 normalizations. Two-point normalizations composed of IAEA 600 and
USGS 91 were excluded from subsequent data analysis and visualization
because the small isotope range between those standards (<1‰)
precluded an accurate calculation of a realistic expansion coefficient.
Although these standards could be used for a two-point anchoring using
an expansion coefficient derived from a different multipoint
normalization, assessing that method is beyond the scope of this study.
To assess how instrument accuracy was impacted by the selection of
standards and quality controls, the normalizations were characterized
according to their isotope range, the matrix of the standards relative
to the quality controls, and whether the normalization was extrapolated.
The isotope range of each normalization was calculated for each element
as the difference between the maximum and minimum expected isotope
composition of the calibration standards used in that normalization. If
the expected isotope value of both quality controls fell outside the
isotope range of the calibration standards, then the normalization was
classified as an “extrapolation”. One-point normalizations, which have
an isotope range of zero, were classified as an “extrapolation” if the
single calibration standard was not bracketed by the two quality
controls. Finally, the matrix of each standard was classified as high
organic (i.e., protein, caffeine, collagen, L-glutamic acid) or plant
(i.e., plant tissue, flour). If the matrix of the calibration standards
matched the matrix of the quality controls than the normalization was
classified as “matrix matched”, while if the matrix of the calibration
standards and quality controls were different (e.g., high organic
standards used to normalize plant quality controls), then the
normalization was classified as “matrix mixed”. If both the quality
controls and the calibration standards were composed of a combination of
plants and high organics, then the normalization was classified as
“both mixed.”
The significance of differences between different normalization
methodologies, facilities, matrixes, and extrapolation status were
assessed using Kruskal-Wallis testing with Dunns post-hoc testing after
the assumption of normality was rejected with Shapiro-Wilks’s
testing35.
Results
Normalization methodology
comparison
The impact of the number of calibration standards on normalization
errors were considered for two subsets of the data: normalizations that
were matrix-matched and bounded, and normalizations that were
matrix-mixed and extrapolated. The former set of conditions were
hypothesized to perform better than the latter. When normalizations were
matrix-matched and bounded, no significant difference for either element
was observed between one-point, two-point, three-point, and four-point
normalizations, although the variance of two-point normalizations is
higher than other methods (Fig. 2A). When the analysis was constrained
to normalizations that were matrix-mixed and extrapolated, the number of
standards used imparted significant differences on the accuracy of the
normalization (Fig. 2B). Two-point N normalizations (median error =
0.232‰, n = 274) had significantly higher error than one-point (median
error = 0.119‰, n = 180, p < 0.0001), three-point
(median error = 0.118‰, n = 278, p < 0.0001), and
four-point normalizations (median error = 0.070, n = 170, p< 0.0001). Furthermore, three-point normalizations had
significantly higher error than four-point normalizations (p =
0.021). Similarly, two-point C normalizations (median = 0.308‰)
exhibited significantly higher error than one-point (median error =
0.209‰, p = 0.0005), three-point (median error = 0.187‰, p< 0.0001), and four-point normalizations (median error =
0.148, p < 0.0001).