Fig. 1 : Isotope range of certified reference materials and working standards used in this study11,18-20

2.2 Statistical analyses

To test normalization accuracy, we only used the 8 certified reference materials as calibration standards, while the laboratory working standards were used to test linearity in the two instruments. The isotope composition of the certified and working standards were normalized to the working gas (Eq. 1) in the vendor IRMS software (Isodat and lyticOS for the Thermo Delta-V and Elementar VisION, respectively) and then exported in a tabular format; all subsequent normalizations and analyses (Eq. 3-5) were performed in R version 4.2.135. For each normalization, two certified reference materials were designated as quality controls. Quality controls were excluded from the normalization calculation, and the performance of the normalization was assessed using the average observed isotope composition of the quality controls relative to their expected value36. For each combination of quality controls (28 unique combinations), all possible one-point, two-point, three-point, and four-point combinations of the remaining certified reference materials were determined for a total of 1568 combinations. These remaining certified reference materials were used as calibration standards. Using those combinations of calibration standards and quality controls, one-point anchoring and multipoint linear normalizations were calculated for each element (C and N) and facility (2) for a total of 6272 normalizations. Two-point normalizations composed of IAEA 600 and USGS 91 were excluded from subsequent data analysis and visualization because the small isotope range between those standards (<1‰) precluded an accurate calculation of a realistic expansion coefficient. Although these standards could be used for a two-point anchoring using an expansion coefficient derived from a different multipoint normalization, assessing that method is beyond the scope of this study.
To assess how instrument accuracy was impacted by the selection of standards and quality controls, the normalizations were characterized according to their isotope range, the matrix of the standards relative to the quality controls, and whether the normalization was extrapolated. The isotope range of each normalization was calculated for each element as the difference between the maximum and minimum expected isotope composition of the calibration standards used in that normalization. If the expected isotope value of both quality controls fell outside the isotope range of the calibration standards, then the normalization was classified as an “extrapolation”. One-point normalizations, which have an isotope range of zero, were classified as an “extrapolation” if the single calibration standard was not bracketed by the two quality controls. Finally, the matrix of each standard was classified as high organic (i.e., protein, caffeine, collagen, L-glutamic acid) or plant (i.e., plant tissue, flour). If the matrix of the calibration standards matched the matrix of the quality controls than the normalization was classified as “matrix matched”, while if the matrix of the calibration standards and quality controls were different (e.g., high organic standards used to normalize plant quality controls), then the normalization was classified as “matrix mixed”. If both the quality controls and the calibration standards were composed of a combination of plants and high organics, then the normalization was classified as “both mixed.”
The significance of differences between different normalization methodologies, facilities, matrixes, and extrapolation status were assessed using Kruskal-Wallis testing with Dunns post-hoc testing after the assumption of normality was rejected with Shapiro-Wilks’s testing35.

Results

  1. Normalization methodology comparison

The impact of the number of calibration standards on normalization errors were considered for two subsets of the data: normalizations that were matrix-matched and bounded, and normalizations that were matrix-mixed and extrapolated. The former set of conditions were hypothesized to perform better than the latter. When normalizations were matrix-matched and bounded, no significant difference for either element was observed between one-point, two-point, three-point, and four-point normalizations, although the variance of two-point normalizations is higher than other methods (Fig. 2A). When the analysis was constrained to normalizations that were matrix-mixed and extrapolated, the number of standards used imparted significant differences on the accuracy of the normalization (Fig. 2B). Two-point N normalizations (median error = 0.232‰, n = 274) had significantly higher error than one-point (median error = 0.119‰, n = 180, p < 0.0001), three-point (median error = 0.118‰, n = 278, p < 0.0001), and four-point normalizations (median error = 0.070, n = 170, p< 0.0001). Furthermore, three-point normalizations had significantly higher error than four-point normalizations (p = 0.021). Similarly, two-point C normalizations (median = 0.308‰) exhibited significantly higher error than one-point (median error = 0.209‰, p = 0.0005), three-point (median error = 0.187‰, p< 0.0001), and four-point normalizations (median error = 0.148, p < 0.0001).