Discussion
The aims of this study were (1) to evaluate the wide applicability of
proteomic fingerprinting for species identification in marine science
across different metazoan phyla and classes, (2) to identify critical
steps in sample preparation and data processing, and (3) to contribute
to the development of standard procedures and best practices for
MALDI-TOF MS based metazoan classification. The general applicability to
metazoans has been proven before (Mazzeo et al., 2008; Dieme et al.,
2014; Yssouf et al., 2014; Flaudrops et al., 2015; Mazzeo and Siciliano,
2016; Maász et al., 2017; Rossel and Martínez Arbizu, 2019; Rossel et
al., 2020a). However, here we show for the first time the applicability
of this method to a large taxonomic range using a comprehensive data set
with an overall species identification success rate of 93%.
Similar high identification success rates on species level were observed
for individual metazoan groups (Hynek et al., 2018; Vega-Rúa et al.,
2018; Holst et al., 2019; Loaiza et al., 2019; Rakotonirina et al.,
2020; Rossel et al., 2020a). Additionally, our results show that
specimens absent from the reference library will be assigned to the
correct phyla or class with a high probability implying some kind of
phylogenetic signal on higher taxonomic level as was already reported
for congeneric Drosophila before (Feltens et al., 2010). Testing
if species would be classified as a congeneric species in the absence of
the actual species was less promising in our study with only 30% of
specimens being assigned to a congeneric species. This complies with
other studies that only show occasional similarity of congeneric species
e.g. in cluster analyses but without consistency across all congeneric
species (Laakmann et al., 2013; Chavy et al., 2019; Rossel and Martínez
Arbizu, 2019).
In closely related species, morphological identification can often be
complicated. Using proteomic fingerprinting, these problems can however
be resolved as indicated by the analysis of the A. irregulariscomplex. Even though mass spectra show high similarities, distinct
patterns of peak presence and absence as well as pronounced differences
in relative peak intensities serve as good markers for species
identification. Beyond mere species identification, the example ofE. acutifrons shows the power of the method to differentiate
specimens even on a sex level. This has been shown before for e.g. the
fish species Alburnus alburnus (Linnaeus, 1758) (Maász et al.,
2017). Whereas authors focused on presence and absence of peaks, we were
able to show that also relative intensities of certain mass peaks play
an important role in differentiation of sexes. Prior studies on larger
planktonic copepods have also shown a great potential for
differentiation of developmental stages based on a proteomic fingerprint
(Rossel et al., 2022).
Finally, we have shown the necessity of comprehensive reference
libraries. Low numbers of specimens per species in reference libraries
fail to provide sufficient information on species specific mass spectra
features and intraspecific variability. Only with around nine to ten
reference specimens per species, the identification error stabilizes on
a constantly low level. This supports findings by Rakotonirina et al
(2020) who found an increase of identification score with increasing
numbers of available main spectrum patterns. In general we would
recommend to use more than three specimens per species and preferably to
include around ten specimens for every species in a reference library.
MALDI-TOF MS can be used as a universal method for species
identification of metazoan species. Due to the short preparation time,
low costs (Tran et al., 2015; Rossel et al., 2019) and high
identification success it can be a valuable tool in biodiversity
assessments replacing time-intense morphological identification or
costly DNA barcoding. Especially in cases of closely related or very
similar species it can foster a rapid identification. The applicability
of proteome fingerprinting for the differentiation of cryptic species
was already shown and even in cases of morphologically very similar
species, still differences were found (Müller et al., 2013; Paulus et
al., 2022).
Tissue samples used in this work were obtained from specimens stored
between seven to 12 years under partly unknown storage conditions. We
assume working with fresh or recently fixed material would have resulted
in even higher identification success rates. This is supported by the
high mass spectra quality obtained from fish species, which were
extracted and put into freezer storage almost immediately after sampling
(personal communication Knebelsberger). The adverse effect of fixation
and storage on resulting mass spectra quality in metazoans was
investigated several times and supports this assumption (Rossel and
Martínez Arbizu, 2018b; Rakotonirina et al., 2020). We received good
results for storage at -20°C and also for long-term storage at -80°C,
thus we recommend cold storage of samples at -20°C, until further
systematic analyses will specify threshold temperatures for short-
(months) or long-term (years) storage.
Our tests have shown that sample concentration is pivotal to obtain good
quality mass spectra. While too low sample/matrix ratios will result in
lower intensities and a higher baseline, too much tissue will increase
the noise in the data and result in unsuccessful measurements. For all
investigated taxa, the same sample preparation method was used; however
attention must be paid to the correct ratio of matrix and compound to be
analyzed. This allows the wide application of this method without
adaptation of the protocol to a certain species as it would be necessary
for methods such as COI barcoding where certain groups would need highly
specific sets of amplification primers (Lohman et al., 2009; Toumi et
al., 2013) and adjustment of PCR settings.
Much effort is put into optimizing mass spectra quality by adjusting
different preparation protocols (Jeverica et al., 2018; Wang et al.,
2021) or developing methods for steps such as baseline correction,
smoothing or peak picking (Ressom et al., 2007; Shin et al., 2010).
Methods are adjusted either to increase classification success or to
obtain better mass spectra reproducibility. Here, we tested the
influence of certain steps during data processing on classification
success focusing on the important steps for peak detection. Whereas
baseline subtraction and adjustment of a SNR value both aim at reducing
noise within the data, adjusting the HWS influences the peak picking
resolution. Thus, by decreasing the HWS during peak detection, the
number of peaks will increase as the highest peak within the HWS will be
the detected. This will result in peaks of very similar size being
recognized as distinct peaks, rather than being put together in a single
bin. This does also explain the high effect of both parameters SNR and
HWS compared to baseline subtraction. Baseline subtraction is
constrained towards reducing instrument-dependent noise. Adjustment of
the SNR value will however, like HWS alteration, affect the number of
more dominant peaks and thus the general resolution of the mass spectra.
Hence, more species-specific information is retained and more
information is available for classification. Based on our results,
rather than testing all variables, adjusting SNR and HWS should be
adequate to optimize the data pipeline. However, it needs to be
emphasized that this pipeline aims at optimizing species identification
and may not be adequate for investigation of intraspecific variability
as was shown elsewhere16.
In summary, we propose a workflow applicable for any metazoan species or
tissue sample to be identified: A comprehensive reference library is
needed with species level identification by morphological or molecular
approaches (Fig. 5a). In the lab, a small tissue (up to 1mm³) is
retrieved and incubated for at least 5 minutes in the HCCA-matrix
solution. Of the resulting extract, 1 to 1.5 µl are transferred to a
target plate for measurement. Data processing is carried out in R (Fig
5b). Mass spectra quality is done by eye and supported by R-packages
such as MALDIrppa (Palarea-Albaladejo et al., 2017). Finally, based on
previously assessed species identification, data processing can be
optimized to obtain ideal settings for classification. Depending on our
results this can be narrowed to adjustment of HWS- and SNR-value. Based
on the reference library, a RF model can be calculated for specimen
identification (Fig. 5c). Applying a post-hoc test will provide
further support for the identification. If classification is not well
supported, a RF model on class or phyla level can be applied to obtain
higher-level classification.