Methods
Sample material
Tissue for measurements was taken mainly from the marine organisms
tissue bank of the Senckenberg am Meer, German Centre for Marine
Biodiversity Research, which was established using samples from numerous
studies (Knebelsberger and Thiel, 2014; Knebelsberger et al., 2014;
Markert et al., 2014; Gebhardt and Knebelsberger, 2015; Raupach et al.,
2015; Barco et al., 2016; Laakmann et al., 2016; Rossel et al., 2020b)
(supplementary table S1 for accession numbers) on North Sea metazoans.
The material from this collection was taken from specimens processed for
COI-barcoding to create reference libraries for a variety of marine
animal groups. During this process, tissue samples of the respective
specimens were stored in ethanol at -80°C. Tissue samples were available
for Bivalvia (muscle, 18 species), Cephalopoda (muscle from arm, 12
species), Gastropoda (muscle from foot, 24 species), Polyplacophora
(muscle from foot, 2 species), Ascidiacea (tissue, 1 species), Teleostei
(muscle, 67 species), Elasmobranchii (muscle, 7 species), Malacostraca
(muscle from foot or chelae, 39 species), Thecostraca (muscle from foot,
1 species), Pycnogonida (leg fragment, 1 species), Asteroidea (tube
feet, 10 species), Ophiuroidea (tissue from arm, 10 species) and
Echinoidea (tissue from the base of the tubercle, 6 species)
(nspecies= 198, nspecimens=1,246).
Sample preparation
The basic protocol of sample preparation was the same for all analyzed
tissue samples. A very small tissue fragment (< 1
mm3) was incubated for 5 minutes in HCCA as a
saturated solution in 50% acetonitrile, 47.5% molecular grade water
and 2.5% trifluoroacetic acid. Tissue from crustacean Cancer
pagurus Linnaeus, 1758, the fish Clupea harengus Linnaeus, 1758,
the cephalopod Eledone cirrhosa (Lamarck, 1798) and the
echinoderm Stichastrella rosea (O.F. Müller, 1776) was used to
find an optimal tissue to HCCA matrix ratio. Tissue was weighed on a
METTLER TOLEDO XS3DU micro-balance and the amount of matrix was adjusted
to tissue weight to obtain the desired ratios ranging from 0.012 µg
µl-1 to 200 µg µl-1. After
incubation, 1.5 µl of the solution was transferred to 10 spots on a
target plate, respectively. Mass spectra were measured with a Microflex
LT/SH System (Bruker Daltonics) using method MBTAuto. Peak evaluation
was carried out in a mass peak range between 2 k – 10 k Dalton (Da)
using a centroid peak detection algorithm, a signal to noise threshold
of 2 and a minimum intensity threshold of 600. To create a sum spectrum,
160 satisfactory shots were summed up.
Resulting from observations during this initial test, a fast applicable
protocol was developed without the need to weigh each tissue sample.
Matrix volume was added to tissue samples depending on tissue volume,
i.e. tissue samples were always completely covered by HCCA matrix with a
small layer (ca. 1 mm) of supernatant. Samples were incubated for 5
minutes and 1.5 µl of the solution were transferred to a single spot on
a target plate for measurement. Each spot was measured between two to
three times.
Mass spectra processing in R
Mass spectra data was imported to R (R-Core-Team, 2022) using
MALDIquantForeign (Gibb, 2015) and further processed using MALDIquant
(Gibb and Strimmer, 2012). Mass spectra were trimmed to an identical
length from 2 to 20 kDa. Subsequently, spectra were square root
transformed, smoothed using Savitzky Golay method (Savitzky and Golay,
1964), baseline corrected using SNIP approach (Ryan et al., 1988) and
normalized using total ion current (TIC) method.
Spectra were quality controlled using the command ‘screenSpectra’ from
the R-package MALDIrppa (Palarea-Albaladejo et al., 2017). Mass spectra
with a notably high a-score were checked by eye and discarded if mass
spectra were of bad quality. If due to this, only a single specimen for
a certain species was retained, the remaining specimen was discarded
from the data set.
Evaluation of random forest model for identification
Besides initial sample preparation and subsequent data processing, we
tested how to improve a random forest (RF) model used for species
identification. Optimal number of trees and variables was tested in a
previous study (Rossel and Martínez Arbizu, 2018a). Here we assessed the
effect of minimum number of specimens per species category on the
resulting model power. We sampled the dataset using two to 11 specimens
per species including only species with at least 11 specimens per class
(n=20). For each minimum number of specimens, 100 data sets were sampled
using ‘sample_n’ from R-package dplyr (Wickham et al., 2022), a RF
model was created and the OOB errors assessed accordingly.
Standardization of data processing
Based on literature research and own observations, three data processing
steps were identified, which may have a severe impact on data and the
resulting quality of a random Forest (RF) classification model
(Breimann, 2001). I) Iterations of baseline subtraction: this is a first
manipulation step to reduce chemical noise and is carried out
iteratively (Martínez Arbizu and Rossel, 2018). Increasing iterations
will result in loss of low intensity peaks. II) Signal to noise ratio
(SNR) during peak picking: an increase in SNR will exclude signals of
low intensity. The higher the SNR value, the less peaks will be kept.
III) Half window size (HWS) during peak picking: within the HWS the peak
with the highest intensity will be chosen as the resulting peak during
peak picking. The higher the HWS is chosen, the less peaks will be
picked across an entire mass spectrum range.
Interactive effects of these data processing steps were tested using the
classification success by a random forest model as target variable:
iterations of baseline estimation and peak detection HWS were varied
both between 5 to 30 and SNR from 3 to 20. In total, 12,186 analyses
were carried out. In all cases, peak binning using ‘binPeaks’ from
R-package MaldiQuant was repeated until the number of variables in the
data did not further change. The RF model (ntree=2,000 and mtry=35) was
trained on the Hellinger transformed peak intensities as suggested by
Rossel and Martínez Arbizu (2018a). The RF out-of-box (OOB) error was
used as measure for classification success. For these analyses, based on
the results from RF-model evaluation, only species were included with at
least six specimens. To investigate the main drivers of classification
success, a generalized additive model (GAM, family: binomial; link
function: logit) was calculated.
Testing the classification success
In concordance with the results from the previous tests, only species
with at least six specimens were included in the model. Mass spectra
from these species were processed according to the results from the test
on variation of HWS (7), SNR (3) and baseline iteration (22). To test
the overall classification success on species level, single specimens
were separated from the RF training data set and subsequently identified
using this model. After classification, the post-hoc test by
Rossel and Martínez Arbizu (Martínez Arbizu and Rossel, 2018; Rossel and
Martínez Arbizu, 2018a) using the R-package RFtools
(https://github.com/pmartinezarbizu/RFtools)
was applied to verify RF classification. This post-hoc test uses
the empirical distribution of RF assignment probabilities from the RF
model and compares the assignment probabilities of newly classified
specimens to this distribution. Whereas classified specimens with
assignment probabilities falling within this empirical distribution are
considered true positive (tp), specimens with probabilities of
assignment significantly different to this distribution considered false
positive (fp).
Case studies
In order to show the applicability of MALDI-TOF MS, we present two model
cases. First, data of the North Sea starfish Astropecten
irregularis (Pennant, 1777) were investigated based on MALDI-TOF mass
spectra. This species was found to be genetically divergent (Laakmann et
al., 2016) while revealing a high morphological similarity.
Differentiation of species was tested using RF models. Furthermore, data
on the crustacean Euterpina acutifrons (Dana, 1847) from Rossel
and Martínez Arbizu, 2019 was analyzed to show the applicability for sex
level differentiation using hierarchical clustering and RF. Based on the
Gini index the 30 most important peaks for species/sex differentiation
in a RF model were extracted and investigated to show the expression
within the respective groups.
Phyla and class models for identification
To test whether specimen can be identified on an above-species level, a
RF model containing only class- and phylum-categories was applied. All
spectra from species to be classified were excluded from the model to
evaluate its use for specimen not included in a library. The respective
specimens were identified using the model and the predicted class/phylum
was tested with the RF post-hoc test. To test classification on
phylum level, 1,246 specimens from 198 species were included. On class
level, 1,227 specimens from 195 species were analyzed.