Methods
Sample material
Tissue for measurements was taken mainly from the marine organisms tissue bank of the Senckenberg am Meer, German Centre for Marine Biodiversity Research, which was established using samples from numerous studies (Knebelsberger and Thiel, 2014; Knebelsberger et al., 2014; Markert et al., 2014; Gebhardt and Knebelsberger, 2015; Raupach et al., 2015; Barco et al., 2016; Laakmann et al., 2016; Rossel et al., 2020b) (supplementary table S1 for accession numbers) on North Sea metazoans. The material from this collection was taken from specimens processed for COI-barcoding to create reference libraries for a variety of marine animal groups. During this process, tissue samples of the respective specimens were stored in ethanol at -80°C. Tissue samples were available for Bivalvia (muscle, 18 species), Cephalopoda (muscle from arm, 12 species), Gastropoda (muscle from foot, 24 species), Polyplacophora (muscle from foot, 2 species), Ascidiacea (tissue, 1 species), Teleostei (muscle, 67 species), Elasmobranchii (muscle, 7 species), Malacostraca (muscle from foot or chelae, 39 species), Thecostraca (muscle from foot, 1 species), Pycnogonida (leg fragment, 1 species), Asteroidea (tube feet, 10 species), Ophiuroidea (tissue from arm, 10 species) and Echinoidea (tissue from the base of the tubercle, 6 species) (nspecies= 198, nspecimens=1,246).
Sample preparation
The basic protocol of sample preparation was the same for all analyzed tissue samples. A very small tissue fragment (< 1 mm3) was incubated for 5 minutes in HCCA as a saturated solution in 50% acetonitrile, 47.5% molecular grade water and 2.5% trifluoroacetic acid. Tissue from crustacean Cancer pagurus Linnaeus, 1758, the fish Clupea harengus Linnaeus, 1758, the cephalopod Eledone cirrhosa (Lamarck, 1798) and the echinoderm Stichastrella rosea (O.F. Müller, 1776) was used to find an optimal tissue to HCCA matrix ratio. Tissue was weighed on a METTLER TOLEDO XS3DU micro-balance and the amount of matrix was adjusted to tissue weight to obtain the desired ratios ranging from 0.012 µg µl-1 to 200 µg µl-1. After incubation, 1.5 µl of the solution was transferred to 10 spots on a target plate, respectively. Mass spectra were measured with a Microflex LT/SH System (Bruker Daltonics) using method MBTAuto. Peak evaluation was carried out in a mass peak range between 2 k – 10 k Dalton (Da) using a centroid peak detection algorithm, a signal to noise threshold of 2 and a minimum intensity threshold of 600. To create a sum spectrum, 160 satisfactory shots were summed up.
Resulting from observations during this initial test, a fast applicable protocol was developed without the need to weigh each tissue sample. Matrix volume was added to tissue samples depending on tissue volume, i.e. tissue samples were always completely covered by HCCA matrix with a small layer (ca. 1 mm) of supernatant. Samples were incubated for 5 minutes and 1.5 µl of the solution were transferred to a single spot on a target plate for measurement. Each spot was measured between two to three times.
Mass spectra processing in R
Mass spectra data was imported to R (R-Core-Team, 2022) using MALDIquantForeign (Gibb, 2015) and further processed using MALDIquant (Gibb and Strimmer, 2012). Mass spectra were trimmed to an identical length from 2 to 20 kDa. Subsequently, spectra were square root transformed, smoothed using Savitzky Golay method (Savitzky and Golay, 1964), baseline corrected using SNIP approach (Ryan et al., 1988) and normalized using total ion current (TIC) method.
Spectra were quality controlled using the command ‘screenSpectra’ from the R-package MALDIrppa (Palarea-Albaladejo et al., 2017). Mass spectra with a notably high a-score were checked by eye and discarded if mass spectra were of bad quality. If due to this, only a single specimen for a certain species was retained, the remaining specimen was discarded from the data set.
Evaluation of random forest model for identification
Besides initial sample preparation and subsequent data processing, we tested how to improve a random forest (RF) model used for species identification. Optimal number of trees and variables was tested in a previous study (Rossel and Martínez Arbizu, 2018a). Here we assessed the effect of minimum number of specimens per species category on the resulting model power. We sampled the dataset using two to 11 specimens per species including only species with at least 11 specimens per class (n=20). For each minimum number of specimens, 100 data sets were sampled using ‘sample_n’ from R-package dplyr (Wickham et al., 2022), a RF model was created and the OOB errors assessed accordingly.
Standardization of data processing
Based on literature research and own observations, three data processing steps were identified, which may have a severe impact on data and the resulting quality of a random Forest (RF) classification model (Breimann, 2001). I) Iterations of baseline subtraction: this is a first manipulation step to reduce chemical noise and is carried out iteratively (Martínez Arbizu and Rossel, 2018). Increasing iterations will result in loss of low intensity peaks. II) Signal to noise ratio (SNR) during peak picking: an increase in SNR will exclude signals of low intensity. The higher the SNR value, the less peaks will be kept. III) Half window size (HWS) during peak picking: within the HWS the peak with the highest intensity will be chosen as the resulting peak during peak picking. The higher the HWS is chosen, the less peaks will be picked across an entire mass spectrum range.
Interactive effects of these data processing steps were tested using the classification success by a random forest model as target variable: iterations of baseline estimation and peak detection HWS were varied both between 5 to 30 and SNR from 3 to 20. In total, 12,186 analyses were carried out. In all cases, peak binning using ‘binPeaks’ from R-package MaldiQuant was repeated until the number of variables in the data did not further change. The RF model (ntree=2,000 and mtry=35) was trained on the Hellinger transformed peak intensities as suggested by Rossel and Martínez Arbizu (2018a). The RF out-of-box (OOB) error was used as measure for classification success. For these analyses, based on the results from RF-model evaluation, only species were included with at least six specimens. To investigate the main drivers of classification success, a generalized additive model (GAM, family: binomial; link function: logit) was calculated.
Testing the classification success
In concordance with the results from the previous tests, only species with at least six specimens were included in the model. Mass spectra from these species were processed according to the results from the test on variation of HWS (7), SNR (3) and baseline iteration (22). To test the overall classification success on species level, single specimens were separated from the RF training data set and subsequently identified using this model. After classification, the post-hoc test by Rossel and Martínez Arbizu (Martínez Arbizu and Rossel, 2018; Rossel and Martínez Arbizu, 2018a) using the R-package RFtools (https://github.com/pmartinezarbizu/RFtools) was applied to verify RF classification. This post-hoc test uses the empirical distribution of RF assignment probabilities from the RF model and compares the assignment probabilities of newly classified specimens to this distribution. Whereas classified specimens with assignment probabilities falling within this empirical distribution are considered true positive (tp), specimens with probabilities of assignment significantly different to this distribution considered false positive (fp).
Case studies
In order to show the applicability of MALDI-TOF MS, we present two model cases. First, data of the North Sea starfish Astropecten irregularis (Pennant, 1777) were investigated based on MALDI-TOF mass spectra. This species was found to be genetically divergent (Laakmann et al., 2016) while revealing a high morphological similarity. Differentiation of species was tested using RF models. Furthermore, data on the crustacean Euterpina acutifrons (Dana, 1847) from Rossel and Martínez Arbizu, 2019 was analyzed to show the applicability for sex level differentiation using hierarchical clustering and RF. Based on the Gini index the 30 most important peaks for species/sex differentiation in a RF model were extracted and investigated to show the expression within the respective groups.
Phyla and class models for identification
To test whether specimen can be identified on an above-species level, a RF model containing only class- and phylum-categories was applied. All spectra from species to be classified were excluded from the model to evaluate its use for specimen not included in a library. The respective specimens were identified using the model and the predicted class/phylum was tested with the RF post-hoc test. To test classification on phylum level, 1,246 specimens from 198 species were included. On class level, 1,227 specimens from 195 species were analyzed.