Fig. 4: A The 30 most important peaks for differentiation of
the starfish A. irregularis groups within the random forest
model. Species according to COI delimitation are given on top. Molecule
masses sorted by size are given on the left hand side. BHierarchical clustering depicts differentiation of the copepod E.
acutifrons specimens on sex level. Nodal bootstrap support is displayed
at the nodes of the tree. The heatmap below the clustering results
depicts the 30 most important mass peaks for sex-differentiation using a
random forest model with color-coded peak intensities. Data from the
marine copepod Microarthridion littorale (Poppe, 1881) from the
same study was used here as an outgroup species. Relative intensities
are color coded.
Case study - sex determination
In previous research it was shown that sex determination may be possible
in some species by analyzing the proteomic fingerprint (Rossel and
MartÃnez Arbizu, 2019), however the data was not analyzed any further
therein. In depth analyses emphasize these findings and show
sex-specific protein patterns in the crustacean copepod Euterpina
acutifrons (Fig. 4B). Mass peaks such as m/z 2523, 2929 and 7417 are
female specific and not found in any of the male specimens. Others
however, predominantly occur in male specimens (m/z 3638, 3719). Further
mass peaks are evenly observed in measurements from both sexes but show
intensity-pattern differences.
Phyla and class models for identification
If a species is not part of a reference library, it may be desirable to
obtain a higher level classification. To test if this is possible based
on MALDI-TOF mass spectra of metazoans, species were systematically
taken out of the RF training data set and classified with a RF model
that was trained on higher taxonomic level but does not include any
information on the respective species to be classified. Regarding all
phyla together, a classification success of 81% (77% true positive
rate (tpr)) was achieved with phyla-wise success rates ranging from 73%
(64 % tpr) in Echinodermata to 95% (92% tpr) in Chordata (Fig. 3B).
On class level the combined success rate was 72% (66% tpr) ranging
from 7% (0% tpr) in Polyplacophora, for which only two species were
included in the data set, to 96% (94% tpr) in Teleostei.
For 31 taxa (n = 324), a congeneric species was included. Thus, it was
tested if species have a higher affinity to be classified as a
congeneric species in case the respective species is removed from the
training data. Of these 31 taxa, 30% of specimens were classified as a
congeneric species.