Janna Peters

and 4 more

We analyzed robustness of species identification based on proteomic composition to data processing and intraspecific variability, specificity and sensitivity of species-markers as well as discriminatory power of proteomic fingerprinting and its sensitivity to phylogenetic distance. Our analysis is based on MALDI-TOF MS data from 32 marine copepod species coming from 13 regions (North and Central Atlantic and adjacent seas). A random forest (RF) model correctly classified all specimens to species level with only small sensitivity to data processing, demonstrating the strong robustness of the method. Compounds with high specificity showed low sensitivity i.e., identification was rather based on complex pattern-differences than on presence of single markers. Proteomic distance was not consistently related to phylogenetic distance. A species-gap in proteome composition appeared at 0.8 Euclidean distance when using only specimens from the same sample. When other regions or seasons were included, intra-specific variability increased, resulting in overlaps of intra- and inter-specific distance. Highest intra-specific distances (> 0.8) were observed between specimens from brackish and marine habitats i.e., salinity likely affects proteomic patterns. When testing library sensitivity of the RF model to regionality, strong misidentification was only detected between two congener pairs. Still, choice of reference library may have an impact on identification of closely related species and should be tested before routine application. We envision high relevance of this time- and cost-efficient method for future zooplankton monitoring as it provides not only in-depth taxonomic resolution for counted specimens but also add-on information e.g., on developmental stage or environmental conditions.