Aurélie Bonin

and 2 more

Clustering approaches are pivotal to handle the many sequence variants obtained in DNA metabarcoding datasets, therefore they have become a key step of metabarcoding analysis pipelines. Clustering often relies on a sequence similarity threshold to gather sequences in Molecular Operational Taxonomic Units (MOTUs), each of which ideally representing a homogeneous taxonomic entity, e.g. a species or a genus. However, the choice of the clustering threshold is rarely justified, and its impact on MOTU over-splitting or over-merging even less tested. Here, we evaluated clustering threshold values for several metabarcoding markers under different criteria: limitation of MOTU over-merging, limitation of MOTU over-splitting, and trade-off between over-merging and over-splitting. We extracted sequences from a public database for eigt markers, ranging from generalist markers targeting Bacteria or Eukaryota, to more specific markers targeting a class or a subclass (e.g. Insecta, Oligochaeta). Based on the distributions of pairwise sequence similarities within species and within genera, and on the rates of over-splitting and over-merging across different clustering thresholds, we were able to propose threshold values minimizing the risk of over-splitting, that of over-merging, or offering a trade-off between the two risks. For generalist markers, high similarity thresholds (0.96-0.99) are generally appropriate, while more specific markers require lower values (0.85-0.96). These results do not support the use of a fixed clustering threshold. Instead, we advocate a careful examination of the most appropriate threshold based on the research objectives, the potential costs of over-splitting and over-merging, and the features of the studied markers.

Alessia Guerrieri

and 17 more

Ice-free areas are increasing worldwide due to the dramatic glacier shrinkage and are undergoing rapid colonization by multiple lifeforms, thus representing key environments to study ecosystem development. Soils have a complex vertical structure. However, we know little about how microbial and animal communities differ across soil depths and development stages during the colonization of deglaciated terrains, how these differences evolve through time, and whether patterns are consistent among different taxonomic groups. Here, we used environmental DNA metabarcoding to describe how community diversity and composition of six groups (Eukaryota, Bacteria, Mycota, Collembola, Insecta, Oligochaeta) differ between surface (0-5 cm) and relatively deep (7.5-20 cm) soils at different stages of development across five Alpine glaciers. Taxonomic diversity increased with time since glacier retreat and with soil evolution; the pattern was consistent across different groups and soil depths. For Eukaryota, and particularly Mycota, alpha-diversity was generally the highest in soils close to the surface. Time since glacier retreat was a more important driver of community composition compared to soil depth; for nearly all the taxa, differences in community composition between surface and deep soils decreased with time since glacier retreat, suggesting that the development of soil and/or of vegetation tends to homogenize the first 20 cm of soil through time. Within both Bacteria and Mycota, several molecular operational taxonomic units were significant indicators of specific depths and/or soil development stages, confirming the strong functional variation of microbial communities through time and depth. The complexity of community patterns highlights the importance of integrating information from multiple taxonomic groups to unravel community variation in response to ongoing global changes.