Illumina read processing and filtering
Detailed information on the read processing and filtering pipeline is summarized in Table S2. Briefly, we demultiplexed raw reads allowing no mismatch in the dual-index pair. Then, we used fastqc v.0.11.7 (Andrews, 2010) to quality check raw reads and cutadapt v.2.10 (Martin, 2011) to trim primers and filter out raw reads exhibiting any variation from expected primer length and composition. Subsequently, we used pear v.0.9.11 (Zhang, Kobert, Flouri, & Stamatakis, 2014) to merge forward and reverse reads. Each metabarcoding sample was then separately quality filtered, dereplicated discarding singletons, length filtered retaining only reads 416-420 bp, de novo chimera filtered using UCHIME3, and denoised using UNOISE3 as implemented in vsearch v.2.9.1 (Rognes, Flouri, Nichols, Quince, & Mahé, 2016). Once denoising was performed, reads from all metabarcoding samples were pooled and again dereplicated (discarding no sequences) to generate a catalogue of unique putative haplotypes (ASVs). Subsequently, we ran blast to compare all ASVs against a combined database composed of the NCBI nt collection (accessed November 2020) and a curated reference catalogue including the 344 Sanger sequences of the ‘voucher’ specimens plus 561 previously available sequences corresponding to soil lineages of Acari, Collembola and Coleoptera (Arribas et al. 2016, 2021b). Based on the blast output we assigned the ASVs to high-rank taxonomic levels, by applying the weighted lowest common ancestor algorithm in megan6 (Huson et al., 2016; see also Hleap, Littlefair, Steinke, Hebert, & Cristescu, 2021). Only ASVs assigned to Acari, Collembola or Coleoptera were retained and used for downstream analyses. We further filtered the ASVs using metamate v.0.1b18 (Andújar et al., 2021), a novel approach aiming at removing putative nuclear copies of mitochondrial DNA (NUMTs; Lopez, Yuhki, Masuda, Modi, & O’Brien, 1994) and other types of low-frequency erroneous sequences from denoised metabarcoding datasets. This software allows the application of multiple read-abundance filtering strategies and posterior evaluation of their effects on the prevalence of known authentic mitochondrial haplotypes and presumed non-mitochondrial copies (e.g., those violating the reading frame or expected length, as expected for NUMTs and erroneous sequences) in the final filtered dataset (Andújar et al., 2021). We selected the most stringent filtering solution to ensure the removal of most erroneous sequences (see Supplemental Information for details on the metamate filtering). Subsequently, we used vsearch to generate a read-count community table of the metamate-filtered ASVs by matching them with a 100% identity value against the raw read dataset before dereplicating, length filtering and denoising. We further filtered these community tables by removing ASVs showing abundances of 2 or fewer reads and also those whose contribution to the total number of reads per taxonomic group and library was lower than 1%. Finally, filtered read-count community tables were converted to presence/absence tables (see Jurburg, Keil, Singh, & Chase, 2021). Negative controls were processed alongside actual samples throughout the filtering workflow.