DISCUSSION
Taxonomic assignment of OTUs – Recovery of EDF communities is impacted more by methodological choices during metabarcoding sampling than Dikarya communities. In particular, target fragment length and taxonomy assignment method can each have a profound impact on EDF detection. We found considerable differences in taxonomic identifications between taxonomy methods for all primer sets. This was even true at the Kingdom level where LSU OTUs were more likely to be identified as non-fungal by the RDP classifier (47-54%) compared to our RDP+SILVA database (33-38%). The ITS1F dataset had the greatest number of OTUs identified as Fungi. This result was expected due to differences in primer specificity (i.e. the ITS1F/ITS2 primer combination is more fungal-specific but has known biases for Dikarya – Bellemain et al., 2010; Tedersoo et al., 2015) and the completeness of the ITS versus LSU databases. Because the ITS databases have >1,000,000 reference sequences compared to <200,000 for LSU databases, we expected more accurate taxonomic assignment for ITS1F OTUs.
In the mock community analyses, both LSU datasets more closely recapitulated the community than ITS1F (TABLE 1). Errors in the taxonomy assignment of ITS1F OTUs from the mock community indicate misidentification of reference sequences (e.g. Dimargaris ) and also a lack of reference sequences for some taxa (e.g. Zoopagales) (TABLE 1). Correlation plots of fungal OTU length and taxonomic identity score had significant negative relationships for the LSU primer sets (SUPP FIG 9), but a positive relationship for ITS1F. The negative relationship for LSU likely reflects the greater representation of the shorter length reference sequences from Dikarya species in the databases than longer non-Dikarya. However, while there was a negative relationship between OTU length and read number for ITS1F, there was no significant correlation for the LSU datasets (SUPP FIG 5). This implies that taxonomic representation in the reference databases rather than OTU length has a greater impact on taxonomy scores of LSU OTUs. However, our RDP+SILVA LSU database substantially improved identification of Zoopagomycota isolates in the mock community compared to the RDP classifier (TABLE 1) with \(\leq\)10% of mock members identified with RDP versus 63% with RDP+SILVA. In contrast, ITS1F recovered 27% of the mock members using the hybrid taxonomy method. It is important to note, however, that populating our RDP+SILVA database with additional non-fungal sequences from GenBank improved the accuracy of determining fungal versus non-fungal sequences compared to RDP. The supported placement of Zoopagomycota OTUs in our reduced phylogeny also reinforces the accuracy of the taxonomic classifications made by our pipeline (FIG 7). Therefore, a robust reference database should include a diversity of eukaryotic sequences, especially for LSU because primers for this region are often less fungal-specific.
These results demonstrate the profound effect that reference databases have on the classification of OTUs. Other taxonomic assignment approaches, like BLAST followed by MEGAN (Huson et al., 2011) or the Statistical Assignment Package (Munch et al., 2008), have the potential to improve LSU OTU identification (Porter & Golding, 2012). However, many of our unidentified OTUs had best matches to other unidentified sequences in GenBank (e.g. SUPP TABLE 5) or best matches that did not reflect the phylogenetic placement of the OTU, indicating that BLAST and reference-based methods cannot completely alleviate identification problems (Lücking et al., 2020). The disparity between named fungal species and unnamed environmental sequences is substantial and available data suggest that many unidentified sequences represent EDF, including Zoopagomycota (Lazarus & James, 2015; Tedersoo et al., 2017, 2020). Even within a relatively small order, Zoopagales, 13 out of 22 genera lack any sequence data (Davis et al., 2019b). Although much has been done to improve fungal LSU databases (e.g. Vu et al., 2019; Hanafy et al., 2020), further additions and curation can bring LSU on par with ITS as an rDNA metabarcoding marker. Until intensive efforts are made to curate and fill the taxonomic gaps within databases, it is clear that taxonomic assignment issues will be problematic irrespective of improvements in sequencing and bioinformatics. For example, the long reads and simultaneous sequencing of multiple rDNA markers at once offered by PacBio technology was not able to entirely overcome the pitfalls of reference-based taxonomic assignment (Furneaux et al., 2021; Heeger et al., 2018).
Primer comparison – Comparisons between metabarcoding datasets from different studies are challenging due to variable methods of sample collection, PCR amplification, sequencing, and bioinformatics. However, our results using data from >127 environmental samples support the broad pattern of fungal community congruence between ITS and LSU markers as evidenced by significant Mantel tests (TABLE 3) and found by others (Amend et al., 2010; Benucci et al., 2019; Bonito et al., 2014; Brown et al., 2014; Johansen et al., 2016; Mota-Gutierrez et al., 2019; Nelson & Shaw, 2019; Skelton et al., 2019; Xue et al., 2019). Our NMDS plots (FIG 4; SUPP FIG 7) demonstrate that the ITS1F, LROR, and LR22F datasets recovered similar fungal communities and detected mostly the same OTU-rich lineages (SUPP FIG 6). However, there were discrepancies among rarer groups, like some chytrid orders for which more LSU OTUs were detected than ITS1F (FIG 3). Similarly, Nelson and Shaw (2019) and Benucci et al., (2018) reported greater numbers of Chytridiomycota OTUs with LROR than ITS. Conversely, the ITS1F marker inflated the number of OTUs assigned to the Kickxellales in the mock community; we originally added nine taxa but recovered 12 OTUs. This could be a result of biological (e.g. intraspecific rDNA copy number variation) and/or methodological (e.g. inappropriate OTU clustering identity threshold) factors. Artificial inflation of the number of ITS OTUs has been shown for various taxa in mock communities in metabarcoding studies (Castaño et al., 2020; De Filippis et al., 2017; Jusino et al., 2019; Nguyen et al., 2015; Větrovský et al., 2016). These results underscore the importance of mock communities for detecting methodological errors and refining bioinformatic parameters such as clustering thresholds (Caporaso et al., 2011; Palmer et al., 2018; Taylor et al., 2016). Furthermore, taxonomic assignments of ITS OTUs cannot be tested with phylogenetic analyses. Thus, groups with dramatic differences in representation between markers (e.g. Rozellomycota in our dataset, FIG 3) cannot easily be evaluated for accuracy.
We further aimed to test LSU primer pairs for their ability to amplify fungi broadly, and Zoopagomycota fungi specifically, as well as compare their performance in the bioinformatics pipeline. The LR22F dataset had more fungal and total OTUs than LROR and fungal diversity analyses found significant differences between groups for LR22F not found with LROR (SUPP TABLE 4; SUPP FIG 8). Similar results were found by Mueller et al. (2016) for the related LR22R primer, which recovered richness estimates closer to ITS than LROR. We found that the forward and reverse reads from the LR22F dataset were consistently paired into contigs, contrary to both the ITS1F and LROR primer sets (FIG 1). The LROR target fragments for the mock community members were commonly >700 bp (TABLE 1), well beyond the sequenceable length of Illumina MiSeq chemistry. The resulting data are therefore almost entirely restricted to the forward reads, resulting in significant data loss. However, paired reads have several advantages over unpaired reads: more data are utilized, overlapping sequences reduce sequence errors, and longer sequences reduce problems during OTU clustering and taxonomy assignment (Bartram et al., 2011; Truong et al., 2019). Furthermore, longer sequences are more accurately identified using bioinformatic methods (Liu et al., 2012; Porras-Alfaro et al., 2014; Porter & Golding, 2012). As a result, the taxonomic identification of LROR OTUs may be less reliable than the longer LR22F OTUs. We also found that sequence length influenced our phylogenetic analysis where the longer LR22F OTUs were generally placed with higher resolution than shorter LROR OTUs (FIG 6). Twenty-two of the 50 LR22F OTUs were placed in a clade that matched the fungal class of their BLAST match, compared to only 15 of the 50 LROR OTUs. Conversely, both LROR and LR22F OTUs resolved well in the smaller Zoopagomycota tree (FIG 7), and the majority were placed in clades that correspond to the taxonomy assignment output from the UTAX/RDP+SILVA pipeline. These results illustrate the utility of phylogenetic reconstruction of LSU OTUs for identifying potential divergent EDF fungal sequences as well as sequence artifacts or taxonomic errors that need further investigation (Glass et al., 2013). For example, the OTUs that were placed within the Orbiliomycetes (and had high BLAST matches to Orbiliomycetes) indicate that the corresponding reference sequences in the database could be identified beyond kingdom to increase the accuracy of the taxonomy.
Methodological biases impacting detection of EDF: the example of Zoopagomycota We found additional evidence that target region length (for both ITS1 and LSU fragments) strongly affects the metabarcoding process at different steps. For ITS1F, the strongest bias putatively occurs during PCR when shorter fragments are preferentially amplified (Castaño et al., 2020; Jusino et al., 2019; Palmer et al., 2018). In our initial experiments with the ITS1F primer set, Zoopagomycota species with the longest ITS1 (≥400 bp) were not recovered from mixed samples. This was true despite adding DNA from those species at twice the concentration of the “background” DNA (SUPP FIG 1). This pattern was reiterated in mock community results (TABLE 1) where target species with the longest ITS1 (≥400 bp) were not detected. This bias towards short fragments is also taxonomically biased. Many species of Ascomycota, Basidiomycota, and Mucoromycota have short ITS1 regions (<200-300 bp) (Bellemain et al., 2010; Bokulich & Mills, 2013), with a few notable exceptions (e.g. Cantharellales may have ITS1 >1,000 bp – Feibelman et al., 1994). On the other hand, most Zoopagomycota species we examined have an ITS1 >300 bp with significant variation between taxa. For example, Piptocephalis and Syncephalis species have ITS1 that ranges 300 – 800 bp (Lazarus et al., 2017; Reynolds et al., 2019) and we have also found variation among Coemansia species (TABLE 1). However, the Coemansia isolates had shorter ITS1 than many other mock members, which likely contributed to their overrepresentation in the ITS1F results. Species of Harpellales have extreme variation with an ITS1 range of 250–1,000 bp (Gottlieb & Lichtwardt, 2001). Similar length variation occurs in the ITS2 region and in some cases ITS2 is longer than ITS1, such as in some Harpellales that have 1,100 bp for ITS2 but only 500 bp for ITS1 (Gottlieb & Lichtwardt, 2001). Although the fragment length for LR22F is variable among Zoopagomycota, the variation is lower than for ITS1F. In our mock community, the difference in fragment length between the longest and the shortestSyncephalis species was only 45 bp for LR22F versus 508 bp for ITS1F. Likewise, the range among Piptocephalis species was 85 bp for LR22F versus 247 bp for ITS1F.
Beyond sequencing and bioinformatics, the biology of Zoopagomycota must also be considered. The symbiotic nature of Zoopagomycota fungi means that their abundance in any given sample is linked with the abundance of their host organisms and dependent on host/parasite interactions. As a result, the distribution of these fungi is likely patchy and highly variable through time, lowering the probability of their detection from any single sample. Little is known about the host/parasite dynamics among Zoopagomycota parasites, making the choice of locations and sample types for metabarcoding less straightforward. For example, although the mycoparasitic species can be isolated from soil, they also are frequently isolated from dung. Are these species mainly coprophilic, or do they actively attack hosts in the soil environment as well? How long do their spores persist in the soil? Similarly, Zoopagales species that attack microinvertebrates have been isolated from wet substrates like moss and decaying plant material (Drechsler 1938; Duddington 1955). We attempted to concentrate potential hosts of these fungi and thereby increase our chance of detecting them by using Baermann funnels. However, these invertebrate samples were generally dominated by Chytridiomycota OTUs followed by Blastocladiomycota and Mucoromycota (FIG 3). A small number of Zoopagales OTUs were detected from other sample types by each primer set, and phylogenetic analyses supported their identification (FIG 7). Nonetheless, the number of OTUs detected is still less than the number of isolates recovered from these sites using specialized culturing techniques. For example, Benny et al., (2016) found that 46% of 520 soil samples (mostly from Florida) contained at least one Syncephalis species, but we only recovered one Syncephalis OTU from two samples using the LROR primer set. Likewise, the spores of Harpellales fungi are thought to pass between hosts through transmission in the water column (Lichtwardt 1986). However, we were unable to detect Harpellales OTUs from water or mud samples, which were mostly dominated by chytrid OTUs. Species of Kickxellales grow axenically and can be isolated from soil or dung, but their trophic modes remain unclear. Although Kickxellales fungi are assumed to be saprotrophic, there are reports of some species growing on other fungi or in association with arthropods and many species exhibit fastidious growth in culture (Jackson & Deardon, 1948; Linder 1943; Kurihara et al., 2001, 2008). Furthermore, most species are assumed to be rare, but the dearth of reports could be an artifact of undersampling. Our results demonstrate that Kickxellales can be detected from soil (FIG 3) and phylogenetic analyses indicate the clade may be more diverse and more widely distributed than currently recognized (FIG 7).
The combined effects of methodological biases and environmental sample heterogeneity (with symbiotic Zoopagomycota likely having lower abundance) may have a synergistic impact on metabarcoding outcomes, leading to artificially inflated representation of some groups and absence of others. These biases are rooted in methodology and can affect any group of organisms with highly variable target fragment lengths and/or poor reference sequence representation. Castaño et al. (2020) found such a pattern in mock communities where longer fragments added in unequal proportions to shorter fragments were severely underrepresented in Illumina MiSeq results while shorter fragments could be highly overrepresented (up to 57% greater output than input). Although PacBio RS II sequencing produced less severe discrepancies, a length bias was still detected (Castaño et al., 2020), an effect that can also be influenced by sample loading method (Tedersoo et al., 2017b). Our results demonstrate that both PCR length bias and lack of reference sequences severely impact the detection of Zoopagomycota from mixed samples. Other lineages of EDF are similarly involved in symbiotic associations and lack reference sequence data (e.g. Blastocladiomycota, Rozellomycota), indicating they are likewise often missed or misidentified in metabarcoding studies. Many unanswered questions about Zoopagomycota remain, such as their roles in microbial food webs and their effect on host populations. Metabarcoding has the potential to help unravel some of these mysteries but novel approaches are needed to overcome methodological biases, such as PCR-free on-array hybrid capture (e.g. Mamanova et al., 2010). Combined with targeted culturing approaches, improved environmental sampling methods can help illuminate the diversity and ecological roles of “dark matter fungi” (Grossart et al., 2016).