DISCUSSION
Taxonomic assignment of OTUs – Recovery of EDF
communities is impacted more by methodological choices during
metabarcoding sampling than Dikarya communities. In particular, target
fragment length and taxonomy assignment method can each have a profound
impact on EDF detection. We found considerable differences in taxonomic
identifications between taxonomy methods for all primer sets. This was
even true at the Kingdom level where LSU OTUs were more likely to be
identified as non-fungal by the RDP classifier (47-54%) compared to our
RDP+SILVA database (33-38%). The ITS1F dataset had the greatest number
of OTUs identified as Fungi. This result was expected due to differences
in primer specificity (i.e. the ITS1F/ITS2 primer combination is more
fungal-specific but has known biases for Dikarya – Bellemain et al.,
2010; Tedersoo et al., 2015) and the completeness of the ITS versus LSU
databases. Because the ITS databases have >1,000,000
reference sequences compared to <200,000 for LSU databases, we
expected more accurate taxonomic assignment for ITS1F OTUs.
In the mock community analyses, both LSU datasets more closely
recapitulated the community than ITS1F (TABLE 1). Errors in the taxonomy
assignment of ITS1F OTUs from the mock community indicate
misidentification of reference sequences (e.g. Dimargaris ) and
also a lack of reference sequences for some taxa (e.g. Zoopagales)
(TABLE 1). Correlation plots of fungal OTU length and taxonomic identity
score had significant negative relationships for the LSU primer sets
(SUPP FIG 9), but a positive relationship for ITS1F. The negative
relationship for LSU likely reflects the greater representation of the
shorter length reference sequences from Dikarya species in the databases
than longer non-Dikarya. However, while there was a negative
relationship between OTU length and read number for ITS1F, there was no
significant correlation for the LSU datasets (SUPP FIG 5). This implies
that taxonomic representation in the reference databases rather than OTU
length has a greater impact on taxonomy scores of LSU OTUs. However, our
RDP+SILVA LSU database substantially improved identification of
Zoopagomycota isolates in the mock community compared to the RDP
classifier (TABLE 1) with \(\leq\)10% of mock members identified with
RDP versus 63% with RDP+SILVA. In contrast, ITS1F recovered 27% of the
mock members using the hybrid taxonomy method. It is important to note,
however, that populating our RDP+SILVA database with additional
non-fungal sequences from GenBank improved the accuracy of determining
fungal versus non-fungal sequences compared to RDP. The supported
placement of Zoopagomycota OTUs in our reduced phylogeny also reinforces
the accuracy of the taxonomic classifications made by our pipeline (FIG
7). Therefore, a robust reference database should include a diversity of
eukaryotic sequences, especially for LSU because primers for this region
are often less fungal-specific.
These results demonstrate the profound effect that reference databases
have on the classification of OTUs. Other taxonomic assignment
approaches, like BLAST followed by MEGAN (Huson et al., 2011) or the
Statistical Assignment Package (Munch et al., 2008), have the potential
to improve LSU OTU identification (Porter & Golding, 2012). However,
many of our unidentified OTUs had best matches to other unidentified
sequences in GenBank (e.g. SUPP TABLE 5) or best matches that did not
reflect the phylogenetic placement of the OTU, indicating that BLAST and
reference-based methods cannot completely alleviate identification
problems (Lücking et al., 2020). The disparity between named fungal
species and unnamed environmental sequences is substantial and available
data suggest that many unidentified sequences represent EDF, including
Zoopagomycota (Lazarus & James, 2015; Tedersoo et al., 2017, 2020).
Even within a relatively small order, Zoopagales, 13 out of 22 genera
lack any sequence data (Davis et al., 2019b). Although much has been
done to improve fungal LSU databases (e.g. Vu et al., 2019; Hanafy et
al., 2020), further additions and curation can bring LSU on par with ITS
as an rDNA metabarcoding marker. Until intensive efforts are made to
curate and fill the taxonomic gaps within databases, it is clear that
taxonomic assignment issues will be problematic irrespective of
improvements in sequencing and bioinformatics. For example, the long
reads and simultaneous sequencing of multiple rDNA markers at once
offered by PacBio technology was not able to entirely overcome the
pitfalls of reference-based taxonomic assignment (Furneaux et al., 2021;
Heeger et al., 2018).
Primer comparison – Comparisons between metabarcoding
datasets from different studies are challenging due to variable methods
of sample collection, PCR amplification, sequencing, and bioinformatics.
However, our results using data from >127 environmental
samples support the broad pattern of fungal community congruence between
ITS and LSU markers as evidenced by significant Mantel tests (TABLE 3)
and found by others (Amend et al., 2010; Benucci et al., 2019; Bonito et
al., 2014; Brown et al., 2014; Johansen et al., 2016; Mota-Gutierrez et
al., 2019; Nelson & Shaw, 2019; Skelton et al., 2019; Xue et al.,
2019). Our NMDS plots (FIG 4; SUPP FIG 7) demonstrate that the ITS1F,
LROR, and LR22F datasets recovered similar fungal communities and
detected mostly the same OTU-rich lineages (SUPP FIG 6). However, there
were discrepancies among rarer groups, like some chytrid orders for
which more LSU OTUs were detected than ITS1F (FIG 3). Similarly, Nelson
and Shaw (2019) and Benucci et al., (2018) reported greater numbers of
Chytridiomycota OTUs with LROR than ITS. Conversely, the ITS1F marker
inflated the number of OTUs assigned to the Kickxellales in the mock
community; we originally added nine taxa but recovered 12 OTUs. This
could be a result of biological (e.g. intraspecific rDNA copy number
variation) and/or methodological (e.g. inappropriate OTU clustering
identity threshold) factors. Artificial inflation of the number of ITS
OTUs has been shown for various taxa in mock communities in
metabarcoding studies (Castaño et al., 2020; De Filippis et al., 2017;
Jusino et al., 2019; Nguyen et al., 2015; Větrovský et al., 2016). These
results underscore the importance of mock communities for detecting
methodological errors and refining bioinformatic parameters such as
clustering thresholds (Caporaso et al., 2011; Palmer et al., 2018;
Taylor et al., 2016). Furthermore, taxonomic assignments of ITS OTUs
cannot be tested with phylogenetic analyses. Thus, groups with dramatic
differences in representation between markers (e.g. Rozellomycota in our
dataset, FIG 3) cannot easily be evaluated for accuracy.
We further aimed to test LSU primer pairs for their ability to amplify
fungi broadly, and Zoopagomycota fungi specifically, as well as compare
their performance in the bioinformatics pipeline. The LR22F dataset had
more fungal and total OTUs than LROR and fungal diversity analyses found
significant differences between groups for LR22F not found with LROR
(SUPP TABLE 4; SUPP FIG 8). Similar results were found by Mueller et al.
(2016) for the related LR22R primer, which recovered richness estimates
closer to ITS than LROR. We found that the forward and reverse reads
from the LR22F dataset were consistently paired into contigs, contrary
to both the ITS1F and LROR primer sets (FIG 1). The LROR target
fragments for the mock community members were commonly >700
bp (TABLE 1), well beyond the sequenceable length of Illumina MiSeq
chemistry. The resulting data are therefore almost entirely restricted
to the forward reads, resulting in significant data loss. However,
paired reads have several advantages over unpaired reads: more data are
utilized, overlapping sequences reduce sequence errors, and longer
sequences reduce problems during OTU clustering and taxonomy assignment
(Bartram et al., 2011; Truong et al., 2019). Furthermore, longer
sequences are more accurately identified using bioinformatic methods
(Liu et al., 2012; Porras-Alfaro et al., 2014; Porter & Golding, 2012).
As a result, the taxonomic identification of LROR OTUs may be less
reliable than the longer LR22F OTUs. We also found that sequence length
influenced our phylogenetic analysis where the longer LR22F OTUs were
generally placed with higher resolution than shorter LROR OTUs (FIG 6).
Twenty-two of the 50 LR22F OTUs were placed in a clade that matched the
fungal class of their BLAST match, compared to only 15 of the 50 LROR
OTUs. Conversely, both LROR and LR22F OTUs resolved well in the smaller
Zoopagomycota tree (FIG 7), and the majority were placed in clades that
correspond to the taxonomy assignment output from the UTAX/RDP+SILVA
pipeline. These results illustrate the utility of phylogenetic
reconstruction of LSU OTUs for identifying potential divergent EDF
fungal sequences as well as sequence artifacts or taxonomic errors that
need further investigation (Glass et al., 2013). For example, the OTUs
that were placed within the Orbiliomycetes (and had high BLAST matches
to Orbiliomycetes) indicate that the corresponding reference sequences
in the database could be identified beyond kingdom to increase the
accuracy of the taxonomy.
Methodological biases impacting detection of EDF: the
example of Zoopagomycota – We found additional evidence that
target region length (for both ITS1 and LSU fragments) strongly affects
the metabarcoding process at different steps. For ITS1F, the strongest
bias putatively occurs during PCR when shorter fragments are
preferentially amplified (Castaño et al., 2020; Jusino et al., 2019;
Palmer et al., 2018). In our initial experiments with the ITS1F primer
set, Zoopagomycota species with the longest ITS1 (≥400 bp) were not
recovered from mixed samples. This was true despite adding DNA from
those species at twice the concentration of the “background” DNA (SUPP
FIG 1). This pattern was reiterated in mock community results (TABLE 1)
where target species with the longest ITS1 (≥400 bp) were not detected.
This bias towards short fragments is also taxonomically biased. Many
species of Ascomycota, Basidiomycota, and Mucoromycota have short ITS1
regions (<200-300 bp) (Bellemain et al., 2010; Bokulich &
Mills, 2013), with a few notable exceptions (e.g. Cantharellales may
have ITS1 >1,000 bp – Feibelman et al., 1994). On the
other hand, most Zoopagomycota species we examined have an ITS1
>300 bp with significant variation between taxa. For
example, Piptocephalis and Syncephalis species have ITS1
that ranges 300 – 800 bp (Lazarus et al., 2017; Reynolds et al., 2019)
and we have also found variation among Coemansia species (TABLE
1). However, the Coemansia isolates had shorter ITS1 than many
other mock members, which likely contributed to their overrepresentation
in the ITS1F results. Species of Harpellales have extreme variation with
an ITS1 range of 250–1,000 bp (Gottlieb & Lichtwardt, 2001). Similar
length variation occurs in the ITS2 region and in some cases ITS2 is
longer than ITS1, such as in some Harpellales that have 1,100 bp for
ITS2 but only 500 bp for ITS1 (Gottlieb & Lichtwardt, 2001). Although
the fragment length for LR22F is variable among Zoopagomycota, the
variation is lower than for ITS1F. In our mock community, the difference
in fragment length between the longest and the shortestSyncephalis species was only 45 bp for LR22F versus 508 bp for
ITS1F. Likewise, the range among Piptocephalis species was 85 bp
for LR22F versus 247 bp for ITS1F.
Beyond sequencing and bioinformatics, the biology of Zoopagomycota must
also be considered. The symbiotic nature of Zoopagomycota fungi means
that their abundance in any given sample is linked with the abundance of
their host organisms and dependent on host/parasite interactions. As a
result, the distribution of these fungi is likely patchy and highly
variable through time, lowering the probability of their detection from
any single sample. Little is known about the host/parasite dynamics
among Zoopagomycota parasites, making the choice of locations and sample
types for metabarcoding less straightforward. For example, although the
mycoparasitic species can be isolated from soil, they also are
frequently isolated from dung. Are these species mainly coprophilic, or
do they actively attack hosts in the soil environment as well? How long
do their spores persist in the soil? Similarly, Zoopagales species that
attack microinvertebrates have been isolated from wet substrates like
moss and decaying plant material (Drechsler 1938; Duddington 1955). We
attempted to concentrate potential hosts of these fungi and thereby
increase our chance of detecting them by using Baermann funnels.
However, these invertebrate samples were generally dominated by
Chytridiomycota OTUs followed by Blastocladiomycota and Mucoromycota
(FIG 3). A small number of Zoopagales OTUs were detected from other
sample types by each primer set, and phylogenetic analyses supported
their identification (FIG 7). Nonetheless, the number of OTUs detected
is still less than the number of isolates recovered from these sites
using specialized culturing techniques. For example, Benny et al.,
(2016) found that 46% of 520 soil samples (mostly from Florida)
contained at least one Syncephalis species, but we only recovered
one Syncephalis OTU from two samples using the LROR primer set.
Likewise, the spores of Harpellales fungi are thought to pass between
hosts through transmission in the water column (Lichtwardt 1986).
However, we were unable to detect Harpellales OTUs from water or mud
samples, which were mostly dominated by chytrid OTUs. Species of
Kickxellales grow axenically and can be isolated from soil or dung, but
their trophic modes remain unclear. Although Kickxellales fungi are
assumed to be saprotrophic, there are reports of some species growing on
other fungi or in association with arthropods and many species exhibit
fastidious growth in culture (Jackson & Deardon, 1948; Linder 1943;
Kurihara et al., 2001, 2008). Furthermore, most species are assumed to
be rare, but the dearth of reports could be an artifact of
undersampling. Our results demonstrate that Kickxellales can be detected
from soil (FIG 3) and phylogenetic analyses indicate the clade may be
more diverse and more widely distributed than currently recognized (FIG
7).
The combined effects of methodological biases and environmental sample
heterogeneity (with symbiotic Zoopagomycota likely having lower
abundance) may have a synergistic impact on metabarcoding outcomes,
leading to artificially inflated representation of some groups and
absence of others. These biases are rooted in methodology and can affect
any group of organisms with highly variable target fragment lengths
and/or poor reference sequence representation. Castaño et al. (2020)
found such a pattern in mock communities where longer fragments added in
unequal proportions to shorter fragments were severely underrepresented
in Illumina MiSeq results while shorter fragments could be highly
overrepresented (up to 57% greater output than input). Although PacBio
RS II sequencing produced less severe discrepancies, a length bias was
still detected (Castaño et al., 2020), an effect that can also be
influenced by sample loading method (Tedersoo et al., 2017b). Our
results demonstrate that both PCR length bias and lack of reference
sequences severely impact the detection of Zoopagomycota from mixed
samples. Other lineages of EDF are similarly involved in symbiotic
associations and lack reference sequence data (e.g. Blastocladiomycota,
Rozellomycota), indicating they are likewise often missed or
misidentified in metabarcoding studies. Many unanswered questions about
Zoopagomycota remain, such as their roles in microbial food webs and
their effect on host populations. Metabarcoding has the potential to
help unravel some of these mysteries but novel approaches are needed to
overcome methodological biases, such as PCR-free on-array hybrid capture
(e.g. Mamanova et al., 2010). Combined with targeted culturing
approaches, improved environmental sampling methods can help illuminate
the diversity and ecological roles of “dark matter fungi” (Grossart et
al., 2016).