Results
On average, 87.2% ± 3.9% SD of shotgun metagenomic read pairs
passed quality control. Of these, an average of 0.16% ± 0.8% SDof reads were removed after being filtered against the EquCab3 reference
genome. While some samples had large amounts of host contamination (max
= 27%), 95.5% of samples were found to contain < 2% horse
DNA. However, investigators should be cognisant that other hosts have
been shown to contain higher proportions of host DNA . Of the reads that
passed quality control and filtering, 54.3% ± 4.1% SD were
identified as bacterial, 0.5% ± 0.2% SD as archaeal, and 0.6%
± 0.1% SD as microbial eukaryotic. The unclassified fraction of
reads might derive from: (1)
microbiota not present in the reference database, (2) non-coding regions
of microbial genomes, (3) parasitic nematodes in the intestinal tract,
(4) other contaminant DNA, or (5) DNA derived from dietary sources.
Prior to the taxonomic analyses reported below, we removed reads which
were not classified as bacteria. Therefore, when seeking to determine
requisite sequencing depths, we caution investigators to account for
sequence losses due to low-quality reads, or those derived from host,
dietary, or microbial non-coding regions (if using a translated search
method). Investigators should also be prepared for uneven sequencing
depths when multiplexing large numbers of samples. Our sequencing of a
188-sample pilot dataset on an Illumina NovaSeq resulted in a median
depth of 4,105,972 read pairs, which is remarkably close to our target
depth of 4.1 million read pairs. However, observed sequencing depths
were also highly variable (SD ± 1.9 million read pairs; Figure
S2). Therefore, like , we would advise researchers to sequence more
deeply than their identified minimal sequencing depth, to minimize
sample drop-out.