Results
On average, 87.2% ± 3.9% SD of shotgun metagenomic read pairs passed quality control. Of these, an average of 0.16% ± 0.8% SDof reads were removed after being filtered against the EquCab3 reference genome. While some samples had large amounts of host contamination (max = 27%), 95.5% of samples were found to contain < 2% horse DNA. However, investigators should be cognisant that other hosts have been shown to contain higher proportions of host DNA . Of the reads that passed quality control and filtering, 54.3% ± 4.1% SD were identified as bacterial, 0.5% ± 0.2% SD as archaeal, and 0.6% ± 0.1% SD as microbial eukaryotic. The unclassified fraction of reads might derive from: (1) microbiota not present in the reference database, (2) non-coding regions of microbial genomes, (3) parasitic nematodes in the intestinal tract, (4) other contaminant DNA, or (5) DNA derived from dietary sources.
Prior to the taxonomic analyses reported below, we removed reads which were not classified as bacteria. Therefore, when seeking to determine requisite sequencing depths, we caution investigators to account for sequence losses due to low-quality reads, or those derived from host, dietary, or microbial non-coding regions (if using a translated search method). Investigators should also be prepared for uneven sequencing depths when multiplexing large numbers of samples. Our sequencing of a 188-sample pilot dataset on an Illumina NovaSeq resulted in a median depth of 4,105,972 read pairs, which is remarkably close to our target depth of 4.1 million read pairs. However, observed sequencing depths were also highly variable (SD ± 1.9 million read pairs; Figure S2). Therefore, like , we would advise researchers to sequence more deeply than their identified minimal sequencing depth, to minimize sample drop-out.