Bioinformatics
All shotgun metagenomic reads underwent adapter trimming, quality control, and were filtered against the EquCab3 domestic horse reference genome , using default parameters in kneaddata , a wrapper function for trimmomatic and bowtie2 . Read pairs which passed quality filtering were used to estimate microbial taxon relative abundances, via default implementation of Kaiju . Unlike nucleotide-based profilers, Kaiju assigns taxon identities to reads by first translating sequences into all six reading frames, and mapping the resultant amino acid sequences to a protein reference database. Protein coding regions are more strongly conserved than non-coding regions, and amino acid sequences will not be affected by synonymous substitutions of nucleotide bases. This can allow for greater rates of classification for shotgun metagenomic reads which come from microbiota which are underrepresented in reference databases. Using the microbial subset of the NCBI BLAST non-redundant protein database, we first classified reads to ‘species’, a delineation which in actuality encompasses species, strains and co-abundant gene groups. Reads which could not be assigned to species were assigned to progressively coarser taxonomic levels (genus, family, order, class, phylum, kingdom). For example, reads which could be classified to family, but not genus or species would form a ‘familyX unclassified ’ bin in our analyses.
To estimate microbial gene contents and metabolic potential, we used HUMAnN3 . HUMAnN3 maps quality-controlled and filtered shotgun metagenomic reads to UniProt Reference Clusters (UniRef; . In running HUMAnN3, we concatenated forward and reverse read files, by-passed the taxonomic classifier option, reduced subject sequence coverage thresholds to 0, classified reads to UniRef50 gene families, and mapped gene families to MetaCyc reactions and metabolic pathways . The methods used for quality-control filtering and taxonomic classification of 16S rRNA amplicon sequence data using DADA2 are described elsewhere .