Comparisons of 16S amplicon versus shallow shotgun metagenomic sequencing
Despite the methodological biases inherent in amplicon versus shotgun metagenomic sequencing approaches, these methods yielded similar biological patterns. In a 13-sample set of DNA extracts sequenced using both amplicon and shallow shotgun metagenomic methods (both rarefied to a depth of 35K bacterial read pairs) we observed a positive correlation in Shannon diversity. Bray-Curtis dissimilarities in these sequencing datasets were highly correlated, but declined at finer taxonomic resolutions, and were uncorrelated at the ASV/species level. This is unsurprising, since ASVs are binned based on sequence similarities, while classifications of shotgun metagenomics reads are constrained to the taxonomic demarcations present in reference databases. The modest correlation in genus-level Bray-Curtis dissimilarities could arise from an inability to classify amplicon reads to this taxonomic level. Points of discrepancy in Bray-Curtis dissimilarities may also derive from differences between taxonomies within reference databases, rather than biases in sequencing methods. For example, an abundant genus in the horse microbiome, Oscillibacter , is classified within the familyRuminococcaceae in the 16S Silva database , but within the familyOscillospiraceae in the NCBI non-redundant database.
Bacterial family average relative abundance estimates were positively correlated between sequencing methods. But, abundant families in the Sable Island horse microbiome (Lachnospiraceae ,Ruminococcaceae , Prevotellaceae , Spirochaetaceae ,Rikenellaceae ) tended to be over-represented in the amplicon dataset compared to the shotgun dataset. Amplicon sequencing results are biased by 16S rRNA gene copy number and primer design, while shotgun metagenomic estimates are biased by genome size, or in the context of Kaiju, the size of gene-coding regions. For example, the weakest correlations in relative abundance estimates were observed inRuminococcaceae and Lachnospiraceae ; families known to possess large variation in 16S rRNA copy number . Conversely, the strongest correlation was observed between estimates ofFibrobacteraceae relative abundance, a narrow clade with low 16S rRNA copy number variation .
Many taxa of moderate relative abundance in the shotgun metagenomic dataset were either absent, or present at lower-than-expected values in the amplicon dataset. Additionally, we observed a clear bi-modal distribution of prevalence in the amplicon dataset, wherein families were either present in nearly all samples, or very few samples (Figure S4). This could suggest that amplicon-based sequencing under-represents some abundant bacterial clades, perhaps due to primer biases or 16S copy number variation . The discrepancies we observed between shallow shotgun metagenomic and 16S amplicon sequencing data are qualitatively similar to previous evaluations of shallow shotgun sequencing , but determining whether shotgun metagenomic or 16S amplicon data more accurately estimates microbiome features requires communities of known composition.
Kaiju-based classification of shotgun metagenomic reads is reported to more accurately estimate taxon relative abundances than 16S amplicon sequencing . Similarly, found that shallow shotgun sequencing more accurately recapitulated communities of a known composition than 16S amplicon data. These previous benchmarking studies, and similarities in the biological patterns described by our shotgun metagenomic and amplicon datasets lead us to conclude that shallow shotgun sequencing provides a suitable, if not superior, substitute for 16S rRNA gene amplicon-based characterization of the microbiome.