Discussion
Comparisons of 118,617 metagenome assembled genomes from 287 gut bacterial species enabled the identification of genes targeted by balancing selection in the human gut. Results revealed that multidrug efflux pumps (MEPs) display the strongest signatures of balancing selection of any gut bacterial core open reading frames (CORFs). MEPs from a diversity of prominent gut bacterial species, includingBacteroides and Bifidobacterium , displayed evidence of balancing selection (Table S1, Figure 2), suggesting that adaptive allelic variation within these loci has been maintained in parallel in multiple bacterial lineages. MEPs were also overrepresented among the CORFs displaying the highest Tajima’s D values, further supporting that balancing selection shapes allelic variation within this functional category of loci.
MEPs serve myriad functions for bacteria, including the extrusion of antibiotics that are commonly used as medicines. Previous work has shown that antibiotic therapies can act as harsh selective agents in the gut, reshaping the community composition of the gut microbiota (Modi et al., 2014) as well as the adaptive trajectories of individual gut bacterial lineages (Banerjee et al., 2021; Card et al., 2021). The findings reported here are consistent with the possibility that medical antibiotic use also contributes to the maintenance of allelic variation within multiple prominent gut bacterial species. In particular, the observation that the CORF displaying the highest Tajima’s D value was a homolog of the AcrB subunit of the RND superfamily of multidrug efflux pumps (Figure 2) suggests medical antibiotic usage as an agent of balancing selection. In Escherichia coli , the periplasmic distal binding pocket of the AcrB subunit binds minocycline, a tetracycline antibiotic, and erythromycin A, a macrolide antibiotic (Du et al., 2018). Moreover, allelic variation at this locus in E. coli has been shown to contribute to antibiotic resistance (Okusu et al., 1996, Blair et al., 2015). However, MEPs are widely distributed among bacterial genomes and serve ancient functions that predate the usage of antibiotics in medical contexts (Blanco et al., 2016), including the extrusion of heavy metals, organic pollutants, plant-produced compounds, and bacterial metabolites. Therefore, it is possible that selective agents other than medical antibiotics may contribute to the maintenance of allelic variation in MEP loci displaying positive Tajima’s D values. If medical antibiotics are in fact driving balancing selection in the MEP loci identified, results presented her imply that these treatments may be among the most influential selective agents maintaining allelic variation in the human gut.
Positive Tajima’s D values are consistent with a history of balancing selection, but they can also be caused by fluctuations in population size. In particular, recent population contractions can generate positive Tajima’s D values in the absence of balancing selection. Here, Tajima’s D was estimated genome-wide for each CORF in each bacterial species analyzed, allowing identification of loci that deviated substantially from the genomic background. This approach provided tests for balancing selection that accounted for genome-wide patterns of nucleotide variation caused by demographic processes. For example, genome-wide Tajima’s values differed significantly among bacterial clades, with species within the Bifidobacterium displaying the most negative values and species within the Bacteroidesdisplaying the most positive values. These differences among genome-wide Tajima’s D values likely reflect difference among the demographic histories of these clades, whereas loci with Tajima’s D values that deviate from the genomic background are more likely to represent targets of balancing selection.
Multidrug efflux pumps were significantly enriched among the CORFs displaying the highest Tajima’s D values included, but this set of CORFs also included a diversity of other functional categories of proteins. Magnesium transporters, helicases, and various synthases and hydrolases were all represented among the ORFs with the highest Tajima’s D values (Table S1). Allelic variation in these enzymes may be maintained by cyclic fluctuations in the availabilities of different substrates. The CORFs with Tajima’s D values greater than three (Table S1) represent excellent candidates for experimental study of the functional consequences of allelic variation within these loci.
Interestingly, the bacterial species that contained CORFs with the highest Tajima’s D values also tended to be the most abundant bacterial species in the human gut based on metagenomic data (Figure 3). This positive association suggests a relationship between balancing selection and fitness in human gut bacteria. One possible explanation for this pattern is that balancing selection is more effective in more abundant bacterial species, given that selection is in general expected to be more efficient in larger populations than smaller populations (Lanfear et al., 2014). Alternatively, the CORFs identified as targets of balancing selection may confer fitness benefits to bacterial species, increasing their competitive advantage over other species in the gut. This hypothesis is supported by the observation that the relationship between balancing selection and relative abundance in the gut remained evident after controlling for bacterial phylogenetic history (Figure 3B). Under this scenario, the allelic variation in MEPs displaying evidence of balancing selection may underlie the success in the human gut of lineages like Bacteroides spp., which are overrepresented in industrialized human populations relative to non-industrialized populations and non-human primates (Yatsunenko et al., 2011; Moeller et al., 2014; Sonnenburg and Sonnenburg, 2019).