Figure 1. Patterns of intraspecific polymorphism are associated with bacterial phylogenetic history. (A) Phylogenetic tree shows the evolutionary relationships among the Species-level Genome Bins (SGBs) in the human gut for which >100 metagenome assembled genomes were recovered by Pasoli et al., (2019). Colors correspond to bacterial phyla as indicated by the inset. Phylogeny was constructed with IQTree2, and all internal nodes were supported by >70% of 1000 ultrafast bootstrap replicates. Barplots encircling phylogeny show genome-wide Tajima’s D for SGBs, with larger bars indicating more negative values. Bars are colored based on whether the value is below (dark grey) or above (light grey) the median value across all SGBs. Arcs on the outside of the phylogeny indicate clades corresponding to bacterial genera that displayed significantly different genome-wide Tajima’s D estimates a the p < 0.001 significance threshold. (B) Boxplots show the median and inner-quartile range of per-species genome-wide Tajima’s D estimates for bacterial genera represented by >6 SGBs. Horizontal bars indicate significant differences between comparisons of genera after false discovery rate correction for multiple pairwise tests;  < 0.05 *;  < 0.001 ***.
Figure 2. Balancing selection targets the active sites of multidrug efflux pumps in multiple prominent gut bacterial species. A) Histogram shows the distribution of Tajima’s D values across all 566,958 CORFs analyzed from 287 gut bacterial species. Highlighted are CORFs from multidrug efflux pumps, including AcrB and AcrA subunits, from multiple prominent gut bacterial species. B) Crystal structure of a multidrug efflux pump from Escherichia coli containing homologous AcrB subunits to the top hit identified in Bacteroides dorei (PDB ID: 4DX5) C) Binding pocket of AcrB homolog in E. coli interacting with ligand Erythromycin A (PDB ID: 4ZJO).
Figure 3. Signatures of balancing selection predict relative abundances of gut bacterial species. A) Scatter plot shows the relationship between the maximum Tajima’s D value for a CORF in a SGB and the relative abundance of the SGB in the human gut as estimated by CoverM. The SGBs with ORFs displaying the top five Tajima’s D values (Table S1) are labelled. Polynomial regression was fitted in R, and asterisks show significance of linear, quadratic, and cubic coefficients;  < 0.05 *,  < 0.01 **,  < 0.001 ***.  B) Phylomorphospace plot shows the projection of the phylogenetic tree of SGBs onto the space of relative abundances of the SGBs and maximum Tajima’s D values for a CORF in the SGBs. Large circles represent tips, and smaller circles represented states at ancestral nodes inferred under a Brownian motion model of evolution. Asterisks indicates significance of PGLS; * p< 0.05.