Figure 1. Patterns of intraspecific polymorphism are associated
with bacterial phylogenetic history. (A) Phylogenetic tree shows the
evolutionary relationships among the Species-level Genome Bins (SGBs) in
the human gut for which >100 metagenome assembled genomes
were recovered by Pasoli et al., (2019). Colors correspond to bacterial
phyla as indicated by the inset. Phylogeny was constructed with IQTree2,
and all internal nodes were supported by >70% of 1000
ultrafast bootstrap replicates. Barplots encircling phylogeny show
genome-wide Tajima’s D for SGBs, with larger bars indicating more
negative values. Bars are colored based on whether the value is below
(dark grey) or above (light grey) the median value across all SGBs. Arcs
on the outside of the phylogeny indicate clades corresponding to
bacterial genera that displayed significantly different genome-wide
Tajima’s D estimates a the p < 0.001 significance
threshold. (B) Boxplots show the median and inner-quartile range of
per-species genome-wide Tajima’s D estimates for bacterial genera
represented by >6 SGBs. Horizontal bars indicate
significant differences between comparisons of genera after false
discovery rate correction for multiple pairwise
tests; p < 0.05 *; p < 0.001 ***.
Figure 2. Balancing selection targets the active sites of
multidrug efflux pumps in multiple prominent gut bacterial species. A)
Histogram shows the distribution of Tajima’s D values across all 566,958
CORFs analyzed from 287 gut bacterial species. Highlighted are CORFs
from multidrug efflux pumps, including AcrB and AcrA subunits, from
multiple prominent gut bacterial species. B) Crystal structure of a
multidrug efflux pump from Escherichia coli containing homologous
AcrB subunits to the top hit identified in Bacteroides dorei (PDB
ID: 4DX5) C) Binding pocket of AcrB homolog in E. coli interacting with
ligand Erythromycin A (PDB ID: 4ZJO).
Figure 3. Signatures of balancing selection predict relative
abundances of gut bacterial species. A) Scatter plot shows the
relationship between the maximum Tajima’s D value for a CORF in a SGB
and the relative abundance of the SGB in the human gut as estimated by
CoverM. The SGBs with ORFs displaying the top five Tajima’s D values
(Table S1) are labelled. Polynomial regression was fitted in R, and
asterisks show significance of linear, quadratic, and cubic
coefficients; p < 0.05 *, p < 0.01
**, p < 0.001 ***. B) Phylomorphospace plot
shows the projection of the phylogenetic tree of SGBs onto the space of
relative abundances of the SGBs and maximum Tajima’s D values for a CORF
in the SGBs. Large circles represent tips, and smaller circles
represented states at ancestral nodes inferred under a Brownian motion
model of evolution. Asterisks indicates significance of PGLS; * p< 0.05.