Identifying core genomes of gut bacterial species
To enable scans for balancing selection across the gut bacterial species of humans, all 154,723 metagenome assembled genomes from Pasoli et al. (2019) were downloaded from http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html. Of these SGBs, bacterial SGBs that were represented by >100 genomes and retained for downstream analyses of intraspecific patterns of DNA polymorphism. To identify the core genomes of the well-represented SGBs, Open Reading Frame (ORF) Finder from the National Center for Biotechnological Information was used with default settings to identify all ORFs in each of the genomes. Next, CoreCruncher (Harris et al., 2021) was used with default settings to identify the core set of ORFS for each SGB. In this analysis, the largest genome from each SGB was chosen as the pivot genome from which to identify the core ORFs (CORFs). All CORFs were then annotated against the updated Clusters of Orthologous Groups database (Galperin et al., 2021).