Identifying core genomes of gut bacterial species
To enable scans for balancing selection across the gut bacterial species
of humans, all 154,723 metagenome assembled genomes from Pasoli et
al. (2019) were downloaded from
http://segatalab.cibio.unitn.it/data/Pasolli_et_al.html. Of these
SGBs, bacterial SGBs that were represented by >100 genomes
and retained for downstream analyses of intraspecific patterns of DNA
polymorphism. To identify the core genomes of the well-represented SGBs,
Open Reading Frame (ORF) Finder from the National Center for
Biotechnological Information was used with default settings to identify
all ORFs in each of the genomes. Next, CoreCruncher (Harris et al.,
2021) was used with default settings to identify the core set of ORFS
for each SGB. In this analysis, the largest genome from each SGB was
chosen as the pivot genome from which to identify the core ORFs (CORFs).
All CORFs were then annotated against the updated Clusters of
Orthologous Groups database (Galperin et al., 2021).