Analyses of protein structure and chemistry
To visualize the putative tertiary structure of proteins coded for by CORFs under balancing selection, the Protein Data Bank was searched for homologs with resolved structures at www.rcsb.org (Burley et al., 2021). Top hits from Escherichia coli was used for downstream analysis. To identify the putative location of CORFs under balancing selection within the E. coli protein complex, CORF sequences from the representative MAGs for SGBs were aligned against the sequence of theE. coli protein and visualized with the Mol* 3D Viewer. Disorder was estimated along the protein sequence with IUPred2 (Mészáros, et al., 2018) and hydrophobicity was calculated using the Kyte and Doolittle method (Kyte and Doolittle, 1982) with a window size of 21.