Genome characters of Apibacter and phylogenetic inference
We isolated 14 strains of Apibacter from worker bees of A. cerana from Sichuan, Jilin, and Qinghai provinces in China (Dataset S1). Two genomes (strains B3706 and B2966) were completed into single circular chromosomes by sequencing on the PacBio platform. The assemblies of genomes sequenced with either Illumina or BGISEQ contained 12–49 contigs and with full completeness as evaluated by CheckM (Parks, Imelfort, Skennerton, Hugenholtz, & Tyson, 2015) (Table S2).Apibacter strains isolated from A. cerana had genome sizes ranging from 2.26 to 2.35 Mb, similar to those from the bumble bees (2.33 Mbp; Brandt et al. 2016), but smaller than those from A. dorsata (2.63–2.76 Mbp; Kwong, Steele, and Moran 2018) (Table S2). Parasitic pathogenic species from genera Riemerella ,Bergeyella , Weeksella , and Ornithobacterium also have small genome sizes ranging from 2.16–2.44 Mb comparing to other type strains in the Chryseobacterium clade, suggesting that they are subjected to genome reduction as documented for other symbionts (Fig. S2 B, Table S2) (Pérez-Brocal, Latorre, & Moya, 2013). The GC contents of the honeybee symbiotic Apibacter are lower than its non-symbiotic relatives, a common feature of symbionts attributed to the mutational bias and weak selections (Fig. S2 B) (McCutcheon et al., 2019; McCutcheon & Moran, 2012).
The average nucleotide identities (ANIs) between genomes obtained in this study and those from Malaysian A. cerana (Apibactersp. wkB309, Kwong & Moran, 2016) are 95.82%–97.33%, which suggests that Apibacter carried by the mainland A. cerana have diverged from those from the Sundaland A. cerana , on the verge of speciation (Table S2). Six genomes among the 14 sequenced strains have almost identical ANIs (99.99%), the consensus of which was used in subsequent analyses. The genomic divergence in Apibacter isolates is more obvious between bee hosts, as the ANIs are only 85% and 74% to those isolated from bumble bees and A. dorsata , respectively. Despite of large sequence divergence, genome structures are mostly conserved across the Apibacter strains from A. cerana andBombus , even though few gene rearrangements and inversions are observed in Apibacter genomes isolated from A. dorsata(Fig. S1). But the overall structural variation seems not substantial when Apibacter diverged with the hosts. Such a pattern is in congruent with the observations in Bartonella apis fromA. mellifera , and in Buchnera aphidicola from aphids (Chong, Park, & Moran, 2019; Segers et al., 2017).
A maximum-likelihood phylogeny was inferred using a total of 681 orthologous single-copy genes present in all Apibacter, related taxa and one outgroup (Flavobacterium aquatile LMG 4008) chosen from the Flavobacteriaceae family (Fig. S2 A) (Li, Stoeckert, & Roos, 2003; Stamatakis, 2014). Consistent with the previous phylogenetic relationship (Kwong & Moran, 2016), Apibacter species formed a monophyletic group and the isolates from A. cerana clustered together (Fig. S2 A). The Apibacter clade was sister the lineage consisting of Elizabethkingia , Riemerella andChryseobacterium genera, which is referred to as Clade C hereafter following Kwong and Moran (Kwong & Moran, 2016). These genera contain environmental non-pathogens, opportunistic pathogens, and parasitic pathogens (McBride, 2014). Finally, the Apibacter clade and Clade C together formed a sister relationship to Clade E (Empedobacter , Weeksella , and Ornithobacterium ), which consists of free-living and parasitic strains associated with mammal and bird diseases (McBride, 2014).

Apibacter conferred a large number of genes loss but preserved specific host beneficial functions

To infer the genes loss patterns in Apibacter spp. during adaptive transition to a bee gut symbiont, we constructed the last common ancestor (LCA) for Apibacter spp. and close relatives using generalized parsimony and analyzed the gene flux. A total of 601 genes were lost at the node leading to the LCA of Apibacterbranch (Fig. 2A). Of these, 498 were preserved in both Clade C and E, and 99 were only present in Clade C. Unique genes present in Clade C but lost in Apibacter might be associated with non-symbiotic lifestyles, e.g. environmental free-living. Among genes with known functions, those lost in Apibacter are enriched in the COG categories of inorganic ion transport, transcription, amino acid transport and metabolism, carbohydrate transport and metabolism, cell wall biogenesis and lipid transport and metabolism (8.7, 8.5, 8.1, 7.7, 7.7 and 6.5% of genes with COG annotation respectively, Dataset S2). The loss of cell wall biosynthesis is in line with the general pattern of genome reduction found in symbionts, which facilitates the exchange of molecules through the host-symbionts interface and allows further degeneration of metabolites in the symbionts (McCutcheon & Moran, 2012). The losses of amino acid membrane transport, carbohydrate and lipid metabolic processes could be explained as an adaptation to a nutrient-rich environment (Schmid et al., 2018).
It showed that a few symbiotic bacteria could provide major functions in pollen degradation in the bee gut, facilitating polysaccharide utilization for the host (Kešnerová et al., 2017), similar toBacteroidates in human gut (Koropatkin, Cameron, & Martens, 2012). However, genes involved in carbohydrate metabolism are substantially lost in Apibacter . Compared with Clade C and E,Apibacter lost most of the carbohydrate active enzymes (CAZymes) genes and retained no more than two polysaccharide utilization loci (PUL). The remaining PULs in Apibacter are composed of only tandem susCD -like genes, and lack all surrounding CAZyme genes (Dataset S3). The protein sequences of these PUL residue structures are conserved across Apibacter isolates from A. cerana andBombus (>87% similarity). To the contrast, the PULs in Apibacter genomes from A. dorsata are more diverse, with one from wkB301 homologous to those from A. cerana and bumble bee (>70% similarity), one conserved among wkB301 and wkB180 from A. dorsata (>95% similarity), and one unique to wkB180 (isolate of bumble bee, Dataset S3). The variations in PUL residue structure among Apibacter strains suggests independent gene losses of CAZyme genes in the A. dorsatalineage.
In contrast to significant gene loss in Apibacter , specific genes beneficial to the host were preserved. For instance, the mannose-6-phosphate isomerase encoded by manA is responsible for degradation of the toxic mannose for the host (Zheng et al., 2016). This gene is lost in all strains of Clade C and E, but is preserved by allApibacter strains, indicating that mannose detoxification is an important mutualistic trait in Apibacter (Dataset S5). It is known that symbiotic gut bacteria are capable of synthesizing amino acids for host and other co-occuring symbionts (Kwong et al., 2014; McCutcheon & Moran, 2012). Consistently, Apibacter have preserved all genes underlying amino acid synthesis, which are inherited from the LCA. On the contrary, the parasitic pathogens from generaBergeyella , Weeksella , Riemerella andOrnithobacterium , have all conferred a genome reduction, while losing substantial amino acid synthesis genes (Dataset S4) (Rohmer, Hocquet, & Miller, 2011).

Respiratory nitrate reduction was enriched and conservative inApibacter

A total of 1,349 gene families are shared by all three clades, representing core gene functions shared by bacteria of varied lifestyles (Fig. 2C). Apibacter LCA has 226 unique gene families, which include functions that are specific to the adaptation to the bee host. Clade C and Clade E have more gene families (467 gene families) in common compared to what is shared with the Apibacter group, in congruent with the non-symbiotic lifestyles shared between them. Gene ontology (GO) enrichment analysis was performed to explore overrepresented functions specific to the Apibacter group. Interestingly, the gene families belonging to the respiratory nitrate reduction (NAR) pathway are enriched in Apibacter (Dataset S5). Furthermore, the biosynthesis of the molybdenum cofactors (Moco), which are required for NAR nitrate reductase (Moreno-Vivián, Cabello, Martínez-Luque, Blasco, & Castillo, 1999; Stewart, 1988), are also enriched. These genes locate in the NAR operon and are harbored in allApibacter isolates, with only one Moco synthesis gene (mosC) and one nitrate transporter missing in two A. dorsata isolates (Fig. 3). The NAR related genes are highly conserved in amino acid sequences among genomes isolated from the same bee hosts (>94% similarity). Isolates from A. cerana are more similar to the one from bumble bee than those from A. dorsata (Fig. 3). The function of the NAR pathway seems to be highly conserved, eventhough a key member gene is replaced by alternative gene. The narI encoding the membrane biheme b quinol-oxidizing γ subunit of the nitrate reductase is missing. Alternatively, a Rieske protein homolog was identified in the NAR operon. The Rieske protein is an iron-sulfur protein (2Fe-2S), with a function in transferring electrons from the quinone pool, which is equivalent to NarI (Schneider & Schmidt, 2005). Therefore, the Rieske protein homolog is expected to have replaced the function of NarI in transferring electrons to the NarGH complex (Arshad et al., 2015). The replacement of narI in the NAR operon structure was previously identified in halophilic archaea, which represents an ancient respiratory nitrate reductase (Cabello, Roldán, & Moreno-Vivián, 2004; Yoshimatsu, Iwasaki, & Fujiwara, 2002). Additionally, three copies of the narK nitrate transporter gene are identified inApibacter (Fig. 3B) (Cole & Richardson, 2017), implying high efficiency of nitrate respiration in Apibacter . To infer the evolutionary origin of the nitrate reductase genes, the Maximum likelihood tree based on the narG and narH genes were constructed (Fig. S4). Interestingly, the narG and narH ofApibacter formed a monophyletic clade. The phylogenetic relationship of narG and narH is consistent with the phylogeny of the bacterial strains that harbor the genes. Their closely related genes are mostly from strains of genera Flavobacterium , implying that the nitrate reductase genes of Apibacter are vertically inherited.
In contrast, bacteria from Clade C and E lack intact NAR pathways, which is congruent with their aerobic nature. It is worth noting that parasitic pathogens from Riemerella and Ornithobacteriumalso lack the NAR operon ( Mavromatis et al., 2011; Van Empel & Hafez, 1999), despite that they are microaerophilic, as with Apibacter(Fig. 3, Dataset S5). These results imply that the NAR pathway might be particularly beneficial to gut commensal bacteria, which prompted us to survey the NAR pathway in other honeybee gut bacteria. Interestingly, NAR pathway is also mostly conserved in Snodgrassella strains, but absent in the other four core bee gut bacterial phylotypes (Gilliamella , Bifidobacterium , Lactobacillus Firm4 and Firm5). This difference further suggests that the NAR pathway might be generally required by microaerobic bacteria inhabiting gut epithelium (Dataset S5). However, the nitrate reductase of Snodgrassellashowed obvious variations when compared with that of Apibacter : Four subunits of nitrate reductase are encoded by the narGHJIgenes that are similar to E. coli (Dataset S5) (Moreno-Vivián et al., 1999), and genes involved in Moco synthesis are not located next to the NAR operon (Fig. 3B).

Apibacter lost ancestral gene families related to pathogenicity and antibiotic resistance

Some gene families shared by the common ancestor of Apibacter , Clade C and E are lost in the Apibacter group but are retained in Clade C and E. As both Clade C and E encompass important mammal and bird pathogens, the absence of these genes families among Apibactergroup suggests that they might be superfluous or deleterious to the interactions with the host. The 467 gene families shared by Clade C and E are overrepresented in histidine metabolism, fatty acid degradation, phenylacetate catabolism and urease activity. A survey of the genes in these pathways showed that Apibacter spp. lost all genes related to histidine degradation, two key genes involved in long chain fatty acid beta-oxidation, and the genes responsible for the production of host toxic virulence using intermediates generated in the phenylacetate catabolism (Fig. 4) (Teufel et al., 2010). The loss of genes that are responsible for histidine degradation and long chain fatty acid beta-oxidation implies that these substrates are less accessible forApibacter to use as energy sources. To the contrary, genes of these three pathways are prevalently distributed among Clade C and E (Dataset S5). All of these pathways are reported to be involved in host recognization, successful colonization, virulence factor regulation and production in pathogenic bacteria (Law et al., 2008; Moraes et al., 2014; Zarzycki-Siek et al., 2013; Zhang, Ritchie, & Rainey, 2014).
Antibiotic resistances are promiscuous for bacteria in the Flavobacteriaceae family, causing difficulties in the treatment of their infection (McBride, 2014). Referring to the CARD database, 13 antibiotic resistance genes which conferred resistance to beta-lactam, fluoroquinolones, tetracyclines and glycopeptides were identified in genomes of Clade C and E (Fig S3). These resistance genes are absent in the Apibacter group, except that a lincosamides resistance gene is identified in A. adventories wkB301. These results may be explained by the fact thatA. cerana bee gut microbes are less exposed to antibiotics.