Materials and Methods

Sample collection and Apibacterisolation

The worker bees of Apis cerana were collected from Sichuan, Jilin and Qinghai Provinces in China (Dataset S1). A total of six guts were dissected, homogenized and frozen in glycerol (25%, vol/vol) at –80°C. Frozen stocks of homogenized guts were streaked out on the heart infusion agar (Oxoid) or Columbia agar (Oxoid) supplemented with 5% sheep blood (Solarbio). The plates were incubated at 35°C in 5 % CO2 for 2–3 days. Colonies were screened by PCR using universal primers 27F and 1492R for the16S rRNA gene and Sanger sequencing.

Fluorescence in situ hybridization (FISH) microscopy

Ten adult worker bees of A. cerana were collected from a single colony in Beijing, China in May 2018. The whole guts of bees were dissected and frozen in RNAlater (Qiagen) at -80 ºC. The FISH protocol was adapted from Yuval et al. (Gottlieb et al., 2006). In brief, the guts preserved in RNAlater were fixed in the fixative solution (Ethanol: Chloroform: Acetic acid = 6:3:1) and kept overnight at the room temperature, followed by rinsing with 1 × PBS. The guts were then treated with 1 mg/mL proteinase K solution at 56 °C for 20 minutes, followed by rinsing with 1 × PBS. The guts were incubated with 6% H2O2 ethanol solution for 2 hours, followed by rinsing twice with 1 × PBS. Finally, the guts were permeated with 0.1% Triton for 2 hours, followed by rinsing for 2 or 3 times in 1 × PBS buffer.
Binding FISH probes targeting 16S rRNA genes were designed specifically for each of the genera Apibacter , Snodgrasella andGilliamella , following Martinson et al. (Martinson et al., 2012) (Table S1). Probe hybridization was performed overnight at 37 °C, followed by rinsing in 1 × PBS for 3 times. Spectral imaging was used to visualize gut sections on a ZEISS LSM780 confocal microscope. Autofluorescence was assayed for each tissue type as the negative control. Apibacter were labeled with florescent together with either S. alvi or G. apicola .

Estimation of bacterial abundance using qPCR

Worker bees were collected from 2 colonies at the same apiary in Beijing, China in September 2018. Samples were stored at -80°C. Gut segments (i.e., midgut, ileum, and rectum) were dissected and separated. DNA was extracted following Powell et al. (Powell et al., 2014). Universal 16S primers 27F and 355R for bacteria were employed andApibacter -specific primers Apiq9-F and Apiq9-R were designed in this study (Table S1). Standard curves were created and reactions were carried out on a LightCycler 480 (Roche Applied Science, Indianapolis, IN). Significance in differences within and between samples was determined using Mann-Whitney (Wilcoxon-rank) nonparametric U tests.

In vivo colonization experiments

Microbiota-depleted bees were obtained following Zheng et al. (Zheng, Powell, et al., 2017). Late stage pupae of both A. cerana orA. mellifera were removed from brood frames and incubated for 24-36 h in sterile plastic bins at 35 °C and 75% humidity. Newly emerged bees were kept in cup cages provided with sterilized sucrose syrup (0.5 M) and bee bread. For inoculation, bacteria strain of theApibacter sp. B3706 was grown on the heart infusion agar (Oxoid), which was supplemented with 5% sheep blood from glycerol stocks, for two days. Cultivated strains were scraped and suspended in 1 × PBS to reach an OD600 of 1.0. Batches of 25 bees were placed in a 50 ml conical tube, and 50 μl sucrose syrup was added. The tube was rotated gently so that the syrup was coated on the surface of bees. The tube was rotated again after the addition of 50 μl of theApibacter sp. B3706 suspension (~2 × 106 cells per bee). The bacteria were inoculated into the bee guts as a result of auto- and allogrooming. Inoculated bees were reared in cup cages. Three replicate enclosures were set up for bothA. cerana or A. mellifera with 20 bees in each cage. The inoculated bees were fed with sucrose syrup and sterilized pollen throughout the experiment. Colonization levels were determined at day 6 using qPCR as described previously.

Genome sequencing, assembly and annotation

Genomic DNA of Apibacter isolates were extracted using a bead-beating method as previously described (Powell et al., 2014). Two strains (Apibacter sp. B2966 and B3706) were sequenced with the PacBio RS (PacBio) at Nextomics Biosciences Co. Ltd., China, and assembled using OLC algorithm of the Celera (Chaisson & Tesler, 2012). Three strains (Apibacter sp. B3239, B3546 and B2912) were sequenced with a BGISEQ-500RS (BGI) at BGI-Qingdao, China. The other nine strains (Apibacter sp. B3813, B3883, B3887, B3889, B3912, B3913, B3918, B3924 and B3935) were sequenced with a HiSeq X-Ten (Illumina) at Novogene Co., Ltd, China. The twelve strains except forApibacter sp. B2966 and B3706 were assembled withSOAPdenovo-Trans (version 2.04, -K 51 -m 91 –R for 150PE reads; -K 31 -m 63 –R for 100PE reads) (Xie et al., 2014), SOAPdenovo(only for 150PE reads) and SPAdes (version 3.13.0, -k 33,55,77,85) (Bankevich et al., 2012). The quality trimmed reads were mapped back to the assembled contigs using minimap2-2.9 (Li, 2018) to examine assembly quality. The bam file generated by samtools(version 1.8) (Li et al., 2009) and the assemblies were processed byBamDeal(https://github.com/BGI-shenzhen/BamDeal, version 0.19) to calculate and visualize sequencing coverage and GC contents of assembled contigs. Spurious contaminants, contigs with low depths, unnormal GC contents and those apparently differing from the cluster, were removed from the draft genome. Assembled genomes were then annotated using PROKKA 1.13.3 (Seemann, 2014).

Comparisons of genome structure, genome divergence, and gene contents

The contigs of each assembly were re-ordered according to the single circular genome of strain B3706 using the ‘Contig Mover’ tool of Mauve version 2.4.0 (Darling, Mau, Blattner, & Perna, 2004; Rissman et al., 2009). Pairwise average nucleotide identity (ANI) was calculated with JSpeciesWS (Richter, Rosselló-Móra, Oliver Glöckner, & Peplies, 2016) using the BLASTN algorithm (ANIb). Genome completeness was estimated with CheckM (Parks et al., 2015), which was available at KBase online (Arkin, Stevens, Cottingham, Maslov, & Perez, 2015) using recommended parameters. The genome structures were compared using the R-package genoPlotR (Guy, Kultima, Andersson, & Quackenbush, 2011).

Phylogenetic inference

Gene orthology was determined using OrthoMCL (Li et al., 2003) for all genomes used in this study. All steps of the OrthoMCL pipeline were executed as recommended in the manual and the mcl program was conducted using parameters ‘-abc -I 1.5’.
Protein sequences of the identified single-copy orthologous were aligned using Mafft-linsi (Katoh, Kuma, Miyata, & Toh, 2005). Alignment columns only containing gaps were removed and the alignments were concatenated. The phylogeny was reconstructed using RAxML v8.2.10 (Stamatakis, 2014) with the PROTGAMMAIJTT model and 100 bootstrap replicates.

Gene flux analysis

Gene gain and loss analyses and inferences for gene contents of LCAs (last common ancestors) were conducted using Count (Csurös, 2010). Standard methods used in previous works were employed in the present study (Segers et al., 2017). For each gene family, Wagner parsimony with a gene gain/loss penalty of 2 (Zaremba-Niedzwiedzka et al., 2013) was used to infer the most parsimonious ancestral states. Parameter choices followed a previous publication (Oyserman et al., 2016).

Analysis of functional gene contents

Gene contents were categorized based on COG and eggNOG (Huerta-Cepas et al., 2019) functions. Significantly differential genes and pathways were enriched by OrthoVenn (Yi, Colemanderr, Chen, & Gu, 2015). For gene families subset of interest, BLASTP(Altschul et al., 1997) was used to query against the NCBI’s nr database and TIGRFAM Hidden Markov Model (HMM) (Haft, Selengut, & White, 2003). The phylogenies of narGand narH were produced from protein sequences obtained by blastp against NCBI’s nr database with default parameters. And the hits aligned using Mafft-linsi, trimmed with trimAL (Capella-Gutiérrez, Silla-Martínez, & Gabaldón, 2009) for sites with over 50% gaps and phylogenetic trees were constructed using RAxML PROTGAMMALG with 100 bootstraps. The parameters were based on previous publication (Neuvonen et al., 2016).
Carbohydrate-active enzyme (CAZymes) gene families were identified for all analyzed genomes using the command-line version of dbCAN (Database for automated Carbohydrate-active enzyme Annotation) (Yin et al., 2012), following authors’ instruction. The PULs (Polysaccharide utilization loci) were identified using the TIGRfam (Potter et al., 2018) and Pfam (Finn et al., 2014) models, following Terrapon et al. (Terrapon, Lombard, Gilbert, & Henrissat, 2015). Antibiotic resistance genes were identified by querying all the genomes against the Comprehensive Antibiotic Resistance Database (CARD) (McArthur et al., 2013).