Phylogeny
Phylogenetic analysis of the Neisseria -related symbionts was performed on two different matrices, the concatenated “multigene matrix” and the “16S matrix”. To avoid a possible artefact due to HGT, the “multigene matrix” was composed of 10 genes with reliably supported origin within Neisseriales (the genes selected arbitrarily from the blast category “Neisseriales”; detailes on the assignement to the categories are povided in SupplementaryInformation/SupplementaryText) which were present in both genomes. Two betaproteobacteria of the order Burkholderiales,Burkholderia cepacia and Acidovorax sp. KKS102, one gammaproteobacterium, Legionella pneumophila subsp. pneumophila str. Philadelphia 1 , and one alphaproteobacterium, Rhizobium leguminosarum , were used as outgroups (SupplementaryData3). For each gene, the sequences were aligned in MAFFT v7.450 using the E-INS-i setting (Katoh, Misawa, Kuma, & Miyata, 2002). Ambiguously aligned positions and divergent blocks were discarded using Gblocks v. 091b (Castresana, 2000). The LG +G+I was determined as best fitting model for all matrices by Akaike information criterion (AIC) using smart model selection of PhyML (Lefort, Longueville, & Gascuel, 2017). Maximum-likelihood phylogenetic reconstructions were performed using online PhyML server v3.0 (Guindon et al., 2010) with 100 bootstrap replicates for each single-gene alignment and also for the concatenated “multigene matrix”. Bayesian inference of the “multigene matrix” was conducted in MrBayes v3.2.5 using LG +G+I evolutionary model (Ronquist et al., 2012). Four chains were run for 20 000 000 generations with sampling frequency set to 1 000 generations. Convergence was checked in Tracer v1.6.0. (Rambaut, Drummond, Xie, Baele, & Suchard, 2018).
The “16S matrix” was designed with the aim to obtain wider phylogenetic context by including the bacteria for which the 16S rRNA gene sequence is the only available marker. The 16S rRNA gene sequences were retrieved by blastn from the GenBank (SupplementaryData3). Two betaproteobacteria Taylorella equigenitalis str. 09-09 andAdvenella kashmirensis str. cv4, and one alphaproteobacterium,Rhizobium capsici str. IMCC34666, were used as outgroups. The matrix was prepared with the same procedure as our “multigene matrix” and analyzed by maximum likelihood (ML) and Bayesian inference (BI). The evolutionary models best fitting to dataset were selected according to the Akaike information criterion (AIC) using jModelTest2 v. 2.1.10 (Darriba, Taboada, Doallo, & Posada, 2012). ML analysis and 100 bootstrap replicates were performed in PhyML (Guidon & Gascuel, 2003) using selected TN93 +G+I evolutionary model. BI analysis was performed in MrBayes v. 3.2.5, using GTR +G+I substitution model running four chains for 10 000 000 generations and checked for convergence as was previously described.
Phylogenetic trees of the hosts, used as a background for the microbiome diversity, were reconstructed as described in SupplementaryInformation/SupplementaryText.