Phylogeny
Phylogenetic analysis of the Neisseria -related symbionts was
performed on two different matrices, the concatenated “multigene
matrix” and the “16S matrix”. To avoid a possible artefact due to
HGT, the “multigene matrix” was composed of 10 genes with reliably
supported origin within Neisseriales (the genes selected arbitrarily
from the blast category “Neisseriales”; detailes on the assignement to
the categories are povided in
SupplementaryInformation/SupplementaryText) which were present in both
genomes. Two betaproteobacteria of the order Burkholderiales,Burkholderia cepacia and Acidovorax sp. KKS102, one
gammaproteobacterium, Legionella pneumophila subsp. pneumophila
str. Philadelphia 1 , and one alphaproteobacterium, Rhizobium
leguminosarum , were used as outgroups (SupplementaryData3). For each
gene, the sequences were aligned in MAFFT v7.450 using the E-INS-i
setting (Katoh, Misawa, Kuma, & Miyata, 2002). Ambiguously aligned
positions and divergent blocks were discarded using Gblocks v. 091b
(Castresana, 2000). The LG +G+I was determined as best fitting model for
all matrices by Akaike information criterion (AIC) using smart model
selection of PhyML (Lefort, Longueville, & Gascuel, 2017).
Maximum-likelihood phylogenetic reconstructions were performed using
online PhyML server v3.0 (Guindon et al., 2010) with 100 bootstrap
replicates for each single-gene alignment and also for the concatenated
“multigene matrix”. Bayesian inference of the “multigene matrix” was
conducted in MrBayes v3.2.5 using LG +G+I evolutionary model (Ronquist
et al., 2012). Four chains were run for 20 000 000 generations with
sampling frequency set to 1 000 generations. Convergence was checked in
Tracer v1.6.0. (Rambaut, Drummond, Xie, Baele, & Suchard, 2018).
The “16S matrix” was designed with the aim to obtain wider
phylogenetic context by including the bacteria for which the 16S rRNA
gene sequence is the only available marker. The 16S rRNA gene sequences
were retrieved by blastn from the GenBank (SupplementaryData3). Two
betaproteobacteria Taylorella equigenitalis str. 09-09 andAdvenella kashmirensis str. cv4, and one alphaproteobacterium,Rhizobium capsici str. IMCC34666, were used as outgroups. The
matrix was prepared with the same procedure as our “multigene matrix”
and analyzed by maximum likelihood (ML) and Bayesian inference (BI). The
evolutionary models best fitting to dataset were selected according to
the Akaike information criterion (AIC) using jModelTest2 v. 2.1.10
(Darriba, Taboada, Doallo, & Posada, 2012). ML analysis and 100
bootstrap replicates were performed in PhyML (Guidon & Gascuel, 2003)
using selected TN93 +G+I evolutionary model. BI analysis was performed
in MrBayes v. 3.2.5, using GTR +G+I substitution model running four
chains for 10 000 000 generations and checked for convergence as was
previously described.
Phylogenetic trees of the hosts, used as a background for the microbiome
diversity, were reconstructed as described in
SupplementaryInformation/SupplementaryText.