Bioinformatics analysis
Raw sequence data underwent QC, trimming, and removal of the synthetic
DNA template using FastQC (21), Trimmomatic (22) and bbmap (BBTools
v37.28; (23)), respectively. The sequence data that passed QC was
imported into the QIIME2 package (v2019.10) (24)) for analysis using the
deblur pipeline (25) with the SILVA DB (v128 at 97%, (26)). Following
filtering of chimeric sequences and taxonomic assignments, OTUs that
were not assigned as bacteria were removed. Due to the nature of the
samples (i.e. low biomass), additional filtering was conducted using
decontam (27). Finally, the samples were rarefied at a sampling depth of
4,000 using the diversity plugin (diversity core-metrics-phylogenetic).
Differences in the relative abundance of microbial taxa between vaginal
and fallopian tube samples were assessed using theĀ R package microeco
(28), with statistical significance measured with the Kruskal-Wallis
statistical test (with Benjamini-Hochberg adjusted p-value <
0.05 considered significant). Alpha diversity (Shannon index) and beta
diversity (Bray-Curtis dissimilarity index) metrices were measured using
the QIIME2 diversity plugin (diversity alpha and diversity beta
respectively). Comparisons of alpha diversity (Shannon index) between
groups of interest (i.e. case vs. control, or vaginal vs. fallopian tube
were conducted using the QIIME2 diversity plugin (diversity
alpha-group-significance), with the Kruskal-Wallis statistical test
(with Benjamini-Hochberg adjusted p-values considered significant at
<0.05). Comparisons of beta diversity (Bray-Curtis
dissimilarity) between groups of interest (as above) were conducted
using the QIIME2 diversity plugin (diversity beta-group-significance),
with the PERMANOVA statistical test (using 999 permutations, with
Benjamini-Hochberg adjusted p-values considered significant at
<0.05). The linear discriminant analysis effect size (LEfSE)
biomarker discovery tool (29) [used as part of the R package
microeco] was used to identify differentially abundant OTUs, where
variation in abundance of taxa (with relative abundance greater than
0.01%) was considered significant where the LDA score was greater than
3 and/or the p-value from the LefSe test was < 0.05.