Use of SCNIC influences the detection of OTUs that differ between
MSM and non-MSM
We next evaluated the effects of applying SCNIC with default SparCC and
SMD parameters and varying R-value thresholds on downstream statistical
analysis results. To investigate differential abundance based on MSM
status in the HIV dataset we used ANCOM[28]. After removing taxa
that were not present in at least 5% of the samples the OTU table had
317 samples and 639 OTUs. We found that 12 OTUs were significantly
different between MSM and non-MSM without using SCNIC. Using SCNIC at
R-values of 0.2, 0.4, and 0.65 and running ANCOM on the filtered output
feature table, we found that most of the significant features were
modules at an R-value of 0.2 and 0.4 but not 0.65 (e.g. 14 of the 15
significant features were modules at R=0.2) (Table 1). This was the case
even though the vast majority of OTUs were not a part of modules at the
0.4 R-value threshold (Figure 4A). The majority of 12 of the OTUs that
were significant without running SCNIC, were grouped into modules with
each other and with OTUs that were not individually significant without
running SCNIC. These significant modules contained 74, 26, and 1 new OTU
at R-values of 0.2, 0.4 and 0.65 respectively. Using SCNIC also resulted
in the identification of 1, 5 and 25 (at R-values of 0.2, 0.4 and 0.65)
OTUs that were individually significant that were not significant
without running SCNIC, with no OTUs that were individually significant
losing significance because they were binned in a module, indicating an
increase in statistical power resulting from running a test like ANCOM
that controls the FDR.
Considering correlation structure of significant features can help in
understanding the broader community context of bacteria that differ with
MSM status. In module-0 for each of the R-values, which
significantly differed by MSM status in all cases, Prevotella was
the dominant genus (Figure 5). At an R-value of 0.65, all OTUs in
module-0 were assigned to the genus Prevotella (Figure
5C). However, at an R-value of 0.4 module-0 included sevenPrevotella OTUs, one Dialister , and an unidentified member
of the Paraprevotellaceae family. At the R-value of 0.2,Prevotella accounted for 13 of the 25 OTUs and 11 of the 12
pre-SCNIC significant OTUs were all found in this module. This suggests
that individual OTUs that differ with MSM status may in some cases be a
part of a consortium of diverse members that collectively display
features that may contribute to differences in microbiome function.
To further explore this concept, we investigated the results generated
with an R-value of 0.4, as the significant features maintain a strong
level of correlation while being phylogenetically diverse. When running
ANCOM on this feature table, we found that these individually
significant OTUs tended to be joined into modules with other highly
co-correlated microbes and that these modules significantly differed
with MSM (Figure 6). Of particular note, we observe that the modules and
taxa that are significantly related to MSM do not all correlate with
each other. At the R-value of 0.4, module-36 contains two taxa,Erysipelotrichaceae and Clostridium that are negatively
correlated with the other significant taxa and modules (Figure 6).
Module-2 contains Eubacterium, Catenibacterium andPrevotella which are phylogenetically heterogenous but mutually
co-occurring. A follow up experiment, which leverages insights that
SCNIC generates, may combine different strains of microbes to assemble a
community type to test for functional correlates of disease.