Use of SCNIC results in the detection of lake associated taxa
To test consistency in patterns across different datasets, we also
tested the effects of applying SCNIC with default SparCC and SMD
parameters and varying R-value thresholds on results from the Great
Lakes dataset. Specifically, we identified features that significantly
differed between Lake Michigan (N=16) and Lake Superior (N=33) using
ANCOM[28].
We began with a table of 3,871 OTUs, and 725 of these remained after
removing OTUs not present in at least 5% of the samples. We found that
168 OTUs were significantly different between lakes without using SCNIC
using ANCOM. Using SCNIC at R-values of 0.2, 0.4, and 0.65 and running
ANCOM on the filtered output OTU table, we found that most significant
features were modules at an R threshold of 0.4 but not 0.2 or 0.65
(Table 1). Use of SCNIC resulted in the detection of individual OTUs
that were now significant that were not before (3 and 13 for R-value
thresholds of 0.4 and 0.65 respectively). Application of SCNIC also
identified many additional OTUs that become of interest because they
were now part of significant modules (139, 64, and 12 OTUs at 0.2, 0.4,
and 0.65 respectively; Table 1). However, unlike for the HIV dataset,
several OTUs that were individually significant were no longer
significant with ANCOM after applying SCNIC and this effect was the most
pronounced with lower R-value thresholds (64, 14, and 6 OTUs that were
significant with SCNIC were no longer significant after applying SCNIC
at 0.2, 0.4, and 0.65 R-value thresholds respectively)(Table 1). This is
likely because microbes that differed between lakes were binned with
loosely correlated microbes that did not, leading to a loss of signal.
Thus, in this case, only SCNIC with a moderate to high R-value threshold
appeared to balance the benefit of the increased power and disadvantages
of loss of signal from binning loosely correlated features.