Use of SCNIC results in the detection of lake associated taxa
To test consistency in patterns across different datasets, we also tested the effects of applying SCNIC with default SparCC and SMD parameters and varying R-value thresholds on results from the Great Lakes dataset. Specifically, we identified features that significantly differed between Lake Michigan (N=16) and Lake Superior (N=33) using ANCOM[28].
We began with a table of 3,871 OTUs, and 725 of these remained after removing OTUs not present in at least 5% of the samples. We found that 168 OTUs were significantly different between lakes without using SCNIC using ANCOM. Using SCNIC at R-values of 0.2, 0.4, and 0.65 and running ANCOM on the filtered output OTU table, we found that most significant features were modules at an R threshold of 0.4 but not 0.2 or 0.65 (Table 1). Use of SCNIC resulted in the detection of individual OTUs that were now significant that were not before (3 and 13 for R-value thresholds of 0.4 and 0.65 respectively). Application of SCNIC also identified many additional OTUs that become of interest because they were now part of significant modules (139, 64, and 12 OTUs at 0.2, 0.4, and 0.65 respectively; Table 1). However, unlike for the HIV dataset, several OTUs that were individually significant were no longer significant with ANCOM after applying SCNIC and this effect was the most pronounced with lower R-value thresholds (64, 14, and 6 OTUs that were significant with SCNIC were no longer significant after applying SCNIC at 0.2, 0.4, and 0.65 R-value thresholds respectively)(Table 1). This is likely because microbes that differed between lakes were binned with loosely correlated microbes that did not, leading to a loss of signal. Thus, in this case, only SCNIC with a moderate to high R-value threshold appeared to balance the benefit of the increased power and disadvantages of loss of signal from binning loosely correlated features.