Evaluating effects of applying SCNIC to discern microbes that differ between groups in the HIV and Great Lakes datasets
OTUs/modules that differed with MSM status (HIV study) and between lakes (Great Lakes study) were identified using ANCOM [28] for each feature. For the first study, we focused on evaluating differences in the microbiome between MSM and non-MSM without confounding by HIV infection status, by only using samples from HIV negative individuals. We chose ANCOM because it is also a tool designed specifically for working with compositional microbiome data. ANCOM was applied to the original feature table where SCNIC was not applied, as well as to feature tables output from SCNIC using SparCC at different R-value thresholds with the SMD algorithm.
While applying SparCC, SCNIC uses the recommended practice of the SparCC manuscript of filtering based on average relative abundance across samples [38] .The SparCC manuscript suggests this filter because removing features with high abundances, even in a few samples, will upset the ability of the method to control for the number of reads per sample in its compositionality adjustment. Because this method can retain OTUs that are highly abundant in only a single sample, we removed features that had 0 values in more than 5% (~ 29/146) of samples before applying ANCOM but after applying SparCC. Significant differences between groups were determined as those above the W-value threshold determined by ANCOM.