Suggestions for designing effective sequencing experiments

Data generated from amplicon sequencing is inherently compositional and provides relative abundances, which are independent of the total microbial load of the original sample. It has been previously shown that analyzing compositional datasets with standard statistical techniques (including Pearson correlations or t tests on proportions) can lead to very high (up to 100%) false positive discovery rates (56, 57)⁠. The potential high false positive rates will undoubtedly lead any data set to present some correlations with microbiome data, which is, for the soil science, at an unprecedented rate given that microbiome data presents thousands of different individual variables. The possibility to obtain significant results, therefore, may also lead to an “abuse” of the statistical significance (also referred to “p hacking”). While exploratory analysis is useful, researchers should always remember that an effect or association does not exist just because it was statistically significant, and even more important is that inference should be scientific and not merely statistical. In recent years, the discussion around the abuse of p-values and their importance has risen (58–60)⁠, and some alternative options have been proposed (60)⁠, including the use of more stringent p-values for claims of new discoveries (61, 62)⁠. Clearly the issue is much more complicated than a simple critique to the p-value, but involves scientific research at all levels, including the publish or perish culture insinuated in academic fields, and therefore we address the reader to further explore this topic through the above-mentioned citations.
Nevertheless, the issue of generating false conclusions based on spurious correlation exists, which include the variability inherent in amplicon sequencing data. When adopting a “let’s sequence and see” approach, many correlations (including false positive) will be generated. Given that exploratory research often leads to follow-up research, increasing our confidence will reduce the chances of research born on unsubstantiated findings. Adopting a more stringent p-value threshold will reduce the false positive rate, at the cost of type II errors. In order to avoid this, if we wanted to adopt a more stringent p-value while maintaining statistical power, it was shown that a 70% increase in sample size has to be obtained. We understand that this is often unrealistic, but we also recognize that this could save future efforts born on unsubstantiated research. We focus more often on expanding the depth of analyses on the same few samples at the expense of replication.

Improving ecological insights from sequencing

Consequently, the majority of sequencing studies remain highly descriptive due to their design and the limitations of the nature of the data. Microbial ecology as a field should bridge microbiological isolation approaches and characterization of microbial communities, while reconciling the heterogeneity of soil systems in which microorganisms live. Recent studies are beginning to combine other forms of data with amplicon sequencing data to improve investigations of ecological patterns.
Combinations of amplicon sequencing and stable-isotope probing has also been used as a viable option to link microbial activity to microbial abundance (54)⁠.
Other researchers have combined sequencing approaches in order to improve inferences made from amplicon sequencing data (55)⁠.