Suggestions for designing effective sequencing experiments
Data generated from amplicon sequencing is inherently compositional and
provides relative abundances, which are independent of the total
microbial load of the original sample. It has been previously shown that
analyzing compositional datasets with standard statistical techniques
(including Pearson correlations or t tests on proportions) can
lead to very high (up to 100%) false positive discovery rates
(56, 57). The
potential high false positive rates will undoubtedly lead any
data set to present some correlations with microbiome data, which is,
for the soil science, at an unprecedented rate given that microbiome
data presents thousands of different individual variables. The
possibility to obtain significant results, therefore, may also lead to
an “abuse” of the statistical significance (also referred to “p
hacking”). While exploratory analysis is useful, researchers should
always remember that an effect or association does not exist just
because it was statistically significant, and even more important is
that inference should be scientific and not merely statistical. In
recent years, the discussion around the abuse of p-values and their
importance has risen
(58–60), and some
alternative options have been proposed
(60), including
the use of more stringent p-values for claims of new discoveries
(61,
62). Clearly the issue is much more complicated than a simple critique
to the p-value, but involves scientific research at all levels,
including the publish or perish culture insinuated in academic fields,
and therefore we address the reader to further explore this topic
through the above-mentioned citations.
Nevertheless, the issue of generating false conclusions based on
spurious correlation exists, which include the variability inherent in
amplicon sequencing data. When adopting a “let’s sequence and see”
approach, many correlations (including false positive) will be
generated. Given that exploratory research often leads to follow-up
research, increasing our confidence will reduce the chances of research
born on unsubstantiated findings. Adopting a more stringent p-value
threshold will reduce the false positive rate, at the cost of type II
errors. In order to avoid this, if we wanted to adopt a more stringent
p-value while maintaining statistical power, it was shown that a 70%
increase in sample size has to be obtained. We understand that this is
often unrealistic, but we also recognize that this could save future
efforts born on unsubstantiated research. We focus more often on
expanding the depth of analyses on the same few samples at the expense
of replication.
Improving ecological insights from sequencing
Consequently, the majority of sequencing studies remain highly descriptive due to their design and the limitations of the nature of the data. Microbial ecology as a field should bridge microbiological isolation approaches and characterization of microbial communities, while reconciling the heterogeneity of soil systems in which microorganisms live. Recent studies are beginning to combine other forms of data with amplicon sequencing data to improve investigations of ecological patterns.
Combinations of amplicon sequencing and stable-isotope probing has also been used as a viable option to link microbial activity to microbial abundance (54).
Other researchers have combined sequencing approaches in order to improve inferences made from amplicon sequencing data (55).