Inference from co-occurrence analysis

Challenges associated with amplicon sequencing analysis and interpretation also complicate the use of co-occurrence network analysis from soil samples. Network construction is based on the detection of significant correlations between taxa, and can be constructed to investigate properties of microbial communities including organismal co-existence (e.g. REF), identification of keystone species (e.g.  \cite{Banerjee2018}) and the stability of community structure (e.g.  \cite{de2018,Shi2016}). There has been a recent increase in the  number of studies including the construction of association networks for microbial communities (REF). However, many of these studies have been criticized for their highly descriptive use of networks, that  do not propose an ecological interpretation of detected patterns (REFs?). 
The difficulty in interpretation stems from inferring causal relationships between taxa based on correlations, which is a long-standing topic of discussion in ecology \cite{Blanchet_2020,Barner_2018}. Especially  for soil, it is important to keep in mind that the data contained in each environmental sample is only a snapshot of a complex spatio-temporal dynamics (see section XYZ). In fact, it captures a noisy signal which reflects several biological processes including: birth, death, dispersal, as well as intra- and inter-specific interactions; all subjected to environmental filtering. Moreover, while interactions occur at the level of individual microorganisms the detectable abundance patterns can only be measured on relatively large and possibly highly heterogeneous soil samples, as mentioned in section on spatial structure of soil. This represents an additional kind of confounding effect that can introduce many spurious associations, posing additional challenges unique to the study of soil ecosystem. 
For microbiome data, the associations are most often assigned trough detection of significant correlations between relative abundances, where spurious links can be detected if compositional data is not appropriately handled as explained in the previous section on quantitative sequencing studies . Here as well, log ratios can be applied to deal with the data set, as done by several popular network construction tools, e.g. SparCC (log ratios) and SPIECEASI (clr) \cite{Kurtz2015,Friedman2012}. Another option available is to convert relative abundances into absolute values by using the total gene copy numbers obtained from qPCR (see section XYZ).  As the outcome, these construction tools generally produce networks with biological species as nodes and edges representing associations between them.
Despite these considerations, network  analysis can be very useful to isolate possible interactions between species and to analyze species coexistence. To improve this analysis we suggest a  careful comparison with null models \cite{Connor_2017} and complementing it with environmental information \cite{Goberna2019,Lima_Mendez_2015} to help interpret the results and eliminate some indirect associations between species.  In summary, the field of network inference is a rapidly evolving one and we constantly see new alternatives proposed to solve currently standing issues. Nevertheless we still lack a definite framework which allows to generate co-occurrence networks with a straight forward and easy interpretation. 
In this context the use of tools from the field of complex systems can be central to formulate and test hypotheses about how the structure of the microbial community can be linked to its function \cite{Faust_2012,Röttjers2018} , identify important/keystone species \cite{Banerjee2018} and even make predictions about system's stability to environmental perturbations \cite{de2018,Shi2016}.

6. Suggestions for more robust statistical analyses in sequencing studies

Amplicon sequencing data is well-suited for exploratory analysis and hypothesis generation in soil research but can also  be applied for targeted hypothesis testing if appropriate statistical methods are selected (see Fig 2?; e.g. Gloor et al., 2017). As amplicon datasets from soil are characterized by compositionality, the use of standard statistical methods (including Pearson correlations or t tests on proportions) can lead to very high false positive discovery rates (up to 100% ; 56, 57)⁠.
Almost any data set will show significant correlations with soil microbiome data which consist of thousands of different individual variables. The possibility to obtain significant results, therefore, may also lead to an abuse of the statistical significance (also referred to “p hacking”). Consequently, we caution researchers from inferring effects or associations just because they were statistically significant.  
In recent years, the discussion around the abuse of p-values has emerged (58–60) and some alternative options have been proposed⁠, including the use of more stringent p-values for claims of new discoveries (61, 62)⁠. Adopting a more stringent p-value threshold will reduce the false positive rate, at the cost of type II errors. In order to avoid this, if we wanted to adopt a more stringent p-value while maintaining statistical power, it was shown that a wide rang of common statistical tests a 70% increase in sample size has to be achieved (61). We understand that this is often unrealistic, but we also recognize that this could save future efforts born on unsubstantiated research. Instead, current research endeavors often focus more on expanding the depth of analyses at the expense of replication. Clearly, the issue is much more complicated than a simple critique to the p-value, but involves scientific research at all levels, including the publish or perish culture insinuated in academic fields, and therefore we address the reader to further explore this topic through the above-mentioned citations.
To explore how sample size influences statistical power in soil microbiome analyses, we calculated the dependency of the statistical power of permutational multivariate analysis of variance (PERMANOVA) on the effect size for different data sets varying in replicate number. Our data set featured a range of soils  \cite{Zheng_2019} and can therefore be regarded as an example that appropriately covers the heterogeneity inherent to soil and associated microbial communities (see section XYZ). We used the R package "micropower" \cite{Kelly_2015} which allows to simulate distance matrices from a set of parameters to generate available PERMANOVA power or necessary sample size for a planned microbiome analysis. Data for both the 16S rRNA gene and the ITS1 region were filtered to include only bacteria and archaea (16S) and fungi (ITS). We calculated the Jaccard similarity index (Supplementary Fig. 1a,b) and used the average and standard deviation across all samples to simulate OTU/ASV tables with similar parameters. Figure 5a shows the increase in statistical power needed to detect significant difference with increasing effect size for multiple group representing different sample size. The graph shows clearly that by increasing sample size the power needed to detect small difference largely increase, even with small increase in number of samples. To better visualize these differences, we further calculated the average statistical power for a range of effect sizes  (  ω2 ) defined as 'Low' (0.001-0.04), 'Medium' (0.04-0.08) and 'High' (0.08-0.12). Our analysis showed that the number of replicates hardly affects the statistical power if microbial communities feature strong differences (Figure 5b, "High"). However, if communities with higher similarity were to be analyzed, we found that an increase of the replicate number from 4 to 5 was sufficient to almost double the statistical power of small effect size ("Low") and to achieve the recommended power above 0.8 for medium effect sizes (Figure 5b, "Low" and "Medium"). Consequently, these effects were more pronounced when the number of replicates was doubled (4 to 8; Figure 5b). Similar effects were obtained for the fungal data set (Supplementary Fig. 1c).
In practice, obtaining knowledge about the level of differences in soil microbial communities a priori represents a complicated undertaking. If preliminary sequencing data is available we encourage researchers to perform such power analyses before experimental planning. If not, one could still estimate the likelihood/degree of (dis)similarity from biogeochemical parameters or other microbiome data published on similar soils. For example, if soils of the same field site were to be analyzed in an experiment where short-term effects of treatments (e.g. fertilization) are the aim of the study, one could assume that the microbial communities would be rather similar in their structure which would suggest to increase the replication scheme. Such considerations should also include the amount of technical replicates that will be pooled to alleviate the spatial heterogeneity of soils (see section XYZ) and should be taken with care. We refer to further literature for experimental planning and robust statistical analyses (e.g. time-series; e.g. Coenen et al 2020 (24 ⁠other REFs).  
 Hannes: Link back to spatial and temporal sections