One of the first steps in the analysis of amplicon sequencing data is the removal of potential sequencing errors. Doing so eliminates sequencing artefacts that may falsely boost diversity levels \cite{Edgar2011,Haas2011}. The use of amplicon sequencing variants (ASVs), instead of operational taxonomic units (OTUs) helps to overcome this issue by assigning a greater probability of a true biological sequence being more abundant than an error-containing sequence \cite{Callahan_2017}⁠. To that end, bioinformatic tools such as DADA2 \cite{Callahan_2016}⁠ and Deblur \cite{Amir_2017}⁠ attempt to use sequencing error profiles to resolve amplicon sequencing data into ASVs. An ASV is more likely to have an intrinsic biological meaning (i.e. true DNA sequence), as opposed to an OTU which can either be a representation of the most abundant biological sequence or a consensus sequence. Additionally, ASVs make the merging of datasets possible, even when the sequencing primer pairs are different  \cite{Callahan_2017}. We therefore advocate the deposition of processed ASV tables together with sample metadata and taxonomy files in public repositories to facilitate meta-analyses of soil amplicon sequencing datasets.
Another relevant step when analyzing sequencing data is to account for the different sequencing efforts across samples (i.e. library size, sequencing depth) that can result in a substantially different number of recovered reads even among replicates. Ways to tackle this include total library size normalization and rarefaction, with the latter remaining debated to date \cite{McMurdie_2014}⁠. Bioinformatic tools such as DeSeq2 and EdgeR provide ways to normalize count tables \cite{Love_2014,Robinson_2010}⁠. Both methods are applied on raw or low-abundance filtered count tables, and have performed well in both real as well as simulated datasets and outperform rarefaction-based approaches  \cite{McMurdie_2014}⁠ ⁠. Other alternatives that account for the compositional aspect of sequencing data include center log (CLR), isometric log (ILR) or additive log (ALR) ratios transformations on a count data matrix (Aitchison 1984, \cite{Egozcue_2003})⁠. 
Following data normalization, traditional workflows include the generation of distance matrices for ordination, clustering, and variance partitioning analyses. Commonly used distance metrics include Bray-Curtis, Jaccard and Unifrac (weighted and unweighted). These metrics are often used although they do not take into account the compositional nature of sequencing data. The Aitchison distance - defined as the Euclidian distance on top of a center-log transformed count matrix – is a viable compositional alternative ( Aitchison 1984, )⁠ that allows to perform ordinations (e.g. PCA biplots). Additionally, the Philr transform metric has been introduced as a compositional alternative to the weighted Unifrac that carries phylogenetic information (38)⁠. Most of the above mentioned compositional options are implemented in R packages (except Aitchison to the best of our knowledge) and include publicly available tutorials. In light of the challenges related to normalization and analysis of compositional data, we recommend a critical evaluation of available data analysis tools to best address the nature of each experimental setup (see section 6).  
In addition to the limitations imposed by sequencing technology and the compositional nature of sequencing data, another aspect that prevents data analyses from being fully quantitative is the potential multiple copies of marker genes per organism. For example, the 16S rRNA gene copy number per bacterial cell can vary between 1 and 18 and can additionally show variation within different strains of the same species \cite{Stoddard_2014,Coenye2003,Lavrinienko2021}. Therefore, relying solely on the number and diversity of 16S rRNA gene sequences can lead to inaccurate estimates of abundance and diversity of microbial communities. Several computational tools can  correct amplicon datasets for the number of 16S rRNA gene copies based on existing genome information (e.g. PICRUSt2 \cite{Douglas_2019} and CopyRighter \cite{Angly_2014}). However, correcting for 16S rRNA gene copy numbers in sequencing surveys remains challenging, particularly for soil, as the gene copy numbers are only known for a subset of the soil microbes \cite{Louca_2018,Nunan_2020}. This challenge becomes even more problematic for marker genes of fungi and other eukaryotes, such as protists, as the copy number here can vary drastically between taxa \cite{Gong_2013,Gong_2019}. Other housekeeping genes, which occur only once in a genome, have been proposed as univeral phylogenetic marker genes (such as the recA \cite{Eisen1995}), but their use remains limited due to lower taxonomic resolution and limited availability in databases. 

4. Addressing and interpreting compositional sequencing data

Interpreting relative abundance data

Compositionality of amplicon sequencing data presents challenges to the interpretation of changes in microbial community structure. The amount of sequence data obtained through high-throughput sequencing is a fixed value, resulting in a random sampling of sequences from a sample that cannot be directly linked to actual counts based on sequences alone \cite{Gloor_2017}. Numerous studies have revealed shifts in microbial community composition across treatments including gradients of temperature, pH, and salinity, as well as seasonal or temporal parameters. This practice is robust on a community level when broad-scale changes in taxa are of interest (e.g. phylum), and has resulted in similar ecological conclusions as data generated with more quantitative approaches  \cite{Piwosz2020}. However, at higher taxonomic resolution (e.g. genus), quantitative inferences from relative abundance sequencing data become more challenging. Due to the nature of sequencing, a change in relative abundance of one species is often reflected in a corresponding change in another species. We depict such challenges in interpretation in the following theoretical experiment. 
Amplicon sequencing data obtained from the same soil sample at two different time points (t1, t2) consists of two species (A, B). The relative abundance observed for species A and B is 0.55 and 0.45 at timepoint 1 (t1), and 0.8 and 0.2 at timepoint 2 (t2), respectively (Figure 3). From t1 to t2, species B decreases in relative abundance coupled to an increase in the relative abundance of species A. The bars below (t2a-t2e) illustrate five examples of changes in absolute abundance in t2 that could underlie the patterns observed in relative abundance data. The initial time point (t1) is also shown for comparison. 
The first case represents a biologically true situation where the absolute abundance matches the relative abundance observations. There are no changes in biomass from t1 to t2 and species A increases, whereas species B decreases (Fig. 3, t2a). The second case depicts an increase in overall biomass between t1 and t2 caused by an absolute increase in species B and no absolute changes in species A (Fig. 3, t2b). In relative abundance data, species A would appear to reduce in abundance  when in fact that was caused by a change in species B alone. The third case represents an opposite scenario where there is a total decrease in biomass from t1 to t2 caused by a decrease in species B and no changes in species A (Fig. 3, t2c). Again, in a relative abundance plot, species B would appear to have decreased in relative abundance (true), but inferences regarding species A would be false. The fourth case represents a situation where there is a general increase in biomass from t1 to t2 prompted by increases in absolute abundances of both species A and B (Fig. 3, t2d) while the fifth case represents an opposite scenario where there is a general decrease in biomass from t1 to t2 caused by decreases in absolute abundances of both species A and B (Fig. 3, t2e). In these last two cases, as long as the proportion between the two species stays the same, the relative abundance plot reflects biological changes. 
This theoretical exercise shows, that even for a community of only two member species, there are five potential changes in the absolute abundance that could underlie the observed shift in relative abundance. Given that soil communities usually harbour thousands of species, the degree of complexity increases dramatically. This is further compounded by factors such as PCR bias, which underlines the caution that must be applied when interpreting relative abundances retrieved from amplicon sequencing.