Figure captions
Figure 1. Quality profiles for selected samples in each study. Each panel shows the quality profile for the sample with the highest (top) and lowest (bottom) number of reads for each study. The gray scale indicates the frequency of each quality score at each base position (darkness indicates a higher frequency). Green and orange lines indicate the mean and quartile quality score at each position, respectively.
Figure 2. Percentage of reads preserved after standard processing with dada2 (Ben J Callahan et al., 2016) without (top) and with (bottom) chimera checking. Control samples (n=5) are shown for each study, as a percent of the original reads recovered from INSDC databases.
Figure 3. Relationship between taxonomic classification and read length, for the control samples (n=5) of each dataset. Color indicates taxonomic level.
Figure 4. Relationship between alpha diversity and read length. Richness (q=0, top) was calculated as the number of ASVs per sample. Inverse Simpson’s index (q=2, bottom) was calculated according to (Chao et al., 2014). The diversity in control samples (n=5) was assessed for each read length.
Figure 5. Variance in community composition with increasing read lengths. The mean pairwise dissimilarity between the 5 control samples in each study was assessed using Sorensen (a) and Bray-Curtis (b) dissimilarities.
Figure 6. Information loss from shorter read lengths. For each dataset, Mantel tests between the Sorensen (a) and Bray-Curtis (b) dissimilarities 200-bp reads and each shorter read length evaluated the correlation in microbial communities between shorter read lengths and the most information-rich version of the dataset (200 bpp).
Figure 7. Outcome of statistical tests comparing control and disturbed communities for each dataset, across read lengths. Kruskall-Wallis tests (top) evaluated differences in richness and inverse Simpson’s index between the control and recently-disturbed (1 day, n=5 for each time point) for each study. Similarly, PERMANOVA tests (bottom) evaluated differences in the community composition between control and recently-disturbed samples using Sorensen and Bray-Curtis dissimilarities.