Mixed samples
While NGSpeciesID was not designed specifically for metabarcoding data, the flexibility of the algorithmic steps in the pipeline enables the tool to handle mixed samples. We recovered seven consensus sequences corresponding to the seven DNA barcodes pooled in the mixed sample analysis. NGSpeciesID generated highly accurate consensus sequences for all barcodes, ranging from 99.2% to 100%. For the mixed sample test we adjusted the read abundance ratio for the clusters to 5%, since the seven barcodes at equal abundance are each present in only 14% of the reads in the sample. Therefore, the default abundance cutoff of 10% would require 210 out of the 300 reads to be used per cluster, which might not be the case. Three out of seven barcodes showed a slightly lower consensus accuracy than in the respective single species analysis, which is likely due to the presence of some reads from other barcodes in the clusters that might have affected the polishing accuracy, and the random selection of the 300 reads for each barcode (as individual read error rates can differ). We expect some cross-contamination (reads assigned to the wrong cluster), especially for closely related species. However, this should improve with the continued improvement of third-generation sequencing read accuracy. This experiment shows that NGSpeciesID, even though it was not developed for mixed samples, can recover highly accurate consensus sequences from metabarcoding data. However, its performance on metabarcoding data will need to be investigated separately with mock datasets of varying ratios and sample relationships (taxonomic divergences).