ObiTools filtering analyses for taxonomic assignments and
comparison of the two areas
We applied a first bioinformatic workflow that optimizes the ability to
detect identified taxonomic entities. The sequencing reads were
processed to remove errors and analyzed using programs implemented in
the ObiTools package (http://metabarcoding.org/obitools, Boyer et al.
2016) following a published protocol (Valentini et al. 2016). The
forward and reverse reads were assembled using the ILLUMINAPAIREDEND
program using a minimum score of 40 and retrieving only joined
sequences. The reads were then assigned to each sample using the
NGSFILTER software. A separate data set was created for each sample by
splitting the original data set in several files using OBISPLIT. After
this step, we analysed each sample individually before merging the taxon
list for the final ecological analysis. Strictly identical sequences
were clustered together using OBIUNIQ. Sequences shorter than 20 bp, or
with occurrences lower than 10 were excluded using the OBIGREP program.
The OBICLEAN program was then run within a PCR product. We discarded all
sequences labelled ‘internal’ that correspond most likely to PCR
substitutions and indel errors. Taxonomic assignment of the remaining
sequences was performed using the program ECOTAG the sequences extracted
from the release 142 (standard sequences) w of the European Nucleotide
Archive (ENA). Taxonomic assignments were corrected as follows to be
more conservative: for an identification match > 98%
identity, we validated a species level, for a 96-98% match, genus level
if available and for an 90-96% match, family level if possible.
Considering the bad assignments of a few sequences to the wrong sample
due to tag-jumps (Schnell et al. 2015), all sequences with a frequency
of occurrence below 0.001 per taxon and per library. We further
corrected for Index-Hopping (MacConaill et al., 2018) with a threshold
empirically determined per sequencing batch using experimental blanks
(i.e. combinations of tags not present in the libraries), for a given
sequencing batch between libraries (Polanco-Fernández et al. 2021).
From the taxonomic assignment recovered from the ObiTools analyses, we
compared the species recovered in each area. We further compared the
species recorded by eDNA with other species distribution sources,
including a compiled set of species distribution maps for the Caribbean
region (Robertson and Van Tassell, 2019). Differences in species
recovered between the two areas using eDNA were further compared with
those of the UVC transects.