Sequence analysis
Raw sequence reads were analyzed using a bioinformatics pipeline
designed to trim and sort the sequence reads according to sample
identification. An outline of the bioinformatic process is as follows:
(1) raw reads were paired using PEAR; (2) followed by demultiplexing
using 8 basepair index sequences unique to each sample (mismatches
discarded); (3) lastly, OTUs from each sample were taxonomically
assigned using BLAST against 12S vertebrate sequences available in
GenBank and using BLAST against COI arthropod sequences available in
MIDORI (Leray et al., 2018; Machida et al., 2017).
A series of filtering and quality control measures were carried out on
taxonomically assigned sequences. For 12S vertebrate data, we initially
removed OTUs that were identified as human DNA or contaminants (if total
number of reads per sample was less than 100 or averaged less than the
number of reads in the negative controls). We then removed
non-amplifying samples determined by a 500 read sample replicate
threshold. Within the remaining sample replicates, we removed OTUs with
a percentage identity score less than 95%. With this filtered data, we
additionally removed OTUs that totaled less than 1% of the total number
of remaining sequences in a sample. Finally, we eliminated species that
were not found in both sample replicates. We then compared taxonomic
assignment with the known regional fauna to reassign non-regional
species with closely related, regional matches. If no suitable
species-level matches were discovered, these taxa were then assigned at
the genus or family level or removed from the dataset.
For COI sandfly species data, there was a similar set of quality control
measures. We again removed non-amplifying samples with a 500 read
threshold for a sample replicate. We then removed non-sandfly sequences
based on family and genus taxonomic designations so that only sandfly
species from family Psychodidae were retained. OTUs with a percentage
identity score less than 95% and query sequences that totaled less than
1% of the total number of sequences in that sample were removed.
Finally, species that were not present in both sample replicates were
removed. With this curated dataset, we manually blasted each individual
OTU to examine if there were other local taxa with equal or nearly equal
query percentage matches. If so, we reassigned these species to the
genus level, which was the case for all Nyssomyia species.