Sequence analysis
Raw sequence reads were analyzed using a bioinformatics pipeline designed to trim and sort the sequence reads according to sample identification. An outline of the bioinformatic process is as follows: (1) raw reads were paired using PEAR; (2) followed by demultiplexing using 8 basepair index sequences unique to each sample (mismatches discarded); (3) lastly, OTUs from each sample were taxonomically assigned using BLAST against 12S vertebrate sequences available in GenBank and using BLAST against COI arthropod sequences available in MIDORI (Leray et al., 2018; Machida et al., 2017).
A series of filtering and quality control measures were carried out on taxonomically assigned sequences. For 12S vertebrate data, we initially removed OTUs that were identified as human DNA or contaminants (if total number of reads per sample was less than 100 or averaged less than the number of reads in the negative controls). We then removed non-amplifying samples determined by a 500 read sample replicate threshold. Within the remaining sample replicates, we removed OTUs with a percentage identity score less than 95%. With this filtered data, we additionally removed OTUs that totaled less than 1% of the total number of remaining sequences in a sample. Finally, we eliminated species that were not found in both sample replicates. We then compared taxonomic assignment with the known regional fauna to reassign non-regional species with closely related, regional matches. If no suitable species-level matches were discovered, these taxa were then assigned at the genus or family level or removed from the dataset.
For COI sandfly species data, there was a similar set of quality control measures. We again removed non-amplifying samples with a 500 read threshold for a sample replicate. We then removed non-sandfly sequences based on family and genus taxonomic designations so that only sandfly species from family Psychodidae were retained. OTUs with a percentage identity score less than 95% and query sequences that totaled less than 1% of the total number of sequences in that sample were removed. Finally, species that were not present in both sample replicates were removed. With this curated dataset, we manually blasted each individual OTU to examine if there were other local taxa with equal or nearly equal query percentage matches. If so, we reassigned these species to the genus level, which was the case for all Nyssomyia species.