Reads mapping and SNP calling
First, we trimmed adapter sequences from the sequence reads by applying Trimmomatic (Bolger et al. 2014) to the FASTQ files. Then, we mapped the adapter-trimmed sequence reads to the D. pulex reference genome (PA42 version 4.1) using BWA (Li and Durbin 2009). Coverage for each site was calculated by counting the total mapped reads, and coverage for each sample is calculated as the total mapped bases divided by the mapped genomic regions. We examined the genomic distribution of the sites coverage and set minimum and maximum sample-coverage cutoffs to avoid analyzing problematic sites (Table 1; Fig. S1). SNPs were identified by Samtools (Li et al. 2009) with “samtools mpileup -uf ref.fasta sorted.bam | bcftools call -mv > raw.vcf” and “bcftools filter -s LowQual -e ’%QUAL<20’ raw.vcf > flt.vcf”.
If each NZ population represents a single clone, all of its bi-allelic sites should have equal numbers of reads mapped to the two parental alleles. To check this, we searched for sites with reads for the two parental alleles deviating from a 1:1 ratio. To minimize mapping bias, 1) we remapped the reads using bowtie with “-q -m 1 -v 3 –best” to obtain unique mapped reads (Stevenson et al. 2013); 2) we removed regions with >3 SNPs within 100 bp because additional differentiating sites will interfere with read alignment (Stevenson et al. 2013); 3) To reduce the mapping bias toward the reference alleles (Degner et al. 2009), we generated an assembly by masking all of the bi-allelic sites with “N” in the PA42 4.1 reference. Only bi-allelic sites with reads for the two alleles deviating from a 1:1 ratio from both the original and masked assemblies are considered true deviating sites.
Characterizing D. pulicaria-specific markers
To identify D. pulicaria -specific markers, we first searched for homozygous SNPs shared by all 14 D. pulicaria clones. Then, we checked these SNPs in 42 D. pulex clones collected across North American, Europe, and China and eliminated those that appeared in any of the D. pulex clones. For the remaining SNPs, we checked the corresponding loci in the genome in each of the D. pulex clones and only kept those that are homozygous for the same two bases across all D. pulex clones.