Reads mapping and SNP calling
First, we trimmed adapter sequences from the sequence reads by applying
Trimmomatic (Bolger et al. 2014) to the FASTQ files. Then, we mapped the
adapter-trimmed sequence reads to the D. pulex reference genome
(PA42 version 4.1) using BWA (Li and Durbin 2009). Coverage for each
site was calculated by counting the total mapped reads, and coverage for
each sample is calculated as the total mapped bases divided by the
mapped genomic regions. We examined
the
genomic distribution of the sites coverage and set minimum and maximum
sample-coverage cutoffs to avoid analyzing problematic sites (Table 1;
Fig. S1). SNPs were identified by Samtools (Li et al. 2009) with
“samtools mpileup -uf ref.fasta sorted.bam | bcftools call -mv
> raw.vcf” and “bcftools filter -s LowQual -e
’%QUAL<20’ raw.vcf > flt.vcf”.
If each NZ population represents a single clone, all of its bi-allelic
sites should have equal numbers of reads mapped to the two parental
alleles. To check this, we searched for sites with reads for the two
parental alleles deviating from a 1:1 ratio. To minimize mapping bias,
1) we remapped the reads using bowtie with “-q -m 1 -v 3 –best” to
obtain unique mapped reads (Stevenson et al. 2013); 2) we removed
regions with >3 SNPs within 100 bp because additional
differentiating sites will interfere with read alignment (Stevenson et
al. 2013); 3) To reduce the mapping bias toward the reference alleles
(Degner et al. 2009), we generated an assembly by masking all of the
bi-allelic sites with “N” in the PA42 4.1 reference. Only bi-allelic
sites with reads for the two alleles deviating from a 1:1 ratio from
both the original and masked assemblies are considered true deviating
sites.
Characterizing D. pulicaria-specific markers
To identify D. pulicaria -specific markers, we first searched for
homozygous SNPs shared by all 14 D. pulicaria clones. Then, we
checked these SNPs in 42 D. pulex clones collected across North
American, Europe, and China and eliminated those that appeared in any of
the D. pulex clones. For the remaining SNPs, we checked the
corresponding loci in the genome in each of the D. pulex clones
and only kept those that are homozygous for the same two bases across
all D. pulex clones.