Phylogenomic and population genomic analyses
Maximum Likelihood analysis based on 3997 SNPs (contig dataset) inferred a phylogenetic split between Virginia and Ohio samples (Fig. 4). In all cases, technical replicates of the same specimen were placed together, even with levels of missing data as high as 87% (USNM 525251). Principle component analyses were strongly affected by levels of missing data (Fig. 5A) and by differences between RADseq and capture-based replicates (Fig. 5B). When analyzing all samples (3997 SNPs, n=32), PC 1 (24%) separated samples by geography, but with less separation of replicates with high levels of missing data (Fig. 5A). PC 2 (17%) separated replicates with high levels of missing data, as well as the RADseq and capture-based replicates to some degree. The filtered dataset in which samples with high missingness were removed (3997 SNPs, n=27) separated replicates by geography along PC 1 (29%) and separated RADseq and capture-based replicates along PC 2 (18%) (Fig. 5B). The contig dataset filtered to only capture-based replicates and loci shared by 90% of samples (713 SNPs, n=25) split samples by geography along PC 1 (20%), and again by level of missingness along PC 2 (12%). The most divergent samples along PC 2 were all formalin-fixed samples, which had the highest amounts of missing data (Fig. S8).
Estimates of nucleotide diversity yielded similar values for supernatant and pellet replicates: supernatant = 0.35 (SD = 0.004), pellet = 0.34 (SD = 0.008), but significantly different values for formalin-fixed and RADseq replicates: formalin-fixed = 0.25 (SD = 0.007), RADseq = 0.26 (SD = 0004; Fig. 6). The strict SNP filtering regime (95% complete, 298 SNPs) reduced differences between the formalin-fixed and other capture-based replicates, but still inferred significant differences in estimates between RADseq and all capture-based replicates (Fig. S3A). Counts of heterozygous sites inferred similar levels of heterozygosity between the three capture replicates: supernatant (31.6% SD = 5.8), pellet (30% SD = 3.3), formalin-fixed (29.3% SD = 5.6). RADseq replicates had a significant homozygote bias (12.4% SD = 3.4) compared with the capture-based replicates (Fig. S4).