Materials and Methods
We used published microsatellite datasets for H. molleri andP. cultripes (Gutiérrez-Rodríguez et al., 2017; Sánchez‐Montes et
al., 2019) and generated SNP datasets for both species. Patterns of
genetic structure between markers were compared based on model-based
clustering analyses and those of genetic diversity were assessed with
individual heterozygosity estimates.
Data collection
Samples from H. molleri and P. cultripes covered most of
their current ranges (Figure S1; Table S1). They were evenly distributed
across the main genetic clusters determined in previous works with
microsatellites (Gutiérrez-Rodríguez et al., 2017; Sánchez‐Montes et
al., 2019), securing the representation of more than 20 samples per
north/south clusters. Microsatellite genotypes from H. molleriincluded 84 individuals from 25 localities genotyped at 18 loci (10%
missing data) from Sánchez-Montes et al. (2019). Microsatellite
genotypes from P. cultripes included 83 individuals from 43
localities genotyped at 14 loci (0 % missing data) from
Gutiérrez-Rodríguez et al. (2017). To facilitate comparisons between
marker datasets, we selected the same 83 individuals of P.
cultripes for SNP genotyping. However, in H. molleri only 39
individuals from the microsatellite dataset were amenable for SNP
genotyping. In this case, we sampled additional individuals from the
same or nearby locations as represented in the original microsatellite
study to complete a dataset of 90 individuals from 25 localities (Table
S1; Figure S1).
Genomic DNA was extracted with ExtractMe Genomic DNA 96-Well kits (DNA
GDAŃSK), and concentrated with QIAamp DNA Micro (QIAGEN GmbtH) kits,
when necessary. DNA extracts from H. molleri and P.
cultripes were standardized to 500 ng of DNA (with exceptions as low as
390 ng) and sent for sequencing at Diversity Arrays Technology
(Australia), which uses a proprietary protocol to sequence reduced
representation of the genome from double-digested restriction fragments.
We chose DArTseq because it has been reported to work well with large
and complex genomes, like those of amphibians (Lambert, Skelly, & Ezaz,
2016). The restriction fragments generated were sequenced in an Illumina
HiSeq 2500 as single-end reads of 77 nucleotides (nt). The sequencing
depths for H. molleri and P. cultripes were 7.7 and 5
million reads per sample, respectively. Diversity Arrays Technology
provides genotypes from the proprietary DArTSoft14 pipeline in a text
file along with several quality parameters on each SNP. Around 30% of
the samples in the run are included as internal replicates to provide
confidence levels on the genotype calls.
Data filtering
We applied several
filtering steps to the SNP genotype matrices using R 3.6.0 (R Core Team
2019) functions from the dartR 1.1.11 package (Gruber, Unmack,
Berry, & Georges, 2018) and custom code. The filters were applied as
follows. First, we retained samples with a proportion of loci with calls
(call rate per individual) greater than 0.35 and loci with high
confidence on their genotype calls (RepAvg parameter from DArTseq
greater than 0.95). We kept loci with balanced alleles (proportion of
reads for each allele across samples between 0.15 and 0.85) and removed
loci whose coverage was 3.5 times higher than the median coverage across
loci to remove potential paralogs (O’Leary, Puritz, Willis, Hollenbeck,
& Portnoy, 2018). Then, we removed loci with a call rate (proportion of
samples with a call) lower than 0.8, retained only one SNP per contig
(the one with greatest repeatability) and removed alleles with a
frequency less than 0.02 (O’Leary et al., 2018) (Figure S2).
Genetic structure
We conducted model-based genetic structure analyses in STRUCTURE v2.3.4
(Pritchard, Stephens, & Donnelly, 2000). For each dataset, we performed
10 replicate runs with values assuming a number of clusters (K) between
1 and 8 (K = 1 to K = 8), to encompass the optimal number of clusters (K
= 2, K = 4 and K = 6) found in previous studies with microsatellites for
the study species (Gutiérrez-Rodríguez et al., 2017; Sánchez‐Montes et
al., 2019), and explore any potential finer substructure. We used an
admixture model with correlated allele frequencies (Falush, Stephens, &
Pritchard, 2003) with no prior information on sample origin. For the
microsatellite data, we used the same run lengths as in the original
publications: 500,000 burnin steps followed by 1,000,000 iterations. For
the SNP datasets, run lengths were shorter as data chains often converge
faster: 30,000 burnin steps followed by 10,000 iterations (Table S2;
Figures S3.1 and S3.2). For the SNP runs, we estimated lambda with K = 1
by averaging lambda estimates across three replicate runs. These values
of lambda (0.67 for H. molleri and 0.69 for P. cultripes )
were then used across all runs of the SNP data, whereas for
microsatellites lambda was fixed to 1. Lower values of lambda can
improve the modeling of correlated allele frequencies when using SNPs,
where often the data is skewed towards rare alleles (Falush et al.,
2003). We ran STRUCTURE in parallel in 8 cores usingStructure_threader (Pina-Martins, Silva, Fino, & Paulo, 2017),
recording steps to log files every 50 and 5000 iterations for the SNP
and microsatellite data, respectively.
Convergence between the 10 replicate runs for each K was evaluated using
Gelman and Rubin’s convergence diagnostic, GR (Gelman & Rubin, 1992),
with function coda ::gelman.diag (Plummer, Best, Cowles, &
Vines, 2006). Values below 1.05 indicate good convergence (Vats &
Knudson, 2018). We used KFinder (Wang, 2019) to compare the best number
of clusters for each dataset through three approaches: (1)
Pr[X|K], the probability of data X given K clusters
(Pritchard et al., 2000), (2) Evanno’s ΔK, which considers the rate of
change in the logarithm of the probability of data between successive K
values (Evanno, Regnaut, & Goudet, 2005), and (3) PI, parsimony index,
a newly proposed metric that favors K values yielding clusters with the
most consistent and with minimal average individual admixture. The
latter is assumed to be a more consistent metric across a wider range of
demographic scenarios (Wang, 2019). We ran CLUMPAK (Kopelman, Mayzel,
Jakobsson, Rosenberg, & Mayrose, 2015) on STRUCTURE outputs. CLUMPAK
feeds the software CLUMPP with results of replicate runs for each K
value to generate consensus solutions for the distinct modes. It also
computes the similarity between Q-matrices (ancestry matrices) from each
run and matches clusters across successive values of K.
STRUCTURE results were contrasted with a model-free hierarchical
clustering method using the Neighbor-Joining algorithm on pairwise
genetic distances (Supplementary File S1).
Congruence in ancestries between microsatellite and SNP
datasets
We assessed the congruence of the Q-matrices from STRUCTURE results
between SNP and microsatellite datasets using the Symmetric Similarity
Coefficient (SSC) (Jakobsson & Rosenberg, 2007). For P.
cultripes , since all individuals were identical among datasets, we ran
CLUMPAK over the combined STRUCTURE results from both markers (n = 20
runs per K). The CLUMPP algorithm in CLUMPAK computes a pairwise
distance matrix for all runs in each K based on the SSC. For H.
molleri , since we sampled different individuals from the same
localities for microsatellites and SNPs, we averaged individual
ancestries per locality, and used R package starmie 0.1.2
(Tonkin-Hill & Lee, 2016) to run CLUMPP and compute the similarity
coefficients. SSC ranges from negative values to a maximum of 1 when
Q-matrices are identical. Pairwise SSCs were computed between runs from
the same marker (SNPs-SNPs, microsatellites-microsatellites), in
addition to cross-comparisons between markers (SNPs-microsatellites). To
aid visualization of spatial patterns of genetic structure, we computed
mean ancestries per locality for each species and marker from major
clusters after CLUMPAK results. Then, for each species and K value, we
aligned the microsatellite and SNP matrices using the CLUMPP algorithm
from starmie 0.1.2 (Tonkin-Hill & Lee, 2016).
We evaluated admixture in individual ancestries of P. cultripesfor each K in STRUCTURE using a newly developed index: the Coefficient
of Admixture, CA. CAKi for individual iacross clusters of a Q-matrix from a given K in STRUCTURE
represent individual levels of genetic admixture, 0 indicating all
ancestry belonging to a single cluster, and 1, equal proportions across
clusters (details in Supplementary File S2).
Genetic diversity
Individual heterozygosity with each marker type was computed as the
proportion of heterozygous loci standardized by the heterozygosity of
loci across the dataset (standardized multilocus heterozygosity, sMLH)
(Coltman, Pilkington, Smith, & Pemberton, 1999), usinginbreedR::sMLH (Stoffel et al., 2016) in R. We then represented
the median sMLH per locality for each dataset in a map to describe the
spatial distribution of genetic diversity. Pearson correlations were
computed between sMLH from microsatellite and SNP data, for individuals
of P. cultripes . We also explored patterns of genetic diversity
along the axes of demographic expansions from inferred glacial refugia
in both species. For that purpose, the putative effects of latitudinal
and longitudinal gradients on patterns of genetic diversity were
assessed with linear models in R, using sMLH as a dependent variable and
latitude and longitude as fixed effects.