2 | METHODS
2.1 | Specimen
Collection
Blue crabs were sampled from the northern Gulf of Mexico, the U.S.
Atlantic coast and Caribbean (Fig. 1A,
Table 1). Sampling was concentrated at
five estuarine locations (FWC, GIL, LUM, RBC and RWR) along the
Louisiana coast (Fig. 1B) from which 1234
adults, juveniles and megalopae were sampled over a 6-year period and
genotyped (Table S1). Additional
samples from more distant locations were generously provided by
colleagues. Adults and juveniles were collected with baited crab pots,
hoop nets, and hand lines. Carapace width from point to point was
measured to the nearest millimeter and sex was determined by the shape
of the abdominal tergites. Males larger than 120 mm and females with
dome-shaped abdominal tergites were considered mature (following
Newcombe,
Sandoz, & Rogers‐Talbert, 1949;
Van
Engel, 1990). The second walking leg on the right side of each crab was
removed and stored in 95% ethanol at 4˚C. Megalopae were sampled with
modified “hogs-hair” collectors
(Metcalf,
van Montfrans, Lipcius, & Orth, 1995) deployed for approximately 24
hours, those visually identified as C. sapidus were removed,
washed with ambient seawater and preserved in pre-chilled 95% ethanol
at 4˚C.
2.2 | DNA Extraction
DNA was extracted from approximately 2 mg of muscle tissue from each
adult or juvenile, or from entire megalopae using Nucleospin® Tissue
Kits (Machery-Nagel) in an epMotion 5075 TMX liquid handling workstation
(Eppendorf) following the manufacturer’s protocols. DNA concentrations
were measured with a NanoDrop 1000 spectrophotometer (Thermo Fisher
Scientific) and adjusted to between 50 and 150 ng/µl in a volume of at
least 15 µl by either concentration in a SpeedVac vacuum concentrator
(Thermo Fisher Scientific) or dilution with deionized water.
2.3 | Selection of SNPs for
Genotyping
We selected 2958 SNPs for genotyping on a custom 24-chip Infinium®
iSelect® array (Illumina) with 5373 bead types. SNPs were selected for
assessment of data quality, analysis of population structure and testing
for natural selection. Gene annotation and SNP discovery were based on
an assembled transcriptome representing seven individuals
(Yednock,
Sullivan, & Neigel, 2015). Statistical power to detect genetic
differentiation increases with the number of alleles
(Ryman
et al., 2006), so it should be advantageous to combine alleles of
neighboring SNPs into greater numbers of haplotypes. To this end, we
selected multiple linked SNPs for some loci. We considered 272 gene loci
in which SNPs were assayed as candidates for selection by environmental
stress or pathogens. These loci had been assigned the gene ontology
terms: cold , defense , detox , ecdys ,heat , hypoxia , osmo or xenobio by Yednock
and co-workers
(2015).
2.4 | Design of the Infinium Bead
Array
The Infinium assay is based on the specificity of single-base extensions
of DNA duplexes formed by hybridization of bead-anchored DNA probes to
genomic target sequences. Based on the transcriptomes from Yednock and
co-workers
(2015),
SNPs were required to be biallelic, have a minor allele frequency of at
least 0.3, a high-quality read depth of at least 50, and be flanked by a
region of at least 60 bp without other SNPs. Introns were identified by
comparing transcriptomic sequences with genomic sequences identified by
BLAST searches
(Altschul,
Gish, Miller, Myers, & Lipman, 1990) of a draft genome of C.
sapidus (generously provided by T. Schultz). Probes that overlapped
introns were dropped from further consideration. Probe designs were
evaluated with the Illumina online Assay Design Tool; only probes with a
final score greater than 0.7 were considered further. To evaluate the
species-specificity of the SNP assays, a representative of a congener
(Callinectes similis ) was also genotyped. All genotyping was
performed by the DNA Technologies Core at the University of California
Davis Genome Center.
2.5 | Genotyping SNPs with Two
Probes
Coding regions are more polymorphic in the blue crab genome
(1
SNP per 13.3 bp; Yednock & Neigel, 2014) than in the human genome
(1
SNP per 346 bp; Cargill et al., 1999) for which the Infinium assay has
been extensively tested
(Steemers
et al., 2006). This was a concern, because polymorphisms within the
probe-binding region can cause genotyping errors. To detect genotyping
errors, improve call rates, and identify loci with null alleles we used
two probes for 1750 of the 2958 genotyped SNPs (59%). One probe was
designed to hybridize downstream of the SNP and the other upstream on
the opposite strand. For each crab, if a genotype was not called for one
of the probes the genotype called for the other probe was assigned.
Genotypes were scored as unknown if different genotypes were called for
the two probes or no call was made for either probe. A probe with null
alleles would have lower apparent heterozygosity than an unaffected
probe for the same locus. The deviation from expected heterozygosity was
calculated as (Hobserved –Hexpected ) / Hexpected for
both probes; if their values differed by more than 0.05 that locus was
not used for population structure analysis.
2.6 | Genotype Calling with
GenomeStudio
Initially, 2958 SNPs were genotyped from 1508 putative loci. We used 665
probes for A/T and C/G SNPs, and 4043 probes for A/G, A/C, T/G, and T/C
SNPs, requiring 5373 bead types. SNP genotypes were inferred by the
GenTrain 3.0 algorithm in the Genotyping Module of GenomeStudio 2.0.2
(Illumina), which also calculates a GenCall score (the frequency of
reliable calls). Data from probes with GenCall scores below 0.85 were
discarded. For population structure analysis, only crabs for which 85%
of loci were successfully genotyped were included. Scatter plots of
clusters of genotypes were visually assessed for each locus; genotypes
that did not belong to a well-defined cluster were scored as unknown.
2.7 | Illumina Infinium Genotyping versus
Sequencing
Yednock and Neigel
(2014)
sequenced portions of protein-coding genes in blue crabs. 176 of these
crabs were also genotyped in the present study, allowing a comparison
between Infinium genotyping and sequencing. Comparisons were made for
two SNPs in ATP/ADP translocase (ant ), two inATP-synthase subunit 9 (atps ), and three intrehalose 6-phosphate synthase (tps ). Deviation from
Hardy-Weinberg proportions (F ) determined both from sequences and
from Illumina genotypes were estimated with Genepop 4.2
(Rousset,
2008).
2.8 | Phasing SNPs in Multi-SNP
Loci
SNPs within the same locus were phased and combined into haplotypes with
PHASE 2.1
(Stephens
& Donnelly, 2003;
Stephens,
Smith, & Donnelly, 2001). Initially, 5 replicate runs of 1000
iterations with a thinning interval of 1 and a burn-in of 1000 were
used. If replicates differed in haplotype assignments, 5 more were run
with 10,000 iterations, and if those differed, 5 were run with 50,000
iterations and burn-ins of 5,000. If less than 90% of individuals were
each assigned haplotypes with at least 90% probability after PHASE
runs, progressively smaller subsets of SNPs were evaluated until a
subset was found that met this criterion.
2.9 | Tests for Linkage
Disequilibrium
After SNPs within the same locus were phased and combined into
haplotypes, tests for linkage disequilibrium (LD) between pairs of loci
were run in Genepop 4.7.0. Genepop evaluates the significance of
deviations from expected proportions of 2-locus diploid genotypes.
Initial runs combined all genotyped individuals of C. sapidusinto a single population sample and used default MCMC parameters of
10,000 dememorization steps, batch lengths of 5000 and 100 batches. As
recommended by Waples
(2014),
the distribution of p -values for each test was compared with the
expected uniform distribution. Loci with the lowest and highestp -values were retested in individual populations to assess if LD
was due to a Wahlund effect
(Sinnock,
1975).
2.10 | Analysis of Population
Structure
We began by calculating FST and testing for the
presence of population structure among locations, years and life stages.
When significant overall heterogeneity was found, two approaches were
used to characterize the pattern of heterogeneity: 1) tests of
heterogeneity in defined subsets of samples, with sequential Bonferroni
adjustments of the Type I error rate, α, which was set to 0.05, and 2)
pairwise estimates of FST between sampling units,
with Bonferroni adjustments of α and control of the False Discovery Rate
(FDR)
(Benjamini
& Hochberg, 1995) applied to tests of FST > 0. FST was estimated with
Genepop 4.7.0. Tests for overall heterogeneity were conducted with
CHIFISH
(Ryman,
2006), which provides two estimates of p : one fromχ 2 summed over loci and the other with Fisher’s
method of combining multiple independent tests. The significance of
pairwise FST estimates
(FST > 0) was determined with
Arlequin
(Schneider,
Roessli, & Excoffier, 2000) using 10,000 permutations of the data.
Statistical power of tests for population differentiation was assessed
with the Powsim_b version of POWSIM
(Ryman
& Palm, 2006), downloaded from
http://internt.zoologi.su.se/~ryman/. Genetic drift was simulated
for populations with effective sizes of 10,000 for 1 or 2 generations,
with expected FST reaching 0.00005 and 0.0001
respectively. 1000 replicates were used to estimate statistical power
and Type I error rates. Markov chain parameters for estimation ofp by Fisher’s method were 1000 dememorizations, 100 batches and
1000 iterations.
2.11 | Estimation of Coancestry
The coefficient of relationship (r ) between individuals was
estimated by the likelihood method of Milligan
(2003)
using Coancestry 1.0.1.8
(Wang,
2011), which requires estimates of population allele frequencies. We
examined how estimates of r were affected by partitioning of the
full sample for estimates of allele frequencies by life stage, sampling
year, or sampling location.