3 | RESULTS
3.1 | Genotyping Call Rates and
Reproducibility
A total of 2958 SNPs were
genotyped; 910 of 1208 (75.3 %) genotyped with single probes passed
quality assessment in GenomeStudio 2.0.2. For the 1750 SNPs genotyped
with two probes, 1118 SNPs (63.9%) passed quality assessment for both
probes, 470 (26.9%) for one probe, and 162 (9.3%) for neither probe.
Overall, 2498 SNPs (84.4%) passed quality assessment for at least one
probe and with two probes a success rate of 90.7% was achieved. After
loci with significant deviations from Hardy-Weinberg proportions were
removed, data for 2486 SNP loci were retained for population genetic
analysis.
After probes with low call rates were eliminated the average call rate
for probes was 99.0% and only two individuals (0.13%) had probe call
rates below 70%. 1475 individuals (96.0%) had probe call rates meeting
our cutoff (85%), 1447 (94.2%) had call rates of at least 99%, and
139 (9.0%) had call rates of 100%. Reproducibility of genotypes at
each locus for individual crabs was nearly perfect. Of the 17
individuals that were genotyped twice, data for one was discarded
because the call rate was below 84%. Among the remaining 16
individuals, identical SNP genotypes (across all loci) were called for
both replicates of six individuals, and a genotype was called at one or
two loci in one replicate but not the other for 10 individuals. There
were no instances in which different genotypes were called for
replicates of the same individual. After crabs with low call rates or
anomalous heterozygosity were removed from the data, genotype data was
available for 1434 crabs.
3.2 | Genotyping and Non-target
Species
Low call rates for non-target species are expected because of mismatches
with probes designed for the target species (C. sapidus ).
Consistent with this expectation, the probe call rate forCallinectes similis (0513LUMCsimJ1) was only 77%. A megalopa
collected near Galveston, Texas (GAL815_M45) with a probe call rate of
87% was subsequently identified by its mitochondrial 16S sequence asC. rathbunae (Fig. 2). Seven other
megalopae from Louisiana (14RWR7516, FWC15M20, FWC15M21, FWC15M22,
FWC15M23, FWC15M25, and FWC15M27) had probe call rates and SNP
heterozygosities that clustered tightly with those of GAL815_M45 (Fig.
2). Comparisons based on which probes were unscorable and which
allele was present at each homozygous SNP confirmed that these megalopae
were genetically very similar to GAL815_M45
(Table S2). Between 93.3% and 96.5%
of the 255 probes that were unscorable in GAL815_M45 were also
unscorable in these megalopae, and between 97.8% and 98.5% of the 2220
SNP loci that were homozygous in GAL815_M45 were homozygous for the
same allele in these megalopae. In contrast, the individual of C.
similis (0513LUMCsimJ1) matched GAL815_M45 for only 46.6% of
unscorable probes and 64.7% of alleles at homozygous loci.
Five specimens from Venezuela had low heterozygosities (near 0.1) and
probe call rates between 88.2% and 99.5% (Fig. 2). They were similar
in which alleles were present at homozygous loci, but not in which
probes were unscorable and they did not share high percentages of either
unscorable or homozygous loci with representatives ofC. rathbunae or C. similis (Table S2). These Venezuelan
specimens thus appear to represent a population lacking polymorphism at
loci that are polymorphic in North American populations. Probe call
rates were not exceptionally low for three of these individuals, while
low probe call rates for the other two, which were museum specimens, may
have been a consequence of inadequate preservation.
A single specimen from Louisiana (0813LUMCSapM017) had a low proportion
of scorable loci but a uniquely high proportion of heterozygous loci
(Fig. 2). We devised a test to determine if this individual could be a
hybrid between C. sapidus and a crab with a genotype matching any
of the low-heterozygosity individuals described above. We compared the
genotype at each of the 977
homozygous loci in 0813LUMCSapM017 with the genotype for the same locus
in each potential parental genotype (GAL815_M45, 0513LUMCsimJ1,
0000EMVCsapA1, 501EMVCsapA7, 0901ZLVCsapU1, 0901ZLVCsapU2, and
1099ZLVCsapU3). Following Mendelian principles, we should not observe
cases in which a parental genotype is homozygous for an allele not
present in its hybrid offspring. By this criterion, none of the
genotypes we tested could be the parent of 0813LUMCSapM017
(Table S3). The number of loci that
failed to meet this criterion ranged from 104 (9.8%) to 210 (18.8%).
3.4 | Illumina Infinium Genotyping versus
Sequencing
Genotypes were successfully called by the Infinium assay for all 7 SNPs
among the 176 blue crabs from which sequence data was available from
Yednock and Neigel
(2014).
Sequences were available for ATP/ADP translocase (ant )
from 149 crabs and for both ATP-synthase subunit 9 (atps )
and trehalose 6-phosphate synthase (tps ) from 167 crabs.
One of the SNPs in tps was monomorphic in this set of individuals
and is not considered further. Overall, 94% of the 966 SNP genotypes
determined by Infinium matched the sequences. However, the percent in
agreement ranged from 76.5% for a SNP in ant to 100% for the
two SNPs in tps . Yednock and Neigel
(2014)
reported a significant heterozygote deficiency for the ant locus
and suggested this was due to null alleles. This interpretation is
supported by Infinium genotyping: all 53 of the discrepancies betweenant sequences and Infinium genotypes were instances of SNPs
called as heterozygotes by Infinium appearing as homozygotes in
sequences. Furthermore, departures from expected heterozygosity were
large and highly significant for both SNPs in ant when genotyped
by sequencing (F values of 0.404 and 0.389, p = 0.00 for
both SNPs) but were not when genotyped by Infinium (F values of
0.096 and -0.110). For the two SNPs in atps , there were a total
of five discrepancies between sequences and Infinium genotypes and all
except one were instances of genotypes called heterozygotes by the
Infinium assay appearing as homozygotes in sequences.
3.5 | Phasing Haplotypes at Multi-SNP
Loci
Two or more SNPS were successfully genotyped for 407 loci. After PHASE
runs were completed, 211 (52%) of the loci met our 0.9/90% criteria
(haplotypes inferred with at least 0.9 probability in at least 90% of
individuals). Longer runs sometimes led to convergence among replicates
but still failed to identify haplotype configurations that met the
0.9/90% criteria. For loci that did not meet these criteria we removed
SNPs (collapsing the set of haplotypes) until the 0.9/90% criterion was
satisfied for the haplotypes distinguished by the remaining SNPs. From
an initial total of 1484 SNPs that were phased, data for 1095 SNPs
(74.8 %) were retained and distinguished a total of 2196 haplotypes.
The conversion of 1484 SNPs into 2196 haplotypes increased the total
number of degrees of freedom from 1077 to 1789, a 66% increase.
Genotype and specimen collection data used for analyses in this paper
are available on GRIIDC (Neigel, 2017).
3.6 Linkage Disequilibrium
With all individuals pooled and all polymorphic loci tested for linkage
disequilibrium (LD) there was an unexpectedly high proportion ofp values near zero (Fig. S1):
3,269 tests out of a total of 911,925 had p values below 0.001,
3.6 times more than the 912 expected under the null hypothesis of no LD.
We retested the 10 loci with the lowest p values and the 10 loci
with the highest p values in subsamples representing each
combination of sampling location and year for the main sampling
locations on the Louisiana coast (Table S1). The two resulting
distributions were similar and both lacked an excess of low pvalues (Fig. S2), suggesting that LD in the pooled samples could be a
Wahlund effect
(Sinnock,
1975;
Waples,
2014). Peaks at p = 0 were also investigated and were found to
be associated with contingency tables that had cells with low counts.
3.7 | Statistical Power to Detect
Structure
POWSIM was used to estimate the statistical power of theχ 2 and Fisher’s exact tests that are used by
CHIFISH to detect genetic differentiation among populations. For samples
from the five main locations in Louisiana (with all life stages
combined), Type I error rates for the χ 2 test
and Fisher’s exact test were 0.037 and 0.077 respectively with all loci
used, 0.055 for χ 2 and 0.078 for Fisher’s with
haplotypes of multi-SNP loci, and 0.058 for χ 2and 0.091 for Fisher’s with single-SNP loci. Using all markers,
statistical power to detect even weakly differentiated populations
(FST = 0.0001) was considerable. The proportion
of replicate simulations runs in which differentiation was significant
with α set to 0.05 was 0.90 for χ 2 and 0.92 for
Fisher’s exact test; with multi-SNP loci the proportions were 0.70 forχ 2 and 0.72 for Fisher’s, and with single-SNP
loci the proportions were 0.65 for χ 2 and 0.74
for Fisher’s. With the level of differentiation reduced toFST = 0.00005, the power estimates were 0.54 forχ 2 and 0.47 for Fisher’s.
3.8 | Large-Scale Genetic Population
Structure
Overall geographic differentiation among 15 locations in the Atlantic
and GOM was slight but highly significant (FST =
0.0002; p = 0.00000 for χ2, 0.00017 for
Fisher’s). Estimates of pairwise-FST between
locations were small (-0.0016 to 0.0054, mean 0.00065), with the highest
between the samples from JAC and RRC
(Table S4). 17 of the 105 pairwise
comparisons were significantly different from zero with the False
Discover Rate (FDR) set to 0.05, and all were comparisons with samples
from either JAC or RRC. Following a Bonferroni correction for multiple
comparisons, 7 of the 105 comparisons were significant, and all were
comparisons with samples from JAC. Without the JAC sample, overall
geographic differentiation was not significant
(FST = 0.0001; p = 0.32 for
χ2, 0.071 for Fisher’s).
FST estimated among sampling years, with
locations and life stages pooled, was small but statistically
significant (FST = 0.0001; p = 0.012 for
χ2, 0.031 for
Fisher’s). This raises the question of whether apparent genetic
differentiation among locations could be caused by differences among the
years in which locations were sampled. However, for samples collected in
2010, which included 13 of the 17 locations, this does not appear to be
the case. The estimated overall FST among
locations in 2010 was 0.0004 (higher than the overall estimate forFST among locations pooled across years) and
highly significant (p = 0.00001 for χ2, 0.0043
for Fisher’s).
3.9 | Patterns of Differentiation on Louisiana
Coast
Blue crab megalopae, juveniles and adults were sampled from five
locations on the coast of Louisiana in six different years. The
estimated overall FST among the five locations
(life stages and years pooled) was 0.0000. FSTamong years (life stages and locations pooled) was estimated at 0.0001
and was statistically significant (p = 0.025 for
χ2, 0.016 for Fisher’s). In pairwiseFST estimates among the six sampling years
(Table S5), two of the 15 comparisons
were significant after control of the FDR at 0.05: 2010 vs. 2016
(FST = 0.0003, p = 0.0014) and 2013 vs.
2015 (FST = 0.0002, p = 0.0059).
Sources of heterogeneity among collection years in Louisiana were
identified by testing subsets of samples. First, heterogeneity among
years was tested separately at each of the five locations; none of these
were significant. Second, heterogeneity among years was tested
separately for each of the three life-stages. Heterogeneity was
significant among years for juveniles (FST =
0.0003; p = 0.0059 for χ2, 0.013 for
Fisher’s,), but not for adults (FST = 0.0002),
and was absent for megalopae (FST = -0.0001). For
juveniles (Table S6), the only between-year FSTpairwise estimate that was significantly greater than zero was between
2011 and 2013 (FST = 0.0009; p = 0.0007).
Settling megalopae were collected from four locations (FWC, GIL, LUM and
RWR) in at least five different years for a total of 21 collections. We
found no evidence for heterogeneity among collections of megalopae
(FST = -0.0004).
3.10 | Coancestry among Blue
Crabs
Initial estimates of r , the coefficient of relationship, between
pairs of individuals were based on allele frequencies for all
individuals pooled. The distribution of estimates between individuals
from the same location was similar to the distribution of estimates
between individuals from different locations
(Fig. S3A), but the within-location
distribution had a longer upper tail with some high values
(Fig. S3B). There were five pairs with
estimates of r above 1/8 (0.125), the degree of relatedness
expected for first cousins, although their 95% confidence limits
included zero (Table 2). Each of these pairs of apparently related
individuals were collected from the same location in the same year and
were at the same life stage. This could suggest that closely related
individuals settled in cohorts at the same locations. Alternatively, our
estimates of relatedness could have been upwardly biased by being based
on allele frequencies for individuals pooled from different locations,
sampling times and life stages. We investigated this potential bias by
estimating r with allele frequencies from more restricted
samples: the pair’s source location, the pair’s combination of location
and sampling year, and the pair’s combination of location, year and life
stage. (Only one additional estimate was made for pairs from the JAC
sample, which consisted entirely of juveniles collected in 2010.) As
shown in Table 3, estimates of r decreased with each restriction
of the samples used for allele frequency estimates. This effect is not
confined to the upper tails of the distributions of r : the
distribution of estimates of r based on allele frequencies at
specific locations tended to be lower than estimates based on allele
frequencies for all locations combined
(Fig. S4).
Relatedness was estimated for all pairs within each life stage
(megalopae, juveniles and adults) from the five intensively sampled
Louisiana locations. Allele frequencies were estimated separately for
each life stage with collection locations and years pooled. The
distributions of estimates of coefficients of relatedness (r )
were similar among life stages (Fig. S5A), and no estimates for pairs of
megalopae were above 1/8 (Fig. S5B, Table 4). Thus, we found no evidence
for reproductive sweepstakes events, such as full or half-siblings in
the same cohort of megalopae or year class of juveniles.