Materials and Methods
Data acquisition
We reanalyzed previously collected morphological, ecological and genomic
data of wild-caught stickleback from British Columbia, Canada, obtained
from several independent datasets. The genotype data for the genome-wide
association (GWA) mapping of lake size was generated using ddRAD
sequencing (Peterson, Weber, Kay, Fisher, & Hoekstra, 2012) for 33
solitary stickleback populations from Vancouver Island. For 21 of these
solitary populations, we obtained data on the proportion of benthic diet
and gill raker numbers (a key trophic trait). Both of these datasets are
from Bolnick and Ballare (2020); the genomic dataset was produced by
Stuart et al. (2017). Genotype data used for FSTanalyses from three sympatric benthic-limnetic species pairs (Paxton,
Priest and Little Quarry Lakes) were generated by Samuk et al. (2017)
using the genotyping-by-sequencing method. Diet and gill raker number
data for the sympatric species pairs from Paxton and Priest Lakes were
obtained from Schluter and McPhail (1992). QTL data on different
sympatric species pairs were obtained from several previous studies
(Arnegard et al., 2014; Conte et al., 2015; Malek, Boughman, Dworkin, &
Peichel, 2012).
Morphological analyses
Linear regression models were used to test whether lake size (log
surface area) affects trophic ecology (proportion of benthic diet) or
morphology (gill raker number), as previously shown for allopatric
populations (Bolnick & Ballare, 2020). Linear regression was done
twice, once with only the 21 solitary populations and once including the
sympatric benthic-limnetic species pairs. We further determined
residuals to identify populations that deviate strongest from the linear
regression models (Fig. S1). The proportion of benthic diet consumed by
stickleback has been shown to be affected by lake size across allopatric
populations (Bolnick & Ballare, 2020). In the lakes harboring sympatric
benthic-limnetic species pairs, that show pronounced specialization in
diet, the association between lake size and diet is assumed to be broken
up leading to extreme residuals for some of these sympatric species
(Fig. S1).
Genome-wide differentiation analyses and quantification of
parallelism
To investigate the genomic architecture of divergence along the
benthic-limnetic axis, we determined patterns of genome-wide
differentiation using Weir and Cockerham’s FST (Weir &
Cockerham, 1984) for eight solitary populations and three
benthic-limnetic species pairs. The eight solitary populations were
selected to reflect strong differences in trophic ecology and included
populations from the four smallest (Little Goose, Little Mud, Muskeg,
Ormund) and the three largest (Lower Campbell, Upper Campbell, Stella)
lakes. Additionally, the population from Amor Lake was included in the
group of large lakes (although it is only the sixth largest lake)
because it had the lowest proportion of benthic diet (8.9%). To allow
comparisons across datasets, we calculated mean FSTvalues for 50-kb windows for windows with a minimum of three data points
and all analyses were based on these windows. A 50-kb window was
classified as an outlier if the FST value was in the top
5% of the genome-wide FST distribution. Spearman’s rank
correlation coefficients were used to quantify the extent of shared
genome-wide differentiation across pairs of populations that diverged
along the benthic-limnetic axis.
We sought to detect genomic regions that are repeatedly differentiated
either across benthic-limnetic species pairs or across solitary
populations from small and large lakes. Levels of repeatability were
estimated as in Rennison et al. (2019). Briefly, repeatability was
calculated for each window as the proportion of population comparisons
with data for which this window was scored as an FSToutlier. Statistical significance of repeatability for the solitary
comparisons was determined at the 0.05 level by comparing empirical
values against a null distribution produced by 10,000 permutations with
subsampling to account for non-independence of some comparisons.
P-values were corrected for multiple testing using the Benjamini &
Hochberg method (Benjamini & Hochberg, 1995).
To examine the overlap between FST outliers for
sympatric species pairs and lake size GWA mapping candidates (see next
paragraph for more information) for the 33 solitary populations, we
normalized benthic-limnetic FST values and took the mean
across species pairs. As we were interested in FSToutlier regions consistently associated with benthic-limnetic
differentiation, only FST windows with data for at least
two species pairs were used. Windowed FST values were
normalized using the following equation:
x – mean(x)/SD(x),
where x is a vector of all windowed FST values for each
population pair. Normalized FST values for each window
were then averaged across the three lakes.
Genome-wide association
mapping
The GWA mapping for lake size (the proxy for dietary niche) was based on
175,350 single nucleotide polymorphisms (SNPs) obtained from ddRAD
sequencing of 33 solitary populations with 12 fish per population, as
detailed in Bolnick and Ballare (2020). For more information on library
preparation and bioinformatics pipeline protocols, see Stuart et al.
(Stuart et al., 2017). To test for associations, a binominal linear
model with allele frequency and log lake size was run for each SNP with
a logit link function and watershed as a covariate, see Bolnick and
Ballare (2020) for further details. The p-values resulting from this
analysis were used to identify candidate genomic regions (50-kb windows)
associated with lake size; a window was considered a GWA candidate if it
contained loci with a p-value falling below the 0.05 cutoff.
Overlap of candidate regions between
datasets
All analyses that involved comparisons of FST and GWA
mapping data were based on 4,985 50-kb windows distributed across all 21
chromosomes that were included in both datasets. We created four subsets
of the combined benthic-limnetic FST and GWA mapping
dataset. These subsets contained windows that were either
non-significant in both datasets, FST outliers in the
sympatric benthic-limnetic data (95th percentile),
significantly associated with lake size across solitary lake populations
(P < 0.05) or double outliers in which they were both
an FST outlier and significantly associated with lake
size. While identifying windows that were FST outliers
and significantly associated with lake size allowed us to obtain a set
of candidate regions that might be important during adaptation to
benthic and limnetic habitats, we want to point out the differences
between these two analyses. The GWA mapping allowed us to detect
specific alleles significantly associated with lake size, it provides
information on the directionality of shifts in allele frequencies across
many populations. In contrast, the FST analysis lacks
such directionality and rather tells us whether the same genomic loci
are repeatedly differentiated but not which alleles are more common in
which populations. Thus, these are independent lines of evidence
suggesting a given genomic region contributes to adaptation along the
benthic-limnetic axis.
We tested whether windows of these subsets (except for non-significant
windows) were significantly enriched on certain chromosomes. This was
done by running a permutation with 10,000 iterations. For each
iteration, the number of significant windows in each subset were
randomly shuffled across different chromosomes (accounting for
chromosome size). Empirical frequencies of significant windows for each
chromosome were then compared to the null distribution obtained by
permutation and significant enrichment was determined at the 0.05 level.
Chi-squared tests of independence were used to determine whether there
was a significant association between FST outliers and
GWA candidate windows. Spearman’s rank correlation coefficients were
calculated to test whether the GWA p-values are correlated with
normalized benthic-limnetic FST values. To obtain
further information on the phenotypic traits associated with candidate
regions, we identified published quantitative trait loci (QTL) whose
peaks overlapped with windows differentiated along the benthic-limnetic
axis. All statistical analyses were performed in R v3.5.1 (R Core Team,
2016).