2.7 Genotype-environment associations: Seascape Genomics
To assess how environmental variables shaped the genetic structure of
North Atlantic harbour porpoises, we developed a seascape genomics
approach. Specifically, we studied genotype-environment associations
(GEA) to identify putative SNPs underlying local adaptation by carrying
out a redundancy analysis (RDA) with the R package vegan(Oksanen, 2011). RDA is a multivariate method that finds linear
dependencies between response (genotypes) and explanatory variables
(environmental predictors). Previous authors (Forester et al., 2018,
Capblancq et al., 2018) have found that multivariate GEA analyses,
especially constrained ordination approaches such as RDA, detect more
effectively multilocus selection than univariate methods and have a
superior combination of low false-positive and high true-positive rates
(Grummer et al., 2019; Capblancq
& Forester 2021). RDA was carried out at the individual level using two
datasets, with and without BLS. To control for spatial autocorrelation
and other demographic processes, pairwise oceanic distances were
calculated from the coordinates of the samples with the R packagemarmap (Pante & Simon-Bouhet, 2013). Subsequently, pairwise
oceanic distances were transformed to distance-based Moran’s eigenvector
maps (dbMEMs ) with the R package adespatial (Dray et al.,
2012) and used as the space variable in the RDA.
Five environmental variables previously identified as significantly
associated with genomic variation of Common (Delphinus delphis )
(Barceló et al., 2022) and Bottlenose dolphins (Tursiops
truncatus ) (Pratt et al., 2022) were selected as potential predictors
of harbour porpoise genomic variation: sea surface temperature (SST),
sea surface salinity (SSS), sea current velocity (SCV), sea
chlorophyll-A concentration (SCA), and sea primary productivity (SPP).
For each of the variables, the annual maximum, mean, minimum and range
values were downloaded from the database BioOracle (v2.2)(Tyberghein et al., 2012, Assis et al., 2018). SCVmax and SPPmax were
not available in this database, yielding altogether 18 variables. As
genomic input we used a set of unlinked called genotypes, identified byPlink (v1.9) (Purcell et al., 2007) with a window size of 20kb, a
step size of 10kb and a r2 threshold of 0.1.
To maximize the genetic variance explained by our set of environmental
predictors we performed a forward selection with the functionforward.sel of the R package adespatial . The forward
selection was carried out for the environmental variables anddbMEMs separately, and only variables explaining a significant
proportion (p<0.05) of the genomic variation were retained. To
control for multicollinearity, only predictors with a highly
conservative variance inflation factor (VIF)<3 were retained.
An RDA-based variance partitioning (Capblancq & Forester, 2021) was
performed to estimate the independent contribution of the environmental
and spatial variables. First, we ran an unconstrained RDA with the
retained environmental and spatial variables as predictors to find the
variance explained by the full model. Then, we calculated the variance
explained by the environmental variables once the influence of spatial
variables had been removed by applying a constrained pRDA. Finally, we
checked the significance of the RDAs and pRDAs as well as the
significance of the environmental variables by an analysis of variance
(ANOVA) with 1,000 permutations.
Candidate loci under selection were identified by applying a 3 standard
deviation cut-off (p=0.0027) on the SNPs loading scores (Forester et
al., 2018) from the two first redundancy axis and we determined to which
environmental variable each candidate SNPs is most associated. To
further control for false positives, we carried out an additional
selection genome scan based on individual genotypes with the R packagePcadapt (Privé et al., 2020). We used four principal components
(K), identified by computing a scree plot and choosing the K that
minimized the genomic inflation factor (GIF). Then we transformed the
p-values in q-values with the R package qvalue and applied a
false discovery rate of 0.1 to control for false positives.
To identify candidate genes under local adaptation, we used only the
candidate SNPs detected by all three methods RDA-pRDA-Pcadapt on the
dataset without BLS and that were associated to salinity (271). We
extracted 300 bp of flanking sequences for each candidate SNP resulting
in 601 bp long sequences, each containing a single SNP. We then
performed a basic local alignment with BLASTN using the
nucleotide database of NCBI with an e-value of 1x10-3.
After extracting hits per query, we identified gene function and
associated biological function with the PubMed and GeneCards (Stelzer et
al., 2016) databases.