3.3 Seascape Genomics
Out of the 18 environmental predictors used in the GEAs, forward selection analysis identified five variables significantly associated (p =0.001) with genomic variation (Table S3): mean SST, mean SSS, minimum SCV, minimum SCA, and minimum SPP. The five variables show a heterogeneous seascape in the North Atlantic (Figure S6-S10), especially a pronounced salinity gradient in the Baltic region (Figure S7). Forward selection analysis identified three dbMEMs that explained a significant proportion (p <0.05) of the genomic variation and were used as the spatial variables. After checking for multicollinearity (Figure S11) and assessing the variance inflation factor (VIF), the five environmental variables were retained since they presented a r<|0.7| and VIF<3. We removed one dbMEM in the dataset including BLS, as it had a large VIF (5.8). In total, the RDA model comprised five environmental variables and two dbMEMs for the dataset with BLS samples and the five environmental variables and three dbMEMs for the dataset excluding BLS (Table S2).
In the dataset without BLS, the overall RDA model was significant (p =0.001), with the environmental variables explaining ~8% of the variation and the spatial variables ~5%. In the RDA model, SSS, SST (p <0.001) and SCV (p =0.023) were significant, while on the pRDA only SSS (p <0.001) and SCV (p =0.018) were significant. By plotting both the RDA and pRDA (Figure 5, Figure S12) we observed that using spatial variables as a condition (pRDA) affected the pattern of the biplots, making the first axis less predominant. In the dataset without BLS, RDA1 (Figure 5A) explained 28% of the variance while pRDA1 (Figure 5B) explained 22.7%. The RDA biplots show the variation in the genomic response to the different environmental variables among sampling locations in the North Atlantic. Both RDA1 and pRDA1 divided the Baltic samples from the rest, mostly based on SSS, while RDA2 and pRDA2 were moderately driven by SST and SCV (Figure 5). In the model including BLS porpoises, the five environmental variables were significant and explained 8.7% of the variance, while the spatial variables explained 4.3% (Table S4). Both pRDA1 and RDA1 (Figure S12) separated BLS from the rest based on SST; pRDA2 and RDA2 (~22% variance explained) separated BES and PBS populations from the rest based on SSS.
The PCA loadings of Pcadapt (Figure S13C) showed that most of the p-values followed a uniform distribution, but there was an excess of small p-values, indicating the presence of outliers. Using the dataset without BLS, Pcadapt identified 18,955 candidate SNPs, while the pRDA and RDA, identified 9,272 and 7,079 candidate SNPs, respectively (Table 1). A set of 952 candidate SNPs overlapped in the pRDA, RDA andPcadapt . The number of candidate SNPs inferred to be under selection on the dataset with BLS are found in Table S4. We successfully mapped and annotated 202 out of the 271 candidate SNPs associated with salinity, of which 106 were annotated to known genes. While 48 candidate SNPs had hits to only one gene, the other 58 candidate SNPs had equally good (very low e-value and high bit score) hits to multiple annotated genes (Table S5), thus the latter candidate genes must be interpreted with caution.