Genetic analyses
To determine if genetic differences existed among individuals expressing different feeding strategies, the 79 lake trout classified by fatty acid composition into four groups were genotyped to determine genetic variation and structure within and among groups. To allow a sample size sufficient for making a genetic comparison of giants to the other dietary groups, 22 additional individuals determined non-randomly by their size (≥ 900 mm ; giant sub-set) from the 2002-2015 collections were added to giants processed for fatty acids, for a total of 39 giants. Lake trout DNA was extracted from pectoral fin tissue preserved in ethanol using DNEasy extraction kits (Qiagen Inc., Valencia, CA) following manufacturer protocols. Piscivorous groups were assayed using a suite of 23 putatively neutral microsatellite markers amplified in four multiplexes previously described in Harris et al. (2015). Amplified microsatellite fragments were analyzed using an automated sequencer (ABI 3130xl Genetic Analyzer; Applied Biosystems, Foster City, CA). The LIZ 600 size standard was incorporated for allele base-size determination. All genotypes were scored using GeneMapper software ver. 4.0 (Applied Biosystems) and then manually inspected to ensure accuracy.
The program MICROCHECKER ver. 2.2.0.3 (Van Oosterhout et al., 2004) was used to identify genotyping errors, specifically null alleles and large allele dropout. Observed and expected heterozygosity (HEand H O) were calculated using GENEPOP ver. 4.2 (Rousset, 2008). The program HP-RARE ver. 1.1 (Kalinowski, 2005) was used to determine the number of alleles, allelic richness, and private allelic richness for each group, sampling 22 genes in each sample. Tests of departure from Hardy-Weinberg equilibrium and genotypic linkage disequilibrium within each sample (i.e., for each fatty acid grouping and the Giant subset) were conducted in GENEPOP using default values for both. Results from all tests were compared with an adjusted alpha (α = 0.05) following the False Discovery Rate procedure (Narum, 2006).
We used the POWSIM V. 4.1 analysis to assess the statistical power of our microsatellite data set given the observed allelic frequencies within our samples in detecting significant genetic differentiation between sampling groups (Ryman et al., 2006). For POWSIM analyses, we assumed that lake trout within our study diverged from a common baseline population with the same allelic frequencies as observed in our contemporary samples. Simulations were performed with an effective population size of 5000 to yield values of FST of 0.01, 0.005 and 0.001. The significance of tests in POWSIM were evaluated using Fisher’s exact test and the χ2 test and the statistical power was determined as the proportion of simulations for which these tests showed a significant deviation from zero. All simulations were performed with 1000 iterations.
Genetic structuring was tested among lake trout groups using several different methods. First, genotypic differentiation among lake trout groups was calculated using log-likelihood (G) based exact tests (Goudet et al., 1996) implemented in GENEPOP. Global FST (θ) (Weir et al., 1984) was calculated in FSTAT ver. 2.9.3 (Goudet, 1995) and pairwise comparisons of FST between groups were calculated in ARLEQUIN ver. 3.5 (Excoffier et al., 2005) using 10,000 permutations. We then employed the Bayesian clustering program STRUCTURE V. 2.3.2 (Pritchard et al., 2000) to resolve the putative number of populations (i.e., genetic clusters (K)) within our samples. Owing to the remarkably low levels of genetic differentiation among lake trout in the Great Bear Lake (Harris et al., 2015; Harris et al., 2013), we employed the LOCPRIOR algorithm (Hubisz et al., 2009). The LOCPRIOR algorithm considered the location/sampling information as a prior in the model, which may perform better than the traditional STRUCTURE model when the genetic structure is weak (Hubisz et al., 2009). We also incorporated an admixture model with correlated allelic frequencies and the model was run with a burn-in period of 500,000 iterations and 500,000 Markov chain Monte Carlo iterations. We varied the potential number of populations (K) from 1 to 10 and we ran 20 iterations for each value of K. The STUCTURE output was first processed in the program STRUCTURE HARVESTER (Earl, 2012), followed by the combination of results of independent runs of the program and compilation of results based on lnP(D) and the post hoc ΔK statistic of Evanno et al. (2005), to infer the most likely number of clusters. The best alignment of replicate runs was assessed with CLUMPP V. 1.1 (Jakobsson et al., 2007) and DISTRUCT V. 1.1 (Rosenberg, 2004) was then used to visualize the results. For STRUCTURE analyses, we reported both lnP(D) and the post hoc ΔK statistic.
Finally, Discriminant Analysis of Principal Components (DAPC) (Jombart et al., 2010) was implemented in the Adegenet package (Jombart, 2008) in R (Team, 2015). The number of clusters was identified using thefind.clusters function (a sequential K-means clustering algorithm) and subsequent Bayesian Information Criterion (BIC), as suggested by Jombart et al. (2010). Stratified cross-validation (carried out with the function xvalDapc ) was used to determine the optimal number of principal components to retain in the analysis.