2.6 | Genetic load
We evaluated genetic load at the species and population level using variants detected in a set of conserved single copy coding loci (Benchmarking Universal Single-Copy Orthologs or BUSCOs; Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) that are expected to be under purifying selection due to evolutionary constraints across specific lineages. To assess genetic load at the species level we focused on derived nonsynonymous substitutions at sites that were polymorphic in either one or both species (“Species level mutations”). Using these sites will capture differences in genome-wide heterozygosity between species, which can substantially impact estimates of load (van der Valk et al., 2019). To estimate genetic load at the population level we used sites with derived nonsynonymous substitutions that were polymorphic within specific populations (“Population level mutations”). These variants capture a different aspect of genetic load that reflects the evolutionary process shaping mutation variation within populations independent of species level effects.
To assess variation in BUSCO loci, we first performed homology searches for these loci in the S. catenatus reference genome. We used BUSCO v.3.1.0 (Simão et al., 2015) and the Tetrapoda v.10 database to identify homologs for 5,310 query BUSCO loci. We then used ANGSD to characterize SNPs and genotypes for these loci in downsampled (5x) sequences for both S. catenatus and S. tergeminus samples. ANGSD runs used the ‘sites’ parameter to restrict searches to BUSCO annotations, eliminate reads with mapping qualities < 20, remove bases with quality scores < 20, retain SNPs identified with P -values < 1x10-6, and call genotypes with posterior likelihoods of P > 0.95. To identify derived nonsynonymous substitutions in BUSCO loci for bothS. catenatus and S. tergeminus , we compared detected variants with those identified in a genome assembly of a closely-related outgroup (S. miliarius ; Broe et al., in prep.) (for detailed bioinformatic methods see Supplemental Material). This analysis identified both Species and Population level mutations (see above).
Next, we used PROVEAN v.1.1 (Choi & Chan, 2015) to classify the potential deleterious impact of specific derived nonsynonymous alleles in the BUSCO loci (Perrier, Ferchaud, Sirois, Thibault, & Bernatchez, 2017; Renaut & Rieseberg, 2015). PROVEAN uses BLAST searches of protein databases and subsequent alignments of related protein sequences from other related species to generate a measure (“Delta alignment score”) of how phylogenetically conserved a given nonsynonymous variant is among homologous protein sequences. To characterize load, we used Delta scores to pool nonsynonymous mutations into “benign” (PROVEAN scores > -2.5) and “deleterious” (PROVEAN scores < - 2.5). We also identified substitutions that resulted in intermediate stop codons or “loss of function” mutations—not accounted for by PROVEAN—as an additional category of potentially highly deleterious mutations.
To estimate overall levels of deleterious mutations present, we calculated the number and mean proportion of derived alleles present in an individual that were deleterious or loss of function with respect to the benign ones. For comparisons using Species level mutations, we compared these values averaged across all S. catenatus andS. tergeminus individuals. For comparisons that used Population level mutations, we generated mean values for the proportion of all derived deleterious alleles for a specific mutation category across all individuals from single S. catenatus populations and the onlyS. tergeminus population analyzed. Finally, for Species level mutations only, we compared the mean PROVEAN scores for all deleterious mutations found in an individual and compared this value separately for heterozygous and homozygous genotypes between individuals from each species. For this analysis we coded loss of function mutations as having a PROVEAN score of -12.
We used the nonparametric Mann-Whitney U -test for statistical comparisons between groups. We further used the false discovery rate method (Benjamini & Hochberg, 1995) for P -value adjustments across multiple tests.