2.6 | Genetic load
We evaluated genetic load at the species and population level using
variants detected in a set of conserved single copy coding loci
(Benchmarking Universal Single-Copy Orthologs or BUSCOs; Simão,
Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015) that are expected
to be under purifying selection due to evolutionary constraints across
specific lineages. To assess genetic load at the species level we
focused on derived nonsynonymous substitutions at sites that were
polymorphic in either one or both species (“Species level mutations”).
Using these sites will capture differences in genome-wide heterozygosity
between species, which can substantially impact estimates of load (van
der Valk et al., 2019). To estimate genetic load at the population level
we used sites with derived nonsynonymous substitutions that were
polymorphic within specific populations (“Population level
mutations”). These variants capture a different aspect of genetic load
that reflects the evolutionary process shaping mutation variation within
populations independent of species level effects.
To assess variation in BUSCO loci, we first performed homology searches
for these loci in the S. catenatus reference genome. We used
BUSCO v.3.1.0 (Simão et al., 2015) and the Tetrapoda v.10 database to
identify homologs for 5,310 query BUSCO loci. We then used ANGSD to
characterize SNPs and genotypes for these loci in downsampled (5x)
sequences for both S. catenatus and S. tergeminus samples.
ANGSD runs used the ‘sites’ parameter to restrict searches to BUSCO
annotations, eliminate reads with mapping qualities < 20,
remove bases with quality scores < 20, retain SNPs identified
with P -values < 1x10-6, and call
genotypes with posterior likelihoods of P > 0.95. To
identify derived nonsynonymous substitutions in BUSCO loci for bothS. catenatus and S. tergeminus , we compared detected
variants with those identified in a genome assembly of a closely-related
outgroup (S. miliarius ; Broe et al., in prep.) (for detailed
bioinformatic methods see Supplemental Material). This analysis
identified both Species and Population level mutations (see above).
Next, we used PROVEAN v.1.1 (Choi & Chan, 2015) to classify the
potential deleterious impact of specific derived nonsynonymous alleles
in the BUSCO loci (Perrier, Ferchaud, Sirois, Thibault, & Bernatchez,
2017; Renaut & Rieseberg, 2015). PROVEAN uses BLAST searches of protein
databases and subsequent alignments of related protein sequences from
other related species to generate a measure (“Delta alignment score”)
of how phylogenetically conserved a given nonsynonymous variant is among
homologous protein sequences. To characterize load, we used Delta scores
to pool nonsynonymous mutations into “benign” (PROVEAN scores
> -2.5) and “deleterious” (PROVEAN scores < -
2.5). We also identified substitutions that resulted in intermediate
stop codons or “loss of function” mutations—not accounted for by
PROVEAN—as an additional category of potentially highly deleterious
mutations.
To estimate overall levels of deleterious mutations present, we
calculated the number and mean proportion of derived alleles present in
an individual that were deleterious or loss of function with respect to
the benign ones. For comparisons using Species level mutations, we
compared these values averaged across all S. catenatus andS. tergeminus individuals. For comparisons that used Population
level mutations, we generated mean values for the proportion of all
derived deleterious alleles for a specific mutation category across all
individuals from single S. catenatus populations and the onlyS. tergeminus population analyzed. Finally, for Species level
mutations only, we compared the mean PROVEAN scores for all deleterious
mutations found in an individual and compared this value separately for
heterozygous and homozygous genotypes between individuals from each
species. For this analysis we coded loss of function mutations as having
a PROVEAN score of -12.
We used the nonparametric Mann-Whitney U -test for statistical
comparisons between groups. We further used the false discovery rate
method (Benjamini & Hochberg, 1995) for P -value adjustments
across multiple tests.