2.3 | Genetic diversity and inbreeding
We compared levels of genetic diversity, size and abundance of ROHs, and relatedness between combined samples of S. catenatus and S. tergeminus and between individual populations of S. catenatus . We used ROHan v.1.0 (Renaud, Hanghøj, Korneliussen, Willerslev, & Orlando, 2019) to estimate genome-wide heterozygosity (ΘW), fraction of the genome in ROH (F ROH), and number of ROHs (N ROH) for each sample. This program combines local Bayesian and hidden Markov models to generate reliable estimates of ΘW and identify ROHs from low-coverage samples; furthermore, it does not require stringent mapping and base quality filters, since these metrics are informative for the models (Renaud et al., 2019).
Following Benazzo et al. (2017), we defined ROHs as genomic regions> 50 kb with a heterozygosity rate ≤ 5 x 10−4 (i.e., ≤ 25 heterozygous genotypes in 50-kb sliding windows), thus accounting for potential sequencing errors. For this analysis we used individual BAM files downsampled to 5x coverage to make samples statistically comparable. Also, to maximize our ability to detect long ROHs (see below), we limited searches to the 135 scaffolds ≥ 2 Mb in length, which comprised ~24% of the S. catenatus genome assembly (Broe et al., in prep.).
We quantified the impact of inbreeding on genome-wide levels of variation by comparing individual estimates ofF ROH with individual estimates of ΘW and N ROH. Inbred individuals should show a high F ROH and low ΘW (Saremi et al., 2019), whereas individuals from populations that have experienced a recent bottleneck should show an excess of N ROH (Ceballos et al., 2018). To make cross-study comparisons with other threatened and endangered species and subspecies (Benazzo et al., 2017; Grossen et al., 2020; Robinson et al., 2019; Saremi et al., 2019; van der Valk et al., 2019), we also recalculated F ROH for ROH sizes of ≥ 0.1, 1, 2, and 2.5 Mb.
Finally, estimates of individual relatedness can provide an additional evaluation of the level of inbreeding occurring within populations. We used ANGSD v.0.930 (Korneliussen, Albrechtsen, & Nielsen, 2014) and NgsRelate v.2 (Korneliussen & Moltke, 2015) to calculate relatedness for pairs of individuals from the same population based on genotype likelihood distributions from low-coverage data. We focused on ther xy statistic (Hedrick & Lacy, 2015) because it is designed for estimating individual relatedness in populations where inbreeding occurs. We used the downsampled BAM files (with reads mapped only to the 135 scaffolds ≥ 2 Mb) grouped into populations for the analysis, while ANGSD/NgsRelate filtering parameters consisted of setting mapping and base qualities ≥ 20, SNP P ≤ 1 x 10−6, and MAF ≥ 0.05.