2.2 Genomic data processing
Raw reads were demultiplexed and the individual fastq files were processed with the software Fastp (v.023.2) (Chen et al., 2018) to trim residual adapter sequences and poly G tails as well as to filter out bad/low quality (<15Q) and too short (<75bp) reads. The remaining filtered reads were mapped to the harbour porpoise reference genome assembly (Autenrieth et al., 2018) using the Bwa mem algorithm (v.0.7.17) (Li & Durbin, 2009) with default settings. Alignment sam files were converted to bam files and sorted by its leftmost coordinate with Samtools (v.1.15) (Danecek et al., 2021). Picard tools (v.2.27.2) was used to add read groups and to remove PCR and sequencing duplicates. Thereafter, bamfiles were realigned around indels with GATK (v.3.8.1) (Van der Auwera et al., 2013).
We used RepeatMasker (v4.1.2) (Smit et al., 2013) and thedfam 3.6 (Storer et al., 2021) database to identify repetitive sequences and interspersed repeats were removed from the bamfiles, using Samtools . Then, we identified sex-linked scaffolds with the software SATC (Nursyifa et al., 2021) and usedSamtools to remove them. Additionally, we removed reads of mapping quality <30 and regions of low (1/3 mean coverage) and excessive (2x mean coverage) depth, previously estimated with ANGSD (v.10.2.0) (Korneliussen et al., 2014). Since some population genomics analyses can be affected by the presence of first-degree relatives, we calculated relatedness statistics with the software NgsRelate(v.2.1) (Hanghøj et al., 2019), which uses genotype likelihoods as input. Subsequently, we removed one sample from the only pair of first-degree relatives found in our dataset from the downstream analyses.