2.2 Genomic data processing
Raw reads were demultiplexed and the individual fastq files were
processed with the software Fastp (v.023.2) (Chen et al., 2018)
to trim residual adapter sequences and poly G tails as well as to filter
out bad/low quality (<15Q) and too short (<75bp)
reads. The remaining filtered reads were mapped to the harbour porpoise
reference genome assembly (Autenrieth et al., 2018) using the Bwa
mem algorithm (v.0.7.17) (Li & Durbin, 2009) with default settings.
Alignment sam files were converted to bam files and sorted
by its leftmost coordinate with Samtools (v.1.15) (Danecek et
al., 2021). Picard tools (v.2.27.2) was used to add read groups
and to remove PCR and sequencing duplicates. Thereafter, bamfiles were realigned around indels with GATK (v.3.8.1) (Van der
Auwera et al., 2013).
We used RepeatMasker (v4.1.2) (Smit et al., 2013) and thedfam 3.6 (Storer et al., 2021) database to identify repetitive
sequences and interspersed repeats were removed from the bamfiles, using Samtools . Then, we identified sex-linked scaffolds
with the software SATC (Nursyifa et al., 2021) and usedSamtools to remove them. Additionally, we removed reads of
mapping quality <30 and regions of low (1/3 mean coverage) and
excessive (2x mean coverage) depth, previously estimated with ANGSD
(v.10.2.0) (Korneliussen et al., 2014). Since some population genomics
analyses can be affected by the presence of first-degree relatives, we
calculated relatedness statistics with the software NgsRelate(v.2.1) (Hanghøj et al., 2019), which uses genotype likelihoods as
input. Subsequently, we removed one sample from the only pair of
first-degree relatives found in our dataset from the downstream
analyses.