2.2 | Data processing and mapping
We used Trim Galore v.0.4.5
(https://github.com/FelixKrueger/TrimGalore) to remove the adapter
sequences from the PE reads, trim low-quality 5’ and 3’ ends (i.e., ends
with missing information ‘N’, Phred-scaled scores < 20, or
poly-G tails), and discard PE reads if any or both were < 100
bp after trimming. We then used BWA v.0.7.15 (Li & Durbin, 2009) and
the ‘MEM’ algorithm to map the processed PE reads to a S.
catenatus reference genome (~1.6 Gb; 7,469 scaffolds;
scaffold N50 = 1,045 Mb; Broe et al., in prep.). Next, we used SAMtools
v.1.3.1 (Li et al., 2009) to remove potential PCR duplicates, unmapped
reads, and reads with unmapped partners from the resulting BAM files.
Finally, we used GATK v.3.8 (McKenna et al., 2010) to realign indels by
applying the ‘RealignerTargetCreator’ and ‘IndelRealigner’ functions.