2.2 | Data processing and mapping
We used Trim Galore v.0.4.5 (https://github.com/FelixKrueger/TrimGalore) to remove the adapter sequences from the PE reads, trim low-quality 5’ and 3’ ends (i.e., ends with missing information ‘N’, Phred-scaled scores < 20, or poly-G tails), and discard PE reads if any or both were < 100 bp after trimming. We then used BWA v.0.7.15 (Li & Durbin, 2009) and the ‘MEM’ algorithm to map the processed PE reads to a S. catenatus reference genome (~1.6 Gb; 7,469 scaffolds; scaffold N50 = 1,045 Mb; Broe et al., in prep.). Next, we used SAMtools v.1.3.1 (Li et al., 2009) to remove potential PCR duplicates, unmapped reads, and reads with unmapped partners from the resulting BAM files. Finally, we used GATK v.3.8 (McKenna et al., 2010) to realign indels by applying the ‘RealignerTargetCreator’ and ‘IndelRealigner’ functions.