RAD-tag assembly and SNP calling
Five RAD-seq derived catalogs (Figure S1; Table S3) were generated.
Three of them included ABFT individuals and were either de novo
assembled (‘nuclear de novo’) or mapped to the Pacific bluefin tuna
nuclear (PBFT, Thunnus orientalis )
(Suda et al. 2019) or ABFT mitochondrial
(accession number NC_014052) genomes (‘nuclear mapped’ and ‘mito’). The
two others were mapped to the PBFT genome and included ABFT individuals
and 4 Southern bluefin tuna (Thunnus maccoyii ), 4 albacore
(Thunnus alalunga ) and 5 PBFT individuals
(Díaz-Arce et al. 2016) (‘nuclear mapped
+ others’) or only ABFT larvae and 4 albacore individuals (‘nuclear
mapped + ALB’). Both reference-mapped and de novo assembled catalogs
were generated for testing possible bias introduced by the use of the
reference genome from a closely related species, which is less
fragmented than the one available for ABFT (Accession GCA_003231725).
Three nuclear mapped catalogs were generated including different groups
of individuals to maximize the number of informative markers included
for each type of analysis. In order to avoid inclusion of kins in the
resulting datasets, which could bias some population structure results,
a genetic relatedness matrix using the GCTA toolbox
(Yang et al. 2011) was generated using
the genotypes obtained from the ‘nuclear mapped’ catalog (generated as
described below), and only one individual (the one with the highest
number of assembled RAD tags) of each resulting pair with relatedness
higher than 0.1 was included in subsequent analyses. Reference-based
assemblies were performed by mapping the quality-filtered reads to the
corresponding reference genome using the BWA-MEM algorithm
(Li 2013), converting the resulting SAM
files to sorted and indexed BAM files using SAMTOOLS
(Li et al. 2009) and filtering the mapped
reads to include only primary alignments and correctly mate mapped
reads. De novo assembly was performed using the ustacks, cstacks,
sstacks and tsv2bam modules of Stacks version 2.3e with a
minimum coverage depth of 3 reads per allele (this is, each of the two
possible versions of one biallelic SNP variant), a maximum of 2
nucleotide mismatches between two alleles at a same locus and a maximum
of 6 mismatches between loci
(Rodríguez-Ezpeleta et al. 2019). For all
mapped and de novo catalogs, SNPs were called using information from
paired-end reads with the gstacks module of Stacks version 2.3e.
For the ‘mito’ catalog only samples with no missing data for the three
diagnostic positions used for detecting introgression were kept, and the
heterozygous genotypes, considered to be related to sequencing or
assembly errors, were removed. For the rest of the RAD catalogs, only
samples with more than 25,000 RAD loci and SNPs contained in RAD-loci
present in at least 75% of the ABFT (‘nuclear mapped’ and ‘de novo’) or
in 75% of the individuals from each of the species included (‘nuclear
mapped + others’ and ‘nuclear mapped + ALB’) were kept and exported into
PLINK (Purcell et al. 2007) using thepopulations module of Stacks version 2.3e. Using only SNPs
derived from read 1, increasing threshold values for minimum genotyping
rate for individuals and SNPs were applied to obtain a final genotype
table with a minimum genotyping rate of 0.95 and 0.85 per SNP and
individual, respectively (except for the ‘nuclear mapped + ALB’ catalog
for which thresholds were 0.95 and 0.90, respectively). SNPs were
filtered using different minor allele frequency (MAF) thresholds
considering sample sizes of the different datasets to exclude from the
analysis rare non-informative variants that are susceptible to being
derived from sequencing or assembly errors. For the ‘nuclear mapped’ and
‘nuclear de novo’ catalogs, SNPs with a MAF < 0.05 were
removed, for the ‘nuclear mapped + others’ catalog, SNPs with MAF
< 0.05 in ABFT and MAF < 0.25 in each of the other
species were removed, and for the ‘nuclear mapped + ALB’ catalog, SNPs
with a minimum allele count of two in ABFT and those variable within
albacore were removed. For all nuclear catalogs, SNPs failing Hardy
Weinberg equilibrium test at a p-value threshold of 0.05 in
Mediterranean Sea larvae or Gulf of Mexico larvae groups were removed.
Resulting genotype tables including all SNPs or only the first SNP per
tag were converted to genepop, structure, PLINK, BayeScan, immanc, VCF
and treemix formats using populations or PGDSpider version
2.0.8.3 (Lischer and Excoffier 2011) .
From the ‘mito’ catalog, genotypes for the three diagnostic positions
used for detecting introgression were extracted using PLINK
(Purcell et al. 2007).