Abstract
Temporal variation of effective population size and gene flow determine current patterns of genetic diversity within species, and hence the genetic variation upon which natural selection can act. Although such demographic processes are well understood in terrestrial organisms, they remain largely unknown in the ocean, where species diversity is still being described. Here, we present one of the first population genomic studies in a cephalopod, Octopus insularis , which is distributed in coastal and oceanic island habitats in the Atlantic Ocean, Mexican Gulf and the Caribbean Sea. Using genomic data, we identify the South Equatorial current as the main barrier to gene flow between southern and northern parts of the range, followed by discontinuities in the habitat associated with depth. We find that genetic diversity of insular populations significantly decreases after colonization from the continental shelf, also reflecting low habitat availability. Using demographic modelling, we find signatures of a stronger population expansion for coastal relative to insular populations, consistent with estimated increases in habitat availability since the Last Glacial Maximum. The direction of gene flow is coincident with unidirectional currents and bidirectional eddies between otherwise isolated populations, suggesting that dispersal through pelagic paralarvae is determinant for population connectivity. Together, our results show that oceanic currents and habitat breaks are determinant in the diversification of marine species, shaping standing genetic variability within populations. Moreover, our results show that insular populations are particularly vulnerable to current human exploitation and selective pressures, calling the revision of their protection status.
Keywords : ddRADseq, gene flow, population structure, genetic diversity, conservation, Octopus insularis

Introduction

Genetic diversity segregating within populations is the main target of natural selection, and therefore is determinant for species adaptability and vulnerability (Frankham 1995; Lande 1995; DeWoody et al. 2021). Understanding how genetic diversity is conditioned on demographic responses to past environmental change has been a major task in evolutionary biology as it provides insights into how species might react to future environmental change (Hofmann & Sgró, 2011; Moritz & Agudo, 2013). Although demographic processes, such as temporal variation of effective population size and gene flow, are well understood in terrestrial systems, these processes remain obscured in marine systems.
Marine systems play a fundamental role on human consumption (Watson & Tidd, 2018; Pauly & Zeller, 2016, Lotze et al., 2006). Nevertheless, we are only starting to understand the number of species within marine environments (Mora et al., 2011), and the evolutionary processes underlying their diversification. Until recently, the presumed assumption was that marine systems are characterized by large populations and high dispersal (Riginos & Liggins, 2013; Sanford & Kelly, 2011; Palumbi, 1994, 1992), making them more resilient to environmental change, relative to terrestrial systems. However, demographic parameters in marine organisms and their dependence on oceanographic barriers and environmental change have remained largely untested.
Genetic and, more recently, genomic studies in marine species are shifting our understanding on diversification in the marine environment. Many morphologically conserved taxa that were taxonomized as single, widely distributed species have been recognized as multiple cryptic species (Amor et al., 2017; Duda, Kohn & Matheny, 2009), often composed by differentiated populations (Peijenburg et al., 2006), suggesting that strong barriers to gene flow can operate in marine taxa (Johannesson et al., 2018; Volk et al., 2020; Filatov et al., 2021). Genomic studies in economically important fish species revealed that effective population size varies between and within species (da Fonseca et al., 2021; Barry, Broquet & Gagnaire, 2022; Sodeland et al., 2022), suggesting that genetic drift can be a strong driver of diversification in the ocean. Such studies also show that genetic barriers between populations of the same species are coincident with temperature and salinity gradients (Jorgensen et al., 2005; Guo, Li & Merilä, 2016; Guo et al., 2015, Fisher et al., 2021), suggesting that local adaptation can further drive divergence in the presence of gene flow. Less is known about invertebrate oceanic species, particularly those with sedentary behavior, where dispersal is restricted to early developmental stages.
The cephalopod Octopus insularis Leite & Haimovici, 2008 was recognized as a different species from other more widespread octopuses’ species, including O. vulagris , which is heavily targeted by fisheries worldwide. It inhabits tropical shallow reefs, from the interdidal to depths of 40 m (Leite et al., 2008; Leite et al., 2009a; Bouth et al., 2011), where it acts as an opportunistic predator (Leite et al, 2009b; Leite et al., 2016; da Silva et al., 2018). It is distributed along the eastern continental shelf of the American continent, from the Gulf of Mexico (Flores-Valle et al. 2018) to the south of Brazil and in several oceanic islands in the south Atlantic (Leite & Haimovici, 2008; Leite et al., 2009a). This cephalopod has a short life cycle (one generation per year; Lima et al., 2017), high fecundity (~ 95,000 eggs; Lima et al., 2014), planktonic paralarvae (Lima et al., 2017), and adults display a sedentary life style (Lima et al., 2017). Studies on environmental niche modelling (Lima et al., 2020) found that coastal habitats are highly connected by suitable habitat, while most oceanic islands are disconnected from coastal habitats due to high depth. The exception is the oceanic archipelago of Rocas’ Atoll and Fernando de Noronha, which are connected to the Northwestern coast of Brazil by shallow seamounts. By projecting the current ecological niche to the Last Glacial Maximum (some 24,000 kya), Lima et al. (2020) suggested that habitat suitability increased strongly along the coast, while no major changes were seen in oceanic islands, but it remains unclear whether such environmental changes shaped current patterns of genetic diversity. Recent genetic work using a fragment of the mitochondrial COI gene from populations collected throughout the known species range (Lima et al., under review) showed four haplotypic groups; two most divergent haplogroups separated by the South Equatorial Current (Fig. 1A), and two less divergent haplogroups separated by high depth between the coast and oceanic islands. These results raise the hypothesis that sea currents and habitat availability associated to seamounts may drive the diversification of O. insularis . Although valuable, molecular studies relying on mitochondrial markers alone are prone to reflect stochastic processes affecting a single marker, by selective constrains of the mitochondria, and by demographic processes specific to females (Galtier et al., 2009). Understanding how demographic history, in particular how temporal changes of effective population size and gene flow, resulted in extant patterns of genetic diversity within populations requires the sampling of independent nuclear markers and the use of coalescent-based methods (Sousa & Hey, 2013).
Here, using thousands of SNPs sampled randomly across the genome of O. insularis , and sampling populations of this species throughout its known range, including its coastal and disjunct oceanic habitats (Fig. 1A), we perform one of the first population genomic studies of a cephalopod species (but see Timm et al., 2020). First, we identify population structure and associated oceanographic barriers to gene flow along the species range. Second, we infer phylogeographic patterns of island colonization and associated changes in genetic diversity within populations. Lastly, we understand how temporal changes in effective population size and gene flow are conditioned by changes in habitat suitability and main oceanic currents. Our findings provide general insights on the drivers of diversification in marine species with low dispersal abilities, and provide recommendations for conservation and sustainable fisheries of this ecologically and commercially relevant species.

Material and Methods

Sampling of specimens

A total of 71 individuals ofOctopus insularis were sampled from 11 localities encompassing most of its known species range (Table S1, Fig. 1A). Specimens were collected during snorkeling, scuba diving, or were purchased in fish markets, when the exact location of capture was known. Specimens were then stored in 96% ethanol at room temperature and deposited in the mollusks collection of the Federal University of Rio Grande do Norte.

Library preparation and sequencing

We extracted DNA from muscle tissue of stored specimens using DNeasy Tissue kits (Qiagen), following the manufacturer’s protocol. We prepared double digest restriction site associated DNA sequencing (ddRADSeq) libraries following the protocol of Peterson et al. (2012), as adapted by Gaither et al. (2015). In short, we used the restriction endonucleases SphI and MluCl (New England Biolabs), following DNA purification with Dynabeads M-270 Streptavidin (Life Technologies), and ligation of the universal P2 adaptors and 24 different P1 adaptors containing individual barcodes of 5 bp, which differed from one another by at least 3 bp. Prior to pooling, we cleaned the DNA libraries using magnetic beads, and pooled individuals in three groups with unique Illumina indices. We then size-selected pooled DNA to recover fragments between 376 and 450 bp using a Pippin Prep (Sage Science). Libraries were quantified using a High Sensitivity DNA Kit on a 2100 Bioanalyzer (Agilent Technologies) and sequenced by QB3 Genomics at UC Berkeley on an Illumina HiSeq 2000, producing 100 bp single end reads. We obtained 260,002,245 raw reads, which after demultiplexing corresponded to an average of 3,298,408 reads per individual (sd 1,900,730 reads/ind; Table S2).

Assembly of RAD-loci and filtering

Since there is no reference genome for Octopus insularis , we assembled de novo RAD-loci using ipyrad (v.0.9.50) (Eaton & Overcast, 2020). To assess the robustness of the de novo assembly we considered two sequence similarity thresholds (0.9 and 0.95) that are often used for studies within species (Amor, Johnson & James, 2020). We only considered reads with length >70 bp and otherwise kept the default settings of ipyrad. For each assembly we estimated number of all loci assembled across individuals, number of homologous loci after filtering, number of single nucleotide polymorphisms (SNPs), percentage of missing data, and number of parsimoniously informative SNPs.
Because the two assemblies were similar in their summary statistics (Table S3), we chose a sequence similarity threshold of 0.9 in order to avoid over splitting of loci. Two samples (SPS3 and FN5) that showed overall low read number (< 200,000) and low number of recovered loci (< 200) were discarded to reduce the amount of missing data (Table S2), leaving 69 individuals from 11 locations for our final assembly.
We exported SNPs in the variant call format (vcf), resulting in 572,012 SNPs and 299,304 homologous loci. We then generated three datasets for downstream analyses (Table S4) with the program vcftools v.0.1.12b (Danecek et al., 2011): 1) the original dataset without further filtering (herein “full”), 2) a more permissive dataset allowing for loci with a maximum of 40% missing data across all 69 individuals (–max-missingness 0.6; herein “69inds_40MD”), and 3) a more stringent dataset allowing for a maximum of 20% missing data across individuals after removing the 5 individuals with the lowest number of loci (–max-missingness 0.8 and –remove for individuals BA18, RN2A, STH2, OIC1, RN13; herein “64inds_20MD”). All of these data sets contain SNPs linked within the same ddRAD locus, as physical linkage is accounted for by the most methods (Table S4). To test if the percentage of missing data reflect the input of reads per individual, rather than a population-specific divergence, we fit a linear regression model to the comparison between the log of raw reads and the log of missing data per individual, using the “69inds_40MD” dataset, and assessed statistical significance.

Population structure

We inferred the most likely number of evolutionarily independent lineages within O. insularis using complementary methods that rely on different assumptions, using the permissive and stringent datasets (“69inds_40MD”, “64inds_20MD”).
First, we carried out a Principal Component Analysis developed for low coverage sequencing data (EMU PCA, Meisner et al., 2021). This imputation method is suitable for genetic dataset with extensive missing data, which typically occurs in ddRAD datasets, and does not depend on previously defined groups and thus cannot over-estimate population structure. We converted both datasets to bed format using Plink v1.90b6.22 (Purcell et al., 2007) and ran EMU PCA assuming eleven eigenvalues (the number of sampling localities), filtering out alleles with a minor allele frequency < 0.001 (-f 0.001).
Second, we inferred the probability of assignment of each individual to a given number of K ancestral clusters, using TESS3R (Caye et al., 2016), also considering the geographic location of the sampled individuals. We converted the vcf-files to the lfmm input format, using the vcf2lfmm function of the R-package LEA (Frichot & François., 2015). We conducted 20 independent runs, assuming K between one and eleven and allowing for maximum 1,000 iterations.
Lastly, we estimated the co-ancestry matrix between every pair of individuals, using fineRADstructure (Malinsky et al., 2018). In contrast with the previous method, this method uses linkage information within the same ddRAD locus, without any prior assumption based on sampling location. In addition to the permissive and stringent datasets, we also converted vcf-files for the “full” datasets to finerad input files with RADpainter (RADpainter hapsfromVCF), and inferred the co-ancestry matrix with the default settings.