Abstract
Temporal variation of effective population size and gene flow determine
current patterns of genetic diversity within species, and hence the
genetic variation upon which natural selection can act. Although such
demographic processes are well understood in terrestrial organisms, they
remain largely unknown in the ocean, where species diversity is still
being described. Here, we present one of the first population genomic
studies in a cephalopod, Octopus insularis , which is distributed
in coastal and oceanic island habitats in the Atlantic Ocean, Mexican
Gulf and the Caribbean Sea. Using genomic data, we identify the South
Equatorial current as the main barrier to gene flow between southern and
northern parts of the range, followed by discontinuities in the habitat
associated with depth. We find that genetic diversity of insular
populations significantly decreases after colonization from the
continental shelf, also reflecting low habitat availability. Using
demographic modelling, we find signatures of a stronger population
expansion for coastal relative to insular populations, consistent with
estimated increases in habitat availability since the Last Glacial
Maximum. The direction of gene flow is coincident with unidirectional
currents and bidirectional eddies between otherwise isolated
populations, suggesting that dispersal through pelagic paralarvae is
determinant for population connectivity. Together, our results show that
oceanic currents and habitat breaks are determinant in the
diversification of marine species, shaping standing genetic variability
within populations. Moreover, our results show that insular populations
are particularly vulnerable to current human exploitation and selective
pressures, calling the revision of their protection status.
Keywords : ddRADseq, gene flow, population structure, genetic
diversity, conservation, Octopus insularis
Introduction
Genetic diversity segregating within populations is the main target of
natural selection, and therefore is determinant for species adaptability
and vulnerability (Frankham 1995; Lande 1995; DeWoody et al. 2021).
Understanding how genetic diversity is conditioned on demographic
responses to past environmental change has been a major task in
evolutionary biology as it provides insights into how species might
react to future environmental change (Hofmann & Sgró, 2011; Moritz &
Agudo, 2013). Although demographic processes, such as temporal variation
of effective population size and gene flow, are well understood in
terrestrial systems, these processes remain obscured in marine systems.
Marine systems play a
fundamental role on human consumption (Watson & Tidd, 2018; Pauly &
Zeller, 2016, Lotze et al., 2006). Nevertheless, we are only starting to
understand the number of species within marine environments (Mora et
al., 2011), and the evolutionary processes underlying their
diversification. Until recently, the presumed assumption was that marine
systems are characterized by large populations and high dispersal
(Riginos & Liggins, 2013; Sanford & Kelly, 2011; Palumbi, 1994, 1992),
making them more resilient to environmental change, relative to
terrestrial systems. However, demographic parameters in marine organisms
and their dependence on oceanographic barriers and environmental change
have remained largely untested.
Genetic and, more recently, genomic studies in marine species are
shifting our understanding on diversification in the marine environment.
Many morphologically conserved taxa that were taxonomized as single,
widely distributed species have been recognized as multiple cryptic
species (Amor et al., 2017; Duda, Kohn & Matheny, 2009), often composed
by differentiated populations (Peijenburg et al., 2006), suggesting that
strong barriers to gene flow can operate in marine taxa (Johannesson et
al., 2018; Volk et al., 2020; Filatov et al., 2021). Genomic studies in
economically important fish species revealed that effective population
size varies between and within species (da Fonseca et al., 2021; Barry,
Broquet & Gagnaire, 2022; Sodeland et al., 2022), suggesting that
genetic drift can be a strong driver of diversification in the ocean.
Such studies also show that genetic barriers between populations of the
same species are coincident with temperature and salinity gradients
(Jorgensen et al., 2005; Guo, Li & Merilä, 2016; Guo et al., 2015,
Fisher et al., 2021), suggesting that local adaptation can further drive
divergence in the presence of gene flow. Less is known about
invertebrate oceanic species, particularly those with sedentary
behavior, where dispersal is restricted to early developmental stages.
The cephalopod Octopus insularis Leite & Haimovici, 2008 was
recognized as a different species from other more widespread octopuses’
species, including O. vulagris , which is heavily targeted by
fisheries worldwide. It inhabits tropical shallow reefs, from the
interdidal to depths of 40 m (Leite et al., 2008; Leite et al., 2009a;
Bouth et al., 2011), where it acts as an opportunistic predator (Leite
et al, 2009b; Leite et al., 2016; da Silva et al., 2018). It is
distributed along the eastern continental shelf of the American
continent, from the Gulf of Mexico (Flores-Valle et al. 2018) to the
south of Brazil and in several oceanic islands in the south Atlantic
(Leite & Haimovici, 2008; Leite et al., 2009a). This cephalopod has a
short life cycle (one generation per year; Lima et al., 2017), high
fecundity (~ 95,000 eggs; Lima et al., 2014), planktonic
paralarvae (Lima et al., 2017), and adults display a sedentary life
style (Lima et al., 2017). Studies on environmental niche modelling
(Lima et al., 2020) found that coastal habitats are highly connected by
suitable habitat, while most oceanic islands are disconnected from
coastal habitats due to high depth. The exception is the oceanic
archipelago of Rocas’ Atoll and Fernando de Noronha, which are connected
to the Northwestern coast of Brazil by shallow seamounts. By projecting
the current ecological niche to the Last Glacial Maximum (some 24,000
kya), Lima et al. (2020) suggested that habitat suitability increased
strongly along the coast, while no major changes were seen in oceanic
islands, but it remains unclear whether such environmental changes
shaped current patterns of genetic diversity. Recent genetic work using
a fragment of the mitochondrial COI gene from populations collected
throughout the known species range (Lima et al., under review) showed
four haplotypic groups; two most divergent haplogroups separated by the
South Equatorial Current (Fig. 1A), and two less divergent haplogroups
separated by high depth between the coast and oceanic islands. These
results raise the hypothesis that sea currents and habitat availability
associated to seamounts may drive the diversification of O.
insularis . Although valuable, molecular studies relying on
mitochondrial markers alone are prone to reflect stochastic processes
affecting a single marker, by selective constrains of the mitochondria,
and by demographic processes specific to females (Galtier et al., 2009).
Understanding how demographic history, in particular how temporal
changes of effective population size and gene flow, resulted in extant
patterns of genetic diversity within populations requires the sampling
of independent nuclear markers and the use of coalescent-based methods
(Sousa & Hey, 2013).
Here, using thousands of
SNPs sampled randomly across the genome of O. insularis , and
sampling populations of this species throughout its known range,
including its coastal and disjunct oceanic habitats (Fig. 1A), we
perform one of the first population genomic studies of a cephalopod
species (but see Timm et al., 2020). First, we identify population
structure and associated oceanographic barriers to gene flow along the
species range. Second, we infer phylogeographic patterns of island
colonization and associated changes in genetic diversity within
populations. Lastly, we understand how temporal changes in effective
population size and gene flow are conditioned by changes in habitat
suitability and main oceanic currents. Our findings provide general
insights on the drivers of diversification in marine species with low
dispersal abilities, and provide recommendations for conservation and
sustainable fisheries of this ecologically and commercially relevant
species.
Material and Methods
Sampling of specimens
A total of 71 individuals ofOctopus insularis were sampled from 11 localities encompassing
most of its known species range (Table S1, Fig. 1A). Specimens were
collected during snorkeling, scuba diving, or were purchased in fish
markets, when the exact location of capture was known. Specimens were
then stored in 96% ethanol at room temperature and deposited in the
mollusks collection of the Federal University of Rio Grande do Norte.
Library preparation and
sequencing
We extracted DNA from muscle tissue of stored specimens using DNeasy
Tissue kits (Qiagen), following the manufacturer’s protocol. We prepared
double digest restriction site associated DNA sequencing (ddRADSeq)
libraries following the protocol of Peterson et al. (2012), as adapted
by Gaither et al. (2015). In short, we used the restriction
endonucleases SphI and MluCl (New England Biolabs), following DNA
purification with Dynabeads M-270 Streptavidin (Life Technologies), and
ligation of the universal P2 adaptors and 24 different P1 adaptors
containing individual barcodes of 5 bp, which differed from one another
by at least 3 bp. Prior to pooling, we cleaned the DNA libraries using
magnetic beads, and pooled individuals in three groups with unique
Illumina indices. We then size-selected pooled DNA to recover fragments
between 376 and 450 bp using a Pippin Prep (Sage Science). Libraries
were quantified using a High Sensitivity DNA Kit on a 2100 Bioanalyzer
(Agilent Technologies) and sequenced by QB3 Genomics at UC Berkeley on
an Illumina HiSeq 2000, producing 100 bp single end reads. We obtained
260,002,245 raw reads, which after demultiplexing corresponded to an
average of 3,298,408 reads per individual (sd 1,900,730 reads/ind; Table
S2).
Assembly of RAD-loci and
filtering
Since there is no reference genome for Octopus insularis , we
assembled de novo RAD-loci using ipyrad (v.0.9.50) (Eaton & Overcast,
2020). To assess the robustness of the de novo assembly we considered
two sequence similarity thresholds (0.9 and 0.95) that are often used
for studies within species (Amor, Johnson & James, 2020). We only
considered reads with length >70 bp and otherwise kept the
default settings of ipyrad. For each assembly we estimated number of all
loci assembled across individuals, number of homologous loci after
filtering, number of single nucleotide polymorphisms (SNPs), percentage
of missing data, and number of parsimoniously informative SNPs.
Because the two assemblies
were similar in their summary statistics (Table S3), we chose a sequence
similarity threshold of 0.9 in order to avoid over splitting of loci.
Two samples (SPS3 and FN5) that showed overall low read number
(< 200,000) and low number of recovered loci (< 200)
were discarded to reduce the amount of missing data (Table S2), leaving
69 individuals from 11 locations for our final assembly.
We exported SNPs in the
variant call format (vcf), resulting in 572,012 SNPs and 299,304
homologous loci. We then generated three datasets for downstream
analyses (Table S4) with the program vcftools v.0.1.12b (Danecek et al.,
2011): 1) the original dataset without further filtering (herein
“full”), 2) a more permissive dataset allowing for loci with a maximum
of 40% missing data across all 69 individuals (–max-missingness 0.6;
herein “69inds_40MD”), and 3) a more stringent dataset allowing for a
maximum of 20% missing data across individuals after removing the 5
individuals with the lowest number of loci (–max-missingness 0.8 and
–remove for individuals BA18, RN2A, STH2, OIC1, RN13; herein
“64inds_20MD”). All of these data sets contain SNPs linked within the
same ddRAD locus, as physical linkage is accounted for by the most
methods (Table S4). To test if the percentage of missing data reflect
the input of reads per individual, rather than a population-specific
divergence, we fit a linear regression model to the comparison between
the log of raw reads and the log of missing data per individual, using
the “69inds_40MD” dataset, and assessed statistical significance.
Population structure
We inferred the most likely number of evolutionarily independent
lineages within O. insularis using complementary methods that
rely on different assumptions, using the permissive and stringent
datasets (“69inds_40MD”, “64inds_20MD”).
First, we carried out a Principal Component Analysis developed for low
coverage sequencing data (EMU PCA, Meisner et al., 2021). This
imputation method is suitable for genetic dataset with extensive missing
data, which typically occurs in ddRAD datasets, and does not depend on
previously defined groups and thus cannot over-estimate population
structure. We converted both datasets to bed format using Plink
v1.90b6.22 (Purcell et al., 2007) and ran EMU PCA assuming eleven
eigenvalues (the number of sampling localities), filtering out alleles
with a minor allele frequency < 0.001 (-f 0.001).
Second, we inferred the probability of assignment of each individual to
a given number of K ancestral clusters, using TESS3R (Caye et al.,
2016), also considering the geographic location of the sampled
individuals. We converted the vcf-files to the lfmm input format, using
the vcf2lfmm function of the R-package LEA (Frichot & François., 2015).
We conducted 20 independent runs, assuming K between one and eleven and
allowing for maximum 1,000 iterations.
Lastly, we estimated the
co-ancestry matrix between every pair of individuals, using
fineRADstructure (Malinsky et al., 2018). In contrast with the previous
method, this method uses linkage information within the same ddRAD
locus, without any prior assumption based on sampling location. In
addition to the permissive and stringent datasets, we also converted
vcf-files for the “full” datasets to finerad input files with
RADpainter (RADpainter hapsfromVCF), and inferred the co-ancestry matrix
with the default settings.