Molecular Ecology Resources - Authorea

by author

by title

by keyword

Estimates of heterozygosity from single nucleotide polymorphism markers are context d...

Jarrod Sopniewski

and 1 more

March 08, 2024

Heterozygosity is frequently used to describe variation in genetic diversity amongst populations and is often estimated using single nucleotide polymorphisms (SNPs). However, methods of calculating heterozygosity from SNPs have been shown to be affected study design and filtering parameters, reducing their utility and comparability across studies. Though solutions have been proposed to account for identified problems, in our own data, we continued to see inconsistent results. Here, we aimed to further improve methods of reducing inconsistency in these results, specifically by investigating how sample size and missing data thresholds influenced autosomal estimates of heterozygosity (heterozygosity calculated from across the genome, i.e., both fixed and variable sites). We also investigated how the exclusion of tri- and tetra-allelic sites, which is generally standard practice in such studies, could affect eventual estimates of heterozygosity. Across three distinct taxa (a frog, Litoria rubella; a tree, Eucalyptus microcarpa; and a grasshopper, Keyacris scurra) we found autosomal heterozygosity estimates to be affected by samples size when missing data is not allowed and show that this is partly due to the exclusion of tri- and tetra-allelic loci. We also show that the biases introduced by these factors are not consistent between species, or even populations, with higher levels of actual heterozygosity tending to result in larger adverse effects. We propose a modified framework for calculating heterozygosity to reduce these inherent issues and highlight the need for further development in methods such that tri- and tetra-allelic sites can be included in the calculation of population genomics statistics.

The population genetics of partial diapause, with applications to the aestivating mal...

Rita Mwima

and 3 more

October 27, 2023

Diapause, a form of dormancy to delay or halt the reproductive development during unfavourable seasons, has evolved in many insect species. One example is aestivation, a summer adult-stage diapause, enhancing malaria vectors’ survival during the unfavourable dry season (DS) and their re-establishment in the next rainy season (RS). This work develops a novel genetic approach to estimate the number or proportion of individuals undergoing diapause, as well as the breeding sizes of the two seasons, using signals from temporal allele frequency dynamics. Using Anopheles coluzzii as an example, our modelling shows the magnitude of drift is dampened at early RS when previously aestivating individuals reappear. Aestivation severely biases the temporal effective population size (N_e), leading to overestimation of the DS breeding size by 1/(1-α)^2 across one year, where α is the aestivating proportion. We find sampling breeding individuals in three consecutive seasons starting from a RS is sufficient for parameter estimation, and perform extensive simulations to verify our derivations. This method does not require sampling individuals in the dormant state, the biggest challenge in most studies. We apply the method to a published An. coluzzii dataset from Thierola, Mali, and the estimated aestivating proportions were 39%-79%. These results will inform the development of genetic approaches to vector control. Beyond mosquitoes, our method and the expected evolutionary implications are applicable to any species in which a fraction of the population diapauses for more than one generation, and are difficult or impossible to sample during that stage.

In-situ metagenomics: A platform for on-field rapid sequencing and analysis of metage...

Javier Tamames

and 4 more

August 24, 2023

We present a complete portable pipeline for sequencing and analysis of environmental metagenomes in less than a day. This unprecedented development was possible due to the conjunction of state-of-the art experimental and computational advances: a portable laboratory suitable for DNA extraction and sequencing with nanopore technology.The powerful metagenomic analysis pipeline SqueezeMeta, capable to provide a complete analysis in a few hours and using scarce computational resources. Finally, tools for the automatic inspection of the results via a graphical user interface, that can be coupled to a web server to allow remote visualization of data (SQMtools and SQMxplore). We tested the feasibility of our approach in the sequencing of the microbiota associated to volcanic rocks in La Palma, Canary Islands. Also, we did a two-day sampling campaign of marine waters in which the results obtained the first day guided the experimental design of the second day. We demonstrate that it is possible to generate metagenomic information in less than one day, making it feasible to obtain taxonomic and functional profiles fast and efficiently, even in field conditions. This capacity can be used in the further to perform real-time functional and taxonomic profiling of microbial communities in remote areas

Morphological and taxonomic diversity of mesozooplankton is an important driver of ca...

Margaux Perhirin

and 5 more

February 10, 2023

Mesozooplankton is a very diverse group of small animals ranging in size from 0.2 to 20 mm not able to swim against ocean currents. It is a key component of pelagic ecosystems through its roles in the trophic networks and the biological carbon pump. Traditionally studied through microscopes, recent methods have been however developed to rapidly acquire large amounts of data (morphological, molecular) at the individual scale, making it possible to study mesozooplankton using a trait-based approach. Here, combining quantitative imaging with metabarcoding time-series data obtained in the Sargasso Sea at the Bermuda Atlantic Time-series Study (BATS) site, we showed that organisms’ transparency might be an important trait to also consider regarding mesozooplankton impact on carbon export, contrary to the common assumption that just size is the master trait directing most mesozooplankton-linked processes. Three distinct communities were defined based on taxonomic composition, and succeeded one another throughout the study period, with changing levels of transparency among the community. A co-occurrences’ network was built from metabarcoding data revealing six groups of taxa. These were related to changes in the functioning of the ecosystem and/or in the community’s morphology. The importance of Diel Vertical Migration at BATS was confirmed by the existence of a group made of taxa known to be strong migrators. Finally, we assessed if metabarcoding can provide a quantitative approach to biomass and/or abundance of certain taxa. Knowing more about mesozooplankton diversity and its impact on ecosystem functioning would allow to better represent them in biogeochemical models.

Predicting downstream transport distance of fish eDNA in lotic environments

Didier Pont

July 24, 2023

Environmental DNA is an effective tool for describing fish biodiversity in lotic environments, but the downstream transport of eDNA released by organisms makes it difficult to interpret species detection at the local scale. In addition to biophysical degradation and exchanges at the water-sediment interface, hydrological conditions control the transport distance. We have developed an eDNA transport model that considers downstream retention and degradation processes in combination with hydraulic conditions and assumes that the sedimentation rate of very fine particles is a correct estimate of the eDNA deposition rate. Based on meta-analyses of available studies, we successively modelled the particle size distribution of fish eDNA (PSD), the relationship between the sedimentation rate and the size of very fine particles in suspension, and the influence of temperature on the degradation rate of fish eDNA. After combining the results in a mechanistic-based model, we correctly simulated the eDNA uptake distances observed in a compilation of previous experimental studies. eDNA degradation is negligible at low flow and temperature but has a comparable influence to background transfer when hydraulic conditions allow a long uptake distance. The wide prediction intervals associated with the simulations reflect the complexity of the processes acting on eDNA after shedding. This model can be useful for estimating eDNA detection distance downstream from a source point and discussing the possibility of false positive detection in eDNA samples, as shown in an example.

GenAPoPop 1.0: a user-friendly software to analyse genetic diversity and structure in...

Solenn Stoeckel

and 3 more

December 02, 2022

Autopolyploidy is quite common in most clades of eukaryotes. The emergence of sequence-based genotyping methods with individual and marker tags enables now confident allele dosage, overcoming the main obstacle to the democratization of the population genetic approaches when studying ecology and evolution of autopolyploid populations and species. Reproductive modes, including clonality, selfing and allogamy, have deep consequences on the ecology and evolution of population and species. Analysing genetic diversity and its dynamics over generations is one efficient way to infer the relative importance of clonality, selfing and allogamy in populations. GENAPOPOP is a user-friendly solution to compute the specific corpus of population genetic indices, including indices about genotypic diversity, needed to analyse partially clonal, selfed and allogamous polysomic populations genotyped with confident allele dosage. It also easily provides the posterior probabilities of quantitative reproductive modes in autopolyploid populations genotyped at two-time steps and a graphical representation of the minimum spanning trees of the genetic distances between polyploid individuals, facilitating the interpretation of the genetic coancestry between individuals in hierarchically structured populations. GENAPOPOP complements the previously existing solutions, including SPAGEDI and POLYGENE, to use genotypings to study the ecology and evolution of autopolyploid populations. It was specially developed with a simple graphical interface and workflow, and comes with a simulator to facilitate practical course and teaching of population genetics for autopolyploid populations.

Development and validation of a DNA-based multi-species biomonitoring toolkit using a...

Dennis van der Pouw Kraan

and 3 more

July 19, 2023

Biomonitoring of marine life has been enhanced in recent years by the integration of innovative DNA-based approaches, which offer advantages over more laborious conventional techniques (e.g. direct capture) and greater taxonomic resolution especially in complex life cycles and early life stages. However, tradeoffs between throughput, sensitivity and quantitative measurements must be made when choosing between the prevailing molecular methodologies (i.e. metabarcoding or qPCR/dPCR). Thus, the aim of the present study was to demonstrate the utility of a microfluidic-enabled High Throughput quantitative PCR platform (HT-qPCR) for the rapid and cost-effective development and validation of a DNA-based multi-species biomonitoring toolkit, using larvae of 24 commercially targeted bivalve and crustacean species as a case study. The workflow was divided into three main phases: definition of target taxa and establishment of reference databases (PHASE 1); in silico selection/development and in vitro assessment of molecular assays (PHASE 2); and protocol optimization and field validation (PHASE 3). Of a total of 85 assays in silico, 42 were eventually chosen and validated in vitro. Genetic signal showed good correlation with direct visual counts by microscopy, but also showed the ability to provide quantitative data at the highest taxonomic resolution (species level) in a time- and cost-effective fashion. This study developed a biomonitoring toolkit, demonstrating the considerable advantages of this state-of-the-art technology in boosting the development and application of panels of molecular assays for the monitoring and management of natural resources that can be applied to a range of monitoring programmes. Keywords: DNA, High Throughput, qPCR, biomonitoring, shellfish

Non-invasive age estimation based on fecal DNA using methylation-sensitive high-resol...

Genfu Yagi

and 7 more

July 10, 2023

Age is necessary information for the study of life history of wild animals. A general method to estimate the age of odontocetes is counting dental growth layer groups (GLGs). However, this method is highly invasive as it requires the capture and handling of individuals to collect their teeth. Recently, the development of DNA-based age estimation methods has been actively studied as an alternative to such invasive methods, of which many have used biopsy samples. However, if DNA-based age estimation can be developed from fecal samples, age estimation can be performed without touching or disrupting individuals, thus establishing an entirely non-invasive method. We developed an age estimation model using the methylation rate of two gene regions, GRIA2 and CDKN2A, measured through methylation-sensitive high-resolution melting (MS-HRM) from fecal samples of wild Indo-Pacific bottlenose dolphins (Tursiops aduncus). The age of individuals was known through conducting longitudinal individual identification surveys underwater. Methylation rates were quantified from 36 samples. Both gene regions showed a significant correlation between age and methylation rate. The age estimation model was constructed based on the methylation rates of both genes which achieved sufficient accuracy (after LOOCV: MAE = 5.08, R2 = 0.34) for the ecological studies of the Indo-Pacific bottlenose dolphins, with a lifespan of 40-50 years. This is the first study to report the use of non-invasive fecal samples to estimate the age of marine mammals.

Plant-derived environmental DNA complements diversity estimates from traditional arth...

Sven Weber

and 8 more

June 12, 2023

Our limited knowledge about the ecological drivers of global arthropod decline highlights the urgent need for more effective biodiversity monitoring approaches. Monitoring of arthropods is commonly performed using passive trapping devices, which reliably recover diverse communities, but provide little ecological information on the sampled taxa. Especially the manifold interactions of arthropods with plants are barely understood. A promising strategy to overcome this shortfall is environmental DNA (eDNA) metabarcoding of arthropods from plant material they have interacted with. However, the accuracy of this approach has not been sufficiently tested. In four experiments, we exhaustively test the comparative performance of plant-derived eDNA from surface washes of plants and homogenized plant material against traditional monitoring approaches. We show that the recovered communities of plant-derived eDNA and traditional approaches only partly overlap, with eDNA recovering various additional cryptic taxa. This suggests eDNA as a useful complementary tool to traditional monitoring. Despite the differences in recovered taxa, estimates of community α- and β-diversity between both approaches are well correlated, highlighting the utility of eDNA as a broad scale tool for community monitoring. Last, eDNA outperforms traditional approaches in the recovery of plant-specific arthropod communities. Unlike traditional monitoring, eDNA revealed fine-scaled community differentiation between individual plants and even within plant compartments. Especially specialized herbivores are better recovered with eDNA. Our results highlight the value of plant derived eDNA analysis for large-scale biodiversity assessments that include information about community level interactions.

POOLPARTY2: An integrated pipeline for analyzing pooled or indexed low coverage whole...

Stuart Willis

and 3 more

June 06, 2023

Whole genome sequencing data allow survey of variation from across the genome, reducing the constraint of balancing genome sub-sampling with recombination rates and linkage between sampled markers and target loci. As sequencing costs decrease, low coverage whole genome sequencing of pooled or indexed-individual samples is commonly utilized to identify loci associated with phenotypes or environmental axes in non-model organisms. There are, however, relatively few publicly available bioinformatic pipelines designed explicitly to analyze these types of data, and fewer still that process the raw sequencing data, provide useful metrics of quality control, and then execute analyses. Here, we present an updated version of a bioinformatics pipeline called POOLPARTY2 that can effectively handle either pooled or indexed DNA samples and includes new features to improve computational efficiency. Using simulated data, we demonstrate the ability of our pipeline to recover segregating variants, estimate their allele frequencies accurately, and identify genomic regions harboring loci under selection. Based on the simulated data set, we benchmark the efficacy of our pipeline with another bioinformatic suite, ANGSD, and illustrate the compatibility and complementarity of these suites by using ANGSD to generate genotype likelihoods as input for identifying linkage outlier regions using alignment files and variants provided by POOLPARTY2. Finally, we apply our updated pipeline to an empirical dataset of low coverage whole genomic data from uncurated population samples of Columbia River steelhead trout (Oncorhynchus mykiss), results from which demonstrate the genomic impacts of decades of artificial selection in a prominent hatchery stock.

Reliable NGS genotyping of MHC class I and II genes requires template-specific optimi...

Artemis Efstratiou

and 4 more

June 05, 2023

Using high-throughput sequencing for precise genotyping of multi-locus gene families, such as the Major Histocompatibility Complex (MHC), remains challenging, due to the complexity of the data and difficulties in distinguishing genuine from erroneous variants. Several dedicated genotyping pipelines for data from high-throughput sequencing, such as next-generation sequencing (NGS), have been developed to tackle the ensuing risk of artificially inflated diversity. Here, we thoroughly assess three such multi-locus genotyping pipelines for NGS data, using MHC class IIβ datasets of three-spined stickleback gDNA, cDNA, and “artificial” plasmid samples with known allelic diversity. We show that genotyping of gDNA and plasmid samples at optimal pipeline parameters was highly accurate and reproducible across methods. However, for cDNA data, the same configuration yielded decreased overall genotyping precision and consistency between pipelines. Further adjustments of key clustering parameters were required tο account for higher error rates and larger variation in sequencing depth per allele, highlighting the importance of template-specific pipeline optimization for reliable genotyping of multi-locus gene families. Through accurate paired gDNA-cDNA genotyping and MHC-II haplotype inference, we show that MHC-II allele-specific expression levels correlate negatively with allele number across haplotypes. Lastly, sibship-assisted cDNA genotyping of MHC-I revealed novel variants and haplotype-based allelic segregation with a higher-than-previously-reported individual allelic diversity for MHC-I in sticklebacks. In conclusion, we here provide novel genotyping protocols for MHC-I and -II genes of the three-spined stickleback, but also evaluate the performance of popular NGS-genotyping pipelines and highlight the need for template-specific optimization for reliable multi-locus genotyping.

Leveraging whole genome sequencing to estimate telomere length in plants

Michelle Zavala Paez

and 2 more

June 02, 2023

Changes in telomere length are increasingly used to indicate species’ response to environmental stress across diverse taxa. Despite this broad use, few studies have explored telomere length in plants. However, rapid advances in sequencing approaches and bioinformatic tools now allow estimation of telomere length using whole genome sequencing (WGS) data. Thus, evaluation of new approaches for measuring telomere length in plants are needed. Traditionally, telomere length has been quantified using quantitative polymerase chain reaction (qPCR). While WGS has been extensively used in humans, no study to date has compared the effectiveness of WGS in estimating telomere length in plants relative to traditional qPCR approaches. In this study, we use one hundred Populus clones re-sequenced using short-read Illumina sequencing to quantify telomere length using three different bioinformatic approaches, Computel, K-seek, and TRIP, in addition to qPCR. Overall, telomere length estimates varied across different bioinformatic approaches, but were highly correlated across methods for individual genotypes. A positive correlation was observed between WGS estimates and qPCR, however, Computel estimates exhibited the greatest correlation. Computel incorporates genome coverage into telomere length calculations, suggesting that genome coverage is likely important to telomere length quantification when using WGS data. Overall, telomere estimates from WGS provided greater precision and accuracy of telomere length estimates relative to qPCR. The findings suggest WGS is a promising approach for assessing telomere length, and as the field of telomere ecology evolves may provide added value to assaying response to biotic and abiotic environments for plants needed to accelerate plant breeding and conservation management.

Don't be scared of the genome's 5th base -- Explaining phenotypic variability and evo...

Joerg Tost

May 17, 2023

Epigenetic processes have taken center stage for the investigation of many biological processes and epigenetic modifications have shown to influence phenotype, morphology and behavioral traits such as stress resistance by affecting gene regulation and expression without altering the underlying genomic sequence. The multiple molecular layers of epigenetics synergistically construct the cell type-specific gene regulatory networks. DNA methylation occurring on the 5’ carbon of cytosines in different genomic sequence contexts is the most studied epigenetic modification. DNA methylation has been shown to provide a molecular record of a large variety of environmental factors, which might be persistent through the entire lifetime of an organisms and even be passed onto the offspring. Animals might display altered phenotypes mediated by epigenetic modifications depending on the developmental stage or the environmental conditions as well as during evolution. Therefore, the analysis of DNA methylation patterns might allow deciphering previous exposures, explaining ecologically relevant phenotypic diversity and predicting evolutionary trajectories enabling accelerated adaption to changing environmental conditions. Despite the explanatory potential of DNA methylation. studies of DNA methylation are still scarce in the field of ecology. This might be at least partly due to the complexity of DNA methylation analysis and the interpretation of the acquired data. In the current issue of Molecular Ecology Resources, Laine and colleagues (2023) provide a detailed summary of guidelines and valuable recommendations for researchers in the field of ecology to avoid common pitfalls and perform interpretable genome-wide DNA methylation analyses.

HMSS2: an advanced tool for the analysis of sulfur metabolism, including organosulfur...

Tomohisa Tanabe

and 1 more

May 12, 2023

The global sulfur cycle has implications for human health, climate change, biogeochemistry, and bioremediation. The organosulfur compounds that participate in this cycle not only represent a vast reservoir of sulfur, but are also used by prokaryotes as sources of energy and/or carbon. Closely linked to the inorganic sulfur cycle, it involves the interaction of prokaryotes, eukaryotes, and chemical processes. However, ecological and evolutionary studies of the conversion of organic sulfur compounds are hampered by the poor conservation of the relevant pathways and their variation even within strains of the same species. In addition, several proteins involved in the conversion of sulfonated compounds are related to proteins involved in sulfur dissimilation or turnover of other compounds. Therefore, the enzymes involved in the metabolism of organic sulfur compounds are usually not correctly annotated in public databases. To address this challenge, we have developed HMSS2, a profiled Hidden Markov Model-based tool for rapid annotation and synteny analysis of organic and inorganic sulfur cycle proteins in prokaryotic genomes. Compared to its previous version (HMS-S-S), HMSS2 includes several new features. HMM-based annotation is now supported by non-homology criteria and covers the metabolic pathways of important organosulfur compounds, including dimethylsulfpopropionate, taurine, isethionate, and sulfoquinovose. In addition, the calculation speed has been increased by a factor of four and the available output formats have been extended to include iTol compatible datasets, and customised sequence FASTA files

Estimating contemporary effective population size from SNP data while accounting for...

Enrique Santiago

and 3 more

May 08, 2023

A new method is developed to estimate the contemporary effective population size (Ne) from linkage disequilibrium between SNPs without information on their location, which is the usual scenario in non-model species. The general theory of linkage disequilibrium is extended to include the contribution of full-sibs to the measure of LD, leading naturally to the estimation of Ne in monogamous and polygamous mating systems, as well as in multiparous species, and non-random distributions of full-sib family size due to selection or other causes. The prediction of confidence intervals for Ne estimates was solved using a small artificial neural network trained on a dataset of over 105 simulation results. The method, implemented in a user-friendly and fast software (currentNe) is able to estimate Ne even in problematic scenarios with large population sizes or small sample sizes, and provides confidence intervals that are more consistent than parametric methods or resampling.

Journeying towards best practice data management in biodiversity genomics

Natalie Forsdick

and 8 more

May 05, 2023

Advances in sequencing technologies and declining costs are increasing the accessibility of large-scale biodiversity genomic datasets. To maximise the impact of these data, a careful, considered approach to data management is essential. However, challenges associated with the management of such datasets remain, exacerbated by uncertainty among the research community as to what constitutes best practices. As an interdisciplinary team with diverse data management experience, we recognise the growing need for guidance on comprehensive data management practices that minimise the risks of data loss, maximise efficiency for stand-alone projects, enhance opportunities for data reuse, facilitate Indigenous data sovereignty and uphold the FAIR and CARE Guiding Principles. Here, we describe four fictional personas reflecting user experiences with data management to identify data management challenges across the biodiversity genomics research ecosystem. We then use these personas to demonstrate realistic considerations, compromises, and actions for biodiversity genomic data management. We also launch the Biodiversity Genomics Data Management Hub (https://genomicsaotearoa.github.io/data-management-resources/), containing tips, tricks and resources to support biodiversity genomics researchers, especially those new to data management, in their journey towards best practice. The Hub also provides an opportunity for those biodiversity researchers whose expertise lies beyond genomics and are keen to advance their data management journey. We aim to support the biodiversity genomics community in embedding data management throughout the research lifecycle to maximise research impact and outcomes.

Inference of the distribution of fitness effects of mutations is affected by SNP filt...

Bea Andersson

and 4 more

May 03, 2023

A document by Bea Andersson. Click on the document to view its contents.

Integrating Pool-seq uncertainties into demographic inference

João Carvalho

and 4 more

December 05, 2022

Next-generation sequencing of pooled samples (Pool-seq) is a popular method to assess genome-wide diversity patterns in natural and experimental populations. However, Pool-seq is associated with specific sources of noise, such as unequal individual contributions. Consequently, using Pool-seq for the reconstruction of evolutionary history has remained underexplored. Here we describe a novel Approximate Bayesian Computation (ABC) method to infer demographic history, explicitly modeling Pool-seq sources of error. By jointly modeling Pool-seq data, demographic history and the effects of selection due to barrier loci, we obtain estimates of demographic history parameters accounting for technical errors associated with Pool-seq. Our ABC approach is computationally efficient as it relies on simulating subsets of loci (rather than the whole-genome), and on using relative summary statistics and relative model parameters. Our simulation study results indicate Pool-seq data allows distinction between general scenarios of ecotype formation (single versus parallel origin), and to infer relevant demographic parameters (e.g., effective sizes, split times). We exemplify the application of our method to Pool-seq data from the rocky-shore gastropod Littorina saxatilis, sampled on a narrow geographical scale at two Swedish locations where two ecotypes (Wave and Crab) are found. Our model choice and parameter estimates show that ecotypes formed before colonization of the two locations (i.e., single origin) and are maintained despite gene flow. These results indicate that demographic modeling and inference can be successful based on pool-sequencing using ABC, contributing to the development of suitable null models that allow for a better understanding of the genetic basis of divergent adaptation.