Corresponding author: Brian Charlesworth
Email:Brian.Charlesworth@ed.ac.uk
Keywords: epistatic selection, genetic hitchhiking, linkage
disequilibrium, selective sweeps, demography
ABSTRACT
We write to address recent claims by Gompert et al. (2021) about
the potentially important and underappreciated phenomena of ”indirect
selection”, the observation that neutral regions may be affected by
natural selection. We argue both that this phenomenon – generally known
as genetic hitchhiking – is neither new nor poorly studied, and that
the patterns described by the authors have multiple alternative
explanations.
We wish to express a number of concerns about the recent paper by
Gompert et al . (2021), who propose a ”widespread indirect
selection hypothesis”, and assert that “functionally neutral genetic
regions can be affected indirectly by natural selection, via their
statistical association with genes under direct selection”. They add
that ”the genomic extent of such indirect selection, particularly across
loci not physically linked to those under direct selection, remains
poorly understood”. The authors present analyses of several datasets;
they suggest that this process could be an important phenomenon over
both short and long timescales, and may serve to make “aspects of
evolution more predictable” given that ”conditional on patterns of LD,
indirect selection is a deterministic process.” We have concerns about
both their conceptual framework, and their interpretation of the
experimental results.
First, the concept of “indirect selection” appears to be equivalent to
genetic hitchhiking, which is only mentioned towards the end of their
paper. The term hitchhiking is most often used in relation to the
effects of the spread of a beneficial mutation on variability at linked
sites (a selective sweep: Maynard Smith & Haigh 1974), or to the
similar effects of the elimination of deleterious mutations (background
selection: Charlesworth et al . 1993; incorrectly attributed by
Gompert et al . to Begun & Aquadro 1992). As noted in the recent
review by Charlesworth & Jensen (2021), the various forms of
hitchhiking can all be described in general terms by the Price-Robertson
identity, which states that the change in mean of a trait is governed by
the additive covariance between the trait and fitness (Robertson 1968;
Price 1970). In this case, the trait is the allele frequency at the
neutral locus, and the covariance is the product of the coefficient of
linkage disequilibrium, D , and the additive fitness effect of the
locus under selection (Santiago & Caballero 1995; Charlesworth &
Jensen 2021). We note that variation maintained by long-term balancing
selection or divergent selection between populations can also affect
variability at linked sites, but this does not involve changes in allele
frequencies that are noticeable over the short-timescales considered by
Gompert et al. (2021), except in the initial phase of their
establishment (see Zeng et al . 2021 for a recent theoretical
analysis).
Thus, in one sense their remark about the deterministic nature of
indirect selection is correct. However, the initial value of D is
generated by various forms of chance associations between alleles at the
neutral and selected loci, as is indicated by the title of their paper,
so that there is inherently a stochastic element to hitchhiking.
Furthermore, D is reduced by a factor of 1 – c each
generation, where c is the frequency of recombination between the
two loci concerned. (Charlesworth & Charlesworth 2010, p.381). Multiple
theoretical analyses of hitchhiking have shown that significant effects
of selection on linked neutral sites only occur if the ratio of cto the selection coefficient at the selected loci is small, or the
neutral loci are embedded within a large number of loci that are
simultaneously experiencing selection (Charlesworth & Jensen 2021). A
neutral locus can only be affected by a selective sweep at an unlinked
locus (c = ½) if there is extremely strong selection, involving a
much greater than two-fold selective advantage to the beneficial allele.
Similarly, background selection caused by unlinked loci is likely to
have only a minor effect on neutral variability (Charlesworth 2012).
Detectable hitchhiking effects involving unlinked loci are therefore
likely to be very infrequent.
Second, their assertion that widespread hitchhiking effects are
”relatively untested” seems misplaced. One of the most consistent
patterns to emerge from the study of DNA sequence variation over the
past few decades is the generally positive correlation between neutral
variability and the local rate of recombination (Begun & Aquadro 1992;
Charlesworth & Jensen 2021). Genetic hitchhiking has long been
discussed as a major factor in generating this genome-wide correlation,
although in some species the mutagenic effect of recombination may
contribute as well (Pratto et al . 2014; Arbeithuber et al .
2015). Indeed, the abundant evidence for pervasive background selection
effects has prompted numerous authors to argue that it should be an
essential component in any evolutionary null model when conducting
population genomic studies (e.g., Comeron 2017; Pouyet et
al . 2018; Jensen et al. 2019).
Third, the authors suggest that, while hitchhiking effects have been
studied over long periods of time, “it remains unclear whether indirect
selection is pervasive on shorter time scales, such as generations or
decades”. Although it is not obvious how hitchhiking effects could
operate over long time-scales without also acting at short time-scales,
this remark reflects a more fundamental misunderstanding. As described
above, recombination events between directly selected mutations and
linked variants act to diminish hitchhiking effects, so that the
footprints associated with recent hitchhiking events are naturally
stronger than older ones. In fact, recombination, subsequent mutations,
and genetic drift cause these patterns to decay so rapidly that most
effects of individual selective sweeps are only detectable over
relatively brief periods of time since their occurrence (Przeworski
2002; Kim & Stephan 2002). If there has been strong directional
selection at certain loci in the experimental set-ups that are described
by Gompert et al. (2021), the short time-scale involved is
certainly favourable for the detection of their hitchhiking effects
(although the statistical power associated with the relatively small
experimental sample sizes and limited number of generations can be
problematic: Barrett et al. 2019).
When considering the evidence for hitchhiking effects in the cases
described by Gompert et al . (2021), we focus specifically on
their primary example - the stick insect Tisema cristinae .
Previous work has documented the existence of an interesting colour
polymorphism, controlled by a 10 megabase region, the Mel-Stripelocus, which appears to be potentially associated with a chromosomal
inversion (Lindtke et al. 2017). Among approximately 7 million
SNPS in their sample, Gompert et al . (2021) found 64 SNPs that
had r 2 ≥ 0.1 with Mel-Stripe and were
also on different chromosomes, where r is the correlation
coefficient between pairs of alleles at two loci (Charlesworth &
Charlesworth 2010, p.373). The first question is whether this could be
generated by the effect of random sampling of the 492 haploid genomes
sequenced in their experimental population. Under the null hypothesis of
no LD, an r 2 of 0.1 corresponds to a 1 d.f.\(\chi^{2}\) of 49.2, for which p = 2.31 x
10–12, using the incomplete gamma function with
parameter 0.5, which is equivalent to \({0.5\chi}^{2}\)(https://keisan.casio.com/exec/system/1180573447 ). The expected
number of SNPS with r 2 ≥ 0.1 is thus
approximately 2.31 x 10–12 x 7 x
106 = 1.62 x 10–5. Not
surprisingly, therefore, this explanation can be ruled out. Note that
the expected value of r 2 generated by random
sampling in the absence of true LD is 1/492 ≈ 0.002, which is not far
from the mean value of 0.004 for all pairs of unlinked SNPs reported by
Gompert et al . (2021).
There appear to be at least six possible explanations for this
unexpectedly large number of unlinked SNPs that are in fairly strong LD
with Mel-Stripe :
(1) There exist technical errors that results in spurious cases of LD or
incorrect assignment of the locations of SNPs; for example, so that SNPs
that are actually in the Mel-Stripe region are placed on other
chromosomes. The quality control details needed to evaluate this
possibility were not presented by the authors.
(2) LD between neutral SNPs and the Mel-Stripe locus has been
created by random genetic drift in a panmictic population over a long
period of time. This seems improbable, as the expected value ofr 2 with no linkage is approximately
1/(2Ne ) (Charlesworth & Charlesworth 2010,
p.383), and the size of the T. cristinae population used in the
experiment is said be of the order of thousands of individuals (Gompertet al . 2021).
(3) There has been very recent admixture from a genetically distinct
population or populations, resulting in patches of LD withMel-Stripe (Charlesworth & Charlesworth 2010, p.388). No
structure/admixture modelling needed to evaluate this possibility
appears to have been performed.
(4) There has been a recent and severe bottleneck in population size,
generating random LD of a much higher magnitude than expected under a
constant population size (Charlesworth & Charlesworth 2010, p.389), As
with point 3, these signatures may be detectable using standard
demographic modeling approaches (reviewed in Beichman et al .
2018); again, such analyses appear not to have been performed.
(5) The sample used in the experiment has captured an ongoing sweep
involving very strong directional selection on Mel-Stripe ; this
seems implausible, given the evidence that the Mel-Stripevariants represent a long-standing balanced polymorphism (Lindtkeet al. 2017).
(6) There could be epistatic fitness interactions between the SNP loci
and Mel-Stripe . This of course does not constitute ”indirect
selection” (i.e ., genetic hitchhiking) of the type proposed by
Gompert et al . (2021), but is perhaps what they have in mind when
they refer to “polygenic selection”. However, it is stretching
credulity that there could be tens of unlinked loci subject to very
strong epistatic fitness interactions with Mel-Stripe , resulting
in significant LD. Theoretical analyses of two-locus balanced
polymorphisms have shown that substantial LD requires the measure of
additive x additive epistasis for fitness to considerably exceed the
recombination frequency (Charlesworth & Charlesworth 2010, pp.420-425).
For example, in the simple symmetric two-locus fitness model of Lewontin
& Kojima (1960), LD is maintained at equilibrium with free
recombination only if 2 ≤ e , where e is the epistatic
fitness parameter (fitnesses are here measured relative to the fitness
of the double heterozygote). In this case, with equilibrium allele
frequencies of 0.5 at both loci, r 2 =
16D 2 = 1 – 2/e . Very few convincing
cases of LD maintained by epistatic selection among unlinked loci have
been described. The classic case is that of the Australian grasshopperKeyacris scurra (formerly Moraba scurra ), involving two
inversion polymorphisms that show a consistent pattern of LD across
multiple populations, and substantial deviations from Hardy-Weinberg
frequencies that indicate strong viability selection (Turner 1972). Even
here, the magnitude of r 2 is small. In the
sample that showed the highest value of D (i.e., Royalla B
1958), the data in Table 3 of Turner (1972) give D = 0.00116 andr 2 = 0.00349.
In summary, the observation of many unlinked SNPs with substantial LD inT. cristinae seems to raise more questions than it answers, and
cannot be taken as indicating widespread ”indirect selection” without
much more evidence, including a proper consideration of baseline
expectations related to both technical (e.g ., data quality) and
evolutionary (e.g ., population history) factors. Furthermore,
even if these alternatives were to be ruled out by further analyses,
genetic hitchhiking is neither a new nor poorly studied phenomenon.
REFERENCES
Arbeithuber, B., Betancourt, A.J., Ebner, T., & Tiemann-Boege, I.
(2015). Crossovers are associated with mutation and biased gene
conversion at recombination hotspots. Proceedings of the National
Academy of Sciences of the USA, 112 , 2109–14.
Barrett, R.D.H., Laurent, S., Mallarino, R., Pfeifer, S.P., Xu, C.C.,
Foll, M., … Hoekstra, H.E. (2019). Linking a mutation to survival
in wild mice. Science , 363 , 499–504.
Begun, D., & Aquadro, C.F. (1992). Levels of naturally occurring DNA
polymorphism correlate with recombination rate in Drosophila
melanogaster . Nature , 356 , 519–20.
Beichman, A.C., Huerta-Sanchez, E., & Lohmueller, K.E. (2018). Using
genomic data to infer historic population dynamics of non-model
organisms. Annual Review of Ecology, Evolution and Systematics ,49 , 433-56.
Charlesworth, B. (2012). The effects of deleterious mutations on
evolution at linked sites. Genetics, 190 , 5-22.
Charlesworth, B., & Charlesworth, D. (2010). Elements of
evolutionary genetics. Greenwood Village, CO: Roberts and Company,
Charlesworth, B. & Jensen, J.D. (2021). Effects of selection at linked
sites on patterns of genetic variability. Annual Review of
Ecology, Evolution and Systematics, 52 , 177-97.
Charlesworth, B., Morgan. M.T., & Charlesworth, D. (1993). The effect
of deleterious mutations on neutral molecular variation. Genetics,
134 , 1289–303.
Comeron, J.M. (2017). Background selection as null hypothesis in
population genomics: insights and challenges from Drosophilastudies. Philosophical Transaction of the Royal Society, Series B,
372 , 20160471.
Gompert, Z., Feder, J.L., & Nosil, P. (2021). Natural selection drives
genome-wide evolution via chance genetic associations. Molecular
Ecology, 00, 1-15. https://doi.org/10.1111/mec.16247.
Jensen, J.D., Payseur, B.A., Stephan, W., Aquadro, C.F., Lynch, M.,
Aquadro, C.F., …
Charlesworth, B. (2019). The importance of the Neutral Theory in 1968
and 50 years on: a response to Kern & Hahn 2018. Evolution ,73 , 111-4.
Kim, Y., & Stephan, W. (2002). Detecting a local signature of genetic
hitchhiking along a recombining chromosome. Genetics, 160 ,
765–77.
Lewontin, R. C., & Kojima, K.-I. (1960). The evolutionary dynamics of
complex polymorphisms. Evolution, 14 , 458-472.
Lindtke, D., Lucek, K., Soria-Carrasco, V., Villoutreix, R., Farkas, T.
E., Riesch, R., . . . Nosil, P. (2017) Long-term balancing selection on
chromosomal variants associated with crypsis in a stick insect.Mol. Ecol., 26 , 6189-6205.
Maynard Smith, J., & Haigh, J. (1974). The hitch-hiking effect of a
favourable gene. Genetical Research, 23 , 23–35.
Pouyet, F., Aeschbacher, S., Thiery, A., & Excoffier, L. (2018).
Background selection and biased gene conversion affect more than 95% of
the human genome and bias demographic inferences. eLife, 7 ,
e36317.
Pratto, F., Brick, K., Khil, P., Smagulova, F., Petukhova, G.V., &
Camerini-Otero, R.D. (2014). DNA Recombination. Recombination initiation
maps of individual human genomes. Science, 346 , 1256442.
Price, G. R. (1970). Selection and covariance. Nature,227 , 520-521.
Przeworski, M. (2002). The signature of positive selection at randomly
chosen loci. Genetics, 160 , 1179–89.
Robertson, A. (1968). The spectrum of genetic variation (ed. Lewontin
R.C.), pp. 5-16. Syracuse, NY: Syracuse University Press.
Santiago, E., & Caballero, A. (1995) Effective size of populations
under selection. Genetics, 139 , 1013-1030.
Turner, J. R. G. (1972). Selection and stability in the complex
polymorphism of Moraba scurra . Evolution, 26 ,
334-343.
Zeng, K., Charlesworth, B., & Hobolth, A. (2021). Studying models of
balancing selection using phase-type theory. Genetics ,218 , iyab055.