Tests of gene flow and dynamic history inference
To characterize the demographic histories of the four lineages, we
removed hybrids to investigate gene flow, effective population size (Ne)
and divergence time. We first examined the gene flow between lineages
through the ABBA-BABA test. For four lineages, (((NE, EL), CN),
outgroup), (((CN, NW), NE), outgroup) and (((CN, NW), EL), outgroup)
showed a significant deviation of D-stat from zero (absolute value of
Z-score greater than 3), indicating that gene flow exists between
lineages EL and CN, CN and NE, CN and EL. Interestingly, no gene flow
was detected between the NW lineage and other lineages (Figure 3A and
Table S5). Next, we employed TreeMix to further investigate the complex
patterns of gene flow between populations. The results of TreeMix
analysis inferred that recent gene flow was only exhibited in the EL and
NE lineages, while historical gene flow was exhibited in NE, EL and CN
lineages (Figure S7).
Combined with the results of the ABBA-BABA test and TreeMix analysis,
the evolutionary history of the four lineages was inferred by
fastsimicoal2 using pairwise joint site frequency spectra. Based on the
best-supported model (Table S6), the effective population size was
927,589 for the ancestors. The ancestral population was differentiated
into eastern and western lineages at approximately 239 Kya (95% highest
posterior density (HPD) = 217–267 Kya), and their effective population
size were 385,321 and 433,093, respectively. Next, CN and NW separated
at approximately 211 Kya (95% HPD = 196–227 Kya), and NE and EL
separated at 168 Kya (95% HPD = 153–184 Kya). All the divergence times
were in the Middle Pleistocene. The current effective population sizes
of NE, EL, CN and NW were 561,674, 99,440, 433,094 and 385,321,
respectively (Figure 3B and Table S7). In addition, the divergence of
the four lineages accompanied six bidirectional gene flow events
inferred by fastsimcoal2, including three ancient gene flow and three
modern gene flow. The modern gene flow was higher from NE to EL than
others (CN to EL > CN to NE > EL to NE
> NE to CN > EL to CN), which indicated gene
flow between contacting lineages than those geographically isolated
ones. The credible lineage divergence pattern had a better fit based on
the comparison of the simulated dataset with the observed site frequency
spectra (Figure S9).
FST was calculated across different lineages (NE, EL, CN
and NW) to infer population genetic differentiation. At the overall
level of the genome, the FST values had the highest
value between EL and NW (> CN vs. NW > NE vs.
EL > EL vs. CN > NE vs. NW > NE
vs. CN), which indicated that the A. viridiflora complex had a
moderate level of differentiation. Lineage NW was the most
differentiated from the other lineages may be due to the lack of recent
gene flow. This pattern was also obvious from the FSTand DXY values calculated for the four lineages using 10
kb windows across genomes, which were consistent with
FST at the overall level of the genome (Figure 3C). We
compared
linkage
disequilibrium decay between different chromosomes of four lineages. The
analysis showed that CN and EL presented a greater degree of LD, while
NE and NW showed less linkage disequilibrium (indicated by
r2). When r2 = 0.1, the decay
distances of NE, EL, CN and NW were 10 kb, 37 kb, 44 kb and 6.9 kb,
respectively (Figure 3D). The rapid decay of NE and NW may have been due
to the higher genetic diversity relative to that of EL and CN.