Tests of gene flow and dynamic history inference
To characterize the demographic histories of the four lineages, we removed hybrids to investigate gene flow, effective population size (Ne) and divergence time. We first examined the gene flow between lineages through the ABBA-BABA test. For four lineages, (((NE, EL), CN), outgroup), (((CN, NW), NE), outgroup) and (((CN, NW), EL), outgroup) showed a significant deviation of D-stat from zero (absolute value of Z-score greater than 3), indicating that gene flow exists between lineages EL and CN, CN and NE, CN and EL. Interestingly, no gene flow was detected between the NW lineage and other lineages (Figure 3A and Table S5). Next, we employed TreeMix to further investigate the complex patterns of gene flow between populations. The results of TreeMix analysis inferred that recent gene flow was only exhibited in the EL and NE lineages, while historical gene flow was exhibited in NE, EL and CN lineages (Figure S7).
Combined with the results of the ABBA-BABA test and TreeMix analysis, the evolutionary history of the four lineages was inferred by fastsimicoal2 using pairwise joint site frequency spectra. Based on the best-supported model (Table S6), the effective population size was 927,589 for the ancestors. The ancestral population was differentiated into eastern and western lineages at approximately 239 Kya (95% highest posterior density (HPD) = 217–267 Kya), and their effective population size were 385,321 and 433,093, respectively. Next, CN and NW separated at approximately 211 Kya (95% HPD = 196–227 Kya), and NE and EL separated at 168 Kya (95% HPD = 153–184 Kya). All the divergence times were in the Middle Pleistocene. The current effective population sizes of NE, EL, CN and NW were 561,674, 99,440, 433,094 and 385,321, respectively (Figure 3B and Table S7). In addition, the divergence of the four lineages accompanied six bidirectional gene flow events inferred by fastsimcoal2, including three ancient gene flow and three modern gene flow. The modern gene flow was higher from NE to EL than others (CN to EL > CN to NE > EL to NE > NE to CN > EL to CN), which indicated gene flow between contacting lineages than those geographically isolated ones. The credible lineage divergence pattern had a better fit based on the comparison of the simulated dataset with the observed site frequency spectra (Figure S9).
FST was calculated across different lineages (NE, EL, CN and NW) to infer population genetic differentiation. At the overall level of the genome, the FST values had the highest value between EL and NW (> CN vs. NW > NE vs. EL > EL vs. CN > NE vs. NW > NE vs. CN), which indicated that the A. viridiflora complex had a moderate level of differentiation. Lineage NW was the most differentiated from the other lineages may be due to the lack of recent gene flow. This pattern was also obvious from the FSTand DXY values calculated for the four lineages using 10 kb windows across genomes, which were consistent with FST at the overall level of the genome (Figure 3C). We compared linkage disequilibrium decay between different chromosomes of four lineages. The analysis showed that CN and EL presented a greater degree of LD, while NE and NW showed less linkage disequilibrium (indicated by r2). When r2 = 0.1, the decay distances of NE, EL, CN and NW were 10 kb, 37 kb, 44 kb and 6.9 kb, respectively (Figure 3D). The rapid decay of NE and NW may have been due to the higher genetic diversity relative to that of EL and CN.