Global population structure
To investigate genetic clustering of P. vivax populations we used the biallelic SNPs as input for PCA and phylogenetic analysis. Both analyses (PCA + tree) reveal the presence of three major clusters consistent with their geographical origin (Figure 2A and B). Isolates from ESEA + MSEA form a differentiated cluster in the vicinity of isolates from OCE. Isolates from AFR cluster close to isolates from WAS, however, these two regions are clearly separated in the fourth principal component of the PCA (supplementary figure 1) and form separate clades in the tree (Fig 2B). Isolates from LAM form a distinct cluster and clade in the PCA and tree, respectively. Together this indicates a high genetic diversity of the global P. vivax population, confirmed by high nucleotide diversity (supplementary figure 2), with a geographical structuring of populations.
Admixture analysis estimated ten (K=10) geographically distinct ancestral populations (Figure 2C). All genomes from AFR, WAS and OCE were predicted to belong predominantly to a single shared ancestry within each region, while genomes from LAM, ESEA and MSEA regions, each belong to distinct subpopulations (i.e. ancestral population within a region, Figure 2C). Admixture (predicted ancestry to more than one cluster) is mostly observed between subpopulations within a region (e.g., in LAM and ESEA), and rarely between regions, with the exception the admixture observed in AFR with WAS.
In the phylogenetic tree, isolates from WAS form two separate clades, with the upper cluster containing isolates from India (Figure 2B). This separate subpopulation could not be confirmed in the admixture analysis that estimated one ancestral cluster in this region (Figure 2C). Therefore, while Indian isolates might be distinct from other isolates in WAS, all P. vivax isolates from this region share a common ancestry. The highest amount of admixture between isolates is observed between the three subpopulations in LAM (mixed ancestry proportions to K7 and K10 and to a lesser extent K4), indicating a shared ancestry or gene flow between these subpopulations (Figure 2C).