Statistical Analyses
After alignment, we discarded the monomorphic nucleotide positions, and
considered only those polymorphic. Thus, we were left with 30 positions
for EF1 and 33 positions for NaKA, and the following analysis was done
for each marker separately: For each of the 14 populations, we
calculated the distribution of the four different nucleotides (A, C, G
and T) in each of the nucleotide positions. Note that ifx 1, x 2,x 3 and x 4 are the
proportions of A, C, G and T in a position, then is a point in the
four-dimensional space, whose sum of coordinates is 1. The corresponding
point x * = lies on the surface of the four-dimensional unit
sphere. Next we compared, for each position, the distribution of the
four different nucleotides between the 14 different populations, by
using a distance or a similarity metric (see below). Thus, we get for
each position, 91 pairwise distances (or similarities). These distances
(or similarities) were averaged over all relevant positions of the
marker, to obtain the final pairwise distances (or similarities) for the
marker. The results were arranged in a 14×14 symmetric distance (or
similarity) matrix. We then we added the two distance (or similarity)
matrices (one for each marker) to obtain the comprehensive distance (or
similarity) between the populations. This final matrix served for
constructing population dendrograms or for performing a Principal
Coordinates (PCoA) analysis.
We used three different distance measures, the squared Euclidean
distance, a modified squared chord distance and the Manhattan (or city
block) distance, and one similarity measure, a modified Morisita’s
similarity coefficient. If x 1,x 2, x 3 andx 4 are the proportions of A, C, G and T in
population 1, and y 1, y 2,y 3 and y 4 are these
proportions in population 2, then: the squared Euclidean distance = ;
the modified squared chord distance = (which is actually the squared
length of the chord connecting x * and y * on the unit
sphere); the Manhattan distance = ; and the modified Morisita’s
similarity coefficient = .
We considered three different amalgamation procedures – UPGMA
(unweighted pair group method with arithmetic mean), minimum variance
(Ward’s method) and furthest neighbor (complete-linkage clustering), as
well as PCoA analysis, using the MVSP software, Kovach Computation
Services 2013. We thus can construct ten different unconstrained trees
(i.e., all different combinations, except that minimum variance is only
applicable for the squared Euclidean or the squared chord distances).
For each population we calculated, separately for each marker, the mean
number of different alleles per position. We then averaged over the two
markers to obtain the overall mean number of different alleles per
position in this population. In addition, we calculated for each
population the mean expected heterozygosity of a marker, defined as,
where ,,and are the proportions of A, C, G and T in position k(k = 1, 2, … ,N , where N is the number of
positions in the marker). We then averaged the measures of the two
markers, to obtain the expected heterozygosity of the relevant
population. Similarly, for each population, we calculated the percentage
of polymorphic positions in each marker, and then averaged the
percentages of the two markers, to obtain the polymorphism measure of
the relevant population.
We call an allele which is present
in a population X (with a frequency of at least 1%), anexclusive to that population, if it is present in that population
but not in other population or populations to which we compare
population X.
Statistical tests were carried out using IBM SPSS Statistics 26. Allp -values are given for a two-tailed alternative.