2.5 Population genomic summary statistics
A series of diversity and demographic statistics were estimated from the
folded site frequency spectrum (SFS) with ANGSD . Genome-wide
heterozygosity was estimated per sample by first computing the folded
site allele frequency likelihood using the reference genome as ancestral
state and then calculating the folded SFS. The folded SFS was calculated
independently for each sampling site after removing admixed (less than
70% ancestry to any cluster under K4) and migrant individuals
(individuals whose ancestry was different from the prevalent cluster of
their sampling location). Then we estimated both Watterson’s theta and
Tajima’s D, using a sliding-window approach with window size of 50 kb
and a step size of 10 kb. Individual inbreeding coefficients (F )
were estimated with the software ngsF (v1.2.0) (Vieira et al.,
2013). First, approximate F were obtained in an initial run using
the -aprox_EM method, with a maximum root mean squared
difference between iterations of
1x10-5(-min_epsilon ) and random initial
values. From the output of this first run, the initial parameters for
the final run were derived, where the -min_epsilon value was
decreased to 1x10-7 to assume convergence. To avoid
convergence to local maxima, this two-step analysis was repeated ten
times, as suggested by the authors (Vieira et al., 2013).