Xerces Blue and Silvery Blue mapping and variant calling
The ancient DNA reads were
clipped using AdapterRemoval v2.2 (32), sequencing adapters were
removed. Only reads longer than 25bp were kept. Filtered reads were
mapped against the G. alexis assembly with Burrows-Wheeler
Aligner (BWA) (33), using the backtrack algorithm, setting no
trimming, disabling seed (-l 1024), increasing stringency for edit
distance (-n 0.01), and allowing opening of 2 gaps (33, 34) Duplicated
reads were removed using -tools MarkDuplicates (35). Mapped reads with
mapping quality below 25 were removed using Samtools (36).
Basic mapping statistics were generated using Qualimap 2 (37). The
resulting reads were examined with PMDtools and MapDamage2 to assess the
degradation rate of the data, which is a sign of authenticity (38,39).
We detected the presence of typical aDNA-damaged bases at the end of
reads and pmd score distribution that could be attributed to an
authentic museum sample (38) (Fig. S8). To avoid problems in the next
steps, we trimmed 2 nt from each read end using BamUtil trimbam (40).
Bedtools was to assess genome coverage across the reference, using
windows of 1mbp for the nuclear fraction of the genome (41). To account
for the reference genome mappability, we split the reference genome in
reads of 60bp and re-mapped them against the reference. Coverage and
mappability are displayed using Circos (42).
We used snpAD v0.3.2 (43), a program developed for genotype calling in
ancient specimens since it infers priors for each sample separately to
account for DNA damage patterns. The mapped sequences were transformed
from bam-format into snpAD-format files, priors for base composition
estimated, and genotypes were called using standard settings. The VCFs
were combined and concatenated with CombineVariants and GatherVcfs from
GATK v3.5 (44) and filtered with vcftools v0.1.12b (45) to keep only
sites within the mappable fraction of the genome with minimum read depth
of 2, max read depth of 30, genotype quality > 30, maximum
missingness of 0.6, minor allele frequency of 5% and excluding indels
and multiallelic sites.
Genotype likelihoods were obtained with ANGSD (46) version 0.916 using
the GATK model with the following parameters for all the samples:
-uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -trim 10 -C 50 -baq
1 -minInd 5 -skipTriallelic 1 -GL 2 -minMapQ 30 -doGlf 2 -doMajorMinor 1
-doMaf 2 -minMaf 0.05 -SNP_pval 1e-6.