Xerces Blue and Silvery Blue mapping and variant calling
The ancient DNA reads were clipped using AdapterRemoval v2.2 (32), sequencing adapters were removed. Only reads longer than 25bp were kept. Filtered reads were mapped against the G. alexis assembly with Burrows-Wheeler Aligner (BWA) (33), using the backtrack algorithm, setting no trimming, disabling seed (-l 1024), increasing stringency for edit distance (-n 0.01), and allowing opening of 2 gaps (33, 34) Duplicated reads were removed using -tools MarkDuplicates (35). Mapped reads with mapping quality below 25 were removed using Samtools (36).
Basic mapping statistics were generated using Qualimap 2 (37). The resulting reads were examined with PMDtools and MapDamage2 to assess the degradation rate of the data, which is a sign of authenticity (38,39). We detected the presence of typical aDNA-damaged bases at the end of reads and pmd score distribution that could be attributed to an authentic museum sample (38) (Fig. S8). To avoid problems in the next steps, we trimmed 2 nt from each read end using BamUtil trimbam (40). Bedtools was to assess genome coverage across the reference, using windows of 1mbp for the nuclear fraction of the genome (41). To account for the reference genome mappability, we split the reference genome in reads of 60bp and re-mapped them against the reference. Coverage and mappability are displayed using Circos (42).
We used snpAD v0.3.2 (43), a program developed for genotype calling in ancient specimens since it infers priors for each sample separately to account for DNA damage patterns. The mapped sequences were transformed from bam-format into snpAD-format files, priors for base composition estimated, and genotypes were called using standard settings. The VCFs were combined and concatenated with CombineVariants and GatherVcfs from GATK v3.5 (44) and filtered with vcftools v0.1.12b (45) to keep only sites within the mappable fraction of the genome with minimum read depth of 2, max read depth of 30, genotype quality > 30, maximum missingness of 0.6, minor allele frequency of 5% and excluding indels and multiallelic sites.
Genotype likelihoods were obtained with ANGSD (46) version 0.916 using the GATK model with the following parameters for all the samples: -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -trim 10 -C 50 -baq 1 -minInd 5 -skipTriallelic 1 -GL 2 -minMapQ 30 -doGlf 2 -doMajorMinor 1 -doMaf 2 -minMaf 0.05 -SNP_pval 1e-6.