Population genomic resequencing, data processing, and genotyping
Sequencing. We resequenced a total of 30 individuals
from two populations each of six montane bird species from the Ethiopian
Highlands (Fig. 1, Table 1, Table S1). We extracted genomic DNA from
blood samples using QIAGEN (Hilden, Germany) DNeasy blood and tissue
kits following manufacturer guidelines. Genomic DNA extractions were
sent to the Oklahoma Medical Research Foundation (OMRF) Clinical
Genomics Center for standard Illumina shotgun sequencing library
creation and subsequent sequencing on either an Illumina HiSeq3500 or
NovaSeq6000. Samples were multiplexed on flow cells with other libraries
from unrelated projects, with the goal to obtain ~8-25x
mean genomic sequencing coverage. We were able to aim for similar
sequencing coverage across samples because avian genomes tend to have
very similar sizes (~1.2 Gbp).
Filtering and Genotyping. We used the bbduk.sh script of
the bbmap package (Bushnell, 2014) to trim adapters and quality filter
raw sequencing data. We used BWA (Li & Durbin, 2009) with the BWA-MEM
command to align filtered reads to a reference genome. Here, we used the
Zebra Finch (Taeniopygia guttata ) genome (Warren et al., 2010) as
a reference for aligning reads and genotyping. This is possible as all
birds in this study are in the same order Passeriformes, and most of the
genome is similar enough to the species studied here so as to align
reads (Fig. S1). In addition, because synteny is well conserved in birds
(Derjusheva et al., 2004; Griffin et al., 2008), analyses relying on
haplotype structure should not be strongly biased. Despite being able to
align most regions of the genome (~76% to 86% in each
individual; Table S1), some regions may not be aligned between our focal
species and the reference genome due to (1) poorly assembled regions of
the reference, (2) regions evolving quickly in the Zebra Finch or our
focal species studied here, or (3) hard to align regions such as
non-unique regions of the genome (e.g, repetitive elements, etc.).
We used samtools v1.4.1 (Li et al., 2009) to convert the BWA output SAM
file to BAM format, and lastly cleaned, sorted, added read groups to,
and removed duplicates from each BAM file using the Genome Analysis
Toolkit (GATK) v4.1.0.0 (McKenna et al., 2010). We then used GATK’s
functions HaplotypeCaller and GenotypeGVCFs to group genotype each
species’ individuals together for both variant and invariant sites.
Here, we limited analyses to chromosomes with a 1 Mbp minimum size. We
measured the distribution of sequencing coverage using the samtools
‘depth’ command (Fig. S1).
We used VCFtools v0.1.14 (Danecek et al., 2011) to initially filter all
variant and invariant site calls using the following restrictions: (1)
minimum site quality of 20, (2) minimum genotype quality of 20, (3)
minimum depth of coverage of 6, (4) maximum mean depth of coverage of
50, and (5) indel removal.
Mitochondrial Genomes. We assembled complete
mitochondrial genomes for each individual and uploaded them to GenBank
as a resource for the research community. Briefly, we extracted
potential mitochondrial reads from the raw FASTQ data using the
bbsplit.sh script from the BBMap package (Bushnell, 2014) and several
songbird mitogenomes as references. We used these putative mtDNA reads
in Geneious (BioMatters Ltd.) to assemble mitogenomes. We then used the
Geneious “Live Annotate” feature with the reference songbird
mitogenomes to annotate the new assemblies. All assemblies are
accessioned on NCBI’s GenBank: MT017889- MT017917. To visualize mtDNA
structure, we created a median joining network of the NADH Dehydrogenase
Subunit 2 (ND2) gene for each species using PopART (Bandelt, Forster, &
Röhl, 1999; Leigh & Bryant, 2015).