Population genomic resequencing, data processing, and genotyping
Sequencing. We resequenced a total of 30 individuals from two populations each of six montane bird species from the Ethiopian Highlands (Fig. 1, Table 1, Table S1). We extracted genomic DNA from blood samples using QIAGEN (Hilden, Germany) DNeasy blood and tissue kits following manufacturer guidelines. Genomic DNA extractions were sent to the Oklahoma Medical Research Foundation (OMRF) Clinical Genomics Center for standard Illumina shotgun sequencing library creation and subsequent sequencing on either an Illumina HiSeq3500 or NovaSeq6000. Samples were multiplexed on flow cells with other libraries from unrelated projects, with the goal to obtain ~8-25x mean genomic sequencing coverage. We were able to aim for similar sequencing coverage across samples because avian genomes tend to have very similar sizes (~1.2 Gbp).
Filtering and Genotyping. We used the bbduk.sh script of the bbmap package (Bushnell, 2014) to trim adapters and quality filter raw sequencing data. We used BWA (Li & Durbin, 2009) with the BWA-MEM command to align filtered reads to a reference genome. Here, we used the Zebra Finch (Taeniopygia guttata ) genome (Warren et al., 2010) as a reference for aligning reads and genotyping. This is possible as all birds in this study are in the same order Passeriformes, and most of the genome is similar enough to the species studied here so as to align reads (Fig. S1). In addition, because synteny is well conserved in birds (Derjusheva et al., 2004; Griffin et al., 2008), analyses relying on haplotype structure should not be strongly biased. Despite being able to align most regions of the genome (~76% to 86% in each individual; Table S1), some regions may not be aligned between our focal species and the reference genome due to (1) poorly assembled regions of the reference, (2) regions evolving quickly in the Zebra Finch or our focal species studied here, or (3) hard to align regions such as non-unique regions of the genome (e.g, repetitive elements, etc.).
We used samtools v1.4.1 (Li et al., 2009) to convert the BWA output SAM file to BAM format, and lastly cleaned, sorted, added read groups to, and removed duplicates from each BAM file using the Genome Analysis Toolkit (GATK) v4.1.0.0 (McKenna et al., 2010). We then used GATK’s functions HaplotypeCaller and GenotypeGVCFs to group genotype each species’ individuals together for both variant and invariant sites. Here, we limited analyses to chromosomes with a 1 Mbp minimum size. We measured the distribution of sequencing coverage using the samtools ‘depth’ command (Fig. S1).
We used VCFtools v0.1.14 (Danecek et al., 2011) to initially filter all variant and invariant site calls using the following restrictions: (1) minimum site quality of 20, (2) minimum genotype quality of 20, (3) minimum depth of coverage of 6, (4) maximum mean depth of coverage of 50, and (5) indel removal.
Mitochondrial Genomes. We assembled complete mitochondrial genomes for each individual and uploaded them to GenBank as a resource for the research community. Briefly, we extracted potential mitochondrial reads from the raw FASTQ data using the bbsplit.sh script from the BBMap package (Bushnell, 2014) and several songbird mitogenomes as references. We used these putative mtDNA reads in Geneious (BioMatters Ltd.) to assemble mitogenomes. We then used the Geneious “Live Annotate” feature with the reference songbird mitogenomes to annotate the new assemblies. All assemblies are accessioned on NCBI’s GenBank: MT017889- MT017917. To visualize mtDNA structure, we created a median joining network of the NADH Dehydrogenase Subunit 2 (ND2) gene for each species using PopART (Bandelt, Forster, & Röhl, 1999; Leigh & Bryant, 2015).