Virus extraction, RT-PCR, virus isolation
Viral RNA was extracted from swab samples using the
MagMAXTM-96 AI/ND Viral RNA Isolation Kit (Ambion,
Austin, TX) following the manufacturer’s procedures. Real-time RT-PCR
was performed using previously published procedures, primers, and probes
(Spackman et al., 2002) designed to detect the IAV matrix gene. RT-PCR
assays utilized reagents provided in the Qiagen
OneStep® RT-PCR kit (Qiagen, Germantown, MD).
Following RT-PCR screening, all
samples with Ct values <45 were attempted to be grown in
embryonated chicken egg cultures (Woolcock, 2008), the harvested
allantoic fluid were tested again by RT-PCR and samples with Ct values
<22 were selected for sequencing. This cut-off was selected
based on our previous experiences, as samples with higher Ct-values
typically yield poor quality sequence results. Egg-grown virus isolates
were sequenced using multiple standard methods including Sanger (Sanger,
Nicklen, & Coulson, 1977), Roche 454 (Margulies et al., 2005), and
Illumina HiSeq 2000 and MiSeq (Illumina, San Diego, CA) sequencing
(Dusek et al., 2014; Guan et al., 2019; Hall et al., 2014).
Datasets for
phylodynamic analyses
Global dataset: The
polymerase basic protein 2 (PB2)
segment was selected as the basis for phylodynamic analysis, given (a)
it is the largest internal segment of the IAV genome (maximizing the
number of nucleotides in the analysis (>2000 nts)), (b) it
is relatively fast evolving for an internal gene with a mean of 3.15 x
10-3 (95% HPD 1.94-4.44) substitutions per site per
year (Rubing Chen & Holmes, 2006), and (c) enables the investigation of
transmission dynamics without targeting a specific subtype.
The temporal signal of the PB2
gene and clock-like evolution of both global and local datasets were
evaluated (Supplementary figure 1). All globally available avian and
marine mammal IAV PB2 genes sequenced between 2009 and 2019, excluding
those from Iceland, were downloaded from the National Center for
Biotechnology Information Influenza Virus Resource database (NCBI IVR)
(Bao et al., 2008) on February 12, 2020, resulting in 13,434 sequences.
Iceland-derived sequences were downsampled separately, as described
below. Duplicate sequences (based on collection date, location, and
nucleotide content) and sequences with less than 75% unambiguous bases
were removed, and all vaccine derivative and
laboratory-synthesized recombinant
sequences were excluded. Sequences in the dataset were only included if
isolation dates, location, and host species were available, resulting in
7,210 remaining sequences. The subsequent downsampling strategy aimed to
reduce the number of sequence taxa for computational efficiency and to
mitigate sampling bias while maintaining genetic diversity in the
dataset.
Downsampling
of global taxa outside of Iceland (i.e. the ‘outgroup’): Four variables
were considered important for explaining genetic diversity in the global
outgroup IAV sequence dataset: geographic region, host taxa, sampling
year, and hemagglutinin (HA) subtype. Five geographic region categories
included North America, Europe, Asia, Africa, and South America
(Australia and Antarctica were removed due to insufficient sequence
counts). Fourteen HA subtype
categories included H1, H2, H3, H4, H6, H8, H10, H11, H12, H13, H14,
H15, H16, and pooled H5/7/9. H5, H7, and H9 were combined to reduce
bias, as these were over-represented in the global dataset. Four host
categories included Anseriformes, Charadriiformes, Galliformes, and
Other, which comprised all other avian taxa and marine mammals.
To inform the downsampling strategy for the outgroup and evaluate if any
of the four variables were correlated, a multiple correspondence
analysis (MCA) was performed (JMP Pro v.14.0.0 (JMP Version 14.0.0,
1989-2019)). The MCA uses
categorical data as input, which for this study included the sampling
metadata associated with each sequence (region, host taxa, year, and HA
subtype). Through representation of the variables in two-dimensional
Euclidean space, significant clustering of HA subtypes with host taxa
was detected (Supplementary figure 2), indicative of host-specific
subtypes that are a well-known feature of influenza (Olson et al.,
2014). These findings confirmed by previously published data on
species-specificity of HA subtypes (Byrd-Leotis, Cummings, &
Steinhauer, 2017; Long, Mistry, Haslam, & Barclay, 2019; Verhagen et
al., 2015) led us to downsample the dataset stratifying taxa by two
non-overlapping variables:
geographic region and HA subtype. Downsampling of the global data
resulted in 21-75 taxa per five geographic region categories and 6-30
taxa per 14 HA subtype categories, resulting in a total of 301 sequences
in the global outgroup, with relative evenness across sampling years.
This step was performed to mitigate sampling bias resulting in
over-representation of species or viral strains, while accounting for
genetic diversity in the dataset.
Downsampling of Iceland-derived taxa (i.e. the ‘ingroup’): Next,
virus sequences from Iceland (including 35 downloaded from NCBI IVR and
58 novel viruses isolated and recently sequenced by our group (n=93))
(Dusek et al., 2022) were downsampled by stratifying taxa by HA subtype
(generating 1-15 sequences per 14 HA subtype categories), resulting in
63 sequences. These 63 sequences were used for global and local discrete
trait analyses and the ingroup dataset reflected the underlying
composition of host-specific subtypes present in this localized system.
To assist with rooting and time-calibration of the tree, historical
avian sequences from NCBI IVR were downloaded for the years 1979-2008.
These were downsampled by year to ensure one sequence per year,
resulting in 30 historic sequences. The total downsampled dataset,
including the outgroup (n=301), ingroup (n=63), and historic sequences
(n=30) resulted in a total of 394 sequences (Gass J.D., 2021).
Europe-Iceland-North America
Datasets: Following analyses
which identified the most significant geographic regions acting as
sources of IAVs to Iceland, a second dataset was constructed at a
restricted scale to Europe, Iceland, and North America between 2009 and
2019. First, the cleaned global dataset described above (n=7245) was
downsampled to include significant source regions of North America
(n=3222) and Europe (n=407), totaling 3629 sequences. To identify at
lower spatial resolution the source/sink locations relevant to Iceland,
a K-means cluster analysis was performed (JMP Pro v.14.0.0 (JMP Version
14.0.0, 1989-2019)) using latitude/longitude coordinates for each of the
3629 sequences (obtained by extracting sampling location from the strain
name of each sequence and searching in www.geonames.org). A total of 20
intraregional clusters resulted. Identified clusters with <50
sequences were combined with geographically proximal clusters to
increase evenness of within cluster sequence counts for discrete traits
analyses, resulting in 13 intraregional cluster locations within North
America, Iceland, and the rest of Europe (Supplementary figure 3).
Viral sequences were then downsampled from 3629 to 743 taxa stratifying
by intraregional cluster groupings and HA subtype.
Two datasets were formed, both of
which included the 743 downsampled sequences
(557 from North America, 229 from
Europe excluding Iceland), 30 historic sequences (same as the global
analysis), and: (i) for discrete trait phylogeographic and phylodynamic
analyses (non-geographic analyses of host transmission and subtype
reassortment patterns using discrete diffusion models): 63 downsampled
Iceland-derived sequences, totaling 836 sequences, and (ii) for
continuous trait phylogeographic analysis, all 93 Iceland-derived
sequences were included, totaling 866 sequences (Gass J.D., 2021). For
purposes of clarity, we use the term ‘phylogeographic’ to refer to
analyses that are geographic in nature and ‘phylodynamic’ to refer to
analyses that focus on viral diffusion between non-geographic traits,
namely transmission between species and reassortment of HA subtypes.
The two datasets were formed
because (a) heterogeneity in sampling among locations and host taxa can
bias results of discrete trait analyses, therefore downsampling to
ensure relative homeogeneity of trait groupings (while striving to
preserve host and pathogen population diversity across space and time)
is required, and (b) continuous trait analyses are robust against
heterogeneity in sampling (Baele, Suchard, Rambaut, & Lemey, 2017;
Lemey, Rambaut, Welch, & Suchard, 2010), thus the full Iceland dataset
was included for continuous analyses (Gass J.D., 2021). Multiple
sequence alignments of PB2 sequences were performed using MUSCLE in
Geneious Prime 2020.01.02 (https://www.geneious.com) and trimmed to the
open reading frame.
Maximum-likelihood phylogenies of
PB2 segments in the downsampled datasets were reconstructed using RAxML
v8.2.12 (Stamatakis, 2006) and temporal signal was investigated using
TempEst v1.5.3 (Rambaut, Lam, Max Carvalho, & Pybus, 2016)
(Supplementary figure 1).