Introduction
The prodigious throughput of short-read sequencing technology has
revolutionized quantitative genetics by allowing multiplexed genome-wide
genotyping of large numbers of individuals with minimal ascertainment
bias (Davey et al., 2011; Andrews et al., 2016). A major technical
challenge to this approach is accurate calling of heterozygous genotypes
at low sequencing depth. To circumvent this problem,
reduced-representation libraries are often generated using restriction
enzymes (Baird et al., 2008; Elshire et al., 2011) or sequence capture
(Gnirke et al., 2009; Ali et al., 2016), increasing sequencing depth
across a subset of the genome. However, haploid or inbred individuals
can generally be genotyped and imputed much more accurately and
inexpensively than heterozygous individuals (Swarts et al., 2014). A
second challenge to genotyping with low-depth short-read data is the
possibility of “homeo-SNPs” arising from alignment of reads from
homeologous regions of the genome (Tinker et al., 2014; Hulse-Kemp et
al., 2015). These false polymorphisms can often be identified by their
excess heterozygosity relative to Hardy-Weinberg equilibrium, but
homeo-SNPs that escape filtering may interfere with imputation and
estimation of relatedness between individuals. Homeo-SNPs are
particularly problematic in polyploids and interspecific hybrids.
Polyploidy and interspecific hybridization are common features of plant
evolution that are exploited in plant breeding to generate novelty,
increase vigor, and stack desirable alleles from different species (Alix
et al., 2017). Tree and vine crops often rely on interspecific hybrid
rootstocks to increase vigor and resilience to biotic or abiotic
stresses without affecting fruit or nut quality in the grafted scion
(Warschefsky et al., 2015). In California, for example, production of
almonds (Prunus dulcis) (Ledbetter and Sisterson, 2008), walnuts(Juglans regia) (Ramasamy et al., 2021), and pistachios(Pistacia vera) (Ferguson et al., 2002) relies on rootstocks that
are interspecific hybrids. Each of these nut crops has a mating system
that can be exploited to generate large numbers of hybrid progeny
(self-incompatibility, monoecy, and dioecy, respectively), and superior
hybrid genotypes can be propagated clonally. However, genetic gain in
tree breeding programs is generally slow due to the time and space
required for evaluation, as well as the difficulty of genotyping highly
heterozygous material.
This study evaluates different methods for generating genotype data from
elite populations of interspecific hybrid pistachio (P. atlantica
X P. integerrima ; n=725) and walnut (J. microcarpa X J. regia ;
n=228) rootstocks. Short read sequencing was performed on
reduced-representation libraries for each species. A typical workflow
would be to align the resulting reads against either the maternal (P1)
or paternal (P2) genome (Figure 1). Because interspecific hybrids are
composed of one haploid gamete from each parent, we expected that
alignment to both parental genomes simultaneously (P1+P2) would result
in haploid data, avoiding depth thresholding and greatly increasing
genotyping efficiency for heterozygous germplasm.