Pedigrees complement genomic study design and inference
The relationship information and metadata captured by pedigrees are an invaluable tool to help design and implement genomic research. For example, without pedigree data, there is no way to know how representative reference genomes actually are. Consequently, pedigrees provide biologically-relevant data to inform a non-biased selection of individuals for building representative and high-quality reference genomes. When selecting an individual for a reference genome for species with genetic sex determination, some researchers have preferred selecting either the homogametic sex to ensure adequate coverage of the homogametic sex chromosome (i.e., X or Z), or the heterogametic sex to capture the alternative and often highly repetitive sex chromosome (i.e., Y or W; Tomaszkiewicz, Medvedev, & Makova, 2017; Rhie et al., 2021). In addition to helping select the candidates for sampling based on sex, pedigrees can also identify individuals that are likely to be highly inbred, which assists with genome assembly by reducing error associated with ambiguity between heterozygosity and genetic paralogues (Hahn, Zhang, and Moyle 2014; Rhie et al., 2021). Further, detailed pedigrees can enable selection of parent-offspring trios to generate phased de novo genome assemblies (Korbel & Lee, 2013; Koren et al., 2018; Leitwein, Duranton, Rougemont, Gagnaire, & Bernatchez, 2020). High-quality de novo reference genomes are a powerful resource for the conservation and evolutionary genomics community by facilitating read mapping (Card et al., 2014), mining for genes of interest (e.g., Greenhalgh et al., 2021), and SNP discovery and genotyping (e.g., Galla et al., 2019; Brandies, Peel, Hogg, & Belov, 2019; Gooley et al., 2020). To further characterise variants across the genome, including structural variants (SVs), pedigrees may be used to inform the curation of a pangenome, which is the assembly of multiple individuals with the aim to capture all standing genomic diversity in a population or species of interest (Tettelin et al. 2005; Brockhurst et al., 2019). In this instance, a pedigree can be leveraged to identify distantly related individuals to ensure the pangenome is representative (Wold et al. 2021 pre-print).
Pedigree data is also an invaluable resource for selecting individuals for resequencing (i.e., whole genome resequencing, or WGS). For example, a pedigree can inform the choice of closely related family groups for genomic inquiry (e.g., Galla et al., 2020), understanding characterized phenotypes of interest (Nersisyan, Nikoghosyan, & Arakelyan, 2019), or when maximizing representative genomic diversity across a species (Robinson et al., 2021). In the case of sable antelope (Hippotragus niger ; Gooley et al., 2020) the software program PedSam (https://sites.uwm.edu/latch/software-2/) was used to streamline the selection of individuals representative of founder diversity across many managed populations for downstream diversity comparisons. In a recent study in California condors, individuals with low inbreeding and kinship coefficients were selected using the pedigree, and were compared in terms of runs of homozygosity using WGS (Robinson et al., 2021). When familial relationships are known via pedigrees, this information can also be used to validate whether molecular genetic and genomic approaches (e.g., extraction, amplification, library preparation, or sequencing) produce data that are consistent with biologically-relevant expectations or experienced error along the way (see Galla et al., 2020 for details).
Beyond informing the individuals sampled for molecular studies, pedigrees can be pivotal to successful genetic variant discovery. For many conservation genomic research projects, variants (e.g., SNPs, SVs) are used as markers to identify and measure diversity (Hohenlohe, Funk, & Rajora, 2020; Wold et al., Preprint ). Artefacts from library preparation, sequencing, and bioinformatic processing can lead to false variants in datasets, which can bias downstream analyses (O’Leary, Puritz, Willis, Hollenbeck, & Portnoy, 2019). In addition to adequate filtering for sequencing depth and Hardy-Weinberg equilibrium, validated pedigrees can be used as one tool for filtering false datasets from variants using Mendelian inheritance. This approach has long been used in the field of human genetics for marker validation, and in one study, was able to reduce marker error rates by 50% (Chen et al., 2013). A study in the pedigreed population of Florida scrub jays shows great promise for this approach,  identifying sex-linked and false SNPs from a reduced representation data set (Chen, Van Hout, Gottipati, & Clark, 2014). Further, variant discovery for the critically endangered kākāpō is being informed by Mendelian inheritance, creating a high quality variant data set for all individuals of this species (Joseph Guhlin,Personal comm. ). Because genomic research for species of conservation concern is often budget-constrained, datasets are often hampered by low sequencing depth and subsequent missing data. In the fields of human and crop genetics, imputation (e.g., completing missing data sets with likely alleles using algorithms) is one option for addressing large amounts of missing data (Hickey, Kinghorn, Tier, van der Werf, & Cleveland, 2012; Sargolzaei, Chesnais, & Schenkel, 2015). When coupled with genotypic information from family groups, this approach can increase the likelihood of accurate imputation, even of rare alleles (Ullah et al., 2019). While imputation is not currently practiced in conservation or ecological genomics, we anticipate it is only a matter of time before it will be explored, especially for species with large genomes that are costly to sequence at high depths (e.g., some fish, insects, and plants; Mao et al., 2020) or as a cost-effective option for conservation programs that can only sequence at low depths.