Figures
Figure 1 (a) Pedigree chart illustrating the bases of an iSNP and a DNM. To phase a DNM and determine the parent-of-origin, single reads must span from the DNM to the iSNP without fragmentation. The distance between these two points of interest dictates the feasibility of phasing the target region with short or long read sequencers. PCR amplification followed by long read sequencing allows target read lengths of up to ~10,000 bp. (b)Illustration of the three categories in which targets are grouped and phased. The grey boxes upon the thin line in each category (1, 2, and 3) represent existing exome read coverage. Each category is based on the distance between the iSNP and DNM, which determines whether it can be phased with only the existing short-read WES trio data (1), WES and long-read proband data (2) or WES and long-read trio data (3). The percentage of samples that fit each category is highlighted in blue.
Figure 2 Using validated DNM for an anchored phasing pipeline. Pipeline overview for phasing, including illustration of variant strength and filtering. Reads are split by their Sanger validated DNM nucleotide, then homozygous variants are identified in the two groups. These variants are compared to additional supporting sequencing data and prioritised for phasing based on their primary, secondary, tertiary, and unvalidated categories. These categories are based on the extent of the variant validation from WES and ONT Trio data. Small deletions that have no alternative supporting data are removed due to their high false call rate. An iSNP is selected based on the sequencing support category and used to phase the original proband ONT mapped reads, after which the allele frequencies, zygosity, and parent-of-origin are determined.
Figure 3 Example of the postzygotic DNM in C10orf71 visualized in IGV . The iSNP and DNM base associated (coloured) counts and percentages are displayed at the top of the plot as a descriptive extension of the coverage bar. This postzygotic DNM is observed in the ONT and WES allele data and is supported by the discrepancies in the DNM positional base frequencies, while the iSNP base frequencies remain close to the expected 50:50 ratio. The IGV visual of the reads presents them grouped by iSNP base type. This read view illustrates a third allelic form in the reads that matches the base and iSNP combination expected, had the DNM not occurred (C-A 33% in ONT, 41% in WES).