Figures
Figure 1 (a) Pedigree chart illustrating the bases of
an iSNP and a DNM. To phase a DNM and determine the parent-of-origin,
single reads must span from the DNM to the iSNP without fragmentation.
The distance between these two points of interest dictates the
feasibility of phasing the target region with short or long read
sequencers. PCR amplification followed by long read sequencing allows
target read lengths of up to ~10,000 bp. (b)Illustration of the three categories in which targets are grouped and
phased. The grey boxes upon the thin line in each category (1, 2, and 3)
represent existing exome read coverage. Each category is based on the
distance between the iSNP and DNM, which determines whether it can be
phased with only the existing short-read WES trio data (1), WES and
long-read proband data (2) or WES and long-read trio data (3). The
percentage of samples that fit each category is highlighted in blue.
Figure 2 Using validated DNM for an anchored phasing
pipeline. Pipeline overview for phasing, including illustration of
variant strength and filtering. Reads are split by their Sanger
validated DNM nucleotide, then homozygous variants are identified in the
two groups. These variants are compared to additional supporting
sequencing data and prioritised for phasing based on their primary,
secondary, tertiary, and unvalidated categories. These categories are
based on the extent of the variant validation from WES and ONT Trio
data. Small deletions that have no alternative supporting data are
removed due to their high false call rate. An iSNP is selected based on
the sequencing support category and used to phase the original proband
ONT mapped reads, after which the allele frequencies, zygosity, and
parent-of-origin are determined.
Figure 3 Example of the postzygotic DNM in C10orf71
visualized in IGV . The iSNP and DNM base associated (coloured) counts
and percentages are displayed at the top of the plot as a descriptive
extension of the coverage bar. This postzygotic DNM is observed in the
ONT and WES allele data and is supported by the discrepancies in the DNM
positional base frequencies, while the iSNP base frequencies remain
close to the expected 50:50 ratio. The IGV visual of the reads presents
them grouped by iSNP base type. This read view illustrates a third
allelic form in the reads that matches the base and iSNP combination
expected, had the DNM not occurred (C-A 33% in ONT, 41% in WES).