Results
Whole exome sequencing (WES) of 77 infertile males and their unaffected parents identified 109 rare de novo mutations (DNMs), all of which were independently validated by Sanger sequencing. Accurate phasing and parent-of-origin calling of the DNMs requires DNA molecules spanning a parentally informative single nucleotide polymorphism (iSNP) and the DNM (Figure 1.a). As such, the ability to call the parent-of-origin is primarily reliant on read lengths. This is notable in the WES data, where only 8% of DNMs could be phased as most iSNP were located >300 bp away from the DNM (Figure 1.b1 and Table 1).
Target amplification groups
For phasing, all DNM regions were targeted with long-range PCR and sequenced using ONT long-read sequencing. Primer pairs were designed for targeted long-read phasing of the 109 targets (Supplementary Table 1). The fragment size was capped at ~10 kb to simplify PCR optimisation. Unfortunately, amplification success was still impacted by target length within the 10 kb range, with ideal lengths found to be <4kb (Supplementary Figure 2.a). PCR optimisation steps were required for 50% of all targets (Supplementary Figure 2.b). The amplification size had no impact on error or quality in base calls or allele assignment (Supplementary Figure 3 and Supplementary Table 7). Importantly, for 71% of cases an iSNP was identified in the available trio-based WES data within 10 kb of the DNM (Figure 1.b1 & 1.b2). For this group of DNMs, phasing can be done by targeted long-read sequencing of the proband only, since the iSNP is already typed in patients and their parents.
Cases where an iSNP could not be found in the coding region had primers designed to cover 5 kb regions around the DNM position for parent and proband samples (Figure 1.b3). We chose the 5 kb region based on the analysis of 4344 DNMs identified in 53 whole genome sequenced individuals/children (Smits et al., 2022). This revealed the presence of at least 1 iSNP within 5 kb of any given DNM in 81% of cases (Supplementary Tables 8 and 9). In the end, we obtained long-read sequencing data with iSNPs for 77 out of 109 DNMs selected (71%, Supplementary Table 8 and 10).