Germline coding and noncoding variants (SNVs and indels)
Figure 1 summarizes the analysis workflow of the WES data. The mean sequencing depth of the exomes was 92× (Supplementary Table 2 presents sequencing metrics and the type of genomic library for each sample); P03 was the only sample presenting 10x on-target coverage below 80%.
A total of 9,467 rare (population frequency <1%) germline coding nonsynonymous variants were detected in the cohort of 30 HB patients, mapping to 6,102 genes. Details of all variants can be found in Supplementary Table 3 . A total of 2,107 of these rare variants, related to 1,737 different genes and including 1,671 missense mutations and 436 LoF variants (Figure 2a ), met our criteria of a read depth >10, Phred score >20, and alternative allele frequency >0.35. Pathogenic (P) or likely pathogenic (LP) variants mapped to morbid OMIM genes that could explain the syndromic phenotypes of some patients were not detected.
Using a list of 222 CPGs composed of 119 known CPGs (49 of them reported in OMIM), in addition to 103 candidates that were compiled by revision of recent publications (Supplementary Table 4 ), we investigated the presence of rare coding variants that could be related to cancer development. No homozygous or compound heterozygous pathogenic (P) or likely pathogenic (LP) variants were observed in known or candidate CPGs. Eleven heterozygous P/LP variants mapped to 11 CPGs were detected in 10 patients, comprising 33% of the group (Table 2 );VHL variants were detected in two patients. Among these 10 patients with P/LP in CPGs, only four presented a family history of cancer; four of them were syndromic, and three were born prematurely. One of these patients (P28) carried two variants mapped to known recessive CPGs. Eight out of the ten P/LP variants were detected in seven autosomal-dominant CPGs (known or candidate), including anAPC LoF variant (OMIM #175100 FAMILIAL ADENOMATOUS POLYPOSIS 1; gastrointestinal carcinomas) and six missense variants mapped to theCHECK2 (LI-FRAUMENI SYNDROME 2; colorectal, breast and prostate cancer), DROSHA (50), MSH2 (LYNCH SYNDROME I; colorectal cancer/MISMATCH REPAIR CANCER SYNDROME 2; hematologic malignancy, brain tumors, and gastrointestinal tumors), RPS19 (DIAMOND-BLACKFAN ANEMIA 1; osteogenic sarcoma, myelodysplastic syndrome, colon cancer),VHL (VON HIPPEL-LINDAU SYNDROME; renal cell carcinoma, pheochromocytoma, hemangioblastoma, hypernephroma, pancreatic cancer, paraganglioma, adenocarcinoma of the ampulla of Vater), andTGFBR2 (COLORECTAL CANCER, HEREDITARY NONPOLYPOSIS, TYPE 6) genes. Three P/LP variants were detected in three CPGs associated with recessive conditions: an ERCC5 LoF (XERODERMA PIGMENTOSUM, COMPLEMENTATION GROUP G; skin cancers) and missense variants mapped toFAH (TYROSINEMIA, TYPE I; hepatocellular carcinoma) andMUTYH (FAMILIAL ADENOMATOUS POLYPOSIS 2; colorectal carcinomas). In addition, 44 VUS mapped to 34 known/candidate CPGs were observed in 21 patients (70%) (Table 3 ), most of whom carried more than one variant. VUS mapped to the ATM ,BRCA2 , COL7A1 , DHCR7 , DOCK8 , FANC s, and GLI3 genes were detected in more than one patient.
Thirty-two genes related to liver differentiation or function were found to be affected by rare damaging variants (ABCB11, ABCB4, ABCC2, ABCC3, AFP, AHR, ALB, CTNNB1, CYP1A1, CYP1A2, CYP2C19, CYP2C8, CYP2C9, CYP2D6, CYP3A4, CYP3A7, DLK1, FAH, FOXA2, G6PC, HIF1A, KRT7, KRT8, MET, NR1I2, ONECUT1, PAH, POU5F1, PPARg, SOX17, UGT1A6, and UGT1A9).
We also investigated whether the observed sex bias in the group could be explained by an increased burden of rare damaging variants in one of the sexes; we did not detect significant differences considering the average of rare damaging variants in male and female patients (~66.7 and 68.1, respectively), LoF variants (~15 variants in both groups), and rare damaging CPG variants (~8 and 6, respectively).
A total of 2,069 noncoding variants passed our filters (read depth >10, Phred score >20, alternative allele frequency >0.35, frequency in population databases > 0.1%), including intronic (76%), intergenic (7%), 3’ prime UTR (7%), 5’ prime UTR (6%), and splice region (4%) variants (Supplementary Table 5 ; Figure 2b ). These variants were annotated using SNP Nexus [40], and those with CADD scores above 15 and associated with cancer (https://geneticassociationdb.nih.gov/) were prioritized for further analyses. Table 4 details the 23 prioritized rare noncoding variants. Two variants were observed in the intronic regions of the CPGs BRAF and CREBBP , but with no evidence of a functional effect. In particular, P12 carried a de novo mutation in the 5’ UTR of the TCF7 gene, an important effector protein in the 7Wnt pathway; this patient also carries a paternally inherited coding TCF7 VUS (c.1060C>G).
Using the exome data of HB patients and a healthy control group (n=19, data not shown), we inspected a list of 220 DNA repair genes (distributed in 16 categories, including several bona fide CPGs; https://www.mdanderson.org/documents/Labs/Wood-Laboratory/human-dna-repair-genes.html), searching for rare LoF and missense variants with high in silicodamage prediction (5 or more algorithms; Supplementary Table 6 ).Figure 3 shows the frequency of rare damaging variants detected in 12 DNA repair categories in both controls and patients. Thirty-four heterozygous variants mapped to DNA repair genes were observed in 21 patients (70%), while nine heterozygous variants were detected in nine healthy controls (47%). Although not statistically significant, there was an apparent excess of damaging variants mapped to DNA repair genes in patients. Moreover, rare damaging variants affecting specific DNA repair gene categories, such as ubiquitin modification, poly (ADP-ribose) polymerase (PARP) enzymes that bind to DNA, nonhomologous end-joining, homologous recombination (BRCA1, EME2, SPIDR, andRAD54L ), Fanconi anemia genes (BRCA2, FANCA, BRIP1, SLX4, FANCD2, and FAAP24 ), genes associated with DNA sensitivity to damaging agents, and base excision repair, were observed only in patients. Significant enrichment for rare damaging variants mapped to Fanconi anemia genes was detected in the group of patients (p value 0.0338; Fisher’s test).