WES data analysis
We used VarSeq software version 1.5.0 (Golden Helix) to annotate and
filter the variants. SNV and indel variants were filtered by read depth
(>10), Phred score (>20), and variant allele
frequency (>0.35 for germline, and >0.10 for
somatic variants). Variant annotation was performed based on public
databases of populational, clinical, and functional databases.
Germline variants with populational frequencies above 0.5% or 1% for
recessive and dominant models of inheritance, respectively, were
filtered out. The somatic mutations in the tumor sample were obtained
excluding all germline variants. Following, variants were filtered based
on Sequence Ontology by RefSeq, and only coding non-synonymous missense
and essential splice site, frameshift, and gain/loss of stop-codons
(loss of function - LoF) were maintained for further analysis. In
silico prediction of pathogenicity of missense variants were based on
six algorithms provided by the database dbNSFP (version 2.4). The
potential damaging effect was also assessed using the VEP32 script
software package from Ensembl (https://www.ensembl.org/), and only
variants predicted as pathogenic by at least five different tools were
prioritized. All the LoF variants were also prioritized. The final list
of filtered variants was annotated using Varelect 33and HPO 34 for ranking genes associated to the
specific phenotype of the patients. The variants were validated by
visual inspection using the Integrated Genomics Viewer (IGV). The
prioritized germline variants were classified according to the ACMG
guidelines 35,36, using the Varsome tool37. The Supporting Information Figure S1 summarizes
the approach for WES data analysis.
Two prioritized germline variants from candidate genes (CYP1A1and CEP164 ) and mutational hotspots of TERT promoter were
investigated by Sanger sequencing (primer pairs are available under
request).