Number of disease causing and benign MECP2 genetic variants available
Based on the 13 genotype-phenotype databases identified in (Townend et al., 2018), the inclusion criteria for this study were not met by DisGeNET, dbSNP, dbVAR, Café Variome, and HGMD. DisGeNET, dbSNP and dbVAR did not provide unambiguous descriptions of variations as the RS identifier only indicates a location of polymorphism and needs evaluation of the, sometimes ambiguous, additional information about the nucleotide change. Café Variome provided only protein change which, although very relevant itself, cannot be translated back to an unambiguous genetic change. HGMD, the only commercial database, did not allow re-use and re-distribution of the content. The eight databases that did fulfil our inclusion criteria and data previously anonymized from local RTT patients were used in this study (see Table 2). At the time of research, in total 12,158 MECP2 variation entries were found in these databases. The databases contained between 34 (DECIPHER) and 4,706 (RettBASE) MECP2 variations (Table 2). Between 15% and 100% of these variations were unique database entries (occur only once in one single database). Multiple entries of one variation were found frequently in disease specific databases, giving an indication of the abundance of this variant and also confirming its pathogenicity. In total we identified 4,573 RTT causing MECP2 variants (of which 863 were unique) that annotate genetic information with diagnosis (RettBase, ClinVar, Maastricht Rett dataset, KMD) and/or clear phenotype descriptions (DECIPHER) clearly stating that they cause RTT (or similar e.g., X-linked mental retardation) (intake criteria Sup. Table 1). We identified 617 benign MECP2 variants, of which 209 were unique, from two of the databases that annotate with diagnosis information (RettBase and ClinVar). These were clearly stated to be benign. 19 variants were found annotated both as RTT causing and benign (Sup. Table 2).
In total, we collected 12,158 MECP2 variants, which resulted in a collection of 10,968 (5,038 unique) curated and integrated variants. These processed datasets are available as csv on gdrive (link). Out of the 10,968 curated MECP2 variations only 11 occur in more than 1% of all database entries, and these account for 53.7% of all database entries (data not shown).