Introduction
After genome-wide sequencing, the majority of patients with rare disease
(RD) remain without an identified genetic cause for their condition (Lee
et al., 2014; Wright et al., 2018; Yang et al., 2013). Oftentimes,
finding even one other unrelated patient with a similar phenotype and
available genomic data can lead to the identification of a shared
genetic cause (Bamshad et al., 2011). Given that RDs are just that,
rare, for many disorders no one clinician will see two patients with the
same unsolved RD. Instead, the “matching” patients (having similar
manifestations with the same genetic cause) may be located across the
globe.
A significant barrier to identifying matching patients has been the
storage of clinical and genomic data in isolated silos. Furthermore,
much of the clinical data is collected in unstructured, non-standardized
formats, which impedes computation and sharing across groups. Data is
shared by clinicians in numerous ways including case discussion at
conferences and publication of case reports. However, given the rarity
of the conditions being discussed, these means of sharing are
ineffective and inefficient, which translates to delays in gene
discovery and answers for patients.
In the field of genetics, data generation is occurring at unprecedented
rates in both the clinical and research settings. The subsequent
abundance of data in different siloes has led to an interpretation
bottleneck and data sharing plays a crucial role in addressing this
issue. Broad data sharing will lead to better data quality and
interpretation, faster answers for patients, and rapid advancements in
the field of genetics. There is consensus among professional societies,
patients, and experts that responsible data sharing is imperative (ACMG
Board of Directors, 2017; Bush et al., 2018; Darquy et al., 2016; Rehm,
2017). Consideration must be given to the social, societal, privacy, and
policy challenges to implementing international data sharing
responsibly, and these are being tackled by groups such as the Global
Alliance for Genomics and Health (Rahimzadeh, Dyke, & Knoppers, 2016).
There is no question that data sharing will be essential to solving the
currently unsolved patients with rare diseases. Furthermore, it will
also support better understanding of complex, non-Mendelian conditions
and facilitate the shift toward personalized medicine.
To being to address the problem of responsible data sharing, in 2014 we
launched PhenomeCentral, a web portal designed for the matchmaking of
cases entered by clinicians and researchers working with rare diseases
(Buske et al., 2015a). Since its entrance into the rare disease space,
PhenomeCentral has been facilitating matches and the identification of
second cases or case series leading to gene discovery and answers for
families often experiencing a diagnostic odyssey. PhenomeCentral is a
founding member of the MME (Philippakis et al., 2015), a collaborative
effort to solve genetic disorders by building an international network
of rare disease databases connected by a common application interface
(API; Buske et al., 2015b). In this manuscript we describe the growth of
PhenomeCentral over the past 7 years, improvements we have made for the
portal, and goals of our future work.