Discussion

Assessing the current landscape of the PhenomeCentral dataset shows a steady growth in the deposition of cases for the purposes of matchmaking. Most of these cases are deeply phenotyped, with an average of 11 HPO terms annotated per case, and many cases containing additional medical and family history information. A high amount of genotypic diversity was also observed within the PhenomeCentral dataset, with over 3200 unique candidate genes flagged in total. Finally, all PhenomeCentral cases were subjected to internal matching, and about 70% of cases were also consented to matching with other MME data repositories. Both internal and external matchmaking queries resulted in over 62,000 matches being returned across the entire dataset, ultimately leading to the identification of multiple novel disease-gene associations.
PhenomeCentral is based on the PhenoTips software, which has a number of advantages. Over the past five years, PhenoTips has been actively implemented into hospital systems around the world to enable improved care for RD patients, resulting in a lower barrier to entry as more physicians become familiar with the similar interfaces. Having PhenoTips software at the core of clinical and research databases also expedites the migration of clinical data into research, as all PhenoTips instances support the export and import of the same standardized data files, as well as automated deposition of de-identified cases into PhenomeCentral.
Based on user feedback, we devoted considerable resources to developing the revised matchmaking filters and the My Matches table. As the MME network continues to grow and more data is deposited for matchmaking, we have begun to approach a point where nearly every candidate gene returns matches with other cases (Osmond et al., in press). Combined with the reality that older matchmaking submissions continue to receive new matches years after their initial submission, it is critical that matchmaking nodes provide users with the tools to filter and track up to thousands of matches simultaneously. The new filters and My Matches table represent initial steps towards providing users with such tools, however additional changes will be required so that matchmaking remains efficient for users.
The presence of high quality phenotypic data in cases submitted to matchmaking represents another solution to reducing the time required to resolve an increasing number of matches. The matchmaking experiences of the Care4Rare Canada research team suggest that while most MME nodes support the storage of standardized phenotypic data, more than half of cases in the MME are submitted with little to no information on clinical features (Osmond et al., in press). As a result, most matches are difficult to resolve on an initial review, and require lengthy email exchanges with the matching user to determine whether a given match is of interest. Conversely, matches with cases from nodes where phenotypic data is frequently provided, such as PhenomeCentral, could be ruled out on initial review over 50% of the time, drastically reducing the number of follow-up emails required. As the number of cases and candidate genes submitted to matching continue to grow, it will be critical for nodes to emphasize the importance of submitting phenotypic data to ensure current matchmaking solutions remain practical. The philosophy of PhenomeCentral is that the upfront effort of contributing phenotypic data to cases ultimately saves time in the matchmaking process
We believe that the current design of PhenomeCentral is well positioned for novel approaches to matchmaking, which will utilize genomic sequence data to increase the number of matches made for a given gene. The MME framework is currently based on two-sided matchmaking, an approach where both cases in a match have the same identified candidate gene. An iteration on this approach, called one-sided matchmaking, would instead allow users to directly query the genomic sequence data of patient records in a database for variants in a candidate gene. In the future, zero-sided matchmaking, a process in which algorithms use genomic sequence data and phenotypic information to highlight matches of interest, may also become a reality. One-sided and zero-sided matchmaking, while requiring patient consent to a greater degree of data sharing, both have the potential to increase the number of matches made for a candidate gene. They will also have a greater need for detailed phenotypic data associated with cases to ensure that the larger numbers of matches can be reviewed without resorting to lengthy email exchanges. PhenomeCentral is perfectly positioned for these matchmaking approaches, as it already allows users to upload sequencing data for cases, provides consent checkboxes to indicate which cases may use this data for matchmaking, and can display matches between candidate genes and variants identified in sequencing data.
In summary, PhenomeCentral has continued to grow as a data repository dedicated to gene discovery and finding diagnoses for unsolved rare disease patients through matchmaking. The current dataset consists of cases from both large research groups and individual researchers, and contains a wide variety of candidate genes and computer-readable phenotypes. The development of new features such as robust matching filters and cohort-wide matching tables have helped PhenomeCentral users more efficiently manage an ever-growing number of matches. Finally, an emphasis on contributing high quality genotypic and phenotypic data to matchmaking has both aided MME users in the quick resolution of many matches (Osmond et al., in press), and has positioned PhenomeCentral to contribute to more sophisticated forms of matchmaking in the future.