INTRODUCTION
Prediction models also known as clinical prediction models are mathematical formula or equation that expresses the relationship between multiple variables and helps predict the future of an outcome using specific values of certain variables. Prediction models are extensively used in numerous areas including clinical settings and their application is large. In clinical application, a prediction model helps to detect or screen high-risk subjects for asymptomatic disease for early interventions, predict a future disease to facilitate patient-doctor communication based on more objective information, assist in medical decision-making to help both doctors and patients to make an informed choice regarding the treatment, and assist in healthcare services with planning and quality management.
While specific details may vary between prediction models, the goal and process of developing prediction models are mostly similar. Conventionally, a single prediction model is built from a dataset of individuals in whom the outcomes are known and then the developed model is applied to predict outcomes for future individuals. There are two main components of prediction modeling: model development and model validation. Once a model is developed using an appropriate modeling strategy, its utility is assessed through model validation. Investigators want to see through validation how the developed model works in a dataset that was not used to develop the model to ensure that the model’s performance is adequate for the intended purpose.
Model validation provides a true test of a model’s predictive ability when the model is applied on an independent data set. A model may show outstanding predictive accuracy in a dataset that was used to develop the model, but its predictive accuracy may decline radically when applied to a different dataset. In the era of precision health where disease prevention through early detection by monitoring health and disease based on an individual’s risk is highly encouraged, accurate prediction in model validation has become even more important for successful screening.
There are numerous clinical prediction models available to serve different purposes, however, only a few found their application in clinical practice. One reason for that is lack of their validation, particularly external validation. External validity establishes generalizability of a prediction model. Generally, accuracy of a prediction model degrades from the sample in which the model was first developed to subsequent application. For a prediction model to be generalizable, the accuracy of the model need to be both reproducible and transportable. A prediction model that cannot predict outcomes accurately in a new sample is useless. Clinicians did not find confidence and trust to use prediction models in their practice that are not well validated. Despite its importance being recognized, external validation of prediction models is not common, which has largely contributed to failure to translate prediction models into clinical practice. Different clinical practice guidelines recommend incorporating only those prediction models in clinical practice that has demonstrated good predictive accuracy in multiple validation studies.
Model validation involves different aspects and our objective is to discuss those aspects in this paper to provide the readers with a basic understanding and importance of the topic. The concept of model validation is statistical. However, we tried to present a nontechnical discussion of the topic in plain language. The information provided in this paper can be helpful for anyone who wishes to be better informed, have more meaningful conversations with data analysts about their project or apply the right model validation technique given that they have advanced training in statistics. We have arranged our discussion as follows. We begin the discussion with defining model validation. Then we have outlined the major steps one needs to follow in model validation. Within the model validation steps, we discussed different ways of model validation together with their strengths and limitations which we named “model validation procedures” and how to assess the performance of a validated model which we named “model performance assessment”.