Insert Figure 3
Deep learning models suffer from this due to their hidden learning
behavior and up to billions of parameters. One recent study made this
vividly clear. When researchers trained a model to distinguish COVID-19
patients with pneumonia from those with other respiratory diseases based
on chest radiographs, the algorithm based its prediction on the printed
dates on the radiological images; it found a shortcut and classified all
patients dated since 2020 as COVID-19 cases. Thus, there is a growing
demand for ‘white box’ approaches, referring to methods and models that
are easy to explain and interpret. This need is further amplified when
the aim is to bring applications to clinical practice, which has many
technical, medical, legal, and ethical dimenions. The urgent
need for explainability has accelerated methodological innovations to
‘open the black box’. Relevant examples are SHAP (SHapley Additive
exPlanations), LIME (Local Interpretable Model-agnostic Explanations),
and CAM (Class Activation Maps). For example, SHAP was recently
implemented to describe the contribution of features selected for
inclusion in asthma prediction models. These analytical methods
calculate how each input feature contributes to each prediction,
providing detailed insights into the learning patterns of the AI model.
Validation and
generalizability
A structured modeling process is essential in developing an ML
prediction model to create a reliable model and establish confidence in
its outcomes. There are many ML algorithms, and it is difficult to tell
which will perform best beforehand. This is called the no-free
lunch theorem, which emphasizes the need to develop and evaluate ML
models iteratively. Thus, multiple ML methods should be applied to the
data and their performance compared. Figure 4 depicts the steps to build
a supervised learning prediction model for disease risk. The steps
needed for unsupervised learning overlap to a large extent. Skipping or
mismanaging these steps poses a risk to model reliability, for example,
by not properly separating the training and validation data, which may
lead to overfitting of the prediction model.