CDSS as clinical reasoning support systems

Above, we have argued that coming up with a diagnosis and treatment plan involves a search process (exploration and investigation) that is directed by clinical experts. Specific to the reasoning of clinical experts in this search process is, for example, to ask relevant and sensible questions about the case, to decide which parameters (clinical data and other) about a patient are relevant to include and which not, to formulate possible explanations for the symptoms, and to see similarities with other cases. In this epistemological context, CDSS must support this process by answering questions asked by the clinician. For example: 1. What are likely diagnoses for a patient with symptoms x,y,z? 2. What treatments have been found effective for patients with diagnosis A, from age group B, with comorbidities C,D and E? 3. What are the chances that a patient with symptoms x,y,z has disease A? Or disease B? 4. How likely is it that treatment T will be effective for a patient with symptoms x,y,z? 5. If the patient with symptoms x,y,z has disease D, what other signs of symptoms would they have? 6. What if, instead of symptom x, the patient would have symptom w?
In addition, CDSS could also be helpful in effectively searching the patient’s medical records, for example to answer questions such as:
7. How often has the patient suffered from similar attacks? 8. What other drugs does the patient take, and might they interact? 9. What other examinations have been performed on this patient, and what was the outcome?
In short, the CDSS can provide information on the patient’s records and statistical (numerical) information about illnesses and treatments in similar cases, and with that support all types of reasoning (deductive, inferential, hypothetical, counterfactual, analogical, etc.) employed by clinicians about their patients. Moreover, based on the data of a patient that is fed to the CDSS, the system could come up with suggestions itself (hypotheses). But still it is the clinical expert’s epistemic task to: 1) come up with relevant questions and 2) judge the answers. Concerning the latter, the criteria employed by a CDSS to evaluate the answers are different from the criteria employed by the clinician. Whereas the CDSS uses a very limited set of epistemic criteria (such as technical and statistical accuracy, cf. Kelly et al. 2019), a clinician’s judgement must meet a more extensive set of both epistemic criteria (such as adequacy, plausibility, coherency, intelligibility) and pragmatic criteria to assess the relevance and usefulness of the knowledge for the specific situation.
In short, we have argued that clinical decision-making is a complex and sophisticated reasoning process, and that a clinician is epistemologically responsible for this process. Instead of thinking of CDSS as a system that answers the question “what is the diagnosis for patient A with symptoms x,y,z” and, subsequently “what is the best treatment for this patient”, it is better to think of the system as answering the numerous intermediate questions raised by a clinician in the clinical reasoning process. By answering these questions with the help of statistical information based on a large amount of reliable data, the clinician’s reasoning process can be supported, substantiated and refined. Therefore, we propose that it is more suitable to think of CDSS as clinical reasoning support systems (CRSS). In the following paragraphs, we will further elaborate on what is needed for good use of a CRSS in clinical practice. We will defend that the designer of the system and the clinicians who will use it, already need to collaborate from early on in the development of the CRSS.

The epistemological role of experts in developing CRSS

Above, we explained that the epistemological role of clinicians in the diagnosis and treatment of individual patients is crucial, even though CRSS can provide important support. Here we will explain that the epistemological role of clinical and AI experts is also crucial in the development of a CRSS, and that these experts need to collaborate.
In a very simple schema, the development of a CRSS consists of three phases, the input, throughput and output. Human intelligence plays a crucial role in each phase.
The input in the development of a CRSS is existing medical knowledge (for knowledge-based AI-systems) and available data (for data-driven systems). In the development of knowledge-based CRSS all clinical, epidemiological and theoretical knowledge in the medical literature can be used. However, medical experts must indicate which knowledge is relevant for which purpose, and which knowledge belongs together, and also how reliable that knowledge is. In the development ofdata-driven CRSS, reliably labelled data are needed to train the system, while relevant, reliable unlabelled data are need for the system to find patterns and correlations. Knowledge from clinical experts is needed to generate the training set (such as labelled images), and to select sets relevant and reliable unlabelled data. In all these cases, knowledge of clinical experts plays a role in choosing appropriate categorizations, adequate labelling, and in the organization of data storage in order to make the system searchable and expandable for clinical practice.25
The throughput in the development of a CRSS is the machine-learning process in which the machine-learning algorithm searches for a ’model’ (i.e., another algorithm) that connects the labelled data in the training set in a statistically correct way (i.e., supervised learning), or detects statistically relevant correlations in unlabelled data (i.e., unsupervised learning). The design, development and implementation of this machine-learning process requires AI experts rather than clinical experts. However, there will be overlap between the development of the input (the labelled or unlabelled data fed into the process) and the machine learning process, which implies that some collaboration is necessary in this phase.
The output (or result) of the mentioned steps in the development of a CRSS is a ‘model’ (an algorithm). This model is implemented in the CRSS to be used in clinical practice. But before implementation, the model must be checked by human experts for relevance and correctness, since its statistical correctness does not automatically mean that it is adequate and relevant for the CRSS.3 11For example, Kelly et al. (2019) describe a study in which “an algorithm was more likely to classify a skin lesion as malignant if an image had a ruler in it because the presence of a ruler correlated with an increased likelihood of a cancerous lesions” (ibid, 4). This is because the data is under-determined, which means that in principle many statically correct models (algorithms) can be found (cf. McAllister 201126) to (i) connect between labelled data and their labels (in the case of supervised learning), or (ii) find statistically relevant correlations in unlabelled data (in the case of unsupervised learning). In order to be able to do this, clinical experts must, for example, know which parameters play a role in the model and then assess on the basis of their medical expertise whether this is medically/biologically/physically plausible. In short, here as well the contribution of human intelligence is crucial, since medical experts, in collaboration with AI experts must determine whether the resulting model is reliable and relevant.

Explainable and accountable CRSS to facilitate interaction with the clinician

To use a CRSS as a clinical reasoning support system in the manner we suggest above, it is necessary that a CRSS facilitates this. This requires22Another requirement is that a CRSS is equipped with a suitable interface that allows clinicians to enter their questions, possibly even by speaking. And the algorithm should be designed such that it can deal with various questions posed by clinicians. This kind of flexibility might be challenging to implement, it goes beyond the scope of this paper to address these challenges. that a CRSS should facilitate that a clinician can evaluate its answer and judge its accuracy and relevance for the specific patient. A well-known objection to AI for clinical practice is the opacity of the algorithm: how it establishes an outcome based on the input is ‘black-boxed’. This, of course, obscures the users’ ability to judge the accuracy and relevance of the outcome. Chin-Yee and Upshur (2019), for example, argue that because of the black-box nature of CRSS, using these systems conflicts with clinicians’ ethical and epistemic obligation to the patient. According to them, this is one of central philosophical challenges confronting big data and machine learning in medicine.27
Similarly, in their ‘Barcelona declaration for the proper development and usage of artificial intelligence in Europe’ Sloane and Silva (2020) argue that decisions made by machine learning AI are often opaque due to the black box nature of the patterns derived by these techniques. This can lead to unacceptable bias.9 Therefore, they state that “When an AI system makes a decision, humans affected by these decisions should be able to get an explanation why the decision is made in terms of language they can understand and they should be able to challenge the decision with reasoned arguments” (ibid, 489).
These requirements for the use of AI systems are indicated by the developers of machine learning developers by the concept ofexplainable AI. The idea of explainable AI is that humans can understand how a CSRS has produced an outcome, for example by developing algorithms that are understandable by the users. This, however, might limit the level of complexity of the algorithm, and with that negate the possible benefits of using AI. In case of clinical use it might not be necessary to understand the exact intricacies of the algorithm, but rather to have some insight into factors that are important or decisive to come up with a specific prediction or advice. What machine learning algorithms do is learn to assign weights to features in the data, in order to make optimal predictions based on that data. For clinicians, it is important to know which features are considered relevant by the algorithm and how much weight is assigned to this feature. Having that information, a clinician can judge whether the features that a CRSS picks out are indeed relevant or not (i.e. an artefact in an image, or an unreliable measurement). In the optimal configuration, a clinician can also enter feedback into the system, allowing the algorithm to come up with an alternative prediction, and to learn for future cases.
An advantage of using an explainable AI algorithm, assuming that CRSS should be considered as a clinical reasoning support systemrather than a decision system, is that it aids clinicians to explicate their reasoning process. Important in this context is that medical expertise involves a lot of tacit knowledge that can easily remain hidden in the clinical reasoning of these experts. We have argued that epistemological responsibility entails elucidating knowledge and reasoning that otherwise remains implicit.14 However, for clinicians this can be quite challenging. Using a system that formalises aspects of the reasoning process and explicates the factors that are combined, and with what weight, will support clinicians in developing their ability to articulate and justify their own reasoning process. This explicit understanding, in turn, can contribute to the communication between the clinician and the patient. The explanation enables patients to understand their clinician’s reasoning process and add to it, thus empowering them to take part in the decision-making process concerning their own medical care.

Establishing a link between the CRSS and the individual patient

Sullivan (2020) argues that it is not necessarily the complexity or black-box nature that limits how much understanding a machine learning algorithm can provide.28 If an algorithm is to aid understanding of the target phenomenon by its user (such as a scientist or a clinician) it is more important to establish how key features of the algorithm map onto features of the real-world phenomenon. This is called empirical justification . Sullivan calls a lack of this type of justification link uncertainty . Link uncertainty can be reduced by collecting evidence that supports the connection between “the causes or dependencies that the model uncover to those causes or dependencies operating in the target phenomenon” (ibid, 6).
Consider, for example, an algorithm that is used to classify cases of skin melanoma29 (Esteva et al. 2017, as referred to by Sullivan), which is developed by a machine learning algorithm using large amounts of images from healthy moles and melanoma. Because there is extensive background knowledge linking the appearance of moles to instances of melanoma, for example explaining why possible interventions are effective for lesions that look a certain way, “the model can help physicians gain understanding about why certain medical interventions are relevant, and using the model can help explain medical interventions to patients” (ibid, 23). This background knowledge links the mechanisms that are uncovered by the AI algorithm (i.e. predicting which treatments will be effective for which cases) to relevant mechanisms in the target phenomenon (i.e. skin lesion that does or doesn’t require treatment). Because of this link, empirical justification is established, and clinicians can use the algorithm to answer why-questions about skin lesions.
Concerning the transparency of algorithms, Sullivan contends that our understanding is quite limited if we know nothing whatsoever about the algorithms. She argues that having some insight in the weighing used by the algorithm is needed. Therefore, as long as the model is not opaque at the highest level, that is to say that there is some understanding of how the system is able to identify patterns within the data, it is possible to use a complex algorithm for understanding. What is needed is “some indication that the model is picking out the realdifference makers (i.e., factors that matter) for identifying a given disease and not proxies, general rules of thumb, or artefacts within a particular dataset” (ibid, 21).
In our view, Sullivan identifies an important condition for the use of CRSS in clinical practice. Based on her analysis, we infer that it is important to ensure that the algorithm used by a CRSS (which was developed by data-driven AI) is linked to the target phenomenon, by empirical (preferably scientifically supported) evidence. Sullivan has more general links in mind: that the algorithm can generally be used to understand the mechanisms of a target phenomenon. For clinical practice we would add another important link: a link between the algorithm and the individual patient that the clinician intends to diagnose and treat. To establish this link and use a CRSS to better understand the individual patient, clinicians need to ensure/verify that 1) the type of outcome (i.e. the disease category) produced by the CRSS is consistent with the ‘picture’ of the patient that the clinician has constructed so far; 2) the data used to train the CRSS is relevant to the patient; and 3) that the input required by the CRSS is available to the patient in question and of good quality.