Influences of early diagnostic suggestions on clinical reasoning

Kourtidis, Ploutarchos; Nurek, Martine; Delaney, Brendan; Kostopoulou, Olga

doi:10.1186/s41235-022-00453-y

Original article
Open access
Published: 15 December 2022

Influences of early diagnostic suggestions on clinical reasoning

Ploutarchos Kourtidis¹,
Martine Nurek¹,
Brendan Delaney¹ &
…
Olga Kostopoulou ORCID: orcid.org/0000-0001-9643-0838¹

Cognitive Research: Principles and Implications volume 7, Article number: 103 (2022) Cite this article

2756 Accesses
2 Citations
4 Altmetric
Metrics details

Abstract

Previous research has highlighted the importance of physicians’ early hypotheses for their subsequent diagnostic decisions. It has also been shown that diagnostic accuracy improves when physicians are presented with a list of diagnostic suggestions to consider at the start of the clinical encounter. The psychological mechanisms underlying this improvement in accuracy are hypothesised. It is possible that the provision of diagnostic suggestions disrupts physicians’ intuitive thinking and reduces their certainty in their initial diagnostic hypotheses. This may encourage them to seek more information before reaching a diagnostic conclusion, evaluate this information more objectively, and be more open to changing their initial hypotheses. Three online experiments explored the effects of early diagnostic suggestions, provided by a hypothetical decision aid, on different aspects of the diagnostic reasoning process. Family physicians assessed up to two patient scenarios with and without suggestions. We measured effects on certainty about the initial diagnosis, information search and evaluation, and frequency of diagnostic changes. We did not find a clear and consistent effect of suggestions and detected mainly non-significant trends, some in the expected direction. We also detected a potential biasing effect: when the most likely diagnosis was included in the list of suggestions (vs. not included), physicians who gave that diagnosis initially, tended to request less information, evaluate it as more supportive of their diagnosis, become more certain about it, and change it less frequently when encountering new but ambiguous information; in other words, they seemed to validate rather than question their initial hypothesis. We conclude that further research using different methodologies and more realistic experimental situations is required to uncover both the beneficial and biasing effects of early diagnostic suggestions.

Significance statement

Diagnostic errors can lead to patient harm and have been identified as a global priority by the World Health Organization. Suggesting to physicians diagnoses to consider before they start testing their own hypotheses has been found to increase accuracy in previous studies that used both online, rich clinical scenarios and simulated consultations with actors-as-patients. We explored hypothesised mechanisms of this phenomenon. We did not measure a clear and consistent effect of early diagnostic suggestions on physicians’ reasoning. In contrast, we measured a strong and consistent effect of confidence: when confidence about the initial diagnosis was high, it was followed by less extensive information search, more biased evaluations, fewer diagnostic changes, and fewer differential diagnoses. Thus, interventions that succeed in curbing physicians’ confidence in their initial diagnostic hypotheses may result in improved reasoning and greater accuracy. Furthermore, we raise the possibility that diagnostic suggestions may have an undesirable effect: when physicians see their own hypothesis amongst the suggestions, they may become more confident and more biased, validating rather than questioning their hypotheses.

Introduction

Diagnosis is a core task in health care. It is particularly important in primary care, where patients present with new medical problems that need to be managed appropriately, without being indiscriminately subjected to invasive or expensive investigations (Singh et al., 2017). Failure to provide a timely and accurate diagnosis can have serious consequences for patients. Furthermore, it has been suggested that most people will experience at least one diagnostic error in their lifetime (IoM, 2015). It is therefore important to find ways of supporting medical diagnosis, as it is the task that leads to the greatest potential for serious medical error (Bhasale, 1998; Fisseni et al., 2008; Kostopoulou, 2006; Kostopoulou et al., 2008).

Medical diagnosis can be supported in different ways, including training, national guidelines, checklists, and decision aids. Decision aids are usually computerised tools, algorithms or online platforms that provide advice to support decision-making (Berner et al., 1999; Short, Frischer, & Bashford, 2003). In health care, this advice could range from risk calculation and procedural guidance to diagnostic suggestions and treatment plans.

Previous research has highlighted the importance of physicians’ initial hypotheses for their subsequent diagnostic judgements. For example, there is evidence that family physicians may miss cancers, if they do not consider them early on in the diagnostic process (Kostopoulou et al., 2017a). Thus, interventions to improve diagnosis could be more effective if they are employed early, before physicians start testing their own hypotheses. Based on this principle, Kostopoulou and colleagues (2015a, 2015b, 2017b) developed a computerised decision support tool, which provides diagnostic suggestions at the start of the consultation based on the patient’s demographics, risk factors and principal symptom. Two early studies evaluated the principle of early suggestions in two different countries (UK and Greece), with GPs diagnosing information-rich clinical scenarios online, where they could request information at will. A third study integrated the principle of early suggestions into a computerised diagnostic support tool and evaluated it in a high-fidelity simulation, where family physicians consulted with actors-as-patients (“standardised patients”). In all three studies, this type of decision support improved physicians’ diagnostic accuracy, without significantly increasing consultation time or number of investigations ordered (Kostopoulou et al., 2015a, 2015b, 2017a).

The mechanism by which early diagnostic suggestions impact clinical reasoning and improve diagnostic accuracy has not yet been explored. This is a crucial next step in the development of diagnostic aids. We need to understand why they are effective, i.e. how they influence physicians’ thinking, so that we can streamline and optimise them for clinical use.

Elstein, Shulman and Sprafka’s seminal studies found that physicians generate one or very few diagnostic hypotheses early in the consultation (i.e. within the first few seconds), based on minimal information (Elstein et al., 1978). Furthermore, the Hypothesis Generation (HyGene) model, a computational memory model, suggests that only a small number of hypotheses can be held in working memory due to memory constraints, and that these will guide the subsequent elicitation and interpretation of information (Thomas et al., 2008, 2014). These early hypotheses can, however, compromise the diagnostic process by crowding out other valid hypotheses, and exerting a disproportionate influence on what information is elicited and how it is interpreted (Brownstein, 2003). Thus, physicians may elicit information that is likely to confirm their focal hypothesis and/or interpret non-diagnostic information as supportive of that hypothesis. Even diagnostic information can be made to fit a coherent narrative that has developed during a consultation, as physicians, and perhaps even patients, search for cognitive consistency (Kostopoulou et al., 2009, 2012; Russo et al., 2008).

Coherent narratives induce greater confidence in judgement. “Confidence is a feeling, which reflects the coherence of the information and the cognitive ease of processing it” (Kahneman, 2011, p. 212). Thompson and colleagues (2011, 2013) suggested that people’s first intuitive judgements are accompanied by subjective confidence, a feeling that they are right. This can make people less open to alternative interpretations, less likely to seek additional information and re-evaluate their initial judgement and less likely to change it when appropriate (Desender et al., 2018; Thompson et al., 2011, 2013). These “feelings of rightness” about an initial intuitive judgement can determine whether, and to what extent, analytical reasoning will be activated.

Applying this to the clinical encounter, the first hypothesis that comes quickly to a physician’s mind will be accompanied by some degree of certainty, experienced as a feeling of rightness. If certainty is high, it could bias the subsequent diagnostic process and outcome. Indeed, overconfidence has been linked to diagnostic error (Berner & Graber, 2008; Friedman et al., 2005; Meyer et al., 2013). Berner and Graber (2008) suggest that physicians may develop an “illusion of validity”, which makes them overestimate the accuracy of their judgements (Einhorn & Hogarth, 1978).^{Footnote 1} As a result, physicians often anchor on their initial diagnostic hypotheses and become less likely to seek advice or consider other possibilities (Arkes, 2013; Dreiseitl & Binder, 2005). When they do, they may selectively seek information that supports their hypotheses, (Dani, Bowen-Carpenter, & McGown, 2019; Mendel et al., 2011), and/or distort this information in favour their hypotheses (Kostopoulou et al., 2009, 2012; Leblanc et al., 2001, 2002; Nurek et al., 2014). Furthermore, studies have found physicians not to be well calibrated, i.e. their confidence did not match their accuracy (Dawson et al., 1993; Friedman et al., 2005; Meyer et al., 2013).

Presenting physicians with diagnostic alternatives early on could reduce unwarranted certainty by reminding them of other possibilities that they should consider. “Unpacking” hypotheses, i.e. presenting specific hypotheses in place of an “other” category, has been found to have a debiasing effect on diagnostic judgements and to reduce probability estimates attached to the focal hypothesis (Redelmeier et al., 1995). Furthermore, when physicians are presented with other possibilities, they may be more willing to seek further information before reaching a diagnostic conclusion, more cautious when they evaluate non-diagnostic information, and more likely to reconsider their initial diagnostic hypothesis.

Larrick categorised debiasing strategies into motivational, technological, and cognitive (Larrick, 2004). Motivational strategies try to leverage incentives, social norms, and accountability to improve decision-making and are related to the so-called choice architecture and nudging techniques (Dolan et al., 2012; Michie et al., 2011; Thaler & Sunstein, 2009). Technological strategies aim to improve decision-making through the use of algorithms and tools, such as decision analysis and computerised decision aids (Bhandari et al., 2008; Huang et al., 2012; Raiffa, 1968). Finally, cognitive strategies include training in logical rules, statistical reasoning and awareness of one’s own biases (Gigerenzer, 2015; Nisbett, 1993).

A well-known example of a cognitive debiasing strategy that encourages analytical reasoning is “consider-the-opposite”, a technique that directs attention towards disconfirming information and facilitates consideration of alternative hypotheses (Hirt & Markman, 1995; van Brussel et al., 2020). Along with decision aids, consider-the-opposite has been found to be one of the most effective debiasing strategies in health-related judgements (Ludolph & Schulz, 2017). For example, generating arguments that contradict one’s own hypotheses or favour alternative hypotheses has been found to reduce overconfidence (Haran, Moore, & Morewedge, 2010; Hirt & Markman, 1995; Koriat et al., 1980; McKenzie, 1997), anchoring (Mussweiler et al., 2000), confirmation bias (van Brussel et al., 2020), and hindsight bias (Arkes et al., 1988). Although it is classified as a cognitive debiasing strategy, research has shown that consider-the-opposite can be implemented through technological strategies as well, such as decision aids that provide alternative hypotheses for decision-makers to consider (Bhandari et al., 2008; Dreiseitl & Binder, 2005; Harada et al., 2021; Huang et al., 2012; Sibbald et al., 2021).

Based on these findings, we hypothesised that early provision of diagnostic suggestions would reduce certainty about an initial diagnostic hypothesis and as a consequence, lead to more extensive information search, more balanced appraisal of information, and more frequent diagnostic changes. To test these hypotheses, we conducted three online experiments where UK family physicians assessed hypothetical patient scenarios and either received a list of diagnostic suggestions or received no such help. Specifically, in Experiment 1, we tested the effects of early diagnostic suggestions on initial certainty and diagnostic change. In Experiment 2, we tested the effects of the suggestions on information search and information evaluation. In Experiment 3, we investigated these effects, after some modification on the list of suggestions.

In all the experiments, we accounted for Actively Open-minded Thinking (AOT). AOT refers to a thinking style that involves adopting various perspectives and considering arguments that oppose one’s own beliefs (Baron, 2019), seeking more information and considering alternatives (Baron, 2008; Haran, Ritov, & Mellers, 2013). Baron (2006) describes AOT as “good thinking”. It is a model of rational thinking that has been termed Active Open-mindedness because it is a) “open” to alternative explanations that oppose an initial judgement and b) “active” in searching for evidence to disconfirm pre-established beliefs (Baron, 2006, 2019). Baron further suggested that AOT is a way to prevent various biases to occur, including overconfidence and confirmation bias. To measure AOT, a scale was initially developed by Stanovich and West (1997). In our research, we used a shorter and more recent version of the scale developed by Baron (2019). It consists of eleven statements that measure how people evaluate information and form their beliefs (see Procedure).

Experiment 1

In this within-participants experiment, we explored a potential mechanism by which early diagnostic suggestions might impact reasoning. Specifically, we tested whether diagnostic suggestions reduce physicians’ certainty about their initial diagnostic hypothesis, resulting in more frequent diagnostic changes when physicians encounter new information that is not entirely consistent with the initial hypothesis.

Method

Participants and sample size

We powered the study to detect differences in diagnostic certainty between control and experimental conditions in a multiple linear regression. Using the G*power software (v. 3.1.9.4), we estimated that 392 responses would be needed to detect a small effect (Cohen’s f² = 0.05) with 80% power and alpha of 0.05. To account for data clustering (each physician responding to two scenarios), we adjusted this number by the Design Effect (DE) (Barratt, Kirwan, & Shantikumar, 2018). This is calculated using the formula DE = 1 + (n–1)*ICC, where n is the cluster size (the two scenarios), and ICC is the intra-class correlation. The ICC of the original study was 0.05 (Kostopoulou et al., 2015b). Thus, DE = 1.05. We adjusted the number of participants required by multiplying the 392 required responses with the DE and dividing by the cluster size: (392*1.05)/2 = 205.8. Thus, we estimated that we needed to recruit 206 physicians.

We recruited fully qualified family physicians and trainees in family medicine, currently practising in England, using a database of family physicians who had participated in previous studies by the research group. We offered them a £10 Amazon voucher for their participation.

Materials

We used two patient scenarios, initially developed by Kostopoulou and colleagues (2017a) and adapted for the purposes of this experiment. One scenario depicted a patient presenting with chest pain, the other a patient presenting with breathlessness. For each scenario, we used the list of diagnostic suggestions prepared by Kostopoulou and colleagues (2017a): 18 diagnoses for one scenario and 23 for the other. The full scenarios with their lists of diagnostic suggestions are presented in the Additional file 1: S1.

Procedure

The experiment was conducted online and was administered using the Qualtrics platform. Participating physicians were sent an invitation e-mail that contained a brief description of the study and a hyperlink via which they could access the study website. Upon accessing the study, physicians read the information sheet and provided their consent. Subsequently, they were asked to indicate their gender (male or female), their professional status (fully qualified family physician or trainee). Fully qualified physicians were asked to provide their year of training completion. They were then presented with two patient scenarios in a random order. Only in one of these scenarios did participants receive diagnostic suggestions (the “Aided” condition). The provision of diagnostic suggestions was counterbalanced: half of the physicians received the suggestions in the first scenario and the other half in the second scenario. The Qualtrics randomiser ensured that each scenario was presented with and without diagnostic suggestions an equal number of times. Both scenarios were presented in two steps: at step one, physicians saw a brief patient description suggestive of a specific diagnosis. The description contained patient demographics, risk factors and the presenting problem. Participants were then asked to provide their initial diagnosis in a text box and indicate how certain they were on a visual analogue scale (VAS) ranging from 0 (Not at all certain) to 10 (Absolutely certain). At this stage, if the scenario was in the Aided condition, participants were instructed to read a list of diagnostic suggestions. They were told: “A decision aid trialled at your practice makes these diagnostic suggestions about the patient (in alphabetical order)”. To ensure that participants read the list, we used a timer that prevented physicians from progressing until 10 s had passed. At step two, participants were presented with additional information about the patient, including physical examination results. This information was somewhat ambiguous; it was consistent with the diagnosis suggested by the initial information but could also suggest other diagnoses. Participants were then asked to update their certainty about their initial diagnosis and provide their final diagnosis and certainty about the final diagnosis. Finally, they were asked to indicate whether they would order investigations and, if so, to specify which ones (in free text). After completing the two patient scenarios, participants completed the AOT scale (Additional file 1: S2). For each AOT statement, they indicated their agreement on a 5-point scale, ranging from “Completely disagree” to “Completely agree”.

Analyses

We computed change in initial certainty by subtracting the second measurement of initial certainty (elicited at step two) from the first measurement (elicited at step one). Change in diagnosis was a dichotomous variable (Yes/No) indicating whether the initial and final diagnoses differed. To determine this, we first standardised and classified diagnoses into diagnostic categories, under the guidance of the clinician co-author (BD), an experienced family physician, who was blinded to the experimental condition. For instance, angina, coronary artery disease, and ischaemic heart disease were classified as heart disease (see Additional file 1: S3). We then followed two criteria to determine whether a diagnostic change had occurred:

1.
In case of a single initial diagnosis, any change in diagnosis, including a switch to a different diagnosis or an addition of a new diagnosis, counted as a change.
2.
In case of multiple initial diagnoses, any change in the diagnostic set, such as the addition or removal of diagnoses, counted as a change. Number of investigations was a count variable indicating the number of different tests that physicians ordered in each scenario.

We regressed change in certainty, change in diagnosis, and number of investigations on condition (Aided vs. Control) in multilevel regression models with a random intercept by physician, controlling for initial certainty and AOT score. In a separate model, we added a variable indicating whether the suggestions were provided in the scenario seen first or second (Condition order). We used linear regression for continuous measures, logistic regression for dichotomous measures, and Poisson regression for the count variable. Table 1 presents a summary of the results across the two conditions.

Table 1 Descriptive statistics by condition

Full size table

Results

We recruited 217 family physicians. Twenty-one did not provide complete responses and two did not provide analysable diagnoses (e.g. “uncertain”); 23 physicians were thus excluded from the analyses. We analysed the responses of the remaining 194 physicians. There were 101 males (52%) and two were trainees. The sample’s average experience was 12.63 years in family medicine (SD 9.10), ranging from 0 to 41 years (median 10 years). Participation lasted on average 10 min and 7 s (range 3 to 34 min, median 9 min). Table 1 presents descriptive statistics for the Control and Aided conditions separately.

Change in initial certainty

Across conditions, certainty about the initial diagnosis dropped by an average of 0.85 units on the 0–10 scale. As expected, this reduction was greater in the Aided condition vs. Control (means − 0.99 vs. − 0.72); however, it was not significant (b = − 0.239, [− 0.57, 0.09], p = 0.154). The higher the initial certainty, the less it dropped after additional information was provided (b = − 0.149, [− 0.25, − 0.04], p = 0.007). When the suggestions were provided in the scenario seen first (vs. second), there was a greater reduction in certainty overall (means − 1.09 vs. − 0.63, b = − 0.42 [− 0.76, − 0.09], p = 0.013), suggesting a possible spillover effect (from Aided to Control condition). We detected no association with the AOT score (b = − 0.294, [− 0.82, 0.23], p = 0.272). The regression table is available in Additional file 1: S4.

Figure 1 shows the mean change in certainty by Condition and scenario, where the bar height indicates the extent of change. We can see that in the breathlessness scenario, there was a greater difference in the extent of change between the two conditions than in the chest pain scenario. Indeed, subgroup analyses found that in the breathlessness scenario, initial certainty dropped significantly more in the Aided condition than Control (b = − 0.41, [− 0.73, − 0.09], p = 0.013) but this difference was not significant in the chest pain scenario (b = − 0.16, [− 0.74, 0.42], p = 0.583).

Change of initial diagnosis

Physicians changed their initial diagnosis 48.45% of the time. Results were according to expectations and consistent with the results on change in certainty. Specifically, diagnostic changes were more frequent when suggestions were provided (50% vs. 46.90%), but not significantly so (OR = 1.196, [0.79, 1.80], p = 0.389). Initial certainty was negatively associated with change in diagnosis (OR = 0.799, [0.70, 0.91], p = 0.001). When suggestions were provided in the scenario seen first (vs. second), the initial diagnosis changed more frequent (OR = 1.637, [1.08, 2.47], p = 0.019), again suggesting a spillover effect. There was no association with the AOT score (OR = 1.339, [0.71, 2.53], p = 0.367). The regression tables are available in Additional file 1: S4.

Figure 2 shows that in the chest pain scenario, diagnoses changed equally frequently in both conditions, whereas in the breathlessness scenario, there were more changes in the Aided condition, consistent with the greater reduction in certainty seen above. However, subgroup analyses did not detect a significant difference in the frequency of diagnostic changes between conditions in the scenario (OR = 1.38, [0.74, 2.54], p = 0.305).

Investigations

There was no significant difference between conditions in the number of investigations ordered (Control: 3.15 vs. Aided 3.18, IRR = 0.993, [0.89, 1.11], p = 0.907). The order in which suggestions were presented—in the scenario seen first vs. second, was not associated with number of investigations (IRR = 1.00, [0.97, 1.04], p = 0.714). We detected a positive association between AOT score and number of investigations (IRR = 1.47, [1.23, 1.75], p < 0.001). The regression table is presented in Additional file 1: S4.

Discussion

We did not find that early suggestions significantly and consistently reduced diagnostic certainty or led to significantly more diagnostic changes compared to control. We did however find trends in the expected direction for both outcome variables, and a significant difference between conditions in one scenario. Active open-mindedness was associated with more investigations, as expected, but with no other outcome variable. Although the ambiguous additional information led to reductions in certainty across the board, when initial certainty was high, it was more resistant to change, and it was accompanied by fewer diagnostic changes.

There are several plausible reasons why diagnostic suggestions did not influence physicians’ responses significantly. Firstly, unexpected spillover effects from Aided to Control conditions may have compromised our ability to detect an effect. Secondly, it is possible that the patient scenarios did not have the anticipated result of inducing high initial certainty; this was, in fact, rather moderate, at around the midpoint of the scale. It is possible that physicians interpreted the initial information differently from what we had intended, given that physicians did not always provide the most likely initial diagnosis that each scenario suggested. For instance, in the chest pain scenario where the most likely diagnosis was musculoskeletal chest pain (Additional file 1: S3), physicians provided a different diagnosis 25% of the time (e.g. infection, pulmonary embolism, pericarditis). In the breathlessness scenario, responses were less variable (less than 10%). Importantly, we do not know to what extent physicians took the list of suggestions into account, since some commented that they found it lengthy, distracting, and confusing. Finally, it is possible that physicians approached the scenarios with an analytical mindset from the start, in the knowledge that they were being studied, which could reduce differences between conditions. We attempted to overcome some of these limitations in Experiment 2.

Experiment 2

In this experiment, we used two new patient scenarios with strong signals aiming to induce higher initial certainty than in Experiment 1. Each scenario had two versions, one suggesting a serious and the other a less serious disease. This was to ensure that any effect of the diagnostic suggestions was not limited by the severity of the initial diagnosis. This also gave us the opportunity to investigate whether severity of the initial diagnosis was associated with subsequent information search and evaluation. We edited the list of diagnostic suggestions to maximise its impact, by removing the least likely diagnoses and merging any overlapping diagnoses. The number of diagnostic suggestions was therefore reduced and was equal in both scenarios (12 suggestions). Participants were able to request information about the patient by choosing from a list of clinical cues that were designed to provide non-diagnostic information. Thus, in addition to diagnostic certainty and diagnostic changes, we were able to measure information search, that is, the number of cues requested before a final diagnosis was given. We also measured information evaluation, by asking participants to rate the degree to which each requested cue supported their initial diagnosis. Diagnostic suggestions were always provided in the scenario seen last, to avoid spillover effects.