This project focuses on individual differences in two aspects of causal reasoning. In causal learning, people induce general knowledge about the strength and structure of causal relationships after observing large-sample covariation data (Griffiths and Tenenbaum 2009) or by making interventions and observing their effects (Bramley et al. 2017). In diagnostic reasoning, people use previously learned causal knowledge and a small number of observed events to make inferences about other specific events (Meder and Mayrhofer 2017a). In this paper, we focus on a particular kind of diagnostic reasoning known as fault diagnosis.
Fault diagnosis
Fault diagnosis is an arguably general-purposeFootnote 1 process that involves finding the causes that are producing specific abnormal effects in a system (symptoms). Fault diagnosis is common in equipment repair and medicine, but the reasoning used in fault diagnosis is applicable in other domains. Legal reasoning (Fenton et al. 2013) and some scientific argumentation (e.g., identifying causes of global warming) seek to identify the causes of specific observed events. Fault diagnosis is not just done by experts. Home care nurses sometimes need to diagnose faults on home medical devices and may have difficulty doing so (Lyons and Blandford 2018). If your laptop cannot get a Wi-Fi signal but your cell phone can, the working cell phone allows you to eliminate an absent Wi-Fi signal as the cause and localize the fault to your laptop. Gugerty (1989) found that many undergraduates used this elimination strategy when diagnosing a household electrical problem. Causal attribution in social situations (Morris and Larrick 1995) uses causal inferences like discounting, which is used in fault diagnosis.
Little lab-based research has been conducted on fault diagnosis, as most research has focused on experts working in complex, knowledge-rich domains, e.g., medical diagnosis (Patel et al. 2012). In this project, we used a fault diagnosis task that is more complex than many tasks used to study diagnostic reasoning but which does not require prior expertise and can be used in a laboratory setting. We focus on three fault diagnosis strategies: backtracking from the abnormal system output, eliminating potential faults that lead into normal system output, and inference to the best explanation (IBE). In IBE, people choose a causal explanation of a set of symptoms based on simplicity (minimizing the number of faults), coverage (maximizing the number of symptoms explained), and other factors (Lombrozo and Vasilyeva 2017).
Individual differences in normative performance
Researchers may obtain an inaccurate picture of peoples’ cognition by focusing only on group averages, especially when individuals can complete a task using different strategies. Such differences have been shown in learning and memory (Estes 1956; Hemmer et al. 2015) and spatial cognition (Logie 2018). However, within research on reasoning, many of the early heuristics-and-biases studies focused on group averages and concluded that the average person fell short of normative standards on a variety of inductive (Nisbett et al. 1983; Tversky and Kahneman 1974) and deductive (Evans et al. 1983; Wason and Shapiro 1971) reasoning tasks.
More recent findings on individual differences have emphasized how subgroups of people vary from the group mean. First, a small group of participants reason normatively on many reasoning tasks, although most fail to do so. Second, the extent to which individuals reason normatively is positively correlated with fluid intelligence and thinking dispositions (e.g., open-mindedness), with each predictor contributing uniquely (Klaczynski and Lavalee 2005; Stanovich and West 1997, 1998; Toplak et al. 2014). In the current studies, we investigate these two research questions using the complex reasoning task of fault diagnosis.
Many cognitive science researchers define normative cognition using models at Anderson’s (1991) rational level of explanation, which describes cognitive functions in terms of optimal adaptation to goals given environmental constraints. Here, we use information gain as a metric for defining optimal or normative fault diagnosis. Information gain is a frequently used metric of normative performance in fault diagnosis (Navarro and Perfors 2011) and other information search tasks (Nelson 2005).
Fluid intelligence and thinking dispositions
The ability to concurrently process and store information in working memory is strongly correlated with performance on a variety of reasoning tasks (Kyllonen and Christal 1990). Researchers have recently focused on a particular function of working memory—supporting inferences about hypothetical situations that are decoupled from perceptual representations of the world—as an important component of reasoning (Evans and Stanovich 2013; Oaksford and Chater 2012). Evans and Stanovich (2013) suggested that tests of fluid intelligence assess this function of working memory; Shipstead et al. (2016) provided evidence supporting this viewpoint.
Stanovich (2011, 2018) proposed that tests of thinking disposition assess the degree to which people can detect the need to override less-effortful thinking that relies on prior knowledge and switch to analytic, hypothetical thinking. Common thinking disposition measures include motivation and effort toward cognitive tasks (e.g., Typical Intellectual Engagement; Goff and Ackerman 1992) and openness to changing beliefs (e.g., Actively Open-Minded Thinking; Stanovich and West 1997). In this viewpoint, fluid intelligence assesses the capability for hypothetical thinking using working memory, while thinking dispositions assess the propensity to do this.
Research questions
Our goal was to test whether the individual differences findings described above, which have been demonstrated primarily on simpler reasoning tasks, extend to complex causal reasoning tasks. We conducted three studies in which participants completed tests of fluid intelligence and thinking dispositions and a task where they diagnosed faults in physical systems. We focused on the strategies mentioned above—backtracking, elimination, and IBE—and considered two research questions related to individual differences. Are strategies that allow normative performance used less frequently than strategies not associated with normative performance (Q1)? Are strategies associated with normative performance used more frequently by people with higher fluid intelligence and thinking dispositions (Q2)? Also, to better understand the cognitive processes used in fault diagnosis, we investigated whether elimination, IBE and backtracking have characteristics of analytic or heuristic processing (Q3).
Causal learning task
In two of our studies, participants completed a causal learning task (Liljeholm and Cheng 2009) in addition to the fault diagnosis task. As noted above, learning causal models of the world and using these models to make useful inferences are two critical aspects of causal reasoning. This research design allowed us to pursue our research questions for both causal reasoning tasks and also to see whether the use of normative strategies was correlated across the two tasks. Due to length considerations, we were not able to present the findings from both tasks in this paper. We plan to present the causal learning findings and the cross-task correlations in a separate paper.
Diagnostic reasoning
In causal learning, people learn a causal model describing the relationships among some causal and effect variables—including their structure and strength—by observing many co-occurrences of cause and effect variables (Lu et al. 2008) or by observing the effects of interventions they have selected (Bramley et al. 2017; Coenen et al. 2015). Peoples’ causal models are general in that they are applicable to many situations. Many studies of diagnostic reasoning, including ours, assume that participants have already learned a causal model that describes a particular situation, based on either in-study training or prior expertise. Then, participants observe the state of a small number of variables in the model and make inferences that update their beliefs about the state of other model variables. We consider diagnostic reasoning to be a broad category that includes the following: (1) single-step inferences between directly linked variables, including diagnostic (effect-to-cause) and predictive (cause-to-effect) inferences; (2) multistep inferences such as inference chaining and discounting (Waldman et al. 2008); and (3) higher-level processes like fault diagnosis and forecasting. (In this paper, the term diagnostic inference refers to effect-to-cause inferences between directly connected variables, while diagnostic reasoning is a much broader term, as described here.)
Simpler diagnostic reasoning tasks
Much of the research in a recent review of diagnostic reasoning (Meder and Mayrhofer 2017a) used lab-based tasks with relatively simple causal structures, i.e., a few causes and fewer than a dozen effects. Researchers often assume that, given these simpler structures, people make diagnostic inferences in a quantitative fashion (e.g., Meder and Mayrhofer 2017b; Waldmann et al. 2008). For example, the ability to accurately estimate the posterior probability of a cause after observing its effect correlates positively with fluid intelligence and thinking dispositions (McNair and Feeney 2015; Sirota et al. 2014).
Handling complexity
Realistic fault-diagnosis problems often have complex causal structures. For example, based on verbal protocols given by physicians as they diagnosed realistic cases, Patel et al. (1990) created causal networks showing physicians’ predictive and diagnostic inferences. One physician’s network contained 12 nodes representing evidence from the case and 24 causal nodes representing physiological conditions (including the correct diagnosis). Another physician’s network contained four pieces of evidence and 15 causal nodes. In each problem of our fault diagnosis task, 25–35 potential faults were present, and participants could gather 40–60 pieces of evidence.
Research has shown that as the complexity of decisions increases (e.g., more alternatives or attributes), participants shift from evaluating alternatives using quantitative reasoning (e.g., weighted additive strategy) to qualitative strategies like elimination by aspects, which iteratively eliminates choices (Payne and Bettman 2004). Studies of diagnosis by physicians and nurses suggest that they use qualitative reasoning as well, e.g., classifying diseases as being in or out of a set of potential causes instead of assigning each disease a graded probability of causing the symptoms (Eddy and Clanton 1982; Johnson et al. 1982; Rossi and Madden 1979). Also, evidence indicates that children (Schulz and Sommerville 2006) and adults (Austerweil and Griffiths 2011; Lu et al. 2008; Yeung and Griffiths 2015) often assume qualitative, deterministic causes (either strong or absent) even in complex domains where causal strength varies continuously. Sloman and Lagnado (2015) have highlighted the importance of qualitative reasoning in causal reasoning. In the current studies, we used a deterministic task and focused on how qualitative reasoning might be used in fault diagnosis.
Fault diagnosis
Our participants solved problems like in Fig. 1, which shows a network of water storage tanks through which water flowed from left to right. At the start of each problem, the display showed whether clean (C) or rusty (R) water was flowing through the network input and output pipes. The network is taking in clean water but outputting rusty water because a tank is rusty. The goal of the participants was to find the rusty tank. They did this by making diagnostic tests (testing a pipe revealed whether it contained clean or rusty water) and submitting diagnoses (checking a tank revealed whether it was clean or rusty inside). Results of tests and incorrect diagnoses (a C or R by a pipe or tank) remained on the display. Participants made tests and diagnoses until they diagnosed the rusty tank. They were instructed that (1) pipe tests and tank checks were costly and should be minimized, (2) only one tank was rusty, and (3) rusty tanks had very strong effects (i.e., deterministic causes). Participants were incentivized to observe the cost constraint by imaginary monetary costs and by delays between the time when they clicked on a pipe or tank and when the test result appeared (2.5 s delay and $10 for pipe tests; 5–12.5 s delay and $80 for tank checks). The cumulative amount of money spent on each problem was updated after each test or check. After diagnosing the rusty tank, participants received feedback about how well they had met the cost constraint. Participants could place visual markers on tanks, which were intended to reduce the memory load of using diagnosis strategies. “Appendix 1” shows the networks used in the studies. Similar tasks have been used by Carlson et al. (1992), Kostopoulo and Duncan (2001), and Ham and Yoon (2007).
According to Nelson (2005), fault diagnosis is an example of the general inductive problem where people have a set of hypotheses, and data relevant to the hypotheses are currently available or potentially available from queries (e.g., diagnostic tests). In fault diagnosis, the causal hypotheses are that some system components could be faulty. The task involves efficiently selecting diagnostic tests, which allow observing the effects of potential faults and then updating beliefs about hypotheses based on observed effects. Many studies have focused on the test selection component of this inductive problem (Klayman and Ha 1987; Oaksford and Chater 1994, 2003; Ruggeri and Lombrozo 2015). In the current studies, we focused on the belief-updating component, i.e., how participants updated their beliefs about causal hypotheses using observations, such as observing rusty versus clean water in pipes. In order to highlight to participants the importance of using efficient belief-updating strategies, in each study participants saw two problems like Fig. 1, where using elimination and IBE allowed diagnosing the rusty tank without any diagnostic (pipe) tests, based only on belief updating using the initial observations.
Researchers have distinguished two types of hypothesis testing (Ruggeri and Lombrozo 2015). The more efficient constraint seeking involves making diagnostic tests that reveal the effects of faults and then updating (narrowing) the fault set based on the observed effects. Hypothesis scanning, which focuses on root causes instead of effects, involves making diagnoses of potential faults. This approach is inefficient because it only narrows the fault set by one potential fault per diagnosis and provides no new observations that allow updating the fault set. Given our focus on belief updating, we set the time and money costs for diagnoses to be much higher than for diagnostic tests of pipes in order to motivate participants to minimize the number of diagnoses and use pipe tests as their problem-solving operator. (In Study 1, where the delay after unsuccessful diagnoses was 5 s, a few participants made too many diagnoses. This tendency was reduced by setting this delay to 12.5 s in Study 2 and Study 3.) In the following, we describe the belief-updating strategies that we studied. All of these strategies narrow the fault set by eliminating some faults from consideration, although some strategies do this more efficiently than others.
General backtracking
In backtracking, reasoners update the fault set (which initially contains all tanks) at the outset of the problem by first making one-step diagnostic inferences from the observations of abnormal system state (rusty water), which generates the hypotheses that a tank outputting rusty water could be rusty or any pipe leading into it could carry rusty water. Instead of testing these hypotheses about pipes with diagnostic tests, reasoners make diagnostic inferences from them. This process is repeated recursively until no more inferences can be made. These chains of diagnostic inferences create a set of potential faults that we call the backtracking set and eliminate tanks that that do not lead into rusty water. Figure 1 shows the initial backtracking set, before any diagnostic tests have been made.
After the initial update, reasoners test pipes that directly connect tanks within the current backtracking set until they find a rusty water result and then update the backtracking set again. This test–update cycle is repeated until a diagnosis can be made, i.e., when a tank is identified with all clean water inputs and rusty water outputs. Backtracking is inefficient because it ignores useful information—observations of normal system state (clean water). We call this strategy general backtracking to distinguish it from a variant of backtracking described later.
Updating the backtracking set after each rusty-water test result means that tanks that were in the previous backtracking set but are not causally upstream of the latest rusty water observation are eliminated from the fault set even though they lead into rusty water and could be rusty. For example, in Fig. 1, after pipe 2 has been tested and found to carry rusty water, the new backtracking set contains tank B and all tanks upstream of it, while tanks A, D, E and F are eliminated even though they could be rusty if there were multiple rusty tanks. This second type of elimination during backtracking depends on the single fault assumption. Whether participants using backtracking are consciously eliminating these tanks based on the single fault assumption or merely focusing on making diagnostic, upstream inferences until they find definitive evidence for a rusty tank is not clear.
Practiced elimination
At the outset of the problem, reasoners using elimination make recursive diagnostic inferences (without diagnostic testing) as in general backtracking but from the clean instead of the rusty water observations. This creates the initial elimination set (Fig. 1). Then, they test a pipe directly connecting two tanks in the current elimination set and update the set by eliminating tanks causally upstream of a clean water result and, as in general backtracking, not upstream of a rusty water result. This procedure is iterated until the faulty tank is identified. Although all the updating strategies we discuss involve elimination, only the strategy we call elimination rules out potential causes that predict effects that are disconfirmed by observations of normal system state. This approach is consistent with the idea of eliminating or ruling out hypotheses in medical diagnosis, which is discussed below. This strategy is called practiced elimination to distinguish it from a variant of elimination discussed below.
IBE
Researchers have identified a number of “explanatory virtues” (Lipton 2004) that reasoners use to choose the best explanation of some effects. These explanatory virtues include explaining more effects (coverage) (Johnson et al. 2014), having fewer root causes (simplicity) (Lombrozo 2007; Pacer and Lombrozo 2017), and being more coherent with background information (Koslowski et al. 2008). As its name implies, IBE also involves an inference procedure that evaluates the quality of various explanations in light of sometimes conflicting explanatory criteria. In fault diagnosis, starting with the current elimination set, the coverage and simplicity criteria allow eliminating potential faults that do not explain all of the observations of abnormal system state (symptoms). One inference procedure that accomplishes this is to make predictive inferences from each potential fault in the elimination set and eliminate those that are not causally upstream of all of the symptoms. This procedure is similar to one used in a machine learning model of IBE based on causal information flow (Pacer et al. 2013). Note that in this viewpoint, IBE involves testing potential faults by making hypothetical, predictive inferences.
By telling participants that there was only one rusty tank per network, we primed them to use IBE by revealing that a simple, one-fault explanation could cover or explain all the symptoms. However, we did not tell them how to make the inferences to identify the best explanation. Thus, if we find participants who always make tests and diagnoses within the IBE set, we cannot make the strong claim that they are using IBE without any aid, but we can claim that they are exhibiting the inferences that are part of IBE.
Examples
In Fig. 2a (where tank 6 is rusty), the general backtracking set is tanks 1, 2, 3, 5, 6, 8, and 9; the practiced elimination set is 3, 5, 6, 8, and 9; and the IBE set is 3 and 6. If someone tested the pipe connecting tanks 1 and 5, i.e., pipe 1–5 (with result C), and then tested pipe 2–5 (C), 6–8 (R), and 3–6 (C), these four tests would be evidence for general backtracking but not elimination or IBE since some of the tests are outside the elimination and IBE sets. If someone tested pipe 5–8 (C), 6–8 (R), and then 3–6 (C), this would be evidence for elimination but not IBE. People using IBE would only test pipe 3–6 (C), as this test isolates the fault. See the supplementary materials for demonstrations of the strategies.
Stepwise backtracking
In stepwise backtracking, reasoners make one-step diagnostic inferences from the observations available at the outset of the problem, and then (unlike in general backtracking) immediately test the pipes leading into the tank outputting rusty water. If a pipe test reveals rusty water, the strategy is applied recursively from this result. In Fig. 2b (with tank 5 rusty), the following test sequence exemplifies stepwise backtracking: 6–8 (C), 5–8 (R), 1–5 (C), and then 2–5 (C). Thus, testing starts at the network rusty output and moves upstream.
Discovering elimination
Here we describe more exploratory reasoning that may allow reasoners to transition from backtracking to elimination. In Fig. 2b (general backtracking set: tanks 1, 2, 3, 5, 6, and 8), suppose a reasoner using general backtracking hypothesized that pipe 3–6 contains rusty water. Instead of making a diagnostic test of this pipe, the reasoner could make the predictive inferences that pipes 6–9 and the output for 9 contain rusty water. Since the final prediction is disconfirmed by observed evidence, pipes 3–6, 6–9, and 6–8 must contain clean water, and tanks 6 and 9 must be clean. A sequence of hypothetical predictive inferences like this could lead a reasoner to discover the elimination strategy. After practicing this discovery strategy, participants could develop the more efficient elimination strategy described earlier, in which they immediately rule out tanks leading to clean water without making predictive inferences. In the following, the term elimination refers to the practiced version.
Prior research on belief updating strategies
Gugerty (2007) found that college students used mostly backtracking on a version of the current task and showed little use of elimination. Johnson et al. (1982) provided evidence that medical students, but not physicians, used stepwise backtracking during diagnosis. Carlson et al. (1992) found that undergraduates trained in stepwise backtracking followed it more than an untrained group but that training did not reduce testing costs. Patel et al. (1990) and Johnson et al. (1982) documented instances of physicians using practiced elimination on patient cases.
We noted above that hypothetical thinking based on predictive inferences was used in discovering elimination and in IBE. This approach is important because hypothetical thinking has been implicated as a key part of fluid intelligence and analytic thinking (Evans and Stanovich 2013), which relate to two of our research questions. Gugerty (2007) found that participants who initially used only backtracking increased their use of elimination (relative to control participants) when trained to test fault hypotheses by predictive reasoning instead of diagnostic tests. Patel et al. (1990) and Johnson et al. (1982) observed physicians ruling out hypotheses that conflict with evidence by making predictive inferences instead of diagnostic tests. Finally, IBE has been observed in field studies of medical diagnosis (Eddy and Clanton 1982; Kassirer 1989).
Which strategies are normative?
We defined normative performance for updating strategies in terms of maximizing information gain. Normative performance is a common metric for measuring peoples’ efficiency at posing questions to gather information, including the test selection component of fault diagnosis. Nelson (2005) showed that information gain is at least as efficient as other metrics (e.g., diagnosticity) for quantifying performance on question-posing tasks. For example, Navarro and Perfors (2011) proved that the half-split strategy—selecting the test that comes closest to eliminating half of the hypotheses—is normative in the sense that it maximizes information gain and minimizes the number of tests. Using a task where participants asked yes–no questions to determine the cause of an event, Ruggeri and Lombrozo (2015) found a developmental shift between ages 7–18, whereby older participants asked questions that more effectively narrowed the search space, resulting in higher information gain and fewer tests. Bramley et al. (2017) used information gain to define normative performance at selecting interventions to learn the causal structure of a system.
However, once people have selected an efficient diagnostic test, they must appropriately update the hypothesis set to realize any information gain. Consider a reasoner who conducts a half-split pipe test that reveals clean water. If this person is unaware that tanks upstream of clean water can be eliminated, she might fail to update the fault set appropriately, resulting in no information gain. For deterministic problems where no further narrowing can be accomplished by IBE, de Kleer and Williams (1987) proved that elimination minimizes the size of the fault set. It is important to note that the fact that elimination minimizes tests is an emergent property that falls out of the process of making diagnostic and predictive inferences from all of the available observations. When multiple abnormal system outputs are present, elimination and IBE can be used together. To the extent that the single-fault constraint is warranted, elimination followed by IBE is normative and minimizes the size of the fault set because it uses all the information and constraints that can reduce this set.
Thus, to minimize diagnostic tests during fault diagnosis, participants should use elimination and, if needed, IBE for belief updating and half split for test generation. However, given the size and structure of the networks used in our study, the updated fault sets after use of elimination and IBE were usually so small that half-split could not be used (i.e., they contained three or fewer pipes). We asked participants to minimize the costs of diagnostic tests and diagnoses, which required minimizing the number of tests, because we wanted to assess their capabilities for normative fault diagnosis. In preliminary testing, when there was no delay after pipe tests, many participants seemed to be minimizing time use rather than costs, as they used mostly backtracking and made many very fast tests. The 2.5 s delay after each pipe test was implemented to encourage them to minimize tests.
Measuring strategy use
Elimination use and backtracking were measured on blocks of five to nine network problems that had one rusty output so that IBE could not be used. IBE use was measured on a separate block of five to nine problems that had two rusty outputs. The sequence and timing of pipe tests, tank checks, and markers placed was recorded for each problem. We used both types of diagnostic actions—pipe tests and tank checks—to measure elimination and IBE use. For each participant, strategy use variables were calculated for each network problem and then averaged over problems in a block.
Elimination
Since almost all participants confined their actions to the general backtracking fault set, the elimination use variable was designed to measure the extent to which participants went beyond backtracking to use elimination on a single problem. We measured elimination use based on how frequently participants’ actions were within the set of potentially rusty tanks identified by this strategy. The percentage of elimination actions (%ElimActions) was the percentage of the total actions for a network that were in the current elimination set, which was updated after each test. However, since the elimination set is always a subset of the general backtracking set (e.g., Fig. 1), a participant using only backtracking will have some actions fall within the elimination set by chance. Therefore, actions in the elimination set do not unambiguously indicate elimination use. Accordingly, we only gave participants credit for using elimination if their percentage of elimination actions was above the percentage that would be expected for participants using backtracking. The chance percentage of actions falling within the elimination set given use of stepwise backtracking (chance%ElimActions) was estimated for each problem by averaging 10,000 runs of a simulation that diagnosed the fault using the stepwise backtracking strategy and calculated the percentage of actions within the elimination set. Then, to calculate Elimination Use for a problem, %ElimActions was corrected for chance:
$${\text{Elimination use}} = \frac{{\% {\text{ElimActions}} - {\text{chance}}\% {\text{ElimActions}}}}{{100 - {\text{chance}}\% {\text{ElimActions}}}}$$
(1)
Thus, elimination use measured how consistently an individual used elimination on a problem beyond the level expected from using only stepwise backtracking. Elimination use would be 100 for participants who used elimination for all actions on the problem and 0 (on average) for participants who always used stepwise backtracking. (The chance percentage of elimination tests based on stepwise backtracking is higher than the percentage based on general backtracking. Thus, using stepwise backtracking as the baseline yields a more conservative estimate of elimination use.)
IBE
Because the use of IBE without elimination will not minimize the fault set, we only gave people credit for using IBE if they also used elimination. Because the IBE set is always a subset of the elimination set, measuring IBE use presents the same problem as measuring elimination. Therefore, as for elimination, we only gave participants credit for using IBE if they made more tests in the IBE set than would be expected for someone using elimination but not IBE. The percentage of IBE actions (%IBEactions) was the percentage of the total actions for a network that were in the updated fault set based on using elimination and IBE. The chance percentage of actions falling within the IBE set for a person using elimination but not IBE was calculated for each network problem by simulation. Thus,
$${\text{IBE use}} = \frac{{\% {\text{IBEactions}} - {\text{chance}}\% \,{\text{IBEactions}}}}{{100 - {\text{chance}}\% \,{\text{IBEactions}}}}$$
(2)
Participants who used elimination and IBE for all actions on a problem would score 100 on IBE use, and those who used only elimination would score around 0.
The elimination use and IBE use variables describe how frequently individuals used these strategies. However, since the elimination set was a subset of the backtracking set, the percentage of general (or stepwise) backtracking actions cannot be used directly to characterize how frequently individuals used these strategies. In the results section, we describe how we measured the backtracking strategies and how we classified individual participants in terms of whether they consistently used any of the four strategies across all of the problems.
Hypotheses and analyses
Infrequent versus modal strategies (Q1)
We expected that elimination and IBE would be used by a small percentage of participants, with backtracking being the modal strategy. Although this is a descriptive question with an imprecise criterion, the pattern of infrequent normative performance has been found for resisting belief bias in syllogistic reasoning and the Wason selection task (Stanovich and West 1998). Also, these data are important in understanding whether some individuals can achieve consistent normative performance on reasoning tasks.
Predictors of strategy use (Q2)
Our description of the elimination and IBE strategies suggested that these strategies rely on hypothetical thinking using working memory. Therefore, following Evans and Stanovich’s (2013) assumption that fluid intelligence and thinking dispositions assess the capability and propensity, respectively, to engage in this kind of working-memory intensive thinking, we hypothesized that fluid intelligence and thinking dispositions would correlate positively with elimination use and IBE use, with each predictor accounting for unique variance in using these strategies.
Heuristic versus analytic processing (Q3)
We also evaluated whether elimination, IBE, and backtracking involve heuristic or analytic processing. This evaluation was done in a post-hoc manner, without advancing a hypothesis. Analytic processing involves heavy use of working memory and exhaustive information processing, while heuristic processing involves processing environmental cues based on prior knowledge using mental shortcuts (Dreschler et al. 2014; Evans and Stanovich 2013). Elimination and IBE seem to have characteristics of analytic processing, as they require making and maintaining many inferences in working memory and using all the available evidence. Both backtracking strategies have characteristics of heuristic processing, as they focus primarily on salient environmental cues (rusty water), and because they ignore useful evidence from clean water, make fewer inferences. Thus, elimination and IBE should yield more accurate but slower performance compared to backtracking.
Contribution
Individual differences
Research on causal reasoning has begun to address individual differences in causal learning strategies (Bramley et al. 2017; Bramley et al. 2015; Buehner et al. 2003; Coenen et al. 2015). However, these studies did not study fault diagnosis, and they did not investigate the cognitive correlates of reasoning strategies. Individual differences studies that have focused on diagnostic reasoning have tended to use tasks that are much simpler than fault diagnosis (McNair and Feeney 2015; Sirota et al. 2014). In contrast to these studies, the current project uses two causal reasoning tasks—fault diagnosis and causal learning—to investigate individual differences in the use of normative versus non-normative strategies and the cognitive correlates of normative strategy use. We are not aware of individual differences research that has investigated peoples’ performance on two causal reasoning tasks.
Cognitive processes in fault diagnosis
Our fault diagnosis task differs from other lab tasks used to study diagnostic reasoning in a number of ways. Most lab-based diagnostic reasoning tasks involve simpler causal structures than the fault diagnosis task. Also, many diagnostic reasoning tasks require participants to make a single, explicit, quantitative judgment on each problem (e.g., posterior probability), whereas in the fault diagnosis task, participants make multiple realistic actions (diagnostic tests) and, to perform effectively, must do Bayesian updating of their problem knowledge after each test.
Few studies have investigated the higher-level cognitive processes (e.g., strategies) used in fault diagnosis and diagnostic reasoning. Studies using fault diagnosis tasks similar to ours (Carlson et al. 1992; Kostopoulo and Duncan 2001; Ham and Yoon 2007) have evaluated how training methods affect diagnostic performance but have not measured participants’ frequency of using particular strategies. Also, we are not aware of studies that investigated whether particular fault diagnosis strategies involved analytic versus heuristic processing.
Empirical studies
Studies 2 and 3 can be considered replications of Study 1; therefore, we present the results of the studies together. Here we describe minor variations across the studies. In Study 1, fluid intelligence was measured by SAT and ACT scores and thinking dispositions by open-mindedness (Stanovich and West 1997). A limitation of Study 1 was that only one measure was used for each predictor variable. In studies 2 and 3, we added two fluid abilities tests, verbal analogies, and (because the fault diagnosis task seemed to have a spatial component) a spatial reasoning test. In Study 2, we added intellectual engagement as a thinking dispositions test (Goff and Ackerman 1992). The main focus of Study 3 was to test a hypothesis related to the causal learning task. Because this created time limitations, we did not measure thinking dispositions in Study 3. For studies 2 and 3, because each predictor was measured using multiple tests, we used structural equation modeling (SEM) with latent variables representing fluid intelligence and thinking dispositions. This approach allowed us to assess the reliability of our predictors in the same causal model as our reasoning outcomes. Also, SEM identifies the unique variance accounted for among all observed variables, giving a more reliable estimate of all relationships involved than would be obtained by aggregating predictors.
Participants completed 18 fault diagnosis problems in Study 1 and ten fault diagnosis problems in Studies 2 and 3. To encourage participants to use mainly diagnostic (pipe) tests, the delay after submitting an incorrect diagnosis was increased from 5 s in Study 1 to 12.5 s in Studies 2 and 3. Finally, the instructions for the fault diagnosis task were improved across the studies, as described below.