Causal illusions in the classroom: how the distribution of student outcomes can promote false instructional beliefs

Double, Kit S.; Chow, Julie Y. L.; Livesey, Evan J.; Hopfenbeck, Therese N.

doi:10.1186/s41235-020-00237-2

Original article
Open access
Published: 03 August 2020

Causal illusions in the classroom: how the distribution of student outcomes can promote false instructional beliefs

Kit S. Double ORCID: orcid.org/0000-0001-8120-1573¹,
Julie Y. L. Chow²,
Evan J. Livesey² &
…
Therese N. Hopfenbeck¹

Cognitive Research: Principles and Implications volume 5, Article number: 34 (2020) Cite this article

2907 Accesses
4 Citations
10 Altmetric
Metrics details

Abstract

Teachers sometimes believe in the efficacy of instructional practices that have little empirical support. These beliefs have proven difficult to efface despite strong challenges to their evidentiary basis. Teachers typically develop causal beliefs about the efficacy of instructional practices by inferring their effect on students’ academic performance. Here, we evaluate whether causal inferences about instructional practices are susceptible to an outcome density effect using a contingency learning task. In a series of six experiments, participants were ostensibly presented with students’ assessment outcomes, some of whom had supposedly received teaching via a novel technique and some of whom supposedly received ordinary instruction. The distributions of the assessment outcomes was manipulated to either have frequent positive outcomes (high outcome density condition) or infrequent positive outcomes (low outcome density condition). For both continuous and categorical assessment outcomes, participants in the high outcome density condition rated the novel instructional technique as effective, despite the fact that it either had no effect or had a negative effect on outcomes, while the participants in the low outcome density condition did not. These results suggest that when base rates of performance are high, participants may be particularly susceptible to drawing inaccurate inferences about the efficacy of instructional practices.

Significance statement

The article outlines a series of six experimental studies that examine whether biases in contingency learning affect the judgements participants make about the efficacy of a teaching technique. The manuscript shows an outcome density effect in judgements of novel teaching methods’ efficacy, whereby participants who are exposed to frequent positive student outcomes in a contingency learning task erroneously conclude that the teaching technique is effective. The study has implications for understanding why inaccurate beliefs are prevalent among educators and why such beliefs do not necessarily self-correct over time.

A number of widespread beliefs about instructional practice have been criticised as lacking a scientific basis (e.g. Dekker, Lee, Howard-Jones, & Jolles, 2012; Howard-Jones, 2014; Kirschner, 2017; Kirschner, Sweller, & Clark, 2006). For example, 93% of teachers still subscribe to the (now widely debunked) idea of student learning styles (Dekker et al., 2012). Like urban myths, many of these beliefs have persisted for a long time, often decades after a scientific consensus around their inaccuracy was reached (Kirschner, 2017). Inaccurate instructional beliefs are often adopted to the exclusion of more evidence-based practices, which has a significant negative effect on academic outcomes (Bruyckere, Kirschner, & Hulshof, 2015). While, such practices are often spread culturally, being passed from teacher to teacher, they are also often reinforced by inaccurate media stories, social media, and a lack of scientific education (Pasquinelli, 2012). Teacher misconceptions are remarkably robust across countries and pseudoscientific practices are increasing in schools worldwide (Ferrero, Garaizar, & Vadillo, 2016; OECD, 2002). Little is known, however, about the psychological mechanisms that reinforce inaccurate beliefs about instructional practice. Previous studies into contingency learning—how individuals learn about the statistical relationship between a behaviour and an outcome—have suggested that inaccurate beliefs about the causal effect of a behaviour can often form when the expected outcome occurs frequently, irrespective of the actual contingency between behaviour and the outcome – a so-called outcome density effect (Blanco, Matute, & Vadillo, 2013; Chow, Colagiuri, & Livesey, 2019). In the current study, we examine the outcome density effect in the context of teacher beliefs, by evaluating whether the distribution of students’ outcomes on an academic assessment (e.g. a class test) influences participants’ false beliefs about the efficacy of instructional practices.

Contingency learning

Like most beliefs that people hold, teachers’ beliefs are often causal in nature, that is, they are motivated by perceived cause and effect relationships (e.g. If I use teaching practice x my students’ performance will improve). Teachers acquire these beliefs by accumulating evidence about the cause-effect relationship through direct experience with the putative cause (often referred to as the cue); e.g. the use of a novel teaching practice, and the desired outcome, e.g. improvement in grades. If the cue is influential in changing the outcome, then the probability of the outcome occurring should differ as a function of whether the cue was present or absent. The process of extracting causal information in this way is often referred to as contingency learning (Jenkins & Ward, 1965). This difference in the probability of events is more formally captured in Allan’s delta p (Δp) index (Allan, 1980):

$$ \Delta p=p\left(O|C\right)-p\left(O|\sim C\right) $$

(1)

Δp = contingency

p(O|C) = probability of the outcome given the cue

p(O|~C) = probability of the outcome given no cue

According to Eq. 1, a positive Δp indicates a positive contingency between cue and outcome, such that the probability of the outcome occurring is greater when the cue is present than when it is absent (i.e. the novel teaching practice is effective at improving students’ grades). In contrast, a negative Δp value indicates that the novel teaching practice is producing worse outcomes than if students were not given the novel teaching practice at all. The ability to extract causal information through experience is a necessary tool for navigating the world; people are motivated to produce behaviours that lead to a desirable outcome and avoid behaviours that produce undesirable ones. In fact, people are generally good at identifying positive and negative contingencies between events (e.g. Shanks & Dickinson, 1988); however, when there is no genuine relationship between the two events, that is Δp = 0, people tend to overestimate the causal relationship and develop a false causal belief. This phenomenon is often referred to as the illusion of causality or illusory causation (for a review, see Matute et al., 2015). The illusion of causality has previously been associated with the development and maintenance of pseudomedicine beliefs (Matute, Yarritu, & Vadillo, 2011), as well as judgements of guilt in a criminal setting (Lassiter, Geers, Munhall, Ploutz-Snyder, & Breitenbecher, 2002). We argue that this cognitive bias also presents a problem to educators, as it might result in teachers endorsing teaching practices that are not effective in improving students’ academic performance. We, of course, do not suggest that all false beliefs that teachers hold are the result of observational contingency learning in the classroom or that contingency learning is the only mechanism that reinforces such beliefs; e.g. it would be difficult to see how some false beliefs, such as the belief that we only use 10% of our brains (endorsed by nearly 50% of teachers; Dekker et al., 2012), are perpetuated through contingency learning in the classroom. However, many beliefs that teachers hold are about the efficacy of their own practices which presumably are based on causal inferences about the impact of their teaching practices, which may be driven by contingency learning.

Outcome densities and causal inference

The ability to correctly estimate the contingency between two events relies on an accurate memory of the outcome occurring in the presence and absence of the cue. Experimental research on illusory causation effects has explored the frequency of cue and outcome events as potential factors that inflate false causal beliefs, where manipulations that increase cue and outcome coincidences are particularly effective in biasing strong false beliefs (e.g. Wasserman, 1990). One pertinent example of this is the outcome density effect (Blanco & Matute, 2015; Chow et al., 2019). The outcome density effect is the tendency for people to overestimate the causal relationship between a cue and an outcome when the base rate of the outcome occurring is high relative to when the base rate is low, even when the outcome is independent of the cue. The outcome density effect has been reliably produced using binary outcome events (Blanco & Matute, 2015), when the outcome event is variable (non-discrete) and ambiguous in relation to the participant’s putative causal belief (Chow et al., 2019), when the cue-outcome events are presented one trial a day across a 24-day time scale (Willett & Rottman, 2019), and when the genuine contingency is negative (Vallée-Tourangeau, Murphy, & Baker, 2005; Wasserman, Elek, Chatlosh, & Baker, 1993). In a classroom setting, the outcome density effect may play a role in biasing teachers’ ability to accurately determine the effectiveness of their teaching practices if the student cohort is high-achieving and, therefore, likely to perform well academically regardless of the teaching practice used. Some researchers have also proposed that the development of strong false beliefs are able to interfere with subsequent acquisition of real causal relationships (Yarritu, Matute, & Luque, 2015), suggesting that these beliefs may be persistent and difficult to correct.

Although we will not be manipulating the frequency of the cue in this series of experiments, it is important to note that a high frequency of cue-present trials (e.g. implementing the teaching practice regularly) also result in heightened illusory causation relative to when the cue is presented infrequently. This is referred to as the cue density bias (Allan & Jenkins, 1983; Matute et al., 2011). In theory, these event densities may present a cycle of illusory belief that is difficult to break: teachers develop strong false belief in the efficacy of an ineffectual teaching practice when they have a high-performing cohort (i.e. the outcome density effect), and this results in the persistence of the teaching practice that further strengthens the belief in its efficacy (i.e. the cue density effect), although neither the outcome or cue density effect have been shown in educationally relevant situations previously. Therefore, it is pertinent that we examine illusory causation and the outcome density effect, in particular, in an educational context to determine the extent to which students’ academic outcomes influence people’s belief in a novel teaching practice that is objectively ineffective (Experiments 1–3, 5, and 6) and even detrimental to student performance (Experiment 4).

Outcome densities in educational assessments

Classroom-based assessments are often used by teachers to gauge students’ achievement and learning. Teachers use these assessments to gauge whether their instruction and teaching methods are working, and often adjust their practices according to their students’ results on assessments (Pellegrino, Chudowsky, & Glaser, 2001). Teachers typically infer the effectiveness of their practice by considering the contingency between their practice and the aggregate level of performance due to the computational complexity and memory demands of basing their inferences on student-level data (Black & Wiliam, 2018; Fiedler, Freytag, & Meiser, 2009). For example, Fiedler, Freytag, and Unkelbach (2007) found that teachers gathering observations to infer correlations between student beliefs in a simulated classroom environment were biased by contingencies at the aggregate (i.e. classroom) level. This raises the question of whether the distribution of outcomes of a teacher’s class can influence their ability to make accurate causal inferences about the efficacy of their practices.

In practical terms, classroom assessments are typically designed so that performance is relatively high. This is typically done out of a desire to allow all students to ‘show what they know’ as well as to promote and protect students’ self-efficacy (e.g. Kang, Thompson, & Windschitl, 2014; McCabe, 2003). It is, therefore, worth noting that the distribution of most classrooms assessments will, in practice, have a high base rate with frequent positive outcomes. The real-world context of classroom assessments thus appear to be analogous to a typical high outcome density condition used in contingency learning experiments, a condition where causal illusions are significantly more likely (Matute et al., 2015). If teachers use classroom assessments to guide and evaluate their practice then designing classroom assessments such that a vast majority of students perform well may, in fact, be biasing teachers to believe in the efficacy of ineffectual practices as well as limiting the utility of classroom assessments as a source of feedback for teachers.

Current study

The current study examines whether the distribution of students’ academic outcomes affects the inferences that an observer makes about the efficacy of a novel instructional technique. Adapting the typical contingency learning paradigm for an educational context and examining both categorical and continuous outcomes, we aim to examine whether frequent positive student outcomes promote false beliefs. In Experiment 1, we first examine whether an outcome density effect is observed in the context of participants’ instructional beliefs when students’ test performance is presented as a discrete outcome (high vs. low performance). A discrete outcome format is typically used within the outcome density literature (Chow et al., 2019). In Experiments 2 and 3, we explore whether outcome density affects participants’ instructional beliefs when test performance is presented as a continuous outcome, using either a skewed (Experiment 2) or normal (Experiment 3) distribution of outcomes. In Experiment 4 we examine whether inaccurate beliefs about the efficacy of a novel instructional technique persist when there is a negative contingency between the use of the novel teaching technique and student outcomes. In Experiment 5, we examine the outcome density effect when student outcomes are presented concurrently as they might in a classroom environment, and, finally in Experiment 6 we replicate the basic outcome density finding in a sample of teachers.

Experiment 1

Participants

Participants were recruited using Amazon’s Mechanical Turk (Mturk). Participation was restricted to participants from the US who had at least a 95% approval rate on the site. Participants were paid 80 cents for participation in the study. A power analysis suggested that a minimum sample size of 78 participants was required to detect an effect size of .65 with 80% power. We recruited a total of 80 (48% female) participants to account for the possibility that data would need to be discarded. Forty-one participants were randomly allocated by computer to the high outcome density (High-OD) condition and 39 participants were allocated the low outcome density (Low-OD) condition. The average age of the participants was 37.08 (standard deviation (SD) = 9.53).

Material and procedure

Before commencing the experiment, participants were told to imagine that they were a school teacher trialling a new instructional method called ‘Kalavatic teaching’. Participants were told that each student in their class had either been taught with this novel method or an ‘ordinary teaching’ method, and it was their goal to determine whether the new teaching method was effective at improving performance in an examination. Participants were told that this novel method of teaching was hypothesised to improve memory and academic performance in students.

Participants then performed a contingency learning task that consisted of a training and test phase. Each trial during the training phase represented one student in the class. For each trial, participants observed whether Kalavatic teaching (i.e. cue-present) or ordinary teaching (i.e. cue-absent) was administered to that particular student. Participants were then asked to predict the student’s performance in the examination. Finally, after making their prediction, the participant was shown an outcome that was ostensibly the student’s actual performance in the examination (see Fig. 1). There were 30 trials in the training phase. This number of trials was selected because it maps onto typical class sizes in the US. Cue presence/absence was randomly determined for each trial and each participant such that the cue was present on 50% of trials in the training phase.

For Experiment 1, the outcome (and prediction) was binary with students either having ‘high’ or ‘low’ performance in the examination. Cue presence and absence was distinguished by presenting the teaching method used in large blue font on each trial, either Kalavatic teaching on cue-present trials or ordinary teaching on cue-absent trials. The prediction was accompanied by the instruction ‘What level of test performance do you expect?’. Participants indicated their prediction by clicking a button with their mouse (either ‘high’ or ‘low’). After making their prediction, the outcome was displayed in the format ‘Actual Performance – high [low]’ in green font. Each trial was participant paced, such that they clicked ‘Next’ when they were ready to progress to the next trial.

During the test phase, participants were asked to make a causal rating by judging how effective Kalavatic teaching was compared to ordinary teaching on a scale ranging from − 100 (‘Effectively IMPAIRS academic performance’) to 100 (‘Effectively IMPROVES academic performance’). The midpoint (zero) was labelled with the anchor (‘Completely ineffective’). The question asked ‘On a scale from − 100 to 100%, rate how effective you think the teaching technique was compared to doing ordinary teaching, if at all’. At the top of the page there was a heading: ‘Kalavatic teaching vs. ordinary teaching’ (see Fig. 2). The following note was also provided below the rating to participants while they made their judgement:

‘Note that intermediate negative values indicate the teaching technique actually makes academic performance worse whereas intermediate positive values indicate that the teaching technique was effective in improving academic performance’.

Design

Participants were randomly allocated to either a high outcome density condition or a low outcome density condition, the difference being the ratio of outcomes. For the High-OD condition 20/30 of students had ‘high’ performance in the examination (10/30 ‘low’), while for the Low-OD condition 10/30 had ‘high’ performance on the examination (20/30 ‘low’). The outcome for each trial (high/low) was presented in a random intermixed fashion from the specified distributions. In both conditions there was no contingency between the presentation of the cue (Kalavatic teaching vs. ordinary teaching) and the outcome (high vs. low; Δp = 0).

General analytical approach

In each experiment we compare the high and low outcome density conditions on each dependent variables separately. Comparisons are made using both Bayes Factors and Frequentist statistics. Where reported, the BF₁₀ is the likelihood of the alternative model compared to the null model given the data, where the null model specifies no difference between groups. For the between-within-subjects analyses of variance (ANOVAs), we report the Bayes Factor Inclusion/Exclusion (BF_inc/BF_excl) across matched models (Rouder, Morey, Verhagen, Swagman, & Wagenmakers, 2017), which indicates the evidence that a model including the interaction term is a better fit for the data compared to an equivalent model without an interaction term.

Results and discussion

Training data

We begin by examining participants’ predictions made throughout training. Of primary interest is whether outcome density condition interacts with cue presence. A 2 (Kalavatic teaching vs. ordinary teaching) X 2 (high vs. low outcome density) between-within-subject ANOVA was performed, with the proportion of ‘high’ predictions used as the dependent variable. As shown in Fig. 3a, participants made substantially more ‘high’ predictions of the student’s performance when they were taught with Kalavatic teaching (M = .63, SD = .26) compared with when they were taught with ordinary teaching (M = .47, SD = .27), F = 15.77, p < .001, η_p² = .17, BF₁₀ > 100. Furthermore, participants made more ‘high’ predictions on average when they were in the high outcome density group (M = .69, SD = .22) compared to the low outcome density group (M = .41, SD = .25), F = 79.82, p < .001, η_p² = .51, BF₁₀ > 100. Crucially, however, the interaction between cue presence and outcome density condition was not significant, F = 0.15, p = .70, η_p² = .002, BF_inc = .248. These results suggests that outcome density did not affect participants predictions of the effect of Kalavatic teaching relative to ordinary teaching in training. This finding is consistent with previous research and may suggest important differences between the processes involved in making predictions in training compared to a making causal judgements (Blanco & Matute, 2015; Chow et al., 2019). This finding will be discussed in greater detail in the general discussion.

Efficacy ratings

Next, we examined whether there was an outcome density effect with respect to participants’ causal ratings. As shown in Fig. 3b, participants in the high outcome density condition (M = 21.02, SD = 53.03) made significantly higher efficacy ratings compared to participants in the low outcome density condition (M = − 8.13, SD =53.75, t (78) = 2.44, p = .017, d = .55, BF₁₀ = 2.95). The efficacy ratings for participants in the high outcome density group were significantly higher than zero, t (40) = 2.54, p = .015, d = .40, BF₁₀ = 2.84, while the efficacy ratings for the low outcome density group were not, t (38) = .94, p = .351, d = − .15, BF₁₀ = .26.

The results of Experiment 1 suggest that the frequency of assessment outcomes affects the causal efficacy ratings made by participants with respect to a novel instructional technique. This finding replicates the classic outcome density effect that has typically been observed in causal or efficacy ratings in other learning contexts. The result provides an in-principle demonstration that similar contingency learning biases may affect judgements of educational effectiveness in the classroom. Participants observing frequent positive outcomes (high performance) appear to be more susceptible to incorrectly inferring that the novel teaching treatment is effective.

Experiment 2

While Experiment 1 established an outcome density effect on causal beliefs about instructional techniques using the discrete paradigm typical of contingency learning studies, such discrete outcomes are less frequent in the classroom. Most classroom-based assessments and exams are marked on a continuous scale (although some categorisation is also typical, e.g. A+). We therefore now turn in Experiment 2 to examining whether teaching beliefs are affected by the frequency of assessment outcomes when such outcomes are presented using a continuous scale.

The outcome density effect has recently been applied to continuous outcomes by Chow et al. (2019). They found that in a health context, continuous outcomes still produced illusory causation and outcome density effects. They used two types of distribution to test this. In Experiment 1, they used a bimodal distribution combining two normal distributions, one centred on high outcome values, one centred on low outcome values, varying the proportion of trials sampled from each distribution to create High-OD and Low-OD conditions. In Experiment 2, they used unimodal distributions that were either centred on a low value and positively skewed or centred on a high value and negatively skewed. Here in Experiment 2 we similarly utilise skewed unimodal distributions, before using symmetrical normal distributions in Experiment 3.