In a series of six experiments we examined the outcome density effect in relation to causal illusions about instructional practices. We hypothesised that the frequency of positive outcomes plays a role in promoting inaccurate beliefs about the efficacy of ineffectual instructional practices. Across all six experiments we showed that outcome density affects participants’ causal ratings such that when simulated student performance was high, participant observers were more likely to believe that a novel teaching technique was effective, despite the fact that it was either ineffective or detrimental to performance. Using a typical contingency learning paradigm, in Experiment 1 we showed that outcome density affects causal beliefs about instructional practices when the outcome is categorical. Like the vast majority of outcome density experiments, we showed that when the salient category (i.e. high performance) was more frequent, causal illusions were more likely. In Experiments 2 and 3 we examined the outcome density effect in a more educationally plausible scenario – when the outcome (i.e. students’ performance) was continuous. Both experiments indicated that when the distribution of students’ performance favoured frequent positive outcomes, participants were more likely to infer that a novel teaching technique was effective at promoting students’ academic performance. In Experiment 4 we showed that even when we controlled the contingency between the use of a novel teaching technique and students’ performance to be negative, participants who observed frequent positive outcomes continued to rate the teaching technique as effective. In Experiment 5, we demonstrated that outcome density effects similarly occur when student outcomes were presented concurrently, and, finally, in Experiment 6 we replicated the outcome density effect in a sample of teachers.
Inaccurate beliefs about the effectiveness of instructional techniques are widespread among educators (Dekker et al., 2012). Little is known, however, about the psychological mechanisms that develop and maintain these beliefs. Here, we have examined the role that contingency learning may play in maintaining false beliefs in teachers. While humans are generally good at tracking contingencies between events and outcomes (Shanks & Dickinson, 1988; Wasserman, 1990), there is clear evidence that there are robust biases that undermine these inferences (e.g. Don & Livesey, 2017; Hannah & Beneteau, 2009; Matute, Blanco, & Díaz-Lago, 2019). The current study replicates the outcome density effect in a new domain and suggests that outcome frequency may be an important factor in determining teachers’ instructional beliefs. The fact that when positive outcomes are frequent, participants’ causal inferences can vary markedly from reality, e.g. believing that an instructional practice is improving students’ performance when it is, in fact, impairing performance, has substantial implications for how teachers use classroom assessments and evaluate their students’ outcomes.
Importantly, the distributions used in the high outcome density conditions, where causal illusions tended to occur, closely mirror the distributions of typical classroom assessments used by practicing teachers. The quality of teachers’ judgements and decision-making about the efficacy of their practice will be determined by the extent to which the learning environment can provide access to informative feedback. This is particularly important because very often teachers are encouraged to frequently use classroom assessments to evaluate and improve their practice (Black & Wiliam, 1998). Indeed, assessments have been proposed to create ‘a moment of contingency’ (p. 285) where teachers are able to use the assessment as evidence and adapt their instruction accordingly (Wiliam, 2006). These propositions assume that teachers are able to accurately infer the contingency between their practice and their students’ performance on the assessment. However, if teacher’s inferences are biased by the frequency of an outcome then classroom-based assessments may have less utility as a tool for teachers to evaluate and improve their practice than proposed. Importantly, the results of these experiments make a clear, if not counter-intuitive, prediction that, all else being equal, teachers with high-performing students are going to be more susceptible to causal illusions and more likely to endorse ineffectual instructional practices. Further research is needed to examine this prediction in real-world educational contexts.
It is also worth noting that within educational research, teacher reports of efficacy are still often used as an indicator of an intervention’s efficacy, and sometimes the sole indicator of the intervention’s efficacy (e.g. Beauchemin, Hutchins, & Patterson, 2008; Berg, Bergendahl, Lundberg, & Tibell, 2003; Lage, Platt, & Treglia, 2000; Martens, Peterson, Witt, & Cirone, 1986). The current results suggest that it is difficult for researchers to rely on such reports to evaluate efficacy as they are influenced by the base rate of student performance (along with other biases such as demand characteristics). It has been suggested that there is a weak link between practice in schools and teacher education world-wide. Teacher students are provided with theory by their teacher educators which is not sufficiently integrated into classroom experiences, leading to the long-standing mismatch between theory and practice (Lillejord & Børte, 2016; Zeichner, 2010).
Detecting causal relationships within the classroom is a difficult task. Teachers are faced with discerning the relationship between a large number of outcomes, across both students and time, with a multitude of putative causes. This is in addition to various other complexities such as moderation, mediation, and autocorrelation. Within such a complex environment, individuals are likely to perceive meaningful patterns where there is only random noise (Gilovich, 1991). While, in some situations, this tendency may be less costly to the individual than failing to detect a pattern in the environment, it might also produce behaviours that are simply redundant (Blanco, 2017). In addition, as shown in Experiment 4 it may cause participants to fail to detect that a new practice is actually worsening performance, which may come at a considerable cost to students’ achievement. It should also be acknowledged that most previous research on how teachers learn, has focussed upon how practice can be better linked to theory, and not the other way around (Korthagen, 2017).
One explanation for this bias in contingency learning is that the tracking of cue-outcome contingencies is resource-intensive and requires the learner to attend to, and calculate on-line, the independent relationships between the cue and the outcome, and the outcome without the cue present (De Houwer, 2009; Mitchell, De Houwer, & Lovibond, 2009). To overcome this high cognitive demand, people rely on heuristics to estimate contingencies between events which can result in overestimating the frequency of salient events (Fiedler et al., 2009). In this case, evidence suggests that people often pay substantially more attention to instances where the cue and expected outcome co-occur (i.e. confirmatory trials; Crocker, 1982; Jenkins & Ward, 1965), which may, in turn, create inaccurate intuitions when the frequency of such confirmatory trials is high. This bias to attend to confirmatory trials appears to also be present when trial information is summarised, as in Experiment 5. Although one may argue that the presentation format is still sequential in nature—participants only ever saw one cue type on each screen—the cognitive resources required to track and compute average student performance for Kalavatic teaching and ordinary teaching online is greatly reduced in this format. Nevertheless we still find participants showing the tendency to overestimate the efficacy of Kalavatic teaching when the base rate of student performance is high than when it is low, even when student performance was equivalent under ordinary teaching methods.
In all six experiments, the clear outcome density effects observed in ratings of the efficacy of the novel educational practice were not reflected in cue-specific predictions made during training, nor judgements of mean outcome in the presence and absence of the novel training made during testing. In some instances, participants gave higher predictions in the presence of the cue than in its absence (consistent with an illusory causation effect) but the strength of this difference was equivalent in Low-OD and High-OD conditions. This is not an uncommon finding in research on the outcome density effect, which is typically observed more clearly on causal and efficacy ratings than on other measures. In other instances, particularly when the outcome presented is continuous and there is no meaningful contingency between cue and outcome, there was little evidence of any illusory causation in these predictions. It is worth noting that the aggregate predictions closely resemble the ratings during training, to which participants receive extensive feedback. The lack of outcome density effect on these ratings may reflect that through the training process participants have learnt to accurately make predictions about the expected outcome. Indeed, it is interesting that despite often making these aggregate predictions accurately, participants then make inaccurate efficacy ratings.
Some researchers have argued that this disassociation between prediction ratings and causal ratings indicates the involvement of a different system or psychological process when participants are making predictions rather than causal judgements (Allan, Siegel, & Tangen, 2005; Blanco & Matute, 2015; Waldmann, 2001). However, given that this is a single dissociation where the effect is reliable on one measure and inconsistent on the other, it may simply come down to the sensitivity of each individual measure to subtle biases (De Houwer, Vandorpe, & Beckers, 2007; Vadillo et al., 2005; Vadillo, Musca, Blanco, & Matute, 2011). Ultimately, the extent to which aggregate predictions or causal ratings predict actual classroom behaviour is an empirical question requiring further investigation.
In any case, it is important to note that the type of measure on which outcome density effects have most reliably been observed in other contexts also generated outcome density effects in this educational context. An important assumption in causal learning research is that causal and efficacy ratings reflect beliefs about treatments, interventions, or practices that have real consequences for people’s choices. That is, we choose whether or not to purchase a treatment or engage in an activity because we believe that doing so will cause a desirable outcome. It remains to be seen whether illusory causation in this context translates to real-world implications for teachers’ decisions to adopt new educational practice. With this in mind, it is noteworthy just how influential some educational practices have been despite their poor evidence base and apparent lack of any causal effectiveness in the classroom (Bruyckere et al., 2015; Kirschner et al., 2006).
The conditions that give rise to the outcome density effect are likely to be encountered in everyday life and, indeed, may be desirable in some instances. For instance, the outcome density effect occurs in medical learning scenarios when patients frequently spontaneously recover. For this reason, researchers have highlighted the relevance of the effect to the popularity of complementary and alternative medical treatments for mild ailments that are likely to spontaneously remit without treatment (e.g. Echinacea use for the common cold). This has led some to comment that we can do nothing to change the outcome density effect (Matute et al., 2015). In education, we obviously want students to perform as well as possible, presenting the same challenge in reducing outcome density effects. However, these findings suggest that if a teacher has a large number of high-performing students, they may have difficulty identifying optimal teaching methods that will help those not performing well.
Some general solutions to biases in contingency learning have been proposed. For instance, Vadillo, Matute, and Blanco (2013) showed that if there are reliable alternative causes for an outcome then individuals are less prone to the outcome density effect. It might, therefore, be worthwhile to stress to teachers that when considering their students’ performance on assessments they need to bear in mind factors other than their teaching that will affect the outcomes (e.g. natural ability, maturation, the difficulty of the assessment, etc.). Another potential avenue for correcting biased causal judgements is to provide explicit base rate expectancies regarding student performance, particularly when the student cohort is high achieving and, therefore, academic performance is negatively skewed (i.e. most students will perform well). In a study by Blanco and Matute (2019, Experiment 1), participants were either pre-trained to expect a high outcome base rate or not pre-trained in a zero-contingency learning task; they found that participants who were exposed to a high base rate in pre-training showed reduced illusory causation in causal judgements compared to control participants, despite witnessing an identical high base rate zero-contingency cue-outcome relationship. In a subsequent study, they showed the reverse effect, where participants who were pre-trained on a low outcome base rate showed an inflated illusory causation effect relative to control participants (Blanco & Matute, 2019, Experiment 2). These findings suggest that causal illusions can be influenced by prior expectancies about the base rate of outcome occurrence. Although not measured in the current set of experiments, it would be interesting to see if teachers’ prior expectations about student performance influenced their susceptibility to cognitive biases such as the outcome density effect. However, it is also worth mentioning that teachers’ misconceptions are often resistant to interventions designed to correct them. In one study, Ferrero, Hardwicke, Konstantinidis, and Vadillo (in press) provided educators with texts to refute their misconceptions. While the intervention affected short-term beliefs, it had no effect in the long-term (after 30 days) and disconcertingly increased the extent to which teachers indicated that they were willing to implement educational practices based on the misinformation.
Outcome density effects as they relate to educational assessment may offer additional opportunities for intervention. For example, a relatively simple solution in practical terms might be to set more difficult classroom assessments, thereby artificially reducing the frequency of high-achieving students. However, such an intervention would need to be evaluated against the risk of potentially negative side effects (e.g. negative impacts on student self-efficacy). Another potential focus for intervention could be on how such results are communicated to teachers and potentially utilising ranks or standardised scores rather than grades or percentage-correct-type feedback. While the present study cannot speak to the efficacy of such interventions, these appear to be plausible and worthwhile avenues for further research.
While the current study has shown the outcome density affects causal ratings about educational practices in an experimental paradigm, further research is needed to examine how well such findings generalise to the classroom. In particular, the current study did not utilise real teachers as participants. While we have no reason to suspect that teachers are likely to be particularly susceptible or immune to the contingency learning mechanisms responsible for this bias, teachers may have particularly strong prior beliefs and expectations that may affect how causal illusions play out in the classroom (Mutter, Strain, & Plumlee, 2007; Yarritu & Matute, 2015). Furthermore, teachers have access to not only trial-by-trial observation in the case of educational assessments, they have additional information such as means and distributions. This information may allow teachers to make more accurate inferences, particularly if teachers receive training in the scientific method. However, in some circumstances it may make the effect worse because teachers may confound aggregate level correlations with student-level ones (Fiedler et al., 2007). The findings of Experiment 5 suggest that presenting outcomes concurrently as might often occur as a teacher monitors a class’ performance, did not change the underlying pattern of results such that high outcomes promoted causal illusions. While there are many conceivable ways that assessment information can be aggregated in an educational context, these findings suggest that receiving information about multiple outcomes does not necessarily eliminate the outcome density effect.
The current study has shown that the outcome density effect can be applied to causal inferences about the efficacy of teachers’ instructional practices. In six experiments we have shown that, when students frequently perform well, participants tend to infer that a novel teaching method is working despite the fact that it either has no effect or a negative effect. This suggests that teachers may have difficulty using classroom assessments to evaluate the efficacy of their practice. This is especially true because most classroom assessments are designed so that a majority of students perform well, providing a context where the outcome density effect is particularly likely. While further research is needed to examine how such phenomena play out in the classroom, the current study provides a plausible mechanism for understanding why teachers often believe in the efficacy of practices that have little scientific support.