Seeing the unseen? Illusory causal filling in FIFA referees, players, and novices

Brockhoff, Alisa; Huff, Markus; Maurer, Annika; Papenmeier, Frank

doi:10.1186/s41235-016-0008-5

ORIGINAL ARTICLE
Open access
Published: 22 September 2016

Seeing the unseen? Illusory causal filling in FIFA referees, players, and novices

Alisa Brockhoff¹,
Markus Huff¹,
Annika Maurer¹ &
…
Frank Papenmeier¹

Cognitive Research: Principles and Implications volume 1, Article number: 7 (2016) Cite this article

3212 Accesses
6 Citations
34 Altmetric
Metrics details

Abstract

Humans often falsely report having seen a causal link between two dynamic scenes if the second scene depicts a valid logical consequence of the initial scene. As an example, a video clip shows someone kicking a ball including the ball flying. Even if the video clip omitted the moment of contact (i.e., the causal link), participants falsely report having seen this moment. In the current study, we explored the interplay of cognitive-perceptual expertise and event perception by measuring the false-alarm rates of three groups with differing interests in football (soccer in North America) (novices, players, and FIFA referees). We used the event-completion paradigm with video footage of a real football match, presenting either complete clips or incomplete clips (i.e., with the contact moment omitted). Either a causally linked scene or an incoherent scene followed a cut in the incomplete videos. Causally linked scenes induced false recognitions in all three groups: although the ball contact moment was not presented, participants indicated that they had seen the contact as frequently when it was absent as in the complete condition. In a second experiment, we asked the novices to detect the ball contact moment when it was either visible or not and when it was either followed by a causally or non-causally linked scene. Here, instead of presenting pictures of the clip, the participants were give a two-alternative forced-choice task: “Yes, contact was visible”, or “No, contact was not visible”. The results of Experiment 1 indicate that conceptual interpretations of simple events are independent of expertise: there were no top-down effects on perception. Participants in Experiment 2 detected the ball contact moment significantly more often correctly in the non-causal than in the causal conditions, indicating that the effect observed in Experiment 1 was not due to a possibly influential design (e.g., inducing a false memory for the presented pictures). The theoretical as well as the practical implications are discussed.

Significance

The current work is, to our knowledge, the first to combine a study of perceptual-cognitive skills with event perception and it is, therefore, mainly of an explorative nature. We took theoretical research out into the real world and investigated the role of top-down factors on event completion by testing three groups with a differing level of interest and experience (novices, players, and FIFA referees) on a simple event-completion task (Strickland & Keil, 2011). Although there is considerable evidence that expertise in sports domains is connected to superior perceptual-cognitive skills, our results indicate no influence of these skills on event perception. They rather support a recent publication by Firestone and Scholl (2015b), who concluded that perception may be largely independent of top-down influences. Such a proposition not only challenges our theoretical understanding of event perception, but also has substantive practical implications for fairness in sports by strongly advocating the increased use of technology instead of perceptual training programs for match officials.

Background

During the FIFA World Cup tournament in 2010, the referees made many controversial calls that influenced the outcomes of matches so tremendously that the then-FIFA president apologized for the referees’ mistakes. In response, the use of goal-line technologies was officially allowed in 2012, which since have become more and more common at the very top levels of the game. The current study was inspired by a controversial goal that happened in a Bundesliga match in 2013, a match in which no goal-line technology was used. The ball went through a hole in the side netting and everyone, including the referees, mistook it for an actual goal. This rare phantom goal demonstrated the limits and biases of human perception. Such a phantom goal is even more surprising in the light of numerous studies that reported experts to have superior domain-specific perceptual-cognitive skills (e.g., Williams, 2000), an expertise that even leads to an advantage in motion outside the expert’s area (e.g., Romeas & Faubert, 2015). Vision and perception are shaped by one’s individual experiences and knowledge: the mental representations of events. Such representations are reconstructed and updated through experience and knowledge and provide the basis for understanding the world around us (Zacks & Tversky, 2001). However, constant reconstruction and updating of mental representations make event perception effortful and, thus, fragile. Strickland and Keil (2011) reported a (possibly consequential) bias in event perception: the event-completion effect. Video clips that indicated a causal implication (example sequence: an athlete running towards a ball – cut – a flying ball) produced higher false-alarm rates for pictures displaying the athlete kicking the ball than video clips that did not imply any causation. The authors suggested that observers either confused online predictions (the ball will be kicked and will bounce down the field) with actually seen elements of the scene, or relied on schema- or principle-based post hoc inferences (a ball bouncing down a field must have been kicked).

Perceptual-cognitive expertise A number of studies have reported that expert athletes show superior perceptual-cognitive skills compared to novices in sport-specific tasks, including visual cue usage (Abernethy, Gill, Parks, & Packer, 2001; Ward, Williams, & Bennett, 2002; Williams, 2000), visual search strategies (Vaeyens, Lenoir, Williams, & Philippaerts, 2007; Williams, 2000), and recall and recognition of meaningful patterns (Bell, Boshuizen, Scherpbier, & Dornan, 2009; Lesgold et al., 1988; Reingold & Sheridan, 2011; Smeeton, Ward, & Williams, 2004). In general, experts’ demonstration of perceptual-cognitive expertise can go beyond the specific sports domain (Romeas & Faubert, 2015; Romeas, Guldner, & Faubert, 2016) and can help, for example, in learning complex neutral dynamic scenes (Faubert, 2013) or to outperform novices in everyday tasks (e.g., crossing a street as a pedestrian in a crowded inner city: Chaddock, Neider, Voss, Gaspar, & Kramer, 2011). While the majority of the reported studies intended to identify the exceptional perceptual-cognitive skills of experts by focusing on pattern recognition, decision-making, or biological motion perception, mainly aiming to create training programs or prevent incidents that result in injuries, the current paper is interested in a fundamental understanding of experts’ perception, or memory, of events.

Hypotheses In the current study, we conceptually replicated the design by Strickland and Keil (2011) and tested two expert groups (football players and FIFA referees) and a control group (students with no interest in football). We wondered whether the perceptual-cognitive skills of experts would prevent the event-completion effect when observing familiar motion. Based on the currently most prominent model of event perception, the event segmentation theory (EST; Zacks, Speer, Swallow, Braver, & Reynolds, 2007), prediction errors occur and an event boundary is perceived when certain event features change (e.g., situational features such as spatial location and characters: Zacks, Speer, & Reynolds, 2009). If online predictions of experts are more detailed, it may be more likely that the missing ball contact is actually reported to be perceived as a missing situational feature in the schema, and, thus, not perceptually filled in. More specifically, a more detailed representation would result in a lower false-alarm rate in referees and players.

We do have reason to hypothesize that the superior perceptual-cognitive skills of experts could prevent the event-completion effect since they may process visual information not only qualitatively but also quantitatively differently, but the opposite could be the case as well. Mann, Williams, Ward, and Janelle (2007) analyzed eye movements of experts and novices and revealed that the skilled performers required fewer fixations of longer duration to gather relevant information, compared to novices, who made many short fixations. Thus, novices consider the potential influence of all available visual information while experts concentrate on the relevant information by perceiving the multidimensional complexity of the situation (further examples are in Haider & Frensch, 1996; Hattie, 2003; North & Williams, 2008). Expertise was also shown to allow for a more efficient switch of attentional foci. Underwood, Chapman, Brocklehurst, Underwood, and Crundall (2003) observed that the scan paths during driving differ depending on the expertise of the driver. Novices were not able to switch their focus of attention as a response to potential hazards, while experts constantly monitored other road users. In the current study, the hardwired event schemata of experts could actually lead to a stronger bias if the ball contact is considered irrelevant information in the representation of the event. Or stated differently, novices may have a more detailed schema of the event (e.g., a ball kick) because, in their lives, there is no need for them to condense the schema for more efficient processing. Referees, however, have to make 3 or 4 decisions in each minute of the actual play time (Williams, 2013) and, thus, they benefit significantly from filtering visual information rigorously. On the other hand, experts may have a more detailed schema than a novice due to frequent exposure and the ability to switch their focus of attention if needed. However, based on the EST, this again would result in a stronger event-completion effect. If experts have a rather global observational approach to familiar scenes, they may even have event models that account for missing information and changes in visual information. The missing ball contact may then not be surprising; therefore, it may not be detected as an error, and will, thus, not result in the perception of an event boundary but in an event-completion effect. Finally, it is also possible that there are simply no top-down effects of cognition on perception as recently claimed by Firestone and Scholl (2015b). The two authors carefully reviewed hundreds of studies and extracted general (design) pitfalls of each approach to study the effect of cognition on attention. We will discuss our results with regard to the two disparate but interrelated systems of perception and memory.

Experimental overview

To ensure that we really tested perceptual-cognitive differences in event perception – and not declarative knowledge and analysis skills – we intentionally used video clips of dynamic events that did not require knowledge of the game, depicting actions that definitely have been observed by each participant before, independent of their level of interest in football. We cut out scenes from a real match, including corner kicks, kick-offs, free kicks, and throw-ins. In Experiment 1, we conceptually replicated the design of Strickland and Keil (2011) and presented the participants with (1) the complete sequences (i.e., including the contact moment), (2) an incomplete causal sequence (i.e., excluding the ball contact), or (3) an incomplete non-causal sequence (i.e., excluding the ball contact with a non-logical follow-up; example sequence: player about to throw the ball in – cut – a different player being fouled). However, note that our restricted sample of experts did not allow us to run a between-subject design as was done in the original study. To ensure that our design would not alert the participants to the purpose of the event, we left out one condition: a visible ball contact that was followed by a non-causal scene. In Experiment 2, we further controlled the design by showing video clips that either included or excluded a ball contact. Participants were fully informed about the probabilities of each clip type occurring (50 %) and were given a forced choice of the two alternatives (ball contact seen: yes or no). The latter inevitably brought in the aspect of attentional control by “knowing what to look for”; however, it helped us to understand further at which point of information processing the bias has its origin. We are aware, however, that our (or any) design may not be able to grasp the fine line between perception, memory, and post-perceptual judgment. Our results will be discussed with a focus on the event-completion effect and its occurrence in different groups. Any interpretation concerning perception or memory has to be regarded with caution.

Methods

Stimuli were presented on 15.4-inch notebooks using PsychPy (Peirce, 2008). The participants were seated at a distance of 60 cm from the screen. Footage of a soccer match of the Young Boys Bern against the Grasshoppers Zürich that took place on 23 March 2014 was used as stimulus material. The footage was compiled out of three camera perspectives. Clips of about 20 seconds each were created. Each clip consisted of two parts shot from different camera angles. The assignment of clips to conditions was balanced across participants in each experiment. In general, the two parts of each clip were causally linked or not (Fig. 1 c or d), and the ball release or contact (kick) moment¹ (Fig. 1 b) was visible or not. Figure 1 depicts example sequences.

In Experiment 1, we conceptually replicated the design by Strickland and Keil (2011) and used the following combinations of video clips (see Fig. 1): complete (A–B–C) vs incomplete causal (A–C) vs incomplete non-causal (A–D).² In Experiment 2, the basic idea of the design was similar; however, we measured only the detection rate of the contact moment and further added a condition in which the ball contact (B) was visible in non-causal sequences as well (A–B–D). In Experiment 1, each participant saw seven response pictures (see Strickland & Keil, 2011) after each clip. Three pictures were selected from the first part of each clip (a yes filler), three pictures were related to the yes-filler items but came from other parts of the game, such as other players preparing for a corner kick (a no filler), and the critical picture depicted the moment of ball contact or ball release (contact). The participants were asked whether they had seen the picture in the clip: Yes (“press 1”) or No (“press 9”). See Fig. 2 for the response pictures for the example sequences (Fig. 1). Further, they were asked to rate how certain they were about their answer (on a scale from 1, not at all, to 5, extremely).

In Experiment 2, we showed the participants 40 clips and asked whether they had seen the ball contact moment (B in Fig. 1). Instead of response pictures (Fig. 2), we gave the participants forced-choice alternatives: “Yes, I have seen the ball contact” and “No, I have not seen the ball contact”. The experiment was conducted as a mixed 2 (ball contact visible, within) × 2 (second part of the clip: causal or non-causal, between) subject design. We measured the sensitivity to the contact moments as d ^′ and response criterion c (see Experiment 2 for further details).³

An expertise questionnaire tested basic declarative football knowledge using 11 questions, for example, “In which country did the last FIFA World Cup take place?” (see Additional file 1: Appendix for a complete list of questions).

Statistical analysis

In Experiment 1, we report expertise knowledge, proportion correct, proportion of yes answers, and confidence in the recognition test as separate dependent variables. Because of the binary response variable (yes or no), we analyzed effects on proportion correct and proportion of yes answers with a generalized mixed effect model (with a logit link), using the lme4 package (Bates, Sarkar, Bates, & Matrix, 2007; Pinheiro, Bates, DebRoy, & Sarkar, 2006) in the R environment (R Development Core Team, 2016). Participants were specified as the random factor to control for their associated intraclass correlation. We present the type II Wald χ ² test results from GLMER. Further, we provide the results of planned contrasts (based on our hypotheses and the original study’s results). Additionally, the credibility of the found null effect and the likelihood of the occurrence of the null and the alternative hypotheses are presented with Bayesian statistics and JASP (JASP Team, 2016). In Experiment 2, we report the sensitivity measure d ^′.

Experiment 1: Conceptual replication of the original study with groups with different expertise levels

Method

Participants Three groups of participants were tested on three different occasions. There were 42 novices (14 male and 28 female students, age M = 25.76, SD = 6.81 years), 16 football players of a seventh German football league (all male, age M = 24.81, SD = 3.64 years), and 18 referees from Switzerland appointed as officials for matches in competitions organized by the Fédération Internationale de Football Association (FIFA) (all male, age M = 32.2, SD = 4.93 years). Two referees were excluded because they retired from their active positions as official FIFA referees. The students tested participated in return for monetary compensation or course credits. The football players were students of the University of Tuebingen’s department of sports science and their participation was a course requirement. The referees participated during one of their regular advanced training courses and were not compensated monetarily.

Design and procedure The first part of each clip was between 11.6 and 15.1 seconds long. A keeper during a kick-off was depicted in three clips, a throw-in in one clip, a corner kick in three clips, and a free kick in two clips. A clip was either shown completely or shortened by the removal of the moment of ball contact (kick) or ball release (throw-in). We deleted 1–4 frames; however, the deletion for causal and non-causal clips was always exactly the same. The second part of the clip lasted between 5.7 and 8.4 seconds. Each participant saw nine clips spread equally across three conditions: complete first part with causally linked second part (complete), shortened first part with causally linked second part (incomplete with causally linked sequence), or shortened first part with second part that was not causally linked (incomplete with non-causally linked sequence). See Fig. 1, combinations A–B–C, A–C, and A–D. The experiment reported here took 15 minutes. The participants received instructions and immediately started with the event-completion task. After each clip, seven response pictures (Strickland & Keil, 2011) were shown (see Fig. 2).

Results

Expertise knowledge We calculated the proportion of correctly answered questions. The football players’ declarative football knowledge was significantly higher compared to the novices’ (M =.86, SD =.34 and M =.51, SD =.50, respectively): t(50.83)=10.70, p <.001. We regarded the referees’ football knowledge as a precondition for their FIFA employment and did not test them on the questionnaire.

Proportion correct We analyzed participants’ performance in the recognition test. Because the critical contact item was a target item in the complete condition and a distractor item in the remaining two conditions, we excluded this item from this analysis. We calculated the proportion of correctly answered questions and fitted a generalized mixed effect model with the binary dependent variable yes/no answers. Expertise was inserted as the fixed effect, and participants were specified as the random factor. The factor expertise was significant [ χ ²(2) = 17.621 and p <.01]. Post-hoc Tukey comparisons helped to specify the difference between the three groups of expertise. As can be seen in Fig. 3, the two expert groups outperformed the novices: for players vs novices z = 3.33 and p <.01, and for referees vs novices z = 3.254 and p <.01. We observed no differences between the players’ and the referees’ performance (z = 0.06, p =.99).

Proportion of yes answers. We analyzed the effects on the binary dependent variable (yes/no answers) with a generalized linear mixed model (with a logit link), using the lme4 package in the R environment. Participants were specified as a random factor to control for their associated intraclass correlation. We used the raw data and fitted a model including all main effects and interactions of expertise, item type, and condition as fixed effects. We analyzed the resulting model using type II Wald χ ² tests.

Our main finding is a significant two-way interaction of condition and item type [ χ ²(4)=11.52 and p =.021]. The three-way interaction of expertise, condition, and item type was not significant [ χ ²(8)=6.91 and p =.546]. Further, there was a significant main effect of item type [ χ ²(2)=1262.00 and p <.001], and a significant interaction of expertise and item type [ χ ²(4)=41.05 and p <.001]. None of the other main effects and interactions reached significance,p>.17. While the proportion of yes answers in the non-causal condition was significantly lower (as expected), it should be noted that the false-alarm rate was still over chance level. However, our findings are in line with the results found in the original study by Strickland and Keil (Strickland and Keil 2011). See Fig. 4 a for the analyzed proportions in each expertise group.

To investigate the interactive relationship of the two categorical variables condition and item type, we calculated contrasts. The underlying glmer model was now reduced (see Fig. 4 b for the aggregated data used) and did not include expertise anymore, since the given expertise level (novice, player, or referee) did not interact with condition*item type (non-significant three-way interaction reported above). To prevent α inflation at this level of the analysis, a Bonferroni correction (0.05/3=0.016) for multiple comparisons was applied. Further insights into the variability of the (log) mean difference between the observed answers are given with 95 % confidence intervals (CI).

As expected, two of the three contrasts produced significant results. The number of yes answers (i.e., the number of reports indicating that the contact moment had been seen) in the condition with implied causation (causal) differed significantly (z = 22.21 and p <.001) from the number of yes answers in the condition without implied condition (non-causal), with an estimated (log) mean difference of 4.03, CI [3.60, 4.46]. The non-causal incomplete condition also differed significantly from the condition in which the ball contact was included (complete condition), z = 16.51 and p <.001 (estimated difference = 3.95, CI [3.52, 4.38]). The contrast of the causal vs the complete condition was not significant, z = 0.73 and p = 0.75 (estimated difference = 0.15, CI [ −0.36, 0.68]).

Bayesian statistics We calculated a Bayes factor analysis for the proportion of yes answers to the contact items in the no causal implication and the conditions with causal implication. The Bayes factor evidence for the null hypothesis in a Bayesian repeated measures ANOVA comparing a model that included the main effects of condition (with causal implication or no causal implication) and expertise (novices, players, and referees) with a model including additionally the interaction of these factors amounted to 4.99, which is conventionally classified as substantial (Rouder, Morey, Speckman, & Province, 2012; Wetzels & Wagenmakers, 2012).

Confidence A repeated measures ANOVA was performed with confidence as the dependent variable (see Fig. 5). We observed a significant main effect of expertise [ F(2,71)=10.27 and p <.001]. Players’ and referees’ confidence was significantly higher than novices’ confidence,p<.004. Again, there was no difference between players and referees (p =.501). Further, we observed a significant main effect of item type [F(2,142) = 27.20 and p <.001], indicating that confidence was higher for the no-filler items compared to the contact items and the yes-filler items,p<.003. The interaction of item type and condition approached significance [F(2,142) = 2.42 and p =.049]. In this context, however, we observed no significant differences between the different conditions with regard to the contact item responses, p>=.247.

Discussion

To capture online perceptual performance errors, we presented video clips that implied causation (or not) and asked the participants afterwards whether they had seen certain pictures (or not). While overall performance (proportion correct) was higher for experts than for novices, all participants were prone to the event-completion effect (analyzed with the proportion of yes answers). Furthermore, we measured confidence rating to examine whether experts show illusionary superiority biases (observed as a coping mechanism for stress and self-esteem protection in referees; e.g., Wolfson & Neave, 2007). We observed higher confidence ratings in the referee and the player groups compared to the novices – however, they actually performed better, thus, showing an actual superiority instead of an illusionary superiority bias. This was expected based on the experts’ superior recall and recognition of meaningful patterns and details (Bell et al., 2009; Lesgold et al., 1988; Reingold & Sheridan, 2011; Smeeton et al., 2004). The results of the present study replicate the event-completion effect measured in the original study by Strickland and Keil (2011). The results exemplify how the human information processing system struggles with perceiving and recalling details of an everyday life event. We found these difficulties to be independent of task-specific expertise, suggesting that on a certain basic perceptual level, if presented with a simple action event, humans equally chunk or segment continuous activity, resulting in the representation of a series of discrete events (Newtson, 1973) – a process that allows for online and post-hoc inferences, and illusory causal fillings. However, before we interpret these results further, we need to ensure that the effect found is not due to the study instructions, which may have biased the participants to assume ball contact. Participants may have assumed they had seen contact because they did not know that omitted contact moments were an option.

The question remains whether the observed event-completion effect is a phenomenon based on online predictions or rather the result of backwards mapping, an effect known from text comprehension research (e.g., Potts, Keenan, & Golding, 1988). Although, backwards mapping was originally used to explain anticipation processes during text comprehension, its adaption to causal fillings in event perception is straightforward: participants base their decisions of a recognition item at the very moment of presentation and check if the picture is a plausible cause of what they have already watched. In other words, a contact picture would be a plausible, and natural, cause of a video clip that showed a football player approaching a ball.

Experiment 2: contact – yes or no?

In a detection experiment, we presented participants with complete and incomplete stimuli with causal and non-causal continuation and asked them to indicate whether they had seen the contact moment or not. This may prevent backwards mapping because participants “know what to look for” before the presentation of the video clip. Further, without recognition items (pictures), the participants are less prone to picture-based biases, which allows us to measure participants’ discrimination performance in the non-causal and causal conditions. If the event-completion effect is primarily a phenomenon based on online predictions, participants’ discrimination performance should be lower in the causal compared to the non-causal condition.