Stimuli were presented on 15.4-inch notebooks using PsychPy (Peirce, 2008). The participants were seated at a distance of 60 cm from the screen. Footage of a soccer match of the Young Boys Bern against the Grasshoppers Zürich that took place on 23 March 2014 was used as stimulus material. The footage was compiled out of three camera perspectives. Clips of about 20 seconds each were created. Each clip consisted of two parts shot from different camera angles. The assignment of clips to conditions was balanced across participants in each experiment. In general, the two parts of each clip were causally linked or not (Fig. 1
c or d), and the ball release or contact (kick) moment1 (Fig. 1
b) was visible or not. Figure 1 depicts example sequences.
In Experiment 1, we conceptually replicated the design by Strickland and Keil (2011) and used the following combinations of video clips (see Fig. 1): complete (A–B–C) vs incomplete causal (A–C) vs incomplete non-causal (A–D).2 In Experiment 2, the basic idea of the design was similar; however, we measured only the detection rate of the contact moment and further added a condition in which the ball contact (B) was visible in non-causal sequences as well (A–B–D). In Experiment 1, each participant saw seven response pictures (see Strickland & Keil, 2011) after each clip. Three pictures were selected from the first part of each clip (a yes filler), three pictures were related to the yes-filler items but came from other parts of the game, such as other players preparing for a corner kick (a no filler), and the critical picture depicted the moment of ball contact or ball release (contact). The participants were asked whether they had seen the picture in the clip: Yes (“press 1”) or No (“press 9”). See Fig. 2 for the response pictures for the example sequences (Fig. 1). Further, they were asked to rate how certain they were about their answer (on a scale from 1, not at all, to 5, extremely).
In Experiment 2, we showed the participants 40 clips and asked whether they had seen the ball contact moment (B in Fig. 1). Instead of response pictures (Fig. 2), we gave the participants forced-choice alternatives: “Yes, I have seen the ball contact” and “No, I have not seen the ball contact”. The experiment was conducted as a mixed 2 (ball contact visible, within) × 2 (second part of the clip: causal or non-causal, between) subject design. We measured the sensitivity to the contact moments as d
′ and response criterion c (see Experiment 2 for further details).3
An expertise questionnaire tested basic declarative football knowledge using 11 questions, for example, “In which country did the last FIFA World Cup take place?” (see Additional file 1: Appendix for a complete list of questions).
Statistical analysis
In Experiment 1, we report expertise knowledge, proportion correct, proportion of yes answers, and confidence in the recognition test as separate dependent variables. Because of the binary response variable (yes or no), we analyzed effects on proportion correct and proportion of yes answers with a generalized mixed effect model (with a logit link), using the lme4 package (Bates, Sarkar, Bates, & Matrix, 2007; Pinheiro, Bates, DebRoy, & Sarkar, 2006) in the R environment (R Development Core Team, 2016). Participants were specified as the random factor to control for their associated intraclass correlation. We present the type II Wald χ
2 test results from GLMER. Further, we provide the results of planned contrasts (based on our hypotheses and the original study’s results). Additionally, the credibility of the found null effect and the likelihood of the occurrence of the null and the alternative hypotheses are presented with Bayesian statistics and JASP (JASP Team, 2016). In Experiment 2, we report the sensitivity measure d
′.
Experiment 1: Conceptual replication of the original study with groups with different expertise levels
Method
Participants Three groups of participants were tested on three different occasions. There were 42 novices (14 male and 28 female students, age M = 25.76, SD = 6.81 years), 16 football players of a seventh German football league (all male, age M = 24.81, SD = 3.64 years), and 18 referees from Switzerland appointed as officials for matches in competitions organized by the Fédération Internationale de Football Association (FIFA) (all male, age M = 32.2, SD = 4.93 years). Two referees were excluded because they retired from their active positions as official FIFA referees. The students tested participated in return for monetary compensation or course credits. The football players were students of the University of Tuebingen’s department of sports science and their participation was a course requirement. The referees participated during one of their regular advanced training courses and were not compensated monetarily.
Design and procedure The first part of each clip was between 11.6 and 15.1 seconds long. A keeper during a kick-off was depicted in three clips, a throw-in in one clip, a corner kick in three clips, and a free kick in two clips. A clip was either shown completely or shortened by the removal of the moment of ball contact (kick) or ball release (throw-in). We deleted 1–4 frames; however, the deletion for causal and non-causal clips was always exactly the same. The second part of the clip lasted between 5.7 and 8.4 seconds. Each participant saw nine clips spread equally across three conditions: complete first part with causally linked second part (complete), shortened first part with causally linked second part (incomplete with causally linked sequence), or shortened first part with second part that was not causally linked (incomplete with non-causally linked sequence). See Fig. 1, combinations A–B–C, A–C, and A–D. The experiment reported here took 15 minutes. The participants received instructions and immediately started with the event-completion task. After each clip, seven response pictures (Strickland & Keil, 2011) were shown (see Fig. 2).
Results
Expertise knowledge We calculated the proportion of correctly answered questions. The football players’ declarative football knowledge was significantly higher compared to the novices’ (M =.86, SD =.34 and M =.51, SD =.50, respectively): t(50.83)=10.70, p <.001. We regarded the referees’ football knowledge as a precondition for their FIFA employment and did not test them on the questionnaire.
Proportion correct We analyzed participants’ performance in the recognition test. Because the critical contact item was a target item in the complete condition and a distractor item in the remaining two conditions, we excluded this item from this analysis. We calculated the proportion of correctly answered questions and fitted a generalized mixed effect model with the binary dependent variable yes/no answers. Expertise was inserted as the fixed effect, and participants were specified as the random factor. The factor expertise was significant [ χ
2(2) = 17.621 and p <.01]. Post-hoc Tukey comparisons helped to specify the difference between the three groups of expertise. As can be seen in Fig. 3, the two expert groups outperformed the novices: for players vs novices z = 3.33 and p <.01, and for referees vs novices z = 3.254 and p <.01. We observed no differences between the players’ and the referees’ performance (z = 0.06, p =.99).
Proportion of yes answers. We analyzed the effects on the binary dependent variable (yes/no answers) with a generalized linear mixed model (with a logit link), using the lme4 package in the R environment. Participants were specified as a random factor to control for their associated intraclass correlation. We used the raw data and fitted a model including all main effects and interactions of expertise, item type, and condition as fixed effects. We analyzed the resulting model using type II Wald χ
2 tests.
Our main finding is a significant two-way interaction of condition and item type [ χ
2(4)=11.52 and p =.021]. The three-way interaction of expertise, condition, and item type was not significant [ χ
2(8)=6.91 and p =.546]. Further, there was a significant main effect of item type [ χ
2(2)=1262.00 and p <.001], and a significant interaction of expertise and item type [ χ
2(4)=41.05 and p <.001]. None of the other main effects and interactions reached significance,p>.17. While the proportion of yes answers in the non-causal condition was significantly lower (as expected), it should be noted that the false-alarm rate was still over chance level. However, our findings are in line with the results found in the original study by Strickland and Keil (Strickland and Keil 2011). See Fig. 4
a for the analyzed proportions in each expertise group.
To investigate the interactive relationship of the two categorical variables condition and item type, we calculated contrasts. The underlying glmer model was now reduced (see Fig. 4
b for the aggregated data used) and did not include expertise anymore, since the given expertise level (novice, player, or referee) did not interact with condition*item type (non-significant three-way interaction reported above). To prevent α inflation at this level of the analysis, a Bonferroni correction (0.05/3=0.016) for multiple comparisons was applied. Further insights into the variability of the (log) mean difference between the observed answers are given with 95 % confidence intervals (CI).
As expected, two of the three contrasts produced significant results. The number of yes answers (i.e., the number of reports indicating that the contact moment had been seen) in the condition with implied causation (causal) differed significantly (z = 22.21 and p <.001) from the number of yes answers in the condition without implied condition (non-causal), with an estimated (log) mean difference of 4.03, CI [3.60, 4.46]. The non-causal incomplete condition also differed significantly from the condition in which the ball contact was included (complete condition), z = 16.51 and p <.001 (estimated difference = 3.95, CI [3.52, 4.38]). The contrast of the causal vs the complete condition was not significant, z = 0.73 and p = 0.75 (estimated difference = 0.15, CI [ −0.36, 0.68]).
Bayesian statistics We calculated a Bayes factor analysis for the proportion of yes answers to the contact items in the no causal implication and the conditions with causal implication. The Bayes factor evidence for the null hypothesis in a Bayesian repeated measures ANOVA comparing a model that included the main effects of condition (with causal implication or no causal implication) and expertise (novices, players, and referees) with a model including additionally the interaction of these factors amounted to 4.99, which is conventionally classified as substantial (Rouder, Morey, Speckman, & Province, 2012; Wetzels & Wagenmakers, 2012).
Confidence A repeated measures ANOVA was performed with confidence as the dependent variable (see Fig. 5). We observed a significant main effect of expertise [ F(2,71)=10.27 and p <.001]. Players’ and referees’ confidence was significantly higher than novices’ confidence,p<.004. Again, there was no difference between players and referees (p =.501). Further, we observed a significant main effect of item type [F(2,142) = 27.20 and p <.001], indicating that confidence was higher for the no-filler items compared to the contact items and the yes-filler items,p<.003. The interaction of item type and condition approached significance [F(2,142) = 2.42 and p =.049]. In this context, however, we observed no significant differences between the different conditions with regard to the contact item responses, p>=.247.
Discussion
To capture online perceptual performance errors, we presented video clips that implied causation (or not) and asked the participants afterwards whether they had seen certain pictures (or not). While overall performance (proportion correct) was higher for experts than for novices, all participants were prone to the event-completion effect (analyzed with the proportion of yes answers). Furthermore, we measured confidence rating to examine whether experts show illusionary superiority biases (observed as a coping mechanism for stress and self-esteem protection in referees; e.g., Wolfson & Neave, 2007). We observed higher confidence ratings in the referee and the player groups compared to the novices – however, they actually performed better, thus, showing an actual superiority instead of an illusionary superiority bias. This was expected based on the experts’ superior recall and recognition of meaningful patterns and details (Bell et al., 2009; Lesgold et al., 1988; Reingold & Sheridan, 2011; Smeeton et al., 2004). The results of the present study replicate the event-completion effect measured in the original study by Strickland and Keil (2011). The results exemplify how the human information processing system struggles with perceiving and recalling details of an everyday life event. We found these difficulties to be independent of task-specific expertise, suggesting that on a certain basic perceptual level, if presented with a simple action event, humans equally chunk or segment continuous activity, resulting in the representation of a series of discrete events (Newtson, 1973) – a process that allows for online and post-hoc inferences, and illusory causal fillings. However, before we interpret these results further, we need to ensure that the effect found is not due to the study instructions, which may have biased the participants to assume ball contact. Participants may have assumed they had seen contact because they did not know that omitted contact moments were an option.
The question remains whether the observed event-completion effect is a phenomenon based on online predictions or rather the result of backwards mapping, an effect known from text comprehension research (e.g., Potts, Keenan, & Golding, 1988). Although, backwards mapping was originally used to explain anticipation processes during text comprehension, its adaption to causal fillings in event perception is straightforward: participants base their decisions of a recognition item at the very moment of presentation and check if the picture is a plausible cause of what they have already watched. In other words, a contact picture would be a plausible, and natural, cause of a video clip that showed a football player approaching a ball.
Experiment 2: contact – yes or no?
In a detection experiment, we presented participants with complete and incomplete stimuli with causal and non-causal continuation and asked them to indicate whether they had seen the contact moment or not. This may prevent backwards mapping because participants “know what to look for” before the presentation of the video clip. Further, without recognition items (pictures), the participants are less prone to picture-based biases, which allows us to measure participants’ discrimination performance in the non-causal and causal conditions. If the event-completion effect is primarily a phenomenon based on online predictions, participants’ discrimination performance should be lower in the causal compared to the non-causal condition.
Method
Participants Altogether, 32 students of the University of Tuebingen (7 male and 25 female students, age M = 23.16 years, SD = 4.61) participated in the experiment in return for course credits or monetary compensation. Of these, 17 participants were assigned to the causal and 15 to the non-causal condition. We excluded from the analysis one participant who did not understand the task. Thus, 17 in the causal and 14 participants in the non-causal condition entered the final analysis.
Design and procedure Then 40 video clips were shown either as complete clips or with the ball contact excluded. The first part of each clip was between 1.4 and 15 seconds long. The clips were either causally connected or not (see “General method”). We always deleted four frames before the ball contact frame, resulting in a deletion of 160 ms in both incomplete conditions (the presentation rate of each clip was 25 frames per second). The second part of the clip was between 1.2 and 6.3 seconds long.4 Clips consisted of 14 kick-offs, 5 corners, 13 throw-ins, and 8 free kicks. Participants received specific information on the probability that the ball contact was visible (50 %). Further, they saw a process graphic of a matchstick man approaching a ball and kicking it so that they knew what “ball contact moment” or “moment of ball release” meant. The suggested experiment has been conducted as a mixed 2 (ball contact visible, within-subject manipulation) × 2 (second part of the clip: causal or non-causal, order was balanced between groups) design.
Results
We report sensitivity (d
′) and response criterion (c) from signal detection theory as dependent variables (Green & Swets, 1966). Yes answers to clips depicting the release moment (complete conditions) were counted as hits and yes answers to clips not depicting the release moment (incomplete condition) were counted as false alarms. Finally, we aggregated the data on the participant level and calculated separate independent sample t-tests for d
′ and c. Because d
′ and c are not defined for hit rates and false-alarm rates of 1.0 and 0.0, we adjusted such values to half a trial incorrect or half a trial correct, respectively.
Sensitivity Sensitivity (d
′) was well above chance (d
′=0) and was significantly higher in the non-causal (M=2.75, SD = 0.59) compared to the causal condition (M = 1.44, SD = 1.15), t(29) = 4.08, p <.001. Thus, this supports the hypothesis that participants’ online perception was distorted by the causal continuation of the scene.
Response bias We did not observe a statistically significant difference between the non-causal (M=−0.09, SD = 0.32) and the causal condition (M=−0.20, SD = 0.38) with regard to the response criterion (c), t(29) = 0.88 and p =.388.
Discussion
We observed lower discrimination performance in the group of participants who saw the causal sequel compared to the group of participants who were presented with non-causal sequences. Thus, these findings support the hypothesis that the causal continuation actually changed participants’ perceptions. A (cautious) explanation of this finding refers to EST (Zacks et al., 2007). According to EST, participants’ perceptions are based on predictions. For a non-causal continuation, these predictions fail and participants perceive an event boundary. As a consequence, participants’ representations of this moment are more precise compared to the condition with non-causal continuation in which predictions were not violated and participants did not perceive an event boundary.
General discussion
The present study was interested in the interplay of cognitive and perceptual processes in experts compared to novices. The main objective was to study the appearance of the event-completion effect in groups with different cognitive-perceptual training. However, our results also give us an idea of how internal schema-based systems and external sensory input processing may result in an automatic completion of events. The results reported here allow a number of interesting implications.
Theoretical implications for event perception
The current most prominent model of event perception is EST (Zacks et al., 2007). EST is based on various theories of perception, neurophysiology, and language processing (Carpenter, Grossberg, & Arbib, 2003; Fuster, 1990; Van Dijk, Kintsch, & Van Dijk, 1983). A fundamental principle of EST states that the processing of events forms sensory representations that are influenced by experience and knowledge. Event schemata affect the current content of the event models with top-down processes, expanding their effective capacity by assembling predictive information about the future relevance of certain features of events. When certain event features change (e.g., situational features such as spatial location and characters; Zacks et al., 2009), prediction errors occur and an event boundary is perceived. Regarding EST, our results could be explained with an error-detection mechanism that operates on a temporal buffer holding a given number of causal snapshots (Wood, 2011). The error-detection mechanism constantly checks whether online predictions based on working memory representations of the ongoing event are fulfilled. Transient increases in the violation of predictions (Zacks et al., 2009) make the current event model useless and in need of an update. As our results suggest, one missing snapshot of an event (implied causation condition) does not automatically trigger an event update because enough predictions of the event are fulfilled. Clip sequences that did not imply causality may have activated an error-detection mechanism and triggered event boundary perception processes. The original EST model describes event models as a stable representation that can only be reset or updated based on the current perceptual information available when the error-detection mechanism opens the gate. Error detection may also play an important part in the actual perception of events: the comparable number of yes answers for contact items and causal yes-filler items in our data implies that the event-completion effect is nurtured by the sensitivity of the error-detection mechanism. In other words, the more prediction errors the error monitoring allows, the more illusory causal fillings will happen. Importantly, our data suggest that expertise does not influence event perception. That indicates that top-down processes do not influence the simple mechanisms of online prediction and error detection as much as is assumed in the EST (Zacks et al., 2007). This top-down component, however, is largely underspecified in EST. Zacks and colleagues write: “This claim is based largely on parsimony and may need to be revised in the future” (Zacks et al., 2007, p. 275). At least for our stimulus material with simple structured events, the idea of an unaffected gating mechanism is in line with Firestone and Scholl (2015b): there are no top-down effects of cognition on perception.
Top-down effects and the locus of contextual biases
Did the participants in our studies actually see (falsely perceive) or did they simply report to have seen (falsely remember) the ball contact? The presented studies applied a recognition and detection test to explore the event-completion effect. However, as recently suggested by Firestone and Scholl (in press), there is a great difference between seeing and recognizing. Any top-down effect measured can be due to an influence on front-end visual processing but equally likely be due to back-end memory. In the current paper, we communicated a tendency to define the event-completion effect as due to an error that occurs in perception rather than in memory. Although we do not have clear evidence for either involvement, the results of Experiment 2 (in which we decreased the possible memory biases due to backwards mapping) do indicate that the effect is partly due to online perceptional processes. We were further biased by the majority of results found in the literature that connect memory to experience. As memory fades due to brain damage or aging, representations become increasingly changed by preexisting knowledge. Especially popular is that patients with Alzheimer’s tend to falsely remember details, words, or events that they actually did not experience (confabulation: e.g., Tallberg & Almkvist, 2001). However, experience and expertise did not influence the appearance of the event-completion effect. Thus, reversing the argument, our results could show that the event-completion effect cannot be an error in memory, because then we would have found differences between the expertise groups.
In a recent paper (Firestone & Scholl, 2015a), the authors discuss semantic (language) priming, universally understood as an effect on memory (Collins & Loftus, 1975) that may have been mistaken for top-down effects on visual processing in various studies. In semantic priming, reading a word such as “peach” lowers the threshold for related fruits in memory and they will be processed faster than an unrelated word. Language and event perception are closely related: much of what we know about our understanding of events comes from studies that asked participants to describe an event in their own words. For example, with such a linguistic account, Talmy (1975) was able to define the building blocks of motion events. However, it may be possible that the observed language structure does not only reflect how we perceive event units, but could be a general reflection of the preferred global-over-local approach of the human brain (e.g., Fink et al., 1996). If we assume that the activation of related words is comparable to the activation of related event models in memory (allowing for faster access to different scenarios and faster processing of related visual details), our null findings would again point towards a bias on the perceptual level. The wealth of experienced scenarios of the event should have activated a broader spectrum of experienced content in the experts, which should have resulted in differences between the three groups due to differences in memory activations. Firestone and Scholl (2015a) further proposed that it is possible to distinguish memory and perception clearly in practice. This seems to be a bold proposal since false memories (here, an error of commission) can be elicited within 1/20th of a second (Intraub & Dickinson, 2008). Intraub and Dickinson (2008) report a constructive error in scene representation, the boundary extension, in which observers falsely remember an image that is shown beyond the edges of the previously encountered view. When the first item is presented without a scenic structure, boundary extension does not occur (Intraub, Gottesman, & Bills, 1998). They propose that boundary extension is the result of a source-monitoring error (Johnson, Hashtroudi, & Lindsay, 1993) with a strong influence of a reality-monitoring error (Johnson & Raye, 1981). The error happens when the human brain has to distinguish between internally generated information (experience with certain structures) and externally generated information (sensory input). The authors suggest that the rapidity of such a boundary extension error is advantageous rather than harmful; it shows how the visual system incorporates fleeting views of images with spurious boundaries into a coherent representation of the world around us. The rapidity of the error may further imply that perception and memory are two processing systems of the same underlying cognitive mechanism.
Our data could be explained as the result of a distinction error between internally and externally generated information. Disregarding the traditional distinction of false memories and visual illusions and assuming an extraordinary fast engagement of both during the processing of visual input, the observed null effect of expertise in our experiment may be the result of an imbalance of weighted sources. The externally generated information processing of experts may be more efficient and more detailed; however, the internally generated information outperforms sensory input due to the system’s need to embed the event into known reality. Experiment 2 further reflects the weight of the reality source. Here, participants knew precisely what would be tested in each trial and were prepared to answer a specific question. Conscious awareness is needed to be able to report whether the stimulus was visible or not (Lamme, 2003), but even in such an enhanced state of target processing, the internally generated source overruled the external sensory input, resulting in decreased sensitivity for the detection of the ball contact moment in causally linked scenes. For the current design, the ideas mentioned above are pure speculations and may be regarded as such. Future research could be concerned with whether expertise influences the level (global or local) of event processing. For example, Beaucousin et al. (2011) recorded event-related potentials and reported that the meaningfulness of an object influences global and local information. They assumed that knowledge about the world influences the global and local levels of processing. Comparing the performance of experts and novices on meaningful and non-meaningful patterns would help us to understand better the early stages of processing.
In addition, it would be interesting to see whether experts compared to novices structure events differently, measured as event segments indicated with a button press by participants. As memory distortions can happen within 50 ms (Intraub & Dickinson, 2008), behavioral measurements may not be able to grasp the difference between memory and perception (if there is any). To really answer such a question, functional neuroimaging procedures are advisable.
Practical implications
The present findings have a serious impact on the fairness of the game. A red card may be based simply on two single observations that perceptual processes have falsely interconnected in a causal manner: player A approached player B and player B got hurt. The match official may be absolutely certain that they had seen a contact, but it may have been an event-completion effect. The top Dutch football league (Eredivisie), therefore, employs a video referee who observes video replays of the game to help the referees on the field with tricky decisions. However, since many believe that the human element of sports is lost when technologies are used, eliminating, for example, the “enjoyment of debating mistakes” (Kelso, 2010), chances are rather low that other European football leagues will follow the example of the Dutch. Even in the presence of technology, the importance of the perceptual and cognitive skills of match officials is, thus, not reduced.
Limitations
It needs to be taken into account, however, that we aimed to test basic perceptual processes and can, thus, speak only about the organization of the mind when it is faced with simple events. The perception of complex events may nonetheless be influenced by domain-specific expertise. For example, when presented with a deliberate dive, novices may not be able to differentiate between whether it was a real foul or a fake fall by the player. The cognitive-perceptual excellence and the so-called intuitive skills of an expert to analyze such an incident may be based on a highly sensitive error-detection mechanism. Such ideas, however, will require theoretical and empirical development beyond the scope of this article. Left unknown is still whether memory or perception is responsible for the effect.