Errors in visual search: Are they stochastic or deterministic?

Li, Aoqi; Hulleman, Johan; Wolfe, Jeremy M.

doi:10.1186/s41235-024-00543-z

Original article
Open access
Published: 19 March 2024

Errors in visual search: Are they stochastic or deterministic?

Aoqi Li¹,
Johan Hulleman¹ &
Jeremy M. Wolfe^2,3

Cognitive Research: Principles and Implications volume 9, Article number: 15 (2024) Cite this article

692 Accesses
2 Altmetric
Metrics details

Abstract

In any visual search task in the lab or in the world, observers will make errors. Those errors can be categorized as “deterministic”: If you miss this target in this display once, you will definitely miss it again. Alternatively, errors can be “stochastic”, occurring randomly with some probability from trial to trial. Researchers and practitioners have sought to reduce errors in visual search, but different types of errors might require different techniques for mitigation. To empirically categorize errors in a simple search task, our observers searched for the letter “T” among “L” distractors, with each display presented twice. When the letters were clearly visible (white letters on a gray background), the errors were almost completely stochastic (Exp 1). An error made on the first appearance of a display did not predict that an error would be made on the second appearance. When the visibility of the letters was manipulated (letters of different gray levels on a noisy background), the errors became a mix of stochastic and deterministic. Unsurprisingly, lower contrast targets produced more deterministic errors. (Exp 2). Using the stimuli of Exp 2, we tested whether errors could be reduced using cues that guided attention around the display but knew nothing about the content of that display (Exp3a, b). This had no effect, but cueing all item locations did succeed in reducing deterministic errors (Exp3c).

Introduction

Individuals routinely fail to report or respond to visual stimuli that are clearly visible, “right in front of their eyes”. This is unfortunate if the stimulus is a typo in your CV. It is markedly more serious if it is a tumor in a chest x-ray. The nature of these errors is important, not the least because it can have legal consequences, in the case of the tumor, if not the typo (Berlin & Hendrix, 1998; Duszak & Robinson, 2022). How we or a court may think about an error may depend on whether it is stochastic, occurring randomly, or deterministic, occurring any time a target appears in a specific location in a particular scene. In this paper, we describe a method for categorizing the type of error as stochastic or deterministic and we consider possibilities for mitigation.

In some cases, a clearly visible, missed item is an unexpected item. The Simons and Chabris (1999) gorilla is the most famous example of such “inattentional blindness” (Koivisto et al., 2004; Kuhn & Tatler, 2011; Mack & Rock, 1998; Macknik et al., 2008; Simons, 2000; Simons & Chabris, 1999). Inattentional blindness has been invoked as an explanation for some real-world errors; for example, how a driver may fail to notice an unexpected road user before a road accident or, more benignly, how an audience member may be induced to believe that something has materialized from nothing in a magic show. Some researchers have claimed that such “inattentional blindness” involves attentional misdirection in magic or elsewhere. The observer is blind because the magician or the situation has moved the observer’s attention away from the critical event (Barnhart & Goldinger, 2014; Kuhn & Tatler, 2005; Kuhn et al., 2008). Other researchers have proposed that a failure to see some highly noticeable objects is due to an illusion that the space behind an occluding foreground object is experienced as empty (“the illusion of absence”, Ekroll et al., 2021). Based on the assumption that the region is empty, the observer may fail to note an item even when movement of the occluder or the observer makes the target visible. Ekroll et al (2021) proposed this illusion as an important contributor to 'looked-but-failed-to-see' (LBFTS) errors in driving situations where an item, hidden by a car’s ‘blindspot’ is not seen even when movement of the car makes the item visible.

Missed gorillas and other examples of inattentional blindness are dramatic but they are far from the only type of LBFTS error (Wolfe et al., 2022). Clearly visible targets are routinely missed even when the searcher knows that these targets, be they typos or tumors, are relevant to their ongoing task. In typical LBFTS driving accidents, the driver will generally know that they should be watching for pedestrians, turning vehicles, etc. (Pamme et al., 2018). In medical settings, when a clinician fails to report an “incidental finding”, it will not be a missed gorilla (Drew et al., 2013). It is more likely to be a secondary, but clinically significant finding that the clinician knows might occur in this setting (Lumbreras et al., 2010). Indeed, a missed item can be the actual target of a search (Hovda et al., 2022, 2023). Medical errors by radiologists are an example. Clinicians will sometimes miss targets like pulmonary nodules even if they are clearly visible when pointed out. Kundel et al (1978) classified such errors into three groups based on eye tracking data. The three groups are search, recognition and decision errors. These remain widely used in the analysis of errors. An error is deemed to be a search error when the target (e.g. a lung nodule) never falls within a "functional visual field” surrounding that target (Sanders, 1970; Wolfe et al., 2021). Recognition errors occur when the eyes fixate on or near the target but the eyes move on without the observer having apparently noted the targets presence. These can be classified as a type of LBFTS errors. Decision errors occur when the observer spends significant time looking at or near the target but still does not label it correctly. In this case, the observer did not fail to see but misclassified the item. In breast radiology, perhaps 70% of missed lesions on mammograms are retrospectively visible when pointed to, after the fact. The search-recognition-decision taxonomy can classify those errors, given eye tracking data but classification is not explanation. Many different factors could underly the errors, including satisfaction of search, incorrect background sampling, and incorrect first impressions (Gandomkar & Mello-Thoms, 2019). We are seeking to understand if these and other factors operate at chance or whether some configurations of stimuli are more error prone. Moreover, we also aim to investigate ways of reducing those errors, even if it is unlikely that there is a general method to reduce all kinds of errors.

In the present work, we are using a simple letter search task. However, even in a very basic laboratory visual search task like a search for a perfectly visible “T” among other distractor letter, "L"s, observers will routinely miss 5–10% of targets. When targets are missed, are those errors random (henceforth “stochastic”)? That is, if participants miss, let us say, 10% of targets, is that a random set of 10% of all target trials, or are observers more likely to miss some specific targets in some specific displays? In the limit, would participants miss the same targets again, if asked to search the same displays? We will call such errors “deterministic”. In addition to examining the nature of the errors, this paper also tests several cueing interventions to see if they can reduce errors and which errors can be reduced. To categorize the errors, a set of T among L search displays was presented twice to each participant. We calculated the miss rate, $P1$, for the first time that the set of displays was shown and, $P2$, for the second time. We also calculated the proportion cases where both copies were missed: $P12$. If the errors are stochastic, then $P12=P1*P2$. If the errors (on the first or the second copy) are deterministic, $P12={\text{min}}(P1, P2)$. If errors are a mix of stochastic and deterministic, $P12$ will fall between these two predictions. In addition to the analysis on the qualitative nature of these errors, it is possible to calculate the relative proportions of stochastic and deterministic errors, based on the three observable quantities: $P1$, $P2$ and $P12$. This calculation allowed us to evaluate the effect of the cueing interventions. If an intervention was useful, did it reduce stochastic or deterministic errors? If these interventions reduce errors on a simple T-vs-L search task, it might be worth trying a similar strategy in socially important, real-life tasks.

Experiment 1: basic search for a T among Ls

Experiment 1 consisted of a simple visual search task where white letters were presented against a gray background.

Participants

The experiment was run online on the Pavlovia platform (https://pavlovia.org). When recruiting participants for Experiment 1 and all the subsequent experiments, we didn’t set the language filter to ensure participants understand the instructions, but we set practice trials and excluded participants based on d’ after we got the data. Therefore, participants who don’t understand the instructions and are guessing will not be included. The only additional filters we set are age (18–100), vision (yes) and exclude participants from previous studies (this criterion applies to experiments after Exp 1). For Experiment 1, we tested 20 participants (6 males, 14 females, mean = 19.5, SD = 0.9, min = 18, max = 21) from the BSc Psychology programme at the University of Manchester. All participants reported normal or corrected-to-normal vision and gave their informed consent before they began the experiment. Participants received course credit for their participation. Ethics approval came from The University of Manchester (2023–16117-27,175).

Stimuli & apparatus

The experiment was programmed in Python and translated into javascript by PsychoPy (Peirce et al., 2019). The online version was hosted on Pavlovia. Figure 1 shows the stimuli for Experiment 1. They consisted of an array of white letters (T and Ls) against a gray background. The length of vertical and horizontal line segments of the Ts and Ls was 0.03 screen height (note that because we were testing on-line, we had relative, not absolute control of the sizes of stimuli). The orientations of the letters were randomly and uniformly selected from rotations of 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, & 360 deg. The positions of the letters were randomly generated for each trial such that all items fit in a square region that had a side length of 0.7 screen height, centered on the middle of the screen. In addition, the minimum distance between any two letters was always larger than 0.1 screen height.

Design & procedure

Participants searched for the letter T among Ls. Participants were instructed to press ‘j’ if they found the target and ‘f’ if they did not. The stimulus was present until response. After the initial response, participants could press the space bar within one second to reverse the response if they thought they made a motor error. Targets were present on 50% of trials. Trial by trial feedback was not given, but after every block of 100 trials the proportion correct for that block was displayed. There were two set sizes; 18 and 36, fully crossed with target presence and target absence. For each participant, we generated 75 versions of each of the four resulting combinations for a total of 300 unique stimulus displays. Each of these was presented twice. The two copies of the 300 stimuli were randomly intermixed across six blocks of 100 trials for a total of 600 trials for the experiment (Please note that this means that the minimum distance between two copies could be 1, i.e., when they were presented in consecutive trials, whereas the maximum distance could be 599, i.e., first copy presented in trial 1, second copy presented in trial 600). Thus, there were three factors in this design, each with two levels: repetition (first, second), set size (18, 36), and target (present, absent). Participants completed four practice trials before they started the experiment.

Analysis method

We focused on the RT data and miss rate data, with our primary interest being in the miss rate data. The RT data was subjected to a three-way repeated measure ANOVA with target presence, set size and repetition as within-subject factors. Since the experiment involved a typical visual search task, we found the typical main effects of target presence and set size. Specifically, there were longer reaction times for absent trials and longer reaction times for larger set sizes. A two-way interaction between target and set size, showing steeper reaction time slopes for absent trials, also occurred. All of these effects are highly statistically reliable and will not be reported in detail in the Results section. The results of the full ANOVAs are shown in supplementary tables.

For miss rate data, we calculated the miss rate, $P1,$ for the first time the set of displays was shown and, $P2,$ for the second time. We also calculated the proportion of cases where both copies were missed, $P12$. If the errors are stochastic, then $P12=P1*P2$. If the errors (on the first or the second repetition) are deterministic, $P12={\text{min}}(P1,P2)$. If the errors are a mix of stochastic and deterministic errors, $P1*P2<P12<{\text{min}}(P1, P2)$. To get a quantitative estimate of the relative proportion of stochastic and deterministic errors, we modelled how the errors observed in round 1 and round 2 could be decomposed into different types of error, as shown in Fig. 2. One complication worth noting is that a deterministic display may produce a stochastic error. A deterministic display is one that would produce a deterministic error. However, it is possible for an error to be produced on that trial for stochastic reasons. Imagine, for instance, that the observer is simply not paying attention on that would-be deterministic trial and pushes a response button at random.

In Fig. 2, there are four possible states for a trial.^{Footnote 1} The target in a trial is either fundamentally unfindable (black) or fundamentally findable (non-black). A completely black circle represents the case where a deterministic error is made on a trial with an unfindable target. A black circle with a red border means that the unfindable target was missed stochastically. A solid red circle represents the situation where a stochastic error is made on a trial with a findable target. A blue circle represents a trial where the target is successfully found. If the target in a trial is fundamentally findable, this target cannot become fundamentally unfindable. This means that it is not possible to transition from a non-black circle to a black circle or a black circle with a red border. A transition in the opposite direction is possible though. For instance, if a cueing intervention works and reduces the number of deterministic errors, it is possible for a deterministic miss (black) on one trial to become a hit (blue) on its next appearance.

To describe the proportions of deterministic and stochastic errors, four parameters are introduced: $d1$ and d2 represent the proportion of deterministic errors relative to the total number of stimuli in round 1 and round 2. $s1$ and $s2$ represent the stochastic error rates for a stimulus in round 1 and round 2 respectively. In Fig. 2, Row 0 with an empty circle represents the to-be-determined status of one trial. Row 1 with four different types of circles represents the four possible outcomes of the first appearance of a trial with the notation for the corresponding probabilities in round 1. Row 2 represents the possible outcomes with the notation for the corresponding probabilities in round 2. Therefore, the observed $P1, P2$ and $P12$ can be theoretically decomposed into the summed error probabilities in round 1 and round 2. The following three equations can be derived (The original versions and the simplifying process can be found in the appendix):

$$P1=d1*\left(1-s1\right)+s1$$

$$P2=d2+s2*\left(d1-d2\right)+\left(1-d1\right)*s2$$

$$P12=d2+s2*\left(d1-d2\right)+s1*\left(1-d1\right)*s2$$

For Experiment 1, there was no cueing intervention. A fixed deterministic rate was therefore assumed for round 1 and round 2, i.e., $d=d1=d2$. With this additional assumption, there is a unique solution for the above equations.

$$d=\frac{P12-P1*P2}{1-P1-P2+P12}$$

$$s1=\frac{P1-P12}{1-P2}$$

$$s2=\frac{P2-P12}{1-P1}$$

Data exclusion

Since the data were collected online, we first checked if there were extreme long RTs (larger than 100 s) that might cause large shifts in means and standard deviations and excluded those extreme long RTs to calculate the limit of mean $\pm$ 2.5 SD. Then trials with RTs smaller or greater than 2.5 SD from the mean RT in each cell of the combination target x set size (3.47%) and trials where participants corrected their motor responses (1.07%) were removed for each observer. When one trial was removed, the other copy of the trial was also be removed (93.3% remained). After the removal of the above trials, we further checked the d’ of all the participants. Participants with d’ beyond 2.5 SD from the group mean for each individual experiment were excluded. One participant with a low d’ = 1.02 was removed from Exp 1. For the remaining participants, min d’ = 2.95, max d’ = 5.84.

Results

RTs

Figure 3 shows RTs on correct response trials for Experiment 1. It is clear that the first and second repetitions of the stimuli produce very similar RTs with a slight speed-up on the second appearance. The three-way repeated measure ANOVA with target presence, set size and repetition as within-subject factors shows a main effect of repetition [F(1, 18) = 7.00, p = 0.016, ${\eta }_{p}^{2}$ = 0.28], suggesting that participants responded faster in round 2 than in round 1. The interaction between target presence and repetition [F(1, 18) = 0.37, p = 0.55, ${\eta }_{p}^{2}$ = 0.02] as well as the interaction between set size and repetition [F(1, 18) = 0.50, p = 0.49, ${\eta }_{p}^{2}$ = 0.03] was not significant. The three-way interaction among all the factors was not significant either [F(1, 18) = 0.004, p = 0.95, ${\eta }_{p}^{2}$ = 0.00]. The full results of the three-way ANOVA are presented in Additional file 1: Figure S1. Individual RTs are given in Additional file 1: Table S1.

Miss rates

Figure 4 shows the results of the miss rate analyses for Experiment 1. In the scatter plot in Fig. 1, the proportion of targets missed twice (P12) is plotted as a function of the proportion of targets missed on the first appearance. The blue dots represent the observed data for each participant with $x=P1$ and $y=P12$ calculated from human data. Each observed data point (blue dot) is paired with the stochastic prediction (red circle) and the deterministic prediction (black circle) of $P12$ given the observed $P1$ and $P2$. Therefore, for one participant with observed $P1$, $P2$ and $P12$, each observed data point (blue dot) is $(P1, P12)$, the stochastic prediction (red circle) is $(P1, P1*P2)$, and the deterministic prediction (black circle) is $(P1,\mathrm{ min}(P1,P2))$. As can be seen, the observed data (blue dots) are almost overlapping with the stochastic predictions (red circles). The bar plot in Fig. 1 shows the results of the parameters solved using the equations from the method section, above. The figure is based on the assumption that $d=d1=d2$, resulting in three parameters to be computed i.e., the deterministic error proportion, $d$, the stochastic rate in round 1, $s1,$ and the stochastic rate in round 2, $s2$. $d$, $s1$, $s2$ were all supposed to be positive rates (including 0), so one participant with a calculated $d$ smaller than $-0.002$ was excluded. For other participants, when -$0.002 \le d < 0$, d was rounded to 0. A one-sample t-test showed that the deterministic error proportion $d$ was not significantly different from 0 [t(17) = 1.47, p = 0.159, Cohen’s d = 0.35], demonstrating that errors in Experiment 1 were almost exclusively stochastic. A paired t-test comparing the stochastic rates, $s1$ and $s2$, shows that observers made fewer stochastic errors in round 2 than in round 1 [t(17) = 3.48, p = 0.003, Cohen’s d = 0.82], indicative of some learning effect over the course of Experiment 1.

Experiment 1 discussion

Experiment 1 consisted of a simple T-vs-L search task where all the white letters were presented on a gray background. Analyses of the RTs and miss rates showed that observers responded faster and made fewer errors in round 2 than in round 1, indicative of some learning effect. Learning effects could include both an effect of time on task (participants perform better or worse over time due to practice or fatigue) and a repetition effect (participants may know something about the stimulus when they encounter it for the second time), but this is not relevant to our aim of categorizing errors as deterministic or stochastic. More importantly for present purposes, the proportion of deterministic errors, $d,$ calculated from miss rates was not significantly different from 0, indicating that errors were almost purely stochastic in this experiment. The result would be different if there was a systematic bias in search. For example, if observers tended to ignore the lower left corner of the display, then targets in the lower left would be more likely to be missed on both their first and second appearances. This is not what is found with these simple and clear stimuli. However, in many real-world search tasks (mammography, airport security), search items are not so clearly visible. In the next two experiments we therefore tested whether stochastic errors still dominate when items become harder to distinguish from the background.

Experiments 2a and 2b: letters on a noisy background

In Experiments 2a and 2b, the uniform gray background was replaced by a noisy background. The letters were also of various grayscales. The only difference between the two experiments was that Experiment 2b used a more restricted set of target contrasts and target locations compared to Experiment 2a.