When more is more: redundant modifiers can facilitate visual search

Rehrig, Gwendolyn; Cullimore, Reese A.; Henderson, John M.; Ferreira, Fernanda

doi:10.1186/s41235-021-00275-4

Original article
Open access
Published: 17 February 2021

When more is more: redundant modifiers can facilitate visual search

Gwendolyn Rehrig ORCID: orcid.org/0000-0002-5263-4924¹,
Reese A. Cullimore¹,
John M. Henderson^1,2 &
…
Fernanda Ferreira¹

Cognitive Research: Principles and Implications volume 6, Article number: 10 (2021) Cite this article

2543 Accesses
11 Citations
2 Altmetric
Metrics details

Abstract

According to the Gricean Maxim of Quantity, speakers provide the amount of information listeners require to correctly interpret an utterance, and no more (Grice in Logic and conversation, 1975). However, speakers do tend to violate the Maxim of Quantity often, especially when the redundant information improves reference precision (Degen et al. in Psychol Rev 127(4):591–621, 2020). Redundant (non-contrastive) information may facilitate real-world search if it narrows the spatial scope under consideration, or improves target template specificity. The current study investigated whether non-contrastive modifiers that improve reference precision facilitate visual search in real-world scenes. In two visual search experiments, we compared search performance when perceptually relevant, but non-contrastive modifiers were included in the search instruction. Participants (N_{Exp. 1} = 48, N_{Exp. 2} = 48) searched for a unique target object following a search instruction that contained either no modifier, a location modifier (Experiment 1: on the top left, Experiment 2: on the shelf), or a color modifier (the black lamp). In Experiment 1 only, the target was located faster when the verbal instruction included either modifier, and there was an overall benefit of color modifiers in a combined analysis for scenes and conditions common to both experiments. The results suggest that violations of the Maxim of Quantity can facilitate search when the violations include task-relevant information that either augments the target template or constrains the search space, and when at least one modifier provides a highly reliable cue. Consistent with Degen et al. (2020), we conclude that listeners benefit from non-contrastive information that improves reference precision, and engage in rational reference comprehension.

Significance statement

This study investigated whether providing more information than someone needs to find an object in a photograph helps them to find that object more easily, even though it means they need to interpret a more complicated sentence. Before searching a scene, participants were either given information about where the object would be located in the scene, what color the object was, or were only told what object to search for. The results showed that providing additional information helped participants locate an object in an image more easily only when at least one piece of information communicated what part of the scene the object was in, which suggests that more information can be beneficial as long as that information is specific and helps the recipient achieve a goal. We conclude that people will pay attention to redundant information when it supports their task. In practice, our results suggest that instructions in other contexts (e.g., real-world navigation, using a smartphone app, prescription instructions, etc.) can benefit from the inclusion of what appears to be redundant information.

Introduction

Suppose you and several friends get together for a picnic in the park. When your friends realize they left a blanket in the car, you are asked to retrieve the blanket while they scout out a good picnic spot. Your friend hands you a set of car keys and says “it’s the green Mazda” as you make your way to the parking lot. In this context, the speaker—your friend—knows that you are searching for a car among other cars, and provided extra information about the target car (make and color) to help you find it. If your friend’s car is the only car in the parking lot when you arrive, however, the description would be overinformative, and therefore suboptimal from an audience design perspective, which assumes that linguistic expressions should be included only as required to avoid referential ambiguity (Grice 1975). The current study investigates whether overinformative modifiers—those that add information about unique targets in a scene beyond what is minimally required for identification—facilitate visual search, despite adding redundancy to an utterance.

According to audience design theories of communication (Grice 1975; Clark and Murphy 1982; Konopka and Brown-Schmidt 2014), speakers craft utterances with the listener in mind such that they account for common ground between speaker and listener (Clark and Murphy 1982), provide relevant context, and are efficient (Gibson et al. 2019). Grice (1975) famously dubbed the tendency for speakers and listeners to accommodate each other’s communicative needs the “Cooperative Principle” of conversation, and outlined Maxims (best practices) that speakers follow in cooperation with the listener to optimize conversation. In the current study, we focus on the Maxim of Quantity, which states that speakers should provide enough information for listeners to correctly identify the intended referent, and no more (Grice 1975).

Gricean Maxims are guidelines for communication, not inviolable rules that speakers obey strictly. Indeed, speakers systematically violate the Maxim of Quantity in particular (Pechmann 1989; Belke and Meyer 2002; Sedivy 2003; Gatt et al. 2011; Koolen et al. 2013; Westerbeek et al. 2015; Rubio-Fernández 2016; Gatt et al. 2017; Degen et al. 2020). Hereafter we will use the term “non-contrastive” to refer to a referential expression that includes modifiers which are not strictly required for unique identification (e.g., “the red pen” in a context with only a single pen). Overinformative referring expressions frequently include non-contrastive color descriptors (Pechmann 1989; Belke and Meyer 2002; Sedivy 2003; Gatt et al. 2011; Koolen et al. 2013; Engelhardt and Ferreira 2016; Degen et al. 2020), except when the color is highly typical of the object (e.g., “the yellow banana”; Westerbeek et al. 2015; Rubio-Fernández 2016). Speakers are more likely to produce overinformative referring expressions that describe atypical object properties (e.g., “the brown banana”) than highly typical object features (Sedivy 2003; Mitchell et al. 2013; Westerbeek et al. 2015; Degen et al. 2020), and are more likely to include color modifiers than size modifiers (Sedivy 2003). The tendency for speakers to overinform increases with stimulus complexity (Koolen et al. 2013; Davies and Katsos 2013; Gatt et al. 2017; Degen et al. 2020). In a series of production experiments, Degen et al. (2020) replicated these patterns: Speakers systematically included non-contrastive information in referring expressions, especially when describing complex stimuli and atypical object properties. The authors fit a Rational Speech Act model with continuous semantics to the data and found speakers elected to include non-contrastive modifiers when those modifiers made the reference more precise, and presumably more useful to the interlocutor. In sum, speakers include strictly non-contrastive information strategically, resulting in referring expressions that are appropriately informative because the non-contrastive information is still useful.

Under a classical interpretation of Gricean Maxims, it should be more difficult for an interlocutor to arrive at the correct reference interpretation if the referential expression is either over- or underinformative. This prediction follows from the Gricean idea that listeners assume speakers are economical in their use of linguistic expressions, and therefore will use a modifier to infer the existence of a set of items denoted by the head noun rather than a single item. If only one item in fact is present, listeners will be momentarily confused. However, there is mixed evidence regarding how violations of the Maxim of Quantity affect an interlocutor’s interpretation of the reference (Engelhardt et al. 2006, 2011; Arts et al. 2011; Davies and Katsos 2013; Engelhardt and Ferreira 2016; Toutouri et al. 2017). For example, Visual World Paradigm tasks—which employ displays that are similar to visual search arrays—have shown that listeners experience comprehension difficulty when interpreting an overinformative description (Engelhardt et al. 2006) and further exhibit processing difficulties for both under- and overinformative utterances (Davies and Katsos 2013). In an attentional-cueing task, Engelhardt et al. (2011) found longer response times and an N400 following overinformative modifiers, indicating a processing penalty associated with unexpected redundant information. It is important to note that overmodification may have been detrimental in the cases discussed above because the non-contrastive modifiers did not improve reference precision (Degen et al. 2020). Other studies have reported facilitation of reference interpretation when the referring expression contains redundant modifiers. Arts et al. (2011) found violations of the Maxim of Quantity facilitated target object identification among an array of objects when the overinformative modifiers communicated perceptually relevant information (size, color, shape), or spatial information about the target’s location in the array (e.g., left). Toutouri et al. (2017) found overinformative modifiers facilitated search for a target in an object array when the modifier reduced reference entropy: For example, “blue” was helpful when there were few blue objects, and not when many objects were blue. In other words, there was a benefit of non-contrastive modifiers when those modifiers made the referring expression more precise, consistent with Degen et al. (2020). To summarize, non-contrastive modifiers appear to impede reference interpretation when they add noise (e.g., when stimuli are simple or the “speaker” is not reliable), but may facilitate reference interpretation when they improve reference precision.

The predictions about reference interpretation that follow from the classical Gricean perspective—and have been partially supported empirically in psycholinguistic work (Engelhardt et al. 2006, 2011; Davies and Katsos 2013)—are rather counterintuitive in the context of visual search. In the hypothetical car search scenario, the expression “it’s the green Mazda” is overinformative because the color and make are non-contrastive, and so the expression arguably violates the Gricean Maxim of Quantity. However, the overinformative details (the make and color of the car) are perceptually relevant and therefore clearly useful for visual search. Well-defined target information has been shown to facilitate template-based guidance of search for a target object in real-world scenes (Vickery et al. 2005; Malcolm and Henderson 2009, 2010; Castelhano and Heaven 2010; Reeder and Peelen 2013; Bahle et al. 2019). In template-based guidance of visual search, the observer uses a target object cue (e.g., a word or picture) to form a template of the target object in visual working memory, which is subsequently compared to the scene during search (Rao et al. 2002; Schmidt and Zelinsky 2009; Malcolm and Henderson 2009). Targets are located faster when the template is more specific (e.g., a picture of the target vs. the name of the target object; Malcolm and Henderson 2009, 2010; Castelhano and Heaven 2010; Schmidt and Zelinsky 2009; Bravo and Farid 2009), and when the target is a highly typical exemplar of the object category (Castelhano et al. 2008; Maxfield et al. 2014)—though typicality only reduced the time required to verify the target after it was initially fixated (Castelhano et al. 2008). Counter to the Gricean prediction, there is an additive benefit when multiple cues are provided (Malcolm and Henderson 2010; Castelhano and Heaven 2010; Hout and Goldinger 2015). However, the degree to which additional information is beneficial depends on how consistent features are within the object category (e.g., how noisy object feature cues are; Hout et al. 2017), which is consistent with Degen et al. (2020). Target templates held in working memory can incorporate the shape of the target objects as well as diagnostic object parts (e.g., a wheel on a car; Reeder and Peelen 2013), and can incorporate color information (Bahle et al. 2019), evidenced through attention capture by distractor objects with the same shape (Reeder and Peelen 2013) or color (Bahle et al. 2019) as the template. Note that when there were only two target object categories, a single letter (the first letter of the target category name) was sufficient to build object shape information into the target object template (Reeder and Peelen 2013). Based on these findings, we would expect non-contrastive descriptors to facilitate visual search so long as they enrich the target search template.

It is unclear whether the mixed evidence on how violations to the Maxim of Quantity influence reference processing in the psycholinguistic literature, and the mismatch between the reference processing literature and empirical work on templated-based visual search, is due to differences in the paradigms used in each field. For example, reference processing experiments tend not to use complex real-world scenes, and often involve button press or typed responses. When eye-movements are recorded, they are not typically analyzed in the same way in the psycholinguistic literature as in the visual search literature; in Visual World Paradigm tasks, fixations made to each image in a search array during a target period of the auditory stimulus are aggregated (e.g., averaged across trials) for analysis. Furthermore, the literature on template-guided visual search has shown that some information improves target template—and template-guided search by association—more than others, and in many of these paradigms the information is provided in written form. It is possible that theories of reference processing can speak to why certain types of information, presented in linguistic and acoustic form, are more useful than others. The discrepancy provides a fruitful opportunity for cross-disciplinary research. In the current study, we investigated whether the beneficial effects of non-contrastive modifiers on reference processing (Arts et al. 2011; Toutouri et al. 2017) generalize to visual search in real-world scenes, an understudied topic in both psycholinguistics and visual cognition. Real-world scenes benefit from rapid scene gist extraction (Castelhano and Henderson 2007), are processed more efficiently than cartoons or other simplified displays (Henderson and Ferreira 2004), and better approximate real-world environments. Because real-world scenes are complex stimuli, we expect any non-contrastive modifiers that convey task-relevant information to improve reference precision (following Degen et al. 2020). In two experiments, observers searched real-world scenes for a unique target object. We manipulated reference specificity by modifying the search instruction to add either a perceptually relevant target feature or information about the target location.

Based on evidence that target object templates can contain color information (Bahle et al. 2019), in both experiments we added non-contrastive but perceptually relevant information using a color modifier (e.g., Find the black lamp). Following Arts et al. (2011), in Experiment 1 we added target location information using a prepositional phrase after the target object’s name that specified which screen quadrant the target was located in (e.g., Find the lamp on the top left). In Experiment 2, we instead expressed location information relative to an anchor object in the scene (Boettcher et al. 2018). Anchor objects are typically larger objects (e.g., a desk, table, bookshelf, etc.) on which target objects are likely to be located (e.g., Find the lamp on the shelf). In both experiments, performance on trials in which the search instruction included non-contrastive modifiers was compared to trials in which only the target object was mentioned in the search instruction (e.g., Find the lamp).

Because Malcolm and Henderson (2009, 2010) found a target template advantage specifically when observers scanned the scene (scanning time) and in the time interval between finding the target and responding accordingly (verification time), we similarly divided the trial period into the same three discrete search epochs to determine which search epoch may benefit from the redundant modifiers chosen (Fig. 1). Initiation time was the latency of the first saccade following scene onset, at which point the eye first moved to search the scene (Fig. 1, white arrow). Malcolm and Henderson (2009, 2010) did not find a benefit of target template specificity on initiation times; therefore, we do not expect this measure to be sensitive to reference precision. Scanning time was the primary search epoch, defined as the time taken to fixate on the target object after the first saccade (Fig. 1, purple arrows). Verification time was the time to confirm the fixated object was indeed the target, defined as the time between the first fixation on the target object and the subject’s response (Fig. 1, green arrows). Reaction time, a common measure of search efficacy, was defined as the time between the start of the trial and the subject’s response.

We hypothesized that a non-contrastive color modifier would augment the target search template and facilitate target template-guided search relative to when no modifier is present (Bahle et al. 2019). We also hypothesized that location modifiers would facilitate visual search by constraining the region of the display to be searched (Arts et al. 2011). Based on the finding that non-contrastive modifiers improve reference precision and are therefore useful for visual search (Degen et al. 2020), then following Malcolm and Henderson (2009), we predicted non-contrastive, redundant modifiers to facilitate visual search. Specifically, in both experiments we predicted that the duration of the primary search epoch (scanning time) and confirmation epoch (verification time) would be shorter when modifiers appeared in the search instruction in the presence of non-contrastive modifiers in both experiments.^{Footnote 1}

Experiment 1

In Experiment 1, we compared search performance when the search instruction included non-contrastive, redundant modifiers to performance when no such modifiers were provided. Specifically, we included either the color of the target object (e.g., Find the black lamp) to augment the target object template, the location of the target object in the scene (e.g., Find the lamp on the upper left) to constrain the region of the scene to be searched, or no additional information (e.g., Find the lamp). We predict that redundant, non-contrastive modifiers that constrain the relevant object colors and locations within a scene will facilitate visual search. We expect scanning and verification times to be faster when there is a redundant modifier in the referring expression.

Experiment 1: Method

Stimulus selection

Forty-two scene candidates were selected from Google image search. All scenes depicted human-made environments (e.g., kitchens, offices, drawers) and each contained only one instance of the target object type (e.g., only one mug).

Prior to the eye-tracking study, we conducted a norming study to verify that the intended target in each scene was relatively easy to find and that its color was easily identifiable. Fourteen native English speaking undergraduates enrolled at UC Davis completed a Qualtrics survey. Each of the 42 scenes was presented individually. For each scene, subjects were instructed to report separately the location of the target object and its color. Responses were recorded via text box. Prior to the 42 experimental trials, subjects viewed an example trial in which a scene was displayed along with its location relative to another object in the scene (e.g., on the desk) and its color (e.g., white).

Results of the norming study were used to exclude scenes as follows. Two scenes were excluded because subjects spent over 30 s searching for the object or failed to locate the object. An additional two scenes were excluded because subjects reported that more than one instance of the object was present in the scene. Finally, two more scenes were excluded because subjects did not agree on the identity of the object (e.g., they mistook another object for the target) and because no single color constituted a majority of the color responses. The remaining 36 scenes were presented as stimuli in the eye-tracking experiment.

Stimulus preparation

For each scene, we defined a rectangular interest area surrounding the target object. The region of interest (ROI) was used to determine when subjects fixated on the target, and to exclude trials in which observers did not fixate the target from analysis.

Participants

Participants were 50 native English speaking adults enrolled at UC Davis. All subjects had normal or corrected-to-normal visual acuity and normal color vision. Subjects were naive to the purpose of the experiment and provided informed consent to participate. Two of the subjects could not be accurately eye-tracked. Data from the remaining 48 subjects were analyzed.

Apparatus

The experiment was conducted using an EyeLink 1000 + system with a tower mount. Subjects sat approximately 83 cm from the display monitor. Head movements were stabilized using a chin and forehead rest. Stimuli were displayed at 1024 × 768 pixels in resolution on a 21″ CRT monitor and subtended approximately 36° × 27° visual angle. Viewing was binocular, but eye movements were recorded from the right eye only. Experiment presentation was controlled using SR Research Experiment Builder software.

Design

The modifier manipulation was implemented via the search instruction subjects received prior to seeing the scene, which was presented in written form in the first display of each trial. The instruction either did not include a modifier (e.g., Find the lamp), included a color modifier (Find the black lamp), or included a location modifier (Find the lamp on the upper left). The color modifier was chosen from the majority response provided in the norming study, and the location modifier was chosen by determining which scene quadrant (upper left, lower left, upper right, lower right) contained the target object.

Each experimental session consisted of 36 experimental trials. The modifier manipulation was implemented within-subject such that 12 trials did not include a modifier, 12 trials included a color modifier, and the other 12 included a location modifier. The scenes and all modifiers were counterbalanced and equally distributed across three lists. Subjects were assigned to one of the three lists at random.

Procedure

Subjects were first instructed to search for targets in each scene and to press a button on the button box upon locating the target object. Prior to the experimental trials, a calibration procedure was performed to map eye position to screen coordinates. Calibration was successful if the average error fell below 0.49° and maximum error was below 0.99°. Fixations and saccades were parsed with EyeLink’s standard algorithm using velocity and acceleration thresholds (30°/s and 9500°/s²; SR Research 2017a).

Calibration was maintained throughout the experiment using a drift correction procedure to check and correct for calibration drift. Prior to each trial, a central fixation cross was presented on screen, and the experimenter pressed a button to continue unless the drift check error exceeded 0.99° visual angle, in which case the experimenter repeated the calibration procedure.

Successful initial calibration was followed by 3 practice trials. A trial proceeded as follows (Fig. 2). After the drift check procedure, the search instruction (e.g., Find the black lamp) was presented in black 20 pt Times New Roman font on the center of a white screen. The instruction persisted until the subject pressed a button, after which a central fixation cross appeared for 500 ms, followed by the scene. The scene persisted until the subject pressed a button upon locating the target, at which point response time was recorded. After a 100-ms blank screen, the next trial began.

The procedure for the experimental trials and the practice trials was identical. After completing three practice trials, subjects completed 36 experimental trials. Trial presentation was randomized without replacement.

Data treatment

Prior to analysis, data were inspected in Data Viewer (SR Research 2017b) by the second author. Trials were excluded from analysis entirely (1) if a fixation landed within the target ROI immediately after the fixation cross (61 trials in the location modifier condition only), (2) if no fixations landed in or near the target ROI, in which case observers may have failed to find the target or may have mistaken another object for the target, and (3) trial duration outliers that were over 3 standard deviations above the mean trial duration. We excluded 278 trials (out of 1728 trials total) from analysis using these criteria: 66 in the control condition (3.8% of all trials), 65 in the color modifier condition (3.8% of all trials), and 147 in the location modifier condition (8.5% of all trials). Data from the remaining 1450 trials were analyzed.

Measures

We measured reaction time to gauge overall search performance, defined as the time in milliseconds between the start of a trial and the subject’s response, which terminated the trial. Following Malcolm and Henderson (2009), we divided each trial into three search epochs: initiation, scanning, and verification. Initiation time was equivalent to initial saccade latency, the milliseconds that elapsed between scene onset and when the eye first moved to search the scene. Short initiation latencies (≤ 90 ms) were excluded from analysis. Scanning time was the time taken to traverse the scene before finding the target, defined as the time between initiation (initial saccade latency) and when the target object was first fixated. Verification time was defined as the time, in milliseconds, between when the observer first fixated the target object and the end of the trial.

Analysis

Each of the dependent measures (initiation time, scanning time, verification time, and reaction time) was analyzed in turn using a Bayesian mixed-effects model implemented using the brms package in R (Bürkner 2017, 2018). To facilitate model convergence, the adapt_delta parameter was set to 0.999999999999 for each model, and max_treedepth was set to 15. Because the measures analyzed were ex-Gaussian distributed, each model used an ex-Gaussian linking function. Unless otherwise noted, each model used the default (weakly informative) priors and was maximally specified, with modifier condition as a fixed effect, and random effects of item (scene) and subject with uncorrelated random intercepts and slopes, and all other parameters (e.g., number of iterations) were set to the default. The modifier condition variable was centered prior to analysis, and the reference level was always the no modifier condition (e.g., Find the lamp). We consider differences to be reliable if the 95% credible interval (reported as an equal-tail interval) for the comparison does not contain zero, in which case the true value of β is unlikely to be zero (Nicenboim and Vasishth 2016).

Experiment 1: Results

We predicted that the presence of a redundant, non-contrastive modifier would facilitate search, despite ostensibly violating the Gricean Maxim of Quantity. We predicted that color and location modifiers would reduce all search epoch durations relative to the no modifier control.

Initiation time

On average, observers made an initial saccade 243 ms after scene onset (M = 243, SD = 83). Initiation time was longest on average in the no modifier condition (M = 257 ms, SD = 78 ms), followed by the color modifier condition (M = 255 ms, SD = 81 ms), and was fastest when a location modifier was present (M = 209 ms, SD = 82 ms; see Fig. 3a). Overall, this trend is not numerically consistent with our predictions because initiation times differed across modifier conditions.

The model ran for 8000 iterations. Analysis using this model revealed no reliable difference between the color modifier and no modifier conditions (β = − 0.50, 95% CI = [− 8.33 7.20]). Initiation times in the location modifier condition, however, did differ reliably from the no modifier condition (β = − 48.54, 95% CI = [− 58.48 − 38.92]; see Fig. 4 for posterior draw visualizations).

Scanning time

Observers required 720 ms on average to scan the scene between executing an initial saccade and fixating the target for the first time (M = 720 ms, SD = 795 ms). The scanning epoch was longest when no modifier was provided (M = 928 ms, SD = 924 ms), shorter in the presence of a color modifier (M = 779 ms, SD = 759 ms), and shortest when observers were given a location modifier (M = 402 ms, SD = 531 ms; Fig. 3b). The decrease in mean scanning time in the presence of a non-contrastive modifier is consistent with our predictions.

The final model included random slopes for condition in the subject random effect, and random intercepts in the item random effect. The model did not converge with weakly informative priors, but did converge using a more informative prior with a Student t distribution on β parameter estimates (df = 3, μ = 0, σ = 20) and on standard deviations for the by-subject random slopes (df = 3, μ = 0, σ = 10), and an exponential prior (μ = 25) on the standard deviations corresponding to the location modifier condition for the by-subject random slopes. According to the final model, the difference in scanning time when a color modifier was provided as opposed to no modifier at all was reliable (β = − 58.01, 95% CI = [− 89.92 − 27.11]), as was the difference in time when a location modifier was provided compared to no modifier (β = − 231.42, 95% CI = [− 268.41 − 196.21]; Fig. 5).

Verification time

The time between initially fixating the target and trial termination, or verification time, was 740 ms on average (M = 740 ms, SD = 693 ms). Verification took longest when a color modifier was present (M = 776 ms, SD = 780 ms), was slightly less long when no modifier was present (M = 752 ms, SD = 701 ms), and was fastest when a location modifier was provided (M = 681 ms, SD = 556 ms; Fig. 3c).

The model revealed that none of the differences reported above were reliable. Verification time did not differ reliably when a color modifier was present compared to when no modifier was used (β = 9.22, 95% CI = [− 26.23 45.58]), and the presence of a location modifier similarly did not yield a reliable difference (β = − 6.82, 95% CI = [− 41.96 29.73]; Fig. 6).

Reaction time

On average, observers took 1677 ms to indicate that they had successfully found the target object (M = 1677 ms, SD = 1038 ms). Reaction time was longest when no modifier was present (M = 1910 ms, SD = 1152 ms), was shorter when a color modifier was present (M = 1790 ms, SD = 1026 ms), and shortest when a location modifier was used (M = 1265 ms, SD = 751 ms; Fig. 3d).

The model did not converge with weakly informative priors, but did converge using more informative priors with a Student t distribution on β parameter estimates (df = 3, μ = 0, σ = 150), on standard deviations for random slopes (df = 3, μ = 0, σ = 15), and on the β parameter of the exponential distribution (df = 3, μ = 0, σ = 25), and an exponential prior on the standard deviations (μ = 50) corresponding to the by-item random intercepts and to the no modifier condition for the by-subject random slopes. The final model ran for 6000 iterations. According to the model, reaction time did not reliably differ when a color modifier was provided as opposed to no modifier at all (β = − 42.36, 95% CI = [− 110.98 25.97]), but did differ reliably when a location modifier was provided compared to no modifier (β = − 336.49, 95% CI = [− 412.26 − 264.52]; Fig. 7).

To summarize, initiation times were reliably faster when a location modifier was present, and did not differ when a color modifier was present as opposed to no modifier at all. Both color modifiers and location modifiers facilitated scanning times, which resulted in shorter scanning epoch durations. There was no difference between any of the modifier conditions with respect to the verification epoch. Only the location modifier condition facilitated reaction times.

Experiment 1: Discussion

We predicted that redundant, non-contrastive modifiers that constrain the target object colors and locations within a scene would facilitate visual search. Specifically, we expected search to be faster when a modifier was provided. Only scanning time showed the predicted facilitation for search instructions that contained non-contrastive modifiers. Counter to prior findings (Malcolm and Henderson 2009, 2010), we did not find a benefit of modifiers during the verification epoch. Consistent with our predictions, a non-contrastive color modifier reliably facilitated search during the scanning epoch, likely because it augmented the target object template.

The presence of a location modifier facilitated search, resulting in shorter scanning epoch durations, shorter response times, and, surprisingly, shorter initiation times. The location modifier in Experiment 1 was clearly beneficial overall. While the location modifier was not a contrastive modifier with respect to the target object, it was contrastive with respect to the spatial layout of the scene, and because the information it provided was uniquely relevant to visual search, the benefits of narrowing the region of the display to be searched outweighed the costs associated with processing additional linguistic material. We conducted a second experiment to determine whether the observed benefit of location information would hold when the modifier instead expressed the location of the target object relative to another object in the scene.

Experiment 2

In Experiment 2, we changed the location modifier to express the spatial relationship between target objects and anchor objects in the scene (e.g., on the desk; Boettcher et al. 2018) rather than constraining the region of the scene containing the target to a single quadrant. The color modifier was less clearly beneficial in Experiment 1 than the location modifier was. We suspected the color modifier may have been less informative for targets that typically occurred in that color (e.g., red fire extinguisher), based on the observation that, in such cases, speakers are less likely to include color adjectives in referring expressions (Sedivy 2003; Mitchell et al. 2013; Westerbeek et al. 2015; Degen et al. 2020). To address this potential limitation, in Experiment 2 we excluded scenes and targets for which the target object’s color was too typical, and added additional scenes.

We again predict non-contrastive modifiers that improve reference precision will facilitate search according to our primary search measures: scanning time, verification time, and reaction time.