An ‘auditory scene’ is broadly defined as an array of concurrent sound sources (Gygi & Shafiro, 2010). Auditory scenes can be as simple as a set of pure tones, one or more chords, or a complex array of environmental sounds, like that encountered on a busy city street corner, in a restaurant kitchen during the lunch rush, or in a stadium filled for a rock concert. Increasing the complexity of an auditory scene increases the perceptual and cognitive demands for processing and can lead to misperceptions. Such misperceptions can manifest as inaccurate or inaccessible perceptual representations (Darwin et al. 1972; Näätänen & Winkler, 1999), which can be influenced by top-down factors such as attention, attributions of relevance, and interactions between short- and long-term memory processes (e.g., Zimmermann et al., 2016; Kidd et al., 2008; Gregg & Samuel, 2008; Cowan, 2001). Informational factors, such as stimulus similarity and uncertainty, also contribute to inaccurate or inaccessible perceptual experiences (Dickerson & Gaston, 2014 for review).
An example of the pervasiveness of perceptual errors is evident in the ‘change deafness’ phenomenon, which describes the failure of listeners to notice changes when they are embedded within complex auditory scenes. Change deafness, like a similar finding in the visual literature, change blindness, demonstrates that, despite a subjective impression of coherence and completeness, perceptual experience is incomplete and can be inaccurate. Change deafness has been demonstrated across several auditory domains to include speech (Sinnett et al. 2006; Vitevitch, 2003), environmental sounds (Gregg & Samuel, 2008, 2009; Eramudugolla et al. 2005; Gregg & Snyder, 2012), and music (Agres & Krumhansl, 2008). Several authors have also demonstrated the difficulty of auditory change perception tasks using artificial (synthesized) scenes such as pure tones shaped into scenes via amplitude modulation and shaped noise arrays (e.g., Constantino et al., 2012). A review of the recent change deafness literature (Dickerson & Gaston, 2014) noted that changes are missed 20–50% of the time depending on various perceptual and cognitive factors. For example, similarity between the sound that is changed and the other sounds in the scene influences change perception accuracy, with changes that are acoustically and semantically dissimilar from background sounds producing fewer errors (Gregg & Samuel, 2009), as would be expected from a signal-noise ratio perspective. Gregg et al. (2014) demonstrated that familiarity is also an important factor in driving change perception errors by showing that temporally scrambled and unrecognizable sounds produced significantly more errors than unscrambled and recognizable sounds. The manner in which a change occurs also appears to influence performance. Constantino et al. (2012) found that listeners performed better when a new sound was added to a scene than when a sound was deleted. Finally, change deafness seems to be influenced by attention, namely cueing or directing attention to the spatial location of a changed sound source can reduce the frequency of change perception errors (Eramudugolla et al., 2005; Backer & Alain, 2012).
The reportedly positive effect of providing a cue to the location of a change suggests that spatial position and spatial separation may be useful for perceptually segmenting a scene, which in turn may reduce change perception errors. Studies addressing the relationship between spatial separation and change perception, however, are few, limited to virtual audio manipulations, and are generally not in agreement. Gregg and Samuel (2008) found no segregation advantage for spatially separated sources in a virtual array, whereas Eramudugolla et al. (2005) found that spatial separation resulted in significantly fewer change perception errors. More generally, spatial position can be a cue to successful perceptual segregation (e.g., Bregman, 1993; Yost, 1993, 1997) and a number of psychophysical studies have shown (auditory) spatial cues to provide beneficial effects for perceptual performance (e.g., Broadbent, 1954; Best et al., 2006; Jones & Litovsky, 2011), including auditory search (Eramudugolla et al., 2008). Specifically, spatial separation has been shown to provide a reduction or elimination of informational masking effects (e.g., Ihlefeld & Shinn-Cunningham, 2008; Kidd et al., 1994), a phenomenon that may share common perceptual mechanisms with patterns in reports of change deafness (Dickerson & Gaston, 2014). There is clearly reason to expect that spatial separation may reduce change perception errors by reducing perceptual ambiguity, but as was previously mentioned, the literature on this topic as it relates to change deafness is sparse and conflicting. The present study uses a physical multi-speaker array and compares scenes with spatially separated versus spatially co-located sounds to systematically evaluate the role of spatial separation in modulating change errors. The use of sounds presented over speakers in the free-field, rather than the use of a virtual spatial manipulation, is an important methodological change from previous studies, as it is often the case that virtual spatial arrays are more often lateralized than truly localized (Yost, 1993) and artifacts associated with headphone lateralization, or the use of a generic head-related transfer function could, in part, explain the mixed results of previous studies investigating the role of spatial cues in change deafness.
In addition to the spatial manipulation, we follow the path of others in this area by examining how the type of change influences perceptual errors. In the current study, we instantiate changes via source additions and source removals. Change deafness studies in the past have manifested changes via source removals (e.g., Eramudugolla et al., 2005), as in ‘token and type changes’ (Gregg and Samuel, 2009), where a source in the scene is replaced with a signal that is either semantically and acoustically dissimilar (token change) or only acoustically dissimilar (type change), a ‘switch’ in which a sound is replaced with a different sound (Gregg et al., 2014), or a position ‘swap’, where two sounds change spatial position (Backer & Alain, 2012). Only Constantino et al. (2012) appear to have looked at both the addition and the deletion of a source within a single study context. Constantino et al. (2012) found that the addition of a source was easier to detect because the new source ‘pops-out’ from the background, compared to a deletion, in which the information in each frequency band must be iteratively compared. The present study examines performance for both additions and removals, as there is some suggestion (from Constantino et al., 2012, and others) that fewer errors should occur for source additions, as these changes will be perceived as ‘onset events’ and may pop out. Onset events are likely to elicit an automatic allocation of attention (e.g., Samuel & Weiner, 2001), which is known to facilitate change perception in both vision (Miller, 1989) and audition (Sussman et al., 2003). Thus, the addition of a new sound to the scene should be especially salient, causing participants to make fewer errors in the addition than in the removal condition. Although source additions may be more salient events, all of the scenes in the present study commence with the same number of sounds. Thus, an addition will result in a scene that has two more sources than in the sound removal condition. If there is some limit on the number of stimuli that can be represented in memory, then there should be more errors in the source addition condition with a trend toward reductions in errors as the size of scene two decreases (from 5 or 4, to 3 in the Addition, No-change, and Removal conditions, respectively).
This idea that the scene size, and therefore memory load, plays a role in change perception errors has been examined previously. Gregg and Samuel (2008) found evidence for change deafness (high errors) despite generally accurate performance on a cued recall task. In vision, Mitroff et al. (2004) report a similar finding; however, their results are less clear. Mitroff et al. (2004) in fact reported that memory for pre- and post-change scenes was preserved even when participants reported no awareness of a change, but in follow-up experiments they found that the stored representations are fragile; simply reversing the question order from cued-recall first to cued-recall last lead to significant decrements in recall. To further explore the interaction between recall accuracy and change perception errors and assess memory for scene elements, the present study presents participants with a cued recall task following each change perception trial.
Finally, change deafness is usually characterized in terms of hits, or accuracy, in indicating that a change has occurred. Change deafness, the failure to notice a change that has occurred, would be most directly measured by looking at hits or misses (e.g., Gregg & Samuel, 2008). However, restricting analyses to accuracy is a potential limitation because hit rates can be substantially influenced by listener response biases. In Signal Detection Theory (SDT) (Macmillan & Creelman, 2005), changes in response bias correspond to changes in decision criteria that ultimately result in systematic changes in hit and false alarm rates. These systematic changes can be modeled in receiver operator characteristic space and show that, across changes in criteria, sensitivity remains the same. In the change blindness literature, the influence of response bias is recognized and thus analyses typically report performance based on SDT measures of sensitivity in addition to accuracy measures (e.g., Mitroff et al., 2004). Although SDT approaches have not been broadly applied in the change deafness literature, there are notable examples (e.g., Eramudugolla et al., 2005; Gregg & Samuel, 2008; McAnally et al., 2010; Puschmann et al., 2013a, 2013b) that report evidence of change deafness despite using a bias-free measure of sensitivity. Here, we report measures of accuracy and SDT measures to examine patterns of hits and false alarms as well as a bias-free measure of sensitivity (d’) using an AX (same-different) task. We refer to the phenomenon of ‘change deafness’, but will also use the term ‘change discrimination’ where appropriate to denote the experimental procedure underlying measurement of the phenomenon.
To summarize, the present study fills a gap in the emerging change deafness literature by manipulating several common factors thought to influence change perception performance. We address the mixed results over the role of spatial cues in change deafness by comparing performance for spatially distributed or spatially co-located scenes using real spatial sources over a loudspeaker array. We examine two common change implementation strategies to investigate both the possibility that pop out could occur for additions and the secondary goal of evaluating set size effects. Finally, we present accuracy and SDT analyses together to eliminate the possibility that change discrimination errors are not simply an artifact of listener bias.Footnote 1