Audience immersion: validating attentional and physiological measures against self-report

Hammond, Hugo; Armstrong, Michael; Thomas, Graham A.; Gilchrist, Iain D.

doi:10.1186/s41235-023-00475-0

Original article
Open access
Published: 19 April 2023

Audience immersion: validating attentional and physiological measures against self-report

Hugo Hammond ORCID: orcid.org/0000-0003-2411-6924¹,
Michael Armstrong²,
Graham A. Thomas² &
…
Iain D. Gilchrist¹

Cognitive Research: Principles and Implications volume 8, Article number: 22 (2023) Cite this article

4352 Accesses
4 Citations
1 Altmetric
Metrics details

Abstract

When an audience member becomes immersed, their attention shifts towards the media and story, and they allocate cognitive resources to represent events and characters. Here, we investigate whether it is possible to measure immersion using continuous behavioural and physiological measures. Using television and film clips, we validated dual-task reaction times, heart rate, and skin conductance against self-reported narrative engagement. We find that reaction times to a secondary task were strongly positively correlated with self-reported immersion: slower reaction times were indicative of greater immersion, particularly emotional engagement. Synchrony in heart rate across participants was associated with self-reported attentional and emotional engagement with the story, although we found no such relationship with skin conductance. These results establish both dual-task reaction times and heart rate as candidate measures for the real-time, continuous, assessment of audience immersion.

Significance statement

The average UK adult spends almost a third of their waking hours (5 hours and 40 minutes per day) watching television, film, or other online video content (Ofcom, 2021). One often-desired property of this media is that it elicits immersion: it can captivate viewers, sustain attention, and lead to total envelopment in the on-screen world. Despite the prevalence of media in our lives, however, psychologists and media creators understand relatively little about immersion. Immersion is conventionally measured through retrospective questionnaires, which may not be sensitive to in-the-moment fluctuations. More recently, attempts have been made to assess immersion using continuous neural, behavioural, and physiological measurements. In this study, we aim to validate three measures (dual-task reaction times, heart rate, and skin conductance) against a widely used immersion questionnaire: the Narrative Engagement Scale. We find that dual-task reaction times and synchrony in heart rate are strongly related to attentional and emotional engagement with the story. These results may allow researchers and creative industry professionals to better understand continuous fluctuations in immersion. This methodology could be applied throughout various stages of the media development process, for example to pre-screen versions of a scene, or even to provide real-time dynamic feedback to broadcasters or performers.

Introduction

Imagine you are watching your favourite film or television programme: your heart races, your eyes are glued to the screen, and you fail to notice that several hours have passed while you are completely absorbed in the narrative. This is immersion, which can be defined as an individual’s experience of ‘a state of deep mental involvement in which their cognitive processes (with or without sensory stimulation) cause a shift in their attentional state such that one may experience dissociation from the awareness of the physical world’ (Agrawal et al., 2020). This definition shares similarities with other concepts, for example the dissociation from real towards virtual worlds experienced in presence (Sanchez-Vives & Slater, 2005) and the deep mental involvement experienced in transportation (Green & Brock, 2000), narrative engagement (Busselle & Bilandzic, 2009), and narrative absorption (Hakemulder et al., 2017). While immersion is a broad term which may apply to a wider variety of media (Agrawal et al., 2020), here we will focus exclusively on immersion within film and television.

Immersion is dynamic and may fluctuate over the course of an experience, changing from deep mental involvement towards mind-wandering and disengagement (Esterman & Rothlein, 2019; Song et al., 2021). Content creators understand that it is infeasible to maintain an ‘edge of the seat’ level of immersion indefinitely, and explicitly design media with the intention that immersion rises and falls (Pearlman, 2015). However, despite the dynamic nature of immersion, much of the research in this area relies on singular retrospective estimates from questionnaires: e.g. Immersive Experience Questionnaire (Rigby et al., 2019). Immersion questionnaires typically measure multiple dimensions, including: attention, perception of time, feelings of being spatially located or transported towards the mediated environment, emotional aspects such as the theory of mind, and absorption within the narrative (see Pianzola, 2021 for a review). Questionnaires then are an attractive measure for capturing the multidimensional nature of immersion. The distinct ways in which individuals become immersed are sometimes classified as different types of immersion: e.g. narrative immersion, emotional immersion, and sensory/perceptual immersion (Nilsson et al., 2016; Ryan, 2001). However, as retrospective estimates, questionnaires are not sensitive to the dynamic nature of immersion and are dependent on the memory of the participant. This leaves questionnaire estimates vulnerable to memory biases such as primacy and recency effects (Glanzer & Cunitz, 1966).

More recently, alternative attempts to index immersion have been made using a range of techniques, including neural measures such as fMRI or EEG (Baldassano et al., 2018; Cohen & Parra, 2016; Dmochowski et al., 2012; Hasson et al., 2008); physiological measures such as heart rate or skin conductance (Richardson et al., 2020; Sukalla et al., 2015); behavioural paradigms such as dual-task (Bezdek & Gerrig, 2017; Hinde et al., 2018), continuous rating (Tchernev et al., 2021), or thought-listing paradigms (Magliano et al., 1996; Pjesivac et al., 2021); and other measures including eye tracking (Madsen et al., 2021) or body motion (Theodorou et al., 2019). For a longer discussion of these efforts, see Millman et al. (2022). When using these techniques, one major outcome is that audiences often demonstrate synchronous responses in response to the media, which can be seen in neural (Hasson, 2004), behavioural (Madsen et al., 2021), and physiological data (Madsen & Parra, 2022).

Within narrative media, immersion may be achieved as individuals construct mental models to represent characters, events, and emotions (Mar & Oatley, 2008; Thon, 2008; van Laer et al., 2014; Zacks, 2013). Early definitions of immersion almost exclusively focussed on how increasingly sophisticated display properties evoke immersion (Slater, 2003) and recent work has confirmed that immersion increases where lower-level audio-visual features of the content become more veridical (Hinde et al., 2022).

To a cognitive psychologist, the descriptions of immersion will sound very reminiscent of William James' (1890) much quoted definition of attention as ‘taking possession of the mind… of one out of what seems several simultaneously possible objects or trains of thought’. Here, in Experiment 1, we explore the relationship between attention and immersion directly. There are a wide range of cognitive paradigms to measure attention (see Pashler, 1998) but the dual-task paradigm (Kahneman, 1973) probably most closely captures the non-spatial withdrawal of attentional resources from one task to focus on another, and so that is the focus of Experiment 1.

Dual-task reaction times are a classic and extremely well-established measure of attention within psychology (Kahneman, 1973). In this paradigm, participants complete a primary task (watching a film) alongside a simple secondary task (e.g. responding to an auditory tone). Reaction time to the secondary task is taken to indicate the available cognitive resources for that task. Given a finite amount of cognitive resources, any reduction in cognitive resources to the secondary task suggests that more resources are being allocated to the primary task (Lang & Basil, 1998; see Potter & Bolls, 2012 for a review). Dual-task reaction times have been applied previously within media research (Bezdek & Gerrig, 2017; Hinde et al., 2018, 2022; Lang, 2000; Troscianko et al., 2012); however, no study to date has directly validated the task as a measure of immersion. Dual-task reaction times have the advantage that they can provide moment-to-moment estimates of immersion and are easy and inexpensive to collect.

In this experiment, we also investigate whether synchrony in reaction times (e.g. correlations across participants arising from similar cognitive processing of the media) may be driven by immersion. We note that audience synchrony has been used both in the context of viewer co-presence (i.e. multiple audience members in the same room) and in the context of individual audience members viewing alone. In this paper, we are referring to the latter: synchrony arising from audience member’s cognitive processing of the content.

Experiment 1

In Experiment 1, we explore if the dual-task reaction times task performance is related to immersion as measured by a standard immersion questionnaire. Our aim is to validate dual-task reaction times against questionnaire-based, self-reported immersion. We selected the Narrative Engagement Scale (Busselle & Bilandzic, 2009) as our self-report measure, as it is widely used, designed for film and television content, and assesses dimensions of immersion that may relate to underlying cognitive and emotional processes (attentional focus, emotional engagement, narrative presence, narrative understanding). While the authors of this scale name this concept narrative engagement, we can consider this the degree to which individuals are immersed in a story (Bilandzic et al., 2019). Our design used 7 short clips which were likely to vary in immersion, so we could look at the correlations between dual-task reaction times and narrative engagement scores. A secondary objective of these experiments was to compare the full questionnaire to a single-item question assessing immersion, to determine if it is possible to reduce questionnaire length.

Methods

Participants

Experiment 1 consisted of 170 participants. Participants were recruited from the University of Bristol Psychology student population and were reimbursed with course credit. The sample size was selected arbitrarily but was preregistered at https://osf.io/4fjyc. Participants were eligible for the experiment if they were aged 18 or above, had normal or corrected-to-normal vision, had unimpaired hearing, and had English as a first language (or an equivalent level of fluency). Participants were excluded who: did not watch all clips (n = 2), did not meet the eligibility criteria (n = 1), or had an error rate on the dual-task paradigm above or equal to chance (n = 3); leaving a final sample size of n = 164 (M_age = 19.93, SD ± 3.16, 138 female, 25 male, 1 preferred not to say). Note that excluding participants performing below chance deviates from our preregistered exclusion criteria of < 75% correct responses (see Additional file 1: Fig. S6 for a replication of these results following the preregistered exclusion criteria). Upon reflection, we do not think it is appropriate to exclude participants for answering incorrectly, as a higher error rate may simply be a consequence of higher engagement.

Stimuli

Participants viewed clips from television and film content available on BBC iPlayer. Clips were between 141 and 184 s long and were selected from a range of genres. As clips spanned a range of genres, we expected they would account for a range of participant preferences, and therefore each clip would vary in narrative engagement within each participant. Experiment 1 used 7 clips (see Table 1 for details). Excerpts were selected which would not require any prior context to understand. Clips were presented at 1280 × 720p resolution: the maximum available on BBC iPlayer for most content, and so representative of a typical home viewing environment (note: only a limited amount of content is available in 3840 × 2160p resolution when streaming from a compatible smart television).

Table 1 Stimuli

Full size table

Participants watched the content remotely, on their own laptop or desktop computer. This work was conducted during the COVID-19 pandemic, meaning we could not test in a controlled, in-person setting. However, the value of this work is that we were able to collect data from participants in their naturalistic viewing environment. The experimental window was displayed in Fullscreen, and participants were instructed to ensure they were in a quiet location where they would not be disturbed. Participants could listen using either their device’s speakers or headphones.

Measures

Reaction times

Participants heard random high (1000 Hz) and low (600 Hz) tones at 15-s intervals and were required to make a button press response (left shift for a low tone, right shift for a high tone) as soon as they heard the tone. Tones were 1 s length sine waves, and so distinctive from the audio characteristics of the content. Tones were presented at approximately 10% louder (measured using root mean square energy) than the average volume of all clips. Participants were instructed to set their volume to a comfortable listening level and completed 10 practice trials before the main task to ensure they could discern the tones. As tones themselves may necessarily disrupt the experience of immersion, we elected to compromise varying the interval between tones (and reducing predictability), by creating maximal distance between each tone to avoid potential interference.

Narrative engagement

Participants completed the Narrative Engagement Scale (Busselle & Bilandzic, 2009) after watching each clip. This 12-item questionnaire is used to assess four dimensions of engagement: attentional focus, emotional engagement, narrative understanding, and narrative presence. Items are rated on a 7-point Likert scale anchored between ‘strongly disagree’ and ‘strongly agree’. We included a single additional question on immersion (‘During the program, I was very immersed’), rated using the same 7-point Likert scale, to assess the relationship between the full narrative engagement scale and a single dimension.

Design

We had a within-subjects design, where participants watched each clip in a random order. Because of the total number of clips (7 in Experiment 1), it was not possible to fully counterbalance the design and so the clip order was random. The breakdown of clips in each order is provided in Additional file 1: Fig. S1. Experiment 1 was built using PsychoPy 2021.2.3 (Peirce et al., 2019) and hosted online using https://pavlovia.org, with the information sheet, consent form, and final demographic information being hosted separately using Qualtrics (https://www.qualtrics.com/uk/), version January 2022. All data were analysed using R 4.1.1 (R Core Team, 2021).

Results

Immersive narratives consume more attentional resources

Mean dual-task reaction times were M = 999 ms, SD = 427 ms with M = 92.7%, SD = 26.9% correct responses (see Additional file 1: Fig. S2 for an overall distribution of correct and incorrect responses). In subsequent analyses, responses over 3000 ms and incorrect responses were excluded, as in Hinde et al. (2018). Reaction time data were aggregated by participant and clip, to match the granularity of the self-report measures. While reaction times were not normally distributed (Additional file 1:Fig. S2), we did not transform the data as analyses were conducted on means, which following the central limit theorem will conform to a normal distribution. Mean narrative engagement scores were M = 4.27, SD = 0.94. For each dimension, this is: attentional focus (M = 4.37, SD = 1.75), emotional engagement (M = 3.58, SD = 1.61), narrative presence (M = 3.90, SD = 0.73), and narrative understanding (M = 5.21, SD = 1.44). In subsequent references to ‘narrative engagement’, we describe the mean of all 12 items. When referring to a dimension of narrative engagement, we describe the mean of the subscale items which assess that dimension. The mean single-item immersion rating was M = 4.41, SD = 1.83.

Figure 1 shows the overall correlation between reaction time and narrative engagement. To assess this relationship, we fit a linear mixed model to our reaction time data using the ‘lme4’ package in R (Bates et al., 2007). We included a fixed effect of narrative engagement, and participant as a random intercept.^{Footnote 1} We find that narrative engagement increases reaction time (b = 29.24, 95% CI [18.19, 40.26]). Our single-item question of immersion (hereafter referred to as ‘immersion’) also positively influences reaction time (b = 14.21, 95% CI [8.71, 19.71]). Individual p values are not provided for linear mixed model estimates, due to the problems associated with interpreting p values from linear mixed models (Baayen et al., 2008). However, where confidence intervals do not intersect with zero, this can be considered analogous to a significant difference at p < 0.05.

To assess whether this relationship was robust within participants, we computed the correlation between reaction time and narrative engagement for each participant and compared this distribution against zero. This approach accounts for individual differences in preference, as the test makes no assumptions about which content participants may rate as most engaging. Using a one-sample, two-tailed t test, we found that this overall distribution was significantly greater than zero: mean r = 0.218, t(163) = 6.82, p = 1.69 × 10^–10. Similarly, participants’ individual correlation between reaction times and self-reported immersion was significantly greater than zero: mean r = 0.185, t(163) = 5.85, p = 2.58 × 10^–8.

One interpretation of these data is that the more engaging clips may simply be louder, and as such it may be more difficult to discern reaction time probes, leading to slower responses or increased errors from participants. To address this possibility, we calculated the root mean square energy (RMSE) for each clip’s audio track, as a measure of loudness. Given we only have 7 RMSE values, we are constrained to make comparisons based on those values and have averaged reaction times and tone discrimination error rate per clip for this analysis. RMSE was not significantly correlated with reaction times (r(5) = 0.580, p = 0.172), but was significantly associated with error rate (r(5) = 0.823, p = 0.023). As such, we have evidence to conclude that louder clips may be masking the detection of the tones. We therefore included RMSE as a fixed effect in subsequent linear mixed-effects models, to account for this effect of clip volume.

To assess which dimensions of the Narrative Engagement Scale were influencing reaction time, we fitted a linear mixed model to our data using the ‘lme4’ package in R (Bates et al., 2007). We included fixed effects for each dimension of narrative engagement (attentional focus, emotional engagement, narrative presence, narrative understanding). We also included fixed effects for single-item immersion, clip order (which clip was viewed 1st, 2nd 3rd, etc., given the tendency for reaction times to increase over time; Hinde et al., 2018), clip volume (RMSE), and for familiarity (whether participants had seen the clip before). Participant was set as a random slope to account for participant-level differences in average reaction time.

As shown in Fig. 2, emotional engagement led to a significant increase in reaction time: b = 14.02, 95% CI [3.91, 24.14]. Clip order also significantly increased reaction time: b = 12.83, 95% CI [8.13, 17.54]. Narrative presence (b = 7.03, 95% CI [− 12.07, 26.13]), narrative understanding (b = − 3.78, 95% CI [− 12.18, 4.62]), familiarity (b = 9.40, 95% CI [− 6.33, 25.13]), and clip volume (b = 314.70, 95% CI [− 980.44, 1610.35]) did not significantly affect reaction time. Interestingly, despite the dual-task paradigm being a measure of attention, attentional focus also did not affect reaction times (b = − 3.82, 95% CI[− 14.96, 7.31]). We can conclude therefore that increases in reaction time are predominantly driven by emotional engagement in the story.

We then looked to assess whether synchrony in reaction time was related to narrative engagement. We rely on the most widely used method to measure synchrony: inter-subject correlation (Nastase et al., 2019). For each clip, a correlation matrix between all pairs of participants is produced. We then take an average of each row of that matrix (each participant), which provides a score for each participant of how synchronous they are with all other participants. We find that inter-subject correlation in reaction times (ISC_RT) was not significantly related to narrative engagement (r(5) = 0.554, p = 0.197) or immersion (r(5) = 0.490, p = 0.264). As with mean reaction times, we then looked to assess whether this correlation was robust across participants, by calculating the relationship between individual participant’s ISC_RT and narrative engagement. Individual participant’s ISC_RT was significantly related to narrative engagement (mean r = 0.216, t(162) = 6.73, p = 2.84 × 10^–10) and immersion (mean r = 0.182, t(162) = 5.75, p = 4.26 × 10^–08).

Familiarity

Additional file S1: Figure S4 (left) presents a breakdown of participant’s familiarity scores in Experiment 1. To assess if familiarity affected our measures, we used Welch’s two-sample, two-tailed t tests to account for the unequal sample size and variance between familiar and unfamiliar groups. Participants who had seen any of the series before were more engaged (M unfamiliar = 4.04, M familiar = 4.63, t(1006) = − 11.06, p < 0.001) but did not show differences in reaction times (M unfamiliar = 985, M familiar = 989, t(939) = − 0.22, p = 0.82). Similarly, participants who had seen the specific clip before were more engaged (M unfamiliar = 4.14, M familiar = 4.9, t(264) = − 11.62, p < 0.001) but did not show differences in reaction times (M unfamiliar = 982, M familiar = 1019, t(226) = − 1.45, p = 0.15). To summarise, familiarity was associated with higher narrative engagement, but did not affect reaction time.

Single-item question of immersion indexes the full narrative engagement scale

Finally, we looked to assess whether our single-item question of immersion was related to overall narrative engagement. We found a significant correlation between single-item immersion and the Narrative Engagement Scale: r(1146) = 0.797, p = 2.2 × 10^–16. To assess which dimensions immersion was related to, we fit a linear regression predicting immersion from each dimension of narrative engagement. Immersion was predicted by attentional focus (b = 0.61, p = 2 × 10^–16), emotional engagement (b = 0.28, p = 2 × 10^–16), and narrative presence (b = 0.50, p = 2 × 10^–16), but not narrative understanding (b = 0.01, p = 0.77). This offers a promising indication that most dimensions of engagement (excluding understanding) could be indexed by a single-item questionnaire.

Discussion

Experiment 1 provides strong evidence that greater levels of immersion are associated with slower dual-task reaction times. This provides strong support for the view that immersion arises from changes in attention (Murray, 1998; Thon, 2008). This is consistent with numerous other findings within the media literature: for example, viewers show attentional synchrony in gaze behaviour (Smith & Henderson, 2008) and viewers are resistant to oculomotor capture by salient visual distractors (Hinde et al., 2017).

A simple way to interpret these results is in terms of an enveloping of perceptual apparatus (Green & Brock, 2000). From this perspective, when immersed, fewer resources are available for the secondary task because they are dedicated towards attending and perceiving the on-screen events. For example, richer visual experiences (such as high dynamic range) lead to slower reaction times on the dual-task paradigm (Hinde et al., 2022). There is some further evidence that larger screens are also more engaging (Troscianko et al., 2012), and that viewing on a television rather than a smartphone is more immersive (Szita & Rooney, 2021). It is possible then that immersion is in part driven by simple visual features such as contrast, luminance, or chrominance.

However, visual properties of the content alone are unlikely to be sufficient to fully explain why more attention is allocated towards engaging stimuli. For example, Bezdek and Gerrig (2017) find that participants are slower to respond to dual-task reaction time probes during moments of higher narrative suspense and provide evidence that simple visual features alone are an inadequate explanation for this. Instead, we may consider immersion as a form of mental simulation, where viewers are occupied with constructing models to represent characters, events, and scenes (Zwaan, 1999). Slower reaction times then may be a consequence of the greater cognitive elaboration arising from processing the narrative, and this is consistent with our result that greater emotional engagement is associated with slower reaction times.

In Experiment 1, familiarity was associated with higher narrative engagement, but did not influence reaction times. This finding intuitively suggests that participants consume more of the content which they find engaging. However, regardless of whether participants have seen the clips before (and therefore may know what to expect), their allocation of attentional resources towards the content remains unchanged. We do note that this experiment did not ask participants how long ago they previously saw the content; participants who recently viewed the content may show differences in attentional orientation.

While dual-task reaction times are able to index the focussed attention arising during immersion, they are not without their own pitfalls. The regular probe intervals may themselves act as a distraction from becoming fully immersed within the media. As an example of this, Hammond et al. (unpublished) find evidence that reaction time probes cause subsequent physiological responses which may be associated with a startle reflex. Further, while providing moment-to-moment estimates, reaction times are not truly continuous (our experiment used an interstimulus interval of 15 s), and as such could not be used to ascertain faster-moving changes in attention and immersion.

Experiment 2

Experiment 2 looked to further explore the relationship between immersion and two physiological measures: heart rate and skin conductance. These measures have both been widely used within media psychology (Gregersen et al., 2017; Kraj et al., 2020; Richardson et al., 2020; Sukalla et al., 2015); however, they have not yet been validated against self-reported immersion. As physiological measures, they avoid the pitfalls of dual-task reaction times in that they do not disrupt the viewing experience and can easily be sampled at a higher frequency.

Heart rate and skin conductance may also be sensitive to different dimensions of immersion than dual-task reaction times. Heart rate indexes parasympathetic and sympathetic nervous system activity (Levy, 1971), and is known to vary in response to cognitive processing demands (Potter & Bolls, 2012). Skin conductance is one of the few physiological measurements singly innervated by the sympathetic nervous system, is considered a measure of arousal, and is known to vary with the emotional content of a stimulus (Boucsein, 2012). Recent research has found that time-locked correlations in heart rate between participants (synchrony) relate to attention towards narrative content (Madsen & Parra, 2022; Pérez et al., 2021; Stuldreher et al., 2020). Synchrony in skin conductance may additionally relate to the emotional content of the media (Han et al., 2021).