For analyses estimating linear mixed-effects models, we used the lme4 package (Bates, Mächler, Bolker, & Walker, 2015) in R (R Core Team, 2015) with type III sums of squares. Significance for these models was assessed using the lmerTest package (Kuznetsova, Brockhoff, & Christiansen, 2015; Luke, 2017) with Satterthwaite’s approximation for degrees of freedom. To account for variance across subjects and slideshows, all dwell-time analyses included random intercepts for these two variables. Where applicable, linear mixed-effects models included orthogonal contrasts to explore effects of interest.
Slideshow resolution influences overall dwell time
In a first set of analyses, we examined the extent to which slideshow resolution impacted viewers’ per-slide log10 dwell times (i.e., prior to residualization but with outliers replaced and dwell times log10 transformed to correct for positive skew) as they advanced at their own pace through the slideshows. Average log10 per-slide dwell times for 1-fps slideshows (M = 2.70, SD = 0.24) were significantly higher than those for 2-fps slideshows (M = 2.63, SD = 0.24), β = 0.03, t(5.64) = 5.96, p = .001. Because the additional content in 2-fps slideshow versions might uniquely influence the observed dwell-time differences related to resolution, we also performed this analysis using only the subset of slides that occurred in both the 1-fps and 2-fps slideshows and thus, depicted identical content (henceforth, matched slides; this included all slides in 1-fps slideshows and odd-numbered slides in the 2-fps slideshows). As in our previous analysis, when only matched slides were considered, average log10 per-slide dwell times were still longer for slideshows at 1-fps resolution (M = 2.70, SD = 0.24) relative to slideshows at 2-fps resolution (M = 2.63, SD = 0.24), β = 0.03, t(5.51) = 6.02, p = .001. In fact, the difference in means for only the matched slides at the 2-fps resolution versus across all slides at 2-fps resolution was extremely small. Despite the overall resolution-related difference in per-slide dwell times, dwell-time patterns were strikingly aligned across 1-fps versus 2-fps versions of a given activity sequence, as can be seen in Fig. 2. As a test of the extent to which dwell-time patterns are aligned across rates of resolution, if we consider only matched slides across the 1-fps and 2-fps slideshows, observers’ log10 per-slide dwell times were highly positively correlated, r(320) = .72, p < .001, 95% CI [.67, .77]. That is, slides that elicited increased dwelling in the 1-fps slideshows were also likely to elicit increased dwelling in the 2-fps slideshows.
These results counter the simple content-tracking account: if attention to a given slide simply reflected that slide’s content, we would not expect average per-slide dwell times to differ across 1-fps and 2-fps resolutions, yet they did. These findings also seem inconsistent with the boundaries are conceptually special account, which predicts that missing boundaries in the 1-fps slideshows should elicit higher dwelling to subsequent within-unit slides. In particular, boundaries are conceptually special predicts that dwell times to a subset of within-unit slides in a given 1-fps slideshow would be high while dwell times to the corresponding within-unit slides in the relevant 2-fps slideshow (e.g., when the boundary was present) would be low. Because of this discrepancy, dwell times across the two slideshows would not be expected to correlate strongly, yet they did. In contrast, the strong positive correlation in dwell times across 1-fps and 2-fps versions is consistent with both the physical change and information-optimization accounts. It is likely that slide-to-slide pixel change (i.e., physical change) is greater in the 1-fps over the 2-fps slideshows, thus predicting longer dwell times. Under the information-optimization account, predictability from one slide to the next should be lower for 1-fps versus 2-fps slideshows, with the consequent prediction of longer average per-slide dwell times.
Dwell-time patterns replicated
In our next set of analyses, we asked whether our results replicated prior research using the dwell-time paradigm. Specifically, we tested for (1) a boundary advantage (longer dwelling on a boundary relative to within-unit slides) and (2) a hierarchical advantage (longest dwell times for coarse boundaries, shorter for fine boundaries, and shortest for within-unit slides). We also explored the extent to which these effects differed across slideshow resolution. This linear mixed-effects model thus included slide type (coarse, fine, and within) and resolution as fixed effects and random intercepts for subjects and slideshows. Additionally, it is relevant to note that in these analyses we considered all slides that were present in 1-fps and 2-fps slideshow versions. That is, slideshows filmed at 2 fps had additional content (including boundaries at both levels of structure as well as within-unit content) relative to 1-fps slideshows. Therefore, in the tests of boundary and hierarchical advantage, we asked about the extent to which such effects were robust to slideshow resolution given all slides that were present in the slideshow (not just matched slides).
This analysis of residualized dwell times yielded a significant boundary advantage (M = 0.009, SD = 0.12) relative to within-unit content (M = 0.001, SD = 0.12), β = 0.008, t(29865) = 6.16, p < .001, replicating prior research. Also replicating prior research, average residualized dwell times were greater for: (1) coarse (M = 0.017, SD = 0.13) over fine-level slides (M = 0.007, SD = 0.12), β = −0.010, t(29865) = −3.18, p = .001; (2) coarse over within-unit slides (M = 0.001, SD = 0.12), β = −0.018, t(29865) = −5.36, p < .001; and (3) fine over within-unit slides, β = −0.006, t(29865) = −3.64, p < .001. No significant main effect of resolution emerged in the analysis, β = 0.002, t(29865) = 1.75, p = .08. However, because residualization involves fitting a power curve and calculating deviations from that curve for each participant individually, the process tends to remove differences between individuals, and hence, also group-level differences (which is where resolution differences would occur). Therefore, this discrepancy from the initial set of analyses reported earlier (examining raw dwell times) is not unexpected. As well, no significant interaction emerged in the analysis, p = .16, suggesting the boundary advantage persisted across both 1-fps and 2-fps resolutions. All in all, boundary and hierarchical advantage patterns were replicated in these activity sequences, and they were robust to slideshow resolution differences. Regardless of the presence or absence of information (i.e., 1-fps vs. 2-fps resolution), boundaries that remained in the slideshow garnered enhanced attention and did so to a higher degree the greater the granularity of the boundary content.
Effect of missing boundary content
Thus far, the reported findings appear to rule out the simple content-tracking account. Examining the effect of missing boundary content was particularly useful for testing the predictions of the boundaries are conceptually special account, as described earlier. Of particular interest was the extent to which missed boundary content might spark enhanced dwelling on the slide depicting content immediately subsequent to the missed boundary. Recall that under the boundaries are conceptually special account, when a boundary is missed (as occurs in the 1-fps resolution slideshows), viewers would need to infer the missed content, leading to increased attention to the slide immediately following the missed boundary. Further, we would expect that when that same boundary is present (i.e., in the 2-fps slideshows), dwell times for the boundary slide would be high, but dwell times for the within slide immediately following the (present) boundary would be like other within-unit slides in the event sequence, and hence, relatively low.
Specifically, the conceptually special account predicts an interaction between resolution (1 fps vs. 2 fps) and slide type (boundary, within, and within-after-missed-boundary). That is, dwell times should be high for boundaries and low for within-unit slides across both 1-fps and 2-fps resolutions. The locus of the effect, however, should be the difference in dwell times for within-unit slides immediately following boundaries that are missed in the 1-fps slideshows relative to those same within-unit slides following boundaries that are present in 2-fps slideshows. At 1-fps resolution, dwell times on within-unit slides following missed boundaries should be high (as these within slides after missing boundaries are essentially functioning as the opportunity to infer missed boundary content). At 2-fps resolution, on the other hand, the relevant boundary content is present; thus, the very same within slides (that followed missed boundaries in the corresponding 1-fps slideshows) should elicit the reduced dwelling that is characteristic of within-unit content.
To explore the influence of missing boundary content, we ran a linear mixed-effects model with fixed effects of slide type (boundary, within, and within after 1-fps missed boundary) and resolution (1 fps vs. 2 fps) and random intercepts for subjects and slideshows. Residualized dwell times on boundaries (M = 0.009, SD = 0.12) and within-after-missed boundaries (M = 0.008, SD = 0.12) did not differ, β = −0.001, t(29859) = −0.38, p = .71. However, consistent with the prediction of the boundaries are conceptually special account, residualized dwell times for both boundaries and within-after-missed-boundaries were significantly higher than residualized dwell times for within-unit slides (M = −0.001, SD = 0.12), β = 0.011, t(29859) = 6.50, p < .001 and β = 0.010, t(29859) = 5.12, p < .001, respectively (Fig. 3). The main effect of resolution (1 vs. 2 fps) was not significant, p = .49, nor was the interaction between slide type and resolution, p = .06.
As an additional test of the boundaries are conceptually special account, and despite the non-significant resolution-related effects, we opted also to explore the extent to which residualized dwell times on boundary slides, within-unit slides, and within-after-missed-boundaries differed across the 1-fps and 2-fps resolutions. This analysis tested a key prediction made by the boundaries are conceptually special account. Specifically, dwelling on within slides that followed missed boundaries should be higher in 1-fps slideshows (i.e., when the boundary is missed) than on those same slides in 2-fps slideshows (i.e., where that same boundary slide is present), since dwell times for a within slide after a 1-fps missed boundary would be expected to be similar to other within-unit slides in the sequence. In contrast to what the boundaries are conceptually special account would predict, there was no significant difference in residualized for within-after-missed-boundaries across 1-fps resolution (M = 0.010, SD = 0.13) and 2-fps resolution (M = 0.007, SD = 0.11), β = 0.002, t(6.43) = 0.61, p = .56. Additionally, residualized dwell times for boundary slides did not significantly differ across slideshows viewed at 1-fps resolution (M = 0.011, SD = 0.13) and 2-fps resolution (M = 0.007, SD = 0.11), β = 0.002, t(4.63) = 0.63, p = .56, nor did residualized dwell times for within-unit slides (M1fps = −0.004, SD1fps = 0.12; M2fps = −0.0002, SD2fps = 0.11), β = −0.001, t(4.39) = −0.99, p = .37. This pattern of results held when boundaries were considered separately at the coarse and fine levels of hierarchical structure.
In summary, residualized dwell times to within slides following a missed boundary were elevated relative to other within slides, as the conceptually special account predicts. However, a second result clearly militates against the account: when that boundary was actually present (i.e., in the 2-fps slideshows), residualized dwell times for the same within slide were equivalently elevated. This disconfirms the conceptually special account’s prediction that dwell times for within slides following missed boundaries will be higher than those for the same slides when the preceding boundary slide is present. The information-optimization account, in contrast, is consistent with both these findings, and all findings thus far reported. First, the general difficulty of predicting from one slide to the next would increase as resolution decreases, engendering overall per-slide increases in dwell time for 1-fps compared to 2-fps slideshows. Second, for the information-optimization account, boundary slides would be expected to garner increased attention simply because they forecast immediately upcoming low predictability. Given that the predictability structure of the activity depicted in 1-fps and 2-fps slideshows is identical, the account predicts a comparable pattern of dwelling on slides across resolution differences (at least, as long as the resolution is high enough that the nature of the activity is discernible). That is, the account predicts—for both 1-fps and 2-fps slideshows—that attention will ramp up in anticipation of boundaries and continue to stay high immediately following boundaries, but will reduce for within-unit slides that occur further away from boundary regions (where predictability correspondingly increases). Our next analyses directly investigated this idea.
Dwell times reflect boundary regions
In this set of exploratory analyses, we further investigated the information-optimization account by examining the time course of residualized dwell times before and after event boundaries. Pre-boundary slides were classified as slides occurring one or two slides before an event boundary, while post-boundary slides were classified as slides occurring one or two slides after the boundary. Thus, the region factor had a total of five levels: (1) two pre-boundary, (2) one pre-boundary, (2) boundary, (4) one post-boundary, and (5) two post-boundary. We explored these region effects separately across the coarse and fine levels of hierarchical structure in a set of two linear mixed-effects models. As in the boundary and hierarchical advantage analyses reported earlier, we included all boundary slides in the slideshows; therefore, slideshows filmed at 2 fps had additional content (including boundaries at both levels of structure as well as within-unit content) relative to 1-fps slideshows.
Overall, we predicted that slides closer to boundaries would elicit increased dwelling while slides further from event boundaries would have reduced dwell times, replicating Hard et al.’s (2011) previous findings that dwell times ramp up as event boundaries approach, and that they decrease thereafter. Under the information-optimization account, we might predict that these effects would differ across slide type and rate of resolution. The regions surrounding coarse boundaries are likely less predictable than regions surrounding fine boundaries. Therefore, we might expect that increased dwell times for slideshow regions would persist longer when boundaries fall at the coarse level of hierarchical structure and be more focused at the fine level of structure. Further, we might also predict that it would take longer to resolve unpredictability at the lower 1-fps resolution and therefore, dwell times would remain high for longer after a boundary in these 1-fps slideshow versions. Because of their exploratory nature, the goal of the next set of analyses was simply to characterize the pattern of dwell times before and after event boundaries across 1-fps and 2-fps resolution and coarse- and fine-level slides.
The first linear mixed-effects model included boundary region (two pre-coarse, one pre-coarse, coarse, one post-coarse, and two post-coarse) and resolution (1 fps vs. 2 fps) as fixed effects and subjects and slideshows as random effects (intercepts). For coarse-level event boundaries, the effect of region was best characterized by a linear trend, β = 0.02, t(5151.11) = 5.37, p < .001. While the effect of resolution was not significant, β = −0.004, t(5.18) = −0.54, p = .61, we did find a significant interaction between resolution and the observed linear trend, β = 0.008, t(5151.10) = 1.92, p = .05. Follow-up tests (separate mixed-effects models for 1-fps and 2-fps resolutions) revealed that this interaction was synergistic in nature. While the linear trend was present across both the 1-fps resolution, β = 0.03, t(1609.47) = 4.18, p < .001, and 2-fps resolution, β = 0.01, t(3425.02) = 3.16, p = .002, the effect appeared stronger for slideshows viewed at a rate of 1 fps. As depicted in Fig. 4, for coarse-level boundaries at both 1-fps and 2-fps resolutions, dwell times increased across the two pre-boundary slides, but then remained relatively high across the coarse-level event boundary and the two post-boundary slides.
The next linear mixed-effects models focused on fine-level boundaries across slideshows viewed at 1-fps and 2-fps resolutions. We again included fixed effects of boundary region (two pre-fine, one pre-fine, fine, one post-fine, and two post-fine) and resolution and random intercepts for subjects and slideshows. For fine-level event boundaries, the boundary region effect was best characterized by a quadratic trend, β = −0.02, t(10,249.32) = −5.24, p < .001. The effect of resolution was not significant, β = 0.001, t(5.15) = 0.19, p = .86, nor did it interact with boundary region (p = .71). As depicted in Fig. 5, dwell times for both 1-fps and 2-fps resolutions increased across the two pre-boundary slides, peaking around the fine-level event boundary, and they began to decline thereafter.
Taken together, these findings largely replicate the results of Hard et al. (2011) and seem to provide further support for the information-optimization account of event processing. As predicted under this account, we observed that dwell times ramped up in advance of boundary slides at both coarse- and fine-levels of hierarchical structure. In some cases, especially at the coarse level of structure, dwell times remained high after event boundaries. This, perhaps, reflects the relatively lower predictability junctures represented by coarse-level boundaries. As can be seen in Fig. 4, which includes additional pre- and post-boundary slides, dwell times did begin to decline at about four slides after the coarse-level boundary; perhaps, this is the point at which event sequences become relatively more predictable and thus, requires less attention. For fine-level boundaries, dwell times ramped up before boundaries and began to decline not long after, perhaps because it takes less time to resolve the unpredictability that occurs at fine-level boundaries. Under this account, and as we observed, dwell times should begin declining shortly after the fine-level boundary occurs. While our results thus far generally favor the information-optimization account of event processing, we have not yet ruled out the physical change account. We next directly tested the extent to which physical change (operationalized here as slide-to-slide pixel change) was related to dwell-time patterns.
Physical change and dwell-time patterns
Across all slides in 1-fps and 2-fps slideshow versions, we calculated slide-to-slide pixel change using the algorithm outlined in Loucks and Baldwin (2009). Briefly, for each slide, this algorithm compares the RGB values of each pixel to the RGB values of the corresponding pixel in the immediately preceding slide and generates a change value. It was not possible to calculate the pixel change for the first slide in all slideshows (because there was no immediately previous slide); therefore, the first slide was dropped from these analyses.
It seems likely that the slide-to-slide physical change would be heightened for 1-fps over 2-fps slideshows simply due to differences in the rate at which information unfolds across these two levels of resolution. Our first pixel-change analysis directly tested the extent to which physical change differed across 1-fps and 2-fps slideshows. In this analysis, we ran a regression predicting the pixel change from the resolution (1 fps vs. 2 fps). As predicted, the slide-to-slide pixel change for 1-fps slideshows (M = 1.11 × 107, SD = 4.92 × 106) was significantly larger than for slides in the 2-fps slideshows (M = 8.84 × 106, SD = 4.14 × 106), β = 1.11 × 106, t(956) = 7.32, p < .001. To ensure that the additional content present in 2-fps slideshow versions was not the sole reason for this difference in pixel change, we next asked whether these results held when we considered only matched slides (i.e., those identical across 1-fps and 2-fps slideshows). We again found that the average slide-to-slide pixel change was larger for 1-fps slideshow versions (M = 1.11 × 107, SD = 4.92 × 106) than 2-fps versions (M = 8.95 × 106, SD = 4.25 × 106), β = 1.06 × 106, t(638) = 5.82, p < .001.
Additionally, under the physical change account, the degree to which physical change is enhanced for a given slide should be directly related to the degree to which attention is enhanced for that slide. To explore the relation between physical change and enhanced attention, we ran a regression predicting the mean per-slide residualized dwell time from pixel change. As predicted by the physical change account, the pixel change was positively related to the mean per-slide residualized dwell time, β = 2.35 × 10-9, t(956) = 9.83, p < .001.
Finally, to explore further the plausibility of a physical change account, we tested the extent to which the effects of slideshow resolution and boundary advantage held when controlling for pixel change. We ran another regression analysis predicting the mean residualized dwell time for all matched slides from slideshow resolution, whether the slide was a boundary or within-unit slide, and pixel change. Together, these predictors explained a significant amount of variance (12%) in the residualized dwell time, R2 = .12, F(7, 632) = 12.78, p < .001. In a model that included pixel change and interactions with resolution, the boundary advantage effect remained significant, β = 0.012, t(632) = 3.12, p = .002, as did the effect of pixel change, β = 2.04 × 10-9, t(632) = 5.44, p < .001. The effect of resolution was not significant, β = −0.001, t(632) = −0.32, p = .75. Thus, in contrast to the physical change account, the boundary advantage effect accounted for the variance in residualized dwell time above and beyond the effects of resolution and pixel change. However, there was a significant interaction between the boundary advantage effect and pixel change, β = −7.7261 × 10-10, t(632) = −2.06, p = .04. Further exploration revealed that pixel change was not strongly related to dwell times for boundary slides in general r(154) = .15, p = .06, 95% CI [−.01, .30] nor at the coarse, r(26) = .19, p = .33, 95% CI [−.19, .53], and fine, r(126) = .15, p = .09, 95% CI [−.02, .32], levels of structure, but it was positively correlated with within-unit slides, r(482) = .36, p < .001, 95% CI [.28, .44]. None of the additional two- or three-way interactions were significant (p’s > .06). Thus, it appears that pixel change does contribute to dwell-time patterns to some extent, but is more predictive of dwell times for within-unit slides. On the one hand, these analyses provide support for the physical change account. On the other hand, they yielded doubt that physical change alone provides a complete account of attentional patterns in event processing. All in all, the outcome of the pixel-change analyses generally replicated the findings from Hard et al. (2011) described earlier.
Overall dwell times predict recall of slideshow content
Immediately after viewing each of the slideshows, participants were given a free recall task: they were asked to list all the actions that they remembered from the slideshows they had just viewed. Because it has previously been demonstrated that removing boundary content negatively impacts event memory (e.g., Newtson & Engquist, 1976; Schwan & Garsoffky, 2004), we first explored the extent to which these results were replicated when activity was viewed via the novel dwell-time procedure. For all participants (N = 124), we compared their recall score for the slideshow they viewed at 1-fps resolution to their recall score for the slideshow viewed at 2-fps resolution. Four participants were missing recall data from one slideshow each due to their failure to follow the instructions (these participants reported actions that had occurred in the practice slideshow rather than the coffee (2), boot (1), or tidying (1) slideshow). We ran a linear mixed-effects model including a fixed effect of resolution (1 fps vs. 2 fps) and random intercepts for subjects and slideshows. On average, participants recalled 47% (SD = 22%) of the listed actions from each of the slideshows. In contrast to our predictions, however, participants’ recall for items from slideshows they viewed at 2 fps (M = 50%, SD = 22%) did not significantly differ from slideshows they viewed at 1 fps (M = 44%, SD = 22%), β = −0.03, t(5.93) = −0.55, p = .61.
Additionally, previous research suggests that observers’ skill in explicit segmentation tasks (e.g., Zacks et al., 2006) and the extent to which they implicitly increase attention to event boundaries (e.g., Hard et al., 2011) positively predicts event memory. In light of such findings, we anticipated that viewers’ log10 dwell times, specifically dwell times for event boundaries, would positively predict the number of activities they were able to recall. This prediction was confirmed; viewers’ average log10 dwell time for boundary slides in a given slideshow was significantly positively correlated with the number of activities recalled from that slideshow, r(242) = .37, p < .001, 95% CI [.26, .47]. However, the log10 dwell time to within-unit slides was also predictive of the number of actions recalled, r(242) = .37, p < .001, 95% CI [.26, .48], suggesting that this relation was not unique to boundary slides. As well, when controlling for log10 dwell time to within-unit slides, the correlation between boundary slides and number of actions recalled was no longer observed, r(242) = .01, p = .88. Therefore, this package of findings seems best described as a correlation between average per-slide log10 dwell time and recall, r(242) = .37, p < .001, 95% CI [.26, .48]. Perhaps unsurprisingly, participants who, on average, displayed longer per-slide attention to a given slideshow were able to recall more actions from the activity stream depicted in that slideshow.
Of interest in our final analysis was the degree to which dwell time was predictive of event memory above and beyond the effects of resolution. In this regression analysis, we predicted participants’ memory score for each slideshow that they viewed from their average log10 dwell time, the resolution at which they viewed that slideshow, and their interaction. We found that these predictors together explained a significant amount of variance (18%) in recall scores, R2 = .18, F(3, 240) = 18.01, p < .001. Again, participants’ log10 dwell times were predictive of free recall scores, β = 0.43, t(240) = 6.90, p < .001. As in our earlier analysis comparing recall for 1-fps versus 2-fps slideshows, we found that resolution was not a significant predictor of recall scores, β = −0.14, t(240) = −0.81, p = .42. There was also no interaction between log10 dwell time and resolution, β = 0.03, t(240) = 0.54, p = .59, suggesting that log10 dwell time was a positive predictor of recall across both levels of resolution. In sum, participants’ average per-slide log10 dwell time for a given slideshow appears to have been uniquely predictive of the number of actions they recalled and, again, this effect was robust to differences in resolution.