Detection Response Task
The DRT data reflect the response to the onset of the red light in the peripheral detection task. RT was measured to the nearest millisecond. Hit rate was calculated based on a response to the red light, which was coded as a “hit”, and a non-response to a red light, which was coded as a “miss”. The RT and hit rate data for the DRT task are plotted as a function of Age × Condition in Figs. 1 and 2, respectively. The data from the DRT task are also plotted as a function of Session × Condition in Figs. 3 and 4, respectively. The data are broken down by active involvement in the IVIS condition, denoted by a suffix of “-1” (i.e., IVIS-1) or when participants were operating the vehicle without concurrent secondary-task interaction, denoted by a suffix of “-0” (i.e., IVIS-0).
The DRT is inversely related to the workload in the driving task (e.g., Strayer et al., 2015a, 2015b). Thus, increases in RT and decreases in hit rate are indicative of an increase in the workload experienced by the driver. As can be seen in Figs. 1 and 2, RT increased and hit rates decreased as a function of the experimental condition and the age of the participant. Additionally, the age-related differences observed in the single-task baseline were amplified in the IVIS-1 condition. Perusal of Figs. 3 and 4 shows that RT decreased and hit rates increased with practice and the practice effects observed in the single-task baseline were exacerbated in the IVIS-1 condition.
MANOVA
The DRT data were first analyzed using a 3 (Age) × 10 (Vehicle)Footnote 3 × 4 (Condition) × 2 (Session) MANOVA that included both reaction time and hit rate as dependent variables.Footnote 4 There were significant main effects of Age (F(4, 454) = 14.07, p < 0.001, η2 = 0.110), Condition (F(6, 1362) = 164.86, p < 0.001, η2 = 0.421), and Session (F(2, 226) = 48.61, p < 0.001, η2 = 0.301). In addition, Condition interacted with Age (F(12, 1362) = 8.15, p < 0.001, η2 = 0.067), Vehicle (F(54, 1362) = 1.53, p = 0.009, η2 = 0.057), and Session (F(6, 1362) = 12.54, p < 0.001, η2 = 0.052). None of the other effects were significant.
Reaction time
The reaction time data from the DRT were analyzed using a 3 (Age) × 10 (Vehicle) × 4 (Condition) × 2 (Session) Analysis of Variance (ANOVA). The analysis revealed significant main effects of Age (F(2, 227) = 31.71, p < 0.001, η2 = 0.218), Condition (F(3, 681) = 894.29, p < 0.001, η2 = 0.798), and Session (F(1, 227) = 84.65, p < 0.001, η2 = 0.272). In addition, as can be seen in Fig. 1, Condition interacted with Age (F(6, 681) = 15.75, p < 0.001, η2 = 0.122), Vehicle (F(27, 681) = 2.00, p = 0.002, η2 = 0.074), and Session (F(3, 681) = 16.62, p < 0.001, η2 = 0.068). None of the other effects were significant.
Hit rate
The hit rate data from the DRT task were analyzed using a 3 (Age) × 10 (Vehicle) × 4 (Condition) × 2 (Session) ANOVA. The analysis revealed significant main effects of Age (F(2, 227) = 17.87, p < 0.001, η2 = 0.136), Condition (F(3, 681) = 129.15, p < 0.001, η2 = 0.363), and Session (F(1, 227) = 53.61, p < 0.001, η2 = 0.191). In addition, as shown in Fig. 2, Condition interacted with Age (F(6, 681) = 7.94, p < 0.001, η2 = 0.065), Vehicle (F(27, 681) = 1.87, p = 0.005, η2 = 0.069), and Session (F(3, 681) = 12.44, p < 0.001, η2 = 0.052). None of the other effects were significant.
The Condition × Age interaction (Figs. 1 and 2) indicates that the costs of the IVIS interactions were greater for older adults than for younger adults. RT increased with age by 18.2 % in the single-task condition and by 29.7 % in the IVIS-1 condition. A similar analysis of hit rates found a decrease with age of 2.1 % in the single-task condition and of 8.5 % in the IVIS-1 condition. This interaction was also found in the log transformed RT data (F(2, 227) = 10.17, p < 0.001, η2 = 0.071).Footnote 5
The Condition × Session interaction (Figs. 3 and 4) indicates that the effects of practice were more pronounced when participants were using the IVIS than when they were in the single-task condition. RT decreased with practice by 3.5 % in the single-task condition and by 9.0 % in the IVIS-1 condition. A similar comparison on hit rates found an increase with practice of 1.4 % in the single-task condition and of 5.7 % in the IVIS-1 condition.
The MANOVA reported above found a significant Condition × Vehicle interaction that requires additional analyses for clarity in the interpretation. The interaction could be due to difficulties operating the vehicle, workload differences with the IVIS interactions, or a combination of the two. To discriminate between these interpretations, we created a composite of the DRT measures obtained from the IVIS-1 condition by taking the weighted average of the z-transformed RT and hit rate data. This transformation was necessary because RT and hit rate are on different scales and the result was a score that was centered at 0 and the standard deviation was 1.Footnote 6 A similar procedure was used to compute the single-task and OSPAN composite scores.
Figure 5 presents the average of z-transformed DRT data plotted as a function of Vehicle in the IVIS-1 condition. For comparison, performance in the z-transformed DRT data for the single-task and OSPAN conditions is also included in Fig. 5. To better understand the Condition × Vehicle interactions reported above, a between-subject ANOVA was performed on the z-transformed data from the IVIS-1 condition. This analysis revealed a significant effect of Vehicle (F(9, 247) = 2.03, p = 0.037). By contrast, a similar analysis on the z-transformed data from the single-task and OSPAN conditions failed to yield a significant effect of Vehicle (F(9, 247) = 0.16, p = 0.320 and F(9, 247) = 1.04, p = 0.411, respectively). Moreover, an Analysis of Covariance (ANCOVA) on the data obtained in the IVIS-1 condition that held constant any performance differences in the single-task condition, found a significant effect of the IVIS voice-based interaction (F(9, 246) = 3.29, p < 0.001, η2 = 0.107). This pattern is important because it indicates that there were significant differences in DRT performance when participants were interacting with the IVIS, but there were no significant differences in DRT performance when they were just driving the vehicle. That is, the workload differences were associated with the IVIS voice-based interaction and not driving the vehicle by itself.
Residual costs
A surprising finding was that the off-task performance in the DRT task differed significantly from single-task performance. Given that drivers were not engaged in any secondary-task activities during the off-task portions of the drive, it suggests that there were residual costs that persisted after the IVIS interaction had terminated. Figure 6 presents the residual costs plotted as a function of the time since the IVIS interaction terminated and the solid blue line reflects the best-fitting power function :
$$ f(x)=a*\left({x}^{-.1878}\right) $$
(1)
where a = exp(6.6915), with R
2 = 0.98.
The residual costs took a significant amount of time to dissipate. In fact, the data indicate that off-task performance reflects a mixture of “single-task” performance and the persistent costs associated with the IVIS interactions from the immediately preceding on-task period. One way to contextualize these residual costs is to use logic underlying the workload scale developed by Strayer et al. (2015b) to estimate, based solely on the DRT RT data, when the cognitive workload would reach a category 4 level (approximately 6 s), when it would reach a category 3 level (approximately 9 s), and when it would reach a category 2 level (approximately 15 s). The residual costs are notable because of their magnitude, their duration, and the fact that they are obtained even when there is no active switch to perform another task. They appear to reflect the lingering act of disengaging from the cognitive processing associated with the IVIS task and fully re-engaging attention to the driving environment. From a practical perspective, the data indicate that just because a driver terminates a voice-based interaction does not mean that they are no longer impaired. Indeed, the residual costs are at a category 3 level of impairment 9 s after the IVIS interaction had terminated. At the 25 mph speed limit in our study, drivers would have traveled over the length of a football field during this 9-s interval.
Subjective
Subjective assessments of workload were made using the NASA TLX and supplementary questions on the intuitiveness and complexity of the IVIS systems. The NASA TLX is a subjective measure of workload that is composed of six sub-scales that range from 0 (no workload) to 21 (very high workload). As illustrated in Fig. 7, the subjective workload increased as a function of Condition. Figure 8 shows that the subjective workload decreased with practice. Figure 9 documents an increase in the subjective workload as a function of age of the participant.
NASA TLX
The six sub-scales of the NASA TLX were analyzed using a 3 (Age) × 10 (Vehicle) × 3 (Condition) × 2 (Session) ANOVA. There were significant main effects of Vehicle (F(54, 1362) = 1.47, p = 0.016, η2 = 0.055), Condition (F(12, 900) = 72.10, p < 0.001, η2 = 0.490), and Session (F(6, 222) = 28.51, p < 0.001, η2 = 0.435). In addition, Condition interacted with Age (F(24, 1880) = 2.46, p < 0.001, η2 = 0.032), Vehicle (F(108, 2724) = 1.60, p < 0.001, η2 = 0.060), and Session (F(12, 900) = 3.36, p < 0.001, η2 = 0.043). The Session × Vehicle (F(54, 1362) = 1.36, p = 0.045, η2 = 0.051) and the Session × Age × Vehicle interactions were also significant (F(108, 1362) = 1.30, p = 0.025, η2 = 0.094). None of the other effects were significant.
As with the DRT analysis described above, there was a significant Condition × Vehicle interaction in the TLX data, which is an analysis with parallel structure to the DRT. We created a composite of the TLX measures obtained from the IVIS condition by taking the weighted average of the z-transformed sub-scales of the TLX. A similar transform was used to compute the single-task and OSPAN composite scores. As with the DRT analysis, a main effect of the Vehicle holding constant any differences in single-task workload ratings (using ANCOVA) would indicate that subjective workload of the IVIS interactions differed as a function of Vehicle.
Figure 10 presents the average of z-transformed TLX data plotted as a function of Vehicle in the IVIS condition. For comparison, performance in the single-task and OSPAN conditions is also included in Fig. 10. A between-subject ANOVA that compared the z-transformed data from the IVIS condition found a significant effect of Vehicle (F(9, 247) = 3.08, p = 0.002). A similar analysis on the z-transformed data found a significant effect of Vehicle in the single-task condition (F(9, 247) = 1.96, p = 0.044; a post-hoc analysis found that the Mazda, Hyundai, and Nissan vehicles had higher NASA TLX workload ratings than the VW and Equinox) but not in the OSPAN condition (F(9, 247) = 1.21, p = 0.292). An ANCOVA on the data from the IVIS condition that held constant the performance differences observed in the single-task condition also found a significant effect of IVIS interaction (F(9, 246) = 2.93, p = 0.003, η2 = 0.097). As with the DRT data reported above, this pattern is important because it indicates that there were significant differences in TLX performance when participants were interacting with the IVIS, over and above any differences of just driving the different vehicles. That is, the workload differences were associated with the IVIS voice-based interaction over and above any differences associated with operating the vehicle by itself.
Intuitiveness
Participants were also asked to rate how intuitive, usable, and easy it was to use the IVIS. Figure 11 presents the intuitiveness ratings for the IVIS voice-based interactions on a 21-point scale where 1 reflects “not at all” and 21 reflects “very much”. A 3 (Age) × 10 (Vehicle) × 2 (Session) split-plot ANOVA found that intuitiveness varied as a function of Vehicle (F(9, 227) = 4.55, p < 0.001, η2 = 0.153). None of the other effects were significant (all other p values >0.14).
Complexity
Participants were also asked to rate how complex, difficult, and confusing it was to use the IVIS. Figure 12 presents the complexity ratings for the IVIS voice-based interactions on a 21-point scale where 1 reflects “not at all” and 21 reflects “very much”. A 3 (Age) × 10 (Vehicle) × 2 (Session) split-plot ANOVA found that complexity ratings varied as a function of Age (i.e., older adults found the IVIS interactions to be more complex; F(2, 227) = 6.21, p = 0.002, η2 = 0.052) and Vehicle (F(9, 227) = 4.82, p < 0.001, η2 = 0.160). None of the other effects was significant (all other p values >0.07).
Video analysis
Three performance measures were derived from analysis of the video. These were task completion time, glance location, and practice frequency.
Task completion time
Task completion time, the average task duration for the six tasks in the IVIS condition, is plotted in Fig. 13. The data were analyzed using a mixed model ANOVA with Age and Vehicle as between-subject factors and Session as a within-subject factor. As can be seen in the figure, the time to complete the task varied as a function of Vehicle (F(9, 174) = 20.16, p < 0.001, η2 = 0.511). Additionally, there was a main effect of Session (F(1, 174) = 11.8, p < 0.001, η2 = 0.063) and the Vehicle × Session interaction was also significant (F(9, 174) = 2.04, p < 0.05, η2 = 0.095). However, the main effect of Age was not significant (F(2, 174) = 1.26, p = 0.285, η2 = 0.014) and neither were any of the interactions with Age. These data suggest that practice reduced task completion time but that the effect of this improvement was dependent on the vehicle. Not surprisingly given the long time on task, participants in the Nissan showed the greatest improvement in task completion time, moving from 37.6 s on average during the first session to 28.5 s during the final session; however, even after practice the duration of the interactions with the Nissan were longer than any of the other vehicles in the first session or practice.
Glance location
The percentage of time that drivers spent looking forward, down, and scanning mirrors was analyzed using a 3 (Age) × 10 (Vehicle) × 3 (Condition) × 2 (Session) × 3 (Glance location) mixed model ANOVA with Age and Vehicle as between-subject factors and Session and Condition as within-subject factors. Glance location is plotted as a function of Condition in Fig. 14. There was a significant main effect of Glance location (F(2, 412) = 1247, p < 0.001, η2 = 0.868) and the Glance location × Condition interaction was also significant (F(4, 824) = 10.81, p < 0.001, η2 = 057). None of the other effects were significant.
A simplified 3 (Glance location) × 3 (Condition) repeated measures ANOVA was conducted on the data presented in Fig. 14. Both the main effect of Glance location (F(2, 856) = 126.17, p < 0.001, η2 = 0.983) and the Glance location × Condition interaction were significant (F(4, 856) = 52.9, p < 0.001, η2 = 0.198). Performing the voice tasks with the IVIS led to a reduction in the glance time to the mirrors and forward roadway with a corresponding increase in glance time to the dashboard displays. Similarly, performing the OSPAN task led to a reduction in the glance time to mirrors and dashboard displays with a corresponding increase in glance time to the forward roadway. Given that the primary task was to drive the vehicle and that the secondary tasks were primarily cognitive in nature, it is not surprising that drivers maintained their eyes on the forward roadway the majority of the time.
Practice frequency
The frequency of practice was coded from the video recordings. On average, participants completed a total of 21.8 (standard deviation = 19.3) voice-based tasks during the 5 days that they had the vehicle. As shown in Fig. 15, the age of the participant did not affect the amount of practice with the IVIS voice systems. Participants gained the most practice with the music selection task, followed by the contact-calling task, then the number dialing task. The practice data were analyzed using a 3 (Age) × 4 (Practiced item: contact call, number dial, music selection, other) ANOVA. The main effect of Practiced item was significant (F(3, 522) = 41.1, p < 0.001), but neither the main effect of Age nor the Age × Practiced item interaction were significant.
The cognitive distraction scale
A primary objective of the current research was to compare the cognitive workload associated with IVIS interactions in ten different vehicles as drivers of different ages completed common IVIS voice-based tasks (e.g., voice dialing, music selection, etc.). Because the different dependent measures collected in this research were recorded on different scales, each was transformed to a standardized score. This involved z-transforming the two DRT measures and the six NASA TLX measures to have a mean of 0 and a standard deviation of 1. The standardized scores were then weighted and summed to provide an aggregate measure of cognitive distraction. Weighting was equally assigned to the DRT and TLX so that each accounted for 50 % of the collective rating. Finally, the aggregated standardized scores were scaled such that the non-distracted single-task driving condition anchored the low-end (category 1) and the OSPAN task anchored the high-end (category 5) of the cognitive distraction scale. For each of the other tasks, the relative position compared to the low and high anchors provided an index of the cognitive workload for that activity when concurrently performed while operating a motor vehicle. The four-step protocol for developing the cognitive distraction scale is listed below.
-
Step 1: For each dependent measure, the standardized scores were computed using zi = (xi − X)/SD, where X refers to the overall mean and SD refers to the pooled standard deviation.
-
Step 2: For each dependent measure, the standardized condition averages were computed by collapsing across subjects.
-
Step 3: The standardized averages were computed with an equal weighting for secondary (i.e., DRT performance) and subjective (i.e., NASA TLX performance) metrics. The measures within each metric were also equally weighted. For example, the secondary task workload metric was comprised of an equal weighting of the measures DRT RT and DRT hit rate.
-
Step 4: The standardized mean differences were range-corrected so that the non-distracted single-task condition had a rating of 1.0 and the OSPAN task had a rating of 5.0
$$ {\mathrm{X}}_{\mathrm{i}}=\left(\left(\left({\mathrm{X}}_{\mathrm{i}}- min\right)/\left( max- min\right)\right)*4.0\right)+1 $$
(5)
The cognitive workload scale for the different conditions is presented in Fig. 16. By definition, the single-task condition had a rating of 1.0 and the OSPAN condition had a rating of 5.0. The rating for the different IVIS interactions varied considerably across vehicles, from a low rating of 2.37 to a high of 4.57. Instances where the pairwise difference between adjacent systems was significant are denoted by an asterisk in Fig. 16.