New automobiles provide a number of features that allow motorists to perform a variety of secondary tasks unrelated to the primary task of driving. Many of these IVIS involve complex, multimodal interactions to perform a task. For example, to select a music option a driver might push a button on the steering wheel, issue a voice-based command, view options presented on a liquid crystal display (LCD) located in the center stack, and then select an option using the touchscreen controls on the LCD display. Complex multimodal IVIS interactions such as this may distract motorists from the primary task of driving by diverting the eyes, hands, and/or mind from the roadway (Regan, Hallett, & Gordon, 2011; Regan & Strayer, 2014).
Driver distraction arises from a combination of sources (Ranney, Garrott, & Goodman, 2000; Strayer, Watson, & Drews, 2011). Impairments to driving can be caused by a competition for visual information processing, for example when motorists take their eyes off the road to perform IVIS interactions. Impairments can also come from manual interference, as in cases where drivers take their hands off the steering wheel to perform a task. Finally, cognitive sources of distraction occur when attention is withdrawn from the processing of information necessary for the safe operation of a motor vehicle. These sources of distraction can operate independently, but they are not mutually exclusive, and therefore different IVIS interactions can result in impairments from one or more of these sources. In fact, few if any tasks are “process pure” (Jacoby, 1991) and instead often place demands on multiple resources (Wickens, 2008).
Driver distraction is caused by a diversion of attention from the primary task of operating a motor vehicle (Regan et al., 2011; Regan & Strayer, 2014) resulting in impairments to driving. In some cases, this may involve the concurrent performance of a task that is unrelated to driving (e.g., placing a cell phone call). In other cases, this may involve mis-prioritization of the component tasks associated with operating the vehicle (e.g., attending to a navigational display instead of attending to the forward roadway). It is useful to consider two theoretical accounts for why such interference occurs (e.g., Bergen, Medeiros-Ward, Wheeler, Drews, & Strayer, 2014).
On the one hand, domain-general accounts attribute dual-task interference to a competition for general computational or attentional resources that are distributed flexibly between the various tasks (e.g., Kahneman, 1973; Navon & Gopher, 1979). When two tasks require more resources than are available, performance on one or both of the tasks is impaired. This class of models suggests a transitive property of interference, such that if two tasks, A and B, exhibit dual-task interference and two tasks, B and C, exhibit dual-task interference, then the combination of tasks A and C should also exhibit dual-task interference so long as none of the tasks has reached a data limit.
On the other hand, domain-specific accounts attribute dual-task interference to competition for specific computational resources. The more similar two tasks are, in terms of specific processing resources, the greater the interference, or “code conflict” (e.g., Navon & Miller, 1987), or “crosstalk” (e.g., Pashler, 1994). In essence, two tasks that compete for the same neural hardware cannot be performed at the same time without impairments to one or both tasks. In the context of driving, for example, the visual system cannot process visual information from the forward roadway and information presented on a center stack display or heads-up display at the same time. As drivers perform different IVIS tasks, we looked for evidence of domain-general interference evidence, of domain-specific interference, and situations where both accounts would be supported.
Prior research has evaluated workload when motorists performed activities unrelated to driving. For example, the Crash Avoidance Metrics Partnership (CAMP; Angell et al., 2006) investigated the effects of twenty-two different secondary tasks requiring a combination of visual, manual and cognitive resources on driving performance. Some of the visual-manual tasks required participants to tune the radio or adjust fan speed using physical buttons located in the center console. Auditory-vocal tasks required drivers to listen to a book-on-tape or sport broadcasts and answer related questions. Distinctive driver-performance profiles suggested that task-induced driver workload was multimodal and characterized by different combinations of visual, manual, and cognitive components. In particular, relative to a baseline driving condition, visual-manual tasks were associated with a decrease in the detection of driving-related events and greater time spent glancing away from the forward roadway. By contrast, auditory-vocal tasks tended to focus the driver’s gaze on the forward roadway and resulted in better lane position maintenance - a phenomenon referred to as cognitive tunneling (see Medeiros-Ward, Cooper, & Strayer, 2014; Victor, Harbluk, & Engström, 2005).
In a series of studies, Reimer, Mehler, and colleagues (McWilliams, Reimer, Mehler, Dobres, & McAnulty, 2015; Mehler et al., 2015; Reimer et al., 2014) tested real-world infotainment systems. In Mehler et al. (2015), participants drove two vehicles (2013 Chevrolet Equinox, 2013 Volvo XC60) and interacted with the infotainment systems (MyLink and Sensus, respectively). A combination of ocular measures, subjective workload ratings, and behavioral metrics (e.g., task completion time) was adopted to examine levels of driver workload associated with completing contact calling and navigation-related tasks. Results showed that using visual-manual systems resulted in longer and more frequent off-road glances than auditory-vocal systems. Self-report measures of workload for voice interfaces were higher than those for visual-manual systems. However, the task completion time data showed mixed results, with benefits of auditory-vocal systems observed with MyLink disappearing when drivers used the Sensus system.
Our prior research provided a comprehensive assessment of cognitive workload associated with voice-based interactions, an activity known to divert attention from the driving task and lead to cognitive distraction (Strayer et al., 2015, Strayer, Cooper, Turrill, Coleman, & Hopman, 2016, 2017b). We used converging methods to provide a systematic analysis of the workload associated with different voice-based interactions. This included collecting a variety of performance measures (e.g., primary-task measures, secondary-task measures, subjective measures, and physiological measures) to provide a fine-grained assessment of variations in driver workload as they performed different tasks (e.g., calling and dialing, audio entertainment, text messaging). In Strayer et al. (2016), 257 subjects participated in a week-long evaluation of the IVIS interaction in one of 10 different model-year 2015 automobiles. After an initial assessment of the cognitive workload, participants took the vehicle home for 5 days and practiced using the system. At the end of the 5 days of practice, they returned and the workload of these IVIS interactions was reassessed. The cognitive workload was found to be moderate to high and was associated with the intuitiveness and complexity of the system and the time it took participants to complete the interaction. Importantly, practice did not eliminate the interference. In fact, interactions that were difficult on the first day were still relatively difficult to perform after a week of practice. There were also long-lasting residual costs after the IVIS interactions had terminated. We suggested that the higher levels of workload should serve as a caution because these voice-based interactions can be cognitively demanding and ought not to be used indiscriminately while operating a motor vehicle.
Task duration is central to the issue of workload assessment. A simple but elegant argument for the importance of task duration has been outlined by Shutko and Tijerina (2006). They suggest that evaluation of task duration is critical not because it reflects a cumulative effect of load, but because it represents the time over which an unexpected event might occur. Using a simple exposure-based model, they argue that all else being equal, a task that takes twice as long to complete will result in twice the potential risk of an adverse event. Other models suggest a cascading negative effect of task duration on situation awareness (e.g., Fisher & Strayer, 2014; Strayer & Fisher, 2016).
There is no clear consensus on what constitutes an acceptable interaction time for a secondary task. Problematically, the issue is confounded by research suggesting that secondary tasks are often sensitive to whether testing is completed in a static (i.e., not driving) or dynamic (i.e., driving) environment (Young et al., 2005), the age of participants (McWilliams, Reimer, Mehler, Dobres, & Coughlin, 2015), and performance characteristics of the primary or secondary tasks (Tsimhoni, Yoo, & Green, 1999). Because of the visual demands associated with driving, visual secondary tasks generally take longer to complete when performed concurrently with driving. Additionally, due to natural aging processes, older adults generally take longer to perform tasks than younger adults. These issues aside, a number of organizations have provided guidance on what constitutes an acceptable secondary task duration (e.g., Driver Focus-Telematics Working Group, 2006; Japan Automobile Manufacturers Association, 2004; National Highway Traffic Safety Administration, 2013).
For example, National Highway Traffic Safety Administration (NHTSA) (2013) has issued a set of voluntary guidelines for visual/manual tasks that suggest that tasks should require no more than 12 s of total eyes off road time (TEORT) to complete. This 12-s rule is based on the societally acceptable risk associated with tuning an analog in-car radio. Using visual occlusion, a method specified by NHTSA to evaluate visual-manual tasks, motorists can view the driving environment for 12 s and vision is occluded for 12 s in 1.5-s on/off intervals. When assessed with the visual occlusion methodology, the NHTSA guidelines provide an implicit maximum of 24 s of total task time (i.e., 12 s of shutter open time + 12 s of shutter closed time for a total task time of 24 s). While intended for visual/manual tasks, these guidelines provide a reasonable upper limit for multimodal task durations of any type.
An important prerequisite for duration-based measures of secondary task performance is the definition of a task. We use the definition provided by Burns, Harbluk, Foley, and Angell (2010), which is a derived from the Alliance of Automobile Manufactures, International Standards Organization (ISO), and JAMA guidelines. Burns et al., suggest that a task can be defined as a sequence of inputs leading to a goal at which the driver will normally persist until the goal is reached. However, we differentiate between continuous and discrete tasks that are shaped by different performance goals. Fundamental to secondary discrete tasks is a performance goal with a finite beginning and end state (e.g., changing the audio source, dialing a phone number, calling a contact, entering a destination into a navigation unit, etc.). Conversely, continuous tasks are characterized by performance maintenance over an indefinite period of time, often with no clear termination state (Schmidt & Lee, 2005) (e.g., conversing via a cell phone, listening to music, following route guidance, etc.). Given the nature of discrete tasks, a failure to account for task duration during assessment provides an incomplete picture of distraction potential.
Research questions
An important knowledge gap concerns the workload associated with making complex multimodal IVIS interactions. What are the visual and cognitive demands associated with different modes of IVIS interactions (e.g., auditory/vocal interactions versus visual/manual interactions)? To what degree do the different IVIS task types (e.g., audio entertainment, calling and dialing, text messaging, navigation, etc.) place differential demands on visual and cognitive resources? Vehicles clearly differ in their configuration and layout, but do they differ in the visual and cognitive demands of IVIS interactions? Are there tradeoffs for IVIS interactions performed with one task or mode of interaction versus another? For example, auditory/vocal inputs may have lower levels of visual demand than issuing commands using a visual/manual touchscreen, but the time taken to perform the interaction may be longer in the former than the latter. Surprisingly little is known about how these complex multimodal IVIS interactions impact the driver’s workload. Given the ubiquity of these systems, the current research sought to address three interrelated questions concerning this knowledge gap.
First, are some task types more impairing than others? The IVIS interactions support a variety of secondary tasks that are unrelated to the primary task of driving. Some of these interactions may be considered to be sufficiently impairing that they are locked out by the automaker when the vehicle is in motion (e.g., social media interactions are locked out by most automakers). However, not all secondary tasks are equivalent in distraction potential (e.g., Strayer et al., 2015). They differ in terms of task goals (e.g., play a song, send a text, place a call, etc.). Tasks differ in duration, ranging from a few seconds to a few minutes to complete, with greater distraction potential associated with greater task duration (e.g., Burns et al., 2010). Tasks differ in the way that they are implemented and they may be performed using different modes of interaction (i.e., tasks may be easier to perform using one mode of interaction than another). Tasks may also be performed using a streamlined “one-shot” interaction, or via a series of interactive steps. The current research assessed which task types were most distracting. It is possible that some tasks may be too demanding to be enabled when the vehicle is in motion, regardless of the mode of interaction.
Second, are some modes of interaction more distracting than others? In many instances, a task can be performed using auditory/vocal commands, visual/manual interactions, or, as in the example discussed above, a hybrid combination of both auditory/vocal and visual/manual interactions. If the workload associated with one mode of interaction differs from another, the differences may be offset by the time it takes to perform the interaction. For example, a visual/manual touchscreen interaction may divert the driver’s eyes from the roadway while an auditory/vocal interaction may keep the eyes on the road; however, if the time to perform an auditory/vocal interaction takes longer than the visual/manual interaction, any benefits of the former may not be realized. Moreover, just because auditory/vocal interactions tend to keep the eyes on the road does not provide a guarantee that drivers will see what they are looking at (Strayer, Drews, & Johnston, 2003; Strayer & Fisher, 2016). The current research is designed to provide an objective benchmark for the level of distraction caused by different modes of IVIS interaction.
Third, are IVIS interactions easier to perform in some vehicles than others? A trip to the automobile dealer’s showroom will quickly illustrate that vehicles differ in the features, functions, and type of human-machine interface of the IVIS. Are these differences in the IVIS merely cosmetic, or do the differences result in differential workload to perform the same IVIS functions? Vehicles differ in the number and complexity of button interactions on the steering wheel, the size, resolution, and functions supported on the center stack LCD, manual buttons on the center stack and their configuration, and the other unique modes of interaction (e.g., heads-up displays, gesture controls, rotary dials, writing pads, etc.). Moreover, vehicles often provide more than one way to perform a task. There are often cross-modal interactions wherein the task is initiated using one mode of interaction (e.g., voice commands), and then transitions to another mode of interaction (e.g., touchscreen interactions). Some IVIS interactions are ubiquitous (e.g., calling and dialing and audio entertainment), whereas others are supported by one automaker but not another (e.g., destination entry for a navigation system while the vehicle is in motion). The current research compared the IVIS interactions supported by different automakers to determine if they differ in the workload associated with their use. If there are differences in the overall demand of the IVIS interactions, what are the bases for the differences?
Experimental overview
Our prior research found that it was necessary for the driver to be driving the vehicle in order to accurately assess the concurrent workload associated with IVIS interactions - that is, dynamic testing rather than static testing (cf., SAE J2365, 2016). This was true for IVIS interactions with high levels of cognitive demand, such as using voice commands to interact with the IVIS (e.g., Strayer et al., 2015, Strayer et al., 2016, 2017b). With cognitive demand, the task of driving added a constant increase to the estimates of driver workload (e.g., the time to perform a purely voice-based IVIS interaction in a moving vehicle was increased by a constant from the time to perform the same interaction in a stationary vehicle).Footnote 1 This problem was exacerbated for IVIS interactions with high levels of visual demand, such as making selections on a center stack touchscreen, where the time to perform an IVIS interaction in a moving vehicle was an increasing linear function of the time to perform the same interaction in a stationary vehicle. Consequently, all estimates of driver workload in the current research were obtained when participants were driving the vehicle and engaged in IVIS interactions or driving in one of the control conditions (i.e., a dynamic testing method). The driving route we used was a low-density residential section of roadway with a speed limit of 25 MPH, chosen due to the relatively modest driving demands imposed by the roadway.
To properly scale the driver’s workload while interacting with the IVIS, several control conditions were required. First, a single-task driving baseline was needed to estimate the workload of the driver when they were driving the vehicle without the additional workload imposed by the IVIS interactions. This single-task baseline controls for any differences between participants and the workload associated with driving the different vehicles. The single-task baseline anchors the low end of the cognitive and visual workload estimates derived in our research.
To scale cognitive demand, a high workload cognitive task was selected that could be performed in the same way by all participants in all vehicles. The high workload referent task we used was an N-back task (e.g., Mehler, Reimer, & Dusek, 2011; Zhang, Angell, Pala, & Shimonomoto, 2015) in which a pre-recorded series of numbers ranging from 0 to 9 were presented at a rate of one digit every 2.25 s. Participants were instructed to say out loud the number that was presented two trials earlier in the sequence. The N-back task places a high level of cognitive demand on the driver without imposing any visual demands. Using the single-task baseline and N-back referent, provided a way to standardize the cognitive demand of the different IVIS interactions. That is, after controlling for any differences in workload associated with different vehicles using the single-task baseline, IVIS interactions can be directly compared to the N-back task to provide an objective measure of cognitive demand associated with their performance.
To scale visual demand of the IVIS interactions, a high workload visual referent task was selected that could be performed in the same way by all participants in all vehicles. The high workload task we used was a variant of the ISO TS 14198 Surrogate Reference Task (SuRT; Engström & Markkula, 2007; Mattes, Föhl, & Schindhelm, 2007, Zhang et al., 2015) that required participants to use their finger to touch the location of target items (larger circles) presented in a field of distractors (smaller circles) on an iPad Mini tablet computer that was mounted in a similar position in all the vehicles. Immediately after touching the location of the target, a new display was presented with a different configuration of targets and distractors. The trial sequence would not advance until the correct location was touched on the screen. The SuRT task, illustrated in Fig. 1, is a based on a feature search (e.g., Treisman & Gelade, 1980) for the size of the larger circle and the participant’s response is to identify the location of the target (as opposed to a present/absent response).
Drivers were instructed to perform the SuRT as a secondary task while giving the driving task highest priority. The SuRT task places a high level of visual demand on the driver because they must look at the display in order to locate the targets and then touch the display to indicate their response. Using the single-task baseline and SuRT referent provides a way to standardize the visual demand of the different IVIS interactions. That is, after controlling for any differences in workload associated with different vehicles using the single-task baseline, IVIS interactions can be directly compared to the SuRT task to provide an objective measure of visual demand associated with its performance.
The N-back referent task induces a high level of cognitive demand and does not present any visual information for the driver to look at. However, it is well-known that high levels of cognitive demand often alter the visual scanning behavior of the driver (e.g., see Strayer & Fisher, 2016 for a review). That is, the N-back task may impair what the driver sees. Similarly, the SuRT referent induces a high level of visual demand by requiring the driver to look at a touchscreen to locate a target amongst distractors. However, in addition to taking the driver’s eyes off the roadway to perform the task, visual attention is required to perform the SuRT task. Pilot testing of the SuRT task found a visual search slope of approximately 20 msec/item, a value above the upper threshold associated with automatic visual search (e.g., Schneider & Shiffrin, 1977; Shiffrin & Schneider, 1977). Thus, the SuRT task has high visual/manual demand and modest cognitive demand.
The current research used converging performance measures to benchmark the workload of the IVIS interactions. This included the collection of subjective estimates from the driver on their workload using the NASA-Task Load Index (Hart & Staveland, 1988) at the end of testing each IVIS interaction.
We also assessed driver workload using the Detection Response Task (DRT), an ISO protocol for measuring attentional effects of cognitive load (ISO 17488, 2015). The DRT procedure involves presenting a simple stimulus (e.g., a changing light or vibrating buzzer) every 3–5 s and requiring the driver to respond to these events when they detect them by pressing a microswitch (button) was attached to the driver’s left thumb so that the button could be depressed against the steering wheel when participants detected the vibration (or light). That is, the DRT is a simple response task (RT) that is performed concurrently with other activities (e.g., driving). As the workload of driving and/or the IVIS interactions increase, the reaction time to the DRT stimulus increases and the likelihood of detection of the DRT stimulus (i.e., the hit rate) decreases (e.g., Strayer et al., 2015, Strayer et al., 2016, 2017b). The DRT has proven to be very sensitive to dynamic changes in the driver’s workload (e.g., Strayer et al., 2017a). The DRT provides an objective assessment of the driver’s workload associated with different IVIS interactions, with minimal interference in performance of the driving task (see Strayer et al., 2013, Castro, Cooper, & Strayer, 2016, Palada, Strayer, Neal, Ballard, & Heathcote, 2017).
We used two variants of the DRT in our research. The first variant was a vibrotactile DRT, in which a vibrating buzzer, that feels similar to a vibrating smartphone, was attached to the participant’s left collarbone and a microswitch was attached to a finger on the driver’s left hand so that it could be depressed against the steering wheel when they detected the vibration. The vibrotactile DRT provides a sensitive measure of the participant’s cognitive load as they perform different IVIS interactions. As the cognitive demand increases, the RT to the vibrotactile DRT increases. These RT differences were calibrated using the single-task baseline and N-back referent to anchor the workload of the IVIS interactions in different vehicles.
Specifically, evaluation of the cognitive demand of any IVIS interaction involved an initial subtraction from any differences between vehicles and/or participants obtained in the single-task baseline (i.e., this defined the relative demand associated with an IVIS interaction). This relative cognitive demand was compared to the N-back task (i.e., the difference between the N-back task and single-task baseline defined the relative cognitive demand of the N-back task). The Cognitive Demand Ratio (CDR) was defined as the ratio of the relative cognitive demand of an IVIS interaction to the relative cognitive demand associated with the N-back task.
The CDR provides a standardized metric for comparison across IVIS interactions (both within a vehicle and between vehicles). For example, if an IVIS interaction has a CDR that is between 0 and 1, the cognitive demand of that interaction is greater than the single-task baseline and less than the N-back task. If an IVIS interaction has a CDR greater than 1, then the cognitive demand of that IVIS interaction exceeds the N-back task. Furthermore, if the CDR of an IVIS interaction in one vehicle is greater than the same IVIS interaction in another vehicle, the two vehicles differ in the cognitive demand of that interaction, with the former being greater than the latter.
The second variant of the DRT used a light that was projected onto the windshield in the driver’s line of sight as they looked at the forward roadway. When the DRT light changed from orange to red, the participant was instructed to press the microswitch attached to their finger when they detected the changing light (the same response that was used for the vibrotactile DRT). The visual DRT provides a sensitive measure of the participant’s visual load as they perform different IVIS interactions. As the visual demand increases, the detection of the changing light decreases (i.e., a decrease in hit rate). These hit rate differences were calibrated using the single-task baseline and SuRT task to anchor the workload of the IVIS interactions in different vehicles.
Evaluation of the visual demand of any IVIS interaction involved an initial subtraction from any differences between vehicles and/or participants obtained in the single-task baseline (i.e., this defined the relative visual demand associated with an IVIS interaction). This relative visual demand was compared to the SuRT task (i.e., the difference between the SuRT referent and single-task baseline defined the relative visual demand of the SuRT task). The visual demand ratio (VDR) was defined as the ratio of the relative visual demand of an IVIS interaction to the relative visual demand associated with the SuRT task.
As with CDR, VDR provides a standardized metric for comparison across IVIS interactions (both within a vehicle and between vehicles). For example, if an IVIS interaction has a VDR that is between 0 and 1, the visual demand of that interaction is greater than the single-task baseline and less than the SuRT task. If an IVIS interaction has a VDR greater than 1, then the visual demand of that IVIS interaction exceeds the SuRT task. Furthermore, if the VDR of an IVIS interaction in one vehicle is greater than the same IVIS interaction in another vehicle, the two vehicles differ in the visual demand of that interaction, with the former being greater than the latter.
In order to capture the effects of task duration, our measures of momentary cognitive, visual, and subjective task demand were combined into a metric of overall demand and scaled by task completion time. Tasks that took longer than 24 s resulted in an upward biasing of overall demand whereas tasks that took less than 24 s resulted in a downward bias. Of the metrics that fed into the overall workload metric, total task time may be most amenable to modification through design. Our investigation found that factors such as menu depth, display clutter, system responsivity, dialog verbosity, cellular connection stability, and server performance all play a significant role in task duration (e.g., Biondi, Getty, Cooper, & Strayer, 2018). The time required for a user to complete a task can be reduced through the careful performance evaluation, resulting in a reduction in exposure duration.