Humans usually do not execute actions solely within a single effector system, but constantly coordinate actions across several output domains (e.g., manual, oculomotor, vocal, or other motor control units). This is also true for navigating a car in traffic, where we need to steer the car manually and control the speed with our foot while scanning the environment or controlling the driving trajectory with our eyes. The integration and coordination of behavior across several specific output modalities in response to complex action control demands has been studied under the umbrella term “cross-modal action” (Huestegge & Hazeltine, 2011). However, while cross-modal action has been thoroughly studied in basic research (see later), surprisingly little is known about these phenomena in more applied settings; for example, regarding the interaction of manual steering and gaze behavior while driving. Most existing studies in this domain focused on the interaction of steering and gaze behavior while driving curves (Land & Lee, 1994; Wilson, Stephenson, Chattington, & Marple-Horvat, 2007) and showed that the driver looks in the direction(s) he is going to steer (see also Pfeuffer, Kiesel, & Huestegge, 2016, for basic mechanisms underlying anticipatory oculomotor control), whereas other studies analyzed the influence of cross-modal action between verbal tasks (i.e., talking on a phone) and (spatial) visual attention while driving (e.g., Atchley, Dressel, Jones, Burson, & Marshall, 2011). However, these results are not informative regarding the issue of gaze–steering interaction when drivers respond to suddenly appearing objects, which require a swift response from the driver. Specifically, to date little is known about how long it takes the driver to show a steering reaction to sudden hazardous events and how gaze behavior (spatially and temporally) interacts with steering behavior. The present study intends to fill this gap by providing a first approach to this topic. In the following, we will summarize previous findings about driver reactions to suddenly appearing objects, and about basic mechanisms of cross-modal action control with a focus on the interaction of gaze behavior and manual responses.
Driver reactions to suddenly appearing objects
Previous studies suggest that drivers often tend to prefer braking over steering to avoid an accident, sometimes even in situations where steering might be the better collision avoidance strategy (Adams, 1994; Adams, Flannagan, & Sivak, 1995; Dozza, 2013; Malaterre, Ferrandez, Fleury, & Lechner, 1988; Malaterre & Lechner, 1990; McGehee et al., 1999; Wiacek & Najm, 1999). However, in those cases where drivers try to avoid an obstacle by steering, they tend to swerve in the moving direction of the obstacle (Scanlon, Kusano, & Gabler, 2015; Weber & Färber, 2015; Wiacek & Najm, 1999). According to Malaterre and Lechner (1990) and Weber and Färber (2015), the time to collision (TTC) strongly influences the likelihood of an evasive steering maneuver. Specifically, their data suggest a U-shaped correlation, in that the likelihood for an evasive maneuver increases if the TTC is smaller than 1.8 s or greater than 2.2 s.
So far, most studies reporting steering reaction time (RT) have been conducted in laboratory settings (Müsseler, Aschersleben, Arning, & Proctor, 2009; Proctor, Wang, & Pick, 2004; Wang, Proctor, & Pick, 2003) and report data on manual steering responses to spatially presented visual or auditory stimuli. More specifically, they have focused on the influence of spatial compatibility between the position of the imperative stimulus and the required steering reaction. Spatial compatibility is a special form of stimulus–response (S–R) compatibility, which is thought to be based on the spatial association between a stimulus and the required response (Proctor & Vu, 2006). In general, high (spatial) S–R compatibility is assumed to yield a shorter RT than low (spatial) S–R compatibility. This prediction does not only hold for task-relevant stimulus features (e.g., pressing a right key in response to a stimulus on the right), but also for task-irrelevant stimulus features; for example, when a stimulus requiring a right key press is displayed on the right (vs left) side of a display although the stimulus presentation location is not task relevant (“Simon effect”; Simon, 1969; Simon & Rudell, 1967).
In the context of traffic psychology, both Müsseler et al. (2009) and Wang et al. (2003) found that spatially compatible stimuli (i.e., stimuli that are spatially compatible with the required steering response) lead to faster steering RTs compared to spatially incompatible stimuli, with an advantage of approximately 60 ms (Wang et al., 2003) or 15 ms (Müsseler et al., 2009), while steering RTs ranged from 425 to 625 ms. Interestingly, however, Müsseler et al. (2009) also showed that spatial compatibility effects can be reversed in specific driving situations: participants responded faster (about 21 ms) when they steered away from a pedestrian (spatially incompatible response) compared to steering toward a pedestrian (spatially compatible response). They assumed that such a reversed compatibility effect is the consequence of stimulus valence, which might be associated with a corresponding response (e.g., avoidance of stimuli with “negative”, hazard-related valence and approach toward stimuli with "positive" valence). This indicates that mere spatial S–R compatibility might only exert a negligible influence in driving situations, where the driver has specific goals and intentions which are associated with the driving task and depend on experience and training (e.g., avoid collisions while driving). Consequently, the avoidance response—although spatially incompatible with the stimulus—might eventually be carried out faster because of its congruency with the drivers’ current goals in the context of a highly trained driving task (i.e., goal congruency might override spatial compatibility).
The reversed compatibility effect is also interesting because in real driving situations the avoidance reaction might be more complex than the approach reaction. Specifically, to select the appropriate reaction, drivers must perceive and identify the stimulus first, a process that—despite the possibility that humans can in principle covertly shift attention without any eye movements—should usually be accompanied with corresponding oculomotor behavior (Findlay & Gilchrist, 2003). In the case of a stimulus requiring the driver to steer toward it (approach), drivers can focus on this stimulus during the entire process of preparing and executing the steering response and might not have to divert their visual attention to plan and control the required vehicle trajectory. In this case, the required direction of the attentional (i.e., typically oculomotor) and manual response is the same (cross-modal congruency). In the case of a stimulus requiring the driver to steer away (avoid), however, drivers need to attend to the stimulus as well (in order to respond to it), but also need to manually initiate the required evasive vehicle trajectory in the opposite direction. In this case, the spatial direction of (initial) visual processing and manual responding differs (cross-modal incongruency). How oculomotor and manual control interacts in such situations is an as yet unresolved issue.
Cross-modal action control: basic mechanisms of the interaction of gaze and manual actions
In basic cognitive research, studies focusing on cross-modal action control have shown that gaze behavior can interfere with concurrent manual actions (Hodgson, Müller, & O’Leary, 1999; Huestegge & Adam, 2011; Huestegge & Koch, 2009), a finding that can be considered a special case of performance costs associated with multitasking (Pashler, 1994). For example, Hodgson et al. (1999) and Huestegge and Koch (2009) showed that both manual and gaze RTs are delayed under simultaneous gaze and manual response demands (see also Huestegge, 2011; Huestegge, Pieczykolan, & Koch, 2014; Tibber, Grant, & Morgan, 2009). These dual-response costs (i.e., additional time to initiate a response in the context of another response vs alone) typically increase when one or both responses are incompatible with the stimulus or incompatible among each other (see Huestegge, 2011). Such performance costs for incompatible responses across effector systems do not only occur for explicitly instructed saccades in the context of manual responses, but also for incidental (not explicitly instructed) saccades (Huestegge & Adam, 2011). Across all of these basic research studies, manual responses were associated with greater dual-response costs and incompatibility effects than the gaze responses. Additionally, the oculomotor response was typically initiated earlier than the manual response under both compatible and incompatible response requirements. Further studies have replicated these findings and suggested a general prioritization of oculomotor responses over manual responses (oculomotor dominance; Huestegge & Koch, 2013; Pieczykolan & Huestegge, 2014). Hodgson et al. (1999) assumed that manual and visual responses might share a common attentional representation of space, which could be an explanation of why especially spatially incompatible motor programs of these different effector systems interfere so strongly with each other. In sum, these studies indicate a possible response delay, especially for manual responses, when incompatible oculomotor and manual responses have to be executed at close temporal proximity in a driving situation, and a general tendency to execute saccades prior to manual responses. However, these predictions have not yet been tested, and it is possible that the general goal to optimize vehicle control may yield quite different control strategies in complex driving situations than under more basic task demands in reduced laboratory settings.
The present study
With this study, we wanted to analyze the interaction of gaze and manual steering responses while driving as an applied example of cross-modal action control. More specifically, we tried to narrow down the gap between laboratory research and applied research by developing a more realistic yet still standardized experimental setting in a dynamic driving simulator, which incorporates the experimental setup in a driving task. A general question is whether typical result patterns found in reduced, highly controlled basic research settings reflect fundamental cognitive mechanisms that thereby also generalize to more complex real-life tasks (e.g., navigating through traffic). In contrast, it is possible that many effects found in basic research setups are absent (or at least strongly modulated) in more complex, realistic environments due to a strong adaptivity and flexibility of cognitive sets based on changing situations, task demands, and goals.
As already mentioned, Huestegge and Adam (2011) showed that that even mere incidental (as opposed to explicitly instructed) saccades during the preparation of concurrent manual responses significantly affected manual RTs in terms of spatial congruency effects: if the direction of the saccade was compatible with the position of the required manual response (cross-modal action congruency), the manual RT was faster compared to trials in which the saccade was incompatible (cross-modal action incongruency). Assuming that such basic laboratory findings generalize to more complex settings and goals, one would expect that such cross-modal action incongruency should also negatively affect steering RT in driving situations, especially when drivers have to steer away from (avoid) a suddenly appearing object (assuming that the driver should usually gaze at the object in order to process it). Conversely, in situations where the driver must steer toward (approach) an object, faster steering RTs would be expected due to cross-modal action congruency (in addition to spatial S–R congruency).
However, in a driving context, such an effect might also be counteracted by the particular goals of a driver. The results of Müsseler et al.’s (2009) study indicated that it is also reasonable to expect no such difference in steering RTs because in both (approach and avoidance) conditions, spatial congruency emerges between the steering movement and the current goal in terms of the intended driving direction. This more high-level, conceptual spatial congruency may override any low-level cross-modal action (and S–R) congruency effects (and probably also affect gaze behavior in general, see following hypothesis).
Second, based on the many laboratory studies already referred to, one might expect that after stimulus onset an oculomotor response should usually be initiated prior to the manual response due to the well-known general latency differences between these effector systems that were observed regardless of any particular S–R or R–R congruency conditions (oculomotor dominance in cross-modal dual-response tasks; Huestegge & Koch, 2013). In a driving context, it is furthermore reasonable to assume that the stimulus must be perceived (typically by looking at it) before an appropriate manual steering response will be initiated.
However, it is also possible that covert attention (i.e., without observable eye movements) is used for stimulus processing, especially in avoidance situations where the manual steering trajectory should be planned away from the stimulus, and where oculomotor control may predominantly be devoted to planning and monitoring an optimal vehicle trajectory. Thus, it is entirely possible that avoidance situations are associated with fewer saccades toward the stimulus (i.e., saccades dedicated to stimulus decoding), but more (slightly delayed) saccades in the intended steering direction (for planning/monitoring the steering responses). Since these saccades would serve a different goal, they might well be executed after the initial manual steering response. Thus, again, we consider it possible that a robust finding from basic research (here: with respect to cross-modal response sequence effects) might not generalize (or at least be strongly modulated) in a complex driving situation due to different underlying goals of the subject.
Third, we wanted to explore whether the combination of both response demands (i.e., mixing of approach and avoidance demands within one experimental block of trials) has an influence on the response pattern and the coordination of oculomotor and manual responses. In real-life situations, drivers might often have to choose between several response options (e.g., steer toward or away from an upcoming target) and to select the correct option (in relation to current task goals) within a short timeframe. Therefore, it is important to implement an environment which requires the driver to dynamically adapt his/her response strategies in accordance with the specific stimulation conditions. Based on findings from basic cognitive research (e.g., Kiesel et al., 2010), we expected typical performance effects associated with experimental blocks involving task switches compared to blocks involving a constant task; that is, an increase in RTs and error rates for blocks requiring both approach and avoidance tasks. Again, we reasoned that even though detrimental effects of task switching are well replicated across innumerable basic laboratory experiments, it is important to explicitly test to what extent corresponding effects can also be relevant in more complex real-life task demands such as driving a vehicle.