Distraction in diagnostic radiology: How is search through volumetric medical images affected by interruptions?

Williams, Lauren H.; Drew, Trafton

doi:10.1186/s41235-017-0050-y

Original article
Open access
Published: 20 February 2017

Distraction in diagnostic radiology: How is search through volumetric medical images affected by interruptions?

Lauren H. Williams¹ &
Trafton Drew¹

Cognitive Research: Principles and Implications volume 2, Article number: 12 (2017) Cite this article

3153 Accesses
32 Citations
Metrics details

Abstract

Observational studies have shown that interruptions are a frequent occurrence in diagnostic radiology. The present study used an experimental design in order to quantify the cost of these interruptions during search through volumetric medical images. Participants searched through chest CT scans for nodules that are indicative of lung cancer. In half of the cases, search was interrupted by a series of true or false math equations. The primary cost of these interruptions was an increase in search time with no corresponding increase in accuracy or lung coverage. This time cost was not modulated by the difficulty of the interruption task or an individual’s working memory capacity. Eye-tracking suggests that this time cost was driven by impaired memory for which regions of the lung were searched prior to the interruption. Potential interventions will be discussed in the context of these results.

Significance

Radiologists are frequently interrupted during the interpretation of medical images. The current research provides the first attempt to quantify the effect of these interruptions using an experimental design. In our study, we found that interruptions lead to a significant increase in task completion time. Through the use of eye-tracking, we were able to determine that this inefficiency is driven by impaired memory for previously searched areas of the image. In natural settings, these results translate to longer patient turnaround times and increase the cost of providing and receiving healthcare. By establishing a causal link between interruptions and productivity loss, we aim to encourage healthcare providers to reduce unnecessary interruptions in radiology reading rooms. In addition, our eye-tracking results hint at potential interventions, such as eye-tracking feedback, that may help lower the cost of unavoidable interruptions.

Background

Interruptions have been identified as a prevalent and potentially harmful occurrence in radiology reading rooms. A recent workflow analysis found that radiologists are interrupted once every 12.1 min on average during regular business hours (Ratwani, Wang, Fong, & Cooper, 2016). These interruptions are primarily in the form of medical questions during in-person or phone-call interactions. During after-hours radiology, interruptions may be even more common. At many academic institutions, after-hours phone calls are handled by a single radiology resident (Balint et al., 2014). A recent study found that on-call radiologists receive an average of 72 phone calls during a typical 12-h overnight shift (Yu, Kansagra, & Mongan, 2014). This rate of interruption equates to a 59% chance of being interrupted by a phone call for every 10 min spent reading a computed tomography (CT) scan. In a separate analysis of after-hours reading environments, increases in phone-call volume were associated with an increase in the number of errors made by radiology residents (Balint et al., 2014). Similarly, interruptions have been linked to medical errors in other tasks, such as dispensing medication (Westbrook, Woods, Rob, Dunsmuir, & Day, 2010).

The significance of a potential link between interruptions and medical errors is difficult to overstate. In 2000, the Institute of Medicine’s To Err is Human report implicated medical errors in almost 100,000 deaths and over 1 million injuries in America each year (Kohn, Corrigan, & Donaldson, 2000). Although there is no official count of these casualties, more recent estimates have placed medical errors as the third leading cause of death in America (Makary & Daniel, 2016). In addition to injury and loss of life, medical errors are a substantial financial burden to society. Diagnostic errors are the leading cause of successful medical malpractice litigation and result in the highest payout per case (Tehrani et al., 2013; Whang, Baker, Patel, Luk, & Castro, 2013). Compared to doctors from other specialties, radiologists are disproportionately named as defendants in these claims (Physician Insurers Association of America, 2004; Whang et al., 2013).

Interruptions have also been linked to decreased productivity in the workplace. Over the ten-year period from 1999 to 2010, the institutional workload for radiologists increased tenfold (McDonald et al., 2015). After adjusting for increases in staff over this period, the number of images that need to be interpreted increased from 2.9 to 10.1 per minute. Faced with this increasing workload, radiologists are under substantial pressure to maintain productivity. In the workflow analysis by Ratwani et al. (2016), the mean time spent handling each interruption was 2.4 min. In 10.6% of these interruptions, the secondary task was completely unrelated to medicine. It is likely that all interruptions have negative consequences on productivity, but these unrelated interruptions might be particularly problematic. Task-switching research has consistently demonstrated that interleaving multiple tasks takes more time than completing each of the tasks separately (for a review, see Monsell, 2003). This time cost is typically on the order of milliseconds in laboratory tasks, but these results have been replicated on a larger scale in a number of applied settings. For example, task-completion time doubled when telecommunications workers were interrupted by a secondary task (Eyrolle & Cellier, 2000).

In recent years, advancements in medical imaging technology have dramatically increased the size and complexity of the radiologist’s workload. Two-dimensional (2D) film images, such as chest radiographs, have largely been replaced by volumetric images such as CT or positron emission tomography (PET) scans. These images consist of hundreds, and sometimes even thousands, of images stacked together to form three-dimensional (3D) representations of the body (Andriole et al., 2011). By most accounts, these imaging techniques have led to positive patient outcomes (Mathieson, Mayo, Staples, & Müller, 1989; National Lung Screening Trial Research Team, 2011). However, the effects of interrupting radiologists during these large, complex images are unknown. To the extent that medical images are analogous to laboratory visual search paradigms, we can gain insight from the literature on interrupted visual search. Spatial memory, which we define as memory for locations in a visual scene, is thought to play an important role in the successful resumption of an interrupted visual search task. Primary support for this idea comes from the rapid resumption literature, which demonstrates that interrupted search is resumed more quickly than a new search can be initiated (Lieras, Rensink, & Enns, 2005). These results suggest that memory for the scene is retained throughout the interruption and is therefore able to facilitate task resumption. However, the interruptions used in these paradigms are typically brief, unfilled time delays. More complex interruptions, such as secondary search tasks, have been shown to disrupt memory for visual search arrays after only a few seconds, and this memory seems to be completely eradicated by interruptions with longer durations (Shen & Jiang, 2006). These results are consistent with known constraints on suspected mechanisms of memory in visual search, such as inhibition of return, which is limited in both capacity and duration (for a review, see Wang & Klein, 2010). Although spatial memory in visual search seems to be relatively fragile, the ability to remember where you were in a task has also proven to be a key component in resuming interrupted computer tasks (Ratwani, Andrews, McCurry, Trafton, & Peterson, 2007; Ratwani & Trafton, 2008). In large volumetric images, it may be difficult to maintain these important spatial representations of the task during an interruption. This impaired memory could have a negative impact on task completion time and error rate by causing regions of the image to be unnecessarily revisited or completely overlooked after an interruption.

In the human-computer interaction literature, Altmann and Trafton’s (2002) Memory for Goals model has been a useful framework for understanding and predicting the effects of interruption. According to this model, the success of task resumption is dependent on the relative activation level of goal-relevant information in memory. When a primary task is interrupted, relevant information about the task must be temporarily stored in memory. In order to resume the task, this goal-relevant information must be retrieved from memory. This model predicts that goals with greater activation will be retrieved more quickly and have a smaller time cost. The strength of goal activation is constrained by three factors: interference (e.g. strength of irrelevant goals), strengthening (e.g. goal rehearsal), and priming (e.g. cues in the environment). This model makes many predictions about interruptions that have been successfully tested in the literature (Altmann & Trafton, 2004; Chung & Byrne, 2008; Hodgetts & Jones, 2006; Monk, Boehm-Davis, Mason, & Trafton, 2004; Monk, Trafton, & Boehm-Davis, 2008; Trafton, Altmann, & Brock, 2005; Trafton, Altmann, Brock, & Mintz, 2003). Although the majority of these findings have been in human–computer interaction tasks, this model may also be an effective framework for understanding the effects of interruptions in visual search tasks.

Radiologists work in an increasingly complex and highly disruptive environment. Despite indications that interruptions might be both harmful and frequent in this environment, the effects of these interruptions have yet to be examined using an experimental design. The purpose of the current research is to quantify the cost of interruptions in diagnostic radiology in terms of error rate and search time. In addition to these behavioral measures, eye-tracking will be used to gain a qualitative understanding of how interruptions affect search through volumetric images. We anticipate that interruptions will lead to an increase in errors and search time. Based on the existing literature, these effects are expected to be driven by impaired memory for which regions of the image were searched prior to the interruption.

Experiment 1

Materials and methods

Participants

Twenty-nine students from the University of Utah participated in the study for course credit or $10 an hour. The experimental design was approved by the Institutional Review Board and all participants provided informed consent. Data from three participants were discarded prior to analysis: two for uncorrected vision impairments and one for completing the majority of trials too quickly to reach the experimental condition. Twenty-six participants were included in the data analysis (16 women, mean age = 21.6 years, age range = 18–42 years). Due to difficulty with calibration, eye-tracking data are missing for one participant. Participants were not medically trained and had no experience interpreting medical images prior to the study.

Primary task

Chest CT scans are volumetric representations of the lung that consist of stacked axial images. During lung cancer screening, radiologists search for small nodules that “pop in and out of view” as the reader scrolls through the depth of the image. In the current study, participants searched through 21 (1 practice, 20 experimental) chest CT scans for nodules. The CT scans had a 1024 × 1024 resolution and were centered on a 1920 × 1080 monitor. The images subtended approximately 25° of visual angle. Each CT scan consisted of 51 lung slices and 2 to 11 artificially embedded nodules that had a diameter in the range of 18–26 pixels. The up and down arrow keys were used to freely scroll through the depth of the lung. Participants were instructed to search the lung thoroughly and mark detected nodules using the computer mouse. There was an unlimited amount of time to view each CT scan and search was self-terminated by clicking on a box located on the side of the screen. Each participant underwent an instructional period and a practice trial before proceeding to the experiment. Task stimuli were presented using the Psychophysics toolbox in Matlab (Brainard, 1997).

Interruption task

Half of the trials were interrupted by a series of ten true or false math equations (see Fig. 1). Each CT scan was assigned a specific interruption time in the range of 30–60 s following search onset. The math equations were randomly generated with numbers between 1 and 10 using the format A * B – C = D. Each set of problems was solved correctly 50% of the time. Participants responded using the right arrow key for correctly solved equations and the left arrow key for incorrectly solved equations. The screen flashed red for 100 ms following an incorrect response.

Participants were instructed to be as accurate as possible and to treat both tasks with equal importance. CT scans were divided into two groups such that an equal number of participants were interrupted on each group of images. This design ensures that any observed effect is due to the interruptions rather than any differences in difficulty across CT scans. The images were presented in a random order and participants did not know if or when a case would be interrupted at the start of each trial.

Eye-tracking

Eye-movements were recorded using the Eyelink 1000 Plus (SR Research, Ontario, Canada). Participants were positioned in a chinrest approximately 64 cm away from the computer. A nine-point calibration procedure was performed every five to seven trials. Eye position was sampled at 500 Hz. In order to obtain x, y, and z coordinates for the volumetric images, the position in depth was co-registered at each time point using Eyelink messages.

Behavioral results

Math performance

On average, participants spent 38.2 s completing the math problems. The problems were solved correctly 90% of the time.

Search time

After accounting for the amount of time spent on math problems, participants spent significantly more time searching interrupted cases (M = 196.66 s, SD = 67.9 s) than control cases (M = 182.66 s, SD = 56.81 s), t(25) = 2.62, p = 0.015, Cohen’s d = 0.22 (see Fig. 2a). The average time cost was 14 s (median: 10.68 s, range: –37 to 88 s), which is an 8% increase in search time for interrupted cases.

Accuracy

Overall, 70% of the nodules were detected (see Fig. 2b). There were no significant differences in the number of missed nodules per lung between interrupted (M = 1.82, SD = 0.78) and control cases (M = 1.89, SD = 0.78), t(25) = 0.76, p = 0.45, Cohen’s d = 0.09. False alarms were infrequent and the number of false alarms per lung did not differ between interruption (M = 0.05, SD = 0.09) and control cases (M = 0.08, SD = 0.18), t(25) = 1.27, p = 0.21.

Eye-tracking results

Useful field of view

In order to calculate lung coverage and refixation rate, we need to estimate the useful field of view (UFOV) in a volumetric image. The UFOV is the area around a foveated point that can be attended without moving the eyes (Ball, Beard, Roenker, Miller, & Griggs, 1988). In the literature, a 5° diameter estimate has been used to study search through 2D medical images (Kundel, Nodine, & Krupinski, 1989; Nodine, Mello-Thoms, Kundel, & Weinstein, 2002). However, UFOV is known to decrease with the complexity of the stimuli and it is unknown if the added depth dimension in CT scans changes this estimate (Young & Hulleman, 2013). Furthermore, novices may not be able to extract as much information in a single fixation as expert radiologists (Krupinski, 2012; Kundel & La Follette Jr, 1972; Manning, Ethell, Donovan, & Crawford, 2006). To account for these factors in our study, we used a smaller (2.5°) estimate in our calculations, which increases the precision of the measure and is closer to most estimates of foveal vision (Wandell, 1995).

Coverage

Lung coverage was calculated using the x, y, and z coordinates for each sampling point. Image processing applications in Matlab were used to create black and white versions of each lung slice. White pixels represented lung tissue and black pixels represented areas that were not lung tissue. For each lung slice, a new image was generated with black circles centered at each set of visited coordinates (see Fig. 3a). Each circle subtended 2.5° of visual angle. Lung coverage was calculated as 1 minus the percentage of white pixels remaining in the new image out of the number of white pixels in the original image. There were no significant differences in lung coverage between interruption (42%) and control (40%) trials, t(24) = 1.83, p = 0.08, Cohen’s d = 0.13 (see Fig. 3b).

Search resumption

The accuracy of search resumption was calculated by dividing the lung into quadrants and comparing the locations of the last pre-interruption fixation and the first post-interruption fixation. If memory is retained following the interruption, search should resume somewhere in the vicinity of the most recently searched region of the image. However, search was resumed in the correct quadrant only 23.1% of the time, which is statistically equivalent to chance, t(24) = 0.63, p = 0.534. Furthermore, the rate of inaccurate search resumption (M = 76.9%) is significantly greater than the overall rate of quadrant changes between consecutive fixations (M = 27.6%), t(24) = 11.43, p < 0.001. In other words, the two fixations surrounding the interruptions are in different quadrants of the lung far more often than consecutive fixations during typical search. This suggests that the high rate of quadrant changes between pre and post interruption does not reflect an overall tendency to frequently switch quadrants of the lung during search. Instead, these results suggest that interruptions impair memory for which region of the lung was searched immediately prior to the interruption.

Refixation rate

Traditional eye-movement classifications, such as fixations and saccades, do not have a clear definition in volumetric space. For example, a reader may scroll through several layers of depth while maintaining their gaze at a fixed x and y position. Although eye-tracking software would classify this as a fixation, it is not a fixation in the traditional sense. For current purposes, each eye-tracker defined fixation was treated as a cylinder that permeated each layer of the lung that was visited during that time period (see Fig. 4a). The base of the cylinder was a circle that subtended 2.5° of visual angle. Each time a cylinder overlapped with another cylinder, it was classified as a refixation. In order to account for the time differences between trials, we calculated refixation rate instead of the absolute number of refixations. Refixation rate was defined as the proportion of total fixations that fell within 2.5° of a previous fixation during a given time period. In other words, refixation rate is a measure of how frequently previously viewed spatial locations are searched relative to novel spatial locations.

Each individual CT scan was associated with a unique interruption time, which allowed us to compare equivalent time periods across interruption and control trials. For example, if one group of participants was interrupted during Scan X at 40 s and the other group was not interrupted during Scan X, the refixation rate for Scan X would be calculated relative to the 40-s time point for each group. The critical comparisons were between the refixation rates for the interruption trials and the control trials during the same time periods for a given scan. During the 30 s prior to the interruption time for each CT scan, there were no significant differences in refixation rate between interruption (M = 0.41, SD = 0.08) and control (M = 0.41, SD = 0.08) trials, t(24) = 0.52, p = 0.61. However, we found a significant difference in refixation rate between interruption (M = 0.45, SD = 0.08) and control trials (M = 0.41, SD = 0.09) in the 30 s immediately following the interruption time, t(24) = 3.151, p = 0.004, Cohen’s d = 0.41 (see Fig. 4b). In the 30–60-s post-interruption period, the difference in refixation rate between interruption (M = 0.41, SD = 0.08) and control trials (M = 0.42, SD = 0.08) returned to baseline, t(24) = 0.36, p = 0.72.

For interruption trials, the refixation rate during the 0–30-s time period significantly correlates with each individual’s average time cost (search time for interruption trials – search time for control trials), r(23) = 0.43, p = 0.03. This correlation is not significant at baseline or in the 30–60-s time window. Furthermore, a median split of the data reveals a significantly higher refixation rate in the post-interruption period for the participants with the greatest time cost (M = 0.48, SD = 0.08) than the participants with the smallest time cost (M = 0.41, SD = 0.08), t(23) = 2.34, p = 0.029.

Experiment 2

In Experiment 1, interruptions led to an increase in task completion time. Although one would expect longer search times to lead to increased lung coverage and fewer missed nodules, there were no differences across the two conditions. The eye-tracking measures revealed that this inefficiency of search seems to be driven by impaired spatial memory during the time period immediately following the interruption. In Experiment 2, we sought to determine which features of interruptions might modulate the associated time cost. The Memory for Goals model (Altmann & Trafton, 2002) suggests that the difficulty of the interruption task will influence the magnitude of the interruption cost. According to this account, difficult interruptions impair the ability to maintain goal-relevant information during an interruption to a greater extent than easy interruptions. Therefore, we should observe a greater time cost if we increase the difficulty of the interruption task. However, Cades, Davis, Trafton, and Monk (2007) emphasize that it is the ability to rehearse goal-relevant information, rather than subjective task difficulty, which predicts the disruptiveness of an interruption. Based on this more nuanced interpretation of the Memory for Goals model, we might not observe any additional time in more difficult interruption tasks. In Experiment 1, the accuracy of search resumption was statistically equivalent to chance. This suggests that the ability to rehearse relevant spatial information was impaired by the interruption. Once the opportunity to rehearse has been eliminated, more difficult interruptions might not place any additional demands on the participant. However, if there is an effect of task difficulty beyond the ability to rehearse, we should observe an increase in time cost for difficult interruptions.

In addition to the difficulty manipulation, we administered a visuospatial working memory task to determine if individual differences in working memory capacity explain the variation in interruption cost. Working memory, perhaps more than any other cognitive measure, has been linked to meaningful outcomes, such as reading comprehension (Daneman & Carpenter, 1980), academic performance (Colom, Escorial, Shih, & Privado, 2007), and fluid intelligence (Kane, Hambrick, & Conway, 2005; Unsworth, Fukuda, Awh, & Vogel, 2014). Most notably, working memory capacity explains individual variation in multitasking ability (Redick, 2016). Individuals with high working-memory capacity might have the cognitive resources to better maintain task-relevant information in memory throughout an interruption. According to the Memory for Goals framework (Altmann & Trafton, 2002), this ability would allow these individuals to resume the task more quickly. Therefore, we expect to observe a negative correlation between working memory capacity and time cost.