In line with our prior work (Ruginski et al., 2016), we hypothesized that participants viewing the cone of uncertainty would report that the hurricane was larger at a future time point. It was an open question whether judgments of intensity would also be associated with the depicted size of the cone. We predicted that those viewing the ensemble display would report that the size and intensity of the storm remained the same in the future because the size cue from the cone was not present. On the other hand, for ensemble hurricane track displays (Fig. 2b, d), it is possible that the individual tracks and their relationship to one another are the salient features used to interpret the hurricane forecast. The tracks in the ensemble display employed by Ruginski et al. (2016) became increasingly farther apart as the distance from the center of the storm increased, which could be associated with a decrease in perceived intensity of the storm. We predicted that participants viewing the ensemble display would believe that the storm was less intense where the individual tracks were farther apart (an effect of distance from the center of the storm). However, because the cone of uncertainty lacks this salient spread of tracks, we predicted that judgments of intensity when viewing the cone would not be affected by distance from the center of the storm.
Methods
Participants
Participants were 182 undergraduate students currently attending the University of Utah who completed the study for course credit. Three individuals were excluded from final analyses for failing to follow instructions. Of the 179 included in analyses, 83 were male and 183 were female, with a mean age of 21.78 years (SD = 5.72). Each participant completed only one condition, either size task with cone (n = 40), size task with ensemble display (n = 42), intensity task with cone (n = 48), or intensity task with ensemble display (n = 48).
Stimuli
Stimuli were presented online using the Qualtrics web application (Qualtrics [Computer software], 2005). In each trial, participants were presented with a display depicting a hurricane forecast. The hurricane forecast images were generated using prediction advisory data from two historical hurricanes, available on the NHC website (http://www.nhc.noaa.gov/archive). The cone of uncertainty and an ensemble display technique were both used to depict the two hurricanes (Fig. 2).
A custom computer code was written to construct the summary and ensemble displays, using the algorithm described on the NHC website (http://www.nhc.noaa.gov/aboutcone.shtml). The ensemble and summary displays were created using the code of Cox et al. (2013). The resulting displays were a subset of the five visualization techniques used in Ruginski et al. (2016), which depicted two hurricanes and were randomly presented to participants. All were digitally composited over a map of the U.S. Gulf Coast that had been edited to minimize distracting labeling. These images were displayed to the subjects at a pixel resolution of 740 × 550. A single location of an ‘oil rig’ depicted as a red dot was superimposed on the image at one of 12 locations defined relative to the centerline of the cone and the cone boundaries. We chose the following distances to place the oil rigs relative to the centerline of the cone, 69, 173, 277, 416, 520, and 659 km (Fig. 3), which correspond to 0.386, 0.97, 1.56, 2.35, 2.94, and 3.72 cm from the center line of the hurricane on the map.
Relative points with respect to the center and cone boundary were chosen so that three points fell outside the cone boundary (277, 173, and 69 km), three points fell within the cone boundary (416, 520, and 659 km), and so that no points appeared to touch the visible center line or boundary lines. Underneath the forecast, a scale ranging from A to I was displayed along with visual depictions. For the intensity task, the scale was indicated by gauges, and for the size task the scale was indicated by circles (Fig. 4). Each circle was scaled by 30% from the prior circle. Each gauge was scaled by 1 ‘tick’ from the prior gauge. The starting size and intensity of the hurricane were overlaid on the beginning of the hurricane track forecast for each trial. Three starting sizes and intensities (C, E, G) were presented in a randomized order.
Salience assessment
To test the previously stated prediction about salience of features of ensemble and summary displays, we utilized the Itti et al. (1998) salience model. Prior research has employed the Itti et al. (1998) salience model to test the salience of cartographic images and found that this model is a reasonable approximation of bottom-up attention (Fabrikant et al., 2010; Hegarty et al., 2010). The Itti et al. (1998) salience model was run in Matlab (2016, Version 9.1.0.441655) using the code provided by Harel et al., (2007). The results of this analysis suggest that the most salient visual features of the cone of uncertainty are the borders of the cone and the centerline (Fig. 5a). Additionally, the salient visual features of the ensemble display are the relative spread of hurricane tracks (Fig. 5b).
Design
We utilized a 2 (visualization type) × 2 (hurricane) × 3 (starting size or intensity) × 12 (oil rig location) mixed factorial design for each task (size and intensity). Hurricane starting size or intensity and the oil rig location were within-participant variables, resulting in a total of 72 trials per participant. Participants were randomly assigned to one of two visualization conditions (summary or ensemble display) and one of two tasks (size or intensity) as between-participant factors.
Procedure
Individuals were first given a simple explanation of the task and visualization. Participants completing the size task were provided with the following instructions:
“Throughout the study you will be presented with an image that represents a hurricane forecast, similar to the image shown above. You will be provided with the initial hurricane size (diameter) at a particular point in time, indicated by the circle shown at the apex (beginning) of the hurricane forecast. An oil rig is located at the red dot. Assume that the hurricane were to hit the oil rig (at the red dot). Your task will be to select the size that best represents what the hurricane’s diameter would be when it reaches the location of the oil rig.”
Additionally, each trial included the text as a reminder of the task, “Assume that the hurricane were to hit the oil rig (at the red dot). Your task is to select the size that best represents what the hurricane’s diameter would be when it reaches the location of the oil rig.”
For the intensity task, participants were provided the instructions:
“Throughout the study you will be presented with an image that represents a hurricane forecast, similar to the image shown above. You will be provided with the initial hurricane wind speed at a particular point in time, indicated by the gauge shown at the apex (beginning) of the hurricane forecast. As the arm of the gauge rotates clockwise, the wind speed increases. For example, gauge A represents the lowest wind speed and gauge I the highest wind speed. An oil rig is located at the red dot. Assume that the hurricane were to hit the oil rig (at the red dot). Your task will be to select the gauge that best represents what the hurricane’s wind speed would be when it reaches the location of the oil rig.”
Each trial also contained the instructions, “Assume that the hurricane were to hit the oil rig (at the red dot). Your task is to select the gauge that best represents what the hurricane’s wind speed would be when it reaches the location of the oil rig.”
Following the instructions, participants completed all of the trials presented in a different random order for each participant. Finally, participants answered questions related to comprehension of the hurricane forecasts. These included two questions specifically relevant to the current research question: “The display shows the hurricane getting larger over time.” and “The display indicates that the forecasters are less certain about the path of the hurricane as time passes.” These questions also included a measure of the participants’ understanding of the response glyphs used in the experiment by asking them to indicate which of two wind gauges had a higher speed or to match the size of circles. Participants who did not adequately answer these questions were excluded from the analysis (two participants for the wind speed gauges, one for the size circles).
Data analysis
Multilevel models (MLM) were fit to the data using Hierarchical Linear Modeling 7.0 software and restricted maximum likelihood estimation procedures (Raudenbush & Bryk, 2002). Multilevel modeling is a generalized form of linear regression used to analyze variance in experimental outcomes predicted by both individual (within-participants) and group (between-participants) variables. A MLM was appropriate for modeling our data and testing our hypotheses for two major reasons. Firstly, MLM allows for the inclusion of interactions between continuous variables (in our case, distance) and categorical predictors (in our case, the type of visualization). Secondly, MLM uses robust estimation procedures appropriate for partitioning variance and error structures in mixed and nested designs (repeated measures nested within individuals in this case).
We transformed the dependent variable before analysis by calculating the difference between the starting value of the hurricane (either size or intensity) and the participant’s judgment. A positive value of the difference score represents an increase in judged size or intensity. In addition, although an ordinal variable by definition, we treated the dependent variable Likert scale as continuous in the model because it contained over five response categories (Bauer & Sterba, 2011).
For the distance variable, we analyzed the absolute value of oil rig distances, regardless of which side of the hurricane forecast they were on, as none of our hypotheses related to whether oil rigs were located on a particular side. We divided the distance by 10 before analysis so that the estimated model coefficient would correspond to a 10-km change (rather than a 1-km change). The mixed two-level regression models tested whether the effect of distance from the center of forecasts (level 1) varied as a function of visualization (level 2). Visualization was dummy coded such that the cone visualization was coded as 0 and the ensemble display as 1. We tested separate models for the intensity and size tasks. Self-report measures of experience with hurricanes and hurricane prone regions were also collected. As the participants were students at the University of Utah, so few had experienced a hurricane (3%) or had lived in hurricane-affected regions (7%) that we did not include these measures as covariates.
Results – Size
Level 1 of our multilevel model is described by:
$$ Chang{e}_{ij}={\beta}_{0j}+{\beta_{1j}}^{\ast}\left( Distanc{e}_{ij}\right)+{r}_{ij}; $$
and level 2 by:
$$ {\upbeta}_{0\mathrm{j}}={\upgamma}_{00}+{\upgamma_{01}}^{\ast}\left({\mathrm{Visualization}}_{\mathrm{j}}\right)+{\mathrm{u}}_{0\mathrm{j}} $$
$$ {\upbeta}_{1\mathrm{j}}={\upgamma}_{10}+{\upgamma_{11}}^{\ast}\left({\mathrm{Visualization}}_{\mathrm{j}}\right)+{\mathrm{u}}_{1\mathrm{j}} $$
Where i represents trials, j represents individuals, and the β and γ are the regression coefficients. The error term rij indicates the variance in the outcome variable on a per trial basis, and u0j on a per person basis. Though people are assumed to differ on average (u0j) in the outcome variable, we tested to determine whether the effect of distance differed per person (u1j) using a variance-covariance components test. We found that the model including a random effect of distance fit the data better than the model not including this effect, and so the current results reflect that model (χ2 = 955.95, df = 2, P < 0.001). Including this term allowed us to differentiate between the variance accounted for in judgments specific to a fixed effect of distance and the variance accounted for in judgments specific to a random effect of person.
Our primary hypothesis was that we would see greater size judgments with the cone compared to the ensemble display, reflecting a misinterpretation that the hurricane grows over time. Consistent with this prediction, we found a significant main effect of visualization type on average change in size judgments (γ
01
= −0.69, standard error (SE) = 0.33, t-ratio = −2.08, df = 80, P = 0.04). This effect indicates that, at the center of the hurricane, individuals viewing the cone visualization had a 0.69 greater increase in their original size judgment compared with individuals viewing the ensemble visualization (Fig. 6). However, the oil rig distance from the center of the storm did not significantly alter change in size judgments (γ
10
= 0.01, SE = 0.01, t-ratio = 1.43, df = 80, P = 0.16) and the effect of distance from the center of the storm on change in size judgments did not differ based on visualization type (γ
11
= −0.01, SE = 0.01, t-ratio = −1.32, df = 80, P = 0.19). Further, the main effect of visualization type on the average change in size judgment was also supported by results of the post-test question. A t-test, in which yes was coded as 1 and no as 0, revealed that participants viewing the cone (M = 0.70, SE = 0.04) were significantly more likely to report that the display showed the hurricane getting larger over time compared to the ensemble display (M = 0.39, SE = 0.05), t(176) = 4.436, P < 0.001, 95% CI 0.17–0.45, Cohen’s d = 0.66.
Results – Intensity
The multilevel model used for the intensity data included the exact same variables as the size model. Similar to the first model, we found that the model including a random effect of distance fit the data better than the model not including this effect, and so the current results reflect that model (χ2 = 704.81, df = 2, P < 0.001).
For intensity, we expected to see a greater effect of distance from the center of the storm on judgments with the ensemble display compared to the cone, reflecting participants’ attention to the increasing spread of tracks as the distance from the center increase for the ensemble display. First, we found a significant main effect of visualization type on average change in intensity judgments (γ
01
= −0.85, SE = 0.33, t-ratio = −2.58, df = 95, P = 0.01). This indicates that, at the center of the hurricane, individuals viewing the cone visualization increased their intensity judgment by 0.85 (almost a full wind gauge) more than those who viewed the ensemble visualization at the center of the hurricane. Second, we found a significant main effect of distance from the center of the storm (γ
10
= −0.02, SE = 0.01, t-ratio = −3.28, df = 95, P = 0.001), which is qualified by a significant cross-level interaction between distance and visualization type (γ
11
= −0.02, SE = 0.01, t-ratio = −3.33 df = 95, P = 0.001). To decompose the interaction between distance from the center of the storm and visualization type, we computed simple slope tests for the cone and ensemble visualizations (Fig. 7). This revealed that the association between distance from the center of the hurricane and change in intensity judgment is different from zero for each visualization (cone visualization: Estimate = −0.02, SE = 0.01, χ2 = 64.74, P < 0.001; ensemble visualization: Estimate = −0.04, SE = 0.004, χ2 = 10.74, P = 0.001) and stronger for the ensemble visualization (χ2 = 101.89, P < 0.001). This result suggests that judgments of intensity decreased with distance more for the ensemble display than for the cone, consistent with a focus on the relative spread of hurricane tracks. In addition, using a t-test, a post-test question revealed that participants viewing the ensemble display (M = 0.53, SE = 0.04) were more likely to report that the display indicated the forecasters were less certain about the path of the hurricane over time compared to the cone (M = 0.39, SE = 0.05), t(176) = −1.97, P = 0.04, 95% CI −0.29 to −0.0003, Cohen’s d = 0.29.
Discussion
The results of this experiment showed that novice users interpret the size and intensity of a hurricane represented by ensemble and summary displays differently. Our prior work showed different damage ratings over time with the cone compared to the ensemble display, but it was unclear whether these were being driven by interpretations of size or intensity because a more general concept of ‘damage’ was used (Ruginski et al., 2016). In the current study, we found a similar pattern of greater increase in both size and intensity reported at the center of the hurricane with the cone, compared to the ensemble display. Furthermore, we found an effect of decreasing intensity judgments with distance from the center of the storm that was greater for the ensemble display than for the cone.
These findings support our hypothesis that a salient feature of the cone is the border that shows the diameter of the cone, which is more likely to influence viewers’ beliefs that the storm is growing over time compared to the ensemble display, which does not have this visually salient feature. We saw evidence of the participants’ beliefs that the cone represented the storm growing in size with both objective judgments of size (which increased more relative to judgments made using the ensemble display) and self-reported interpretations of the cone of uncertainty. Our second hypothesis that participants viewing the ensemble display would believe that the storm was less intense where the individual tracks were farther apart was supported by results of the intensity task conditions. Here, while intensity ratings were higher for the cone compared to the ensemble display, the rate of decrease in ratings of intensity as distance from the center of the storm increased was greater for the ensemble display than the cone. Together, these findings demonstrate that, in the context of hurricane forecasts, the salient visual features of the display bias viewers’ interpretations of the ensemble hurricane tracks.
More generally, we suggest that summary displays will be most effective for cases in which spatial boundaries of variables such as uncertainty cannot be misconstrued as presenting physical boundaries. In contexts like cartography, where spatial layouts inherently represent physical space, ensemble displays provide a promising alternative to summary displays. Although our findings suggest that ensemble displays seem to have some advantages over summary displays to communicate data with uncertainty in a geospatial context, it may also be the case that ensemble displays provoke additional unintended biases. We tested one potential ensemble display bias in Experiment 2.