Introducing hat graphs

Witt, Jessica K.

doi:10.1186/s41235-019-0182-3

Brief report
Open access
Published: 14 August 2019

Introducing hat graphs

Jessica K. Witt ORCID: orcid.org/0000-0003-1139-1599¹

Cognitive Research: Principles and Implications volume 4, Article number: 31 (2019) Cite this article

6274 Accesses
4 Citations
4 Altmetric
Metrics details

Abstract

Visualizing data through graphs can be an effective way to communicate one’s results. A ubiquitous graph and common technique to communicate behavioral data is the bar graph. The bar graph was first invented in 1786 and little has changed in its format. Here, a replacement for the bar graph is proposed. The new format, called a hat graph, maintains some of the critical features of the bar graph such as its discrete elements, but eliminates redundancies that are problematic when the baseline is not at zero. Hat graphs also include design elements based on Gestalt principles of grouping and graph design principles. The effectiveness of the hat graph was tested in five empirical studies. Participants were nearly 40% faster to find and identify the condition that led to the biggest difference from baseline to final test when the data were plotted with hat graphs than with bar graphs. Participants were also more sensitive to the magnitude of an effect plotted with a hat graph compared with a bar graph that was restricted to having its baseline at zero. The recommendation is to use hat graphs when plotting data from discrete categories.

Significance

Visualizations are an important way to communicate results from data. As “big data” has increased impact on daily life, communication of data is of critical importance. The bar graph is ubiquitous and yet has not been fundamentally updated since its inception in the eighteenth century. The hat graph is offered as a modernized version that more effectively communicates differences across conditions. Hat graphs increased the speed to find the condition associated with the biggest difference by nearly 40% relative to bar graphs. Hat graphs also increased sensitivity to the size of an effect by 30% and eliminated bias in estimating effect size. Hat graphs can significantly improve how scientists communicate their data.

Introducing hat graphs

The bar graph is commonly used to visualize data in psychology. In a recent issue of Psychological Science (2018, v 29, issue 12), 50% of the articles included a bar graph. The bar graph dates back to 1786 (Playfair, 1786), and the first bar graph bears great resemblance to typical bar graphs used today. Bar graphs are problematic in presenting results from behavioral research. Bar graphs depict values in two ways: one is by the relative position of the end of the bar and the other is by length of the bar (and, perhaps a third, is by area of the bar). These various sources of information will be inconsistent with each other if the baseline of the graph is not set to zero. When comparing conditions represented by separate bars, the relative position of the ends of the bars will accurately reflect the differences between the conditions, but the relative difference in length will exaggerate the differences (Healy, 2019; Pandey, Rall, Satterthwaite, Nov, & Bertini, 2015; Pennington & Tuttle, 2009). Thus, the rule for bar graphs is to always set the baseline to zero. Of the articles that used bar graphs referenced above, all but one used zero as the baseline. However, even a baseline at zero creates large biases in readers’ perceptions of the size of effects depicted in bar graphs: big effects can appear small with a baseline at zero (Witt, in press). One way to improve readers’ perceptions is to maximize compatibility between the visual size of the effect and the effect size being depicted. Given that effect size in psychology is often measured in terms of SDs, and an effect size of 0.8 is considered “big” (Cohen, 1988), it is sensible to set the range of the y-axis to 1.5 SDs. With this range, big effects look big and small effects look small (Witt, in press). This range is problematic when using bar graphs, however, given the mixed meanings across the different features when the baseline is not set to zero.

Alternatives to bar graphs include point graphs and line graphs. Point graphs have the advantage that only one feature specifies the data, namely relative position. Thus, inconsistencies are not created by non-zero baselines. But with point graphs the Gestalt grouping principles to help facilitate perceptual grouping of pairs of data points are not as strong. This grouping problem can be solved by connecting the points with a line, thus making the graph a line graph. Line graphs are a natural choice when communicating trends, such as differences in a dependent variable across a continuous independent variable, because a feature of the line (the slope) represents the trend, without having to integrate across multiple features (Carswell & Wickens, 1996). Line graphs are not, however, a natural choice when communicating discrete values. They can even lead to misinterpretations of discrete variables as continuous. For example, when comparing across distinct groups like construction workers versus librarians, people were more likely to make continuous comparisons like “the more librarian a person is, the shorter he is” rather than discrete comparisons like “librarians tend to be shorter than construction workers” when presented with line graphs than when presented with bar graphs (Zacks & Tversky, 1999). Lines are also an excellent choice for communicating interactions because the interactions are represented by the intersection in the lines, so the interaction can be perceived by comparison across the slopes of the lines, rather than integrating across four or more bars. Some have recommended that line graphs be used to display interactions even across discrete categories (Kosslyn, 2006).

Rather than have to select between these various trade-offs between bar and line graphs, another option is to design a new kind of graph that has the desirable properties of the bar graphs (proper interpretation of discrete categories) and desirable properties of the line graphs (configurable properties that signal the effect of interest, unrestricted settings for the y-axis). Often the purpose of a graph is to report findings of a difference between two conditions (a main effect) or a difference between differences in conditions (an interaction). Bar graphs are not the most effective or efficient way to communicate differences because they require additional processing. According to Pinker’s theory of graph comprehension, objects give rise to “message flags” that make the objects’ values “easily extractable” from the graph (Pinker, 1990, p. 108). For bar graphs, each bar has an associated message flag to signal its height but extracting the difference across two bars (or the differences across two pairs of bars in the case of an interaction) requires additional processes of what Pinker refers to as interrogation. In the case of bar graphs, this will require top-down visual search processes to locate the relevant bars and then mentally compare their relative heights. A better way to communicate differences is to represent the difference as a single object. Thus, the difference would have its own message flag automatically associated with it, rather than require these additional interrogation processes.

To achieve these objectives, the traditional bar graph was transformed. First, the tops of the bars were retained while the bars themselves were removed. This removes the redundancy between specifying the values by the tops of the bars and by the length of the bars. Removing redundancy is one of the recommendations made by Tufte (2001), and by removing bar length as a signifier of value, the y-axis does not have to start at zero because now the tops are the only indicator of value and not also bar length. Second, the difference between two sets of bars was highlighted by enclosing this difference as its own object by keeping the portion of the second bar that differed from the first bar (see Fig. 1). Third, the components directly abutted each other in order to evoke strong Gestalt principles of grouping, namely connectedness and proximity. The new format is called a hat graph because the graphs ended up bearing a resemblance to hats. The “brim” of the hat represents the value for condition 1, and the top of the “crown” of the hat represents the value for condition 2. The height of the crown represents the difference. A single object (the crown) represents the difference, so it should be easier and faster to see the differences represented in the graph. This prediction was tested in experiments 1 and 2.

A second prediction was that hat graphs would lead to better sensitivity and less bias in estimating the magnitude of the effect. This prediction was based on the idea that hat graphs allow for more flexibility in setting the range of the y-axis, and that setting the y-axis range to 1.5 SDs, as recommended by Witt (in press), improves sensitivity and decreases bias relative to showing the full range. This prediction was tested in experiment 3.

Experiment 1

Participants were shown images depicting attitude scores on baseline and final tests for three or six advertisements. Their task was to indicate which advertisement produced the largest improvement in attitude at final score over baseline.

Method

Participants

Twenty-two participants volunteered in exchange for course credit. A large effect was assumed, given the theoretical reasons to think that hat graphs would have an advantage over bar graphs. A power analysis for a paired-samples t test with an effect size of d = 0.80 and alpha = 0.05 (two-tailed) showed that 14 pairs are needed to achieve 80% power. Data collection was scheduled to stop on a day on which this number was likely to be achieved, although more participants were collected than needed, resulting in 95% power to find an effect size of d = 0.80.

Stimuli and apparatus

Stimuli were displayed on computer monitors. The stimuli were created using data simulated and plotted in R (R Core Team, 2017). Four factors were manipulated. One factor was graph type (hat graph versus bar graph). Each set of simulated data were plotted with a hat graph and with a bar graph. Another factor was number of advertisements (three or six). A third factor was the position of the target (best) advertisement. These were evenly distributed across the locations, and target position was repeated for the graphs with only three advertisements. The fourth factor was the alignment across advertisements. One third of the graphs were aligned to have similar baselines, so the target advertisement also had the highest final score. One third were aligned to have similar final scores, so the target advertisement had the lowest baseline score. And one third were aligned at the mean value between baseline and final scores (see Fig. 2). Each graph style was repeated four times to have several variants to show to the participants. This resulted in 288 unique graphs (288 = 2 × 2 × 6 × 3 × 4).

Data for the graphs were created from simulations. As noted below, the task for participants was to indicate which advertisement produced the largest change in attitude, so the critical data are the differences between conditions. For the non-target advertisements, the differences between baseline and final scores were the mean value of 100 samples from a normal distribution with a mean of 1 and a SD of 2. For the target advertisements, the differences between baseline and final scores were the mean value of 100 samples from a normal distribution with a mean of 2.5 and a SD of 1. Thus, the target advertisement produced a bump in attitude scores 2.5 times more than the non-target advertisements. These difference scores were added to the baseline scores for the baseline aligned and mean aligned graphs. For these graphs, the baseline scores were the mean value of 100 samples from a normal distribution with a mean of 3 and a SD of 1. For the final-align graphs, the final scores were the mean value of 100 samples from a normal distribution with a mean of 5.5 and a SD of 1, and the difference scores were subtracted from the final scores to compute baseline scores. The process was the same for the target conditions with the exception that 0.5 was subtracted from the baseline condition so that it would not align with the other baseline conditions.

Two graphs were created for each set of data. One was a bar graph and one was a hat graph. Thus, the data contained in the graphs were identical across graph types. The baseline condition was white and the final condition was black for the bar graph. The lines were black and the crown was white for the hat graphs. The y-axis ranged from 1 to 8 on every graph. Advertisements were labeled A–F and were always in alphabetical order.

Procedure

Participants completed two blocks of trials, one with bar graphs and one with hat graphs. Start order was counterbalanced across participants. For the hat graphs, they were shown these initial instructions: “An advertising company is interested in which ads lead to the biggest changes in attitude. They ran a study testing several different ads. In each study, they measured attitude at BASELINE (before seeing any ads) and again at the FINAL test (after seeing the ads). All of the ads increased attitude. Your task is to determine which ad produced the BIGGEST increase in attitude. The baseline attitude will be shown as a horizontal line. The final attitude is shown as the top of the box. The height of the box shows the change in attitude from baseline to final test. Which ad produces the biggest change? Enter your response for each graph on the keyboard. Respond as fast and accurately as possible. Press ENTER to begin”. For the bar graphs, the instructions were the same except instead of describing the hat graph, they were told the following: “The baseline attitude will be shown in white boxes. The final attitude will be shown in black boxes. The difference between the white and black boxes shows change in attitude from baseline to final test.”

On each trial, a graph was shown after a fixation screen of 500 ms, and participants entered a response A–C (for graphs with three advertisements) or A–F (for graphs with six advertisements). The graph remained visible until participants made their response, at which point a blank screen was shown for 500 ms before the next trial began. Participants completed 144 trials with one type of graph before switching to the block of trials with the other graph type. Order within block was randomized.

Results and discussion

Reaction times (RTs) are positively skewed, so they were log-transformed. The data were initially explored for outliers. RTs beyond 1.5 times the interquartile range (IQR) for each subject for each condition were excluded (3% of the data). Next, mean RTs and mean accuracy scores were calculated for each subject and each condition and plotted in separate boxplots. One participant was beyond the IQR for both, and three participants were beyond 1.5 times the IQR for accuracy scores. These participants were excluded. For remaining participants, accuracy was nearly perfect (mean (M) = 98.9%, SD = 1.3%), so the analysis focused on RTs.

Data were analyzed with linear mixed models using the lme4 and lmerTest packages in R (Bates, Machler, Bolker, & Walker, 2015; Kuznetsova, Brockhoff, & Christensen, 2017). A linear mixed model was run with the log RTs as the dependent factor. The independent factors were graph type (bar or hat), number of advertisers (three or six), graph alignment (baseline, final, mean), and initial graph type (bar or hat). All independent factors were entered as a factor with the reference factor being the first as listed above. Two-way interactions between graph type and each factor were also included. The random effects for participant included intercepts and slopes associated with graph type. Estimation was done using restricted maximum likelihood and Satterthwaite’s method for degrees of freedom. Effect sizes were calculated based on the formula from Westfall, Kenny, and Judd (2014). The emmeans R package was used to extract marginal means on the original scale (non-transformed RTs) from the model for the plots (Lenth, 2019).

Graph type had a large effect on RTs, d = 1.20, t = 11.76, p < .001. Relative to bar graphs, responses to hat graphs were 37% faster (see Fig. 3). Using the random effect coefficients to estimate the impact of graph type on each participant, it can be seen that the model estimated that all 18 participants showed faster responses to hat graphs than to bar graphs (see Fig. 4).

The number of items had a small-to-medium effect on RTs, d = 0.39, t = 15.05, p < .001. Going from three to six items was associated with a 16% increase in RT (see Fig. 3). The interaction between number of items and graph type was negligible, d = 0, t = 0.09, p > .92. Thus, although hat graphs increased speed to find the largest difference, they did not make the search more efficient, as would have been shown by a shallower slope for the hat graphs.

The medium effect of initial graph type on RTs (d = 0.50, t = 2.07, p = .054) is better explained by the big interaction with graph type, d = 0.87, t = 6.47, p < .001. The increased speed to respond to hat graphs was greater in people who had completed a block of trials with the bar graphs than in those who started with the hat graphs (see Fig. 5). This interaction could also be interpreted as two main effects: faster responses for hat graphs and faster responses in the second block.

Graph alignment had a very small influence on RTs (Fig. 6). The RT was slightly slower when the baseline scores were aligned than when the final scores were aligned, d = 0.15, t = 4.62, p < .001, and slightly slower for baseline scores than when the mean scores were aligned, d = 0.13, t = 4.26, p < .001. There was no difference between RTs for graphs with aligned final scores and aligned mean scores, d = 0.01, t = 0.36, p = .72. The interaction between graph alignment and graph type was similarly quite small (graph type and baseline versus final, d = 0.17, t = 3.82, p < .001; graph type and baseline versus mean. d = 0.08, t = 1.84, p = .066; graph type and final versus mean, d = 0.09, t = 1.98, p = .048). More importantly, responses were faster to hat graphs than to bar graphs in all alignment conditions, ds ≥ 0.99, ps < .001. This shows some robustness to the advantage of hat graphs over bar graphs because it shows the advantage for hat graphs does not depend on one particular alignment between its parts.

Hat graphs improved speed to find the advertisement that produced the biggest boost in attitude relative to bar graphs. The results are consistent with principles of graph design that by making the difference an object that will give rise to a message flag, graph comprehension will be easier and thus faster.