Visual narratives are a ubiquitous modern experience that can occur as static images with accompanying text (e.g., picture books, comics), or as continuous visual experiences that incorporate aural stimuli (e.g., film, TV). Most visual narratives are multi-modal in nature in that they contain visual and linguistic content. As a result, when engaging with visual narratives, consumers must integrate across these modalities in order to understand them (Cohn, 2016; Magliano, Loschky, Clinton, & Larson, 2013). While there is growing interest in the study of visual narrative comprehension, many researchers have opted to use materials that do not contain language (e.g., Magliano, Kopp, McNerney, Radvansky, & Zacks, 2012; Magliano, Larson, Higgs, & Loschky, 2016; Magliano & Zacks, 2011; Zacks, Speer, & Reynolds, 2009; Zacks, Speer, Swallow, & Maley, 2010). There is some evidence, however, that both visual and linguistic information support comprehension (e.g., Magliano, Dijkstra, & Zwaan, 1996). Moreover, there is robust evidence that visual and linguistic content support learning from multimedia contexts (e.g., Mayer, 2009), and there is reason to believe that visual and linguistic content support narrative comprehension in profound ways. However, because these information streams are not always equally important in conveying a narrative (e.g., Cohn, 2016), how visual and linguistic information convey meaning is an open question. In the present study, we explored the extent to which visual and linguistic content in sequential visual narratives (i.e., visual narratives that consist of static images such as comics) support comprehension and in what context these sources of information convey important and distinct information about the story.
Theories of text comprehension universally assume that narrative comprehension requires one to construct a coherent mental model that, in part, reflects how the explicitly conveyed narrative events are situationally related (e.g., related in terms of space, time, causality; Zwaan, Magliano, & Graesser, 1995 & Zwaan & Radvansky, 1998). Narrative plots are structured around characters performing intentional actions (Gee & Kegl, 1983; Mandler & Johnson, 1977; Rumelhart, 1975; Stein & Nezworski, 1978; Trabasso, van den Broek, & Suh, 1989). As such, inferring the relationships between explicitly conveyed actions and the goals that motivated them is an important basis for constructing coherent mental models (Long, Golding, & Graesser, 1992; Magliano, Taylor, & Kim, 2005; McNamara & Magliano, 2009; Suh & Trabasso, 1993). Research has shown that these mental models are organized in terms of events and event structure, with distinct boundaries between them (e.g., Kurby & Zacks, 2008; Radvansky, 2012; Radvansky, Krawietz, & Tamplin, 2011; Radvansky & Zacks, 2014). Comprehenders tend to track a set of situational dimensions across a narrative and segment their representation when these dimensions change. For example, people will track, from moment to moment, the goal of a character. If that goal changes in the currently processed information unit (e.g., sentence, comic panel, movie scene, etc.), people tend to update their mental models to accommodate that change (Kurby & Zacks, 2012; Zacks & Swallow, 2007; Zwaan, Langston, & Graesser, 1995), and such updating leads to the perception of an event boundary (Speer, Zacks, & Reynolds, 2007). This additional processing at event boundaries confers behavioral and cognitive consequences: reading times tend to slow down at changes (Radvansky & Copeland, 2010; Zacks, Kumar, Abrams, & Mehta, 2009; Zwaan, Magliano, & Graesser, 1995), memory increases for the new content (Swallow, Zacks, & Abrams, 2009), and memory for previous event information becomes less accessible than current event information (Radvansky et al., 2011; Speer & Zacks, 2005; Swallow et al., 2009). In the current study, given that goal changes are predictive of event segmentation and updating, we ask how might viewers of sequential narratives extract goal information during event processing?
Visual and linguistic content in picture stories can be coordinated to understand how the actions of the characters are related to explicitly established goals. Sometimes these information streams indicate that there is continuity of goal structure. For example, consider the two-panel sequence in Fig. 1 from the comic Black Cobra (1954). In the first panel, two men are seated at a table. The dialogue between the characters establishes that one character has the goal of having an individual killed, and the other character is a hit man who can accomplish that goal. Their bodily positions are consistent with that goal in that they convey that the characters are having a conversation about the contract. The language and images in the second panel are coherently related to the first panel because they are consistent with the continuation of the goal. Specifically, the dialogue conveys that the hit man has agreed and has established a written contract, and the bodily positions of the characters also convey that the contract has been accepted (i.e., the hit man is handing a piece of paper to the client). Presumably, a mental model of the events in the two panels would reflect the situational consistencies between the language and the visual content in terms of the goals of the characters. We see the situation reflected in this panel as akin to situations in text in which a reader has to infer how an explicitly described action is causally related to an explicitly established goal (e.g., Suh & Trabasso, 1993). However, in this example the action and goal are in the same panel.
However, visual and linguistic information streams may also indicate shifts in goals of characters. For example, consider the two-panel sequence in Fig. 2 from the same comic. In the first panel, the dialogue implies that the characters are in the middle of a fight and the visual stream gives specific details on who is fighting and how the fight is progressing. Specifically, the Black Cobra is using one of his enemies’ bodies to attack another. The second panel takes place in the same location - indicated by the Black Cobra’s enemies in the background - and involves the Black Cobra rescuing a doctor. Importantly, the bodily positions and actions of characters have dramatically changed and reflect that their goals have changed (i.e., rescuing someone is a separate goal from fighting one’s enemies). The introduction of a new character, actions of the primary characters (i.e., the Black Cobra), and the dialogue all convey that there is a shift in the goals of the primary character.
Figure 1 reflects the situation that we were interested in investigating because it resembles a common situation in narrative texts but has features unique to visual narratives. Specifically, as discussed above, text specifies content that conveys a goal of the characters (e.g., the statement of guaranteeing satisfaction indicates that the job has been accepted) and the action conveyed can be understood as being in the service of accomplishing that goal (e.g., handing over a contract). Viewers of visual narratives similarly need to infer the relationships between character actions and goals, but those actions are conveyed in pictures via changes in bodily position from panel to panel (e.g., the hit man was previously sitting at a table, clasping his hands). As such, inferring the relationship between texts conveying goals and visually depicted actions necessarily requires an integration of linguistic and visual content. Given that the study of visual narratives is relatively new, this phenomenon has not received empirical attention. Moreover, given the importance of inferring the relationships between goals and actions in establishing narrative coherence (e.g., Suh & Trabasso, 1993), this is a candidate for exploring how verbal and visual content are processed and integrated to support visual narrative comprehension.
Linguistic and visual content can have different relationships in conveying the narrative. Cohn (2016) argues that information streams in multimodal narratives vary along two dimensions: dominance and assertiveness. Dominance refers to the degree to which an information stream carries semantic information that contributes to constructing a mental model of the narrative (Cohn, 2016). For example, Figs. 1 and 2 both show co-dominant visual and verbal information streams in which both the pictures and the dialogue contribute to the understanding of the narrative. In addition to dominance, multimodal narratives may also vary in their assertiveness, or the degree to which there is a grammar-like structure that dictates order (Cohn, 2016). In the first panel of Fig. 2, the Black Cobra is shown fighting his enemies. This panel is followed by an image of him rescuing the doctor, with his enemies incapacitated in the background. Thus, the order of pictures clearly cannot be switched without changing the understanding of the narrative. On the other hand, in Fig. 1, if the pictures were reversed, the meaning of the visual sequence would not be changed (the characters sitting at the table in conversation and exchanging the contract could visually happen in any order). In both Figs. 1 and 2, we can see that the order of the text cannot be changed without altering the narrative. Thus, Fig. 1 would be described as a co-dominant, verbal-assertive narrative, whereas Fig. 2, would be described as a co-dominant, co-assertive narrative (Cohn, 2016). Although there any many relationships between verbal and visual streams in visual narratives, for the purpose of the current study, we focused on co-dominant narratives, with co-assertive relationships around the critical point. Specifically, we were interested in seeing whether changes in body position and changes in goal continuity were processed independently when visual and verbal information streams are co-dominant as well as co-assertive.
Overview of study, hypothesis, and predictions
The goal of the present study was to explore the relative impact of the linguistic and visual content on the processing of sequential narratives. We explored this issue in the context of event segmentation. People habitually recognize the boundaries that make up mundane everyday events (Speer, Swallow, & Zacks, 2003; Zacks, Speer, Vettel, & Jacoby, 2006; Zacks, Tversky, & Iyer, 2001) and narratives (Kurby & Zacks, 2012; Magliano et al., 2012; Magliano, Miller, & Zwaan, 2001; Magliano et al., 2005; Zacks, Speer, Swallow, Braver, & Reynolds, 2007). Segmentation for mundane, everyday activities is heavily influenced by perceptually salient changes in bodily position (Zacks, 2004), presumably because these are informative of progression towards the accomplishment of a goal (Kurby & Zacks, 2011). In the context of narratives, segmentation is influenced by changes in situational continuities (time, space, causality, goals) such that boundaries between narrative events are perceived when there are shifts in these dimensions (Kurby & Zacks, 2012; Magliano et al., 2012, 2001, 2005; Zacks, Kumar, et al., 2009). Importantly, when the visual stream suggests that there are changes in the goals of characters in the context of picture stories (Magliano et al., 2012) or films (Magliano et al., 2005; Magliano & Zacks, 2011; Zacks, Kumar, et al., 2009), viewers tend to perceive a narrative boundary.
In the first experiment of this study, participants viewed sequential narratives that contained text (see Fig. 3). The text provided a narration of the events depicted in the pictures, and the pictures depicted the characters engaged in goal-directed activities. An explicit goal (e.g., finish psychology paper) of the character was established in the verbal content (see first and second panels in Fig. 3). At a critical panel (panel 5 in Fig. 3), we manipulated the extent to which that the verbal content conveyed an event that was either continuous (e.g., needed to print the paper) or discontinuous (e.g., needed to print a plane ticket) with the prior established goal (see Fig. 4 for a representation of the four conditions of the experiment). We also manipulated whether the bodily position in the critical panel was continuous with that of the prior panel or indicated a change in position (i.e., discontinuous). Importantly, the changes in bodily position could be interpreted as being either consistent or inconsistent with the prior goal (e.g., the character could be printing the paper or printing the plane ticket). In experiments 1 and 2, participants read the pictured stories. In experiment 1, we recorded both viewing time and segmentation behavior via a unitization task, and in experiment 2 we recorded viewing time only. The unitization task involved participants indicating when there were changes in the events conveyed in the story pictures. Participants were not instructed about what constitutes a change in events, rather, they were allowed to determine this with no experimenter input. Segmentation judgments have been found to be highly reliable within and across participants (Magliano et al., 2012; Zacks et al., 2001). Additionally, these judgments are correlated with theoretically meaningful changes in videos (Zacks, Kumar, et al., 2009) and text (Speer et al., 2007; Speer & Zacks, 2005; Zacks, Kumar, et al., 2009).
There are two possible hypotheses for how people process changes in goal status and bodily positions when comprehending static visual narratives. First, it may be that changes in body position and goals are processed independent of each other. If this is the case, the probability of segmentation would increase when there is a shift in goal and when there is a shift in body position, but there would be no interaction. Alternatively, it is possible that there is an interaction between goal and body position such that shifts in body position increase segmentation likelihood, but only when goal continuity is maintained. Specifically, the disruption of the goal in the discontinuous goal condition will overshadow the impact of the body shift in the body discontinuous condition. However, in the continuous goal condition, the shift in bodily position indicates incremental and meaningful change toward the completion of the goal (e.g., completing the paper) that may require updating (Kurby & Zacks, 2012).