Although there is as yet no unified theory of embodiment, scholars of embodied cognition generally agree that mental processes are mediated by body-based systems, including body shape, movement, and scale; motor systems, including the neural systems engaged in action planning; and the systems involved in sensation and perception (Alibali & Nathan, 2012, p. 248).
The embodied mind
Theories of embodied cognition propose to explain the genesis of human conceptual knowledge representation and cognitive processing as rooted to varying degrees in the shape of the human body and its action with the environment. The pursuit to unify various aspects of cognition with respect to bodily form and action, however, has given rise to various interpretations of the “embodiment” construct and the degree to which behavior and knowledge representation can be viewed as body-based. Given the polysemous nature of embodied cognition and an incomplete theoretical parsimony in the literature, we first clarify what precisely we mean by embodiment, the evidence taken to support this account, and what implications should arise from this theoretical framing.
Although theories of embodied cognition have been around for decades, there is no singular view of what is meant by the term “embodiment” or embodied action (Anderson, 2003). In a broad sense, embodiment is characterized by a shared assumption that the body, its particular form, and its sensory capacities supply a cognitive system with a rich input stream that shapes knowledge representation and later cognitive processing of those representations. However, despite this class of shared assumptions, researchers operationalize “the embodied mind” differently in empirical work and it is not always clear that embodiment is an umbrella term for lines of work separated by differing philosophical assumptions. Wilson (2002) has provided a succinct review of these threads, their assumptions, and the claims they have advanced. Embodiment, as referenced in this article, is consistent with Wilson’s sixth claim: namely, that offline cognition is body-based. Embodiment in this sense is a fundamentally brain-based phenomenon, where, “[the] function of the sensorimotor resources is to run a simulation of some aspect of the physical world, as a means of representing information or drawing inferences” (Wilson, 2002, p. 633).
Some important existence proofs of embodiment have arisen out of studies of semantics, suggesting that meaning and knowledge representation engage sensorimotor simulation. Humans are quicker to identify whether a common household tool (e.g., frying pan, hammer) is in the correct or inverted orientation when the stimulus is presented in an orientation consistent with how a human would grasp that object for action (Tucker & Ellis, 1998). In the action-compatibility effect (ACE), Glenberg and Kaschak (2002) found that participants are quicker to identify whether a phrase is sensical when participants move their body in a way congruent with the motion implied by the sentence. For example, if participants were asked to press a button that required extension of the arm away from the body when reading the sentence “Mary pulled the drawer toward herself,” it would take longer to judge this sentence as sensical than the condition where the motion and semantic meaning of the sentence were aligned.
Further behavioral evidence from language processing studies has shown that comprehension of certain grammatical constructions results from a mental simulation within the action space implied by the sentence. Kaschak and Glenberg (2000) demonstrated that meaning in language is not purely syntactic: when presented with innovative denominal verbs constructed from nouns, participants were quicker to read sentences as sensical when the affordances of the noun were consistent with the action implied in the sentence. For example, the sentence “the woman crutched the goalie the ball” would be judged as sensical over the same grammatical construction “the woman egg-shelled the goalie the ball,” because a crutch has particular affordances such as rigidity and extension that allow the woman to transfer the ball to the goalie that are not possible for the egg shell. Of importance, these effects require no common association between verb and object. Detectable differences in reading time are also found when participants read sentences that combine the affordances of objects that exhibit no typical association, but whose affordances mesh during mental simulation (e.g., “hang the coat on the vacuum”). Kaschak and Glenberg argue that a view of language comprehension as a manipulation and combination of abstract symbolic knowledge would not predict such reading time differences. Further, they argue that these studies lend evidence to support a view that humans draw on modality-specific information even though they are displaced in space from the scene, actors, objects, and syntactic relationships implied by a sentence.
Such behavioral findings are consistent with Wilson’s (2002) body-based view of offline cognition. A growing number of neuroimaging studies have also sought to directly image the brain regions engaged in these object recognition and conceptual processing tasks. For example, the motor system is shown to selectively activate when individuals observe objects that have common action affordances or read action-related words. Functional magnetic resonance imaging (fMRI) studies have demonstrated that when right-handed participants see images of tools (e.g., a hammer), higher levels of activation are observed in the left ventral premotor cortex compared to viewing objects with no typical associated hand movements (e.g., an elephant), suggesting that perception of manipulable objects automatically elicits imagined interactions with those objects (Chao & Martin, 2000). When participants read action-related words such as kick, pick, lick, motor-specific leg, hand, and mouth areas of the brain activate in response to the word, indicating sensorimotor activation in the comprehension of these words (Pulvermüller, 2005). Moreover, lesion simulations via transcranial magnetic stimulation (TMS) have been shown to induce differences in behavioral performance with respect to comprehension of such action words (Pulvermüller, Hauk, Nikulin, & Ilmoniemi, 2005).
These empirical findings provide substance to the claim that cognition and knowledge representation engage body-based simulation. However, note that we reject the “strong” formulation of the embodiment hypothesis given that it is incongruous with available neuroscientific evidence (Chatterjee, 2010; Meteyard et al., 2012). That is, the claim that human knowledge and cognitive processing are completely embodied and are composed solely of sensorimotor content is an untenable position. Recent critiques have emphasized that while the evidence clearly shows that modal areas of the brain activate during conceptual processing, it has not been ruled out that a more abstract form of conceptual knowledge does not simply cascade to these areas of the brain in a functionally unimportant manner (Mahon & Caramazza, 2008). Meteyard et al. (2012) reviewed a number of theories of embodiment and claims advanced about semantic processing in the face of neuroscientific and neuropsychological literature. The theories of embodiment were graded on a continuum and characterized as: (1) non-embodied, (2) secondary embodiment, (3) weakly embodied, or (4) strongly embodied. At one end of the continuum, non-embodied theories encompass traditional cognitivist views of representation. At the other end, strongly embodied theories argue that primary perceptual cortices are directly recruited in knowledge representation and that veridical sensory impressions are simulated during semantic grounding. In the middle of the continuum, theories of secondary and weak embodiment disagree primarily on the extent to which the modal regions of the brain are directly implicated in knowledge representation. Theories of secondary embodiment propose “that the semantic system is independent of but directly associated with sensory and motor information,” whereas theories of weak embodiment “propose that semantic representations are at least partly constituted by sensory-motor information” (Meteyard et al., 2012, pp. 791–2).
Meteyard and colleagues (2012) argued that the current evidence best aligns with claims from weak embodiment, where functional neural clusters have been found in regions parallel (e.g., anterior) to primary sensory cortices. Consistent with the functional-anatomical hypothesis (Chatterjee, 2008), weak embodiment argues that functional neural clusters that are organized near primary perceptual cortices, but non-isomorphic to them, function to abstract features of experience and provide input into higher-order representational systems. In a weakly embodied view, abstracted modal experiences converge during the access of mental representations in convergence zones, where simulation “…may instead be the activation of feature conjunctions sufficient to represent a given object, or word” (Meteyard et al., 2012, pp. 794–5).
Establishing that knowledge is not solely embodied does not rule out the possibility that all knowledge has some embodied component. Addressing this possibility in its full complexity will likely require years of targeted behavioral and brain imaging studies, but current evidence suggests that cognitive processes that recruit mental imagery appear to have a more strongly embodied character: “…we find ourselves supporting a position where primary sensory and motor regions are not activated during routine semantic processing (in opposition to strong embodiment) but may be so for deeper processing related to imagery” (Meteyard et al., p. 801). Despite the ongoing debate, the stance advanced by Chatterjee (2010) that an embodied/disembodied dichotomy has “outlived its usefulness” in face of the neuroscientific evidence moves the field beyond asking questions about the existence of embodiment. Instead, nuanced research questions that probe when and to what extent conceptual knowledge is embodied are likely to be more generative moving forward. Therefore, embodiment is better understood through the grounding metaphor: that particular concepts, and even classes of concepts, are grounded in perception and action states from an individual’s prior experience and that such conceptual knowledge is mediated by simulation as a function of task demands.
Such behavioral and brain imaging studies provide a strong counterpoint to traditional cognitivist models of mental representation and computation (e.g., Newell & Simon, 1972; Pylyshyn, 1984) where amodal symbols exhibit non-analog mappings to the external world. Instead, complementing lines of evidence support the view that important concept-driven processes (e.g., language comprehension) can draw on simulation of analog properties of the body and experience with the external world to ground meaning. Consistent with the reviewed literature, grounded views of embodiment do not position human perception as a veridical recording system (Barsalou, 2008; Meteyard et al., 2012). Rather, visual, auditory, kinesthetic, olfactory, and somatosensory experience provide a rich input spectrum to the cognitive system, from which features, relationships, and states are schematically abstracted — a process that is also subject to error — that can generalize beyond the immediate situation in which they were produced (Barsalou, 1999).
As theories of embodiment are translated into accounts of learning, an open question remains about the mechanism by which new knowledge – especially abstract knowledge – can be accounted for within an embodied framework and how neurally abstracted sensory impressions are implicated in broader cognitive representation. While settling on a specific mechanism is outside the scope of this paper, a few possibilities are worth considering. The first is that embodied actions may provide students with novel representations for structuring information and problem solving. In this view, the performance of actions with the body provides learners with new representations, such as representational gestures, that foreground aspects of a domain in a stable form that can be readily reproduced. These body-based representations might become part of a learner’s “toolbox,” providing utility for reasoning on novel tasks, where bodily action might serve to alter the way in which the learner structures their thinking.
A second possible mechanism is analogy/metaphor. Analogy and metaphor have long received attention as a mechanism by which sensorimotor impressions derived from experience help structure more abstract reasoning (Lakoff & Johnson, 1980, 1999; for a review see Jamrozik et al., 2016). In this view, individuals repeatedly access sensory and motor impressions of concrete objects and events and come to abstract the more generalizable relational properties of these instances that apply to other knowledge. For instance, the phrase “negotiation is a muscle” could be understood by first accessing sensory impressions of muscles: muscles are flexible and can apply force; muscles can be strengthened with practice. These generalized relationships can then become ascribed to “negotiation” in a way that is not made explicit in the turn of phrase alone. Of importance, fMRI work has constrained this view: sensorimotor simulation both varies as a function of the experiential nature of the source of the metaphor as well as the actual accrued exposure the individual has perceiving and interacting with the source of that metaphor (Jamrozik et al., 2016). Such a mechanism is wide-ranging in its explanatory capacity and resembles other accounts, such as Barsalou’s (1999) perceptual symbol systems, where “multimodal traces of neural activity that contain at least some of the motor information present during actual sensorimotor experience” (Goldin-Meadow & Beilock, 2010, p. 665) can ground meaning in simulation.
A third possible mechanism is that performing actions with the body may serve to sharpen existing spatial representations that a learner may already have access to. In spatial domains, the analogy/metaphor mechanism brings into question what precisely is the source from which source-target mapping may proceed. This would imply that to think spatially may actually involve the reactivation of an individual’s prior representation of a spatial concept and make it salient in the new context. This existing spatial representation, perhaps in the form of a motor or image schema, can then be mapped to the novel task in a way that may hone the existing representation.
This integrative view of mind and body is a departure point from which to reframe what is at the disposal of a learner in a learning environment and what potential consequential utility might result from instruction that aims to promote embodiment around STEM concepts. Given the previous assertion that the “strong” form of embodiment is untenable, it is also not supported by current evidence that all facets of knowledge accessed during problem solving in STEM tasks merely lack some embodied alternative. However, there is evidence in extant literature that human concepts of motion in space and the representation of spatial relationships is a class of knowledge that frequently recruits modal simulation and that the body should thus serve as an inroads to promoting domain-relevant understanding of spatial concepts.
On the embodiment of spatial thought
Embodiment is not novel per se in investigations of human spatial thought and cognition. Developmental psychologists have long explored the connection between body–environment feedback, concept formations, and development of mental representations (cf. Wellsby & Pexman, 2014). Piaget’s observations of his children and their development of push/pull schema from interacting with blocks progressively removed from their immediate reach showed how infants learn about the allowed classes of interactions with their environments through action and perception feedback (Piaget, 1952; Piaget & Inhelder, 1956). In robotics and artificial intelligence, researchers found that by providing robots with biological perceptuomotor systems able to perceive, process, and encode aspects of the external world they could create a form of intelligence that emerged in the absence of rich explicit internal representations of the environment (Brooks, 1991; Kirsh, 1991).
Recently, Waller (2014) argued that “[s]patial thought may be an excellent venue for [the modal basis of internal spatial representations], and may be relatively better poised than many other research domains to provide evidence for the constitutive claims of embodied cognition” (p. 148). Many tasks that are used as proxy measures for the human capacity to decode and manipulate spatial entities rely on analog mental simulation. For example, Shepard and Metzler’s (1971) canonical finding of a linear relationship between an individual’s response time and the angular disparity between two block pairs in tests of speeded rotation suggests that processes like mental rotation rely on imagistic manipulation of the blocks as if they were actually being moved in the physical world. Moreover, interference studies have demonstrated that on the block rotations of Shepard and Metzler, when individuals are asked to rotate a joystick either aligned or anti-aligned to the required mental manipulation of the blocks, detectable response time differences emerge where the aligned condition is quicker (Wexler, Kosslyn, & Berthoz, 1998). Chu and Kita (2008, 2011) have shown that individuals who gesture to solve similar mental rotation tasks outperform their non-gesturing counterparts and that this is a trainable skill.
If spatial knowledge exhibited a non-analog correspondence to modal experience, then preferred reference frames in spatial memory tasks should also be an unobserved phenomenon (Waller, 2014). In tasks probing judgment of relative distance, participants consistently prefer to encode the location of objects with respect to their natural corporeal orientation to gravity. Rather than observing a chance distribution of a participant’s reference frame with respect to an array of objects, there is a strong bias to orient the upward z-axis with the viewer’s bodily axis. In addition to preferring the vertical bodily axis, Franklin and Tversky (1990) have also demonstrated that location judgments are not made equally along all bodily axes. Testing the spatial framework hypothesis, Franklin and Tversky showed that an isotropic notion of space is undercut by response time biases where individuals in an upright position were fastest to identify objects from an imagined array above and below them, slower to identify objects in front of or behind them, and slowest to identify the location of objects on a lateral left–right body axis. Moreover, Kosslyn, Ball, and Reiser (1978) observed that spatial representations in memory preserve metric properties. Kosslyn and colleagues demonstrated that irrespective of whether a participant viewed an actual spatial pictorial representation or simply imagined one, response times on tasks where individuals were asked to scan the image were nearly identical. Taken together, these findings point to an understanding of space that is modal and analog.
Some of the earliest empirical evidence supporting embodiment of spatial thought in modern psychology originated in cognitive linguistics. By investigating the implicit conceptual structure embedded in human language, Lakoff and Johnson (1980, 1999) argued that human concepts are fundamentally grounded in bodily experience and arise from experience with the world. In particular, Lakoff and Johnson provided evidence that spatial concepts are fundamentally embodied. They cited cross-linguistic analyses of spatial language to show that despite millennia of separate evolution, various human languages contain remarkably similar concepts of space that map to the anatomy of the human body. In English, for example, the constructions “the ball is on top of the box” or “the dog is behind the tree” can be interpreted from an egocentric reference frame that conceptualizes spatial relationships structurally isomorphic to the human body. That is, in the first sentence “top” designates the point on an object most distal from the pull of gravity. For a human this location corresponds to the head and, by extension, the same location on the box. In the second example, the tree acquires the attributes “front” and “back” because one hemisphere of the tree is in view (as would be the case in human discourse) while the other hemisphere is occluded from vision.
The use of the body and embodied knowledge to represent and think spatially has also been identified among expert STEM professionals engaged in their discipline. Ethnographic studies of scientists engaged in authentic practice have found that complex spatial ideas are often conveyed using representational gesture-based and body-based metaphors. When explaining the complex configuration of the protein thrombin, research biologists frequently recruited representational gestures to demonstrate the complex conformational changes of the protein in the presence of thrombomodulin with their hands (Becvar, Hollan, & Hutchins, 2005). In a study of physicists collaboratively working to understand the relationship between temperature and magnetic transitions in a particular material, Ochs, Gonzales, and Jacoby (1996) found that scientists drew on body-based metaphors during discourse when they were confused about novel hypotheses (“When I come down I’m in the domain state”). Moreover, Nobel Laureate geneticist Barbara McClintock has long been recognized for her innovative approach to imagining herself as the plants she studied, “perceiving” the chromosomes (Henriksen, Good, & Mishra, 2015). McClintock’s work lead to a number of significant breakthroughs in scientific understanding of gene expression, exchange of genetic information during meiosis, and preservation of information in telomeres and centromeres.
Neuroimaging studies have only recently begun to probe how the brain represents spatial information and the connection between spatial perception, conception, and language. When individuals view a visual scene and are then asked to imagine it in the absence of the stimulus, as many as 90% of the voxels that are active during online perception are also activated during imagined viewing of the scene (Ganis, Thompson, & Kosslyn, 2004; Kosslyn, Thompson, & Alpert, 1997). Studies of spatial language have found that the separate grammatical constructions for manner and path found in spoken language comport with the neural divergence of manner and path information along ventral and dorsal pathways. Regions of the laterotemporal cortex associated with action perception appear to mediate semantic grounding of spatial language, where more metaphoric uses of spatial language are mediated more anteriorly along the middle temporal gyrus (Chatterjee, 2010). These findings are broadly consistent with the thesis that humans conceive of space in a manner consistent with the grounded account of embodiment: mental representation maintains analog and metric properties, spatial computation interfaces with the motor system, humans exhibit a strong preference to encode spatial relationships consistent with body orientation, highly similar brain regions activate during perception and imagination of a visual scene, and spatial language reflects real regional specializations for conceiving and perceiving spatial relationships such as object manner and path. Thus, we propose that if conceptual knowledge of space is mediated by body-based systems, conceptual knowledge of space should be groundable through bodily action.
Promoting spatial thinking through embodied actions
National reform efforts have emphasized that fostering spatial ways of thinking and problem solving are not broadly represented in contemporary curricula (National Research Council, 2006). Unlike the long standing history to reform mathematics and verbal literacy education, researchers and educators have paid comparatively less attention to supporting learners at all levels to master knowledge of space, spatial concepts, and the concomitant habits of mind that produce critical thinkers in STEM. The report argues that such a constellation of skills constitutes a particular form of literacy, subject to normative forces, where the literate student should be able to: (1) “have the habit of mind of thinking spatially,” that is, know the contexts in which it is appropriate to think spatially, (2) “practice spatial thinking in an informed way,” that is, do so in a principled manner built on a solid understanding of underlying concepts and tools of representation, and (3) “adopt a critical stance to spatial thinking”, that is, critically evaluate sources of data as well as products of problem solving (National Research Council, 2006, p. 20).
Of the aspects of spatial literacy that the National Research Council (2006) report highlights, supporting learners during problem solving to know how to use concepts and representations of space to structure problems in a domain is most clearly related to traditional learning outcomes. Teaching students to think spatially, then, requires considering how embodied actions can promote and “… [nurture] the practical, emotional, or imaginative states that are thought to undergird formal analytical thought” (Waller, 2014, p. 147, emphasis added). Embodied actions, as the name suggests, are related to embodiment (and in fact, deeply so), but the distinction between the two is important. We operationalize embodied actions as the purposeful body positions and movements that an individual engages in during a learning activity, where these body states and actions exhibit a non-trivial relationship to the targeted learning objective. In the case of spatial thinking, an embodied action would represent any purposeful body state or motion that reproduces a structural mapping to a spatial concept during learning. This distinction between embodiment and embodied action is important because if embodiment is ultimately a brain-based phenomenon, then embodied actions represent the physical antecedents of the embodied mind. Moreover, learning environments can only directly manipulate the physical actions of the learner in pursuit of promoting embodiment around a learning objective.
With this distinction in mind, there is already promising evidence that embodied actions play an important role in spatial thinking. Research in gesture for learning has shown that gesture can influence how learners approach spatial problems. For example, when students are given classic gear chaining problems in engineering, learners taught to physically trace the direction of gears are more likely to use gesture in their solution strategy (Alibali, Spencer, Knox, & Kita, 2011). Goldin-Meadow, Cook, and Mitchell (2009) showed that when students are taught to group addends with a gesture not explicitly mentioned in speech, students took up the grouping gesture as part of their explanation. Of importance, these gestures do not just influence strategy choice, they can also improve learning outcomes for students over traditional instruction. For example, students who used gestures to represent the grouping of addends in a mathematical equality were significantly more likely to mention grouping as a strategy in their verbal explanation and their gesture frequency was associated with higher scores on an achievement assessment. Similarly, chemistry students who are taught to use their hands to represent and maintain complex spatial relationships in organic molecules outperform their peers when asked to draw structural diagrams of a molecule from multiple perspectives (Stieff, Lira, & Scopelitis, 2016).
All of these examples of disciplinary learning show that gesture, and arguably embodied actions more broadly, may serve as a useful resource in improving spatial thinking precisely because it serves to ground understanding of spatial concepts in bodily action. In fact, as a manifestation of the embodied mind (Alibali, 2005; Hostetter & Alibali, 2008), gesture has been argued to enhance thought by foregrounding action in mental representation (Goldin-Meadow & Beilock, 2010). As argued previously, spatial thought may regularly be grounded in the simulation of perception and action states as a means to structure information and draw inferences: mental representations of space maintain analog features of the external world (Kosslyn et al., 1978) and mental imagistic processes such as mental rotation interface with the body’s motor system (Chu & Kita, 2008; Waller, 2014; Wexler et al., 1998). Compellingly, the imagery and computation underlying spatial thinking can be selectively enhanced through instruction employing embodied actions. For example, when presented with the classic “radiation problem” (Gick & Holyoak, 1980) that involves deciding how to irradiate a tumor without killing a patient, Craig, Nersessian, and Catrambone (2002) demonstrated that participants who were asked to make gestures that described their solution strategy performed much better than participants who only made a drawing of their solution strategy. Studies such as these demonstrate that gesture may have a unique role for supporting spatial thinking that goes beyond drawing a learner’s attention to spatial information or making spatial information more salient.
Instructionally supporting broader habits of mind to think spatially and engage in spatial processes of reasoning may also be enhanced by embodied actions. Gestures, for example, have been shown to play an important role in learning interactions: they externalize imagistic aspects of thought and coordinate shared attention around representational tools during learning (Alibali & Nathan, 2012) and speech-gesture discordance indicates a learner’s receptiveness to instruction (Alibali & Goldin-Meadow, 1993; Breckinridge-Church & Goldin-Meadow, 1986). Gestures can also feedback into and lay the foundation for new knowledge (Goldin-Meadow et al., 2009). Such research suggests that gesture has clear utility as a visible formative assessment tool, but that it also serves a broader purpose in individual cognition. Supporting learners to think spatially means considering learning environment designs that provide the imaginative mental states about concepts of space, tools of representation, and processes of reasoning that underlie more formal thought (Waller, 2014). The reciprocal quality of gesture and other embodied actions to both externalize thought as well as provide novel and retrievable resources when learning and problem solving make it a promising means through which to promote complex analytical reasoning such as spatial thinking. The position that embodied actions can act as resources to improve spatial thinking in STEM is in sharp contrast to interventionist approaches that aim to improve spatial thinking in STEM through proxy training of spatial ability.
Constraining the breadth of embodiment
We have remained largely optimistic about the potential for embodied actions to improve spatial thinking, but we wish to constrain this position meaningfully: foremost, we strongly adhere to the philosophy that learning environments should always serve as a testbed where theories of learning can be placed in harm’s way (Cobb, Confrey, DiSessa, Lehrer, & Schauble, 2003). This means that although we are arguing that spatial thinking can be grounded in body-based simulation, we equally advocate a healthy skepticism throughout. By testing the viability of learning environments derived from theory, we can provide real evidence about whether viewing cognition as offline body-based simulation grounded in sensorimotor experience is consequential for supporting students across the STEM spectrum. Embodiment is not new to education research and various scholars have brought a unique focus to how we understand embodiment as an individual and group phenomenon. Arguably, multiple lines of investigation are beneficial if the research field is to achieve any kind of theoretical parsimony. However, despite the benefits we stand to gain from studying embodiment as it plays out in situ, we also argue that our knowledge of how and when embodied actions best support spatial thinking are still poorly understood and it is necessary to first investigate these mechanisms through appropriate reductionism (cf. Núñez, 2012).
We acknowledge that learning often happens through complex interactions embedded in a sociomaterial environment. Some work has even specifically sought to understand embodiment of disciplinary knowledge in social settings (e.g., Alibali & Nathan, 2012) as well as advocate distributed embodiment in supporting students’ understanding of complex systems dynamics (e.g., Lindgren & Johnson-Glenberg, 2013). While there is strength in understanding embodiment as it arises in social contexts, there are a few unavoidable confounds such an approach encounters when trying to better understand embodiment as a cognitive phenomenon and its role in learning. Consider a dyad working together toward understanding the concept of the water cycle. In the process, these individuals produce a rich exchange of dialogue as well as various behaviors such as posture shifts, gestures, and inscriptions to explain how water moves from reservoirs, into the atmosphere, and then back down through precipitation. Let us say that one student is explaining how “water evaporates to become clouds (hand moves up, palm open).” This student's partner asks a clarifying question about evaporation and performs a similar gesture. Does the second student gesture because they are simulating evaporation in a way analogous to the first student? Or, on the other hand, does this gesture now serve as a shared representation to facilitate communication? While such externalizations may provide insight into evolving spatial thinking, it is not clear whether the observed utterances result from the internal simulations that would have also been observed from each individual uniquely. Rather it is likely that the functional role of instances of language, gesture, and inscription shift fluidly for these learners as they move between individual reflection, self-explanation, attending to their interlocutor, and working to construct inferences in their dyad. Gesture, in particular, which provides insight into the non-linguistic imagistic aspects of speech, is highly susceptible to social settings: the threshold above which someone may gesture is influenced by implicit mores of the interaction (McNeill, 1992).
Such confounds to studying embodiment as a group process significantly obfuscate the ability of research to draw rigorous claims about the extent to which grounded simulation may support targets such as spatial thinking. This is especially problematic given that recent critiques from neuroscience indicate that conceptual grounding may exhibit a developmental arc. Rather than conceptual grounding being binary, it may actually evolve along a continuum toward abstract knowledge representation (Chatterjee, 2010). For example, an individual may understand the spatial concept “above” over the course of multiple exposures to arrays of objects arranged in an invariant configuration that shapes their simulation of aboveness. This schematization of the relationships encoded in “above” no longer pertain to any specific objects and the concept of “above” becomes flexible with respect to its referents. Chatterjee refers to this as referential ambiguity and argues that such abstraction is fundamental for concepts such as spatial relationships and configurations. If reasoning about space depended on first indexing specific objects, it would hold little utility in novel situations.
In addition to embodiment being subject to a developmental trajectory, spatial thinking is also a complex construct and represents no unified process (National Research Council, 2006). Spatial thinking subsumes various cognitive operations of visualizing and operating on spatial information, but it also more broadly implicates the tools and processes of reasoning that situate such cognitive competencies. We might instead assume that the role that embodiment plays in supporting learners as they develop an understanding of concepts of space, the use of tools to represent data and spatial phenomena, and the patterns of reasoning in their domain will draw differentially on simulation. In fact, the demands of some spatial tasks may more readily activate offline simulation of perceptual and motor states (e.g., imagining an object moving through space, imagining positioning oneself at different vantage points around an object) than others (e.g., employing the formalisms of the Cartesian plane to characterize the velocity of an object). Given such unanswered questions, there are likely to be rich contributions to the literature as well as theory by focusing on embodiment as a cognitive phenomenon at the individual level. The development of spatial thinking and the extent to which domain-relevant spatial concepts and processes of reasoning can be fostered through embodied action remain underexplored and warrant investigations in parallel to efforts that focus on the social nature of embodiment.
There are also likely to be individual differences with respect to embodiment that remain underexamined in extant literature. Perceptual symbols and simulators (Barsalou, 1999) represent broad theoretical constructs intended to accommodate all aspects of cognitive representation. As such, we should expect variation in the extent to which conceptual knowledge may reliably produce similar sensorimotor activation across individuals. It is comparatively easier to hypothesize that the representation of tangible objects such as tools manipulable by hand (e.g., a hammer) should more readily give rise to similar neural activation across individuals on networks associated with hand movement and action planning compared to concrete objects that cannot be moved by hand (e.g., a building). However, such a priori specificity of simulation becomes increasingly difficult as a concept becomes increasingly abstract (e.g., consider “the basis of freedom” or “world peace”). Such complexity may be attributable to Barsalou’s (1999) argument that perceptual simulators are simultaneously rational and empirical. He argues that simulators are rational because they are rooted in genetic factors. In fact, a vast majority of humans exhibit a shared anatomy and physiology – bodily and neurally – that privileges particular classes of data from the world (audition, vision, proprioception, olfaction, etc.) and is also constrained by anatomical factors of the body such as the location of important sensory organs in the head, the bilateral symmetry of the body, and the typical preferences of handedness. Simulators, though, are also empirical because humans constantly abstract and accrete sensorimotor impressions from the external world in ways that reflect idiosyncratic experience. Humans broadly share a genetic basis for simulation, but the reality that humans each inhabit a somewhat unique environment should give rise to variation in the perceptions and conceptions of the world that subserve offline cognition.
Under the notion of variable embodiment (Barsalou, 1999), individual differences should thus arise from idiosyncrasies that impact either the rational or empirical basis of simulation. Factors that would affect the rational basis of simulation arise from atypical anatomies and physiologies. For example, an individual who lacks input from some perceptual organ (e.g., deaf, blind), has a congenital defect, has had a lesion or stroke, or is born with atypical anatomy (e.g., missing a limb) has a significantly altered rational input into the cognitive system and this should give rise to unique embodiment. Within populations that do not have physiological disparities, differential embodiment should thus arise from the unique empirical experiences underlying simulation. Differences in experience occur constantly, given that no two humans inhabit the same corner of an ecological niche. In an fMRI study, ballet and capoeira dancers were both shown videos of their particular dance style as well as the style they had not had experience performing. Calvo-Merino, Glaser, Grezes, Passingham, and Haggard (2005) found that even though both groups of dancers had extensive experience engaging in their respective forms of dance, brain areas associated with action control activated more when dancers viewed a video depicting the form of dance they had personally performed. This effect was found even within a particular style of dance when male ballet dancers watched video of moves only performed by their female counterparts (Calvo-Merino, Grèzes, Glaser, Passingham, & Haggard, 2006). Thus, representation and simulation differ measurably as a function of direct experience and direct experience observing action differs from performing the action physically. In the case of the dancers, the specificity of motor movement mattered and the findings suggested that “[h]aving produced an action affected the ways the dancers perceived the action, suggesting that the systems involved in action production subserve action perception” (Goldin-Meadow & Beilock, 2010, p. 666).