Visual completion from 2D cross-sections: Implications for visual theory and STEM education and practice

Gagnier, Kristin Michod; Shipley, Thomas F.

doi:10.1186/s41235-016-0010-y

Original article
Open access
Published: 22 September 2016

Visual completion from 2D cross-sections: Implications for visual theory and STEM education and practice

Kristin Michod Gagnier¹ &
Thomas F. Shipley²

Cognitive Research: Principles and Implications volume 1, Article number: 9 (2016) Cite this article

2623 Accesses
5 Citations
Metrics details

Abstract

Accurately inferring three-dimensional (3D) structure from only a cross-section through that structure is not possible. However, many observers seem to be unaware of this fact. We present evidence for a 3D amodal completion process that may explain this phenomenon and provide new insights into how the perceptual system processes 3D structures. Across four experiments, observers viewed cross-sections of common objects and reported whether regions visible on the surface extended into the object. If they reported that the region extended, they were asked to indicate the orientation of extension or that the 3D shape was unknowable from the cross-section. Across Experiments 1, 2, and 3, participants frequently inferred 3D forms from surface views, showing a specific prior to report that regions in the cross-section extend straight back into the object, with little variance in orientation. In Experiment 3, we examined whether 3D visual inferences made from cross-sections are similar to other cases of amodal completion by examining how the inferences were influenced by observers’ knowledge of the objects. Finally, in Experiment 4, we demonstrate that these systematic visual inferences are unlikely to result from demand characteristics or response biases. We argue that these 3D visual inferences have been largely unrecognized by the perception community, and have implications for models of 3D visual completion and science education.

Significance

Practitioners of science, technology, engineering, and mathematics (STEM) disciplines such as radiologists, geologists, and surgeons often have to make inferences about the three-dimensional (3D) structure of objects (organs, rocks, tumors) from two-dimensional (2D) surface views such as magnetic resonance imaging (MRI) slices or outcrops of rock. These inferences represent a challenging visual problem that has not been explored in the literature on amodal visual completion. Additionally, these inferences can be challenging for students, and understanding why they are difficult has the potential to inform education. In this paper, we present data that suggest specific priors in how observers infer 3D structures from 2D cross-sections; these priors influence both the accuracy of observers’ inferences and recognition of when accurate inferences are not possible. These priors have implications for theories of visual processing and also in science education for how to develop effective pedagogical approaches to teaching students to make inferences about the 3D structures from 2D cross-sections.

Visual Completion from 2D Cross-Sections: Implications for Visual Theory and STEM Education and Practice

If you inspect the rust-colored lines on each side of the rock in Fig. 1, no doubt you will have the impression of planar layers of mineral in the marble. Observers seem to have no difficulty connecting the similar-colored lines on each side, and making an inference about how they extend into the rock. Notice that we use the term inference here because one cannot actually see the planar form inside the rock. This may seem like a trivial achievement. In this case, accurately inferring the 3D shape of the layers from the surface features is possible because two sides of the rock are visible, so the internal structure may be inferred by filling in the plane between the lines. However, consider the problem if only a single side were visible, as is the case in a single cross-sectional view of any object. In this case, the single planar layer of marble would appear as a line on the surface; however, the line would provide no information about the orientation of the internal layer.

To illustrate, we provide an example outside of geology and invite the reader to consider the slice of bread shown in Fig. 2a. Most people report a clear impression that the cinnamon layer extends straight back into the slice, often arguing, “How could it be otherwise; surely the bread was baked with a layer that extended from one end to the other.” This logic is faulty, as it assumes (a) completely regular dough thickness during the rolling of dough and cinnamon, (b) perfect alignment of the shaped dough with the pan, and (c) completely isotropic rising. More importantly, in this case, it is incorrect. The dip^{Footnote 1} of the cinnamon is shown in Fig. 2b; the cinnamon extends into this slice at a dip angle of 60°, not 90°.

Here we present work on a class of 3D visual inferences that has not received much attention in the visual perception community – inferring the 3D interior structure of an object from surface features and possible constraints on these inferences. This visual task is important for many science, technology, engineering, and mathematics (STEM) disciplines in which 3D structures are measured or displayed using a series of cross-sections. (For example, a radiologist infers the 3D structure of a tumor from MRI scan slices, and a geologist infers 3D structures in the Earth from outcrops visible on the surface.) This task is challenging for students (Kali & Orion, 1996) and experts (Bond, Lunn, Shipton, & Lunn, 2012) alike. Understanding visual inferences about 3D structures can offer insights into principles of visual processing and has implications for STEM education and practice. Our aims are to begin to characterize visual inferences about 3D structure from surface features and to highlight the similarity between this inference and amodal completion (Kanizsa, 1979; Michotte, Thinès, & Crabbé, 1964).

Background

A central problem in midlevel vision is to understand how the visual system uses light projected onto a 2-dimensional (2D) surface to represent 3-dimensional (3D) properties of the world. Our world is 3D, and thus we have to mentally represent and process the 3D structure of objects. Yet, the retinal image is flat, so the 3D structure of an object has to be inferred by the visual system. Much of the work on this problem has been focused broadly on the question of how spatial relations are recovered from the 2D retinal projection. Seminal work on this problem has identified how 3D shape is inferred from stereoscopic disparity (Marr & Poggio, 1976), edges (Marr & Hildreth, 1980), contour shape (Marr & Nishihara, 1978; Ullman, 1989), shading (Buelthoff & Yuille, 1991; Hayward, 1998), silhouette (Koenderink & Van Doorn, 1991), motion (i.e., the kinetic depth effect; Wallach & O’Connell, 1953), and textural gradients (Gibson, 1950), collectively referred to as shape-from-X (Buelthoff & Yuille, 1991).

The problem of inferring the 3D shape from a cross-section is similar to other 3D shape-from-X problems where observers have to make inferences about the 3D shape from limited perceptual information. Despite the inherent ambiguity of structures visible on a cross-sectional surface, observers may nevertheless have clear and systematic impressions of the shape and orientation these structures take as they extend into the object. The possibility that observers have clear and systematic impressions of how regions extend into cross-sections, or priors, first came to our attention in discussions with geologists, who noted that their students often reported that geological structures seen in an outcrop (a cross-section of rock) extended straight back into the rock (Kali & Orion, 1996). The purpose of the work presented in this paper is to examine the generality and implications of this observation.

We borrow the term prior from the Bayesian framework, which models decision-making under uncertainty. This framework has been applied successfully to understanding the role of prior knowledge in visual perception, as well as perceptual illusions and constancies (Kali & Orion, 1996). This work proposes that experience on a personal or evolutionary time scale supports unconscious statistical inferences (Helmholtz, 1867) about the probability of the environment’s being a particular way, given specific sensory input. For example, in apparent motion, there is a specific prior to infer a straight path between objects that successively appear in different locations, unless the objects are biological, where curved paths may be more likely (Heptulla-Chatterjee, Freyd, & Shiffrar, 1996). In such a case, priors may reveal something about the statistical likelihood of objects or events in the environment.

Across four experiments, we examined visual inferences from structures visible only in cross-sectional views (e.g., the rust-colored lines in Fig. 1 and the cinnamon swirl in Fig. 2). We showed undergraduate psychology students photographs (Experiments 1 and 2) or 3D objects (Experiment 3) where a single surface was visible and asked for their impression of how a specific region extended into the object. We examined whether (a) observers tended to infer 3D forms from single surface views or whether this phenomenon was restricted to a few rocks or geology students, (b) there is a prior to infer regions visible in a cross-section as extending back into the object at 90°, (c) observers can correctly infer the 3D structure if given sufficient information, and (d) the visual inferences made from cross-sections occur in spite of world knowledge and thus may be similar to the visual inferences seen in amodal completion (Kanizsa & Gerbino, 1982; Michotte et al., 1964), where structures that are not visible are inferred from visible structures.

To preview our results, in Experiments 1, 2, and 3, observers did indeed report regions in cross-sections as extending straight back into the object at 90°. However, in Experiment 2, when observers were given sufficient information, they were generally correct in their inferences and thus did not show evidence of this prior. Finally, in Experiments 3 and 4, we show that this prior is not adjusted by memory or beliefs, suggesting similarities with amodal completion. We argue that this prior to report regions as extending straight back is revealing about how the perceptual system processes information about the interior of objects from information on the object’s surface and is relevant to STEM education where students learn about interior structures from cross-sections.

Experiment 1

The purpose of Experiment 1 was to describe naïve observers’ reports of the internal structure of objects inferred from surface patterns. Do observers (a) report 3D forms, and, if so, (b) how do they report those regions as continuing into the object? Observers viewed photographs of cross-sections of everyday objects (rocks, wood, food) and indicated whether a region highlighted with a red line (shown in Fig. 2a) was present only on the surface or whether it extended into the object. If they thought it extended into the object, they used a bar attached to an inclinometer to show the angle of extension. If participants tended to infer regions in a cross-section as extending straight back in three dimensions, then we would expect dip estimates to cluster near 90°. If inferences of cross-sections are not systematic, then we would expect to see a wide range of estimates of extension (e.g., some estimates at 45°, some at 170°).

Anticipating there might be some individual differences, we sought to determine whether there was a relationship between the estimated dip angle and performance on measures of spatial reasoning about perspective and orientation of 3D forms. We hypothesized that observers who performed better at measures of spatial reasoning might be more likely to recognize that one cannot know the true 3D shape from a single cross-sectional view.

Methods

Participants

Participants were 30 Temple University undergraduates (19 females) fulfilling a requirement for an introductory psychology course.

Stimuli

The stimuli consisted of 17 color photographs cross-sections of common objects such as food, wood, and rocks, as shown in Fig. 2a. One practice image (a Swiss roll) and 16 experimental images were used. Stimuli fell into three categories: (a) biological [fruits (n = 3); wood (n = 2); vegetable, fish, and meat (n = 2)], (b) geological [rocks (n = 4)], and (c) analogues to igneous rocks—these are food products that were originally liquid and are now solid (blue cheese, chocolate with almonds, and cinnamon bread). The images were approximately 25 × 18 cm when presented on the screen.

These categories were selected because the internal structure of the objects ranged from highly structured and constrained by the environment (e.g., wood grain) to relatively unconstrained (e.g., minerals in rock), and thus the angle of extension into the object is either predictable within a certain range or completely unpredictable. For example, the internal structure of wood is constrained by the environment. As tree structures are generally concentric cylinders, the extension into a block of wood is a function of the angle of the cut relative to the cylinders. The internal composition of rocks can be structured, but the orientation of a mineral’s surface relative to the cutting plane is essentially arbitrary, and thus the 3D structure is unpredictable from a single cross-section. This is also true for geologically analogous stimuli such as nuts in chocolate—the orientation of the nut relative to the orientation of the cut is arbitrary.

These objects were chosen with two additional constraints: (a) that we were physically able to slice each object and measure the true angle at which each highlighted region extended into the object, and (b) that we included a variety of objects that might be familiar to participants.

Apparatus

Stimuli were presented on a 20-inch Dell monitor. As shown in Fig. 3a, the monitor was positioned parallel to the ground.

Procedure

Participants were tested individually in a well-illuminated room. They viewed each picture while standing with both eyes open and positioned over the center of the monitor (as shown in Fig. 3a). Participants were told that we were interested in their opinions of how regions continue in three dimensions. They were told that sometimes they would see pictures where they might have a strong sense that a region continued and sometimes they might have a sense that something was present only on the surface. To illustrate these, students were shown a cross-section of the front of a Swiss roll (a filled pastry with alternating layers of chocolate cake and cream) and crayon marks on paper. All students reported the layers of the Swiss roll as extending into the object while the crayon marks were present on the surface.

Observers were then shown 16 pictures. For each, they indicated whether the region indicated with a red line was present only on the surface (like the crayon marks on the paper) or extended into the object (like the Swiss Roll). If they thought the region extended into the object, they indicated the orientation using a stainless steel bar with an attached inclinometer (to measure angle). Participants placed the end of the bar on the red line and then moved the bar to indicate the angle. The 0° was defined relative to the ground plane (i.e., if the bar was angled straight down, as shown in Fig. 3a, the angle measurement was 90°). After this, they reported their confidence in their response on a 5-point scale (5 indicates “extremely confident”). Prior to viewing the 16 pictures, participants practiced orienting the bar on the image of the Swiss roll.

To be sure that there were no differences in the estimates based on the orientation of the picture, participants viewed all 16 pictures in their original orientation and rotated 180°. This allowed us to calculate any bias due to body position relative to the image. Finally, participants were shown the pictures a third time and asked to identify each picture. For any response given a confidence rating of 0 or 1, we further probed the participants’ uncertainty by asking them which of the following reasons best described their confidence rating: (a) they have no idea what the orientation could be, (b) the orientation is not knowable from the picture, or (c) there could be a range of possible orientations at which the region extended into the object. After this, participants completed three measures of spatial reasoning.

Measures of Spatial Reasoning

The Geologic Block Cross-Sectioning Test (GBCT; Ormand et al., 2014) is a measure of visualizing volumetric forms from cross-sections. Participants were given 8 minutes to complete 14 problems in which they had to select the cross-section that resulted from a pictured cut into a geologic block diagram (as shown in Fig. 4).

The Object Perspective Taking Test (Kozhevnikov & Hegarty, 2001) is a test in visualizing the locations of objects when seen from a specific perspective. A configuration of seven objects is shown, and participants imagine standing at the position of one object, facing another object, and they indicate the direction to a third object. Participants had 5 minutes to complete 12 questions, and the dependent measure is angular error.

The Peters and colleagues (1995) paper-and-pencil version of the Vandenberg and Kuse (1978) Mental Rotation Test measures skill in visualizing objects after they are rotated. Observers viewed five line drawings of 3D forms similar to those used by Shepard and Metzler (1971). The target form is on the left, and four answer choices are presented on the right. For each problem, participants identify the two choices that are identical but rotated versions of the target form. The test has 2 parts with 12 problems each, and participants were given 3 minutes to complete each half.

Unbiased Estimate Measurement

To calculate the participant’s unbiased estimate for each picture, we combined the two estimates by calculating the average of the first estimate and 180° minus the second estimate (when the picture was rotated 180°). Overall, participants exhibited a bias of 4.1° toward their body.

Results and Discussion

Participants were fairly confident that the 3D form could be determined from a cross-sectional view. In 74 % of the trials, they reported the region as “going into” the object and gave an estimate of dip with a mean confidence of 3.2 (SD 1.1). There was variation across images. Dip estimates were given most often for the salmon cross-section (97 % of the trials) and least often for the papaya cross-section (37 %).

Figure 5a shows the true dip angle (how the highlighted region actually extended into the object) and participants’ mean dip estimate for each picture. Although there is a wide range of true dip angles, participants reported regions as extending straight back into the object. Fifteen of the 16 pictures have mean dip estimates that are not significantly different from 90°.^{Footnote 2} Figure 5b shows the frequency distribution of dip estimates across all pictures. The estimates cluster near 90°, with relatively little variability: 76 % of responses were within 10° of 90°. When participants reported the regions as extending into the object, they said the regions extended straight back.

Next, we examined performance on our three measures of spatial reasoning. The mean number correct on the GBCT was 3.9 (SD 2.0) out of 14. The mean angular error for the Object Perspective Taking Test was 52.2 (SD 26.3).^{Footnote 3} The mean number correct on the Mental Rotations Test was 7.4 (SD 3.3) out of 24. Performance on the spatial measures did not predict the mean dip angle estimates (R ² = .005, F < 1). This is perhaps not surprising, given the limited range of reported dip angles; most regions were reported as extending straight back into the objects.

In sum, the responses indicated a prior to infer structures visible in a single cross-section extending straight back into the object at 90°. This was not limited to rocks, but occurred for familiar objects.

Experiment 2

In Experiment 2, we further probed the nature of the prior. We reasoned that perhaps observers knew that the orientation of an extension was unknowable from a single cross-sectional view, but because they were given only two answer choices (surface vs. extends in), they were prevented from expressing this knowledge. To address this, participants in Experiment 2 selected one of the following answer choices for each picture: (a) the region extends into the object, and I can show you the angle of extension; (b) the region is present only on the surface; (c) the region extends in, but from the picture you cannot know how it extends in; and (d) from the picture, you cannot tell if the region extends in or is on the surface (i.e., the answer is unknowable). As in Experiment 2, we examined whether there was a relationship between spatial reasoning performance and how likely a participant was to understand the uncertainty of a cross-sectional view. We also changed the orientation of the image to be sure that the effect observed in Experiment 1 was not dependent on the particulars of the viewing perspective. In Experiment 2, participants viewed the screen oriented perpendicular to the ground and indicated their dip estimates from this position (as shown in Fig. 3b).

Finally, we included two additional types of stimuli. First, we added five additional pictures in which the region was present only on the surface to serve as a check to ensure participants were using our dependent measure correctly. Second, after giving their responses to the pictures, participants viewed 3D models of layers of Play-Doh (as shown in Fig. 6). As observers could see how a region shown on the top extended in three dimensions by also looking at the cross-section on the side, we could determine if participants could correctly infer the orientation of extension when given sufficient information to solve the problem.