Learning hierarchically organized science categories: simultaneous instruction at the high and subtype levels

Nosofsky, Robert M.; Slaughter, Colin; McDaniel, Mark A.

doi:10.1186/s41235-019-0200-5

Original article
Open access
Published: 19 December 2019

Learning hierarchically organized science categories: simultaneous instruction at the high and subtype levels

Cognitive Research: Principles and Implications volume 4, Article number: 48 (2019) Cite this article

2304 Accesses
2 Citations
Metrics details

Abstract

Background

Most science categories are hierarchically organized, with various high-level divisions comprising numerous subtypes. If we suppose that one’s goal is to teach students to classify at the high level, past research has provided mixed evidence about whether an effective strategy is to require simultaneous classification learning of the subtypes. This past research was limited, however, either because authentic science categories were not tested, or because the procedures did not allow participants to form strong associations between subtype-level and high-level category names. Here we investigate a two-stage response-training procedure in which participants provide both a high-level and subtype-level response on most trials, with feedback provided at both levels. The procedure is tested in experiments in which participants learn to classify large sets of rocks that are representative of those taught in geoscience classes.

Results

The two-stage procedure yielded high-level classification performance that was as good as the performance of comparison groups who were trained solely at the high level. In addition, the two-stage group achieved far greater knowledge of the hierarchical structure of the categories than did the comparison controls.

Conclusion

In settings in which students are tasked with learning high-level names for rock types that are commonly taught in geoscience classes, it is best for students to learn simultaneously at the high and subtype levels (using training techniques similar to the presently investigated one). Beyond providing insights into the nature of category learning and representation, these findings have practical significance for improving science education.

Significance

A fundamental part of science education involves teaching the categories of the target domain. Furthermore, in numerous cases, the categories are hierarchically organized, with high-level divisions broken down into fundamental subtypes. This research addresses the question whether requiring students to learn the subtypes may sometimes lead to more effective teaching of the high-level divisions themselves. The question is pursued here in basic-research laboratory experiments that investigate performance in a real-world science domain; namely, rock classification in the geologic sciences. In particular, the participants in our studies learn to classify sets of images of rocks into categories that are commonly taught in college-level introductory geoscience courses. The results from the work provide firm suggestions for methods that are likely to be effective for teaching the hierarchical structure of categories in the science classroom.

Introduction

An integral part of science education is learning the categories of the domain of interest. For instance, botany focuses on classifying and learning plants, entomology on classifying and learning insects, and geology on classifying and learning rocks. As argued below, learning these categories is fundamental to scientific reasoning and inference and forms a significant component of college-level science curricula.

In the present research, our example target domain is rock classification in the geologic sciences. As is true of numerous natural science categories, rock types have a graded structure, with clear prototypical instances at their centers, but also with many less typical instances (Rosch, 1973; Smith & Medin, 1981). Thus, individual samples of the same type of rock can often display remarkable within-category variability. In addition, as is also true of most natural categories, the boundary lines dividing different rock types are often fuzzy, and the distributions of members from contrasting categories may sometimes even overlap. Moreover, rock categories have a hierarchical structure in which broader level categories (igneous, metamorphic, sedimentary) subsume lower level subtypes organized within each broad-level category (as displayed in Table 1). In the senses described above, rock classification appears to be both a challenging and representative example of natural science category learning.

Table 1 A breakdown of the rock high levels and subtype levels used in the study

Full size table

Teaching rock classifications is one of the early goals in geoscience education. Introductory college-level geology textbooks devote multiple chapters to the classification of rocks (e.g., Marshak, 2015; Tarbuck & Lutgens, 2017), as does the National Association of Geoscience Teachers/American Geological Institute Laboratory manual in physical geology (Cronin, 2018). The textbook and laboratory manual chapters provide detailed descriptions of the major categories of rocks, and they attempt to characterize the key features and dimensions that organize and compose the rock categories. Further, laboratory sessions and field work associated with college-level introductory geoscience courses often devote significant amounts of time to the training of rock classifications.

Teaching fundamental categories, such as rock types in geology, is a core component of science curricula for a good reason. Categories are the building blocks of our basic thought processes and they provide an efficient means to allow us to reason about the nature of the world and draw inferences. Examples of the important role of rock classification in reasoning and inference abound in geology. For example, as conceptualized in the geologic sciences, one of the broad high-level divisions of rocks is the class of igneous rocks; this high-level division is composed of rocks formed from the solidification of magma. A major distinction between categories of igneous rocks is that of intrusive versus extrusive rocks. Intrusive igneous rocks, such as granite, are formed when magma solidifies at depth. In this case, the magma cools slowly, allowing large crystalline mineral structures to develop resulting in a coarse grain. By contrast, extrusive igneous rocks, such as rhyolite, are formed when magma solidifies in a surface environment. In this case, the magma cools quickly, resulting in a fine-grained crystalline structure. A geologist examining a terrain might therefore obtain clues about its history by determining whether the rocks that compose the terrain are intrusive or extrusive igneous rocks as evidenced by the grain size of the rocks.

As alluded to above, in numerous scientific domains the categories are hierarchically organized. For example, geologic scientists divide rocks into three, broad, high-level categories: igneous, metamorphic and sedimentary (Marshak, 2015; Tarbuck & Lutgens, 2017). These broad categories are defined by how the rocks are formed. In brief, whereas igneous rocks are formed from the solidification of magma, metamorphic rocks are formed when other rocks are exposed to extreme heat and pressure, causing them to undergo changes in their physical or chemical structure. Finally, sedimentary rocks are formed when mineral and organic particles are deposited on the floor of bodies of water and are eventually cemented together. However, each of these broad, high-level divisions is broken down into fundamental subtypes. For example, common subtypes of igneous rocks are granite, obsidian and pumice; common subtypes of metamorphic rocks are gneiss, marble and quartzite; and common subtypes of sedimentary rocks are sandstone, shale and limestone.

If we suppose that the goal of the instructor is to teach students to classify into the high-level divisions of igneous, metamorphic and sedimentary, then a reasonable hypothesis is that, to achieve that goal, it might be best to focus training on that high level, without also requiring that students learn to discriminate among all the subtypes. Consistent with the principle of transfer-appropriate processing (e.g., Blaxton, 1989; Thomas & McDaniel, 2007), such training would focus on the outcome that is the instructor’s primary goal. Recent work reported by Noh, Yan, Vendetti, Castel, and Bjork (2014) is consistent with this hypothesis. In one of their conditions, these researchers had participants learn to classify pictures of snakes into the high-level categories of venomous versus nonvenomous. The participants’ high-level classification performance was better if they focused their learning solely at that high level, rather than also being required to simultaneously learn to discriminate among different subtypes of the venomous and nonvenomous snakes. Presumably, by focusing on the high level, participants learned more effectively to attend to features that are highly diagnostic of membership in the contrasting high-level categories (Nosofsky, 1986; Shepard, Hovland, & Jenkins, 1961). For example, venomous snakes tend to have arrow-shaped heads whereas nonvenomous snakes tend to have spoon-shaped heads. By contrast, use of the head-shape feature does not allow one to discriminate among different subtypes of venomous snakes or different subtypes of nonvenomous ones.

However, other research has sometimes pointed in the opposite direction, with subtype-level training being shown to be beneficial. For example, using a particular artificially designed category structure, several researchers found that learners displayed more accurate classification when trained and tested at a subtype level of a hierarchy than at a higher, more general level (Lassaline, Wisniewski, & Medin, 1992; Palmeri, 1999; Verheyen, Ameel, Rogers, & Storms, 2008).

Nosofsky, Sanders, Gerdom, Douglas, and McDaniel (2017) found evidence consistent with the hypothesis that the best teaching strategy may vary depending on the structure of the categories being learned. Using real-world rock categories as their target domain, these researchers compared two different teaching strategies across two different category structures. One teaching strategy focused solely on teaching the high-level divisions whereas, in the second, participants simultaneously learned to classify at both the high and subtype levels (see below for further details). The to-be-learned category structures, illustrated schematically in Fig. 1, were either compact or dispersed. As shown in Fig. 1, in both category-structure conditions each high-level division (igneous, metamorphic and sedimentary) was composed of three subtypes of rocks. In the compact condition, the subtypes were chosen such that all three subtypes belonging to the same high-level division were highly similar to one another, while being dissimilar to the subtypes from the alternative high-level categories. Thus, each high-level category formed a relatively compact cluster in a multidimensional similarity space. By contrast, in the dispersed condition, the subtypes were chosen such that each of the three subtypes belonging to the same high-level division were dissimilar to one another, occupying separate clusters of the similarity space. At the same time, each subtype was similar to individual subtypes from both other high-level divisions. We acknowledge that readers may find the category structure in the dispersed condition to be contrived because three different categories with the same high-level name are clustered together within each of the similarity groups. Nevertheless, we emphasize here that the category structure was produced by sampling real-world subtypes from the actual high-level divisions of igneous, metamorphic and sedimentary rocks. In other words, in the real world, there are many cases, for example, of igneous subtypes that are highly similar to metamorphic subtypes, and of igneous subtypes that are highly dissimilar from one another. We expand on this point in greater detail below.

In brief, Nosofsky et al. (2017) found that learning the high-level names of the rocks in the compact condition was better when the training procedure focused solely on teaching the high-level names for the rocks. By contrast, learning the high-level names of the rocks in the dispersed condition was better when the training procedure required participants to simultaneously learn both the high-level and subtype-level names. The latter result is of potentially high practical significance. Nosofsky et al. (2017) and Nosofsky, Sanders, Meagher, and Douglas (2018) conducted extensive similarity-scaling studies in which participants rated the similarity among pairs of items drawn from a large battery of igneous, metamorphic and sedimentary rocks that are representative of those taught in college-level introductory geoscience classes. Multidimensional scaling analyses of the similarity-judgment data revealed that the structure of the igneous, metamorphic and sedimentary categories does in fact appear to be relatively disorganized and dispersed (although not to the extreme degree illustrated in the right panel of Fig. 1).

Although suggestive, there were at least two major limitations of Nosofsky et al.’s (2017) initial study in terms of its practical implications. First, as already noted, the researchers constructed the compact and dispersed conditions by selectively sampling rock subtypes from the three high-level categories. A natural question is how the alternative teaching strategies would fare if participants were tasked with classifying a larger set of ‘authentically’ sampled rocks that are more representative of those taught in introductory geoscience classes.

A second and more fundamental limitation concerns the detailed method that Nosofsky et al. (2017) used in the condition in which participants learned to classify at both the high and subtype levels simultaneously. In particular, Nosofsky et al. used a ‘simultaneous paired-naming’ procedure, in which the response alternatives associated with each rock consisted simultaneously of the high-level category and the subtype name. For example, one rock might be designated as ‘igneous-granite’ and another as ‘metamorphic-marble’. Both members of the paired name were always simultaneously present when participants made their responses. To measure high-level naming performance, the researchers scored a response as correct if the participant indicated the correct high-level name, regardless of the subtype-level response that was indicated. Unfortunately, however, although the procedure was well motivated from a theoretical standpoint (see formal modeling presented by Nosofsky et al., 2017), from a practical standpoint it does not allow one to determine whether participants actually learned the high-level names at all. In particular, a participant could have learned at least some rocks solely at the subtype-naming level, without ever establishing an association between the subtype name and the high-level name. For example, if a participant learned that a particular rock sample was granite, then he or she would press the ‘igneous-granite’ response key and receive credit towards a correct high-level categorization response. It is unknown, however, whether the participant could have correctly classified the sample as ‘igneous’ if the subtype name (‘granite’) was not simultaneously present.

Miyatsu, Nosofsky, and McDaniel (in press) conducted a series of experiments to begin to address both of the above-stated limitations. One change to Nosofsky et al.’s (2017) experiment was that, rather than using the selectively sampled compact and dispersed structures, Miyatsu et al. had participants learn a larger number of rock subtypes that provided a more representative sampling of the sets of igneous, metamorphic and sedimentary rocks found in the natural world. A second change involved the training and testing procedures in the conditions in which participants were trained to classify at both levels of the rock category hierarchy. Of most direct relevance to the current practical question were the procedures used by Miyatsu et al. (in press) in their experiments 1 and 2. In their experiment 1, Miyatsu et al. used an observational training procedure in which participants studied pictures of rocks with names assigned to them. One group of participants studied the pictures with just the high-level names, whereas the second group studied the pictures with both the high-level and subtype names. (During this observational training participants were provided with general instructions to learn the names associated with the rocks.) Following the observational training phase, participants were tested on their ability to classify both old and new rock pictures into their high-level categories. Participants in the high-level name-only training group performed significantly better at the time of the test (on both old and new items) than did participants in the paired-name training group. One limitation of this design, however, involves the problem that many participants in the paired-name training group may have focused their learning on the subtype names without ever forming associations between the subtype-level and high-level names. Clearly, such participants would then be severely impaired when tested on their ability to classify the items into their high-level categories.

To potentially address this limitation, in a second experiment Miyatsu et al. (in press) tested two new groups of participants. The first group was again trained using observational training of only the high-level names, and the participants knew that they would eventually be tested on their ability to produce the correct high-level classifications. A second group, however, first engaged in observational classification training at only the subtype level (and were instructed to learn the subtype-level name assignments). Next, the participants in this group engaged in a separate paired-associate training phase in which they were trained on the pairings between the subtype-level and high-level category names. Finally, during the subsequent test phase, participants in this group attempted to classify each rock into its high-level category. Just as in their experiment 1, Miyatsu et al. found that the group that was trained on only the high-level names performed significantly better in high-level classification at the time of the test than did the participants in the subtype-level/paired-associate training group. Unfortunately, however, this new experimental design ended up with essentially the same main limitation as the previous one: during the paired-associate training phase, participants in Miyatsu et al.’s study achieved an accuracy level of only .76 in producing the high-level category name associated with each subtype name. Thus, even if participants had learned to classify extremely accurately at the subtype level, it stands to reason that the high-level classification performance for this group (during the final test phase) would be impaired. In addition, Miyatsu et al.’s experiment 2 design also had the limitation that, for the subtype-level/paired-associate group, no form of high-level training occurred in the initial classification-learning phase (participants were trained at only the subtype level). Thus, there was no opportunity for participants to learn to give greater attention to features that were diagnostic at the high level of classification.^{Footnote 1}

The central motivation of our present research was to continue to investigate the potential utility of subtype-level training in improving high-level classification in this rock domain. Our main goal was to try to develop a training procedure that maintained the potential advantages of simultaneous high-level and subtype-level training, while directly addressing the limitation that many participants may fail to learn associations between the subtype-level and high-level names. To preview, we introduce a new condition in which classification training again takes place simultaneously at both the high and subtype levels, but which places stronger emphasis than the previous studies on the goal of learning both levels, and which provides continuous practice throughout training to promote the achievement of this goal. We compare the performance of this new training group to that of two comparison groups who are trained using procedures similar to those in the previous studies of Nosofsky et al. (2017) and Miyatsu et al. (in press).

Experiment 1A

Across three different conditions, participants learned to classify images of rocks into the high-level categories of igneous, metamorphic and sedimentary. In all conditions, the instructions to the participants emphasized that their primary task was to learn these high-level category assignments. In some of the conditions, the participants also learned to classify the rocks into their subtype categories. The complete set of rocks comprised 30 subtypes, 10 subtypes from each of the three high-level categories. The subtypes are listed in Table 1. The subtypes are highly representative of those that are commonly taught in introductory college-level geoscience classes, and are among the major ones listed and described in introductory textbooks (e.g., Marshak, 2015; Tarbuck & Lutgens, 2017). Because it was unrealistic to expect a participant to learn all 30 subtypes in a single 1-h session, each individual participant was randomly assigned 15 of the 30 subtypes to learn (five from each of the three high-level categories).^{Footnote 2}

In condition 1, participants were trained on only the high-level names of the rocks. An example screenshot of the question prompt on a typical trial is presented in Fig. 2. As illustrated, on each trial, an individual rock would be presented, and the participant would attempt to classify it into one of the three high-level categories. Feedback was provided only with respect to the high-level category to which the rock belonged. In the test phase, participants continued to classify items at only this high-level of categorization.

In condition 2, participants learned simultaneously to classify rocks into both their high-level and subtype-level categories. The condition used a two-stage response procedure. An example screenshot of the first stage of an individual trial is presented in Fig. 3. As illustrated, an individual rock would be presented in the center of the screen. Underneath the rock, the high-level responses igneous, metamorphic and sedimentary were shown in three columns, and beneath each high-level name were shown the subtypes for that high-level category. In the first stage, participants were prompted to enter the high-level response for the rock. Once the high-level response was selected, the second response stage began. As shown in the example screenshot in Fig. 4, the participant was prompted to select the subtype name from among the possibilities for the selected high level. For instance, if a participant had responded that the rock was metamorphic, he or she would then be prompted to select the rock’s subtype name within the metamorphic column. This same two-stage procedure for collecting responses continued to be used in the testing phase of condition 2.

Our central idea in implementing the two-stage response procedure of condition 2 was that it might combine in synergistic fashion various elements of previously tested procedures that have advantageous components. First, because participants are required initially to classify at the high level, they may be motivated to search for features that are diagnostic at that high level. Second, the requirement that participants also learn the subtype-level categories may foster the learning of aspects of the category structure that are disorganized and dispersed (e.g., in which subtypes from contrasting high-level categories are highly similar to one another). Third, the requirement that participants make two separate responses on each individual trial — first the high-level classification response and then the subtype-level one — might be effective in allowing participants to develop learned associations between the high-level and subtype-level names of the rocks.

Nevertheless, in terms of assessing the participants’ acquired knowledge, this experimental condition has the same limitation as did the simultaneous paired-name condition that had been tested in Nosofsky et al.’s (2017) experiment. In particular, because the high-level and subtype-level names were simultaneously present, a participant could in principle focus on only the subtype names during both training and test. On each trial, if the participant decided that a rock was, for instance, ‘granite’, then he or she could enter the corresponding high-level category response (‘igneous’) by making reference to the column in which granite appeared. Thus, an alternative condition was required to evaluate the extent to which the training procedure is effective in allowing participants to directly classify the rocks at the high-level of categorization.

We addressed this requirement by also conducting condition 3. With one exception described below, the training phase for condition 3 was identical to that in condition 2. The key difference across the conditions arose at time of test. Whereas in condition 2 we continued to present the subtypes along with the high-level names at the time of the test (in the column format illustrated in Figs. 3 and 4), in condition 3 the subtypes were no longer presented. Instead, just as in condition 1, the question prompt now made reference to only the high-level categories (as illustrated in Fig. 2). Thus, condition 3 provided a pure test of the participants’ ability to classify the rocks into the high-level categories, without the benefit of an external cue that linked the subtype names to the high-level names.

The second difference between conditions 2 and 3 arose during the training phase. On 80% of the trials, the same two-stage response procedure was used in condition 3 as in condition 2. However, on 20% of the trials, the question prompt was the same as in condition 1; that is, participants were required to classify the rock into one of the high-level categories without the benefit of the external cue showing which subtype names were linked with which high-level names (as in Fig. 2). In addition, on these trials, a reminder message was provided at the bottom of the computer screen stating: “Remember, your primary job is to learn the high-division names.” We included these high-level-only trials to remind participants that their primary task was to learn the high-level name for each rock and to discourage participants from developing a strategy of relying solely on learning the subtype-level names.

A schematic summary of the training and testing procedures across the three conditions is provided in Fig. 5.

Method

Participants

There were 95 undergraduate students from Indiana University Bloomington who participated as part of a requirement for their introductory psychology courses. The participants all had normal or corrected-to-normal vision and all reported having normal color vision. All reported that they had little or no previous experience in rock classification. Each participant’s condition was randomly assigned, with 32 participants in condition 1, 32 in condition 2, and 31 in condition 3. These sample sizes were as large or larger than in the individual conditions of the closely related studies that most directly motivated the present research and that found significant differences in the outcomes of the broad- versus specific-level training (Miyatsu et al., in press, experiments 1 and 2; Noh et al., 2014; Nosofsky et al., 2017). (As it turned out, the correlation on the repeated old–new item performance measure in our study was r = .74; this yielded power = .628 to detect a medium-size main effect of training procedure on test-phase performance, and power = .966 to detect a large-size effect.)

Stimuli and apparatus

The stimuli consisted of 360 pictures of rocks from the three broad divisions of igneous, metamorphic and sedimentary rocks. Each broad division comprised 10 subtype categories, listed in Table 1. There were 12 samples of each subtype. The rock picture samples were taken from a variety of online sources (for a fuller description of these stimulus materials, see Nosofsky, Sanders, Meagher, & Douglas, 2018). The experiment was programmed in MATLAB using Psychophysics toolbox (Brainard, 1997) on a personal computer running Microsoft Windows.

Procedure

For each individual participant, 5 of the 10 subtypes from each of the three broad divisions of igneous, metamorphic and sedimentary rocks were randomly selected. The participant learned the items from only these randomly selected subtypes. We used this procedure of sampling a subset of the categories because pilot work suggested that overall learning performance would be poor if participants were required to try to learn all 30 categories in a single 1-h session.

In all conditions, the procedure consisted of a training phase and a test phase. The training phase consisted of three training blocks with 90 trials in each block, whereas the test phase consisted of one block with 120 trials. The test phase included presentations of old items from the training phase as well as novel transfer items from the studied categories. Across all conditions, for each individual participant, the members of each rock subtype were randomly assigned as either training or novel transfer stimuli. For each subtype, there were six randomly chosen training examples and four randomly chosen transfer examples.

Across all conditions in the training phase, on each trial, a picture of a training rock was displayed in the center of the screen and the participant attempted to classify it. Each of the individual training items was presented once per block, with the order of presentation of the 90 training items randomized. An analogous procedure was used across all conditions in the test phase. The tested stimuli consisted of four of six randomly selected training examples from each of the 15 subtype categories, and of the four novel transfer items from each of the 15 subtype categories, for a total of 120 test items. The order of presentation of the 120 items was randomized.

The nature of the training and test procedures in each of the conditions has already been described in our introduction to this experiment; here we provide only some additional methodological details. First, in cases in which participants were required to classify the rocks into their high-level categories, they did so by pressing the “i” key for igneous, “m” for metamorphic and “s” for sedimentary. In cases in which participants were required to indicate the subtype category of the rock, they did so by pressing a number on the keyboard that preceded the subtype name on the computer screen (as illustrated in Fig. 4). During the training phase, the computer displayed corrective feedback at the end of each trial (with the rock picture remaining on screen), stating that the participant was either “correct” or “incorrect” followed by the correct response. In condition 1 (and in the 20% of trials in condition 3 that required only a high-level response), the corrective feedback was with respect to only the high-level category of the rock (for example, “Correct! Igneous” or “Incorrect: Sedimentary”). In condition 2 (and in the 80% of trials in condition 3 that used the two-stage response procedure), the corrective feedback was provided after both responses were made. The computer provided simultaneous feedback at both levels, such as “Correct! Igneous-gabbro” or “Incorrect: Metamorphic-marble”. In all conditions, the feedback remained on the screen for 1 s following correct responses and for 2 s following incorrect responses (with the picture of the rock remaining on the screen). The feedback was followed by a 0.5-s inter-trial interval consisting of a blank screen. At the end of each training block, the computer reported to the participants their overall percentage of correct responses. The methodological details of the test phase were the same as already described for the training phase, except no corrective feedback was provided during the test-phase trials. Instead, the computer simply displayed a message of “okay” to indicate to the participants that their response had been recorded. At the end of the testing phase, the participants were thanked and were provided with a debriefing of the purpose of the experiment. The experimental session lasted roughly 50 min.