Successful retrieval is a fundamental expectation of a well-functioning memory. Yet, what we can access from memory fluctuates based on importance, salience, cues, frequency of previous retrieval, and context (e.g., Kornell et al., 2015; Light & Carter-Sobell, 1970; Smith & Vela, 2001; Tulving & Thomson, 1973). As such, retrieval failures are a common occurrence (Kornell & Bjork, 2009). In daily life, such experiences vary broadly, such as struggling to remember a person’s name or hesitating to fully recall a particular fact when discussing the details of the latest political exchanges. When retrieval fails, there are a range of underlying causes and associated mental experiences or phenomenological states, from the sensation of “drawing a complete blank” or having nothing come to mind, to perhaps having a tip of the tongue sensation (TOT; Schwartz, 1999). Despite the near universal nature of these experiences, it is noteworthy that much of the relevant research has focused on relatively basic retrieval from the knowledge base or on laboratory list-learning paradigms. Here, we instead explore the phenomenology associated with retrieval failures for a unique type of material: information about complex, real-world public events occurring within the previous decade or so that were, indirectly at least, “experienced” by participants in real time (i.e., as they occurred). These events are not historical, the information about them was not likely to have been learned formally, and memories for the associated details are likely linked with some episodic details in memory. We elaborate more below on the specific materials; first, we provide an overview of the relevant prior literature.
An extensive literature has investigated the TOT feeling of imminent retrieval associated with information that is just at the threshold of accessibility, including across the lifespan (Brown, 1991; Burke et al., 1991; see Schwartz, 1999; Schwartz & Metcalfe, 2014, for reviews). But of course, not all retrieval failures result in this very particular feeling (Koriat & Lieblich, 1977). As such, the literature on “feeling of knowing” (FOK), in which participants use numerical scales to rate their “feeling that one will be able to recognize—from a list of items—an item that is currently inaccessible” (Schwartz, 2006, pg. 153), has attempted to quantify the continuum of retrieval failure experiences more broadly (Hart, 1965; Koriat, 1993, 1995). Whereas TOT research typically assesses performance on vocabulary items (i.e., information stored in the knowledge base; Eysenck, 1979), FOKs have been used to examine retrieval of both general knowledge and more traditional laboratory-based episodic material (e.g., Hertzog et al., 2010; Schacter, 1983)
Building on these studies, within the context of retrieval of general knowledge, most recently, researchers leveraged natural language use, rather than numerical scales, to study the phenomenological and behavioral differences between a lack of accessibility versus availability (Coane & Umanath, 2019; cf. Tulving & Pearlstone, 1966). That is, they studied a basic difference between self-identified not remembering and not knowing and the ways in which these experiences are described in order to understand phenomenological experiences associated with retrieval failures (see also Hart, 1965; Smith & Clark, 1993). Coane and Umanath (2019) reported that participants’ definitions tended to indicate that not remembering reflected a temporary failure in accessibility (marginal knowledge; Berger et al., 1999), whereas not knowing reflected that the sought-after information was not part of the knowledge base and therefore not available. Thus, they found that these participant-generated definitions were consistent with Tulving and Pearlstone’s (1966) classic explanation of accessibility (retrievability) versus availability (storage). More specifically, from naïve participants to memory experts, participants’ definitions of Don’t Remember (DR) and Don’t Know (DK) spontaneously associated DR with a lack of access in the moment and forgetting, whereas DK was often defined as never having learned particular information at all.
The materials in Coane and Umanath’s (2019) investigation of DR/DK were from published norms of general knowledge (Tauber et al., 2013). Most of the questions from knowledge norms refer to events or information that occurred several decades prior (e.g., the name of the first cosmonaut) or are historical in nature (prior to any living age group’s lifetime) and were likely learned as part of formal education. They also refer to concepts that are relatively fixed (e.g., geography, scientific processes) or to culturally defined contexts such as literature and movies (Coane & Umanath, 2021; Nelson & Narens, 1980; Tauber et al., 2013). Such general knowledge is typically defined as “crystallized knowledge,” reflecting the long-term persistence and importantly, decontextualized nature of this information. Thus, these stimuli involve information learned long ago, in an educational context, rehearsed and retrieved often enough over time for the material to be solidified in memory, and generally not tied to specific event experiences and memories.
In fact, much of the research on the knowledge base, both in general and for older adults in particular, has examined general knowledge that is relatively stable or crystallized (Verhaeghen, 2003) and is often included in tests of intelligence or neuropsychological functioning (Kaufman & Kaufman, 1993; Wechsler, 1981). Older adults typically have strongly preserved knowledge, comparable to or exceeding that of younger adults until very late in life (see Umanath & Marsh, 2014, for a review). In sum, the materials examined in Coane and Umanath (2019) were likely devoid of episodic traces such as the time and place of acquisition, and any personally relevant details or affective responses, thus falling within the realm of semantic memory.
At the other end of the episodic/semantic memory spectrum, Lukasik et al. (2020) applied DR/DK to unanswerable questions in a traditional episodic memory context with mostly younger adults. Participants were presented with narratives and accompanying photos as the study materials. Then, they were tested on their memory for these materials in a recognition format with options to select the correct answer among lures as well as “I don’t know” and “I don’t remember” (without instructions on how or when to use these options). Critically, the test included questions regarding details that the participants had never seen, rendering those questions unanswerable; in these cases, the correct response would be “I don’t know.” Participants did respond DK significantly more often to unanswerable questions than answerable ones, providing evidence that participants were able to distinguish between using DR and DK. Lukasik et al. (2020) speculated that providing the DR response would lead to use of DK only when participants believed the questions were unanswerable. Instead, their collected data on participants’ explanations of how they used DR and DK generally replicated Coane and Umanath (2019), with DK being used for whenever they felt an answer was unavailable, whether because the detail was never presented or more commonly, because they thought they missed it at encoding, whereas DR was used whenever they felt the answer was available but inaccessible. Based on this work, it seems that the phenomenology associated with not remembering versus not knowing at least can be similarly experienced and effective for characterizing memories that are squarely within the realm of general knowledge and for at least one traditional episodic memory context. Further work is certainly needed in more episodic- or event memory-related studies to establish that this is wholly the case.
Understanding retrieval failures beyond semantic and event memory
In the present work, we test the validity of DR and DK with materials that potentially exist in the gray area between episodic and/or event memory (as defined by Rubin & Umanath, 2015) and semantic memory: public news events. These events were selected to primarily include events that were somewhat “viral” in nature: very popular and receiving extensive media coverage for a few days or weeks and then being covered less frequently as new stories emerge.
Materials like these are of practical and theoretical interest for several reasons. First, they can extend traditional memory research beyond the typical single learning episode under tight experimental control and can bridge the challenge of connecting “real world” and laboratory research (Koriat & Goldsmith, 1996). This is an important step in establishing the external validity of DR and DK for capturing a lack of accessibility versus availability.
Second, such memories are typically acquired through naturalistic exposure to media (e.g., radio, television, newspapers, social media). Most laboratory studies examining long-term episodic memory include relatively simple, well-controlled stimuli and delays of less than a day (and often less than an hour), whereas laboratory studies examining semantic memory rely on vocabulary tests or general knowledge (i.e., crystallized knowledge) acquired years or decades prior. Therefore, use of such stimuli allows us to explore long-term memory processes beyond these limits (Bahrick et al., 2013) in naturalistic, non-controlled learning environments (i.e., “in the wild”). The acquisition contexts are variable in terms of modalities, source, and a host of characteristics, such as where one was when they learned of these events, whom they were with, and their emotional reactions. Clearly, these contextual elements are not typically associated with semantic memory or the knowledge base, but with episodic memory (Tulving, 1972). As is assumed by many models and theoretical approaches, however, repeated exposure to and the associated accumulation of memory traces leads to an abstraction process and the loss of episodic traces (Baddeley, 1988; Conway et al., 1997; Nelson & Shiffrin, 2013; Schank & Abelson, 1995; Versace et al., 2009).
Third, given the unique nature of these stimuli, these types of events provide an opportunity to capture information that exists in the space between the extreme ends of semantic and episodic memory: Knowledge accompanied by episodic details such as where one was when the information was learned, emotional responses, etc., but may be in the process of taking on the characteristics of semantic memory (e.g., information that is known, not remembered, decontextualized traces; Brown, 1990). Importantly, we did not assess participants’ episodic memories for these events; rather, we were interested in how they used the terms DR and DK. As reviewed above, the previous work examining use of DR and DK has been limited to materials that attempted to be purely semantic or purely episodic– something that characterizes much of memory research to date (see Rubin & Umanath, 2015, for discussion).
In recent work focused on successful retrieval, using the same materials, Coane et al. (2022) found that when younger adults and older adults retrieved fact-based details about news stories from the previous decade in an experimental task, they provided a high rate of both remember and know responses, suggesting that this information may not be fully semanticized (because remembering is associated with retrieval from episodic or event memory, not the knowledge base). Thus, the use of these stimuli has been previously validated, and it has been established that the populations we are examining have been exposed to the material and have preserved memory traces. Furthermore, these types of materials appear to share characteristics of both episodic and semantic memory, at least based on the phenomenological responses given by participants.
Need for establishing the external validity of metacognitive judgments
It is also not only important, but necessary to test the effective usability of DR and DK for capturing the experiences of retrieval failures for different types of materials. Reliance on phenomenology can be problematic if participants and researchers do not consistently agree on the meaning of terms. For example, given the frequency with which older adults complain about retrieval failures (Cavanaugh et al., 1983), developing and validating ways that intuitively and consistently allow laypeople and researchers to understand the perceived causes of these failures is essential for implementing effective strategies for resolving or minimizing such challenges. Lack of clarity in how memorial experiences are described can limit the effectiveness of any intervention or limit the precision of theoretical approaches.
Bahrick et al. (2011) developed a stage model for the validation of metacognitive concepts, including naming the concept, instructions to participants, exploring the nature of participants’ phenomenological reports, and using behavioral data for validation. Coane and Umanath (2019) provided a foundation of internal validity for DR and DK. Moving beyond Bahrick and colleagues’ (2011) step of exploring participants’ phenomenological reports discussed above, younger and older adults demonstrated the metacognitive ability to use these simple phrases to effectively distinguish between a lack of accessibility versus availability when responding to general knowledge questions, behaviorally validating participants’ definitions. That is, when an initial DK response was given on a short-answer test (with or without correct answer feedback), performance on a later multiple-choice or short-answer general knowledge test was generally lower than after initial DR responses. In other words, when information was not accessible, participants were better able to recognize it among foils or recall it following feedback than when it was deemed not available.
So, here, the focus is on the next important step in validation of DR and DK to metacognitively delineate between types of retrieval failures by behaviorally testing the external validity of using DR and DK. Typically, external validity includes generalizability to other people, other research, and settings (Morling, 2017). Understanding the generalizability and boundary conditions for the usefulness of these terms is not only theoretically important and sound, but also necessary for effective implementation.
For comparison, consider the Remember–Know (R/K) paradigm that is used to understand the phenomenology and underlying processes of successful retrieval (Gardiner, 1988; Tulving, 1984). It also relies on participants’ understanding and correct reporting of their internal mental experiences (see Tulving, 1989, for a critique of this reliance). Despite a multitude of studies that have yielded similar findings with regards to how remembering and knowing are affected by various manipulations (see Dunn, 2004; Gardiner et al., 2002), the paradigm continues to be scrutinized for its basic face validity (Geraci et al., 2009; McCabe & Geraci, 2009; Perfect et al., 1996; Rubin & Umanath, 2015; Strack & Forster, 1995; Williams & Moulin, 2015; Yonelinas, 2002; for a review, see Umanath & Coane, 2020). That is, participants require extensive instructions (Barber et al., 2008; Gardiner & Java, 1990; Rajaram, 1993; Yonelinas, 2002) and a very particular experimental context (e.g., Gardiner, 1988) for the terms to successfully map onto recollection and familiarity—which is what the vast majority of researchers up to this point have been using the terms to understand (see Umanath & Coane, 2020, for a review). Even so, slight modifications in the instructions lead to large differences in usage and performance (Eldridge et al., 2002; Geraci & McCabe, 2006; Geraci et al., 2009; McCabe et al., 2011; Rotello et al., 2005; Williams & Lindsay, 2019).
The current work explicitly attempts to prevent such a disconnect between participants and researchers in using DR and DK for capturing and characterizing experiences of retrieval failure from the outset, rather than discovering such a fundamental issue after extensive (potentially problematic) usage. We examine the extent to which the terms DR and DK effectively distinguish accessibility versus availability failures for other settings in the form of a different set of materials described below by using complex, naturalistically acquired common events occurring over the previous decade and continues to consider generalizability to other people with samples of older adults. If the original findings turn out to be constrained to a specific type of knowledge, clearly, the use of DR and DK will be limited in its scope and application.
The present research
Two waves of data were collected to examine the generalizability of the phenomenology of retrieval failures for real-world knowledge for events from public news media. Our stimuli were brief descriptions of relatively recent (2006–2016) news stories regarding a variety of topics from politics to pop culture and natural disasters.
Under the circumstances specific to such stimuli, do DR and DK mean the same things and are they used in the same ways as prior work has found? This is the empirical question we aim to answer. In particular, for materials that are potentially familiar—due to their viral nature—but not accessible in the moment, use of DK might take on more of a face-saving role: Rather than admitting a failure in remembering a detail from a public news event, participants might prefer to use DK to signal a lack of certainty or an unwillingness to commit to an answer (Smith & Clark, 1993).
Samples of older adults participated in the present studies to consistently address generalizability of these metacognitive measures across age. Older and younger adults differ along a number of dimensions, especially those of relevance to the present questions, as mentioned above: memory, knowledge, and metacognition. Older adults tend to attend to news more than younger adults, at least traditional news media like radio, newspapers, and television (Bourne et al., 2020). As such, older adults would, overall, outperform younger adults in overall accuracy and might experience more DR responses, indicating an awareness that the information is available, albeit temporarily inaccessible, consistent with greater and maintained general knowledge (Park, 2000) as well as overall increased experience of retrieval failures (Cavanaugh et al., 1983). However, older adults also tend to perform worse than younger adults on traditional episodic tasks and have well-documented deficits in episodic metacognition (e.g., Souchay et al., 2000, 2007; Thomas et al., 2011), encoding new information (Balota et al., 2000; Park, 2000), and even report this themselves (Hertzog & Dixon, 1994). However, in semantic tasks, older adults’ self-assessments are as accurate as those of younger adults (e.g., Backman & Karlsson, 1985; Hertzog & Dunlosky, 2011; Lachman et al., 1979; Morson et al., 2015). Thus, more DK responses might occur, reflective of an absence of information stored in memory, if the information was simply not encoded or had decayed. Older adults might also use DK more often to reflect uncertainty or to save face: In this case, accuracy for DK items would reflect an underestimation of knowledge (Smith & Clark, 1993; see Coane & Umanath, 2019).
Study 2 was a replication of Study 1 in which we extended the retention interval of these naturalistic stimuli by approximately 18 months. Given the novelty of the stimuli and the relative paucity of work demonstrating how usage of DR and DK map onto memory performance and metamemory accuracy, and in the spirit of reproducible science, replications help establish the reliability of an effect. Hereafter, we refer to the two testing points as Wave 1 and Wave 2, to emphasize the similarity across them and the fact that this was not a longitudinal study examining forgetting at the individual level. The second wave of testing did allow us to extend the retention interval for the events. In particular for the younger adults tested in Wave 2, some of the events occurred in early elementary school. Therefore, the difference between not remembering and not knowing might have been less salient, because the familiarity of those events decreased, thereby compressing the range of stored information toward lower levels of retention.
In addition to assessing objective memory performance for these items, we also obtained a measure of self-rated familiarity for each event. Importantly, these ratings were collected prior to participants answering specific questions. Thus, familiarity was evaluated prior to any explicit retrieval attempt (although it is likely that some covert or implicit retrieval took place in assessing the event’s familiarity). Therefore, we could assess the familiarity of items subsequently given a DR or DK response when an explicit retrieval attempt failed. Coane et al. (2022) found that retrieval success of these public events was associated with both the phenomenological indicators of knowing (retrieval from semantic stores) and remembering (retrieval from episodic stores). Furthermore, know responses were more accurate than remember responses in a subsequent multiple-choice task, whereas familiarity, perhaps surprisingly, did not differ as a function of phenomenological responses. Here, we mirrored this work, but focused on retrieval failures. Given prior evidence that DR responses are associated with inaccessible information and DK responses with unavailable information, familiarity should be higher for items subsequently given a DR response than those given a DK response. Furthermore, if DR responses are given when retrieval failure is only temporary, familiarity of DR items might be similar to that of items correctly answered. Alternatively, early assessments of familiarity might predict subsequent retrieval failures, such that DR responses are associated with lower familiarity than correct responses indicating relative accuracy in how participants assess the ease of retrieval.
The work described below is meant to provide an incremental contribution and further assurance of the replicability and validity of the use of DR and DK. In light of the replication crisis currently affecting the field of psychology (and other disciplines; Nosek & Errington, 2020; Nosek et al., 2022), it is crucially important to demonstrate that novel findings are, indeed, robust to replication across different factors. As argued by Nosek and Errington (2020), “The purpose of replication is to advance theory by confronting existing understanding with new evidence” (p. 3). Thus, as mentioned above, here we provide new evidence to critically evaluate the extent to which our earlier claim—that not remembering and not knowing map onto retrieval failures of accessibility and availability, respectively—is robust across variations in participants, stimuli, and historical context. Thus, our contribution with the present work is to provide an examination and potential validation of older and younger adults’ accurate metacognitive usage of DR and DK for information about public news media, content that differs from the previously explored materials in a myriad of ways described above. This represents an important step in not only establishing the external validity of DR and DK, but also in furthering our understanding of older adults’ metacognition.