In the American justice system, witnesses to a crime are generally asked to repeat their account of an incident numerous times: to the police, in a disposition, in meetings with attorneys, and in court. When recounting a story so many times, and under such varied conditions, discrepancies are likely to occur. Such inconsistencies are often viewed as a sign of an unreliable and untrustworthy witness. Lawyers search for inconsistencies in witnesses’ accounts and use them to attack witnesses’ credibility (Alavi & Ahmad, 2002; Kerper, 1997), drawing their entire testimony into question. The technique is quite effective: A study using mock trials found that mock jurors exposed to inconsistent testimony found the eyewitness less effective and the defendant less culpable, and, as a consequence, they were less likely to convict (Berman & Cutler, 1996).
Despite the legal implications, surprisingly little research has been done to investigate the relationship between consistency and accuracy of recall. Basic memory research does not commonly focus on the form of accuracy relevant to this scenario, a form sometimes called output-bound accuracy (Koriat & Goldsmith, 1994). Output-bound accuracy is to be contrasted with the type of accuracy that is typically referred to in laboratory studies, which can be called input-bound accuracy. Input-bound accuracy refers to the proportion of studied items that are successfully recalled (i.e., the number of correctly recalled items divided by the number of things originally studied). This measure of memory is used in the vast majority of basic laboratory research on memory and places an emphasis on the amount of correct information reported about an event (Koriat & Goldsmith, 1994). In eyewitness literature, this is often referred to as completeness, as it reflects how thoroughly a witness describes an event. (This is the “whole truth” part of the famous oath used in U.S. courts.) Output-bound accuracy refers to the percentage of the items recalled that are correct (i.e., the number of correct things recalled divided by the total number of things recalled). This is the “nothing but the truth” part of the oath and is more relevant in circumstances where the specific details of the original event are unknown, such as the eyewitness scenario we have been relying on here. It reflects how much of a witness’s testimony is true. In this paper, when we talk about the accuracy of a memory, we mean specifically its output-bound accuracy.
Most of the relevant research on the relationship between consistency and accuracy can be found in the eyewitness memory literature. This research uses paradigms that, as we detail below, provide challenging circumstances under which to evaluate the accuracy of recall. It is thus perhaps not surprising that there are inconsistencies among those reports. The goal of the present research is to use the many advantages of very basic laboratory memory research to complement the small amount of applied work on this interesting problem.
An additional important distinction is related to the types of inconsistencies across multiple recall opportunities. Here we can distinguish between two general types: forgotten details, which are provided in an earlier account and omitted in a later one, and reminisced details, which are included in a later account but not mentioned in prior accounts.Footnote 1 Note here that forgotten details are not ones that were never reported, but rather ones reported at least once and then omitted in later testimony. The focus of research on the consistency of recall has been on reminiscence, probably because forgetting but not reminiscence is compatible with the well-accepted notion that memory declines over time (Erdelyi, 2010).
Inconsistent recall could reflect a number of established psychological phenomena. One possibility is that rememberers may be more willing to report memories that they are unsure about on later opportunities than on earlier ones. Such a shift in reporting policy would result in new details or new memories being included in later testimony (cf. Koriat & Goldsmith, 1996). Another possibility is that exposure to postevent information from another source leads to changes in testimony (Johnson, Hashtroudi, & Lindsay, 1993), or that postevent information reminds the rememberer of forgotten aspects of the original event (Benjamin & Ross, 2011; Benjamin & Tullis, 2010; Tullis, Benjamin, & Ross, 2014). Yet another possibility is that reminiscence occurs simply because there is more cumulative retrieval time on later than on earlier retrieval attempts, an explanation Roediger and Thorpe (1978) proposed to account for hypermnesia (the enhancement of memory over multiple recall tests that is seen when reminiscence exceeds forgetting). None of these perspectives can also explain why details are sometimes reported and then omitted at a later date.
However, one general perspective, widely accepted in the memory literature, provides a straightforward explanation of both forgetting and reminiscence without reference to additional events or mechanisms. By that view, the success of any retrieval event is a product of both the cues available during retrieval and the ones experienced during encoding (Tulving & Thomson, 1973). In an eyewitness situation, the various retrieval attempts may take place under highly varying physical, mental, and emotional conditions, all of which can serve as retrieval cues. As retrieval cues change, details that were once recalled may become inaccessible and details that were inaccessible may be recalled (see also Fisher, Brewer, & Mitchell, 2009). Such fluctuation is a well-accepted theoretical mechanism for spontaneous recovery of previously extinguished associations in animal behavior, and can be straightforwardly applied to human recall as well (Estes, 1955; see also Bower, 1972).
If inconsistent recall reflects the fluctuation of cues across retrieval situations, then two effects should be apparent. First, details that are recalled across situations are more likely to be accurate than inconsistently recalled details, because those memories have proven themselves to be more robust to the variance of cues. Second, reminisced details should not be less likely to be accurate than forgotten ones. Any differences in accuracy between reminisced and forgotten details that have been reported in the literature may be due to a confounding with the elapsed time since the original encoding event for those two effects. Because cues are thought to fluctuate with time, an attempted retrieval at a more distant point is less likely to overlap with those present during encoding.
From the cue fluctuation perspective, reminisced memories differ from forgotten memories only insofar as they are first produced after a longer retention interval—the fact that they are produced after a failed attempt at retrieval gives them no special status. Consequently, both forgotten and reminisced details should be less accurate than consistently recalled details, but there is no reason to expect that they should be different from one another once the retention interval is controlled. It is for this reason that we take great care in our experiment to control for the retention interval.
We also seek to control for other sources of inconsistency in recall. Our use of the same free recall task across tests minimizes the possibility of shifts in reporting policy, as do our relatively short retention intervals. The introduction of postevent information is eliminated by keeping subjects in the laboratory between tests. Reminding is reduced by placing demanding but unrelated distractor tasks between all of the experimental events. We minimize the influence of total retrieval time by forcing subjects to recall for an extended period on each of the individual tests. As will become clear, inconsistencies are still ample under these conditions, suggesting that cue fluctuation may provide a more coherent explanation of the relevant phenomena.
A basic understanding of the relationship between accuracy and consistency is important in part because the American legal system tends to make assumptions about the accuracy of inconsistent witnesses. First, it is assumed that inconsistent details are inaccurate (Fisher et al., 2009). Second, it is assumed that some types of inaccuracies are worse than others. This is particularly true of reminisced memories, which lawyers are trained to use to discredit the witness in a process known in the legal community as “impeachment by omission” (McElhaney, 1987). This effect finds some support in the memory literature as well: Untrained observers expect reminisced memories to be much lower in accuracy than consistently produced information, and even to be lower in accuracy than they actually are (Oeberst, 2012). Trained detectives exhibit the same bias (Krix, Sauerland, Lorei, & Rispens, 2015). Third, it is assumed that inconsistent testimony is indicative of an unreliable witness—that is, that their entire testimony should be called into question, and not just the inconsistent parts. A fourth assumption follows from the third: that inconsistent testimony should be uncommon.
Three reports in the eyewitness memory literature bear on these issues directly (Brock, Fisher, & Cutler, 1999; Gilbert & Fisher, 2006; Oeberst, 2015). Gilbert and Fisher (2006) had subjects watch short mock crime videos; the subjects were then interviewed on two separate occasions. When comparing these two interviews, they found that inconsistencies were ubiquitous, with 98 % of all subjects reminiscing at least two details. On average, subjects recalled 20.4 details consistently, reminisced 8.4 details on the second test, and forgot 9.2 details from the first test to the second.Footnote 2 Consistently recalled items were in fact significantly more accurate than forgotten items, which were in turn significantly more accurate than reminisced items; however, the accuracy of all the reported items was high. Brock and colleagues (1999) used a similar procedure: Subjects viewed a video of a traffic accident and were then interviewed twice (using the cognitive interview method; Fisher et al., 2009). They found that forgotten details were less likely to be accurate than consistently recalled details. (They did not examine reminisced details.) A third study using a video of a theft found that reminisced and forgotten details exhibited similar levels of accuracy, and were only slightly lower than those evidenced by consistent details (Krix et al., 2015). A similar pattern was reported by Oeberst (2015), evaluated memory for a live event in a classroom.
The use of an eyewitness task to evaluate memory consistency has costs and benefits. On the plus side, the generalization from these results to actual eyewitness memory is more straightforward than if a more traditional list-based laboratory memory task had been used. On the other hand, aspects of the procedure lead to questions about validity that might be more easily resolved in a list memory task and the controls that it affords. Most importantly, it is difficult to objectively define a “detail” in a report of a witnessed event. If a subject reports that a suspect wore pants and then later that the suspect wore jeans, is that a consistent report (because they reported the pants on both occasions) or a reminiscence (because there is an additional detail on the second report that the pants are made of denim)? Additionally, details may not be independent—remembering some details may allow for logical deductions about other details. Furthermore, not all of the details are of equal importance: Some may be trivial and others crucial. If a subject reports the presence of a blue sky, is it reasonable to consider that a relevant detail? All of these difficulties may be the source of some of the inconsistencies between the results in the prior literature—it is unclear from those papers, for example, whether forgotten details are more accurate than reminisced details or of equivalent accuracy. These problems are easily solved by moving to a list-based memory task, where the number and labels for the memoranda are clear and subjects know exactly what they are expected to report. Moreover, in a list-based task, items are independent and of roughly equal importance. This research is thus best considered to be use-inspired rather than applied. Our task does not “look like” an eyewitness memory event, but the basic findings may be relevant to those situations, especially if confirmed by a wider variety of designs. As with all such research, there may be factors involved in real-world eyewitness situations that limit the generalizability of the results of our work.
Only one experiment that we know of in the literature (Oeberst, 2012; Experiment 1) used a list-based procedure and examined the relationship between accuracy and consistency of recall. In that experiment, reminisced items were not found to be less accurate than consistently recalled items, a result inconsistent with all of the results from eyewitness paradigms reviewed earlier.
Our procedure uses a three-test design, which enables us to additionally evaluate whether an item reminisced early (say, during a deposition) is more likely to be accurate than an item reminisced later (say, during court). We know of no extant literature examining this question. Our experiments use a unique two-group, three-block procedure that enables control of key potential confounders in making this comparison.
A related question that follows from an investigation of accuracy and consistency concerns individual differences. Do people who report more inconsistent details exhibit lower accuracy for the consistently recalled material that they produce? Gilbert and Fisher (2006) found a small negative correlation that was not statistically significant relating the number of inconsistencies and the accuracy of the rest of the testimony. However, that analysis suffers from the same potential measurement difficulties discussed earlier—a fact that may make small correlations difficult to detect. Taken at face value, the lack of a statistically significant correlation would cast doubt on the commonly held belief that inconsistent witnesses are not trustworthy. Yet, the cue fluctuation perspective, combined with individual differences, would seem to predict that inconsistent recall should be related to lower accuracy among consistently recalled items. If subjects vary in their willingness to report a memory when the mnemonic evidence is uncertain (as we know they do; Koriat & Goldsmith, 1996), then some of the variance in the number of inconsistencies produced will reflect differences in a reporting criterion. Rememberers with a lower reporting criterion will exhibit overall lower accuracy (because of the addition of low-confidence reports to the protocol) and also produce more inconsistent details (because the exact content of those low-confidence reports will be more likely to vary with fluctuations in cues). Consequently, in contrast to the past research, we expect to see a positive relationship between accuracy and consistency across subjects.
In our experiments, we set out to evaluate three questions that follow directly from the prior literature and the cue fluctuation perspective:
-
1.
Are inconsistently recalled items less accurate than consistent items?
-
2.
Do reminisced and forgotten items exhibit similar levels of accuracy when the time of production is controlled?
-
3.
Do people who demonstrate more inconsistent recall have lower overall accuracy, even for items that are recalled consistently?
We choose to evaluate these relationships using list-learning tasks. To this end, we had subjects study common items, thus rendering the scoring (almost) completely objective and eliminating the aforementioned concerns regarding evaluating recall of mock crimes.
One confound ubiquitous in prior research is the different production times for items that were reminisced or forgotten. By definition, in a two-test design, reminisced items are produced for the first time at a later delay from the study event than are forgotten items. In our experiments, we used a between-subjects design where both groups take multiple tests but the retention intervals are arranged such that one group takes its second test at the same lag at which the second group takes its first test. This procedure allows a comparison of reminisced items from one group with the forgotten and consistent items from another group, because the time of the original production is the same for reminisced, forgotten, and consistent details.