Police officers tend to use verbal confidence scales with only two levels of confidence on those occasions when they ask for confidence. Psychologists examining confidence–accuracy relationships usually conduct research with fine-grained numeric scales, often 20- or 100-point scales. We asked whether the same strong relation between confidence and accuracy that has been found with fine-grained scales would be replicated with two- and four-level scales with verbal statements of confidence. The answer is yes: high confidence indicates high accuracy about equally for both scale types. Further, adding numbers to verbal scales did not change performance. We used two different lineups that varied markedly in difficulty. Although correct identification rates were low for the difficult set, high confidence responses still provided highly accurate responding at 87%. We conclude that high confidence responses on verbal scales typical of those used by police provide highly accurate responses.
The relationship between confidence and accuracy with verbal and numeric confidence scales
The relationship between confidence and accuracy has long been debated within eyewitness research, but recently a resolution to the debate has occurred. Studies from the 1980s until relatively recently, using a point-biserial correlation technique (a correlation between the dichotomous accuracy variable and the corresponding confidence ratings), led to the conclusion that there was little to no relationship between confidence and accuracy (Kassin, Ellsworth, & Smith, 1989; Wells & Murray, 1984). However, Juslin, Olsson, and Winman (1996) argued that even when the point-biserial correlation was small, a strong confidence–accuracy (CA) relationship could still exist in the data. They computed a calibration plot using the formula C = # correct ID/(# correct ID + # incorrect ID) that includes all filler IDs from target-absent (TA) and target-present (TP) lineups and showed a strong CA relation even when the point-biserial correlation was low. Because fillers in TP lineups are known to be innocent, another way to compute calibration plots is to exclude these filler IDs. This approach, now called confidence–accuracy calibration characteristic plots (CAC; Mickes, 2015), provide measures of confidence (in bins) on the abscissa and measures of accuracy from low to high on the ordinate. Typically, these plots show that high confidence is associated with quite high accuracy (Mickes, 2015; Palmer, Brewer, Weber, & Nagesh, 2013; Sauer, Brewer, Zweck, & Weber, 2010; Wixted, Mickes, Clark, Gronlund, & Roediger, 2015; for a review, see Wixted & Wells, 2017). This outcome seems true of initial witness reports. If a witness has been repeatedly tested and has moved from a low confidence initial identification to a high confidence courtroom identification, then an error is likely. High confidence in the courtroom usually should not be weighed heavily (if at all) if the witness’s confidence at the first lineup was low.
Studies using CAC analysis have often employed numeric confidence scales (e.g. 20-point scales or 100-point scales), which are not reflective of eyewitness identification as conducted by police departments. Unlike laboratory studies, police departments usually have eyewitnesses verbally express their confidence or use a small range of confidence scales (perhaps highly confident or somewhat confident) instead of 100-point scales. Behrman and Richards (2005) reported that eyewitnesses typically used phrases such as “He resembles the guy” or “I think he did it” to indicate their certainty (or lack thereof). Wells (2014) reported that the Houston Police Department used a three-level verbal scale (positive, strong tentative, or weak tentative) for eyewitness identifications. Together, these police procedures raise an important question: Are such verbal confidence scales with few levels of confidence as predictive of eyewitness accuracy as the more fine-grained numeric scales used in laboratory studies? This is the issue we addressed in our paper, but we first review related findings.
Dodson and Dobolyi (2015a) showed that providing verbal or numeric labels and varying the number of confidence points on a 100-point scale (6 points: 0, 20, 40, 60, 80, 100; or 11 points: 0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100) did not change the CA relationship for eyewitness identification. Nonetheless, they employed a 100-point scale, which is unlikely to be used by police departments. Recently, Tekin and Roediger (2017) compared narrow ranges (4- and 5-point scales) to wider ranges (20- and 100-point scales) and concluded that the scale range did not affect the CA relationship with numeric scales. However, they used unrelated words and faces as materials, not lineups.
In the present study, we directly compared two- and four-level scales using lineups, because the four-level scale can be directly compared to the two-level scale by combining levels. The four-level scale may also be applicable in some police departments. Although previous research from our lab revealed that the 4-point and wider (e.g. 20-point, 100-point) scales did not differ from one another in the CA relation (Tekin & Roediger, 2017), no one has examined the issue with smaller scales (e.g. two- and four-level scales). Importantly, we used verbal confidence statements, as often used by police departments, and we also examined whether providing numerical values for the verbal confidence statements provided any benefits to the CA relationship compared to only verbal statements. In a quest for external validity, we did not compare verbal scales to just numeric scales because eyewitnesses are unlikely to give a numeric confidence without a verbal statement; thus, we added numbers to the verbal confidence scales to make them comparable. In addition, we employed two different material sets to establish some generalizability. Interestingly, one lineup turned out to be much more difficult for individuals than the other lineup and thus we can examine the effect of lineup difficulty on the CA relationship.
The current experiment addressed three primary questions. First, do small scales produce similar CA relationships (e.g. 2 values of confidence compared to 4 values), as is true for larger numeric scales (e.g. 20 points compared to 100 points)? Second, does adding numbers to purely verbal scales affect the CA relationship compared to using only verbal scales? Third, do the results replicate across two different sets of material (crime scenes and associated lineups)? To answer these questions, individuals viewed two videos, made identifications for possible suspects in each video with TP and TA lineups, and then indicated their confidence on either: (1) a verbal-only two-level scale; (2) a verbal + numeric 2-point scale; (3) a verbal-only four-level scale; or (4) a verbal + numeric 4-point scale.