From passport checks to buying age-restricted items, photographic identity documents (IDs) are the most commonly used proof of one’s identity. Although passport control increasingly relies on automated technology, when the identity is in question, or when the passport holder is a minor, human observers make the final decision.
Research has repeatedly shown that face matching is a challenging task and even motivated and trained individuals make a considerable number of mistakes (Kemp, Towell, & Pike, 1997; White, Kemp, Jenkins, Matheson, & Burton, 2014), often independently of experience (White, Kemp, Jenkins, Matheson, and Burton, 2014; Wirth & Carbon, 2017). In their seminal study, Kemp et al. (1997) examined the accuracy of experienced cashiers in detecting fraudulent IDs. They found that despite a financial incentive to do well, cashiers accepted approximately 35% of foil ID cards even when the appearance of the card bearer did not resemble that of the foil depicted on the document’s image. Under optimal laboratory conditions, when photographs are taken on the same day, participants sampled opportunistically from the general population make between 11% and 20% of mistakes in a matching task (Burton, White, & McNeill, 2010). In real-life settings, these optimal conditions are rarely preserved. With a typical passport document valid for ten years, factors such as age (e.g., White, Phillips, Hahn, Hill, & O’Toole, 2015), hairstyle changes (Ellis, Shepherd & Davies, 1979), wearing glasses (Kramer & Ritchie, 2016), and general within-person appearance idiosyncrasies (Ritchie & Burton, 2017) can all be detrimental to face-matching accuracy.
To address this issue, a number of studies have concentrated on the individual differences in face matching and the ways to improve photographic ID by, for instance, providing multiple images of the same person (Dowsett, Sandford, & Burton, 2016), restricting the viewing to internal features (Kemp, Caon, Howard, & Brooks, 2016), face-matching training (Alenezi & Bindemann, 2013; Dowsett & Burton, 2015; Moore & Johnston, 2013; White, Kemp, Jenkins, & Burton, 2014), and by giving specific instructions on which features to focus on (Megreya & Bindemann, 2018).
With the limited success of training regimes (c.f., Megreya & Bindemann, 2018) and few effective ways of improving ID documents for human observers, several studies proposed that selecting individuals from the high end of the face processing ability spectrum would be the best strategy for improving operational accuracy while more adequate training methods are developed (Bobak, Dowsett, & Bate, 2016; Bobak, Hancock, & Bate, 2015; Robertson, Noyes, Dowsett, Jenkins, & Burton, 2016). Indeed, so-called super-recognisers have been found to outperform typical perceivers on standard face-matching tasks both as a group and at the individual level (Bobak et al., 2016; Robertson et al., 2016) with some performing on par with or better than the leading computer algorithms (Phillips et al., 2018). However, the possibility of employing super-recognisers for all possible face-matching scenarios (i.e., border control and selling age-restricted items in stores) is unlikely. Therefore, most face-matching tasks will continue to be problematic.
While experimental work on face matching has typically concentrated on person properties - the variability in individual appearance such as that caused by facial expression, hairstyle, pose, age, or paraphernalia - in face-matching accuracy, considerably less research has examined image properties (i.e., changes that can be applied after images have been taken) and their effect on face processing. One such image property is colour, previously shown to be relevant for face recognition (Yip & Sinha, 2002), face detection (Bindemann & Burton, 2009), gender classification (Nestor & Tarr, 2008), and non-face object recognition (Bramão, Reis, Petersson, & Faísca, 2011). Kemp, Pike, White, and Musselman (1996) showed that completely inverting the hue, such that a typical face appears in shades of blue, had almost no effect on the recognition of familiar faces but did affect recognition of previously unfamiliar faces. Yip and Sinha (2002) showed that colour information does matter for face recognition when availability of other cues is diminished, for instance when faces are blurred, but not when the images are of high quality. This is due to colour information facilitating low-level analysis and segmenting features within a face (such as separating the mouth contour or the hairline), rather than aiding identification directly (but see Abudarham & Yovel, 2016; Bindemann & Burton, 2009). However, Abudarham and Yovel (2016) identified several critical features, such as hair and eye colour, that are invariant across changes in one’s appearance and are pertinent to recognising one’s identity. Changing these features appears to considerably alter the perception of identity, while variations in other features do not. For instance, chin shape was defined as a non-critical feature that differs depending on rigid and non-rigid face motion, but eye colour and hair colour remain the same, providing they are not disguised deliberately with coloured contact lenses or hair dye.
It is thus plausible that colour is an important factor not only in face recognition, or detection, but also in face matching, yet one of the most commonly used tests to assess the face matching ability, the Glasgow Face Matching Test (Burton et al., 2010) is administered using grayscale images, while other tasks, such as the Model Face Matching Test (MFMT) (Dowsett & Burton, 2015), or the new Kent Matching Test (Fysh & Bindemann, 2017) utilise colour photographs. It is unclear what effect image colour incongruence may elicit on face-matching performance. This is important, because in real-life situations, it is common for a grayscale ID photograph to be compared with an individual in front of the person performing the check. For instance, EU driving licences, Polish national identity cards that are valid for international air travel within the EU, and Polish and Republic of Ireland passports contain grayscale photographs (for examples see Fig. 1). Other countries such as Canada allow applicants to submit either grayscale or coloured photographs for their passports. These documents are used for identity verification at airports and when buying age-restricted items. Thus, if image hue influences face-matching performance, this could have important implications for the design of photographic ID.
In this study, we investigated whether image colour affects accuracy in the matching of photographs. We used unconstrained images from the well-established MFMT (Dowsett & Burton, 2015) and a newly designed face-matching task capturing the natural variability in people’s appearance. This is important, because in real-world situations people vary in their everyday appearance and many IDs do not have to adhere to strict passport-like image capture guidelines. We tested participants under three conditions: “colour”, “grayscale”, and “mixed” (where one image was presented in colour and one in grayscale). The addition of the mixed trials is the main advancement of this study on those previously reported in the literature and is of importance from both theoretical and applied perspectives. We hypothesised that, if colour facilitates low-level analysis, it is possible that grayscale images and/or hue incongruency between photographs may disrupt this process leading to a decrease in overall accuracy in these conditions, relative to when both images are presented in colour. Additionally, if hair and eye colour are critical features that individuals use for recognising unfamiliar individuals (Abudarham & Yovel, 2016), we would expect decreased performance in “mixed” and grayscale conditions. However, if colour is a general diagnostic (i.e. it is helpful for extracting a robust representation of one’s face by integrating hue, shading, and fine-grained featural information) for one’s identity from which one can generalise to other instances of the same identity, one clear and high-quality image may be enough to extract identity information sufficient to compare this identity to a second picture in a “mixed” matching trial. We would then merely expect reduced performance in the grayscale trials.