A grey area: how does image hue affect unfamiliar face matching?

Bobak, Anna K.; Mileva, Viktoria R.; Hancock, Peter J. B.

doi:10.1186/s41235-019-0174-3

Original article
Open access
Published: 22 July 2019

A grey area: how does image hue affect unfamiliar face matching?

Cognitive Research: Principles and Implications volume 4, Article number: 27 (2019) Cite this article

3628 Accesses
13 Citations
Metrics details

Abstract

The role of image colour in face identification has received little attention in research despite the importance of identifying people from photographs in identity documents (IDs). Here, in two experiments, we investigated whether colour congruency of two photographs, shown side by side, affects face-matching accuracy. Participants were presented with two images from the Models Face Matching Test (experiment 1) and a newly devised matching task incorporating female faces (experiment 2) and asked to decide whether they show the same person or two different people. The photographs were either both in colour, both in grayscale, or mixed (one in grayscale and one in colour). Participants were more likely to accept a pair of images as a “match”, i.e. same person, in the mixed condition, regardless of whether the identity of the pair was the same or not. This demonstrates a clear shift in bias between “congruent” colour conditions and the mixed trials. In addition, there was a small decline in accuracy in the mixed condition, relative to when the images were presented in colour. Our study provides the first evidence that the hue of document photographs matters for face-matching performance. This finding has important implications for the design and regulation of photographic ID worldwide.

Significance statement

Photographic documents, such as national identity cards, driving licences, and passports are the most common means of verifying an individual’s identity. This is despite most of the research suggesting that unfamiliar face matching is difficult and error prone. Much attention in the literature has been devoted to factors influencing face matching, such as image quality, time between the taking of photographs, and the presence of paraphernalia, such as glasses. However, no work has considered the influence of the colour of photograph on the accuracy of face matching, and current identity documents (IDs) are often printed in grayscale, e.g., the European Union (EU) driving licence or Polish and Canadian passports. The findings of this paper highlight the potential pitfall of using grayscale images in IDs. People are more inclined to accept a pair of images as a match when one is grayscale and one is in colour. This detrimental effect is particularly important in the mismatched trials, i.e. when the two images present two different people. While it is unclear whether this effect persists in trained or highly skilled individuals (e.g., passport officers), our participants were sampled from a population that often works in service sector industries where routine ID inspections are commonplace. We call on the policy makers to re-think image colouration in photographic identity documents.

Background

From passport checks to buying age-restricted items, photographic identity documents (IDs) are the most commonly used proof of one’s identity. Although passport control increasingly relies on automated technology, when the identity is in question, or when the passport holder is a minor, human observers make the final decision.

Research has repeatedly shown that face matching is a challenging task and even motivated and trained individuals make a considerable number of mistakes (Kemp, Towell, & Pike, 1997; White, Kemp, Jenkins, Matheson, & Burton, 2014), often independently of experience (White, Kemp, Jenkins, Matheson, and Burton, 2014; Wirth & Carbon, 2017). In their seminal study, Kemp et al. (1997) examined the accuracy of experienced cashiers in detecting fraudulent IDs. They found that despite a financial incentive to do well, cashiers accepted approximately 35% of foil ID cards even when the appearance of the card bearer did not resemble that of the foil depicted on the document’s image. Under optimal laboratory conditions, when photographs are taken on the same day, participants sampled opportunistically from the general population make between 11% and 20% of mistakes in a matching task (Burton, White, & McNeill, 2010). In real-life settings, these optimal conditions are rarely preserved. With a typical passport document valid for ten years, factors such as age (e.g., White, Phillips, Hahn, Hill, & O’Toole, 2015), hairstyle changes (Ellis, Shepherd & Davies, 1979), wearing glasses (Kramer & Ritchie, 2016), and general within-person appearance idiosyncrasies (Ritchie & Burton, 2017) can all be detrimental to face-matching accuracy.

To address this issue, a number of studies have concentrated on the individual differences in face matching and the ways to improve photographic ID by, for instance, providing multiple images of the same person (Dowsett, Sandford, & Burton, 2016), restricting the viewing to internal features (Kemp, Caon, Howard, & Brooks, 2016), face-matching training (Alenezi & Bindemann, 2013; Dowsett & Burton, 2015; Moore & Johnston, 2013; White, Kemp, Jenkins, & Burton, 2014), and by giving specific instructions on which features to focus on (Megreya & Bindemann, 2018).

With the limited success of training regimes (c.f., Megreya & Bindemann, 2018) and few effective ways of improving ID documents for human observers, several studies proposed that selecting individuals from the high end of the face processing ability spectrum would be the best strategy for improving operational accuracy while more adequate training methods are developed (Bobak, Dowsett, & Bate, 2016; Bobak, Hancock, & Bate, 2015; Robertson, Noyes, Dowsett, Jenkins, & Burton, 2016). Indeed, so-called super-recognisers have been found to outperform typical perceivers on standard face-matching tasks both as a group and at the individual level (Bobak et al., 2016; Robertson et al., 2016) with some performing on par with or better than the leading computer algorithms (Phillips et al., 2018). However, the possibility of employing super-recognisers for all possible face-matching scenarios (i.e., border control and selling age-restricted items in stores) is unlikely. Therefore, most face-matching tasks will continue to be problematic.

While experimental work on face matching has typically concentrated on person properties - the variability in individual appearance such as that caused by facial expression, hairstyle, pose, age, or paraphernalia - in face-matching accuracy, considerably less research has examined image properties (i.e., changes that can be applied after images have been taken) and their effect on face processing. One such image property is colour, previously shown to be relevant for face recognition (Yip & Sinha, 2002), face detection (Bindemann & Burton, 2009), gender classification (Nestor & Tarr, 2008), and non-face object recognition (Bramão, Reis, Petersson, & Faísca, 2011). Kemp, Pike, White, and Musselman (1996) showed that completely inverting the hue, such that a typical face appears in shades of blue, had almost no effect on the recognition of familiar faces but did affect recognition of previously unfamiliar faces. Yip and Sinha (2002) showed that colour information does matter for face recognition when availability of other cues is diminished, for instance when faces are blurred, but not when the images are of high quality. This is due to colour information facilitating low-level analysis and segmenting features within a face (such as separating the mouth contour or the hairline), rather than aiding identification directly (but see Abudarham & Yovel, 2016; Bindemann & Burton, 2009). However, Abudarham and Yovel (2016) identified several critical features, such as hair and eye colour, that are invariant across changes in one’s appearance and are pertinent to recognising one’s identity. Changing these features appears to considerably alter the perception of identity, while variations in other features do not. For instance, chin shape was defined as a non-critical feature that differs depending on rigid and non-rigid face motion, but eye colour and hair colour remain the same, providing they are not disguised deliberately with coloured contact lenses or hair dye.

It is thus plausible that colour is an important factor not only in face recognition, or detection, but also in face matching, yet one of the most commonly used tests to assess the face matching ability, the Glasgow Face Matching Test (Burton et al., 2010) is administered using grayscale images, while other tasks, such as the Model Face Matching Test (MFMT) (Dowsett & Burton, 2015), or the new Kent Matching Test (Fysh & Bindemann, 2017) utilise colour photographs. It is unclear what effect image colour incongruence may elicit on face-matching performance. This is important, because in real-life situations, it is common for a grayscale ID photograph to be compared with an individual in front of the person performing the check. For instance, EU driving licences, Polish national identity cards that are valid for international air travel within the EU, and Polish and Republic of Ireland passports contain grayscale photographs (for examples see Fig. 1). Other countries such as Canada allow applicants to submit either grayscale or coloured photographs for their passports. These documents are used for identity verification at airports and when buying age-restricted items. Thus, if image hue influences face-matching performance, this could have important implications for the design of photographic ID.

In this study, we investigated whether image colour affects accuracy in the matching of photographs. We used unconstrained images from the well-established MFMT (Dowsett & Burton, 2015) and a newly designed face-matching task capturing the natural variability in people’s appearance. This is important, because in real-world situations people vary in their everyday appearance and many IDs do not have to adhere to strict passport-like image capture guidelines. We tested participants under three conditions: “colour”, “grayscale”, and “mixed” (where one image was presented in colour and one in grayscale). The addition of the mixed trials is the main advancement of this study on those previously reported in the literature and is of importance from both theoretical and applied perspectives. We hypothesised that, if colour facilitates low-level analysis, it is possible that grayscale images and/or hue incongruency between photographs may disrupt this process leading to a decrease in overall accuracy in these conditions, relative to when both images are presented in colour. Additionally, if hair and eye colour are critical features that individuals use for recognising unfamiliar individuals (Abudarham & Yovel, 2016), we would expect decreased performance in “mixed” and grayscale conditions. However, if colour is a general diagnostic (i.e. it is helpful for extracting a robust representation of one’s face by integrating hue, shading, and fine-grained featural information) for one’s identity from which one can generalise to other instances of the same identity, one clear and high-quality image may be enough to extract identity information sufficient to compare this identity to a second picture in a “mixed” matching trial. We would then merely expect reduced performance in the grayscale trials.

Experiment 1

Method

Participants

A total of 42 students (30 female; age, mean (M) = 20, SD = 3.5; all with self-reported normal or corrected-to-normal vision) at a university in the UK took part in the study on a voluntary basis and without reimbursement. The study was approved by the General University Ethics Panel and was carried out in accordance with the recommendations of the World Medical Association Declaration of Helsinki. Sample size was determined based on previous research (e.g., Kramer & Ritchie, 2016) and our stopping point was set for the pre-determined participant number.

Materials

Our materials consisted of a total of 90 MFMT trials: 45 matched and 45 mismatched trials divided into three sets of 30 trials (15 matched and 15 mismatched per set). All three sets were of equal difficulty (this baseline average accuracy for each set was determined by pilot testing in Dowsett & Burton, 2015). In this study, we called these three sets of 30 face pairs A, B, and C. All images measured 300 (width (W)) × 420 (height (H)) pixels, did not contain visible jewellery, but were not cropped of hair or clothing to mimic natural conditions under which face matching would occur (Fig. 2). We created three variations of every pair: (1) colour condition as per the original study, (2) grayscale condition where all were presented in black and white, and (3) mixed condition, where one image of each pair was presented in colour and one in grayscale. Images were converted from colour to grayscale using IrfanView software (http://www.irfanview.com/).

Procedure and apparatus

Each participant saw all 90 pairs. The colour condition in which they saw each set was counterbalanced, i.e. some participants saw set A in colour, some saw it in the mixed, and others in the grayscale condition etc. All participants saw all three colour conditions (within-subjects design) displayed randomly (not blocked) to mimic the natural environment in which those checking identity documents may operate (see Fig. 2 for examples of face pairs). In the mixed condition pairs, the grayscale images appeared equally often on each side of the screen.

On each of the 90 trials, the pairs of images were presented side by side, one to the left and one to the right of the centre of the screen. The viewing distance was not fixed. Participants were instructed to decide whether two images presented on screen were of the same person, or two different people and respond with the “s” key for “same” and “k” key for “different”. These response buttons remained the same throughout the experiment for each participant. There was no time restriction placed on participants. Testing took part in dimly lit cubicles using 19 in. monitors running 1280 × 1024 pixels resolution, and refresh rate 60 Hz.

Results

All participants’ data were used in the analyses. Accuracy was analysed separately for matched and mismatched trials due to the weak correlation between performance on matched and mismatched trials as reported in the literature, which suggests that these trials represent distinct processes (Megreya & Burton, 2007).

For matched trials, percentage correct was analysed using one-way within-subjects analysis of variance (ANOVA) with three levels (colour, grayscale, and mixed). There was a significant main effect of image hue, F(2,82) = 9.96, p < .001, η²_p = 0.19. Follow-up pairwise comparisons (Bonferroni corrected) showed that participants were more accurate in the “colour” and “mixed” conditions than in the “grayscale” condition, p = .045, d = 0.40 (95% CI 0.11, 0.72) and p < .001, d = 0.77 (95% CI 0.44, 1.15), respectively (see Table 1 for a summary of means and SD). The mixed and colour conditions did not differ from each other: p = .279, d = 0.29 (95% CI − 0.04, 0.64).

Table 1 Average performance for all conditions (standard deviations are in parentheses)

Full size table

Accuracy was also examined in mismatched trials, using within-subjects ANOVA with three hue levels. There was a significant main effect of condition, F(2,82) = 23.60, p < .001, η²_p = 0.365. Pairwise comparisons (Bonferroni corrected) revealed that performance was lower in the mixed condition than in colour and grayscale conditions, p < .001, d = 0.64 (95% CI 0.34, 0.98) and p < .001, d = 0.89 (95% CI 0.56, 1.28), respectively. Accuracy in grayscale and colour conditions did not differ, p = .073, d = 0.22 (95% CI 0.04, 0.41).

In keeping with other recent studies in the field of face matching, we also analysed signal detection measures to separate the effects of sensitivity and response bias on match and mismatch image trials. d prime was calculated by subtracting the z scores for false alarms (FA), i.e. when participants responded “same” in mismatched trials, from z scores when participants correctly identified two images as “same” in matched trials (hits, H). Response bias (criterion c) was calculated by taking a negative average of z scores for the H and FA responses (Macmillan & Creelman, 2004). On one-way within-subjects ANOVA of d prime scores there was a non-significant trend for hue condition, F(2,82) = 2.65, p = .077, η²_p = 0.06. However, the critical comparison is between the colour and mixed conditions and these differed significantly on analysis by paired t test, t(41) = 2.31, p = .028, d = 0.35 (95% CI 0.05, 0.66).

An analogous analysis of response bias, showed a highly significant main effect of condition F(2,82) = 29.24, p < .001, η²_p = 0.42, with a very large effect size. Follow-up comparisons (Bonferroni corrected) showed that participants had a significantly more conservative bias (they were more likely to reject a pair as mismatch) in the grayscale condition than in the colour: p = .002, d = 0.39 (95% CI 0.18, 0.60), and mixed conditions p < .001, d = 1.03 (95% CI 0.71, 1.41). Participants were also more likely to respond “different” in the colour condition, than in mixed condition, p < .001, d = 0.57 (95% CI 0.28, 0.90) (see Fig. 3). This reflects the matching data shown in Table 1: the mixed condition produces the highest match accuracy, at the cost of the worst mismatch accuracy: participants are simply more likely to declare a match than in the colour and grayscale conditions.

Discussion

In experiment 1, we examined how image hue affects face-matching performance in a group of young British adults. While the overall accuracy did not differ between conditions (Table 1), when we examined d prime there was a trend towards individuals being better at discriminating faces (i.e. deciding whether they were the same or two different faces) when they were presented in colour relative to when the colour of images differed. Even more clearly, there were differences in how participants approached matched and mismatched trials depending on the colour congruency. In the colour and grayscale conditions, participants were significantly more biased to respond conservatively (i.e. that a pair was a mismatch). This was more pronounced for the grayscale pairs. This pattern of responses was not present in the mixed-hue condition where the accuracy was comparable in both matched and mismatched trials. More importantly, in mismatched trials, participants were significantly less accurate than in colour and grayscale conditions. This clear shift in bias (see Fig. 3) may be explained by the additional difference between the two images in each pair. That is, in colour and grayscale conditions each of the two side-by-side images only differed in the specific pictures displayed (see Fig. 2 top and middle rows). However, within the mixed condition, the two images differed in which pictures were displayed but also in the extra dimension of having one in colour and one in grayscale. This may have led participants to discount perceptions of a difference between the two images (in mismatch pairs), and to attribute those differences to the image hues, rather than differences in actual identities.

These biases were unexpected, and because we anticipated differences in performance to affect the mixed condition irrespective of the trial type, we sought to replicate this effect in another experiment with a more ecologically valid face set (Additional file 1).