Individual differences in hyper-realistic mask detection

Sanders, Jet G.; Jenkins, Rob

doi:10.1186/s41235-018-0118-3

Original article
Open access
Published: 27 June 2018

Individual differences in hyper-realistic mask detection

Cognitive Research: Principles and Implications volume 3, Article number: 24 (2018) Cite this article

5746 Accesses
6 Citations
59 Altmetric
Metrics details

Abstract

Hyper-realistic masks present a new challenge to security and crime prevention. We have recently shown that people’s ability to differentiate these masks from real faces is extremely limited. Here we consider individual differences as a means to improve mask detection. Participants categorized single images as masks or real faces in a computer-based task. Experiment 1 revealed poor accuracy (40%) and large individual differences (5–100%) for high-realism masks among low-realism masks and real faces. Individual differences in mask categorization accuracy remained large when the Low-realism condition was eliminated (Experiment 2). Accuracy for mask images was not correlated with accuracy for real face images or with prior knowledge of hyper-realistic face masks. Image analysis revealed that mask and face stimuli were most strongly differentiated in the region below the eyes. Moreover, high-performing participants tracked the differential information in this area, but low-performing participants did not. Like other face tasks (e.g. identification), hyper-realistic mask detection gives rise to large individual differences in performance. Unlike many other face tasks, performance may be localized to a specific image cue.

Significance

The proliferation of Hollywood-style silicone masks has caught the security sector unawares. These whole-face masks allow wearers to transform their facial appearance in seconds and are readily accepted as real faces. The implications for security and crime prevention are potentially far-reaching, as undetected face masks undermine the connection between facial appearance and personal identity. Psychological research on face perception has discovered large individual differences in identification ability. The present studies similarly reveal large individual differences in the completely novel task of hyper-realistic mask detection and identify a specific region under the eyes that may drive accurate performance. Our findings raise the interesting prospect of selecting personnel for very narrow cognitive tasks. They also suggest that performance on this particular task may be responsive to training. Either route could improve our ability to distinguish hyper-realistic face masks from real faces.

Background

In a number of high-profile criminal cases, offenders have used hyper-realistic face masks (Fig. 1) to transform their facial appearance, leading police to pursue suspects who looked nothing like the actual offenders (e.g. different race or age; Bernstein, 2010). In a separate incident, an airline passenger wearing a hyper-realistic mask boarded an international flight without the deception being noticed (Zamost, 2010). These cases suggest that, in practical settings, hyper-realistic face masks can be difficult to distinguish from real faces. Experimental evidence bears out this conclusion. In a series of studies (Sanders et al., 2017), we examined incidental detection of unexpected but attended hyper-realistic masks in both photographic and live presentations. In all of these studies, viewers accepted hyper-realistic masks as real faces. These findings extend a tradition of research into realism of artificial stimuli. The Uncanny Valley phenomenon originally considered a range of human-like stimuli from puppets to robots (Mori, 1970; Mori, MacDorman, & Kageki, 2012). In recent years, the focus has shifted somewhat to computer-generated images (e.g. Nightingale, Wade, & Watson, 2017), but the very success of computer graphics has raised awareness that on-screen images may be digitally generated or enhanced. One of the interesting aspects of hyper-realistic masks is that they also fool the eye in the physical world (Sanders et al., 2017), where digital image manipulation has not yet encroached.

The finding that spontaneous mask detection is unreliable suggests that specific measures may be required if detection rates are to be improved. Here we pursue an individual differences approach to the problem. Over the last decade, individual differences have become an important topic in face perception research, not least because they suggest a route to improving performance in applied settings. For face identification, the range of ability is bracketed by two extremes. At the high end, super-recognizers who rarely make errors (Bobak, Bennetts, Parris, Jansari, & Bate, 2016; Robertson, Noyes, Dowsett, Jenkins, & Burton, 2016; Russell, Duchaine, & Nakayama, 2009), and at the low end, people with developmental prosopagnosia who rarely exceed chance performance (Behrmann & Avidan, 2005; Duchaine & Nakayama, 2005). Between these extremes, there is a spectrum of ability on standardized face identification tests (e.g. Burton, White, & McNeill, 2010; Duchaine & Nakayama, 2006).

These findings have led some researchers to suggest that personnel selection could play a useful role in optimizing occupational face recognition (White, Kemp, Jenkins, Matheson, & Burton, 2014). For example, Metropolitan Police super-recognizers have been found to score unusually high on a range of face identification tests (Robertson et al., 2016).

For mask detection, the cognitive situation is somewhat different. Here the challenge is not individuation at the subordinate level (Rosch, Mervis, Gray, Johnson, & Boyes-Braem, 1976), but rather categorization at the basic level, albeit for the unusual case where one basic category (masks) deliberately mimics the other (faces). As the current task involves face/non-face categorization, it arguably has more in common with face detection than with face identification (see Bindemann & Lewis, 2013, for a careful dissection of these issues).

The analogy with face detection may have some broad predictive value for the present case. Large individual differences in face detection ability have recently been reported (Robertson, Jenkins, & Burton, 2017) and they appear to dissociate from face identification ability. However, one important difference is that face detection hinges on the presence or absence of a face-like pattern (e.g. two eyes above a nose above a mouth). That criterion will not help the viewer in the current task, as hyper-realistic face masks and real faces both present face-like patterns. Thus, the intuition is that hyper-realistic mask detection will require finer discrimination than face detection tasks demand.

As yet, very little is known about individual differences in this finer perceptual task. For example, we do not know the expected range of ability. Nor do we know any factors that might differentiate high performers from low performers. The present studies address these issues by asking whether some people are better than others at categorizing masks and faces, and what they may be doing that allows them to perform well. The overarching aim is to establish whether an individual differences approach might be as useful in hyper-realistic mask detection as it has been in face identification.

We begin in Experiment 1 by comparing detection of low-realism and high-realism masks in the context of real faces. In Experiment 2, we eliminated low-realism masks to focus participants on the harder comparison (high-realism masks vs real faces). Finally, we undertook an image analysis to compare use of information for high- and low-accuracy subgroups.

Experiment 1

Previous studies of hyper-realistic mask perception have assessed spontaneous detection of masks during an orthogonal task (social inference ratings; Sanders et al., 2017). Detection rates approached floor levels in that situation, precluding individual differences analysis. In this study, we sought to increase detection rates by: (1) explicitly instructing participants that the task was to distinguish masks from real faces; (2) presenting masks and faces equally often (50% prevalence); and (3) explaining this prevalence rate to participants. These measures were intended to license “mask” responses, even when participants were not certain. We expected that low-realism masks and real faces would be categorized accurately. Our main interest was in the range of performance for high-realism masks.

Method

Ethics statement

Ethics approval for all experiments was obtained from the departmental ethics committee at the University of York.

Participants

Thirty members of the volunteer panel at the University of York (21 women, 9 men; mean age = 22 years, age range = 18–41 years) took part in exchange for a small payment or course credit.

Stimuli and design

To collect images of high-realism masks, we entered the search terms “realistic masks,” “hyper-realistic masks,” and “realistic silicone masks” into Google Images. We selected images that: (1) exceeded 150 pixels in height; (2) showed the mask in roughly frontal aspect; (3) showed the eye region without occlusions; and (4) included real hair eyebrows. We used the same criteria to search the websites of mask manufacturers (e.g. RealFlesh Masks, SPFX, CFX) and topical forums on social media (e.g. Silicone Mask Sickos, Silicone mask addicts). Our aim here was to sample “ambient” photos of hyper-realistic masks that represent the range of the mask images in the visual world (Jenkins et al., 2011). For this reason, we avoided promotional studio photographs of the masks and instead used photos of the masks in situ. This search resulted in 37 hyper-realistic mask images that met the inclusion criteria.

For comparison, we collected 37 images of low-realism masks by entering search terms such as “Halloween,” “party,” “mask,” “masquerade,” “face-mask,” and “party mask” in Google Images and selecting the first images that met inclusion criteria 1–3 above.

We also collected 74 real-face images for use as fillers in the mask/face categorization task. To ensure that the demographic distribution among our real face images was similar to that portrayed by the high-realism masks, we entered the search terms “young male,” “old male,” “young female,” and “old female” into Google Images. We then accepted images that met criteria 1–3 until the distribution of faces across these categories was the same as for the high-realism mask images. All photos were cropped to show the head region only and resized to 540 × 385 pixels for presentation (see Fig. 2).

The final image set consisted of 148 photographs (37 high-realism masks, 37 low-realism masks, 74 real faces). Each participant viewed the 148 images intermixed in a different random order (within-subjects design).

Procedure

Participants were instructed that half of the images showed real faces and half of the images showed masks. They were also informed that mask trials would contain both low-realism masks and high-realism masks. Each trial consisted of a centrally presented image (a mask or a face) together with the prompt “Is this person wearing a mask?” and response options “Yes - Press M” and “No – Press Z.” The display remained on screen until response, upon which the following trial began automatically. No time limit was imposed. Participants completed three practice trials, followed by 148 experimental trials in a unique random order. The entire experiment took approximately 10 min to complete.

Results and discussion

Group performance

Real face images were correctly classified on 96.3% of trials and were not analyzed further. Performance on mask trials is summarized in Fig. 3. As expected, low-realism masks were categorized reliably (M = 98.2%, SE = 0.4, CI = 97.6–99.0). High-realism masks were categorized much less reliably (M = 40.4%, SE = 5.6, CI = 29.2–51.5), meaning that the clear majority of these masks (59.6%) were misclassified as real faces. A within-subjects t-test confirmed that this difference in accuracy was statistically significant (t(29) = 10.29, p < 0.001).

Reaction time (RT) data followed a similar pattern. Correct responses to low-realism mask trials were relatively fast (M = 895 ms, SE = 35, CI = 831–959). Indeed, RTs to high-realism masks were twice as long 1629 ms (SE = 142, CI = 1352–1901). Again, the difference between mask conditions was statistically robust (t(29) = 5.86, p < 0.001).

Individual differences

As can be seen in Fig. 4, there was little variability in accuracy in the low-realism mask condition (range 95–100%), with performance compressed against ceiling for this easy task. In contrast, accuracy in the high-realism condition spanned the entire range (5–100%). Unsurprisingly, there was no correlation between high- and low-realism mask trial performance (r = 0.182, p = 0.335).

Overall, classification judgements were much harder for high-realism masks than for low-realism masks. More importantly for the current study, the data reveal striking individual differences in performance for the high-realism condition. A few observers detected hardly any hyper-realistic face masks in this experiment, but a few detected nearly all of them.

One possible interpretation of this pattern is that low-realism masks make high-realism masks hard to detect, by encouraging viewers to draw the category boundary in the wrong place ([real faces + high-realism masks] vs [low-realism masks] as opposed to [real faces] vs [high-realism + low-realism masks]). Prior knowledge of hyper-realistic face masks could protect against this error, leading to high overall accuracy. To address this possibility, we next repeated the experiment without the low-realism mask condition. We also asked participants whether they had encountered hyper-realistic face masks before the experiment.

Experiment 2

This experiment was the same as Experiment 1, except for the following changes. First, we replaced the low-realism mask stimuli with high-realism mask stimuli, in order to focus participants on the difficult judgments (real faces vs hyper-realistic face masks). As before, we informed participants that half of the trials would contain real faces and half of them would contain masks. We expected the new composition of trials to elicit errors in both directions (i.e. masks mistaken for faces and faces mistaken for masks). Our main interest was the distribution of performance in this situation. To test for effects of prior mask knowledge on performance, we also collected self-report ratings at the end of the experiment.