Part of our interest in hyper-realistic masks stems from their use in security settings. At first sight, it is difficult to credit that a person wearing a full mask could board a plane unchallenged. How are we to make sense of such incidents? Do they reflect inattention on the part of the observer, or perhaps an unwillingness to confront the mask wearer? Or could it be that, in these situations, hyper-realistic masks are indistinguishable from real faces? In our experiments, almost no one reported noticing the mask, despite attending to the mask and answering several questions about its appearance. This was true for photographic images presented onscreen. It was also true for live confederates presented outdoors. The numbers are sobering. Of the 280 participants who viewed hyper-realistic masks in these studies (60 in Experiment 1; 60 in Experiment 2; 160 in Experiment 3), only two spontaneously reported the mask and only three more reported the mask following further prompting. Interestingly, all five of these participants viewed the mask live (Experiment 3) and at the closer viewing distance of 5 m. These are low detection rates. Evidently, the information available even in near-distance, live viewing (visual detail, 3D form, motion) did not allow viewers to distinguish hyper-realistic masks from real faces with any generality. Nevertheless, the clustering of these few participants by viewing condition suggests that the available information may have some diagnostic value, above and beyond that which is available at longer viewing distances or in photographic presentations.
Other aspects of our results bear out this interpretation. In Question 3 of each experiment, we asked participants to guess whether they were in the Mask condition or the No Mask condition (2AFC). The intention here was to draw out more covert detection of hyper-realistic masks, perhaps arising from an uncanny valley phenomenon. We anticipated that the wording of Question 3, combined with the sensitivity of 2AFC as a measure, might lead to a ceiling effect in responses, with all participants guessing that they were in the Mask condition. As it turned out, 2AFC performance did not approach ceiling in any of the experiments (with the planned exception of the low-realism masks in Experiment 3). Instead, ‘mask’ responses were the minority in Experiment 1, Experiment 2 and the Far condition of Experiment 3. Even in the Near condition of Experiment 3, ‘mask’ responses were not reliably above 50%.
Presumably, there must be some critical distance at which viewers spontaneously and accurately distinguish hyper-realistic masks from real faces. After all, painted silicone and human skin are different materials with different surface properties (Motoyoshi, Nishida, Sharan, & Adelson, 2007). We do not know what this critical distance might be, but we can now be confident that the Near distance in Experiment 3 (5 m) exceeds it. That finding may have implications for mask detection in the real world. Classic work on proxemics (Hall, 1966) divides interpersonal space into four radial zones. In this scheme, intimate distance (0–1.5 feet; 0–0.5 m) is associated with physical contact and whispering, personal distance (1.5–4 feet; 0.5–1.2 m) is reserved for interactions among close friends or family, social distance (4–12 feet; 1.2–3.7 m) accommodates interactions among acquaintances, and public distance (>12 feet; > 3.7 m) is occupied by strangers. Our upper bound of 5 m suggests that any critical distance for mask detection falls within social space (4–12 feet; 1.2–3.7 m) or closer in this scheme. Nevertheless, most people do not enter this space. Strangers in particular tend to be seen at longer range, where we now know mask detection is unreliable. One important exception is photo-ID checks (e.g., passport control), which are typically carried out at a distance of one or two metres (Verhoff, Witzel, Kreutz, & Ramsthaler, 2008; Noyes & Jenkins, 2017). Future studies should assess mask detection performance at this closer range. However, anecdotal reports of mask use on airlines (Zamost, 2010) and the prevalence of identification errors in live-to-photo comparisons (Kemp, Towell, & Pike, 1997; Davis & Valentine, 2009; White et al., 2014) do not inspire confidence.
These proxemic considerations raise some interesting questions about the appearances of hyper-realistic masks and their social effects. To date, mask manufacturers have followed a single strategy for evading detection, namely the pursuit of ever greater realism. An interesting direction for future research would be to assess the viability of a complementary strategy: evading detection by manipulating the behaviour of onlookers. It is almost tautological that the less approachable a mask looks, the less inclined viewers will be to approach it and the less likely they will be to reach the critical distance for detection. A similar argument could be made for attractiveness. To the extent that facial attractiveness summons attention (Shimojo, Simion, Shimojo, & Scheier, 2003; Sui & Liu, 2009) and increases dwell time (Leder, Tinio, Fuchs, & Bohrn, 2010), a less attractive mask should receive less scrutiny. Based on such principles, it may be possible to devise a hyper-realistic mask that deflects observers’ minds by (1) maximising viewing distance and (2) minimising visual attention. A brutish-looking pickpocket might arrive at a different set of priorities, favouring a highly approachable mask that allows them to move closer to a target.
In future studies, it would be interesting to isolate the information that leads viewers to guess that they are in the Mask condition. The fact that ‘mask’ responses were more prevalent in the Near condition than the Far condition suggests that high spatial frequency information plays an important role. However, it is not clear whether decisions are driven by local visual features (e.g., surface discontinuities around the eyes or mouth), by more holistic visual features (e.g., wrinkle patterns over the whole face), or by higher-level inferences that are abstracted from such information (e.g., social attributions based on facial appearance). If reliable cues can be established, they could potentially form the basis of a training program aimed at enhancing mask detection. For passive viewing situations, such as reviewing recorded footage, this could be as simple as encouraging observers to monitor for particular visual features.
For interactive situations, such as live identity checks, more active approaches may be feasible. Our informal observation is that wearing a hyper-realistic mask attenuates some forms of facial movement. Even with good contact between the face and the mask, manipulating the mask places additional demands on facial muscles, relative to normal facial movement. Moreover, movements that may be clear and distinct at the internal surface of the mask (where they are initiated) will be partly absorbed by the silicone on their way to the external surface (where they are seen). These attenuation effects may be negligible for coarse movements such as rotation of the head on the neck, and opening and closing of the jaw. Nevertheless, emotional expressions such as smiles and frowns generally appear muted, and subtle expressions are often lost altogether.
The overall facial impression, at least in extended interactions, is one of blunted animacy. It is possible that, under appropriate testing conditions, this impression might be enough to cue detection of a hyper-realistic mask, perhaps by tipping the interaction into the uncanny valley. However, it may also encourage false positives for low-animacy real faces. Thus, blunted animacy in the face may be more diagnostic when it is paired with incongruous animacy cues from the body or voice. Various aspects of facial appearance, including apparent age, gender and emotion, can shape viewers’ expectations about how a person is likely to move and speak (e.g., Lander, Hill, Kamachi, & Vatikiotis-Bateson, 2007; Johnson, McKay, & Pollick, 2011; Van den Stock, Righart, & De Gelder, 2007; Montepare & Zebrowitz-McArthur, 1988). Violations of those expectations, such as sprinting centenarians, may allow viewers to infer the presence of a mask, even if the mask itself is entirely convincing.
Speech could be revealing for other reasons too. Normal speech comprehension is strongly supported by visual lip-reading (Campbell, 2008; McGurk & MacDonald, 1976). However, the lips of a hyper-realistic mask fully cover the lips of the wearer (Fig. 1). This arrangement has a number of implications for speech and lip-reading. First, it introduces a physical barrier between the wearer’s lips, presumably impeding production of phonemes that require contact between the lips (e.g.,/b/,/p/,/m/), or between the teeth and the lower lip (e.g.,/f/,/v/). Second, it reduces the pliability of the whole mouth area, presumably impeding articulation more generally. Reduced lip movement implies reduced visual support for speech understanding (Campbell, 2008). It also suggests that hyper-realistic masks may affect the auditory stream in distinctive ways. Ironically, auditory information may provide the best hope of solving this difficult visual task.
Perception of emotional expression, uncanny valley effects, cue integration and speech comprehension are all matters that can be unpicked experimentally. Our observation (Experiment 1) of elevated detection rates for participants with prior knowledge of hyper-realistic masks suggests that training to enhance performance is possible at least in principle. The optimal form of training remains to be determined.
We also tested for other-race effects in mask detection. Other-race effects were originally observed in the context of face identification – a task that requires fine perceptual discriminations. Given that distinguishing hyper-realistic masks from real faces also requires fine perceptual discriminations, we wondered whether performance would be poorer for other-race faces than for own-race faces. The evidence on this particular point was not very clear. Floor effects in the Open question and Prompted question make it difficult to draw any conclusions about race effects in overt detection, beyond noting that the task defeated own-race and other-race viewers alike. The same manipulation did have some impact on responses to the 2AFC item, but even here the different experiments present a mixed picture. Experiment 1 (UK participants) and Experiment 2 (Japanese participants) were both based entirely on Western face images. Comparing across experiments, Japanese viewers were somewhat more likely than UK participants to guess that they were in the Mask condition (rather than the No Mask condition), but this difference was not statistically significant. Experiment 3, using a fully crossed design and a larger sample, found a significant difference in the same direction, namely that other-race viewers were more likely than own-race viewers to guess that they were in the Mask condition. On its own, this effect might suggest an other-race advantage in distinguishing real faces from hyper-realistic masks, which would contrast with the other-race disadvantage that is standard in identification tasks. However, the Real face condition undermines this interpretation – for real faces, too, other-race viewers were disproportionately likely to guess that they were in the Mask condition. That finding is not consistent with an other-race advantage in distinguishing real faces from hyper-realistic masks. Instead, it suggests an overall bias towards guessing ‘mask’.
This interpretation of the 2AFC data accords with the array challenge findings (Experiments 1 and 2). In the array challenge, Japanese participants picked out the mask significantly less often than the UK participants. Given that the stimuli were Western face images, this pattern resembles the expected disadvantage for other-race faces. It is not obvious how one might square an other-race disadvantage in the array challenge with an other-race advantage in the 2AFC. However, no such tension arises between an other-race disadvantage in the array challenge and a decision bias in the 2AFC.
Why might other-race viewers be especially inclined to guess that they are in the Mask condition? One possibility is that, at least in the campus locations tested, other-race faces are simply less prevalent than own-race faces. That being the case, if the confederate presents an other-race face, the participant has to explain the balance of probabilities. Either they just happen to be witnessing a (relatively) rare event, or they are subject to an experimental manipulation. Presumably, some proportion of participants finds the latter explanation more compelling than the former. If this argument is sound, we expect that equating the frequencies of own-race and other-race stimuli in a laboratory experiment should give rise to an other-race disadvantage.
Hyper-realistic masks fool most people most of the time. This finding should be unsettling, not least because it indicates a new frontier in deception. Covering the face may be grounds for suspicion when the intent is to conceal identity. Yet, historically, such deception has been easy to detect. In hyper-realistic masks, we confront the prospect of face coverings that shroud the wearer, yet are themselves accepted as real faces. It is difficult to estimate how many of these masks are already in circulation. However, as documented cases attest, their proliferation poses a challenge for face recognition in applied settings, including crime prevention and border control. We expect that increasingly sophisticated manufacturing techniques will continue to improve the quality of these masks and to drive prices down. Keeping pace with these improvements will require increasingly sophisticated countermeasures, perhaps including consciousness raising, personnel development and supplementary imaging methods. Machine vision researchers have made some interesting progress on this front (e.g., Erdogmus & Marcel, 2014; Kose & Dugelay, 2013). The conditions are conducive to a new arms race in face identification between deception and detection.