That person is now with or without a mask: how encoding context modulates identity recognition

Garcia-Marques, Teresa; Oliveira, Manuel; Nunes, Ludmila

doi:10.1186/s41235-022-00379-5

Original article
Open access
Published: 01 April 2022

That person is now with or without a mask: how encoding context modulates identity recognition

Cognitive Research: Principles and Implications volume 7, Article number: 29 (2022) Cite this article

3075 Accesses
7 Citations
Metrics details

Abstract

Previous research has mostly approached face recognition and target identification by focusing on face perception mechanisms, but memory mechanisms also appear to play a role. Here, we examined how the presence of a mask interferes with the memory mechanisms involved in face recognition, focusing on the dynamic interplay between encoding and recognition processes. We approach two known memory effects: (a) matching study and test conditions effects (i.e., by presenting masked and/or unmasked faces) and (b) testing expectation effects (i.e., knowing in advance that a mask could be put on or taken off). Across three experiments using a yes/no recognition paradigm, the presence of a mask was orthogonally manipulated at the study and the test phases. All data showed no evidence of matching effects. In Experiment 1, the presence of masks either at study or test impaired the correct identification of a target. But in Experiments 2 and 3, in which the presence of masks at study or test was manipulated within participants, only masks presented at test-only impaired face identification. In these conditions, test expectations led participants to use similar encoding strategies to process masked and unmasked faces. Across all studies, participants were more liberal (i.e., used a more lenient criterion) when identifying masked faces presented at the test. We discuss these results and propose that to better understand how people may identify a face wearing a mask, researchers should take into account that memory is an active process of discrimination, in which expectations regarding test conditions may induce an encoding strategy that enables overcoming perceptual deficits.

Introduction

One of the public health guidelines established as a response to the current coronavirus outbreak was the use of masks that cover the bottom half of an individual’s face. Here, we draw on the memory processes involved in facial recognition to examine how the occlusion of the bottom half of a face impacts facial recognition and target identification.

Before the COVID-19 pandemic, researchers had already addressed how facial occlusions—the use of sunglasses, hats/caps, scarfs, beards, medical masks, and religious veils—impact targets’ recognition/identification (e.g., Davies & Flin, 1984; Hockley et al., 1999; Mansour et al., 2017; Nguyen & Pezdek, 2017; Righi et al., 2012; Terry, 1994). All these forms of facial occlusion appeared to deteriorate the recognition of individuals’ identities (e.g., Davies et al., 1977; McKelvie, 1976; Patterson & Baddeley, 1977).

In line with these studies, Freud et al. (2020) showed evidence indicating that face masks impaired identity recognition, regardless of being present when a face is initially encountered (i.e., at encoding) or during a recognition test. Specifically, in their Experiment 2, Freud et al. (2020) found that the presence of a mask in either the study phase or the test phase equally impaired facial recognition. However, this disturbance did not occur when the faces were inverted, that is when the holistic processing of the faces was already disrupted (e.g., Richler et al., 2011). Thus, masks appeared to impair face recognition by disrupting the holistic processing that sustains our ability to identify or recognize a face.

Yet the effects of occluding a face initially, at the encoding phase, or subsequently, at the test phase, might not be independent. Specifically, the match or mismatch of encoding and test conditions might influence whether occlusions impair recognition. For instance, sunglasses occluding the region of the eyes impair target recognition more when encoding and test conditions mismatch (i.e., when the target is initially seen with glasses and subsequently shown without glasses or vice versa) than when the target is shown in both encoding and test with sunglasses (e.g., Leder et al., 2011; Patterson & Baddeley, 1977; Righi et al., 2012; Terry, 1993, 1994). Also, the presence of occlusions during encoding might be more detrimental than during retrieval, as removing the glasses during testing decreases accurate facial identification more than adding them (Douma et al., 2012). Manley et al. (2019) showed that matching appears to occur for masked faces in a lineup identification paradigm, where they manipulated orthogonally the presence of a facial mask both at encoding (seeing an unfamiliar unmasked face vs. a masked face) and retrieval (identifying an unmasked face vs. a masked face). The identification of a face that had been encoded as a masked faced was higher in a masked-face lineup than in an unmasked-face lineup, showing matching effects.

Only designs that orthogonally manipulate the presence of a mask at encoding and retrieval are able to detect the interplay between encoding and retrieval conditions that is assumed to lead to matching effects. This interplay was first hypothesized within the encoding-specificity framework (e.g., Tulving & Thomson, 1973), assuming that “what is stored is determined by what is perceived and how it is encoded and is stored determines what retrieval cues are effective in providing access to what is stored” (p. 353). Congruent with this hypothesis, research has indicated that memory performance increases when the cognitive processes involved in encoding operations are similar/relevant to those involved in retrieval operations, either because the processing conditions of retrieval and encoding are the same (i.e., transfer appropriate processing; see Franks et al., 2000, for a review) or because the conditions of encoding are relevant to or appropriate for the retrieval strategy (Gardiner et al., 1972; Nairne, 2002). In the recognition of occluded faces, matching effects supposedly occur through the transfer of appropriate processing (see Manley et al., 2019), related to the holistic process imposed by a full face and the featured process imposed by the presence of a mask. The research on transfer of appropriate processing suggests that the impact of a mask on facial recognition also likely depends on the dynamic relationship between encoding and retrieval conditions, so that we should expect to find evidence of matching effects when testing for the impact of masks in face recognition.

The reviewed literature suggests that to better understand the current daily impact of a mask in face recognition, we need more evidence than the one offered by studies focusing on how masks interfere with face perception mechanisms and that did not manipulate the presence of masks in encoding and test phases orthogonally. We need evidence about how masks interfere with memory mechanisms and whether such memory mechanisms may help to overcome deficits promoted by masks interfering with the holistic apprehension of a face. The hypothesis that a matching effect is likely to occur with regard to masked and unmasked faces at encoding and retrieval—faces studied with a mask should be better remembered with a mask than without it—is one example of how a memory mechanism can overcome negative effects of masks on face recognition.

Another example relates to the fact that a “masked-faces” memory context may lead faces to be processed differently. An example is the memory testing effect, showing that the expectations of a memory test can induce the use of different encoding strategies (e.g., Finley & Benjamin, 2012). Knowing in advance that our environment is dynamic and that a specific face can later be met with or without a mask is likely to lead individuals to adapt their encoding strategies. Given that memory is an active process, individuals can exercise strategic control over encoding and recall processes (for a review, see Tullis & Benjamin, 2015,) and adapt their encoding strategies to what they anticipate encountering in the test phase (Benjamin, 2007; Dunlosky and Kane, 2007; Finley et al., 2010; Serra & Metcalfe, 2009). Illustrating this, Finley and Benjamin (2012) induced participants to expect a cued recall test or a free recall test and showed that those who received a test that matched their expectations outperformed those who received a mismatched test. Garcia-Marques et al. (2015) followed up on this hypothesis, showing that experienced retrieval contexts can affect subsequent encoding. That is, the specific requirements of retrieval contexts appear to affect subsequent encoding with consequences for recognition and free recall performance. In their experiments, after learning the structure of the test, participants adopted an encoding strategy that avoided a conceptual-based encoding and instead relied on feature-based encoding, which facilitated performance in the test. Performance at retrieval depends on individuals’ ability to attend, during encoding, to the cues that are relevant to subsequent recognition (e.g., Eysenck, 1979; Geiselman et al., 1986; Jacoby et al., 1979; Nairne, 2002; Roediger & Guynn, 1996). In the case of facial recognition, although holistic processing usually supports it, individuals experience different facial features as varying in their diagnostic value for facial identity apprehension (e.g., Nam et al., 2012; but see Sporer, 1991). Previous research suggests that the region of the eyes has some of the most informative value for facial identification (e.g., Davies et al., 1977; Gosselin & Schyns, 2001; Haig, 1985, 1986; Christie et al., 1981) and that facial identification accuracy is more affected by the omission or alteration of upper facial features (e.g., eyes) than lower facial features (e.g., mouth; see Davies et al., 1977; Sinha et al., 2006). The studies developed by Sadr et al. (2003) clarified that not only the eyes but also the eyebrows are among the most important facial features affecting face recognition, as their participants were better at correctly identifying famous faces lacking eyes than lacking eyebrows. If this is the case, masks do not cover the most diagnostic cues for facial recognition.

Several other studies have shown that, in a mutating memory context, individuals adapt their encoding strategies, impacting their face recognition abilities (e.g., Light et al., 1979; Sporer, 1991; Wells & Hryciw, 1984). For instance, by instructing participants to make attributional judgments to disguised faces, Patterson and Baddeley (1977) obtained an improvement in face recognition when compared with undisguised faces (but see Davies & Flin, 1984, for a null effect). Also, if instructions call attention to the individuality of a face, participants will likely attend to more detailed features of the face (Schwartz & Yovel, 2016; Hugenberg et al., 2010). For example, presenting a person’s name along with a face improves the recognition of that face relative to when no name is presented (Schwartz & Yovel, 2016).

Although test expectations may be provided by task instructions, the most likely is that individuals implicitly appraise the context and create their expectations about the memory environment. Participants in a heterogenous encoding context of masked and unmasked targets are likely to apprehend the mutating features of such environment and encode the targets’ distinctive features more than they would in a homogenous encoding context of solely masked or unmasked targets. Previous research showed that recognition can be modulated by encoding conditions, mostly defined by the composition of study lists—mixed lists (i.e., items with and without a manipulated feature appear intermixed in the same list) are compared with pure lists (i.e., items with and without the manipulated feature appear in separate lists) (Jonker et al., 2014; McDaniel & Bugg, 2008; Mulligan & Peterson, 2015; see McDaniel & Bugg, 2008, for a review). The impact of list composition has been detected in production effects (e.g., MacLeod et al., 2010), generation effects (e.g., Slamecka & Graf, 1978), bizarreness effects (e.g., Einstein & McDaniel, 1987), perceptual interference effects (Nairne, 1988), and picture-complexity-superiority effects (Nguyen & McDaniel, 2015). These effects occur in mixed-list but not in pure-list designs, suggesting that encoding conditions that foster the processing of distinctive features may be able to overcome the negative impact of the disruptions of holistic processing. In line with this, Winograd (1981) showed that when participants focused on the processing of distinctive features between items, their performance in a subsequent facial identification task was better than when participants were not led to focused on those distinctive features.

In sum, although evidence shows that masks interfere with holistic processing, disturbing facial identification/recognition, it is not yet clear the role that encoding and retrieval dynamics might play in this process. To contribute to the clarification of this issue, we tested how features of the memory context intervene in masked faces identification/recognition. We tested for evidence of: (a) matching effects (better performance in matching than mismatching study-test conditions), (b) list composition effects (mixed lists compared to pure lists would make processing of masked and unmasked faces more similar), and (c) effects of previous testing (whether individuals’ prior expectations regarding future test features will guide their encoding).

Current studies

Our research plan encompasses the development of a set of three independent face recognition experiments, with a design that fully crossed masked versus unmasked conditions, presented at test and study. In Experiment 1, we used pure lists in a between-participants manipulation. Two of these four experimental conditions (the mismatch conditions) are identical to the ones used by Freud et al.’s (2020). By using the full possible combinations of masked vs. unmasked at test or at study, we aim to clarify whether Freud et al.’s (2020) results replicate in a context where participants experience a mutating environment, and whether the effect is sustained by two simple main effects or instead emerges from an interaction between study and test conditions. We expected to find that masks generally impair target identification/recognition, given the known interference with holistic perceptual processing, However, memory processes can also interfere with performance in a different way. Thus, we expected that masks promoted less impairment if they were present both at study and test, suggesting that when individuals study a face with a mask, they will subsequently better recognize the upper facial region (i.e., a matching effect).

Experiments 2 and 3 tested whether the same interplay between study and test conditions occurred when masked faces were mixed with unmasked faces, that is in a within-participants design with mixed lists. We assumed that in this memory context participants would better attend to distinctive features of the upper half of the face, independently of the face wearing a mask or not. If that is the case, the presence of a mask at encoding should not deteriorate participants’ memory performance (measured by memory sensitivity). This memory performance should become more dependent on the features of the stimuli that better represent a “distinctive” cue (i.e., a cue that is uniquely associated with the to-be-remembered item; see Roediger & Guynn, 1996), namely the region of the eyes.

In Experiment 3, we directly tested the relevance of test expectations offered by instructions. To do so, we compared conditions where the instructions directed individuals’ attention to features that are likely to change between the study and the test phases (i.e., in the test phase the same face could appear with or without a mask) with conditions where instructions did not direct attention to any particular features.

We tested our hypotheses by using signal detection theory (SDT) and thus assessing the sensitivity (d’) and bias or decision criterion (c) indexes. According to SDT (e.g., Kadlec, 1999; Kellen et al., 2021; Lockhart & Murdock, 1970; Van der Kellen et al., 2008), better accuracy is defined as a higher sensitivity (d’) of target identity, and response caution is defined by using a more conservative criterion in making a positive identification.

For the between participants design, we expected differences in d’ both in the study and the test conditions, in such that unmasked faces would always be better identified than masked faces, regardless of being presented at study or test. The matching hypothesis would be translated into a specific interaction between the study and test masked versus unmasked conditions, suggesting that performance would be higher for matching conditions than for mismatching conditions. The same would be less likely to occur with a mixed list design in Experiments 2 and 3. If, as expected participants use the same diagnostic features to similarly encode masked and unmasked faces, no main effect of study conditions should emerge, and matching effects should be less likely to occur. In Experiment 3, we expect to clarify the role that the instructions that explicitly direct attention to the mutating features of the environment play in preventing the impact of a mask at encoding.

Differences between the decision criterion (c) in each experimental condition can occur either because the presence of a mask reduces or increases the likelihood of participants saying they recognize a face (positive responses). In the first case, responses should show that participants were more cautious (higher c) in providing an identification when the face is wearing a mask or more lenient (lower c) in providing an identification when the face is not wearing a mask. Thus, as an exploratory hypothesis, we also analyzed whether mixed lists induced more lenience, as a result of participants having followed an encoding strategy more independent of the presence of a mask.

Experiment 1

Participants

A sample of 169 Portuguese students (82% women) with a mean age of 21 years old (SD = 4.89; range 18–27) participated in this study in exchange for credits in an introductory psychology course. For power analysis, we relied on Shapiro and Penrod’s (1986) meta-analysis, which reported an effect size for correct identifications (hits) of disguised faces of d = 0.71 and considered an effect size of f = 0.25. Power analysis using G*Power (Faul et al., 2007) suggested the need for at least 128 participants to detect an effect size f = 0.25, 80% power and α = 0.05, regarding the detection of the two main effects and the interactions associated with the planned between-participant design (study list: unmasked face vs. masked face x test list: unmasked face vs. masked face).

Design

This experiment has a between participants 2 × 2 factorial design, having as independent variables face type at study (masked vs. unmasked) and face type at test (masked vs. unmasked).

Materials

The full-face stimuli were extracted from Face Research Lab London Set (DeBruine & Jones, 2017). Thirty-two colored photographs of faces (16 men, 16 women) portraying neutral expressions were selected from the central gaze directions set. The faces selected had similar ages and similar hairstyles (see Fig. 1).

Face masking was implemented using the OpenCV v3.4.2 and dlib v19.19 modules within a Python 3.7 environment. A masked face version of each face image was created by overlaying an edited image of a medical-looking face mask (retrieved from Google Images) onto each face image. Because faces differed in size and structure, we developed a program that dynamically resized and fit the mask image to a fixed configuration of facial landmarks defining the facial region typically covered by a face mask. To address variations in facial shape, each face’s landmarks was dynamically determined by a histogram of oriented gradients (HOG)-based face detector (Dalal & Triggs, 2005) pretrained on a large set of faces under highly variable conditions of expression and environmental factors (300-W database; Sagonas et al., 2016).

At study, all the 32 faces were presented either with a mask (pure masked list) or without a mask (pure unmasked list). For the test phase, half of the studied faces were presented (i.e., old items). For counterbalancing these materials, each gender set of 16 faces was randomly divided into two subsets of eight faces. The test lists either presented one or the other subset. The faces used in the test phase were the exact same faces that had been studied but they could be presented as they had been studied (i.e., masked or unmasked) or with masks added or removed, relative to the study phase. A total of 20 additional faces taken from the same database (10 men, 10 women) were chosen to be randomly presented in the test phase as “new” faces.

Procedure

Participants were invited to take part in a face memory study, and after informed consent was obtained, they accessed a link to a Qualtrics survey that supported the experimental procedure and guaranteed the equal distribution of participants to the four experimental conditions, defined by the type of face studied (unmasked face vs. masked face) and used at test (unmasked face vs. masked face). The instructions stressed that in the memory test, participants should recognize not the photograph but the person in it: “Try to remember the person behind the mask to be able to correctly recognize her or him.” In the study phase, each participant attended to 32 faces randomly presented at the center of the screen. For half of the participants, those faces belonged to a pure list of unmasked faces, and for the other half the faces belonged to a pure list of masked faces. Each face was shown for 10 s. Then, participants performed a filler perceptual task for 15 min (estimating the width or the length of an image) in order to displace the content from working memory. In the test phase, participants saw 16 studied and 20 new faces, presented randomly at the center of the screen. These faces either matched or mismatched the study mask condition. On each screen, below each face, there were presented two affirmations: “I recognize the person” and “This is a new person.” Participants selected the one that better represented their answer. At the end, participants provided their demographic data (age and gender) and were thanked for their participation.

Dependent measures

The proportions of correct face recognition (hits) and incorrect positive responses (false alarms; FAs) were calculated for each participant. These were used to calculate d′ as an index of sensitivity and c as an index of general response tendencies (see, for example, Kadlec, 1999).

Results

We used a 2 × 2 between-participants ANOVA to analyze the proportion of hits, sensitivity (d′), and bias (c). Post hoc analysis supporting the interpretation of the interactions relied on Tukey statistics.

Proportion of correct identifications (hits)

Although faces studied without a mask were better identified than faces studied with a mask (M = 0.61, SE = 0.02 and M = 0.57, SE = 0.02, respectively), this difference did not reach conventional levels of significance, F(l, 164) = 3.26, MSE = 0.09, p = 0.068, η_p² = 0.02. Neither a main effect of test condition, F(l, 164) = 2.44, MSE = 0.06, p = 0.129, nor the interaction, F(l, 164) = 0.01, n.s.^{Footnote 1} were significant.

Sensitivity (d′)

In this analysis, only the study conditions showed a reliable main effect, F(l, 164) = 4.97, MSE = 1.15, p = 0.027, η_p² = 0.029, occurring because faces studied with a mask were less discriminable than faces studied without a mask (M = 0.60, SE = 0.05 and M = 0.77, SE = 0.05, respectively). In the test phase, although results suggested that masked faces were less accurately identified than unmasked faces (M = 0.62, SE = 0.05 and M = 0.75, SE = 0.05, respectively), this difference did not reach significance, F(l, 164) = 3.28, MSE = 0.75, p = 0.072, η_p² = 0.02. However, test and study conditions interacted, F(l, 164) = 10.39, MSE = 2.40, p = 0.002, η_p² = 0.06,. The pattern of the study by test interaction suggests that a matching effect was at work (see Fig. 2). Faces studied without a mask were better recognized without it (M = 0.95) than with it (M = 0.58), t(165) = 3.60, p = 0.002, d = 0.77. Faces studied with a mask were better recognized with a mask (M = 0.65, SE = 0.08) than without it (M = 0.54, SE = 0.07), although this comparison was not significant, t(164) = 0.99, n.s.

Criterion (c)

Results indicated that the study conditions did not directly impact individual response tendencies. F(l, 164) = 0.60, n.s. However different response tendencies were developed in the test phase, F(l, 164) = 8.05, MSE = 1.24, p = 0.005, η_p² = 0.05, showing that participants were more lenient (lower c) in their identifications of faces tested with a mask than without a mask (M = 0.01, SE = 0.04 and M = 0.18, SE = 0.04, respectively).

The main effect of test condition was qualified by the study condition (see Fig. 3), and this significant interaction, F(l, 164) = 3.90, MSE = 0.60, p = 0.049, η_p² = 0.02, occurred because the effect of the test condition was clear for faces studied without a mask [t(165) = 3.44, p = 0.004, d = 0.74] but not significant for faces studied with a mask [t(164) = 0.06, n.s.].

Discussion Experiment 1

The results of Experiment 1 indicate that, as expected, the presence of a mask at study interferes with subsequent levels of recognition/identification of a face. This effect is better detected with d’ than with a simple count of hits (correct identifications) because d’ considers parameters that allow for an estimation of the ability to discriminate between signal and noise during the test phase. Data also suggest that a mask at test appears to interfere with levels of recognition, although only by interacting with study conditions, because there was no clear main effect of test condition. These results are partially consistent with Freud et al.’s (2020) conclusions about masks interfering with encoding and retrieval processes. In fact, sensitivity data replicated the results they had obtained in their second experiment because when comparing only conditions with a mismatch between study and test conditions (masked-unmasked versus unmasked-masked), they obtained no differences between conditions. However, the results obtained by Freud et al. (2020) might be explained by their approach of isolating the two cells of a full design where the study and test conditions interact. Our data suggest that such an approach might not be optimal to fully understand the impact of masks in face recognition.,

Regarding the d’ index, and contrary to what was expected, the interaction between study and test conditions did not fully cross. As such, rather than documenting better performance in matching study-test conditions, the interaction may have emerged only because it was easier for those in unmasked-unmasked conditions to identify a target, than in all the other 3 conditions.

The c index of the SDT approach also clarifies that participants are more lenient in offering positive identifications of a masked face than an unmasked face. However, the effect occurs more clearly if at study the faces were seen without a mask. One reason for this to occur is that by continuing to rely on the holistic processing that supported encoding, participants projected their memories over all the masked faces they saw at test. If that is the case, the same results should not be expected in the within-participants design of our next studies because the use of such a strategy is less likely to occur in a context where encoding strategies are not homogeneous, and a single strategy does not serve well all the stimuli encountered. The strategies that are adapted to each type of stimuli tend to work better when isolated in between-participants designs (see Forrin et al., 2016).