Face masks versus sunglasses: limited effects of time and individual differences in the ability to judge facial identity and social traits

Bennetts, Rachel J.; Johnson Humphrey, Poppy; Zielinska, Paulina; Bate, Sarah

doi:10.1186/s41235-022-00371-z

Original article
Open access
Published: 16 February 2022

Face masks versus sunglasses: limited effects of time and individual differences in the ability to judge facial identity and social traits

Cognitive Research: Principles and Implications volume 7, Article number: 18 (2022) Cite this article

6015 Accesses
16 Citations
19 Altmetric
Metrics details

Abstract

Some research indicates that face masks impair identification and other judgements such as trustworthiness. However, it is unclear whether those effects have abated over time as individuals adjust to widespread use of masks, or whether performance is related to individual differences in face recognition ability. This study examined the effect of masks and sunglasses on face matching and social judgements (trustworthiness, competence, attractiveness). In Experiment 1, 135 participants across three different time points (June 2020–July 2021) viewed unedited faces and faces with masks, sunglasses, or both. Both masks and sunglasses similarly decreased matching performance. The effect of masks on social judgements varied depending on the judgement and whether the face was depicted with sunglasses. There was no effect of timepoint on any measure, suggesting that the effects of masks have not diminished. In Experiment 2, 12 individuals with developmental prosopagnosia (DP) and 10 super-recognisers (SRs) completed the same tasks. The effect of masks on identity matching was reduced in SRs, whereas the effects of masks and sunglasses for the DP group did not differ from controls. These findings indicate that face masks significantly affect face perception, depending on the availability of other facial information, and are not modified by exposure.

Introduction

In response to the COVID-19 pandemic, use of face coverings (masks)^{Footnote 1} was introduced to mitigate the spread of the disease in many countries (Felter & Bussemaker, 2020). Masks offer substantial public health benefits (Brooks & Butler, 2021; Howard et al., 2021), but there is preliminary evidence that they may impair social interactions by obscuring areas of the face that carry cues to emotions, identity, speech information, and other social judgements (Biermann et al., 2021; Carbon, 2020; Freud et al., 2020; Marini et al., 2021; Noyes et al., 2021; Saunders et al., 2021). However, the consistency of these effects over time (i.e., as people adjust to the use of masks in everyday life) and across individuals remains unclear. In this study, we examine the effects of masks on two face perception tasks: face identity matching and social judgements (trustworthiness, competence, and attractiveness), and compare them to the effects of sunglasses. Specifically, we examine (1) whether the effects of face coverings remained consistent over 13 months during the pandemic; and (2) whether the effects of face coverings on face perception are related to individual differences in face recognition ability.

Masks and identification

There are good reasons to believe that identity matching (specifically, matching unfamiliar faces) could be adversely affected by masks. Decades of research has established that unfamiliar face matching is error-prone, and even small variations between images can reduce accuracy. For example, in simple tasks which present two images side by side and ask participants to judge whether they are the same or a different person, accuracy tends to vary from around 80–90% (e.g., Burton et al., 2010; Carragher & Hancock, 2020; Megreya & Burton, 2007). Accuracy declines further when images contain naturalistic variability (e.g., pictures taken at different time-points and in different settings; Bate et al., 2018; Fysh & Bindemann, 2018). Similarly, the addition of simple props or occlusions, such as spectacles or sunglasses, also has a negative impact on unfamiliar face matching (Graham & Ritchie, 2019; Kramer & Ritchie, 2016), particularly when only one image is displayed with eyewear.

There is comparatively less work on occlusions which, like masks, obscure the lower face. Freud et al. (2020) and Marini et al. (2021) both investigated the effects of masks on face learning. Marini et al. found that presenting faces with masks in the learning phase impeded subsequent recognition. Notably, these effects were even present when the faces were shown with transparent masks (Marini et al., 2021), supporting the claim that even subtle occlusions can impair unfamiliar face processing. Freud et al. reported similar results: they found that learning masked face impeded subsequent recognition accuracy (and vice versa).

Studies on face matching also support the claim that masks impede unfamiliar face identification. Carragher and Hancock (2020) investigated the effects of masks on simultaneous face matching, and found that both human observers and computer-based face recognition systems are negatively affected by the addition of masks to images. Notably, these effects were similar in scale regardless of whether one or both images to be matched were shown wearing a mask. Furthermore, masks also biased participants’ responding, making it more likely that they would classify unfamiliar faces as “different” or “mismatched”. Noyes et al. (2021) found similar results (although a smaller effect size) for both accuracy and bias using more naturalistic images (images sourced from the internet, rather than images with masks edited on).

Noyes et al. (2021) also examined the effects of different types of facial occlusions, and reported that the effect of masks on unfamiliar face matching was slightly larger than the effect of sunglasses. Nonetheless, accuracy in both conditions remained well above chance levels, suggesting that occlusion of any one area or feature does not abolish face recognition abilities. This is in line with research using the “bubbles” technique (which reveals areas of the face necessary for identification; Gosselin & Schyns, 2001) and research into “critical features” in face recognition (Abudarham & Yovel, 2016) which suggest that information from both the upper and lower regions of the face is important during face identification tasks. Noyes et al.’s results indicate that individuals may use this information flexibly when some areas of the face are occluded. However, it is unclear what effect multiple occlusions (i.e., sunglasses and masks together) may have on identification, particularly with more variable images (e.g., images taken with different cameras or at different timepoints).

Face coverings and social judgements

Identity is not the only information carried in a face. People frequently make social attributions such as how trustworthy, competent, or attractive a person is based on their face, and these attributions can have consequences for a wide range of behaviours, such as behaviour in economic games, voting choices, and dating (Todorov et al., 2015).

Compared to identity, there have been fewer studies examining the effects of masks and other occlusions on social judgements. Graham and Ritchie (2019) assessed the effects of spectacles and sunglasses on social trait judgements, and found that sunglasses (but not spectacles) reduced judgements of trustworthiness, but did not affect judgements of competence or attractiveness. Information from both the eye and mouth region is involved in trustworthiness and competence judgements (Dotsch & Todorov, 2012; Olivola & Todorov, 2010; Riggio & Riggio, 2010), so it is reasonable to assume that masks could lead to similar perceptual effects. In support of this, Marini et al. (2021) found that transparent face masks did not have a significant effect on trustworthiness judgements, whereas opaque masks did affect trustworthiness judgements for some faces (specifically, they increased trustworthiness ratings for “untrustworthy” faces) (see also Biermann et al., 2021). Attractiveness judgements are influenced by facial symmetry (Rhodes, 2006) and contrast, particularly in the eye region (Killian et al., 2018). Consequently, the potential effects of masks on attractiveness judgements are unclear: they may occlude some important cues to attractiveness by making symmetry judgements more difficult, whilst leaving other cues available.

Thus, there is good reason to believe that face coverings, including masks and sunglasses, affect identification and social judgements. However, less is known about whether and how this varies over time.

Variability of effects over time

Much of the data about masks and face processing reported to date was collected in the early months of the pandemic—for example, Biermann et al. (2021) collected data between July and October 2020, when masks had been mandatory for three to six months in Germany (where the data was collected). At this point, most individuals in Western countries had relatively limited exposure to masks, so it is unsurprising that this unfamiliar perceptual occlusion disrupted face perception. However, there is some evidence that face perception can adapt to perceptual input over time. For example, own-group biases (e.g., the own-race effect, own-age effect) are a phenomenon whereby people show poorer performance when identifying faces of a different social group than their own (Anastasi & Rhodes, 2005; Meissner & Brigham, 2001). These biases are often attributed to a lack of perceptual experience with the “out-group”. However, prolonged contact with the “out-group” (Hancock & Rhodes, 2008; Harrison & Hole, 2009) and training programmes focused on individuating out-group faces (Tanaka & Pierce, 2009) can ameliorate these biases, indicating that exposure and training may mitigate perceptual limitations in our face processing system.

It is possible that the same processes could act to mitigate or compensate for the effects of masks on identification. In support of this, a very recent paper found that training individuals to focus on diagnostic features (e.g., ears or visible marks) could improve covered face recognition (Carragher et al., 2021). It is possible that, following prolonged periods of exposure to masks (i.e., in the 12–16 months since face coverings were mandated in certain areas of the UK), individuals could adapt or develop their own strategies which could reduce the effects of masks on identification. A change in effects over time could also explain why previous studies (Carragher & Hancock, 2020; Noyes et al., 2021) found different effect sizes for unfamiliar face matching, despite the use of relatively similar tasks.

Opinions on masks and the proportion of people wearing them regularly have also varied over time (Nolsoe, 2021; Smith, 2020), which raises the possibility that social judgements about mask-wearers could also vary depending on the context at the time of data collection. However, no research to date has attempted to determine whether or how the effects of masks on face perception have changed over time as individuals adapted to the effects of masks. Consequently, the first aim of the current study was to examine the effects of masks and other facial occlusions (sunglasses) on identification and social judgements at three different time points, spanning more than 13 months.

Variability of effects across individuals

The second aim of the current study was to examine how the effects of masks on identification vary between individuals. It is apparent from data collected in previous studies that the effects of masks in identification vary substantially between individuals (Marini et al., 2021; Noyes et al., 2021). However, it is unclear what factors can account for the variability in the effects of masks on face processing.

One possibility is that the effects of masks on identification might be associated with face recognition ability. Face recognition varies substantially in the general population (Bowles et al., 2009; Germine et al., 2011). At one extreme of this variability, there are some individuals who have very poor face recognition skills, despite relatively normal intellectual capacities and low-level vision—this is referred to as developmental prosopagnosia (DP; also sometimes referred to as ‘congenital prosopagnosia’; Bate & Tree, 2017; Corrow et al., 2016; Susilo & Duchaine, 2013). At the other extreme, some individuals have extraordinarily good face recognition skills—these individuals have been referred to as “super-recognisers” (Bate et al., 2018; Bennetts et al., 2017; Bobak et al., 2016a; Ramon, 2021; Russell et al., 2009).

Currently, we do not know of any research in DP that has investigated naturalistic face occlusions such as sunglasses and face coverings. While face perception in DP is heterogeneous (Bate et al., 2019c; Dalrymple et al., 2014; Klargaard et al., 2018; Palermo et al., 2011), there is some evidence that, on a group level, individuals with DP might show particular difficulty with naturalistic face transformations (e.g., matching images despite changes in viewpoint, lighting, or other transformations) (White et al., 2017). Further, some individuals with DP also report relying on unusual feature-based or extra-facial strategies to recognise individuals (Adams et al., 2020; Murray et al., 2018). The use of atypical strategies in DP is supported by eye-tracking research which shows that, compared to typical controls, some individuals with DP spend a higher proportion of their time looking at unusual areas of the face (e.g., the mouth, Bobak et al., 2017; hairline, neck, and chin, Schwarzer et al., 2007) or body (Bobak et al., 2017). This may make individuals with DP particularly vulnerable to the effects of masks, particularly in cases where extra-facial information is limited or unreliable.

While SRs tend to perform exceptionally well on tasks involving face memory, their performance on face perception tasks is also heterogeneous (e.g., Bate et al., 2018, 2019d; Bobak et al., 2016a; Noyes et al., 2021). Furthermore, SRs are not impervious to the same biases that affect face recognition in the typical population—for example, SRs display “own-age” and “own-ethnicity” biases (Bate et al., 2019a, 2020). Noyes et al. (2021) also found that SRs performed worse when matching unfamiliar faces with masks or sunglasses, compared to uncovered faces. Notably, though, their pattern of performance across conditions was different to that of typical perceivers: while SRs were equally good at matching faces with sunglasses and face coverings, typical perceivers showed slightly better performance for faces with sunglasses than those with masks. As for typical perceivers, it is unclear how SRs’ performance might be affected by multiple occlusions (sunglasses and masks).

The current study

This study used edited stimuli from a pre-existing, well-validated database of face images to examine the effects of masks and sunglasses (alone and in combination) on face identity matching and social judgements. To determine the effects over time, data was collected from different groups of participants with typical face recognition abilities at three points in time between June 2020 and July 2021 (Experiment 1). To examine whether face recognition abilities are associated with the effects of different face occlusions, we compared performance on the matching task across three groups of individuals: those with typical face recognition, individuals with DP, and SRs; and examined the relationship between self-reported face recognition ability and the effects of masks and sunglasses (Experiment 2).

Experiment 1

Methods

Participants

Data for Experiment 1 was collected at three different timepoints: June 2020, February 2021, and August 2021. A total of 150 participants (50 per time point) completed the study. Participants were recruited via online participant recruitment services (Prolific.ac and Testable Minds), with the restriction that the study should be available to people who were living within the UK. Subsequently, 15 participants were excluded from analysis: nine were outside the age range for the study (18–60 years of age), and six were outliers in the face matching task (> 3 SDs from the mean in measures of sensitivity or bias). One participant only completed the ratings tasks, not the matching task.

The final sample for analysis included 135 participants (62 female, 71 male, 2 other, M_age = 32.61 years, SD = 10.88); 44 in June 2020 (26 female, 18 male, M_age = 33.64 years, SD = 10.88); 44 in February 2021 (18 female, 26 male, M_age = 28.75 years, SD = 7.43); and 47 in August 2021 (18 female, 27 male, 2 other, M_age = 35.26 years, SD = 10.57). Power calculations (G*Power 3.1.9.2) indicated that this sample size was sufficient to detect an effect of masks and an interaction between masks and timepoint with a small-to-medium effect size (d = 0.24) with 90% power, assuming a correlation of r = 0.68 between the within-subjects variables (this was based on the data obtained in Experiment 1). Noyes et al. (2021), reported an effect size of d = 0.57 for unpractised control participants; thus, our study has sufficient power to detect smaller effects of masks than have previously been found in the literature.

The vast majority of the sample (111) identified their ethnicity as Caucasian/White (44 from June 2020; 27 from February 2021; 38 from August 2021); 19 identified their ethnicity as Asian or Pacific Islander (0 from June 2020; 14 from February 2021; five from August 2021); three identified their ethnicity as Black (0 from June 2020; one from February 2021; two from August 2021, one identified their ethnicity as Hispanic or Latino (from February 2021), and two identified their ethnicity as Other (both from August 2021).

Materials

Face images

The target stimuli consisted of 60 identities (30 female) from the Glasgow Unfamiliar Face Database (GUFD; Burton et al., 2010). Sixty identities (30 female), matched to the target stimuli in gender and similar in age, skin tone, hair colour, and hair style, were selected as distractor images (as in the Glasgow Face Matching Test, there was some overlap between test and distractor identities to ensure a good match between images). The GUFD images have been used for prior research and include images of the same face, with a neutral facial expression, captured with multiple cameras. The stimuli thus include some small variations in face size, colouration, lighting, head angle, and hairstyle, making it difficult to match images based on pictorial cues (e.g., skin tone, specific idiosyncrasies in an image) alone.

Two images taken with different cameras (C1 and C2) were selected for each target identity. All images were selected to show the face from a roughly frontal viewpoint (with variation of up to 10 degrees), but the pairs included some variation in colouration, hairstyles, and face size. Some pairs also showed small changes in eye gaze or facial pose (e.g., mouth closed/mouth slightly open). Images were resized to 800 × 600 pixels. The first image (C1) was not edited further. The second image of each individual (C2) was edited in Adobe Photoshop to show the face wearing (1) a medical-style face mask; (2) sunglasses; and (3) both sunglasses and a face mask (Fig. 1). Images of sunglasses and face masks were selected via an online search (Google images). To prevent participants becoming overly familiar with the accessories, multiple versions of sunglasses and masks were selected (3 masks, 6 sunglasses) and applied to equal numbers of faces (within a single identity, the same accessories were always shown). While the accessories varied in terms of colour and style, the basic shape of the accessories was consistent within genders (male sunglasses were a slightly different shape to female sunglasses), and the images were resized and warped to ensure they covered a similar area of each face. A single image of each distractor identity (from the C2 camera) was selected and edited in the same way as the target images.

In sum, for each unedited target face, there were four “matching” images of the same identity and four “mismatching” images of a different identity (unedited, mask only, sunglasses only, mask and sunglasses). For clarity, the unedited image from camera 1 will be referred to as the “comparison image”, the matching images from camera 2 as “target images”, and the mismatching images as “distractor images”.

Design and procedure

Participants completed four face processing tasks. The first three tasks involved rating the attractiveness, competence, and trustworthiness of each face. The final task involved matching faces based on identity. The design for all tasks was similar: fully within-subjects, with all participants providing ratings/accuracy data for faces in all four conditions (unedited/control, mask only, sunglasses only, sunglasses and mask). Timepoint was also included as a between-subjects variable in the analyses.

For each rating task participants were asked to rate 60 target images (15 in each condition: unedited/control, mask only, sunglasses only, sunglasses and mask) for trustworthiness, competence, and attractiveness on 7-point Likert scales. Participants viewed the target face in the centre of the screen, with the Likert scale presented above it. The Likert scales ranged from 1 (Very Untrustworthy/Unattractive/Incompetent) to 7 (Very Trustworthy/Attractive/Competent), with 4 representing “Neutral”. Responses were made via keypress (the 1–7 keys on the keyboard). There was no time limit to respond, and images stayed on screen until a response was recorded. Prior to each task, there were three practice trials. The ratings tasks were blocked (so participants provided all the trustworthiness ratings in a single block, all the competence ratings in a separate block etc.), and their order of presentation was randomised between participants. The order of presentation of faces in each ratings task was randomised. The allocation of different faces to different conditions was counterbalanced between participants.

For the matching task participants saw pairs of images (one comparison image, paired with either a target or distractor image) presented simultaneously, and were asked to indicate whether the two images depicted the same person or two different people by clicking the “SAME” or “DIFFERENT” button onscreen. There was no time limit to respond, and images stayed on screen until a response was recorded. Participants completed 120 trials in total: 30 in each condition (unedited/control, mask only, sunglasses only, sunglasses and mask), of which half were same identity trials and half were different identity trials.

There were two practice trials at the beginning of the task, and a short break in the middle of the task. Participants did not receive feedback on the practice trials. As in the ratings task, the allocation of different faces to different conditions was counterbalanced across participants, and trials were presented in a random order. Following the matching task, participants completed the PI20 (see Experiment 2 for further details).

Participants at the second and third timepoints (February and August 2021) viewed the same identities in the same conditions (i.e., the same faces were presented with sunglasses/masks/both) in the ratings and matching tasks. The target images presented in the matching task were the same as the images used in the ratings task, meaning that each target image was viewed four times across the entire experiment. Due to a difference in task programming, we were unable to control whether participants at the first timepoint viewed the same faces in the same conditions for the ratings and matching tasks. Participants were not informed that they would see the same identities in the rating and matching tasks.

All data collection took place online via platforms designed for online tasks (Testable.org and Qualtrics). Prior to the experiment, all participants provided informed consent via an online consent form. This project was approved by the institutional Research Ethics Committee, references 11697-A-Apr/2020- 25416-1; 11697-A-Jul/2021- 33456-1; 21052-A-Jul/2021- 33366-2.

Statistical analyses

Matching task

Scores for all participants were calculated in terms of hits (the number of correct “same” responses) and correct rejections (the number of correct “different” responses). This data was also used to calculate signal detection theory (SDT) measures of sensitivity. Due to the non-normal distribution of the data (many participants achieved perfect or near-perfect accuracy in some conditions), the analysis for this task used non-parametric measures of sensitivity (A) and bias (b) (Zhang & Mueller, 2005). The measure A ranges from 0 (chance performance) to 1 (perfect performance); the measure b is used as an indicator of response bias (i.e., whether the participant has a tendency to say that the images are the same or different). A b of 1 indicates a neutral response criterion, whereas a higher score indicates conservative responding (a tendency to indicate that a two faces were different) and a lower score indicates more liberal responding (a tendency to indicate that the two faces were the same) (Macmillan & Creelman, 2005). Examination of the average A and b across conditions revealed six participants who were extreme outliers (> 3 SDs from the mean) on at least one measure; these participants were excluded from all analyses.

We analysed response time (RT) to trials with correct responses only. Any RTs greater than 3SD from a participants’ mean RT were excluded from calculations.

Ratings tasks

The mean rating for each condition was calculated for each participant for the three ratings tasks. Mean ratings could range from 1 to 7.

All the data from the study can be accessed at https://osf.io/m2ch8/?view_only=a5b8edc88bcc4d6ca3f7b9b56b57b0d6.

Preliminary analyses

Previous research suggests that age can influence face identity perception (Bowles et al., 2009; Megreya & Bindemann, 2015) and some social judgements (Zebrowitz et al., 2013). Consequently, participants were divided into two age groups: younger adults (18–39 years old) and older adults (40–59 years old), and data from all tasks was entered into a series of ANOVAs including age group as a between-subjects factor. The main effects of age group and interactions with other variables were not significant for any of the key dependent variables (A, ratings), all p’s > .100. Furthermore, entering age as a covariate in the analyses did not change the pattern of results. Consequently, age was excluded from further analyses.

Initial examination of the data revealed departures from normality (Shapiro–Wilk p’s < .05) in many variables, with the data (particularly from the matching task) showing substantial skew. However, as ANOVA models are relatively robust to departures from normality (Blanca et al., 2017), and there was no evidence of violation of the homogeneity of variance assumption (all Levene’s p’s > .05) we proceeded with the planned analyses.

Results

Matching task

Performance on the matching task at each timepoint is displayed in Table 1. Performance is displayed in Figs. 2 (SDT measures) and 3 (accuracy and response time). SDT data from the matching task (A and b) was initially entered into two 3 (timepoint: June 2020; February 2021; August 2021) × 2 (mask: mask; no mask) × 2 (sunglasses: sunglasses; no sunglasses) ANOVAs.

Table 1 Mean (SD) for each timepoint for the matching and ratings tasks

Full size table

The ANOVA on A revealed main effects of mask, F(1,131) = 102.08, p < .001, ηρ² = .44, and sunglasses, F(1,131) = 68.38, p < .001, ηρ² = .34. On average, unmasked faces (M = 0.95, SD = 0.04) were matched better than masked faces (M = 0.92, SD = 0.04); and faces without sunglasses (M = 0.95, SD = 0.05) were matched better than faces with sunglasses (M = 0.92, SD = 0.05).

These main effects were superseded by an interaction between masks and sunglasses, F(1,131) = 5.04, p = .026, ηρ² = .04 (see Fig. 2). Simple pairwise comparisons (Bonferroni-corrected) confirmed that masks or sunglasses alone impaired recognition compared to the unedited faces, p’s < .001, and faces with both masks and sunglasses were matched significantly worse than sunglasses or masks alone, p’s < .001. There was no significant difference between performance with masks alone and performance with sunglasses alone, p > .99. However, the effect of masks on identification (i.e., the difference between performance for masked and unmasked faces) was slightly, but significantly, higher when the faces were depicted with sunglasses (M = 0.043, SD = 0.07) than without sunglasses, (M = 0.028, SD = 0.04), F(1,131) = 5.04, p = .026, ηρ² = .04.

There was no main effect of timepoint, F(2,131) = 0.70, p = .500, ηρ² = .01, and timepoint did not interact significantly with any other effects, mask × timepoint: F(2,131) = 1.06, p = .351, ηρ² = .02, sunglasses × timepoint: F(2,131) = 0.19, p = .825,1 ηρ² = .00, mask × sunglasses × timepoint: F(2,131) = 0.24, p = .790, ηρ² = .00. Thus, the effects of masks and sunglasses, alone or in combination, did not significantly differ across timepoints.

The ANOVA on b revealed significant main effects of masks, F(1,131) = 111.44, p < .001, ηρ² = .46, and sunglasses, F(1,131) = 38.72, p < .001, ηρ² = .23. Masked faces (M = 1.35, SD = 0.41) led to more conservative patterns of responding than unmasked faces (M = 0.98, SD = 0.41); likewise, faces with sunglasses (M = 1.30, SD = 0.43) led to more conservative responding than faces without sunglasses, (M = 1.03, SD = 0.43) (see Fig. 2).

There was no main effect of timepoint on bias, F(2,131) = 0.50, p = .607, ηρ² = .01, and no interactions were significant, all p’s > .30.

Effects of trial type

To explore the effects of masks on matched and mismatched trials separately, accuracy data was entered into a 3 (timepoint: June 2020; February 2021; August 2021) × 2 (mask: mask; no mask) × 2 (sunglasses: sunglasses; no sunglasses) × 2 (trial type: matched; mismatched) ANOVA. As in the A analysis, the main effects of mask and sunglasses, and the interaction between masks and sunglasses, were significant, mask: F(1,131) = 118.86, p < .001, ηρ² = .48; sunglasses, F(1,131) = 75.71, p < .001, ηρ² = .37, mask × sunglasses: F(1,131) = 4.38, p = .038, ηρ² = .02. Pairwise comparisons (Bonferroni-corrected) on the interaction showed a similar pattern to the A data, with unedited faces matched significantly more accurately than faces with masks or sunglasses, p’s < .001, and faces with a single occlusion matched better than faces with both masks and sunglasses, p’s < .001, but no significant difference between faces with masks only and faces with sunglasses only, p = 1.

The main effect of trial type was not significant, F(1,131) = 1.63, p = .203, ηρ² = .01, but trial type interacted with both mask, F(1,131) = 101.90, p < .001, ηρ² = .44, and sunglasses, F(1,131) = 32.93, p < .001, ηρ² = .20. The three-way interaction was not significant, F(1,131) = 0.31, p = .578, ηρ² = .00. Follow-up comparisons showed that neither masks nor sunglasses made a significant difference to accuracy in mismatched trials, p’s = 1, however, accuracy in matched identity trials significantly decreased when either masks or sunglasses were introduced, p’s < .001 (see Fig. 3).

Once again, the effects of masks and sunglasses did not differ across timepoints: the main effect of timepoint was not significant, F(2,131) = 0.54, p = .587, ηρ² = .01; and none of the interactions involving timepoint were significant, all p’s > 0.5.

Response time

A 3 (timepoint: June 2020; February 2021; August 2021) × 2 (mask: mask; no mask) × 2 (sunglasses: sunglasses; no sunglasses) × 2 (trial type: matched; mismatched) ANOVA revealed a similar pattern of findings to the main analysis on accuracy for matched and mismatched trials: there were significant main effects of masks and sunglasses, but not trial type, mask: F(1,131) = 26.85, p < .001, ηρ² = .17; sunglasses, F(1,131) = 65.49, p < .001, ηρ² = .33, trial type: F(1,131) = 1.65, p = .201, ηρ² = .01. There were significant interactions between mask and trial type, F(1,131) = 27.25, p < .001, ηρ² = .17, and sunglasses and trial type: F(1,131) = 13.18, p < .001, ηρ² = .09. Both masks and sunglasses led to slower responses than unoccluded faces in matched identity trials, p’s < .001. Response times to masked and unmasked faces were not significantly different for mismatched trials, p > 0.9, but responses to faces with sunglasses were slower than to faces without sunglasses in mismatched identity trials, p = .02.

There were no significant main effects of interactions with timepoint, p’s > 0.08, and no other interactions were significant, p’s > 0.1.

Difficult trials

Participants performed very well in the matching task overall (see Figs. 2 and 3). It is possible that ceiling effects could obscure some subtle differences between conditions or timepoints; consequently, we repeated the analyses on a reduced dataset that contained a subset of more difficult trials. Due to counterbalancing, there were four sets of faces (each with 15 target identities) presented to participants. We selected the five target identities with the highest baseline accuracy in each set (based on unedited trials) from the analysis. Data for these faces was removed from all conditions. This resulted in a reduced dataset, containing the most difficult 2/3 of trials in the experiment. Average accuracy in the unedited condition for the reduced dataset was 92.4% (compared to 94.1% in the full dataset); this is similar to the levels of accuracy reported for the original Glasgow Unfamiliar Faces Test (89.9% in Burton et al., 2010). The analyses reported above were repeated on this more difficult dataset. For brevity, we have only reported a brief summary of the analyses here; however, the data from the more difficult trials is openly available alongside the full dataset at https://osf.io/m2ch8/?view_only=a5b8edc88bcc4d6ca3f7b9b56b57b0d6.

Overall, the pattern of results for the reduced dataset was identical to the full dataset. For sensitivity (A), the ANOVA revealed no significant effects of timepoint, all p’s > 0.2, but all main effects and interactions involving masks and sunglasses were significant, p’s < .05. As in the main analysis, participants performed better with unedited faces than those with masks or sunglasses, p’s < .001; and worse with masks and sunglasses compared to masks or sunglasses in isolation, p’s < .02. There was no significant difference in performance for faces shown with sunglasses or with masks alone, p > 0.9. Similarly, for accuracy, the results for the reduced dataset mirrored those for the full dataset: no significant main effect or interactions involving timepoint, p’s > 0.3, and the same pattern of performance across conditions and trial types as in the complete dataset.

Ratings tasks

The mean ratings given to faces in each condition at each timepoint are shown in Table 1. Separate 3 (timepoint: June 2020; February 2021; August 2021) × 2 (mask: mask; no mask) × 2 (sunglasses: sunglasses; no sunglasses) ANOVAs were carried out on the trustworthiness, competence, and attractiveness ratings.

Trustworthiness

The main effect of masks on trustworthiness judgements was not significant, F(1,132) = 0.34, p = .558, ηρ² = .01. The main effect of sunglasses on trustworthiness judgements was significant, F(1,132) = 163.97, p < .001, ηρ² = .55; faces with sunglasses (M = 3.31, SD = 0.80) were rated as less trustworthy, on average, than those without sunglasses (M = 4.28, SD = 0.80). There was also a significant interaction between masks and sunglasses, F(1,132) = 12.48, p < .001, ηρ² = .06. Follow-up simple main effects analyses revealed that there was a significant negative effect of masks on trustworthiness judgements when the images were shown wearing sunglasses, p = .021, but not when the images were shown without sunglasses, p = .27 (see Fig. 4).

The main effect of timepoint was not significant, F(2,132) = 1.42, p = .247, ηρ² = .02, nor were the two-way interactions between timepoint and masks or sunglasses, p’s > 0.20, or the three-way interaction between masks, sunglasses, and timepoint, F(2,132) = 2.33, p = .10, ηρ² = .03.

Competence

Similar to trustworthiness, the analysis on competence judgements revealed no main effect of masks, F(1,132) = 3.67, p = .058, ηρ² = .03, but a significant main effect of sunglasses, F(1,132) = 88.26, p < .001, ηρ² = .40 and a significant interaction between masks and sunglasses, F(1,132) = 14.15, p < .001, ηρ² = .10. While the main effect of sunglasses was similar to trustworthiness, with sunglasses leading to lower competence ratings on average (sunglasses: M = 3.70, SD = 0.82, no sunglasses: M = 4.45, SD = 0.82), the pattern of ratings for the mask × sunglasses interaction diverged substantially from trustworthiness judgements. When faces were depicted with sunglasses, masks did not have a significant effect on competence ratings, p = 1; however, when faces were depicted without sunglasses, masks increased competence ratings, p = .006 (see Fig. 4).

There was no main effect of timepoint, F(2,132) = 0.95, p = .388, ηρ² = .01, and no interactions with timepoint were significant, p’s > 0.08.

Attractiveness

As for both trustworthiness and competence, the ANOVA on attractiveness judgements revealed no significant main effect of masks, F(1,132) = 0.03, p = .955, ηρ² = .00, but a significant main effect of sunglasses, F(1,132) = 68.88, p < .001, ηρ² = .34, and a significant interaction between masks and sunglasses, F(1,132) = 22.95, p < .001, ηρ² = .15. Once again, sunglasses led to lower ratings on average (sunglasses: M = 3.10, SD = 0.81, no sunglasses: M = 3.54, SD = 0.81). Pairwise comparisons did not reveal a difference between unedited and masked faces, p = .135, or faces depicted with sunglasses or sunglasses and masks, p = .104. However, follow-up analysis exploring the interaction revealed that the effect of masks on attractiveness ratings (i.e., the difference between ratings for masked and unmasked faces) was significantly different for faces depicted with (M = 0.15, SD = 0.76) and without sunglasses (M = −0.15, SD = 0.76), F(1,132) = 22.95, p < .001, ηρ² = .15 (see Fig. 4).

Once again, there was no significant effect of timepoint, F(1,132) = 2.83, p = .062, ηρ² = .04, and no interaction between timepoint and any other variable, p’s > 0.1.

Discussion

Experiment 1 examined the effects of different facial coverings (masks, sunglasses) on face perception across three timepoints. Consistent with previous research (Carragher & Hancock, 2020; Graham & Ritchie, 2019; Noyes et al., 2021), the findings suggest that face masks and sunglasses have a significant effect on face matching ability: both result in a significant decrease in sensitivity in a face matching task. The effects of masks on trait judgements were somewhat variable, and depended on the judgement being made and the presence of other face coverings (i.e., sunglasses).

In general, the effects of masks were consistent across the three timepoints measured in this research. This suggests that extended exposure to mask-wearing over the course of a year has not reduced the effects of masks on face processing. However, while the effects of face coverings did not differ significantly over time, there was substantial variability between participants, regardless of timepoint. For example, within our sample, the mask effect varied from negligible or even negative (a mask advantage) to a 23% reduction in overall face matching accuracy for some participants. On a broader level, the variance associated with individual differences in the A analysis equated to 51.5% of the total variance in the data. In Experiment 2, we examined whether this variability was related to individual differences in face recognition ability. First, we compared the effects of masks and sunglasses on face matching in individuals with extremely good (super-recognisers) and very poor (developmental prosopagnosia) face recognition. Second, we examined whether the effects of masks correlate with self-reported face recognition ability in the general population.