Corneal reflections and skin contrast yield better memory of human and virtual faces

Vaitonytė, Julija; Alimardani, Maryam; Louwerse, Max M.

doi:10.1186/s41235-022-00445-y

Original article
Open access
Published: 18 October 2022

Corneal reflections and skin contrast yield better memory of human and virtual faces

Julija Vaitonytė¹,
Maryam Alimardani¹ &
Max M. Louwerse¹

Cognitive Research: Principles and Implications volume 7, Article number: 94 (2022) Cite this article

2151 Accesses
Metrics details

Abstract

Virtual faces have been found to be rated less human-like and remembered worse than photographic images of humans. What it is in virtual faces that yields reduced memory has so far remained unclear. The current study investigated face memory in the context of virtual agent faces and human faces, real and manipulated, considering two factors of predicted influence, i.e., corneal reflections and skin contrast. Corneal reflections referred to the bright points in each eye that occur when the ambient light reflects from the surface of the cornea. Skin contrast referred to the degree to which skin surface is rough versus smooth. We conducted two memory experiments, one with high-quality virtual agent faces (Experiment 1) and the other with the photographs of human faces that were manipulated (Experiment 2). Experiment 1 showed better memory for virtual faces with increased corneal reflections and skin contrast (rougher rather than smoother skin). Experiment 2 replicated these findings, showing that removing the corneal reflections and smoothening the skin reduced memory recognition of manipulated faces, with a stronger effect exerted by the eyes than the skin. This study highlights specific features of the eyes and skin that can help explain memory discrepancies between real and virtual faces and in turn elucidates the factors that play a role in the cognitive processing of faces.

Introduction

Humans are highly visually oriented species (Van Essen, 2004), with one type of visual stimulus especially capturing our attention—the face (Hershler & Hochstein, 2005). This is not surprising as face processing plays an important role in human communicative interactions (Hernández-Gutiérrez et al., 2021). Superior face processing skills in humans can be linked to highly specialized neural circuitry (Duchaine & Yovel, 2015), as well as the high variability in facial morphology, particularly in comparison to other species (Sheehan & Nachman, 2014).

Given the specialized neuro-cognitive mechanisms for human face processing, it becomes increasingly relevant to understand the extent to which human face processing extends to the faces of entities emulating the appearance of humans, for instance, Intelligent Virtual Agents (IVAs). IVAs are embodied virtual characters that can interact with humans using verbal, para-verbal, and nonverbal behaviors (Lugrin, 2021). With advances in computer graphics (Alexander et al., 2009), the faces of IVAs have become photorealistic (Seymour et al., 2017). The growing prevalence of using IVAs as stimuli in research (Kätsyri et al., 2020) and the interest to use them in different use cases, such as e-commerce (Etemad-Sajadi, 2016), healthcare (Robinson et al., 2014), and education (Belpaeme et al., 2018) highlight the importance of investigating the similarities between processing IVA faces, henceforth, virtual faces and natural human faces.

Despite the progress in computer-generated imagery, the processing of virtual faces is different than the processing of natural human faces. Previous perception research revealed that computer-generated faces are generally distinguishable from natural faces (Farid & Bravo, 2012; Vaitonytė et al., 2021), and that information across the whole face, the eyes, and the skin, is employed to make this distinction (Balas & Tonsager, 2014). Relatedly, the fidelity of specific features in the eyes and the skin (as discussed in detail below) has been recently found to be responsible for identifying virtual agent faces as virtual rather than human-like (Vaitonytė et al., 2021).

Memory studies, too, point to a processing discrepancy between computer-generated and human faces (Balas & Pacella, 2015; Crookes et al., 2015; Kätsyri, 2018). Balas and Pacella (2015) used real photographs and identity-matched computer-generated counterparts, created using FaceGen software, which allows importing a frontal and two lateral photographs of the face to generate an individualized avatar. Balas and Pacella (2015) found that real faces were significantly better remembered than computer-generated faces. Crookes et al. (2015) who used real photographs of Caucasian and Asian faces and computer-generated counterparts, as well as computer-generated Caucasian and Asian faces generated at random using FaceGen, reached the same conclusion, i.e., higher memory recognition accuracy was found for natural than computer-generated faces. Crookes et al. (2015) also showed that the Other Race effect (ORE; better face memory for one’s own race than other races) was reduced for computer-generated faces compared to real facial photographs.

Similarly, Kätsyri (2018) employed real photographs and virtual faces that were generated in FaceGen and that were also matched on low-level features (global luminosity and spatial frequency contents). Virtual faces matched on low-level visual characteristics were still recognizable as virtual. Regarding participant memory, while Kätsyri (2018) found that the sensitivity index d' was not higher for real faces than virtual faces, the response bias index c was higher for virtual faces, indicating participants found that virtual faces were more similar to one another than real human faces were. It was previously suggested that computer-generated faces lack discriminating information in the form of fine-grained surface texture (Crookes et al., 2015). This prediction is compatible with face recognition literature, showing that when spatial frequency information is reduced (where spatial frequencies refer to luminance variations, with high spatial frequencies encoding fast luminance variations and hence more detail), face recognition accuracy also drops (Sandford et al., 2018). However, while the heterogeneity of facial details may be important for remembering virtual faces as it is with natural faces, this prediction has not been directly tested.

Vaitonytė et al. (2021) showed both experimentally and computationally that the intricacy of facial details is important for the perceived human-likeness, allowing perceivers to distinguish between natural human faces and virtual faces. The specific features that were indicative of the face being virtual were skin contrast, i.e., the degree to which skin texture is rough versus smooth, and corneal reflections, i.e., the white foci in each eye that occur when the ambient light reflects from the surface of the cornea. Reductions in skin contrast and corneal reflections caused one to perceive the face low in human-likeness.

The reasons behind the predicted influence of the skin contrast and corneal reflections are based on the previous literature that reported the features affecting face recognizability, i.e., spatial frequency information (Sandford et al., 2018) and contrast polarity (Gilad et al., 2009). Reducing spatial frequency information negatively affects face recognition accuracy because the face loses detail (Sandford et al., 2018), whereas the potential role of corneal reflections might be associated with the broader role that contrast polarity relationships play in face recognition (Gilad et al., 2009). Contrast polarity relationships refer to the eyes being darker than the forehead and the cheeks, known to be a remarkably stable feature; if these relationships are reversed, for instance, by applying contrast negation, the face recognition becomes impaired. According to Gilad et al. (2009), contrast polarity relationships may be important for typical face processing, in that they represent regularities in the data that get incorporated by the visual system when it learns about visual objects. It is unclear how regular the presence of corneal reflections is. Corneal reflections might be less regular than contrast polarity relations across the face because depending on the lighting conditions, corneal reflections may be pronounced or reduced. However, one may argue that corneal reflections are a feature that gets incorporated into face representations by the visual system.

In the current study, we examined the extent to which the previously identified features of skin contrast and corneal reflections impacted the memory of different faces. We tapped into more general cognitive processes by asking participants to remember stimuli. In day-to-day life, people commonly encounter situations that require remembering different faces while assessing human-likeness is arguably infrequent. We examined the role of skin contrast and corneal reflections in memory by conducting two experiments. Experiment 1 had high-quality virtual faces collected “in the wild,” while Experiment 2 used human faces obtained from a picture database that were further manipulated to reduce skin contrast and corneal reflections. Although both experiments were conducted with the same group of participants, they can be best understood as independent experiments due to the different nature of stimuli. Dawel et al. (2021) argued that an approach that seeks convergent evidence using a set of controlled images and a set of images collected “in the wild” is valuable when studying face processing. Our approach is, however, not amenable to directly comparing the memory recognition performance in virtual faces versus human faces, rather we sought to more generally understand whether the predicted features were of influence on memory.

Experiment 1

Experiment 1 used high-quality virtual faces available from different companies that employ cutting-edge techniques (i.e., 3D scanning) to test how facial details present in virtual faces were associated with participant face memory performance. Following Vaitonytė et al. (2021), we predicted that the virtual faces with higher skin contrast and a higher number of corneal reflections would be remembered better compared to the virtual faces with lower skin contrast (i.e., smoother skin) and fewer corneal reflections.

Method

Participants

Sixty-three students at Tilburg University (34 females, 28 males, and one person who preferred not to indicate gender, Age: Mean = 22.65, SD = 4.19) took part in the experiment in exchange for either partial course credits or a candy bar. The majority of participants identified with Caucasian ethnicity (n = 42), followed by Asian (n = 9), Black (n = 2), Hispanic (n = 2), and Middle Eastern (n = 2) ethnicities, while 6 participants indicated “Other.” Participants were recruited via the university participant pool and advertisements put on campus. The experiment was approved by the Research Ethics and Data Management Committee of the Tilburg School of Humanities and Digital Sciences (identification code: REC#2019/03). All participants provided consent prior to their participation in the experiment.

Stimuli

Experimental stimuli consisted of photorealistic virtual agent faces (n = 24, 12 female), collected from the Internet using the following criteria: (1) the photographs of the virtual face had to be of high quality, (2) the face had to be presented in frontal view, and (3) the face was not covered with hair that obscured facial features. No changes were made to the facial images of virtual faces. We used searches “digital human” and “digital humans” with the aim to collect virtual faces created by companies that work in the realm of “digital human” technology. These companies use techniques such as 3D scanning and/or deep neural networks, which permit creating photorealistic faces. To prepare stimuli, virtual faces were cropped to an oval removing all non-facial information (e.g., hair), with a slightly varying width due to inherent variation in the facial width (from 550 to 650 pixels) and a constant height (800 pixels). By collecting virtual agent faces, we aimed to obtain a sample of photorealistic virtual faces, and also have a more spontaneous set of faces. Such stimulus set can be considered similar to the face photographs collected “in the wild” that are sometimes used in studies on face recognition, whereby e.g., lighting conditions or face age may differ among images.

Procedure

Participants received instructions both in writing and verbally. First, participants were presented with written instructions, after which they signed an informed consent digitally and filled in a demographic questionnaire. The written instructions, the informed consent, and the questionnaire were presented in Qualtrics (Qualtrics, 2021). Following this, the experimenter explained the task in Experiment 1 and Experiment 2 verbally. We combined the presentation of instructions for Experiment 1 with those for Experiment 2 since one experiment followed another, with a short break in-between. The decision against combining the virtual faces and the human faces from Experiment 2 into a single experiment was based on the two classes of images forming clearly distinct groups.

Participants were told that they would see a series of virtual agent faces that needed to be memorized (henceforth “study” phase). Next, they would get statements whose veracity needed to be judged (distractor task), and they would then again see a set of virtual agent faces and would decide whether they had previously seen each face or not (henceforth “test” phase). There was no time limit in “test” phase, but participants were asked to use their immediate judgment to make decisions. Participants were asked to use their index fingers to press the M key if the face was “old” (presented previously) or a Z key if the face was “new” (presented for the first time). The distractor task between “study” and “test” included general knowledge statements (e.g., “Monaco is the smallest country in the world”), for which participants selected whether they thought the statement was true or false. The duration of completing the distractor task slightly varied across participants, because it depended on each participant’s speed, but on average it lasted approximately 2–3 min. Neither answers nor response times in the distractor task were relevant or used for the analysis.

Stimuli were presented using the PsychoPy software (Peirce, 2007) on a laptop (Dell E5480) with a screen resolution of 1920 × 1080. Participants sat approximately 40 cm from the screen. Four face stimuli, one virtual face, and three human faces, none of which were part of the experimental stimuli, were given to participants as practice. Having finished the practice trials, participants started Experiment 1, in which they saw 16 virtual faces (half female) (Fig. 1A) presented for two seconds following Schyns et al. (2002). Following the “study” phase, participants read eight statements and indicated their veracity. In the “test” phase, participants saw 16 virtual faces of which 8 images were previously not shown. Virtual faces to be shown to participants as “old” or “new” were selected randomly. However, due to the limited sample size of the available virtual agent faces, they were not counterbalanced. Therefore, all participants saw the same set of images as “new” and the same set of images as “old.” Participants’ accuracy and response times (RTs) were collected as dependent variables.

Computational measures

While we did not manipulate virtual faces, we used two computational measures, as previously described in Vaitonytė et al. (2021), to assess skin smoothness versus roughness and the presence of the corneal reflections. For the assessment of skin smoothness, we measured contrast variations in facial images, which we converted to grayscale before carrying out the calculations. The developed measure for skin, termed “skin contrast,” identified for each pixel in a facial image the biggest difference with its adjacent pixels (each pixel had 8 neighbors). A matrix of those differences in contrast quantifications had been used to derive the median for each facial image. Therefore, the output of the algorithm was the median value of skin contrast per image. In this computation, we used the whole face as input because: (1) taking the whole face versus isolated parts did not yield differing results, and (2) taking whole faces confers better generalizability. For the computational assessment of corneal reflections, we counted the white foci in each eye, selecting the iris and pupil. Each image was first converted to grayscale and then bright and dark pixels were identified. Finally, all bright pixels that were connected to each other were identified as corneal reflections, yielding the number of corneal reflections as measure.

Analysis

Data were preprocessed and analyzed in R (version 4.0.3; R Core Team, 2021). We transformed participant responses into sensitivity index d′ and the response bias index c in the framework of the Signal Detection Theory (SDT, Stanislaw & Todorov, 1999). In addition, we fitted Generalized Linear Mixed Models (GLMM) to raw accuracy scores and response times via the lme4 package (Bates et al., 2015). Response times longer than three standard deviations from the mean were removed, affecting 2.15% of the data from Experiment 1. We included the SDT measures in addition to raw accuracy scores to account for the bias introduced by participants, in that in SDT, sensitivity to the task (d′) is measured independently of response bias (c).

Responses could be assigned to one of the four categories: (1) Hits when the face was correctly identified as old, (2) Misses when participants failed to categorize old faces as such, (3) False alarms when unseen faces were mistakenly identified as old, and (4) Correct rejections when unseen faces were identified as new. When calculating d′, transformations were applied to values of hits and false alarms of 0 using the log-linear rule (Hautus, 1995) to avoid obtaining values of positive and negative infinity. Higher d′ values suggested higher sensitivity, indicative of having many hits and few false alarms. The response bias index c determined the preferred response favored by participants, i.e., conservative or liberal. Conservative responding is represented by positive values of c and can be understood as the preference to respond “old,” while negative values of c are indicative of liberal responding, i.e., “new.” In Experiment 1, we were interested in the values of d′ across the virtual faces. The sensitivity index d′ was calculated for each virtual face (n = 16), i.e., averaged across participants, and for each participant (n = 63), i.e., averaged across virtual face images.

In the mixed-effects model analyses, we looked at whether the computational measures, i.e., the number of corneal reflections and the skin contrast score, obtained for each virtual face as described above (both treated as continuous predictors in the models), were predictive of the accuracy and the RT responses. Mixed-effects logistic regression analysis was conducted on participants’ original responses: 1 = correct and 0 = incorrect, whereby correct meant that the participant correctly remembered whether or not a face was presented earlier. Linear mixed-effects regression was conducted on RTs (measured in milliseconds). Across all analyses, participants were a random factor while we did not include items as a random factor to avoid eliminating item variance that needed to be attributed to the predictors of interest (i.e., the number of corneal reflections and the skin contrast score). Model comparisons were conducted via the likelihood ratio (LR) tests to determine the significance of predictors of interest.

Results

Sensitivity index d′ and response bias index c

The values of the sensitivity index d′ were generally high across virtual face images, with a mean of 2.05 (no sensitivity being 0). This suggests that the virtual faces were overall remembered well, with the exception of three virtual faces (bottom row in Fig. 2) yielding negative scores. Figure 2 presents the virtual faces that were employed to test memory in Experiment 1 together with the sensitivity index d′ and the computational measures associated with each face. A slight bias was found among participants toward responding having seen a face before (average of c scores = 0.63). Overall, the results of Experiment 1 suggested that participants were able to recall virtual faces well.

Accuracy

We first ascertained which of the models were significant using LR tests. The number of corneal reflections in the eyes was a significant predictor when compared to the null model, χ²(1) = 38.18, p < 0.001. When skin contrast as the predictor was added to the model, the model improved, χ²(1) = 13.94, p < 0.001 compared to the model that only contained the number of corneal reflections.

Overall, we found that participants were more likely to have higher memory recognition accuracy when virtual faces had a higher number of corneal reflections (β = 0.19, SE = 0.05, z = 3.64, 95% CI [0.09, 0.30], p < 0.001), and when the faces had increased skin contrast (β = 63.83, SE = 18.01, z = 3.54, 95% CI [29.00, 102.08], p < 0.001).

Response times

Contrary to accuracy scores, the null model for the response times compared with the model containing the number of corneal reflections as a predictor yielded no improvement in the model, χ²(1) = 2.74, p = 0.10. In line with the accuracy scores, adding skin contrast as a predictor did improve the model, χ²(1) = 85.25, p < 0.001. The model with the number of corneal reflections and skin contrast as predictors was significantly better than the model containing only the number of corneal reflections.

We found that virtual faces with increased corneal reflections yielded faster response times (β = − 43.15, SE = 7.88, t(914.85) = − 5.48, 95% CI [− 58.60, − 27.69], p < 0.001). However, higher skin contrast yielded significantly slower (not faster) response times (β = 31,062.75, SE = 3281.45, t(917.23) = 9.47, 95% CI [24621.28, 37,500.92], p < 0.001).

Discussion

The results in Experiment 1 showed that across the virtual face images, the values of sensitivity index d′ were high, with the exception of three faces. High memory recognition accuracy in general might, however, have resulted from a relatively small number of faces to memorize. In terms of raw accuracy scores, skin contrast and corneal reflections were predictive of memory performance. The virtual faces with a higher skin contrast and a higher number of corneal reflections were more likely to be remembered better than the faces with reductions in these features. The two features thus covaried with accuracy in the hypothesized direction, i.e., facilitating memory performance. Participants responded significantly faster on the faces in which the eyes had a higher number of corneal reflections, whereas increased skin contrast yielded significantly longer response times. This difference in the direction of the observed effects on response times was not predicted. It might be that more detailed skin appearance as indicated by higher values of skin contrast might have required more processing time because skin is a more global feature (compared to corneal reflections that are more localized).

Based on these results with virtual faces, for better memory of the face, the skin texture may be suggested to be rougher (rather than smoother) and the eyes with corneal reflections included (rather than reduced). The downside of Experiment 1, however, is that we did not control for any extraneous factors (e.g., face distinctiveness). It is possible that distinctiveness, known to affect the ability to remember faces (Valentine, 1991), has covaried with the features of interest, and thus explains the results as much as those features do. A controlled set of faces is therefore needed to seek convergent evidence regarding the extent to which altering skin contrast and the presence of corneal reflections affect face memory. We address this caveat in Experiment 2.

Experiment 2

Experiment 2 used human faces and examined how face memory was affected by the extent of manipulation made to different faces by smoothening the skin and removing/preserving the corneal reflections. We predicted that violations in the two features would negatively impact memory performance. Based on the prior literature (Vaitonytė et al., 2021), we expected that the absence of corneal reflections would exert a stronger influence, in that the worst memory would be observed with respect to the faces lacking corneal reflections, with and without changes in skin.