What’s in a face? The role of facial features in ratings of dominance, threat, and stereotypicality

Faces judged as stereotypically Black are perceived negatively relative to less stereotypical faces. In this experiment, artificial faces were constructed to examine the effects of nose width, lip fullness, and skin reflectance, as well as to study the relations among perceived dominance, threat, and Black stereotypicality. Using a multilevel structural equation model to isolate contributions of the facial features and the participant demographics, results showed that stereotypicality was related to wide nose, darker reflectance, and to a lesser extent full lips; threat was associated with wide nose, thin lips, and low reflectance; dominance was mainly related to nose width. Facial features explained variance among faces, suggesting that face-type bias in this sample was related to specific face features rather than particular characteristics of the participant. People’s perceptions of relations across these traits may underpin some of the sociocultural disparities in treatment of certain individuals by the legal system.


Significance statement
Faces judged as stereotypically Black (i.e., Afrocentric) are perceived negatively relative to less stereotypical faces, and this face-type bias influences a variety of realworld outcomes including employment and legal decisions. Dominance is a first-impression trait that is cued by facial structure and is associated with threat and criminality. In this experiment, we investigated whether facial features that are perceived as dominant and threatening, may be consistent with stereotypically Black features and thereby explain some of the biased treatment of people who have this face-type. Artificial faces were constructed to manipulate facial features to study the relations among perceived dominance, threat, and Black stereotypicality. People were shown faces with different combinations and variations, of facial features typically associated with stereotypicality; nose width, lip fullness, and variations in skin tone (here manipulated as reflectance; shadowing and texture). After presentation, people judged how well each face represented the three factors of interest (traits). Results showed that stereotypicality was related to wide nose and darker reflectance and to a lesser extent full lips; threat was associated with wide nose, thin lips, and low reflectance; dominance was mainly related to nose width. People were influenced by the facial features when making trait judgments, while the demographics of the perceiver (race, age, gender), did not change how the faces were judged. These results suggest that the extent to which people perceive dominance, threat, and stereotypicality as related, may underpin some of the sociocultural disparities in treatment of certain individuals in an applied context.
The "Barbie Bandits", two attractive teenage girls who robbed banks in Georgia (Joseph, 2009), likely were successful during their heists because they surprised bank tellers with their atypical appearance. Jeremy Meeks, the "Sexy mugshot guy" (Rayne, 2016), who was arrested for robbery and assault, gained notoriety and a modeling contract as a result of good looks, despite his criminal activity. People judge faces quickly, making first impression judgments in as little as 100 ms (Bar et al., 2006;Willis & Todorov, 2006). Speeded judgments are often biased and based on little or no information about actual behavior . Instead, people form impressions of one another and assume character traits based in part on facial structure and the extent to which facial cues support preconceived expectations for behavior (Blair et al., 2004a(Blair et al., , 2004bDotsch & Todorov, 2012;Kleider-Offutt et al., 2017a, 2017b. Face judgment research finds commonalities in facial structure that lead to judgments of dominance, trustworthiness, and a variety of other trait-based assumptions (for review Zebrowitz et al., 2011,). These judgments may play a role in how people are perceived and may relate to important applied decisions, such as political elections (Todorov et al., 2005), military rank (Mazur et al., 1984;Mueller & Mazur, 1998), and court system outcomes relating to sentence severity and guilty verdicts (Blair et al., 2004a(Blair et al., , 2004bKleider-Offutt et al., 2017a, 2017bPorter et al., 2010). These face trait judgments occur for race-and gender-ambiguous faces, suggesting that susceptibility to biased assessment may be ubiquitous (Ito et al., 2011;Kaminska et al., 2020). However, in scientific research and the news media, Black faces specifically garner biased judgment (Dixon, 2017;Dixon & Azocar, 2007;Kleider-Offutt, 2019;Kleider-Offutt et al., 2017a, 2017b. The focus of the current study is to identify facial features associated with assumed behavioral traits that underpin biased judgments of Black individuals.
Black men, specifically, are vulnerable to face-type bias and assumed criminality due to associations with the Black man criminal stereotype (Kleider et al., 2012;Kleider-Offutt, 2019;Kleider-Offutt et al., 2018;Knuycky et al., 2014). Black men with stereotypically Black features are often judged more negatively and more criminal in real-world and laboratory settings than are their counterparts who possess more atypical features (Blair et al., 2004a(Blair et al., , 2004bKleider et al., 2012). In addition, men with more stereotypical features are more likely to be misidentified (Flowe & Humphries, 2011;Kleider-Offutt et al., 2017a, 2017b and given more punitive sentences (Eberhardt et al., 2006) than are Black men judged as possessing fewer stereotypical features in criminal cases. For example, Black men who were misidentified as the perpetrator in a crime, incarcerated, and later exonerated based on DNA evidence (i.e., factually innocent), were judged by an independent sample of people as being more stereotypically Black than were Black exonerates who were falsely incarcerated for reasons other than eyewitness identification error (Kleider-Offutt et al., 2017a, 2017b. These findings suggest a bias to associate certain face-types with negative (e.g., criminal) actions (Kleider-Offutt et al., 2017a, 2017b.
Discussions around what drives this bias suggest that stereotypically Black features may activate negative racial stereotypes that can result in associations with fear (Golkar et al., 2015;Olsson et al., 2005). A body of research is focused on identifying what aspects of a Black face lead to negative associations for White participants. Some studies find that darker skin tone is what drives the effect (Maddox & Gray, 2002). Alternatively, some research suggests that facial features and skin tone are used together (Deregowski et al., 1975;Livingston & Brewer, 2002), while others argue that they are used independently to inform these negative associations (see for a review, Hagiwara et al., 2012;Stepanova & Strube, 2009). Although this is important work that aims to better understand what features cue negative responses, these studies did not test the specific features, or combination of features, that compose a stereotypical Black face-which is the next step in understanding why some within-race faces are judged especially harshly. One study did test specific features to determine prototypicality for several race groups. Strom et al. (2012) tested how facial metrics (e.g., face width, feature size) and skin tone influenced judgments of prototypicality across Black, White, and Korean faces. Results for Black faces showed that facial metrics had the biggest influence on White perceivers' prototypicality ratings, while skin tone was consistently impactful for Black and Korean perceivers. Black face prototypicality was not specifically identified by metrics; however, relative to White faces, Black faces were rated as having a wider nose, thicker lips, and a wider jawline (Strom et al., 2012). Aside from this study, the bulk of the research that attributes behavioral associations to Black face-types generally suggests that stereotypicality includes some combination of a wider nose, fuller lips, and darker skin (e.g., Blair, 2006;Blair et al., 2004aBlair et al., , 2004b. Thus, testing and identifying what features specifically define a stereotypically Black face will inform what cues associations to criminality and negative judgments. People have stereotypes about what makes a criminal face (MacLin & Herrera, 2006;MacLin & MacLin, 2004): they have long, shaggy, dark hair; tattoos; beady eyes; pockmarks; and scars. Faces rated high in criminality may also be identified from police lineups on appearance alone (Flowe & Humphries, 2011), and such a response is associated with Criminal face-type bias. Similarly, participants making speeded first impression judgments of convict faces revealed that criminality was determined immediately and was related to judgments of low trustworthiness and high dominance (Klatt et al., 2016). These studies focused on Caucasian faces, but similar biases occur for Black faces (e.g., Kleider et al., 2012).
How people form these judgments so quickly is a point of discussion. One idea is that people infer personality traits from the similarity of a person's facial features to emotional expressions (i.e., the Emotion Overgeneralization hypothesis; Zebrowitz, 2004). Emotionally neutral faces that look angry are perceived high in dominance, while neutral faces that appear happy are perceived as trustworthy. To test the influence of these traits on criminality, Flowe and Humphries (2011) had participants rate cropped faces, such that there was no clothing or background information available, of actors and inmates on criminality, anger, dominance, trustworthiness, and maturity (i.e., baby-facedness). Results showed that, regardless of face group, both male and female faces that were judged high in criminality were also judged as high in dominance and low in trustworthiness, with angry faces being perceived as the most dominant. This suggests that a possible cue to determining that a face is threatening (i.e., associated with fear) and also criminal, is the extent to which the face looks dominant. This relationship is born out of face trait models that show that the more dominant a face is perceived, the more threatening it is judged; and these impressions of threat are closely tied to criminal appearance (Funk et al., 2017).
To investigate the relationship between facial cues and trait assessments, Oosterhof and Todorov (2008) hypothesized a framework for face evaluation. They used a datadriven approach, based on principal components analysis of 2D facial images, wherein people made judgments of face traits and then determined which facial features mapped onto which traits. Through this computational modeling approach, they could model social perception of faces tied to facial structure that influenced a specific judgment, such as dominance or trustworthiness. Using this approach, they could modify the structure of new faces to increase or decrease how trustworthy or dominant they looked. These models have been examined in several studies Todorov et al., 2013;Walker & Vetter, 2009), suggesting that spontaneous trait inferences made based on facial appearance are derived from valence and dominance. In Todorov et al. (2008Todorov et al. ( , 2011Todorov et al. ( , 2013 model of face evaluation, valence is a cue to whether a person should be approached or avoided, while dominance cues the likelihood of a person inflicting physical harm. Features of faces associated with happiness and anger (i.e., valence) are overgeneralized to determine whether a person is trustworthy and should be approached or avoided. Facial features that appear dominant (e.g., looking more masculine or mature) are used to evaluate physical strength. From an evolutionary standpoint, these findings suggest that these cues are adaptive for determining who to approach and who to avoid. In support of this idea, Todorov et al. (2013) found that assessments of threat derived from facial appearance are negatively associated with perceptions of trustworthiness and positively associated with perceptions of dominance. In a similar vein, Hehman et al. (2017) investigated the contribution of dominance, trustworthiness, and youthful-attractiveness on face judgments focusing on the different contributions of the perceiver and the stimuli. They found that trait-based factors representing character (e.g., dominance) are driven more by the perceiver than are factors based on appearance (e.g., attractiveness). The authors explained how cross-classified regression can estimate the amount of variance due to faces, raters, and error, and that trait impressions are derived from several sources.
What makes a face dominant, trustworthy, and threatening is well established; what is less clear is what features or combination of features, makes a face stereotypically Black, and how those features may relate to these other traits. Could it be that features that are consistently rated as dominant are consistent with features that are rated as stereotypically Black, and therefore threatening? The current study will take the next step in addressing this gap in the literature.
We plan to evaluate whether specific facial features, or combinations of features, considered stereotypically Black are also associated with dominance and threat. We hypothesize that Black stereotypicality, dominance, and threat will be positively related traits. To test this expectation, we will focus on three main aims: (1) to examine how lip width, nose width, and skin reflectance correspond to ratings of dominance, threat, and stereotypicality; (2) to examine the extent to which rater characteristics may affect face ratings; (3) to evaluate the extent to which ratings of dominance, threat, and stereotypicality are related to each other after accounting for the effects of facial features and rater demographics.
Together these results will help to determine whether some of the bias found in judgments of more versus less-stereotypically Black faces are underpinned by feature judgments that are afforded to all faces with these features. In addition, the participant sample used in this study is primarily Black women, while much of the research to date on face-type bias focuses on a White sample. Assessing trait judgments in a sample of people who are the target of the biased judgments, will aid in understanding not only the cultural implications of facetype bias but the ubiquitous nature of such judgments. Moreover, this work addresses the need for face perception research to extend beyond primarily White samples as the fluidity of face judgments maybe based on context and the racial group that one identifies with (Willadsen-Jensen & Ito, 2008). Kleider-Offutt et al. Cogn. Research (2021) 6:53

Materials
FaceGen Modeller software (Singular Inversions, Toronto, Canada) was used to generate an average, baseline face (i.e., no feature manipulations) that was subsequently altered on different feature dimensions to create our core stimulus set. Stimuli faces were computer-generated ( Fig. 1) to afford complete control over feature manipulations. Additionally, faces were presented without hair or specific skin tone (i.e., faces were racially ambiguous), such that each face was initially generated as a 'European' face in FaceGen Modeller and further adjusted to appear slightly darker in complexion utilizing the software, to isolate responses to the manipulated features as much as possible. Faces were presented in full color to participants.
Building from the average, baseline face, each successive stimulus face was manipulated to contain a specific level of nose width (wide, average, thin), lip fullness (full, average, thin), and/or reflectance (skin texture and brightness; none, medium, or high). Nose and lip features, specifically, were adjusted using the built-in sliding scale controls in FaceGen Modeller. Furthermore, each level of each feature (e.g., thin nose, full lips, etc.) was scaled to the same value for each face with that specific feature. To achieve varying levels of reflectance, we altered the contrast of the photographs (i.e., no contrast [no reflectance], 50% contrast [medium reflectance], 100% contrast [high reflectance]). It is important to note that reflectance is not meant to cue race in this paradigm, but rather we are interested in whether manipulations of skin texture and brightness, which have previously been shown to signal dominance and threat, interact with nose and lip manipulations to influence judgments of perceived stereotypically Black faces.
In total, nine faces were created with different combinations of nose width and lip fullness, and each of these nine faces was further manipulated for each level of reflectance. These three features, with three levels each, yielded a set of 27 distinct stimulus faces in total. While the stimuli set is relatively small, we have maintained maximal control over the unique faces which allowed us to assess the individual and combined influence of each feature on our outcomes of interest. Furthermore, preratings of the stimuli were not collected since the goal of the study was to obtain information regarding first impressions of specifically manipulated facial characteristics (nose, lips, and reflectance).

Procedure
In a computer laboratory with seven partitioned workspaces, each participant was randomly presented with the 27 unique facial models sequentially at the center of their computer screen. Before the presentation of each stimulus face, a fixation cross appeared in the center of the screen for 500 ms. The fixation cross was then replaced by a stimulus face for an additional 500 ms. Although prior literature has shown that individuals can form a first impression in as little as 38 ms , initial test subjects were given 100 ms to view a face. However, participants expressed stress and discomfort concerning the speed of presentation time. Thus, the stimulus presentation time was increased to 500 ms to reduce the likelihood of a potential stress response among participants, while also maintaining the desired speeded nature of the task.  Kleider-Offutt et al. Cogn. Research (2021) 6:53 Following the presentation of each face, participants provided judgments on a variety of randomized perceived inherent traits of the face and behavioral attributes (e.g., dominance, stereotypicality, threat). The full list of traits and application-based questions that were assessed, including those not used in the current report, can be found in "Appendix 1". The response scale ranged from 1 (not at all) to 7 (extremely) for each trait judgment. Participants had unlimited time to make their response via keypress (1-7). No two participants saw the facial stimuli presented in the exact same order, nor did participants make behavioral/applied judgments in the same order for each face. Given that both the facial stimuli and associated judgments were fully randomized for each participant, we did not expect any carryover effects.
It is important to note that each of the 27 stimulus faces was shown for a total of 14 consecutive trials in which respondents would rate the face on eight traits and then respond to six applied judgment questions. These trait ratings and judgment questions are listed in "Appendix 2". In these trials, a face would appear for 500 ms, then a judgment question, then the same face would appear for another 500 ms followed by a different judgment question, and so on until that specific stimulus had been rated on 14 different trait and judgment questions. As an initial complex multivariate model, we present in this paper an analysis of the three traits of stereotypicality, dominance, and threat.
After completing the face rating task, participants completed the Symbolic Racism 2000 Scale 1 (Henry & Sears, 2002; not included in the following models) and a brief demographics questionnaire.

Analysis
The model was a joint set of three multilevel regressions: responses to 27 faces within 341 raters, where each rating (dominance, threat, or stereotypicality) was predicted by facial features and rater characteristics. The general form of this model of face f by rater r can be conceptually represented by: where Rating fr represents the response for that trait (stereotypicality, dominance, or threat); Nose f represents the level of nose width (thin, average, or wide) for that face; Lip f represents the level of lip fullness (thin, average, Rating fr =Nose f + Lip f + Reflectance f + Race r + Age r + Gender r + e r full) for that face; Reflectance f represents skin texture and brightness (low, moderate, high) for that face, plus all two-way interactions for these three features (not shown); Race r represents the race of the rater (Black, White, Hispanic/Latinx, Asian, Biracial, or other); and e r is random error. The full representation of these variables and how they were coded is presented in "Appendix 2". This regression was fit jointly for the three traits: dominance, threat, and stereotypicality (i.e., as a simultaneous structural equation model of three rating outcomes). All models were fit in Mplus version 8.1 (Muthén & Muthén, 2017), treating the 7-point ratings as continuous (findings were highly similar when we treated the ratings as categorical, so we chose to report here the simpler, continuous score model).
Because ratings were nested within faces and within raters (i.e., were cross-classified; Hehman et al., 2017), we initially fit a trivariate model of the three outcome ratings nested within faces and raters. However, in this crossclassified model, the face level was nearly fully explained, with near-zero residual variances-an understandable finding because we modeled all 27 feature patterns which were designed into the study. We therefore fit the same data to a two-level model of ratings in raters, and the model fit essentially the same (cross-classified DIC = on 96 parameters; two-level DIC = on 90 parameters). Moreover, we graphed the model-based predictions and found no strong substantive differences. We therefore present the technically simpler two-level results.
In addition, we wish to evaluate the pattern of all possible effects and to discourage the dichotomous yes/no thinking for individual effects (especially in the presence of interactions). We therefore focus on the overall modelimplied effects in the graphs, presented in the appendices, which are expected to be invariant under different coding schemes. Indeed, we wish to discourage unrealistic, overly narrow reliance on p-values for individual effects because our model is attempting to capture the design of all 3 features, each with 3 levels. We therefore rely on the graphs of the model-implied effects, rather than estimates of individual parameters. Kleider-Offutt et al. Cogn. Research (2021) 6:53 Question 1: what features predict dominance, threat, and Black stereotypicality?

Effects of facial features
Facial features were modeled as contrasts of two extremes, each around an average: noses were thin, average, or wide; lips were thin, average, or full; reflectance was none, medium, or high. These features were modeled as dummy variables for thin and for wide/ full (versus average) noses and lips, and for none/high versus medium reflectance (see "Appendix 2" for equations). This coding scheme allowed us to directly model every condition in the experiment, and without making assumptions of linearity or equal intervals between low, average, and high conditions of the facial features. In addition, all two-way interactions of these dummy variables were also modeled. Because these six main effects and 12 two-way interactions can be cumbersome to display and difficult to interpret, we present graphs for the model-implied effects on each of the three outcomes. The parameter estimates for the predictions by facial features are presented in "Appendix 3" Table 2. Figure 2 presents the model-estimated ratings for dominance. The vertical axis is the predicted rating on the 7-point scale. The horizontal axis represents nose width: thin, average, and wide. Each panel represents one level of lip fullness: thin, average, and full. Within each panel, there is a separate line for reflectance: none (light gray), medium (gray), and high (black). The dotted horizontal line represents the model-predicted average (intercept).
The steep upward slopes in Fig. 2 show an appreciable effect for nose width, suggesting that wider noses were seen as more dominant while thinner noses were seen as less dominant. The effect of nose width ranged up to around half a unit on the 7-point scale. The other lines did not differ much from each other, and all lines in the three panels fell within half a unit of four, suggesting only small effects of lip fullness and reflectance on judgments of dominance. Figure 3 shows the estimated ratings for threat. In all three panels, there is an upward trend for wide noses, but negligible differences for thin versus average noses. This suggests that wider noses were generally seen as more threatening. Low reflectance (light gray) is high, while there was little distinction between medium (gray) and high (black) reflectance. Thus, the absence of reflectance (i.e., none) was associated with increased judgments of threat. Thin lips (left panel) were generally more threatening than average and full lips (middle and right panels, respectively). Figure 4 shows the estimated total effects for stereotypicality. The steep upward slope across all three panels suggests that wider noses were seen as more stereotypical of Black faces, while thinner noses were seen as less stereotypical. In a similar fashion, there is a moderate amount of separation between high (black), medium (gray), and low (light gray) reflectance, with high reflectance positioned higher on the scale and low reflectance positioned lower. Therefore, higher reflectance was associated with increased judgments of stereotypicality (higher lines overall), whereas lower reflectance was associated with decreased judgments of stereotypicality (lower lines). Full lips (right panel) were slightly more stereotypical (higher lines) than average and thin lips (middle and left panels, respectively), although this distinction is somewhat Fig. 2 Model-predicted dominance ratings. The vertical axis is the predicted dominance score, with a dotted line for the model-implied mean. The horizontal axis represents the levels of nose width. The three lines represent degrees of reflectance (none, medium, high). Each panel represents the degree of lip fullness Fig. 3 Model-predicted threat ratings. The vertical axis is the predicted threat score, with a black, dotted line for the model-implied mean. The horizontal axis represents the levels of nose width. The three lines represent degrees of reflectance (none, medium, high). Each panel represents the degree of lip fullness Kleider-Offutt et al. Cogn. Research (2021) 6:53 diminished (i.e., modulated) by the combinations of nose width and reflectance.

Effects of rater characteristics
The parameter estimates for the rater level of the full, conditional model are presented in "Appendix 3" Table 3. The three predictors were older age (> 21 years old), race (White, Asian, Hispanic/Latinx, Biracial, or other, each coded as its own dummy variable), and nonfemales (91.4% male, 6.2% non-binary, 2.4% prefer not to respond). These predictors were in reference to younger adults, Black participants, and women, respectively.
Age had no substantial effects on any of the three outcomes of interest. Gender had a small effect on judgments of dominance, such that non-female participants judged faces as less dominant (β = − 0.14). Additionally, participant race had an effect on judgments of Black stereotypicality, such that White (β = − 0.31), Asian (β = − 0.21), and Hispanic/Latinx (β = − 0.20) participants judged faces as less stereotypical. Race also had a small effect on perceptions of threat, such that Asian participants rated faces as more threatening (β = 0.16). Table 1 shows the correlations among the three outcomes after accounting for the effects of rater demographics (i.e., rater-level random effects). The model-implied means (intercepts) and standard deviations of the ratings are shown at the bottom of the table. The three traits were all positively correlated with one another (r = 0.34-0.55). This finding indicates that despite rater demographics and facial features, ratings across threat, dominance, and stereotypicality were positively related.

Discussion
Inequality in the way people are judged and ultimately treated in a variety of contexts, including the legal system, may begin with biased first impressions based on facial features. Previous research links first impression judgments of certain facial features/structure to perceptions of dominance and indicators of threat (Toscano et al., 2016). Moreover, facial features that are perceived as being stereotypically Black are touted as a harbinger for biased judgments related to criminality (e.g., Kleider et al., 2012;Kleider-Offutt et al., 2017a, 2017b. However, perceptions of faces likely result from a combination of the features of the faces as well as differences among the perceivers. The current study investigated what specific facial features were associated with judgments of Black stereotypicality and whether these features were also perceived as dominant and threatening. This research may provide initial information to better understand withinrace variability in treatment and why some Black individuals are perceived as dominant and/or threatening without performing any overt actions to indicate negative behavior.

Overview of findings
Face judgment is complex, involving faces and raters (Hehman et al., 2017). Ignoring such differences due to faces and raters may be misleading with regard to relationships across traits. For example, zero-order correlations ("Appendix 3" Table 4) would suggest that dominance is positively related to both threat (r = 0.34) and stereotypicality (r = 0.20), but that ratings of threat and stereotypicality are essentially unrelated. However, our model shows that all three traits were positively related (r = 0.34-0.55) after controlling for facial features Fig. 4 Model-predicted stereotypicality ratings. The vertical axis is the predicted stereotypicality score, with a black, dotted line for the model-implied mean. The horizontal axis represents the levels of nose width. The three lines represent degrees of reflectance (none, medium, high). Each panel represents the degree of lip fullness  Kleider-Offutt et al. Cogn. Research (2021) 6:53 and rater differences. Also, our results suggest that participant demographics do little to explain the relationship between these traits. This suggests that the relationship across these three traits is largely driven by facial features and not driven by the specific perceiver demographics (i.e., race, age, gender) assessed in this study. It may be that the relationships among these traits are due in part to ubiquitous facial structure cues or due to features of the perceivers not tested here.
Previous applied research on Black face-type bias describes stereotypically Black features as a combination of nose and lip width and skin tone (here reflectance); thus, this research focused on only those features. Importantly, we found that the effects of nose width, lip fullness, and reflectance had complex effects that differed by the trait being rated. A wide nose, thin lips, and the absence of reflectance were associated with higher ratings of threat. A wide nose and higher reflectance were associated with increased judgments of stereotypicality. A wide nose was the only feature substantially related to higher ratings of dominance. This suggests that a stereotypical Black face includes a wide nose and high skin reflectance but the only feature that is consistent with dominance is a wide nose. Black stereotypicality, dominance, and threat were related to faces with a wide nose. This potentially suggests that nose width is a cue indicative of a Black face but may simultaneously cue dominance and threat. Moreover, the finding that higher skin reflectance was related to Black stereotypicality but not dominance or threat, is inconsistent with other literature. If higher reflectance is a shading or texture of skin, it makes sense that this would be tied to Black stereotypicality in line with previous research (e.g., Livingston & Brewer, 2002;Maddox & Gray, 2002;). However, Todorov et al. (2013) found that in addition to dominance, threat is also cued by higher reflectance, while we found the opposite.
In line with this idea, the three traits were positively correlated among raters (r = 0.34-0.55), suggesting moderate to strong consistency-personal biases, not explained by demographic differences-may influence trait judgments to a fair extent. Overall, this would suggest that even after controlling for facial features and demographics, participants agree that stereotypically Black faces are dominant and threatening, to a moderate to strong degree.
Together, these results suggest that a stereotypical facetype is a combination of wide nose and higher reflectance and, to a lesser extent, full lips. Thus, a face is not likely to be judged as stereotypical based on full lips alone. This refines and validates previous work noting that a stereotypically Black face is some combination of a wide nose, full lips, and darker skin (e.g., Blair, 2006;Blair et al., 2004aBlair et al., , 2004b. The current study shows that among a diverse population of mostly non-White people, a stereotypical Black face is cued by a wide nose and higher reflectance. In addition, the relations among trait dominance, threat, and stereotypicality suggests that a wide nose, consistent for all three traits, may play a role in some Black people being judged as dominant and threatening. Compared to a person who is less stereotypically Black, with lighter reflectance and a relatively narrow nose, a stereotypically Black person is more likely to be judged as dominant and threatening, and potentially perceived negatively, by people making quick judgments. This work also suggests that for people who are not White, as in our sample, Black stereotypicality is related to threat and dominance (r = 0.34 each). Although demographic differences did not substantially influence our outcomes, we suggest that the racial makeup of our sample may be why some of our results diverge from previous work regarding reflectance, threat, and dominance (Todorov et al., 2013). From an applied standpoint, facetype bias related to Black stereotypicality may lead to judgments of dominance, which in some circumstances is positive (e.g., boxer, military personnel), and in other circumstances less advantageous, which can lead to negative judgments. Together, these findings suggest, potentially, that when people see a stereotypically Black face, it may cue assessments of dominance and threat which are consistent traits related to criminality. Thus, it may be that some aspects of the facial features tested here underpin criminal face-type bias reported in previous research. These effects upon ratings cued by these facial features are important because without contextual information, people are left to rely on hasty first impression cues to predict traits or behavior, and perceivers are likely to rate these different traits fairly similarly.

Limitations
While we used the demographic information available in our model, our sample of raters was primarily young, Black, females, and likely does not allow powerful tests of differences in ratings due to age, gender, and race. A more diverse sample would be informative and could yield not merely more generalizability, but interesting tests of differences in perceptions. However, it is noteworthy that our sample diverges from much of the previous research focused on face-type bias, which has tested trait assumptions within majority White samples (e.g., Blair, 2006;Blair et al., 2004aBlair et al., , 2004bEberhardt et al., 2006;Hagiwara et al., 2012). Although our study may be limited in generalizability, using a sample of people who may be the target of Black face-type bias is especially important. The findings here suggest that even for people who are part of a minoritized group and may themselves Page 9 of 14 Kleider-Offutt et al. Cogn. Research (2021) 6:53 have encountered racial bias, are still prone to judge features representative of their racial group as dominant and threatening in some circumstances, lending support to the ubiquitous nature of biased racial judgments. In addition, we intentionally used a small set of features on artificial faces. The facial features in the current study were specifically designed and controlled to test features considered to be stereotypically Black and/or dominant in previous applied studies. More variation on more features with more faces could also provide more information about effects upon perceptions.

Conclusion
The current sociocultural climate suggests that there is a need for people to be more cognizant of how they perceive and interact with individuals from different groups. First impressions based on facial features can lead to face-type bias and can serve as a vehicle to perpetuate faulty expectations of behavior. Throughout the legal system, people are assessed from the time of first interview (e.g., when stopped on the street or pulled over in their vehicle) to trial and sentencing. An awareness of race-based biases in face judgment could be disseminated throughout the legal system as training for law enforcement and triers of fact as well as become part of jury instruction to community members who serve as jurors. An awareness of biased tendencies will not stop people from having a bias but may slow knee-jerk decisions that are made prior to considering facts and evidence. Most misidentified men who were exonerated based on DNA evidence are Black (The Innocence Project, 2021), which suggests biased expectations are at work. Knowing that some Black individuals are judged as dominant and possibly threatening based on their facial structure should encourage citizens, law enforcement, and the legal system generally, to pause before making judgments that could have long-term impact.

Appendix 1
Full list of traits assessed (1 (not at all) to 7 (extremely)): How ________ was the previous face?: • Full list of applied questions assessed (1 (not at all likely) to 7 (extremely likely)): How likely is it that you would ________?: • Sit next to this person on the bus?
• Share an Uber with this person? • Talk to this person at a party?
• Trust this person with your money?
• Vote for this person in an election?
• Trust this person to deliver a valuable package?

Cross-classified model equations
The cross-classified model for i ratings given by j raters across k faces upon the three traits t was specified as: where Y ijkt is the rating i given by person j for face k upon trait t. ß 1jt is the main effect of person j upon trait t. ß 2kt is the main effect of face k upon trait t. e ijkt is random error, distributed normally with free covariance across the three traits (i.e., each with mean zero and freely estimated variance and covariances).
where γ 10t is the grand mean of trait t. γ 11t White j is the fixed effect of White (relative to Black raters) upon trait t. γ 12t Asian j is the fixed effect of Asian (relative to Black raters) upon trait t. γ 13t Hisp j is the fixed effect of Hispanic/Latinx (relative to Black raters) upon trait t. γ 14t Old j is the fixed effect of being older than 21 years (i.e., dummy coded) upon trait t. γ 15t Non-Female j is the fixed effect of being nonfemale (male, non-binary, prefer not to respond) upon trait t. u 1jt is random error for rater j, distributed normally with free covariance across the three traits (i.e., each Level j (raters) : ß 1jt = γ 10t + γ 11t White j + γ 12t Asian j + γ 13t Hisp j + γ 14t Old j + γ 15t Non−Female j + u 1jt Page 10 of 14 Kleider-Offutt et al. Cogn. Research (2021) 6:53 with mean zero and freely estimated variance and covariances).
where each of the three facial features is dummy coded relative to their midpoint (zero), nose (thin, wide: NT, NW), lips (thin, full: LT, LF), and reflectance (no, high: RN, RH): γ 21t &γ 22t are the fixed effects of thin and wide nose (relative to average nose) upon trait t. γ 23t &γ 24t are the fixed effects of thin and full lips (relative to average lips) upon trait t. γ 25t &γ 26t are the fixed effects of no and high reflectance (relative to average reflectance) upon trait t. γ 27t -γ 38t are the fixed effects of all two-way interactions of respective facial features upon trait t. u 2kt is random error for face k, distributed normally with free covariance across the three traits (i.e., each with mean zero and freely estimated variance and covariances).

Two-level model equations
Because most of the variance components for faces (u 2kt ) in the cross-classified model estimated close to zero, we fit the model as a two-level model of i ratings within j raters for the three traits t. Because this is a simple restriction of the cross-classified model (no variance components for k faces), we keep the same notation, but drop the k subscripts: where Y ijt is the rating i given by person j upon trait t. ß 1jt is the main effect of person j upon trait t. e ijt is random error, distributed normally with free covariance across the three traits (i.e., each with mean zero and freely estimated variance and covariances).
where all parameters are the same as those provided for the cross-classified model-only the random effects for faces (u 2kt ) are excluded (i.e., restricted to zero).