Humans versus AI: whether and why we prefer human-created compared to AI-created artwork

Bellaiche, Lucas; Shahi, Rohin; Turpin, Martin Harry; Ragnhildstveit, Anya; Sprockett, Shawn; Barr, Nathaniel; Christensen, Alexander; Seli, Paul

doi:10.1186/s41235-023-00499-6

Original article
Open access
Published: 04 July 2023

Humans versus AI: whether and why we prefer human-created compared to AI-created artwork

Lucas Bellaiche ORCID: orcid.org/0000-0002-5548-4271¹,
Rohin Shahi¹,
Martin Harry Turpin²,
Anya Ragnhildstveit³,
Shawn Sprockett⁴,
Nathaniel Barr⁵,
Alexander Christensen⁶ &
…
Paul Seli¹

Cognitive Research: Principles and Implications volume 8, Article number: 42 (2023) Cite this article

39k Accesses
11 Citations
66 Altmetric
Metrics details

"We hear much these days about the remarkable new thinking machines. We are told that these machines can be made to take over much of men's thinking…eventually about the only economic value of brains left would be in the creative thinking of which they are capable…"

- J. P. Guilford, “Creativity” (1950), p. 446.

Abstract

With the recent proliferation of advanced artificial intelligence (AI) models capable of mimicking human artworks, AI creations might soon replace products of human creativity, although skeptics argue that this outcome is unlikely. One possible reason this may be unlikely is that, independent of the physical properties of art, we place great value on the imbuement of the human experience in art. An interesting question, then, is whether and why people might prefer human-compared to AI-created artworks. To explore these questions, we manipulated the purported creator of pieces of art by randomly assigning a “Human-created” or “AI-created” label to paintings actually created by AI, and then assessed participants’ judgements of the artworks across four rating criteria (Liking, Beauty, Profundity, and Worth). Study 1 found increased positive judgements for human- compared to AI-labelled art across all criteria. Study 2 aimed to replicate and extend Study 1 with additional ratings (Emotion, Story, Meaningful, Effort, and Time to create) intended to elucidate why people more-positively appraise Human-labelled artworks. The main findings from Study 1 were replicated, with narrativity (Story) and perceived effort behind artworks (Effort) moderating the label effects (“Human-created” vs. “AI-created”), but only for the sensory-level judgements (Liking, Beauty). Positive personal attitudes toward AI moderated label effects for more-communicative judgements (Profundity, Worth). These studies demonstrate that people tend to be negatively biased against AI-created artworks relative to purportedly human-created artwork, and suggest that knowledge of human engagement in the artistic process contributes positively to appraisals of art.

Introduction

Art is widely considered to be a uniquely human phenomenon. It can encapsulate and communicate our emotions, be used to express our individualistic and communal experiences, and serve as a social commentary on these experiences, all of which are elements that are commonly thought to be human-specific (Chatterjee, 2014). Yet, art is also a perceptual product that is engaged through the senses, sometimes independently of the experient’s awareness of the inclusion (or lack thereof) of these human elements in artworks they evaluate. Here, an important distinction emerges between art as (1) a purely physical stimulus and (2) a deeper communicative medium of the human experience. The importance of this distinction lies in its ability to permit investigations of the separate and joint influences that these two conceptions of art may have on people’s appraisals of art, which in turn contributes to our understanding of the factors and processes involved in aesthetic appraisals. And, with the recent development of highly advanced artificial intelligence (AI) models that can produce, as purely physical stimuli, high-quality artworks that are indiscernible from human-created artworks (Gangadharbatla, 2022; Johnson, 1997; Mazzone & Elgammal, 2019), the Computer Age in which we live has afforded considerable experimental control over these two conceptions of art. Here, across two studies, we explored the question of whether and why humans might tend to prefer ostensibly human-created artworks over AI-created artworks.

Although the process of appraising artworks undoubtedly involves a subjective element, consistent preferential patterns have nevertheless emerged across peoples’ aesthetic judgements of such artworks (e.g., blue is the most preferred color, whereas yellow is most disliked [Bornstein, 1975; Komar & Melamid, 1999; Palmer et al., 2013]; and representational art is enjoyed over abstract art [Heinrichs & Cupchik, 1985; Kettlewell et al., 1990; Knapp & Wulff, 1963; Mastandrea et al., 2011, 2021]). However, the ways in which we judge art are often sensitive to factors in the environment, not just intrinsic preferences. One common finding across literatures of judgement and decision-making is that labels play an important role in people’s general evaluations of things. For instance, people prefer wine they are falsely told is more expensive than other wine (Plassmann et al., 2008). The influence that a mere label has on our judgements is also made evident by research showing that, when participants are provided Coca-Cola either with or without its label, they report greater enjoyment of the labelled beverage than the non-labelled beverage, despite their otherwise identical compositions (McClure et al., 2004). As demonstrated by these studies, our judgements rely heavily on contextual information. And, perhaps unsurprisingly, the role that context plays in our appraisals extends to judgements of art as well. For example, when Newman and Bloom (2012) provided participants artworks labelled either as originals or as identical forgeries, participants preferred the originals over the forgeries: a finding that was taken to indicate that humans are sensitive to an authentic process of creative production as determined simply by a label. In other words, people consider non-sensory aspects of art, like context and background, in their judgements and evaluations of art (Blank et al., 1984; Chatterjee & Vartanian, 2016; Winner, 1982).

The influential role that labels play in our judgements is also made clear in contemporary lines of research focused on AI. The general finding emerging from this research is that, in many instances, the label of “AI” is taken to be a pejorative. For instance, Liu et al. (2022) conducted an experiment wherein participants were asked to read a number of emails and were told that some of the emails (but not others) were drafted with the help of an AI-based language model. Results indicated that, when participants were told that AI played a role in drafting the email, their reports of trust in the email writer decreased. This anti-AI bias seems especially apparent in domains that people assume to be human-specific (e.g., those concerning affect or creativity; Wilson, 2011). Research has only just begun to connect the spheres of aesthetics and AI, yet, the discussion of AI-produced creativity arose many decades ago. Indeed, in his 1950 Presidential Address to the American Psychological Association (APA), JP Guilford foreshadowed the development of AI in modern society, noting that, if advances in “thinking machines” (i.e., AI) reached their forecasted height, then “the only economic value of brains left would be the creative thinking of which they are capable” (Guilford, 1950). While Guildford’s position is certainly provocative, he may have overestimated the extent to which the apparent ‘last bastion’ of human utility— “creativity”—is uniquely human; instead, it could be that, with recent advances in highly advanced “thinking machines,” even our creative abilities could lose their economic value when competing with the abilities of AI.

Initial forays into the AI-art interaction seem to suggest that, not only is AI art sufficiently sensorily similar to human-created art that people fail to accurately discern its true creator (human or AI), but also that people tend to derogate AI-created art as compared to human-created art (Chamberlain et al., 2018; Gangadharbatla, 2022; Mazzone & Elgammal, 2019). For example, Chamberlain et al. (2018) found that participants rated human-created artworks as higher in aesthetic value than AI-created artworks. Interestingly, however, this bias was malleable: indeed, when participants were shown videos of robots that were physically producing art, participants’ anti-AI art judgements were reduced, suggesting that people may consider the level of artistic effort exerted by the creator as a marker of the quality of the produced artwork. Similar findings emerged in a more-recent study in which participants were shown, and asked to provide judgements of, two abstract paintings (Chiarella et al., 2022). While both paintings were in fact created by humans, by randomly assigning a label of “human-created” or “AI-created” to each painting, the researchers deceived participants into believing that one of the paintings was created by AI. Results of this study indicated that participants tended to prefer the painting that was labelled “human-created” relative to the painting labelled “AI-created.” Notably, this preference for human over AI art is not specific to visual art. In fact, several recent studies have reported similar anti-AI findings in music (Shank et al., 2022), creative writing (Raj et al., 2023), dance (Darda & Cross, 2023), poetry (Köbis & Mossink, 2020), and non-art texts (Darda et al., 2023). This bias in aesthetic judgement is further intensified when judging more “human” aspects of art, like evoked emotion, suggesting the need for multiple judgement criteria in AI aesthetics research (Raj et al., 2023).

Critically, however, other studies have yielded results that are at odds with those from the aforementioned studies, finding little-to-no differences in evaluations of human- and AI-created artworks (Hong & Curran, 2019; Israfilzade, 2020; Xu et al., 2020). Additionally, though not an explicit investigation of aesthetic-judgements, per se, one study found that participants were equally likely to consider hypothetical products created by human and AI artists as “art” (Mikalonytė & Kneer, 2022). Put differently, people did not tend to perceive AI artists’ hypothetical products as being lesser than human artists’ hypothetical products. Given these mixed results, it is challenging to ascertain whether humans do in fact prefer human-created artworks over AI-created artworks, and, if so, why this is so. Thus, here, we examined this question with the intention of shedding more light on their answers.

The current studies

Across two studies, we sought to extend the extant—albeit sparse—literature on attitudes toward artworks created by humans as compared to artworks created by AI platforms. Our studies had three primary aims. First, while a handful of studies have examined the question of whether people differ with respect to their judgements of human-created versus AI-created artworks, at noted above, mixed results have been yielded, leaving the answer to this question unclear. Thus, by employing two large-sample (Ns = 150 and 151), highly powered, within-subjects studies including relatively large stimulus sets, we sought to provide greater clarity on the answer to this question. We hypothesized that humans will indeed show a preference for human-labelled art relative to AI-labelled art.

Second, we sought to identify some of the more-complex criteria that people might rely upon when making aesthetic judgements. Many aesthetic studies have used simple probes assessing general “liking” and “aesthetic beauty” to index art judgements (for a review, see Chatterjee & Cardillo, 2022). However, most models of aesthetics consider aesthetic judgements to be a multifaceted and/or hierarchical cascade of evaluations, implicating more-complex and elaborative processes in judgements of art that seek to understand meaning and communication through the piece (Berlyne, 1960, 1971; Chatterjee & Vartanian, 2016; Cupchik & Berlyn, 1979; Graf & Landwehr, 2015; Leder et al., 2004; Silvia, 2005). This is supported, in part, by the growing discussion of art not only as a bottom-up appraisal of only visual features that induces liking or judgements of beauty, but a device for communication and socio-epistemic value (see Sherman & Morrissey, 2017). This distinction considers human expression and, as such, is essential to investigate in aesthetic judgements of art created by AI. Thus, in Study 1, in addition to simply indexing Liking and Beauty, which we consider to be more-passive, surface-level appraisals in line with fluency models of aesthetics (see Leder et al., 2004; Graf & Landwehr, 2015), we also obtained assessments of how profound participants found each painting to be, and how much money they would (hypothetically) spend on each painting. These latter criteria require higher cognitive elaboration to determine communicative properties of art, as discussed in Graf and Landwehr (2015) and Sherman and Morrissey (2017). In Study 2, we further developed our probing procedure by, in addition to asking the four questions from Study 1 (i.e., Liking, Beauty, Profundity, Worth), asking participants to rate each painting on the following additional communicative attributes of the art to understand potential interactions between surface-level and communicative engagement processes: emotionality (how much the artwork evoked an emotion in the viewer; Emotion), narrativity (the degree to which the artwork portrayed an imagined narrative in the viewer; Story), meaningfulness (how personally meaningful the artwork was to the viewer; Meaningful), perceived effort (how much effort the viewer thought went into the creation of the artwork; Effort), and perceived time (how much time the viewer thought went into the creation of the artwork; Time). We predicted that the anticipated preference for human-labelled art over AI-labelled art can be, at least partially, explained by a perceived lack of integration of the human experience in AI-created artworks as measured by these wide-ranging aesthetic criteria.

Third, given that aesthetic appraisals are personal and subjective (Chatterjee, 2014; Roseman & Evdokas, 2004), also imperative in their investigation is consideration of individual differences that might reliably predict these judgements. In this vein, past work has considered the personality trait of openness to experience and its interactions with artwork and other creative outputs (Kaufman, 2013; McCrae, 2007; McCrae & Greenberg, 2014), with Silvia et al. (2015) considering it “an essentially aesthetic trait” (p. 376). In addition, openness has been shown to influence judgements of art depending on painting type, such that people who are more open to experience tend to have an increased appreciation for abstract art compared to less-open participants (Feist & Brady, 2004). Other personality traits explored in aesthetics and creativity research include empathy and embodied cognitions (Freedberg & Gallese, 2007; Rusu, 2017), and beliefs of creative mindsets (Hass et al., 2016; Karwowski, 2014). To provide greater clarity on the possible influence of individual-differences measures on appraisals of artwork, in Study 1, we assessed age and scores on the cognitive reflection test (CRT), and in Study 2, age, CRT scores, openness to experience, personal attitudes toward AI, empathy, growth mindsets, and fixed mindsets (below, we outline our rationale for inclusion of these individual-differences measures).

Study 1

Method

Study 1 was approved by the University of Waterloo Research Ethics Board (31067). All data and stimuli can be found at https://osf.io/cgw8v/.

Participants

One-hundred and fifty participants, each with at least 100 approved human intelligence tasks and an approval rating of above 90%, were recruited through Amazon’s Mechanical Turk. Power analyses were not performed a priori. Instead, we reviewed the most closely related studies we could find, finding that, in Study 1 of Chamberlain et al. (2018), 65 participants completed the study. To ensure our study was well-powered, we decided to nearly triple this sample size. Participants were told at sign-up that they would be evaluating pieces of art and filling out questionnaires meant to probe the way they think. Participants were directed to Qualtrics to complete the study, and then were debriefed following completion of the study and were compensated for their time. One participant was excluded for bad data (i.e., they used the same response throughout the entire study), resulting in 149 participants (M_age = 42.35, SD = 11.59; female = 65).

Materials

Thirty AI-created paintings, considered by the authors of this article to be of high quality, were taken from ArtBreeder, which is a machine learning website that produces art (see Fig. 1 for sample images). The images were open-source and pre-existing and were not created by the authors. While it might seem most reasonable to obtain artworks from both AI and humans (and not deceive participants), one possible problem with this approach is that there could be error introduced into the selection process. For instance, although one could do their best to ensure that any selected human and AI artworks are comparable, without first conducting a norming study first (across all of our measures), this outcome could not be verified. For this reason, we opted to present a stimulus set that was exclusively AI-created.

Of the 30 AI-created paintings, half (15) were representational (i.e., reflected an easily recognizable figure or object), and half (15) were abstract (i.e., they included partially or completely unrecognizable referents), as rated by the first author. This distinction was made in the present study because past results have repeatedly revealed that people tend to prefer representational over abstract paintings (Heinrichs & Cupchik, 1985; Kettlewell et al., 1990; Knapp & Wulff, 1963; Mastandrea et al., 2011, 2021). Here, we not only wanted to attempt to replicate this finding, but also to determine whether evaluations of different painting types (representational and abstract) differ as a function of whether the purported creator was a human or AI. Research on AI-art perceptions has primarily used abstract paintings as stimuli, which could limit our understanding of the true role of painting type in aesthetic judgements (Chiarella et al., 2022; Israfilzade, 2020). However, Chamberlain et al. (2018) presented varied painting types but found no interaction between painting creator and painting type on general aesthetic value ratings; with our extension of rating criteria, potentially different processes of judgements that may rely both on painting type and creator could be illuminated. Relatedly, Gangadharbatla (2022) found an increased willingness for people to believe that abstract art tends to be created by AI (compared to humans), whereas Chamberlain et al. (2018) found an increased willingness for people to believe that representational art tends to be created by humans. Thus, it seems that painting type does bear some relationship to the creator of the painting; to date, however, this is unclear.

Importantly, each image was presented in random order and had individual-level randomization of a label of “human-created” or “AI-created.” In other words, different labels were assigned to different images across participants. Thus, on average, participants were presented 15 images with an AI label, and 15 images with a human label, even though all images were in fact AI-created.^{Footnote 1} Participants were asked to rate each image on the following criteria: “How much do you like this image?” (Liking), “How beautiful/aesthetically pleasing is this image?” (Beauty), “How profound or meaningful is this image?” (Profundity), and “How much money would this work be worth?” (Worth). All criteria were answered on a 1–5 Likert scale [“Not at all”… “Very much”], except for Worth, which was answered on a 1–5 Likert scale with possible responses [“None at all”… “Worth quite a lot”]. In addition, for each image, participants were asked “Based on the label above, was this image created by a human or an artificial intelligence computer program?” (Label-Check). This question was used as an attention check. Incorrect responses to the Label-Check were excluded from data analysis, resulting in the removal of 113 trials (out of 4470 total trials across the 149 participants). No participants themselves were removed on the basis of this attention check, only trials.

Before rating the images, participants were asked to complete the 7-item cognitive reflection test (CRT; Frederick, 2005; Toplak et al., 2014). This extended CRT, which has been shown to have high internal consistency (⍺ = 0.74), consists of seven numeracy questions and is administered to assess an individual’s cognitive ability to override instinctual responses for a correct answer to the problem (Campitelli & Gerrans, 2014). Thus, the CRT considers individual differences in quantitative skills and bias-overriding. Our rationale for including the CRT was to determine whether CRT scores predict different ratings across human- and AI-created artwork: If we do observe an anti-AI bias—with more-positive judgements for human artworks than AI artworks—then one possibility is that individuals who have a greater ability to override their intuitions may show less of a bias, given that this bias may be the result of an automatic (intuitive) response to devalue artworks created by AI. Notably, both CRT and age were included as exploratory variables.

Procedure

Participants first provided informed consent following a description of the nature of the study. Next, they were given two bot checks (zero participants failed these checks, and all of them therefore moved forward with the study). Participants were then directed to complete demographic information, followed by the CRT. When finished with these tasks, participants received instructions for the remaining portion of the survey, which asked them to rate the 30 AI-created paintings on Liking, Beauty, Profundity, and Worth, with randomized labels of AI- or human-created. Following ratings, participants were debriefed on the purpose of the study, once again provided consent for their data to be used for analyses, and were redirected for compensation.

Study 1 results

Paired t-tests with Bonferroni corrections show increased ratings for human-labelled over AI-labelled art for all four criteria: Liking (t(148) = 2.644, p = 0.036, d = 0.17), Beauty (t(148) = 3.499, p = 0.002, d = 0.22), Profundity (t(148) = 7.725, p < 0.001, d = 0.47), and Worth (t(148) = 10.042, p < 0.001, d = 0.61) (Fig. 2). In addition, for each criterion, a difference score between average human-labelled art rating and average AI-labelled art rating was calculated for each participant and was plotted against the participant’s CRT score. Only a relationship between CRT and Beauty difference scores emerged significant, with lower CRT scores associated with higher Beauty difference scores (that is, higher average Beauty judgements were given for human-labelled than AI-labelled art; r = − 0.17, p = 0.042).^{Footnote 2} No such significant relationships emerged elsewhere (Liking: p = 0.07, Profundity: p = 0.95, Worth: p = 0.37). The same difference scores were also plotted against age; again, no significant relationships emerged (p = 0.29, p = 0.20, p = 0.26, p = 0.59). A significant relationship emerged between age and CRT performance (r = 0.23, p = 0.004), such that older participants performed better on the CRT.

Lastly, exploratory linear mixed-effects models using maximum likelihood estimation were performed for each of the four main criteria as the outcome variables: Liking, Beauty, Profundity, and Worth. Across models, participants and paintings were random effects, resulting in crossed random effects models. For all models, our manipulated variables of the art’s Label (AI or Human) and Painting Type (abstract or representational) were included, and we tested a potential interaction between Label and Painting Type. Across all criteria, a significant main effect of Painting Type emerged, such that participants preferred representational over abstract paintings regardless of Label (p’s < 0.001). For Liking, a non-significant main effect of Label (p = 0.057) was qualified by a significant interaction between Painting Type and Label (p = 0.041). Additionally, for Profundity, a main effect of Label (p < 0.001) was qualified by a significant interaction between Painting Type and Label (p = 0.036). When a painting is representational, art is liked more and found more profound when it is created by a human as opposed to AI. No significant interaction between Painting Type and Label emerged in Beauty (p = 0.21) or Worth (p = 0.18).

Study 1 discussion

Across all four rating criteria (Liking, Beauty, Profundity, and Worth), we found an anti-AI bias, with participants showing a preference for art labelled as “human-created” compared to “AI-created.” This finding is particularly important given the within-subjects design of our study, its relatively large sample size, range of rating criteria, range of painting types, and randomization of labels across a large set of paintings.

Study 1 primarily served as a test of whether an anti-AI-art bias exists across a range of aesthetic judgement criteria. Given our relatively large sample, we were confident in the findings of this study, which, on the whole, mirrored findings of Chamberlain et al. (2018). However, extending these results, here we showed that this pattern of judgements for AI-art biases is reflected across appraisal processes outside of general aesthetic preferences. For instance, not only is AI-art liked less, but it is also viewed as less worthy and less profound, which may have interesting implications for the ways in which people will consume AI-art in the future.

In addition, though some studies also used a randomization of labels (as in the present study; e.g., Chiarella et al., 2022), our study was unique in its use of only AI-created images, across many more stimuli, in a within-subjects design. Statistically, this ensured higher power and increased confidence in our findings. In other words, participants were given the opportunity to judge both human- and AI-labelled artworks, and this permitted a sounder comparison of participants’ judgements across the two sources. Perhaps most importantly, though, was that all artworks were from AI in actuality. This underscores the fact that participants are generally incapable of noticing what source created what paintings—as supported by Gangadharbatla (2022) and Chamberlain et al. (2018)—which ultimately reflects the noteworthy quality of AI-art today. Though we did not explicitly ask participants for their confidence in the accuracy of our labels, our decision to exclusively select AI-created stimuli enabled us to dissociate top-down from bottom-up processing in these aesthetic judgements. That is, actual human versus actual AI images may differ in certain visual, bottom-up qualities.^{Footnote 3} Thus, by exclusively selecting AI artwork and randomizing labels on a random subset of paintings, we ensured that any observed effects were necessarily top-down and isolated to source manipulation, as opposed to bottom-up, or driven by potential inherent differences across human- and AI-created artworks.

Somewhat surprising was the lack of statistically significant associations between our individual-differences measures (CRT and age) with the difference scores of human-AI ratings for each participant, which might be expected to track an AI bias. Although lower cognitive reflection scores were associated with higher Beauty preferences of human than AI art, no other significant relationship emerged between our individual-differences measures and the human-AI difference scores. Past work with the CRT has found negative relationships between CRT scores and perceived profundity of randomly generated statements (Pennycook et al., 2015), implying a relationship between cognitive reflection traits and subjective judgements of profundity that could extend to aesthetics. However, in Study 1, we found no support for this relationship. Given that the CRT assesses intuition-overriding in mathematical domains, we can conclude that (1) anti-AI biases largely do not rely on quantitative skills, (2) the anti-AI bias is not an intuitive response as probed by the mathematical problems in the CRT, and/or (3) we need more-sophisticated statistical analyses and/or power to uncover the true underlying relationships (if any) among these measures. In addition, the lack of statistically significant relationships between age and judgements of art was somewhat surprising. Though exploratory, this is at odds with some previous aesthetic studies that have reported age-dependent differences in judgements of art. Specifically, Mockros (1993) reported that general aesthetic judgements were rated higher by novice professional adults than novice undergraduates. More research is warranted on this topic given that we found no such age effect, which perhaps implies that younger participants—with their lives being more heavily dominated by AI than older participants—do not have different views on the creative products of AI than those of older adults.

Study 2

The results from Study 1 provide initial insight into participant judgements of art that is believed to be generated either by AI algorithms or by humans. While not reliably predictable by individual differences such as age or CRT scores, a bias against AI art emerged. This bias, however, clearly depended on the criterion used to assess the artwork. Certainly, individuals do not just consider measures of Liking, Beauty, Profundity, and Worth when assessing artwork, but a host of other engagement processes.

This nuance to a general “anti-AI bias” matches with contemporary aesthetics models that argue that aesthetic judgements are rather complex. Indeed, largely pioneered by David Berlyne’s new experimental aesthetics (1960, 1971), art appreciation has been argued to be a consequence of elaborative appraisals of criteria like novelty, ambiguity, and complexity (which he deemed “collative” properties; Cupchik & Berlyn, 1979). Other emerging models have since provided more nuance to Berlyne’s traditional behaviorist approach (see Berlyne, 1975; Silvia, 2005), including more cognitive principles, and an increased understanding of the processes people utilize to engage with art. These models all agree in their proposal that humans engage in a multi-process evaluation of art, considering both sensory and non-sensory aspects (e.g., the Aesthetic Triad, Chatterjee & Vartanian, 2016; the pleasure-interest model, Graf & Landwehr, 2015; the information-processing model, Leder et al., 2004). Accordingly, many aesthetic properties—some more literal and surface-level, some non-sensory and more elaborative—can be probed during interactions with art. For instance, findings from Chamberlain et al. (2018) show that participants specifically cite that brush-strokes and other surface-level properties of the artistic process influence their evaluations of the final product. Moreover, when viewing anthropomorphized videos of robots painting, participants show greater appreciation for computer-generated art than when no anthropomorphized video of the robot is provided. This mirrors findings by Hong et al. (2022), who found that anthropomorphization of an AI-music algorithm led to higher acceptance of the algorithm as a true “musician,” which in turn led to increased aesthetic appreciation. Collectively, these studies suggest that there may be an implicit role of effortfulness or embodiment behind peoples’ evaluations of art. More specifically, it may be that peoples’ preferences for art increase as their beliefs about the amount of effort that went into creating a piece of art increase.

As informed by multi-level models as above, including additional judgement criteria could help us to shed further light on the reasons as to why people are making the aesthetic judgements they’re making. Thus, in addition to Liking, Beauty, Profundity, and Worth, in Study 2 we also included the following judgement criteria: emotionality (Emotion), perceived narrativity (Story), personal meaning (Meaningful), perceived effort (Effort), and estimated time to create (Time) as additional criteria for judgement. Importantly, while these now nine criteria may or may not behave in similar ways to one another, some may moderate ratings of others. In this way, we could gain insight on how levels of processing act interpedently or influence one another in this new sphere of aesthetic judgements with art created by non-humans. We specifically included these criteria given emotion’s prominent role in aesthetics and creativity (Silvia, 2005), the use of narratives as engines of consumption in other forms of art (a growing literature in music cognition investigates narratives in response to music, e.g., Margulis et al., 2019; McAuley et al., 2021), the act of deriving meaning from artwork that is often posited in models of aesthetics (e.g., Leder et al., 2004; Pelowski et al., 2017), and that effort and time involved in creation of a product is often a heuristic for quality (Kruger et al., 2004).

In addition, we extended the battery of individual-differences measures given our mostly null results from the individual-differences measures in Study 1. While we choose to keep both age and CRT scores as potential individual differences, we also included empathy skills (to determine if participants differ in judgement based on the ability to empathize with other agents, including AI, perhaps explaining AI-art judgements), openness to experience (given its role in aesthetic encounters; Kaufman, 2013; McCrae, 2007; McCrae & Greenberg, 2014; Silvia et al., 2015), personal attitudes toward AI (which we hypothesized could predict judgements of artwork made by AI), and facets of the creative mindset scale (CMS; Karwowski, 2014), which include growth and fixed mindsets. The CMS specifically asks about views of who can produce products of creativity, and we thus deemed it important to include in this study to determine if growth mindsets lead to higher AI-art ratings as we hypothesized. In addition, to detect more sensitive relationships between both individual-differences measures and judgement criteria, we aimed to use more-sophisticated statistical modeling with a pre-registered design. In sum, through implementing linear mixed models, we sought to determine whether we would replicate and explain the anti-AI effect with wider criteria of judgements—to parallel common multi-level processing models of aesthetics—and an extension of individual-differences measures.