When applying the AIE, participants show higher intrinsic motivation, more need satisfaction, and higher performance after being able to customize their avatar (Birk, Atkins, et al., 2016; Birk, Mandryk, et al., 2016; Birk, Mandryk, et al., 2016; Birk & Mandryk, 2018; Birk & Mandryk, 2019). Thus, in this study, participants run through the SSG twice: once with a customized avatar and once with a preset avatar. The order of the two SSG blocks was counterbalanced across subjects. A participant’s performance with a self-made, customized avatar was compared to their performance with a generic, preset avatar.
Method
Sample
We recruited participants over Amazon Mechanical Turk (MTurk) and recorded 71 complete datasets (28 female, 43 male, mean age = 37.92, SD = 11.55). Based on previous research on the AIE (Birk, Atkins, et al., 2016; Birk, Mandryk, et al., 2016), we expected a medium-sized effect of f = 0.35 as well as a medium-sized correlation between repeated performance measures of r = 0.4 (Friehs & Frings, 2018, 2019a; Friehs et al., 2020; Friehs et al., 2020; Friehs et al., 2020c; Friehs, Brauner, et al., 2021; Friehs, Frings, et al., 2021). Together with an α-value of 0.05 and a power of 1 – β = 0.95, a sample of at least 30 participants was planned (power analysis was carried out using G*Power 3.1.3; Faul, Erdfelder, Lang, & Buchner, 2007). To test also for smaller effects, as well as to potentially deal with interaction effects with the (between-subject) factor of the order of the SSG parts (see above), we collected data from a larger sample. Participants were monetarily compensated for their participation (12$/h) and the whole study took 50 min to complete. The study was approved by the local ethics committee. All participants provided written informed consent.
Design
The study had a repeated-measures design to evaluate the effect of avatar identification on performance within participants. Thus, we used a 2 (avatar identification: high vs. low) × 2 (order: high identification first vs. low identification first) mixed design, with the order being counterbalanced between participants. The main dependent variable was the stop-signal reaction time (SSRT, i.e., the estimate of time needed to respond to the stop-signal and to cancel the movement), which is a measure of the reactive inhibition process.
Procedure
Participants were briefed about their tasks in the study, and afterward, participants were either tasked with customizing one avatar to make it resemble their own character (high-identification condition) or were provided with a generic (low-identification condition).
In the customized condition, participants could adjust the appearance and personality of their avatar. Possible customizations included gender, height, weight, muscles, heads, the shape of facial features, chest size and the color of the skin, eyes, hair and beard. Further, the participant could choose the style and color of the avatar’s clothing, as well as add accessories such as glasses headphones, or hats. Furthermore, participants were asked to describe the personality of their avatar by manipulating five 7-point scales, which each described one personality trait, based on the 10-item short version of the Big Five inventory (BFI-10) (Rammstedt et al., 2013). Figure 1 shows the different steps of the avatar editor.
Participants were asked to complete one of the two SSG conditions (high-identification or low-identification) with the accompanying avatar. Once the first SSG condition was finished, participants filled out the Player Identification Scale (PIS; Van Looy et al., 2012) to assess identification with the avatar. After that the second SSG condition, the PIS was presented a second time. Once both sessions were completed, participants answered demographic questions.
Stop-Signal Game We used a, previously validated, gamified version of the SST. Here, the game takes the form of an infinite runner. The game was implemented using the Unity3D engine (for technical details please refer to ). The underlying task architecture was based on the recommendations for the use of the SST by Verbruggen and colleagues (Verbruggen & Logan, 2008, 2009; Verbruggen et al., 2019). The SSG further presented the participants with a cover story that helped contextualize their performance. Specifically, participants were told they were lost in an enchanted forest and a fairy would help them escape by pointing either to the left or right at every crossroads. However, they were further told that a witch might attempt to take on the appearance of the fairy in order to trick the player into going deeper into the haunted woods. This witch, however, can be detected by an audio-cue. This audio cue serves as the stop-signal in the task. Figure 2 depicts the SSG.
Each session consisted of a total of 300 trials, containing 75% go- and 25% stop-trials. The 300 trials were divided into three blocks with a 15 s break in between. Participants were instructed to react as fast and accurately as possible to the go-stimulus (i.e., a fairy pointing left or right) with the left or right arrow key and withhold their reaction when a stop-signal (i.e., a noise presented over headphones) occurs. The go-stimulus was presented for a maximum of 1500 ms or until reaction. The stop-signal was played over the headphones following a variable delay (the stop-signal delay, SSD). The SSD represents the delay between the onset of the go- and the stop-signal and was initially set to 250 ms. The SSD was continuously adjusted with the staircase procedure to obtain a probability of responding of 50%. After the reaction was successfully stopped (i.e., button press was inhibited), the SSD was increased by 50 ms, whereas when the participants did not stop successfully, the SSD was decreased by 50 ms. The inter-trial interval was set to a random value between 500 and 1500 ms. Several different performance measures were logged and calculated including the SSD and the probability of making a (wrong) response when a stop-signal is presented (p(response|signal)). Furthermore, two variables that are directly related to accuracy were logged: first, the amount of omission errors (reflecting the probability of missed response on no-signal trials) and second, the choice errors (reflecting the probability of a wrong response on no-signal trials). Additionally, we logged two RT variables; no-signal RT reflects the speed of correct responses on trials without a stop-signal, and signal RT, which indicates the latency of the incorrectly executed response on stop-signal trials. Furthermore, the probability of a correct inhibition (i.e., the likelihood of inhibiting an already initiated action) was recorded for each participant. Most importantly, the stop-signal reaction time (SSRT) could be calculated based on a participant’s performance. The estimation of the SSRT was based on the integration method with replacement of omissions (for a detailed description please refer to Friehs et al., 2020a, 2020b; Verbruggen et al., 2013, 2019).
Player identification scale
The PIS was used to measure the participant’s identification with the avatar (Van Looy et al., 2012). The scale encapsulates three subscales: similarity identification, embodied identification, and wishful identification. Similarity identification is measured through items such as “My character is similar to me in many ways”, “I identify with my character” and “My character is an extension of myself”. Embodied identification is measured through items such as “When I am playing, it feels as if I am my character”, “In the game, it is as if I become one with my character” and “In the game, it is as if I act directly through my character”. Wishful identification is measured through items such as “If I could become like my character, I would”, “My character is an example to me” and “My character has characteristics that I would like to have”. Participants rated their agreement to different avatar-related statements on a 5-point scale from “strongly agree” to “strongly disagree”.
Analysis plan
Data Analysis was done in five phases: First, in the data reduction stage, any participant with faulty or invalid data as well as participants that made too many errors were removed from analysis. See Data Reduction section below for more details. Second, in a preliminary analysis stage, all pre-requisites for analysis were evaluated. This includes showing that there is a statistical difference between the average signal-response time and the average no-signal RT (for more details see Verbruggen et al., 2019) as well as reliability analysis for the player identification scale and its subscales. Third, as a form of manipulation check, we evaluated whether player identification differed between the custom and generic avatar condition. Fourth, overall performance was evaluated. Fifth, we investigated the hypothesis that player identification, as measured by the PIS, is predictive of a performance improvement in the SSG as indicated by a changed response inhibition. Specifically, the difference between avatar identification should be predictive of the SSRT difference between generic and custom avatar conditions.
Data reduction
Although MTurk data quality has been shown to be reliable in general, there are still task-specific exclusion criteria to be considered (Buhrmester et al., 2011; Chmielewski & Kucker, 2020). We followed the recommendations by Verbruggen and colleagues (Verbruggen et al., 2019; Verbruggen & Logan, 2015, see also Friehs, Brauner, et al., 2021; Friehs, Frings, et al., 2021). First, we tested the horse-race assumption for every participant by comparing signal-response reaction time (RT) and no-signal RT. The horse-race assumption states that SSRT can only reliably be estimated if the RT on unsuccessful stop trials is smaller than the mean go-RT. Second, participants were excluded if their p(response|signal) was smaller than 0.25 or larger than 0.75 in either session. Third, outliers were determined based on the Tukey outlier criterion (Tukey, 1977), and removed if the accuracy on go-trials was 3 or more standard deviations below the sample. Based on these criteria, eighteen participants had to be excluded, resulting in a final sample of 53 subjects (mean age = 39.1, SD = 12.1, 33 male and 20 female). Participants’ self-identified ethnicity was predominantly White (n = 40), with a minority of people identifying as Asian (n = 6), Black or African American (n = 5), American Indian or Alaskan (n = 1) or Hispanic/Latino (n = 1).
Results
The results show that the reactive inhibition process (as measured by SSRT) is affected by identification with the custom, participant-made avatar. Specifically, a performance increase over time was observed in participants with increased identification. For details on results, see Fig. 3 for the results on avatar identification and Fig. 4 for the performance change predicted by identification.
Preliminary SSG analysis
To validate the gathered data, it is recommended to show that there is a statistical difference between the average signal RT (i.e., RTs of correct responses during go-trials) and the average no-signal RT (i.e., RTs of false responses during stop-trials) for each experimental condition (Verbruggen & Logan, 2015; Verbruggen et al., 2019). We crossed the trial type (signal vs. no-signal) and the avatar identification (high vs. low) in a 2 × 2 ANOVA. Results show significantly different RTs between signal and no-signal trials as indicated by the significant main effect of trial type, F(1, 52) = 299.27, p < 0.001, ηp2 = 0.85). No main effect of avatar identification and no interaction was found, F(1, 52) = 2.45, p = 0.12 and F(1, 52) = 2.98, p = 0.09, respectively. Together, these results demonstrate a valid SSG measurement.
Manipulation check
The PIS served as a manipulation check. Reliability scores (Cronbach’s alpha) for all PIS subscales were computed, split by the order on which participants went through the study procedure. Results show that Cronbach’s alpha was satisfactorily high across all subscales: Similarity identification (custom avatar first = 0.93, generic avatar first = 0.95), embodied identification (custom avatar first = 0.97, generic avatar first = 0.95), wishful identification (custom avatar first = 0.87, generic avatar first = 0.94). Hence, player identification was measured with high reliability. A 2 (avatar identification: high vs. low) × 2 (order: high identification first vs. low identification first) × 3 (subscale: similarity vs. embodiment vs. wishful) MANOVA with the PIS scores as the dependent variable was calculated. Most importantly, the main effect of avatar identification was significant, F(1, 51) = 24.37, p < 0.001, ηp2 = 0.32, indicating higher identification with the avatar in the high-identification (i.e., custom avatar) condition compared to the in the low-identification condition. Moreover, the main effect of subscale was significant, F(2, 50) = 45.83, p < 0.001, ηp2 = 0.65, indicating that scores at the similarity subscale were rated higher than those of the wishful or embodiment subscales. The main effect of order did not reach statistical significance, F(1, 51) = 2.53, p = 0.12. The two-way interaction of avatar identification and subscale was significant, F(2, 50) = 13.42, p < 0.001, ηp2 = 0.35. This indicates that the difference between the two avatar identification conditions were different depending on the subscales. The two-way interactions of avatar identification and order as well as subscale and order were not significant, F(1, 51) = 0.05, p = 0.82 and F(2, 50) = 1.74, p = 0.19, respectively. The three-way interaction was also not significant, F(2, 50) = 0.50, p = 0.61.
Overall performance analysis
The main dependent variable in the SSG is the SSRT as an indicator of the speed of the reactive response inhibition process. However, a 2 (avatar identification: high vs. low) × 2 (order: high identification first vs. low identification first) ANOVA did not reveal an overall significant effects of avatar identification on SSRT and neither the effect of task order nor the interaction was significant; all Fs < 2.45, all ps > 0.12. A comparable 2 (avatar identification: high vs. low) × 2 (order: high identification first vs. low identification first) ANOVA with the SSD as the dependent variable revealed two non-significant main effects of condition (F(1, 51) = 1.01, p = 0.32) and order (F(1, 51) = 1.27, p = 0.27), while the interaction (F(1, 51) = 15.08, p < 0.001, ηp2 = 0.23) was statistically significant, which indicates a practice effect from session 1 to 2. The same data pattern was found in no-signal RTs (i.e., RTs of correct responses during go trials) as well as in signal RTs (i.e., RTs of false responses during stop trials). In both ANOVAs, a significant interaction was found, F(1, 51) = 11.75, p < 0.001, ηp2 = 0.19 for no-signal RTs, and F(1, 51) = 9.67, p < 0.01, ηp2 = 0.16 for signal RTs. However, no main effect of avatar identification, F(1, 51) = 3.03, p = 0.09 for no-signal RTs, and F(1, 51) = 1.07, p = 0.31 for signal RTs, as well as no main effects of order was revealed, F(1, 51) = 0.28, p = 0.60 for no-signal RTs and F(1, 51) = 0.51, p = 0.48 for signal RTs. With regard to performance errors, due to the overall low number of errors, omission and commission errors were combined and overall accuracy was submitted to the analysis. The 2 (avatar identification: high vs. low) × 2 (order: high identification first vs. low identification first) ANOVA with accuracy scores as dependent variable resulted in no significant main effects, (all Fs < 2.02, all ps > 0.16).
Relation between reactive response inhibition and avatar identification
To tackle our main research question and hypothesis, we further filtered the participants and included only people that showed a higher average PIS for custom as compared to generic avatars, i.e., participants who could be characterized as responders to the AIE manipulation. Specifically, delta(PIS) = custom(PIS) – generic(PIS) needed to be > 0. Forty people fulfilled this criterion; 16 played with their custom character first and 24 with the generic one first. Further, the inhibition performance advantage for custom avatars was calculated; delta(SSRT) = generic(SSRT) – custom(SSRT). Thus, since smaller SSRTs indicate a faster inhibition process, difference scores > 0 signify a performance advantage for the custom compared to the generic condition. Descriptively, participants improved over time by 21 ms (i.e., ~ 5% from 414 to 393 ms), if they played with the self-relevant avatar first, but performance got worse by 9 ms (i.e., ~ 3% from 396 to 387 ms) if they played with the generic avatar first (see Fig. 4A). In the initial analysis step, the delta(SSRT) was correlated with delta(PIS) scored for all subscales as well as with the overall mean difference. Further, since previous results showed that the order—as a changed frame of reference—is important, analysis was split by the order in which participants completed the two conditions. Results revealed that delta(SSRT) did not significantly correlate with either delta(PIS) (r = 0.09, p = 0.57), delta(PIS similarity) (r = 0.11, p = 0.50), delta(PIS embodiment) (r = 0.08, p = 0.63) or delta(PIS wishful) (r = 0.04, p = 0.83). However, order may be important given that it might change the frame of reference. Thus, correlations were re-calculated after splitting participants. If the generic avatar was used first, an identical pattern emerged with no significant correlations overall as well as for all subscales; delta(PIS) (r = 0.03, p = 0.87), delta(PIS similarity) (r = 0.04, p = 0.84), delta(PIS embodiment) (r = 0.15, p = 0.50) and delta(PIS wishful) (r = − 0.03, p = 0.88). However, if participants played with the custom avatar first, the results changed. Specifically, the correlation between delta(SSRT) and delta(PIS similarity) was significant (r = 0.56, p < 0.05), while all others were not: delta(PIS) (r = 0.40, p = 0.13), delta(PIS embodiment) (r = 0.17, p = 0.53) and delta(PIS wishful) (r = 0.34, p = 0.19). Additionally, all subscales were submitted in a stepwise manner to a regression in order to predict delta(SSRT). This procedure chooses predictors purely based on statistical merit and only significant predictors with p ≤ 0.05 are included stepwise and it is split by order; the regression model overall was significant only for custom avatar first task completion (F(1, 15) = 6.22, p < 0.05), with an R2 = 0.31 (i.e., 31% of variance explained by the significant predictors). The only significant predictor is delta(PIS similarity) (t = 2.50, p < 0.05, Cohen’s d = 1.34; standard model: delta(SSRT) = 0.56 * delta(PIS similarity)). For a depiction of the results and a visual reference refer to Fig. 4B.