The experiment was conducted with voluntary participants recruited among undergraduate students in psychology at the University of Vienna, in exchange for “experimental participation” course credits. All participants had normal or corrected-to-normal vision and signed an informed consent (including agreement to publicly sharing their anonymous test data) before beginning the experiment. Each participant was randomly assigned to one of three conditions: Control group (no specific instructions regarding speed or accuracy), Speed group (instructions emphasizing speed), or Accuracy group (instructions emphasizing accuracy).
We initially opened 80 slots for participation in each of the three groups. We had registered to collect 25 more participants in each group, repeated up to a maximum of 130 participations (i.e., students who came and completed the test) per each group, if the Bayes factor (BF) for the one-way analysis of variance (ANOVA) across the three groups, for probe–irrelevant RT mean differences, had not reached 5.
For a power of 0.9 and alpha at 0.05, an effect size as low as Cohen’s d = 0.40 can be detected with 130 participants, in consideration of the critical between-subjects t tests comparing each two of the three groups (Champely, 2020). Assuming a base correct detection rate (CDR) of 0.80, an SD of 33.6 for “guilty” predictors (probe–irrelevant RT mean differences), an SD of 23.5 for “innocent” predictors, the CDR gain corresponding to the effect size of d = 0.40 would be 0.08 (hence, the improved CDR would be 0.88; see Lukács & Specker, 2020). In consideration of cost-efficiency, this potential improvement seemed a reasonable minimum size of interest to us (e.g., Lakens et al., 2018), especially in light of real-life cases’ differences likely being smaller than in strictly controlled laboratory experiments such as ours.
However, as can be seen in the results, the BF was well-above 5, at 19.45, already at 240 participants; hence, we stopped collecting at this point. We excluded five participants from the analysis based on our preregistered exclusion criteria (see below), leaving 235 valid tests in our analysis: 78 subjects (age = 22.1 ± 3.0; 26 male) in the accuracy group; 78 subjects (age = 21.9 ± 3.6; 27 male) in the speed group; 79 subjects (age = 22.5 ± 5.1; 24 male) in the control group).
At the beginning of the experiment, participants were asked to state and verify their given name, surname and birthday, as well as provide further demographic information. Participants were then presented with a random list of eight different dates (month and day) and eight different surnames that did not include their own and, regarding surnames, matched their surnames as closely as possible in character length. They were asked to select up to two dates and surnames of the list that in any way seemed familiar or meaningful to them or stood out to them from the rest of the list. Subsequently, five dates and surnames for the CIT were randomly selected from the non-chosen items (as this assured that the irrelevants were indeed irrelevant). One of these items was randomly chosen as the target, while the remaining four served as irrelevants.
In their respective condition (control vs. speed vs. accuracy), participants were randomly assigned to either completing the block with surname items first or to completing the one with date items first, with the respective other item category following in the second block.
The probe was the respective participant’s real surname as stated at the beginning of the experiment in one block and the participant’s birthday in the other block. During the RT-CIT, participants were asked to categorize items that were presented in the center of the screen by pressing either “E” or “I” on their keyboard. They were asked to press one of those keys, whenever they saw the probe or an irrelevant. Whenever the target appeared, they were asked to press the other key. Whether they were instructed to use “E” or “I” to categorize the probe and irrelevants and, respectively, the other key to categorize the target, varied randomly between subjects.
Apart from these main items (probe, target, irrelevants), we included two kinds of fillers: (a) expressions referring to familiarity and self-relatedness (e.g., “FAMILIAR,” “MINE,” etc.) that had to be categorized with the same key as the target (and, thus, opposite to the probe and the irrelevants), and (b) expressions referring to unfamiliarity and other-relatedness (e.g., “UNFAMILIAR,” “OTHER,” etc.) that had to be categorized with the same key as the probe and irrelevants. It is assumed (Lukács et al., 2017) that fillers further slow down responses to the probes because the probes have to be categorized together with the semantically incompatible expressions referring to unfamiliarity (cf. Nosek et al., 2007; Rosch et al., 1976). In addition, by increasing the complexity of the otherwise excessively simple task, fillers prevent strategically focusing on the target and, thereby, ignoring, to some extent, the probe and its meaning and relevance (Anderson, 1991; Hu et al., 2013; Reber, 1989; Verschuere et al., 2015; Visu-Petra et al., 2013). These assumptions, as well as the necessity of this specific arrangement of fillers, have been strongly supported by the findings of a recent study (Lukács & Ansorge, 2021).
The inter-trial interval (i.e., between the end of one trial and the beginning of the next) always randomly varied between 500 and 800 ms. In case of a correct response, the next trial followed. In case of an incorrect response or no response within the given time limit, the feedback “Falsch!” [“Wrong!”] or “Zu langsam!” [“Too slow!”] in red color appeared, respectively, in place of the stimulus for 500 ms, followed by the next trial.
To begin the test, participants were guided through three practice rounds. In the first round, they were asked to categorize only the filler items as being either familiarity-referring (“vertraut”, “mein”, relevant” [“familiar”, “mine”, “relevant”]) or unfamiliarity-referring (“unvertraut”, “fremd”, “unbekannt”, “andere”, “sonstiges”, “irrelevant” [“unfamiliar”, “foreign”, “unknown”, “other”, “other”, “irrelevant”]). In this first practice round, participants were required to have at least 80% valid responses. The time limit for their response was 1 s. If participants failed to reach 80% valid responses, they were reminded of the instructions and had to retake this practice round.
In the second practice round, participants were asked to categorize items as either unfamiliar (i.e., pressing the key assigned to classify the target) or familiar (i.e., pressing the key they were assigned to classify the probe and irrelevants). Depending on the condition, participants were either presented with dates or surnames when they were told to categorize the target as familiar and all other appearing items as unfamiliar. To secure that participants paid attention to the stimulus and that resulting differences in RTs (and ERs) in the main task were not caused by misunderstanding the instructions or uncertainty about how to respond, each trial in this round required a correct response. For this, participants were given an extended time limit for their response (10 s). To ensure that neither accuracy nor speed was already enforced in this practice round and to avoid bias in the following main task, each item was only shown once, and the round, thus, consisted of only six trials (probe + target + irrelevants). In the case of an incorrect response, participants were immediately reminded of the instructions and had to retake this practice round.
In the third and final practice round, fillers and main items were presented together and had to be classified as familiar or unfamiliar. The time limit for the participants’ response was decreased again (to the initial 1 s) and a certain rate of mistakes was allowed, though 60% valid responses for each item type (familiarity-referring filler, unfamiliarity-referring filler, target, main items [probe or irrelevants together]) were required to pass this round. Otherwise, participants were reminded of the instructions again and had to retake the practice round.
The main task followed and contained two blocks for each test. In the speed group, participants were instructed to respond as fast as possible to the items, react quickly to the items, and to focus on speed. They were then presented with all items, with the main items and probe consisting of either dates in the first block and surnames in the second block or vice versa, depending on random assignment. In order to avoid examinees regressing to their natural mean SAT (Heitz, 2014; Schouten & Bekker, 1967), they were reminded of their instructions between the first and second blocks. In the accuracy group, the procedure remained the same, with the exception that instead of the instructions with a focus on speed, participants were told to respond as accurately as possible, always press the correct response key and to focus on accuracy. In the control group, participants were not provided with particular speed or accuracy instructions. The response time limit remained at 1 s.
In each block, each probe, irrelevant, and target was repeated 18 times (hence, 18 probe, 72 irrelevant, and 18 target trials, in each block). The order of these items was randomized in groups: First, all six items (one probe, four irrelevants, and one target) in the given category were presented in a random order; then, the same six items were presented in another random order (but with the restriction that the first item in the next group was never the same as the last item in the previous group). Fillers were placed among these items in a random order, but with the restrictions that a filler trial was never followed by another filler trial and each of the nine fillers preceded each of the other items (probes, targets, and irrelevants) exactly one time. (Thus, 9 × 6 = 54 fillers were presented per block, and 54 out of the 108 other items were preceded by a filler.)
We registered to exclude data from all participants, within each of the three experimental groups, with an accuracy rate further than three interquartile range [IQR] distance from the IQR, based on the IQR of each given group, for any of the following item types: (a) main items (probe and irrelevants merged), (b) targets, and (c) fillers (all fillers merged). Based on these criteria, five participants had to be excluded from the analysis. For all further analyses, responses below 150 ms were excluded.
All data analysis was carried out in R (R Core Team, 2019; via: Kelley, 2018; Lukács, 2021a; Morey & Rouder, 2018; Robin et al., 2011). For all one-way ANOVAs and all between-subject t-tests, we used Welch’s correction (Delacre et al., 2017, 2019).