Participants and design
The main training study employed a three-group pre-post design. A power analysis conducted in G*Power 3.0.10 (Faul, Erdfelder, Buchner, & Lang, 2009; Faul, Erdfelder, Lang, & Buchner, 2007) indicated that a sample size of 50 participants per group would be adequate for detecting a small to medium effect. Previous research on deceit detection training reported small to medium effect sizes, and many studies used sample sizes of about 50 participants (Driskell, 2012; Hauch et al., 2014). All participants had to be free from neurological/psychological diagnoses and have normal or corrected-to-normal vision. All participants were screened for dementia and completed tests of visual and hearing acuity. Participants were recruited from our existing participant database and the local community using fliers, word of mouth, and advertisements and received US$25 for their participation. Participants could not be the same individuals who participated in the validation study.
The main training study included a total of 157 participants (ages 57–87 years; mean age = 68.76 years, SD = 5.80). Each participant was randomly assigned to one of three experimental conditions: verbal training, facial training, or control. Participants were screened for dementia with the Modified Telephone Interview for Cognitive Status (TICS-M 15 items; Brandt, Spencer, & Folstein, 1988; Welsh, Breitner, & Magruder-Habib, 1993). Nine participants scored below the cutoff of 21 and were not included in further analysis, leaving a total of 148 participants (ages 60–87 years; mean age = 68.96, SD = 5.75; n = 51 male and 91 female participants) for analyses. The breakdown of participants by group was as follows: 51 (15 male, 36 female) participants in the facial condition, 48 participants (22 male, 29 female) in the verbal condition, and 49 participants (20 male, 29 female) in the control condition. Within each condition, participants were randomly assigned to video set A or video set B.
Stimuli creation
To create the videos of truth/lie statements, we recruited 20 middle-aged adults (ages 40–59 years; 50% women) because the majority of fraud against the elderly is perpetrated by middle-aged adults (Burnett, Xia, Suchting, & Dyer, 2017; DeLiema, Yonashiro-Cho, Gassoumis, Yon, & Conrad, 2017). Participants (hereafter referred to as suspects) were asked to lie or tell the truth about their agreement on six controversial topical issues (legalizing marijuana, euthanasia, labor unions, abortion, cloning of human cells, and government healthcare). We selected lies about attitudes and opinions (rather than a theft) because these are the most common types of lies (DePaulo et al., 2003) and we wanted to avoid putting suspects and judges into unfamiliar situations. Indeed, on average, individuals tell one or two social lies per day. Suspects were randomly assigned to lie about their opinion for three topics and tell the truth about their opinion for three topics. The interrogator was blind to which topics were lies. Suspects were provided with a monetary incentive (US$20) if they could convince the experimenter that they were telling the truth on all six statements. Past research has shown that a monetary incentive is important for creating a high-stakes lies scenario to best imitate the types of situations where con-artists lie in actual fraud or other criminal situations (Frank & Ekman, 1997; Hauch et al., 2014). Suspects’ heads and shoulders were videotaped while being interrogated with a standard set of questions (e.g., When did you first develop this opinion? Who was with you when you first developed this opinion? Are you lying to me now?).
Recent work in forensic psychology highlights the importance of employing the principle of differential recall enhancement in order to elicit content that provides discernible cues to deception (Colwell, Hiscock-Anisman, & Fede, 2013). The idea is that a person who is being honest believes that their honesty is transparent and therefore they do not try to manage their impression. In addition, a person who is being honest is able to recall the actual events from memory, which takes less effort than creating information while trying to appear consistent and credible (in the case of someone lying). We employed the principle of differential recall enhancement by simply asking at the end of each interview for the suspect to “Describe in as much detail as possible everything you remember about your opinion on this topic”. This final question can highlight differences in the credibility of statements and is now recommended as a basic requirement of standard interviewing techniques by police officers (Colwell et al., 2013).
Validation of stimuli
In order to select the best set of stimuli for the main training study, we conducted a validation study with the sets of six statements from all 20 middle-aged suspects (a total of 120 videos, which ranged from 56 s to 4 min, 20 s (mean (M) = 1 min, 54 s; SD = 49 s). The aim was to select one truth and one lie from each suspect that yielded typical performance (based on prior research), to avoid ceiling or floor effects and allow for an increase in accuracy after the training intervention. A lifespan sample of 23 men and women (18–78 years old; M = 53.13 years, SD = 18.81; 70% women) participated in the validation study. Because judging 120 videos would likely fatigue participants, each participant viewed 30 video statements in a randomized order and judged each statement as a truth or a lie. Each video was judged by 8–10 participants (with age groups equally represented). In order to select stimuli that elicited typical, near-chance accuracy detection of truth versus deception in naïve participants, we selected one truth and one lie statement from each suspect that was closest to 50% accuracy in the validation study. Each suspect had three truth videos and three lie videos from which to choose one truth and one lie video for the main study. For some suspects’ videos, the accuracy across their different truth or lie statements was very low; while for other suspects’ videos, the accuracy across different truth or lie statements was much higher than 50%. For example, one suspect’s three truth videos were detected with 90%, 90%, and 100% accuracy in the validation study. In this case, we chose one of the 90%-truth videos for this suspect to be included in the main study. Another suspect’s lie videos were detected with 0%, 0%, and 11% accuracy in the validation study. We selected this suspect’s 11%-accuracy lie video to be included in the main study. This yielded 20 truth-lie pairs to use pre-test and post-test (before and after the training study). The final set of videos ranged in accuracy in the validation study from 11 to 90%. For the main study pre-test and post-test, we created two sets of videos with 10 videos (5 truths and 5 lies) for pre-test and 10 videos (5 truths and 5 lies) for post-test. Each suspect was only represented once in each video set. Thus, if the “truth” statement was used in video set A from suspect 702, then the lie statement from suspect 702 was used in video set B.
These videos were then coded for the presence of valid facial cues and verbal cues to deception. To code the facial expressions of the videos, we used the Affectiva Affdex facial expression recognition engine within the iMotions Biometric Research Platform (version 7.1 software, 2018; Copenhagen, Denmark). First, two coders independently watched the playback of the videos with the video box and points of reference to identify videos with artifacts that interfered with the facial processing. One person was chewing gum and one person brought their hand to their mouth several times, which interfered with the automated coding. This left 36 videos to code for facial expressions (18 truth-lie pairs). Because we expected individuals who were lying would try to hide some of their facial expressions (Porter & ten Brinke, 2008), we used a criterion threshold of 10% to capture fleeting and partially concealed facial expressions. Ninety percent of lies in a previous study were correctly categorized as lies just based on the presence of fear or disgust (Frank & Ekman, 1997). Coding of real televised footage of emotional pleas about a missing relative found that the presence of disgust and lower face smiles were predictive of liars (ten Brinke & Porter, 2012). Based on this prior evidence, we focused on the basic facial expressions of joy, fear, and disgust, as well as the presence of smiles (or “duping delight”). We included both joy expressions and smiles, in case the duping delight only leaked through on the lower half of the face, in a smile. The sampling rate was 17 ms. For each frame, if the software reported greater than 10% confidence that the emotion was present, that frame received a 1, all else received a zero. Then we multiplied the number of frames that exceeded the threshold by 17 to obtain the total duration (in milliseconds) of each facial expression per video. Next, we divided this duration of facial expressions by the total duration of the video to obtain a percent duration of each facial expression for each video. Given that there are individual differences in the facial expressions that leak out when lying (Frank & Ekman, 2004), we examined each video pair for the strongest emotional expressions. Nine video pairs differed on smile duration, with lie videos (M = 8.90%, SE = 3.22%) containing significantly greater overall durations of smiles than truth videos (M = 4.98%, SE = 2.50%) (paired samples t test: t(8) = 2.35, p = .047). Five video pairs (including one of the video pairs in the smile group) differed on the duration of the facial expression of disgust, with lie videos (M = 1.44%, SE = 0.70%) containing significantly greater overall durations of facial expressions of disgust than truth videos (M = 0.11%, SE = 0.09%) (related samples Wilcoxon signed rank test, p = .043). This left six video pairs without clear facial expression cues differing between truth and lie. Thus, we confirmed that 13 (of 20) video pairs contained valid facial cues to deception.
The verbal content in the videos were transcribed and verified. To code the videos for valid verbal cues to deception, two coders independently coded transcripts of the 40 videos on the 14 verbal cues presented in Table 1. All cues were coded as present (1) or absent (0), except quantity of details, which was coded as absent (0), some (1), or a lot (2). The codes were not mutually exclusive; each transcript could be coded as having anywhere from 0 to 14 cues present. The coders first trained on transcripts from eight videos that were not selected for this study, and then coded half of the 40 transcripts from the videos used in this study (20 transcripts). Coders worked independently, and the transcripts were stripped of identifying information such as gender and veracity. There was an overall mean kappa score (κ) = 0.88 (SD = 0.22) across the 14 cues and the mean Z score (κ divided by the SD of κ) was 3.94, indicating that agreement was significantly greater than chance (because the Z score was greater than 1.96). Discrepancies were resolved with discussion and the final codes agreed upon were used to determine whether the videos contained valid verbal cues to deception. A single coder coded the final 20 transcripts, after achieving reliability with the second coder on the first 20 transcripts.
Across all 20 video pairs, McNemar’s test for repeated measures showed that truth videos (19 of the 20 truth videos) were more likely to contain logical structure than lie videos (13 of the 20 lie videos), p = .031. Both quantity of details and spontaneous corrections were also valid verbal cues for a subset of these video pairs (i.e., for 10 video pairs, with 3 video pairs that had both cues). The related samples Wilcoxon signed rank test was significant (p = .008) for quantity of details for seven video pairs, with truth videos (M = 1.43, SD = 0.53) containing greater quantity of details than lie videos (M = .43, SD = .55). For spontaneous corrections, for six video pairs, the truth videos all had spontaneous corrections whereas none of the lie videos did (McNemar’s test for repeated measures, p = .031).
For the dependent variables of pre and post accuracy of deceit detection, we focused on the video pairs with valid cues to deception. Each set of videos (pre-test A, post-test A, pre-test B, and post-test B) contained at least two lies and two truths with valid cues to deception in the facial, verbal, and control conditions. Videos that did not contain facial or verbal cues were excluded from the control accuracy scores. We converted number correct to percent correct for each condition (separately pre-test and post-test) as our main dependent variables.
Procedure
Participants sat at a computer with headphones pre-test and post-test. They adjusted the volume of headphones prior to the task using a music sample. Instructions were presented on the computer. Participants were told to watch each video and then make a judgment as to whether the suspect was lying or telling the truth. They were told that anywhere from ¼ to ¾ of the people in the videos were lying. They made their response after each video by circling truth or lie on a sheet of paper. Each video was numbered 1 through 10. Participants either saw video set A or video set B. After each test (pre and post), participants rated how accurate they thought they were at judging who was lying (on a 10-point scale). They also rated how confident they felt about their judgments (on a 10-point scale). Finally, participants listed the cues they used to detect deception. The procedure was identical pre-test, which occurred prior to training, and post-test, which occurred following training. The entire study took about 2.5 h (with breaks) to complete.
Training
Facial training was a self-paced tutorial through the Micro-Expression Training Tool (METT; available at paulekman.com) created by Paul Ekman. This training tool includes a benchmark assessment of accuracy in identifying micro-expressions of emotion, followed by 75 min of training and practice with feedback, and then an improvement measure assesses micro-expression accuracy again after training. The emotion recognition training consisted of 3 parts: (1) a tutorial on the evidence for valid facial cues to deception, namely that fear of getting caught, shame of lying, and joy or “duping delight” leak out during deceptive statements (approximately 15 min; Frank & Ekman, 1997), (2) completing the computerized Microexpression Training Tool (eMett 3.0; created by Paul Ekman), which has been shown to improve the accuracy of microexpression identification and includes instructions on how to identify different microexpressions of emotion (e.g., fear, joy, shame; approximately 45 min) and, (3) practice items with feedback in the form of identifying the different microexpressions of emotion (approximately 20 min). These practice items of microexpressions are the same microexpressions that have been identified as valid cues to deception in videos from a mock crime scenario (Frank & Ekman, 1997, 2004).
We created the verbal training to match the facial training using the cadre of valid verbal cues from the literature including the 14 criteria of the CBCA described above (Table 1) and additional verbal cues provided in the literature (e.g., less fluency; Driskell, 2012; Hauch et al., 2014). The CBCA has shown satisfactory reliability and validity for detecting adult lies (Gödert, Gamer, Rill, & Vossel, 2005; Landry & Brigham, 1992). The training consisted of 3 parts: (1) a tutorial on the evidence for valid versus invalid verbal cues to deception (approximately 10 min), (2) detailed descriptions of valid verbal cues to deceit adapted from DePaulo et al. (2003; approximately 40 min) and Steller and Köhnken (1989), and (3) practice with sample transcripts of statements containing each of the valid verbal cues (approximately 20 min). Participants gained practice with identifying valid verbal cues to deceit and received feedback on their ability to identify these cues.
In order to provide diversity in the practice items, they were culled from two sources of truths and lies on different topics: (1) transcripts of videos of truths and lies we obtained from ten Brinke and colleagues (ten Brinke et al., 2014), and (2) videos we collected for another study. The videos of ten Brinke et al.’ (2014) consist of six individuals lying and six telling the truth about whether they stole money in a mock crime scenario where they were instructed to either steal the money or not but always claim innocence. These videos were transcribed and good and bad examples of each of the valid verbal cues to deception were used as sample items in the practice part of the training. In addition, we collected videos of 20 young (18–30 years; 50% women) and 20 older (60–85 years; 50% women) adults lying or telling the truth about six of their personal hopes and dreams (three lies and three truths from each person). These videos were transcribed and the transcripts were used as material for practice items of valid and invalid verbal cues to deception.
Participants started with a benchmark assessment of correct categorization of verbal statements to categories of valid verbal cues to deception. There were five categories: quantity of detail, contextual embedding, admitting doubt, spontaneous corrections, and self-deprecation. Next, participants worked through a self-paced tutorial where they learned the categories and received feedback on practice trials. Finally, participants completed an improvement measure to assess their accuracy at categorizing verbal cues after the training. The entire verbal training took about 75 min.
For the control condition, participants completed a series of questions on the computer presented using Qualtrics software (Qualtrics, Provo, UT). Questions included logic and math problems, personality items, perceptual items/optical illusions, and puzzles. This took about 75 min.
Coding of cues
Participants listed the cues or strategies they used to make truth/lie judgments twice during the study: once after the deceit detection pre-test (after all 10 truth/lie judgments) and once after the deceit detection post-test (after the second set of all 10 truth/lie judgments). This yielded two thought-listing responses for each participant. Specifically, participants were asked: “What cues or strategies did you use to determine which statements were truths and which statements were lies?”
A theory and data-driven coding scheme was developed for these open-ended responses. The coding scheme included the following 14 categories: hesitation, facial expressions, eye movements, logical response, recall of comments, speech characteristics, nonverbal behavior, nervous manner, details/context, personal beliefs, liar’s use of notes, miscellaneous, no cue reported/guessing, and not codeable. Two coders independently coded 20% of the 314 responses. Inter-rater agreement was high (85–100%). Coders discussed discrepancies to reach an agreement. A single coder coded the remainder of the responses.