Memory accuracy
In separate repeated-measures ANOVAs for the three dependent measures (proportion correct, proportion of true/false reversals, and proportion of trials incorrectly labelled as “never seen”), we examined the impact of threat of shock (threat vs. safe), veracity (true vs. false), and valence (positive vs. neutral vs. negative) as within-subjects factors on memory accuracy.
Proportion correct
As hypothesized, analysis of the proportion of correct responses revealed a significant veracity x threat of shock interaction, F(1,97) = 4.86, p = 0.030, ηp2 = 0.048. Follow-up pairwise comparisons revealed that response accuracy was significantly lower for trials in which participants had been informed that a pairing was false while under threat (M = 0.69, SE = 0.02) compared to trials in which participants had been told that a pairing was true while under threat (M = 0.74, SE = 0.02), mean difference (MD) = 0.051, 95% CI [0.009, 0.093], p = 0.017, but that for safe trials the proportion of correct responses for negated (i.e., “false”) trials (M = 0.73, SE = 0.02) did not significantly differ from the proportion of correct responses for non-negated (i.e., “true”) trials (M = 0.72, SE = 0.02), MD = 0.003, 95% CI [−0.036, 0.430], p = 0.866 (see Fig. 3).
A significant veracity x valence interaction also emerged,Footnote 1F(1.88,181.94) = 3.68, p = 0.030, ηp2 = 0.037. Deconstructing this interaction, we found that for positively valenced information, the proportion of correct responses was significantly lower for trials that were negated (M = 0.68, SE = 0.02) compared to non-negated (“true”) trials (M = 0.74, SE = 0.02), MD = −0.05, 95% CI [0.005, 0.100], p = 0.032. However, for neutral trials we did not see a significant decrease in correct responses for negated trials (M = 0.71, SE = 0.02) compared to non-negated trials (M = 0.76, SE = 0.02), MD = −0.05, 95% CI [0.001, −1.04], p = 0.057, nor did this pattern emerge for negatively valenced trials that were negated (M = 0.74, SE = 0.02) compared to non-negated (M = 0.70, SE = 0.02), MD = 0.03, 95% CI [−0.088, 0.024], p = 0.026. No further main effects or interactions reached significance, including those related to valence (all p’s > 0.13).
Proportion true/false reversal errors
Analysis of the proportion of true/false reversal errors (that is, when participants mistook false items for true items, or true items for false items) revealed no significant veracity x threat of shock interaction, F(1,97) = 0.49, p = 0.482, η
2p
= 0.005 (see Fig. 4, panel A). This was contrary to our hypotheses that threat of shock would specifically lead participants to mistake false pairings for true ones.
A significant veracity x valence interaction was detected, F(2,194) = 3.21, p = 0.042, ηp2 = 0.032. Echoing the above findings for correct responses, follow-up pairwise comparisons indicated that for positively valenced information, the proportion of true/false reversal responses was significantly higher for trials that were negated (M = 0.25, SE = 0.02) compared to non-negated (“true”) trials (M = 0.21, SE = 0.02), MD = 0.05, 95% CI [0.002, 0.090], p = 0.042. This was not the case for neutral or negatively valenced trials. We found no main effect of threat of shock for true/false reversals, F(1,97) = 0.019, p = 0.892, ηp2 = 0.00, nor did any further main effects or interactions reach significance (all p’s > 0.19).
Proportion of “never seen” errors
However, when examining the proportion of trials in which participants incorrectly answered that they had never seen the pairing before, we observed a significant veracity x threat of shock interaction, F(1,97) = 7.09, p = 0.009, ηp2 = 0.068. Follow-up pairwise comparisons indicated that for threat trials, the proportion of incorrect “never seen” responses was significantly higher for negated trials (i.e., “FALSE”; M = 0.070, SE = 0.01) compared to non-negated trials (i.e., “TRUE”; M = 0.056, SE = 0.007), MD = 0.023, 95% CI [0.001, 0.045], p = 0.037. Meanwhile, this was not the case for safe trials, where the proportion of incorrect “never seen” responses did not differ significantly between negated trials (i.e., “FALSE”; M = 0.042, SE = 0.007) and non-negated trials (i.e., “TRUE”; M = 0.056, SE = 0.009), MD = 0.014, 95% CI [−0.03, 0.032], p = 0.10 (see Fig. 4, panel B). No further main effects or interactions reached significance (all p’s > 0.17).
Manipulation checks
We performed a manipulation check on participant’s skin conductance responses (SCR). To determine the impact of threat of shock on participants’ physiological responding, we conducted an ANOVA with threat of shock (safe versus threat of shock) as a within-subjects factor. We observed a significant main effect of threat of shock, F(1,97) = 189.23, p < 0.001, ηp2 = 0.66, whereby participants exhibited significantly more physiological arousal during threat of shock trials (M = 1.13, SE = 0.05) compared to safe trials (M = 0.79, SE = 0.04), MD = 0.34, 95% CI [0.29, 0.39], p < 0.001. SCR responses on shock trials were uninterpretable due to movement artefacts.
Self-report measures
Paired samples t tests revealed that participants self-reported significantly more stress when they saw the cue signalling threat of shock (M = 3.86, SD = 0.81) compared to the cue signalling safety (M = 1.50, SD = 0.80); t(97) = 26.23, p < 0.001, d = 2.65. Participants also reported that their expectations of receiving a shock were significantly greater when the cue signalling threat of shock was presented (M = 3.50, SD = 0.97) than when the cue signalling safety was presented (M = 1.54, SD = 1.00); t(97) = 15.11, p < 0.001, d = 1.52.
Confidence ratings
A paired samples t test on mean confidence ratings for correct versus incorrect responses revealed that participants rated their confidence as significantly higher for correct responses (M = 3.82, SD = 0.65) than for incorrect responses (M = 2.81, SD = 0.73); t(97) = 16.90, p <. 001, d = 1.72.
Additionally, participants’ confidence ratings for (a) correct and (b) incorrect responses were submitted to two separate 2 (threat of shock: threat vs. safe) × 2 (veracity: true vs. false) × 3 (valence: positive vs. neutral vs. negative) within-subjects ANOVAs.
For correct responses, a significant veracity x threat of shock interaction emerged, F(1,78) = 4.79, p = 0.032, η
2p
= 0.058.Footnote 2 Follow-up pairwise comparisons revealed that for face-descriptor pairings learned under threat of shock, participants had significantly lower confidence for correct responses to non-negated (“true”) pairings (M = 3.79, SE = 0.09) than for correct responses to negated pairings (M = 3.94, SE = 0.08), MD = -−0.15, 95% CI [−0.028, −0.272], p = 0.016). Although this stands in apparent contrast with the patterns of accuracy, it is important to note the small magnitude of the effects involving confidence ratings, which at their largest amounted to 0.17 on a 1–5 rating scale. Given that confidence ratings were not a main outcome of interest in the current study, future work should directly target the interacting effect of arousal and negation of information on confidence for what people remember. Notably, in the absence of threat of shock, participants’ confidence in their responses to non-negated (M = 3.92, SE = 0.08) and negated (M = 3.87, SE = 0.09) face-descriptor pairings did not significantly differ (MD = 0.05, 95% CI [−0.094, 0.202], p = 0.47). A significant main effect of valence also emerged, F(2,156) = 7.32, p = 0.001, η
2p
= 0.086. Follow-up pairwise comparisons revealed that participants expressed higher confidence in their responses for neutral descriptors (M = 3.99, SE = 0.08) than for positive descriptors (M = 3.98, SE = 0.08), MD = 0.15, 95% CI [0.057, 0.246], p = 0.002, or negative descriptors (M = 3.81, SE = 0.08), MD = 0.17, 95% CI [0.072, 0.271], p = 0.001. Confidence ratings between positive and negative face-descriptors did not significantly differ, p = 0.69. No further main or interaction effects reached significance (all p’s > 0.18).
For incorrect responses, no main or interaction effects reached significance (all p's > 0.17).
Interim summary
Our analysis of the behavioural findings suggests that for memories encoded in safe conditions, participants can update, or correct, representations of the veracity of information with little impact on either negated or non-negated items. Conversely, memory specifically suffers for items that are subsequently negated under looming threat (i.e., threat of shock).
Further research is necessary to draw conclusions about the mechanism underlying this pattern. In contrast to earlier work (e.g., Gilbert et al. 1990), incorrect responses did not necessarily stem from participants miscategorizing negated items as non-negated. Instead, in the present study, these errors seemed to reflect a small number of instances where participants reported never having seen a pairing at all when it had been negated under threat.
Linear ballistic accumulator modelling
These results suggest that looming threat affects negated representations differently from non-negated memories, but without further analysis it is unclear whether this pattern was due to participants actually having poorer memory of the negated face-descriptor pairings or to decisional processes. For example, it could be that when memories are encoded under threat, participants report these memories more impulsively, leading to more errors when information has been negated.
To help determine what underlay the higher error rate for information negated under threat, we fit our choice and response time data from the memory phase to a linear ballistic accumulator (LBA) race model (Brown and Heathcote 2008). To keep the model tractable, and as word valence was not directly relevant to assessing this determination, we did not include any effect of word valence in the LBA model (though trials from all valences were included during model fitting). The LBA assumes that when making a response from a selection of alternatives, each alternative is represented by an accumulator. These accumulators race against each other, with the speed of each accumulator given by its drift rate. Drift rate is determined by the amount of evidence in favour of each alternative: the more evidence for one alternative over another (in this case, the stronger the memory), the higher its drift rate will be. At some point in the race, one of the accumulators reaches a predetermined threshold, whereupon it "wins" the race and is selected. The higher the threshold for an alternative is, the more evidence that must be collected for that alternative to win the race, and as more evidence is required, the less such decisions are susceptible to decision noise, or "gut feeling"/flippant responses. Conversely, lower thresholds lead to more decision noise and fewer correct responses.
The aggregated parameter estimates for the thresholds (Fig. 5a) suggested no difference between thresholds for threat trials versus safe trials, and this was supported by our analyses, t(97) = 0.26, p = .794. (The full report of all modelling methods and analyses can be found in the Additional file 1). We also found no difference in the thresholds for any of the different response types (correct versus true/false reversal versus never seen; Fig. 5b), F(2,194) = 2.35, p = 0.098. This indicated that participants did not differ in their level of impulsiveness when responding to threat or safe trials and that participants were not biased to make any particular response type. However, as suggested by Fig. 5a we found that participants required less evidence on foil trials (where the face-descriptor pairing had not been previously seen) before making a response compared to threat trials, t(97) = 14.63, p <. 001, or safe trials, t(97) = 14.31, p <. 001, indicating that participants appeared to be able to recognize when they had not seen a face-descriptor pairing before, and required less evidence when making a response on those trials.
In drift rate, there was a clear effect of response type (correct versus true/false reversal versus never seen responses), F(2,194) = 743.43, p < 0.001, with faster drift rates for correct responses than incorrect responses (true/false reversals, t(391) = 28.77, p < 0.001, and faster drift rates for incorrect trials than never-seen trials, t(391) = 14.25, p < 0.001. We also observed a response type x veracity interaction (Fig. 5c), F(2,194) = 10.67, p < 0.001, with negated trials having lower drift rates for the correct response and higher drift rates for the incorrect response compared to non-negated trials. This indicated that participants overall had a poorer memory for negated (“false”) trials than for non-negated (“true”) trials, in line with the behavioural findings. We did not observe any veracity x threat of shock interaction, threat of shock x response type interaction, or three-way interaction (all p’s > 0.48). This suggests that although memory for negated items was overall worse than memory for non-negated items, there was no evidence that memory was affected by threat of shock in the model.