Visual learning is a critical skill in medical diagnosis education. For example, neurologists make diagnostic decisions by viewing magnetic resonance scans, radiologists analyze mammograms for evidence of cancer and dermatologists inspect skin lesions for melanoma. Anecdotally, medical educators often introduce more typical cases first—those that have the classic representation of the symptom—before introducing more atypical cases. This implies that an easy-to-hard schedule might already be in use by medical educators. One medical training study in cytopathology (Evered et al., 2014), manipulated the difficulty of the training items and suggested that training should avoid images along category boundaries. However, it was unclear whether the easy-to-hard schedule was superior to the hard-to-easy schedule.
The purpose of the present training task was to teach trainees to make correct diagnosis of whether a pigmented lesion is melanoma or benign. Previously Xu, Rourke, Robinson, and Tanaka (2016) have shown that trainees can improve significantly in melanoma diagnosis after receiving perceptual training with the exposure to multiple exemplars of pigmented skin lesion images, with immediate feedback of the correctness of the diagnosis, and with the requirement to reach the accuracy criterion of 90% with all the training images. In this study, instead of scheduling training items randomly, training items are introduced following either the easy-to-hard or the hard-to-easy schedule.
Experiment 2 directly compares the performance and knowledge retention of groups trained using an easy-to-hard training schedule and a hard-to-easy training schedule. Categorizing lesions is a good test of the predictions of two types of scheduling procedures because it requires integration of information across multiple dimension (e.g. size, coloration, symmetry, and contour) (Ashby & Spiering, 2004; Spiering & Ashby, 2008). When categorization requires perceptual integration, some studies found a learning advantage for the easy-to-hard approach (Church et al., 2013; Liu et al., 2008; McLaren & Suret, 2000), while others report a learning advantage for the hard-to-easy approach (Spiering & Ashby, 2008). In the current experiment, all participants received the same number of easy, medium, and difficult training trials. Participants in the easy-to-hard group were trained with the easy items first, followed by the medium items and hard items. Participants in the hard-to-easy group learned items in the reverse order. Item difficulty was determined using the Ease values of the skin lesion images obtained in Experiment 1. Pre-training and post-training performance for the two schedules was tested immediately after training and two weeks later. The pre-training and post-training performance was correlated with the Ease value of individual test items. The two training conditions were compared by examining overall performance, as well as difficulty-specific performance, in the immediate and two-week post-test.
Method
Participants
Based on an a priori power analysis using the criteria of Cohen’s d = 0.8 (large effect size, Cohen, 1988), alpha = 0.05, power = 0.8, and an attrition rate of 20%, we planned to test 31 participants in each of the training conditions. Sixty-two undergraduate students from the University of Victoria participated in the study. All of the participants had normal or corrected-to-normal vision and none of them have received formal medical training. Thirty-one participants (seven men) were randomly selected to participate in the easy-to-hard condition and another 31 participants (10 men) participated in the hard-to-easy condition. The average age of the easy-to-hard (M = 22.7, SD = 5.4) and hard-to-easy (M = 21.8, SD = 4.4) was not significantly different (t60 = 0.72, p = 0.47, Cohen’s d = 0.18), nor were the gender ratios significantly different (χ2 = 0.73, p = 0.40).
Melanoma diagnosis test (MDT)
The MDT is a measure of the ability to discriminate between melanoma and benign pigmented skin lesion images. In the MDT, six images of each of the four types of melanoma and benign lesions were selected (48 images in total) from the image pool. A mixture of easy, medium, and hard items was selected for melanoma and benign lesions. The melanoma and benign lesions had an average Ease value of 0.60 (SD = 0.10) and 0.57 (SD = 0.12), respectively. In each trial, participants saw one skin lesion image and were asked to judge whether the lesion was “Benign” or “Melanoma” by clicking the buttons presented under the image. The MDT served as the pretest (before training), immediate post-test (immediately after training), and delayed post-test (two weeks after training). The images were identical in all three tests, with the exception that images were rotated 90° clockwise for the immediate post-test and 180° clockwise for the two-week post-test. Images used in the MDT were never used in the training.
Training
Twelve images of each of the four types of melanoma and benign skin lesions were used for training (96 in total). Images used during training were never used in the MDT. All the benign and melanoma lesion images were first sorted by their Ease values. Sixteen of the melanoma images (regardless of their sub-types) with the highest Ease value were labeled as easy items, 16 of the melanoma images with the lowest Ease value were labeled as hard items, and the remaining 16 melanoma images were labeled as medium items. The same method was used to group the benign lesions. As a result, for melanoma lesions, the easy, medium, and hard items had Ease values of 0.72 (SD = 0.06), 0.59 (SD = 0.03), and 0.49 (SD = 0.05), respectively. For benign lesions, the easy, medium, and hard items had Ease values of 0.72 (SD = 0.08), 0.56 (SD = 0.03), and 0.43 (SD = 0.07), respectively. Each of the easy, medium, and hard training blocks contains 16 melanoma and 16 benign lesion images. In the training, participants in the easy-to-hard (hard-to-easy) condition received four iterations of the easy (hard) training block, followed by four iterations of the medium training block, and, finally, four iterations of the hard (easy) training blocks. When a training block was repeated, the same images were used as in the previous block, but appeared randomly in one of the four orientations (i.e. upright, inverted, rotated 90° clockwise, and rotated 90° counterclockwise). As a result, all participants had 384 trials during training. In each trial, participants were required to decide whether the lesion image presented on the screen was melanoma or benign by clicking on the “Melanoma” or “Benign” buttons presented underneath the image. Feedback about the accuracy of the diagnosis was provided immediately after participants responded.
Procedure
The detailed training and pre/post-test arrangements were illustrated in Fig. 4. All 62 participants visited the lab on day 1. They first took the MDT as the pretest. Then, they were randomly assigned into the easy-to-hard and hard-to-easy conditions for the training. After they completed the training, they were given the MDT as the immediate post-test. All participants were invited to complete the second post-test 14 days after the first post-test remotely using their own computers. Both the MDT and the training were programmed using the JsPsyche library (de Leeuw, 2015) using JavaScript and deployed using an online data collection platform developed by the lab led by the senior author of this study. Skin lesion images were 300 × 300 pixels in size. On day 1, the pretest, training, and immediate post-test were conducted in the lab. Participants viewed the images on a 22-inch monitor with a resolution of 1680 × 1050 pixels at a viewing distance of approximately 70 cm, resulting in a visual angle of 6.9° × 7.0°. However, no specific instruction was given to require the participants to remain at this viewing distance during the experiment. The two-week post-test was done remotely so the size of display and viewing distance were unknown.
Results and discussion
Improvement across training sessions
Using correct detection of melanoma as a hit (H) and categorizing benign lesion as melanoma as false alarm (FA), sensitivity (d′) could be calculated for each individual as the difference between the Z transforms of the H rate and the Z transforms of the FA rate (i.e. d′ = ZH – ZFA). The measure of d’ was calculated for each training block across all three training sessions (Fig. 5). Visual inspection suggested that in each session, training performance improved continuously in both groups. Moreover, in Session 1 where the easy-to-hard (hard-to-easy) group repeated four blocks of training with easy (hard) items, training performance was better in the easy-to-hard than the hard-to-easy group. This pattern was mirrored in Session 3 where the easy-to-hard (hard-to-easy) group repeated four blocks of training with hard (easy) items and training performance was better in the hard-to-easy than the easy-to-hard group. Interestingly, in Session 2 where both groups of participants repeated four blocks of items of medium level difficulty, the easy-to-hard group performed better than the hard-to-easy group. This observation was further investigated by a 3 × 2 ANOVA and examining the two-way interactions between the within-subject variable of Sessions (Session 1–3) and the between-subject variable of Training Policy (easy-to-hard versus hard-to-easy). This interaction was significant (F(2,120) = 854.51, p < 0.001, η2= 0.85). Multiple Bonferroni corrected comparisons were conducted between the two groups for each session. There were significant group differences in all sessions: Session 1 (t60 = 23.93, p < 0.001, Cohen’s d = 6.07); Session 2 (t60 = 4.51, p < 0.001, Cohen’s d = 1.15); and Session 3 (t60 = − 26.35, p < 0.001, Cohen’s d = 6.69).
The most important finding from the training data was from the between-group comparison in Session 2, which provides evidence regarding the advantages of the easy-to-hard policy. Unlike Session 1 or Session 3, both groups were trained using the exact same items in Session 2. Before Session 2, the two groups of participants had different training experience. In Session 1, the easy-to-hard group was trained with the easy items whereas the hard-to-easy group was trained using the hard items. Therefore, any differences emerging from the between-group comparison can only be attributed to the different training history for the two groups. These results suggest that learning the easy items first established a better foundation for the trainees to learn the medium difficulty items in the subsequent session.
Post-training gain
The sensitivity (d’) measure was used to compare the performance in MDTs administered before and immediately after the training. A 2 × 2 ANOVA was conducted, with Test (pre versus post) as within-subject variable and Training Policy (easy-to-hard versus hard-to-easy) as between-subject variable (Fig. 6). The main effect of Test (F(1,60) = 145.32, p < 0.001, η2= 0.49) was significant, indicating both groups improved after the training. However, neither the main effect of Training Policy nor the interaction between Test and Training Policy were significant (all Fs < 2.1, all ps > 0.15). Similar results were found when H and FA rates were analyzed separately, with only the main effect of Test being significant. The results from the direct comparison between the performance in the pretest and post-test show that both the easy-to-hard and hard-to-easy training policy were able to improve overall melanoma diagnosis performance to the same degree.
Ease value predictions
Another important question is whether the Ease values can accurately predict diagnosis performance of the lesion images. If the Ease values are a good predictor of the diagnosis difficulty of the lesion images, participant performance should correlate with the predictions. It was also hypothesized that the MDT items’ accuracy should not correlate significantly with the Ease value in the pretest MDT, but the correlation should be significantly larger with same items in the post MDTs. The Ease algorithm effectively constitutes a simple model of an expert. Before training, participant performance should correlate poorly with the predictions of an expert model. In contrast, trained participant performance should correlate highly with the predictions of an expert model.
The Ease values of each of the 48 items in the MDT were used to correlate with the actual performance on those items in the pretest, immediate post-test, and two-week post-test. All 62 participants’ data were used to compare the correlations between the Ease values and performance in the pretest and Ease values and performance in the immediate post-test (Fig. 7). The results showed that the Ease value did not significantly correlate between the accuracy of the items in pretest MDT (r = 0.15, p = 0.32), but correlated significantly with accuracy of the items in immediate post-test (r = 0.66, p < 0.001). The difference between these two correlation coefficients was significant (p < 0.005), indicating that the improved correlation is due to training.
The significant correlation between Ease and performance in immediate post-test was further investigated between the easy-to-hard and hard-to-easy conditions. The results showed that this correlation was significant in both the easy-to-hard (r = 0.60, p < 0.001) and hard-to-easy conditions (r = 0.68, p < 0.001), but the correlation coefficients of the two groups were not significantly different (p = 0.52). For both training policies, equivalent and significant correlations were found between the Ease values and actual performance in the post-test MDT, but not in the pretest MDT. This suggests that a participant’s internal representation of the category structure became more expert-like, as measured by the predictions of the Ease algorithm.
Retention
Retention was measured using data from the first post-test (immediately after training) and the second post-test (14 days after the pretest). Fifty-two (25 in easy-to-hard condition) out of 62 participants completed the two-week post-test, resulting in an attrition rate of 16%. In order to investigate the performance change between the immediate post-test and the two-week post-test in easy, medium, and hard items in the MDT separately, items in the MDT were binned into easy (16 items), medium (16 items), and hard (16 items) sets based on their Ease values. A 3 × 2 × 2 ANOVA was conducted with Difficulty (easy, medium, and hard) and Test (immediate post-test versus two-week post-test) as within-subject variables and Training Policy (easy-to-hard and hard-to-easy) as the between-subject variable (Fig. 8). A main effect was found for Test (F(1,50) = 20.34, p < 0.001, η2= 0.05). However, the main effect of Training Policy and the two-way interactions involving Training Policy were not significant (all Fs < 3.3, all ps > 0.07, all η2< 0.008), indicating that the performance for both groups dropped between the first to second post-test. Importantly, the three-way interaction between Training Policy, Test, and Difficulty was significant (F(2,100) = 5.96, p < 0.01, η2=0.02). In order to further investigate this three-way interaction, retention scores were calculated as the difference between the performance at immediate post-test and two-week post-test. Multiple Bonferroni corrected t-tests on the degree of decay between the two groups showed that the easy-to-hard group had less decay in both the easy (t49 = 2.21, p < 0.05, Cohen’s d = 0.62) and medium items (t49 = 2.75, p < 0.01, Cohen’s d = 0.77) but had equivalent decay in the hard items (t49 = − 1.65, p = 0.11, Cohen’s d = 0.46).
In summary, both groups show equivalent overall performance drops at the two-week post-test. These results indicate that visual categorization knowledge deteriorated for participants in both groups. However, between-group differences were found when the performance decay was examined at the level of item difficulty. The easy-to-hard condition resulted in a larger amount of retained performance in easy and medium items.