Aggregations of judgments often outperform those of individuals. This phenomenon, often termed “the wisdom of crowds” (Surowiecki 2005), has been shown in many decision and prediction contexts, including mathematical problems (Yi et al. 2012), game shows (Lee et al. 2011), and elections (Gaissmaier and Marewski 2011). However, most crowdsourcing methods have an important limitation—they cannot detect cases in which the majority is wrong.
One recently developed aggregation approach, the “surprisingly popular” method (Prelec et al. 2017; henceforth “SP”), has shown promise in overcoming this weakness. The SP method leverages metacognitive awareness: people who are correct, but in the minority, often know that their response is rare. Participants answer one additional question: the percentage of other participants who will make the same judgment they did. These estimates are then compared to participants’ actual judgments. When an option is chosen more often than the average metacognitive judgments of that option, it is “surprisingly popular” and is selected by the method.
For example, suppose participants are asked whether Reno, Nevada, is east of Los Angeles, California. Because most of Nevada is east of most of California, people often respond that Reno is east of Los Angeles. This is incorrect; Reno is some 86 miles west of Los Angeles. Suppose that 30% of people know this. They also—importantly—know that this knowledge is rare, and estimate, on average, that 15% of others are also correct. Now consider the 70% of people who are incorrect; suppose that they believe, on average, that 90% of others agree with their answer. Thus, although the average metajudgment was that only 11.5%Footnote 1 of people believe that Reno is west of Los Angeles, that answer was actually given by 30% of respondents, making it “surprisingly popular.”
Most demonstrations of the SP method have examined judgments in which the correct answer is known. Although improving the accuracy of such judgments may inform understanding of judgments about as-yet-unsolved questions, it does not necessarily follow that improvements in problem solving imply improvements in prediction. Leveraging the SP method to improve prediction of future events is a particularly exciting potential application of this approach.
Lee et al. (2018) provided the first test of whether the SP method can improve collective judgments of unknown events—that is, future outcomes. Lee et al. (2018) had participants predict the winners of National Football League (NFL) games in the 2017 season. They found that, among participants who indicated that they were “extremely knowledgeable” about football, the SP method yielded better predictions than many NFL media figures, an alternative aggregative method (confidence-weighted judgments), and a prominent algorithmic approach to prediction (by fivethirtyeight.com). However, SP was inferior to the democratic method (the modal judgment). Given these mixed results, Lee et al. (2018) were appropriately cautious in their conclusions. First, they noted that participants were capable of easily making metacognitive judgments about future events, as they are in the case of factual judgments. Second, they emphasized the importance of expertise in yielding accurate predictions using the SP method. However, several important questions remain unanswered.
First, does the SP method actually yield more accurate predictions than other aggregation methods? Examining Lee et al. (2018), the most straightforward implication is that SP does not clearly outperform other approaches. Nevertheless, it may be that the particular NFL season examined by Lee et al. (2018) is not representative of future events, sporting events, or even NFL seasons. Thus, it remains useful to provide additional tests of the SP method.
Second, does the SP method perform better when it aggregates judgments made by experts? Prelec et al. (2017) did not find systematic differences in the effectiveness of the SP method based on expertise. In contrast, Lee et al. (2018) found that the SP method was more effective when it aggregated only judgments made by self-assessed experts. However, this selection decision was exploratory (Lee et al. 2018, p. 326), and moreover, self-assessments of expertise are not always accurate (Kruger and Dunning 1999).
To examine these questions, we conducted three studies in which participants predicted future outcomes. We compared the SP method to other methods of aggregating crowdsourced judgments and also assessed expertise by testing domain knowledge. Study 1 examined predictions of NFL games made by students, Study 2 examined predictions of the 2018 midterm elections made by mTurk workers, and Study 3 examined predictions of NBA games made by members of the /r/NBA and /r/sportsbook subreddits and students in a sport psychology course. We hypothesized that the SP method, when applied to judgments made by experts, would yield more accurate forecasts than other crowdsourcing approaches.