Response sheets
The NASA task was divided into three phases: (a) individual task, (b) group task, and (c) feedback. The two phases of the individual and group decision-making were completed on each of the pages, respectively. In the phase related to the individual task, the survival situation was written at the top of the page and an answer form was presented at the bottom. The answer form included 15 rows corresponding to the 15 items and five columns including the items, rank order, usefulness, confidence, and degree of agreement with others’ answers. In the items’ column, a vertical list of the 15 items was arranged as in a previous study (Hall & Watson, 1970). Participants filled out the rank order column by ranking the 15 items in terms of their relative importance for survival. In the usefulness column, participants freely wrote how the items could be utilized in survival. In the confidence column, participants were asked to evaluate their confidence in each ranking on a 5-point scale (1 = not at all; 5 = very much). In the degree of agreement column, participants were asked to estimate how much others’ rank order would agree with their own (0% = not at all; 100% = perfectly). The phase related to the group task was similar to the individual task with the exception of the degree of agreement column, in which participants were asked to estimate the extent by which other groups’ answers would agree with their own group’s (0% = not at all; 100% = perfectly). In this paper, we did not use the responses to the questions of the degree of agreement for both individual and group phases, because of several missing values. In the phase related to the feedback, participants were provided with feedback on the results of the task. The page for the feedback contained 15 rows corresponding to 15 items and five columns: (a) individual answers, (b) group answers, (c) correct answers, (d) individual error scores, and (e) group error scores. The correct answers by NASA experts were not included in the page, and the experimenter projected the correct answers on a monitor. Participants were asked to fill in each of the columns and calculate the individual and group error scores to compare individual and group answers with the correct answers.
Procedure
The experiment contained two main parts: individual decision-making and group decision-making. In the individual decision-making phase, the experimenter provided each participant with the response sheets and instructions on how to complete them (10 min). The participants were asked to rank the objects individually and evaluate their confidence in each ranking and its degree of agreement with others’ answers (15 min). In the group decision-making phase, five group members were predetermined randomly from a class attendance list. The experimenter projected group member lists with student codes and group positions in the classroom, and each participant was instructed to move to their assigned position (10 min). Where groups of four or five people could not be formed because of absent members, the students present were extemporarily re-grouped to form four-person or five-person groups. The experimenter then provided instructions on group decision-making. All groups were instructed to employ the method of group consensus as described by Hall and Watson (1970). First, each group member must agree upon the ranking for each of the 15 survival items before it is entered as the group decision. Second, members must avoid conflict-reducing techniques such as majority voting. Third, members must avoid readily changing their opinions simply to avoid conflict and reach agreement. To prevent intergroup influences, researchers ensured that participants interacted only within their groups, with no cross-talking between groups. After group decision-making, correct answers were projected on the monitor by the experimenter. Finally, at the end of the NASA task, each group member responded individually to questions about the group work.
Analyses
Data analyses were conducted using the R statistical programming language (version 3.3.2). First, error scores indicating task performance, standard Borda, and CW-Borda count aggregations were calculated. Seven analyses were then performed: comparison among group decisions, standard Borda, CW-Borda, and individual decisions based on the four NASA task indices, prediction of confidence on error score and utilization of confidence for each group, and simulation of CW-Borda and standard Borda based on the change in group size and weight value by confidence.
An error score is given by the sum of the absolute differences between the ranks assigned to the items by the NASA experts and by the participants. Lower error scores indicate better performance and adequacy in terms of reasonable judgments. The individual and group rank orders were used to calculate error scores, and not the values of feedback to participants. If any group member wrote rank orders different from others, the majority rank orders of the group were used for analysis.
Standard Borda and CW-Borda were calculated for each of the 25 groups in the present study, aggregating the rank orders of four- or five-group members. In the basic Borda count method (Marden, 2014), weighted counts are assigned such that the first-choice item receives a count of N (where N is the number of items), the second-choice item receives a count of N − 1, and so on. These counts are summated across group members and the item with the highest count is considered the “most preferred.” To weight Borda count by confidence, the softmax function was used. The softmax function outputs a vector that represents the probability distributions of a list of potential outcomes. The softmax function is defined as follows:
$$ W\left({Y}_{xi}\right)=\frac{e^{k{y}_{xi}}}{\sum_{j=1}^n{e}^{ky_{xj}}}, $$
(1)
where W(Yxi) indicates a weight value for item x on group member i. yxi is the subjective confidence (on a 5-point scale) of the ranking of item x for participant i. k is a sensitivity parameter of the inverse of temperature, regulating how strongly activation y varies with the confidence. When the k parameter is a lower value, the difference in the weight value among group members decreases (k with zero indicates no weight). Whereas, when the k parameter is a higher value, the weight value of a higher-confident group member increases and the weight value of a lower-confident group member decreases. The denominator is the exponential sum of all inputs for members of one group (n = group size). When calculating the weight value for group members on one item with a three-person group; for example, when the k parameter is 1, the softmax function turns confidences (logits) [1, 3, 5] into weights (probabilities) [0.02, 0.12, 0.87], which adds to 1. In the present study, the k parameter with 1 as a default was considered in analyses regarding the four NASA task indices, and the k parameter was varied from 0 to 5 in steps of 0.5 for the analyses of the utilization of confidence and simulation. Note that standard Borda is a special case of CW-Borda where k = 0.
Based on the above, CW-Borda count aggregation was calculated as follows. First, the softmax function calculated the weights of group members using the confidence for each item from the phase of individual task. Second, each rank order score (i.e., firstorder item = 15 points, second = 14 points, third = 13 points, etc.) was multiplied by the weights for each group member. Third, these weighted scores were summated across members, completing CW-Borda count aggregation (the item with the highest count was considered the “most preferred”).
Comparison among group decisions, standard Borda, CW-Borda, and individual decisions based on the four NASA task indices
Decision adequacy
Error scores were calculated for group decisions, standard Borda, CW-Borda, and individual decisions as the difference in the correct rank order identified by experts. Average error scores of individual group members were calculated for each of groups. The one-way repeated measures analysis of variance (ANOVA) was conducted with the condition (group decisions, standard Borda, CW-Borda, and individual decisions) as the independent variable and error score as the dependent variable. We investigated whether group decision-making, standard Borda, and CW-Borda outperformed individual decision-making, and whether group decision-making outperformed standard Borda and CW-Borda.
Synergism
Two scores were computed for weak and strong cognitive synergy, respectively. Weak synergy was calculated as the difference between the group’s performance and the mean of the individual scores within the group; strong synergy was calculated as the difference between the group’s performance and the score of the best-performing member in the group (Larson Jr., 2007). Two separate one-way repeated measures ANOVA were then conducted, with the condition (group, standard Borda, and CW-Borda) as the independent variable and weak and strong cognitive synergy score as the dependent variable, respectively.
Utilization of resources
We analyzed whether group decision-making utilized more pre-discussion resources than standard Borda and CW-Borda. Specifically, the frequency with which group members’ pre-discussion decision resources were actually utilized in group decision-making was calculated. First, the number of group members whose individual decisions were the same as the group’s was counted for each item. For example, if four participants ranked an item as [1, 3, 5, 3] and the group answer was 3, the count would be 2. These counts were summated across items, and the sum was divided by product of the numbers of group members and items. This frequency rate was calculated for CW-Borda as well as group decision-making. The one-way repeated measures ANOVA was conducted to determine whether group decision-making used more pre-discussion resources than standard Borda and CW-Borda, with the condition (group, standard Borda, and CW-Borda) as the independent variable and frequency of utilization of resources as the dependent variable. Next, two regression analyses were conducted to determine whether predictors of the utilization of resources explained the error score in terms of difference between group decision-making and standard Borda and between group decision-making and CW-Borda. The independent variable was the difference in utilization of resources between group and standard Borda (or CW-Borda). The dependent variable was the difference in the error score between group and standard Borda (or CW-Borda).
Creativity
Creativity was assessed by calculating the frequency with which correct rankings were determined by group but were not present in group members’ pre-discussion decision resources. First, the number of group answers not present in pre-discussion resources was counted; in the group answers, the number of items with an error score within 1 was counted. For example, if three items were ranked by participant A as [3, 1, 2] and by participant B as [3, 2, 1], compared with the correct answers of [1, 2, 3], the coded numbers would be [0, 1, 1] and the final count would be 2. Then, the counts were divided by the total number of items (i.e., 15). This frequency rate was calculated for group decision-making and CW-Borda. The one-way repeated measures ANOVA was conducted to determine whether group decision-making resulted in more creative solutions than standard Borda and CW-Borda, with condition (group, standard Borda, and CW-Borda) as the independent variable and frequency of creativity as the dependent variable. Next, two regression analyses were conducted to determine whether predictors of the creativity explained the error score in terms of difference between group decision-making and standard Borda and between group decision-making and CW-Borda. The independent variable was the difference in creativity between group and standard Borda (or CW-Borda). The dependent variable was the difference in error score between group and standard Borda (or CW-Borda).
Effects of confidence weighting inherent in the group decisions
Predictability of confidence on error score
HLM analysis was conducted using M-plus version 7.31 (Muthén and Muthén, 1998–2012). The predictability of confidence on error scores for each item was confirmed using two models: the random intercept model and the random intercept and slope model. The random intercept model is defined as follows:
$$ {\displaystyle \begin{array}{c}\mathrm{Level}\ 1:{error\ score}_{si}={\beta}_{0i}+{\beta}_1{(confidence)}_{si}+{e}_{si},\\ {}\mathrm{Level}\ 2:{\beta}_{0i}={\gamma}_{00}+{\mu}_{0i},\\ {}{\beta}_{1i}={\gamma}_{10,}\end{array}} $$
(2)
where in level 1 (within-item level), error scoresi is the error score for subject s on item i as a dependent variable, β0i is the mean error score of item i as an intercept, (confidence)si is the subjective confidence value of subject s for item i, β1 is the slope of predictability of the confidence level, and esi is the variance in error score for subject s around the mean of item i as a residual error. In level 2 (between-item level), β0i is the mean error score of item i, γ00 is the mean error score of all items as the intercept, and μ0i is the variance between the mean and the mean error score of item i. In the random intercept model, β0i at level 1 and γ00 at level 2 have only a single value (for example, an intercept and a regression coefficient), termed fixed effects. In contrast, esi at level 1 and μ0i at level 2 have values that vary across level 1 and 2 units, respectively, termed random effects (i.e., a regression residual or variance).
Based on the random intercept model, a random slope was added in the random intercept and slope model, defined as follows:
$$ {\displaystyle \begin{array}{c}\mathrm{Level}\ 1:{error\ score}_{si}={\beta}_{0i}+{\beta}_1{(confidence)}_{si}+{e}_{si},\\ {}\mathrm{Level}\ 2:{\beta}_{0i}={\gamma}_{00}+{\mu}_{0i},\\ {}{\beta}_{1i}={\gamma}_{10}+{\mu}_{1i},\end{array}} $$
(3)
where in level 2, β0i and β1i are estimated with both fixed and random effects; β1i is the slope value (the impact of confidence level) in an item i; γ10 is the mean slope across items as the intercept; and μ1i is the variance of an individual’s mean slope.
Utilization of confidence for group decision-making
Sensitivity values of confidence weighting best fitting the group answers for each group were estimated as follows. First, error scores of CW-Borda with temperature parameter k from 0 to 5 by 0.5 in the softmax function were calculated for each group. Second, difference scores were calculated by subtracting error scores of group answers from that of CW-Borda for each group. The lower the difference in scores, the closer it is to the group decision. Finally, the parameter k was calculated to take the smallest difference value between error score of group answer and that of CW-Borda for each group.
Simulation of CW-Borda and standard Borda count aggregations according to change in group size and sensitivity of confidence weighting
Finally, CW-Borda with standard Borda values were simulated with increasing group sizes and sensitivities of confidence weighting to investigate whether the effect of confidence depended on group size and sensitivity of confidence weighting. The simulation used data of individual rank orders and their confidence levels. Group sizes varied from 1 to 50 and sensitivities of confidence weighting varied from 0 to 5 by 0.5. In each group size, individual rank orders were randomly selected, and CW-Borda and standard Borda count aggregations were computed. This procedure was repeated 10,000 times, and a mean was computed for each group size and sensitivity of confidence weighting.