Mathematically aggregating experts’ predictions of possible futures

bepress|Physical Sciences and Mathematics 0301 basic medicine 330 Science Data Aggregation Judgment 03 medical and health sciences bepress|Social and Behavioral Sciences|Social Statistics MetaArXiv|Physical Sciences and Mathematics|Statistics and Probability Humans Psychology 0501 psychology and cognitive sciences Students Expert Testimony MetaArXiv|Social and Behavioral Sciences Models, Statistical MetaArXiv|Social and Behavioral Sciences|Social Statistics Q 05 social sciences R Bayes Theorem Awareness MetaArXiv|Physical Sciences and Mathematics Research Personnel Group Processes Public Opinion bepress|Social and Behavioral Sciences bepress|Physical Sciences and Mathematics|Statistics and Probability Medicine Research Article Forecasting
DOI: 10.1371/journal.pone.0256919 Publication Date: 2021-09-02T17:52:51Z
ABSTRACT
Structured protocols offer a transparent and systematic way to elicit and combine/aggregate, probabilistic predictions from multiple experts. These judgements can be aggregated behaviourally or mathematically to derive a final group prediction. Mathematical rules (e.g., weighted linear combinations of judgments) provide an objective approach to aggregation. The quality of this aggregation can be defined in terms of accuracy, calibration and informativeness. These measures can be used to compare different aggregation approaches and help decide on which aggregation produces the “best” final prediction. When experts’ performance can be scored on similar questions ahead of time, these scores can be translated into performance-based weights, and a performance-based weighted aggregation can then be used. When this is not possible though, several other aggregation methods, informed by measurable proxies for good performance, can be formulated and compared. Here, we develop a suite of aggregation methods, informed by previous experience and the available literature. We differentially weight our experts’ estimates by measures of reasoning, engagement, openness to changing their mind, informativeness, prior knowledge, and extremity, asymmetry or granularity of estimates. Next, we investigate the relative performance of these aggregation methods using three datasets. The main goal of this research is to explore how measures of knowledge and behaviour of individuals can be leveraged to produce a better performing combined group judgment. Although the accuracy, calibration, and informativeness of the majority of methods are very similar, a couple of the aggregation methods consistently distinguish themselves as among the best or worst. Moreover, the majority of methods outperform the usual benchmarks provided by the simple average or the median of estimates.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (69)
CITATIONS (13)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....