I just finished reading Superforecasting: The Art and Science of Prediction by Philip E. Tetlock and Dan Gardner. It’s a pretty good book about the fallacies that make people, including experts, generally very bad at forecasting world events, and about the methods the authors used to generate vastly superior predictions from groups of volunteers.
The book begins by summarizing the Expert Political Judgment (EPJ) study which ran from 1984 to 2003 and asked 284 experts in various fields to make roughly 28,000 predictions about the future. The study’s headline grabbing conclusion was that “the average expert was roughly as accurate as a dart-throwing chimp”. The more interesting conclusion was that some experts were more accurate than random chance, which implied that forecasting is not necessarily a pointless endeavor.
The book continues to point out that the problem with public experts is that nobody really keeps track of how accurate they are when making predictions. There are many reasons for this, most to do with the predictions being worded in ways that make them untestable:
- Many predictions use words like “may” which don’t specify the actual predicted probability. As in, anything “may” happen, but that doesn’t say how likely it is. More generally, it turns out that words like “likely”, “quite”, “significant”, and “risk of” mean very different concrete numbers to different people, so it’s hard to determine if predictions using them came true or not.
- Many predictions don’t specify a time frame. This makes it hard to say when enough time has passed without them happening for them to be declared wrong. For instance, the statement “quantitative easing will cause runaway inflation” doesn’t specify a “by” date, so it’s impossible to say if it was wrong or if the runaway inflation just hasn’t happened yet.
Additionally, since many experts build their careers on the accuracy of their forecasts, or rather on their ability to sell their predictions and narratives about what is going on in the world, they have little incentive to make testable predictions. Mind you, this needn’t necessarily be malicious behaviour; it could just be various cognitive biases conspiring to paint a rosier picture in experts’ minds about their past performance than what actually happened.
The book then describes the Good Judgment Project (GJP), a set of experiments into determining and improving the accuracy of aggregate forecasts made by ordinary people. For the initial setup, some 3,800 volunteers were gathered and asked to make predictions about carefully worded scenarios (e.g. “Will Serbia be officially granted European Union candidacy by 31 December 2011?”). Some of these volunteers worked alone, and some were organized into small randomly composed teams. Some received training, and some did not. The predictions were then averaged together and extremized to be used as part of IARPA’s forecasting tournament in the early 2010’s. The GJP won the tournament in 2014 by a wide margin.
One interesting result of the GJP was that some people were consistently better at forecasting than their peers. Additionally, what made these “superforecasters” so super was not any obvious single trait. Many were smarter than average by conventional metrics, but they weren’t geniuses. Many were numerate having backgrounds in math, science, or computer programming, but they tended not to rely on complex numerical models. Many followed the news closely, but not as full time occupations. In fact, it seemed that “the critical ingredient was the style of thinking”.
This different style of thinking is summarized on the Ten Commandments for Aspiring Superforecasters page. I think the most important bits are:
- Superforecasters broke large questions into smaller ones and then aggregated the answers.
- Superforecasters considered multiple scenarios and aggregated their probabilities. In particular, they were “actively open-minded” in seeking new scenarios that could factor into their forecasts.
- Superforecasters considered the “outside” view of scenarios before the adjusting probabilities by the “inside” view. That is, they first considered the likelihood of a scenario as an example of a larger set of scenarios before adjusting that by how likely and compelling the scenario was in itself.
- Superforecasters tended to use finer degrees of probabilities in their predictions. For instance, they would specify their forecasts in 5% increments, where the regular forecasters would use increments of 10%, and where regular people might just use increments of 50% (i.e. “will definitely not happen”, “might happen”, “will definitely happen”).
- Superforecasters tended to conduct post-mortems on both their failed and successful forecasts to see how they could improve in the future.
Another interesting result was that organizing superforecasters into teams with a shared communication channel like a forum would improve their individual forecasts.
The book concludes by noting that, although superforecasters displayed intellectual humility rather than confidence, and would refuse to make strong 100% predictions about the future, this could paradoxically be a boon to decision making in organizations. The examples given are the German army in World War II and the American army in the Gulf War which functioned by passing not strict orders, but goals down the chain of command. The lower levels were then expected to achieve the given goals in the best way they saw available.
The book feels fluffy and not very tight to me. I think this is because it was meant to serve two different purposes:
- to report on the findings of the GJP, and
- to convince the reader that similar forecasting methods should be used widely.
I think the conclusion of the report-on-findings part is best summarized on the commandments page. That contains the advice on improving the accuracy of one’s forecasts. The book also reports on the specific amount of improvement observed during the IARPA tournament, but it seems hard to say whether similar improvements would manifest under different circumstances.
The convince-the-reader part is a collection of anecdotes on how various experts failed to predict certain world events and thought experiments about how those scenarios could have unfolded differently. These stories, interspersed throughout the book, are well written and show where the authors are coming from. That said, they are still anecdotes and lengthen the text considerably.
Overall, I would say the book is definitely worth a read, especially if you are interested in discussions about fallacies and cognitive biases à la Thinking Fast and Slow.