Why are some experts good at making predictions when others fare no better than chance?


An Optimistic Skeptic

Introduces Tetlocks research. Often misinterpreted as ‘experts cant forecast’ but actually more an investigation into why some experts can while others cant.

Accurate and well calibrated forecasting is important. Will Iraq deploy WMDs? Will bailing out banks improve unemployment rates? Should I invest money in the stock market? We rarely score experts in this field - when we do, they are often perform no better than chance.

Contrast Laplaces demon vs Lorenzs butterflies - clearly some aspects of some systems are inherently unpredictable over time. Just as clearly, most aspects of daily life are very predictable. We are rarely surprised by minute-to-minute events.

Question is how close to the theoretical limits do we get, and how can we do better.

Equally interesting, how well calibrated is our confidence ie do we know when our predictions are good?

Optimistic skeptic - perfect forecasting is not possible, but we can do much better than we do right now.

Tetlock ran the Good Judgment Project. >20k volunteers making predictions through website. Demolished everyone else in IARPA competition, including the intelligence agencies.

This is the reason I have high confidence in the main ideas of this book - as well as seeming truthy and being based on research, they have been tested on difficult problems with a wide variety of people and produced excellent results.

Point of the book is that foresight is a real, measurable skill. It can be taught, and reasonable improvements can be made with fairly minimal investment. Only real problem is overcoming human psychology - people are overconfident in their own predictions and so averse to being wrong that they will sabotage their own measurements by retroactively re-intrepreting predictions.

Tetlock makes a prediction - the best results will be achieved by blending computer-base forecasting and human insight. Neither will supplant the other.

Illusions of Knowledge

Focuses on human biases that make discovering knowledge hard.

Until the mid-20th century, the field of medicine was full of confident experts who had years of expertise and anecdotal evidence that there treatments worked. They turned out to be mostly wrong and, on average, more likely to harm then help.

Rise of evidence-based medicine in the 50s, pushed heavily by Archie Cochrane. Heavily opposed by doctors who saw it as a challenge to their expertise. They felt like they were already experts and didn’t need any external validation / criticism, even though evidence showed that many of their treatments were useless or worse. Especially resistant to trials, which they saw as unethical when they already ‘knew’ what the most effective treatment was.

Similar stories in politics even today - vicious debate over effectiveness of various policies but with little validity. Humans tend to feel like they know what they are talking about.

Suffering from the illusion of knowledge.

This is going to be a common theme through all of these notes - the importance of escaping naive realism / what-you-see-is-all-there-is. Feeling right is not the same as actually being right, feeling confident does correlate well with successful predictions, the map is not the territory etc.

Discusses System 1 vs System 2. Covered better in other books, but in short: System 1 is unconscious, fast, effortless, automatic and associative where System 2 is conscious, slow, effortful, sequential and logical. System 1 is the default mode of thinking and runs most of our life but has a number of systematic heuristics/biases/flawes. Since it’s below conscious introspection we can’t notice these directly, only by observing large numbers of mistakes. Activating System 2 takes conscious effort and is tiring, so is only activated when we realize that we need it.

Similarly, because the decisions made by System 1 are below conscious perception we aren’t aware of the reasons for them. Various ingenious experiments demonstrate that, rather than being aware of this lack of introspection, people will generate theories from scratch and believe them to be the original reason ie post-hoc rationalization of System 1 activity.

Main source of naive realism seems to be believing uncritically the data passed up to conscious attention by System 1. Explains why cognitive reflection, mindfulness etc are useful. Similarly, there needs to be some word for the corresponding illusion where people believe that they have full access to their own thought processes when in fact that perception is generated post-hox.

The Cognitive Reflection Test is a series of questions that test how good the testee is at activating System 2 at appropriate times ie noticing when System 1 is likely to make mistakes.

Same section connects the urge to rationalize post-hoc with overconfidence in general, arguing that both are related to confirmation bias. We uncritically believe first theory that fits and stop looking for alternative explanations.

Remember: confidence == coherent story, not confidence == accuracy.

Bait and switch - System 1 often replaces hard questions with easy questions without the thinker noticing. Many biases fit this pattern eg availability bias replaces ‘how likely is X’ with ‘how easily can I imagine X / how many instances of X can I remember’ - reasonable heuristic in many environments but not always correct.

Tetlock coins ‘tip-of-your-nose perspective’ for the subjective reality created by System 1.

Note the message is not ‘System 2 is better than System 1’. System 1 is fast, automatic and effortless. Makes it very good at pattern matching. The question is when should we trust it? Experiments show that in some fields the intuitive hunches of experts are eerily prescient, recognizing patterns in data that they aren’t even able to process consciously. But in other fields the intuitive hunches of experts are no better than random. In both cases the subjective feeling of confident knowledge is the same. Want to figure out what makes the difference, and whether it is something can train.

Keeping score

Discusses the importance of making concrete, testable predictions and checking the results.

Most predictions are so unspecific that it is hard to judge after the fact whether or not they were correct. In some cases this is deliberate - leaves pundits wiggle room to avoid embarrassment.

Being wrong is unpleasant. Most people, consciously or otherwise, optimize for avoiding appearing wrong. Better to optimize for avoid being wrong. In the latter frame of mind, discovering that you are wrong is a welcome mistake - you get the change to be less wrong.

What makes a good forecast?

It should be completely unambiguous ie it must be easy for observers to agree later on whether or not you were wrong. Ambiguity leaves room to fool yourself. Some prediction sites require each prediction to have a referee, which seems like a useful way to keep yourself honest.

It should have a concrete finish date. Predictions like ‘unemployment will go down after stimulus’ leave too much room to spend years saying ‘any minute now’.

It should be probabilistic. For predictions to be used as the basis for decisions it is important to know what level of confidence is expressed, and to calibrate that confidence it is necessary to practice.

It should use specific numbers for probabilities. Tetlock gives various anecdotes where terrible mistakes were made because different people did not assign the same meaning to phrases like ‘There is a serious possibility that…’, with interpretations ranging from 20% to 80% chance in one case. Also, non-specific language again creates room to wiggle out of wrong predictions.

You should make lots of predictions. Given partial knowledge and probabilistic events, there is no way to judge if your ‘70% chance of rain’ was misinformed or just unlucky. With large numbers of similar predictions we can begin to judge accuracy and calibration.

Well-calibrated: predicted probability == observed probability, across all ranges from 0% to 100%. Overconfidence: prediction probability > observed probability. Possible to be eg well calibrated between 20% and 80% but poorly calibrated at extremes, or vice versa.

Turns out later that it’s sometimes good enough for the shared information to be high. The GJP actually re-calibrated the combined judgments of its participants to achieve better results. If you find your System 1 is systematically overconfident in some domain, System 2 can remember to assign lower confidence to those predictions.

Resolution: how extreme your predicted probabilities are. The reason we value this is that it’s easy to be well-calibrated by lumping many classes of events together and taking the average rate. A weather forecaster who makes predictions using the seasonal average will be well-calibrated over time but not as useful as one who manages the same calibration while taking current conditions into account. The latter can achieve better resolution.

Tetlock uses the Brier score to score predictions. This is a scoring rule with the useful property that the maximum expected score is given by predicting the actual probability of the event (rather than eg deliberately extremizing your prediction to score better) so the optimal strategy is to give honest predictions instead of gaming the scoring.

I wonder if a better measure would be shared information between event and prediction. This takes into account calibration and resolution, and also the probabilities of different event classes eg we could look at shared information over forecasts in general, or focus on shared information given that the forecaster predicts tornadoes. Is this the same as logarithmic scoring?

Also have to compare scores to other forecasters on the same problems. Some predictions are easier to make than others, so the only fair comparison is to use the same problems across the board. Also compare scores to simple strategies like ‘same weather as yesterday’ or ‘same weather as this day last year’ to see how much value the forecaster is actually adding.

Various experiments run with experts in different fields. Two distinct groups emerge. One group does no better than random chance (and sometimes worse). Other group does slightly better than default strategies like ‘predict no change’. Not amazing results, but clearly some foresight exists.

Notable difference between the groups: those with no foresight were dogmatic and convinced of a few Big Ideas whereas those with some foresight were more pragmatic, drew on a range of conceptual tools and weighed many perspectives. Named these groups hedgehogs and foxes after Archilochus “the fox knows many things but the hedgehog knows one big thing”. Foxes won on both calibration and resolution ie they weren’t just playing it safe, they were better at identifying when they had reason to be confident.

The explanation given is that filtering everything through a strongly believed Big Idea exacerbates confirmation bias, increasing confidence while at the same time filtering out important evidence.

It’s not clear in this book how rigorously this was established or how well-replicated it is. TODO find out…

Hedgehogs are much more popular across the board. Media likes bold predictions and simple theories. Politicians are frequently quoted as wanting to hear confidence. People dislike uncertainty.

This theme will be repeated later in unrelated books - uncertainty is known to be a huge stressor. There are suggestions on ways to make it more tolerable. Intended for managing stress, but would also be useful for making oneself less likely to embrace false confidence to escape uncertainty.

Wisdom of the crowds - average group predictions are often more accurate than individual predictions. Tetlock stresses that there is no magic here - it is simply a result of combining the perspective of many people to make an artificial fox. How well this works depends on how much accurate information there is to aggregate. If the crowd has no information then the aggregate result has no information.

Advice is to behave like a crowd yourself. Aggregate information from as many different sources and perspectives as you can.

No guidance on how to weigh different perspectives. Nor is it clear whether the exercise by itself is enough to improve results or whether the habit is itself a symptom of a particular kind of mind. Later shows that training in prediction in general is effective, which is weak evidence that training in foxiness would be effective.


Aims to demonstrate that there is some element of skill to forecasting, not just luck.

Notes a common bait and switch - replace “was this a good prediction” with “did this have a good outcome”. Pernicious in politics where even if the speaker doesn’t believe, it will still convince the audience and be a useful attack.

IARPA competition was started to improve decision making in US intelligence agencies. Want to know what methods actually produce results, as opposed to just seeming truthy. Tetlock compares this to the rise of evidence-based medicine ie applying the experimental method to intelligence work. What other fields could benefit from similar apporaches.

Competition aimed to focus on problems which were hard but possible. Acknowledged clear theoretical limits to what humans can predict and focused on improving in the areas where we can make predictions in theory but fail in practice.

Good Judgment Project took the combined predictions of thousands of volunteers. Averaged the predictions but weighted successful forecasters more heavily. Extremized the result, because this produced better results in practice - Tetlocks plausible explanation is that combining disparate information should increase confidence, not just produce the average confidence.

The Good Judgment Project beat everyone, including the intelligence agencies who had access to classified info.

Even more amazing, some of the individual members of GJP alone beat the intelligence agencies.

Tetlock studied 58 of the top forecasters, who beat the average Brier score by around 60%. Many of them did not regress to the mean over the later years of competition - in fact, on average their lead over the other contestants increased. To test if this effect was caused by singling out the superforecasters and putting them on special teams, they repeated the experiment with later generations of forecasters and observed the same effect. I’m not clear on how that actually rules out the effect of special teams, since they put the later generations on special teams too.

Overall the correlation between individual scores for all forecasters from year to year was around 0.65, which is sufficiently high to indicate that there is some real skill involved and not just plain luck.


Argues that raw intelligence is not the main contributor to success in forecasting.

Note attributes common to the superforecasters: intelligent, wide range of general knowledge and modern affairs, often highly numerate (even though most used little math in their process). Hypothesis - numeracy is correlated with placing trust in data over intuition, and with repeated exposure to being clearly and inescapably wrong.

Intelligence - on both fluid intelligence and crystallized intelligence (aka knowledge) tests forecasters score ~70% above general population, superforecasters score ~80%. Intelligent, but not unusual genius. Tetlock postulates declining returns on both intelligence and knowledge, explaining why volunteers were able to beat professional intelligence analysts. Would have liked to see the graph of intelligence/knowledge vs score to see if it does indeed level off at the top.

Also common themes to their problem solving process. Fermi estimate -> outside view -> inside view -> synthesis -> loop.

Fermi estimates - rather than just guess, break a problem down into its parts until you reach numbers that are easier to guess accurately. Prevents bait-and-switch by concentrating focus onto the structure of the problem. In probabilistic questions, enforces the correct use of the laws of probability which System 1 systematically misuses.

Gives a long example of using Fermi estimates to predict whether polonium would be found when testing Yasser Arafat’s body, showing how it catches several bait-and-switch moments eg replacing “will polonium be found” with “would Israel poison him”.

Outside view - aka reference-class forecasting - before focusing on the details of the current problem, first obtain a base estimate by averaging over many similar problems. Thought to counter anchoring - a (poorly replicated) bias where guesses can be heavily influenced by completely unrelated numbers. Effective at reminding oneself to account for priors / base-rate. Also perhaps a counter to story-oriented thinking / inaccurate stereotyping?

Gives an example with a long description of a family followed by asking to predict whether they have a pet. Common responses are led astray by the ethnicity of the family and the number of the children. Superforecasters instead start by finding out what percentage of families in the area have pets and then adjusting from that base estimate if necessary.

There are inherent difficulties in how you pick the reference class eg should our reference class be American families, or Italian-American families, or Italian-American families within this neighborhood, or Italian-American families with this income bracket etc. It’s very easy to construct problems where two seemingly valid reference classes for the same event produce wildly different base rates. I currently have no good models/techniques for choosing reference classes.

Inside view. Finally we use the specific details of the problem to adjust the base rate. Have to be sure not to double-count features we already used for our reference class eg if the reference class was families-with-children, can’t reuse the children to adjust the base rate.

Both the outside and inside view get applied to the individual components of the Fermi estimate. Can also use outside view of the problem as a whole to sanity-check results.

Synthesis - finally, try to find other perspectives. This is a counter to confirmation bias and what-you-see-is-all-there-is - trying to account for unknown unknowns. Get fresh perspectives from other people if possible, but even deliberately trying to generate alternate perspectives yourself is useful. Research on creativity and out-of-the-box thinking may be useful here for generating fresh ideas.

Loop - keep going until you are done. No mention of how superforecasters judge when they have generated enough perspectives, viewed enough data etc. Typical overconfidence would lead to finishing too early.

No surprise - superforecasters score highly on the Cognitive Reflection Test.

Notes that superforecasters score highly on tests of need for cognition, openness to experience and active open-mindedness. No magic here. Good tools for countering known biases but using them does not guarantee improved predictions. You have to actively want to find the truth more than you want to avoid looking/feeling wrong.

Bumper sticker summary:

For superforecasters, beliefs are hypotheses to be tested, not treasures to be guarded.


Discusses probabilistic reasoning.

Various examples where media and the press portray confidence as heroic and probabilistic estimates as hedging/dissembling.

Some plausible evolutionary psychology just-so stories about why this happens: Probability of risk requires being on alert which is physically draining/damaging. Better to just switch off at low values of confidence. Somewhat aligns with the fact that uncertainty is known to cause stress responses. Alternative just-so story seen elsewhere is that false confidence is useful for convincing other highly-evolved lie detectors - if you don’t even know you are lying about your confidence then you won’t give off subtle signals for others to catch you out.

Easy to demonstrate that people suck at probability. Bad at updating priors. Bad at conditional probabilities. Confuse probability with scores - if >50% it should happen. A flaw I’ve noticed in a lot of otherwise numerate people is confusing the map with the territory - believing that probabilities exist in the world itself rather than being a property of ones partial knowledge of the world. This is the underlying flaw exposed by the Monty Hall problem.

A useful model is to assume that people only have a few settings along the range from impossible to guaranteed. Forecasters who use more settings on the dial (going from nearest 10% to nearest 5% or nearest 1%) are more accurate. Rounding up the answers of superforecasters to the nearest 5% reduces their score, whereas regular forecasters don’t suffer much from rounding even to the nearest 20%, indicating that the granular answers of the superforecasters does actually reflect real information.

Another explanation of why many people struggle with probability is that it runs counter to their story-oriented view of the universe. People believe in fate and define their identities by the unique events that shaped them. There is a correlation between prediction accuracy and belief in fate / meant-to-happen thinking. Not clear that this result is related to probabilistic thinking. Could be explained instead by the earlier result that confidence == coherent story - perhaps people who don’t think in stories don’t suffer that bias as much.


Discusses prediction updating over time.

Rebuts the idea that the superforecasters did better simply by putting more time in than the others (consuming more news and updating predictions more often). Already know that superforecasters don’t have substantially more general knowledge than other forecasters. Superforecasters initial forecasts were still 50% more accurate than other forecasters, so they still would have won without updating. Not a solid rebuttal - superforecasters clearly do spend lots of time researching their first forecast too.

Discusses ‘belief perseverance’ - people make beliefs but can be extremely resistant to updating them in the face of fresh evidence. Also under-update ie make less of a change than the evidence demands. Beliefs are more likely to be fixed when they are closer to our identity eg in a domain that we feel expert in. See keep your identity fluid.

Dilution effect - being presented with irrelevant information can weaken our confidence in relevant information. Kinda like the availibility heuristic - replace “what is the confidence” with “how much of the evidence supports it”.

Superforecasters make many small updates to their predictions. While many are aware of Bayes theorem, few of them use it explicitly. Would have liked to see experiments on how well they can apply it eg the false positive problem. I suspect that even the forecasters who don’t know Bayes theorem explicitly would do well, otherwise how are they making accurate updates?

Perpetual Beta

Discusses how to practice.

Brief interlude about growth mindset.

Practice doesn’t always lead to skill gains, otherwise all our experts would be good forecasters. What makes practice effective?

Forecasters who read the booklet prepared by Tetlock and practiced forecasting performed better than forecasters who only read the book and forecasters who only practiced. Not surprisingly, practice is much more effective when you have an idea of what you are aiming for.

Effective practice also needs clear and timely feedback. Ambiguous feedback makes it harder to learn what you are doing right or wrong. Slow feedback gives you fewer opportunities to correct mistakes. This is supported by research on the accuracy of expert intuition in various domains - experts in domains where feedback is poor (eg financial pundits wait a long time to find out if they are right or wrong) are no better than laymen.

Hindsight bias - believing after the fact that the correct decision was obvious - is a major impediment eg experts who made predictions about the fall of the Soviet Union were asked to recall their estimates much later, and on average recalled a 31% higher confidence than they actually gave at the time. If you cannot acknowledge being wrong then you are not actually receiving feedback.

Anecdotally, the superforecasters were more likely to share postmortems and to credit some of their success to luck rather than foresight.

Mentions, without references, that research shows that good calibration doesn’t transfer between fields. Practice must be done in the field in which you want to be an expert. TODO track down the reference.

Superforecasters traits:

Notes that not all forecaster has every trait. ‘Perpetual beta’ is the strongest predictor of success, 3x stronger than intelligence. Defines perpetual beta as ‘the degree to which one is committed to belief updating and self-improvement’ but does not say how that is measured

It’s not clear to what extent all of these are supported by the research vs anecdotal evidence, especially given as I was dubious of some of the earlier lines of reasoning. TODO find the actual research.


Shifts focus to making predictions as a team.

Discusses the Bay of Pigs disaster. Rare consensus on all sides of the political spectrum that the plan was clearly awful. Independent analysis supports that it’s not just hindsight bias.

Lead to the coining of groupthink - “members of any small cohesive group tend maintain esprit de corps by unconsciously developing a number of shared illusions and related norms that interfere with critical thinking and reality testing.” ie the crowd is not always wise.

Forecasters organised into teams with advice to practice “constructive confrontation” and warned about groupthink. Advised to replace attacking with precise questioning. In effect, asking for the reasoning process behind the claim. Similar to reasons for using Fermi estimates in the first place.

Teams worked cooperatively but made their own predictions. Teams were 23% more accurate than individuals. People who scored highly enough to be placed on superforecaster teams increased their accuracy by 50%. Comparisons like these in the book vary terms - I’m assuming that they all mean Brier score.

Compared teams to commercial prediction markets. Forecaster teams lost by ~20% but superforecaster teams won by 15-30%.

There was a correlation between the AOM of the team and it’s accuracy. How do you measure a teams AOM?. The AOM of the team was not just a product of the AOM of it’s members, but also of the group dynamics that emerged.

Went through team interactions and categorized people as givers, matchers or takers depending on how much they contributed to the group vs how much they gained. Teams with givers were more successful. Not idea what this is measuring - at a guess improvement in score?.

Declares that “diversity trumps ability”. I dont see any mention of evidence.

Overall, this chapter is not hugely convincing. Feels tacked on.

The leaders dilemma

Near-universal agreement that (appearance of) confidence is vital for leadership. But false confidence leads to bad forecasting!

Discusses the shift in military organization begun by the Wehrmacht, from rigid hierarchies to pushing decisions down the chain of command. Communicate information and goals, not orders. Allows correctly responding to new circumstances without needing to wait for new orders. Allow uncertainty about situation but not about decisions.

Distinguishes intellectual humility (maintaining healthy doubt of your own ideas) from self-doubt (doubting your own abilities / attributes). Effectively, recognizing that uncertainty is unavoidable, not a sign of personal flaws.

Are they really so super?

Following the theme of the book, Tetlock includes criticism from other researchers.

Debate with Kahneman. Kahneman believes that bias is essentially unavoidable. Tetlock believes that it specific contexts humans can be trained to reduce bias. They worked together to test superforecasters for various biases.

Scope insensitivity - basically forgetting to account for variables. In this experiment, ask forecasters to make same prediction but give them each different time frames. Scope insensitivity predicts that the predictions will not vary enough between the different time frames. Superforecasters showed fair resistance to this bias (giving 15% odds if asked about a three month period vs 24% odds if asked about a six month period).

Postulates that superforecasters run thought experiments, varying variables such as time frame, and that this makes them consciously aware that they have to account for those variables. No evidence, but this seems like a useful habit anyway.

Also postulates that superforecasters are so practiced in engaging System 2 that it comes naturally. Doesn’t explain why calibration doesn’t transfer across fields. Perhaps a finer-grained theory is that they are know which factors are likely to trip them up in their specific field?

Nassim Taleb contends that forecasting is not as useful as it seems because noone can even imagine, let along predict, the Black Swan events that make the largest impact. Tetlock contends that many Black Swan events were imagined in advance, and that there are also plenty of predictable but important changes that shape the world just as much. More importantly, while the occurrence of some events are not predictable, the GJP shows that the consequences often are and this can be important for preparing for them.

A more compelling rebuttal from a different book - knowing the distribution of earthquake magnitudes allows predicting a 1/300 chance per year of the Fukushima earthquake. That doesn’t tell you whether it will happen, or when, but it does tell you that it is likely enough to be worth preparing for.

What’s next?

Biggest impediment to accurate forecasting is that making predictions is more often about advancing the interests of the predictor than it is about reaching for the truth.

Sees some promise in the fact that some fields have transformed entirely despite huge opposition. Was not inevitable, but was achievable through persistence. Evidence-based medicine is well established. Evidence-based policy is being introduced in some countries. Sabermetrics is well established, and more generally sport science is now well-regarded.

Discuses elements of good judgment other than good forecasting. Primary is good questioning - how do you decide which prediction problems to think about in the first place. This relates back to the Black Swan criticism - how do you avoid getting blindsided?

Wonders whether good forecasters also make good questioners, or whether generating questions comes more naturally to hedgehogs with Big Ideas.

Holds out hope for resolving partisan disputes by forcing both sides to work within a prediction framework so that over time their claims can be evaluated. Compares this to the scientific world, where progress is eventually made because both sides of a debate are forced to play by rules that allow evidence to accumulate.


The ten commandments of forecasting.

  1. Triage. Focus on questions where extra attention is likely to improve accuracy. Not too hard, not too easy. Weigh the consequences of wasting time on an unpredictable event vs failing to predict a predictable event before deciding whether to invest effort on a problem.
  2. Break seemingly intractable problems into tractable sub-problems ie use Fermi estimates.
  3. Strike the right balance between the inside and outside views.
  4. Strike the right balance between under- and over-reacting to new evidence.
  5. Look for the clashing causal forces at work in each problem. Weigh all the perspectives.
  6. Strive to distinguish as many degrees of doubt as the problem permits, but no more ie get comfortable using fine-grained estimates for fine-grained problems.
  7. Strike the right balance between under- and over-confidence, between prudence and decisiveness.
  8. Look for the errors behind your mistakes but beware of rearview-mirror hindsight biases. Own your failures!
  9. Bring out the best in others and let others bring out the best in you. Team dynamics matter as much as team composition.
  10. Master the error-balancing bicycle. These commandments are not enough - you need deep, deliberative practice.
  11. Don’t treat commandments as commandments.

It’s not surprising that a book which praises foxes ends with a list of tools and suggestions rather than a list of Big Ideas. Notice the number of commandments which are about balancing different ideas.


The appendix and the list of traits do a good job of summarizing the core lessons of the book. The methods seem well worth adopting, between the empirical evidence and the convincing alignment with other cogsci results. I’m not convinced by all of the explanations as to why the methods work, and there are a number of places where I want to follow up the references to see if they are more convincing.

The later chapters on teams and leadership appear much weaker. They are delivered with just as much confidence but, unless I missed some, far fewer references.

I’m particularly interested in the debate with Kahneman about whether or not these results reflect real changes in superforecasters abilities to overcome cognitive flaws, since this bears directly on the question of whether training/education can increase effective intelligence. The fact that calibration doesn’t transfer across fields is a poor sign.

Worth contrasting with The Signal and the Noise which is more reliant on anecdotal evidence but goes much further into exploring what kinds of events are predictable.