How to Effectively Communicate Forecast Probability and Analytic Confidence

06/Feb/21 14:11

At Britten Coyne Partners, we have often observed that research on issues related to anticipating, assessing, and adapting in time to emergent strategic threats is poorly shared across the military, intelligence, academic, and practitioner communities. This post is another of our ongoing attempts to share key research findings across these silos.

David Mandel is a senior scientist at Defense Research and Development Canada, specializing in intelligence, influence, and collaboration issues. Based on our review of the research, we regard Mandel as a world leader in the effective communication of forecast probability and uncertainty, which is the subject of this post.

Background

Many analysts agree that, even before the COVID pandemic arrived, the world had entered a period of “unprecedented” or “radical” uncertainty and disruptive change.

In this environment, avoiding strategic failure in part depends on effectively meeting three forecasting challenges:

• Asking the right forecasting questions;
• Accurately estimating the probability of different outcomes; and
• Effectively communicating the degree and nature of the uncertainty associated with your forecast.

As we have noted in past posts on our Strategic Risk Blog, as well as in our Strategic Risk Governance and Management course, techniques to help forecasters ask the right questions have received moderate attention.

That said, some powerful methods have been developed, including scenario analysis (which I first encountered in 1984 when taking a course from Pierre Wack, who popularized it at Shell); prospective hindsight, such as Gary Klein’s pre-mortem method; and Robert Lempert and Steven Bankes’ exploratory ensemble modeling approach.

In contrast to the challenge of asking the right questions, much greater attention has been paid to the development of methods to help analysts accurately forecast the answers to them, particularly in the context of complex adaptive systems (which generate most of the uncertainty we confront today).

In addition to the extensive research on this challenge conducted by the intelligence and military communities, we have also recently seen many excellent academic and commercial works, including best selling books like “Future Babble” by Dan Gardner, “Superforecasting” by Philip Tetlock and Dan Gardner, and “The Signal and the Noise” by Nate Sliver.

Compared to the first two challenges, the critical issue of forecast uncertainty, and in particular how to effectively communicate it, has received far less attention.

Some authors have constructed taxonomies to describe the sources of forecast uncertainty (e.g., “Classifying and Communicating Uncertainties in Model-Based Policy Analysis," by Kwakkel, Walker, and Marchau).

Other analysts have attempted to estimate the likely extent of forecast uncertainty in complex adaptive systems.

For example, in “The Prevalence Of Chaotic Dynamics In Games With Many Players”, Sanders et al find that in games where players can take many possible actions in every period in pursuit of their long-term goals (which may differ), system behavior quickly becomes chaotic and unpredictable as the number of players increases. The authors conclude that, “complex non-equilibrium behavior, exemplified by chaos, may be the norm for complicated games with many players.”

In “Prediction and Explanation in Social Systems”, Hoffman et al also analyze the limits to predictability in complex adaptive social systems.

They observe, “How predictable is human behavior? There is no single answer to this question because human behavior spans the gamut from highly regular to wildly unpredictable. At one extreme, a study of 50,000 mobile phone users found that in any given hour, users were in their most visited location 70% of the time; thus, on average, one could achieve 70% prediction accuracy with the simple heuristic, ‘Jane will be at her usual spot today’.”

“At the other extreme, so-called ‘black swan’ events are thought to be intrinsically impossible to predict in any meaningful sense. Last, for outcomes of intermediate predictability, such as presidential elections, stock market movements, and feature films’ revenues, the difficulty of prediction can vary tremendously with the details of the task.”

The authors note that, “the more that outcomes are determined by extrinsic random factors, the lower the theoretical best performance that can be attained by any method.”

In “Exploring Limits to Prediction in Complex Social Systems”, Martin et al also address the question, “How predictable is success in complex social systems?” To analyze it, they evaluate the ability of multiple methodologies to predict the size and duration of Twitter cascades.

The authors conclude that, “Despite an unprecedented volume of information about users, content, and past performance, our best performing models can explain less than half of the variance in cascade sizes … This result suggests that even with unlimited data predictive performance would be bounded well below deterministic accuracy.”

“Although higher predictive power [than what we achieved] is possible in theory, such performance requires a homogeneous system and perfect ex-ante knowledge of it: even a small degree of uncertainty … leads to substantially more restrictive bounds on predictability … We conclude that such bounds [on predictability] for other complex social systems for which data are more difficult to obtain are likely even lower.”

In sum, forecasts of future outcomes produced by complex adaptive systems (e.g., the economy, financial markets, product markets, interacting combatants, etc.) are very likely to be accompanied by a substantial amount of uncertainty.

David Mandel’s Insights

A critical question is how to effectively communicate a forecast’s probability and its associated uncertainty to decision makers.

A recent review concluded that, given its importance, this is an issue that surprisingly has not received much attention from researchers (“Communicating Uncertainty About Facts, Numbers And Science”, by van der Bles et al).

That is somewhat strange, because this is not a new problem.

For example, in 1964 the CIA’s Sherman Kent published his confidential memo on “Words of Estimative Probability”, which highlighted the widely varying numerical probabilities that different people attached to verbal expressions such as “possible”, “likely”, “probable”, or “almost certain”. Over the succeeding fifty years, multiple studies have replicated and extended Kent’s conclusions.

Yet in practice, verbal expressions of estimative probability, without accompanying quantitative expressions, still widely used.

For example, it was only after recommendations from the 9/11 Commission Report, and direction by the Intelligence Reform and Terrorism Prevention Act (IRTPA) of 2004, that on 21 June 2007 the Office of the Director of National Intelligence (DNI) released Intelligence Community (IC) Directive (ICD) 203.

This Directive established intelligence community-wide analytic standards intended to, “meet the highest standards of integrity and rigorous analytic thinking.”

ICD 203 includes the following table for translating “words of estimative probability” into quantitative probability estimates:

ICD 203

In our experience, nobody has written more about these issues than David Mandel.

To be sure, “Assessing Uncertainty in Intelligence” by Friedman and Zeckhauser is an important paper. However, it pales in comparison to the volume and breadth of Mandel’s research, including his contributions to and editorship of NATO’s exhaustive June 2020 report on “Assessment and Communication of Uncertainty in Intelligence to Support Decision-Making”.

In what follows, we’ll review some of his Mandel’s key findings, insights, and recommendations in three critical areas: (1) Communicating probability forecasts; (2) Communicating the degree of forecast uncertainty (or “analytic confidence”); and (3) Why organizations have been reluctant to adopt what researchers have found to be the most effective practices in both these areas.

Effectively Communicating Probability Forecasts

“As Sherman Kent aptly noted [in 1964], substantive intelligence is largely human judgment made under conditions of uncertainty. Among the most important assessments are those that not only concern unknowns but also potentially unknowables, such as the partially formed intentions of a leader in an adversarial state.”

“In such cases, the primary task of the analyst is not to state what will happen but to accurately assess the probabilities of alternative possibilities as well as the degree of error in the assessments and to giver clear explanations for the basis of such assessments.” (Source: “Intelligence, Science, and the Ignorance Hypothesis”, by David Mandel).

“Most intelligence organizations today use some variant of the Kent-Foster approach. That is, they rely on curated sets of linguistic probability terms presented as ordered scales. Previously, some of these scales did not use numeric probability equivalencies. However, nowadays most standards assign numeric ranges to stipulate the meaning of each linguistic probability term.”

“Efforts to transform the vagueness of natural language into something clearer reflect a noble goal, but the curated-list approach is flawed in practice and in principle. For example, flaws in practice include the fact that each standard uses a common approach, yet each differs sufficiently to undermine interoperability among key collaborative partners; e.g., an even chance issued by NATO could mean unlikely, roughly even chance, or likely in the US system.”

“Current standards also prevent analysts from communicating probabilities less than 1% or greater than 99%. This pre-empts analysts from distinguishing “one in a hundred” from “one in a million.” In the US standard, “one in a hundred” is the smallest communicable probability, while in the NATO and UK standards, “one in a million” would be indistinguishable from “one in ten.” Orders of magnitude should matter to experts because orders of magnitude matter in everyday life. A threat that has a 10% chance of occurring may call for a different response than if it had a one-in-a-million chance of occurring instead.”

“Intelligence organizations have naively assumed that they can quash the unruliness of linguistic probabilities simply by stating their intended meaning. Yet ample research shows that when people have direct access to a translation table, a large proportion still interprets linguistic expressions inconsistently with the prescribed meanings.”

“Noting the abysmal rates of shared understanding when probability lexicons are provided, researchers have recommended that numeric ranges be reported alongside linguistic probabilities in assessments [as in ICD 203]. However, this approach has yielded only modest improvements in shared understanding.”

“Studies show that people generally prefer to communicate probabilistic information linguistically, but that they also prefer to receive it numerically. These preferences are exhibited across a range of expert judgment communities, but are particularly pronounced when judgments are based on unreliable or incomplete information, as is characteristic of intelligence analysis.”

“Decision-makers want useful (i.e., timely, relevant, and accurate) information to support their decisions; they don’t wish to be reminded repeatedly what probability terms should mean to them when consuming intelligence. Any standard that encourages analysts to express anything other than their best probability estimate for the event being judged is suboptimal.”

Mandel also stresses that, “Explanation is [also] vital to intelligence since without it, a decision-maker would not know how the particular assessment was reached. Numeric assessments and clear explanations should work together to yield effective intelligence.”

(Source: “Uncertainty, Intelligence, and National Security Decision Making”, by David Mandel and Daniel Irwin).

Related research has also found that allowing forecasters to use narrower probability ranges than those specified in national guidelines like ICD-203. (See “The Value of Precision in Probability Assessment: Evidence from a Large-Scale Geopolitical Forecasting Tournament”, by Friedman et al).

Another problem is that, “Linguistic probabilities also convey ‘directionality,’ a linguistic feature related to but distinct from probability.

“Directionality is a characteristic of probabilistic statements that calls attention to the potential occurrence or non-occurrence of an event. For instance, if someone tells you there is some chance they will make it to an event, you will probably be more inclined to expect them to attend than if they had said it was doubtful, even though both terms tend to be understood as conveying low probabilities … These implicit suggestions can influence decision-making outside of the decision-maker’s awareness.” (Source: “Uncertainty, Intelligence, and National Security Decision Making”, by David Mandel and Daniel Irwin).

“Communicating probabilities numerically rather than verbally also benefits forecasters’ credibility. Verbal probabilities convey implicit recommendations more clearly than probability information, whereas numeric probabilities do the opposite. Prescriptively, we propose that experts distinguish forecasts from advice, using numeric probabilities for the former and well-reasoned arguments for the latter.” (Source: “Cultivating Credibility With Probability Words And Numbers”, by Robert Collins and David Mandel).

Effectively Communicating Forecast Confidence (or Uncertainty)

Probabilistic forecasts are based rest on a combinations of (1) facts, (2) assumptions about critical uncertainties; (3) the evidence (of varying reliability and information value) supporting those assumptions; and (4) the logic used to reach the forecaster’s conclusion.

A forecast is typically assessed either directly (by judging the strength of its assumptions and logic), or indirectly, on the basis of the forecaster’s stated confidence in her/his conclusions.

In our forecasting work with clients over the years, we have found that discussing the assumptions made about critical uncertainties, and, less frequently the forecast logic itself, generates very productive discussions and improved predictive accuracy.

In particular, we have found Marvin Cohen’s approach quite practical. His research found that the greater the number of assumptions about “known unknowns” [i.e., recognized uncertainties] that underlie a forecast, and the weaker the evidence that supports them, the lower confidence one should have in the forecast’s accuracy.

Cohen also cautions that the more assumptions about “known unknowns” that are used in a forecast logic, the more likely it is that more potentially critical “unknown unknowns” remain to be discovered, which again should lower your confidence in the forecast (e.g., see, “Metarecognition in Time-Stressed Decision Making: Recognizing, Critiquing, and Correcting”, by Cohen, Freeman, and Wolf).

Mandel focuses on expressions of “analytic confidence” in a forecast, which are the established practice in the intelligence world.

In a number of different publications, he highlights many shortcomings in the ways that analytic confidence is currently communicated to users of estimative probability forecasts.

“Given that intelligence is typically derived from incomplete and ambiguous evidence, analysts must accurately assess and communicate their level of uncertainty to consumers. One facet of this perennial challenge is the communication of analytic confidence, or the level of confidence that an analyst has in his or her judgments, including those already qualified by probability terms such as “very unlikely” or “almost certainly”.

“Analytic confidence levels indicate the extent to which “assessments and estimates are supported by information that varies in scope, quality and sourcing.”

“Consumers [i.e., forecast users] are better equipped to make sound decisions when they understand the methodological and evidential strength (or flimsiness) of intelligence assessments. Effective communication of confidence also militates against the pernicious misconception that the Intelligence Community (IC) is omniscient.”

“Most intelligence organizations have adopted standardized lexicons for rating and communicating analytic confidence. These standards provide a range of confidence levels (e.g., high, moderate, low), along with relevant rating criteria…

“There is evidence that expressions of confidence are easily misinterpreted by consumers … There is also evidence that the terms stipulated in confidence standards are misunderstood (or at least misapplied) by intelligence practitioners.”

“For example, here is the three level confidence scale used by the Canadian Forces Intelligence Command (CFINTCOM):

Canada Confidence

In the CFINTCOM framework, “Analytic confidence is based on three main factors:

(1) Evidence: “the strength of the knowledge base, to include the quality of the evidence and our depth of understanding about the issue.”

(2) Assumptions: “the number and importance of assumptions used to fill information gaps.”

(3) Reasoning: “the strength of the logic underpinning the argument, which encompasses the number and strength of analytic inferences as well as the rigour of the analytic methodology applied to the product.”

To show how widely standards for communicating forecast confidence vary, Mandel contrasts those used by intelligence and military organizations with the framework and ratings used by the Intergovernmental Panel on Climate Change (IPCC):

IPCC Confidence

After comparing the approaches used by different NATO members, Mandel finds that, “The analytic confidence standards examined generally incorporate the following determinants:

• Source reliability;
• Information credibility;
• Evidence consistency/convergence;
• Strength of logic/reasoning; and
• Quantity and significance of assumptions and information gaps.”

However, he also notes that, “few [national] standards attempt to operationalize these determinants or outline formal mechanisms for evaluation. Instead, they tend to provide vague, qualitative descriptions for each confidence level, which may lead to inconsistent confidence assessments.”

“Issues may also arise from the emphasis most standards place on evidence convergence as a determinant of analytic confidence … Convergence can help eliminate false assumptions and false/deceptive information, but may not necessarily prevent analysts from deriving high confidence from outdated information. Under current standards, a large body of highly credible and consistent information could contribute to high analytic confidence, despite being out of date. A possible solution would be to incorporate a measure of information recency”.

“The emphasis on convergence may also lead analysts to inflate their confidence by accumulating seemingly useful but redundant information” (e.g., multiple reports based on same underlying data).

“In evaluating information convergence, confidence standards also fail to weigh the reliability of confirming sources against disconfirming sources, or how relationships between sources may unduly influence their likelihood of convergence. Focusing heavily on convergence can also introduce order effects, whereby information received earlier faces fewer hurdles to being judged credible.”

Mandel concludes, “It is unlikely that current analytic confidence standards incorporate all relevant determinants. For instance, confidence levels, as traditionally expressed, fail to consider how much estimates might shift with additional information, which is often a key consideration for consumers deciding how to act on an estimate.

“Under certain circumstances, the information content of an assessment may be less relevant to decision makers than how much that information (and the resultant forecast estimate) may change in the future. Analytic confidence scales could incorporate a measure of “responsiveness,” expressed as the probability that an estimate will change due to additional collection and analysis over a given time period (e.g., there is a 70% chance of x, but by the end of the month, there is a 50% chance that additional intelligence will increase the estimated likelihood of x to 90%).”

“In addition to responsiveness and evidence characteristics, current conceptions of analytic confidence fail to convey the level of consensus or range of reasonable opinion about a given estimate. Analysts can arguably assess uncertainty more effectively when the range of plausible viewpoints is narrower, and evidence characteristics and the range of reasonable opinion vary independently.”

For example, “In climate science, different assumptions between scientific models can lead researchers to predict significantly different outcomes using the same data. For this reason, current climate science standards incorporate model agreement/consensus as a determinant of analytic confidence.”

(Source: “How Intelligence Organizations Communicate Confidence (Unclearly)”, by Daniel Irwin and David Mandel).

Mandel also observes that, “analysts are usually instructed to assess probability and confidence as if they were independent constructs. This fails to explain that confidence is a second-order judgment of uncertainty capturing one’s subjective margin of error in a probabilistic estimate. That is, the less confident analysts are in their estimates, the wider their credible probability intervals should be.

“An analyst who believes the probability of an event lies between 50% and 90% (i.e., 70% plus or minus 20%) is less confident than an analyst who believes that the probability lies between 65% and 75% (i.e., 70% plus or minus 5%). The analyst providing the wider margin of error plays it safer than the analyst providing the narrower interval, presumably because the former is less confident than the latter.”

(Source: “Uncertainty, Intelligence, and National Security Decision Making”, by David Mandel and Daniel Irwin).

Organizational Obstacles to Adopting More Effective Methods

In words that are equally applicable to the private sector forecasts, Mandel notes that, “intelligence analysis and national security decision-making are pervaded by uncertainty. The centrality of uncertainty to decision-making at the highest policy levels underscores the primacy of accurately assessing and clearly communicating uncertainties to decision-makers. This is a central analytic function of intelligence.”

“Most substantive intelligence is not fact but expert judgment made under uncertainty. Not only does the analyst have to reason through uncertainties to arrive at sound and hopefully accurate judgments, but the uncertainties must also be clearly communicated to policymakers who must decide how to act upon the intelligence.”

“Thomas Fingar, former US Deputy Director of National Intelligence, described the role of intelligence as centrally focusing on reducing uncertainty for the decision-maker. While analysts cannot always reduce uncertainty, they should be able to accurately estimate and clearly communicate key uncertainties for decision-makers.”

“Given the importance of uncertainty in intelligence, one might expect the intelligence community to draw upon relevant science aimed at effectively handling uncertainty, much as it has done to fuel its vast collections capabilities. Yet remarkably, methods for uncertainty communication are far from having been optimized, even though the problem of uncertainty communication has resurfaced in connection with significant intelligence failures.”

We could make the same argument about the importance of accurately assessing uncertainty and emerging strategic threats in the private sector, and its association with many corporate failures. As directors, executives, and consultant, we have frequently observed the absence of best practices for communicating forecast uncertainty in private sector organizations around the world.

Mandel goes on, “Given the shortcomings of the current approach to uncertainty communication and the clear benefits of using numeric probabilities, why hasn’t effective reform happened?”

“In part, organizational inertia reflects the fact that most intelligence consumers have limited time in office, finite political capital, and crowded agendas. Efforts to tackle intelligence-community esoterica deplete resources and promise little in the way of electoral payoff. High turnover of elected officials also ensures short collective memory; practitioners can count on mistakes being forgotten without having to modify their tradecraft [i.e., analytical practices]. Even when commissions are expressly tasked with intelligence reform, they often lack the requisite knowledge base, resulting in superficial solutions.”

“Beyond these institutional barriers, intelligence producers and consumers alike may view it in their best interests to sacrifice epistemic quality in intelligence to better serve other pragmatic goals.”

“For forecast consumers, linguistic probabilities provide wiggle room to interpret intelligence estimates in ways that align with their policy preconceptions and preferences—and if things go wrong, they have the intelligence community to blame for its lack of clarity. Historically, intelligence consumers have exploited imprecision to justify decisions and deflect blame when they produced negative outcomes.”

Unfortunately, that’s equally true in the private sector.

(Source: “Uncertainty, Intelligence, and National Security Decision Making”, by David Mandel and Daniel Irwin).

However, it’s not just forecast consumers who are to blame for the current state of affairs. As Mandel notes, “Given that there is far more to lose by overconfidently asserting claims that prove to be false than by underconfidently making claims that prove to be true, intelligence organizations are likely motivated to make timid forecasts that water down information value to decision-makers—a play-it-safe strategy that anticipates unwelcome entry into the political blame games that punctuate history.”

(Source: “Intelligence, Science and the Ignorance Hypothesis”, by David Mandel)

Conclusion

As we noted at the outset, even before the COVID pandemic arrived the world had entered a period of unprecedented or radical uncertainty and disruptive change.

In this environment, avoiding failure in part depends on effectively meeting three forecasting challenges:

• Asking the right forecasting questions;
• Accurately estimating their possible outcomes; and
• Effectively communicating the degree and nature of the uncertainty associated with your forecast.

Meeting these challenges has proven to be difficult in the world of professional intelligence analysis; this is even more so the case in the private sector, as the history of corporate failure painfully shows.

Of these three challenges, effectively communicating the degree and nature of the uncertainty associated with forecasts has received the least attention.

Fortunately, David Mandel has made it his focus. His research is too little known and appreciated outside the intelligence community (and even within it, unfortunately).

By briefly summarizing his research here, we hope Mandel’s work can help far more organizations to improve their forecasting practices and substantially improve their chances of avoiding failure and achieving their goals.

Britten Coyne Partners advises clients how to establish methods, processes, structures, and systems that enable them to better anticipate, accurately assess, and adapt in time to emerging threats and avoid strategic failures. Through our affiliate, The Strategic Risk Institute, we also provide online and in-person courses leading to a Certificate in Strategic Risk Governance and Management.

Britten Coyne Partners

Strategic Risk Governance and Management Experts

How to Effectively Communicate Forecast Probability and Analytic Confidence