THE MAP IS NOT THE LAND
All Models Are Wrong, Some Are Useful
The Wrong & The Useful
Whenever we make a decision, we do so with a set of assumptions that are either conscious or subconscious. Even something as simple as deciding to go for a hike holds some assumptions such as “it would be good to get some exercise outside” or “there won’t be a bear ready to eat me.” This is entirely reasonable and necessary when making a decision. Where we get into trouble is when we’re not careful about the underlying validity of the things we’re assuming to be true. When our assumptions are false, our decisions are likely to produce counterproductive and – at best – unintended consequences. That brings us to models, forecasts, and measurements.
While undoubtedly useful and important, there are significant risks to utilizing models, forecasts, and measurements in decision-making. To be clear up front, this is a “baby in the bathwater” situation. I’m not advocating for you to throw out every bit of data you use to make decisions and to “use your gut” going forward. Instead, the intent of this article is to make these tools more useful to you by helping highlight their vulnerabilities and risks. By understanding their shortcomings, you’re far better equipped to benefit from models while simultaneously avoiding their pitfalls.
“All models are wrong, but some are useful” is the guiding mantra of this approach. The phrase is attributed to George Box, a statistician, although the concept shows up in earlier writings as well. It recognizes the inherent and unavoidable limitations of models while simultaneously acknowledging they can be incredibly useful. That balance is essential to utilizing models effectively, and unfortunately it isn’t always a widespread practice. Today, there is a rampant overreliance on models that leads to poor decision making at nearly all levels of public and private sectors. Consider the incredible failures of polling models for the 2016 election where many polls “put Clinton’s chance of winning at anywhere from 70% to as high as 99%.” Additionally, although there are many contributing factors, look to the replication crisis going on in the scientific community. Some studies have found only 11% of “landmark” cancer biology research replicates, only about 40% of psychology research replicates, and about a third of economic studies from top tier journals don’t replicate.
Despite these shortcomings, leaders and policy makers routinely rely on these tools – often with far too much faith – to guide their decision making. This article will explore how the very nature of models makes them inaccurate, how we “confuse the map for the world,” why we fixate on what we can measure and how it harms us, and how to think about models to make them as useful as possible.
Please note: For this article, I will be using the phrase “model” loosely to include forecasts, measurements, logic models, statistical models, and so forth.
Models are simplified representations – which means they are incomplete
It is the inherent nature of models to leave information and context out. They serve as a way for us to simplify the incredible multitude of variables that go into a decision, process, or system. This is essential as without some form of simplification we couldn’t develop models for much beyond extremely simple games (eg: tic tac toe, blackjack). This simplification is obvious when considering maps which are designed to represent, abstractly, the geographical features of an area. The map may capture rough elevation changes over the area, represent in general the vegetation features of the land, and provide a visually representative version of distance between things, but it still leaves out immense detail compared to the actual land itself. This is obviously true for models designed to represent something, so what about models designed to predict?
For example, if I wanted to develop a model to predict what sandwich a friend may order at lunch- that also included all variables I would need to consider- it would include things like availability of certain ingredients, how hungry they are, what events my friend encountered throughout the day, how their mood influences what they eat and order, how the time of day for ordering influences their decisions, and so forth – with each of these components requiring extensive supporting knowledge as well (eg: knowing available ingredients means we need knowledge of the current restaurant supply chain, etc). Alternatively, I could simplify the model to something like “if very hungry, big sub sandwich – if not very hungry, simple ham and cheese.” This simplification to focus on one variable (how hungry) may be all I need to make an accurate prediction most of the time, but it is important to note that it leaves a multitude of variables out. If there isn’t a lot of consequence to being wrong, approaches like this are relatively harmless.
Unfortunately, many of the things we attempt to model in the real world are far more complex and interconnected than guessing what our friend will order for lunch – and the stakes are much higher. The trajectory of a pandemic, stock prices, effectiveness of new drugs, weather and climate, business and economic trends, hazard and vulnerability analyses, and so forth are all filled with incredible complexity, interconnectedness, and a potential for cascading and interacting consequences. Given their size and scope, it entirely makes sense that some information is left out – it would be impossible to capture it all. In this way, all models are “wrong” in they don’t display reality, in its entirety, for what it is. There is nothing inherently bad about this but there are significant tradeoffs that are often not recognized.
The Ludic Fallacy, coined by Nassim Taleb, describes the fallacy of using games to model real life. It comes from a tendency to believe that, although we are removing or ignoring variables that influence the outcome, we still have an accurate model that can be relied upon. The last piece is critical – it is entirely reasonable to design a model by removing/ignoring variables, but it is essential that we remain skeptical of its applicability and accuracy. Unfortunately, that skepticism isn’t widely practiced particularly in the face of motivated reasoning and uncertainty avoidance. The COVID-19 pandemic was a case study in the overconfidence of models and experts with “…only 44% of predictions from the expert group [falling] within their own 75% confidence ranges” and that many of these models “lack a thorough validation and a clear communication of their uncertainties”. Novelty, complexity, lack of comprehensive data, individual and collective action, and the sheer number of contributing variables make such predictions near impossible. This is in addition to ongoing statistical challenges such as sampling biases and errors, faulty statistical analysis particularly around fat-tailed distributions, and motivated model or study design (funding loyalty, “publish or perish,” political whims, etc).
Keep in mind, the issue here isn’t that we attempt to model these systems and phenomenon – it is that we are too confident in our modeling or forecasting and do not adequately account for what happens if we are wrong. It is a failure to remember the very nature of a model is to leave things out – to not give the whole picture – and in doing so we risk missing a critical interaction or variable that determines the final outcome. Often, when a prediction is accompanied with a lot of letters behind the author’s name, involves a bunch of math and maybe some colorful graphs, and boils everything down to a concrete number we tend to view it as truly reliable – mistaking precision for accuracy (precision bias).
Leaders and policy makers tend to grab onto these sorts of findings as they feel solid, reliable. They give a clear target and something to measure progress against. They reduce uncertainty even if the forecasted outcome is negative. Meanwhile, those who engage in careful skepticism around such findings (often those skilled in designing models) and those who must deal with the consequences if the model is wrong (practitioners, those impacted) often take a far more realistic perspective around forecasting. While they may form opinions on how accurate the model is and act accordingly, they almost always also consider the consequences if it is wrong and plan accordingly. It isn’t hard to find someone who has a story about how a forecast, model, or analyst was wrong and they were left cleaning up the mess.
We tend to replace the thing with the symbol, “the map is not the land”
As hinted to above, there is a tendency to become fixated on models and measures to our own detriment – tuning out contradicting facts or failing to acknowledge our lack of comprehensive understanding. We have a tendency, particularly during times of stress and uncertainty, to replace the “thing” itself with the symbol. This phenomenon is apparent across nearly all parts of society. How many purchases are made to display a symbol of wealth opposed to increasing one’s true wealth or quality of life? Alternatively, consider the college ranking system where some colleges have been accused of submitting faulty data in attempt to rise in the rankings. Opposed to actually improving the conditions and learning environment at the college, administrators became more focused on improving the perception of the college particularly when it came to rankings. The rankings (the symbol) became more important than the thing itself (learning environment at the college). These symbols (measures, models) can be harmful if not approached with adequate skepticism and a commitment to the underlying thing itself (actual outcomes).
One of the first major challenges that arises when a model, and subsequently a measure, is used to guide decision-making is the value of the model/measure itself. Since these models are often created to guide a decision of some consequence, people will care about what the measure shows and the subsequent actions taken because of it. In this way, the model/measure itself shapes behavior which, while it may be intended to do so in some circumstances, will likely lead to outcomes that are not fully intended. This is captured in the concept of Goodhart’s law which states that “[a]ny observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.” To put it simply, when we use a measure as a target it often ceases to be a useful measure. Take for example the case of some researchers rising in prominence due to their citations dramatically increasing, a key measure for a researcher’s career performance, despite the overwhelming majority of their citations being self-cited. This is a case of mistaking the symbol (total number of citations) and the thing itself (the importance and relevance of developed works as recognized by other researchers in the field). It is the difference between operations and public affairs; one is based on what is actually done while the other is focused on what people think was done.
This leads us to our second principle related to “confusing the symbol for the thing itself”; the more important the model/measure is the more likely it is people will try to game it. This is also called Campbell’s law which indicates that “the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor.” Imagine a used car lot with a set of employees. To drive sales the manager comes up with an idea to give bonuses based on the number of cars sold. At the end of the month while reviewing the sales, the manager notices that they moved more cars than any month before – and despite this- their monthly profit had decreased! Looking closer into the transactions, the manager realizes that cars were sold at much lower margins due to the sales staff trying to move vehicles to hit their bonus. By designing a measure that focused on the number of cars sold, the manager inadvertently setup a measure designed to be gamed to the business’s detriment. Look to the United States financial system – both 2008 and now- as another example, an incredible amount of fragility and risk has been allowed to accumulate due to gaming of risk models and measures (among many other factors). Despite being implemented to help reduce risk, these approaches have done little to mitigate catastrophic and cascading risk to our systems. It is the primacy of the symbol over the thing, and a lack of recognition for how measures/models are gamed, that leads us astray here.
Finally, our fixation on what we can measure leads us to ignore what we cannot. In an attempt to move away from uncertainty or to approach things more scientifically, there is a tendency within policy and leadership circles to focus on what is measurable. As noted previously in this article, this makes some sense; a clear target provides both direction and a way to evaluate progress towards a goal. This sort of thinking is even embedded in popular approaches to developing goals. The SMART criteria, which is widely used in management programs and is an approach taught for developing objectives as part of Incident Command System, focuses on this concept as well with “M” representing “measure” - that a target should be quantifiable to evaluate progress. We love to measure; it makes things feel more solid, controllable. The issue isn’t that measurements shouldn’t be a tool in our toolbox, or that it isn’t immensely useful, it is that a fixating only on what is measurable causes us to leave out warning signals and critical information. This fixation has been coined as the McNamara fallacy and was described as follows by Daniel Yankelovich, a social scientist:
“The first step is to measure whatever can be easily measured. This is OK as far as it goes. The second step is to disregard that which can't be easily measured or to give it an arbitrary quantitative value. This is artificial and misleading. The third step is to presume that what can't be measured easily really isn't important. This is blindness. The fourth step is to say that what can't be easily measured really doesn't exist. This is suicide.” - "Corporate Priorities: A continuing study of the new demands on business" (1972)
If you’ve worked within an organization or association of any moderate size, you can likely call up personal examples where you saw each of these steps. Within emergency management, the second step is particularly rampant when it comes to pseudo-measurement. The number of risk, hazard, and vulnerability analyses with largely abstract measurements is as mind numbing as it is frightening. Often these analyses are filled with rough intuitive guesses made by planners during workshops, are packed with data that isn’t properly sampled to represent the indicator or is based on faulty assumptions, and combine all sorts incompatible of measures together to bring it to a single indicator. All in an effort to have a neatly color-coded chart/map or to break down an incredibly complex concept -such as vulnerability- to a single number, as if combining all that information into one number is scientific because it is precise and measurable (once again, confusing precision for accuracy).
This is not to say the act of going through that analysis isn’t extremely valuable or that most of the data itself isn’t useful, but rather, it is unrealistic to imagine that we can accurately represent incredibly complex processes in our world with a single model or measure. The issue here isn’t the use of models and measurements. Instead, what makes us vulnerable is a fixation on what is measurable, ignoring what is not, and mistaking precision for accuracy. It is far more robust to capture what we do know and be extremely aware and frank about what we do not, including the limitations and biases of our models.
Lessons Learned - Making the Wrong Useful
While this article strives to create some degree of caution and skepticism around models, it would be foolish to suggest that these tools are not incredibly useful. There is rarely a disaster response I’ve been on that doesn’t utilize, and effectively so, some sort of model. Weather forecasts are routinely used to inform disaster response activities, hazardous plume modeling is deployed for both preparedness and response planning, and even mass care activities (sheltering, feeding) can be forecasted to some degree by using socioeconomic data. I’m not against models; I’m against elevating them beyond their usefulness to the point they make us vulnerable and self-assured in our thinking. With that in mind, I’d like to propose some useful concepts and principles to avoid these pitfalls.
The first principle is embedded in the phrase that defines this article; “all models are wrong, but some models are useful.” By recognizing that- by their very nature- models are wrong because they leave information out, we are empowered to do two things. First, we can approach each model ready to ask “in what way is this model wrong?” This practice of starting with skepticism allows us to properly orient to what is left out of the model or analysis – paying attention first to its blind spots. If you cannot identify what the model leaves out, you either don’t know enough about the model or it is modeling something extremely simple (eg; poker odds). If the person who built the model cannot tell you what it leaves out, they do not understand the thing they are modeling or they are not willing to share its shortcomings – these shortcomings exist in any model that deals with any degree of complexity greater than casino games. The more you’re aware of the shortcomings of a model the more useful the model becomes because you can apply proper skepticism to what it may get wrong.
Secondly, this phrase encourages us to practice intellectual humility when engaging with models and plans built on them. The most practical way to do this is to care just as much about what you do if the model is wrong as you do about the model being correct. Opposed to evaluating the likelihood of the model being correct about a particular issue, instead consider how to manage the consequences of either a false-positive or a false-negative. Even if you’re 95% sure a model is correct, if the consequence is death and ruin if it is wrong it isn’t a bet worth taking - especially if the upside of it being correct isn’t equally impactful. Practicing intellectual humility means accounting for the consequences of our decision-making, or modeling in this case, being incorrect. This lesson became apparent to me early on in my disaster response career.
During my first major response, I had a choice to locate a shelter at a school that was very close to the impacted area (wildfire) or at a stadium that was roughly 20 minutes away. While not a long commute, it would add time to every trip a survivor or responder took during the incident. I was assured that, based on forecasting work, it was highly unlikely the smoke would move to impact the school. Despite these assurances, I decided the risk of moving an entire shelter of recently evacuated people was simply too much compared to the potential benefit of saving 20 minutes from everyone’s commute. The school was instead used for a response coordination center – which had to move operations shortly after due to wildfire smoke. In this case, I had literally no idea what the true odds of the smoke coming were but I certainly understood the consequences that would follow if it did.
The second principle will be much shorter to explain but is equally important; be careful about the desire to find what you’re looking for. While this seems obvious, we’re all subject to confirmation bias and motivated reasoning, it is critical when dealing with data and models because they feel solid, scientific. Things that feel solid and scientific are extremely tempting when dealing with highly complex or uncertain situations. We need to keep in mind that people are inherently uncomfortable with uncertainty (uncertainty avoidance), we often try to ignore complexity and second-order effects (myopia), and we tend to practice unrealistic optimism in face of strong negative events (optimism bias). These tendencies are actually enabled by having more data and it is important to note that the amount of available data is growing dramatically year over year. With enough data, you can find nearly anything you want to find. Consider the absurd correlations captured by this website - almost all with far stronger correlations than you’ll find in most academic literature. This tendency to find what we want to find is even stronger when it comes to issues that are moral or political in nature. One clear lesson from research around morality and reasoning comes from how we judge evidence. With evidence that supports our stance we tend to ask “can I believe this?”, and with evidence that contradicts our stance we tend to ask “must I believe this?” With motivated reasoning, our biases, and enough data, we can largely find evidence to support nearly any position we wish to take. A commitment to finding the truth as a paramount value, even when we don’t like our results, is critical to navigate both the noise and political agendas.
We now come to the issue of using measurements to track and reward progress. As noted, measures will naturally become gamed, don’t represent the entire picture, focus too heavily on what can be quantified and tend ignore what cannot, and may cause us to focus on pseudo-precision over accuracy. These challenges should be considered natural and inherent- something to be expected and accounted for when using measures. In addition to recognizing these tradeoffs, there are a few practical techniques as well. We should avoid degrading both the qualitative and the incredible value of expertise gained from actually doing the work opposed to theorizing about it. There is a wealth of research showing the importance of operational experience in managing crisis, complexity, and risk (see recognition primed decision making for example). We also must continue to practice intellectual humility by recognizing that our attempts to measure and direct progress may produce unintended consequences.
Lastly, it is often far more useful, in general, to focus on the broad direction opposed to goal posts. While many management processes attempt to set a measurable objective to determine movement and success (eg: SMART approach), progress in the real world often isn’t linear and we are not nearly as good at planning the future as we would like to believe. Efforts often stall, have starts and stops, and at times leap ahead due to favorable and unforeseen circumstances aligning. By focusing more on goal posts than the broad direction, we are operating with the subtle belief that we can forecast with high accuracy how both our actions will unfold and the actions of the rest of the world. Instead by focusing on our broad direction over time, we account for both our intent to move in some direction and that the world gets a say in how that goes. Furthermore, this allows us to account for shortcomings in our measurement approach as even a ruler with inaccurate measurements can tell you if something is growing. This isn’t to say I don’t set clear goal posts when I design an objective, I do, but I do not elevate those goal posts beyond what they are – a rough target set with imperfect knowledge and judgement.
Hopefully, through this article you’ve been able to see both the “baby” and the “bathwater” when it comes to using models and measurements to guide our decision-making and planning. If it isn’t clear, these tools are incredibly useful for us if approached with the proper skepticism and knowledge of their limitations. Doing so requires both intellectual humility and courage as it often requires us to challenge held assumptions and sit with uncomfortable uncertainty – neither of which are particularly fun for people. Although uncomfortable, it is far better than the alternative of failing to achieve our mission, falling short of what we are capable of, and setting up our teams for failure.