Noise | Shortform

1-Page Summary

Noise, by Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, is about how to improve the judgments that affect some of the most important aspects of our lives, including our justice system, medical care, education, and business decisions. As the title suggests, the book focuses on noise, which the authors define as unexpected and unwanted variance in human judgments. The authors argue that if we can understand what noise is, then we can reduce it—and thereby drastically reduce unfairness, loss of money, and even loss of life.

Noise draws on the authors’ expertise across multiple fields. Kahneman is a Nobel Prize-winning psychologist and the author of the award-winning Thinking, Fast and Slow. Sibony is a professor of strategy, a business consultant, and an author of business strategy books. Sunstein is a legal scholar and co-author of the award-winning Nudge. Noise also draws on decades of research and the authors’ own experiences as noise consultants in business settings. The book aims at a general audience, but it may be of particular interest to anyone in charge of an organization that depends on human judgment.

This guide is organized into two major parts. We begin by explaining what noise is and why it’s such a problem. To understand it better, we break noise down into different types and analyze the psychological tendencies that produce it. Then, we explore strategies designed to reduce noise. Throughout the guide, we expand on Noise’s arguments by connecting them to similar ideas from other works and to contexts ranging from financial trading to baseball scouting.

What Noise Is and Why It Matters

Before we can tackle the problem of noise, we have to understand what noise is, where it comes from, and why it’s a problem worth solving. This section begins by looking at how noise introduces error into judgments and how these errors lead to unfairness, financial loss, and physical harm. Then, we’ll look at how noise arises as a result of the way our minds work.

What Noise Is

The authors define noise as one of the two main errors in human judgment (the other being bias). To understand noise, we must first define judgment.

Judgment

A judgment is an attempt to mentally assign a value to something in order to choose a course of action. The authors break judgments down into predictions and evaluations.

Predictions aim to come as close as possible to some correct value or answer. The authors point out that insurance underwriters make predictions when they prepare quotes, and they’re aiming at a theoretical goldilocks number (just right). If the premium is too low, the company loses money. If the premium is too high, the company loses customers. (Shortform note: A similar predictive calculation comes into play in any field that, like insurance, requires balancing risk against potential profit. In a worst-case scenario, noise in these calculations can lead to a full-blown financial collapse.)

Likewise, doctors make predictions when they diagnose patients; they are trying to find the correct cause(s) of the patients’ ailments. The authors point out that you can measure the accuracy of a predictive judgment by comparing the prediction to the correct answer once it’s known. (Shortform note: Although in principle you can check a prediction’s accuracy by comparing it to the result, measuring predictive accuracy is a complicated task in its own right. Many predictions are too vague or too qualified for us to really judge them.)

Other judgments are evaluations; they have no correct answer, but instead require the decision-maker to balance pros and cons as best as possible. The authors point out that judges make evaluations when they decide how to sentence criminals, or whether to grant asylum. Similarly, teachers make evaluations when they grade essays. The authors contrast evaluative judgments with predictive judgments by pointing out that since there is no “correct” answer to an evaluative judgment, you can’t measure the accuracy of an evaluation in the same way you can with a prediction.

Critiques of Noise’s Statistical Basis

Noise uses relatively complex statistical concepts and formulas to argue that reducing noise is always beneficial because, mathematically, doing so reduces overall error. These statistical concepts rely on comparing erroneous values to a known, correct value—which doesn’t exist in evaluative judgments. As a result, some reviewers have questioned whether it’s accurate to call variances in evaluations “noise.” Meanwhile, other reviewers have critiqued the authors’ use and explanation of statistical concepts and terminology.

Readers should be aware of these criticisms. With that in mind, this guide forgoes these statistical foundations in order to focus on the larger, actionable principles that run throughout Noise.

As a general rule, judgments are neither purely factual nor purely opinion-based. A doctor reading the results of your blood panel isn’t making a judgment (though if she sees an anomaly, she might make a judgment about its cause). Likewise, your preference for one band over another isn’t a judgment either.

Given how important professional judgments are in so many areas of our lives, we would certainly hope and expect that these judgments are accurate, consistent, and error-free. We expect a certain amount of deviation from one judger to the next and from one case to the next, because judgments take place in situations where we expect some room for disagreement among qualified, well-informed, reasonable people.

(Shortform note: This is especially true in the case of evaluative judgments, which occur in situations where individual subjectivity comes into play. We know and accept that some teachers and some judges are more or less lenient than others. All the same, we also expect that our schools, courts, and other public institutions should be fair and consistent. One teacher might give a paper a B+, whereas another might give that same paper an A-. But if the same student paper receives an A from one teacher and an F from another, something has gone wrong.)

(Shortform note: This guide occasionally uses the term “judger” or “judgers” as a generic way to describe anyone who makes a professional judgment as defined above. We use this term to avoid confusion with judges as the people who preside over courtrooms (though by our definition, these judges are also judgers).)

Noise (and Bias)

To improve judgments, the authors argue that we need to reduce error as much as possible by correcting for noise and bias. The authors use the following metaphor to explain noise and bias: Think of a judgment as a target at a shooting range, where inconsistency = noise and inaccuracy = bias.

If you see a target with all the shots grouped tightly around the bullseye, you know that the shooters were both accurate and consistent (no bias, no noise).
If you see a target with all the shots grouped tightly some distance away from the bullseye, you know the shooters were consistent but not accurate (no noise, but biased).
If the shots are generally centered on the bullseye, but not tightly grouped, you know the shooters were generally accurate but inconsistent (not biased, but noisy).
If you see a target with the shots spread widely and not centered on the bullseye, you know the shooters were both noisy and biased.

The authors make the further point that if you flip over the targets so that you can see the shots but not the bullseye, you can no longer detect bias or accuracy, but you can still easily see noise, because noise refers to the spread of the shots. That means that you can detect and correct for noise without knowing the correct answer to a prediction. It also means that you can detect noise in evaluative judgments, which, as we’ve seen, are situations where there is no correct answer by which to measure the quality of the judgment. In both cases, you can also address noise without needing to know whether the judgments were biased or not. That’s because noise is the degree of inconsistency between one judgment and the next.

Don’t Mistake Low Noise for Accuracy

It’s important to keep in mind that reducing noise is not a matter of improving accuracy per se—it simply means reducing variation. This is one way in which the target shooting metaphor could be misleading. We might be tempted to think that reducing noise would get us closer to the bullseye, when in fact, reducing noise only means bringing our shots closer to each other. They might still be off the mark.

For example, in the image above, it’s possible that after reducing noise, all of the shots might converge around the shot marked “A” in the graphic, rather than around the bullseye as we would hope. In that case, we’d still need to improve our overall aim by reducing bias or finding other ways to be more accurate.

In The Signal and the Noise, Nate Silver provides further caution against conflating noise reduction with accuracy. Silver uses a graphic almost identical to the one in Noise, but in his case, the spread of the shots represents precision—the tighter the grouping, the more precise the forecast. The problem, he says, is that when we look at forecasts, we tend to mistake precision for accuracy with sometimes devastating results—as in the 2008 financial crisis. High accuracy (in other words, low noise) creates the illusion that forecasters are on target and thereby masks both uncertainty and bias.

While noise and bias contribute equally to overall error, the authors stress that noise is a more pertinent problem than bias because it’s less recognized and less understood. They argue that as a society, we realize that bias is an issue and try to prevent or correct for it. The same isn’t true of noise. They also suggest that noise is harder to understand than bias because it occurs on a statistical level (you need a certain number of shots before you can see how spread apart they are) which is different from how we think. We’ll explore this idea at length later.

We might be tempted to think that noise averages out over time. The authors argue that it doesn’t. They point out that if the target being aimed at is a just sentence, an accurate diagnosis, or a prudent business decision, then every miss is costly, and these costs don’t cancel each other out—they compound each other.

Is “Noise” a New Idea?

The authors argue that noise, as they define it, is a new and unexplored idea. Some of Noise’s critics suggest that the book’s arguments are just a repackaged version of previous work by other authors or of common-sense ideas.

To be sure, the ideas in Noise do draw on similar work from other sources. While the definition of noise as variance specific to human judgment does seem to be novel, the broader idea of noise as statistical variance isn’t. For example, Fischer Black identifies noise (which he contrasts with information) as a fundamental component of several economic models. Similarly, Nate Silver defines noise as junk data (as opposed to signal, by which he means useful information or meaningful patterns) in his treatise on how to improve our predictions.

Moreover, Noise builds extensively on ideas that Kahneman previously explored in Thinking, Fast and Slow. As we’ll see, many of the underlying sources of noise trace back to errors and biases outlined in that book. In fact, we could even think of Noise as an exploration of what happens when the thinking errors from Thinking, Fast and Slow manifest on a larger scale in systems and organizations.

Three Types of Noise

Because we have defined noise as the amount of variance in outcomes, it would be easy to think of noise as random. But it isn’t. Once we know what to look for, we can see that noise comes in three main types: level noise, pattern noise, and occasion noise.

1) Level noise occurs when one person’s average judgments differ from the average person’s average judgments in a consistent way. For example, some teachers grade more or less harshly than others over time. Similarly, some economic forecasters are more or less optimistic than others over time. The key idea here is that level noise refers to the overall patterns of each judger compared with the average overall patterns of all judgers. (Shortform note: Level noise may not be consistent over time; in fact, it may be noisy itself. One study has shown that graders inflate scores over time because they mistake their growing comfort with the act of grading for an increase in the quality of the materials being graded. An effect like this doesn’t invalidate Noise’s point so much as it demonstrates how complicated the problem is.)

2) Pattern noise is the deviation that occurs when a judger is unusually affected by a specific situation for one reason or another. For example, a forecaster might typically be more optimistic than most, but a specific scenario (for example, evaluating a startup company) causes her to be more pessimistic than most of her peers would be about the same case.

Pattern noise occurs as the result of people’s personalities and unique experiences. Some of this noise is stable over time. Conversely, some pattern noise is transient—the result of current or recent circumstances.

For example, maybe our forecaster lost money on startups earlier in her career, and now she is always cautious about them. This is an example of stable pattern noise.
Or, maybe our forecaster read an article this morning about a different startup that failed spectacularly, and so she’s feeling cautious about startups right now; yesterday she might have felt differently. This is an example of transient pattern noise.

(Shortform note: There’s some overlap between the concept of transient pattern noise and the concept of occasion noise outlined below. Though the authors don’t say so explicitly, the difference seems to be that transient pattern noise results from factors specific to an individual, whereas occasion noise consists of more universally applicable factors that affect everyone in similar ways.)

3) Occasion noise describes the variability within a single person caused by numerous seemingly random factors. The authors point to studies showing that judgments can be affected by any of the following:

The weather: One study shows that college admissions officers weigh academic credentials more heavily on cloudy days and non-academic factors more heavily on sunny days.
Mood (and sports): Several studies show that judges sentence more harshly following a loss by their local football team and more leniently following a win. Presumably, their sports-induced mood is influencing their judgment.
Time of day: Studies have found that doctors are more likely to prescribe opioids toward the end of the day than earlier in the day. It’s possible that when doctors are tired, stressed, and rushed, they make diagnostic mistakes and possibly reach for a quick fix in pill form.
The order in a series of judgments: When judges have granted asylum to several people in a row, they become increasingly likely to deny asylum (and vice versa), likely due to an unconscious attempt to maintain balance.
The order that information is presented: If you hear that a politician is smart, driven, charismatic, and ruthless, you probably form a different picture than if you heard about a politician who is ruthless, charismatic, driven, and smart.

How to Fight Occasion Noise

Occasion noise is tricky. Like pattern noise, it can be hard to predict it in advance or even to notice it while it’s happening. Plus, you can’t exactly standardize the weather to make sure everyone gets the same judgments. That said, there are a few techniques that might help minimize the influence of occasion noise:

Some types of occasion noise (such as that arising from the order of information) can be controlled with proper decision-making strategies like the ones listed later in this guide.

Other occasion noise (like tired doctors overprescribing opioids) might be prevented by watching out for information overload and excessive stress; though that’s easier said than done in many professions.

We might be able to mitigate factors like weather, mood, and time of day by having multiple judgers assess a situation independently before comparing notes. The idea is that different judgers will be subject to different occasion factors and will balance out. Ideally, the judgers will also follow a procedure that minimizes occasion noise by telling them what information to pay attention to (see the sample hiring procedure later in this guide to learn how).

The Three Types of Noise in Action

Though the authors break noise into three types for analytical purposes, it’s worth pointing out that in practice, any or all of the three types can be at play in a given situation. To get a sense of how that works, imagine the following. You’ve been convicted of shoplifting. Judge Thompson will decide your sentence.

Judge Thompson, on average, delivers much lighter sentences than his colleagues. This is level noise—and reason for you to be optimistic about your punishment.

His parents own a small retail business that has had serious problems with shoplifting over the years. He therefore sentences shoplifters much more harshly than do his peers. This is pattern noise—and bad news for you.

He just returned from a relaxing holiday and is in a great mood this morning. This is occasion noise—maybe you’ll get a break after all.

If your case had gone to a different judge, all of these variables would be different—and so would your sentence. This is one way the justice system is noisy. But if you’d committed a different crime, or even if you’d caught Judge Thompson on a different day, your sentence would likewise be different. That’s another way the justice system is noisy. And the same holds true for any system relying on human judgments.

Where Noise Comes From

Once we understand what noise is and why it matters, we can move toward finding ways to reduce it. But to do so effectively, we need to look more closely at where noise comes from. We’ve already seen a few sources of noise–such as personal biases (preferences, backgrounds, affiliations, beliefs, and so on) that lead to level noise and pattern noise, as well as the more random factors (mood, weather, the order or timing of decisions or information, and so on) that contribute to occasion noise. In addition, noise occurs because of the way our minds see the world and because of the way we act in group situations.

(Shortform note: A lot of the ideas in this section reflect Kahneman’s previous work in Thinking, Fast and Slow. Noise acknowledges these connections but mostly glosses over them. We spell them out more clearly below since they help clarify the ideas in this section.)

Psychological Source #1: Cause and Effect Thinking

The authors argue that a major reason why our judgments are noisy is that we think about the world in terms of cause and effect. This is also why we don’t notice noise and have a hard time understanding it when it’s pointed out: noise is statistical and made up of many cases, whereas our minds tend to consider one case at a time.

This preference for causality is misleading because it’s shaped through the lens of hindsight. Once an outcome is known, we examine what we know about the situation and attribute one or more of those factors as the cause of the outcome. The authors point out that most events are neither completely surprising nor completely expected. We don’t give much consideration at all to these “normal” events, and as a result, it seems like they would have been completely predictable–when in reality, we couldn’t have reliably predicted them had we tried.

Because we think the causes of everything that happens around us are obvious and inevitable, we think we can predict things. We don’t see just how arbitrary and contingent most events are (at least from the perspective of prediction). The authors explain that this is because the relevant causes often only become known at the same moment the outcome is known.

The Narrative Fallacy

These ideas are related to the narrative fallacy from Thinking, Fast and Slow, whereby we explain occurrences as though they fit a coherent story, when in fact they may have been completely random. Imagine that a company conducted a round of layoffs by firing employees whose names were drawn out of a hat. If you didn’t know how the layoffs were decided, and if one of the fired employees was your friend, you might recall an argument your friend recently had with her boss and assume the boss had it out for her.

If, on the other hand, the employee in question was someone you didn’t like much, perhaps you’d attribute the firing to some deficiency in skill or character. In either case, you would probably conclude that the firing made logical sense (even if it was unfair) and believe that it was foreseeable; in reality, it was entirely random.

Psychological Source #2: Matching Operation

Another source of noise arises from an intuitive matching operation by which we attempt to predict or evaluate something by comparing it to similar things we have more information about. This operation introduces noise because of the oversimplifications inherent in the procedure, as well as the limit on our ability to discern quantitative differences with any great resolution. (Shortform note: This is a type of heuristic, a concept explored in Thinking, Fast and Slow. A heuristic is an operation our mind performs to solve a difficult problem quickly. Specifically, the mind tries to substitute something easier or more familiar to generate an answer.)

For example, you can tell if it’s sunny or cloudy out. That’s a qualitative judgment: Are there clouds in the sky or not? You can generally tell if it’s hot or cold, too. But if you were exposed to a series of different temperatures and asked to rank them from coldest to warmest, you would quickly make mistakes. (According to the authors, studies have found that we can rank things into about seven levels of quality or intensity before we start to make ranking errors.) You’ll do okay if you can directly compare one item to another, but given a set of items and asked to rank or categorize them, you’ll make errors more easily than you’d think.

Finally, the authors point out that any type of judgment that requires assessing things on a scale becomes noisier as the scale becomes less defined. Without proper context and a shared frame of reference for what values mean and how they should be assigned, judgers are forced to guess in a way that makes the judgment arbitrary. Since each person guesses differently, the scale becomes noisy. (Shortform note: We’ll explore ways to improve rating scales later in this guide.)

Not only do we each produce our own noise through the way we understand the world, but when people work in groups to reach judgments, social factors add new sources of noise.

For one thing, popularity (real or perceived) affects how people view information. An idea that receives public support or popularity early on is more likely to succeed, regardless of the idea’s inherent merit. This phenomenon is called an information cascade. When one person shares an opinion, the next speaker is more likely to agree with that person unless they have good reasons not to. The effect becomes stronger with each person who gets on board with the initial opinion. When a group is making a judgment, many people might start out undecided or with mixed feelings. Their decision, therefore, is usually determined by the opinion that began the information cascade.

Because most people are in agreement at the end of the process, we think the outcome was inevitable—but it wasn’t. Given a different starting point, a different outcome could have occurred. This isn’t obvious to us because each real-world situation (like this one) only plays out once.

Groups are also susceptible to polarization, which means that members move to a more extreme version of their initial opinions. If each member of a hiring committee feels mildly enthusiastic about candidate A, by the end of the meeting, they might now feel passionately excited about candidate A. Conversely, if some members feel mildly enthusiastic about candidate A while others feel mildly enthusiastic about candidate B, then the polarizing effect could lead to a stalemate in which half the committee strongly supports A while strongly opposing B, and vice versa.

Overcoming Groupthink

Information cascades and polarization can also feed into each other. Through random chance, the first speaker influences the group in a certain direction (starting an information cascade), and then the polarization effect ensures that the group moves decisively in that direction—even if no member of the group felt particularly decisive about any direction when they came into the meeting. The interaction between these two phenomena might be one source of what’s traditionally been called “groupthink.”

In Originals, Adam Grant argues that in a corporate setting, groupthink also directly results from the company’s attitudes toward dissenting voices. In these settings, being transparent, inviting dissent, and choosing leaders who genuinely welcome criticism can all help reduce the risk of groupthink when making decisions. These principles are worth keeping in mind when we discuss the wisdom of crowds later in this guide. You can only tap into crowd wisdom when a group consists of people with different viewpoints and when those people feel free to voice their ideas.

How to Reduce or Eliminate Noise

Now that we understand what noise is and where it comes from, we can look at steps to reduce or eliminate noise from judgments. The authors of Noise offer a few solutions, including mechanical judgment tools (models and algorithms) that can replace or augment human judgment as well as a set of suggestions for reducing noise in human decision making.

Detecting and Measuring Noise

Typically, the first step in reducing noise is figuring out how much noise there is in the first place. This step is necessary because administrators tend to believe that their organizations make judgments consistently, and until they can see the problem firsthand, they may be resistant to change.

To determine how much noise is present in a company, organization, or system, the authors outline a noise audit process they use when consulting with businesses. The book includes an appendix with detailed guidelines for conducting a noise audit. The general gist is that an organization would give a set of sample cases to all of its members whose job it is to make judgments about such cases. For example, an insurance company would give a set of sample claims to all of its adjusters. The judgers being audited complete their judgments independently, and then the results are compared to see how much variability there is throughout the organization.

Mechanical Judgments

Once noise has been detected, there are several options for reducing it. One option is to remove human judgment from the equation altogether. To do so, decision-making can be handled via statistical models or computer algorithms.

Though they touch on several mechanical judgment methods (which we’ll explain briefly below), the authors are more interested in reducing noise in human judgment rather than replacing human judgment with mechanical judgment. This is in part because, as the authors point out, mechanical predictions currently can’t do anything humans can’t do—they just do it with better predictive accuracy. The authors argue that this improved accuracy mostly results from the elimination of noise, and so we might see the efficacy of mechanical judgments more as a demonstration of the benefits of noise reduction rather than as a blanket solution to the problem of noise.

(Shortform Note: Although the authors come down in favor of improving rather than replacing human judgments, they perhaps don’t make this point as clearly as they could, given the way some reviewers focus on the dangers of algorithms as a major criticism of the book’s recommendations. Indeed, the authors spend a lot of time explaining models and algorithms and defending them from potential criticism, which perhaps creates a misleading impression of how central they are to Noise’s proposed course of action. To keep the focus on ways to improve human judgment, we’ve kept the following discussion of models and algorithms brief and to the point.)

Statistical Models

One way to make predictions is by using a statistical model. A statistical model is a formula that uses weighted variables to calculate the probability of an outcome. For example, you could build a statistical model that predicts the likelihood of a student graduating college by assigning weights to factors like high school GPA, SAT scores, number of extracurricular activities, whether the student’s parents graduated college, and so on.

Studies have shown that simple statistical models that apply weighted averages of relevant variables consistently outperform human predictive judgments. In fact, the authors provide studies suggesting that any statistical model, whether carefully crafted or cobbled together at random, can predict outcomes better than humans can.

The authors argue that this superior performance is simply because statistical models (and by extension, algorithms) eliminate noise. Even the crudest or most arbitrary model has the advantage of being consistent in every single case. And while human judgers can weigh subtle subjective factors that a model can’t take into account, the authors suggest that this subjectivity tends to add more noise than predictive clarity. As we saw earlier, we’re not very good at recognizing which factors are relevant to our predictions.

Computer Algorithms

Another more recent and more complex form of mechanical judgment is the computer algorithm. The authors explain that computer algorithms build on the basic idea of statistical modeling, but they also come with additional benefits that improve their accuracy. Because they take into account massive data sets and can be programmed to learn from their own analysis, algorithms can detect patterns that humans cannot. These patterns can form new rules that improve the accuracy of the judgments.

The authors acknowledge that algorithms are not perfect—and that if they are trained using data that reflects human bias, they will reproduce that bias. For example, if an algorithm built to predict criminal recidivism is built from a data set that reflects racial biases in the justice system, the algorithm will perpetuate those racial biases. (Shortform note: For example, after years of development, Amazon discovered that its recruitment algorithm systematically favored men over women. Likewise, Facebook’s advertising algorithms have come under fire for helping to spread everything from fake news to hate speech.)

Combining Mechanical and Human Judgment

Because the authors are most interested in finding ways to improve human judgment, they don’t give much attention to the option of combining human and mechanical judgment. This hybrid approach has real-life precedents and may sometimes be the best way to tackle a problem.

For example, after the success of Michael Lewis’s Moneyball, some baseball teams began favoring rigorous statistical analysis over traditional scouting when deciding which players to acquire. At the time, there wasn’t a great statistical way to measure players’ fielding skills, so some teams neglected defense in favor of more easily measured offensive skills. In practice, these teams gave up so many runs that they offset the benefits of their new statistical approach.

In more recent years, most teams have adopted statistical modeling techniques, but the most successful teams have combined these models with old-fashioned human scouting. This hybrid approach works because scouts can account for things that models can’t, such as the mental factors needed to succeed in professional baseball.

Baseball provides a counterargument to Noise’s cautions against human subjectivity. But the key here is that teams have learned how to combine human and mechanical judgments in ways that maximize the strengths and minimize the weaknesses of each.

Decision Hygiene

Despite the potential advantages of mechanical judgments, the authors are most interested in finding ways to reduce noise in human judgments. They say that the best way to improve human judgments is by implementing “decision hygiene”—consistent, preventative measures put in place to minimize the chance of noise. Decision hygiene consists of a loose set of suggestions, practices, and principles which we explore below. (Shortform note: With one exception (see the Sample Hiring Procedure below), the authors don’t lay out a specific, systematic course of action. Presumably, organizations should strive to implement as many of the following suggestions as are relevant and practicable.)

Think Statistically

Recall that our normal, causal way of thinking is prone to errors and biases that manifest as noise. To make our thinking more accurate, we have to take a statistical view. The authors suggest that instead of treating each case as its own unique item, we should learn to think of it as a member of a larger class of similar things. Then, when predicting how likely an outcome is, we should consider how likely that outcome is across the whole class. Returning to an earlier example, if we’re trying to predict the likelihood that a student will graduate from college, we first need to know what percentage of all incoming college students end up graduating from college.

How to Think Statistically

Our failure to think statistically is a major theme in Thinking, Fast and Slow. In that book, Kahneman offers a more detailed look at thinking errors of this type and suggests ways to overcome them. As is also suggested in Noise, the basic idea is to take base probabilities into account.

In The Signal and the Noise, Nate Silver suggests another approach to statistical thinking based on a statistical formula known as Bayes’ Theorem. When making a prediction using Bayes’ Theorem, you start with a preliminary guess about the likelihood of an event. Ideally, this guess is based on hard data, such as a base probability. Then you make some calculations in which you adjust the starting probability in the face of specific evidence relating to the thing you are trying to predict. Finally, you repeat this process as many times as you can, each time starting with your most recently updated probability.

This approach has two advantages. First, it explicitly accounts for the noise in human judgment by building human estimates and predictions into the formula. Second, it calls for repeated testing of a prediction or hypothesis in order to improve accuracy in response to updated evidence. Interestingly, Silver argues that a Bayesian approach would have prevented the replicability crisis that has recently plagued the sciences—including some of the studies in Thinking, Fast and Slow.

Choose (and Train) Better Judgers

The authors argue that it’s possible to improve the quality of human judgers. We can do so by finding better judgers in the first place and by helping judgers improve their techniques and processes.

There are two factors to keep in mind when identifying good judgers. Some fields deal with objectively right or wrong outcomes; in these cases judgers can be measured by their objective results. However, as the authors point out, other fields are based instead on expertise, which can’t be measured with a metric. But judgers in any field can be assessed by their overall intelligence, their cognitive style, and their open-mindedness; these traits are correlated with better judgment skills. The authors emphasize, however, that intelligence alone doesn’t make someone a good judger. The other two traits are just as important, if not more.

The authors also note that some members of the general population are superforecasters, and their predictions are consistently more accurate than those of the average trained expert. Ideally, these are the people we should hire or appoint as judgers. The authors identify several traits exhibited by superforecasters that we can use to choose better judgers, or to better train the judgers already in place:

They are open-minded.
They are willing to update their opinions and predictions when new evidence arises.
They naturally think statistically; unlike most of us, it does occur to them to consider factors like base rates.
They break down problems and consider elements using probability rather than relying on a holistic “gut feeling” about the answer.

Hedgehogs and Foxes

Noise’s discussion of superforecasters draws on the work of Philip E. Tetlock and Dan Gardner. In Superforecasting, Tetlock and Gardner offer a particularly colorful description of what makes superforecasters so super: They tend to be foxes, not hedgehogs. The basic idea is that a person with a hedgehog personality tends to see the world through the lens of one big idea, they make snap judgments about things, and they’re extremely confident in their predictions. By contrast, a person with a fox personality tends to collect little bits of information about a lot of things, approach a problem slowly and from multiple angles, and be cautious and qualified about his or her predictions.

As you might guess, Tetlock and Gardner suggest that foxes make better predictors than hedgehogs. Luckily, the rest of us can practice fox skills, too. We can learn to recognize and avoid our own cognitive biases. We can generate multiple perspectives on a problem. And we can learn how to break down problems into smaller questions.

If these techniques feel familiar, that’s because they are essentially the same as many of the recommendations in Noise.

Sequence Information Carefully

Because judgments are subject to influence from information, contextual clues, confirmation bias, and so on, it’s important to carefully control and sequence the information that judgers receive. The authors provide a few guidelines for implementing this strategy:

As a basic rule, judgers should only be given what they need when they need it.
We must make sure independent judgments are in fact independent; if the person verifying the result knows the first person’s conclusion, he or she is more likely to verify it.
Finally, the authors suggest that judgers should document their conclusions at each step in the process; and if new information leads them to change their decisions, they should explain and justify why.

(Shortform note: It’s also important to consider how much information judgers receive. Both Malcolm Gladwell and Nate Silver point out that information overload leads to bad decisions, either because we don’t focus on what’s most important, or because we get overwhelmed and fall back on familiar patterns and preconceived notions.)

Aggregate Judgments

Another way to reduce noise, and to actually turn it into a positive, is by aggregating judgments. You can collect several independent judgments and then compare or average them; or you can assemble teams who will reach a judgment together. According to the authors, these techniques harness the wisdom of crowds, a demonstrated effect by which the judgments of a group of people tend, as a whole, to center on the correct answer.

This technique works best if you assemble a team whose strengths, weaknesses, and biases balance each other out. The idea is to get as many different perspectives on a problem as you can in hopes of finding the best answer somewhere in the middle.

(Shortform note: The authors say elsewhere that noise doesn’t average out, but that’s for a bunch of noisy decisions in a system; here we’re talking about averaging out opinions before a final decision is made and before any action is taken.)

One practical way to aggregate judgments within a typical meeting setting is the estimate-talk-estimate procedure:

First, each member of the group privately makes an estimate—some kind of forecast, prediction, or assessment.
Then each person explains and justifies his or her estimate.
After the discussion, each member then makes a new estimate based on the discussion. These second-round judgments are aggregated into a final decision.

Because this procedure requires that each person start with an independent judgment, it reduces the noise that comes from information cascades and polarization. At the same time, it balances individual psychological biases by encouraging outlier opinions to move toward the middle. (Shortform note: This estimate-talk-estimate procedure has drawbacks as well. For example, because its goal is to build consensus, it can discourage dissent and lead to a false sense of agreement much like the information cascades it is meant to avoid. Alternative approaches like policy delphi and argument delphi avoid this pitfall by aiming not at consensus, but at generating a wide range of dissenting perspectives.)

How to Make Better Judgments on Your Own

Most of the authors’ suggestions for noise reduction are targeted at organizations, but what if you want to improve your own judgments as an individual? Some of the suggestions in this section are simple enough to adopt as an individual. For example, you can practice thinking statistically or breaking down problems on your own. But how can you aggregate judgments if you are working alone instead of in a group?

The trick is to generate as many perspectives as possible before you make a decision or a prediction. One way to do that is to read as much as you can about the problem at hand. Find as many different perspectives and opinions as possible—remember, you are trying to replicate the benefit of crowd wisdom, which only works when you bring together a diversity of viewpoints.

Another way to generate alternate perspectives is to deliberately search for information that would disprove your prediction or your preferred course of action. This technique is called negative empiricism, and it gives you more perspective on a problem while also avoiding some of the logical fallacies you might otherwise fall prey to.

Break Judgments Into Smaller Components

The authors suggest that it’s easier to avoid noise when you break an overall decision into a set of smaller, more concrete subjudgments. Standardized procedures, checklists, guidelines, and assessments help here. For example, educators can reduce noise in essay grading by using rubrics. Asking the grader to assign individual scores to the paper’s originality, logical clarity, organization, and grammar before computing a final grade makes judgment easier. Breaking down a judgment in this way also helps make sure that every judger is following the same procedures and paying attention to the same factors.

The authors concede that this strategy isn’t perfect. They point out that in the field of mental health, the DSM—a book meant to aid and standardize mental diagnoses—has hardly reduced diagnostic noise. One reason is that psychiatrists and psychologists are likely to read signs and symptoms through the lens of their training and background. In other words, different theoretical understandings of the mind and of these kinds of disorders shape how different professionals understand the facts with which they are presented.

Are Some Fields Just Noisy?

The authors suggest that mental health diagnoses are inconsistent because of the different training and theoretical orientations of different mental health professionals. That’s true, but there’s also reason to think that mental health might be an inherently noisy field.

One reason for this is that mental health conditions overlap and influence each other: if you suffer from depression, there’s a good chance you also suffer from anxiety. Likewise, it can be difficult to separate mental health from physical health. Moreover, professionals disagree on the best practices for diagnosing and treating mental health issues, including basic questions such as whether a given set of symptoms is a disorder or just a difference.

These factors suggest that some fields might be more prone to noise—and more resistant to noise reduction—than others. That’s not to say that mental health care, for instance, can’t be made less noisy. Doing so just might require analysis and reform that is beyond the scope of the noise hygiene techniques we’re exploring here.

Use Rules and Standards

One way to break judgments into smaller parts is to implement rules and/or standards. (Shortform note: The authors introduce rules and standards as part of a larger discussion about the pros and cons of implementing noise reduction. We think it’s worth looking at rules and standards as noise hygiene strategies, which is why we’ve included them here.)

Rules offer explicit guidance typically tied to objective measures. For example, there is a maximum allowable blood alcohol content above which a driver can be charged with drunk driving.
Standards are suggestive guidelines that require some amount of subjective interpretation and implementation. For example, law enforcement officers are trained to recognize potential signs of impairment (e.g., erratic driving) and to issue field sobriety tests.

In deciding between rules and standards, the authors say we should first determine which will lead to more errors. They also point out that sometimes it isn’t possible to implement rules because the people making the rules can’t agree (for example, because of political or moral differences) or because the people making the rules don’t have the information needed to write an appropriate rule.

The authors further suggest that in some cases, the best approach is to combine rules and standards. Mandatory sentencing guidelines take this approach, setting a minimum and maximum sentence for a given crime (rule) and otherwise asking judges to determine a just sentence for each individual case (standard).

Second-Order Decisions

Rules and standards are examples of what Sunstein and Edna Ullmann-Margalit call second-order decisions—strategies we use to reduce our cognitive burdens when decisions are too numerous, too repetitive, too difficult, or too ambiguous to make one by one. Other second-order decisions include:

Presumptions, which are rule-like guidelines that allow the possibility of exceptions in some cases.

Routines, such as always brushing your teeth right before bed.

Taking small, reversible steps, such as pet-sitting for a neighbor’s dog before making the commitment to adopt a dog of your own.

Picking at random rather than choosing deliberately, such as throwing a dart at a map to decide where to go on vacation.

Delegating, such as allowing your partner to choose dinner tonight.

Heuristics, such as the matching operation described earlier in this guide.

Use Better Scales

As noted earlier, a lot of noise comes from our attempt to judge things using scales. If the scale is unclear, too complex, or inappropriate for the task, there will be noise. If the scale requires judgers to interpret or calibrate the scale themselves, there will be noise. Therefore, in cases where scales are useful or necessary, we need to design better ones.

The authors argue that as a general rule, comparative scales are less noisy than absolute scales. The authors give the example of job performance ratings, which are noisy in part because the traditional numerical scales are unclear and are interpreted differently from one reviewer to the next. What constitutes a “6” in “communication skills,” or in “leadership?” Without explicit guidance about what the numbers mean and how they correlate to the qualities they measure, each person will have a different understanding of how to score an employee.

Instead of evaluating employees in terms of an absolute number, the authors say it’s better to rank employees. For example, on an employee’s communication, ask whether their skills fell in the top 20% of the company, or the next 20%, and so on. As noted earlier, we are generally better at comparing things than at quantifying them in the abstract.

(Shortform note: Recall the earlier discussion of matching operations and the way our minds substitute an easier question in place of a more complex one. Without clear guidance, something similar probably happens with a vague rating scale, as we replace the question “How does X’s communication rate out of 10?” with something like “How impressed am I with X’s communication?” or “How clear do I find X?”)

A comparative scale also provides concrete anchor points and clear descriptions or markers for each point. A good anchor point correlates a specific value on the scale with a relevant example of the thing being evaluated (if you’re grading a paper and you know that a “C” grade represents average work, that’s your anchor point). To minimize noise, anchor points should be provided ahead of time so that each judger starts with the same frame of reference.

(Shortform note: Anchoring is another concept drawn from Thinking, Fast and Slow. The basic idea of anchoring is that an initial piece of information (for example, a suggested donation amount) has a major influence on the actions we take (in this case, how much we decide to donate). By suggesting that scales come with clear anchor points, the authors of Noise seek to take advantage of this psychological effect by using it to calibrate judgers’ assessments.)

Example: A Sample Hiring Procedure

To get a sense of how to apply these decision hygiene practices in a real-world setting, the authors provide an overview of Google’s hiring process. In brief, the process is as follows:

Determine what skills are most important to the position you are hiring for.
Develop scales to measure each candidate on each skill determined in step 1.
Interview each candidate multiple times with different interviewers (Google uses four interviews). The purpose of the interviews is to rate the candidates on their skills. Interviews must be conducted independently of each other (interviewers can’t compare notes yet).
The hiring team meets to discuss their results, review the data they have amassed, and, finally, to share their impressions and make an overall decision.

Putting It All Together

The above process synthesizes many of the suggestions we have explored in the second half of this guide. The authors point that out, but not in a clear and methodical way. The following analysis shows how Google’s procedure incorporates several decision hygiene techniques:

In step 1, the company breaks down a bigger judgment—Whom should we hire?—into smaller components.

In step 2, the company uses the insights into building better scales to ensure that they collect high-quality, consistent data

In step 3, the company carefully sequences information, making sure interviews are truly independent of one another. This sequencing is aided by setting clear rules and standards that govern the actual interviews and keep them consistent and on point.

In step 4, the company aggregates judgments by asking multiple interviewers to reach a decision together.

On a larger level, the process as a whole also sequences information by asking interviewers not to consider their subjective impressions until this last step. By doing so, the company leaves room for its hiring team to have personal reactions to candidates, but it makes sure those reactions are mediated by the hard data collected throughout the rest of the process and by the group wisdom of the multiple interviewers (each of whom has had a chance to independently form an opinion about the candidate).

While this procedure specifically describes a hiring process, the authors point out that the process can easily be adapted to other types of business decisions, such as whether to make an investment or whether to acquire or merge with a rival company. (Shortform note: The authors provide a detailed hypothetical scenario to show how to do this. To keep it simple, refer back to the principles outlined above: Break down the problem, figure out how to gather the data you need, keep careful control over the information-gathering process, and then aggregate the data and the resulting judgments.)

Shortform Takeaway: Improving Evaluative Judgments

As you’ve seen throughout this guide, a lot of the ideas contained in the book have been explored elsewhere, including in the authors’ own previous works. Yet there is at least one important takeaway from the book that does seem like a new idea, and that is the argument that we should treat evaluations the same way we treat predictions.

To explain that point further, recall that Noise breaks judgments down into two types: predictive (e.g., forecasting a stock’s future value) and evaluative (e.g., grading an essay). We’re accustomed to accepting that evaluative judgments are inherently subjective—there is no correct answer by which to measure their quality, and so it seems that there is no way to improve the quality and accuracy of evaluations. Certainly there is less literature on how to improve evaluations than there is on how to improve predictions.

Yet, if we accept Noise’s arguments that evaluations and predictions are the same kind of thing (judgments), that both suffer from noise to the same degree and for the same reasons, and that reducing noise is a good thing, then it follows that we can improve our evaluative judgments. We can do so by subjecting our evaluations to the same advice that many authors have already offered for making better predictions. It’s a point that Noise makes early on but warrants a highlight, because this insight appears to be truly original.

Exercise: How Has Noise Affected You?

We’re all affected by the judgment errors that noise leads to. These errors can reduce the quality and consistency of our legal system, educational institutions, workplaces, and more.

Think of a time you relied on a judgment made by an institution of some sort. Describe the situation. What was the context? What was the judgment?
What actions or outcomes did the judgment lead to? Remember, this could be a situation that benefited you or harmed you.
Now think about the ways noise may have affected the judgment. Don’t worry about accurately analyzing the situation; the point is to brainstorm about how noise may have affected the judgment you received.
Are there ways you could have avoided or counteracted the noise you encountered? How could the institution that judged you have reduced the noise that affected their judgment?

Exercise: Improve Your Own Judgments

Let’s explore how you might apply the ideas in Noise to improve the quality of your judgments.

Think of a situation where you make judgments with important stakes. Maybe it’s your job to make professional judgments. If not, you might think of a major personal decision involving a prediction or an evaluation, like how to invest your money, or where to live. Describe the judgment and its context.
Now, identify some kinds of noise that affect the judgment.
Which techniques or strategies could you use to improve your judgment(s)? How would they help, and how would you implement them?