CHP 7
CHP 7
In reality, of course, you know that punches in movies are actually just choreographed with little or no
actual contact. But knowing this doesn't stop System 1 from making automatic causal inferences. In
other words, it's a kind of cognitive illusion: System 1 will continue to infer causation even when System
2 knows it isn't really there.
Now imagine a red dot on your computer screen. A green dot moves towards it, and as soon as they
touch, the red dot moves away at the same speed and in the same direction. If the timing is right, you
can't avoid feeling like you saw a causal connection, a transfer of force. But if, instead, they don't touch
and the red dot only moves after pausing for a moment, it feels like the red dot moved itself.
Consciously, we know the dots are just pixels on a screen that don't transfer force at all. But we can't
shake the sense that we're seeing a direct causal connection in one case and not the other.
In addition, our minds are so prone to perceiving patterns that we often seem to find them even in
random, patternless settings—for example, we see faces in the clouds and animal shapes in the stars. In
the environments of our ancestors, there was a great advantage to finding genuine patterns, and little
downside to over-detecting them. (Is that vague shape in the shadows a predator, or nothing at all?
Better to err on the side of caution!)
The result is a mind that's perhaps a bit too eager to find patterns. For example, if you consider the
sequence "2...4...," the next number probably just pops into your head. But hang on—what is the next
number? Some people think of 6, while others think of 8 or even 16. (Do we add 2 to the previous
number, double it, or square it?) As soon as System 2 kicks in, we realize the answer could be any of
these. For a moment, though, the answer may seem obvious. System 1 completes the pattern with the
simple earnestness of a retriever bringing back a stick.
Even when our observations have no pattern at all, we can't help but
suspect some other factor at work. For example, suppose we map
recent crimes across a city, yielding something like the picture on the
left. Most of us would find it suspicious that there are so many
clusters: we'd want to know what's causing the incidents to collect in
some places and not others. But in fact this picture shows a
completely random distribution. Real randomness generates clusters, even though we expect each data
point to have its own personal space. This is called the clustering illusion, and it can send us seeking
causal explanations where none exist.
At some level, we know it's absurd to assume that one event caused another just because it happened
first; but it's a remarkably easy mistake to make on the fly. Often the problem is one of communication. A
report that two events occurred in sequence is often taken to convey that a causal relationship links
them. For example, suppose I say "A fish jumped, and the water rippled." It's fairly clear that I'm
suggesting the fish caused the ripples. Now suppose I say, "After meeting my new boyfriend, my
grandma had a heart attack." The speaker might just be reporting a sequence of events, but it's still
natural for an audience to seek a potential causal connection.
Complex causes
The causal stories that come naturally to us are often very simple. For example, "The cause of the fire
was a match." But in our complex world, many of the things we want to understand don't arise from a
single cause. There may be no answer to the question, "What was the cause?"—not because there was
no cause, but because there were too many interconnected causes—each of which played a part.
In fact, it's rare for anything to have a single cause. When we talk about the cause of an event, we usually
mean the one factor that is somehow most out of the ordinary. For example, if a fire breaks out, in most
ordinary contexts, the cause is a source of ignition like a match. But there are other contexts where the
presence of fuel or even oxygen might be the most out-of-the-ordinary factor. For example, imagine an
experiment in a vacuum under extremely high temperatures. If the researchers were depending on a lack
of oxygen to keep things from burning up, then the unexpected presence of oxygen would count as the
cause.
We can also distinguish between the immediate causes of events
and the distal causes that explain the immediate causes. For
example, a certain drought might have a clear immediate cause, such
as a long-term lack of rain. But it would be useful to know what
caused that lack of rain: perhaps a combination of changes in
regional temperatures and wind patterns. And these factors in turn
may be part of a global change in climate that is largely due to rising
levels of greenhouse gases. Each causal factor is a node in a network that itself has causes, and
combines with other factors to bring about its effects. The real causal story is rarely ever a simple one.
In fact, this inference is so natural that the language we use to report a correlation often gets
straightforwardly interpreted as reporting causation. When people hear that two things are "associated"
or "linked" or "related," they often misinterpret that as a claim about a causal connection. But in the
sciences, these expressions are typically used to indicate a correlation that may or may not be causal.
Because we could go wrong at any step, we should become less confident with each interim conclusion.
This means that a good argument of this sort requires very strong evidence at each step. (We'll look at
the relevant rule for probability in Chapter 8, but just to get a sense of how inferential weakness can
compound, suppose you're 80% confident that the first conclusion is true, and 80% confident that the
second is true given that the first is true. In that case, you should only be 64% confident that they are
both true. And if you're 80% confident that the third is true given that the first two are true, you should
only be about 51% confident that all three conclusions are true [1].)
So a good causal argument from correlation requires that we establish three things with a high degree of
confidence: (1) that a correlation exists in the cases we've observed; (2) that this means there is a general
correlation that holds beyond the cases we've observed; and (3) that this general correlation is not
misleading: it really results from A causing B.
We'll go through these steps one at a time, but first it's worth getting clear on exactly what correlations
are.
And if we replace "higher" in this definition with "lower," then there is a negative (or inverse) binary
correlation between the two factors. (If we don't specify and just say that two things are "correlated," we
mean they are positively correlated.)
To establish a correlation, we need to know whether males own cell phones at a higher rate than
females do. But that simply does not follow from the fact that most males own cell phones and most cell
phone owners are male. Here's why:
Most people in the world own cell phones. So we should expect most males to own a cell phone,
even if males and females own cell phones at the same rate.
Most people in the world are male (by a small margin). So we should expect most cell phone
owners to be male even if males and females own cell phones at the same rate.
Putting these two facts together still doesn't give us a correlation, because they could both be true even
if males and females own cell phones at the same rate.
A useful rule of thumb that can help identify correlations is to ask yourself whether learning that factor A
is present provides you with any evidence that factor B is also present. In this case, the two bullet points
above don't give us any reason to think someone is more likely to own a cell phone after learning that
they are male.
2. Correlation is symmetrical—if it holds in one direction, it also holds in the other. In other words, if A
occurs at a higher rate when B occurs than it does otherwise, then B occurs at a higher rate when A
occurs than it does otherwise. This may not seem obvious at first, but it's true.
The answer is yes. It might sound strange to say, "Mondays occur at a higher rate on rainy days than they
do otherwise," but it's true in our example. The proportion of rainy days that are Mondays will have to be
higher than one in seven, and the proportion of non-rainy days that are Mondays will have to be lower
than one in seven. (As it happens, more than a sixth of rainy days will have to be Mondays. If you know
how to work out these values, it's worth spending the time convincing yourself with examples that
correlation is symmetrical.) This means that learning that it's a Monday is some evidence that it's rainy,
and learning that it's rainy is also some evidence that it's a Monday (if you don't know what day it is).
3. Finally, in the definition above, the term "binary" specifies that the correlation we're talking about has
to do with factors we're treating as all-or-nothing rather than as coming in degrees. The rate of a factor
simply has to do with how often it's present and absent. For example, in the case above, it's either
Monday or it's not, and we're treating whether it's rainy as a simple yes/no question. But some
correlations have to do with the degree or intensity of a factor.
For example, the height and diameter of trees both come in degrees. And at least on average, the greater
a tree's diameter, the greater its height (and vice versa). In this sense, the two features are correlated. But
it would make no sense to say that height occurs at a higher rate with diameter, because all trees have
both height and diameter. Unlike a binary correlation, which relates all-or-nothing factors, this is
correlation is scalar.
If we replace only one instance of the word "greater" with "lesser," then one factor increases as the other
decreases, giving us a negative (or inverse) scalar correlation. (There are other possibilities—for
example, A is binary and B is scalar—but let's not worry about that here.)
Illusory correlations
We turn now to the first of three ways in which we can wrongly conclude from an apparent correlation
that a causal relationship exists between two factors. Recall the three inferential steps from above:
1. We observed a correlation between A and B;
2. There is a general correlation between A and B; and
3. A causes B.
The first kind of error is that we're wrong about (1): it only seems to us like A and B correlate in our
observed sample.
But why might we get the false impression that A and B correlate in our sample? Taking just the case of
binary correlation, we may be overestimating the rate at which A occurs along with B in our sample, or
underestimating the rate at which it occurs without B in our sample—or both. This might happen for
various reasons, such as motivated reasoning, selective recall, and selective noticing.
For example, recall the idea that people behave strangely more often when the moon is full. That's a
claim about the correlation between two factors. I might think that I have good evidence for that claim
because I think a correlation exists between full moons and strange behavior in my experience, and then
I can generalize from my experience.
Another kind of mistake is simply that we fail to think proportionally. For example, suppose we've only
observed Bob when it's cold and we notice that he has worn a hat 70% of the time. Can we conclude
that there's a correlation in our observations between his wearing a hat and cold temperatures? Of
course not! What if he wears a hat 70% of the time regardless of the temperature? In that case, there's no
special correlation between his hat wearing and the cold: he just loves wearing hats.
If we are told, "Most of the time when it's cold, Bob wears a hat," it's easy to forget that this is not enough
to establish a correlation. To infer a correlation, we have to assume that Bob doesn't also wear a hat
most of the time even when it's not cold. Maybe this is a safe assumption to make, but maybe not. The
point is that if we just ignore it, we are neglecting the base rate, a mistake we encountered in the
previous chapter.
We can visualize the point this way. To establish a correlation between A and B, we must not only check
the proportion of B cases in which A occurs, but also compare that with the proportion of non-B cases in
which A occurs. In this chart, that means first asking what proportion of all the cases on the left are on
the top-left, and then asking what proportion of all the cases on the right are on the top-right:
Consider a final example—this time, one of selective recall. As we saw in a previous chapter, when asked
whether Italians tend to be friendly, we search our memory harder for examples of friendly Italians than
for examples of unfriendly Italians. Where A is being friendly, and B is being Italian, that means we focus
on the top left-hand side of the box, and do a poor job at estimating the proportion of B cases that are A.
But things are even worse than that, because the loose generalization "Italians tend to be friendly" is
plausibly a question of correlation—is the rate of friendly people among Italians higher than the
proportion of friendly people among non-Italians? In that case, we have to evaluate not only the
proportion of A cases in the B area, but also the proportion of A cases in the non-B area. So our selective
search for cases in the top left-hand side is absurdly inadequate. We need to check all four boxes of
cases we've observed.
Generalizing correlations
Suppose we've avoided these errors and correctly identified a correlation in our experience. The next
point at which our causal inference can flounder is when we generalize from our sample to conclude
that a correlation exists in the general population. (After all, our observations usually only constitute a
small sample of the relevant cases.) In the previous chapter, we saw several reasons why our sample
might fail to match the wider set of cases. All the same lessons apply when we're generalizing about
correlations from a sample—for example, we need to be aware of sampling biases, participation biases,
response biases, and so on.
However, there is one important difference when we're dealing with correlations. When estimating the
proportion of individuals with some feature, we said that a "sufficiently large" sample is one that gives us
a sufficiently narrow confidence interval. But when we are interested in a correlation between two
features, we want a sample large to make our correlation statistically significant.
So what does this mean, exactly? A correlation is statistically significant when we'd be sufficiently
unlikely to find a correlation at least this large in a random sample of this size without there being some
correlation in the larger population. We can work this out by supposing that there is no correlation in the
larger population and then simulate taking many random samples of this size, and working out what
proportion of those samples would show a correlation of at least the size that we observe, merely by
chance.
As with confidence intervals, the threshold for a statistically significant correlation is somewhat arbitrary.
By convention, sufficiently unlikely in the social sciences means there's less than a 5% chance of seeing a
correlation of this size or larger in our sample without there being some correlation in the larger
population. This corresponds to a p-value of .05. (In areas like physics, however, the threshold is often
more stringent.)
So how strong is the evidence from a study that finds a statistically significant correlation? Note that if H
= there is a correlation in the population and E = there is a correlation of at least this size in the sample,
then statistical significance ensures a low value for P( E | ~H )—namely .05. Usually we can also assume
that we'd be much more likely to see this correlation if there really is a correlation in the population as a
whole, meaning that P( E | H ) is fairly high in comparison. In that case, statistical significance translates
into a fairly high strength factor for the evidence provided by our sample. But note that the strength
factor is not exactly overwhelming. A sample correlation that is just barely statistically significant will
have at best a strength factor of 20 in favor of the generalization.
To make this point more vivid, imagine we find a barely statistically
significant correlation in our sample. As we've seen, this means
roughly a 5% chance of seeing a correlation like this in our sample
even if there's no correlation in the population as a whole. So if
twenty studies like ours were conducted, we should expect one to
find a statistically significant correlation even if there's absolutely no
correlation in the population!
If we also take into account the file drawer effect and bias for surprising findings in scientific journals, we
should be even more careful. When we see a published study with a surprising result and a p-value just
under .05, we should keep in mind that may have been conducted that found no exciting or significant
results and went unpublished. This means that the evidence provided for a surprising correlation by a
single study with that level of significance may be far from conclusive.
This selection effect only compounds for science reporting in the popular media. Studies with surprising
or frightening results are far more likely to make their way into the popular media than those with boring
results. In addition, such studies are often reported in highly misleading ways—for example, by
interpreting a correlations as though they established causation. For these reasons, if we're not experts
in the relevant field, we should be very careful when forming opinions from studies we encounter in the
popular media. It can help to track down the original study, which is likely to contain a much more
careful interpretation of the data, often noting weaknesses in the study itself, and rarely jumping to
causal conclusions. But even this will not erase the selection effect inherent in the fact that we're only
looking at this study—rather than other less exciting ones—because we heard about it in a media report.
This is one of many reasons why there is really no substitute for consulting the opinions of scientific
experts, at least if there is anything close to a consensus in the field. The experts have already
synthesized the evidence from a wide variety of studies, so they're in a much better position to assess
the real significance of new studies. Relatedly, we can look for a meta-analysis on the question—a type
of study that tries to integrate the evidence from all the available studies on the topic.
Section Questions
7-1
Which of the following is true? (Use your commonsense knowledge of the world.)
A Cloudy days are correlated with rainy days, and rainy days are correlated with cloudy days
B Cloudy days are correlated with rainy days, but rainy days are not correlated with cloudy days
C Rainy days are correlated with cloudy days, but cloudy days are not correlated with rainy days
7-2
Suppose Jasmine smiles a lot, even when she's unhappy. Which fact would guarantee that her smiling and
happiness are correlated?
The fraction of her smiling time in which she's happy is greater than the fraction of her time in general in
B which she's happy.
C Most of the time when she smiles, she's happy and most of the time when she's happy, she smiles.
7-3
the probability that a random sample would show a correlation at least as large as the one in our sample
A if there were no correlation in the general population is < .05
B the probability that the correlation we observe does not exactly match the correlation in the general
population is <.05
C the probability that there is no causal relationship in the general population is < .05
the probability that a random sample would show a correlation at least this large if there were no causal
D relationship in the general population is <.05
7-4
There is a correlation between cardiovascular fitness and the risk of heart attack. Using your background
knowledge, what's the best way to characterize this correlation?
There is no general recipe for ruling out misleading correlations, partly because theorists disagree about
the precise nature of causation. But we can all agree on some common ways in which a correlation can
be misleading. In this section, we'll focus on five in particular:
reverse causation
common cause
a side e ect (e.g., placebo)
regression to the mean
mere chance
Let's consider each of these in turn.
Reverse causation
Sometimes, when A and B are genuinely correlated, we get the direction of causation wrong: we think
that A causes B when actually B causes A.
For example, suppose we study people's overall level of happiness and also whether they get and stay
married. We find a correlation between being happy and being married. Given this, it's tempting to
conclude that marriage increases happiness. But what if the causal relationship goes the other way
around—happy people are just more likely to get and stay married?
We can't rule this possibility out just by observing the correlation. But there are other things we could
do. For example, we could study whether getting married is related to change in happiness over time
rather than overall happiness. If we find a greater increase in happiness over time for people who got
married, that would provide more evidence that marriage is the cause. On the other hand, we should be
extremely wary about a simple causal story like Marriage makes people happier. For example, it could be
that the people who got married used to be less happy because they wanted to be married; but the
people who didn't get married aren't the sort of people who would have been happier if they got
married anyway. Only a randomized controlled study could really rule this possibility out, and that
would be impossible to run! (We'd have to take a group of people and randomly pick some to get
married and some to stay unmarried.)
"I wish they didn't turn on the seatbelt sign so much! Every time they do, it gets
bumpy."
—Billy, from Family Circus (Bil Keane)
As you'll recall, correlation is symmetrical. But it's interesting to note how we state correlations when we
want to suggest a causal relationship: we always mention the alleged cause first. So, for example, there
is obviously a correlation between how serious a fire is and how many firefighters go to the fire. But
stating the correlation the other way round suggests a causal relationship in the wrong direction:
The greater the number of firefighters who go to a fire, the worse the fire is!
These statements are actually all true if we take them as merely reporting correlations. But they also
suggest causal relationships—in the wrong direction. The fact that there is a standard way to
communicate causal relationships simply by stating a correlation is a telling sign about our tendency to
look for causal stories and run the two things together.
Sometimes a correlation could be taken to provide evidence for a causal claim in either direction. In that
case, it's possible to influence which causal conclusion people jump to, simply by choosing which of the
correlated things you mention first. And of course, when media outlets report scientific results, they have
an incentive to spin the correlation in the most exciting way possible.
So, for example, what's the most sensational way to report a study finding that couples in their 40s who
look younger have sex more often than couples in their 40s who look less young? One could report that
result in many ways, for example by saying that looking younger is correlated with having more sex, or
having more sex is correlated with looking younger. Not only did media outlets choose the second way
of reporting, they often just leapt to the causal conclusion that "sex is the secret to looking younger," a
conclusion with no good evidence for it at all. The idea that having more sex could cause us to look
younger might be appealing, but the correlation found in the study is just as consistent with the
conclusion that looking younger leads to more frequent sex.
Then again, maybe what's going on is slightly more complicated: perhaps people who have the time
and motivation to maintain their youthful appearance are also more likely to have the time and
motivation to maintain an active sex life. In that case, the correlation is due to a common cause, which is
the topic of our next section.
Common cause
Probably the most important type of misleading correlation is when two factors are correlated due to a
common cause—that is, a third factor that influences both of the others. The mistake is to think that A
and B are correlated because A causes B, when actually the correlation is due to the fact that C causes
both A and B.
Suppose a child who lives in a temperate zone believes that snow comes in the winter because the
clouds want to cover up the trees that have lost their leaves. The child has noticed a real correlation:
every year after the leaves fall from the trees, the snow falls from the clouds. And there is a real causal
relationship here, just not directly between those two factors. Instead,
both factors share a common cause: falling temperatures.
Common causes are a major problem for non-randomized studies
that find correlations between two factors in a large population. For
example, suppose we find that swimming is correlated with better
health outcomes than running or playing most team sports. This by
itself should not make us very confident that swimming actually improves health, especially since we
know that swimming requires access to a swimming pool, which in turn may require pool fees, etc. In
other words, it's plausible that swimming is at least somewhat influenced by income, which we know
also influences health outcomes. So the correlation we've discovered might just be due to a common
cause: income affects people's recreational activities, and separately affects their health outcomes. We
can try to control for this effect by comparing health outcomes only across people of the same income.
But this may not solve the more general problem of possible common causes in our study: there could
be other factors related to socioeconomic status that affect recreational activities and also health, and
that aren't quite captured by income.
Take another example. Suppose we want to know whether broccoli is good for people's health, so we
take a large group of people, look at how much broccoli they eat, and then track their health outcomes.
A major problem with this approach is that broccoli is widely considered to be a healthy food. So even if
it's not actually affecting people's health, we should expect the kind of person who eats more broccoli to
also be the kind of person who does other things that are considered healthy—like exercising, refraining
from tobacco, etc. This kind of common cause—being the sort of person who cares about health or
safety—is essentially impossible to measure, which means it can't be ruled out as the real cause without
using a randomized trial. (More on that below.)
The same holds for many other behaviors that are widely considered beneficial, such as buying a car
that seems safe. Suppose we find out that Volvos are less likely to be involved in fatal accidents than
Fords. Is this good evidence that Volvos are safer than Fords? Ironically, the reason it's not is precisely
that Volvos have a reputation for safety, which means Volvo owners are more likely to be safety-
conscious to begin with, and safety-conscious drivers are less likely to be in fatal accidents. (Luckily, we
can also compare cars using crash-test ratings that take the driver out of the equation.)
"I used to think correlation implied causation. Then I took a statistics class.
Now I don't."
"Sounds like the class helped."
"Well, maybe."
—Randall Munroe, xkcd.com
Some factors are so pervasive that they end up being common causes for a great many correlations. In
addition to socioeconomic status, consider global trends like population growth and economic progress.
Together, these have lead to a steady increase in a great many measurable factors, so that we would find
at least a rough correlation over the last twenty years between such apparently unrelated things as the
number of avocados consumed in Michigan, the average quality of wifi in Spain, and the number of
haircuts per capita given in India every day.
Here's a final and well-known example. Many people have used anecdotal evidence to argue that a
causal link exists between vaccines and autism. Since certain signs of autism tend to occur around the
same time as the recommended age for the MMR vaccine, there are many parents who can report that
soon after the vaccination, they noticed signs of autism in their child. However, because age is a likely
common cause, a proper test would compare children of the same age who are vaccinated with those
who are not. The most comprehensive studies are quite definitive that vaccines do not cause
autism. However, if correlation is causation, perhaps we should conclude there's a causal relationship in
the other direction, as this SMBC comic suggests!
Side e ects
Sometimes there is a genuine causal relationship between A and B, but it's not the relationship we
expected. In particular, B may be caused by a side effect of A.
For example, a study may find that a drug is correlated with reported reduction in pain, but also that a
fake pill with no active ingredients is equally effective. This is not the same as saying the drug has no
effect. Instead, its effect is not due its chemical composition—it's due to the expectation that it will be
effective. When a treatment is effective due to this kind of expectation, that's known as the placebo
effect.
Note that placebo treatments needn't be pills. They can be anything that we expect to be effective.
Suppose I think that a walk in the park will help my headache. If I take a walk and then felt better, it could
be the fresh air and exercise that helped, or it could be the expectation that I'd feel better. An even more
likely explanation, perhaps, is that my walk made me feel better through a combination of these factors.
Don't get me wrong—I'm not saying the hologram bands didn't work. In fact, my guess is that people
really do tend to perform better with such bands, at least while they are thinking about them. The
mechanism of expectation is extremely powerful, especially for things like athletic performance, where
the individual's psychological state is enormously important. That's why many people—no doubt
including many of the people marketing the bands—believe the bands work. They really do work! They
just work by way of the placebo effect, not by holographic power. (In contrast, placebo treatments tend
not to work for things like reducing the size of a tumor.)
This is one reason why using placebo treatments is an important part of testing medical interventions:
it's the only way of being sure that the treatment has efficacy beyond that of the placebo effect. (And as
you might imagine, there is some controversy about prescribing an intervention just because it's likely to
have a positive placebo effect.)
This image shows us a value that generally increases over time but also bounces around a lot above and
below its overall trend. Now suppose we were to selectively choose points on the blue line that are
significantly higher than the red line. Even though the overall trend is upwards, we would expect the
value to fall a bit from those local high points. Likewise, if we were to start from only points that are
much lower than the trendline, we would expect subsequent data points to be higher. This is what it
means for data to regress towards the mean.
Regression to the mean is an extremely common thing, but can be very misleading. For example, when I
have an extremely bad day, I tend to have a better day the next day, due to regression to the mean. Now
suppose I apply some intervention only on the bad days. (It could be anything: taking a pill, meditating,
or calling my mom.) If I do this every time I have a terrible day, it will create a pattern between applying
the intervention and then feeling better a bit later. This is a real correlation, but it's only due to a kind of
selection effect. I didn't randomly select days to apply the intervention, and then test whether I felt
better. I only selected days that were especially bad—i.e., days when I was likely to regress to the mean
the next day regardless.
Or suppose you're the mayor of a city and you want to give people the
false impression that you've improved the traffic situation. You
can find the intersections that had particularly high numbers of
accidents this year, as compared to previous years. Then install an
intervention, like a traffic camera, a new light pattern, or extra signs.
Chances are the following year the rate of accidents will drop at those
intersections, even if the interventions had no effect! The magic is in picking the right intersections—
regression to the mean will do the rest.
The same illusion should be expected for any interventions that are applied only in cases when things
are especially bad. It doesn't matter whether the intervention is alternative medicine, conventional
medicine, supernatural healing, or a new investment strategy. These things may or may not be effective:
but we get no evidence that they're effective when we see the improvements that we should expect
anyway due to regression to the mean.
Here's a final example. Are punishments more effective than rewards in influencing children's behavior?
If we think in terms of regression to the mean, we can see that there is likely to be a two-fold illusion:
Taken together, these are likely to reinforce the idea that punishment is more effective than reward, even
if it's not.
So how does regression to the mean relate to the placebo effect? Even though they are quite different
things, they often operate together. This is because many misleading correlations involve some kind
of intervention (e.g., a medicine) applied to particularly bad cases. But then:
People o en expect the intervention to work, which in turn can give rise to a placebo e ect, even if
the intervention is not otherwise e ective.
If we start from particularly bad cases, we should expect things to regress to the mean even if the
intervention is not e ective.
Luckily, both types of misleading correlation can be ruled out using the same kind of study, as we'll see
below.
Mere chance
The issue here is not that our sample showed a correlation just by chance—let's suppose we've ruled out
that possibility by using a sufficiently large and unbiased sample. The issue is that there might even be a
correlation in the population as a whole merely by chance. Luckily, such correlations tend to be very
narrow and strange. For example, consider the following graph, which tracks two factors in the US over a
decade:
This chart shows a highly significant scalar correlation for this particular period of time: for the most
part, the greater the number of letters in the winning word in a given year, the greater the number of
people killed in the US by venomous spiders. It's extremely unlikely that this correlation over a decade
would happen by chance.
Given this, can we simply reject the hypothesis that the correlation is due to chance? Not at all, because
however unlikely it is that this correlation would happen by chance, a causal relationship between the
two factors is even less likely. It would be ludicrous to think we could reduce the number of people being
killed by venomous spiders by reducing the length of the winning word in the spelling bee—or vice
versa. There just is no causal mechanism that could explain the connection. (A causal mechanism is the
specific way in which one event causes another. For example, the causal mechanism by which smoking
causes cancer involves the formation of DNA adducts by the carcinogens from cigarette smoke that are
taken into the body.)
If we had to predict whether the correlation would continue into the future, we should predict that it will
not. So how can we explain the correlation? Simple: even though it was very unlikely for it to happen by
chance, it just did. After all, sometimes very unlikely things do happen! And as a matter of fact, I was not
very surprised when I encountered this correlation, because I found it with a powerful selection effect: I
went looking on the internet for an example of bizarre correlations. My search brought me to a website
called Spurious Correlations, whose owner had sifted through vast troves of data to come up with the
weirdest correlations he could find. Given the complex data sets he was using, there were bound to be
some very unlikely chance correlations.
One lesson here is that we should be wary of charts that overlay multiple different axes with different
scales. Part of what makes these correlations look so impressive is that the y-axes have been truncated
and scaled to emphasize the fact that the factors move up and down together.
The more important lesson, however, is that we should not be surprised to see very unlikely
coincidences when they are brought to our attention through a process that sifts through a great many
events specifically to search for coincidences. Every day in United States, there are more than 300 people
who experience events so incredibly unlikely that they would only happen to one in a million people on
a given day. If we could search through the country to find those events, we'd have no shortage of
material to amaze us. [2]
Of course, when people experience extremely strange events, they are likely to talk about them, and
others will often amplify their voice so that we do end up hearing about them. Given our inter-
connectedness, we should therefore expect to hear about some genuine events that are so unlikely to
happen by chance that they seem to be better explained by ghosts, aliens, or other X-factors unknown to
science. Should we conclude that such events are evidence for those X-factors? "After all," some say, "the
probability of this event occurring just by chance through natural causes is incredibly low. So this event
is strong evidence for X."
That might look like a pretty good argument until we remember the selection effect involved. We have
an enormous pool of possible events, coupled with a process that sifts through them and focuses on the
most improbable ones. We should expect such a method to give rise to data that is very misleading.
(Note, also, that the argument fails to carefully assess the probability of the evidence given the X-factor
hypothesis: for example, does the hypothesis that ghosts exist really make it likely that this event would
happen to this person, given the zillions of possible ways for ghosts to manifest?) [3]
Suppose our observations do show a correlation. Can they take us all the way from (1) to (3)? We need a
large enough study that avoids sampling bias to establish (2), and we need our evidence to somehow
distinguish between a genuine causal relationship and all the misleading correlations we've discussed in
§7.3.
The hypothesis we are testing is that A causes B. What we want is evidence with a high strength factor,
either for or against that hypothesis. This means we want evidence E that meets one of these conditions:
Ideally, we can devise an experiment that will give us an unmistakable result no matter what: evidence
that could basically only have been observed if A causes B, or evidence that could basically only have
been observed if A does not cause B. In practice, this is very hard to obtain because there are so many
ways for a correlation to link A to B even if A does not cause B. What we want is a study that can
simultaneously rule all of these possibilities out.
This is why the gold standard for establishing causation from correlation is a double-blind randomized
controlled trial. In a randomized controlled trial, subjects are randomly selected and placed into two
groups. The treatment under investigation is applied only to one group, but both groups are followed
and assessed for relevant changes. In a double-blind study, neither the subjects nor the experimenters
that interact with or assess them know which group the subjects are in. (Usually subjects in the
treatment arm are aware of the treatment—such as a medication; so blinding the trial requires the
control group to receive a placebo treatment.)
Note how many possible sources of error this eliminates, by ensuring that no factors other than the
treatment might explain differences in the observed outcomes between the two groups. To illustrate,
suppose we ran a study like this and found that a significant difference between the groups, with effect B
occurring in the treatment group and not the control group.
Bias in the sample. Since subjects are randomized, there's no sampling bias between the two
groups. Since the experimenters are blinded, they can't accidentally bias the sample a er the
randomization by treating the groups di erently.
Reverse causation. The correlation can't be due to reverse causation, because the cause of
treatment is the randomizing procedure of the experiment itself.
Common cause. There could be a common cause for (i) being selected for the study and also for (ii)
exhibiting B. But this would a ect both groups equally, and we found a di erence between the
groups.
Placebo. Because the subjects don't know which group they're in, any placebo e ect would show
up in both groups.
Regression to the mean. The subjects could be regressing to the mean—if, for example, they were
selected due to having some condition. But again, regression to the mean would a ect both groups
equally.
In short, the design of the study ensures that we'd be unlikely to get the result we did if A does not cause
B. So a study like this provides strong evidence that it A does cause B. On the flip side, if a study like this
shows no difference between the groups, that's a result we'd only expect if A does not cause B, so we get
strong evidence that it does not.
Of course, our evidence would be even stronger if it stemmed from several different sources, all pointing
in the same direction. This would help to ensure that the result doesn't stem from any particular flaw in
the study's design, any error on the part of the experimenters, or sheer bad luck. Evidence that stems
from such a wide variety of experiments can be described as robust.
Section Questions
7-5
Suppose we find that eating an expensive health supplement is correlated with lower mortality in the general
population. Before concluding that the result is causal, we should be primarily concerned about... (note: your job
here is to distinguish which is the most probable non-causal explanation.)
A reverse causation
B a common cause
C regression to the mean
D placebo e ect
7-6
Which of the following is most likely an error due to forgetting about common cause?
Philosophy majors do better on the LSAT and GRE than any other major-- so majoring in philosophy
A causes better performance on those tests .
In a study, people were randomly split into two groups. One group was instructed to do an hour of
B exercise a week, and the other group was instructed to refrain from exercise. The group that exercised
reported feeling better at the end of the week.
C The more firefighters respond to a fire, the bigger the fire! Firefighters must be causing the fire
Most people who went to see a doctor for stomach cramps felt better a few days later, whether or not
D they were treated with a drug.
7-7
Suppose a study looking at the long-term health data of a population finds that recreational exercise is better
correlated with good health than exercise during work (e.g., manual labor). The study concludes that exercise
you enjoy is more beneficial than exercise you don't. Given your background knowledge, which of the following
should make you most suspicious of drawing this conclusion from the data?
A The study relied on daily diaries in which subjects self-reported their exercise level.
B There was no group of subjects in the study who did no exercise at all.
7-8
A large number of students took a test of reaction time. We identified the students who got the very worst scores,
and had them drink a protein shake. Everyone else got a similar shake with no protein in it, and none of the
subjects were told which shake they were receiving. Then everyone retook the test. The students who had drunk
the protein shake increased their scores on the second test much more than the average student did. (The
di erence was statistically significant.) What should we be most concerned about before concluding that the
protein shake caused the improvement?
B reverse causation
C placebo e ect
D common cause
7-9
We are testing the hypothesis that treatment A improves pain. In a double-blind randomized controlled trial of
people in more pain than usual, we find an equal improvement in pain in both the treatment group given A, and a
control group not given A. What can we say about the role of the placebo e ect and/or regression to the mean?
B Given the study design, the placebo e ect is probably the cause of the improvement in both groups.
Given the study design, regression to the mean is probably the cause of the improvement in both
C groups.
D Given the study design, neither of these factors could be the cause of the improvement.
7-10
Assessing whether there is a plausible causal mechanism that could explain a correlation...
is not necessary when we find a correlation in a population that had a < .05 probability of happening by
A chance, since we can reject the "mere chance" hypothesis regardless of how implausible the causal
mechanism is
is important because unless we find a correlation in a population that had a < .05 probability of
B happening by chance, we can reject the "mere chance" hypothesis if the causal mechanism is
su iciently implausible
is important because even when we find a correlation in a population that had a < .05 probability of
C happening by chance, whether we should reject the "mere chance" hypothesis can depend on the
plausibility of the causal mechanism
is not necessary, but when the mechanism is su iciently plausible, we should never reject the causal
D hypothesis regardless of whether the correlation is statistically significant
Key terms
Causal argument: the attempt to establish a causal connection between two factors (i.e., anything that
can stand in a causal relation such as events, situations, or features of objects).
Causal mechanism: the specific way in which one event causes another. For example, the causal
mechanism by which smoking causes cancer involves the formation of DNA adducts by the carcinogens
from cigarette smoke that are taken into the body.
Clustering illusion: a form of pattern-seeking in which people tend to think that random distributions
over an area are clustering too much to be random.
Common cause: Two events, A and B, are correlated due to common cause when some third event C is
responsible for both of them, and that's why they occur together at a higher rate than alone.
Correlation: If A occurs at a higher rate when B occurs than it does otherwise, we say A and B have
a positive binary correlation. If A occurs to a greater degree when B occurs to a greater degree, we have a
positive scalar correlation. Negative correlations are statistical relationships in the opposite direction: A
occurs at a lower rate when B occurs than it does otherwise, or A occurs to a lesser degree as B occurs to
a greater degree. If the term “correlated” is used without specifying positive or negative, we assume that
the term refers to a positive correlation.
Double-blind: an experiment is double-blind when neither the subject nor the experimenter is aware of
which subjects belong to the control arm and which belong to the experimental arm of the trial. This
experimental design helps to rule out experimenter effects as a possible explanation for observed
differences in outcomes between the control group and the group receiving the intervention.
Immediate vs. distal causes: A distal cause of x is one that is effective through intermediate causes. A
proximate cause is of x is one that is immediately responsible for the event.
Mere chance (as an explanation for a correlation): when there is a genuine correlation between two
factors but there is no causal connection between them. We often identify correlations due to mere
chance by assessing the plausibility of the causal mechanism required for a causal connection between
the relevant factors.
Pattern-seeking: the tendency to be over-sensitive to patterns even in scarce data that could be
entirely random.
Placebo effect: a positive effect arising from the expectation that an intervention (usually medical or
dietary) will be effective. This effect works entirely through a subject's psychology. For example, when
subjects take pills that they believe are effective for pain or depression, some report that the pills are
effective even when they are not in fact biologically active.
Post hoc ergo propter hoc: Latin for “after this therefore because of this”. It's the name given to the
fallacy of assuming that because event B happens after event A, it must have been caused by A.
Randomized controlled trial: in this kind of experiment, subjects are randomly divided into two
groups, and some intervention (e.g., a drug) is applied to members of one group only. This procedure
helps to rule out other factors (aside from the intervention being tested) that might explain differences in
the observed outcomes between the two groups.
Regression to the mean: the tendency, when selecting a data point that lies outside the mean, for
adjacent data points to lie closer to the mean. This tendency can result in misleading correlations.
Reverse causation: when we propose that A causes B in order to explain a correlation between them,
but in fact the correlation is explained by the fact that B causes A.
Robust evidence: evidence that stems from a wide range of experiments (i.e., from different sources
and from different kinds of experiments). This helps to ensure that we are drawing on lots of data, and
that the result does not stem from some flaw in the study's design, or error on the part of the
experimenters.
Side effect (as an explanation for a correlation): when two factors, A and B, are correlated due to
some additional consequence of the presence of one of the factors, which is not the alleged causal
mechanism. For example, a drug may be correlated with a reported reduction in pain, even if a fake pill
with no active ingredients would be just as effective. This doesn't mean that the pill has no effect, but
that its effect is not due to the drug it contains, but due instead to a side effect of taking the drug:
namely, the expectation that the pill will be effective.
Footnotes
1. Note that this is not exactly the same thing as saying you should be .51 confident that the third conclusion is true. For
example, (3) might happen to be true even if (1) is false: it just so happens that there is a causal relationship overall even
though your sample was bad and had no real correlation. Exactly how confident you should be in (3) after examining your
sample depends on how plausible it was to begin with. But the key point about the strength of this particular three-step
2. cf. the quote attributed to Penn Jillette: "Million-to-one odds happen eight times a day in New York."
3. It's better evidence for the much more specific hypothesis that this house is haunted by a specific kind of ghost who likes to
behave in a specific way. But this should be treated to begin with as far less likely than the simple hypothesis that ghosts exist:
Image Credits
Banner image of spider web with droplets: image by Mindz licensed under CC0 / cropped from original. Lone tree and sunset:
image by mbll licensed under Pixabay license. Desert with cracked earth: image by Marion Wunder licensed under Pixabay
license. Ripples in water with floating ice: image licensed under Pixabay license. Random distribution of points: image by
Blythwood licensed under CC BY-SA 4.0. Chain links closeup with frost: image by Markus Spiske licensed under Pexels license.
Rain on glass: image by Markus Spiske licensed under Pexels license. Tree trunk in forest: image licensed under CC0. Full
moon and winter woods: image by Robson Machado licensed under Pixabay license. Smoggy city: image by Götz Friedrich,
licensed under Pixabay license. Winter trees with snow: image by Hermann Schmider licensed under Pixabay license. Autism
and vaccines comic: image by Zach Weinersmith, used with permission; Park benches with orange leaves: image by Pepper
Mint licensed under Pixabay license. City junction at night from above: image licensed under by Pixabay license. Spurious
correlation graphs: images by Tyler Vigen licensed under CC BY 4.0. All other images are the author's own.
Reason Better: An Interdisciplinary Guide to Critical Thinking © 2019 David Manley. All rights reserved
version 1.4