02 Producing Data, Sampling
02 Producing Data, Sampling
What percentage of voters approve of the way the U.S. President is handling his job?
This is difficult to determine exactly as there are more than 250 million people of voting
age in the U.S.
But it’s not difficult to estimate this percentage quite well:
Sample 1,000 (say) voters at random. Then use the approval percentage among those
voters as an estimate for the approval percentage of all voters.
What is statistical inference?
Since the sample is drawn at random, the estimate will be different from the parameter
due to chance error. Drawing another sample will result in a different chance error.
The chance error (sampling error) will get smaller as the sample size gets bigger.
Moreover, we can compute how large the chance error will be.
This is not the case for the bias (systematic error):
Increasing the sample size just repeats the error on a larger scale, and typically we don’t
know how large the bias is.
Observational Studies
People who eat red meat have higher rates of certain cancers than people who don’t
eat red meat.
I This means that there is an association between red meat consumption and
cancer: there is a link between these two.
I But this does not mean that eating red meat causes cancer: people who don’t eat
red meat are known to exercise more and drink less alcohol, and it could be the
latter two issues that cause the difference in cancer rates.
This is an observational study: It measures outcomes of interest and this can be used
to establish association.
But association is not causation, because there may be confounding factors such
as exercise that are associated both with red meat consumption and cancer.
Randomized controlled experiments
To establish causation, an experiment is required:
A treatment (e.g. eating red meat) is assigned to people in the treatment group but
not to people in the control group.
Then the outcomes in the two groups are compared. To rule out confounders, both
groups should be similar, apart from the treatment. To this end:
I The subjects are assigned into treatment and control groups at random.
I When possible, subjects in the control group get a placebo: it resembles the
treatment but is neutral. Assigning a placebo makes sure that both groups are
equally affected by the placebo effect: the idea of being treated may have an
effect by itself.
I The experiment is double-blind: neither the subjects nor the evaluators know the
assignments to treatment and control.
The placebo effect
The placebo effect is still not fully understood and is one of the most interesting
phenomena in science.
‘The weird power of the placebo effect, explained’ by Brian Resnick (7/7/2017)
The logic of randomized controlled experiments