3 SamplingDistributions Complete
3 SamplingDistributions Complete
1
What are the risks of excessive drinking at a department
party?
2
Inferential Statistics
Inferential statistics are used estimate “parameters” in the
population from parameter estimates in a sample drawn from
that population.
6
Sampling Error
Since a sample does not include all members of the
population, parameter estimates generally differ from
parameters on the entire population (e.g., use mean height of
a sample of 1000 people to estimate mean height of US
population).
7
Hypothetical Sampling Distribution
A sampling distribution is a probability distribution of all
possible samples of size N taken from a population
A sampling distribution can be formed for any population
parameter.
Each time you draw a sample of size N from a population,
you can calculate an estimate of that population parameter
from that sample.
Because of sampling error, these parameter estimates will
not exactly equal the population parameter. They will not
equal each other either. They will form a distribution.
A sampling distribution, like a population, is an abstract
concept that represent the outcome of repeated (infinite)
sampling. You will typically only have one sample.
8
What if we didn’t need samples?
Research question: How do inhabitants of a remote pacific
island feel about the ocean?
Population size = 10,000
Dependent measure: Ocean liking scale scores range from -
100 (strongly dislike) to 100 (strongly like); 0 represents
neutral
Hypotheses: H0: = 0; Ha: <> 0)
> lm.describeData(d)
var n mean sd median min max skew kurtosis
Like0 1 10000 0 23.67 0.1 -86.64 84.46 0 -0.03
10
Ocean Liking Scale Scores in Full Population
> windows() #quartz() for MAC users
> par('cex' = 1.5, 'lwd' = 2, 'font.axis'=1.5, 'font.lab' = 2)
> hist(d$Like0, col=‘yellow’)
11
Parameter Estimation and Testing
What do you conclude?
Inhabitants of island ARE neutral on average on the
Ocean Liking Scale; = 0
13
Obtain a Sample
Your friend is a poor graduate student too. All she can afford
is N=10 too.
> dS$Sample2 = sample(d$Like0,10)
> lm.describeData(dS$Sample2,1)
n mean sd min max
sample2 10 1.04 22.42 -22.74 44.43
14
Sampling Distribution of the Mean
You can construct a sampling distribution for any sample
statistic (e.g., mean, s, min, max, r, B0, B1)
For the mean, you can think of the sampling distribution
conceptually as follows:
1. Imagine drawing many samples (lets say 1000 samples
but in theory, the sampling distribution is infinite) of
N=10 participants (10 participants in each sample) from
your population
...
18
Sampling Distribution of the Mean
What will the mean of the sample means be? In other
words, what is the mean of the sampling distribution?
The mean of the sample means (i.e., the mean of the
sampling distribution) will equal the population mean of
raw scores on the dependent measure. This is
important b/c it indicates that the sample mean is an
unbiased estimator of the population mean.
19
Sampling Distribution of the Mean
The mean is an unbiased estimator:
The mean of the sample means will equal the mean of the
population. Therefore individual sample means will neither
systematically under or overestimate the population mean.
Raw Ocean Liking scores
n mean sd median min max skew kurtosis
Like0 10000 0 23.67 0.1 -86.64 84.46 0 -0.03
21
Standard Error (SE)
The standard deviation of the sampling distribution (i.e.,
standard deviation of the infinite sample means) is equal to:
Nsample
Where is the standard deviation of the population of raw
scores
This variability in the sampling distribution is due to
sampling error.
Therefore, b/c we use sample statistics (parameter
estimates) to estimate population parameters, we would like
to minimize sampling error.
The standard deviation of the sampling distribution for a
population parameter has a technical name. It is called the
standard error of the statistic. Here, we are talking about the
22
standard error of the mean
Standard Error
What factors affect the size of the sampling error of the mean
(i.e., the standard error)?
Nsample
23
Factors that Affect the Standard Error (SE)
26
Normal Pop and Various Sampling Distributions
27
NOTES: Population size = 100,000; Simulated 10,000 samples
Uniform Pop and Various Sampling Distributions
28
Skewed Pop and Various Sample Distributions
29
NOTE: x-axis scale changes across figures on this slide
An Important Normal Distribution: Z-scores
Z scores are normally distributed scores with a mean of 0
and a standard deviation of 1.
You can therefore think of a z-score as telling you the
position of the score in terms of standard deviations above
the mean.
The probability distribution is known for z-scores.
30
Probability of Parameter estimate given H0
How could you use the z-score distribution to determine
the probability of obtaining a sample mean (parameter
estimate) of 2.40 if you draw a sample of N=10 from a
population of Ocean Liking scores with a population mean
(parameter) of 0?
31
Hypothetical Sampling Distribution for H0
If H0 is true; sampling distribution has a mean of 0 and
standard deviation of / Nsample = 23.7 / 10 = 7.5
32
Hypothetical Sampling Distribution for H0
If H0 is true and this is the sampling distribution (in blue),
how likely is it to get a sample mean of 2.4 or more extreme?
Pretty likely…..
But we can do better than that…….
33
Our first inferential test: the z-test
z = 2.4 – 0 = 0.32; p < .749
7.5
pnorm(0.32, mean=0, sd=1, lower.tail=FALSE) * 2
0.7489683
37.4% 37.4%
34
t vs. z
z = 2.4 – 0 = 0.32
7.5
Where did we get the 2.4 from in our z test?
Our sample mean from our study. This is our parameter
estimate of the population mean of OLS (Like0) scores
Where did we get the 7.5 from in our z test and what is the
problem with this?
This was our estimate of the standard deviation of the
sampling distribution. / NSample
We do not know . 35
t vs. z
How can we estimate ?
We can use our sample standard deviation (s) but s is a
negatively biased parameter estimate. On average, it will
underestimate
So what do we do?
We account for this underestimation of and therefore of the
standard deviation (standard error) of the sampling
distribution by using the t distribution rather than the z
distribution to calculate the probability of our parameter
estimate if H0 is true.
37
t vs. z
The bias in s decreases with increasing N. Therefore, t
approaches z with larger sample sizes
38
Null Hypothesis Significance Testing (NHST)
1. Divide reality regarding the size of the population parameter
into two non-overlapping possibilities. (Null hypothesis &
Alternate hypothesis).
3. Collect data.