Sampling Distributions
Sampling Distributions
Summer 2003
The Central Limit Theorem (CLT)
If random variable Sn is defined as the sum of n
independent and identically distributed (i.i.d.)
random variables, X1, X2, …, Xn; with mean, µ,
and std. deviation, σ.
Then, for large enough n (typically n≥30), Sn is
approximately Normally distributed with
parameters: µSn = nµ and σSn = n σ
.75
.42
.28
0 1 0 1 2 3 0 1 2 3 4 5 6 7 8 9 10
0.35
0.30
p=.8, n=10 0.35
0.30
p=.8, n=25
0.25 0.25
0.20 0.20
0.15 0.15
0.10 0.10
0.05 0.05
0.00 0.00
0
5
10
15
20
25
10
15
20
25
15.063 Summer 2003 6
Using the Normal Approximation to
The Binomial…
If r.v. X is Binomial (n, p) with parameters:
E(X) = np; VAR(X) = np(1-p);
We can use Normal r.v. Y with mean np and variance
np(1-p) to calculate probabilities for r.v. X (i.e., the
binomial)
The approximation is good if n is large enough for the
given p, i.e, must pass the following test:
X=
∑ X X ±Z
n
n or
σ σ
X −Z ≤µ ≤ X +Z
n n
Idea: If we take a large enough random sample (i.e. n>=30) for r.v. X
(i.e., the population of interest), then we can estimate the mean, µ , for
r.v. X even if we do not know the distribution of X. Note: use the sample
SD, s, if the population sd, σ, is not known: ∑ (X − X )
2
=
2
S n −1
More on s vs. σ later
S=
2
S
The value of z is determined by the confidence level assigned to the
interval (see next slide)
We would for example say that we are 95% confident the true
mean for x falls in the interval:
σ σ
X −1.96 ≤ µ ≤ X +1.96
n n
Low___________ High____________
315,889
15.063 Summer 2003 14
Overconfidence
Respondant Topic Target Result
Kellogg Starting
MBAs salary 49% 85%