0% found this document useful (0 votes)
9 views

T 4 Sampling Distributions

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

T 4 Sampling Distributions

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Sampling Distributions of Estimators

Rohini Somanathan

Page 0 Rohini Somanathan


' $

What is a sampling distribution?

Since estimators are statistics, they are random variables with their own probability distributions

We call these sampling distributions because they are induced by the sample and determined by the joint
distribution of the sample

The sampling distribution of estimator based on a random sample is determined by F (the population from
which the sample is drawn) and n, the sample size

If F is unknown, then we cannot know the exact sampling distribution

For many estimators, we may have asymptotic distributions (large-sample approximations)

Given a sampling distribution, we can


– make appropriate trade-offs between sample size and precision of our estimator since sampling distributions
depend on sample size.
– obtain interval estimates rather than point estimates after we have a sample- an interval estimate is a
random interval such that the true parameter lies within this interval with a given probability (say 95%).
– choose between to estimators- we can, for instance, calculate the mean-squared error of the estimator,
Eθ [(θ̂ − θ)2 ] using the distribution of θ̂.

& %
Page 1 Rohini Somanathan
' $

Application: sample size and precision

Examples:
1. What if Xi ∼ N(θ, 4), and we want E(X̄n − θ)2 ≤ .1? This is simply the variance of X̄n , and we know
X̄n ∼ N(θ, 4/n).
4
≤ .1 if n ≥ 40
n
2. Consider a random sample of size n from a Uniform distribution on [0, θ], and the statistic
U = max{X1 , . . . , Xn }. The CDF of U is given by:



 0 if u ≤ 0
 n
F(u) = u
if 0 < u < θ


θ

1 if u ≥ θ

We can now use this to see how large our sample must be if we want a certain level of precision in our estimate
for θ. Suppose we want the probability that our estimate lies within .1θ for any level of θ to be bigger than 0.95:

Pr(|U − θ| ≤ .1θ) = Pr(θ − U ≤ .1θ) = Pr(U ≥ .9θ) = 1 − F(.9θ) = 1 − 0.9n

We want this to be bigger than 0.95, or 0.9n ≤ 0.05. With the LHS decreasing in n, we choose
log(.05)
n ≥ log(.9) = 28.43. Our minimum sample size is therefore 29.

& %
Page 2 Rohini Somanathan
' $

Special Sampling Distributions


A few distributions are essential for statistical inference

Chi-square: The Chi-square with n degrees of freedom (χ2n ) is in the family of Gamma distributions.

Student’s t: This is a ratio of a Normal and χ2n

F-distribution: This is ratio of two chi-square distributions, (χ2m and χ2m )

Normal: We are already familiar with this

We will now define these and connect them with each other.

& %
Page 3 Rohini Somanathan
' $

Joint distribution of sample mean and sample variance

1 P
n
For a random sample from a Normal distribution, the MLEs are µ̂ = X̄n and σ̂2 = n (Xi − X̄n )2 .
i=1

Theorem: If X1 , . . . Xn form a random sample from a normal distribution with mean µ and variance σ2 , then
1 P
n
the sample mean X̄n and the sample variance n (Xi − X̄n )2 are independent random variables and
i=1

σ2
X̄n ∼ N(µ, )
n

P
n
(Xi − X̄n )2
i=1
∼ χ2n−1
σ2

Note: This is only for normal samples.

& %
Page 4 Rohini Somanathan
' $

The t-distribution
Let Z ∼ N(0, 1), let Y ∼ χ2v , and let Z and Y be independent random variables. Then

Z
X = q ∼ tv
Y
v

Features of the t-distribution:

The t-density is symmetric with a maximum value at x = 0.

The shape of the density is similar to that of the standard normal (bell-shaped) but with fatter tails.

& %
Page 5 Rohini Somanathan
' $

Relation to random normal samples


P
n
RESULT 1: Define S2n = (Xi − X̄n )2 . The random variable
i=1

n(Xn − µ)
U= q
2
∼ tn−1
Sn
n−1


n(X −µ) S2
Proof: We know that σ
n
∼ N(0, 1) and that σn2 ∼ χ2n−1 . Dividing the first random variable by the square
root of the second, divided by its degrees of freedom, the σ in the numerator and denominator cancels to obtain U.

Implication: We cannot make statements about |X̄n − µ| using the normal distribution if σ2 is unknown. This result
P
n
(Xn −µ)
allows us to use its estimate σ̂ = (Xi − X̄n )2 /n since σ̂/√
2
n−1
∼ tn−1
i=1

RESULT 2 As n → ∞, U −→ Z ∼ N(0, 1)

q √
n(Xn −µ)
To see why: U can be written as n−1
n σ̂ ∼ tn−1 . As n gets large σ̂ gets very close to σ and n−1
n is close
to 1.

F−1 (.55) = .129 for t10 , .127 for t20 and .126 for the standard normal distribution. The differences between these
values increases for higher values of their distribution functions (why?)

& %
Page 6 Rohini Somanathan
' $

Interval estimates and confidence intervals for Xn

Given σ2 , let us see how we can obtain an interval estimate for µ, i.e. an interval which is likely to contain µ with a
pre-specified probability.
 
(Xn −µ) (Xn −µ)
Since σ/√n ∼ N(0, 1), Pr −2 < σ/√n < 2 = .955
2σ 2σ 2σ 2σ
But this event is equivalent to the events − √n
< Xn − µ < √
n
and Xn − √
n
< µ < Xn + √
n
2σ 2σ
With known σ, each of the random variables Xn − √ n
and Xn + √ n
are statistics. Therefore, we have
derived a random interval within which the population parameter lies with probability .955, i.e.
 2σ 2σ 
Pr Xn − √ < µ < Xn + √ = .955 = γ
n n

Notice that there are many intervals for the same γ, this is the shortest one.

Now, given our sample, our statistics take particular values and the resulting interval either contains or does not
contain µ. We can therefore no longer talk about the probability that it contains µ because the experiment has
already been performed.
2σ 2σ
We say that (xn − √ n
< µ < xn + √ n
) is a 95.5% confidence interval for µ. Alternatively, we may say that µ
lies in the above interval with confidence γ or that the above interval is a confidence interval for µ with
confidence coefficient γ

& %
Page 7 Rohini Somanathan
' $

Confidence Intervals for means..example 1

Using the exact distribution of Xn

X1 , . . . , Xn forms a random sample from a normal distribution with unknown µ


and σ2 = 10.

xn is found to be 7.164 with n = 40.

An 80% confidence interval for the mean µ is given by


r r
10 10
(7.164 − 1.282 ), 7.164 + 1.282 ) or (6.523, 7.805)
40 40
The confidence coefficient. is .8

stata command: display invnormal(.9)

& %
Page 8 Rohini Somanathan
' $

Confidence Intervals for means..example 2

Using the CLT approximation for the distribution of Xn

Let X denote the sample mean of a random sample of size 25 from a distribution
with variance 100 and mean µ. In this case, √σn = 2 and, making use of the central
limit theorem the following statement is approximately true:
 (Xn − µ)   
Pr −1.96 < < 1.96 = .95 or Pr Xn − 3.92 < µ < Xn + 3.92 = .95
2
If the sample mean is given by xn = 67.53, an approximate 95% confidence interval
for the sample mean is given by (63.61, 71.45).

& %
Page 9 Rohini Somanathan
' $

Confidence Intervals for means..example 3

Using the MLE for the variance and the t-distribution

Suppose we are interested in a confidence interval for the mean of a normal


distribution but do not know σ2 .

We know that
(Xn − µ)
√ ∼ tn−1
σ̂/ n − 1
so can use the t-distribution with (n − 1) degrees of freedom to construct our
interval estimate.

With n = 10, xn = 3.22, σ̂ = 1.17, a 95% confidence interval is given by


√ √
(3.22 − (2.262)(1.17)/ 9, 3.22 + (2.262)(1.17)/ 9) = (2.34, 4.10)

(display invt(9,.975) gives you 2.262)

& %
Page 10 Rohini Somanathan
' $

Confidence Intervals for differences in means


Consider X1 , . . . , Xn and Y1 , . . . , Ym where Xi ∼ N(µ1 , σ2 ) and Yi ∼ N(µ2 , σ2 ) are independent

Denote sample means and variances by X̄, Ȳ, σ̂21 , σ̂22 .

We know (using previous results) that:


σ2 σ2
X̄ and Ȳ are normally and independently distributed with means µ1 and µ2 and variances n and m
σ2 σ2 (X̄n −q
Ȳm )−(µ1 −µ2 )
(X̄n − Ȳm ) ∼ N(µ1 − µ2 , n + m) so ∼ N(0, 1)
σ2 +σ
2
n m
nσ̂21 mσ̂22
σ2
∼ χ2n−1 and σ2
∼ χ2m−1 , so their sum (nσ̂21 + mσ̂22 )/σ2 ∼ χ2n+m−2 . Therefore

(X̄n − Ȳm ) − (µ1 − µ2 )


U= r  ∼ tn+m−2
nσ̂21 +mσ̂22 1

1
(n+m−2) n + m

Denote the denominator of U by R.

Suppose we want a 95% confidence interval for the difference in the means:
 
Using the above t-distribution, we find a number b for which Pr −b < X < b = .95

The random interval (X̄ − Ȳ) − bR, (X̄ − Ȳ) + bR contains the true difference in means with 95% probability.

A confidence interval is now based on sample values, (x̄n − ȳm ) and corresponding sample variances.

Based on the CLT, we can use the same procedure even when our samples are not normal.
& %
Page 11 Rohini Somanathan
' $

The F-distribution

RESULT : Let Y ∼ χ2m , Z ∼ χ2n , and let Y and Z be independent random variables. Then

Y/m nY
F= =
Z/n mZ

has an F-distribution with m and n degrees of freedom.


– The F-density is given by:

Γ( m+n
2 )m
m/2 nn/2
xm/2−1
f(x) = I
(m+n)/2 (0,∞)
(x)
Γ( m
2 )Γ( n
2 ) (mx + n)

– m andn are referred to as the numerator and denominator degrees of freedom respectively.
– the F density is defined on the positive real numbers and is skewed to the right.
– from the definition of the F distribution (the ratio of two χ2 distributions ), it follows that

1
= Fv2 ,v1
Fv1 ,v2

So if X ∼ F(m, n), 1
X ∼ F(n, m).
– The square of a random variable with a t distribution with n degrees of freedom has an F distribution with
(1, n) degrees of freedom.

The F test is used in many hypothesis testing problems (e.g. testing for equality of variances of two distributions)

& %
Page 12 Rohini Somanathan

You might also like