0% found this document useful (0 votes)
7 views32 pages

8 Stat Rec

statistica

Uploaded by

Alice Rossi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views32 pages

8 Stat Rec

statistica

Uploaded by

Alice Rossi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

How to use CLT

An insurance company has 10,000 automobile policyholders. If the


expected yearly claim per policyholder is 260 with a standard
deviation of 800, approximate the probability that the total yearly
claim exceeds 2.8 million.
10000

claims Ci
S sum fall
is
8002
E Ci 260 Vki
i th customer
Ci claim for
the distribution
CLT to approximate
use
We can
apox
of
eeEz
E.si 196 / 218
2 28
P P 5 89
5
90
2800000 1

I 7 28 1 112 2.5

1 2.5
2.5
1 0.9938
0.0062
Example
Darth Vader wants to measure the distance between the Death
Star and Tatooine. However, due to atmospheric disturbances,
measurements will not yield the exact distance d. As a result,
Vader has decided to make a series of 36 measurements and then
use their average value as an estimate of the actual distance.
Assume that the values of the successive measurements are
independent random variables with a mean of d light years and a
standard deviation of 2 light years.
I Approximate the probability that the estimated value of the
distance will be within 0.5 light-years from d.

197 / 218
X X measurements E Xi d
36
V Xi 4
x̅ É estimation of distance
rgeenug.hr
gzm36is
meaning that CIT can be applied

P ix also 5 Ñ
f

P P 121 1.5
15 15
P 1.5 251.5 1.51 81 1.5 2811.5 1
811.57
1 2 0.9332 1
0.8664
Example
Darth Vader wants to measure the distance between the Death
Star and Tatooine. However, due to atmospheric disturbances,
measurements will not yield the exact distance d. As a result,
Vader has decided to make a series of 36 measurements and then
use their average value as an estimate of the actual distance.
Assume that the values of the successive measurements are
independent random variables with a mean of d light years and a
standard deviation of 2 light years.
I Approximate the probability that the estimated value of the
distance will be within 0.5 light-years from d.
I How many measurements Vader needs in order to be at least
95% certain that his estimate is accurate to within 0.5 light
years?

198 / 218
We consider now n measurements with n
generic

x̅ NON

0.95 P IX d 40.5

P P 12K
0.95
15 14 1
Pl Fs Z E 06
20141 1
0.95428141 1

1.955281 0.975581
0.025

1.96 0.025
70.02s

1.96 Vn 37.84 ns 7.84


How large is “large n”?
How large n should be to have a good approximation depends on
the shape of the population distribution. According to the textbook

A general rule of thumb is that you can be confident of the normal


approximation whenever the sample size n is at least 30.
In most cases the normal approximation is valid for much smaller
sample sizes.
Indeed, usually a sample size of 5 will suffice for the approximation
to be valid.

(I would be a little more cautious about this last statement)

199 / 218
How large is “large n”?
How large n should be to have a good approximation depends on
the shape of the population distribution. According to the textbook

A general rule of thumb is that you can be confident of the normal


approximation whenever the sample size n is at least 30.
In most cases the normal approximation is valid for much smaller
sample sizes.
Indeed, usually a sample size of 5 will suffice for the approximation
to be valid.

(I would be a little more cautious about this last statement)

NOTE: If the population is Normal, then X̄ is normal for all n.

(this is not an approximation from CLT,


it is due to the additive property of normal random variables)
200 / 218
Normal approximation to the Binomial distribution
One of the first important applications of the CLT was related to
Binomial random variables.
We know that a Binomial random variable X with parameters
(n, p) can be expressed as a sum of n independent random variables
X = E 1 + E2 + · · · + E n
with E [Ei ] = p and V [Ei ] = p(1 p).

201 / 218
Normal approximation to the Binomial distribution
One of the first important applications of the CLT was related to
Binomial random variables.
We know that a Binomial random variable X with parameters
(n, p) can be expressed as a sum of n independent random variables
X = E 1 + E2 + · · · + E n
with E [Ei ] = p and V [Ei ] = p(1 p).

As a consequence of the CLT, when n is large X is approx normal


with expectation np and variance np(1 p).
Equivalently,
X
X np p
p and qn
np(1 p) p(1 p)
n
are approximately standard normal.
203 / 218
Problem
Suppose that 60 percent of the residents of a city are in favor of
teaching evolution in high school.
1. Determine expected value and standard deviation of the
proportion of a random sample of size n that is in favor when
n = 10 n = 100 n = 1000 n = 10000

successes in the sample


number of

sample proportion
1 p
E XJ np V XJ np
1 P P
P
E ftp.np PVCEJ f.M
204 / 218
Expectation of sample proportion is
0.6
p for any possible n

variance of sample proportion is

becomes smaller and


PII 021 smaller as n increases

of course standard deviation is Ivorience


Problem
Suppose that 60 percent of the residents of a city are in favor of
teaching evolution in high school.
1. Determine expected value and standard deviation of the
proportion of a random sample of size n that is in favor when
n = 10 n = 100 n = 1000 n = 10000

2. Find the probability that over 55 percent of the members of


the sample are in favor of the proposal if the sample size is
n = 10 n = 100 n = 1000 n = 10000

205 / 218
n 10

P 0 55 P X 3 5 5

P x 6 P X 7 P X 10
10 R

6.61 0.4
K G

or d binom 6 size 10
prob 0.6

8
For n too

P 0.55 P X 355

100 k

R
Ei E
55
0.6 0.4

or we can use normal approximation


9 tox
No
P P
I
055

P 2 1
It I
Continuity correction for Normal approx to Binomial
When using Normal approximation to Binomial, note that:
since the normal is a continuous random variable,
P(X = i) would always be approximated as 0
even if it’s strictly positive (because Bernoulli is discrete).

206 / 218
Continuity correction for Normal approx to Binomial
When using Normal approximation to Binomial, note that:
since the normal is a continuous random variable,
P(X = i) would always be approximated as 0
even if it’s strictly positive (because Bernoulli is discrete).

To overcome this problem, it is best to compute


P(X = i) = P(i 0.5 < X  i + 0.5)
This is called the continuity correction.

207 / 218
Example
Suppose for a Binomial (n = 100, p = 0.40) you need to
approximate P(35  X  40):
P(35  X  40) = P(34.5  X  40.5)
!
34.5 40 X np 40.5 40
= P p p  p
24 np(1 p) 24
' P ( 1.12  Z  0.10)
= = (0.10) ( 1.12)
= (0.10 (1 (1.12))
= 0.5398 (1 0.8686) = 0.4084

36 P X 60
Pl X 35 P
39.52 560.5
34.5 2 335.5 P 35.52 536.5
208 / 218
Summary of sample mean properties
No matter what the population distribution is, denote
µ = the population expectation
2
= the population variance
X1 + · · · + Xn
then the sample mean X̄ = will have
n
I E [X̄ ] = µ
I V [X̄ ] = n
2

I CLT: for large n, the distribution is approximately normal

209 / 218
Expectation of the sample variance S 2
1 Pn
Remember: S2 = n 1 i=1 (Xi X̄ )2

It is possible to prove that E [S 2 ] = 2

(no time to do it in class, if interested see textbook for details)

this explains the denominator n 1

210 / 218
Sampling from a normal population
When the population is normally distributed,
I We have seen that the sample mean X̄ is normal for all n:
X̄ µ
p is standard normal for all n
/ n

211 / 218
Sampling from a normal population
When the population is normally distributed,
I We have seen that the sample mean X̄ is normal for all n:
X̄ µ
p is standard normal for all n
/ n
I Now we discuss a result that permits to obtain probabilities
Pn 2
2 i=1 (Xi X̄ )
for the sample variance S = n 1 :
(n 1)S 2 2
2
has n 1 distribution

xi
t.fi
f

NOTE EI 22 YESTERDAY
212 / 218
Sampling from a normal population
When the population is normally distributed,
I We have seen that the sample mean X̄ is normal for all n:
X̄ µ
p is standard normal for all n
/ n
I Now we discuss a result that permits to obtain probabilities
Pn 2
2 i=1 (Xi X̄ )
for the sample variance S = n 1 :
(n 1)S 2 2
2
has n 1 distribution
I Rather counterintuitive, but important: X̄ and S 2 are
independent

213 / 218
Problem
1. The following data sets come from normal populations whose
standard deviation is specified. In each case, determine the
value of a statistic whose distribution is chi-squared, and tell
how many degrees of freedom this distribution has.
(a) 104, 110, 100, 98, 106; = 4
(b) 1.2, 1.6, 2.0, 1.5, 1.3, 1.8; = 0.5
(c) 12.4, 14.0, 16.0; = 2.4
2. Explain why a chi-squared random variable having n degrees
of freedom will approximately have the distribution of a
normal random variable when n is large.
Hint: Use the central limit theorem.

214 / 218
do
In

EM
the observed value that statistic is
of
2 103.6
1
104 6
03.61

106

1031
Z Z Zi
i I I
Y Yet Yu

Yi
haeg.gg

ssmE afpox
we can
apply C
nt no
X
Hence when n is large enough a X

is
approximately normal with some

parameters µ and

SPOILER N J 2n
M
The t distribution
If we standardize the sample mean using sample variance (instead
of population variance)
X̄ µ
p is no longer normal.
S/ n
This is said to be a t distribution with n 1 degrees of freedom

(Tn 1 ).
The density function of a t looks similar to a standard normal
density, although it is somewhat more spread out, resulting in its
having “larger tails”.

As the degree of freedom parameter increases, the density becomes


more and more similar to the standard normal density (see picture
next slide).

216 / 218
Plot of standard normal and t densities

217 / 218
Quantiles of the t distribution
If Td is a t random variable with d degrees of freedom, its
100(1 ↵) percentile is
td,↵ such that P(Td > td,↵ ) = ↵
(same concept as z↵ for the standard normal)

Table in the appendix of the textbook.

218 / 218

You might also like