0% found this document useful (0 votes)
59 views

Lesson17n18 SampleMeanCLT

This document discusses the central limit theorem (CLT). It states that as the sample size increases, the sampling distribution of the sample mean approaches a normal distribution, even if the population is not normally distributed. It provides examples showing how the distribution of sample means becomes more normal-looking as the sample size increases from n=1 to n=5 to n=10 when sampling from an exponential distribution. The CLT is important because it allows us to use normal approximations even for non-normal populations when the sample size is large enough, typically n=30 or more.

Uploaded by

Nhi Hoàng
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Lesson17n18 SampleMeanCLT

This document discusses the central limit theorem (CLT). It states that as the sample size increases, the sampling distribution of the sample mean approaches a normal distribution, even if the population is not normally distributed. It provides examples showing how the distribution of sample means becomes more normal-looking as the sample size increases from n=1 to n=5 to n=10 when sampling from an exponential distribution. The CLT is important because it allows us to use normal approximations even for non-normal populations when the sample size is large enough, typically n=30 or more.

Uploaded by

Nhi Hoàng
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

T H N

I IO

2 W U
T

3 7 C S Y R
IB EM
T
1 TI I D I H E S
T O
R

T
A T IS I L N T T
B
M A A E M LIMI E
A

ST OB M P L R A L
PR S A N T E
.3 C
7
.4
7
REVIEW OF IMPORTANT
n RAND VAR DEFS & PROPS
Expectation:   E  X    xi P  X  xi 
i 1

  V  X   E  X      E  X 2    2
2
Variance: 2 
 
Properties: E cX   cE  X  E  X  Y   E  X   E Y 

V cX   c V  X 
2

If X and Y are independent: V  X  Y   V  X   V Y 


SECTION 7.3 SAMPLE MEAN DISTRIBUTION
Suppose that 100 registered voters in a county are selected for jury duty.
How many will show up when called upon? We can use that figure to a get a proportion.
This is an example of a sample proportion, which is an example of a sample mean, where
the random variable is 0 if you don’t show up and 1 if you do.
Can we use this to estimate the proportion of all voters who would show up if called
upon?
DEFINITION OF THE SAMPLE MEAN DISTRIBUTION
If we sample n times from a population with mean μ and std dev σ, the sample
mean random variable is
X1  X 2    X n 1 n
X   Xi
n n i 1

Exercise: use the properties from the first slide to prove


2
1. E  X    2. V  X  
n
For result 2, if we take the square root of both sides, we get:

SD  X  
n
This last expression is the standard error for the sample mean.
GRAPHS OF SAMPLE MEAN DISTRIBUTIONS
Plots of sample mean distributions of
different sample sizes taken from the
standard normal.

As the sample size increases:


1. the mean remains where it is
2. the spread decreases
SAMPLE MEAN DISTRIBUTION OF A NON-
NORMAL, N=1 AND 2
Even if the X is non-normal, the sample mean becomes more and more normal as n gets
large, a result known as the Central Limit Theorem.
 We illustrate by using a simple, discrete uniform distribution for Xi:
 Even for n = 2,
we see the bud of normal, namely a triangle:
.
SAMPLE MEAN DISTRIBUTION OF A NON-
NORMAL, N=3
We can model this experiment with a coin flip.
However, instead of heads and tails, or 1 and 0 as outcomes, we use 2 & 1.
If we flip the coin 3 times, there are 8 possible outcomes, each with equal probability: 111,
112, 121, 122, 211, 212, 221, 222
Grouping based on their sums: Dividing the sums by 3 and
counting to get a frequency table:
3 111 f()
4 112,121, 211 3/3=1 111 ->1
5 122, 212,221 4/3 112,121, 211 ->3

6 222 5/3 122, 212,221 ->3


6/3=2 222 ->1
SAMPLE MEAN DISTRIBUTION OF A NON-NORMAL,
N=3
Dividing the frequencies by 8 to get probabilities and changing the vertical table to a
horizontal, we get:

4 5
X 1 2
3 3
1 3 3 1
prob
8 8 8 8

Which is a further step toward the shape of a normal distribution.


Since the distribution is symmetric, we see easily that the mean is 1.5, consistent with
E  X   
VARIANCE OF SAMPLE MEAN OF A NON-NORMAL,
N=3
To find the variance, we use the formula V  X   E  X 2    2

16 25 1 16 3 25 3 1
X2
1 4 E  X 2   1      4 
9 so9 8 9 8 9 8 8
1 3 3 1 3 16 25 12 56 7
prob      
8 8 8 8 24 24 24 24 24 3
2
7  3  7 9 28  27 1
and V  X   E  X 2    2        
3 2 3 4 12 12

which is consistent with the result


7.3.2. If X and X are a sample of size 2 from a population whose
1 2

distribution is given by: P{X = 1} = 0.7 P{X = 2} = 0.3


(a) Compute E[X].
(b) Compute Var(X).
(c) What are the possible values of = (X + X ) /2?
1 2

(d) Determine the probabilities that assumes the values in (c).


(e) Using (d), directly compute E and V
(f) Are your answers to (a), (b), and (e) consistent with the formulas
presented in this section?
SOLUTION TO 7.3.2
(a)   E  X   1 0.7  2  0.3  1.3
(b)  2  V  X   E  X 2    2  1 0.7  4  0.3  1.32  1.9  1.69  .21
(c),(d) 𝑋¯ 1 1.5 2
prob 0.49 0.42 0.09
¿ ¿ ¿ ¿
(e) ¯2
𝑋 1 2.25 4
prob 0.49 0.42 0.09
¿ ¿ ¿ ¿

= 1.795
= 1.795−1.69 = 0.105
2
¯ ] = 𝜎 = 0.21 = 0.105
𝑉[𝑋
(f) which agree with   
E  Xand 𝑛 2
ATM WITHDRAWALS
7.3.4. The amount of money withdrawn in each transaction at an automatic
teller of a branch of the Bank of America has mean $80 and standard
deviation $40. What are the mean and standard deviation of the average
amount withdrawn in the next 20 transactions?
Solution: As a preliminary task, we extract the relevant information:
• and
• The relevant distribution is with and we desire and .
We are now ready to do the calculations:

In words: The mean for the average amount withdrawn in the next 20 transactions is $80 and
the standard deviation is , i.e., on average, the sample mean will differ from $80 by about $9.
DEF OF X HAT AND SOME RESULTS
Given Xi, a family of RV’s each with mean μ and SD σ, define the sum distribution X “hat” to be
n
Xˆ  X 1  X 2    X n   X i
i 1

Then
 ˆ and

E  X   n V  Xˆ   n 2  SD  Xˆ    n 
Exercise: Prove the above results using , the corresponding results for and properties of
expectation and SD reviewed last class.
A FERRY BOAT AT CAPACITY
7.3.7. The weight of a randomly chosen person riding a ferry has expected
value 155 pounds and standard deviation 28 pounds. The ferry has
the capacity to carry 100 riders. Find the expected value and standard
deviation of the total passenger weight load of a ferry at capacity.
Solution: .
Using , we get lbs.
Using , we get lbs.
Note the relatively low standard deviation compared with the mean. This result is based on
independence which may not be true. Examples:
1. A group of children on an elementary school field trip could be riding the ferry in which case the
weight will be as much as 5000 lbs less than the mean.
2. The ferry could be filled with a high school football team and the weight might be 5000 lbs above
the mean.
7.4 CENTRAL LIMIT THEOREM
If we repeat an experiment over an over again, for example calling up 100 registered voters for jury
duty.
• Each time we will get a different number of jurors showing up.
• However, if we average the proportions we get,
• the result will be much more likely to be close to the actual proportion of all registered voters
who would show up than the proportion from a single day.
This result is known as the Central Limit Theorem.
CENTRAL LIMIT THEOREM
For any RV with mean  and SD σ,

Pictured is for n=1, 5 & 10 with the exponential distribution.


How large does n have to be for to be “normal”?
• For most purposes n=30 is sufficient (although even by n=10, it’s close).
MORE ILLUSTRATION OF CENTRAL LIMIT THEOREM (CLT)

For an animation using a uniform distribution for see


http://www.statisticalengineering.com/images/CLTuniform.gif

Some suggestions for viewing:


Default size is small:
 Increase the viewing area size (e.g., by pressing ctrl-+).
Note how
 the distribution quickly takes on a bell shape;
 the standard deviation decreases as n increases.
DIRECTIONS FOR HOMEWORK EXERCISES
First identify which distribution you will use, i.e., or .
Draw normal graph (No picture no credit!) with
 both standard and nonstandard horizontal labelings;
 appropriate inequality shaded.
We will illustrate one example of each type.
7.4.2. Frequent fliers of a particular airline fly a random number of miles
each year, having mean and standard deviation (in thousands of miles) of
23 and 11, respectively. As a promotional gimmick, the airline has decided
to randomly select 20 of these fliers and give them, as a bonus, a check of
$10 for each 1000 miles flown.
Approximate the probability that the total amount paid out is
(a) Between $4500 and $5000
(b) More than $5200

• The trick to deciding whether to use or is to determine whether an


average or a sum of outcomes is being asked for.
• We are asked for a total not an average so we want to work with .
2. (cont.)Frequent fliers of a particular airline fly a random number of miles each year,
having mean and standard deviation (in thousands of miles) of 23 and 11, respectively.
As a promotional gimmick, the airline has decided to randomly select 20 of these
fliers and give them, as a bonus, a check of $10 for each 1000 miles flown.
(a) Between $4500 and $5000 (b) More than $5200
• μ and σ are given in thousands of miles.
• To convert to $, multiply by 10 (payouts are $10 for each thousand
miles) to get μ=$230 and σ=$110. Hence,
E  Xˆ   n  20(230)  4600
and
SD  Xˆ    n  110 20  220 5  492

(a) Form the appropriate inequality: 4500  Xˆ  5000


2. (cont.) Frequent fliers of a particular airline fly a random number of miles each year, having
mean and standard deviation (in thousands of miles) of 23 and 11, respectively. As a promotional
gimmick, the airline has decided to randomly select 20 of these fliers and give them, as a bonus, a
check of $10 for each 1000 miles flown. Find chance that payout is
(a) Between $4500 and $5000 (b) More than $5200
and standardize
100  Xˆ  4600  400
100 Xˆ  4600 400
 
220 5 220 5 220 5
.203   5 /11  Z  4 5 /11  .813
Using Excel,
=NORM.S.DIST(4*SQRT(5)/11, TRUE)-NORM.S.DIST(-SQRT(5)/11, TRUE)=.3725
Finish with a sentence:
• The probability that the total payout is between $4500 and $5000 is 37%.
2. (cont.) Frequent fliers of a particular airline fly a random number of miles each year,
having mean and standard deviation (in thousands of miles) of 23 and 11, respectively.
As a promotional gimmick, the airline has decided to randomly select 20 of these fliers
and give them, as a bonus, a check of $10 for each 1000 miles flown.
(a) Between $4500 and $5000
(b) More than $5200 Xˆ  4600  600
(b) Similarly Xˆ  5200and standardizing: Xˆ  4600 600

220 5 220 5
Z  6 5 /11  1.22
Using Excel:
=1-NORM.S.DIST(6*SQRT(5)/11, TRUE)=.1113
The probability that the total payout by the airline is over $5200 is 11%.
14. The lifetime of a certain type of electric bulb has expected value 500 hours
and standard deviation 60 hours. Approximate the probability that the sample
mean of 20 such light bulbs is less than 480 hours.
• We are asked for an average (sample mean) so we work with :
 60 30
E  X and
   500 SD  X      6 5  13.4
n 20 5
• The problem is to find the probability that X  480
• Standardize: X  500  20
X  500 20

6 5 6 5
Z  2 5 / 3  1.49
• Using Excel: =NORM.S.DIST(-2*SQRT(5)/3, TRUE)=.06802
The chance that sample mean of 20 bulbs is less than 480 hrs is 6.8%.
DIFFERENCE OF SAMPLE MEANS
DISTRIBUTION
Given 2 distributions with
• means ,
• std devs ,
• sampling sizes and
The difference between the sample means distribution is
• and
(These formulas follow easily from the properties of the mean and variance).
• The difference distribution can be considered normal if
the respective sample means distributions can be considered normal.
EXAMPLE 10.1 TIRES FROM 2 DIFFERENT
PROCESSES
Two new methods for producing a tire have been proposed.
The manufacturer believes there will be no appreciable difference in the lifetimes of tires produced.
To test the plausibility of such a hypothesis, a sample of 9 tires is produced by method 1 and a
sample of 7 tires by method 2.
The first sample of tires is to be road-tested at location A and the second at location B.
Table 10.1 Tire Lives in Units of 1000 Kilometers

Tires tested at A Tires tested at B


66.4 58.2
61.6 60.4
60.5 55.2
59.1 62
63.6 57.3
61.4 58.7
62.5 56.1
64.4
60.7
EXAMPLE 10.1 TIRES FROM 2 DIFFERENT PROCESSES
(CONT.)
It is known that the lifetime of a tire tested at either of these locations is a normal random variable with
a mean life due to the tire
but with a variance that is due to the location. Tires tested at location
• A have thousand km;
• B have 4 thousand km.
Find the values of the sample means for each location, the difference in the sample mean and the
standard deviation for the difference distribution.
km and km, so
Similarly:

The difference of the sample means value is 4.0 thousand km


and the standard deviation is 1.8 thousand km.
SURVEYS AND POLLS SUBSECTION
With the last year’s presidential elections still on some of our minds,
• Why are polls examples of sampling distributions?
In what follows, we will provide some information on
• sources of error with an ultimate focus on sampling error;
• margin of error and its direct relation to standard error.
Much of the material has been gleaned from Wikipedia:
https://en.wikipedia.org/wiki/Opinion_poll
There you can find an excellent mini-history of polling and embedded links to more
detailed treatments of a particular aspect of polling.
SURVEYS AND POLLS: SOURCES OF ERROR 1
Sampling error = estimating statistic (sample) – actual parameter value (population)
• uncertainty that comes from selecting sample, rather than whole population:
• probabilistic, easy to quantify as given by the CLT;
• can be decreased by increasing the sample size.
SURVEYS AND POLLS: SOURCES OF ERROR 2
Response bias results from answers given by respondents that do not reflect true beliefs.
• often results from wording or ordering of questions.
• may be deliberately engineered by unscrupulous pollsters
• ill-considered answers to hasten end of questioning or avoid embarrassment
• 4% of Americans report they have personally been decapitated
• Social pressure to be “politically correct” or hide racism or sexism.
Nonresponse bias: people do not answer calls from strangers, or refuse to answer poll
• Response rates have been declining & are down to about 10%
• If the characteristics of those who agree to be interviewed markedly differ from those who decline,
this will contribute to selection bias (unrepresentative sample)
SURVEYS AND POLLS: MARGIN OF ERROR
A 3% margin of error (MOE) means that if the same
procedure is used a large # of times,
95% of the time the true population average will be within
the sample estimate ±3%, i.e.,
|sampling error|≦ 3%
The margin of error:
• Is roughly twice the standard error;
• can be reduced by using a larger sample.
For a fair coin, , so MOE
• To get a MOE of 1%:
• need a sample of 10,000.
In practice, pollsters balance
1. cost of a larger sample
2. the increase in sampling error from a smaller
sample
and select a sample size of 500–1,000.
SURVEYS AND POLLS EXAMPLE
To make things simple, focus on the support of a particular candidate, say Trump.
Think of a poll as the flipping of a (bent) coin a certain number of the times (n).
If someone surveyed supports Trump think of that as a “head”.
• If in the state being polled, say Vermont, Trump has 39% support.
(Note that we are playing God here, as we will never know this.)
Then the 39% should be thought of as the “p” in the coin flip.
SURVEYS AND POLLS EXAMPLE (CONT)
So state is Vermont and Trump has 39% support, i.e., p = 0.39.
Suppose 200 people are surveyed, and we count the support for Trump.
 this sampling distribution is the binomial distribution,
and is a special case of the sum distribution :

 However, results of a survey are given as a proportion,


so distribution should be the sampling mean
SURVEYS AND POLLS QUESTION 1
So state is Vermont and Trump has 39% support, i.e., p = 0.39.
• 200 people are surveyed
• sampling distribution has and standard error
Suppose that we can no longer play God and instead our survey says that Trump’s support is 43%
(which is just over one standard error from Trump’s actual support).
Question: As a pollster, how would you report the result?
Answer: In a newspaper, results are rounded to the nearest percent and so we use the margin of error
of roughly 7% (twice the standard error):
“The City Tech pollster group has determined that Pres. Trump has 43 ±7% support in the state of
Vermont.”
SURVEYS AND POLLS QUESTION 2
So state is Vermont and Trump has 39% support, i.e., p = 0.39.
• 200 people are surveyed
• sampling distribution has and standard error
Suppose that we can no longer play God and instead our survey says that Trump’s support is 43%
(which is just over one standard error from Trump’s actual support).
Question: Using the result of the poll, what is your prediction for Trump’s chance of winning the
state? (Assuming only 2 candidates, Trump needs 50% to win.)
Answer: Our survey says 43% Trump support, so we conclude that Trump’s chance of winning the
Vermont is 2%.

x z P(x>0.5)
0.43 (0.5-0.43)/0.034 1-NORM.S.DIST((0.5-0.43)/0.034,TRUE)
2.06 2%

You might also like