0% found this document useful (0 votes)
12 views14 pages

Lecture 9

Uploaded by

tlinhvu1305
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views14 pages

Lecture 9

Uploaded by

tlinhvu1305
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Lecture 9.

Chapter 9
Sampling distributions
9.3 Introduction to sampling distribution
9.4 Sampling distribution of the sample
mean 𝐗
9.5 Sampling distribution of the sample
proportion 𝐩

9.3 Introduction to sampling


distribution
• In real life, calculating the parameters of
populations is prohibitive because populations
are very large.
• Rather than investigating the whole population, we
take a sample, calculate a statistic related to
the parameter of interest and make an
inference about the parameter of the
population (based on the statistic of the sample).
• The sampling distribution of the statistic is
the tool that tells us how close the statistic is
to the parameter.
2

1
Recall what statistics is
Statistics
Information
Data

“Statistics is a way to get information from data”


Data: Facts, especially Information:
numerical facts, Knowledge
collected together for communicated
reference or concerning some
information. particular fact.

“Statistics is a tool for creating new


understanding from a set of numbers”
Definitions: Oxford English Dictionary

Key statistical concepts


Population Sample

Subset

Statistic
Parameter
• Populations have parameters: , 2, , p (probability)
• Samples have statistics: 𝑥, s2, s, 𝑝 (relative frequency)
• Based on values of statistics of the sample,
inferences can be made about parameters of the
population.

2
9.4 Sampling Distribution
of the Sample Mean
Example, page 346: A fair die is thrown infinitely many
times, with the random variable X = Number of spots
showing on any throw. The probability distribution of X is:

x 1 2 3 4 5 6
P(x) 1/6 1/6 1/6 1/6 1/6 1/6
and the mean and variance (of the population of
numbers of spots observed) are:

Sampling Distribution of Two Dice


A sampling distribution of the sample mean 𝐗 of size n
= 2 is created by looking at all 36 samples of two dice
and their means: 𝑥1 =1, 𝑥2 =1.5, …., 𝑥36 = 6.0.

While there are 36 possible samples of size 2, there


are only 11 values for 𝑥 , and some (e.g. 𝑥 =3.5)
occur more frequently than others (e.g. 𝑥 =1).
6

3
Sampling Distribution of Two Dice
The sampling distribution of X is shown below:
P( )
1.0 1/36
1.5 2/36
2.0 3/36
2.5 4/36
3.0 5/36
3.5 6/36
4.0 5/36
4.5 4/36
5.0 3/36
5.5 2/36
6.0 1/36

Compare…
Compare the distribution of X

1 2 3 4 5 6 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0

𝝁
The sampling
with the sampling distribution of 𝐗 distribution of the
(approximately normal distribution). statistic is the
tool that tells us
Note that: how close the
statistic is to the
parameter 𝝁
8

4
Generalize…
We can generalize the mean and variance of the
sampling of 2 dice:

The standard
to sampling of n dice deviation of the
(roll n dice and calculate sampling
the sample mean 𝑿 ): distribution of the
sample mean is
called the
standard error.

The above formulas can be proved using the laws of


expected value and variance for:
𝑿 = (𝑿𝟏 + 𝑿𝟐 + … + 𝑿𝒏 )/n, where Xi are independent
(normally distributed) random variables with the
same mean  and variance 2. 9

n  5 (roll 5 dice)
 x  3 .5
n  10 (roll 10 dice)
 x2
 x2  .5833 ( )  x  3.5 n  25 (roll 25 dice)
5
 2  x  3 .5
 x2  .2917 ( x
)
10  x2
 x2  .1167 ( )
25
1 3.5 6

Notice that  2x is smaller


2 3.5
than 𝜎𝑥 . The larger the sample 1 6
size the smaller  2x . Therefore,
x tends to fall closer to , as the
sample size increases. 1 3.5 6 10

5
Central Limit Theorem
• If a random sample is drawn from a normal
population, then the sampling distribution of the
sample mean 𝐗 is normally distributed for all
values of n (sample size).
• If a random sample is drawn from any population,
the sampling distribution of the sample mean
𝑿 is approximately normal for a sufficiently
large sample size. The larger the sample size,
the more closely the sampling distribution of 𝐗 will
resemble a normal distribution.
• A sample size of 30 may be sufficiently large to
allow us to use the normal distribution as an
approximation for the sampling distribution of 𝐗 .

11

Sampling Distribution of the Sample Mean


1.  x   x
 x2
2.  
2
x
n
3. If X is normal, X is normal. If X is non - normal
X is approximately normally distributed for
sufficient ly large sample size.

We can standardize the sampling X


distribution of the sample mean 𝐗 as
Z 
/ n

12

6
Sampling Distribution of the Sample Mean
The summaries above assume that the population is
infinitely large. However, if the population is finite the
standard error is
 Nn
x 
n N 1
Nn
where N is the population size and N 1
is the finite population correction factor.

In practice, most applications involve populations that


qualify as large. As a consequence the finite population
correction factor is usually omitted.

13

Example 9.1 pages 355 - 357


A stock broker has observed that the annual return
in the construction industry is actually a normal
distributed random variable with a mean of 12.5%
and a standard deviation of 2.5%.
 a/ Find the probability that a randomly selected
stock in the industry will have an annual return
less than 10.825%:
X   10.825  12.5
P( X  10.825)  P(  )  P( Z  0.67)  0.2514
 2.5
 c/ Find the probability that 04 randomly
selected stocks in the industry will have the
mean annual return less than 10.825%.
X   x 10.825  12.5
P( X  10.825)  P(  )  P( Z  1.34)  0.0901
x 1.25
14

7
Example, pages 342 and 358
Salaries of a business school’s graduates:
The average weekly income of graduates one year
after graduation is $1000. Suppose the distribution
of weekly income has a standard deviation of $100.
What is the probability that 25 randomly-selected
graduates have an average weekly income of less
than $950?
Solution
Let X be the weekly income of graduates one year
after graduation. We consider sampling distribution
of sample mean 𝐗 for sample size n = 25 and have
X  950  1000
P( X  950)  P(  )  P( Z  2.5)  0.0062
x 100 25

15

Example, pages 342 and 358 (contd.)


Salaries of a business school’s graduates:
The average weekly income of graduates one year
after graduation is $1000. Suppose the distribution
of weekly income has a standard deviation of $100.
A statistical inference: If a random sample of 25
graduates actually had an average weekly income of
$950, what would you conclude about the validity of
the claim that the average weekly income is $1000?
Solution
With  = 1000, the probability of having a sample
mean of 950 is very low (0.0062). So the claim that
the average weekly income is $1000 is probably
unjustified. It would be more reasonable to assume
that the average weekly income  is smaller than
$1000, because then a sample mean of $950
becomes more probable.
16

8
Using Sampling Distributions for
Inference
– To make inferences about / related to population
parameters we use sampling distributions.
– The symmetry of the normal distribution along with the
sample distribution of the mean lead to:
X 
P (1.96  Z  1.96)  0.95, or P (1.96   1.96)  0.95
 n
- z.025 z.025
 
This can be written as P (1.96  X    1.96 )  0.95
n n
 
which becomes P (   1.96  X    1.96 )  0.95
n n
 
In general, P (   z 2  X    z 2 )  1
n n
17

Standard normal distribution Z:  = 0.05

0.025 0.025

Normal distribution of 𝑿
–1.96 0 1.96

0.025 0.025

  
  1.96   1.96
n n
18

9
Example, pages 342 and 358 (contd.)

Substituting μ  1000, σ  100, and n  25 from the example,


100 100
P(1000  1.96  X  1000  1.96 )  0.95 or
25 25
P(960.8  X  1039.2)  0.95

Conclusion: A statistical inference can be made


 There is a 95% chance that the sample mean falls
within the interval [960.8, 1039.2] if the population
mean is 1000 and  =100.
 Since the sample mean was 950, the population mean
is probably not 1000.

19

9.5 Sampling Distribution of the


Sample Proportion
• The parameter of interest for nominal data is the
population proportion p (“theoretical relative
frequency”) of data that fall into a category, or
the proportion of time a particular outcome
(success) occurs.
• To estimate the population proportion p we use
the sample proportion p ^ (“experimental relative
frequency”).
• We prefer to use normal approximation to the
binomial distribution to make inferences about p ^.

20

10
Normal Approximation
to Binomial Distribution
Binomial distribution with n=20 and p=0.5 can be
approximated by the normal distribution with the
same mean and standard deviation.
(µ= np = 10 and  = 𝑛𝑝 1 − 𝑝 = 2.24)

21

Normal Approximation
to Binomial Distribution
• Normal approximation to the binomial distribution
works best when
– the number of experiments (sample size) is
large, and
– the probability of success p is close to 0.5.

• For the approximation to provide good results:


np  5; n(1 – p)  5.

22

11
Normal Approximation
to Binomial Distribution
To calculate
P(X=10) using
the normal
distribution, we
can find the area
under the
normal curve
between 9.5 and
10.5

P(X = 10) ≈ P(9.5 < Y < 10.5) = 0.1742


where Y is a normal random variable approximating
the binomial random variable X.
23

Explanation
 = np = 20(0.5) = 10; 2 = np(1 – p) = 20(0.5)(1 – 0.5) = 5;  = 2.24

The exact probability is P(X = 10) = 0.176.


P(9.5<YNormal<10.5)
the approximation

9.5 10 10.5

9.5  10 10.5  10
P(XBinomial = 10) ~= P(9.5<Y<10.5)  P( Z )  .1742
2.24 2.24
24

12
More Normal Approximation Exercises

P(X  8)  P(Y < 8.5)

8
P(X 14)  P(13.5 < Y < n + 0.5) 8.5
 P(Y > 13.5)

For large n the effect of the


continuity correction factor 0.5 is 13.5 14
very small and will be omitted.
25

Approximate Sampling Distribution of a


Sample Proportion 𝒑
𝑋
• We have: the sample proportion 𝑝 = 𝑛 where X is the
number of successes in n trials and is binomially distributed,
with mean np and variance np(1-p).
• If both np  5 and n(1–p)  5, then 𝑋 can be approximated by
Y, normally distributed with the same mean and is
approximately standard normally distributed. Hence
X  np np̂  np p̂  p
Z  or Z
np(1  p) np(1  p) p(1  p)
n
can be considered as having standard normal distribution.
(𝑝 can be considered as normally distributed with mean p and
standard deviation 𝑝(1 − 𝑝)/𝑛 .
26

13
Example 9.2 revised, page 363
The Laurier company’s brand has a market share of
45.6%. In a survey, 300 consumers were asked which
brand they prefer. What is the probability that more than
50% of the respondents say they prefer the Laurier
brand?
Solution: The number of respondents who prefer Laurier
is binomial with n = 300 and p = 0.456. Also, np =
300(0.456) = 136.8 > 5, and n(1 – p) = 300(1 – 0.456)
= 163.2 > 5. Therefore, 𝑝 is normal with mean p =
0.456 and standard error
p (1  p ) 0.456(1  0.4561)
 pˆ    0.0288
n 300
Hence
 pˆ  p 0.50  0.456 
P ( pˆ  0.50)  P   P ( Z  1.53)  0.0618
 p (1  p ) n 0.0288 

27

Summary: page 366

Home assignment:

- Section 9.4 Exercises pages 359 - 361: 9.6, 9.9,


9.15, 9.20

- Section 9.5 Exercises page 364: 9.24, 9.25, 9.28,


9.30

28

14

You might also like