0% found this document useful (0 votes)
9 views

Lecture 3

Biostatistics lecture 3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lecture 3

Biostatistics lecture 3
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

The Normal Distribution

The Normal Distribution


 Using Statistics
 Properties of the Normal Distribution
 The Standard Normal Distribution
 The Transformation of Normal Random Variables
 The Inverse Transformation
 The Normal Approximation of Binomial Distributions
LEARNING OBJECTIVES

After studying this chapter, you should be able to:


 Identify when a random variable will be normally
distributed
 Use the properties of normal distributions

 Explain the significance of the standard normal distribution

 Compute probabilities using normal distribution tables

 Transform a normal distribution into a standard normal


distribution
 Convert a binomial distribution into an approximate normal
distribution
Introduction
As n increases, the binomial distribution approaches a ...
n=6 n = 10 n = 14
Binomial Distribution: n=6, p=.5 Binomial Distribution: n=10, p=.5 Binomial Distribution: n=14, p=.5

0.3 0.3 0.3

0.2 0.2 0.2


P(x)

P(x)
P(x)

0.1 0.1 0.1

0.0 0.0 0.0


0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
x x x

Normal Probability Density Function: Normal Distribution:  = 0, = 1


0.4

x - 2
 

-
  0.3
 
 

e 2 2

f(x)
f ( x) = - < x<
0.2
1 for 0.1

2p 2 0.0

where e = 2 . 7182818 ... and p = 3 . 14159265 ...


-5 0 5
x
The Normal Probability Distribution

The normal probability density function: Normal Distribution:  = 0, = 1

0.4


x - 2

-
  0.3
 
 

f (x) = 1 e 2  2 for -< x<

f(x)
0.2

2 p 2 0.1

where e = 2 .7182818 ... and p = 3.14159265 ...


0.0
-5 0 5
x
Properties of the Normal Distribution

• The normal is a family of


Bell-shaped and symmetric distributions.
Because the distribution is symmetric, one-half (.50 or 50%) lies on
either side of the mean.
Each is characterized by a different pair of mean, , and variance,
. That is: [X~N()].
Each is asymptotic to the horizontal axis.
The area under any normal probability density function within k
of  is the same for any normal distribution, regardless of the mean
and variance.
Properties of the Normal Distribution
(continued)

• If several independent random variables are normally distributed


then their sum will also be normally distributed.
• The mean of the sum will be the sum of all the individual means.
• The variance of the sum will be the sum of all the individual
variances (by virtue of the independence).
Properties of the Normal Distribution
(continued)

• If X1, X2, …, Xn are independent normal random variable, then


their sum S will also be normally distributed with
• E(S) = E(X1) + E(X2) + … + E(Xn)
• V(S) = V(X1) + V(X2) + … + V(Xn)
• Note: It is the variances that can be added above and not the
standard deviations.
4-9

Properties of the Normal Distribution


– Example

Example 4.1: Let X1, X2, and X3 be independent random variables that are
normally distributed with means and variances as shown.

Mean Variance
X1 10 1
X2 20 2
X3 30 3

Let S = X1 + X2 + X3. Then E(S) = 10 + 20 + 30 = 60 and


V(S) = 1 + 2 + 3 = 6. The standard deviation of S is 6
= 2.45.
4-10

Properties of the Normal Distribution


(continued)

• If X1, X2, …, Xn are independent normal random variable, then the


random variable Q defined as Q = a1X1 + a2X2 + … + anXn + b will
also be normally distributed with
• E(Q) = a1E(X1) + a2E(X2) + … + anE(Xn) + b
• V(Q) = a12 V(X1) + a22 V(X2) + … + an2 V(Xn)
• Note: It is the variances that can be added above and not the
standard deviations.
Properties of the Normal Distribution
– Example

Example 4.3: Let X1 , X2 , X3 and X4 be independent random variables that are normally
distributed with means and variances as shown. Find the mean and variance of Q =
X1 - 2X2 + 3X2 - 4X4 + 5

Mean Variance
X1 12 4
X2 -5 2
X3 8 5
X4 10 1

E(Q) = 12 – 2(-5) + 3(8) – 4(10) + 5 = 11


V(Q) = 4 + (-2)2(2) + 32(5) + (-4)2(1) = 73
SD(Q) = 73 = 8.544
Normal Probability Distributions
All of these are normal probability density functions, though each has a different mean and variance.
Normal Distribution:  =40, =1 Normal Distribution:  =30, =5 Normal Distribution:  =50, =3
0.4 0.2 0.2

0.3
f(w)

f(x)

f(y)
0.2 0.1 0.1

0.1

0.0 0.0 0.0


35 40 45 0 10 20 30 40 50 60 35 45 50 55 65
w x y

W~N(40,1) X~N(30,25) Y~N(50,9)


Normal Distribution:  =0, =1
0.4 Consider:
0.3
The probability in each
P(39  W  41) case is an area under a
f(z)

0.2

0.1 P(25  X  35) normal probability density


0.0 P(47  Y  53) function.
P(-1  Z  1)
-5 0 5
z

Z~N(0,1)
The Standard Normal Distribution

The standard normal random variable, Z, is the normal random


variable with mean  = 0 and standard deviation  = 1: Z~N(0,12).

Standard Normal Distribution

0 .4

0 .3

=1
f(z)

{
0 .2

0 .1

0 .0
-5 -4 -3 -2 -1 0 1 2 3 4 5

=0
Z
Finding Probabilities of the Standard
Normal Distribution: P(0 Z 1.56)
Standard Normal Probabilities
Standard Normal Distribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.4 0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.3 0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
f(z)

0.2 0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
0.1 1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.56 1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
{

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
0.0 1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
-5 -4 -3 -2 -1 0 1 2 3 4 5 1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
Z 1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

Look in row labeled 1.5 2.1


2.2
0.4821
0.4861
0.4826
0.4864
0.4830
0.4868
0.4834
0.4871
0.4838
0.4875
0.4842
0.4878
0.4846
0.4881
0.4850
0.4884
0.4854
0.4887
0.4857
0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
and column labeled .06 to 2.4
2.5
0.4918
0.4938
0.4920
0.4940
0.4922
0.4941
0.4925
0.4943
0.4927
0.4945
0.4929
0.4946
0.4931
0.4948
0.4932
0.4949
0.4934
0.4951
0.4936
0.4952

find P(0  z  1.56) = 2.6


2.7
0.4953
0.4965
0.4955
0.4966
0.4956
0.4967
0.4957
0.4968
0.4959
0.4969
0.4960
0.4970
0.4961
0.4971
0.4962
0.4972
0.4963
0.4973
0.4964
0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
0.4406 2.9
3.0
0.4981
0.4987
0.4982
0.4987
0.4982
0.4987
0.4983
0.4988
0.4984
0.4988
0.4984
0.4989
0.4985
0.4989
0.4985
0.4989
0.4986
0.4990
0.4986
0.4990
Finding Probabilities of the Standard
Normal Distribution: P(Z < -2.47)
To find P(Z<-2.47): z ... .06 .07 .08
. . . .
Find table area for 2.47 . . . .
P(0 < Z < 2.47) = .4932 . . . .
2.3 ... 0.4909 0.4911 0.4913
P(Z < -2.47) = .5 - P(0 < Z < 2.47) 2.4 ... 0.4931 0.4932 0.4934
= .5 - .4932 = 0.0068 2.5 ... 0.4948 0.4949 0.4951
.
.
.
Standard Normal Distribution
Area to the left of -2.47
0.4
P(Z < -2.47) = .5 - 0.4932
= 0.0068 0.3 Table area for 2.47
P(0 < Z < 2.47) = 0.4932
f(z)

0.2

0.1

0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
Finding Probabilities of the Standard
Normal Distribution: P(1 Z 2)
To find P(1  Z  2): z
.
.00
.
...

. .
1. Find table area for 2.00 .
0.9
.
0.3159 ...

F(2) = P(Z  2.00) = .5 + .4772 =.9772


1.0 0.3413 ...
1.1 0.3643 ...
. .
. .
2. Find table area for 1.00 . .
1.9 0.4713 ...
F(1) = P(Z  1.00) = .5 + .3413 = .8413 2.0
2.1
0.4772
0.4821
...
...

3. P(1  Z  2.00) = P(Z  2.00) - P(Z  1.00)


. .
. .
. .

= .9772 - .8413 = 0.1359


Standard Normal Distribution
0.4

0.3
Area between 1 and 2
P(1  Z  2) = .9772 - .8413 = 0.1359
f(z)

0.2

0.1

0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
Finding Values of the Standard Normal
Random Variable: P(0 Z z) = 0.40
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
To find z such that 0.0
0.1
0.0000
0.0398
0.0040
0.0438
0.0080
0.0478
0.0120
0.0517
0.0160
0.0557
0.0199
0.0596
0.0239
0.0636
0.0279
0.0675
0.0319
0.0714
0.0359
0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

P(0  Z  z) = .40: 0.4


0.5
0.1554
0.1915
0.1591
0.1950
0.1628
0.1985
0.1664
0.2019
0.1700
0.2054
0.1736
0.2088
0.1772
0.2123
0.1808
0.2157
0.1844
0.2190
0.1879
0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
1. Find a probability as close as 0.9
1.0
0.3159
0.3413
0.3186
0.3438
0.3212
0.3461
0.3238
0.3485
0.3264
0.3508
0.3289
0.3531
0.3315
0.3554
0.3340
0.3577
0.3365
0.3599
0.3389
0.3621
possible to .40 in the table of 1.1
1.2
0.3643
0.3849
0.3665
0.3869
0.3686
0.3888
0.3708
0.3907
0.3729
0.3925
0.3749
0.3944
0.3770
0.3962
0.3790
0.3980
0.3810
0.3997
0.3830
0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
standard normal probabilities. . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .

2. Then determine the value of z Standard Normal Distribution


from the corresponding row
0.4
and column. Area to the left of 0 = .50 Area = .40 (.3997)
P(z  0) = .50 0.3

P(0  Z  1.28)  .40


f(z)

0.2

Also, since P(Z  0) = .50


0.1

0.0

P(Z  1.28)  .90


-5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z = 1.28
99% Interval around the Mean

To have .99 in the center of the distribution, there


z .04 .05 .06 .07 .08 .09
should be (1/2)(1-.99) = (1/2)(.01) = .005 in each . . . . . . .
. . . . . . .
tail of the distribution, and (1/2)(.99) = .495 in . . . . . . .
2.4 ... 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
each half of the .99 interval. That is: 2.5 ... 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 ... 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
. . . . . . .

P(0  Z  z.005) = .495


. . . . . . .
. . . . . . .

Look to the table of standard normal probabilities Total area in center = .99
to find that: Area in center left = .495
0.4

 z.005   0.3


Area in center right = .495

z.005  

f(z)
0.2

P(-.2575 Z  ) = .99 Area in left tail = .005


0.1
Area in right tail = .005

0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
-z.005 z.005
-2.575 2.575
The Transformation of Normal
Random Variables
The area within k of the mean is the same for all normal random variables. So an area
under any normal distribution is equivalent to an area under the standard normal. In this
example: P(40  X  =P(-1  Z   =  since =and =

The transformation of X to Z:
X -  x Normal Distribution:  =50, =10
Z =

x 0.07
0.06

Transformation 0.05

f(x)
0.04
(1) Subtraction: (X - x) 0.03
0.02 =10

{
Standard Normal Distribution 0.01

0.4 0.00
0 10 20 30 40 50 60 70 80 90 100
X
0.3
f(z)

0.2

(2) Division by x)


{

0.1 1.0 The inverse transformation of Z to X:


0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 X = x + Z x
Z
Example: Using the Normal Transformation

X~N(160,302)

P (100  X  180 )
 100 -  X -  180 -  
= P   
    
 100 - 160 180 - 160 
= P  Z  
 
30 30
= P (-2  Z  .6666 )
= 0 . 4772 + 0 . 2475 = 0 . 7247
Using the Normal Transformation

Example

X~N(127,222)
P( X < 150)

= P
 X -  150 -  
< 
   

= P Z <
150 - 127

 22 
(
= P Z < 1.045
= 0.5 + 0.3520 = 0.8520
The Transformation of Normal
Random Variables

The transformation of X to Z: The inverse transformation of Z to X:


X - x
Z = X =  + Z
x x x

The transformation of X to Z, where a and b are numbers::


 a - 
< =
P ( X a ) P Z < 
  
 b - 
> =
P ( X b ) P Z > 
  
a- b - 
< <
P (a X b ) P=  < Z < 
   
Normal Probabilities (Empirical Rule)

• The probability that a normal random S ta n d a rd N o rm a l D is trib u tio n

variable will be within 1 standard 0 .4

deviation from its mean (on either 0 .3

side) is 0.6826, or approximately 0.68.

f(z)
0 .2

• The probability that a normal random


0 .1
variable will be within 2 standard
deviations from its mean is 0.9544, or 0 .0
-5 -4 -3 -2 -1 0 1 2 3 4 5
approximately 0.95. Z

• The probability that a normal random


variable will be within 3 standard
deviation from its mean is 0.9974.
The Inverse Transformation
The area within k of the mean is the same for all normal random variables. To find a
probability associated with any interval of values for any normal random variable, all that
is needed is to express the interval in terms of numbers of standard deviations from the
mean. That is the purpose of the standard normal transformation. If X~N(50,102),
 x -  70 -    70 - 50 
P( X > 70) = P >  = P Z >  = P( Z > 2)
     10 

That is, P(X >70) can be found easily because 70 is 2 standard deviations above the mean
of X: 70 =  + 2. P(X > 70) is equivalent to P(Z > 2), an area under the standard normal
distribution.

Example 4-12 X~N(124,122) Normal Distribution:  = 124,  = 12


P(X > x) = 0.10 and P(Z > 1.28) 0.10 0.04
x =  + z = 124 + (1.28)(12) = 139.36
0.03
z .07 .08 .09
. . . . . f(x)
. . . . . 0.02
. . . . .
1.1 ... 0.3790 0.3810 0.3830 0.01
1.2 ... 0.3980 0.3997 0.4015 0.01
1.3 ... 0.4147 0.4162 0.4177
. . . . .
. . . . . 0.00
. . . . . 80 130 180
139.36
X
The Inverse Transformation (Continued)

Example X~N(2450,4002)
Example X~N(5.7,0.52) P(a<X<b)=0.95 and P(-1.96<Z<1.96)=0.95
P(X > x)=0.01 and P(Z > 2.33) 0.01 x =   z = 2450 ± (1.96)(400) = 2450
x =  + z = 5.7 + (2.33)(0.5) = 6.865 ±784=(1666,3234)
P(1666 < X < 3234) = 0.95
z .02 .03 .04
. . . . .
z .05 .06 .07
. . . . .
. . . . .
. . . . .
. . . . .
2.2 ... 0.4868 0.4871 0.4875
. . . . .
2.3 ... 0.4898 0.4901 0.4904
1.8 ... 0.4678 0.4686 0.4693
2.4 ... 0.4922 0.4925 0.4927
1.9 ... 0.4744 0.4750 0.4756
. . . . .
2.0 ... 0.4798 0.4803 0.4808
. . . . .
. . . . .
. . . . .
. . . . .

Normal Distribution:  = 5.7  = 0.5 Normal Distribution:  = 2450  = 400


0.8
0.0015
Area = 0.49
0.7
0.6 .4750 .4750
0.5 0.0010
f(x)

f(x)
0.4
0.3 X.01 = +z = 5.7 + (2.33)(0.5) = 6.865
0.0005
0.2 .0250 .0250
0.1 Area = 0.01
0.0 0.0000
3.2 4.2 5.2 6.2 7.2 8.2 1000 2000 3000 4000
X X
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
z Z.01 = 2.33 -1.96 Z 1.96
4-27

Finding Values of a Normal Random


Variable, Given a Probability
1. Draw pictures of Normal Distribution:  = 2450,  = 400

the normal 0.0012


.

0.0010
.
distribution in 0.0008
.
question and of the

f(x)
0.0006
.

standard normal 0.0004


.

0.0002
.
distribution.
0.0000
1000 2000 3000 4000
X

S tand ard Norm al D is trib utio n


0.4

0.3
f(z)

0.2

0.1

0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
4-28

Finding Values of a Normal Random


Variable, Given a Probability
Normal Distribution:  = 2450,  = 400
0.0012
.

0.0010
. .4750 .4750
1. Draw pictures of 0.0008
.
the normal

f(x)
0.0006
.

distribution in 0.0004
.

0.0002
. .9500
question and of the
0.0000
standard normal 1000 2000 3000 4000
X
distribution.
S tand ard Norm al D is trib utio n
0.4
.4750
2. Shade the area 0.3
.4750

corresponding to
f(z)

0.2
the desired
probability. 0.1
.9500
0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
Z
Finding Values of a Normal Random
Variable, Given a Probability
Normal Distribution:  = 2450,  = 400
1. Draw pictures of 3. From the table
0.0012
.
the normal 0.0010
. .4750 .4750 of the standard
distribution in 0.0008
.
normal

f(x)
question and of the 0.0006
.
distribution,
0.0004
.
standard normal 0.0002
. find the z value
.9500
distribution. 0.0000 or values.
1000 2000 3000 4000
X
2. Shade the area S tand ard Norm al D is trib utio n
corresponding 0.4
to the desired .4750 .4750
0.3
probability.
f(z)

0.2

z .05 .06 .07


. . . . . 0.1
. . . . . .9500
. . . . .
1.8 ... 0.4678 0.4686 0.4693 0.0
1.9 ... 0.4744 0.4750 0.4756 -5 -4 -3 -2 -1 0 1 2 3 4 5
2.0 ... 0.4798 0.4803 0.4808 Z
. . . . .
. . . . .
-1.96 1.96
Finding Values of a Normal Random
Variable, Given a Probability
Normal Distribution:  = 2450,  = 400
1. Draw pictures of
0.0012
. 3. From the table
the normal .4750 .4750
0.0010
.
of the standard
distribution in 0.0008
.
normal

f(x)
question and of the 0.0006
.

0.0004
. distribution,
standard normal 0.0002
. .9500 find the z value
distribution. 0.0000
1000 2000 3000 4000 or values.
X
2. Shade the area S tand ard Norm al D is trib utio n 4. Use the
corresponding 0.4
transformation
to the desired .4750 .4750 from z to x to get
0.3
probability. value(s) of the
original random
f(z)

0.2

z .05 .06 .07


0.1
variable.
. . . . .
. . . . . .9500
. . . . .
0.0
x =   z = 2450 ± (1.96)(400)
1.8 ... 0.4678 0.4686 0.4693
1.9 ... 0.4744 0.4750 0.4756 -5 -4 -3 -2 -1 0 1 2 3 4 5
2.0 ... 0.4798 0.4803 0.4808 Z = 2450 ±784=(1666,3234)
. . . . .
. . . . .
-1.96 1.96
Finding Values of a Normal Random
Variable, Given a Probability
The normal distribution with  = 3.5 and  = 1.323 is a close
approximation to the binomial with n = 7 and p = 0.50.

P(x<4.5) = 0.7749 Normal Distribution:  = 3.5,  = 1.323 Binomial Distribution: n = 7, p = 0.50

0.3 0.3

P( x 4) = 0.7734
0.2 0.2

P(x)
f(x)

0.1 0.1

0.0 0.0
0 5 10 0 1 2 3 4 5 6 7
X X

MTB > cdf 4.5; MTB > cdf 4;


SUBC> normal 3.5 1.323. SUBC> binomial 7,.5.
Cumulative Distribution Function Cumulative Distribution Function

Normal with mean = 3.50000 and standard deviation = 1.32300 Binomial with n = 7 and p = 0.500000

x P( X <= x) x P( X <= x)
4.5000 0.7751 4.00 0.7734
The Normal Approximation of Binomial
Distribution

The normal distribution with  = 5.5 and  = 1.6583 is a closer


approximation to the binomial with n = 11 and p = 0.50.
P(x < 4.5) = 0.2732
Binomial Distribution: n = 11, p = 0.50
Normal Distribution:  = 5.5,  = 1.6583
P(x  4) = 0.2744
0.3
0.2

0.2

P(x)
f(x)

0.1

0.1

0.0
0.0
0 1 2 3 4 5 6 7 8 9 10 11
0 5 10
X
X
Approximating a Binomial Probability
Using the Normal Distribution
 a - np b - np 
P( a  X  b) =& P Z 
 np(1 - p) np(1 p) 
-

for n large (n  50) and p not too close to 0 or 1.00


or:
 a - 0.5 - np b + 0.5 - np 
P (a  X  b) =& P Z 
 np(1 - p) -
np(1 p) 

for n moderately large (20  n < 50).

NOTE: If p is either small (close to 0) or large (close to 1), use the


Poisson approximation.
Confidence interval Using Statistics

• Consider the following statements:


x = 550
• A single-valued estimate that conveys little information
about the actual value of the population mean.
We are 99% confident that  is in the interval [449,551]
• An interval estimate which locates the population mean
within a narrow interval, with a high level of confidence.
We are 90% confident that  is in the interval [400,700]
• An interval estimate which locates the population mean
within a broader interval, with a lower level of confidence.
Confidence Interval or Interval
Estimate

A confidence interval or interval estimate is a range or interval of


numbers believed to include an unknown population parameter.
Associated with the interval is a measure of the confidence we have
that the interval does indeed contain the parameter of interest.

• A confidence interval or interval estimate has two components:


A range or interval of values
An associated level of confidence
Confidence Interval for 
When  Is Known
 If the population distribution is normal, the sampling distribution of the mean is
normal.
• If the sample is sufficiently large, regardless of the shape of the population
distribution, the sampling distribution is normal (Central Limit Theorem).
In either case:

    Standard Normal Distribution: 95% Interval


P  - 196
. < x <  + 196
.  = 0.95
 n n 0.4

0.3

f(z)
or 0.2

0.1

  
0.0

P x - 196 <  < x + 196  = 0.95
-4 -3 -2 -1 0 1 2 3 4
. . z
 n n
Confidence Interval for  when  is Known
(Continued)
Before sampling, there is a 0.95probability that the interval

  1.96
n
will include the sample mean (and 5% that it will not).

Conversely, after sampling, approximat ely 95% of such intervals



x  1.96
n
will include the population mean (and 5% of them will not).


That is, x  1.96 is a 95% confidence interval for  .
n
A 95% Interval around the Population
Mean
Sampling Distribution of the Mean
0.4
Approximately 95% of sample means
0.3
95%
can be expected to fall within the
interval  - 196   .
,  + 196
f(x)

0.2
 . .
n n 
0.1
2.5% 2.5%
Conversely, about 2.5% can be
0.0

 - 196
.

n
  + 196
.

n
x
expected to be above  + 196
.
n
and
2.5% can be expected to be below
x 
 - 196
. .
x n
2.5% fall below
the interval x
x So 5% can be expected to fall outside
the interval  - 1.96  ,  + 1.96  .
x
x 2.5% fall above
x
the interval  n n
x
x

95% fall within


the interval
The 95% Confidence Interval for 

A 95% confidence interval for  when  is known and sampling is


done from a normal population, or a large sample is used, is:

x  1.96
n

The quantity 1.96 is often called the margin of error or the
n
sampling error.
For example, if: n = 25 A 95% confidence interval:
 = 20  20
x  1.96 = 122  1.96
x = 122 n 25
= 122  (1.96)(4 )
= 122  7.84
= 114.16,129.84
A (1-a )100% Confidence Interval for 

We define za as the z value that cuts off a right-tail area of a under the standard
2
normal curve. (1-a) is called the confidence coefficient. a is called the error
2

probability, and (1-a)100% is called the confidence level.


S tand ard Norm al Dis trib ution  
P z > za  =a/
0.4  
(1 - a ) 2
 
P z < -za  = a/
 
0.3
2
 
f(z)

P  za z za  = (1 - a)
- < <
0.2

0.1 a a  2 2

2 2
0.0 (1- a)100% Confidence Interval:
-5 -4 -3 -2 -1 0 1 2 3 4 5 
-z a Z za x  za
2 2
2 n
Critical Values of z and Levels of
Confidence

a Stand ard N o rm al Distrib utio n


(1 - a ) za 0.4
2 2 (1 - a )
0.3
0.99 0.005 2.576

f(z)
0.2

0.98 0.010 2.326 0.1 a a


2 2
0.95 0.025 1.960 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5
-z a za
0.90 0.050 1.645 2
Z
2

0.80 0.100 1.282


The Level of Confidence and the
Width of the Confidence Interval
When sampling from the same population, using a fixed sample size, the
higher the confidence level, the wider the confidence interval.
St an d ar d N o r m al Dis tri b uti o n St an d ar d N or m al D is tri b uti o n

0.4 0.4

0.3 0.3
f(z)

f(z)
0.2 0.2

0.1 0.1

0.0 0.0
-5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5
Z Z

80% Confidence Interval: 95% Confidence Interval:


 
x  128
. x  196
.
n n
The Sample Size and the Width of the
Confidence Interval
When sampling from the same population, using a fixed confidence
level, the larger the sample size, n, the narrower the confidence
interval.
S a m p lin g D is trib utio n o f th e M e an S a m p lin g D is trib utio n o f th e M e an

0 .4 0 .9
0 .8

0 .3 0 .7

0 .6

0 .5

f(x)
f(x)

0 .2
0 .4

0 .3
0 .1
0 .2

0 .1
0 .0 0 .0

x x

95% Confidence Interval: n = 20 95% Confidence Interval: n = 40


Example

• Shrimmy,is planning to invest heavily in black tiger breed. As part of the


decision, the company wants to estimate the average amount of black tiger
shrimp a family of four would need per month. A random sample of n = 100
families is obtained, and in this sample the average amount of shrimp in pound
per month is 6.5 and the population standard deviation is known to be 3.2.
Construct a 95% confidence interval for the average amount of shrimp
consumed by the entire population of families of 4.
Confidence Interval or Interval Estimate for 
When  Is Unknown - The t Distribution

If the population standard deviation, , is not known, replace


with the sample standard deviation, s. If the population is
normal, the resulting statistic: t = X - 
s
n
has a t distribution with (n - 1) degrees of freedom.
• The t is a family of bell-shaped and symmetric Standard normal
distributions, one for each number of degree of
freedom. t, df = 20
• The expected value of t is 0.
• For df > 2, the variance of t is df/(df-2). This is
t, df = 10
greater than 1, but approaches 1 as the number
of degrees of freedom increases. The t is flatter
and has fatter tails than does the standard
normal. 
• The t distribution approaches a standard normal 

as the number of degrees of freedom increases


Confidence Intervals for  when  is
Unknown- The t Distribution

A (1-a)100% confidence interval for  when  is not known


(assuming a normally distributed population) is given by:
s
x t a
2n
where ta is the value of the t distribution with n-1 degrees of
2 a
freedom that cuts off a tail area of 2 to its right.
The t Distribution
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657 t D is trib utio n: d f = 1 0
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841 0 .4
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707 0 .3
7 1.415 1.895 2.365 2.998 3.499 Area = 0.10 Area = 0.10
8 1.397 1.860 2.306 2.896 3.355

}
f(t)
9 1.383 1.833 2.262 2.821 3.250 0 .2

10 1.372 1.812 2.228 2.764 3.169


11 1.363 1.796 2.201 2.718 3.106
0 .1
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977 0 .0
15 1.341 1.753 2.131 2.602 2.947 -1.372 1.372
-2.228 0
16 1.337 1.746 2.120 2.583 2.921 2.228

}
}
17 1.333 1.740 2.110 2.567 2.898 t
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861 Area = 0.025 Area = 0.025
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23
24
1.319
1.318
1.714
1.711
2.069
2.064
2.500
2.492
2.807
2.797
Whenever  is not known (and the population is
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
assumed normal), the correct distribution to use is
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
the t distribution with n-1 degrees of freedom.
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
Note, however, that for large degrees of freedom,
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
the t distribution is approximated well by the Z
120 1.289 1.658 1.980 2.358 2.617
 1.282 1.645 1.960 2.326 2.576
distribution.
Example

A blood analyst wants to estimate the average AFP index of the Vietnamese
people. A random blood sample of size 15 yields an average of x = 10.37ng / ml
and a standard deviation of s = 3.5 ng/ml. Assuming a normal population of
the AFP values, give a 95% confidence interval for the average AFP value
of the Vietnamese population? (AFP=alpha-fetoprotein)
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------ The critical value of t for df = (n -1) = (15 -1)
1 3.078 6.314 12.706 31.821 63.657
. . . . . . =14 and a right-tail area of 0.025 is:
. . . . . .

13
. .
1.350
.
1.771
.
2.160
.
2.650
.
3.012
t0.025 = 2.145
14 1.345 1.761 2.145 2.624 2.977 The corresponding confidence interval or
15 1.341 1.753 2.131 2.602 2.947 s
.
.
.
.
.
.
.
.
.
.
.
. interval estimate is: x  t 0 . 025
. . . . . . n
35
.
= 10.37  2.145
15
= 10.37  1.94
= 8.43,12.31
Large Sample Confidence Intervals for
the Population Mean

df t0.100 t0.050 t0.025 t0.010 t0.005


--- ----- ----- ------ ------ ------ Whenever  is not known (and the population is
1 3.078 6.314 12.706 31.821 63.657
. . . . . . assumed normal), the correct distribution to use is
. . . . . .
. . . . . . the t distribution with n-1 degrees of freedom.
120 1.289 1.658 1.980 2.358 2.617
 1.282 1.645 1.960 2.326 2.576 Note, however, that for large degrees of freedom,
the t distribution is approximated well by the Z
distribution.
Large Sample Confidence Intervals for
the Population Mean

A large - sample (1 - a )100% confidence interval for :


s
x  za
2 n

Example An environmental scientist wants to estimate the average amount of NOx in a given region. A random sample
of 100 data points gives x-bar = 357.60 ppm and s = 140.00 ppm. Give a 95% confidence interval for , the average
amount of NOx in any sample taken.

s 140.00
x  z0.025 = 357.60  1.96 = 357.60  27.44 =  33016,385
. .04
n 100
Exercise 1
Exercise 2
Large-Sample Confidence Intervals
for the Population Proportion, p

The estimator of the population proportion, p , is the sample proportion, p . If the


sample size is large, p has an approximately normal distribution, with E( p ) = p and
pq
V( p ) = , where q = (1 - p). When the population proportion is unknown, use the
n
estimated value, p , to estimate the standard deviation of p .

For estimating p , a sample is considered large enough when both n  p an n  q are greater
than 5.
Large-Sample Confidence Intervals
for the Population Proportion, p

A large - sample (1-a )100% confidence interval for the population proportion , p :

pˆ  z pˆ qˆ
a /2 n
where the sample proportion , p̂, is equal to the number of successes in the sample, x,
divided by the number of trials (the sample size), n, and q̂ = 1- p̂.
Example

A marketing research firm wants to estimate the share that foreign companies
have in the American market for certain products. A random sample of 100
consumers is obtained, and it is found that 34 people in the sample are users
of foreign-made products; the rest are users of domestic products. Give a
95% confidence interval for the share of foreign products in this market.


pq ( 0.34 )( 0.66)
p  za = 0.34  1.96
2
n 100
= 0.34  (1.96)( 0.04737 )
= 0.34  0.0928
=  0.2472 ,0.4328

Thus, the firm may be 95% confident that foreign manufacturers control
anywhere from 24.72% to 43.28% of the market.
Exercise 3
Confidence Intervals for the Population Variance:
The Chi-Square (2) Distribution

• The sample variance, s2, is an unbiased estimator of the population


variance, 2.
• Confidence intervals for the population variance are based on the chi-
square (2) distribution.
The chi-square distribution is the probability distribution of the sum of
several independent, squared standard normal random variables.
The mean of the chi-square distribution is equal to the degrees of
freedom parameter, (E[2] = df). The variance of a chi-square is equal
to twice the number of degrees of freedom, (V[2] = 2df).
The Chi-Square (2) Distribution

C hi-S q uare D is trib utio n: d f=1 0 , d f=3 0 , d f =5 0


 The chi-square random variable cannot be
negative, so it is bound by zero on the left. 0 .1 0
0 .0 9 df = 10
 The chi-square distribution is skewed to the right. 0 .0 8
0 .0 7
 The chi-square distribution approaches a normal 0 .0 6

f( )
as the degrees of freedom increase. df = 30

2
0 .0 5
0 .0 4
0 .0 3 df = 50
0 .0 2
0 .0 1
0 .0 0
0 50 100

2

In sampling from a normal population, the random variable:

( n - 1) s 2
 =
2

2
has a chi - square distribution with (n - 1) degrees of freedom.
Confidence Interval for the Population
Variance
A (1-a)100% confidence interval for the population variance * (where the
population is assumed normal) is:
 2
 ( n - 1) s , ( n - 1) s 
2

  a2 2 a 
 2
1-
2 
 2
where a is the value of the chi-square distribution with n - 1 degrees of freedom
2 a  2
that cuts off an area to its right and a is the value of the distribution that
1-
a2 2 a
cuts off an area of to its left (equivalently, an area of 1 - to its right).
2 2

* Note: Because the chi-square distribution is skewed, the confidence interval for the
population variance is not symmetric
Example

In an automated process, a machine fills cans of coffee. If the average amount


filled is different from what it should be, the machine may be adjusted to
correct the mean. If the variance of the filling process is too high, however, the
machine is out of control and needs to be repaired. Therefore, from time to
time regular checks of the variance of the filling process are made. This is done
by randomly sampling filled cans, measuring their amounts, and computing the
sample variance. A random sample of 30 cans gives an estimate s2 = 18,540.
Give a 95% confidence interval for the population variance, 2.

 2
 ( n - 12 ) s , ( n -21) s  =  ( 30 - 1)18540 , ( 30 - 1)18540  = 11765,33604
2

 a  a   457
. 16.0 
 2
1-
2 
Example (continued)

Area in Right Tail

df .995 .990 .975 .950 .900 .100 .050 .025 .010 .005
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
28 12.46 13.56 15.31 16.93 18.94 37.92 41.34 44.46 48.28 50.99
29 13.12 14.26 16.05 17.71 19.77 39.09 42.56 45.72 49.59 52.34
30 13.79 14.95 16.79 18.49 20.60 40.26 43.77 46.98 50.89 53.67

Chi-Square Distribution: df = 29

0.06

0.05
0.95
0.04
f( )
2

0.03

0.02
0.025
0.01 0.025
0.00
0 10 20 30 40 50 60 70
2
 20.975 = 16.05  20.025 = 4572
.
Sample-Size Determination

Before determining the necessary sample size, three questions must


be answered:
• How close do you want your sample estimate to be to the unknown parameter? (What is the
desired bound, B?)
• What do you want the desired confidence level (1-a) to be so that the distance between your
estimate and the parameter is less than or equal to B?
• What is your estimate of the variance (or standard deviation) of the population in question?


For example: A (1- a ) Confidence Interval for : x  z a
n

}
2

Bound, B
Exercise 4
Sample Size and Standard Error

The sample size determines the bound of a statistic, since the standard
error of a statistic shrinks as the sample size increases:

Sample size = 2n
Standard error
of statistic

Sample size = n
Standard error
of statistic


Minimum Sample Size: Mean and
Proportion
Minimum required sample size in estimating the population
mean, :
za2 2
n= 2 2
B
Bound of estimate:

B = za
2 n

Minimum required sample size in estimating the population


proportion, p
za2 pq
n= 2 2
B
Example

A microbiologist wants to conduct an experiment to estimate the average amount


of micro-organisms in the water of a popular river. He plans to determine the
average amount of micro organism to within 120 µg/ml, with 95% confidence.
From past record, an estimate of the population standard deviation is
s = 400 µg/ml. What is the minimum required sample size?

za 
2 2

n= 2
2
B

2 2
(1.96) ( 400)
= 2
120

= 42.684  43

You might also like