Lesson 1 Normal Distribution P

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

THE COOPERATIVE UNIVERSITY COLLEGE

OF KENYA

DISTANCE AND eLEARNING


Nairobi, Kenya
E-mail: [email protected]

BUSINESS STATISTICS II

LAST REVISION ON September 21, 2015


©CUCK-DCeL

JJ II
J I
Back Close
STATISTICS

This presentation is intended to covered within one week. The notes, exam-
ples and exercises should be supplemented with a good textbook. Most of
the exercises have solutions/answers appearing elsewhere and accessible by
clicking the green EXERCISE tag. To move back to the same page click
the same tag appearing at the end of the solution/answer.

Errors and omissions in these notes are entirely the responsibility of the
author who should only be contacted through [email protected]. In such
a case, kindly ensure that you specify the module, the lesson number and
the page.
©CUCK-DCeL

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 2
Back Close
Contents

1 Normal Distribution 4
1.1 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 The Standard normal Distribution . . . . . . . . . . . . . . . . . . . . 7
1.3 Normal tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Sampling distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Solutions to Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
©CUCK-DCeL

JJ II
J I 3
Back Close
STATISTICS

LESSON 1
Normal Distribution

Learning outcomes
Upon completion of this lesson you should be able to;
1. Define a the normal probability distribution and describe properties of the nor-
mal distribution

2. Compute standardized scores

3. Read normal tables

4. Apply normality concept in estimating probabilities of certain outcomes

1.1. The Normal Distribution


The Normal Probability Distribution is one of the most useful and more important
continuous distributions in statistics. It is used frequently in statistics for the following
reasons among others:
©CUCK-DCeL

• The Normal distribution has many convenient mathematical properties.

• Many natural phenomena have distributions which when studied have been
shown to be close to that of the Normal Distribution.

• The Central Limit Theorem shows that the Normal Distribution is a suitable
model for large samples regardless of the actual distribution.

1.1.1. Description
The Normal distribution describes a continuous variable that takes on values in the
real number line. The formula for the Normal has two parameters, the mean, µ
and the variance, σ 2 . The parameter µ is a “location” parameter and σ 2 is a “scale”
parameter. The symmetric about the mean as shown in the following figure

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 4
Back Close
STATISTICS

Consider the following plot for a certain study on men heights

It is clear that the very tall are as few as the very short. Majority of the Americans
are 174 cm tall. The heights range from 150 cm which is about 174 − 3(6.7)cm to
about 195 cm which is about 174 + 3(6.7) cm. This is in line with Tchebysheff ’s
theorem.
©CUCK-DCeL

Functional form
A continuous random variable, X, is normally distributed with a probability density
function given by: !
1 (x − µ)2
f (x) = √ exp −
σ 2π 2σ 2

where µ and σ are the mean and the standard deviation respectively. It can also be
written as

1 1 x−µ 2
f (x) = √ e− 2 ( σ )
2πσ
The expected value of a distribution is defined as the “probability weighted sum”
of outcomes. For X ∼ N (µ, σ 2 ) ,
ˆ +∞
E(X) = f (x) · x dx = µ
−∞

and, the variance of a distribution is the “probability weighted sum” of the squared
differences between outcomes and their expected values.
ˆ +∞
V ar(X) = f (x) · [x − E(x)]2 dx
−∞
JJ II
J I CUCK, The Ultimate leader in Cooperative Education 5
Back Close
STATISTICS

which can be rearranged as


ˆ +∞
V ar(X) = f (x) · x2 dx − E(x)2 = E(x2 ) − E(x)2 = σ 2
−∞

It is now clear that the parameters µ and σ 2 are simply equal to the expected value
and variance (respectively).

Definition 1. Tchebysheff ’s theorem: For any set of observations, x1 , x2 , . . . xn ,


at least 1 − 1/k 2 of the values will lie within k standard deviations of the mean is
where k ≥ 1.

Empirical Rule: For any symmetrical, bell-shaped distribution, approximately


68% of the observations will lie within ±1σ of the mean (µ); approximately 95% of
the observations will lie within ±2σ of the mean and approximately 99.7% within
±3σ of the mean.
©CUCK-DCeL

Illustrations
The probability density function of a Normal distribution with µ = 3 and σ = 5 is
shown in Figure 1.1 on the following page.

• How would one describe that?

• How does this distribution change in appearance if µ and σ 2 are changed? Fig-
ure 1.2 on page 8 illustrates what happens.

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 6
Back Close
STATISTICS

N(3,25)

0.08
Probability of x

0.04
0.00

−10 −5 0 5 10 15

Possible values of x

Figure 1.1: Normal Distribution


©CUCK-DCeL

While the figure looks like repetitious, it does convey one very important attribute
of the Normal distribution: it always keeps the same shape. At least for these
parameter values, it is unimodal and symmetric. These graphs look the same because
the X-axis is allowed to re-scale itself to use up the allocated space.
If we restrict the display so that the axes of all of the figures are kept the same
in a position that suits the largest set of values–then the impact of changing the pa-
rameters is a bit more apparent.

The result is to be seen in Figure 1.3.

1.2. The Standard normal Distribution


A normal distribution with a mean of 0 and a standard deviation of 1 is called the
standard normal distribution. Every normally distributed variable can be transformed
into a standard normal variable by commuting the Z score value: The Z value is the
distance between a selected value, designated x, and the population mean µ , divided
by the population standard deviation,σ

X −µ
Z=
σ

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 7
Back Close
STATISTICS

N(3,4,) N(6,4,)
Probability of x

Probability of x
0.00 0.15

0.00 0.15
−2 0 2 4 6 8 0 2 4 6 8 10 12

Possible values of x Possible values of x

N(3,16,) N(6,16,)
Probability of x

Probability of x
0.08

0.08
©CUCK-DCeL

0.00

0.00

−5 0 5 10 15 −5 0 5 10 15

Possible values of x Possible values of x

N(3,36,) N(6,36,)
Probability of x

Probability of x
0.00 0.05

0.00 0.05

−15 −5 0 5 10 15 20 −10 0 5 10 15 20 25

Possible values of x Possible values of x

Figure 1.2: Variety of Normals

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 8
Back Close
STATISTICS

N(3,4,) N(6,4,) N(9,4,)


0.20

0.20

0.20
probability of x

probability of x

probability of x
0.10

0.10

0.10
0.00

0.00

0.00
−20 10 −20 10 −20 10

possible values of x possible values of x possible values of x


©CUCK-DCeL

N(3,16,) N(6,16,) N(9,16,)


probability of x

probability of x

probability of x
0.06

0.06

0.06
0.00

0.00

0.00

−20 10 −20 10 −20 10

possible values of x possible values of x possible values of x

Figure 1.3: Normals with varying variance

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 9
Back Close
STATISTICS

0.4
Probability

0.2
0.0

−4 −2 0 2 4

Z values

Figure 1.4: Standard Normal curve


©CUCK-DCeL

The transformed values will always give the curve 1.4. Notice that the central value
of Z is zero (0) and the curve is still symetric.
We determine probabilities based upon distance from the mean (i.e., the number
of standard deviations).
NOTE:

• The probability is the proportion of area under the standard normal curve.

• The probabilities have been computed and published under the name Normal
probability tables. What we get when we use these tables is always the area
between the mean and z standard deviations from the mean.

• Because of symmetry

P (Z > 0) = P (Z < 0) = 0.5

• Tables show probabilities rounded to 4 decimal places. e.g


If Z < 1.96 then probability is 0.9750, we write P (Z < 1.96) = 0.9750

If Z > −1.96 then probability ≈ 0.9750, we write P (Z > −1.96) = 0.9750

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 10
Back Close
STATISTICS

0.4
Probability

0.2
0.0

−4 −2 0 2 4

Z values

Figure 1.5: Standard Normal with shaded area


©CUCK-DCeL

0.4
Probability

0.2
0.0

−4 −2 0 2 4

Z values

Figure 1.6: Standard Normal with shaded area

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 11
Back Close
STATISTICS

1.3. Normal tables


From the normal table (Provided separately);

1. P (Z < 1) = 0.8413

2. P (Z < 2.97) = 0.9985 ⇒ P (Z > 2.97) = 0.0015

3. P (Z < 0) = 0.5000

4. P (Z < −1) = P (Z > 1) = 1 − 0.5398 = 0.4602

5. P (Z > 2) = 1 − P (Z < 2) = 1 − 0.9772 = 0.0.0228

P (Z > 1.42) = 0.0778

P (Z > −2.54) =

Example . The daily water usage per person in Thika is normally distributed with
a mean of 20 gallons and a standard deviation of 5 gallons. What is the probability
©CUCK-DCeL

that a person from Thika selected at random will use;


(a) less than 20 gallons per day?
(b) less than 25 gallons per day?
(c) more than 30 gallons per day?
(d) What percent of the population uses between 25 and 30 gallons?
Solution: We cannot read the probabilities directly. We must standardize our values
as follows
(a) P (X < 20) = P X−µ < 20−20

σ 5
= P (Z < 0) = 0.5
(b) P (X < 25) = P X−µ < 25−20

σ 5
= P (Z < 1) = 0.5398
(c) P (X > 30) = P σ > 5
X−µ 30−20

= P (Z > 2) = 0.0228
(d) P (25 < X < 30) = P 25−20 X−µ 30−20

5
< σ
< 5
= P (1 < Z < 2) =??

If we denote P (Z < k) as Φ(k) where k is positive, then the following rules could
be useful to you;

P (Z > 2) = 1 − Φ(2) = 1 − 0.9772 = 0.0228

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 12
Back Close
STATISTICS

P (Z > −k) = Φ(k)


P (Z < k) = Φ(k)
P (Z > k) = 1 − Φ(k)
P (Z < −k) = 1 − Φ(k)
P (k1 < Z < k2 ) = Φ(k2 ) − Φ(k1 )
©CUCK-DCeL

Therefore the area between Z = 1 and Z = 2 is given by P (1 < Z < 2) =


Φ(2) − Φ(1) = 0.9772 − 0.5398 = 0.4374 ⇒ 43.74%.
CLICK HERE FOR MORE INTERACTIVE EXAMPLES ON HOW TO READ
NORMAL TABLE go online to:
http://www.mathsisfun.com/data/standard-normal-distribution-table.html
NOTE: If X is normally distributed with mean µ and variance σ 2 , we write this
statement as X ∼ N (µ, σ 2 )
Quiz

1. Given that X ∼ N (34, 4),

(a) P (X < 40) =

(b) P (X > 30) =

(c) P (32 < X < 33) =

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 13
Back Close
STATISTICS

Figure 1.7: Sampling Process

(d) P (30 < X < 35) =

2. An airline has a regular flight from one airport to another. The airline models
the duration of a flight as a normally distributed random variable with a mean
of 185 minutes and a variance of 36 minutes. Use this model to calculate, to one
©CUCK-DCeL

decimal place, the percentage of these flights that are completed in less than 3
hours. Answer = %

Exercise 1.  Given that X ∼ N (45, 12),

1. P (X < 44)

2. P (X > 42)

3. P (40 < X < 46)

1.4. Sampling distributions

Exercise 2.  Discuss the various types of sampling methods and classify them
between probability and non-probability sampling methods
A sampling distribution is created by sampling.

• It is the probability distribution of a given statistic based on a random sample


of size n.
JJ II
J I CUCK, The Ultimate leader in Cooperative Education 14
Back Close
STATISTICS

Figure 1.8: Sampling Distribution

• It may be considered as the distribution of the statistic for all possible samples
of a given size.

• That is,

– we draw samples of size n from a given population.


©CUCK-DCeL

– We compute a statistic (e.g., a mean, proportion, standard deviation) for


each sample.
– The probability distribution of this statistic is called a sampling distri-
bution.

• It depends on the underlying distribution of the population, the statistics being


considered, and the sample size used.

The method employs the rules of probability and the laws of expected value and
variance to derive the sampling distribution. For example, consider the roll of one
and two dice.
Example .
A fair die is thrown infinitely many times, with the random variable X = No. of
spots on any throw.
The probability distribution of X is

x 1 2 3 4 5 6
1 1 1 1 1 1
P (X = x) 6 6 6 6 6 6

x.P (x) = 1 × 61 + 2 × 61 + 3 × 16 + 4 × 16 + 5 × 61 + 6 × 1
P
µ = E(X) = 6
= 3.5
JJ II
J I CUCK, The Ultimate leader in Cooperative Education 15
Back Close
STATISTICS

σ 2 = E(X 2 )−(E(X))2 = 1× 16 +4× 61 +9× 16 +16× 16 +25× 16 +36× 16 −3.52 = 2.92



σ = σ 2 = 1.471
A sampling distribution is created by looking at all samples of size n = 2 (i.e. two
dice) and their means
The entries are means of what is observed in each die

die1/die2 1 2 3 4 5 6
1 1 1.5 2 2.5 3 3.5
2 1.5 2 2.5 3 3.5 4
3 2 2.5 3 3.5 4 4.5
4 2.5 3 3.5 4 4.5 5
5 3 3.5 4 4.5 5 5.5
6 3.5 4 4.5 5 5.5 6

Notice that while there are 36 possible samples of size 2, there are only 11 values
unique means and some such as 3.5 occur more frequently than others. The sampling
distribution of is shown below:
©CUCK-DCeL

x̄ 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6


1 2 3 4 5 6 5 4 3 2 1
P (X = x̄) 6 6 6 6 6 6 6 6 6 6 6

µx̄ = E(X) = x̄.P (x) = 1 × 16 + 1.5 × 26 + ... + 6 × 61 = 3.5


P

σx̄2 = E(X 2 ) − (E(X))2 = 1 × 61 + 2.25 × 26 + ... + 36 × 16 − 3.52 = 1.46



σx̄ = σ 2 = 1.21
It is easy to see that
µx = µx̄ = E(X) = 3.5
but
σx2 = 2.92 6= σx̄2 = 1.46
A keen look reveals that
σx̄2 = σx2 /2 = 1.46
In other words, for general n

σx2
µx̄ = E(X), σx̄2 =
n

The standard deviation of the sampling distribution is called the standard error.

σ
σx̄ = √
n

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 16
Back Close
STATISTICS

Definition:
The sampling distribution of the mean of a random sample
drawn from any population is approximately normal for a
sufficiently large sample size.

The larger the sample size, the more closely the sampling distribution of X will
resemble a normal distribution.

Definition: Central Limit Theorem


The sampling distribution of the sample mean will be approx-
imately normal as the sample size increases.

In many practical situations, a sample size of 30 may be sufficiently large to allow


us to use the normal distribution as an approximation for the sampling distribution
of mean.
Note: If X is normal, the sample mean is normal. Don’t need Central Limit Theorem
in this case. Consider the questions below;
©CUCK-DCeL

1. Census data shows that mean height of adult males in a city are normally
distributed with a mean of 69.2 inches and a standard deviation of 4.3 inches.
A sample of 35 is selected. What is the probability that one of these subjects
has a height greater than 70 inches. This is about data distribution.

2. Census data shows that mean height of adult males in a city are normally
distributed with a mean of 69.2 inches and a standard deviation of 4.3 inches.
A sample of 35 is selected. What is the probability that the mean of these
subjects has a height greater than 70 inches . This is about sampling distribution

Examples
The Dean of the School of Business Studies claims that the average salary of the
school’s graduates one year after graduation is $800 per month with a standard de-
viation of $100. A second-year student would like to check whether the claim about
the mean is correct. He does a survey of 25 people who graduated one year ago and
determines their monthly salary. He discovers the sample mean to be $750. Is this
consistent with the Dean’s claim?
Note: This is the population.

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 17
Back Close
STATISTICS

Quiz

1. A certain variable X is such that X ∼ N (71, 7.4). If a sample of size 100 is taken
from this distribution

(a) What is the probability that a value from this sample is less than 70.01
Answer=

(b) Find the probability that the value lies between between 70.01 and 72.2
Answer =

(c) What is the probability that the mean of this sample exceeds 73.67? Answer=

Suggested materials for further reading


1. Wonnacott, T.H. and Wonnacott, R.J. (1990). Introductory Statistics for Busi-
©CUCK-DCeL

ness and Economics, 2nd Edition, John Wiley and Sons Inc.

2. Mason, R. D., Lind, D. A. and Marchal, W. G. (1999). Statistical Techniques


in Business and Economics. Irwin McGraw-Hill, Boston.

3. Marilyn K. Pelosi and Theresa M. Sandifer (1976). Elementary Statistics. John


Wiley & Sons, Inc

4. Gujarati, D.N. (2006). Basic Econometrics. 3rd Edition, McGraw-Hill, Inc.,


New York.

5. Keller, G., Warrack, B. and Bartel, H. (1994). Statistics for Management and
Economics. 3rd Edition. Wadsworth Publishing Company, Belmont California,
USA.

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 18
Back Close
STATISTICS

Solutions to Exercises
Exercise 1.

1. P (X < 44) = 0.3863

2. P (X > 42) = 0.807

3. P (40 < X < 46) = 0.5395


©CUCK-DCeL

JJ II
J I CUCK, The Ultimate leader in Cooperative Education 19
Back Close

You might also like