EDA Notebook 4 Normal Distributions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

NOTRE DAME OF MARBEL UNIVERSITY ENGINEERING DATA ANALYSIS

COLLEGE OF ENGINEERING AND TECHOLOGY Notebook 4


Alunan Avenue, City of Koronadal
First Semester, SY. 2022-2023

ENGR. JENEL E. TONOGBANUA, ECE


Instructor
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

Introduction
Suppose you are taking up Statistics and
Probability subject along with your
classmates in this school. At the end of
the semester, you have all 100 of your
classmates complete a final exam
consisting of 100 multiple-choice
questions. The scores that you and your
classmates received are as shown.

You can tell from looking at the data


that the highest score a student 1
received was 100% and the lowest

Engineering Data
ENGMATH 114
score was 60%. What you might not
have been able to tell just by glancing

Analysis
at the table is that the data is normally
distributed.
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

The normal, a continuous Most IQ scores are normally distributed.


distribution, is the most Often real-estate prices fit a normal
important of all the distribution. The normal distribution is
distributions. extremely important, but it cannot be
applied to everything in the real world.
It is widely used and even more widely
abused. Its graph is bell-shaped. You see
the bell curve in almost all disciplines.

Some of these include psychology,


business, economics, the sciences, 2
nursing, and, of course, mathematics.

Engineering Data
ENGMATH 114
Some of your instructors may use the
normal distribution to help determine

Analysis
your grade.
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

The Normal Distribution


A normal distribution is a bell-shaped For example, if you took the height of one
frequency distribution curve. Most of hundred 22-year-old women and created
the data values in a normal distribution a histogram by plotting height on the x-
tend to cluster around the mean. The axis, and the frequency at which each of
further a data point is from the mean, the heights occurred on the y-axis, you
the less likely it is to occur. There are would get a normal distribution.
many things, such as intelligence,
height, and blood pressure, that
naturally follow a normal distribution.

Engineering Data
ENGMATH 114

Analysis
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

The Normal Distribution


The normal distribution has two
parameters (two numerical descriptive
measures): the mean (μ) and the
standard deviation (σ). If X is a quantity
to be measured that has a normal
distribution with mean (μ) and standard
deviation (σ), we designate this by
writing
𝑋~𝑁(𝜇, 𝜎)
The probability density function is a
It can be read as rather complicated function. Do not 4
“The random variable 𝑋 is a normal memorize it. It is not necessary.

Engineering Data
ENGMATH 114
distribution with mean 𝜇 and standard
deviation 𝜎.

Analysis
1 1 𝑥−𝜇 2

𝑓 𝑥 = 𝑒 2 𝜎
𝜎 2𝜋
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

The curve is symmetric about a vertical A change in μ causes the graph to shift
line drawn through the mean, μ. In to the left or right. This means there are
theory, the mean is the same as the an infinite number of normal probability
median, because the graph is symmetric distributions. One of special interest is
about μ. As the notation indicates, the called the standard normal distribution.
normal distribution depends only on the
mean and the standard deviation. Since
the area under the curve must equal
one, a change in the standard deviation,
σ, causes a change in the shape of the
curve; the curve becomes fatter or 5
skinnier depending on σ.

Engineering Data
ENGMATH 114

Analysis
ENGR. JENEL E. TONOGBANUA
Characteristics of COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

The Standard Normal Curve Standard Normal Curve


A Standard Normal Curve is a normal A Standard Normal Curve is a normal
probability distribution that has a mean probability distribution that has a mean
𝜇 = 0 and a standard deviation 𝜎 = 1. 𝜇 = 0 and a standard deviation 𝜎 = 1.

The standard normal distribution is a


normal distribution of standardized
values called z-scores. A z-score is Characteristics of a Standard Normal
measured in units of the standard Curve
deviation. For example, if the mean of a 1. The distribution is bell-shaped.
normal distribution is five and the 2. The curve is symmetrical about its
standard deviation is two, the value 11 center. 6
is three standard deviations above (or 3. The mean, the median and the

Engineering Data
ENGMATH 114
to the right of) the mean. The mode coincide at the center.
calculation is as follows: 4. The width of the curve is

Analysis
𝑥 = 𝜇 + 𝑧𝜎 = 5 + 3 2 = 11 determined by the standard
The z-score is three. deviation of the distribution.
ENGR. JENEL E. TONOGBANUA

Characteristics of The z – scores COLLEGE OF ENGINEERING AND TECHNOLOGY


NOTRE DAME OF MARBEL UNIVERSITY

Standard Normal Curve (Standard Scores)


5. The tail of the curve flatten out
indefinitely along the horizontal axis, If X is a normally distributed random
always approaching the axis but never variable and 𝑋 ~ 𝑁(𝜇, 𝜎), then :
touching it. The curve is asymptotic to
the base line. The area under the curve The z-score tells you how many
is 1. Thus it represents the probability standard deviations the value x is
or the proportion or the percentage above (to the right of) or below (to
associated with specific sets of the left of) the mean, μ.
measurement values.
Values of x that are larger than the 7
mean have positive z-scores, and

Engineering Data
values of x that are smaller than the

ENGMATH 114
mean have negative z-scores. If x

Analysis
equals the mean, then x has a z-score
of zero.
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

The z - scores Computing z - scores


The formula for computing z – scores is
given by:

𝑥−𝜇
𝑧=
𝜎
Where
𝑥 = the raw score or normal value
𝜇 = the mean
𝜎 = the standard deviation

If the z – score is given and the raw 8


score is to be found instead, the

Engineering Data
ENGMATH 114
formula can be manipulated such that

Analysis
𝑥 = 𝜇 + 𝑧𝜎
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY
Example 1: Computing z - scores
Find the z-score for each of the Solution:
following B. X ~ N(5, 6), x=1.
A. 𝑋 ~ 𝑁(5, 6), 𝑥 = 17. 1 − 5 −4
B. 𝑋 ~ 𝑁(5, 6), 𝑥 = 1. 𝑧= = ≈ −0.67
6 6
This means that x = 1 is 0.67 standard
Solution: deviations (–0.67σ) below or to the left
A. This says that X is a normally of the mean μ = 5.
distributed random variable with mean
μ = 5 and standard deviation σ = 6. Summarizing, when z is positive, x is
Suppose x = 17. Then:
above or to the right of μ and when 9
𝑥 − 𝜇 17 − 5 12
𝑧= = = z is negative, x is to the left of or
𝜎 6 6

Engineering Data
below μ. Or, when z is positive, x is

ENGMATH 114
𝑧=2
greater than μ, and when z is

Analysis
This means that x = 17 is two standard
deviations (2σ) above or to the right of negative x is less than μ.
the mean μ = 5.
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

Example 2: Computing z - scores


What is the z-score of 𝑥,
when 𝑥 = 1 and 𝑋 ~ 𝑁(12,3)?

Solution
𝑥−𝜇
𝑧=
𝜎
1 − 12
𝑧=
3
11
𝑧=− ≈ −3.67
3
10

Engineering Data
ENGMATH 114

Analysis
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY

Example 3: Computing z – scores in a verbal problem. NOTRE DAME OF MARBEL UNIVERSITY

Some doctors believe that a person can b. Suppose a person gained 3 pounds (a
lose five pounds, on the average, in a negative weight loss). Then z = ________.
month by reducing his or her fat intake This z-score tells you that 𝑥 = – 3 is
and by exercising consistently. Suppose ________ standard deviations to the
weight loss has a normal distribution. Let __________ (right or left) of the mean.
X = the amount of weight lost (in pounds)
by a person in a month. Use a standard
deviation of two pounds. X ~ N(5, 2). Fill in
the blanks. a. z=2.5 This z-score tells you that x = 10 is
2.5 standard deviations to the right of the
a. Suppose a person lost ten pounds in a mean five.
month. The z-score when x = 10 pounds is 11
z = ________. This z-score tells you that

Engineering Data
b. z = –4. This z-score tells you that x

ENGMATH 114
𝑥 = 10 is ________ standard deviations
to the ________ (right or left) of the = –3 is four standard deviations to

Analysis
mean _____ (What is the mean?). the left of the mean.
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

The Empirical Rule


If X is a random variable and has a
normal distribution with mean μ and
standard deviation σ, then the
Empirical Rule states the following:
• About 68% of the x values lie
between –1σ and +1σ of the mean μ
(within one standard deviation of the
mean).
• About 95% of the x values lie
between –2σ and +2σ of the mean μ
(within two standard deviations of the 12
mean).

Engineering Data
ENGMATH 114
• About 99.7% of the x values lie
between –3σ and +3σ of the mean μ

Analysis
(within three standard deviations of the
mean).
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

The Empirical Rule


Notice that almost all the x values lie
within three standard deviations of the
mean.
• The z-scores for +1σ and –1σ are +1
and –1, respectively.
• The z-scores for +2σ and –2σ are +2
and –2, respectively.
• The z-scores for +3σ and –3σ are +3
and –3 respectively.
The empirical rule is also known as the
68-95-99.7 rule. 13

Engineering Data
ENGMATH 114

Analysis
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY
Example 4: Empirical Rule
About 95% of the x values lie within two
Suppose x has a normal distribution with standard deviations of the mean.
mean 50 and standard deviation 6. Therefore, about 95% of the x values lie
between –2σ = (–2)(6) = –12 and 2σ =
(2)(6) = 12. The values 50 – 12 = 38 and 50
About 68% of the x values lie within one + 12 = 62 are within two standard
standard deviation of the mean. Therefore, deviations from the mean 50. The z-scores
about 68% of the x values lie between –1σ are –2 and +2 for 38 and 62, respectively.
= (–1)(6) = –6 and 1σ = (1)(6) = 6 of the
mean 50. The values 50 – 6 = 44 and 50 + 6 About 99.7% of the x values lie within
= 56 are within one standard deviation from three standard deviations of the mean.
the mean 50. The z-scores are –1 and +1 for Therefore, about 95% of the x values lie 14
44 and 56, respectively. between –3σ = (–3)(6) = –18 and 3σ =

Engineering Data
ENGMATH 114
(3)(6) = 18 from the mean 50. The values
50 – 18 = 32 and 50 + 18 = 68 are within

Analysis
three standard deviations of the mean 50.
The z-scores are –3 and +3 for 32 and 68,
respectively.
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY
Example 5: Empirical Rule
From 1984 to 1985, the mean height of c. About 99.7% of the y values lie
15 to 18-year-old males from Chile was between what two values? These
172.36 cm, and the standard deviation values are ________________. The z-
was 6.34 cm. Let Y = the height of 15 to scores are ________________,
18-year-old males in 1984 to 1985. Then
respectively.
Y ~ N(172.36, 6.34).
a. About 68% of the values lie between
a. About 68% of the y values lie 166.02 cm and 178.7 cm. The z-scores
between what two values? These are –1 and 1.
values are _______________. The z-
scores are ________________,
b. About 95% of the values lie between
respectively. 15
b. About 95% of the y values lie
159.68 cm and 185.04 cm. The z-scores

Engineering Data
are –2 and 2.

ENGMATH 114
between what two values? These
values are ________________. The

Analysis
z-scores are ________________ c. About 99.7% of the values lie
respectively. between 153.34 cm and 191.38 cm.
The z-scores are –3 and 3.
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

Using the Normal Distribution

A. The shaded area in the following


graph indicates the area to the left
of x. This area is represented by the
probability P(X < x).
B. The area to the right is then P(X > x)
= 1 – P(X < x).
If the area to the left is 0.0228, then the
area to the right is
1 – 0.0228 = 0.9772.
Remember, 16
P(X < x) = Area to the left of the vertical

Engineering Data
line through x.

ENGMATH 114
P(X > x) = 1 – P(X < x) = Area to the right

Analysis
of the vertical line through x.
ENGR. JENEL E. TONOGBANUA
Example 6: Using the COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

Normal Distribution Solution


a. Let X = a score on the final exam. X ~ N(63, 5),
The final exam scores in a statistics class
where μ = 63 and σ = 5. The corresponding z-
were normally distributed with a mean 𝑥−𝜇 65−63 2
of 63 and a standard deviation of five. score is 𝑧 = 𝜎 = 5 = 5 = 0.4
a. Find the probability that a randomly Given this z-score, use the table to find that the
selected student scored more than 65 on area to the left is 0.6554. Then, find P(x > 65).
the exam. 𝑃(𝑥 > 65) = 1 − 0.6554 = 0.3446

b. Find the probability that a randomly


selected student scored less than 85.

c. Find the 90th percentile (that is, find


the score k that has 90% of the scores 17
below k and 10% of the scores above k).

Engineering Data
ENGMATH 114
d. Find the 70th percentile

Analysis
ENGR. JENEL E. TONOGBANUA
Example 6: Using the COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

Normal Distribution
The final exam scores in a statistics class
were normally distributed with a mean of
63 and a standard deviation of five.

b. Find the probability that a randomly


selected student scored less than 85.
The corresponding z-score for 𝑥 = 85 is
𝑥 − 𝜇 85 − 63 22
𝑧= = = = 4.4
𝜎 5 5
Given this z-score, we look at the graph
and notice that after z=3.4, there is none
that follows but if we continue, we see 18
that the area to the left is 1. The area

Engineering Data
ENGMATH 114
which is 1 represents the entire
bell/region. So the shaded part is the

Analysis
Approaches 1.
entire bell-shaped region.
So for z = 4.4, the probability is 1
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
Example 6: Using the Normal Distribution NOTRE DAME OF MARBEL UNIVERSITY

(Computing Percentile)
The final exam scores in a statistics class were For this kind of problem, find the
normally distributed with a mean of 63 and a probability 0.90 in the table. This is
standard deviation of five. between 𝑧 = 1.28 and 1.29. We can use
c. Find the 90th percentile (that is, find the the average 𝑧 = 1.285 . Using this, we
score k that has 90% of the scores below k compute the corresponding raw score,
and 10% of the scores above k). and assign it as k.

19

Engineering Data
ENGMATH 114

Analysis
ENGR. JENEL E. TONOGBANUA

Example 6: Using the COLLEGE OF ENGINEERING AND TECHNOLOGY


NOTRE DAME OF MARBEL UNIVERSITY

Normal Distribution
𝑥−𝜇
𝑧= The final exam scores in a statistics class were
𝜎 normally distributed with a mean of 63 and a
𝑘 − 63 standard deviation of five.
1.285 =
5 d. Find the 70th percentile.
5 1.285 = 𝑘 − 63
Looking at the table, 0.70 is between 𝑧 = 0.52 and
𝑘 = 5 1.285 + 63 𝑧 = 0.53, we will use their average 𝑧 = 0.525 to
𝑘 = 69.425 compute the raw score k.
That means, 90% of the scores are below
69.425, and consequently, only 10% is
above it.

20

Engineering Data
𝑥−𝜇 𝑘 − 63

ENGMATH 114
𝑧= → 0.525 =
𝜎 5

Analysis
5 0.525 = 𝑘 − 63, 𝑘 = 5 0.525 + 63
𝑘 = 65.625
That means, 70% of the scores are below 65.625,
and consequently, only 30% is above it.
ENGR. JENEL E. TONOGBANUA

Example 7: Using the Normal COLLEGE OF ENGINEERING AND TECHNOLOGY


NOTRE DAME OF MARBEL UNIVERSITY

Distribution (Verbal Problem)


A personal computer is used for office work at Solution:
home, research, communication, personal Let X = the amount of time (in hours) a
finances, education, entertainment, social household personal computer is used for
networking, and a myriad of other things. entertainment. X ~ N(2, 0.5) where μ = 2
Suppose that the average number of hours a and σ = 0.5. Solve for the z-scores
household personal computer is used for corresponding to x=1.8 and x=2.75
entertainment is 2 hours per day. Assume the 𝑥 − 𝜇 1.8 − 2 −0.2
times for entertainment are normally 𝑧= = = = −0.4
𝜎 0.5 0.5
distributed and the standard deviation for the 𝑥 − 𝜇 2.75 − 2 .75
times is half an hour (0.5 hrs). 𝑧= = = = 1.5
𝜎 0.5 0.5
a. Find the probability that a household
personal computer is used for
entertainment between 1.8 and 2.75 21
hours per day.

Engineering Data
ENGMATH 114
b. Find the maximum number of hours per
day that the bottom quartile of

Analysis
households uses a personal computer for
entertainment.
Example 7: Using the Normal ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

Distribution (Verbal Problem) cont’d


Next, use the z-table find the probabilities
(or area) corresponding to z=-0.4 and z=1.5.

We get,
𝑃 𝑥 < −0.4 = 0.3446
and 𝑃 𝑥 < 1.5 = 0.9332.
Then,
𝑃 −0.4 < 𝑥 < 1.5
= 𝑃 𝑥 < 1.5 − 𝑃 𝑥 < −0.4
= 0.9332 − 0.3446
= 0.5886
22
The probability that a household personal

Engineering Data
ENGMATH 114
computer is used between 1.8 and 2.75
hours per day for entertainment is 0.5886.

Analysis
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY
Example 7 Continuation
Looking at the table where P(x < k) = 0.25, we find k
A personal computer is used for office work at between 𝑧 = −0.67 and 𝑧 = −0.68. Their average is
home, research, communication, personal 𝑧 = −0.675
finances, education, entertainment, social
networking, and a myriad of other things.
Suppose that the average number of hours a
household personal computer is used for
entertainment is two hours per day. Assume the
times for entertainment are normally distributed
and the standard deviation for the times is half
an hour.

b. Find the maximum number of hours per day


that the bottom quartile of households uses a 23
personal computer for entertainment.

Engineering Data
ENGMATH 114
Solution:

Analysis
To find the maximum number of hours per day
that the bottom quartile of households uses a
personal computer for entertainment, find the
25th percentile, k, where P(x < k) = 0.25.
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

Using 𝑧 = −0.675, we compute for the raw score k using


𝑥−𝜇
𝑧=
𝜎
𝑘−2
−0.675 =
0.5
−0.675 0.5 = 𝑘 − 2
𝑘 = 2 − 0.675(0.5)
𝑘 = 1.6625
The maximum number of hours per day that the bottom quartile
of households uses a personal computer for entertainment is 1.66
hours.

24

Engineering Data
ENGMATH 114

Analysis
ENGR. JENEL E. TONOGBANUA
COLLEGE OF ENGINEERING AND TECHNOLOGY
NOTRE DAME OF MARBEL UNIVERSITY

25

Engineering Data
ENGMATH 114

Analysis

You might also like