0% found this document useful (0 votes)
15 views

Module 6 Data Management Probabilities and Normal Distribution

Module-6-Data-Management-Probabilities-and-Normal-Distribution
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Module 6 Data Management Probabilities and Normal Distribution

Module-6-Data-Management-Probabilities-and-Normal-Distribution
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MODULE 6

Data Management: Probabilities and Normal Distribution

6.1 Introduction:
The normal curve also known as the Gaussian curve or the normal probability
curve is the most fundamental distribution curve in statistics. In this section,
we shall discuss the applications of a normal curve in statistics to performance
of students in class or in their daily activities using standard or scores.
6.2 Learning Outcomes
At the end of this section, you will be able to:

1. give the importance of a normal distribution;


2. differentiate between a normal distribution and a skewed distribution;
3. give the significance of the standard or -score;
4. compute areas under the normal curve; and 5. solve problems involving
the normal distribution.

6.3 What You Need to Know


A set of continuous variables where the mean, median, and the mode are all
equal is called a normal distribution. Its graph is called a normal curve. The
normal distribution is often referred to as Gaussian distribution in honor of Carl
Friedrich Gauss.
The graph of a normal distribution is symmetrical and approximates the bell
shape (see Figure 1). The area under the normal curve is equal to 1 (or 100%).
Since the mean is the same as the mode, the highest point on the bell
corresponds to the mean. Since the median is the same as the mean, 50% of
the values are below the mean and 50% are above the mean.

Figure 1. A normal curve

Page 1 of 13
The normal distribution is used to find probabilities by finding the area
under the curve. The area under the graph from the mean to any given 𝑧-score
can be determined using Table 1.

The area under the curve is the same as the probability that a value will be
between the mean and the given number.

What is Probability?

By a probability, we mean the likelihood of occurrence of a particular


situation which is described by a number between 0 and 1, inclusive. We
may think of this as a percentage between 0% and 100%, inclusive.

A situation that is not very likely to occur has a probability close to 0 while
a situation that is very likely to occur has a probability close to 1.

For instance, the probability of being struck by a lightning is close to


0. However, if we randomly choose a freshman student from Isabela State
University, it is very likely that the student is under 20 years old, so the
probability is close to 1.

Because any situation has from 0% to 100% of occurring, probabilities


are always between 0 and 1, inclusive. If a situation is sure to occur, its
probability is 1. If it cannot occur, its probability is 0.

Figure 2. The Standard Normal Model

Page 2 of 16
A standard normal model is a normal distribution with a mean of 0 and a
standard deviation of 1. It has some distinct properties.

6.3.1 Properties of a normal distribution


1. The mean, mode, and median are all equal.
2. The curve is symmetric at the center (i.e. around the mean, 𝜇).
3. Exactly half of the values are to the left of center and exactly half the values
are to the right.
4. The total area under the curve is 1.

Because of its properties, the following are observed based on the empirical
rule and Chebyshev’s theorem.
1. Approximately 68% of the data values will fall within 1 standard deviation
of the mean.
2. Approximately 95% of the data values will fall within 2 standard deviations
of the mean.
3. Approximately 99.78% of the data values will fall within 3 standard
deviations of the mean.

The standard deviation controls the spread of the distribution. A smaller


standard deviation indicates that the data is tightly clustered around the mean;
the normal distribution will be taller. A larger standard deviation indicates that
the data is spread out around the mean; the normal distribution will be flatter
and wider.

6.3.2 Standard Normal Model: Distribution of Data

One way of figuring out how data are distributed is to plot them in a graph.
If the data is evenly distributed, you may come up with a bell curve. A bell curve
has a small percentage of the points on both tails and the bigger percentage on
the inner part of the curve. In the standard normal model, about 5 percent of
your data would fall into the “tails” (colored darker orange in Figure 2) and 95
percent will be in between. For example, for test scores of students, the normal
distribution would show 2.5 percent of students getting very low scores and 2.5
percent getting very high scores. The rest will be in the middle; not too high or
too low. The shape of the standard normal distribution looks like this:

The standard normal distribution could help you figure out which subject
you are getting good grades in and which subjects you have to exert more effort
into due to low scoring percentages. Once you get a score in one subject that is
higher than your score in another subject, you might think that you are better
in the subject where you got the higher score. This is not always true.

Page 3 of 13
You can only say that you are better in a particular subject if you get a score
with a certain number of standard deviations above the mean. The standard
deviation tells you how tightly your data is clustered around the mean; it allows
you to compare different distributions that have different types of data —
including different means.

For example, if you get a score of 90 in Math and 95 in English, you might
think that you are better in English than in Math. However, in Math, your score
is 2 standard deviations above the mean. In English, it’s only one standard
deviation above the mean. It tells you that in Math, your score is far higher than
most of the students (your score falls into the tail).
Based on this data, you actually performed better in Math than in English!

The key to solving questions involving the normal curve is understanding


what the area under a standard normal curve represents. The total area under
a standard normal distribution curve is 100% (which is “1” as a decimal). For
example, the left half of the curve is 50%, or 0.5. So the probability of a random
variable appearing in the left half of the curve is 0.5.

Since not all problems are simple, a 𝑧-table had been prepared. A 𝑧-table
measures those probabilities and put them in standard deviations from the
mean. The mean is in the center of the standard normal distribution, and a
probability of 50% equals zero standard deviations.

There are different types of 𝑧-tables. It is important to read and check the
information given before we proceed to finding probabilities. The table which we
will use gives the probabilities to the left of a given 𝑧-value. We also take note
that since the total area under the normal curve is 1, the probability values are
also the areas to the left of a given 𝑧-value.

For instance, if 𝑧 = 1.65, then we go to 𝑧 = 1.6 in the table. Then we move to


the right and get the value that corresponds to 0.05. Thus, the area to the left of
𝑧 = 1.65 is 0.9505.

Page 4 of 16
Page 5 of 13
Source: https://www.math.arizona.edu/~jwatkins/normal-table.pdf

We will give more illustrations on finding probabilities using the -table. This
time, we follow the steps given.

Page 6 of 16
1. Area below 𝒛.

Question: What is the probability at 𝒛 ≤ 𝟏. 𝟔𝟓?

Steps Actual process and result


1. In this case, we will get the
area to the left of 𝒛 = 𝟏. 𝟔𝟓
and denote this as
𝑷(𝒛 ≤ 𝟏. 𝟔𝟓). It will help if
we draw a curve and
shade the area we want
to get. This part is
important because it will
give us an idea of what
the final answer will be.
Here, we know that the
probability is greater that
𝟎. 𝟓.
2. We refer to the table for the
next step. As given in the
previous example, we
locate 𝟏. 𝟔 and move to
the right until we reach
the value that
corresponds to the
column of 𝟎. 𝟎𝟓.

Thus, 𝑷(𝒛 ≤ 𝟏. 𝟔𝟓) = 𝟎. 𝟗𝟓𝟎𝟓

Page 7 of 13
2. Area above 𝒛.

Question: What is the area at 𝒛 ≥ 𝟏. 𝟔𝟓?

Steps Actual process and result


1. For this one, we will get the
area to the right of 𝒛 = 𝟏. 𝟔𝟓
and denote this as 𝑷(𝒛 ≥ 𝟏.
𝟔𝟓).

This time we know that


the probability we should
get must be lower than
𝟎. 𝟓.

2. We locate the value in the


table similar to what we
have done in the first
case.

3. Since the table gives the


area to the left and we
need the area to the right,
then we subtract the area 𝑷(𝒛 ≥ 𝟏. 𝟔𝟓) = 𝟏 − 𝟎. 𝟗𝟓𝟎𝟓 = 𝟎. 𝟎𝟒𝟗𝟓
to the left from 𝟏 to get the
area to the right of 𝒛 =
𝟏. 𝟔𝟓.

Page 8 of 16
3. Area between two 𝒛-values.

What is the area at −𝟎. 𝟕𝟖 ≤ 𝒛 ≤ 𝟏. 𝟔𝟓?

Steps Actual process and result


1. We draw the area on the
normal curve and we see
that the area is between
the values 𝒛 = −𝟎. 𝟕𝟖 and 𝒛
= 𝟏. 𝟔𝟓. We denote the
probability as 𝑷(−𝟎. 𝟕𝟖 ≤
𝒛 ≤ 𝟏. 𝟔𝟓)

2. We locate the values in the


table similar to what we
have done in the first two
cases. This time we
illustrate how we get the
value for 𝒛 = −𝟎. 𝟕𝟖.

3. To get 𝑷(−𝟎. 𝟕𝟖 ≤ 𝒛 ≤ 𝟏. 𝟔𝟓),


we get the difference
between the values we 𝑷(−𝟎. 𝟕𝟖 ≤ 𝒛 ≤ 𝟏. 𝟔𝟓) = 𝟎. 𝟗𝟓𝟎𝟓 − 𝟎. 𝟐𝟏𝟕𝟕 = 𝟎. 𝟕𝟑𝟐𝟖
obtained at 𝒛 = −𝟎. 𝟕𝟖 and
𝒛 = 𝟏. 𝟔𝟓.

Learning Activity 1

Direction. Find the following probabilities.

1. 𝑃(𝑧 ≤ −1.73) 2. 𝑃(𝑧 ≥ −0.67)

3. 𝑃(−1.73 ≤ 𝑧 ≤ −0.67)

Page 9 of 13
6.3.3 Applications of the Normal Distribution
How do you know that a word problem involves normal distribution? Look
for the key phrase “assume the variable is normally distributed” or “assume the
variable is approximately normal.”
Example 1. The mean time to complete a certain psychology examination is 34
minutes with a standard deviation of 8. If the distribution of the time to
complete the examination is approximately normally distributed, what is the
probability that a student will complete the examination

(a) in less than 28 minutes?


(b) in more than 40 minutes?
(c) Between 28 and 40 minutes?

Solution.
(a)
Steps Actual process and result
1. List the given mean and
𝜇 = 34 minutes
standard deviation.
𝜎 = 8 minutes
2. Compute the 𝑧-score of 𝑥
= 28 minutes.
𝜎 8
3. Find the probability
𝑃(𝑧 ≤ −0.75).

𝑃(𝑧 ≤ −0.75) = 0.2266

The probability that a student will complete the


examination in less than 28 minutes is 0.2266.

Page 10 of 16
(b)
Steps Actual process and result
1. List the given mean and
𝜇 = 34 minutes 𝜎
standard deviation.
= 8 minutes
2. Compute the 𝑧-score of 𝑥 𝑥 − 𝜇 40 − 34
𝑧= = = 0.75
= 40 minutes. 𝜎 8
3. Find the probability
𝑃(𝑧 ≥ 0.75).

𝑃(𝑧 ≥ 0.75) = 1 - 0.7734 = 0.2266

The probability that a student will


complete the examination in more than 40
minutes is 0.2266.
(c)
Steps Actual process and result
1. List the given mean
𝜇 = 34 minutes 𝜎
and standard
= 8 minutes
deviation.
2. Compute the 𝑧-scores
of 𝑥 = 28 minutes and
𝑥 = 40 minutes.
𝜎 8
𝑥 − 𝜇 40 − 34
𝑧= = = 0.75
𝜎 8

3. Find the probability


𝑃(−0.75 ≤ 𝑧 ≤ 0.75). 𝑃(−0.75 ≤ 𝑧 ≤ 0.75) = 0.7734 - 0.2266 = 0.5468

The probability that a student will complete the


examination between 28 and 40 minutes is 0.5468.

Page 11 of 13
Example 2.
The mean time to complete a mathematics exam is approximately
normally distributed with a mean of 30 minutes and a standard deviation of 7.
If 100 students take the examination, how many should finish in less than 25
minutes?

Solution.
Steps Actual process and result
1. List the given mean, 𝜇 = 30 minutes
standard deviation, and 𝜎 = 7 minutes
the number of students. Number of students= 100
2. Compute the 𝑧-scores of 𝑥
= 25 minutes.

3. Find the probability 𝑃(𝑧 ≤ Using the 𝑧-table, we obtain


−0.71). 𝑃(𝑧 ≤ −0.71). = 0.2389
The probability that a student will complete the
examination in less than 25 minutes is 0.2389.
4. Get the percentage of
students who complete 100 × 0.2389 = 23.89 ≈ 24 students
the examination in less
than 25 minutes by
Thus, 24 students will finish in less than 25 minutes.
multiplying the number of
students by the obtained
probability.

Example 3.

A company gives an employment test to all applicants for a job. The results
of the test are normally distributed with a mean score of 124 and a standard
deviation of 16. If only the top 75% of the applicants are to be interviewed, what
score must an applicant have to be interviewed?

Solution.

Steps Actual process and result


1. List the given 𝜇 = 124
information. 𝜎 = 16
Top 75% of the applicants will be interviewed

Page 12 of 16
2. Draw the area under
the normal curve
indicating the top 75%
= 0.7500. In this case,
we shade 75% from
the right.

3. Since the shaded region


is from the
right, we consider the
area 1 − 0.7500 = 0.2500
which is the area to the
left of the 𝑧-value we
want to obtain. In this
case, we look for the The area closest to 0.2500 is 0.2514. The 𝑧-value that
area equal to or closest corresponds to this area is −0.67.
to 0.2500.
4. Convert the 𝑧-value
obtained to score by
deriving 𝑥 from the
𝑥 = 𝜎𝑧 + 𝜇 = −0.67(16) + 124 = 113.28 ≈ 113

formula From the result, students with a score of at least 113


must be interviewed.

We have
𝑥 = 𝜎𝑧 + 𝜇

Page 13 of 13

You might also like