MMW (Module5)
MMW (Module5)
MODULE FIVE
DATA MANAGEMENT
CORE IDEA
Statistical tools derived from mathematics are useful in
processing and managing numerical data to describe a
phenomenon and predict values.
Learning Outcome:
5. Use a variety of statistical tools to process and manage
numerical data.
6. Use the methods of linear regression and correlations to
predict the value of a variable given certain conditions.
7. Advocate the use of statistical data in making important
decisions.
Unit Lessons:
Lesson 5.1 The Data
Lesson 5.2 Measures of Central Tendency
Lesson 5.3 Measures of Dispersion
Lesson 5.4 Measures of Relative Position
Lesson 5.5 Normal Distributions
Lesson 5.6 Linear Correlation
Lesson 5.7 Linear Regression
215
MATHEMATICS IN THE MODERN WORLD
Lesson
5.1 The Data
Specific Objectives
It is written in the Holy Book that “the truth shall set us free;” therefore,
understanding statistics paves the way towards intellectual freedom. For without
sufficient knowledge about it, we may be doomed to a life of half-truth. Statistics
will provide deeper insights to critically evaluate information and to bring us to the
well-lit arena of practicality.
Discussions
216
MATHEMATICS IN THE MODERN WORLD
of data but descriptive statistics will provide us certain tools to make the data
manageable to handle and conveniently neat to describe.
Inferential Statistics. We could probably argue that descriptive statistics, with its
characteristic to describe, is sufficient to depict any given information. While it is
effective to describe a manageable size of data, it can hardly engulf a sizeable
amount of data. Thus, for this kind of situation, inferential statistics is the
alternative technique that can be used. Inferential statistics has the ability to
“infer” and to generalize and it offers the right tool to predict values that are not
really known.
Let us consider the fictitious situation we made under descriptive statistics, but this
time instead of reporting the approximate monthly earning of some workers, we
want to determine the estimated monthly earnings of all the workers in a certain
region. By attempting to apply descriptive statistics, it would be impossible to ask
all the workers in the entire region about their monthly income. But by using
inferential statistics, we would instead practically decide to select just a small
number of workers and ask them of their monthly income. From there, we can
predict or approximate in a “more or less” fashion the monthly income of all
workers in the entire region.
217
MATHEMATICS IN THE MODERN WORLD
Measurement
Scales of Measurement
- Nominal Scale : Categorical Data
- Ordinal Scale : Ranked Data
- Interval/Ratio Scale : Measurement Data
Nominal Scale. It concerns with categorical data. It simply means using numbers to
label categories. This is done by counting the occurrence of frequency within
categories. One condition is that the categories must be independent or mutually
exclusive. This implies that once something is identified under a certain category,
then that something cannot be reassigned at the same time to another category.
218
MATHEMATICS IN THE MODERN WORLD
Obviously, those numbers only serve as labels and they do not contain any
numerical weight. Thus, we cannot say that married people (having been labelled
2) have more marital status than single people (having been labelled 1).
Ordinal Scale: It concerns with ranked data. There are instances wherein
comparison is necessary and cannot be avoided. Ordinal scale provides ranking of
the observation in order to generate information to the extent of “greater than”
or “less than;”. But the ranked data generated is limited also the extent of “greater
than” or “less than;”. It is not capable of telling information about how much
greater or how much less.
Ordinal scale can be best illustrated in sports activities like fun run. Finding the
order finish among the participants in a fun run always come up with a ranking.
However, ranked data cannot provide information as to the difference in time
between 1st placer and 2nd placer. Relative to this, reading reports with ordinal
information is also tricky. For example, a TV commercial extol a certain brand for
being the number one product in the country. This may seem acceptable, but if you
learned that there is no other product then definitely the message of the
commercial will be swallowed with an smirking face.
219
MATHEMATICS IN THE MODERN WORLD
Interval Scale: It deals with measurement data. In the nominal scale, we use
numbers to label categories while in the ordinal scale we use numbers to merely
provide information regarding greater than or less than. However, in interval scale
we assign numbers in such a way that there is meaning and weight on the value of
points between intervals. This scale of measurement provides more information
about the data. Consider the comparative illustration below:
As you may have noticed, the interval scale provides substantial information about
the grades of students. Student A earned a grade of 99, and so on and so forth.
Now look at the information given by ordinal data. It is simply about ranking. With
this of information, Student B can proudly and rightfully claim the 2 nd place in the
ranking. Ordinal scale is a trusted friend to keep a secret, that the grade of student
B though claiming 2nd place is actually 74. Let us analyze the nominal data in our
example. With this scale, it is also alright for the school sadly to announce that only
one student passed and four students failed. Nominal data cannot provide more
information specifically provide brighter limelight to student A. Audience may
assume that Student A just got passing grade a little bit higher than the passing
mark but student A grade of 99 will remain hidden forever.
220
MATHEMATICS IN THE MODERN WORLD
Sample. The small number of observation taken from the total number making up
a population is called a sample. As long as the observation or data is not the totality
of the entire population, then it is always considered a sample. For instance, in a
population of 100, then 1 is considered as a sample. 30 is clearly a sample. It may
seem absurd but 99 taken from 100 is still considered a sample. Not until we include
221
MATHEMATICS IN THE MODERN WORLD
that last number (making it 100) could we claim that it is already a population and
no longer a sample.
Statistic. In gauging the sample, any measure obtained from the sample is called a
statistic. Whenever we describe the sample, then it is called statistics. Since a
sample is easier to observe or gather than the population, then statistics are
simpler to gather than the parameter.
Graphical representation
Graphs. It is another way to visually show the behavior of data. To create a graph,
distribution of scores must be organized. For instance, in the scores provided
below, presenting the scores in an unorganized manner can provide confusing or
no information at all; Reporting raw can even hide some significant scores to be
noticed.
120, 65, 110, 75, 105, 80, 105,
85, 100, 85, 100, 90, 95, 90, 90
But when we arrange the scores from highest to lowest, which is a form of score
distribution, some pieces of information can gradually brought forth and exposed.
Distribution of Scores
120
110
105
105
100
100
95
90
90
90
85
85
80
75
65
222
MATHEMATICS IN THE MODERN WORLD
X f
(Raw score) (Frequency of Occurrence)
---------------------------------------------------------------------------
120 1
110 1
105 2
100 2
95 1
90 3
85 2
80 1
75 1
65 1
------------------------------------------------------------------------
223
MATHEMATICS IN THE MODERN WORLD
Another way of showing the data in graphical form is by using Microsoft Excel, as
also illustrated in the graphs below. It is the frequency polygon of the scores in our
cited example above.
Notice in the illustration of the frequency polygon, the two graphs may appear
different but they are actually the same and they disclose the similar information.
This illustration will allow you realize that unless you see things with a critical eye,
a graph can create a false impression of what the data really reveal. This is an
obvious situation showing how graphs can be used to distort reality if you are not
equipped with a critical statistical mind. This type of deceitful cleverness in
distorting graphs is common in some corporations devising the tinsel to camouflage
and also to portray some gigantic leaps in sales in order to attract more clients or
buyers.
224
MATHEMATICS IN THE MODERN WORLD
Lesson
5.2 Measures of Central Tendency
Specific Objectives
Discussion
As we venture into the realm of descriptive statistics, let us now focus in describing
the nature of a quantitative data. By using an appropriate descriptive technique,
we can organize and neatly summarize small amounts and large amounts of data
distribution. The procedure, utilizing measures of central tendency, allows us to
precisely describe the centrality of data distribution.
Measures of central tendency are methods that can used to determine information
regarding average, ranking, and category of any data distribution. Mean, median
and mode are the three tools in obtaining the measures of central tendency. But
only by knowing and using the appropriate tool that most accurate estimation of
centrality can be achieved. The objective of the measures of central tendency is to
describe the centrality of the distribution into a single numerical unit. This single
numerical unit must provide clear description about the common trait being
observed in the distribution of scores.
226
MATHEMATICS IN THE MODERN WORLD
The Mean
The most widely used measure of the central tendency is the mean ( ). It is
the arithmetic average of all the scores. The mean can be determined by adding all
the scores together and then by dividing by the total number of scores. The basic
formula for the mean is as follows:
∑𝑥
=
𝑁 The entire number of
observations being dealt with
Mean
In the example below concerning the annual income of 12 workers, the mean can
be found by calculating the average score of the distribution.
X
===========================
Php 200,000.00
200,000.00
195,000.00
194,000.00
194,000.00
194,000.00
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================
∑ 𝑥 = Php 2, 281,000.00
∑𝑥 2,281,000.00
= = =Php 190,083.00
𝑁 12
227
MATHEMATICS IN THE MODERN WORLD
Mean of Skewed Distribution. There are situations wherein the mean cannot be
trusted to provide a measure of central tendency because it portrays an extremely
distorted picture of the average value of a distribution of scores. For instance, let
us still consider our example of annual incomes but this time with some
adjustment. Let us introduce another score. The annual income of an affluent new
neighbor who happened to move to this town just recently. This new neighbor has
a frugal high annual income so extremely far above the others.
X
===========================
New neighbor
Php 2, 500,000.00
200,000.00
200,000.00
195,000.00
194,000.00
194,000.00
194,000.00
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================
∑ 𝑥 = Php 4, 481,000.00
∑𝑥 4,281,000.00
= = =Php 367,769.00
𝑁 13
228
MATHEMATICS IN THE MODERN WORLD
As you may have noticed, the mean income of Php 367,769.00 this time provides
a highly misleading picture of great prosperity for this neighborhood. The
distribution was unbalanced by an extreme score of the new affluent neighbor. This
is what we call an skewed distribution.
When the tail goes to the right, the curve is positively skewed; when it goes to the
left, it is negatively skewed. The skew is in the direction of the tail-off of scores, not
of the majority of scores. The mean is always pulled toward the extreme score in a
skewed distribution. When the extreme score is at the low end, then the mean is
too low to reflect centrality. When the extreme score is at the high end, the mean
is too high.
The Median
The median is the point that separates the upper half from the lower half of the
distribution. It is the middle point or midpoint of any distribution. If the
distribution is made up of an even number of scores, the median can be found by
determining the point that lies halfway between the two middlemost scores.
193,000.00
190,000.00 (190,000+185,000)
185,000.00 Median= 2
180,000.00
229
MATHEMATICS IN THE MODERN WORLD
X
===========================
➔➔➔ Php 2, 500,000.00
200,000.00
200,000.00
195,000.00
194,000.00
194,000.00
194,000.00 ----- 194,000.00 Median
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================
As you observed, even with the presence of extreme score at the high end of the
distribution- the value of the median is still undisturbed.
The Mode
Another measure of central tendency is called the mode. It is the most frequently
occurring score in a distribution. In a histogram, the mode is always located
beneath the tallest bar.
230
MATHEMATICS IN THE MODERN WORLD
X
===========================
Php 2, 500,000.00
200,000.00
200,000.00
195,000.00
194,000.00
194,000.00 Mode
194,000.00
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================
The mode provides an extremely fast way of knowing the centrality of the
distribution. You can immediately spot the mode by simply looking at the data and
find the dominant constant. It is the frequently occurring scores.
The best way to illustrate the comparative applicability of the mean, median and
mode is to look again at the skewed distribution.
231
MATHEMATICS IN THE MODERN WORLD
10,000
Frequency of Occurrence
Mode
100,000
Mean
20,000
Median
Most income is always skewed to the right because the low end has a fixed limit of
zero while the high end has no limit. If we consider that the area of the curve is 100
percent, then the median is the exact midpoint of the distribution. The area below
and above the median is both equal to 50 percent. Thus, if the median income is
P20,000.00 this means that 50% of the households have an income below
P20,000.00 and 50% of the households have an income above P20,000.00. On the
other hand, the mean in our figure above indicates a high income of P 100,000. This
makes the curve positively skewed. The value of the mean gives a distorted picture
of reality. The value of the mean is being unduly influenced by few affluent income
earners at the high end of the curve whose monthly income is almost around P
500,000.00. Looking at the modal income, which is P 10,000 per month, seemed
also to distort the reality towards the low side. The mode is always the highest point
of the curve. In this example, the mode represents the most frequently-earned
income; it is far lower than the median income of P 20,000.00. Both the mean and
the mode give a false portrait of distribution typicality and the truth lies somewhere
in between.
232
MATHEMATICS IN THE MODERN WORLD
The scale of measurement in which the data are based oftentimes dictates the
measures of central tendency to be used. The interval data can entertain the
calculations of all three measures of central tendency. The modal and ordinal data
cannot be used to calculate for the mean. Ordinal mean can provide an extremely
confusing wrong result. Since median is about ranking, a rank above the score falls
and a rank below a score falls; the ordinal arrangement is necessary in finding the
median. For the nominal data, however, neither the mean nor the median can be
used. Nominal data are restricted by simply using a number as a label for a category
and the only measure of central tendency permissible for nominal data is the mode.
233
MATHEMATICS IN THE MODERN WORLD
Lesson
5.3 Measures of Dispersion
Specific Objectives
The measures of central tendency only provide information about the similarity or
typicality of scores. But to fully describe the distribution, we need to gain
information about how scores differ or vary. The description of the distribution can
only be complete if some information of its variability is known. To substantiate the
information provided by the measures of centrality, some degree of dispersion
must also be brought into the light.
Discussion
Measures of Variability
There are three measures of variability: the range, the standard deviation and the
variance. These three measures give information about the spread of the scores in
a distribution. Metaphorically, variability assert that a glass half-full is also half
empty. Being half-full is about centrality and being half-empty is about variability.
235
MATHEMATICS IN THE MODERN WORLD
The example below shows the calculation of the range from a distribution of annual
incomes:
X
===========================
Php 200,000.00 Highest Score
200,000.00
195,000.00
194,000.00
194,000.00
HS-LS =Range
194,000.00
193,000.00 200,000 –176,000 = 24,000
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00 Lowest Score
===========================
The capability of the range is to give information about the scattering of the scores
by merely using two extreme points. But one the hand, capability of range to report
score deviation poses a severe limitation. If you add new scores within the
distribution, the range can never report any changes in the deviation. Also, just by
adding one extreme score amidst normal distribution can definitely increase or
decrease in range even if there are no other deviations that transpired within the
distribution. The range is not stable enough to indicate variability. But nevertheless
it is still a method in finding the variability of any given distribution.
The Standard Deviation. The standard deviation (SD) is the life-blood of the
variability concept. It provides measurement about how much all of the scores in
236
MATHEMATICS IN THE MODERN WORLD
the distribution normally differ from the mean of the distribution. Unlike the range,
which utilizes only two extreme scores, SD employs every score in the distribution.
It is computed with reference to the mean (not the median or the mode) and it
requires that the scores must be in interval form.
A distribution with small standard deviation shows that the trait being measured is
homogenous. While a distribution with a large standard deviation is indicative that
the trait being measured is heterogeneous. A distribution with zero standard
deviation implies that scores are all the same (i.e. 10, 10, 10, 10, 10). Although it
may seem like stating the obvious, it is important to note that if all the scores are
the same, there is no dispersion, no deviation, and no scattering of scores in the
distribution --- so much so that there can never be less than zero variability.
In calculating the standard deviation, we can either use the computational method
or the deviation method. Both methods provide the same answer. However, in this
lesson, we will use the computational method because it is designed for electronic
calculators.
തതതത
The mean of a distribution is symbolized as 𝑋
The formula simply states that the standard deviation (SD) is equal to the square
root of the difference between the sum of raw score squared, which is divided by
237
MATHEMATICS IN THE MODERN WORLD
the number of cases, and the mean squared (Sprinthall, 1994). Below is an example
on how to obtain the standard deviation using the computational method.
434.283,000,000
𝑆𝐷 = √ − (190,083)2
12
𝑺𝑫 = 𝟕𝟔𝟓𝟑. 𝟓𝟐𝟏
238
MATHEMATICS IN THE MODERN WORLD
Section A Section A
Math Quiz
Scores Scores
Mean 100 100
Median 100 100
Mode 100 100
N 30 30
HS, LS, Range 130, 70, 60 130, 70, 60
SD 10 2
Frequency of occurrence
Frequency of occurrence
----------------------------- ----------------------------
0 70 100 130 0 70 100 130
Section A Section B
Two Frequency Distributions of Scores
As can be noticed in the figure above, there is just a slight bulge in the middle of
the distribution of Section A. This means that it has many scores deviating widely
from the mean (100) and this is the result of having a large standard deviation (10).
However, Section B having a smaller standard deviation (2), most of the scores
gathers closely around the mean (100) thereby creating a towering lump. These
two distributions being compared reveals the disparity in the values of standard
deviation between the two sections. The section A having a large standard
239
MATHEMATICS IN THE MODERN WORLD
𝑿is any raw score in 𝒙it is the deviation score. It is equal to the raw score, 𝑿,
the distribution minus the mean, 𝑋ത : 𝑥 = 𝑋 − 𝑋ത
Σ𝑋 2 Σ𝑥 2
𝑉= 𝑆𝐷2 = − 𝑋ത 2 =
𝑁 𝑁
While standard deviation finds out how to spread out the distribution scores from
the mean by exploring the square root of the variance, the variance, on the other
hand, calculates the average degree by which each score differs from the mean -
i.e. the average of all the scores in the distribution. It may appear to be unnecessary
to study variance where, in fact, standard deviation seems complete. But there are
situations wherein it is more efficient to work directly with variances than to
frequently make courtesy appearances to the standard deviation. In fact, F Ratio
takes full utilization of this special property of variability.
240
MATHEMATICS IN THE MODERN WORLD
Lesson
5.4 Measures of Relative Position
Specific Objectives
In the previous lesson, we have demonstrated two separate but related measures
that can show the characteristics of the scores in a distribution. These are the
measures of central tendency and the measures of variability. In this lesson, we can
further explore all the possibilities that might occur in the relationship of centrality
and variability (i.e., mean and standard deviation). Let us consider having two sets
of distribution and different case scenarios that might occur in comparing their
respective means and standard deviations.
Discussion
The z- Score
Case A
𝜇1 = 𝜇2
𝜎1 = 𝜎2
As shown in Case A, it is possible that two distributions can generate almost the
same means (𝜇) and almost the same standard deviations (𝜎).
242
MATHEMATICS IN THE MODERN WORLD
Case B
𝜇1 ≠ 𝜇2
𝜎1 = 𝜎2
𝜇1 ≠ 𝜇2
𝜎1 = 𝜎2
It is also possible that two distributions have different means (𝜇) but similar
standard deviations (𝜎).
Case C
𝜇1 = 𝜇2
𝜎1 ≠ 𝜎2
Here in Case C, the two distributions have the same means (𝜇) but they differ
in standard deviation (𝜎).
Case D
𝜇1 ≠ 𝜇2
𝜎1 ≠ 𝜎2
243
MATHEMATICS IN THE MODERN WORLD
𝑋−𝜇 𝑋−𝑥̅
𝑧= 𝑧=
𝜎 𝑠
𝑋refers to the raw scores from the population. 𝑋refers to raw score from the sample
𝜇 pertains to the mean of the population ത pertains to the mean of the sample
𝑥
𝜎 population standard deviation 𝑠 estimated standard deviation
Both formulas indicate the same relationship shared by the raw score, mean and
standard deviation. The only distinction between the two formulas is that whether
the distribution was generated from the population or from the sample. The
formula in the left refers to the z-scores from the population while the formula in
the right refers to the z-scores from the sample.
𝑋−𝜇
𝑧=
𝜎
The formula explains that values generated by the mean and standard deviation
can be integrated to transform a raw score (𝑋) into a standard score (𝑧). The z-
𝑋−𝜇
score equation, 𝑧 = , can convert the raw score of any group into a common
𝜎
value and it enables comparison between scores coming from different group
distributions. The below is an illustration of a standardized scale. As you may have
noticed in this z-scaling, the mean is always zero and the standard deviation is
always one unit.
244
MATHEMATICS IN THE MODERN WORLD
𝜇=0
𝜎=1
To further clarify the concept of z-score, let us assume that you are taking physics
and biology courses. In your final examinations, you earned a grade of 95 in physics
and 85 in biology. Now the question is: In which exam did you do better?
It seems obvious based on the face value of the scores, that you did better in
physics than in biology. But to come up with a serious comparison about your
scores between the two tests, we must take into consideration the question about
how well your classmates perform as a whole group. This requires additional
information about the mean and standard deviation values of both physics and
biology groups. But let us assume that we can right away get those needed
information. As such:
𝜇 𝜎
(population mean) (population SD)
Physics 85 10
Biology 75 5
Now, let us substitute that information into the z-score formula and compute for
the z score values
Physics Biology
𝑋𝑝 − 𝜇𝑝 𝑋𝑏 − 𝜇𝑏
𝑍𝑝 = 𝑍𝑏 =
𝜎𝑝 𝜎𝑏
95−85 85−75
𝑍𝑝 = = 1.0 𝑍𝑏 = = 2.0
10 5
245
MATHEMATICS IN THE MODERN WORLD
Finally, let us place these z-score values into a z-scale to clearly illustrate the
measures.
𝒁𝒑 =1.0 𝒁𝒃 =2.0
|_____|_____|_____|_____|_____|______|_____|_____|
𝒁 -4 -3 -2 -1 0 +1 +2 +3 +4
Physics 45 45 55 65 75 85 95 105 115 125
Biology 55 55 60 65 70 75 80 85 90 95
Notice that in the illustration, we can clearly compare the relative position of scores
in one standardized scale. Notice also that the means of both subjects reconcile to
adopt a common mean of 0 (𝜇 = 0). Likewise, both subjects agree to calibrate their
standard deviations into a unit of one (𝜎 = 1). Thus, comparison can now be made
on your final examination scores. As displayed, your score of 95 in physics falls
directly below 1.0 on the z-scale. Your score of 85 in biology falls directly below 2.0
on the z-scale. It is clear that you did much better in the biology exam (𝑍𝑏 = 2.0)
than what we previously thought that you did better in physics. This example is only
a glimpse to show that standardized scores are the building blocks that provide the
foundation to inferential statistics.
Percentile
To locate a specific point in any distribution, percentiles, quartiles and deciles are
the tools that can be used. The relative position of the raw score can be described
precisely by converting it into a percentile. A percentile refers to a point in the
distribution below which a given percentage of scores fall.
246
MATHEMATICS IN THE MODERN WORLD
Based on the figure above, a score at the 97th percentile (P97) is at the very high end
of the distribution because an enormous number (97%) of scores are below that
point. A score at the 3rdpercentile (P3), however, is an extremely low score because
only 3% of the scores are below that point. The figure above also show that the 50 th
percentile divides the distribution exactly in half. The position of the 50th percentile
is also the location of the median.
To provide a better understanding on the role of the percentile, let us assume that
your College Admission Test Result reflected the 97th percentile score. This does
not indicate that out of 100 items of questions, you just made around three
mistakes. Instead, it means that 97% of those who took the exam did not perform
better than you. However, a significant 3% did perform better than you.
The percentile of any given data value score (x) can be determined by dividing the
number of data values less than x with total number of data values, and then
multiplying the obtained result by 100. For instance, consider a College Admission
Test administered to 5000 students, and your score of 800 was higher than the
scores of 4000 examinees. With this information , we can determine the percentile
of your score by using the formula the:
247
MATHEMATICS IN THE MODERN WORLD
= 80
Quartiles. As the name implies, quartiles divide the distribution into quarters.
Q1 Q2 Q3
The first quartile, Q1, is actually on the 25thpercentile. The second quartile, Q2,
coincides with the median, which is on the 50th percentile. The 3rd Quartile, Q3, is
on the 75th percentile. The Q can be determined by using the following procedures:
248
MATHEMATICS IN THE MODERN WORLD
Let us consider this example and determine Q1, Q2, and Q3.
X
===========================
Php 200,000.00
200,000.00
195,000.00
194,000.00
193,000.00
192,000.00
191,000.00
190,000.00
185,000.00
181,000.00
180,000.00
176,000.00
===========================
First, make sure that the scores are arranged from highest to lowest.
Q1 =.25 (n+1)
Q1 =.25 (12+1)
Q1 = 3.25 Q1=182,000
249
MATHEMATICS IN THE MODERN WORLD
Q3=194,750
Q3 =.75 (n+1)
Q3 =.75 (12+1)
Q3 = 9.75
Box-and-Whisker Plots
A box and whisker plot displays a graphical summary of a set of data. It provides
information about the minimum and the maximum scores in the distribution, the
1st Quartile and 3rdQuartile as well as the 2nd quartile or the median. Observe the
figure below.
250
MATHEMATICS IN THE MODERN WORLD
X
===============
Php 200,000.00 HS
200,000.00
195,000.00 Q2
194,000.00
193,000.00
192,000.00
Median
191,000.00
190,000.00
185,000.00 Q1
181,000.00
180,000.00
176,000.00 HS
================
Box-and-Whisker plots are easy to construct and they outrightly show important
information about the distribution of scores in a simple diagram. Also, it is not
necessary to label the final product.
|---|---|---|---|---|---|---|---|---|---|---|
251
MATHEMATICS IN THE MODERN WORLD
Lesson
5.5 The Normal Distributions
Specific Objective
If mean and standard deviation are heart and brain of descriptive statistics then
perhaps the normal curve is its lifeblood. In the preceding section, we discussed in
passing the z-scores, wherein the mean is always zero and the standard deviation
is fixed to 1. In this section, it is now proper to finally introduce the normal curve.
The normal curve is actually a theoretical distribution. It is a unimodal frequency
distribution curve. The scores are scattered on the X axis while the frequency of
occurrence is defined by the Y axis.
254
MATHEMATICS IN THE MODERN WORLD
Discussions
1. Majority of the scores cluster around the middle of the distribution and
fewer scores scattered in both extreme sides or tail ends of the curve.
2. It is always symmetrical and perfectly balanced.
3. Being a theoretical distribution, the mean, median and the mode are all
equal.
4. It uses standard deviation along the x-axis.
5. The normal curve is asymptotic to the abscissa and the total area under
the curve is approximating 1.0 or 100%
6. The normal curve has a mean of zero and standard deviation of 1 unit.
255
MATHEMATICS IN THE MODERN WORLD
The table we will be using is a right tail z-table. This table is used to find the area
between z=0 and any positive value and reference the area to the right side of the
standard deviation curve. The z-score table gives only the percentage for the half
of the curve. But since the normal curve is symmetrical, a z-score that is given to
the right of the mean yields the same percentage as a z score to the left of the mean
Mean line
For example, to look up a z-score of .68 using the z-score table, look for 0.6 in the
far left of the column then look for the second decimal 0.08 in the top row. The
table value is 0.25175. It represents a percentage of 25.17 %. It is the percentage
of cases falling between the z score and the mean.
256
MATHEMATICS IN THE MODERN WORLD
Mean 0.68
Z score
25.17% is the percentage of cases falling between the z score (0.68) and the
mean.
Now, let us consider some situations that might possibly occur in using the z-table
Case 1. Finding percentage of cases falling between z-score and the mean.
This area is 24.215%
This area is 24.215%
257
MATHEMATICS IN THE MODERN WORLD
As example for Case 1, the z-score of +0.75 will generate a z-table value of
0.24215 or 24.215%. In the same way, the z-score of -0.75 will generate the same
value-table value of 0.24215 or 24.215%. Notice that the value is always a
positive number since percentage area is always positive.
Case 2. Finding the percentage of cases above the given z-score. It is important to
remember for this case that the total area of the normal curve is 1.0 or 100%. It is
also essential to keep in mind that the right half of the normal curve is 50% as well
as the left half (50%). You also need to consider that the z-table always provide a
percentage value in relation to the mean.
+0.75 -0.75
Mean
++
Z score - Z score Mean
(a) (b)
For Case 2(a), To find the area above the given z-score, the equivalent z-table value
must be determined then subtract it from the total area of the right half which is
50%. For example, to find the percentage of cases above the z-score of +0.75. Find
the z-table value of +.75 which is 0.24215 (24.215%) then subtract it from the total
area of the right half of the normal curve which is 50%. This is 50% - 24.214% =
25.785%
For Case 2(b), in order the determine the area above the given z-score (the z-score
here is a negative number because it is situated in the left side of the normal curve)
, simply find the equivalent z-table value then add 50%. Again, always keep in mind
258
MATHEMATICS IN THE MODERN WORLD
that the z-table only provide a percentage of cases between the z-score the mean
and not the entire right side of the curve. To cite another example, let us find the
percentage of cases above the z-score of -0.75. The z-table value of -0.75 is
0.24215. This is equivalent to 24.215%. With this number just add the percentage
area of the entire right side which is 50%. So this is 24.215% + 50% =74.215%.
Case 3. Finding the percentage of cases below the given z-score. The principle we
made in Case 2 is the same principle that can be applied in Case 3.
-0.75 +0.75
Mean
-Z score Mean + Z score
(a) (b)
For case 3(a), try to determine the percentage of cases below the z-score of -0.75.
Using similar analysis made in case 2(a), the total area of the left side must be
subtracted. If your computation is correct, your answer is 25.785%.
For case 3(b), to determine the percentage of cases below the z-score of +0.75. The
z-table value will only cover the percentage of cases between the z-score and the
mean, so you need to add 50% which is the l percentage of cases of the left side of
the normal curve. Your computation must generate an answer of 74.215%.
259
MATHEMATICS IN THE MODERN WORLD
-0.75 +0.75
Mean
-Z score +Z score
To illustrate Case 4, let us try to determine the percentage of cases between the
two z-scores. The -0.75 Z-score and +0.75 z-score. The -0.75 z-score generates a
z-table value of 24.215%. Also +0.75 z-score generates the same z-table value of
24.215%. Thus, the percentage of cases between -0.75 and +0.75 is simply to add
the two percentage of cases and that is (24.215% + 24.215%) 48.43%.
We are now familiar with the z-score concepts and having a knowledge about
percentages of area above, below and between z-scores. Likewise, we are also
equipped with certain knowledge regarding the z-score formula that if the mean
and standard deviation are known, we can subtract the mean from the raw score,
divide by standard deviation, and obtain the z score.
𝑥−𝑥̅
𝑧=
𝑆𝐷
The z-score reveals the location of the raw score from the mean in the standard
deviation units. The z score accounts both the mean of the distribution and the
amount of variability. Now, let us determine the practical use z-score in the context
of normal distribution of raw scores.
260
MATHEMATICS IN THE MODERN WORLD
Case A. When the percentage of cases is between the raw score and the mean.
The normal distribution of physics scores has mean of 85 and a standard deviation
of 10. What percentage of scores will fall between the physics score of 95 and the
mean?
Initially, we need to convert the raw score of 95 into its equivalent z-score.
𝑥−𝑥̅ 95−85
𝑧= = = 1.0
𝑆𝐷 10
34.13%
85 95
𝑋ത (1.00)
Case B. When the percentage of cases fall below a raw score. Using the same
example, on a normal distribution of scores in physics class, with a mean of 85 and
a standard deviation of 10, what percentage of physics scores fall below a score of
95?
261
MATHEMATICS IN THE MODERN WORLD
34.13%
50%
85 95
𝑋ത (1.00)
Again, we need to convert the raw score of 95 into its equivalent z-score.
𝑥−𝑥̅ 95−85
𝑧= = = 1.0
𝑆𝐷 10
85 95
𝑋ത (1.00)
262
MATHEMATICS IN THE MODERN WORLD
80 85 95
(-0.5) 𝑋ത (1.00)
263
MATHEMATICS IN THE MODERN WORLD
the raw score of 95 and 80. The answer is 53.28%. This means that 1 in 2 students
got a score between 95 and 85 (i.e. between your score and your friend’s score).
At this point, we already made a significantly long journey. From the measures of
central tendency to the measures of variability and finally to measures of relative
position. We are now in the position no longer seeking answers to questions but
seeking questions beyond the conventions established by the answers.
264
MATHEMATICS IN THE MODERN WORLD
Lesson
5.6 The Linear Correlation: Pearson r
Specific Objective
Discussions
The Pearson R Linear Correlation.
266
MATHEMATICS IN THE MODERN WORLD
correlation but it can never justify any causal relation that may appear
or seemed obvious.
The number of
subjects Means
Σ𝑋𝑌
−(𝑥̅ )(𝑦ത)
𝑁
𝑟= 𝑆𝐷𝑥 𝑆𝐷𝑦
Standard Deviations
The pearson r value may provide three possible scenarios. If the value of 𝑟 is + then
it is a positive correlation. If it is - then it is a negative correlation. If 𝑟’s value is
around “0” then it means that almost no linear correlation found.
𝒓 = +𝟏 𝒓 = −𝟏 𝒓=𝟎
We will explain the nature of linear correlation by using an example. Assuming that
we want to determine if there is a correlation between hours of study and grades
of students last semester. Initially, we need to randomly select students (let say 10)
and ask them about their averaged grade last semester as well as the number of
hours they spent in studying per week in that semester. Let us presume that right
away they provided us these two informations.
===============================================
Student Hours of Study (x) Grade (Y)
===============================================
A 15 2.75
B 35 1.25
C 05 3.00
D 20 2.50
E 30 1.50
F 40 1.00
G 20 2.25
H 25 1.75
I 25 2.00
J 08 3.00
268
MATHEMATICS IN THE MODERN WORLD
Σ𝑋𝑌
𝑁
−(𝑥̅ )(𝑦ത)
𝑟= 𝑆𝐷𝑥𝑆𝐷𝑦
Now let us take into account the data below as our example to illustrate the
formula.
=================================================================================
Student Hours of Study (𝑥) 𝑥2 Grade (𝑦) 𝑦2 𝑥𝑦
==================================================================================
A 15 225 2.75 7.56 41.25
B 35 1225 1.25 1.56 43.75
C 05 25 3.00 9.00 15.00
D 20 400 2.50 6.25 50.00
E 30 900 1.50 2.25 45.00
F 40 1600 1.00 1.00 40.00
G 20 400 2.25 5.06 45.00
H 25 625 1.75 3.06 43.75
I 25 625 2.00 4.00 50.00
J 08 64 3.00 9.00 24.00
====================================================================================
𝚺𝒙=223 𝚺𝒙𝟐=6089 𝚺𝐲=21 𝚺𝒚𝟐 = 48.75 𝚺𝒙𝒚=397.75
269
MATHEMATICS IN THE MODERN WORLD
Σ𝑥 223 Σy 21
𝑥̅ = = = 22.3 𝑦ത = = 10 = 2.1
𝑁 10 𝑁
Σ𝑥 2 6089 Σ𝑦 2 48.75
𝑆𝐷𝑥 = √ − 𝑥̅ 2 = √ − 22.32 = 10.56 𝑆𝐷𝑦 = √ − 𝑦ത 2 = √ − 2.12 =0.682
𝑁 10 𝑁 10
Σ𝑋𝑌
𝑁
−(𝑥̅ )(𝑦ത)
𝑟= 𝑆𝐷𝑥𝑆𝐷𝑦
397.75
− (22.3)(2.1)
𝑟 = 10
(10.56)(0.682)
𝒓= -0.979
Point to Ponder: Why do you think we generated a negative r value?
Thus, we could say that the correlation between hours of study and grades of
students achieved a Pearson r value of -0.979. Do not be confused by the that there
is a negative sign in our final answer. This sign provides an idea of the direction of
correlation line. You should take into consideration that a grade of 1.0 has a strong
academic weight in our grading system but once plug in into the computation it is
interpreted by formula as a small number. Nevertheless, with full knowledge of the
concept you can always come up with the right interpretation.
Since the distribution exclusively concerns the 10 students and it is not a population
sample, then Guilford’s suggested interpretation for the values of r can be used
without hindrance.
270
MATHEMATICS IN THE MODERN WORLD
===============================================================
Does it mean that better grades can be achieved by spending more time studying?
Does it mean that spending more time studying is a by-product of better grades?
Does it mean that another factor influenced better grades and study habits?
All three of these questions are possible. But the point is that correlation alone
is not enough to identify which is the real explanation. Pearson r is not a tool for
establishing causation. It can only a tool describe linear correlation between to
observed traits.
271
MATHEMATICS IN THE MODERN WORLD
Lesson
5.7 The Least-Squares Regression Line
Specific Objective
Discussion
A bivariate simply means that we can graphically represent two variables (x and y)
in a scatter plot wherein each point in a scatter plot represent a pair of scores.
Scatter plot is necessary in order to determine the regression line. The regression
line a generated straight line that lies closest to all the point in the scatter plot.
273
MATHEMATICS IN THE MODERN WORLD
Our example below illustrates the construction of scatter plot based on some data
information regarding the association of our previous example on hours of study
and grade.
𝑑1 𝑑2 𝑑3 𝑑4 𝑑5 𝑑6 𝑑7 𝑑8 𝑑9 𝑑10
As shown in the scatter plot above, the straight line is called the least-squares
regression line. This generated line minimizes the sum of the squares of the vertical
deviation from each data point to the line. This means that of all the possible lines
that can suggest the correlation line strength of all the points, the equation of this
generated line has the best fit. The 𝑑𝑛 represents the distance from point (x,y) to
the line.
𝑑12 + 𝑑22 + 𝑑32 + 𝑑42 + 𝑑52 + 𝑑62 + 𝑑72 + 𝑑82 + 𝑑92 + 𝑑10
2
In the least-squares line, this correlation that can be established around the
regression line is the basis for resulting prediction. But in order to make predictions,
three important ingredients must be on hand: 1. The equation of the best fit line.
2. Slope of the line, and 3. The y-intercept of the line.
𝑦 = 𝑚𝑥 + 𝑏
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦) (Σ𝑦)−𝑚(Σ𝑥)
𝑚= 𝑏=
𝑛(Σ𝑥 2 )−(Σ𝑥)2 𝑛
MATHEMATICS IN THE MODERN WORLD
To apply this formula to our given data, we need to find the value of each
summation.
𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦) 10(397.75)−(223)(21)
𝑚= = = -0.06321
𝑛(Σ𝑥 2 )−(Σ𝑥)2 10(6089)−49729
(Σ𝑦)−𝑚(Σ𝑥) (21)−(−0.06321)(223)
𝑏= = =3.509
𝑛 10
Finally, substituting the values to the given formula:
275
MATHEMATICS IN THE MODERN WORLD
𝒚𝒑𝒓𝒆𝒅 = 𝒎𝒙 + 𝒃
𝒚𝒑𝒓𝒆𝒅 = -0.06321x + 3.509
Slope(𝒎) = -0.06321
𝒚 intercept (𝒃) = 3.509
Since we have already determined the regression the line, let us just simply plug
all the necessary values then “𝑦”.
================================================================
𝒚𝒑𝒓𝒆𝒅 = 𝒎𝒙 + 𝒃
𝒚𝒑𝒓𝒆𝒅 = -0.06321x + 3.509
================================================================
𝒙 − 𝟎. 𝟎𝟔𝟑𝟐𝟏𝒙 − 𝟎. 𝟎𝟔𝟑𝟐𝟏𝒙 + 𝟑. 𝟓𝟎𝟗 = 𝒚𝒑𝒓𝒆𝒅
37 -2.33877 1.17023
22 -1.39062 2.11838
08 -0.50568 3.00332
================================================================
The predicted grade of students is around 1.17 for the student who spends 37
hours of study, 2.12 for the one spending 22 hours of study, and just a passing grade
of 3.0 for the one engaged for eight hours of study.
276
MATHEMATICS IN THE MODERN WORLD
For this culminating requirement in Module Five, you need to work together in
groups of 3 or 4.
1. Your task is to prepare a proposal study that can contribute to a solution to
any social problem.
2. You must use statistical methods for your data processing and analyses.
3. Your final output must be no more than 8 pages that details your project
proposal.
4. Please follow the outline provided below:
a. Title page (not included in the page count)
-An example of problem to be addressed: In this COVID-19 pandemic,
how can we reduce human traffic in wet market places.
b. Background and Statement of the Problem
c. Literature Review
d. Proposed Study with emphasis on how statistics will be used
i. Data to be collected
ii. Methods of data collection and data gathering instrument
iii. Data gathering procedure
iv. Method of Analyses
e. Discussion of how your project proposal can address the identified problem.
f. References (APA or MLA)
278
MATHEMATICS IN THE MODERN WORLD
Chapter Test 5
Multiple Choice. Choose the letter of the correct answer and write it on the
blank provided at the left side of the test paper.
==================================================================
__________ 1. It is a branch of statistics that deals with data analysis and one of its
technique is to “describe” data in symbolic form and abbreviated fashion.
a. Inferential Statistics c. Descriptive Statistics
b. Statistics and Probability d. Probability
__________ 2. It is a branch of statistics that has the ability to “infer” and to generalize.
It is also the right tool to predict values that are not really known.
a. Inferential Statistics c. Descriptive Statistics
b. Statistics and Probability d. Probability
279
MATHEMATICS IN THE MODERN WORLD
a. 53 c. 55
b. 54 d. 56
__________ 8. It is the middle point or midpoint of any distributions. It separates the upper
half from the lower half of distribution.
a. Mean c. Mode
b. Median d. Range
__________ 9. The following is a list of retirement ages for the workers in production
plant: 65, 64, 65, 61, 62, 64, 65, 63, 63, 65, 64. What is the median?
a. 64 c. 63
b. 65 d. 62
__________ 10. It is the most frequently occurring score in the distribution.
a. Mean c. Mode
b. Median d. Range
__________ 11. These three measures can provide the information about spread of the
scores in the distribution.
a. Mean, median and mode c. Range, standard deviation and variance
b. Mean, range, and variance d. Range, median and mode
280
MATHEMATICS IN THE MODERN WORLD
a. 2.70 c. 2.72
b. 2.71 d. 2.74
__________ 14. What is the median?
a. 3. 00 c. 1.75
b. 2.25 d. 2.00
__________ 15. If the standard deviation in a distribution is 4, what is the variance?
a. 8 c. 32
b. 16 d. 64
__________ 16. In comparing different groups, there must be a standard scale that can
reconcile both means and standard deviation in single standard form. It is
only then that direct comparison is possible because transformed scores
from different distributions will share common scores and these common
scores are called ____.
a. Percentile c. T-score
b. Quartile d. Z-score
__________ 17. Jerry took College Admission Test which reflected at 89th percentile,
what does it indicates?
a. 89% of those who took the exam did not get it right than Jerry.
b. Out of 100 items of questions, Jerry had 11 mistakes.
c. Jerry answered 89 questions correctly.
d. 89% of those who took the exam get it right than Jerry.
__________ 18. It divides the distribution into quarters.
a. Percentile c. T-score
b. Quartile d. Z-score
__________ 19. The third quartile Q3 is on what percentile rank?
a. 25th percentile c. 100th percentile
b. 50th percentile d. 75th percentile
281
MATHEMATICS IN THE MODERN WORLD
282
MATHEMATICS IN THE MODERN WORLD
Given the data chart of selected persons with their ages and daily incomes,
calculate the Pearson’s correlation coefficient.
283
MATHEMATICS IN THE MODERN WORLD
284
MATHEMATICS IN THE MODERN WORLD
285