0% found this document useful (0 votes)
6 views63 pages

MMW (Module5)

Module Five of 'Mathematics in the Modern World' focuses on data management and the use of statistical tools to process numerical data for analysis and prediction. It covers various lessons including measures of central tendency, dispersion, and the importance of understanding statistics in decision-making. The module emphasizes the distinction between descriptive and inferential statistics, as well as the significance of different scales of measurement.

Uploaded by

gamarrochristine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views63 pages

MMW (Module5)

Module Five of 'Mathematics in the Modern World' focuses on data management and the use of statistical tools to process numerical data for analysis and prediction. It covers various lessons including measures of central tendency, dispersion, and the importance of understanding statistics in decision-making. The module emphasizes the distinction between descriptive and inferential statistics, as well as the significance of different scales of measurement.

Uploaded by

gamarrochristine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 63

MATHEMATICS IN THE MODERN WORLD

MODULE FIVE

DATA MANAGEMENT

CORE IDEA
Statistical tools derived from mathematics are useful in
processing and managing numerical data to describe a
phenomenon and predict values.
Learning Outcome:
5. Use a variety of statistical tools to process and manage
numerical data.
6. Use the methods of linear regression and correlations to
predict the value of a variable given certain conditions.
7. Advocate the use of statistical data in making important
decisions.

Unit Lessons:
Lesson 5.1 The Data
Lesson 5.2 Measures of Central Tendency
Lesson 5.3 Measures of Dispersion
Lesson 5.4 Measures of Relative Position
Lesson 5.5 Normal Distributions
Lesson 5.6 Linear Correlation
Lesson 5.7 Linear Regression

 Time Allotment: Ten lecture hours

215
MATHEMATICS IN THE MODERN WORLD

Lesson
5.1 The Data

Specific Objectives

1. To Understand the nature of statistics.


2. To gain deeper insights on the different levels.
of measurements.
3. To clarify the meaning of some important key concepts.
4. To explore the strengths and limitations of graphical
representation.

It is written in the Holy Book that “the truth shall set us free;” therefore,
understanding statistics paves the way towards intellectual freedom. For without
sufficient knowledge about it, we may be doomed to a life of half-truth. Statistics
will provide deeper insights to critically evaluate information and to bring us to the
well-lit arena of practicality.

Discussions

General Fields of Statistics: Descriptive Statistics and Inferential Statistics

Descriptive Statistics. If statistics, in general, basically deals with analysis of data,


then descriptive statistics part of the general field is about “describing” data in
symbolic forms and abbreviated fashions. Sometimes we dealing with a large
amount of data and that it is impossible to describe it as it is being a large amount

216
MATHEMATICS IN THE MODERN WORLD

of data but descriptive statistics will provide us certain tools to make the data
manageable to handle and conveniently neat to describe.

To explore the characteristics of descriptive statistics, let us create a fictitious


situation. What does it mean if someone tells you that majority of workers earn
approximately P20,000.00 in a month? Were you able to dissect the idea behind
the plain statement? Does it trigger your mind to question further?

This statement is a piece of information that described a particular trait or


characteristic of a group of workers. Supplied with this singular information but
armed with statistical inquisitiveness, descriptive statistics can further describe the
given information to the extent of its depth and breadth.

Inferential Statistics. We could probably argue that descriptive statistics, with its
characteristic to describe, is sufficient to depict any given information. While it is
effective to describe a manageable size of data, it can hardly engulf a sizeable
amount of data. Thus, for this kind of situation, inferential statistics is the
alternative technique that can be used. Inferential statistics has the ability to
“infer” and to generalize and it offers the right tool to predict values that are not
really known.

Let us consider the fictitious situation we made under descriptive statistics, but this
time instead of reporting the approximate monthly earning of some workers, we
want to determine the estimated monthly earnings of all the workers in a certain
region. By attempting to apply descriptive statistics, it would be impossible to ask
all the workers in the entire region about their monthly income. But by using
inferential statistics, we would instead practically decide to select just a small
number of workers and ask them of their monthly income. From there, we can
predict or approximate in a “more or less” fashion the monthly income of all
workers in the entire region.

217
MATHEMATICS IN THE MODERN WORLD

Of course, inference or generalization is a risky process that is why we need to


ensure that the small group of workers we selected are the approximate
representative of the workers in the entire region. But nevertheless, this inference
or prediction is better than chance accuracy.

Measurement

It essentially means quantifying an observation according to a certain rule. For


instance, the presence of fever can be quantified by using a thermometer. Body
weight can be determined by using a weighing scale. Or the mental ability can be
quantified by using written examination that can generate scores. The
quantification sometimes can be done is simply counting. In quantifying an
observation, there are two types of quantitative informations: variable and
constant. A variable is something that can be measured and observed to vary.
While a constant is something that does not vary, and it only maintains a single
value.

Scales of Measurement
- Nominal Scale : Categorical Data
- Ordinal Scale : Ranked Data
- Interval/Ratio Scale : Measurement Data

To quantify an observation, it is necessary to identify its scale of measurement, it


is known as level of measurement. Scale of measurement is the gateway to the
fascinating world of statistics. Without sufficient knowledge of it, all our statistical
learnings lead to nowhere.

Nominal Scale. It concerns with categorical data. It simply means using numbers to
label categories. This is done by counting the occurrence of frequency within
categories. One condition is that the categories must be independent or mutually
exclusive. This implies that once something is identified under a certain category,
then that something cannot be reassigned at the same time to another category.

218
MATHEMATICS IN THE MODERN WORLD

An example for this, if we want to measure a group of people according to marital


status. We can categorize marital status by simply assigning a number. For instance
“1” for single and “2” for married.

Marital Status: Single (1) and Married (2)


(1) (2)

Obviously, those numbers only serve as labels and they do not contain any
numerical weight. Thus, we cannot say that married people (having been labelled
2) have more marital status than single people (having been labelled 1).

Ordinal Scale: It concerns with ranked data. There are instances wherein
comparison is necessary and cannot be avoided. Ordinal scale provides ranking of
the observation in order to generate information to the extent of “greater than”
or “less than;”. But the ranked data generated is limited also the extent of “greater
than” or “less than;”. It is not capable of telling information about how much
greater or how much less.

Ordinal scale can be best illustrated in sports activities like fun run. Finding the
order finish among the participants in a fun run always come up with a ranking.
However, ranked data cannot provide information as to the difference in time
between 1st placer and 2nd placer. Relative to this, reading reports with ordinal
information is also tricky. For example, a TV commercial extol a certain brand for
being the number one product in the country. This may seem acceptable, but if you
learned that there is no other product then definitely the message of the
commercial will be swallowed with an smirking face.

219
MATHEMATICS IN THE MODERN WORLD

Interval Scale: It deals with measurement data. In the nominal scale, we use
numbers to label categories while in the ordinal scale we use numbers to merely
provide information regarding greater than or less than. However, in interval scale
we assign numbers in such a way that there is meaning and weight on the value of
points between intervals. This scale of measurement provides more information
about the data. Consider the comparative illustration below:

Academic performance of five students in a certain class


Student A Student B Student C Student D Student E
Interval Data 99 74 73 70 70
Ordinal Data 1st 2nd 3rd 4th 5th
Nominal Data Passed Failed Failed Failed Failed

As you may have noticed, the interval scale provides substantial information about
the grades of students. Student A earned a grade of 99, and so on and so forth.
Now look at the information given by ordinal data. It is simply about ranking. With
this of information, Student B can proudly and rightfully claim the 2 nd place in the
ranking. Ordinal scale is a trusted friend to keep a secret, that the grade of student
B though claiming 2nd place is actually 74. Let us analyze the nominal data in our
example. With this scale, it is also alright for the school sadly to announce that only
one student passed and four students failed. Nominal data cannot provide more
information specifically provide brighter limelight to student A. Audience may
assume that Student A just got passing grade a little bit higher than the passing
mark but student A grade of 99 will remain hidden forever.

Ratio Scale. This is an extension of an interval scale. It also pertains with


measurement data but ratio’s point of view is about absolute value. Because of
this, we oftentimes cannot utilize ratio scale in the social sciences. We cannot
justify an absolute value to gauge intelligence. We cannot say that our student A
with a grade of 99 has an intelligence several points superior than student E who
hardly but successfully achieved a grade of 70.

220
MATHEMATICS IN THE MODERN WORLD

Key Concepts in Statistics

Population. A population can be defined as an entire group people, things, or


events having at least one trait in common (Sprinthall, 1994). A common trait is the
binding factor in order to group a cluster and call it a population. Merely having a
clustering of people, things or events cannot be considered as a population. At least
one common trait must be established to make a population. But, on the other
hand, adding too many common traits can also limit the size of the population. In
the illustration below, notice how a trait can severely reduce the size or
membership in the population.

A group of students (this is a population, since the common trait is “students”)


A group of male students.
A group of male students attending the Statistics class
A group of male students attending the Statistics class with iPhone
A group of male students attending the Statistics class with iPhone and
Earphone
As we read the list, we can mentally visualize that the size of the population is
dramatically becoming smaller and as we add more traits we may wonder if anyone
still qualifies. The more common traits we add, the more we reduce the designated
population.

Parameter. In gauging the entire population, any measure obtained is called a


parameter. Situationally, if someone asks you as to what is the parameter of the
study, then bear in mind that he is referring to the size of the entire population. In
some situations where the actual size of the population is difficult to obtain; the
parameters are in the form of estimate or inference.

Sample. The small number of observation taken from the total number making up
a population is called a sample. As long as the observation or data is not the totality
of the entire population, then it is always considered a sample. For instance, in a
population of 100, then 1 is considered as a sample. 30 is clearly a sample. It may
seem absurd but 99 taken from 100 is still considered a sample. Not until we include

221
MATHEMATICS IN THE MODERN WORLD

that last number (making it 100) could we claim that it is already a population and
no longer a sample.

Statistic. In gauging the sample, any measure obtained from the sample is called a
statistic. Whenever we describe the sample, then it is called statistics. Since a
sample is easier to observe or gather than the population, then statistics are
simpler to gather than the parameter.

Graphical representation

Graphs. It is another way to visually show the behavior of data. To create a graph,
distribution of scores must be organized. For instance, in the scores provided
below, presenting the scores in an unorganized manner can provide confusing or
no information at all; Reporting raw can even hide some significant scores to be
noticed.
120, 65, 110, 75, 105, 80, 105,
85, 100, 85, 100, 90, 95, 90, 90

But when we arrange the scores from highest to lowest, which is a form of score
distribution, some pieces of information can gradually brought forth and exposed.

Distribution of Scores
120
110
105
105
100
100
95
90
90
90
85
85
80
75
65

222
MATHEMATICS IN THE MODERN WORLD

The score distribution can still be organized in a form of a frequency distribution.


Frequency distribution provides information about raw scores, and the frequency
of occurrences. Frequency distribution provides clearer insights about the behavior
of scores.

X f
(Raw score) (Frequency of Occurrence)
---------------------------------------------------------------------------
120 1
110 1
105 2
100 2
95 1
90 3
85 2
80 1
75 1
65 1
------------------------------------------------------------------------

Another alternative way of presenting data in frequency distribution is to present


them in a tabular form. A tabular form has the advantage of showing the visual
representation of the data. This kind of presentation is more appealing to the
general audience.
Frequency of Occurrence

0 60 70 80 90 100 110 120 130


Raw scores

223
MATHEMATICS IN THE MODERN WORLD

Another way of showing the data in graphical form is by using Microsoft Excel, as
also illustrated in the graphs below. It is the frequency polygon of the scores in our
cited example above.

Notice in the illustration of the frequency polygon, the two graphs may appear
different but they are actually the same and they disclose the similar information.
This illustration will allow you realize that unless you see things with a critical eye,
a graph can create a false impression of what the data really reveal. This is an
obvious situation showing how graphs can be used to distort reality if you are not
equipped with a critical statistical mind. This type of deceitful cleverness in
distorting graphs is common in some corporations devising the tinsel to camouflage
and also to portray some gigantic leaps in sales in order to attract more clients or
buyers.

224
MATHEMATICS IN THE MODERN WORLD

Lesson
5.2 Measures of Central Tendency

Specific Objectives

: 1. To know the different measures of central tendency.


2. To comprehend the limitations of the three measures.
3. To realize the effect of the measures in the distribution.
4. To critically know how to select appropriate measure to
describe a certain distribution.

Discussion

As we venture into the realm of descriptive statistics, let us now focus in describing
the nature of a quantitative data. By using an appropriate descriptive technique,
we can organize and neatly summarize small amounts and large amounts of data
distribution. The procedure, utilizing measures of central tendency, allows us to
precisely describe the centrality of data distribution.

Measures of central tendency are methods that can used to determine information
regarding average, ranking, and category of any data distribution. Mean, median
and mode are the three tools in obtaining the measures of central tendency. But
only by knowing and using the appropriate tool that most accurate estimation of
centrality can be achieved. The objective of the measures of central tendency is to
describe the centrality of the distribution into a single numerical unit. This single
numerical unit must provide clear description about the common trait being
observed in the distribution of scores.

226
MATHEMATICS IN THE MODERN WORLD

The Mean

The most widely used measure of the central tendency is the mean ( ). It is
the arithmetic average of all the scores. The mean can be determined by adding all
the scores together and then by dividing by the total number of scores. The basic
formula for the mean is as follows:

The operational term “summation” The raw scores


indicating to add all measures of 𝑥

∑𝑥
=
𝑁 The entire number of
observations being dealt with
Mean

In the example below concerning the annual income of 12 workers, the mean can
be found by calculating the average score of the distribution.
X
===========================
Php 200,000.00
200,000.00
195,000.00
194,000.00
194,000.00
194,000.00
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================
∑ 𝑥 = Php 2, 281,000.00

∑𝑥 2,281,000.00
= = =Php 190,083.00
𝑁 12
227
MATHEMATICS IN THE MODERN WORLD

In this example, the mean is an appropriate measure of central tendency because


the distribution is fairly well-balanced. This means that there are no extremely high
or extremely low scores in either direction that can unusually influence the average
of the scores. Thus, the mean value of 190,083.00 represents the total picture of
the distribution (i.e. annual incomes). This means that in a “more or less” or
approximate fashion it describes the entire distribution.

Mean of Skewed Distribution. There are situations wherein the mean cannot be
trusted to provide a measure of central tendency because it portrays an extremely
distorted picture of the average value of a distribution of scores. For instance, let
us still consider our example of annual incomes but this time with some
adjustment. Let us introduce another score. The annual income of an affluent new
neighbor who happened to move to this town just recently. This new neighbor has
a frugal high annual income so extremely far above the others.

X
===========================
New neighbor
Php 2, 500,000.00
200,000.00
200,000.00
195,000.00
194,000.00
194,000.00
194,000.00
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================
∑ 𝑥 = Php 4, 481,000.00

∑𝑥 4,281,000.00
= = =Php 367,769.00
𝑁 13

228
MATHEMATICS IN THE MODERN WORLD

As you may have noticed, the mean income of Php 367,769.00 this time provides
a highly misleading picture of great prosperity for this neighborhood. The
distribution was unbalanced by an extreme score of the new affluent neighbor. This
is what we call an skewed distribution.

Here are some graphic illustration of a skewed distribution:

When the tail goes to the right, the curve is positively skewed; when it goes to the
left, it is negatively skewed. The skew is in the direction of the tail-off of scores, not
of the majority of scores. The mean is always pulled toward the extreme score in a
skewed distribution. When the extreme score is at the low end, then the mean is
too low to reflect centrality. When the extreme score is at the high end, the mean
is too high.

The Median

The median is the point that separates the upper half from the lower half of the
distribution. It is the middle point or midpoint of any distribution. If the
distribution is made up of an even number of scores, the median can be found by
determining the point that lies halfway between the two middlemost scores.

193,000.00
190,000.00 (190,000+185,000)
185,000.00 Median= 2
180,000.00

229
MATHEMATICS IN THE MODERN WORLD

Arranging scores to form a distribution means listing them sequentially either


highest to lowest or lowest to highest. Unlike the mean, the median is not affected
by skewed distribution. Whenever the mean cannot provide centrality because of
extreme scores present, the median can be used to provide a more accurate
representation.

Calculation of the Median

X
===========================
➔➔➔ Php 2, 500,000.00
200,000.00
200,000.00
195,000.00
194,000.00
194,000.00
194,000.00 ----- 194,000.00 Median
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================

As you observed, even with the presence of extreme score at the high end of the
distribution- the value of the median is still undisturbed.

The Mode

Another measure of central tendency is called the mode. It is the most frequently
occurring score in a distribution. In a histogram, the mode is always located
beneath the tallest bar.

230
MATHEMATICS IN THE MODERN WORLD

Finding the mode of a distribution of raw scores (Annual Income)

X
===========================
Php 2, 500,000.00
200,000.00
200,000.00
195,000.00
194,000.00
194,000.00 Mode
194,000.00
193,000.00
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00
===========================

The mode provides an extremely fast way of knowing the centrality of the
distribution. You can immediately spot the mode by simply looking at the data and
find the dominant constant. It is the frequently occurring scores.

Appropriate Use of the Mean, Median and Mode

The best way to illustrate the comparative applicability of the mean, median and
mode is to look again at the skewed distribution.

231
MATHEMATICS IN THE MODERN WORLD

10,000

Frequency of Occurrence
Mode

100,000
Mean

20,000
Median

Distribution of monthly income per household in a certain municipality.

Most income is always skewed to the right because the low end has a fixed limit of
zero while the high end has no limit. If we consider that the area of the curve is 100
percent, then the median is the exact midpoint of the distribution. The area below
and above the median is both equal to 50 percent. Thus, if the median income is
P20,000.00 this means that 50% of the households have an income below
P20,000.00 and 50% of the households have an income above P20,000.00. On the
other hand, the mean in our figure above indicates a high income of P 100,000. This
makes the curve positively skewed. The value of the mean gives a distorted picture
of reality. The value of the mean is being unduly influenced by few affluent income
earners at the high end of the curve whose monthly income is almost around P
500,000.00. Looking at the modal income, which is P 10,000 per month, seemed
also to distort the reality towards the low side. The mode is always the highest point
of the curve. In this example, the mode represents the most frequently-earned
income; it is far lower than the median income of P 20,000.00. Both the mean and
the mode give a false portrait of distribution typicality and the truth lies somewhere
in between.

232
MATHEMATICS IN THE MODERN WORLD

Effects of the Scale of Measurement Used

The scale of measurement in which the data are based oftentimes dictates the
measures of central tendency to be used. The interval data can entertain the
calculations of all three measures of central tendency. The modal and ordinal data
cannot be used to calculate for the mean. Ordinal mean can provide an extremely
confusing wrong result. Since median is about ranking, a rank above the score falls
and a rank below a score falls; the ordinal arrangement is necessary in finding the
median. For the nominal data, however, neither the mean nor the median can be
used. Nominal data are restricted by simply using a number as a label for a category
and the only measure of central tendency permissible for nominal data is the mode.

In summary, if the interval data distribution is fairly well balanced, it is appropriate


to use the mean to measure the central tendency. If the distribution of the interval
data is skewed, you may either remove the outlier or adopt the median. If the
interval data distribution manifests a significant clustering of scores, then consider
to visually analyze the scores and find the presence of dominant constant which is
the Mode.

233
MATHEMATICS IN THE MODERN WORLD

Lesson
5.3 Measures of Dispersion

Specific Objectives

1. To know the different measures of variability.


2. To comprehend the strengths of the three measures
3. To realize the effect of the measures in the distribution
4. To critically an select appropriate tool for a certain situation

The measures of central tendency only provide information about the similarity or
typicality of scores. But to fully describe the distribution, we need to gain
information about how scores differ or vary. The description of the distribution can
only be complete if some information of its variability is known. To substantiate the
information provided by the measures of centrality, some degree of dispersion
must also be brought into the light.

Discussion

Measures of Variability

There are three measures of variability: the range, the standard deviation and the
variance. These three measures give information about the spread of the scores in
a distribution. Metaphorically, variability assert that a glass half-full is also half
empty. Being half-full is about centrality and being half-empty is about variability.

235
MATHEMATICS IN THE MODERN WORLD

The Range. The range, symbolized by R, describes the variability of scores by


merely providing the width of the entire distribution. The range can be found by
simply determining the difference between the highest score and the lowest score.
This difference always has a single value answer.

The example below shows the calculation of the range from a distribution of annual
incomes:

X
===========================
Php 200,000.00 Highest Score
200,000.00
195,000.00
194,000.00
194,000.00
HS-LS =Range
194,000.00
193,000.00 200,000 –176,000 = 24,000
190,000.00
185,000.00
180,000.00
180,000.00
176,000.00 Lowest Score
===========================

The capability of the range is to give information about the scattering of the scores
by merely using two extreme points. But one the hand, capability of range to report
score deviation poses a severe limitation. If you add new scores within the
distribution, the range can never report any changes in the deviation. Also, just by
adding one extreme score amidst normal distribution can definitely increase or
decrease in range even if there are no other deviations that transpired within the
distribution. The range is not stable enough to indicate variability. But nevertheless
it is still a method in finding the variability of any given distribution.

The Standard Deviation. The standard deviation (SD) is the life-blood of the
variability concept. It provides measurement about how much all of the scores in

236
MATHEMATICS IN THE MODERN WORLD

the distribution normally differ from the mean of the distribution. Unlike the range,
which utilizes only two extreme scores, SD employs every score in the distribution.
It is computed with reference to the mean (not the median or the mode) and it
requires that the scores must be in interval form.

A distribution with small standard deviation shows that the trait being measured is
homogenous. While a distribution with a large standard deviation is indicative that
the trait being measured is heterogeneous. A distribution with zero standard
deviation implies that scores are all the same (i.e. 10, 10, 10, 10, 10). Although it
may seem like stating the obvious, it is important to note that if all the scores are
the same, there is no dispersion, no deviation, and no scattering of scores in the
distribution --- so much so that there can never be less than zero variability.

In calculating the standard deviation, we can either use the computational method
or the deviation method. Both methods provide the same answer. However, in this
lesson, we will use the computational method because it is designed for electronic
calculators.

The formula for computational method is provided below:

The raw score in a distribution is symbolized as X

തതതത
The mean of a distribution is symbolized as 𝑋

The number of scores in a distribution is symbolized as N

The formula simply states that the standard deviation (SD) is equal to the square
root of the difference between the sum of raw score squared, which is divided by

237
MATHEMATICS IN THE MODERN WORLD

the number of cases, and the mean squared (Sprinthall, 1994). Below is an example
on how to obtain the standard deviation using the computational method.

434.283,000,000
𝑆𝐷 = √ − (190,083)2
12

𝑺𝑫 = 𝟕𝟔𝟓𝟑. 𝟓𝟐𝟏

Computer Note: Exploring MS Excel to find the value of SD

238
MATHEMATICS IN THE MODERN WORLD

The concept of standard deviation can further be clarified by using an illustration


of score distribution of students in Section A and in Section B, assuming that both
distributions (Section A scores and Section B scores) have precisely the same
measures of central tendency and the same range. The only unusual things about
these two distributions is that they differ in terms of their standard deviations,
Section A having a value that is greater than the value of Section B. The data are
clearly shown in the figure below.

Section A Section A
Math Quiz
Scores Scores
Mean 100 100
Median 100 100
Mode 100 100
N 30 30
HS, LS, Range 130, 70, 60 130, 70, 60
SD 10 2
Frequency of occurrence
Frequency of occurrence

----------------------------- ----------------------------
0 70 100 130 0 70 100 130
Section A Section B
Two Frequency Distributions of Scores

As can be noticed in the figure above, there is just a slight bulge in the middle of
the distribution of Section A. This means that it has many scores deviating widely
from the mean (100) and this is the result of having a large standard deviation (10).
However, Section B having a smaller standard deviation (2), most of the scores
gathers closely around the mean (100) thereby creating a towering lump. These
two distributions being compared reveals the disparity in the values of standard
deviation between the two sections. The section A having a large standard

239
MATHEMATICS IN THE MODERN WORLD

deviation, is behaving in a heterogenous manner while the section B having small


standard deviation acting in a homogenous way.

The Variance. Variance is another technique for assessing disparity in a


distribution. In the simplest sense, variance is the square of the standard deviation.
The formula is illustrated below:

𝑿is any raw score in 𝒙it is the deviation score. It is equal to the raw score, 𝑿,
the distribution minus the mean, 𝑋ത : 𝑥 = 𝑋 − 𝑋ത

Σ𝑋 2 Σ𝑥 2
𝑉= 𝑆𝐷2 = − 𝑋ത 2 =
𝑁 𝑁

Conceptually, variance is the same as standard deviation. If both standard deviation


and variance manifest large values then it means heterogenous distribution and
when they both manifest small values, they provide similar outcomes about the
homogeneity of the distribution.

While standard deviation finds out how to spread out the distribution scores from
the mean by exploring the square root of the variance, the variance, on the other
hand, calculates the average degree by which each score differs from the mean -
i.e. the average of all the scores in the distribution. It may appear to be unnecessary
to study variance where, in fact, standard deviation seems complete. But there are
situations wherein it is more efficient to work directly with variances than to
frequently make courtesy appearances to the standard deviation. In fact, F Ratio
takes full utilization of this special property of variability.

240
MATHEMATICS IN THE MODERN WORLD

Lesson
5.4 Measures of Relative Position

Specific Objectives

1. To gain deeper understanding about the Z-score


2. To realize the important role of percentile, and quartile in a
distribution
3. To interpret the analysis reported by box-and-whisker plots

In the previous lesson, we have demonstrated two separate but related measures
that can show the characteristics of the scores in a distribution. These are the
measures of central tendency and the measures of variability. In this lesson, we can
further explore all the possibilities that might occur in the relationship of centrality
and variability (i.e., mean and standard deviation). Let us consider having two sets
of distribution and different case scenarios that might occur in comparing their
respective means and standard deviations.

Discussion
The z- Score

Case A

𝜇1 = 𝜇2
𝜎1 = 𝜎2

As shown in Case A, it is possible that two distributions can generate almost the
same means (𝜇) and almost the same standard deviations (𝜎).

242
MATHEMATICS IN THE MODERN WORLD

Case B
𝜇1 ≠ 𝜇2
𝜎1 = 𝜎2

𝜇1 ≠ 𝜇2
𝜎1 = 𝜎2

It is also possible that two distributions have different means (𝜇) but similar
standard deviations (𝜎).

Case C

𝜇1 = 𝜇2
𝜎1 ≠ 𝜎2

Here in Case C, the two distributions have the same means (𝜇) but they differ
in standard deviation (𝜎).

Case D

𝜇1 ≠ 𝜇2
𝜎1 ≠ 𝜎2

In Case D, the distributions differ in terms of means (𝜇) and in terms of


standard deviations (𝜎).

This preliminary discussion basically shows that comparing two distributions is


complex. Case scenarios must be considered. Sometimes two distributions differ
in terms of means and sometimes they differs in terms of standard deviations. The

243
MATHEMATICS IN THE MODERN WORLD

groups usually differ in terms of centrality as well as in terms of disparity. Thus, in


order to compare two different groups, there must be a common scale that can
reconcile both means and standard deviation in a single standard form. It is only
when we convert scores from different distributions to common scores that direct
comparison is possible. This common score being referred to is called the z-score.
Below is the formula in finding the z-score.

𝑋−𝜇 𝑋−𝑥̅
𝑧= 𝑧=
𝜎 𝑠

𝑋refers to the raw scores from the population. 𝑋refers to raw score from the sample
𝜇 pertains to the mean of the population ത pertains to the mean of the sample
𝑥
𝜎 population standard deviation 𝑠 estimated standard deviation

Both formulas indicate the same relationship shared by the raw score, mean and
standard deviation. The only distinction between the two formulas is that whether
the distribution was generated from the population or from the sample. The
formula in the left refers to the z-scores from the population while the formula in
the right refers to the z-scores from the sample.

𝑋−𝜇
𝑧=
𝜎

The formula explains that values generated by the mean and standard deviation
can be integrated to transform a raw score (𝑋) into a standard score (𝑧). The z-
𝑋−𝜇
score equation, 𝑧 = , can convert the raw score of any group into a common
𝜎
value and it enables comparison between scores coming from different group
distributions. The below is an illustration of a standardized scale. As you may have
noticed in this z-scaling, the mean is always zero and the standard deviation is
always one unit.

244
MATHEMATICS IN THE MODERN WORLD

𝜇=0

𝜎=1

To further clarify the concept of z-score, let us assume that you are taking physics
and biology courses. In your final examinations, you earned a grade of 95 in physics
and 85 in biology. Now the question is: In which exam did you do better?

It seems obvious based on the face value of the scores, that you did better in
physics than in biology. But to come up with a serious comparison about your
scores between the two tests, we must take into consideration the question about
how well your classmates perform as a whole group. This requires additional
information about the mean and standard deviation values of both physics and
biology groups. But let us assume that we can right away get those needed
information. As such:

𝜇 𝜎
(population mean) (population SD)

Physics 85 10
Biology 75 5

Now, let us substitute that information into the z-score formula and compute for
the z score values

Physics Biology
𝑋𝑝 − 𝜇𝑝 𝑋𝑏 − 𝜇𝑏
𝑍𝑝 = 𝑍𝑏 =
𝜎𝑝 𝜎𝑏

95−85 85−75
𝑍𝑝 = = 1.0 𝑍𝑏 = = 2.0
10 5

245
MATHEMATICS IN THE MODERN WORLD

Finally, let us place these z-score values into a z-scale to clearly illustrate the
measures.

𝒁𝒑 =1.0 𝒁𝒃 =2.0

|_____|_____|_____|_____|_____|______|_____|_____|
𝒁 -4 -3 -2 -1 0 +1 +2 +3 +4
Physics 45 45 55 65 75 85 95 105 115 125
Biology 55 55 60 65 70 75 80 85 90 95

Notice that in the illustration, we can clearly compare the relative position of scores
in one standardized scale. Notice also that the means of both subjects reconcile to
adopt a common mean of 0 (𝜇 = 0). Likewise, both subjects agree to calibrate their
standard deviations into a unit of one (𝜎 = 1). Thus, comparison can now be made
on your final examination scores. As displayed, your score of 95 in physics falls
directly below 1.0 on the z-scale. Your score of 85 in biology falls directly below 2.0
on the z-scale. It is clear that you did much better in the biology exam (𝑍𝑏 = 2.0)
than what we previously thought that you did better in physics. This example is only
a glimpse to show that standardized scores are the building blocks that provide the
foundation to inferential statistics.

Percentile

To locate a specific point in any distribution, percentiles, quartiles and deciles are
the tools that can be used. The relative position of the raw score can be described
precisely by converting it into a percentile. A percentile refers to a point in the
distribution below which a given percentage of scores fall.

246
MATHEMATICS IN THE MODERN WORLD

3rd percentile 97th percentile

Based on the figure above, a score at the 97th percentile (P97) is at the very high end
of the distribution because an enormous number (97%) of scores are below that
point. A score at the 3rdpercentile (P3), however, is an extremely low score because
only 3% of the scores are below that point. The figure above also show that the 50 th
percentile divides the distribution exactly in half. The position of the 50th percentile
is also the location of the median.

To provide a better understanding on the role of the percentile, let us assume that
your College Admission Test Result reflected the 97th percentile score. This does
not indicate that out of 100 items of questions, you just made around three
mistakes. Instead, it means that 97% of those who took the exam did not perform
better than you. However, a significant 3% did perform better than you.

The percentile of any given data value score (x) can be determined by dividing the
number of data values less than x with total number of data values, and then
multiplying the obtained result by 100. For instance, consider a College Admission
Test administered to 5000 students, and your score of 800 was higher than the
scores of 4000 examinees. With this information , we can determine the percentile
of your score by using the formula the:

247
MATHEMATICS IN THE MODERN WORLD

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑥𝑎𝑚𝑖𝑛𝑒𝑒𝑠 𝑤ℎ𝑜 𝑑𝑖𝑑 𝑤𝑜𝑟𝑠𝑒 𝑡ℎ𝑎𝑛 𝑦𝑜𝑢 (4000)


Percentile Score (x) = x 100
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑥𝑎𝑚𝑖𝑛𝑒𝑒𝑠 𝑤ℎ𝑖𝑐ℎ 𝑖𝑠 5000

= 80

Your score of 800 places you at the 80th percentile.

Quartiles. As the name implies, quartiles divide the distribution into quarters.

3rd percentile 97th percentile

Q1 Q2 Q3

The first quartile, Q1, is actually on the 25thpercentile. The second quartile, Q2,
coincides with the median, which is on the 50th percentile. The 3rd Quartile, Q3, is
on the 75th percentile. The Q can be determined by using the following procedures:

For Q1. : The value of x is in the position .25 (n+1)


For Q2: The value of x is in the position .50 (n +1)
For Q3: The value of x is in the position .75 (n+1)

248
MATHEMATICS IN THE MODERN WORLD

Let us consider this example and determine Q1, Q2, and Q3.

X
===========================
Php 200,000.00
200,000.00
195,000.00
194,000.00
193,000.00
192,000.00
191,000.00
190,000.00
185,000.00
181,000.00
180,000.00
176,000.00
===========================
First, make sure that the scores are arranged from highest to lowest.

1. Calculating for the 1st quartile (Q1) or the 25th percentile


The x score is in the position of Q1 =.25 (n+1)

Q1 =.25 (n+1)
Q1 =.25 (12+1)
Q1 = 3.25 Q1=182,000

The value of x corresponding to the position is 181,000 + .25 (185,000-181,000).


Thus, Q1 = 182.000

249
MATHEMATICS IN THE MODERN WORLD

2. Calculating for the 2nd quartile (Q2) or the 50th percentile


The x score is in the position of Q2 =.50 (n+1)

Q2 =.50 (n+1) Q2=191,500


Q2 =.50 (12+1)
Q2 = 6.5

The value of x corresponding to the position is 191,000 + .50 (192,000-191,000).


Thus, Q2 =191,500

3. Calculating for the 3rd quartile (Q3) or the 75th percentile


The x score is in the position of Q3 =.75 (n+1)

Q3=194,750
Q3 =.75 (n+1)
Q3 =.75 (12+1)
Q3 = 9.75

The value of x corresponding to the position is 194,000 + .75 (195,000-194,000).


Thus, Q3 = 194,750

Box-and-Whisker Plots

A box and whisker plot displays a graphical summary of a set of data. It provides
information about the minimum and the maximum scores in the distribution, the
1st Quartile and 3rdQuartile as well as the 2nd quartile or the median. Observe the
figure below.

250
MATHEMATICS IN THE MODERN WORLD

Now, let us find the five-point summary of our previous example.

X
===============
Php 200,000.00 HS
200,000.00
195,000.00 Q2
194,000.00
193,000.00
192,000.00
Median
191,000.00
190,000.00
185,000.00 Q1
181,000.00
180,000.00
176,000.00 HS
================

Box-and-Whisker plots are easy to construct and they outrightly show important
information about the distribution of scores in a simple diagram. Also, it is not
necessary to label the final product.

|---|---|---|---|---|---|---|---|---|---|---|

251
MATHEMATICS IN THE MODERN WORLD

Lesson
5.5 The Normal Distributions

Specific Objective

: 1. To understand the concept of normal distribution


2. To gain knowledge on how use the z-table efficiently
3. To identify and classify some situation pertaining
normal distribution
4. To understand the applicability of normal distribution
in real life.

If mean and standard deviation are heart and brain of descriptive statistics then
perhaps the normal curve is its lifeblood. In the preceding section, we discussed in
passing the z-scores, wherein the mean is always zero and the standard deviation
is fixed to 1. In this section, it is now proper to finally introduce the normal curve.
The normal curve is actually a theoretical distribution. It is a unimodal frequency
distribution curve. The scores are scattered on the X axis while the frequency of
occurrence is defined by the Y axis.

254
MATHEMATICS IN THE MODERN WORLD

Discussions

Here are some key characteristics of the normal curve.

1. Majority of the scores cluster around the middle of the distribution and
fewer scores scattered in both extreme sides or tail ends of the curve.
2. It is always symmetrical and perfectly balanced.
3. Being a theoretical distribution, the mean, median and the mode are all
equal.
4. It uses standard deviation along the x-axis.
5. The normal curve is asymptotic to the abscissa and the total area under
the curve is approximating 1.0 or 100%
6. The normal curve has a mean of zero and standard deviation of 1 unit.

The Empirical Rule for a normal distribution

68% of data within 1 sd

95% of data within 2 sd

99.7% of data within 3 sd

255
MATHEMATICS IN THE MODERN WORLD

z Scores. The z scores are enormously beneficial in interpreting of relative position


of the raw score taking into account the centrality of the distribution and the
amount of variability. With the z-score, we can gain understanding of an individual
relative performance compared to the performance of the entire group being
measured. But before we delve deeper into the concepts of the z score, it is
imperative to learn how to use the z-score table. A copy of the z-table can be
accessed at this website address:
https://www.calculator.net/z-score-calculator.html
https://www.calculator.net/z-score-calculator.html
https://www.calculator.net/z-score-calculator.html

The table we will be using is a right tail z-table. This table is used to find the area
between z=0 and any positive value and reference the area to the right side of the
standard deviation curve. The z-score table gives only the percentage for the half
of the curve. But since the normal curve is symmetrical, a z-score that is given to
the right of the mean yields the same percentage as a z score to the left of the mean

Mean line

For example, to look up a z-score of .68 using the z-score table, look for 0.6 in the
far left of the column then look for the second decimal 0.08 in the top row. The
table value is 0.25175. It represents a percentage of 25.17 %. It is the percentage
of cases falling between the z score and the mean.

256
MATHEMATICS IN THE MODERN WORLD

25.17 % is the area between the


z-score and the mean

Mean 0.68
Z score

25.17% is the percentage of cases falling between the z score (0.68) and the
mean.

Now, let us consider some situations that might possibly occur in using the z-table

Case 1. Finding percentage of cases falling between z-score and the mean.
This area is 24.215%
This area is 24.215%

- Z score Mean Mean + Z score

257
MATHEMATICS IN THE MODERN WORLD

As example for Case 1, the z-score of +0.75 will generate a z-table value of
0.24215 or 24.215%. In the same way, the z-score of -0.75 will generate the same
value-table value of 0.24215 or 24.215%. Notice that the value is always a
positive number since percentage area is always positive.

Case 2. Finding the percentage of cases above the given z-score. It is important to
remember for this case that the total area of the normal curve is 1.0 or 100%. It is
also essential to keep in mind that the right half of the normal curve is 50% as well
as the left half (50%). You also need to consider that the z-table always provide a
percentage value in relation to the mean.

This area is 24.215% This area is 50%

This area is 25.785%

+0.75 -0.75
Mean
++
Z score - Z score Mean
(a) (b)

For Case 2(a), To find the area above the given z-score, the equivalent z-table value
must be determined then subtract it from the total area of the right half which is
50%. For example, to find the percentage of cases above the z-score of +0.75. Find
the z-table value of +.75 which is 0.24215 (24.215%) then subtract it from the total
area of the right half of the normal curve which is 50%. This is 50% - 24.214% =
25.785%

For Case 2(b), in order the determine the area above the given z-score (the z-score
here is a negative number because it is situated in the left side of the normal curve)
, simply find the equivalent z-table value then add 50%. Again, always keep in mind

258
MATHEMATICS IN THE MODERN WORLD

that the z-table only provide a percentage of cases between the z-score the mean
and not the entire right side of the curve. To cite another example, let us find the
percentage of cases above the z-score of -0.75. The z-table value of -0.75 is
0.24215. This is equivalent to 24.215%. With this number just add the percentage
area of the entire right side which is 50%. So this is 24.215% + 50% =74.215%.

Case 3. Finding the percentage of cases below the given z-score. The principle we
made in Case 2 is the same principle that can be applied in Case 3.

This area is 25.785% This area is 50% This area is 24.215%

-0.75 +0.75
Mean
-Z score Mean + Z score
(a) (b)

For case 3(a), try to determine the percentage of cases below the z-score of -0.75.
Using similar analysis made in case 2(a), the total area of the left side must be
subtracted. If your computation is correct, your answer is 25.785%.

For case 3(b), to determine the percentage of cases below the z-score of +0.75. The
z-table value will only cover the percentage of cases between the z-score and the
mean, so you need to add 50% which is the l percentage of cases of the left side of
the normal curve. Your computation must generate an answer of 74.215%.

259
MATHEMATICS IN THE MODERN WORLD

Case 4. Finding the percentage of cases between the two z-scores.

This area is 24.215% This area is 24.215%

-0.75 +0.75
Mean

-Z score +Z score

To illustrate Case 4, let us try to determine the percentage of cases between the
two z-scores. The -0.75 Z-score and +0.75 z-score. The -0.75 z-score generates a
z-table value of 24.215%. Also +0.75 z-score generates the same z-table value of
24.215%. Thus, the percentage of cases between -0.75 and +0.75 is simply to add
the two percentage of cases and that is (24.215% + 24.215%) 48.43%.

Translating the raw score into the z-score.

We are now familiar with the z-score concepts and having a knowledge about
percentages of area above, below and between z-scores. Likewise, we are also
equipped with certain knowledge regarding the z-score formula that if the mean
and standard deviation are known, we can subtract the mean from the raw score,
divide by standard deviation, and obtain the z score.
𝑥−𝑥̅
𝑧=
𝑆𝐷

The z-score reveals the location of the raw score from the mean in the standard
deviation units. The z score accounts both the mean of the distribution and the
amount of variability. Now, let us determine the practical use z-score in the context
of normal distribution of raw scores.

260
MATHEMATICS IN THE MODERN WORLD

Case A. When the percentage of cases is between the raw score and the mean.
The normal distribution of physics scores has mean of 85 and a standard deviation
of 10. What percentage of scores will fall between the physics score of 95 and the
mean?

Initially, we need to convert the raw score of 95 into its equivalent z-score.
𝑥−𝑥̅ 95−85
𝑧= = = 1.0
𝑆𝐷 10

Then draw the normal curve as shown below;

34.13%

85 95
𝑋ത (1.00)

Next is to look up the z-score value in the table ( https://www.calculator.net/z-


score-calculator.html ). The z-table value is 0.34134 or 34.13%. That is the
percentage of scores that falling between the physics score of 95 and the mean.
This means that around 1 in 3 students (34.13%) fall between the score of 95 and
the mean.

Case B. When the percentage of cases fall below a raw score. Using the same
example, on a normal distribution of scores in physics class, with a mean of 85 and
a standard deviation of 10, what percentage of physics scores fall below a score of
95?

First, convert the raw score of 95 into its equivalent z-score.


𝑥−𝑥̅ 95−85
𝑧= = = 1.0
𝑆𝐷 10

Next is to draw the normal curve as already shown below;

261
MATHEMATICS IN THE MODERN WORLD

34.13%
50%
85 95
𝑋ത (1.00)

Finally, look up the z-score in the z- table ( https://www.calculator.net/z-score-


calculator.html )take the right value. It is 0.34134 or 34.13%. Lastly, add the 50%
to 34.13% to get the sum 84.13%. The percentage of physics scores fall below a
score of 95 is 84.13%. This means that if 100 students took the examination and
your score is 95. Then your physics grade surpassed the grade of 84 students.

Case C. When the percentage of cases is above a raw score. On a normal


distribution of scores in physics class, with a mean of 85 and a standard deviation
of 10, what percentage of physics scores above a score of 95?

Again, we need to convert the raw score of 95 into its equivalent z-score.
𝑥−𝑥̅ 95−85
𝑧= = = 1.0
𝑆𝐷 10

The draw the normal curve as already shown below;

This area is 15.87%


34.13%

85 95
𝑋ത (1.00)

262
MATHEMATICS IN THE MODERN WORLD

We look up the z-score in the table ( https://www.calculator.net/z-score-


calculator.html )take the correct value. It is 0.34134 or 34.13%. Then subtract
34.13% from 50%. The answer is 15.87%. This is the percentage of cases above the
score of 95. This means that if 100 students took the examination and your score is
95. Then around 15 students surpassed your physics grade of 95.

Case D. When the percentage of cases is between raw scores. On a normal


distribution of physics scores, the mean is 85 and the standard deviation is 10. Your
physics score is 95 and your friends score is 80. You wanted to determine how many
students got a score between your friend’s score of 80 and your score of 95.
Again, convert the raw score of 95 and the raw score of 80 into its equivalent z-
scores.
𝑥−𝑥̅ 95−85 𝑥−𝑥̅ 80−85
𝑧= = = 1.0 𝑧= = = - 0.5
𝑆𝐷 10 𝑆𝐷 10

The draw the normal curve as already shown below;


34.13%
19.15%

80 85 95
(-0.5) 𝑋ത (1.00)

We look up the z-score in the table ( https://www.calculator.net/z-score-


calculator.html ) and look for z percentage of cases for the z-value 1.0. Also look for
the percentage of cases for the z-value -0.5. The percentage of cases is 34.13% and
19.15% respectively. Add the two values to get the percentage of cases between

263
MATHEMATICS IN THE MODERN WORLD

the raw score of 95 and 80. The answer is 53.28%. This means that 1 in 2 students
got a score between 95 and 85 (i.e. between your score and your friend’s score).

At this point, we already made a significantly long journey. From the measures of
central tendency to the measures of variability and finally to measures of relative
position. We are now in the position no longer seeking answers to questions but
seeking questions beyond the conventions established by the answers.

264
MATHEMATICS IN THE MODERN WORLD

Lesson
5.6 The Linear Correlation: Pearson r

Specific Objective

1. To know the characteristics of Pearson r


2. To solve problems dealing with linear correlations
3. To understand the limitations of linear correlations

At the beginning of this course, we defined mathematics as the science


of patterns. We realized that nature follows a certain kind of
mathematical structure as we observed some patterns and irregularities
and whenever we see patterns, irregularity also beg also to be noticed.
Also, whenever we see irregularities, some patterns suddenly waving for
attention.

The linear correlation is not about patterns, but it is about looking on


irregularities and patiently waiting for the patterns to manifest. This
lesson deals with determining the connections of the things seemed
unrelated and to declare whether some correlations are indeed
significant .

Discussions
The Pearson R Linear Correlation.

The Product-Moment Correlation Coefficient or Pearson r is an statistical


tool that can determine the linear association between two distributions
or groups. This tool can only establish the strength of association or

266
MATHEMATICS IN THE MODERN WORLD

correlation but it can never justify any causal relation that may appear
or seemed obvious.

The formula below is the computational method for calculating the


Pearson r

The number of
subjects Means

Σ𝑋𝑌
−(𝑥̅ )(𝑦ത)
𝑁
𝑟= 𝑆𝐷𝑥 𝑆𝐷𝑦
Standard Deviations

The pearson r value may provide three possible scenarios. If the value of 𝑟 is + then
it is a positive correlation. If it is - then it is a negative correlation. If 𝑟’s value is
around “0” then it means that almost no linear correlation found.

𝒓 = +𝟏 𝒓 = −𝟏 𝒓=𝟎

An example of positive correlation is height and weight of a person. Under normal


circumstances whenever a person gain height it means also a gain in weight. An
example of negative correlation is the relationship between length of employment
and degree of attractiveness. As you may observe physically attractiveness of an
employee is affected by the chronologically advancement of his or her age. An
267
MATHEMATICS IN THE MODERN WORLD

example of zero correlation might be relationship between grade of student living


in high land areas and the study habits of students living in the low land areas. You
should also remember that Pearson 𝑟 does not generate a value less than -1 or
more than +1. Any answer outside below -1 and above +1 can be attributed to a
wrong computation made.

We will explain the nature of linear correlation by using an example. Assuming that
we want to determine if there is a correlation between hours of study and grades
of students last semester. Initially, we need to randomly select students (let say 10)
and ask them about their averaged grade last semester as well as the number of
hours they spent in studying per week in that semester. Let us presume that right
away they provided us these two informations.

===============================================
Student Hours of Study (x) Grade (Y)
===============================================
A 15 2.75
B 35 1.25
C 05 3.00
D 20 2.50
E 30 1.50
F 40 1.00
G 20 2.25
H 25 1.75
I 25 2.00
J 08 3.00

But before we can immediately use the Pearson r formula, we need to


ensure that this is the correct statistical tool in determining the
correlation between hours of study and grades. Let us check some basic
Pearson r requirements:

268
MATHEMATICS IN THE MODERN WORLD

1. Random selection of participants.


2. Traits being measured must not depart significantly from normality
3. The measurements on both distributions must be in the form of
interval data.
4. Comparing only two groups.
5. And the goal is to determine the linear correlation between two
groups.

The formula in solving the Pearson 𝑟 is…

Σ𝑋𝑌
𝑁
−(𝑥̅ )(𝑦ത)
𝑟= 𝑆𝐷𝑥𝑆𝐷𝑦

𝑋 refers to one variable and the 𝑌 refers to another variable


𝑋ത 𝑎𝑛𝑑 𝑌ത refers to the mean of 𝑋 and the mean of 𝑌
𝑆𝐷𝑥 and 𝑆𝐷𝑦 refers to the standard deviation of 𝑋 and 𝑌 respectively
𝑁 refers to the numbers of variables
Σ It is the symbol for summation

Now let us take into account the data below as our example to illustrate the
formula.
=================================================================================
Student Hours of Study (𝑥) 𝑥2 Grade (𝑦) 𝑦2 𝑥𝑦
==================================================================================
A 15 225 2.75 7.56 41.25
B 35 1225 1.25 1.56 43.75
C 05 25 3.00 9.00 15.00
D 20 400 2.50 6.25 50.00
E 30 900 1.50 2.25 45.00
F 40 1600 1.00 1.00 40.00
G 20 400 2.25 5.06 45.00
H 25 625 1.75 3.06 43.75
I 25 625 2.00 4.00 50.00
J 08 64 3.00 9.00 24.00
====================================================================================
𝚺𝒙=223 𝚺𝒙𝟐=6089 𝚺𝐲=21 𝚺𝒚𝟐 = 48.75 𝚺𝒙𝒚=397.75

269
MATHEMATICS IN THE MODERN WORLD

Σ𝑥 223 Σy 21
𝑥̅ = = = 22.3 𝑦ത = = 10 = 2.1
𝑁 10 𝑁

Σ𝑥 2 6089 Σ𝑦 2 48.75
𝑆𝐷𝑥 = √ − 𝑥̅ 2 = √ − 22.32 = 10.56 𝑆𝐷𝑦 = √ − 𝑦ത 2 = √ − 2.12 =0.682
𝑁 10 𝑁 10

Σ𝑋𝑌
𝑁
−(𝑥̅ )(𝑦ത)
𝑟= 𝑆𝐷𝑥𝑆𝐷𝑦

397.75
− (22.3)(2.1)
𝑟 = 10
(10.56)(0.682)

𝒓= -0.979
Point to Ponder: Why do you think we generated a negative r value?

Thus, we could say that the correlation between hours of study and grades of
students achieved a Pearson r value of -0.979. Do not be confused by the that there
is a negative sign in our final answer. This sign provides an idea of the direction of
correlation line. You should take into consideration that a grade of 1.0 has a strong
academic weight in our grading system but once plug in into the computation it is
interpreted by formula as a small number. Nevertheless, with full knowledge of the
concept you can always come up with the right interpretation.

Since the distribution exclusively concerns the 10 students and it is not a population
sample, then Guilford’s suggested interpretation for the values of r can be used
without hindrance.

Guilford’s Interpretation for the values of r


r value Interpretation
===============================================================
Less than .20 Almost negligible relationship
.20-.40 Definite but small relationship
.40-.70 Substantial relationship
.70-.90 Marked relationship
.90-1.00 Very dependable relationship

270
MATHEMATICS IN THE MODERN WORLD

===============================================================

And based on Guilford’s suggested interpretation, there is a very dependable


relationship between hours of study and grade of students.

Does it mean that better grades can be achieved by spending more time studying?
Does it mean that spending more time studying is a by-product of better grades?
Does it mean that another factor influenced better grades and study habits?

All three of these questions are possible. But the point is that correlation alone
is not enough to identify which is the real explanation. Pearson r is not a tool for
establishing causation. It can only a tool describe linear correlation between to
observed traits.

271
MATHEMATICS IN THE MODERN WORLD

Lesson
5.7 The Least-Squares Regression Line

Specific Objective

: 1. Define Linear Regression


2. Define Scatter plot
3. Compute for Least Square Regression Line

In the previous lesson, we discussed Pearson r as a powerful tool in determining


linear correlation. It is an important tool to investigate associations considering
that different mathematical patterns are all around us. Such as, the connection of
high tide and low tide in human behavior, the association between height and
weight. And the correlation between the metaphoric flap of butterfly in Japan to a
weather disturbance in South America a year after.

But correlation entrapped and cloistered us within the parameter of merely


associating. Correlation in and by itself cannot establish causation to warrant
prediction. But in this lesson of regression analysis, not only that we can connect
and associate some observable patterns, it also permits us finally make basic
predictions.

Discussion

Bivariate Scatter Plot

A bivariate simply means that we can graphically represent two variables (x and y)
in a scatter plot wherein each point in a scatter plot represent a pair of scores.
Scatter plot is necessary in order to determine the regression line. The regression
line a generated straight line that lies closest to all the point in the scatter plot.

273
MATHEMATICS IN THE MODERN WORLD

Our example below illustrates the construction of scatter plot based on some data
information regarding the association of our previous example on hours of study
and grade.

𝑑1 𝑑2 𝑑3 𝑑4 𝑑5 𝑑6 𝑑7 𝑑8 𝑑9 𝑑10

As shown in the scatter plot above, the straight line is called the least-squares
regression line. This generated line minimizes the sum of the squares of the vertical
deviation from each data point to the line. This means that of all the possible lines
that can suggest the correlation line strength of all the points, the equation of this
generated line has the best fit. The 𝑑𝑛 represents the distance from point (x,y) to
the line.

𝑑12 + 𝑑22 + 𝑑32 + 𝑑42 + 𝑑52 + 𝑑62 + 𝑑72 + 𝑑82 + 𝑑92 + 𝑑10
2

In the least-squares line, this correlation that can be established around the
regression line is the basis for resulting prediction. But in order to make predictions,
three important ingredients must be on hand: 1. The equation of the best fit line.
2. Slope of the line, and 3. The y-intercept of the line.

The Formula for the Least-Squares Regression Line


There must be 𝑛 ordered pairs: (𝑥1 , 𝑦1 ), (𝑥2 , 𝑦2 ), (𝑥3 , 𝑦3 ), … , (𝑥𝑛 , 𝑦𝑛 )
274

𝑦 = 𝑚𝑥 + 𝑏

𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦) (Σ𝑦)−𝑚(Σ𝑥)
𝑚= 𝑏=
𝑛(Σ𝑥 2 )−(Σ𝑥)2 𝑛
MATHEMATICS IN THE MODERN WORLD

To apply this formula to our given data, we need to find the value of each
summation.

In finding the value of 𝑚 :

𝑛(Σ𝑥𝑦)−(Σ𝑥)(Σ𝑦) 10(397.75)−(223)(21)
𝑚= = = -0.06321
𝑛(Σ𝑥 2 )−(Σ𝑥)2 10(6089)−49729

In finding the value of 𝑏 :

(Σ𝑦)−𝑚(Σ𝑥) (21)−(−0.06321)(223)
𝑏= = =3.509
𝑛 10
Finally, substituting the values to the given formula:

275
MATHEMATICS IN THE MODERN WORLD

𝒚𝒑𝒓𝒆𝒅 = 𝒎𝒙 + 𝒃
𝒚𝒑𝒓𝒆𝒅 = -0.06321x + 3.509
Slope(𝒎) = -0.06321
𝒚 intercept (𝒃) = 3.509

In the preceding lesson, we were able to establish the strength of correlation of


this example using Pearson r. We found a very strong relationship between hours
of study (𝑥) and grade (𝑦) (𝑖. 𝑒. 𝟎. 𝟗𝟕𝟗). Now let us predict the grade of students
who spent the following weekly study hours: 37, 22, and 8.

Since we have already determined the regression the line, let us just simply plug
all the necessary values then “𝑦”.

================================================================
𝒚𝒑𝒓𝒆𝒅 = 𝒎𝒙 + 𝒃
𝒚𝒑𝒓𝒆𝒅 = -0.06321x + 3.509
================================================================
𝒙 − 𝟎. 𝟎𝟔𝟑𝟐𝟏𝒙 − 𝟎. 𝟎𝟔𝟑𝟐𝟏𝒙 + 𝟑. 𝟓𝟎𝟗 = 𝒚𝒑𝒓𝒆𝒅
37 -2.33877 1.17023
22 -1.39062 2.11838
08 -0.50568 3.00332
================================================================

The predicted grade of students is around 1.17 for the student who spends 37
hours of study, 2.12 for the one spending 22 hours of study, and just a passing grade
of 3.0 for the one engaged for eight hours of study.

276
MATHEMATICS IN THE MODERN WORLD

Module Five : Project Proposal Requirement

Project Proposal Requirement

For this culminating requirement in Module Five, you need to work together in
groups of 3 or 4.
1. Your task is to prepare a proposal study that can contribute to a solution to
any social problem.
2. You must use statistical methods for your data processing and analyses.
3. Your final output must be no more than 8 pages that details your project
proposal.
4. Please follow the outline provided below:
a. Title page (not included in the page count)
-An example of problem to be addressed: In this COVID-19 pandemic,
how can we reduce human traffic in wet market places.
b. Background and Statement of the Problem
c. Literature Review
d. Proposed Study with emphasis on how statistics will be used
i. Data to be collected
ii. Methods of data collection and data gathering instrument
iii. Data gathering procedure
iv. Method of Analyses
e. Discussion of how your project proposal can address the identified problem.
f. References (APA or MLA)

5. Below is the format guideline:


Paper Font Font All Line Page
Substance Margin Orientation Paper Size Number
Type Size Spacing
(if printed)
20 Normal Portrait 8.5 x 13 Arial 12 1.5 Page x of x
Justified

Your project proposal will be graded based on these criteria:


1. Soundness of the proposal (1/3)
2. Appropriate use of statistical method (1/3)
3. Coherence (1/3)

278
MATHEMATICS IN THE MODERN WORLD

Chapter Test 5
Multiple Choice. Choose the letter of the correct answer and write it on the
blank provided at the left side of the test paper.
==================================================================
__________ 1. It is a branch of statistics that deals with data analysis and one of its
technique is to “describe” data in symbolic form and abbreviated fashion.
a. Inferential Statistics c. Descriptive Statistics
b. Statistics and Probability d. Probability

__________ 2. It is a branch of statistics that has the ability to “infer” and to generalize.
It is also the right tool to predict values that are not really known.
a. Inferential Statistics c. Descriptive Statistics
b. Statistics and Probability d. Probability

__________ 3. It is an essential quantifying an observation according to a certain rule. It


is also assigning numbers in a prescribed way.
a. Variable c. Data
b. Measurement d. Constant
__________ 4. If the data are labelled 1st, 2nd, 3rd, and so on, in what kind of scale does
it falls?
a. Nominal Scale c. Categorical Data
b. Interval Scale d. Ordinal Scale
__________ 5. Personal Biodata falls in what kind of scale?
a. Nominal Scale c. Categorical Data
b. Interval Scale d. Ordinal Scale
__________ 6. It can be used as tools that provide information regarding average,
ranking, and category of scores of a large number of scores.
a. Measures of Central Tendency c. Measures of Dispersion
b. Measures of Variability d. Measurement
__________ 7. Mean is the arithmetic average of all scores. In the data: 34, 56, 75, 43,
and 67, what is the mean?

279
MATHEMATICS IN THE MODERN WORLD

a. 53 c. 55
b. 54 d. 56
__________ 8. It is the middle point or midpoint of any distributions. It separates the upper
half from the lower half of distribution.
a. Mean c. Mode
b. Median d. Range
__________ 9. The following is a list of retirement ages for the workers in production
plant: 65, 64, 65, 61, 62, 64, 65, 63, 63, 65, 64. What is the median?
a. 64 c. 63
b. 65 d. 62
__________ 10. It is the most frequently occurring score in the distribution.
a. Mean c. Mode
b. Median d. Range
__________ 11. These three measures can provide the information about spread of the
scores in the distribution.
a. Mean, median and mode c. Range, standard deviation and variance
b. Mean, range, and variance d. Range, median and mode

The grade-point average for the selected university students were


computed. The data are as follows:
Student GPA
1 3.75
2 3.00
3 3.00
4 1.75
5 2.00
6 2.25
7 3.25

__________ 12. What is the range?


a. 1.00 c. 1.75
b. 2.25 d. 2.00

__________ 13. What is the mean (in nearest hundredths)?

280
MATHEMATICS IN THE MODERN WORLD

a. 2.70 c. 2.72
b. 2.71 d. 2.74
__________ 14. What is the median?
a. 3. 00 c. 1.75
b. 2.25 d. 2.00
__________ 15. If the standard deviation in a distribution is 4, what is the variance?
a. 8 c. 32
b. 16 d. 64
__________ 16. In comparing different groups, there must be a standard scale that can
reconcile both means and standard deviation in single standard form. It is
only then that direct comparison is possible because transformed scores
from different distributions will share common scores and these common
scores are called ____.
a. Percentile c. T-score
b. Quartile d. Z-score
__________ 17. Jerry took College Admission Test which reflected at 89th percentile,
what does it indicates?
a. 89% of those who took the exam did not get it right than Jerry.
b. Out of 100 items of questions, Jerry had 11 mistakes.
c. Jerry answered 89 questions correctly.
d. 89% of those who took the exam get it right than Jerry.
__________ 18. It divides the distribution into quarters.
a. Percentile c. T-score
b. Quartile d. Z-score
__________ 19. The third quartile Q3 is on what percentile rank?
a. 25th percentile c. 100th percentile
b. 50th percentile d. 75th percentile

__________ 20. In this box-whisker plot

281
MATHEMATICS IN THE MODERN WORLD

What does the middle dot or point denote?


a. Q1 c. Q3
b. Median d. Maximum score
__________ 21. It is a unimodal frequency distribution where the scores are scattered on
the X-axis while the frequency of occurrence is defined by the Y-axis.
a. Z-distribution curve c. Normal curve
b. Distribution curve d. X-axis and Y-axis curve
__________ 22. This table only gives the percentage for the half curve but both the right
and the left of the mean yields the same percentage since the said curve is
symmetrical.
a. Z-table c. T-table
b. Percentage Table d. Normal table
__________ 23. Scores on English Test have an average of 80 with a standard deviation
of 6. What is the z-score of the student who earned a 75 on test?
a. -0.97 c. -0.88
b. -0.76 d. -0.83
__________ 24. Group of children compared what they received while trick or treating.
The average number of pieces of candy received is 43 with a standard
deviation of 2. What is the z-score corresponding to 20 pieces of candy?
a. -11.5 c. -9.5
b. -10.5 d. -12.5
__________ 25. The mean growth of the thickness of tree in a forest is found to be 0.5
cm per year with a standard deviation of 0.1 cm per year. What is the z-
score corresponding to 1 cm per year?
a. 4 c. 6
b. 5 d. 7
__________ 26. It is a statistical tool that can be used to determine the linear association

282
MATHEMATICS IN THE MODERN WORLD

between two distributions or groups. It can only establish the strength of


association or correlation but it can never justify any causal relation that
may appear and seemed obvious.
a. Correlation c. Pearson R correlation
b. Pearson correlation d. Association
__________ 27. There is no linear correlation found if the value of r is ___.
a. -1 c. 2
b. +1 d. 0
__________ 28. The linear correlation is said to be substantial relationship if the value of
r is ___.
a. Less than 0.20 c. Between 0.70 and 0.90
b. Between 0.40 and 0.70 d. More than 1

Given the data chart of selected persons with their ages and daily incomes,
calculate the Pearson’s correlation coefficient.

Person Age(x) Income(y) xy x2 y2


1 20 150 3000 400 2250
2 30 300 90000 900 90000
3 40 500 20000 1600 250000
4 50 750 37500 2500 562500
Total 140 1700 69500 5400 925000

Mean of x = 35 Mean of y = 425


Standard Deviation of x = 11.18 Standard Deviation of y = 225

__________ 29. What is the value of r?


a. 1 c. 0.99
b. 0.89 d. 0
__________ 30. What is the interpretation for the r value?
a. Very dependable relationship c. Almost negligible relationship
b. Substantial relationship d. Marked relationship
__________ 31. It is used in making predictions between two variables.
a. Linear correlation c. Linear regression
b. Linear relationship d. Regression correlation
__________ 32. It uses dots to represent values for two different numeric values. It is

283
MATHEMATICS IN THE MODERN WORLD

also important in determining the regression line.


a. Scatter plot c. Box-whisker plot
b. Scatter line d. Regression line

The data below pertains to the experience of some workers in a company


(number of years) and their performance rating. Estimate the performance
rating for a worker with 20 years of experience.

Worker Experience(x) Performance(y) xy x2


1 16 87 1392 256
2 12 88 1056 144
3 18 89 1602 324
4 4 68 272 16
5 3 78 234 9
6 10 80 800 100
7 5 75 375 25
8 12 83 996 144
Total 80 648 6727 1018

__________ 2. What is the value for slope of the line?


a. 1.13 c. 1.15
b. 1.14 d. 1.16
__________ 2. What is the value for y-intercept?
a. 69.6 c. 69.8
b. 69.7 d. 69.9
__________ 2. What is the value of regression line?
a. 92.0 c. 92.2
b. 92.1 d. 92.3

284
MATHEMATICS IN THE MODERN WORLD

Answer: Chapter test


1. C
2. A
3. B
4. D
5. A
6. A
7. C
8. B
9. A
10. C
11. C
12. D
13. B
14. A
15. B
16. D
17. A
18. B
19. D
20. B
21. C
22. A
23. D
24. A
25. B
26. C
27. D
28. B
29. C
30. A
31. C
32. A
33. A
34. B
35. D

285

You might also like