0% found this document useful (0 votes)
366 views42 pages

Module-4 Mathematics in The Modern World

This document summarizes Module 4 of a statistics course which focuses on problem solving and reasoning. The module contains 6 lessons covering topics like data collection and presentation, measures of central tendency, dispersion, relative position, probabilities, and correlation/regression. It also provides an introduction to the course's purpose of using statistical tools to process and manage numerical data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
366 views42 pages

Module-4 Mathematics in The Modern World

This document summarizes Module 4 of a statistics course which focuses on problem solving and reasoning. The module contains 6 lessons covering topics like data collection and presentation, measures of central tendency, dispersion, relative position, probabilities, and correlation/regression. It also provides an introduction to the course's purpose of using statistical tools to process and manage numerical data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 42

MODULE 4

PROBLEM SOLVING AND REASONING

Lesson 1 Collection and Presentation of Data

Lesson 2 Measures of Central Tendency

Lesson 3 Measures of Dispersion

Lesson 4 Measures of Relative Position

Lesson 5 Probabilities and Normal


Distribution

Lesson 6 Correlation and Linear Regression


2

MODULE 1
DATA MANAGEMENT

INTRODUCTION

Statistical tools derived from mathematics are useful in processing and


managing numerical data in order to describe a phenomenon and predict values.
The intention of this part of the course is to build the concepts and skills learned
prior to university/college, deepen what have been taught and learned and
highlight the skills in interpreting statistical results.

OBJECTIVES

At the end of this module, you will be able to:


1. Use a variety of statistical tools to process and manage numerical data.
2. Use mathematics in other areas such as finance, business, environment and
recreation.

GECC 103 – Mathematics in the Modern World Module 4


3

Lesson 1

COLLECTION AND PRESENTATION OF DATA

Statistics has several meanings. It is frequently used to refer to recorded data such
as the number of traffic accidents, the size of enrollment, or the number of patients
visiting a clinic. Statistics is also used to denote characteristics calculated for a set of
data – for example, mean, standard deviation, or correlation. In another context,
statistics refers to statistical methodology and theory.
In short, STATISTICS is a body of techniques and procedures dealing with the
collection, organization/presentation, analysis, and interpretation of information that can
be stated numerically.
Steps in Statistical Investigation
1. Defining the problem.
2. Collection of data – refers to the process of obtaining numerical measurements.
3. Tabulation and presentation of data – refers to the organization of data in tables,
graphs or charts, so that logical conclusions can be derived from them.
4. Analysis of data – pertains to the process of deriving from the given data relevant
information from which numerical descriptions can be formulated.
5. Interpretation of data – refers to the task of drawing conclusions from the analyzed
data.

Branches of Statistics
Descriptive Statistics – deals with the collection, enumeration, classification and
graphical representation of data and computation of values to describe the characteristics
of data. An example is the census conducted by the NSO, in which all residents are
requested to provide information such as age, sex, and marital status. The data obtained
in such census can then be compiled and arranged into tables and graphs that describe the
characteristics of the population at a given time.
Inferential Statistics – is concerned with reaching conclusions about large groups of
data by studying the characteristics of samples drawn from the population, and making
inferences on previously formulated hypothesis. An example is an opinion poll such as the
SWS survey, which attempts to draw inferences as to the outcome of an election. In such
a poll, a sample of individuals (frequently fewer than 2000) is selected, their preferences
are tabulated, and inferences are made as to how more than 80 million persons would
vote if an election were held that day.

Population and Sample


Collection of data can be made either from a population or sample. The
population includes all objects of interest whereas the sample is only a portion of the
population. Parameters are associated with population and statistic with samples.
Parameters are usually denoted using Greek letters (mu, sigma) while statistic are usually
denoted using Roman letters (x, s).
Variable refers to a factor, property, attribute, characteristic, or behavior that
differentiates a group of persons, a set of things, events, conditions or approaches from
another group(s) or set(s) and which takes on two or more dimensions, categories or levels

GECC 103 – Mathematics in the Modern World Module 4


4

with descriptive or numerical values that can be measured qualitatively or quantitatively.


Example: sex, socio-economic status, geographic location, managerial style and
motivational factors.
Variables can either be dependent or independent depending on their use. If they
are the controlling factors then they are the independent variables. Otherwise they are
dependent on these controlling factors.
Variables may also be discrete or continuous. Discrete variable results from either
a finite number of possible values or a countable number of possible values (i.e., exact
counts). Continuous variable results from infinitely many possible values that can be
associated with points on a continuous scale in such a way that there are no gaps or
interruptions. The number of people that play professional baseball are discrete data as
are golf scores because they represent exact counts; in contrast, distance, force, and time
represent continuous numerical data.
Variables are qualitative if they take the form of attribute or categories such as
race, sex, religion, “YES” or “NO” responses. They are quantitative in nature if they
come as measurements such as weights, heights, temperature and other measurable
quantities.
Scales of Measurements
Data can be grouped into one of four categories or scales: nominal, ordinal,
interval, and ratio.
Nominal scales consist of mutually exclusive categories without qualitative
differentiation between categories. Data consisting of names, labels, or categories serve
as examples. Nominal data cannot be arranged in an ordering scheme (e.g., low-to-high).
Some nominal scales have only two categories such gender (male or female) or yes or no.
Other nominal data include more than two categories such as political parties. Remember,
associate the term nominal with “name only”.
Ordinal data can be arranged in some order, but differences between data values
either cannot be determined or are meaningless. Placement on a ladder tournament is an
example of ordinal data. The person on top of the ladder has performed better than the
person ranked second, but there is no indication of how much better. Rankings of Olympic
competitors represents another example of ordinal data, yet differences between ranks 2
and 3 make little sense expect to order the individuals. With ordinal data, the ordering
says nothing about ability or differences between ranks other than they are different.
Interval data. If numbers can be assigned in such a way that equal numerical
differences correspond to equal increment in the property, we have what we call interval
data. It does not have a true zero point, although a zero point may for convenience be
arbitrarily defined. Examples IQ test scores, Celsius temperature, and degree of
agreement to a certain issue as measured on a five point scale.
Ratio data represents the interval data scale modified to include an inherent zero
starting point where zero indicates that none of the quantity is present. With ratio data
the scale is based on an ordered structure, has equal distance between scale points, and
uses zero to represent absence of the value. All units are equidistant from each other and
proportional or ratio comparisons are appropriate. All measures of distance, force, or time
are based on a ratio scale. A negative score cannot exist on a ratio scale; a person cannot
run a race in negative seconds, weigh less than zero pounds, or score negative points in
golf or basketball.

Learning Activity 4.1


GECC 103 – Mathematics in the Modern World Module 4
5

I. State which of the following represents Discrete Data or Continuous Data.


1. The monthly income of graduate students.
2. The weight of new born babies recorded in a Nursery section.
3. The number of fouls per game.
4. The number of out-patients in a certain hospital.
5. Daily temperature during summer.
6. The number of newspaper sold in San Fernando City on April 25 1998.
7. The time it takes to finish washing your clothes.
8. Grade in Math of a certain BPA student.
9. Blood pressure of a hypertensive patient.
10. The number of Mechanical Engineering Board Passers last examination.

II. Categorize each of the following as either nominal, ordinal, interval or ratio
measurements.
11. Color of the eyes
12. Rank in the military
13. Students rating in College Admission Test
14. Basketball Scores
15. Major field of Specialization
16. Emotional Quotient
17. Bank Deposits
18. Level of Commitment of Mayors
19. Tuition Fee
20. Classification of Municipalities.

Collection of Data

GECC 103 – Mathematics in the Modern World Module 4


6

Any statistical investigation must necessarily be based on correct and accurate


data. In order to ensure correctness and accuracy of the data, one must know the right
sources and methods of collecting data.

Types of Data:
1. Primary Data – It refers to the firsthand experience or directly way of gathering
the data or information gathered from an original source. Example
autobiographies, diaries, first person accounts.
2. Secondary Data – It refers to the information taken from published or unpublished
data which are previously gathered by other individuals or agencies. Example
books, magazines, journals, newspapers, theses and dissertations.

Methods in the Collection of Data


1. Interview Method – it is a person-to-person exchange between the interviewer and
the interviewee. It provides consistent and precise information.
2. Questionnaire Method – written responses are given to prepared questions. A
questionnaire elicits answers to the problem of a study
3. Registration Method – this is enforced by certain laws, information are kept
systematized and available to all because it enforced by law.
4. Observation Method – the investigator observes the behavior of persons or
organizations and their outcomes, usually used when subjects can’t talk or hear.
5. Experiment Method – used when the objective is to determine the cause and effect
relationship of certain phenomena under controlled condition.

Presentation of Data

To give useful information, data should be summarized or organized into a reduced


form more appropriate for an effective analysis and interpretation.

Methods
1. Textual form – is used in presenting data in a paragraph or narrative form.
2. Tabular form – a very effective and efficient means of organizing and summarizing
data because a lot of information can be seen from a single table and it makes
comparison of figures quick under each category. Data appears in rows and
columns.

Frequency Distribution Table


It refers to the tabular arrangement of data by categories/classes with their
corresponding class frequencies.

Definition of Terms:
1. Class interval and class limits – a symbol defining class such as 60-62 is called
class interval. The end numbers 60 and 62 are called class limits; the smaller
number 60 is the lower class limit and the large number 62 is the upper limit.
The term class (category) and class interval are often used interchangeably
although the class interval is actually a symbol of class.

GECC 103 – Mathematics in the Modern World Module 4


7

2. Class boundaries – more precise expressions of the class limits at least 0.5 of
their value. They are called the true limits; the class boundary is situated
between the upper of one interval and the lower limit of the next.
3. Class frequency – refer to the number of observations belonging to a class
interval.
4. Class mark/midpoint (X) – is the class midpoint obtained by adding the lower
and upper limits and divide it by 2.
5. Class size, width or length – difference between the upper class boundary and
the lower class boundary of an interval.

Steps in Constructing Frequency Distribution


1. Find the largest and smallest values and compute for the
Range = Maximum - Minimum
2. Compute for the number of classes and class width using the following
formulas:
For Number of Classes:

Sturge’s Formula:
k =1+3.3 log N
Where: k = no. of classes, N = no. of scores/cases

For Class Width:


R
c=
k
Where: c = class width/interval size; R = Range; k = no. of classes

3. Organize the class interval.


4. Tally each score to the category of class interval it belongs.
5. Count the tally column and summarize it under “f”. Then add the total
number of frequencies (N = total).
6. Compute the Midpoint for each class interval and put it under column
M.
LS + HS
X= Where: X = midpoint, LS = lowest score in the
2
class interval and HS = highest score in the class interval.
7. Find the cumulative frequencies. Depending on what you're trying to
accomplish, it may not be necessary to find the cumulative
frequencies.

Example 1
The following are scores of 50 students in the midterm examination
of Mathematics in the Modern World (MMW), construct a frequency
distribution table.
50 85 91 54 62 72 68 70 79 90

58 35 52 61 93 98 60 62 76 99

64 78 49 88 73 51 69 80 93 89

68 98 66 96 55 77 57 61 70 92

GECC 103 – Mathematics in the Modern World Module 4


8

46 73 83 91 79 53 62 59 82 93

Solution
Step 1: Find the largest and smallest values and compute for the Range.
Lowest Score = 35
Highest Score = 99
Range (R) = 99 – 35 = 64
Step 2: Compute for the number of classes and class width.
Number of classes
N = 50
k =1+3.3 log N
k =1+3.3 log 50=6.6 ≈ 7 Classes
Class width
R 64
c= = =9
k 7

Step 3: Organize the class interval. Use the lowest score as the lower
limit of the lowest class. Add c on each succeeding lower limits per class.
Class interval
35-43
44-52
53-61
62-70
71-79
80-88
89-97
98-106
Step 4 and 5: Tally each score to the category of class interval it belongs.
Summarize under column f (frequency).
Class interval Tally f
35-43 1 1
44-52 11111- 5
53-61 11111-1111 9
62-70 11111-11111- 10
71-79 11111-111 8
80-88 11111- 5
89-97 11111-1111 9
98-106 111 3
N = 50

Step 6: Compute the Midpoint for each class interval and put it under
column M.
Class interval f X
35-43 1 39
44-52 5 48

GECC 103 – Mathematics in the Modern World Module 4


9

53-61 9 57
62-70 10 66
71-79 8 75
80-88 5 84
89-97 9 93
98-106 3 102
N = 50

8. Step 7: Find the cumulative frequencies. Depending on what you're


trying to accomplish, it may not be necessary to find the cumulative
frequencies
Class interval f X <cf >cf
35-43 1 39 1 50
44-52 5 48 6 49
53-61 9 57 15 44
62-70 10 66 25 35
71-79 8 75 33 25
80-88 5 84 38 17
89-97 9 93 47 12
98-106 3 102 50 3
N = 50

Frequency Distribution of the Results of the Midterm Examination in MMW


Class interval f X <cf >cf
35-43 1 39 1 50
44-52 5 48 6 49
53-61 9 57 15 44
62-70 10 66 25 35
71-79 8 75 33 25
80-88 5 84 38 17
89-97 9 93 47 12
98-106 3 102 50 3
N = 50

3. Graphical form
Data can also be presented in graphical form. This form is the most
effective means of organizing and presenting statistical data because the
important relationships are brought out more clearly and creatively in
virtually solid and colorful figures.

Types of Graph
a. Histogram – consists of an abscissa which depicts the class boundaries and a
perpendicular ordinate which shows the frequency of observations.
Steps: Prepare the x and y axis. Mark the x-axis representing the
class boundaries, and y-axis the frequencies. The bases of the bars
are plotted on the x-axis where the width of the base corresponds to
the real limits or class boundaries of the class interval. The center
of the base falls on the midpoint of the class interval.

GECC 103 – Mathematics in the Modern World Module 4


10

Scores in MMW Midterm Examination


10
8
6

Frequency
4
2
0
29.5- 39.5- 49.5- 59.5- 69.5-
39.5 49.5 59.5 69.5 79.5
Class Boundaries

b. Frequency Polygon – constructed by making a point at the midpoint of the


class interval.
Steps: Label the points on the base line. Plot the midpoints; scores
within the interval are concentrated on the midpoint. When all the
midpoints are located, join them by a series of short lines additional
at both ends are needed.

Scores in MMW Midterm Examination

10
8
Frequency

6
4
2
0
34.5 44.5 54.5 64.5 74.5
Class Midpoints

Interpreting Organized Data


After the data have been organized or presented using frequency
distributions or graphs, analysis and interpretation come in. Interpretation is the
process of making sense of numerical data that has been collected, presented and
analyzed.

GECC 103 – Mathematics in the Modern World Module 4


11

Example 2: Using the frequency distribution table below, answer the questions
that follows:

Frequency Distribution of the Results of the Midterm Examination in MMW


Class interval f X <cf >cf
35-43 1 39 1 50
44-52 5 48 6 49
53-61 9 57 15 44
62-70 10 66 25 35
71-79 8 75 33 25
80-88 5 84 38 17
89-97 9 93 47 12
98-106 3 102 50 3
N = 50

1. What percent of the students obtained a score within 80-88?


2. How many students got a score lower than 62?
3. If the passing score is 80, what percentage of the students passed the
Statistics Examination?

Answers:
1. 10% of the students obtained a score within 80-88.
2. 15 students have scored lower than 62.
3. The percentage of the students who have passed the Statistics
examination is 34%.

Learning Activity 4.2


The following are the scores of 50 students in Mathematics. Construct a
frequency distribution table and present it using any of the graphical presentation.
43 68 20 33 39
47 60 40 47 81
80 67 48 78 48
55 69 55 55 57
70 80 63 71 63
62 69 70 39 57
40 62 25 62 52
57 54 46 81 46
79 46 52 63 28
61 37 59 48 55

GECC 103 – Mathematics in the Modern World Module 4


12

Lesson 2

MEASURES OF CENTRAL TENDENCY

The first type of descriptive statistics identifies the center of the


distribution of scores. These are called measures of central tendency because they
all identify the center of the distribution in different ways. The three most
important measures of central tendency are the mean, the median and the mode.

MEAN ( x )
- The mean is the average of the set of scores. By far, the most common
measure of central tendency in statistics is the mean. It is the most
sensitive measure of central tendency.

A. Mean for Ungrouped Data


Arithmetic Mean – The most commonly used measure of central
tendency. The sum of the values of a group of items divided by the number of
such items.
Σx
The sample mean: x=
n
Where: x=sample mean
Σ x =t h e ∑ of all t h e scores
n=total number of cases

Σx
The population mean: μ=
N
Where: μ= population mean
Σ x =t h e ∑ of all t h e scores
N=total number of cases

Example 3: Consider the scores of ten people who took a make up quiz in Algebra.
12 14 16 10 5 8 18 7 10 4
The sum of the scores is Σ x =104, then the mean score is
Σ x 104
x= = =10.4
n 10
Weighted Arithmetic Mean – can be expressed as the sum of the values
multiplied by their corresponding weights divided by the total weight.
Σf x
The formula is: x=
Σf
Where: f =weig h t∨frequency of eac h item
x=value of eac h item

GECC 103 – Mathematics in the Modern World Module 4


13

Example 4: The final grades of a student at the end of semester are the following:
Subjects Grades (x) Units (f)
GECC 101 – Art Appreciation 85.00 3
GECC 104 – Ethics 88.00 3
HMGT 102 – Kitchen Essentials 84.00 3
Tour 103 – Quality Service Management 89.00 3
GECC 103 – Mathematics in the Modern World 87.00 3
Tour 104 – Philippine Tourism Culture 90.00 3
Phed 102 – Individual/Dual Sports 92.00 2

Then the mean grade of the student is:

3 ( 85.00 ) +3 ( 88.00 ) +3 ( 84.00 ) +3 ( 89.00 ) +3 ( 87.00 ) +3 ( 90.00 ) +2 ( 92.00 )


x=
3+ 3+3+3+3+3+ 2

1753
¿
20

x=87.65

B. Mean for Grouped Data


Grouped data are data which are arranged in a frequency distribution.
There are two methods we use to compute for the mean of grouped data,
the long method and the coded method. But, in this module we are going to
make use of the long method or midpoint method only.

a. Long Method or Midpoint Method


The formula is:
Σ fX
a . x= (for sample )
n

Σ fX
b . μ= ( for population)
N

Where: f =frequency of eac h class


M =midpoint ∨classmark of a class
N=total frequency∈t h e population distribution
n=total frequency ∈t h e sample distribution

Characteristics of the Mean


1. It is the most reliable measure of central tendency.
2. Summarizes data in a way that is easy to understand.
3. Uses all the data.
4. Used in many statistical applications.
5. It is the best measure to use when the distribution is symmetrical or normal.
6. The mean is sensitive or greatly affected by extreme values.

GECC 103 – Mathematics in the Modern World Module 4


14

Example 5: The following are the distribution of scores of 50 students in a


Statistics Examination. Compute for the mean using the long method and the
coded method.
Scores Number of X fX
Students (f)
91 – 97 2 94 188
84 – 90 3 87 261
77 – 83 10 80 800
70 – 76 15 73 1095
63 – 69 11 66 726
56 – 62 6 59 354
49 – 55 3 52 156
N= Σ fX =¿3580

Solution
Σ fX 3580
x= = =71.60
n 50

 The average score of the students in the Statistics Examination is 71.60.

MEDIAN ( ~
x)
- Is a measure of central tendency that occupies the middle position in an
array of values. It is the number that divides the bottom 50% of the data
from the top 50%.

a. Median for Ungrouped Data


The median for ungrouped data is the middlemost value when the
data or scores are arranged in ascending/descending manner.
If n is odd:
~
x=x n +1
2
If n is even:
x n+ xn
+1
~
x=
2 2
2

b. Median for Grouped Data


Formula:

( )
n
−cf p
~ 2
x=L B + i
f md
Where: LB =lower boundary of t h e median class
n
Median class ist h e class interval w h ere is found
2
cf p =cumulative frequency ( ¿ cf ) before t h e median class
i=class ¿ interval

GECC 103 – Mathematics in the Modern World Module 4


15

f md=frequency of t h e median class

Characteristics of the Median


1. Easy to understand and easy to compute.
2. The point/ score that divides the distribution in to two halves.
3. The median is not affected by extreme values.

Example 6: The Median for Ungrouped Data


Compute for the median of the following scores.
a. 3, 9, 2, 8, 5, 7
Arrange the scores:
x1 = 2 x2=3 x3 = 5 x4 = 7 x5 = 8 x6 = 9
Since n is even:
x n+ xn x 6 + x6
~ 2 2
+1 +1 x + x 5+7
x= = 2 2 = 3 4=
2 2 2 2

b. 34, 56, 27, 25, 98, 12, 32, 54, 47


Arrange the scores:
x1 = 12 x2=25 x3 = 27x4 = 32x5 = 34
x6 = 47 x7 = 54 x8= 56 x9 = 98

Since n is odd:
~
x=x n +1 =x 9+1 =x 10 =x 5=34
2 2 2

Example 7. The Median for Grouped Data

Scores Number of <cf


Students (f)
91 – 97 2 50
84 – 90 3 48
77 – 83 10 45
70 – 76 15 35
63 – 69 11 20
56 – 62 6 9
49 – 55 3 3
N = 50

Find n/2 = 50/2 = 25. Find 25 under <cf column to find the median
class. The median class lies at 70-76.

( )
n
−cf p
~ 2
x=L B + i
f md
~
x=69.5+(25−20
15 )
× 7=71.83

MODE ( ^x )

GECC 103 – Mathematics in the Modern World Module 4


16

- It is the most frequent or occurring score in a series.


- A distribution that consists of only one of each score has n modes. A
distribution where a single score is most frequent has one mode and is
called unimodal. When there are ties for the most frequent score, the
distribution is bimodal if two scores tie or multimodal if more than two
scores tie.

A. Mode for Ungrouped Data


The most frequent occurring score is the mode. Arrange the scores
from least to greatest and inspect to determine the score/s that has/have
more occurrences. The score or value which occurs the greatest number of
times in the data is the mode.

B. Mode for Grouped Data


The class with the highest frequency is the modal class.

The formula is:


^x =LB +
d1
(
d 1 +d 2
i
)
Where: LB =lower boundary of t h e modal class
d 1=t h e difference of t h e frequency of t h e modal class
¿ t h e frequency preceding t h e modal class
d 2=t h e difference of t h e frequency of t h e modal class
¿ t h e frequency after t h e modal class
i=class ¿ interval

Example 8: Find the mode of the following scores.


23 21 24 21 23 30 16 18 23 28 25 26
Arrange the scores, so that it will be easier to find the most frequently occurring
score.
16 18 21 21 23 23 23 24 25 26 28 30
The mode is 23.

Example 9: The following are the distribution of scores of 50 students in a


Statistics Examination. Compute for the mode.
Find the class with the highest frequency.
Scores Number of
Students (f)
91 – 97 2
84 – 90 3
77 – 83 10
70 – 76 15
63 – 69 11
56 – 62 6
49 – 55 3
N=

d1 = 15 -11= 4
d2 =15 -10 = 5

GECC 103 – Mathematics in the Modern World Module 4


17

^x =LB +
( d1
)
d 1 +d 2
i

^x =69.5+ ( 4+4 5 ) ×7=72.61


Learning Activity 4.3

Answer the following

1. Given the following scores: 7, 13, 8, 5, 9, 12, 15, 22, 10, and 9 find the mean,
median and mode.
2. Using the learning activity in lesson 1, compute the mean, median and mode of
the set of scores.

GECC 103 – Mathematics in the Modern World Module 4


18

Lesson 3

MEASURES OF DISPERSION

Measure of Dispersion is a measure that describes how spread out or


scattered a set of data. It is also known as measures of variation or measures of
spread.
A. Range
- the simplest of the measure of dispersion.
- the difference between the highest (maximum) and lowest (minimum)
values.

 R=Hig h est Value−Lowest Value  for ungrouped data


 R=U . B .H −L . B . L  for grouped data
Where: U . B . H −is t h e upper boundary of t h e hig h est class
L . B .L −is t h e lower boundary of t h e lowest class

Characteristic of the Range:


1. Easy to compute and understand.
2. Emphasizes the extreme values.
3. Most unstable and unreliable measure. Because its value can fluctuate
greatly with a change in just single score – either the lowest or the highest
score.

Example 10: Find the mode of the following sets of data:


Set A: 5, 7, 8, 8, 9, 10, 11, 12  Range = HV – LV = 12 – 5 = 7
Set B: 8, 9, 9, 10, 11, 11, 12, 12  Range = HV – LV = 12 – 8 = 4
Set C: 7, 7, 8, 8, 9, 9, 10, 10, 10  Range = HV – LV = 10 – 7 = 3
Interpretation: Based on the computed range for sets A, B, and C, it can be
concluded that set A has greater variability than B and C.

B. Variance and Standard Deviation

Variance (s2 and σ2)


- Examines how far, on average, each score is away from the mean. The
sample variance is symbolized as s2 and the population variance as σ2.
- The variance of a population is equal to the sum of the squared deviations
about the mean divided by the number of scores.

GECC 103 – Mathematics in the Modern World Module 4


19

Variance of a Population
 The average of the squares of the distances from the population mean. It is
the sum of the squares of the deviations from the mean divided by the
population size. The units on the variance are the units of the population
squared.
Variance of a Population Formula

σ 2
=
∑ (X −μ)2
N

Where: σ 2−variance of a population


X −values of observations∈t h e population
μ− population mean
N−total number of observations∈t h e population

Variance of a Sample
 Unbiased estimator of a population variance. Instead of dividing by the
population size, the sum of the squares of the deviations from the sample
mean is divided by one less than the sample size. The units on the variance
are the units of the population squared.

Variance of a Sample Formula

s=

2 (x−x)2
n−1

Where: s2−variance of a sample


x−values of observations∈t h e sample
x−sample mean
n−total number of observations ∈t h e sample

GECC 103 – Mathematics in the Modern World Module 4


20

Standard Deviation
 The square root of the variance. The population standard deviation is the
square root of the population variance and the sample standard deviation is
the square root of the sample variance. The units on the standard deviation
is the same as the units of the population/sample.

Standard Deviation of a Population Formula

σ=
√ ∑ ( X −μ)2
N
Where: σ −standard deviation of a population
X −values of observations∈t h e population
μ− population mean
N−total number of observations∈t h e population

Standard Deviation of a Sample

s=
√ ∑ (x −x)2
n−1
Where: s−standard deviation of a sample
x−values of observations∈t h e sample
x−sample mean
n−total number of observations∈t h e sample

Variance and Standard Deviation for Ungrouped Data


To compute for the variance of ungrouped data, the following steps should
be undertaken:
1. Find the Mean of the set of scores.
2. Subtract the Mean from each score/number and square the result
3. Then get the summation of those squared differences.
4. To compute for the variance, divide the summation by the total number
of scores minus 1.
5. To compute for the standard deviation, just get the square root of the
variance.

GECC 103 – Mathematics in the Modern World Module 4


21

Example 11: Compute for the variance and standard deviation of the following
sample data:
x x−x ( x−x )2
22 -5 25
24 -3 9
26 -1 1
28 1 1
30 3 9
32 5 25
Σ x =¿162 ∑ ( x− x )2=70

x=
∑ x = 162 2
s=
∑ ( x−x)2
n 6 n−1
x=27 2 70 70
s= = =14.00
6−1 5
s= √14.00=3.74

The Short Cut formula or the Sum of Squares formula


The shortcut formula for the computation of the variance and standard
deviation does not utilize the mean.
To use the formulas, simply square the individual scores and get the
summation of the scores and the squared scores then substitute the values into the
formula.

Variance of a Sample Standard Deviation of a Sample


2
s =n¿ ¿ 2
n( ∑ x 2 )- ( ∑ x)
s=
n ( n-1 )
Where: s2−variance of a sample
s−standard deviation of a sample
x−values of observations∈t h e sample
n−total number of observations ∈t h e sample

GECC 103 – Mathematics in the Modern World Module 4


22

VARIANCE AND STANDARD DEVIATION FORMULAS FOR GROUPED DATA

Variance of a Sample Standard Deviation of a Sample


s2=n¿ ¿ n( ∑ f X −( ∑ fX ) )
2 2

s=
n ( n−1 )

2
Where: s −sample variance
s−sample standard deviation
X −midpoint∨class marks
f −frequency ∈a class
n−total number of observations ∈t h e sample

Example 12: Compute for the Variance and Standard Deviation

Height f X fX X
2
fX
2

(in inches)
45 – 49 3 47 141 2209 6627
50 – 54 4 52 208 2704 10816
55 – 59 6 57 342 3249 19494
60 – 64 7 62 434 3844 26908
65 – 69 10 67 670 4489 44890
70 – 74 7 72 504 5184 36288
75 – 79 6 77 462 5929 35574
80 – 84 4 82 328 6724 26896
85 – 89 3 87 261 7569 22707
N = 50 ΣfX = 3350 2
ΣfX =230200

n( ∑ f X 2 )−( ∑ fX )
2
2 50 (230200)−33502 287500
s= = = =117.34
n ( n−1 ) 50 (49) 2450

s= √ n¿ ¿ ¿

Interpreting the Standard Deviation


The standard deviation is the most useful and important measure of
variation/dispersion. It is widely used in research and is used in drawing inferences

GECC 103 – Mathematics in the Modern World Module 4


23

from samples to populations. The interpretation of the standard deviation is of


great importance in Research and Statistics.

 Chebyshev’s Theorem
The accuracy and the position of the scores in frequency distribution
relative to the mean can be computed by using the Chebyshev’s theorem.

The Chebyshev’s Theorem states that the proportion or percentage of any


data set that lies within k standard deviations of the mean (where k is any
positive integer greater than one) is at least
1
1− 2
k

Example 13: If the mean score of the students enrolled in an English class is 66
points with standard deviations of 5 points, at least what percentage of the scores
must lie between 46 and 86?
Solution: x ̅ −k (s)=46
66−k (5)=46
5 k =20 →k =4

1 1 1 15
1− =1− 2 =1− = =0.9375∨93.75 %
k 2
4 16 16
∴ At least 93.75 % of t h e data lie between 46∧86.

Learning Activity 4.4

Answer the following

1. Given the following scores: 7, 13, 8, 5, 9, 12, 15, 22, 10, and 9 find the range,
variance and standard deviation.

2. Using the learning activity in lesson 1, compute the range, variance and
standard deviation of the set of scores.

Lesson 4

MEASURES OF RELATIVE POSITION


GECC 103 – Mathematics in the Modern World Module 4
24

Measures of Relative Position are conversions of values, usually standardized


test scores, to show where a given value stands in relation to other values of the
same grouping.

Standard Scores (z-Scores)

A standard score (or z-score) indicates how many standard deviations an


element is from the mean. A standard score can be calculated from the following
formula.
( X−μ)
z=
σ
where z is the z-score, X is the value of the element or the raw score, μ is the
population mean, and σ is the standard deviation.

How to interpret z-scores:


 A z-score less than 0 represents an element less than the mean.
 A z-score greater than 0 represents an element greater than the mean.
 A z-score equal to 0 represents an element equal to the mean.
 A z-score equal to 1 represents an element that is 1 standard deviation
greater than the mean; a z-score equal to 2, 2 standard deviations greater
than the mean; etc.
 A z-score equal to -1 represents an element that is 1 standard deviation less
than the mean; a z-score equal to -2, 2 standard deviations less than the
mean; etc.

Example 14: Veniz scored 55 on a mathematics test that had a mean of 45 and a
standard deviation of 10. On an English test with a mean of 56 and a standard
deviation of 12, she had scored 70. Compare her relative positions on the two
tests.
Solution: Convert her scores for the two tests to standard score:
For Mathematics;
X −μ 55−45
z= = =1.00
σ 10
For English;
X −μ 70−56
z= = =1.17
σ 12

Since the standard score for English is larger, her relative position in English is
higher than her relative position in Mathematics.

Example 15: Suppose that the mean of a test is 122 and the s is 24. If Josie earns a
score of 146 on the test, her deviation from the mean is 146-122 is 24. Dividing
Josie’s deviation of 24 by the s of the test, we give her a z of 1.00. If Edlyn’s score
is 110, then what is Edlyn’s z-score?

GECC 103 – Mathematics in the Modern World Module 4


25

110 - 122
z= =−0.50
24

Example 16: Two equivalent intelligence test are given to similar group, the test
are designed with different scales. The statistics for the tests are listed below.
Which is better a score of 145 on Test I or a score of 60 on Test II?

Test I Test II
Mean = 100 Mean = 40
s = 15 s=5

z-score for test I z-score for test II


145 - 100 60 - 40
z= =3.00 z= =4.00
15 5

Therefore, a score of 145 on test I is 3.00 standard deviations above the


mean and a score of 60 on test II is 4.00 standard deviations above the mean. This
implies that 60 is a better score than 145.

PERCENTILE
A percentile is a measure indicating the value below which a given percentage
of observations in a group of observations fall. For example, the 80th percentile is
the value below which 80% of the observations may be found.

Percentile for a Given Data Value


Given a set of data and a data value x,
number of data valuesless t h an x
Percentile of score x = ×100
total number of data values

Example 17: On an examination given 4500 students, Jedd’s score of 340 was
higher than the scores of 2,898 students who took the examination. What is the
percentile of Jedd’s score?

Solution:
Percentile 2898
¿ × 100
4500
¿ 0.644 × 100
¿ 64

Mia’s score of 340 places her at the 64th percentile.

QUARTILE
Refers to the value that divides the distribution into four (4) equal parts.
 Q1 – refers to the value of the distribution that falls on the first one fourth
of the distribution arranged in magnitude.

GECC 103 – Mathematics in the Modern World Module 4


26

 Q2 – two-fourths or half of the distribution. This is also the median of the


distribution.
 Q3 – three-fourths of the distribution.

Example 18: Find Q1, Q2 and Q3 of the following scores.


23, 21, 24, 21, 22, 30, 16, 18, 22, 28, 25, 26, 30
Solution:
Step 1: Arrange the scores.
16 18 21 21 22 22 23 24 25 26 28 30 30
Step 2: Find the median or Q2.The median is 23.
Step 3: Find the median of the data values that fall below Q2.
16 18 21 21 22 22
Q1 = 21
Step 4: Find the median of the data values that are above Q2.
24 25 26 28 30 30
26 +28
median= =27
2
Q3 = 27

Box-and-Whisker Plots
A box-and-whisker plot or boxplot is a diagram based on the five-number
summary of a data set. The five-number summary of a data set consists of the five
numbers determined by computing the minimum, Q1 or the 1st Quartile, median, Q3
or the 3rd Quartile, and maximum value of the data set.
To construct a box-and-whisker plot, first draw an equal interval scale on
which to make the box plot. The boxplot is a visual representation of the
distribution of the data. Greater distances in the diagram should correspond to
greater distances between numeric values.
Using the equal interval scale, draw a rectangular box with one end at Q1
and the other end at Q3. And then draw a vertical segment at the median value.
Finally, draw two horizontal segments on each side of the box, one down to the
minimum value and one up to the maximum value, (these segments are called the
"whiskers").

Example 19: Draw a box-and-whisker plot for the data set 16, 18, 21, 21, 22, 22,
23, 24, 25, 26, 28, 30, 30.
Solution:
1. Find/Compute for the five-number summary:
Minimum =16, Q1 = 21, Median = 23, Q3 = 27 and Maximum = 30.
2. Plot the values.

GECC 103 – Mathematics in the Modern World Module 4


27

Learning Activity 4.5

Answer the following


1. A data set has a mean of 212 and a standard deviation of 40. Find the z-
score for each of the following
a. x = 200 b. x = 224
c. x = 300 d. x = 100
2. On a reading test, Alyson’s score of 455 was higher than the scores of
4256 of the 7210 students who took the test. Find the percentile,
rounded to the nearest percent, for Alyson’s score.
3. The following are grades of 20 students in their Research project: 85, 88,
87, 86, 89, 90, 92, 88, 94, 90, 85, 89, 85, 84, 82, 87, 83, 84, 91, and 92.
Find:
a. Median
b. Q1 
c. Q3
d. Percentile rank ng 89
e. Draw a boxplot of the data set.

GECC 103 – Mathematics in the Modern World Module 4


28

Lesson 5

PROBABILITIES AND NORMAL DISTRIBUTION

The Normal Probability curve is the most commonly used theoretical


distributions in statistical inference. De Moivre developed the mathematical
equation of the normal curve in 1773. It is sometimes called the Gaussian
distribution in honor of Carl Friedrich Gauss, who derived the equation in the 19th
century.
In most cases, this is used to determine the distribution of variables such as
grades of students, weights or heights of person, incomes of families, or IQ.

The Normal Curve


A normal curve is a bell-shaped curve which shows the probability
distribution of a continuous random variable. The normal curve represents a
normal distribution. The total area under the normal curve is one. Thus, the
parameters involved in a normal distribution are mean (μ) and standard deviation
(σ).

Characteristics of the Normal Curve


1. The curve is symmetrical and bell-shaped.
2. The number of cases, N, is infinite.
3. The three measures of central tendency (Mean, Median and Mode) coincide
at one point at the center of distribution.
4. The height of the curve indicates the frequency of cases, expressed as
probability, proportion or percentage.
5. The basic unit of measurement is expressed in sigma units (σ) or standard

()
x
deviations along the baseline. The sigma units are also called Z-scores .
σ
6. Two parameters are used to describe the curve. One is the parameter mean
which is equal to zero (μ = 0) and the other is the standard deviation which
is equal 1 (σ = 1).
7. Standard deviations or Z-scores departing away from the μ towards the right
of the curve or above the mean are expressed in positive values while the
scores departing from the mean to the left of the curve or below the mean
are in negative values.

GECC 103 – Mathematics in the Modern World Module 4


29

The Empirical Rule


In a normal distribution, approximately:
- 68% of the data lie within 1 standard deviation from the mean.
- 95% of the data lie within 2 standard deviations from the mean.
- 99.7% of the data lie within 3 standard deviations from the mean.

Standard Normal Distribution


The standard normal curve represents a normal curve with mean 0 and
standard deviation 1.
It is helpful to convert raw scores to z-scores using the following formulas:
For population:
x−μ
zx=
σ
For sample:
x−x
zx=
s

Tables and calculators are used to determine the area under the normal
curve. The following table of Areas under the Normal Curve will help. Since, the
normal curve is symmetrical, values for negative and positive z-scores are the
same.

GECC 103 – Mathematics in the Modern World Module 4


30

Areas under the Normal Curve


Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3304 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993
3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995
3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997
3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998
3.5 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998
3.6 0.4998 0.4998 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.7 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.8 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999
3.9 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000 0.5000

GECC 103 – Mathematics in the Modern World Module 4


31

Example 20: Find the area under the standard normal curve for the following z-
scores and draw and shade the corresponding area on the curve.
a. Between z = 0 and z = 0.50

Solution: Using the table, the area between the mean and a z-score of 0.50
corresponds to .1915. Hence, the area is 0.1915 or 19.15%.

b. Between z = -1.50 and z = 0.50

Solution: Using the table, the area to the between -1.50 to the mean (0)
is .4332 and the area from the mean to 0.50 is .1915. Hence, total area is
equal to the sum of 0.4332 and 0.1915 which is 0.6247 or 62.47%.

c. Between z = - 2.40 and z = 0

Solution: Using the table, the area between a z-score of -2.4 and the mean
is .4918. Hence, the area is 0.4918 or 49.18%.

GECC 103 – Mathematics in the Modern World Module 4


32

d. To the left of z = 2.30

Solution: Using the table, the area from the mean to z = 2.3 is 0.4893. The
total area to the left of z is 0.9893 or 98.93%.

e. To the right of z = 1.00

Solution: Using the table, the area from the mean to a z-score of 1.0 is
0.3413. The total area to the right of z = 1.0 is 0.1587 or 15.87%.

GECC 103 – Mathematics in the Modern World Module 4


33

Learning Activity 4.6

Answer the following


1. Find the area under the standard normal curve for the following z-scores
and draw and shade the corresponding area on the curve.
a. between z = -1.5 and z = 2.12
b. to the right of z = 2.33
c. to the left of z = 0.64

2. A survey of 1000 women ages 20 to 30 found that their heights were


normally distributed, with a mean of 65 in. and a standard deviation of 2.5
in. (a) How many of the women have a height that is within 1 standard
deviation of the mean? (b) How many of the women have a height that is
between 60 in. and 70 in.?

3. The cholesterol levels of a group of young women at a university are


normally distributed, with a mean of 185 and a standard deviation of 39.
What percent of the young women have a cholesterol level
a. greater than 219?
b. between 190 and 225?

GECC 103 – Mathematics in the Modern World Module 4


34

Lesson 6

CORRELATION AND LINEAR REGRESSION

Linear Regression
In practice a relationship is found to exist between two (or more) variables,
and one wanted to express this relationship in a mathematical form by finding an
equation connecting the variables. To do this, one should collect data showing the
corresponding values of the variables. Next is to plot the points into the
rectangular coordinate system. The resulting graph is sometimes called the scatter
plot or scatter diagram.

Scatterplot
An effective way to see a relationship in data is to display the information
as a scatter plot. It shows how two variables relate to each other by showing how
closely the data points fit to a line. If the variables are correlated, the points will
fall along a line or curve. The better the correlation, the tighter the points will
hug the line.
A simple scatterplot can be used to (a) determine whether a relationship is
linear, (b) detect outliers and (c) graphically present a relationship. For example,
determining whether a relationship is linear (or not) is an important assumption if
you are analyzing your data using a Correlation and Regression.
Various types of correlation can be interpreted through the patterns
displayed on Scatterplots. These are: positive (values increase together), negative
(one value decreases as the other increases), null (no correlation). The strength of
the correlation can be determined by how closely packed the points are to each
other on the graph. Points that end up far outside the general cluster of points are
known as outliers.

Example 21: Sample Scatterplots

Source: https://datavizcatalogue.com/methods/scatterplot.html

GECC 103 – Mathematics in the Modern World Module 4


35

Source: https://datavizcatalogue.com/methods/scatterplot.html

Linear regression tries to model the relationship between two variables by


fitting a linear equation to the observed data. One variable is considered to be an
explanatory (independent) variable, and the other is considered to be the
dependent variable.

A linear regression line has an equation of the form Y =aX +b, where X is
the explanatory variable and Y is the dependent variable. a is the slope of the line
and b is the intercept (the value of y when x = 0)

To compute for the slope a;


a=n ( ∑ xy ) −¿ ¿

To compute for the intercept b,


b=∑ y−a ¿ ¿ ¿
Step-by-step Procedure:
Step 1: For each (x,y) point calculate x2 and xy

Step 2: Sum all x, y, x2 and xy, which gives us Σx, Σy, Σx2 and Σxy
(Σ means "sum up")

Step 3: Calculate Slope a.

Step 4: Calculate Intercept b.

Step 5: Assemble the equation of a line: y = ax + b

Example 22: The table below shows some data for the first ten (10) years of a
certain Manufacturing and Canning company, Marina. Each row in the table shows
Marina’s sales for a year, and the amount spent on advertising in that year.
Calculate the regression equation for the data using advertising as the explanatory
variable.

Year Advertising Sales (in


(in million million

GECC 103 – Mathematics in the Modern World Module 4


36

pesos) pesos)
1 18 665
2 23 758
3 25 823
4 28 1078
5 30 1199
6 33 1301
7 39 1472
8 47 1500
9 52 1604
10 61 1699

Solution:
X Y XY X2
1 18 665 11970 324
2 23 758 17434 529
3 25 823 20575 625
4 28 1078 30184 784
5 30 1199 35970 900
6 33 1301 42933 1089
7 39 1472 57408 1521
8 47 1500 70500 2209
9 52 1604 83408 2704
10 61 1699 103639 3721
∑ X=356 ∑ Y =¿ 12099 ¿ ∑ XY =474021 ∑ X 2=14406
To compute for the slope a;
a=n ( ∑ xy ) −¿ ¿
10 ( 474021 )−( 356 ) (12099)
a=
10 ( 14406 )−(356)2
a=24.992

To compute for the intercept b,


b=∑ y−a ¿ ¿ ¿
(12099 )−24.992(356)
b=
10
b=320.185

The regression equation is: Y =24.992 X +320.185

How to perform Regression Analysis in MS Excel


1. Encode your data in MS Excel worksheet.
2. On the Data tab, in the Analysis group, click Data Analysis.

GECC 103 – Mathematics in the Modern World Module 4


37

3. A dialogue box will appear, select Regression and click OK.

4. A new dialogue box for Regression will appear Select the Y Range. This is
the predictor variable (also called dependent variable). Select the X
Range. These are the explanatory variables (also called independent
variables). These columns must be adjacent to each other. Check Labels
 Click in the Output Range box and select any vacant in the work Check
Residuals  Click OK.

5. Excel produces the following Summary Output (rounded to 3 decimal


places).

GECC 103 – Mathematics in the Modern World Module 4


38

R square means that if the value is closer to 1, the better the regression line
fits the data.

Significance F and P-values


To check if the results are reliable or statistically significant, look at
Significance F (0.000052). If this value is less than 0.05, you're OK. It means that it
is statistically significant. If Significance F is greater than 0.05, it's probably better
to stop using this set of independent variables. Delete a variable with a high P-
value (greater than 0.05) and rerun the regression until Significance F drops below
0.05.

Coefficients
The regression line is: y = 24.992x + 320.175. In other words, for each unit
increase in advertising, sales increases with 320.175 units. This is an important
information.
*The same example was used in performing the regression analysis in MS Excel, there might be a
slight difference in the final answer due to manual computation and rounding off data.

Correlation
Correlation is a bivariate analysis that measures the strength of association
between two variables and the direction of the relationship. 
The main result of a correlation is called the correlation coefficient (or "r").
It ranges from -1.0 to +1.0.
When the value of the correlation coefficient lies around ± 1, then it is said
to be a perfect degree of association between the two variables. The closer r is to
+1 or –1, the stronger the correlation. The direction of the relationship is simply
the + (indicating a positive relationship between the variables) or - (indicating a
negative relationship between the variables) sign of the correlation.
Interpreting Correlation

GECC 103 – Mathematics in the Modern World Module 4


39

Correlation is an effect size and so we can verbally describe the strength of


the correlation using the guide that Evans (1996) suggests for the absolute value of
r:
 .00-.19 “very weak”
 .20-.39 “weak”
 .40-.59 “moderate”
 .60-.79 “strong”
 .80-1.0 “very strong”

Pearson Correlation

Pearson r correlation is the most widely used correlation statistic to


measure the degree of the relationship between linearly related variables. The
calculation of Pearson’s correlation coefficient and subsequent significance testing
of it requires the following data assumptions to hold: interval or ratio level;
linearly related; and bivariate normally distributed.

The following is the formula for r:

n ( ∑ xy )−( ∑ x )( ∑ y )
r=
√ n (∑ x )−(∑ x ) ⋅√ n (∑ y )−(∑ y )
2 2 2 2

Where:
r = Pearson r correlation coefficient
n = number of value in each data set
∑xy = sum of the products of paired scores
∑x = sum of x scores
∑y = sum of y scores
∑x2= sum of squared x scores
∑y2= sum of squared y scores

Example 23: A study investigated the relationship of height and self-esteem


of 20 randomly selected women. The following are the heights in inches and the
level of their self-esteem. Solve for the correlation coefficient r.
GECC 103 – Mathematics in the Modern World Module 4
40

Height (in inches) Self-Esteem Level


68 4.10
71 4.60
62 3.80
75 4.40
58 3.20
60 3.10
67 3.80
68 4.10
71 4.30
69 3.70
68 3.50
67 3.20
63 3.70
62 3.30
60 3.40
63 4.00
65 4.10
67 3.80
63 3.40
61 3.60

Solution: Solve for x2, y2 and xy together with their summations

Height (in inches) Self-Esteem


X2 Y2 XY
X Level Y
68 4.1 4624 16.81 278.80
71 4.6 5041 21.16 326.60
62 3.8 3844 14.44 235.60
75 4.4 5625 19.36 330.00
58 3.2 3364 10.24 185.60
60 3.1 3600 9.61 186.00
67 3.8 4489 14.44 254.60
68 4.1 4624 16.81 278.80
71 4.3 5041 18.49 305.30
69 3.7 4761 13.69 255.30
68 3.5 4624 12.25 238.00
67 3.2 4489 10.24 214.40
63 3.7 3969 13.69 233.10
62 3.3 3844 10.89 204.60
60 3.4 3600 11.56 204.00

GECC 103 – Mathematics in the Modern World Module 4


41

63 4.0 3969 16.00 252.00


65 4.1 4225 16.81 266.50
67 3.8 4489 14.44 254.60
63 3.4 3969 11.56 214.20
61 3.6 3721 12.96 219.60
∑ X =¿ ¿85 ∑ Y
2 2
=¿ ¿28 ∑ XY =¿ ¿49
∑ X=¿ ¿1308 ∑ Y =¿ ¿75.1 912 5.45 37.6

Substitute values into the formula:


n ( ∑ xy )−( ∑ x )( ∑ y )
r=
√ n (∑ x )−(∑ x) ⋅√ n (∑ y )−(∑ y )
2 2 2 2

20 ( 4937.6 )−(1308)(75.1)
r=
√ 20 ( 85912 )−(1308)2 ∙ √20 ( 285.45 )− (75.1 )2
98752−98230.8 521.2
r= = =0.731
√7376 ∙ √ 68.99 713.35
Interpretation: The r coefficient 0.731 indicates a positive strong
relationship between height and self-esteem level. This implies that shorter people
have lower self-esteem and taller people have higher self-esteem.

Learning Activity 4.7

Answer the following

GECC 103 – Mathematics in the Modern World Module 4


42

1. The following are data for 12 individual’s daily sodium intake and their
systolic blood pressure readings.

Person Sodium BP
1 6.8 154
2 7.0 167
3 6.9 162
4 7.2 175
5 7.3 190
6 7.0 158
7 7.0 166
8 7.5 195
9 7.3 189
10 7.1 186
11 6.5 148
12 6.4 140

A researcher investigator is interested in learning how strong


the association is between these variables and how well we can
predict blood pressure from sodium intake.

a. Calculate the value of r and the regression equation for the data.
b. Test the hypothesis at 0.05 level of significance.
c. What would be a likely blood pressure for a person with sodium of
6.3? How about sodium of 7.6?

GECC 103 – Mathematics in the Modern World Module 4

You might also like