Module 7 Data Management Regression and Correlation
Module 7 Data Management Regression and Correlation
Module 7 Data Management Regression and Correlation
7.1 Introduction
In our daily activities, it is necessary that the relationship between variables
be established before a decision is made. For example, the school registrar must
predict the enrollment before preparing the class schedules. One must know the
sequence of the courses to be offered before a feasible flow chart could be
prepared. In this section, we will discuss some commonly used measures of
association that show the linear relationship between two variables such as
correlation analysis. The term “relationship” means that changes in two variables
are associated with each other. This relationship can be directly or inversely
proportional to each other. Moreover, correlation is used to determine if there is
a relationship between two variables and to determine the strength of the
correlation.
Correlation and linear regression can help us deal with the relationship
between two or more continuous variables. We shall study the dependence of
one variable, the dependent variable to the independent variable.
7.2 Learning Outcome
After finishing this module, you are expected to:
Page 2 of 15
1. Positive Linear Correlation - general trend in the plotted points is from
bottom left to top right.
2. Negative Linear Correlation - general trend in the plotted points is from top
left to bottom right.
3. No Linear Correlation - No general trend in plotted points, or a non-linear
trend.
The strength of the linear correlation can be judged by looking at how closely
the points approximate a straight line.
Example 1
The following table shows the Height (X) vs. Weight (Y) measurements (both in
inches) for 10 men:
x 70.8 66.2 71.7 68.7 67.6 69.2 66.5 67.2 68.3 65.6
y 42.5 40.2 44.4 42.8 40.0 47.3 43.4 40.1 42.1 36.0
Example 2.
The following table gives the resale value of a car bought in 1970 at
Php200,000.00.
x (Php) 1970 1973 1976 1979 1982 1985 1988 1991 1994 1997
y (000) 200 150 145 135 120 100 79 65 54 35.0
Page 3 of 11
Interpretation: The diagram indicates a negative linear correlation between the
variables.
Example 3.
Below is a data of the scores in an examination. Make a scatter plot and interpret
the data.
Page 4 of 15
7.3.1.2 Coefficient of Correlation
A more precise method of determining the type and strength of a linear
correlation is to calculate the coefficient of linear correlation 𝑟, also known as
Pearson Product-Moment Correlation Coefficient, for the two variables using the
formula:
Example 4.
Scores of students in the Midterm and Final Examinations were gathered.
The teacher wants to find the strength of linear relationship between the Midterm
scores and the Final Term scores. What is the coefficient of linear correlation?
Page 5 of 11
Solution.
The scatter plot in the example suggests that a positive correlation exists
between Midterm and Final term scores.
1. Compute , and
and column totals.
From the result, we know that the Midterm score and the Final
term score have a strong positive linear correlation.
Page 6 of 15
7.3.2 Regression Analysis
After a relationship between paired data, which are referred to as bivariate
data, has been discovered, one can model the relationship with an equation. One
method of determining a linear relationship for bivariate data is called linear
regression.
In linear regression, we assume that a change in 𝑥 (independent variable)
will lead directly to a change in 𝑦 (dependent variable). Sometimes, we are
interested in predicting the value of 𝑦 from the value of 𝑥. Generally, it is not
logical to believe that 𝑦 caused 𝑥. By convention, we plot the independent variable
along the horizontal axis or the 𝑥-axis and the dependent variable along the
vertical axis or 𝑦-axis.
Furthermore, simple linear regression is similar to correlation in that the
purpose is to measure to what extent there is a linear relationship between two
variables. In particular, the purpose of linear regression is to "predict" the value
of the dependent variable based upon the values of one or more independent
variables. The relationship is summarized by a regression equation consisting of
a slope and an intercept. The slope represents the amount the dependent
variable increases or decreases with unit increase or decrease in the independent
variable and the intercept indicates the value of the dependent variable when the
independent variable takes the value zero.
and
𝑏 = 𝑦 − 𝑎𝑥
The notation 𝑥 represents the mean of the 𝑥 values and 𝑦 represents the
mean of the 𝑦 values.
Page 7 of 11
Example 6.
Find the equation of the least-squares line for the ordered pairs in the table
below.
𝑥 𝑦
2.5 3.4
3.0 4.9
3.3 5.5
3.5 6.6
3.8 7.0
4.0 7.7
4.2 8.3
4.5 8.7
Solution.
From the scatter plot in this example, we see that there is a positive
correlation between the two sets of data.
Page 8 of 15
We now proceed with the process of finding the equation of the regression
line.
Steps Actual process and results
1. Prepare the
columns for
and .
Page 9 of 11
The regression line is given by the red line in the next figure.
Example 7.
Use the equation of the least-squares line from the previous example to
predict the average 𝑦 values for each of the following 𝑥 values. a. 2.8
b. 4.8
Solution.
Steps Actual process and results
Example 8.
Five children aged 2, 3, 5, 7, and 8 years old weighing 14, 20, 32, 42, and 44
kilograms respectively.
Page 10 of 15
b. Based on this data, what is the approximate weight of a six-year-old
child?
Solution.
(a)
Steps Actual process and results
1. Prepare the table with
columns for ,
, and .
2. Compute the
slope
.
(b)
Page 11 of 11