Correlation and Regression Analysis: C H A P T E R 5
Correlation and Regression Analysis: C H A P T E R 5
REGRESSION 2.
3.
4.
5.
Differentiate the different divisions of statistics
Identify the scale of measurement of variables
Differentiate data sets
Present data in three different ways
C ANALYSIS
H
A OBJECTIVES:
1. Differentiate correlation and regression
P
analysis
T 2. Draw a scatter plot for the set of ordered
pairs
E 3. Identify and graph the equation of the
regression line
R 4. Calculate correlations and regressions
using MS- Excel
3.
MEANS WITHOUT THE WRITTEN PERMISSION FROM THE AUTHORs
CORRELATION AND
REGRESSION ANALYSIS
Correlation is a degree of relationship between variables, which seeks to
determine how well a linear or other equation describes or explains the relationship
between variables. It also implies “association” between two variables.
N xy x y
r
N x x N y y
2 2 2 2
The Pearson correlation coefficient, r, can take a range of values from +1 to -1.
A value of 0 indicates that there is no association between the two variables.This is
shown in figure 7.
LINEAR REGRESSION
Y a bx
where
Y = predicted value
a = y-intercept
b = slope of the regression line
x = the value of x to be predicted
a y bx
where:
y = mean value of Y
x = mean value of X
N xy x y
b
N x 2 x
2
where :
x = sum of the values of x
y = sum of the values of y
x = sum of the values of the square of x
2
Example
Below are the scores of 12 college students in Mathematics and Physics tests of 80 items
each.
Mathematics (x) 65 63 67 64 68 62 70 66 68 67 69 71
Physics (y) 68 66 68 65 69 66 68 65 71 67 68 70
Solution
Step 1: Draw a scatter plot. If the scatter plot does not show any (linear) trend stop
analysis, conclude “no relationship”. Otherwise proceed to step number 2
The scatter plot indicates an upward linear trend between Mathematics and Physics
proficiency. Thus, “there is a reason to believe that they are related.”
Numbe Mathematic xy
r s (x)
Physics (y) x2 y2
1 65 68 4225 4624 4420
2 63 66 3969 4356 4158
3 67 68 4489 4624 4556
4 64 65 4096 4225 4160
5 68 69 4624 4761 4692
6 62 66 3844 4356 4092
7 70 68 4900 4624 4760
8 66 65 4356 4225 4290
9 68 71 4624 5041 4828
10 67 67 4489 4489 4489
11 69 68 4761 4624 4692
12 71 70 5041 4900 4970
Step 3: Formulate the regression line equation by solving first the value of the variables
b and a.
Solving for b
Y = a + bx
y 35.58 0.48 x Regression line equation
We can now estimate scores in Physics (y) using the regression line equation by
substituting a value or score in Mathematics (x). Say for instance, if x is equal to 75, then
solving for y will give a 71.59.
y 35.58 0.4875
y 71.58
Therefore, the estimated score in Physics is 71.59 or approximately equivalent to
72 if the score in Mathematics is 75. The regression line equation may be used now in
estimating scores for y by substituting a value of x.
1. Test scores of nine (9) students are shown below. What can you say about the
strength of the correlation between these sets of scores in Trigonometry and
Geometry?
Trigonometry 43 41 50 47 35 33 50 33 54
Geometry 48 45 47 43 33 28 48 31 57
2. Calculate the degree of linear relationship for the following number of minutes
consumed in studying and score in the examination.
Number of
27 50 57 15 18 48 52 55 28 32
minutes
Score in
40 53 52 24 21 35 40 39 47 36
examination
3. The number of hours spent per week viewing television (y) and the number of years
of education (x) were recorded for ten randomly selected individuals. The results are
given below;
x 12 14 11 16 16 18 12 20 10 12
y 10 9 15 8 5 4 20 4 16 15
a. Draw the scatter diagram.
b. Find the correlation coefficient of x and y and interpret your answer.
c. Find the regression line equation.
d. What is the predicted value of y if x are 15, 17 and 19.
Subject 1 2 3 4 5 6 7 8 9 10
Estrone in
7.4 7.5 8.5 9.0 9.0 11.0 13.0 14.0 14.5 16.0
Saliva (x)
Estrone in
free 30.0 25.0 31.5 27.5 39.5 38.0 43.2 49.0 55.0 48.5
plasma(y)
c. If the estrone level is 12.1, predict the level of estrone in free plasma.
5. Compute the correlation ratio between test scores and teaching method.
Teaching Method 54 61 75 63 82 52 63 50
Test scores 76 80 89 80 88 83 79 82
6. A researcher allegedly thinks that a person who works in the academe and spends
years in it receives yearly increment in his salary. So the researcher conducted the
research, gathered data and sought to create a linear regression equation to
represent this allegation. Below are the gathered data.
8. A study was conducted to examine the association between adult immunity and
juvenile mortality in southern fur seals. Therefore, researchers determined the
percentage of adult southern fur seals on different island populations that contained
a certain antibody in their blood and they also determined the mortality rate for seal
pups on those same islands. Is there a significant relationship between adult
southern seal immunity and seal pup mortality on these islands?
Antibody
Presence 35 58 69 43 94 26 7 9 12 45 11 66 51
Pup
mortality 115 98 109 63 24 226 357 339 112 145 111 36 54
y 0 1 x1 2 x2 ........ r xr
where:
The least square estimates of 0, 1, 2,...... r are obtained by solving simultaneous linear
equations:
n 0 1 x1 2 x2 y
0 x1 1 x12 2 x`1 x2 x1 y
0 x2 1 x1 x2 2 x22 x2 y
Example Problem:
1. The given data below are the number of class periods missed by the 12 students
taking the Business Statistics subject. The data are recorded in the following table:
b. Estimate the grade if the student’s test score is 75 and have missed 3 classes.
Car Number 1 2 3 4 5 6 7 8 9 10
Miles per gallon(y) 17.9 16.5 16.4 16.8 18.8 15.5 17.5 16.4 15.9 18.3
Weight in tons x1 1.35 1.90 1.70 1.80 1.30 2.05 1.60 1.80 1.85 1.40
Temperature in F x2 90 30 80 40 35 45 50 60 65 30
a. Fit a regression curve of the form y B0 B1 x1 B2 x2
b. Estimate the miles per gallon of an automobile who are 2.50 tons and temperature of
85.