Correlation Regression 1
Correlation Regression 1
Why do most students who are good in Mathematics also perform well in Physics? Why does
blood pressure go with age? Why do students with high IQ have good academic performances? These
questions have something to do with relationships between variables. In this chapter, we shall learn how
to describe the relationship between two variables.
Age, x 43 48 56 61 67 70
Pressure, y 128 120 135 143 141 152
Solution: Draw and label the x and y axes. Plot each point on the graph.
160
pressure
150
140
130
120
40 50 60 70 80
age
Notice that the points on the scatter plot do not lie on one line. However, the points closely follow
a straight line. This line is called a trend line.
The relationship between two variables is described in terms of strength and direction.
1
Types of Correlation according to Direction
In terms of direction, the relationship between two variables may be positive, negative, or zero.
Positive Correlation y
If a positive correlation exists, then the points on
the scatter plot closely follow a straight line
slanting up to the right
x
Negative Correlation y
If a negative correlation exists, then the points on
the scatter plot closely follow a straight line
slanting down to the right.
x
Zero Correlation y
If a zero correlation exists, then the points on the
scatter plot are randomly scattered. The points do
not follow closely a straight line.
x
Types of Correlation according to Strength
Perfect Correlation – exists when all the points on the scatter plot lie on a straight line. When the points
on the scatter plot do not lie on a straight line, the relationship may be very high, high, moderately high,
low, negligible, or zero.
x x
High Positive Correlation High Negative Correlation
y y
x x
x x
3
Computing the Pearson Product-Moment Correlation Coefficient
Example:
A store manager wishes to find out whether there is a relationship between the age of the employees and
the number of sick days they incur each year. The data for the sample are shown. Calculate the correlation
coefficient (r) and describe the relationship in terms of strength and direction.
Employee A B C D E F
Age (X) 18 26 39 48 53 58
Days (Y) 16 12 9 5 6 2
Solution:
It will be easier to solve for r if a table of values will be constructed. Find the values of XY, X2,
Y and place these values in the corresponding columns of the table.
2
Employee X Y XY X2 Y2
A 18 16 288 324 256
B 26 12 312 676 144
C 39 9 351 1521 81
D 48 5 240 2304 25
E 53 6 318 2809 36
F1 58 2 116 3364 4
Thus, the r = -0.98 shows a very high negative correlation. This implies that there is a very high
negative correlation between the age of employees and the number of sick days. This means that older
employees tend to have less numbers of sick days while younger employees tend to have more number of
sick days.
4
8.3 Testing the Significance of the Pearson Product-Moment Correlation Coefficient r
The test statistics for testing the significance of r is given by using t-test.
where r = correlation coefficient
n = sample size
df = n –2
Example:
A soft drink distributor is interested to find out if the number of cases of soft drinks ordered is related to
the travel time they are delivered. The following data have been obtained from past experiences.
Number of Cases of Soft Drinks (X) Travel Time in Minutes (Y)
24 21
6 3
16 6
64 15
10 21
25 61
35 20
1. Compute the correlation coefficient (r).
2. Test the significance of the correlation coefficient at 0.05 level of significance.
Solution:
1. To compute the correlation coefficient, prepare a table like the one shown below.
X Y XY X2 Y2
24 21 504 576 441
6 3 18 36 9
16 6 96 256 36
64 15 960 4096 225
10 21 210 100 441
25 61 1525 625 3721
35 20 700 1225 400
Step 3
To compute the test value, use the formula for testing the significance of r.
Step 4
Make a decision whether to accept or reject the null hypothesis. Since the absolute value
of the computed t value (0.234) is less than the absolute value of tabular or critical value
(2.571), accept the null hypothesis.
Step 5
There is no significant relationship between the number of cases ordered and the travel
time that they are delivered.
Example:
The following data show the number of years by which passenger jeepneys have been used and their
corresponding depreciated prices in thousand pesos.
Jeep Age in Years (X) Price in Php 1 000 (Y)
A 5 85
B 4 103
C 6 70
D 5 82
E 5 89
F 5 98
G 6 66
H 6 95
I 2 169
J 7 70
K 7 48
1. Determine the regression equation for predicting the price of the jeepney in terms of its years of
usage.
2. Predict the price of the jeepney which is 3 years in use.
Solution:
Step 1 We need to establish first that the age and the depreciated price of a jeepney are significantly correlated
before we can perform regression analysis. We shall compute first the correlation coefficient.
X Y XY X2 Y2
5 85 425 25 7225
4 103 412 16 10609
6 70 420 36 4900
5 82 410 25 6724
5 89 445 25 7921
5 98 490 25 9604
7
6 66 396 36 4356
6 95 570 36 9025
2 169 338 4 28561
7 70 490 49 4900
7 48 336 49 2304
Step 2 We test the significance of r using the t-test. Let us test its significance at 0.05 level. The critical
value of t at 0.05 level, two-tailed test, and df = n – 2 = 11 – 2 = 9 is ±2.262.
Since the absolute value of the computed value is greater than the absolute value of the critical
value, we conclude that the correlation coefficient is significant.
Step 3 Since there is a significant relationship between the age and the depreciated price of the jeepney,
we can proceed to regression analysis to predict the price in terms of age.
We compute for the values of b0 and b1.
Step 4 To predict the price of a jeepney which is 3 years old, we substitute X = 3 in the regression
equation .
8
Since the prices of the jeepneys are expressed in thousand pesos, we multiply 134.68 by 1 000.
Therefore, the predicted price of a jeepney which is 3 years old is Php 134 680.
Chapter 8
Chapter Test
Name: ___________________________________ Date: _________Score: _____ Rating: _____
Course &Section: __________________________ Instructor: ____________________________
I. The average family income and annual savings of nine families are shown below.
Annual Income Annual Savings
(in Php 10 000) (X) (in Php 10 000) (Y)
36 1
39 2
42 2
45 5
48 5
51 6
54 7
56 8
59 7