Chapter 3 ODL
Chapter 3 ODL
3.1 INTRODUCTION
Sometimes two variables are found to relate to each other in some ways. A change of one
variable might cause another variable to change due to the influence of the first variable on
the second variable. For example, an increase in sugar price may cause the price of certain
food to increase. This is due to the increase of the production cost.
Correlation analysis is a statistical method used to measure the strength of the relationship
between two variables.
The first step in determining whether a relationship exists between two variables is to plot or
graph the available data. Normally, the independent variable (also called an explanatory
variable or a predictor variable) is labelled on the horizontal axis and the dependent variable
(also called a response variable) on the vertical axis. These paired variables are then plotted
on the graph. This graph is called a scatter diagram.
The scatter diagram forms certain patterns (increasing or decreasing), indicating that there is
a relationship between two variables. If the scatter diagram does not show any pattern or is
randomly scattered, we can assume that the two variables do not have a relationship between
them. Example of different patterns of scatter diagram is shown below.
STA108 2
Example 1
Draw a scatter diagram for the following data and state the type of relationship between the
variables.
20
15
10
5
0
0 2 4 6 8 10 12 14 16 18
Number of Hours Study per Week
There is strong positive correlation between number of hours study per week and test
mark.
STA108 3
Example 2
Fuzi Company supplies prawns to restaurants. The demand for prawns depends on the price
per kg. The data are shown below:
Draw a scatter diagram for the following data and state the type of relationship between the
variables.
STA108 4
Linear correlation coefficient provides us with the measures to evaluate the strength of
relationship. Two methods are commonly used for this purpose are Pearson’s Product
Moment Correlation Coefficient (quantitative data) and Spearman’s Rank Correlation
Coefficient (qualitative data).
Pearson’s Product Moment Correlation Coefficient is used to measure the strength of the
relationship between two variables that are quantitative in nature. It is normally denoted by r.
The sign (- or +) for r identifies the kind of relationship between the two quantitative variables
and the magnitude of r describes the strength of relationship.
n xy x y
r
n y y
2 2
n x x
2 2
where
r = correlation coefficient
n = number of observations
The value of r that is close to -1.0 indicates that the two variables have a strong negative linear
relationship. Negative relationship means that an increase in one variable causes another
variable to decrease and vice versa.
On the other hand, a value of r that is close to 1.0 indicates that the two variables have a
strong positive linear relationship. Positive relationship means that an increase in one
variable causes another variable to increase and vice versa.
A correlation close to or equal to 0 means that there is no linear relationship between the
two variables. This means that an increase or decrease in value of one variable will not affect
the other variable and vice versa. However, it does not imply that the two variables are
STA108 5
definitely unrelated. The two variables might be related in a nonlinear relationship or they may
not have any relationship at all.
-1 -0.5 0 0.5 1
The following guidelines can be used to interpret the value of Pearson’s product moment
correlation coefficient:
Example 3
Calculate Pearson’s product moment correlation coefficient, r for the following set of data and
interpret the value.
x 3 5 8 10 13 15 18 20 28
y 30 35 41 50 51 60 65 66 70
x y 𝑥 𝑦 xy
3 30 9 900 90
5 35 25 1225 175
8 41 64 1681 328
10 50 100 2500 500
13 51 169 2601 663
15 60 225 3600 900
18 65 324 4225 1170
20 66 400 4356 1320
28 70 784 4900 1960
120 468 2100 25988 7106
∑ 𝑥 = 120
∑ 𝑦 = 468
∑ 𝑥 = 2100
∑ 𝑦 = 25988
∑ 𝑥𝑦 = 7106
𝑛=9
9(7106) − 120(468)
𝑟= = 0.9529
[9(2100) − 120 ][9(25988) − 468 ]
Example 4
Table below shows the interest rate for car loans and the average number of customers who
apply for the loans in a month from a finance company. Calculate Pearson’s product moment
correlation coefficient, r.
Interest
rate in %, 6.0 6.2 6.5 6.8 7.0 7.2 7.5 7.8 8.0 8.2 8.4 8.7
x
Number of
applicants, 80 80 78 75 70 60 60 55 50 48 45 40
y
STA108 8
Coefficient of Determination
The coefficient of determination is the ratio of the explained variation to the total variation. It
is normally denoted by r2.
r
2
0.91 0.8281
2
The term r2 is expressed as a percentage. Therefore, r2 = 0.8281 means that 82.81% of the
variation in the dependent variable is explained by the model while the rest of the variation,
17.19% is caused by random errors.
STA108 9
One of the method to find a best fitting line of the data is the method of least squares. A
regression line with a positive slope indicates that there is a direct relationship between the two
variables. This means that if x increases, y will increase as well and vice versa.
A negative slope indicates an inverse relationship between the two variables. This means that if x
increases, y will decrease and if x decreases, y will increase. Thus, x and y are always moving in
opposite directions.
If the slope is zero, then we say that the two variables are not related.
The linear regression equation for sample data can be written in the form of y = a + bx.
x = independent variable
y = dependent variable
The values of a and b in the regression line y = a + bx can be calculated using the following formula:
n xy x y
b
n x 2 x
2
a
y b x
n n
STA108 10
Example 5
Find the least squares regression line of y on x for the following data:
x 3 6 9 11 16 18
y 2 8 11 14 19 21
a) x=5
b) x = 14
STA108 11
Example 6
A production manager collected data on production cost and the quantity produced for 10
consecutive days. These data are given below:
Day 1 2 3 4 5 6 7 8 9 10
Quantity (‘000 units) 10 13 20 18 17 15 16 14 11 12
Cost (RM ‘000) 20 28 38 35 33 30 34 29 23 25
a) Find the regression equation for the production cost, y on the production quantity, x using
the least squares method.
b) Explain the meaning of the constants a and b in the equation.
c) Estimate the production cost when the production quantity is 25000 units.
d) Find the product moment correlation coefficient.
e) Find the coefficient of determination.
STA108 12
STA108 13
Example 7
The following table shows how many weeks six persons have worked at an automobile inspection
station and the number of cars each one inspected between noon and 2 P.M. on a given day: