0% found this document useful (0 votes)
44 views13 pages

Chapter 3 ODL

The document discusses correlation and regression analysis. It defines correlation as a statistical method to measure the relationship between two variables. A scatter diagram is used to visually depict the relationship, showing either a positive, negative, or no correlation. Pearson's correlation coefficient (r) measures the strength and direction of the linear correlation between two quantitative variables on a scale of -1 to 1. The coefficient of determination (r2) expresses the proportion of the variation in one variable that is predictable from the other variable.

Uploaded by

Noriani Zakaria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views13 pages

Chapter 3 ODL

The document discusses correlation and regression analysis. It defines correlation as a statistical method to measure the relationship between two variables. A scatter diagram is used to visually depict the relationship, showing either a positive, negative, or no correlation. Pearson's correlation coefficient (r) measures the strength and direction of the linear correlation between two quantitative variables on a scale of -1 to 1. The coefficient of determination (r2) expresses the proportion of the variation in one variable that is predictable from the other variable.

Uploaded by

Noriani Zakaria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

STA108 1

CHAPTER 3: CORRELATION AND REGRESSION

3.1 INTRODUCTION

Sometimes two variables are found to relate to each other in some ways. A change of one
variable might cause another variable to change due to the influence of the first variable on
the second variable. For example, an increase in sugar price may cause the price of certain
food to increase. This is due to the increase of the production cost.

Correlation analysis is a statistical method used to measure the strength of the relationship
between two variables.

3.2 THE SCATTER DIAGRAM

The first step in determining whether a relationship exists between two variables is to plot or
graph the available data. Normally, the independent variable (also called an explanatory
variable or a predictor variable) is labelled on the horizontal axis and the dependent variable
(also called a response variable) on the vertical axis. These paired variables are then plotted
on the graph. This graph is called a scatter diagram.

The scatter diagram forms certain patterns (increasing or decreasing), indicating that there is
a relationship between two variables. If the scatter diagram does not show any pattern or is
randomly scattered, we can assume that the two variables do not have a relationship between
them. Example of different patterns of scatter diagram is shown below.
STA108 2

Example 1

Draw a scatter diagram for the following data and state the type of relationship between the
variables.

Number of Hours Study


1 3 5 7 9 13 17
per Week
Test Mark 0 5 11 14 19 22 30

Test Mark Obtained for Number of Hours Study


per Week
35
30
25
Test Mark

20
15
10
5
0
0 2 4 6 8 10 12 14 16 18
Number of Hours Study per Week

There is strong positive correlation between number of hours study per week and test
mark.
STA108 3

Example 2

Fuzi Company supplies prawns to restaurants. The demand for prawns depends on the price
per kg. The data are shown below:

Price per kg (RM) Sales (kg)


20 600
22 550
24 480
26 450
28 400
30 330
32 250

Draw a scatter diagram for the following data and state the type of relationship between the
variables.
STA108 4

3.3 LINEAR CORRELATION COEFFICIENT

Linear correlation coefficient provides us with the measures to evaluate the strength of
relationship. Two methods are commonly used for this purpose are Pearson’s Product
Moment Correlation Coefficient (quantitative data) and Spearman’s Rank Correlation
Coefficient (qualitative data).

Pearson’s Product Moment Correlation Coefficient

Pearson’s Product Moment Correlation Coefficient is used to measure the strength of the
relationship between two variables that are quantitative in nature. It is normally denoted by r.
The sign (- or +) for r identifies the kind of relationship between the two quantitative variables
and the magnitude of r describes the strength of relationship.

The formula for the Pearson’s Correlation Coefficient, r is as follows:

n  xy   x  y
r

   n y    y  
2 2

 n  x   x
2 2

where

r = correlation coefficient

n = number of observations

∑x = sum of the x values

∑y = sum of the y values

∑xy = sum of the product of x and y

∑x2 = sum of the square values of x

∑y2 = sum of the square values of y

The magnitude of r is -1.0 ≤ r ≤ 1.0.

The value of r that is close to -1.0 indicates that the two variables have a strong negative linear
relationship. Negative relationship means that an increase in one variable causes another
variable to decrease and vice versa.

On the other hand, a value of r that is close to 1.0 indicates that the two variables have a
strong positive linear relationship. Positive relationship means that an increase in one
variable causes another variable to increase and vice versa.

A correlation close to or equal to 0 means that there is no linear relationship between the
two variables. This means that an increase or decrease in value of one variable will not affect
the other variable and vice versa. However, it does not imply that the two variables are
STA108 5

definitely unrelated. The two variables might be related in a nonlinear relationship or they may
not have any relationship at all.

Perfect Moderate Moderate Perfect


negative negative positive positive
correlation correlation No correlation correlation
correlation

-1 -0.5 0 0.5 1

The following guidelines can be used to interpret the value of Pearson’s product moment
correlation coefficient:

Pearson’s product moment correlation Interpretation


coefficient, r
-1 Perfect negative correlation
From -0.9 to -0.7 Strong negative correlation
From -0.6 to -0.4 Moderate negative correlation
From -0.3 to -0.1 Weak negative correlation
0 No correlation
From 0.1 to 0.3 Weak positive correlation
From 0.4 to 0.6 Moderate positive correlation
From 0.7 to 0.9 Strong positive correlation
1 Perfect positive correlation
STA108 6

Example 3

Calculate Pearson’s product moment correlation coefficient, r for the following set of data and
interpret the value.

x 3 5 8 10 13 15 18 20 28
y 30 35 41 50 51 60 65 66 70

x y 𝑥 𝑦 xy
3 30 9 900 90
5 35 25 1225 175
8 41 64 1681 328
10 50 100 2500 500
13 51 169 2601 663
15 60 225 3600 900
18 65 324 4225 1170
20 66 400 4356 1320
28 70 784 4900 1960
120 468 2100 25988 7106

∑ 𝑥 = 120

∑ 𝑦 = 468

∑ 𝑥 = 2100

∑ 𝑦 = 25988

∑ 𝑥𝑦 = 7106

𝑛=9

9(7106) − 120(468)
𝑟= = 0.9529
[9(2100) − 120 ][9(25988) − 468 ]

There is strong positive correlation between x and y.


STA108 7

Example 4

Table below shows the interest rate for car loans and the average number of customers who
apply for the loans in a month from a finance company. Calculate Pearson’s product moment
correlation coefficient, r.

Interest
rate in %, 6.0 6.2 6.5 6.8 7.0 7.2 7.5 7.8 8.0 8.2 8.4 8.7
x
Number of
applicants, 80 80 78 75 70 60 60 55 50 48 45 40
y
STA108 8

Coefficient of Determination

The coefficient of determination is the ratio of the explained variation to the total variation. It
is normally denoted by r2.

For example, if correlation coefficient r = 0.91, the coefficient of determination

r
2
 0.91  0.8281
2

The term r2 is expressed as a percentage. Therefore, r2 = 0.8281 means that 82.81% of the
variation in the dependent variable is explained by the model while the rest of the variation,
17.19% is caused by random errors.
STA108 9

3.4 REGRESSION LINE

One of the method to find a best fitting line of the data is the method of least squares. A
regression line with a positive slope indicates that there is a direct relationship between the two
variables. This means that if x increases, y will increase as well and vice versa.

A negative slope indicates an inverse relationship between the two variables. This means that if x
increases, y will decrease and if x decreases, y will increase. Thus, x and y are always moving in
opposite directions.

If the slope is zero, then we say that the two variables are not related.

The linear regression equation for sample data can be written in the form of y = a + bx.

x = independent variable

y = dependent variable

a = value of y where the regression line intersects with the y-axis

b = slope of the regression line (change in y per unit change in x)

The values of a and b in the regression line y = a + bx can be calculated using the following formula:

n  xy   x  y
b
n  x 2   x 
2

a
 y  b  x 
n  n 
 
STA108 10

Example 5

Find the least squares regression line of y on x for the following data:

x 3 6 9 11 16 18
y 2 8 11 14 19 21

Determine the values of y when:

a) x=5
b) x = 14
STA108 11

Example 6

A production manager collected data on production cost and the quantity produced for 10
consecutive days. These data are given below:

Day 1 2 3 4 5 6 7 8 9 10
Quantity (‘000 units) 10 13 20 18 17 15 16 14 11 12
Cost (RM ‘000) 20 28 38 35 33 30 34 29 23 25

a) Find the regression equation for the production cost, y on the production quantity, x using
the least squares method.
b) Explain the meaning of the constants a and b in the equation.
c) Estimate the production cost when the production quantity is 25000 units.
d) Find the product moment correlation coefficient.
e) Find the coefficient of determination.
STA108 12
STA108 13

Example 7

The following table shows how many weeks six persons have worked at an automobile inspection
station and the number of cars each one inspected between noon and 2 P.M. on a given day:

Number of weeks employed, x Number of cars inspected, y


2 13
7 21
9 23
1 14
5 15
12 21

a) Obtain a linear regression line using the least squares method.


b) Explain the meaning of the constants a and b in the equation.
c) Calculate the coefficient of determination and explain the value obtained.
d) Estimate how many cars a person can be expected to inspect during the same two-hour
period if he or she has been working at the inspection station for eight weeks.

You might also like