0% found this document useful (0 votes)
206 views8 pages

Lesson 6.2 Correlation and Regression Analysis Final Edition

This document discusses correlation and simple regression analysis. It defines correlation analysis and regression analysis, and the difference between them. The key aspects of correlation covered include the Pearson correlation coefficient and how to interpret it. Regression analysis is introduced as using independent variables to predict dependent variables. Methods for determining correlation strength and the significance of correlation are also presented.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
206 views8 pages

Lesson 6.2 Correlation and Regression Analysis Final Edition

This document discusses correlation and simple regression analysis. It defines correlation analysis and regression analysis, and the difference between them. The key aspects of correlation covered include the Pearson correlation coefficient and how to interpret it. Regression analysis is introduced as using independent variables to predict dependent variables. Methods for determining correlation strength and the significance of correlation are also presented.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

LESSON 6.

2 CORRELATION AND SIMPLE REGRESSION

Lesson 6.2 Correlation and simple Regression Analysis


Introduction
In this lesson, you learn the correlation analysis and simple regression analysis which
will enable us to develop a model to predict the values of a numerical variable, based
on the value of other variables. In correlation analysis you will be going to determine
the strength of the relationships while in regression analysis, the variable you wish to
predict is called the dependent variable. The variables used to make the prediction are
called independent variables. In addition to predict values of the dependent variable,
regression analysis also allows you to identify the type of mathematical relationship that
exists between a dependent and an independent variable, to quantify the effect that
changes in the independent variable have on the dependent variable, and to identify
unusual observations. For example, as the director of planning, you may wish to predict
sales of a store, based on the size of the store and you can determine also the
relationship on the sales base on the size of the area.

Learning Outcomes:
Upon completion of this unit, you should be able to:
1. To describe coefficient of correlation
2. To describe the relationship between regression and correlation
3. To use regression analysis to predict the value of a dependent variable based on an
independent variable
4. The meaning of the regression coefficients b0 and b1
5. To evaluate the assumptions of regression analysis and know what to do if the assumptions
are violated
6. To make inferences about the slope and correlation coefficient
7. To estimate mean values and predict individual values

Discussion

Pearson Product-Moment Correlation

The Pearson product-moment correlation coefficient (or Pearson correlation


coefficient, for short) is a measure of the strength of a linear association between two
variables and is denoted by r or 𝜌 (𝑟𝑒𝑎𝑑 𝑎𝑠 𝑟ℎ𝑜). Basically, a Pearson product-moment
correlation attempts to draw a line of best fit through the data of two variables, and the
Pearson correlation coefficient, r, indicates how far away all these data points are to this
line of best fit (i.e., how well the data points fit this new model/line of best fit).

The Pearson correlation coefficient, r, can take a range of values from -1 to +1


(−1 ≤ 𝑟 ≤ 1). A value of 0 indicates that there is no association between the two variables.
A value greater than 0 indicates a positive association; that is, as the value of one variable
increases, so does the value of the other variable. A value less than 0 indicates a negative
association; that is, as the value of one variable increases, the value of the other variable
decreases. This is shown in the diagram below:
LESSON 6.2 CORRELATION AND SIMPLE REGRESSION

How can we determine the strength of association based on the Pearson correlation
coefficient?

The stronger the association of the two variables, the closer the Pearson correlation
coefficient, r, will be to either +1 or -1 depending on whether the relationship is positive or
negative, respectively. Achieving a value of +1 or -1 means that all your data points are
included on the line of best fit – there are no data points that show any variation away
from this line. Values for r between +1 and -1 (for example, r = 0.8 or -0.4) indicate that there
is variation around the line of best fit. The closer the value of r to 0 the greater the variation
around the line of best fit. Different relationships and their correlation coefficients are
shown in the diagram below:

When you interpret the result of coefficient of correlation you should be guided by the
following reminders:
1. The relationship of the variables does not necessary mean that one affects the
other variable. It does not mean the cost and effect relationship.
2. If the computed coefficient of correlation is high, it does not mean that one
factor is strongly dependent on the other. For instance the weight and grade
of student, making this as correlation does make any sense. But if the weight
and height of student make a sense.
LESSON 6.2 CORRELATION AND SIMPLE REGRESSION

3. If two variables are related to each other and the computed r is high, then
there is a reason to believe that they are meant to be associated.

Suggested Guidelines to interpret Pearson's correlation coefficient

strength of association Negative Positive strength of association


no correlation 0 no correlation
weak negative Weak positive
−0.25 ≤ 𝑟 < 0 0 < 𝑟 ≤ .25
correlation correlation
Moderate negative Moderate positive
−0.5 ≤ 𝑟 < −0.26 . 26 < 𝑟 ≤ .5
correlation correlation
Strong Negative strong positive
−.75 < 𝑟 ≤ −.51 51 < 𝑟 ≤ .75
Correlation correlation
Very strong negative very strong positive
−0.99 ≤ 𝑟 < −0.76 . 76 < 𝑟 ≤ .99
correlation correlation
perfect negative perfect positive
-1 1
correlation correlation

The Formula is
𝒏(∑ 𝑿𝒀) − (∑ 𝑿)(∑ 𝒀)
𝒓=
√[𝒏 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 ][𝒏 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐 ]

Example: Test the relationship between the quiz score in mathematics and statistics.
math score statistics score
78 82
92 88
86 91
83 90
95 92
85 85
91 89
76 81
88 96
79 77

Solution:

student math score statistics score


(x) (y) XY 𝑿𝟐 𝒀𝟐
1 78 82 6396 6084 6724
2 92 88 8096 8464 7744
3 86 91 7826 7396 8281
4 83 90 7470 6889 8100
5 95 92 8740 9025 8464
6 85 85 7225 7225 7225
7 91 89 8099 8281 7921
8 76 81 6156 5776 6561
9 88 96 8448 7744 9216
10 79 77 6083 6241 5929
∑ 𝑋=853 ∑ 𝒀 =871 𝟐 𝟐
∑ 𝑿 𝒀 =74539 ∑ 𝑿 =73125 ∑ 𝒀 =76165
LESSON 6.2 CORRELATION AND SIMPLE REGRESSION

𝒏(∑ 𝑿𝒀) − (∑ 𝑿)(∑ 𝒀) 𝟏𝟎(𝟕𝟒𝟓𝟑𝟗) − (𝟖𝟓𝟑)(𝟖𝟕𝟏)


𝒓= =
√[𝒏 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 ][𝒏 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐 ] √[𝟏𝟎(𝟕𝟑𝟏𝟐𝟓) − (𝟖𝟓𝟑)𝟐 ][𝟏𝟎(𝟕𝟔𝟏𝟔𝟓) − (𝟖𝟕𝟏)𝟐 ]
(𝟕𝟒𝟓𝟑𝟗𝟎) − (𝟕𝟒𝟐𝟗𝟔𝟑) 𝟐𝟒𝟐𝟕 𝟐𝟒𝟐𝟕 𝟐𝟒𝟐𝟕
𝒓= = = =
√[𝟕𝟑𝟏𝟐𝟓𝟎 − 𝟕𝟐𝟕𝟔𝟎𝟗][𝟕𝟔𝟏𝟔𝟓𝟎 − 𝟕𝟓𝟖𝟔𝟒𝟏] √(𝟑𝟔𝟒𝟏)(𝟑𝟎𝟎𝟗) √𝟏𝟎𝟗𝟓𝟓𝟕𝟔𝟗 𝟑𝟑𝟎𝟗. 𝟗𝟓
= 𝟎. 𝟕𝟑𝟑𝟐 = 𝟎. 𝟕𝟑
Since the value of r is 0.73 then there is a relatively strong positive linear relationship
between their math and statistics score. Thus those students who scored high on
mathematics will tend to score high on statistics.

Test of significance of correlation coefficient r

To test the significance of correlation coefficient r, use the t-test for r using the
formula
𝑟√𝑛−2
𝑡𝑐 =
√1−𝑟 2
Where n-2 degrees of freedom

The test statistic t has the same sign as the correlation coefficient r.

Example: from example above with r=0.73 and n-2=8, the tabular value is 2.306 with level
of significance is 0.05.
0.73√10−2 2.06
𝑡𝑐 = = = 3.05
√1−(0.73)2 √0.4671
Since the computed value is greater than the tabular value we reject the null
hypothesis. Thus there is significant relationship between the score of mathematics and
the score in statistics.

Simple Regression Analysis

Correlation versus Regression

A Correlation analysis is used to measure strength of the association (linear


relationship) between two variables, it is only concerned with strength of the relationship
and No causal effect is implied with correlation, while Regression analysis is used to: a)
Predict the value of a dependent variable based on the value of at least one
independent variable and b) Explain the impact of changes in an independent variable
on the dependent variable. The Dependent variable is the variable you wish to explain
while the Independent variable is the variable used to explain the dependent variable

Simple linear regression discusses, in which a single numerical independent


variable, X, is used to predict the numerical dependent variable Y, such as using the size
of a store to predict the annual sales of the store.

SIMPLE LINEAR REGRESSION MODEL


𝑦 = 𝑎 + 𝑏𝑥 (regression equation)
Where:
𝑎 = Y intercept for the population
∑ 𝑦 ∑ 𝑥 2 − ∑ 𝑥 ∑ 𝑥𝑦
𝑎=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2

𝑏 = slope for the population


𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑏=
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2

y = dependent variable (sometimes referred to as the response variable) for


observation
x = independent variable (sometimes referred to as the explanatory variable) for
observation
LESSON 6.2 CORRELATION AND SIMPLE REGRESSION

From the above example


student math score statistics score
(x) (y) XY 𝑿𝟐 𝒀𝟐
1 78 82 6396 6084 6724
2 92 88 8096 8464 7744
3 86 91 7826 7396 8281
4 83 90 7470 6889 8100
5 95 92 8740 9025 8464
6 85 85 7225 7225 7225
7 91 89 8099 8281 7921
8 76 81 6156 5776 6561
9 88 96 8448 7744 9216
10 79 77 6083 6241 5929
∑ 𝑋=853 ∑ 𝒀 =871 ∑ 𝑿 𝒀 =74539 ∑ 𝑿𝟐 =73125 ∑ 𝒀𝟐 =76165

Solution
∑ 𝑦 ∑ 𝑥 2 − ∑ 𝑥 ∑ 𝑥𝑦 (871)(73125) − (853)(74539)
𝑎= = = 30.24
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 10(73125) − (853)2

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 10(74539) − (853)(871)
𝑏= = = 0.67
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 10(73125) − (853)2
So the regression equation is
𝑦 = 30.24 + 0.67𝑥
So we can say that every unit increase in the score of mathematics there is an increase a
score of statistics for 0.67

Interpreting the y-intercept a and slope b.

Example 2. Suppose a statistics professor wants to use the number of hours a student
studies for a statistics (X) to predict the final exam score (Y). A regression model was fit
based on data collected for a class during the previous semester, with the following results:

𝑦 = 35.5 + 3𝑥

What is the interpretation of the Y intercept, a, and the slope, b?

Interpretation:
The Y intercept a= 35.5 indicates that when the student does not study for the
Final exam, the mean final exam score is 35.5. The slope b = 3 indicates that for each
increase of one hour in studying time, the mean change in the final exam score is
predicted to be +3.0. In other words, the final exam score is predicted to increase by 3
points for each one-hour increase in studying time.

Example. Using Microsoft excel


Test if there is significant relationship between the area of store and the weekly sales.
( note: same commodities are being sold in the store and same location/street)
Y (average
X ( area of
sales per
store in
week) in
square
thousand
meter)
peso
20 12
24 15
25 16
30 20
LESSON 6.2 CORRELATION AND SIMPLE REGRESSION

23 21
24 21
27 23
28 25
32 24
34 25
40 29
36 25
32 26
34 26
38 28

First step is to encode in the Microsoft excel

Step two
Bring the cursor in the icon data, click and find for data analysis

Step three, click data analysis, find regression


LESSON 6.2 CORRELATION AND SIMPLE REGRESSION

Step four Click regression and click ok,


Fill up input range for Y and X, click output range and click any cell that is free from data

Step five. Click ok


LESSON 6.2 CORRELATION AND SIMPLE REGRESSION

Step six. Rewrite the following


Pearson r is the value of multiple r which is 0.86

Probability value is 3.74878𝐸 − 05 𝑜𝑟 3.74878𝑥10−5 or equivalently 0.0000374878

For regression equation


𝑦 = 𝑎 + 𝑏𝑥, a=1.281701 or 1.28 while b = 0.708668 or 0.71, thus the
regression equation is 𝑦 = 1.28 + 0.71𝑥

Interpretation.
It shows that there is a very high relationship, that when the area of a store if bigger/larger
the sales per week is also high. It shows further that there is significant relationship since the
probability value is 0.00.
The regression equation reveals that for every unit are increase the sales per week is
increase by 0.71 in thousand pesos or 71 pesos

Note: Pearson Product Moment Correlation and Regression Analysis applies only to interval
and ratio data and it must be a pairwise comparison.

References
Statistics for Managers Using Microsoft Excel, Fifth Edition, by David M. Levine, Mark L.
Berenson, and Timothy C. Krehbiel. Published by Prentice Hall. Copyright 2008 by Pearson
Education, Inc.
https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-
formula/#Pearson
https://userweb.ucs.louisiana.edu/~rmm2440/CompFormulasANOVA.pdf
http://www.cimt.org.uk/projects/mepres/alevel/fstats_ch7.pdf
https://sites.calvin.edu/scofield/courses/m143/materials/handouts/anova1And2.pdf

You might also like