Lesson 6.2 Correlation and Regression Analysis Final Edition
Lesson 6.2 Correlation and Regression Analysis Final Edition
Learning Outcomes:
Upon completion of this unit, you should be able to:
1. To describe coefficient of correlation
2. To describe the relationship between regression and correlation
3. To use regression analysis to predict the value of a dependent variable based on an
independent variable
4. The meaning of the regression coefficients b0 and b1
5. To evaluate the assumptions of regression analysis and know what to do if the assumptions
are violated
6. To make inferences about the slope and correlation coefficient
7. To estimate mean values and predict individual values
Discussion
How can we determine the strength of association based on the Pearson correlation
coefficient?
The stronger the association of the two variables, the closer the Pearson correlation
coefficient, r, will be to either +1 or -1 depending on whether the relationship is positive or
negative, respectively. Achieving a value of +1 or -1 means that all your data points are
included on the line of best fit – there are no data points that show any variation away
from this line. Values for r between +1 and -1 (for example, r = 0.8 or -0.4) indicate that there
is variation around the line of best fit. The closer the value of r to 0 the greater the variation
around the line of best fit. Different relationships and their correlation coefficients are
shown in the diagram below:
When you interpret the result of coefficient of correlation you should be guided by the
following reminders:
1. The relationship of the variables does not necessary mean that one affects the
other variable. It does not mean the cost and effect relationship.
2. If the computed coefficient of correlation is high, it does not mean that one
factor is strongly dependent on the other. For instance the weight and grade
of student, making this as correlation does make any sense. But if the weight
and height of student make a sense.
LESSON 6.2 CORRELATION AND SIMPLE REGRESSION
3. If two variables are related to each other and the computed r is high, then
there is a reason to believe that they are meant to be associated.
The Formula is
𝒏(∑ 𝑿𝒀) − (∑ 𝑿)(∑ 𝒀)
𝒓=
√[𝒏 ∑ 𝑿𝟐 − (∑ 𝑿)𝟐 ][𝒏 ∑ 𝒀𝟐 − (∑ 𝒀)𝟐 ]
Example: Test the relationship between the quiz score in mathematics and statistics.
math score statistics score
78 82
92 88
86 91
83 90
95 92
85 85
91 89
76 81
88 96
79 77
Solution:
To test the significance of correlation coefficient r, use the t-test for r using the
formula
𝑟√𝑛−2
𝑡𝑐 =
√1−𝑟 2
Where n-2 degrees of freedom
The test statistic t has the same sign as the correlation coefficient r.
Example: from example above with r=0.73 and n-2=8, the tabular value is 2.306 with level
of significance is 0.05.
0.73√10−2 2.06
𝑡𝑐 = = = 3.05
√1−(0.73)2 √0.4671
Since the computed value is greater than the tabular value we reject the null
hypothesis. Thus there is significant relationship between the score of mathematics and
the score in statistics.
Solution
∑ 𝑦 ∑ 𝑥 2 − ∑ 𝑥 ∑ 𝑥𝑦 (871)(73125) − (853)(74539)
𝑎= = = 30.24
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 10(73125) − (853)2
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 10(74539) − (853)(871)
𝑏= = = 0.67
𝑛 ∑ 𝑥 2 − (∑ 𝑥)2 10(73125) − (853)2
So the regression equation is
𝑦 = 30.24 + 0.67𝑥
So we can say that every unit increase in the score of mathematics there is an increase a
score of statistics for 0.67
Example 2. Suppose a statistics professor wants to use the number of hours a student
studies for a statistics (X) to predict the final exam score (Y). A regression model was fit
based on data collected for a class during the previous semester, with the following results:
𝑦 = 35.5 + 3𝑥
Interpretation:
The Y intercept a= 35.5 indicates that when the student does not study for the
Final exam, the mean final exam score is 35.5. The slope b = 3 indicates that for each
increase of one hour in studying time, the mean change in the final exam score is
predicted to be +3.0. In other words, the final exam score is predicted to increase by 3
points for each one-hour increase in studying time.
23 21
24 21
27 23
28 25
32 24
34 25
40 29
36 25
32 26
34 26
38 28
Step two
Bring the cursor in the icon data, click and find for data analysis
Interpretation.
It shows that there is a very high relationship, that when the area of a store if bigger/larger
the sales per week is also high. It shows further that there is significant relationship since the
probability value is 0.00.
The regression equation reveals that for every unit are increase the sales per week is
increase by 0.71 in thousand pesos or 71 pesos
Note: Pearson Product Moment Correlation and Regression Analysis applies only to interval
and ratio data and it must be a pairwise comparison.
References
Statistics for Managers Using Microsoft Excel, Fifth Edition, by David M. Levine, Mark L.
Berenson, and Timothy C. Krehbiel. Published by Prentice Hall. Copyright 2008 by Pearson
Education, Inc.
https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-
formula/#Pearson
https://userweb.ucs.louisiana.edu/~rmm2440/CompFormulasANOVA.pdf
http://www.cimt.org.uk/projects/mepres/alevel/fstats_ch7.pdf
https://sites.calvin.edu/scofield/courses/m143/materials/handouts/anova1And2.pdf