Lesson 1-Correlation
Lesson 1-Correlation
MATH 212
ENGINEERING DATA ANALYSIS
1|Page
ENGINEERING DATA ANALYSIS
Faculty Information:
Getting help
2|Page
ENGINEERING DATA ANALYSIS
TABLE OF CONTENTS
CONTENTS PAGE
Module 5 Outcomes……………..………………….. 4
Lesson 1 ………..……………………………….. 5
Application 1…………………………………………. 10
References ………………………………………….. 11
3|Page
ENGINEERING DATA ANALYSIS
MODULE OVERVIEW
MODULE OUTCOME
4|Page
ENGINEERING DATA ANALYSIS
Learning Outcome:
Introduction
Abstraction
Properties of r
1. The value of r does not depend on the unit of measurement for either variable. For
example, if x is height, the corresponding z score is the same whether height is
expressed in inches, meters, or miles, and thus the value of the correlation coefficient
is not affected. The correlation coefficient measures the inherent strength
of the linear relationship between two numerical variables.
2. The value of r does not depend on which of the two variables is considered x.
3. The value of r is between 1 and +1. A value near the upper limit, +1, indicates
a strong positive relationship, whereas an r close to the lower limit, 1, suggests a
strong negative relationship.
Figure 1 shows a useful way to describe the strength of relationship based on r. It
may seem surprising that a value of r as extreme as 0.5 or 0.5 should be in the weak
category. Even a weak correlation can indicate a meaningful relationship
5|Page
ENGINEERING DATA ANALYSIS
5. The value of r is a measure of the extent to which x and y are linearly related—that
is, the extent to which the points in the scatterplot fall close to a straight line. A
value of r close to 0 does not rule out any strong relationship between x and y;
there could still be a strong relationship that is not linear.
A value of r close to 1 indicates that the larger values of one variable tend to be
associated with the larger values of the other variable. This is far from saying that a
large value of one variable causes the value of the other variable to be large.
Correlation measures the extent of association, but association does not imply
causation. It frequently happens that two variables are highly correlated not because
one is causally related to the other but because they are both strongly related to a third
variable.
Scientific experiments can frequently make a strong case for causality by carefully
controlling the values of all variables that might be related to the ones under study.
Then, if y is observed to change in a “smooth” way as the experimenter changes the
value of x, a plausible explanation would be that there is a causal relationship between
x and y. In the absence of such control and ability to manipulate values of one
variable, we must admit the possibility that an un identified underlying third variable
is influencing both the variables under investigation.
r=
√
6|Page
ENGINEERING DATA ANALYSIS
∑ ∑
=∑
𝑜 𝑜 𝑜 𝑜
∑ ∑
=∑ =∑
3. Rejection Criterion:
a. Reject 𝑡 𝑡 .
b. Reject 𝑡 𝑡 .
c. Reject |𝑡 | 𝑡 .
√
𝑡 = where 𝑡 = computed t-statistic
√
r = correlation coefficient
n = sample size
Example 1: Compute and interpret the correlation coefficient for the following
grades of ten students selected at random. Is there a significant relationship
between mathematics and English grades?
Algebra 72 90 79 80 88 90 78 82 90 70
Grade
English 76 88 80 78 90 92 80 82 89 68
Grade
7|Page
ENGINEERING DATA ANALYSIS
Solution:
Application:
Solving for: r=
√
where:
∑ ∑
=∑ = 67886 - = 482.3
∑
=∑ - = 67577 - = 500.9
∑
∑ - = 68237 - = 504.10
.
Therefore, r = = = 0.960
√ √ . .
Based on the qualitative interpretation of r, the result indicates that there is a very
strong linear relationship between the English and Mathematics grades of the
students. That is, the higher is the student’s English grade, the higher is his
Mathematics grade.
3. Rejection Criterion:
Reject |𝑡 | 𝑡
|𝑡 | 𝑡 . 2.306 (refer to Appendix Table 1)
√ . √
𝑡 = = = 9.697
√ √ .
• gives the proportion of the total variation in y that is accounted for by the
independent variable x.
• ranges from 0 or 1 to 0 to 100 if expressed in %.
The nearer the value to 1 or 100, the better is the fit of the regression line.
9|Page
ENGINEERING DATA ANALYSIS
Application
1. Here are the number of hours that 10 Math 121 students spent studying for a
final exam(x) , and their score on that exam(y) .
Closure
Well done! You have just finished Lesson 1 of this module. Should there
be some parts of the lesson which you need clarification, please ask your tutor during
your face-to-face or on-line interactions. You may proceed to lesson 2 which will
introduce you to Inference for two or More Samples.
10 | P a g e
ENGINEERING DATA ANALYSIS
References
Peck, R., Olsen, C. and Devore, J.L. (2012): Introduction to Statistics and Data
Analysis(4th edition). Brooks/Cole/Cengage Learning, 20 Channel
Center Street Boston, MA 02210, USA
Walpole, RE, & Myers, RH.(1993). Probability and Statistics for Engineers and (5th
ed.). Macmillan Publishing Company, New York.
11 | P a g e
ENGINEERING DATA ANALYSIS
12 | P a g e