100% found this document useful (1 vote)
29 views

Lesson 1-Correlation

Uploaded by

Innoj Maco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
29 views

Lesson 1-Correlation

Uploaded by

Innoj Maco
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

ENGINEERING DATA ANALYSIS

University of Southeastern Philippines


COLLEGE OF ENGINEERING
Obrero, Davao City

MATH 212
ENGINEERING DATA ANALYSIS

DALIA M. RECONALLA, Ph.D


August 2020

1|Page
ENGINEERING DATA ANALYSIS

Faculty Information:

Name: Dalia M. Reconalla


Email: [email protected]
Contact Number: 0906-209-6611
Office: College of Engineering
Contact Number: (082) 224-3334
Consultation Hours: By appointment - may be arranged through:
 Official email
 Facebook messenger/Facebook group chat
 Text or call

Getting help

For academic concerns (College/Adviser - Contact details)


For administrative concerns (College Dean - Contact details)
For UVE concerns (KMD - Contact details)
For health and wellness concerns (UAGC, HSD and OSAS - Contact
details)

2|Page
ENGINEERING DATA ANALYSIS

TABLE OF CONTENTS

CONTENTS PAGE

Cover page ………………………………… 1

Faculty Information ……………………………….... 2

Table of Contents ………………………………… 3

Module 5 Overview ………………………………… 4

Module 5 Outcomes……………..………………….. 4

Lesson 1 ………..……………………………….. 5

Application 1…………………………………………. 10

References ………………………………………….. 11

Appendix Table I……………………………………... 12

3|Page
ENGINEERING DATA ANALYSIS

MODULE OVERVIEW

In this module we will examine the relationship between


two numerical variables. Determine if there is indeed a
relation, and determine how strong that relation is . We
will go on to make inferences about the population
correlation coefficient using the sample correlation
coefficient. This module covers three lessons as
follows:

o Lesson 1: Correlation: Estimating the Strength


of Linear Relationship
o Lesson 2: Simple Linear Regression
o Lesson 3: Multiple Linear Regression

MODULE OUTCOME

At the completion of the module, you should be able


to:

o Estimate the strength of relationship between two


variables and relationship of more than two
variables.
o Formulate a simple linear and multiple linear
regression models.
o Test the significant relationship among the
variables in multiple linear regression

4|Page
ENGINEERING DATA ANALYSIS

Learning Outcome:

o Estimate the strength of relationship between two variables.

Time Frame: Week 10

Introduction

Correlation deals primarily with the magnitude and direction of relationships.


It is a measure of a statistical significant relationship between two variables of
interest, and this can be determined through the use of correlation coefficient.

Abstraction

Correlation Analysis is a statistical technique used to determine the strength or degree


of linear relationship between two variables.

A measure of the degree of linear relationship is called correlation coefficient, r.


The more pronounced the linear relationship, the higher is the degree of linear
relationship and the greater is the magnitude of the correlation coefficient.

Properties of r

1. The value of r does not depend on the unit of measurement for either variable. For
example, if x is height, the corresponding z score is the same whether height is
expressed in inches, meters, or miles, and thus the value of the correlation coefficient
is not affected. The correlation coefficient measures the inherent strength
of the linear relationship between two numerical variables.

2. The value of r does not depend on which of the two variables is considered x.

3. The value of r is between 1 and +1. A value near the upper limit, +1, indicates
a strong positive relationship, whereas an r close to the lower limit, 1, suggests a
strong negative relationship.
Figure 1 shows a useful way to describe the strength of relationship based on r. It
may seem surprising that a value of r as extreme as 0.5 or 0.5 should be in the weak
category. Even a weak correlation can indicate a meaningful relationship

Figure 1. Describing the strength of a linear relationship.


Source: Introduction to statistics and Data Analysis by Peck, Olsen and Devore, 2012

5|Page
ENGINEERING DATA ANALYSIS

4. A correlation coefficient of r =1 occurs only when all the points in a scatterplot of


the data lie exactly on a straight line that slopes upward. Similarly, r = 1 only when
all the points lie exactly on a downward-sloping line. Only when there is a perfect
linear relationship between x and y in the sample does r take on one of its two
possible extreme values.

5. The value of r is a measure of the extent to which x and y are linearly related—that
is, the extent to which the points in the scatterplot fall close to a straight line. A
value of r close to 0 does not rule out any strong relationship between x and y;
there could still be a strong relationship that is not linear.

Ranges of r Degree/Strength of Relationship


±1 Perfect relationship
± . 080 to ± 0.99 Strong
± 0.50 𝑡𝑜 ± 0.79 Moderate
± 0.01 𝑡𝑜 ± 0.49 Weak
0.0 No correlation

Note: One should be careful in interpreting the correlation coefficient when it is


near zero. It is possible that variables x and y are strongly correlated but not in
linear way.

Correlation and Causation

A value of r close to 1 indicates that the larger values of one variable tend to be
associated with the larger values of the other variable. This is far from saying that a
large value of one variable causes the value of the other variable to be large.

Correlation measures the extent of association, but association does not imply
causation. It frequently happens that two variables are highly correlated not because
one is causally related to the other but because they are both strongly related to a third
variable.

Scientific experiments can frequently make a strong case for causality by carefully
controlling the values of all variables that might be related to the ones under study.
Then, if y is observed to change in a “smooth” way as the experimenter changes the
value of x, a plausible explanation would be that there is a causal relationship between
x and y. In the absence of such control and ability to manipulate values of one
variable, we must admit the possibility that an un identified underlying third variable
is influencing both the variables under investigation.

Estimation of Correlation Coefficient,


Based on the simple random sample of size n, an estimator of is the sample
correlation coefficient, r, defined as

r=

6|Page
ENGINEERING DATA ANALYSIS

where = sum of the cross-products of x and y

∑ ∑
=∑
𝑜 𝑜 𝑜 𝑜
∑ ∑
=∑ =∑

Test of Hypothesis About Correlation Coefficient,

1. State the null and the alternative hypothesis.


= 0. There is no correlation between x and y .
: > 0. There is a significant positive correlation between x and y.
b) : < 0. There is a significant negative correlation between x and y.
c) : 0. There is a significant correlation between x and y.

2. Test Statistic: Use t-test at level of significance.

3. Rejection Criterion:
a. Reject 𝑡 𝑡 .
b. Reject 𝑡 𝑡 .
c. Reject |𝑡 | 𝑡 .

4. Computation for the test-statistic:


𝑡 = where 𝑡 = computed t-statistic

r = correlation coefficient
n = sample size

5. Decision: State your decision based on the rejection criterion and


computed t- statistics.

6. Make a conclusion and interpret the results of hypothesis test.

Example 1: Compute and interpret the correlation coefficient for the following
grades of ten students selected at random. Is there a significant relationship
between mathematics and English grades?

Algebra 72 90 79 80 88 90 78 82 90 70
Grade
English 76 88 80 78 90 92 80 82 89 68
Grade

7|Page
ENGINEERING DATA ANALYSIS

Solution:

Let x be the grade in Mathematics


y be the grade in English

Computations are presented in the table:

Application:

Solving for: r=

where:
∑ ∑
=∑ = 67886 - = 482.3


=∑ - = 67577 - = 500.9


∑ - = 68237 - = 504.10

.
Therefore, r = = = 0.960
√ √ . .

Based on the qualitative interpretation of r, the result indicates that there is a very
strong linear relationship between the English and Mathematics grades of the
students. That is, the higher is the student’s English grade, the higher is his
Mathematics grade.

Testing the Degree of Relationship Between Mathematics and English Grades

1. State the null and the alternative hypothesis


8|Page
ENGINEERING DATA ANALYSIS

= 0. There is no correlation between the students’ Mathematics grades


and English grades.
: 0. There is a significant degree of relationship between students’
Mathematics grades and English grades.

2. Test Statistic: Use t-test at level of significance.

3. Rejection Criterion:
Reject |𝑡 | 𝑡
|𝑡 | 𝑡 . 2.306 (refer to Appendix Table 1)

4. Computation for the test-statistic:

√ . √
𝑡 = = = 9.697
√ √ .

5. Decision: Since | |= 9.697 > . = 2.306, reject .

6. Interpretation: The result reveals that there is a significant degree of


relationship between the students’ Math and English grades.

The Coefficient of Determination

The overall measure of adequacy of the equation is provided by the coefficient


of determination, .

• gives the proportion of the total variation in y that is accounted for by the
independent variable x.
• ranges from 0 or 1 to 0 to 100 if expressed in %.

The nearer the value to 1 or 100, the better is the fit of the regression line.

Illustration: From Example 1, if r = 0.960( Example 1-Correlation). Here the


coefficient of determination, 0.960 0.9216.
This tells us that 92.16% of the variation in English grade is explained by the
variation in Mathematics grade.

9|Page
ENGINEERING DATA ANALYSIS

Application

1. Here are the number of hours that 10 Math 121 students spent studying for a
final exam(x) , and their score on that exam(y) .

a. Calculate the correlation coefficient r for these data.


b. At the 0.05 level of significance, test the claim that there is a significant
correlation between the hours spent in studying and the exam score.
c. Determine the coefficient of determination and interpret the result.

Closure
Well done! You have just finished Lesson 1 of this module. Should there
be some parts of the lesson which you need clarification, please ask your tutor during
your face-to-face or on-line interactions. You may proceed to lesson 2 which will
introduce you to Inference for two or More Samples.

10 | P a g e
ENGINEERING DATA ANALYSIS

References

Broto, A.S. (2007). Simplified Approach to Inferential Statistics(1st ed.). National .


Philippines.

Carambas, Zenaida U(2011). Basic probability and Statistics. Valencia Educational


Supply. Baguio City

Peck, R., Olsen, C. and Devore, J.L. (2012): Introduction to Statistics and Data
Analysis(4th edition). Brooks/Cole/Cengage Learning, 20 Channel
Center Street Boston, MA 02210, USA

Ott, R.L., Longnecker, M. (2010). An Introduction to Statistical Methods and Data


Amalysis(6th ed). Brooks/Cole, Cengage Learning, CA, USA.

Raussas, George(2003). Introduction to Probability and Statistical Inference.


Elseviere Science, USA

Walpole, RE, & Myers, RH.(1993). Probability and Statistics for Engineers and (5th
ed.). Macmillan Publishing Company, New York.

Weiss, N.A. (2012). Elementary Statistics (8th ed.)Addison-Wesley. Pearson


Education, Inc. Boston, MA.

Woodbury, George(2002): An Introduction to Statistics(1st ed.) Thomson Learning,


Inc. Thomson Learning, USA

11 | P a g e
ENGINEERING DATA ANALYSIS

Appendix Table I. Values of 𝑡𝛼

12 | P a g e

You might also like