0% found this document useful (0 votes)
47 views41 pages

Correlation Analysis and Regression 22

Correlation and regression analysis measure the relationship between two variables. Correlation coefficients range from -1 to 1, indicating the strength and direction of the linear relationship between variables. A value of 0 means no relationship, while values closer to 1 or -1 indicate a stronger positive or negative linear relationship. Regression analysis uses independent variables to predict or explain the values of dependent variables through linear functions or models.

Uploaded by

Allen Kurt Ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views41 pages

Correlation Analysis and Regression 22

Correlation and regression analysis measure the relationship between two variables. Correlation coefficients range from -1 to 1, indicating the strength and direction of the linear relationship between variables. A value of 0 means no relationship, while values closer to 1 or -1 indicate a stronger positive or negative linear relationship. Regression analysis uses independent variables to predict or explain the values of dependent variables through linear functions or models.

Uploaded by

Allen Kurt Ramos
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

CORRELATION

AND REGRESSION
ANALYSIS
ARMANDO C. MANZANO
Correlation

◦ Correlation is concerned with the relationship between two


variables. It measures the association or strength of relationship
between two variables say 𝑥 and 𝑦.

◦ To any extent, the changes in one variable affect the value of


the other variable.


◦The coefficient of correlation denoted by ρ(the Greek
letter rho) or 𝑟, measures the similarity of the changes in
the value of x and y. Its ranges is
−𝟏 ≤ 𝒓 ≤ +𝟏
◦ If y increases when x increases, 𝑟 is positive. If y
decreases when x increases, 𝑟 is negative. If y is
unaffected by x, then 𝒓 = 𝟎.
▪ Only concerned with strength of the relationship
▪ No causal effect is implied

Bivariate data
Are data sets in which each subject has two
observations associated with it.

4
TYPES
POSITIVE CORRELATION – exists when high scores in
one variable are associated with high scores in the second
variable or low scores in one variable are associated with
low scores in the other
NEGATIVE CORRELATION – exists when high scores in
one variable are associated with low scores in the second
or vice versa.
ZERO CORRELATION– exists when the points on the
scatter diagram are spread in a random manner.
PERFECT CORRELATION– all points lie on a straight
line
5
THE STRENGTH OR DEGREE OF THE
RELATIONSHIP IS BASED ON THE FOLLOWING
RA N GES OF T H E CORR EL A T I O N COEFFI CI EN T:

 Ranges of r Degree/strength of relationship


±1.00 perfect relationship
± 0.90 to ± 0.99 very strong/very high
± 0.70 to ± 0.89 strong/high
± 0.40 to ± 0.69 moderate/substantial
± 0.20 to ± 0.39 weak/small
± 0.01 to ± 0.19 Negligible almost no
0 no correlation
Scatter Diagram or Scatter Plot

◦Scatter plot is used to show a rough


estimate of the relationship between
two variables
Scatter Diagram…
◦ It appears that in fact there is a relationship, that is, the greater the
house size the greater the selling price…

1.8
Patterns of Scatter Diagrams…
◦ Linearity and Direction are two concepts we are interested in

Positive Linear Relationship Negative Linear Relationship

Weak or Non-Linear Relationship


1.9
SCATTER PLOT EXAMPLES
Strong relationships Weak relationships

y y

x x

y y

10
x
SCATTER PLOT EXAMPLES
No relationship

x 11
CORRELATION COEFFICIENT

A descriptive measure usually


denoted by r, which ranges
from -1 to 1.
It measures the degree of
relationship between two
variables.
12
FEATURES OF R
Unit free
Ranges between -1 and 1
The closer to -1, the stronger the
negative linear relationship
The closer to 1, the stronger the
positive linear relationship
The closer to 0, the weaker the
linear relationship 13
Correlation Coefficient: Simple Definition, Formula,
Easy Steps

◦ Correlation coefficients are used to measure how strong a


relationship is between two variables. There are several types of
correlation coefficient, but the most popular is Pearson’s.
Pearson’s correlation (also called Pearson’s R) is a correlation
coefficient commonly used in linear regression. If you’re starting
out in statistics, you’ll probably learn about Pearson’s R first. In
fact, when anyone refers to the correlation coefficient, they are
usually talking about Pearson’s.
Pearson Product-Moment Correlation
◦ What does this test do?
◦ The Pearson product-moment correlation coefficient (or Pearson
correlation coefficient, for short) is a measure of the strength of a
linear association between two variables and is denoted by r.
Basically, a Pearson product-moment correlation attempts to draw
a line of best fit through the data of two variables, and the Pearson
correlation coefficient, r, indicates how far away all these data
points are to this line of best fit (i.e., how well the data points fit this
new model/line of best fit).
What values can the Pearson correlation coefficient
take?

◦ The Pearson correlation coefficient, r, can take a range of values


from +1 to -1. A value of 0 indicates that there is no association
between the two variables. A value greater than 0 indicates a
positive association; that is, as the value of one variable increases,
so does the value of the other variable. A value less than 0 indicates
a negative association; that is, as the value of one variable
increases, the value of the other variable decreases. This is shown
in the diagram below
Formula
◦ Pearson Product Moment Coefficient of Correlation

𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟=
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2

◦ where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Formula
◦ Pearson Product Moment Coefficient of Correlation

𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟=
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2

◦ where:
r = Sample correlation coefficient
n = Sample size
x = Value of the independent variable
y = Value of the dependent variable
Formula
◦ Pearson Product Moment Coefficient of Correlation

𝑛 σ 𝑥𝑦 − σ 𝑥 σ 𝑦
𝑟=
𝑛 σ 𝑥2 − σ 𝑥 2 𝑛 σ 𝑦2 − σ 𝑦 2
CALCULATIONEXAMPLE
Tree Trunk
Height Diameter
y x xy y2 x2
35 8 280 1225 64
49 9 441 2401 81
27 7 189 729 49
33 6 198 1089 36
60 13 780 3600 169
21 7 147 441 49
45 11 495 2025 121
51 12 612 2601 144
=321 =73 =3142 =14111 =713
CALCULATION EXAMPLE
)

Tree
nxy − xy
Height,
r=
[n(x2 )−(x)2][n(y2)−(y)2]
y
70

60

8(3142)−(73)(321)
50 =
40
[8(713)−(73)2][8(14111)−(321)2]
30

20
= 0.886
10
Trunk Diameter, x
r = 0.886 → relatively strong
0
0 2 4 6 8 10 12 14 positive
21
linear association between x and y
Example: The data below summarizes the results of
midterm grade and final exam result. Let us try to
predict that if a certain grade result in midterm will
determine a value for his final grade.
Let x = midterm grade
y = final grade

x 75 70 65 90 85 85 80 70 65 90

y 80 75 65 95 90 85 90 75 70 90
EXERCISE
Identify the correlation given a pair of variables

Temperature and air conditioning cost


School attendance achievement
Investment period and interest earned
Weight and IQ
Temperature and ice cream sales
Age and agility
Amount of exercise and body weight 26
 Pearson product moment correlation
coefficient
 Coefficient of determination = R squared
 Indicates the proportion of the variance in
one variable that can be associated within
the variance in the other variable.
COEFFICIENT OF
DETERMINATION, 𝑹 𝟐

The coefficient of determination is


the portion of the total variation in
the dependent variable that is
explained by variation in the
independent variable
COEFFICIENT OF DETERMINATION, R 2
(

Note: In the single independent variable case, the


coefficient of determination is

where:
R 2
=r 2

R2 = Coefficient of
determination
r = Simple correlation
coefficient
INTRODUCTION TO REGRESSION
ANALYSIS

 Regression analysis is used to:


▪ Predict the value of a dependent variable
based on the value of at least one independent
variable
▪ Explain the impact of changes in an
independent variable on the dependent
variable
Dependent variable: the variable we wish
to explain
Independent variable: the variable used to
explain the dependent variable 30
SIMPLE LINEAR REGRESSION MODEL

 Only one independent variable, x


 Relationship between x and y is
described by a linear function
 Changes in y are assumed to be
caused by changes in x
TYPES OF
REGRESSION
Positive Linear Relationship
MODELS
Relationship NOT Linear

Negative Linear Relationship No Relationship

68
COEFFICIENT OF DETERMINATION, R2
(continued)
Coefficient of determination

SSR sum of squares explainedby regression


R2 = =
SST total sum of squares

Note: In the single independent variable case, the coefficient


of determination is

R =r2 2
where:
R2 = Coefficient of determination
r = Simple correlation coefficient 33
EXAMPLES OF APPROXIMATE
R 2 VALUES

y
R2 = 1

Perfect linear relationship


between x and y:
x
R2 = 1
y 100% of the variation in y is
explained by variation in x

x 34
R2 = +1
EXAMPLES OF
APPROXIMATE
R2 VALUES
y
0 < R2 < 1

Weaker linear relationship


between x and y:
x
Some but not all of the
y
variation in y is explained
by variation in x

x 35
EXAMPLES OF
APPROXIMATE
2 VALUES
R

R2 = 0
y
No linear relationship
between x and y:

The value of Y does not


x depend on x. (None of the
R2 = 0
variation in y is explained
by variation in x)

36
EXAMPLE

 The relationship between the number of


sale calls and the number of units sold is
given by r = 0.759
 The coefficient of determination is r
squared = 0.576
This means that 57.6 % of the variation
in the number of units sold is explained, or
accounted for, by the variation in the number
of sale calls.
 Correlation is a measure of the linear relationship between two
variables and does not mean there is a causal relationship
between them.

 Example. ( explain that there is no causal relationship between


the variables, other factors must have been the causes)
 IQ level and starting menstrual period among females
 Entrance test result and grades.
REGRESSION ANALYSIS
The process of developing an equation,
Preliminaries
Regression equation
How well a regression line fits the data
R2 =1 perfect fit
R2 =0
0< r2< 0.5 not well fit

39
Title Lorem Ipsum

01 02 03
Lorem ipsum Nunc viverra Pellentesque
dolor sit amet, imperdiet enim. habitant morbi
consectetuer Fusce est. tristique
adipiscing elit. Vivamus a senectus et
tellus. netus.

You might also like