0% found this document useful (0 votes)

38 views44 pages

Correlation and Regression Analysis Guide

cse303

Uploaded by

tasniaifa961

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views44 pages

Correlation and Regression Analysis Guide

cse303

Uploaded by

tasniaifa961

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Correlation and

Regression
Topics Covered:
◼ Is there a relationship bivariate data(between x and y)?
◼ What is the strength of this relationship
 Pearson’s r
◼ Can we describe this relationship and use this to predict y from
x?
 Regression
◼ Is the relationship we have described statistically significant?
 t-test
The relationship bivariate data
between x and y
◼ Correlation: is there a relationship between 2
variables?
◼ Regression: how well a certain independent
variable predict dependent variable?
◼ CORRELATION  CAUSATION
 In
order to infer causality: manipulate independent
variable and observe effect on dependent variable
Scattergrams

Y Y Y

X
X X

Positive correlation Negative correlation No correlation

Variance vs Covariance
◼ First, a note on your sample:
◼ If you’re wishing to assume that your sample is
representative of the general population (RANDOM
EFFECTS MODEL), use the degrees of freedom (n – 1)
in your calculations of variance or covariance.
◼ But if you’re simply wanting to assess your current
sample (FIXED EFFECTS MODEL), substitute n for
the degrees of freedom.
Variance vs Covariance
◼ Do two variables change together?

Variance:
n

 i
• Gives information on variability of a
single variable. ( x − x ) 2

Covariance: S =
2 i =1
n −1
x
• Gives information on the degree to which
two variables vary together.
n

 (x
• Note how similar the covariance is to
variance: the equation simply multiplies x’s
error scores by y’s error scores as opposed i − x)( yi − y )
to squaring x’s error scores. cov( x, y ) = i =1
n −1
Covariance
n

 (x i − x)( yi − y )
cov( x, y ) = i =1
n −1
◼ When X and Y : cov (x,y) = pos.
◼ When X and Y : cov (x,y) = neg.
◼ When no constant relationship: cov (x,y) = 0
Example Covariance
xi − x yi − y ( xi − x )( yi − y )
7
x y
6

5 0 3 -3 0 0
4 2 2 -1 -1 1
3 3 4 0 1 0
2
4 0 1 -3 -3
1
6 6 3 3 9
0
0 1 2 3 4 5 6 7 x=3 y=3 = 7

 ( x − x)( y − y))
i i
7
What does this
cov( x, y ) = i =1
= = 1.75 number tell us?
n −1 4
Example Covariance
xi − x yi − y ( xi − x )( yi − y )
7
x y
6

5 0 3 -3 0 0
4 2 2 -1 -1 1
3 3 4 0 1 0
2
4 0 1 -3 -3
1
6 6 3 3 9
0
0 1 2 3 4 5 6 7 x=3 y=3 = 7

 ( x − x)( y − y))
i i
7
What does this
cov( x, y ) = i =1
= = 1.75 number tell us?
n −1 4

A positive covariance indicates that the two variables tend to move in the same
direction, while a negative covariance indicates that they tend to move in opposite
directions.
Problem with Covariance:
◼ The value obtained by covariance is dependent on the size of
the data’s standard deviations: if large, the value will be
greater than if small… even if the relationship between x and y
is exactly the same in the large versus small standard
deviation datasets.
Example of how covariance value
relies on variance
High variance data Low variance data

Subject x y x error * y x y X error * y

error error
1 101 100 2500 54 53 9
2 81 80 900 53 52 4
3 61 60 100 52 51 1
4 51 50 0 51 50 0
5 41 40 100 50 49 1
6 21 20 900 49 48 4
7 1 0 2500 48 47 9
Mean 51 50 51 50

Sum of x error * y error : 7000 Sum of x error * y error : 28

Covariance: 1166.67 Covariance: 4.67

Solution: Pearson’s r
◼ Covariance does not really tell us anything

▪ Solution: standardise this measure

◼ Pearson’s R: standardises the covariance value.

◼ Divides the covariance by the multiplied standard deviations of X
and Y:

𝑐𝑜𝑣(𝑥, 𝑦)
𝑟𝑥𝑦 =
𝑠𝑥 𝑠𝑦
Pearson’s R continued
Sample Correlation Coefficient
Population Correlation Coefficient
Linear Correlation Coefficient
Linear Correlation Coefficient

The sign of r indicates the strength of the linear relationship between the
variables.
•If r is near 1, then the two variables have a strong linear relationship.
•If r is near 0, then the two variables have no linear relation.
•If r is near -1, then the two variables have a weak (negative) linear
relationship.
Limitations of r
◼ When r = 1 or r = -1:
 We can predict y from x with certainty
 all data points are on a straight line: y = ax + b
◼ r is actually r̂
r = true r of whole population
r̂ = estimate of r based on data
5

◼ r is very sensitive to extreme values:

0
0 1 2 3 4 5 6
Pearson’s R Example
Calculate the Correlation coefficient of given data.
Pearson’s R Example
Solution: X values
Here n = 5
Regression
◼ Correlation tells you if there is an association
between x and y but it doesn’t describe the
relationship or allow you to predict one
variable from the other.

◼ To do this we need REGRESSION!

Best-fit Line
◼ Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that
gives best prediction of y for any value of x

◼ This will be the line that ŷ = ax + b

minimises distance between
data and fitted line, i.e. slope intercept
the residuals
ε

= ŷ, predicted value
= y i , true value
ε = residual error
Least Squares Regression
◼ To find the best line we must minimise the sum of
the squares of the residuals (the vertical distances
from the data points to our line)
Model line: ŷ = ax + b a = slope, b = intercept

Residual (ε) = y - ŷ
Sum of squares of residuals = Σ (y – ŷ)2

◼ we must find values of a and b that minimise

Σ (y – ŷ)2
Finding b
◼ First we find the value of b that gives the min
sum of squares

b
ε b ε
b

◼ Trying different values of b is equivalent to

shifting the line up and down the scatter plot
Finding a
◼ Now we find the value of a that gives the min
sum of squares

b b b

◼ Trying out different values of a is equivalent to

changing the slope of the line, while b stays
constant
Minimising sums of squares
◼ Need to minimise Σ(y–ŷ)2
◼ ŷ = ax + b
◼ so need to minimise:

sums of squares (S)

Σ(y - ax - b)2

◼ If we plot the sums of squares

for all different values of a and b
we get a parabola, because it is a
squared term
Gradient = 0
min S
◼ So the min sum of squares is at Values of a and b
the bottom of the curve, where
the gradient is zero.
The maths bit
◼ The min sum of squares is at the bottom of the curve
where the gradient = 0

◼ So we can find a and b that give min sum of squares

by taking partial derivatives of Σ(y - ax - b)2 with
respect to a and b separately

◼ Then we solve these for 0 to give us the values of a

and b that give the min sum of squares
The solution
◼ Doing this gives the following equations for a and b:

r sy r = correlation coefficient of x and y

a= sx
sy = standard deviation of y
sx = standard deviation of x

◼ From you can see that:

▪ A low correlation coefficient gives a flatter slope (small value of
a)
▪ Large spread of y, i.e. high standard deviation, results in a
steeper slope (high value of a)
▪ Large spread of x, i.e. high standard deviation, results in a flatter
slope (high value of a)
The solution cont.
◼ Our model equation is ŷ = ax + b
◼ This line must pass through the mean so:

y = ax + b b = y – ax
◼ We can put our equation for a into this giving:
r sy r = correlation coefficient of x and y
b=y- s x s = standard deviation of y
y
x s = standard deviation of x
x

◼ The smaller the correlation, the closer the

intercept is to the mean of y
Back to the model
a b
r sy r sy
ŷ = ax + b = x+y- x
sx sx
a a
r sy
Rearranges to: ŷ= (x – x) + y
sx
◼ If the correlation is zero, we will simply predict the mean of y for every
value of x, and our regression line is just a flat straight line crossing the
x-axis at y

◼ But this isn’t very useful.

◼ We can calculate the regression line for any data, but the important
question is how well does this line fit the data, or how good is it at
predicting y from x
How good is our model?
∑(y – y)2 SSy
◼ Total variance of y: sy2 =
n-1
=
dfy

◼ Variance of predicted y values (ŷ):

∑(ŷ – y)2 SSpred This is the variance
sŷ2 = = explained by our
n-1 dfŷ regression model

◼ Error variance: This is the variance of the error

between our predicted y values and
∑(y – ŷ)2 SSer the actual y values, and thus is the
serror2 = = variance in y that is NOT explained
n-2 dfer
by the regression model
How good is our model cont.
◼ Total variance = predicted variance + error variance
sy2 = sŷ2 + ser2

◼ Conveniently, via some complicated rearranging

sŷ2 = r2 sy2

r2 = sŷ2 / sy2

◼ so r2 is the proportion of the variance in y that is explained by

our regression model
How good is our model cont.
◼ Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:

ser2 = sy2 – r2sy2

= sy2 (1 – r2)

◼ From this we can see that the greater the correlation

the smaller the error variance, so the better our
prediction
Is the model significant?
◼ i.e. do we get a significantly better prediction of y
from our regression equation than by just predicting
the mean?

◼ F-statistic: complicated
rearranging
sŷ2 r2 (n - 2)2
F(df ,df ) = =......=
ŷ er
ser2 1 – r2
◼ And it follows that:
r (n - 2) So all we need to
(because F = t2) t (n-2) = know are r and n
√1 – r2
General Linear Model
◼ Linear regression is actually a form of the
General Linear Model where the parameters
are a, the slope of the line, and b, the intercept.
y = ax + b +ε
◼ A General Linear Model is just any model that
describes the data in terms of a straight line
Multiple regression
◼ Multiple regression is used to determine the effect of a number
of independent variables, x1, x2, x3 etc, on a single dependent
variable, y
◼ The different x variables are combined in a linear way and
each has its own regression coefficient:

y = a1x1+ a2x2 +…..+ anxn + b + ε

◼ The a parameters reflect the independent contribution of each

independent variable, x, to the value of the dependent variable,
y.
◼ i.e. the amount of variance in y that is accounted for by each x
variable after all the other x variables have been accounted for
Calculate the regression coefficient and obtain the lines of regression for the following data

Solution:
XY Regression coefficient of X on Y

Regression equation of X on Y
Calculate the regression coefficient and obtain the lines of regression for the following data

Solution: Regression coefficient of Y on X

Regression equation of Y on X

Y = 0.929X–3.716+11
= 0.929X+7.284
The regression equation of Y on X is Y= 0.929X + 7.284
Calculate the two regression equations of X on Y and Y on X from the data given
below, taking deviations from a actual means of X and Y.

Estimate the likely demand when the price is Rs.20.

Solution:
Calculate the two regression equations of X on Y and Y on X from the data given
below, taking deviations from a actual means of X and Y.

Estimate the likely demand when the price is Rs.20.

Solution:
Regression equation of X on Y Regression Equation of Y on X

When X is 20, Y will be

= –0.25 (20)+44.25
= –5+44.25
= 39.25 (when the price is Rs. 20, the likely demand is 39.25)
The following table shows the sales and advertisement expenditure of a form

Coefficient of correlation r= 0.9. Estimate the likely sales for a proposed advertisement expenditure of Rs. 10 crores.

Solution:

When advertisement expenditure is 10 crores i.e., Y=10 then sales X=6(10)+4=64 which
implies sales is 64.
The two regression lines are 3X+2Y=26 and 6X+3Y=31. Find the correlation coefficient.
Solution:
Let the regression equation of Y on X be 3X+2Y = 26
In a laboratory experiment on correlation research study the equation of the two regression
lines were found to be 2X–Y+1=0 and 3X–2Y+7=0 . Find the means of X and Y. Also work out
the values of the regression coefficient and correlation between the two variables X and Y.
Solution:
Solving the two regression equations we get mean values of X and Y
In a laboratory experiment on correlation research study the equation of the two regression
lines were found to be 2X–Y+1=0 and 3X–2Y+7=0 . Find the means of X and Y. Also work out
the values of the regression coefficient and correlation between the two variables X and Y.
Solution:

Corr and Regress
No ratings yet
Corr and Regress
61 pages
Corr and Regress
No ratings yet
Corr and Regress
30 pages
Correlation and Regression
No ratings yet
Correlation and Regression
30 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Parametric Test
No ratings yet
Parametric Test
49 pages
Understanding Linear Regression Concepts
No ratings yet
Understanding Linear Regression Concepts
38 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
MetNum1 2023 1 Week 13
No ratings yet
MetNum1 2023 1 Week 13
70 pages
Correction
No ratings yet
Correction
10 pages
Regression Analysis
No ratings yet
Regression Analysis
13 pages
Business Statistics by Gupta 365 379
No ratings yet
Business Statistics by Gupta 365 379
15 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Lecture2 Xy 2025
No ratings yet
Lecture2 Xy 2025
31 pages
Correlation
100% (1)
Correlation
29 pages
31 Mathematics Correlation Regression
No ratings yet
31 Mathematics Correlation Regression
9 pages
Correlation & Regression (Complete) .PDF Theory Module-6-B
100% (1)
Correlation & Regression (Complete) .PDF Theory Module-6-B
9 pages
Correlation Simple Regression
No ratings yet
Correlation Simple Regression
26 pages
Simulation and Modeling1
No ratings yet
Simulation and Modeling1
17 pages
Introduction to Regression Analysis
No ratings yet
Introduction to Regression Analysis
23 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Stat Chapter 9
No ratings yet
Stat Chapter 9
34 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
QT - LESSON 8-Regression & Correlation
No ratings yet
QT - LESSON 8-Regression & Correlation
12 pages
Lecture 7 8 Weeks Correlation and Regression
No ratings yet
Lecture 7 8 Weeks Correlation and Regression
7 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Reg Corr
No ratings yet
Reg Corr
22 pages
Regression and Correlation Guide
No ratings yet
Regression and Correlation Guide
13 pages
4m TN Chapter 3 Regression
No ratings yet
4m TN Chapter 3 Regression
29 pages
IWB Chapter 10 - Inter-Relationships Between Variables
No ratings yet
IWB Chapter 10 - Inter-Relationships Between Variables
22 pages
Topic - Chapter 12 - Regression Models
No ratings yet
Topic - Chapter 12 - Regression Models
1 page
Correlation vs Regression Explained
100% (1)
Correlation vs Regression Explained
66 pages
Applied Statistics in Construction
No ratings yet
Applied Statistics in Construction
8 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Chapter 5 - Eng
No ratings yet
Chapter 5 - Eng
20 pages
Correlation and Regression Analysis - Updated
No ratings yet
Correlation and Regression Analysis - Updated
49 pages
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
No ratings yet
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
31 pages
Intro to Correlation & Regression
No ratings yet
Intro to Correlation & Regression
71 pages
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
No ratings yet
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
8 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Chapter 13 PowerPoint
No ratings yet
Chapter 13 PowerPoint
36 pages
Lecture8 4
No ratings yet
Lecture8 4
29 pages
Sec D CH 12 Regression Part 2
100% (1)
Sec D CH 12 Regression Part 2
66 pages
Simple Regression and Correlation Analysis
No ratings yet
Simple Regression and Correlation Analysis
30 pages
Simple Linear Regression and Correlation Analysis: Chapter Five
No ratings yet
Simple Linear Regression and Correlation Analysis: Chapter Five
5 pages
02 V3 2016 CFA二级强化班 Quantitative Methods
No ratings yet
02 V3 2016 CFA二级强化班 Quantitative Methods
79 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
Title: Regression and Correlation: Mathematics Support Centre
No ratings yet
Title: Regression and Correlation: Mathematics Support Centre
2 pages
Bivariate EDA and Regression Analysis
No ratings yet
Bivariate EDA and Regression Analysis
61 pages
Correlation and Regression
No ratings yet
Correlation and Regression
3 pages
Chapter Four Violations of The Assumptions of Classical Model
No ratings yet
Chapter Four Violations of The Assumptions of Classical Model
151 pages
Qualitative Response Models
No ratings yet
Qualitative Response Models
35 pages
Primer of Applied Regression & Analysis of Variance 3E 3rd Edition Educational Ebook Download
100% (17)
Primer of Applied Regression & Analysis of Variance 3E 3rd Edition Educational Ebook Download
16 pages
Econometric Model Error Detection
No ratings yet
Econometric Model Error Detection
7 pages
Multilevel Modeling Using R 1st Edition Edition W. Holmes Finch 2024 Scribd Download
100% (28)
Multilevel Modeling Using R 1st Edition Edition W. Holmes Finch 2024 Scribd Download
46 pages
Assignment SPSS - 5 - 1844253
No ratings yet
Assignment SPSS - 5 - 1844253
13 pages
Da Unit III
No ratings yet
Da Unit III
43 pages
Orange Juice Production Process Guide
No ratings yet
Orange Juice Production Process Guide
54 pages
Linear Regression Analysis Guide
No ratings yet
Linear Regression Analysis Guide
3 pages
Error Analysis Manual
100% (1)
Error Analysis Manual
11 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Handout Theory by PD Sir
No ratings yet
Handout Theory by PD Sir
94 pages
Linear Models in R for Bioinformatics
No ratings yet
Linear Models in R for Bioinformatics
21 pages
Properties of OLS Estimators & Significance of Error
No ratings yet
Properties of OLS Estimators & Significance of Error
14 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
4 pages
Bài cuối kì-6313006-Lê Văn An
No ratings yet
Bài cuối kì-6313006-Lê Văn An
5 pages
LAB 1 - Brand Strategy and Super Bowl Twitter Analytics
No ratings yet
LAB 1 - Brand Strategy and Super Bowl Twitter Analytics
6 pages
Quantitative Analysis 3
No ratings yet
Quantitative Analysis 3
22 pages
Topic 7 Linear Regression
No ratings yet
Topic 7 Linear Regression
2 pages
Forecasting in Business (PPT) - 092138
No ratings yet
Forecasting in Business (PPT) - 092138
53 pages
04 Chap04 ClassificationMethods-LogisticRegression 2024
No ratings yet
04 Chap04 ClassificationMethods-LogisticRegression 2024
23 pages
HH CALC RIDF Regression Cebu City Cebu Sample - XLSM
No ratings yet
HH CALC RIDF Regression Cebu City Cebu Sample - XLSM
151 pages
Reading 5b
No ratings yet
Reading 5b
6 pages
Numerical Methods in Civil Engineering: Instructions To Candidates
No ratings yet
Numerical Methods in Civil Engineering: Instructions To Candidates
2 pages
Week 5 Class Plan - 2022 Autumn - Data Science and Materials Informatics
No ratings yet
Week 5 Class Plan - 2022 Autumn - Data Science and Materials Informatics
3 pages
Licklider's 1963 Memo on Time-Sharing Networks
No ratings yet
Licklider's 1963 Memo on Time-Sharing Networks
4 pages
Examen Parcial 2 2023-2 Secc 1 (Solutions Alumnos)
No ratings yet
Examen Parcial 2 2023-2 Secc 1 (Solutions Alumnos)
5 pages
PVT Data Input Guide for Fluid Properties
No ratings yet
PVT Data Input Guide for Fluid Properties
28 pages
Classical Regression Assumptions
No ratings yet
Classical Regression Assumptions
7 pages
Almost Unbiased Ridge Estimator in ZINB Model
No ratings yet
Almost Unbiased Ridge Estimator in ZINB Model
9 pages

Correlation and Regression Analysis Guide

Uploaded by

Correlation and Regression Analysis Guide

Uploaded by

Correlation and

Positive correlation Negative correlation No correlation

Subject x y x error * y x y X error * y

Sum of x error * y error : 7000 Sum of x error * y error : 28

Covariance: 1166.67 Covariance: 4.67

▪ Solution: standardise this measure

◼ Pearson’s R: standardises the covariance value.

◼ r is very sensitive to extreme values:

◼ To do this we need REGRESSION!

◼ This will be the line that ŷ = ax + b

◼ we must find values of a and b that minimise

◼ Trying different values of b is equivalent to

◼ Trying out different values of a is equivalent to

sums of squares (S)

◼ If we plot the sums of squares

◼ So we can find a and b that give min sum of squares

◼ Then we solve these for 0 to give us the values of a

r sy r = correlation coefficient of x and y

◼ From you can see that:

◼ The smaller the correlation, the closer the

◼ But this isn’t very useful.

◼ Variance of predicted y values (ŷ):

◼ Error variance: This is the variance of the error

◼ Conveniently, via some complicated rearranging

◼ so r2 is the proportion of the variance in y that is explained by

ser2 = sy2 – r2sy2

◼ From this we can see that the greater the correlation

y = a1x1+ a2x2 +…..+ anxn + b + ε

◼ The a parameters reflect the independent contribution of each

Solution: Regression coefficient of Y on X

Estimate the likely demand when the price is Rs.20.

Estimate the likely demand when the price is Rs.20.

When X is 20, Y will be

You might also like