0% found this document useful (0 votes)

9 views30 pages

Correlation and regression

The document discusses correlation and regression, focusing on the relationship between two variables, x and y, and how to predict y from x using regression analysis. It explains concepts such as Pearson's r, covariance, and the least squares method for finding the best-fit line in linear regression. Additionally, it touches on the significance of the model and introduces multiple regression as a method to analyze the effects of multiple independent variables on a single dependent variable.

Uploaded by

Meredith Chelsea R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views30 pages

Correlation and regression

Uploaded by

Meredith Chelsea R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 30

Correlation and

Regression
By
R. Meredith Chelsea
Assistant Professor
PG & Research Department of International
Business
SRCAS
Topics Covered:
 Is there a relationship between x and y?
 What is the strength of this relationship
 Pearson’s r
 Can we describe this relationship and use this to predict y from
x?
 Regression
 Is the relationship we have described statistically significant?
 t test
 Relevance to SPM
 GLM
The relationship between x and y
 Correlation: is there a relationship between 2
variables?
 Regression: how well a certain independent
variable predict dependent variable?
 CORRELATION  CAUSATION
 In
order to infer causality: manipulate independent
variable and observe effect on dependent variable
Scattergrams

Y Y Y
Y Y Y

X X X

Positive correlation Negative correlation No correlation

Variance vs Covariance
 First, a note on your sample:
 If you’re wishing to assume that your sample is
representative of the general population (RANDOM
EFFECTS MODEL), use the degrees of freedom (n – 1)
in your calculations of variance or covariance.
 But if you’re simply wanting to assess your current

sample (FIXED EFFECTS MODEL), substitute n for

the degrees of freedom.
Variance vs Covariance
 Do two variables change together?

Variance: n
• Gives information on variability of a
single variable.
2
 i
( x  x ) 2

S  i 1
x
Covariance: n 1
• Gives information on the degree to
which two variables vary together. n
• Note how similar the covariance is
to variance: the equation simply  (x i  x)( yi  y )
multiplies x’s error scores by y’s
error scores as opposed to squaring cov( x, y )  i 1
x’s error scores. n 1
Covariance

 (x i  x)( yi  y )
cov( x, y )  i 1
n 1
 When X and Y : cov (x,y) = pos.
 When X and Y : cov (x,y) = neg.
 When no constant relationship: cov (x,y) = 0
Example Covariance

6 x y xi - x yi - y ( xi - x )( yi - y )
5

4
0 3 -3 0 0
3
2 2 -1 -1 1
2
3 4 0 1 0
1 4 0 1 -3 -3
0 6 6 3 3 9
0 1 2 3 4 5 6 7
x =3 y =3 å= 7

 ( x  x)( y
i i  y ))
7 What does this
cov( x, y )  i 1
 1.75 number tell us?
n 1 4
Problem with Covariance:
 The value obtained by covariance is dependent on the size of
the data’s standard deviations: if large, the value will be
greater than if small… even if the relationship between x and y
is exactly the same in the large versus small standard
deviation datasets.
Example of how covariance value
relies on variance
High variance data Low variance data

Subject x y x error * y x y X error * y

error error
1 101 100 2500 54 53 9
2 81 80 900 53 52 4
3 61 60 100 52 51 1
4 51 50 0 51 50 0
5 41 40 100 50 49 1
6 21 20 900 49 48 4
7 1 0 2500 48 47 9
Mean 51 50 51 50

Sum of x error * y error : 7000 Sum of x error * y error : 28

Covariance: 1166.67 Covariance: 4.67

Solution: Pearson’s r
 Covariance does not really tell us anything

 Solution: standardise this measure

 Pearson’s R: standardises the covariance value.

 Divides the covariance by the multiplied standard deviations of
X and Y:

cov( x, y )
rxy 
sx s y
Pearson’s R continued

n n

 ( x  x)( y
i i  y)  ( x  x)( y
i i  y)
cov( x, y )  i 1 rxy  i 1
n 1 (n  1) s x s y

Z xi * Z yi
rxy  i 1
n 1
Limitations of r
 When r = 1 or r = -1:
 We can predict y from x with certainty
 all data points are on a straight line: y = ax + b
 r is actually r̂
 r = true r of whole population
 = estimate of r based on data
r̂

r is very sensitive to extreme values:
5

0
0 1 2 3 4 5 6
Regression
 Correlation tells you if there is an association
between x and y but it doesn’t describe the
relationship or allow you to predict one
variable from the other.

 To do this we need REGRESSION!

Best-fit Line
 Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that
gives best prediction of y for any value of x

 This will be the line that ŷ = ax + b

minimises distance between
data and fitted line, i.e. slope intercept
the residuals
ε

= ŷ, predicted value
= y i , true value
ε = residual error
Least Squares Regression
 To find the best line we must minimise the sum of
the squares of the residuals (the vertical distances
from the data points to our line)
Model line: ŷ = ax + b a = slope, b = intercept

Residual (ε) = y - ŷ
Sum of squares of residuals = Σ (y – ŷ)2

 we must find values of a and b that minimise

Σ (y – ŷ)2
Finding b
 First we find the value of b that gives the min
sum of squares

b
ε b ε
b

 Trying different values of b is equivalent to

shifting the line up and down the scatter plot
Finding a
 Now we find the value of a that gives the min
sum of squares

b b b

 Trying out different values of a is equivalent to

changing the slope of the line, while b stays
constant
Minimising sums of squares
 Need to minimise Σ(y–ŷ)2
 ŷ = ax + b
 so need to minimise:

sums of squares (S)

Σ(y - ax - b)2

 If we plot the sums of squares

for all different values of a and b
we get a parabola, because it is a
squared term
Gradient = 0
min S
 So the min sum of squares is at Values of a and b
the bottom of the curve, where
the gradient is zero.
The maths bit
 The min sum of squares is at the bottom of the curve
where the gradient = 0

 So we can find a and b that give min sum of squares

by taking partial derivatives of Σ(y - ax - b)2 with
respect to a and b separately

 Then we solve these for 0 to give us the values of a

and b that give the min sum of squares
The solution
 Doing this gives the following equations for a and b:

r sy r = correlation coefficient of x and y

a= sx
sy = standard deviation of y
sx = standard deviation of x

 From you can see that:

 A low correlation coefficient gives a flatter slope (small value of
a)
 Large spread of y, i.e. high standard deviation, results in a
steeper slope (high value of a)
 Large spread of x, i.e. high standard deviation, results in a flatter
slope (high value of a)
The solution cont.
 Our model equation is ŷ = ax + b
 This line must pass through the mean so:

y = ax + b b = y – ax
 We can put our equation for a into this giving:
r sy r = correlation coefficient of x and y
b=y- s x s = standard deviation of y
y

x s = standard deviation of x
x

 The smaller the correlation, the closer the

intercept is to the mean of y
Back to the model
a b
r sy r sy
ŷ = ax + b = x+y- x
sx sx
a a
r sy
Rearranges to: ŷ= (x – x) + y
sx
 If the correlation is zero, we will simply predict the mean of y for every
value of x, and our regression line is just a flat straight line crossing the
x-axis at y

 But this isn’t very useful.

 We can calculate the regression line for any data, but the important
question is how well does this line fit the data, or how good is it at
predicting y from x
How good is our model?
∑(y – y)2 SSy
 Total variance of y: sy 2 = =
n-1 dfy

 Variance of predicted y values (ŷ):

∑(ŷ – y)2 SSpred This is the variance
sŷ 2 = = explained by our
n-1 dfŷ regression model

 Error variance: This is the variance of the error

between our predicted y values and
∑(y – ŷ)2 SSer the actual y values, and thus is the
serror =
2
= variance in y that is NOT explained
n-2 dfer
by the regression model
How good is our model cont.
 Total variance = predicted variance + error variance
sy2 = sŷ2 + ser2

 Conveniently, via some complicated rearranging

sŷ 2 = r2 sy 2

r 2 = sŷ 2 / s y 2

 so r2 is the proportion of the variance in y that is explained by

our regression model
How good is our model cont.
 Insert r2 sy2 into sy2 = sŷ2 + ser2 and rearrange to get:

ser2 = sy2 – r2sy2

= sy2 (1 – r2)

 From this we can see that the greater the correlation

the smaller the error variance, so the better our
prediction
Is the model significant?
 i.e. do we get a significantly better prediction of y
from our regression equation than by just predicting
the mean?

 F-statistic: complicated
rearranging
sŷ 2 r2 (n - 2)2
F(df ,df ) = =......=
ŷ er
ser2 1 – r2
 And it follows that:
r (n - 2) So all we need to
(because F = t 2) t (n-2) = know are r and n
√1 – r2
General Linear Model
 Linear regression is actually a form of the
General Linear Model where the parameters
are a, the slope of the line, and b, the intercept.
y = ax + b +ε
 A General Linear Model is just any model that
describes the data in terms of a straight line
Multiple regression
 Multiple regression is used to determine the effect of a number
of independent variables, x1, x2, x3 etc, on a single dependent
variable, y
 The different x variables are combined in a linear way and
each has its own regression coefficient:

y = a1x1+ a2x2 +…..+ anxn + b + ε

 The a parameters reflect the independent contribution of each

independent variable, x, to the value of the dependent variable,
y.
 i.e. the amount of variance in y that is accounted for by each x
variable after all the other x variables have been accounted for
SPM
 Linear regression is a GLM that models the effect of one
independent variable, x, on ONE dependent variable, y

 Multiple Regression models the effect of several independent

variables, x1, x2 etc, on ONE dependent variable, y

 Both are types of General Linear Model

 GLM can also allow you to analyse the effects of several

independent x variables on several dependent variables, y1, y2,
y3 etc, in a linear combination

 This is what SPM does and all will be explained next week!

Session_19&20
No ratings yet
Session_19&20
54 pages
Linear Regresion
No ratings yet
Linear Regresion
28 pages
Simulation and Modeling1
No ratings yet
Simulation and Modeling1
17 pages
13 Predictive Analysis - Tests of Association- Regression
No ratings yet
13 Predictive Analysis - Tests of Association- Regression
70 pages
Corr and Regress
No ratings yet
Corr and Regress
30 pages
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
No ratings yet
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
37 pages
Regression
No ratings yet
Regression
1 page
Lecture 9 simple-linear-regression-correlation updated
No ratings yet
Lecture 9 simple-linear-regression-correlation updated
44 pages
How To Use Social Media Responsibly
No ratings yet
How To Use Social Media Responsibly
2 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Relationship- Correlation and Regression (1)
No ratings yet
Relationship- Correlation and Regression (1)
42 pages
Why Do We Need Statistics? - P Values - T-Tests - Anova - Correlation33
No ratings yet
Why Do We Need Statistics? - P Values - T-Tests - Anova - Correlation33
37 pages
REGRESSION ANALYSIS
No ratings yet
REGRESSION ANALYSIS
6 pages
The Simple Linear Regression Model and Correlation
100% (1)
The Simple Linear Regression Model and Correlation
64 pages
Simple Regression
No ratings yet
Simple Regression
46 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
PARAMETRIC-TEST
No ratings yet
PARAMETRIC-TEST
49 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
Ch 4- Correlation and Regression YARA&LAMA
No ratings yet
Ch 4- Correlation and Regression YARA&LAMA
27 pages
Regression Models Notes
No ratings yet
Regression Models Notes
13 pages
Regression Analysis
100% (1)
Regression Analysis
43 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Lecture8 4
No ratings yet
Lecture8 4
29 pages
Simple Linear Regression sample
No ratings yet
Simple Linear Regression sample
55 pages
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
No ratings yet
Course Notes For Unit 6 of The Udacity Course ST101 Introduction To Statistics PDF
23 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
Chap 10 Regression Analysis
No ratings yet
Chap 10 Regression Analysis
68 pages
14 Statistics and Probability
No ratings yet
14 Statistics and Probability
37 pages
Regression Analysis
No ratings yet
Regression Analysis
47 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Session 5 Marked B PDF
No ratings yet
Session 5 Marked B PDF
36 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Lecture9 Regression1 PDF
No ratings yet
Lecture9 Regression1 PDF
22 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
Simple Linear Regression and Correlation 568a5ac2ce9b3
No ratings yet
Simple Linear Regression and Correlation 568a5ac2ce9b3
31 pages
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
No ratings yet
T-Tests, Anova and Regression: Lorelei Howard and Nick Wright MFD 2008
37 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Crime Delinquency And Justice A Caribbean Reader Ramesh Deosaran pdf download
No ratings yet
Crime Delinquency And Justice A Caribbean Reader Ramesh Deosaran pdf download
83 pages
Corr and Regress
No ratings yet
Corr and Regress
61 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Simple Regression 1
No ratings yet
Simple Regression 1
18 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
A A Regression
No ratings yet
A A Regression
28 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Unit 2-Part 3-Linear Regression
No ratings yet
Unit 2-Part 3-Linear Regression
38 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Intro To Regresion: Codergirl Data Analysis
No ratings yet
Intro To Regresion: Codergirl Data Analysis
32 pages
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
No ratings yet
Regression: Regression. But Quite Often The Values of A Particular Phenomenon May Be Affected by Multiplicity of
8 pages
Regression Analysis
100% (2)
Regression Analysis
9 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
408
No ratings yet
408
32 pages
gn125f_-_euro3
No ratings yet
gn125f_-_euro3
72 pages
Full download Free Pascal Reference guide version 3 0 2 Michaël Van Canneyt pdf docx
100% (1)
Full download Free Pascal Reference guide version 3 0 2 Michaël Van Canneyt pdf docx
55 pages
Business Statistics by Gupta 365 379
No ratings yet
Business Statistics by Gupta 365 379
15 pages
e-passbook-2022-05-11-09-59-47
No ratings yet
e-passbook-2022-05-11-09-59-47
105 pages
Research Problem
No ratings yet
Research Problem
13 pages
Business Communication Assignment
No ratings yet
Business Communication Assignment
13 pages
Metho of Data Collection
No ratings yet
Metho of Data Collection
33 pages
Report Writing
No ratings yet
Report Writing
6 pages
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Homework Construction LTD
100% (1)
Homework Construction LTD
7 pages
Types of Research
No ratings yet
Types of Research
15 pages
CSE312 - RPT 02
No ratings yet
CSE312 - RPT 02
6 pages
Computer Science - Project XI
No ratings yet
Computer Science - Project XI
28 pages
PLN - Detailed Revision List - 13. Second Submission Comment Response Matrix
No ratings yet
PLN - Detailed Revision List - 13. Second Submission Comment Response Matrix
41 pages
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
HeadJerking Words
No ratings yet
HeadJerking Words
9 pages
2 Protein
No ratings yet
2 Protein
7 pages
Colourimetry Practical
100% (1)
Colourimetry Practical
6 pages
Alcatel DECT-500-Manual Ingles
No ratings yet
Alcatel DECT-500-Manual Ingles
36 pages
UIImageView WebCache
No ratings yet
UIImageView WebCache
3 pages
Intro To Matrices
No ratings yet
Intro To Matrices
19 pages
Circuit-Bending and The DIY Culture
100% (1)
Circuit-Bending and The DIY Culture
23 pages
Relational Theory: New Growth in Psychoanalysis Andpsychotherapy
No ratings yet
Relational Theory: New Growth in Psychoanalysis Andpsychotherapy
7 pages
EsP Teacher Roles, K-12
No ratings yet
EsP Teacher Roles, K-12
14 pages
Solenoid Operated Valves Pilot Operated Poppet Type 2-Way Normally Open Common Cavity, Size 08
No ratings yet
Solenoid Operated Valves Pilot Operated Poppet Type 2-Way Normally Open Common Cavity, Size 08
4 pages
Troubleshooting 42.05: Explanation of Error Code
No ratings yet
Troubleshooting 42.05: Explanation of Error Code
2 pages
Electrical System 320 and 323 Excavator: Volume 3 of 4: CGC Volume 2 of 4: Cab Volume 1 of 4: Chassis
No ratings yet
Electrical System 320 and 323 Excavator: Volume 3 of 4: CGC Volume 2 of 4: Cab Volume 1 of 4: Chassis
5 pages
Class Program Grade 5 2024-2025
100% (4)
Class Program Grade 5 2024-2025
2 pages
Alectek Shoes Case Study
50% (2)
Alectek Shoes Case Study
2 pages
177 Sandman Endless Nights
No ratings yet
177 Sandman Endless Nights
5 pages
Address: - : REGISTERED & HEAD OFFICE. Bajaj Auto LTD., Akurdi, Pune 411035
No ratings yet
Address: - : REGISTERED & HEAD OFFICE. Bajaj Auto LTD., Akurdi, Pune 411035
1 page
Engineering Economics, ENGR 610: Quiz-3&4, Take Home (15%)
No ratings yet
Engineering Economics, ENGR 610: Quiz-3&4, Take Home (15%)
2 pages
Progress Chart Cookery NC Ii
83% (6)
Progress Chart Cookery NC Ii
2 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Lesson 3 - Membrane-Bound Organelles
No ratings yet
Lesson 3 - Membrane-Bound Organelles
3 pages

Correlation and regression

Uploaded by

Correlation and regression

Uploaded by

Correlation and

Positive correlation Negative correlation No correlation

sample (FIXED EFFECTS MODEL), substitute n for

Subject x y x error * y x y X error * y

Sum of x error * y error : 7000 Sum of x error * y error : 28

Covariance: 1166.67 Covariance: 4.67

 Solution: standardise this measure

 Pearson’s R: standardises the covariance value.

 To do this we need REGRESSION!

 This will be the line that ŷ = ax + b

 we must find values of a and b that minimise

 Trying different values of b is equivalent to

 Trying out different values of a is equivalent to

sums of squares (S)

 If we plot the sums of squares

 So we can find a and b that give min sum of squares

 Then we solve these for 0 to give us the values of a

r sy r = correlation coefficient of x and y

 From you can see that:

 The smaller the correlation, the closer the

 But this isn’t very useful.

 Variance of predicted y values (ŷ):

 Error variance: This is the variance of the error

 Conveniently, via some complicated rearranging

 so r2 is the proportion of the variance in y that is explained by

ser2 = sy2 – r2sy2

 From this we can see that the greater the correlation

y = a1x1+ a2x2 +…..+ anxn + b + ε

 The a parameters reflect the independent contribution of each

 Multiple Regression models the effect of several independent

 Both are types of General Linear Model

 GLM can also allow you to analyse the effects of several

You might also like