0% found this document useful (0 votes)

10 views58 pages

Correlation Regression

The document provides an overview of correlation and regression analyses, focusing on the relationship between independent and dependent variables. It explains the concepts of regression lines, least squares method, covariance, correlation coefficients, and the coefficient of determination (R²). Additionally, it discusses the importance of understanding the nature of relationships in data analysis, highlighting that correlation does not imply causation.

Uploaded by

Manvendra Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views58 pages

Correlation Regression

Uploaded by

Manvendra Singh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 58

CORRELATION AND REGRESSION

Session 17 - 24
OVERVIEW

• Regression and correlation analyses are based on the relationship, or association,

between two (or more) variables. The known variable (or variables) is called the
independent variable(s). The variable we are trying to predict is the dependent
variable.
• Regression analysis provides a “best-fit” mathematical equation for the values of
the two variables.
• The equation may be linear (a straight line) or curvilinear, but we will be
concentrating on the linear type.
• Correlation analysis measures the strength of the relationship between the
variables.
• Let the two variables are y and x. These are called the dependent ( y) and
independent (x) variables, since a typical purpose for this type of analysis is to
estimate or predict what y will be for a given value of x.
• Relationships can be direct where the dependent variable increases as the independent
variable increases.
• Relationships can also be inverse rather than direct. In these cases, the dependent variable
decreases as the independent variable increases.
• There can be causal relationship between variables; that is, the
independent variable “causes” the dependent variable to change.

• This is the case in the antipollution example above. But in many cases,
other factors cause the changes in both the dependent and the independent
variables.

• For this reason, it is important that you consider the relationships found by
regression to be relationships of association but not necessarily of cause
and effect.
Scatter Diagrams
• A scatter diagram can give us two types of information.
• Visually, we can look for patterns that indicate that the variables are related.
• Then, if the variables are related, we can see what kind of line, or
estimating equation, describes this relationship.
• Relationship described by the data points is well described by a straight
line. Thus, we can say that it is a linear relationship.
• The relationship between X and Y variables can also take the form of a
curve. Statisticians call such a relationship curvilinear
ESTIMATION USING THE
REGRESSION LINE

• The equation for a straight line where the dependent variable Y is

determined by the independent variable X is:
• Suppose we know that a is 3 and b is 2. Let us determine what Y would be for an X equal to 5. When we
substitute the values of a, b, and X in the Equation 12-1, we find the corresponding value of Y to be
USING THE ESTIMATION EQUATION FOR A STRAIGHT LINE
F I N D I N G T H E VA L U E S F O R A A N D B

• Value of a can be found (the Y-intercept) by locating the point where the line crosses the Y-axis.
• Value of b can be found by using this equation
THE METHOD OF LEAST SQUARES
• How can we fit a line mathematically if none of the points lies on the line?
• The line will have a good fit, if it minimizes the error between the estimated points on the line and the
actual observed points that were used to draw it.
• The points that lie on the estimating line are represented as (Y hat).
THE LEAST-SQUARES CRITERION

• The least-squares criterion requires that the sum of the squared deviations between y values in the
scatter diagram and y values predicted by the equation be minimized. In symbolic terms:
LEAST SQUARES REGRESSION LINE
DETERMINING THE LEAST-SQUARES
REGRESSION LINE

𝑦 𝑖= 𝑎1 +𝑏 1 𝑥𝑖 + 𝑒𝑖 𝑥𝑖 = 𝑎2 +𝑏 2 𝑦 𝑖 +𝑒 𝑖

cov ( 𝑥 , 𝑦 ) 𝑏 cov ( 𝑥 , 𝑦 )
𝑏 𝑦 𝑜𝑛 𝑥 ( ¿ 𝑏1 ) = 𝑥 𝑜𝑛 𝑦 ( ¿ 𝑏2 ) = 2
𝜎x
2
𝜎y

( 𝛴 𝑥𝑖 𝑦 𝑖 ) −𝑛 𝑥 𝑦 𝑏 ( 𝛴 𝑥𝑖 𝑦 𝑖 ) −𝑛 𝑥 𝑦
𝑏 y on x = x on y =
( 𝛴 𝑥 )− 𝑛𝑥
2
𝑖
2
( 𝛴 𝑦𝑖 ) − 𝑛 𝑦
2 2
• Scatter
diagram and
least-squares
regression line
EXAMPLE: LEAST SQUARE METHOD

Year Sales (Crores)

2015 76
2016 80
2017 130
2018 144
2019 138
2020 120
2021 174
2022 190
EXAMPLE: LEAST SQUARE METHOD

Year Sales Xi= ti- Xi*Yi (Xi)^2

(ti) (Crores) t(Mean)
Yi
2015 76 -3.5 -266 12.25
2016 80 -2.5 -200 6.25
2017 130 -1.5 -195 2.25
2018 144 -0.5 -72 0.25
2019 138 0.5 69 0.25
2020 120 1.5 180 2.25
2021 174 2.5 435 6.25
2022 190 3.5 665 12.25
t(Mean) Xi= 0 XiYi= 616 (Xi)^2= 42
=2018.5
EXAMPLE: LEAST SQUARE METHOD

• Solution  a  bXi 131.5  14.67 * Xi

Yi
• b=14.67
• a=131.5
Month Stock Price (00)

EXAMPLE: STOCK Oct-21

Nov-21
4.8
4.1

PRICE DATA
Dec-21 6
Jan-22 6.5
Feb-22 5.8
Mar-22 5.2
Apr-22 6.8
May-22 7.4
Jun-22 6
Jul-22 5.6
Aug-22 7.5
Sep-22 7.8
Oct-22 6.3
Nov-22 5.9
Dec-22 8
Jan-23 8.4
DATA
REGRESSION EQUATION

The regression equation

is:
Stock price 4.8525  0.17985 * t
R, R 2 , AND ADJUSTED R 2
MEASURES OF ASSOCIATION

• The sample covariance measures the strength of the linear relationship between two
variables (called bivariate data)

• The sample covariance:

 ( X  X)( Y  Y )
i i
cov ( X , Y )  i1
n 1
• Only concerned with the strength of the relationship
• No causal effect is implied.
INTERPRETING COVARIANCE

• Covariance between two random variables:

 cov(X,Y) > 0 X and Y tend to move in the same direction

 cov(X,Y) < 0 X and Y tend to move in opposite direction
 cov(X,Y) = 0 X and Y are independent

• It is not possible to determine the relative strength of the

relationship using the size of covariance.
CORRELATION

• Correlation refers to sympathetic movement of variables either in the

same or in the opposite directions.
• Measures the relative strength of the linear relationship between two
variables.
• Simple correlation deals with co-variation of two variables
• Multiple and partial correlations involve a study of co-variation between
more than two variables.
• The relationship between variables is established and measured
quantitatively with a view to making estimates based on them.
CORRELATION

• Correlation between variables may be of varying degrees: from perfect on one

extreme down to high, moderate, low and to no correlation on the other.
• Correlation may be linear or non-linear.
• Graphically, correlation is studied by means of a scatter diagram.
• If dots representing pairs of data values are seen to fall on a straight line, the
correlation is perfect. The degree of correlation decreases as the points lay more
and more away from the line. Upward location of points with a rightward
movement on the horizontal axis indicates positive correlation while a downward
location is indicative of the negative correlation.Widely scattered dots with no
clear direction and dots in a line that is parallel to either of the axes means
absence of correlation.
KARL PEARSON’S COEFFECIENT OF
CORRELATION

• Numerically, the correlation is measured and expressed in terms of Karl

Pearson's coefficient of correlation.
• It is defined as the ratio of covariance to the product of standard
deviations of the two series involved.
• Its sign indicates the direction, and its magnitude measures the degree
of correlation.
• The coefficient of correlation varies between ±1
• It is independent of the change of origin and scale.
COEFFICIENT OF CORRELATION

• Sample coefficient of correlation:

cov (X ,Y)
R
SX SY
• where

n n n
 (Xi  X)(Yi  Y)  (X  X)
i
2
 (Y  Y )
i
2

cov (X , Y)  i1 SX  i1

SY  i1
n 1 n 1 n 1
FEATURES OF CORRELATION
COEFFICIENT, R

• Unit free
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear relationship
• Equal to 1, perfect correlation
• Equal to 0, no correlation
For a given series of paired data, the following information is available:
Covariance between X and Y series = -17.8
Standard deviation of X series = 6.6
Standard deviation of Y series = 4.2
No. of pairs of observations = 20
Calculate the coefficient of correlation.

r = -0.642
Thus, variables are negatively correlated.
RANK CORRELATION

• Rank correlation is calculated essentially where the variables under

consideration cannot quantified, being measured on ordinal scale.
• However, it can be calculated even where the variables are objectively
quantifiable.
• This is done by ranking the given data on the basis of the values involved.
• Like the Karl Pearson's coefficient of correlation, the rank correlation
coefficient also varies between ±1.
• The presence of extreme observations in the data does not distort the
value of rank correlation coefficient.
COEFFICINET OF RANK
CORRELATION
COEFFICINET OF RANK
CORRELATION

COMPARISON OF THE RANKS OF FIVE STUDENTS

COEFFICINET OF RANK
CORRELATION

GENERATING INFORMATION TO COMPUTE THE RANK-CORRELATION

COEFFICIENT
COEFFICINET OF RANK
CORRELATION
COEFFICIENT OF
DETERMINATION, R 2

• The coefficient of determination is the portion of the total variation in the dependent
variable that is explained by variation in the independent variable.
• The coefficient of determination is also called r-squared and is denoted as R 2

2 SSR regression sum of squares

R  
SST total sum of squares
2
0 R 1
EXAMPLES OF APPROXIMATE R 2
VALUE

• R2 = 1

X • Perfect linear relationship

R2 = 1 between X and Y:
Y

• 100% of the variation in Y is

explained by variation in X
X
R2 = 1
EXAMPLES OF APPROXIMATE R 2
VALUE
Y

• 0 < R2 < 1

• Weaker linear relationships

X
between X and Y:
Y

• Some but not all of the variation in

Y is explained by variation in X

X
EXAMPLES OF APPROXIMATE R 2
VALUE

Y
• R2 = 0

• No linear relationship between X

and Y.
X
R2 = 0
• The value of Y does not depend on
X. (None of the variation in Y is
explained by variation in X)
ADJUSTED R 2

• R-squared increases every time you add an independent variable

to the model. But Adjusted R-squared not always increases.
• The adjusted R-squared value actually decreases when the term
doesn’t improve the model fit by a sufficient amount.
• It shows how well a regression model makes predictions.
Adjusted R Squared = 1 – [((1 – R 2 ) * (n – 1)) / (n – k – 1)]
where
n – Number of points in your data set.
k – Number of independent variables in the model, excluding
the constant
POINT ESTIMATES USING THE
REGRESSION LINE

• Making point estimates based on the regression line is simply a matter of substituting a
known or assumed value of x into the equation, then calculating the estimated value of
y.
• For example, if a job applicant were to score x 5 15 on the manual dexterity test, we
would predict this person would be capable of producing 64.2 units per hour on the
assembly line.
DEGREES OF FREEDOM

One Independent variable DF=

Total DF= N-
1

In linear regression, the degree of freedom refers to the number

of independent observations available for estimation of the
parameters of the regression model.
DOF IN SIMPLE LINEAR REGRESSION

In linear regression, the degree of freedom refers to the number

of independent observations available for estimation of the
parameters of the regression model.

Total DOF is N-1

Independent variables have k DOF.

DOF for Error is N-k-1

MEASURES OF VARIATION

SSR= Regression Sum of

Squares
SSE= Error Sum of Squares
SST= Total Sum of Squares

MS= F=
(SS/DF) (MSR/MSE)
MEASURES OF VARIATION

• Total variation is made up of two parts:

SST  SSR  SSE

Total Sum of Regression Sum Error Sum of
Squares of Squares Squares

SST  ( Yi  Y )2 SSR  ( Ŷi  Y )2 SSE  ( Yi  Ŷi )2

where

Y= Mean value of the dependent variable

Yi = Observed value of the dependent variable

Yˆi= Predicted value of Y for the given X i value

MEASURES OF VARIATION

Yi 

SSE = (Yi - Yi )2 Y

_
SST = (Yi - Y)2

Y  _
SSR = (Yi - Y)2
_ _
Y Y

Xi X
Measures of Variation
• SST = total sum of squares (Total Variation)
• Measures the variation of the Yi values around their mean Y

• SSR = regression sum of squares (Explained Variation)

• Variation attributable to the relationship between X and Y

• SSE = error sum of squares (Unexplained Variation)

• Variation in Y attributable to factors other than X

13-
58

Class Note II - 044242
No ratings yet
Class Note II - 044242
19 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
CH 6
No ratings yet
CH 6
43 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
SM 38
No ratings yet
SM 38
28 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
25 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
Regression Correlation
No ratings yet
Regression Correlation
22 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Correlation
100% (1)
Correlation
29 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
CH 5 - Correlation and Regression
No ratings yet
CH 5 - Correlation and Regression
9 pages
Chapter XI Correlation and Regression
No ratings yet
Chapter XI Correlation and Regression
41 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Regression: Simple Linear Regression Model
No ratings yet
Regression: Simple Linear Regression Model
16 pages
Correlation, Regression & Curve Fitting
No ratings yet
Correlation, Regression & Curve Fitting
6 pages
CH 6
No ratings yet
CH 6
42 pages
Cha 6
No ratings yet
Cha 6
8 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Correlation and Regression
No ratings yet
Correlation and Regression
16 pages
Correlation & Regression
No ratings yet
Correlation & Regression
26 pages
Correction
No ratings yet
Correction
10 pages
Correlation and Regression Analysis: BMT 1063 Business Statistics
No ratings yet
Correlation and Regression Analysis: BMT 1063 Business Statistics
42 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Correlation N Regression
No ratings yet
Correlation N Regression
25 pages
Regression & Correlation 230224 221642
No ratings yet
Regression & Correlation 230224 221642
9 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
07 - Correlation and Regression Analysis-1
No ratings yet
07 - Correlation and Regression Analysis-1
13 pages
Chapter 14 Simple Linear Regression .
No ratings yet
Chapter 14 Simple Linear Regression .
39 pages
SOCI1005 - Correlation and Regression
No ratings yet
SOCI1005 - Correlation and Regression
36 pages
QT - Unit 2 - Part B - Regression
No ratings yet
QT - Unit 2 - Part B - Regression
40 pages
Oe Statistics Notes
No ratings yet
Oe Statistics Notes
32 pages
Unit 2
No ratings yet
Unit 2
44 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Topic 5-Lecture Notes
No ratings yet
Topic 5-Lecture Notes
12 pages
Stat Chapter 6
No ratings yet
Stat Chapter 6
23 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
Regression Analysis
No ratings yet
Regression Analysis
12 pages
MetNum1 2023 1 Week 13
No ratings yet
MetNum1 2023 1 Week 13
70 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Correlation and Regression
100% (6)
Correlation and Regression
36 pages
Correlation and Simple Linear Regression: Y. I.E. X
100% (1)
Correlation and Simple Linear Regression: Y. I.E. X
9 pages
Correlation-Regression 2019
No ratings yet
Correlation-Regression 2019
76 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Lecture 9 Simple-Linear-Regression-Correlation Updated
No ratings yet
Lecture 9 Simple-Linear-Regression-Correlation Updated
44 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Unit 6, Regression
No ratings yet
Unit 6, Regression
34 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Handout 5 Correlation and Regression (Recovered)
No ratings yet
Handout 5 Correlation and Regression (Recovered)
6 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Corelation & Regression
No ratings yet
Corelation & Regression
21 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
DM4 V39a Manual
No ratings yet
DM4 V39a Manual
147 pages
UserGuide - SK FM4 216 ETHERNET
No ratings yet
UserGuide - SK FM4 216 ETHERNET
77 pages
TCPIP Fundamentals
No ratings yet
TCPIP Fundamentals
1 page
Brief History of The Relational Model
No ratings yet
Brief History of The Relational Model
5 pages
Calculus 1 - Limits - Worksheet 9 - Using The Limit Laws
100% (1)
Calculus 1 - Limits - Worksheet 9 - Using The Limit Laws
15 pages
Non Convex Optimization PDF
No ratings yet
Non Convex Optimization PDF
204 pages
ECE5179and6179 Course Project
No ratings yet
ECE5179and6179 Course Project
8 pages
Computer Applications Sample Papers Knowledge Boat
No ratings yet
Computer Applications Sample Papers Knowledge Boat
76 pages
009-2014-009 APAC Best Practice Installation Manual Issue 1.1
No ratings yet
009-2014-009 APAC Best Practice Installation Manual Issue 1.1
85 pages
A Good Image Generator Is What You Need For High Resolution Video Synthesis
No ratings yet
A Good Image Generator Is What You Need For High Resolution Video Synthesis
23 pages
MCA NewProgrammeGuideCRCFinalJanuary2022
No ratings yet
MCA NewProgrammeGuideCRCFinalJanuary2022
81 pages
Synergy2 Reinsurance Solution - Eurobase International
No ratings yet
Synergy2 Reinsurance Solution - Eurobase International
7 pages
AI and Robotics
No ratings yet
AI and Robotics
22 pages
12 Issue Akira
100% (1)
12 Issue Akira
20 pages
Jurnal Internasional
No ratings yet
Jurnal Internasional
6 pages
Mri Report
No ratings yet
Mri Report
49 pages
Take-Home Exam Questions On Learning
No ratings yet
Take-Home Exam Questions On Learning
2 pages
PPC Model Paper Pvp14
No ratings yet
PPC Model Paper Pvp14
2 pages
Siesta Odt
No ratings yet
Siesta Odt
36 pages
GSM-To-UMTS Training Series 01 - Principles of The WCDMA System - V1.0
No ratings yet
GSM-To-UMTS Training Series 01 - Principles of The WCDMA System - V1.0
87 pages
DR Deepak02
No ratings yet
DR Deepak02
1 page
Rapid Application Development Rapid Application Development: (RAD) Life Cycle Model (RAD) Life Cycle Model
No ratings yet
Rapid Application Development Rapid Application Development: (RAD) Life Cycle Model (RAD) Life Cycle Model
8 pages
A Brief History of FORTRAN:Fortran
No ratings yet
A Brief History of FORTRAN:Fortran
3 pages
ECON 262-Mathematical Applications in Economics-Kiran Arooj
0% (1)
ECON 262-Mathematical Applications in Economics-Kiran Arooj
4 pages
Creating and Managing A Bluebeam Session For Construction Administration - r1
No ratings yet
Creating and Managing A Bluebeam Session For Construction Administration - r1
8 pages
SAFE Tutorial v. 12 Ingles
No ratings yet
SAFE Tutorial v. 12 Ingles
112 pages
Manual VL54
No ratings yet
Manual VL54
174 pages
Detailed Syllabus
No ratings yet
Detailed Syllabus
61 pages
Data Privacy Policy PDF
No ratings yet
Data Privacy Policy PDF
5 pages
Unit - V
No ratings yet
Unit - V
13 pages