0% found this document useful (0 votes)

10 views13 pages

MEFall2023 5

The document provides an overview of correlation and regression, focusing on the measurement of the relationship between random variables using Pearson's coefficient of correlation. It explains the concepts of linear correlation, regression models, and the methods for calculating regression coefficients. Additionally, it includes examples and exercises to illustrate the application of these statistical concepts.

Uploaded by

Muhammad Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views13 pages

MEFall2023 5

Uploaded by

Muhammad Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Corr & Reg

Probability and Random Variables

The math, the computation, and examples.

Prof. Dr. Asad Ali

Department of Applied Mathematics and Statistics

Institute of Space Technology
Islamabad, Pakistan

1 / 94
Corr & Reg

Correlation & Regression

Chapter 5: Correlation & Regression

83 / 94
Corr & Reg

Correlation & Regression

Correlation:
In statistics “correlation” is a tool which measures the degree or the strength of relationship between
two or more random variables. Two variables are said to be in correlation or correlated if the change
in one of the variables results in a change in the other variable. It’s denoted by ‘r’ for sample
data and by ‘ρ’ (rho) for population data. There are many types of correlation such as linear,
quadratic, exponential etc. We will be concerned only with linear correlation. Different types of
linear correlations between X and Y are depicted in the following scatter plots.

Given n pairs of observations (x1 , y1 ), (x2 , y2 ), ..., (xn , yn ) taken on two rvs X and Y , their linear
correlation is defined as, P
(X − X̄)(Y − Ȳ )
r= P p P
(X − X̄)2 (Y − Ȳ )2
This is called Pearson’s coefficient of correlation.
84 / 94
Corr & Reg

Correlation & Regression

For computational purpose (calculator) the above formula can be rewritten as
P P
P X Y
XY −
r = s n
( X)2 ( Y )2
P P
P 2 P 2
X − Y −
n n
1
Or take n
as common from numerator and denominator and cancel it, to give
P P P
n XY − X Y
r= p P P P P
[n X 2 − ( X)2 ] [n Y 2 − ( Y )2 ]
Use whichever you find easy to remember.
A few properties to remember:
−1 ≤ r ≤ +1 or 0 ≤ |r| ≤ 1
The magnitude of r indicates the strength of the relationship whereas the sign indicates the direction
of the relationship.
r = −1 indicates a perfect negative linear relationship and r = +1 indicates a perfect positive linear
relationship. It happens when X and Y are multiple of each others e.g. X = 2Y .
The correlation coefficient is a symmetric quantity i.e. if you interchange the places of the two
variables, it remains the same; rxy = ryx .
The correlation coefficient is independent of the units of measurement. That is, if X is measured in
km and Y in kg, they can still be correlated.
The correlation is independent of changes in origin and scale. For example if you replace X by
X−µ Y −µ
U = σ X and Y by V = σ Y , then rXY = rU V . 85 / 94
X Y
Corr & Reg

Correlation & Regression

Example 68.
Find the Pearson’s linear correlation coefficient of the following data.
X 2.4 3.4 4.6 3.7 2.2 3.3 4.0 2.1
Y 1.33 2.12 1.80 1.65 2.00 1.76 2.11 1.63
Solution:
The Pearson’s linear correlation coefficients is given by
P P
P X Y
XY −
r = s n
P 2 ( X)2 P 2 ( Y )2
P P
X − Y −
n n
To get the required quantities we construct the following table.
s.no X Y X2 Y2 XY
1 2.4 1.33 5.76 1.7689 3.192
2 3.4 2.12 11.56 4.4944 7.208
3 4.6 1.80 21.16 3.2400 8.280
4 3.7 1.65 13.69 2.7225 6.105
5 2.2 2.00 4.84 4.0000 4.400
6 3.3 1.76 10.89 3.0976 5.808
7 4.0 2.11 16.00 4.4521 8.440
8 2.1 1.63 4.41 2.6569 3.423
P
25.7 14.4 88.31 26.4324 46.856 86 / 94
Corr & Reg

Correlation & Regression

Now putting the values in the above formula

(25.7)(14.4)
46.856 −
r = s 8
2
(14.4)2

(25.7)
88.31 − 26.4324 −
8 8
0.5960
= p
[5.75][0.5124]
= 0.3473

This indicates a weak correlation between the two variables.

The strength and significance of the correlation
The following general categories indicate a quick way of interpreting a calculated r value:
0.0 to 0.2 Very weak to negligible correlation
0.2 to 0.4 Weak, low correlation (not very significant)
0.4 to 0.7 Moderate correlation
0.7 to 0.9 Strong, high correlation
0.9 to 1.0 Very strong correlation
The above interpretations apply to both ± signs, since these signs are just the direction of the
relationship.

87 / 94
Corr & Reg

Correlation & Regression

Regression
We often want to predict the values of one variable based on the knowledge of other variable(s).
In general, for this purpose we use certain mathematical models in which one variable depends in
one or more ways on one or more (independent) variables. These mathematical models can either
be deterministic or probabilistic. The deterministic models are those in which for each value of one
(the independent) variable there is a fixed value of the other (dependent) variable. For example,
consider the Celsius-Fahrenheit model.
9
F = 32 + C
5
For C = 37, F will be forever equal to 98.6. Obviously, for a given value of C there is a fixed
value of F , so it is a deterministic model. However, in most problems the relationship of variables
is not deterministic. For example, for a given human age, there is no fixed human body weight.
Different people with exactly the same date (and even time) of birth can have different weights.
Thus, the weight here is a random variable as we can’t predict its value for a given age. To predict
the weight corresponding to a given age in the face of uncertainty, we need a probabilistic model.
Regression provides those mathematical models. A simple linear regression model consists of a
linear deterministic model Yi = a + bXi of two variables X and Y plus a random error term ei :
Yi = a + bXi + ei , i = 1, 2, ..., n
Where the constants a and b represents the intercept and slope, respectively, of the resulting regres-
sion line. This linear equation models the dependency of Y on X in a probabilistic manner, called
the simple linear regression model. The word simple means that there are only two variables
X and Y . 88 / 94
Corr & Reg

Correlation & Regression

The dependent variable Y is also called the response variable or the regressand. Similarly, the
independent variable X is also called explanatory variable, predictor variable or regressor. The
error term e, also called the residual, is the thing that introduces randomness to this model and
is assumed normally distributed with mean zero and variance σ 2 , i.e. e ∼ N (0, σ 2 ). A typical
regression line overlaid with the scatter plot of X, Y data points is shown in the following scatter
plot.

What we actually need is to estimate a and b from the given values of X and Y to get an estimate
of the above linear model that’s Ŷi = â + b̂Xi . The residual is then equal to ei = Yi − Ŷi = the
difference between the observed and estimated responses. 89 / 94
Corr & Reg

Correlation & Regression

Now, how to calculate â and b̂?
The first thing that we need is ˆ which can be calculated as:
P
(X − X̄)(Y − Ȳ )
b̂ = P
(X − X)2
This is almost the same formula as that for the correlation, except that in the denominator the term
(Y − Ȳ )2 and the square root are removed. The computationally convenient formula is
P P
P X Y
XY − P
n XY − X
P P
Y
b̂ = n or b̂ =
P 2 ( X)2
P P
n X 2 − ( X)2
P
X −
n
The intercept â can then be calculated easily, as following

â = Ȳ − b̂X̄

In reality, the formulas for coefficients â and b̂ can be established using the concept of the least
squares method. In which, we try to choose those values of a and b that minimize the sum of the
squared differences (the residuals) between the observed responses Yi and the estimated responses
Ŷi , that is (Yi − Ŷi )2 = e2i . You can simply say that we choose those values of a and b that are
different from their true values to a very less extent.
Note: You can denote the coefficients a and b by α and β and their estimates by α̂ and β̂ too.
90 / 94
Corr & Reg

Correlation & Regression

The least squares estimators (LSE) of a and b
As we said before, we chooseP those values of a and b, that give the smallest sum of squared
residuals, i.e. that minimize e2i .
The sum of squared residuals is given as,
X 2 X
ei = (Yi − Ŷi )2
X
= (Yi − â − b̂Xi )2 ∵ Ŷi = â + b̂Xi

Derivative with respect to a and equating to zero gives,

d X 2 X
ei = 2 (Yi − â − b̂Xi )(−1) = 0
da
Simplifying, we get
X X
Yi = nâ − b̂ Xi (1)

Similarly, differentiating with respect to b and equating to zero gives,

d X 2 X
ei = 2 (Yi − â − b̂Xi )(−Xi ) = 0
db
Simplifying, we get
X X X
Xi Yi = â Xi − b̂ Xi2 (2)
91 / 94
Corr & Reg

Correlation & Regression

Solving equation (1) and (2) simultaneously for â and b̂ gives the LSE of a and b, given as below.
P P
P X Y
XY −
b̂ = n
P 2 ( X)2
P
X −
n

and
P P
X XY
X2
P P
Y −
â = P n
P 2 ( X)2
X −
n

But in practice, to estimate a (i.e. to calculate â) we use the following easier formula

â = Ȳ − b̂X̄

Note: Calculation (or computation) and estimation, are two different things. For example,
multiplying 7 by 8 is a calculation. There is no specific rule or reference used in calculations.
Whereas, we estimate a quantity (or a parameter) by using certain rule. For example, we estimate
the population mean (µ) by sample mean (X̄).
92 / 94
Corr & Reg

Correlation & Regression

Example 69.
Fit a linear regression line to the data in Example 59, using Y as response variable (or regress Y on X).
Solution:
When Y is taken as response, our estimated linear regression model is
Ŷi = â + b̂Xi

Now b̂ is given as:

P
P
P X Y (25.7)(14.4)
XY − 46.856 −
b̂ = n = 8
P 2 ( X)2 (25.7)2
P
X − 88.31 −
n 8
= 0.1037

Also, X̄ = 3.2125 and Ȳ = 1.8. Therefore,

â = Ȳ − b̂X̄ = 1.8 + (0.1037)(3.2125) = 1.4669

Hence, the estimated regression line is

Ŷi = 1.4669 + 0.1037Xi

93 / 94
Corr & Reg

Correlation & Regression

Putting the values of the explanatory variable X in the estimated equation we get the estimated
responses Ŷ .
Ŷi 1.7158 1.8195 1.9439 1.8506 1.6950 1.8091 1.8817 1.6847
If we plot the observed response Y and the estimated response ∩Y both against explanatory variable
X we get the following graph.

This straight line enables us to interpret the interaction behavior of X and Y , and also helps in
predicting as to what would be the values of Y corresponding to any other (missing, past, future)
values of X.
Do Exercises 10.14(b) 10.14(c), 10.15(b), 10.6, 10.7, 10.8, 10.9(a and b parts).
Class Quiz:
What is the difference between correlation and regression (google it). Exercises 10.17 and 10.7.
94 / 94

BASIC Scientific Subroutines Vol. II
No ratings yet
BASIC Scientific Subroutines Vol. II
805 pages
Tga Manual 2011 10 14
100% (1)
Tga Manual 2011 10 14
280 pages
2421-Article Text-7513-2-10-20230126
No ratings yet
2421-Article Text-7513-2-10-20230126
9 pages
HELM Workbook 43 Regression and Correlation
No ratings yet
HELM Workbook 43 Regression and Correlation
32 pages
Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
No ratings yet
Chpter 8 Linear Correlation Analysis and Regressio 250623 080425
72 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
13simple Linear Regression
No ratings yet
13simple Linear Regression
127 pages
Heat and Mass Transfer Lab
No ratings yet
Heat and Mass Transfer Lab
116 pages
Chapter7
No ratings yet
Chapter7
52 pages
Chapter-4-Simple Linear Regression & Correlation
100% (3)
Chapter-4-Simple Linear Regression & Correlation
9 pages
Applied Logistic Regression - 3rd Edition Scribd Download
100% (8)
Applied Logistic Regression - 3rd Edition Scribd Download
17 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
OpenStax Chapter 12 Power Point
No ratings yet
OpenStax Chapter 12 Power Point
81 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Lesson 2 - 1
No ratings yet
Lesson 2 - 1
44 pages
Regression Analysis
No ratings yet
Regression Analysis
43 pages
CH06 Wooldridge 7e PPT 2pp
No ratings yet
CH06 Wooldridge 7e PPT 2pp
17 pages
Christopher F Baum IV and IV-GMM
No ratings yet
Christopher F Baum IV and IV-GMM
45 pages
Chapter 14 Simple Linear Regression .
No ratings yet
Chapter 14 Simple Linear Regression .
39 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Applied Multilevel Analysis A Practical Guide For Medical Researchers Digital PDF Download
No ratings yet
Applied Multilevel Analysis A Practical Guide For Medical Researchers Digital PDF Download
16 pages
Unit 07 Regression Correlation
No ratings yet
Unit 07 Regression Correlation
36 pages
Correlation and Regression
No ratings yet
Correlation and Regression
20 pages
Correlation
No ratings yet
Correlation
57 pages
Linear Regression With R
No ratings yet
Linear Regression With R
45 pages
Detecting and Resolving Heteroskedasticity in STATA-1
No ratings yet
Detecting and Resolving Heteroskedasticity in STATA-1
11 pages
Q4 Moderation Analysis
No ratings yet
Q4 Moderation Analysis
20 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
T Test
No ratings yet
T Test
17 pages
Bio-L8 - Correlation and Regression Analysis
No ratings yet
Bio-L8 - Correlation and Regression Analysis
15 pages
Correlation and Regression
No ratings yet
Correlation and Regression
13 pages
3-Correlation and Rank Correlation-20-02-2024
No ratings yet
3-Correlation and Rank Correlation-20-02-2024
36 pages
The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation
No ratings yet
The Coefficient of Determination R-Squared Is More Informative Than SMAPE, MAE, MAPE, MSE and RMSE in Regression Analysis Evaluation
25 pages
UNIT-4 Material 02
No ratings yet
UNIT-4 Material 02
19 pages
Reg Corr
No ratings yet
Reg Corr
22 pages
Lecture 7 8 Weeks Correlation and Regression
No ratings yet
Lecture 7 8 Weeks Correlation and Regression
7 pages
EXCEL
No ratings yet
EXCEL
24 pages
Unit 5 - Matlab
No ratings yet
Unit 5 - Matlab
32 pages
Tone in Technical Writing
No ratings yet
Tone in Technical Writing
10 pages
Regression
No ratings yet
Regression
16 pages
Regression 1
No ratings yet
Regression 1
12 pages
STA2100-Regression Analysis
No ratings yet
STA2100-Regression Analysis
15 pages
Xtxtgee
No ratings yet
Xtxtgee
19 pages
Correction
No ratings yet
Correction
10 pages
Regression Analysis
No ratings yet
Regression Analysis
34 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Regression Analysis All
No ratings yet
Regression Analysis All
29 pages
PIA Bio Data Form
No ratings yet
PIA Bio Data Form
1 page
PIA Code of Conduct Form
No ratings yet
PIA Code of Conduct Form
1 page
MEFall2023 3
No ratings yet
MEFall2023 3
18 pages
ISLR Chap 7 Shaheryar-Mutahira
No ratings yet
ISLR Chap 7 Shaheryar-Mutahira
15 pages
Apply Linear Regression Model Techniques To Predict Data On Any Dataset
No ratings yet
Apply Linear Regression Model Techniques To Predict Data On Any Dataset
6 pages
Correlation & Regression
No ratings yet
Correlation & Regression
65 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
What Is R2 All About
No ratings yet
What Is R2 All About
10 pages
Geography of Pakistan
No ratings yet
Geography of Pakistan
16 pages
Outliers and Influential Points
No ratings yet
Outliers and Influential Points
14 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Probablity
No ratings yet
Probablity
4 pages
Exercise Set 1 Solution Key
No ratings yet
Exercise Set 1 Solution Key
9 pages
QT - LESSON 8-Regression & Correlation
No ratings yet
QT - LESSON 8-Regression & Correlation
12 pages
Simple Linear Regression Part 1
No ratings yet
Simple Linear Regression Part 1
63 pages
Stat 4-6 Chapter
No ratings yet
Stat 4-6 Chapter
37 pages
Assignment 9-KS
No ratings yet
Assignment 9-KS
3 pages
Econometrics Model Exam
100% (3)
Econometrics Model Exam
10 pages
Customer Shopping Trends Dataset: Analysis of Data - Regression Model
No ratings yet
Customer Shopping Trends Dataset: Analysis of Data - Regression Model
4 pages
LOGISTIC REGRESSION - Jupyter Notebook
No ratings yet
LOGISTIC REGRESSION - Jupyter Notebook
2 pages
Modelling and Calculating The Surface Area of A Clay Pot
No ratings yet
Modelling and Calculating The Surface Area of A Clay Pot
18 pages
Linear Regression Analysis: Gaurav Garg (IIM Lucknow)
No ratings yet
Linear Regression Analysis: Gaurav Garg (IIM Lucknow)
96 pages
7 Regression
No ratings yet
7 Regression
96 pages
Mesh
No ratings yet
Mesh
1 page
2024 - V3 - Employer Contact Details - Predefined Format 2
No ratings yet
2024 - V3 - Employer Contact Details - Predefined Format 2
2 pages
Document
No ratings yet
Document
2 pages
Edited Internship
No ratings yet
Edited Internship
2 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
07 - Correlation and Regression Analysis-1
No ratings yet
07 - Correlation and Regression Analysis-1
13 pages
Simple Linear Regression and Correlation 568a5ac2ce9b3
No ratings yet
Simple Linear Regression and Correlation 568a5ac2ce9b3
31 pages
Chapter 8
No ratings yet
Chapter 8
45 pages
Correlation and Simple Regression
No ratings yet
Correlation and Simple Regression
5 pages
Regression Analysis: Basic Statistics
No ratings yet
Regression Analysis: Basic Statistics
26 pages
Subtitle
No ratings yet
Subtitle
3 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Class 5 Computer Exercise
No ratings yet
Class 5 Computer Exercise
3 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
A1 Mech11
No ratings yet
A1 Mech11
2 pages
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Student - File0.2 - New (1) .Ipynb - Colaboratory
8 pages
12
No ratings yet
12
20 pages
Correlation Regression And: Learning Outcomes
No ratings yet
Correlation Regression And: Learning Outcomes
16 pages
Origin Vs OriginPro 2018
No ratings yet
Origin Vs OriginPro 2018
3 pages
Engineering Analysis & Statistics: Lect. # 11
No ratings yet
Engineering Analysis & Statistics: Lect. # 11
22 pages
Regression and Correlation
No ratings yet
Regression and Correlation
27 pages
Introduction To Regression
No ratings yet
Introduction To Regression
13 pages
Advanced Marketing Research
No ratings yet
Advanced Marketing Research
32 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Chapter 9: Correlation and Regression: Solutions
No ratings yet
Chapter 9: Correlation and Regression: Solutions
8 pages
Regression Analysis
No ratings yet
Regression Analysis
5 pages
Correlation and Regression Bi-Variate Data: Let (X
No ratings yet
Correlation and Regression Bi-Variate Data: Let (X
11 pages
Elements of Partial Differential Equations
From Everand
Elements of Partial Differential Equations
Ian N. Sneddon
4.5/5 (14)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)

MEFall2023 5

Uploaded by

MEFall2023 5

Uploaded by

Corr & Reg

Probability and Random Variables

Prof. Dr. Asad Ali

Department of Applied Mathematics and Statistics

Correlation & Regression

Chapter 5: Correlation & Regression

Correlation & Regression

Correlation & Regression

Correlation & Regression

Correlation & Regression

Now putting the values in the above formula

This indicates a weak correlation between the two variables.

Correlation & Regression

Correlation & Regression

Correlation & Regression

Correlation & Regression

Derivative with respect to a and equating to zero gives,

Similarly, differentiating with respect to b and equating to zero gives,

Correlation & Regression

Correlation & Regression

Now b̂ is given as:

Also, X̄ = 3.2125 and Ȳ = 1.8. Therefore,

â = Ȳ − b̂X̄ = 1.8 + (0.1037)(3.2125) = 1.4669

Hence, the estimated regression line is

Ŷi = 1.4669 + 0.1037Xi

Correlation & Regression

You might also like