0% found this document useful (0 votes)

11 views4 pages

7.1 - Motivation - Correlation & Regression

Uploaded by

Rajesh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views4 pages

7.1 - Motivation - Correlation & Regression

Uploaded by

Rajesh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Ismor Fischer, 5/29/2012 7.

1-1

7. Correlation and Regression

7.1 Motivation
POPULATION

Random Variables X, Y: numerical (Contrast with § 6.3.1.)

How can the association between X and Y (if any exists) be
1) characterized and measured?
2) mathematically modeled via an equation, i.e., Y = f(X)?

Recall:
µX = Mean(X) = E[X] µY = Mean(Y) = E[Y]
σX2 = Var(X) = E[(X – µX)2] σY2 = Var(Y) = E[(Y – µY)2]

Definition: Population Covariance of X, Y

σXY = Cov(X, Y) = E[(X – µX)(Y – µY)]
Equivalently,* = E[XY] – µX µY

SAMPLE, size n
Recall:
1 n 1 n
n∑ n∑
x = xi y = yi
i =1 i =1

1 n 1 n
sx 2 = ∑
n −1 i =1
( xi − x )2 sy2 = ∑
n − 1 i =1
( yi − y )2

Definition: Sample Covariance of X, Y

1 n
n −1 ∑
sxy = ( xi − x )( yi − y )
i =1

Note: Whereas sx2 ≥ 0 and sy2 ≥ 0, sxy is unrestricted in sign.

*Exercise: Algebraically expand the expression (X − µX)(Y − µY), and use the properties of
mathematical expectation given in 3.1. This motivates an alternate formula for sxy.
Ismor Fischer, 5/29/2012 7.1-2

For the sake of simplicity, let us assume that the predictor variable X is
nonrandom (i.e., deterministic), and that the response variable Y is random.
(Although, the subsequent techniques can be extended to random X as well.)
Example: X = fat (grams), Y = cholesterol level (mg/dL)
Suppose the following sample of n = 5 data pairs (i.e., points) is obtained and
graphed in a scatterplot, along with some accompanying summary statistics:

X 60 70 80 90 100 x = 80 sx2 = 250

Y 210 200 220 280 290 y = 240 sy2 = 1750

 Sample Covariance
1
sxy = 5 − 1 [ (60 − 80)(210 − 240) + (70 − 80)(200 − 240) + (80 − 80)(220 − 240) +
(90 − 80)(280 − 240) + (100 − 80)(290 − 240) ] = 600

As the name implies, the variance measures the extent to which a single variable
varies (about its mean). Similarly, the covariance measures the extent to which
two variables vary (about their individual means), with respect to each other.
Ismor Fischer, 5/29/2012 7.1-3

Ideally, if there is no association of any kind between two variables X and Y (as
in the case where they are independent), then a scatterplot would reveal no
organized structure, and covariance = 0; e.g., X = adult head size, Y = IQ.
Clearly, in a case such as this, the variable X is not a good predictor of the
response Y. Likewise, if the variables X = age, Y = body temperature (°F) are
measured in a group of healthy individuals, then the resulting scatterplot would
consist of data points that are very nearly lined up horizontally (i.e., zero slope),
reflecting a constant mean response value of Y = 98.6°F, regardless of age X.
Here again, covariance = 0 (or nearly so); X is not a good predictor of the
response Y. See figures.∗

Y = Body Temp (°F)

Y = IQ score

98.6 −

X = Head Circumference X = Age

However, in the preceding “fat vs. cholesterol” example, there is a clear

“positive trend” exhibited in the scatterplot. Overall, it seems that as X
increases, Y increases, and inversely, as X decreases, Y decreases. The simplest
mathematical object that has this property is a straight line with positive slope,
and so a linear description can be used to capture such “first-order” properties of
the association between X and Y. The two questions we now ask are…

1) How can we measure the strength of the linear association between X and Y?
Answer: Linear Correlation Coefficient

2) How can we model the linear association between X and Y, essentially via an
equation of the form Y = mX + b?
Answer: Simple Linear Regression

∗
Caution: The covariance can equal zero under other conditions as well; see Exercise in the next section.
Ismor Fischer, 5/29/2012 7.1-4

Before moving on to the next section, some important details are necessary in
order to provide a more formal context for this type of problem. In our example,
the response variable of interest is cholesterol level Y, which presumably has some
overall probability distribution in the study population. The mean cholesterol level
of this population can therefore be denoted µY – or, recall, expectation E[Y] – and
estimated by the “grand mean” y = 240. Note that no information about X is used.
Now we seek to characterize the relation (if any) between cholesterol level Y and
fat intake X in this population, based on a random sample using n = 5 fat intake
values (i.e., x1 = 60, x2 = 70, x3 = 80, x4 = 90, x5 = 100). Each of these fixed xi
values can be regarded as representing a different amount of fat grams consumed
by a subpopulation of individuals, whose cholesterol levels Y, conditioned on that
value of X = xi, are assumed to be normally distributed. The conditional mean
cholesterol level of each of these distributions could therefore be denoted µY | X = x i

– equivalently, conditional expectation E[Y | X = xi] – for i = 1, 2, 3, 4, 5. (See

figure; note that, in addition, we will assume that the variances “within groups” are
all equal (to σ 2 ), and that they are independent of one another.) If no relation
between X and Y exists, we would expect to see no organized variation in Y as X
changes, and all of these conditional means would either be uniformly “scattered”
around – or exactly equal to – the unconditional mean µY ; recall the discussion on
the preceding page. But if there is a true relation between X and Y, then it becomes
important to characterize and model the resulting (nonzero) variation.

We can consider n = 5 subpopulations,

µ Y | X = 100

each of whose cholesterol levels Y are

σ
normally distributed, and whose
means are conditioned on X = 60, 70,
80, 90, 100 fat grams, respectively.
µ Y | X = 90

σ
µ Y | X = 80

σ
µ Y | X = 70

σ
µ Y | X = 60

Correlation and Chi-Square Test - LDR 280
100% (1)
Correlation and Chi-Square Test - LDR 280
71 pages
Correlation and Regression
100% (5)
Correlation and Regression
49 pages
ES12005 Lecture 2.5 2024-25
No ratings yet
ES12005 Lecture 2.5 2024-25
75 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
Correlation and Regression
80% (5)
Correlation and Regression
24 pages
Presentation B 6 Sep 2021
No ratings yet
Presentation B 6 Sep 2021
68 pages
Dupont - Simple Linear Regression (STATISTICAL MODELING FOR BIOMEDICAL RESEARCHERS)
No ratings yet
Dupont - Simple Linear Regression (STATISTICAL MODELING FOR BIOMEDICAL RESEARCHERS)
52 pages
Lecture 7
No ratings yet
Lecture 7
65 pages
13simple Linear Regression
No ratings yet
13simple Linear Regression
46 pages
Organizational Change Management
100% (5)
Organizational Change Management
107 pages
Aq200 Genetics of Quantitative Phenotypes-1
No ratings yet
Aq200 Genetics of Quantitative Phenotypes-1
39 pages
Biostat MBBS Project Final 231118 133415
No ratings yet
Biostat MBBS Project Final 231118 133415
51 pages
Differential Games
From Everand
Differential Games
Avner Friedman
No ratings yet
Topic08. Simple Linear Reg
No ratings yet
Topic08. Simple Linear Reg
29 pages
Hedge Fund Accounting - Student - BNY Mellon
100% (1)
Hedge Fund Accounting - Student - BNY Mellon
8 pages
Engineering Maths III
No ratings yet
Engineering Maths III
25 pages
Statistical Methods in Nursing
No ratings yet
Statistical Methods in Nursing
73 pages
Correlation
No ratings yet
Correlation
82 pages
Correlation Rank - Correlation Curve - Fitting For Student
No ratings yet
Correlation Rank - Correlation Curve - Fitting For Student
26 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Week 12+13
No ratings yet
Week 12+13
47 pages
Correlation Coefficient & Simple Linear Regression: STATS 101 Laurens Holmes, JR
No ratings yet
Correlation Coefficient & Simple Linear Regression: STATS 101 Laurens Holmes, JR
53 pages
Capitulo 1 Rencher
No ratings yet
Capitulo 1 Rencher
19 pages
Correlacion y Regresion Lineal
No ratings yet
Correlacion y Regresion Lineal
49 pages
3 BIOMETRY For ABG-730
No ratings yet
3 BIOMETRY For ABG-730
18 pages
Chapter 6 - Correlation and Regression
No ratings yet
Chapter 6 - Correlation and Regression
9 pages
2019 Correlation+Analysis Elsevier
No ratings yet
2019 Correlation+Analysis Elsevier
16 pages
Multivariate Analysis in SPSS
No ratings yet
Multivariate Analysis in SPSS
65 pages
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Statistics in Research
No ratings yet
Statistics in Research
48 pages
12 Bivariate Data Analysis: Regression and Correlation Methods
No ratings yet
12 Bivariate Data Analysis: Regression and Correlation Methods
22 pages
Advertising Response Models
50% (2)
Advertising Response Models
36 pages
STATS Shortcut Formula
No ratings yet
STATS Shortcut Formula
3 pages
Correlation Coefficient & Simple Linear Regression: STATS 101 Laurens Holmes, JR
No ratings yet
Correlation Coefficient & Simple Linear Regression: STATS 101 Laurens Holmes, JR
53 pages
Statests
No ratings yet
Statests
20 pages
Pearson R
No ratings yet
Pearson R
25 pages
Corr - Regression Analysis
No ratings yet
Corr - Regression Analysis
19 pages
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
No ratings yet
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
52 pages
Joint Probability Distributions and Random Samples
No ratings yet
Joint Probability Distributions and Random Samples
28 pages
Topic03 Correlation Regression
No ratings yet
Topic03 Correlation Regression
81 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression
No ratings yet
Two Quantitative Variables: Scatterplot, Correlation, and Linear Regression
17 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Actividad - Evaluable2.1 - Chicaiza - Iza - 5582 (Ingles Version)
No ratings yet
Actividad - Evaluable2.1 - Chicaiza - Iza - 5582 (Ingles Version)
7 pages
Correction
No ratings yet
Correction
10 pages
Private Equity V2 - JPMC
No ratings yet
Private Equity V2 - JPMC
16 pages
SWIFT Messages
100% (1)
SWIFT Messages
5 pages
Coeficiente de Correlação
No ratings yet
Coeficiente de Correlação
6 pages
Correlation, Regression & Curve Fitting
No ratings yet
Correlation, Regression & Curve Fitting
6 pages
Preparation of Financial-Statements
No ratings yet
Preparation of Financial-Statements
32 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
23 pages
ECN 652 Handout 9 Student
No ratings yet
ECN 652 Handout 9 Student
46 pages
Regression: Simple Linear Regression Model
No ratings yet
Regression: Simple Linear Regression Model
16 pages
Correlaton Stats
No ratings yet
Correlaton Stats
8 pages
Observarion + (+1) TH Median of Continuous Frequency9
No ratings yet
Observarion + (+1) TH Median of Continuous Frequency9
9 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Properties of KMnO4 and K2Cr2O7.PDF-65
No ratings yet
Properties of KMnO4 and K2Cr2O7.PDF-65
7 pages
Basic Statistics (3685) PPT - Lecture On 22-01-2019
No ratings yet
Basic Statistics (3685) PPT - Lecture On 22-01-2019
29 pages
Biostat Lecture Note 3
No ratings yet
Biostat Lecture Note 3
5 pages
Accounting For Derivatives
No ratings yet
Accounting For Derivatives
19 pages
Statistics Shortcut Formulae Set
No ratings yet
Statistics Shortcut Formulae Set
3 pages
Microsoft Word - Documento1
No ratings yet
Microsoft Word - Documento1
14 pages
Monitoring and Evaluation
100% (1)
Monitoring and Evaluation
2 pages
Practicum Report On Transformer Repairing and Testing at 33/11kV Substation of Gazipur PBS-1, BREB Power Distribution Network
No ratings yet
Practicum Report On Transformer Repairing and Testing at 33/11kV Substation of Gazipur PBS-1, BREB Power Distribution Network
82 pages
SolomonAntonioVisuyanTandoyBallartaGumbocAretanoNaive - Ed104 - Pearson R & Simple Regression - April 24, 2021
No ratings yet
SolomonAntonioVisuyanTandoyBallartaGumbocAretanoNaive - Ed104 - Pearson R & Simple Regression - April 24, 2021
13 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Belt Conveyor (V1)
No ratings yet
Belt Conveyor (V1)
45 pages
Stratus 3i Installation Guide
No ratings yet
Stratus 3i Installation Guide
8 pages
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
No ratings yet
5 Bivariate Data. Double The Data, Double The Fun: 5.1 Covariance and Correlation
10 pages
Correlation and Regression
No ratings yet
Correlation and Regression
23 pages
Marking Guideline: Building and Structural Construction N5
No ratings yet
Marking Guideline: Building and Structural Construction N5
8 pages
Response Dependent Variable, Predictors Explanatory Independent Variables
No ratings yet
Response Dependent Variable, Predictors Explanatory Independent Variables
9 pages
Repo Vs Reverse Repo
No ratings yet
Repo Vs Reverse Repo
9 pages
Se 221FJ01071
No ratings yet
Se 221FJ01071
3 pages
Complex Interview Evaluation Form 2
0% (1)
Complex Interview Evaluation Form 2
2 pages
Choosing The Right Statistical Test: Source
No ratings yet
Choosing The Right Statistical Test: Source
4 pages
JD - Transaction Management
No ratings yet
JD - Transaction Management
4 pages
Solaris Disk Quota Implementation
No ratings yet
Solaris Disk Quota Implementation
2 pages
Surface Roughness
No ratings yet
Surface Roughness
8 pages
Age of Empires Rise of Rome
No ratings yet
Age of Empires Rise of Rome
35 pages
Preparing OpenStackInstallation Guide
No ratings yet
Preparing OpenStackInstallation Guide
100 pages
ETF Vs Index Funds
No ratings yet
ETF Vs Index Funds
6 pages
Ch6 - Stsitical Process Control
No ratings yet
Ch6 - Stsitical Process Control
32 pages
The Road To Makkah As God Inspired Book
No ratings yet
The Road To Makkah As God Inspired Book
5 pages
1 s2.0 S0263224113006519 Main
No ratings yet
1 s2.0 S0263224113006519 Main
11 pages
Cor Jesu College, Inc. College of Health Sciences: Infographic Competition
No ratings yet
Cor Jesu College, Inc. College of Health Sciences: Infographic Competition
3 pages
FCE Sample Use of English 1, Twins, Edinburugh, Languages
No ratings yet
FCE Sample Use of English 1, Twins, Edinburugh, Languages
6 pages
Indian Railway
No ratings yet
Indian Railway
29 pages
QR729 (QTR729) Qatar Airways Flight Tracking and History - FlightAware
No ratings yet
QR729 (QTR729) Qatar Airways Flight Tracking and History - FlightAware
1 page
Sneha SVMCM SC 2023-2024
No ratings yet
Sneha SVMCM SC 2023-2024
2 pages
Statistical Reasoning For Everyday Life 5th Edition Bennett Test Bank Download
100% (3)
Statistical Reasoning For Everyday Life 5th Edition Bennett Test Bank Download
40 pages
AEIF 2024 Proposal Forms
No ratings yet
AEIF 2024 Proposal Forms
10 pages
T150mm - Beam and Blocks PDF
No ratings yet
T150mm - Beam and Blocks PDF
2 pages
UG - CAO - .00132-002 Tools & Equipment
No ratings yet
UG - CAO - .00132-002 Tools & Equipment
39 pages
Post Trade PDF
100% (1)
Post Trade PDF
36 pages
Fisa Tehnica Pompe SPAU
No ratings yet
Fisa Tehnica Pompe SPAU
4 pages
Proposal Assignment
No ratings yet
Proposal Assignment
10 pages
Untitled Document 13
No ratings yet
Untitled Document 13
3 pages
Fpse 64
No ratings yet
Fpse 64
1 page
Uniu S2466 Sti Ii Ul
No ratings yet
Uniu S2466 Sti Ii Ul
1 page

7.1 - Motivation - Correlation & Regression

Uploaded by

7.1 - Motivation - Correlation & Regression

Uploaded by

Ismor Fischer, 5/29/2012 7.

7. Correlation and Regression

Random Variables X, Y: numerical (Contrast with § 6.3.1.)

Definition: Population Covariance of X, Y

Definition: Sample Covariance of X, Y

Note: Whereas sx2 ≥ 0 and sy2 ≥ 0, sxy is unrestricted in sign.

X 60 70 80 90 100 x = 80 sx2 = 250

Y 210 200 220 280 290 y = 240 sy2 = 1750

Y = Body Temp (°F)

X = Head Circumference X = Age

However, in the preceding “fat vs. cholesterol” example, there is a clear

– equivalently, conditional expectation E[Y | X = xi] – for i = 1, 2, 3, 4, 5. (See

We can consider n = 5 subpopulations,

each of whose cholesterol levels Y are

You might also like