0% found this document useful (0 votes)

56 views23 pages

Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness

The document summarizes key concepts in simple linear regression including: 1) Simple linear regression models the relationship between a dependent variable Y and independent variable X using a linear equation Y = β0 + β1X + ε, where β0 and β1 are parameters estimated from sample data. 2) The parameters β0 and β1 are estimated using the method of least squares which finds the values of β0 and β1 that minimize the sum of squared residuals between observed and predicted Y values. 3) The assumptions of simple linear regression include a linear relationship between Y and X, independent and normally distributed errors with constant variance.

Uploaded by

Vigneshwari Mahamuni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views23 pages

Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness

Uploaded by

Vigneshwari Mahamuni

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Chapter 11: SIMPLE LINEAR REGRESSION

AND CORRELATION

Part 1: Simple Linear Regression (SLR)

Introduction
Sections 11-1 and 11-2
Abrasion Loss vs. Hardness

Price of clock vs. Age of clock

2200

1800
Price Sold at Auction

Bidders
15.0
12.5
1400 10.0
7.5
5.0

1000

125 150 175

Age of Clock (yrs)

1
• Regression is a method for studying the
relationship between two or more
quantitative variables

• Simple linear regression (SLR):

One quantitative dependent variable
- response variable
- dependent variable
-Y
One quantitative independent variable
- explanatory variable
- predictor variable
-X

• Multiple linear regression:

One quantitative dependent variable
Many quantitative independent variables

– You’ll see this in STAT:3200/IE:3760

Applied Linear Regression, if you take it.
2
• SLR Examples:
– predict salary from years of experience
– estimate effect of lead exposure on school
testing performance
– predict force at which a metal alloy rod
bends based on iron content

3
• Example: Health data
Variables:
Percent of Obese Individuals
Percent of Active Individuals
Data from CDC. Units are regions of U.S. in 2014.

PercentObesity PercentActive
1 29.7 55.3
2 28.9 51.9
3 35.9 41.2
4 24.7 56.3
5 21.3 60.4
6 26.3 50.9
.
.
.
35
Percent obese
30
25

40 45 50 55 60 65

Percent Active

4
A scatterplot or scatter diagram can give us a
general idea of the relationship between obe-
sity and activity...
35
Percent obese
30
25

40 45 50 55 60 65

Percent Active

The points are plotted as the pairs (xi, yi)

for i = 1, . . . , 25

Inspection suggests a linear relationship be-

tween obesity and activity (i.e. a straight line
would go through the bulk of the points, and
the points would look randomly scattered around
this line).
5
Simple Linear Regression
The model

• The basic model

Yi = β0 + β1xi + i

– Yi is the observed response or dependent

variable for observation i

– xi is the observed predictor, regressor, ex-

planatory variable, independent variable,
covariate

– i is the error term

– i are iid N (0, σ 2)

(iid means independently and identically distributed)

6
– So, E[Yi|xi] = β0 + β1xi + 0 = β0 + β1xi

The conditional mean (i.e. the expected

value of Yi given xi, or after conditioning
on xi) is “β0 + β1xi” (a point on the esti-
mated line).

– Or, as another notation, E[Y |x] = µY |x

– The random scatter around the mean (i.e.

around the line) follows a N (0, σ 2) distri-
bution.

7
Example: Consider the model that re-
gresses Oxygen purity on Hydrocarbon level
in a distillation process with...

β0 = 75 and β1 = 15

For each xi there is a different Oxygen pu-

rity mean (which is the center of a normal
distribution of Oxygen purity values).

Plugging in xi to (75 + 15xi) gives you the

conditional mean at xi.

8
The conditional mean for x = 1:

E[Y |x] = 75 + 15 · 1 = 90

The conditional mean for x = 1.25:

E[Y |x] = 75 + 15 · 1.25 = 93.75

9
These values that randomly scatter around a
conditional mean are called errors.

The random error of observation i is denoted

as i. The errors around a conditional mean
are normally distributed, centered at 0, and
have a variance of σ 2 or i ∼ N (0, σ ).

Here, we assume all the conditional distri-

butions of the errors are the same, so we’re
using a constant variance model.

V [Yi|xi] = V (β0 + β1xi + i) = V (i) = σ 2

10
• The model can also be written as:

Yi|xi ∼ N (β0 + β1xi , σ 2)

| {z }
Conditional
mean
– mean of Y given x is β0 + β1x (known as
conditional mean)

– β0 + β1xi is the mean value of all the

Y ’s for the given value of xi

The regression line itself represents all the

conditional means.

All the observed points will not fall on the

line, there is some random noise around the
mean (we model this part with an error term).

Usually, we will not know β0, β1, or σ 2 so

we will estimate them from the data.
11
• Some interpretation of parameters:

– β0 is conditional mean when x=0

– β1 is the slope, also stated as the change

in mean of Y per 1 unit change in x

– σ 2 is the variability of responses about the

conditional mean

12
Simple Linear Regression
Assumptions

• Key assumptions

– linear relationship exists between Y and x

*we say the relationship between Y and

x is linear if the means of the conditional
distributions of Y |x lie on a straight line

– independent errors
(this essentially equates to independent
observations in the case of SLR)

– constant variance of errors

– normally distributed errors

13
Simple Linear Regression
Estimation

We wish to use the sample data to estimate the

population parameters: the slope β1 and the
intercept β0
• Least squares estimation
– To choose the ‘best fitting line’ using least
squares estimation, we minimize the sum
of the squared vertical distances of each
point to the fitted line.

14
– We let ‘hats’ denote predicted values or
estimates of parameters, so we have:

ŷi = βˆ0 + βˆ1xi

where ŷi is the estimated conditional mean

for xi,
βˆ0 is the estimator for β0,

and βˆ1 is the estimator for β1

– We wish to choose βˆ0 and βˆ1 such that we

minimize the sum of the squared vertical
distances of each
Pn point to the fitted line,
i.e. minimize i=1(yi − ŷi)2

– Or minimize the function g:

ˆ ˆ Pn
g(β0, β1) = i=1(yi − yˆi)2
Pn ˆ0 + βˆ1xi))2
= i=1 (y i − ( β
15
– This vertical distance of a point from the
fitted line is called a residual. The resid-
ual for observation i is denoted ei and

ei = yi − ŷi

– So, in least squares estimation, we wish

to minimize the sum of the squared
residuals (or error sum of squares SSE ).

– To minimize P
g(βˆ0, βˆ1) = ni=1(yi − (βˆ0 + βˆ1xi))2

we take the derivative of g with respect to

βˆ0 and βˆ1, set equal to zero, and solve.
n
∂g X
= −2 (yi − (βˆ0 + βˆ1xi)) = 0
∂ βˆ0 i=1
n
∂g X
= −2 (yi − (βˆ0 + βˆ1xi))xi = 0
∂ βˆ1 i=1

16
Simplifying the above gives:
n
X n
X
nβˆ0 + βˆ1 xi = yi
i=1 i=1
n n n
(x2i ) =
X X X
βˆ0 xi + βˆ1 y i xi
i=1 i=1 i=1

And these two equations are known as

the least squares normal equations.

Solving the normal equations gets us our

estimators βˆ0 and βˆ1...

17
Simple Linear Regression
Estimation

– Estimate of the slope:

Pn
i=1 (xi − x̄)(yi − ȳ) Sxy
β̂1 = Pn 2
=
i=1(xi − x̄) Sxx

– Estimate of the Y -intercept:

β̂0 = ȳ − β̂1x̄

the point (x̄, ȳ) will always be on the

least squares line

Alternative formulas for β̂0 and β̂1 are also

given in the book.

18
• Example: Cigarette data
(Nicotine vs. Tar content)
●
2.0
1.5

●
Nic

●
●
●
●
● ●●
1.0

●
● ●
●
●
●
●
●
●
●
●
●
●
0.5

●
●

0 5 10 15 20 25 30

Tar

n = 25

Least squares estimates from software:

β̂0=0.1309 and β̂1=0.0610

Summary statistics:
Pn
i=1 xi = 305.4 x̄ = 12.216
Pn
i=1 yi = 21.91 ȳ = 0.8764
19
Pn
i=1(yi − ȳ)(xi − x̄) = 47.01844
Pn 2 = 770.4336
(x
i=1 i − x̄)
Pn 2 = 4501.2 Pn 2 = 22.2105
x
i=1 i y
i=1 i

Using the previous formulas and the sum-

mary statistics...

ˆ Sxy 47.01844
β1 = = = 0.061029
Sxx 770.4336
and
βˆ0 = ȳ − β̂1x̄

= 0.8764 − 0.061029(12.216)

= 0.130870

(Same estimates as software)

20
Simple Linear Regression
Estimating σ 2

• One of the assumptions of simple linear re-

gression is that the variance for each of the
conditional distributions of Y |x is the same
at all x-values (i.e. constant variance).

• In this case, it makes sense to pool all the

observed error information (in the residuals)
to come up with a common estimate for σ 2

21
Recall the model:

iid
Yi = β0 + β1xi + i with i ∼ N (0, σ 2)

– We use the error sum of squares (SSE )

to estimate σ 2...
Pn 2
SS (y − ŷ )
σˆ2 = E
= i=1 i i
= M SE
n−2 n−2

∗ SSE =P
error sum of squares
= ni=1(yi − ŷi)2

∗ M SE is the mean squared error

∗ E[M SE] = E[σˆ2] = σ 2 (Unbiased

estimator)
p
ˆ √
2
∗ σ̂ = σ = M SE

22
∗ ‘2’ is subtracted from n in the denom-
inator because we’ve used 2 degrees of
freedom for estimating the slope and in-
tercept (i.e. there were 2 parameters es-
timated when modeling the conditional
mean)

∗ When we estimated σ 2 in a single nor-

mal
Pn population, we divide
(y − ŷ )2 by (n − 1) because
i=1 i i
we only estimated 1 mean structure pa-
rameter which was µ, now we’re esti-
mate two parameters for our mean struc-
ture, β0 and β1.

Applied Linear Regression Models 4th Ed Note
No ratings yet
Applied Linear Regression Models 4th Ed Note
46 pages
Report On Cost Estimation
No ratings yet
Report On Cost Estimation
41 pages
Regression Analysis
No ratings yet
Regression Analysis
37 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
CVEN2002 Week11
No ratings yet
CVEN2002 Week11
49 pages
Math644 Chapter 1 Part1
No ratings yet
Math644 Chapter 1 Part1
5 pages
Chapter 9 Simple Linear Regression and Correlation
No ratings yet
Chapter 9 Simple Linear Regression and Correlation
56 pages
Reg Analysis
No ratings yet
Reg Analysis
63 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Statistics Week3
No ratings yet
Statistics Week3
19 pages
Simple Linear Regression: Parameters
No ratings yet
Simple Linear Regression: Parameters
34 pages
SimpleLinearRegression PDF
No ratings yet
SimpleLinearRegression PDF
86 pages
Stats101A - Chapter 2
No ratings yet
Stats101A - Chapter 2
59 pages
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
No ratings yet
Lecture 4: Simple Linear Regression Models, With Hints at Their Estimation
12 pages
Lecture 2: Simple Linear Regression Model: Recap
No ratings yet
Lecture 2: Simple Linear Regression Model: Recap
5 pages
Notes On Applied Linear Regression
No ratings yet
Notes On Applied Linear Regression
47 pages
Linera Regression II PDF
No ratings yet
Linera Regression II PDF
14 pages
Math644 - Chapter 1 - Part2 PDF
No ratings yet
Math644 - Chapter 1 - Part2 PDF
14 pages
Definition of Simple Linear Regression
No ratings yet
Definition of Simple Linear Regression
9 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
TSNotes 1
No ratings yet
TSNotes 1
29 pages
Lecture 4
No ratings yet
Lecture 4
11 pages
Notes 2
No ratings yet
Notes 2
16 pages
Chapter 02
No ratings yet
Chapter 02
14 pages
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
No ratings yet
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
65 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Lecture - 8 Regression and Correlation
No ratings yet
Lecture - 8 Regression and Correlation
34 pages
Chapter 2: Simple Linear Regression
No ratings yet
Chapter 2: Simple Linear Regression
58 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
No ratings yet
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
110 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
3 SimpleLinearRegression
No ratings yet
3 SimpleLinearRegression
30 pages
Chapter 8 Linear Regression
No ratings yet
Chapter 8 Linear Regression
22 pages
1.1 Simple Linear Regression Model
100% (1)
1.1 Simple Linear Regression Model
15 pages
Introduction To Mathematical Modeling: Simple Linear Regression
No ratings yet
Introduction To Mathematical Modeling: Simple Linear Regression
21 pages
NASA Regression Lecture
No ratings yet
NASA Regression Lecture
268 pages
Regression Notes - Part-1
No ratings yet
Regression Notes - Part-1
17 pages
Example Class One
No ratings yet
Example Class One
4 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Simple Linear Regression Analysis - Final
No ratings yet
Simple Linear Regression Analysis - Final
46 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
3 - Linear Regression-Least Square Error Fit
No ratings yet
3 - Linear Regression-Least Square Error Fit
35 pages
Simple - Linear - Regression-Presentation - Review-Analysis - Covariance
No ratings yet
Simple - Linear - Regression-Presentation - Review-Analysis - Covariance
10 pages
Regression 2
No ratings yet
Regression 2
28 pages
Week 2
No ratings yet
Week 2
33 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
Simple Linear Regression.: 29.1 Method of Least Squares
No ratings yet
Simple Linear Regression.: 29.1 Method of Least Squares
4 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Chapter3-Goodness of Fit Tests
No ratings yet
Chapter3-Goodness of Fit Tests
24 pages
LM Week1 1 2019
No ratings yet
LM Week1 1 2019
28 pages
Lecture 2-2 - Simple Linear Regression (One Regressor)
No ratings yet
Lecture 2-2 - Simple Linear Regression (One Regressor)
22 pages
Module 5
No ratings yet
Module 5
28 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Capsule Calculus
From Everand
Capsule Calculus
Ira Ritow
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Calculus Refresher
From Everand
Calculus Refresher
A. A. Klaf
3/5 (8)
Simple Regression Model: Conference Paper
No ratings yet
Simple Regression Model: Conference Paper
10 pages
1 s2.0 S2772397621000101 Main
No ratings yet
1 s2.0 S2772397621000101 Main
12 pages
Agricultural Solid Waste Management: Basic Strategies: National Dairy Research Institute
No ratings yet
Agricultural Solid Waste Management: Basic Strategies: National Dairy Research Institute
14 pages
Experimental and Theoretical Study of Earth-Moist Concrete: G. Hüsken H.J.H. Brouwers
No ratings yet
Experimental and Theoretical Study of Earth-Moist Concrete: G. Hüsken H.J.H. Brouwers
10 pages
Particle Packing Theory - Fennis
No ratings yet
Particle Packing Theory - Fennis
3 pages
Create Graphs With Excel
No ratings yet
Create Graphs With Excel
11 pages
First Steps in Understanding Engineering Students' Growth of Conceptual and Procedural Knowledge in An Interactive Learning Context
No ratings yet
First Steps in Understanding Engineering Students' Growth of Conceptual and Procedural Knowledge in An Interactive Learning Context
12 pages
Corrosion Limit For Cuso4 Corrosion Limit For Agcl Corrosion Limit For Calomel
No ratings yet
Corrosion Limit For Cuso4 Corrosion Limit For Agcl Corrosion Limit For Calomel
1 page
Application of Transformative Learning Theory in Engineering Education
No ratings yet
Application of Transformative Learning Theory in Engineering Education
6 pages
Connections and Tension Member Design
No ratings yet
Connections and Tension Member Design
9 pages
Members in Tension - IV
No ratings yet
Members in Tension - IV
53 pages
T H E Analysis and USE Financial Ratios: A Review Article
No ratings yet
T H E Analysis and USE Financial Ratios: A Review Article
13 pages
Econometrics Final
No ratings yet
Econometrics Final
13 pages
Bayesian Methods Statistical Analysis
100% (7)
Bayesian Methods Statistical Analysis
697 pages
Paglamidis Konstantinos
No ratings yet
Paglamidis Konstantinos
42 pages
Špačková - Risk Management of Tunnel Construction Projects
No ratings yet
Špačková - Risk Management of Tunnel Construction Projects
2 pages
Intro To Geostatistics
No ratings yet
Intro To Geostatistics
50 pages
Green Book CPR 16E PDF
100% (1)
Green Book CPR 16E PDF
337 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
CS583 Supervised Learning
No ratings yet
CS583 Supervised Learning
166 pages
Itae0044 Test 1
No ratings yet
Itae0044 Test 1
32 pages
DSIMGTS
No ratings yet
DSIMGTS
13 pages
Short-Term Actuarial Mathematics Exam-October 2018
No ratings yet
Short-Term Actuarial Mathematics Exam-October 2018
8 pages
Module 3.1 Time Series Forecasting ARIMA Model
No ratings yet
Module 3.1 Time Series Forecasting ARIMA Model
19 pages
Untitled
No ratings yet
Untitled
23 pages
Prob&StatsBook PDF
No ratings yet
Prob&StatsBook PDF
202 pages
CIMMYT
No ratings yet
CIMMYT
106 pages
IITb Asi Course
No ratings yet
IITb Asi Course
23 pages
Factor Graphs For Robot Perception
100% (1)
Factor Graphs For Robot Perception
144 pages
Ch01 Solutions
No ratings yet
Ch01 Solutions
4 pages
MB0048 Operations Research
75% (4)
MB0048 Operations Research
329 pages
CRM Scorecard Re Calibration Methodology BGH
No ratings yet
CRM Scorecard Re Calibration Methodology BGH
98 pages
Quantitative Reasoning
No ratings yet
Quantitative Reasoning
31 pages
Generalization of The Lavallée and Hidiroglou Algorithm
No ratings yet
Generalization of The Lavallée and Hidiroglou Algorithm
11 pages
Econometrics Worksheet
No ratings yet
Econometrics Worksheet
7 pages
Analysis of Correlated Data With SAS and R, 4th Edition Extended Version Download
No ratings yet
Analysis of Correlated Data With SAS and R, 4th Edition Extended Version Download
16 pages
Manual PDF
No ratings yet
Manual PDF
161 pages
Probability Analysis of Crane Load and Load Combin PDF
No ratings yet
Probability Analysis of Crane Load and Load Combin PDF
14 pages
STAT 512: Applied Regression Analysis: Fall 2018
No ratings yet
STAT 512: Applied Regression Analysis: Fall 2018
7 pages
13 - NCP Biology 7
No ratings yet
13 - NCP Biology 7
153 pages

Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness

Uploaded by

Simple Linear Regression and Correlation: Abrasion Loss vs. Hardness

Uploaded by

Chapter 11: SIMPLE LINEAR REGRESSION

Part 1: Simple Linear Regression (SLR)

Price of clock vs. Age of clock

125 150 175

• Simple linear regression (SLR):

• Multiple linear regression:

– You’ll see this in STAT:3200/IE:3760

The points are plotted as the pairs (xi, yi)

Inspection suggests a linear relationship be-

• The basic model

– Yi is the observed response or dependent

– xi is the observed predictor, regressor, ex-

– i is the error term

– i are iid N (0, σ 2)

The conditional mean (i.e. the expected

– Or, as another notation, E[Y |x] = µY |x

– The random scatter around the mean (i.e.

For each xi there is a different Oxygen pu-

Plugging in xi to (75 + 15xi) gives you the

The conditional mean for x = 1.25:

E[Y |x] = 75 + 15 · 1.25 = 93.75

The random error of observation i is denoted

Here, we assume all the conditional distri-

V [Yi|xi] = V (β0 + β1xi + i) = V (i) = σ 2

Yi|xi ∼ N (β0 + β1xi , σ 2)

– β0 + β1xi is the mean value of all the

The regression line itself represents all the

All the observed points will not fall on the

Usually, we will not know β0, β1, or σ 2 so

– β0 is conditional mean when x=0

– β1 is the slope, also stated as the change

– σ 2 is the variability of responses about the

– linear relationship exists between Y and x

*we say the relationship between Y and

– constant variance of errors

– normally distributed errors

We wish to use the sample data to estimate the

ŷi = βˆ0 + βˆ1xi

where ŷi is the estimated conditional mean

and βˆ1 is the estimator for β1

– We wish to choose βˆ0 and βˆ1 such that we

– Or minimize the function g:

– So, in least squares estimation, we wish

we take the derivative of g with respect to

And these two equations are known as

Solving the normal equations gets us our

– Estimate of the slope:

– Estimate of the Y -intercept:

the point (x̄, ȳ) will always be on the

Alternative formulas for β̂0 and β̂1 are also

Least squares estimates from software:

β̂0=0.1309 and β̂1=0.0610

Using the previous formulas and the sum-

(Same estimates as software)

• One of the assumptions of simple linear re-

• In this case, it makes sense to pool all the

– We use the error sum of squares (SSE )

∗ M SE is the mean squared error

∗ E[M SE] = E[σˆ2] = σ 2 (Unbiased

∗ When we estimated σ 2 in a single nor-

You might also like

– i is the error term

– i are iid N (0, σ 2)

V [Yi|xi] = V (β0 + β1xi + i) = V (i) = σ 2