0% found this document useful (0 votes)

39 views43 pages

ch12 0

The document discusses Simple Linear Regression, presenting its mathematical model and the relationship between independent and dependent variables. It explains the estimation of model parameters using the least squares method, the interpretation of regression coefficients, and the significance of residuals and outliers in regression analysis. Additionally, it highlights the importance of assessing model fit and the implications of influential points on regression outcomes.

Uploaded by

rk43.koundal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views43 pages

ch12 0

Uploaded by

rk43.koundal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 43

Simple Linear

1 Regression

2
Material from Devore’s book (Ed 8), and Cengagebrain.com
Simple Linear Regression

8
0
6
0
Ratin
g
4
0
2
0

0 5 1 1
0 5
Suga
r 2
Simple Linear Regression

8
0
6
0
Ratin
g
4
0
2
0

0 5 1 1
0 5
Suga
r 3
Simple Linear Regression

8
0
6
0
Ratin
g
4
0

x
x
2
0

0 5 1 1
0 5
Suga
r 4
The Simple Linear Regression
Model
The simplest deterministic mathematical relationship
between two variables x and y is a linear relationship:
y = 0 + 1x.
The objective of this section is to develop an
equivalent linear probabilistic model.

If the two (random) variables are probabilistically related,

then for a fixed value of x, there is uncertainty in the
value of the second variable.

So we assume Y = 0 + 1x + ε, where ε is a

random variable.
2 variables are related linearly “on average” if for
fixed x the actual value of Y differs from its expected
value by a random amount (i.e. there is random
5
error).
A Linear Probabilistic Model
Definition The Simple Linear Regression Model
There are parameters 0, 1, and  2, such that for
any fixed value of the independent variable x, the
dependent variable is a random variable related to x
through the model equation
Y = 0 + 1x + ε

The quantity ε in the model equation is the “error”

--a random variable, assumed to be
symmetrically distributed with
E(ε) = 0 and V(ε) =ε  2
=2
(no assumption made about the distribution
of ε, yet)
6
A Linear Probabilistic Model
X: the independent, predictor, or explanatory variable
(usually known). NOT RANDOM.

Y: The dependent or response variable. For fixed x, Y will

be random variable.

ε: The random deviation or random error term. For fixed x, ε

will be random variable.

What exactly does ε do?

7
A Linear Probabilistic Model
The points (x1, y1), …, (xn, yn) resulting from n
independent observations will then be scattered
about the true regression line:
This image cannot currently be
displayed.

8
A Linear Probabilistic Model
How do we know simple linear
regression is appropriate?

- Theoretical considerations
- Scatterplots

9
A Linear Probabilistic Model
If we think of an entire population of (x, y)
pairs, then
Y2Y| x| xisisthe
a mean of all y of
measure values
how for which
much ,
x = xvalues
these of

and about
out the mean value.
y spread

If, for example, x = age of a child and y =

vocabulary size, then Y | 5 is the average
children in the
vocabulary sizepopulation, and
for all 5-year-
o 2
Y |ld describes
thevariability
of amount in vocabulary size 5 for this

part of the population.

10
A Linear Probabilistic Model
Interpreting parameters:

0 (the intercept of the true

regression line): The average value of
Y when x is zero.

1 (the slope of the true regression

line):
The expected (average) change in Y associated with
a 1-unit increase in the value of x.

11
A Linear Probabilistic Model
What is  2Y | x? How do we interpret  2Y | x?

Homoscedasticity:
We assume the variance (amount of variability) of the
distribution of Y values to be the same at each different
value of fixed x. (i.e.
homogeneity of variance assumption).

12
When errors are normally
distributed…
distribution of 

(b) distribution of Y for different

values of x

The variance parameter  2 determines the extent to

which each normal curve spreads out about the
1
regression line 3
A Linear Probabilistic Model
When  2 is small, an observed point (x, y) will
almost always fall quite close to the true
regression line, whereas observations may deviate
considerably from their expected values
(corresponding to points far from the line) when  2
is large.

Thus, this variance can be used to tell us how

good the linear fit is

But how do we define “good”?

14
Estimating Model Parameters
The values of 0, 1, and  2 will almost never be
known to an investigator.

Instead, sample data consists of n observed pairs

(x1, y1), … , (xn, yn),

from which the model parameters and the true

regression line itself can be estimated.

The data (pairs) are assumed to have been

obtained independently of one another.

15
Estimating Model Parameters
Where
Yi = 0 + 1xi + εi for i = 1, 2, … , n

and the n deviations ε1, ε2,…, εn are independent

r.v.’s. (Y1, Y2, …, Yn are independent too, why?)

16
Estimating Model Parameters
The “best fit” line is motivated by the principle
of least squares, which can be traced back to
the German mathematician Gauss (1777–
1855):

A line provides the best

fit to the data if the sum
of the squared vertical
distances (deviations)
from the observed points
to that line is as small
as it can be.

17
Estimating Model Parameters
The sum of squared vertical deviations from
the points (x1, y1),…, (xn, yn) to the line is then

The point estimates of 0 and 1, denoted by ,

and are
called the least squares estimates – they are
those values that minimize f(b0, b1).

18
Estimating Model Parameters
The fitted regression line or least squares line is
then the line whose equation is y = + x.

The minimizing values of b0 and b1 are found by

taking partial derivatives of f(b0, b1) with respect to
both b0 and b1, equating them both to zero
[analogously to f ʹ(b) = 0 in univariate calculus],
and solving the equations

19
Estimating Model Parameters
The least squares estimate of the slope coefficient
1 of the true regression line is

Shortcut formulas for the numerator and

denominator of are

Sxy = xiyi – (xi)(yi)/n and Sxx = xi2 –

(xi)2/n

(Typically columns for xi, yi, xiyi and xi2 and constructed
and then 20
S and S are calculated.)
Estimating Model Parameters
The least squares estimate of the intercept 0 of
the true regression line is

The computational formulas for Sxy and Sxx require

only the summary statistics xi, yi, xi2 and xiyi.

(yi2 will be needed shortly for the variance.)

21
Example (fitted regression line)

The cetane number is a critical property in

specifying the ignition quality of a fuel used in a
diesel engine.

Determination of this number for a

biodiesel fuel is expensive and time-
consuming.

The article “Relating the Cetane Number of

Biodiesel Fuels to Their Fatty Acid Composition:
A Critical Study” (J. of Automobile Engr., 2009:
565–583) included the following data on x =
iodine value (g) and y = cetane number for a
sample of 14 biofuels (see next slide).
22
Example (fitted
cont’
regression line) d
The iodine value (x) is the amount of iodine necessary to
saturate a sample of 100 g of oil. The article’s authors fit
the simple linear regression model to this data, so let’s do the
same.

Calculating the relevant statistics

gives
xi = 1307.5, yi = 779.2,
xi =
128,913.93,
2 xi yi =
71,347.30,
from whichSxx = 128,913.93 – (1307.5)2/14 = 6802.7693

and Sxy = 71,347.30 – (1307.5)(779.2)/14 = –

1424.41429
23
Example (fitted regression line)
cont’
d
Scatter plot with the least squares line
superimposed.

24
Fitted Values
Fitted Values:
The fitted (or predicted) are
by substituting x1,…, xn into the equation
values of the
obtained
estimated regression line:

Residuals:
The differences between
observed and fitted y the
values.
Residuals are estimates of the true error –
WHY?
25
Sum of the residuals
When the estimated regression line is obtained
via the principle of least squares, the sum of the
residuals should in theory be zero, if the error
distribution is symmetric, since
0

26
Example (fitted values)

Suppose we have the following data on filtration

rate (x) versus moisture content (y):

Relevant summary quantities (summary

statistics) are
xi = 2817.9, yi = 1574.8, x2i =
415,949.85,
xi yi = y2i =
222,657.88, and 124,039.58,
From Sxx = 18,921.8295, Sxy =
776.434.
27
Calculation of residuals?
Example (fitted values)
cont’
d
All predicted values (fits) and residuals
appear in the accompanying table.

28
Fitted Values
We interpret the fitted value as the value of y that we
would predict or expect when using the estimated
regression line with x = xi;; thus is the estimated true
mean for that population when x = xi (based on the
data).

The residual is a positive number if the point lies

above the line and a negative number if it lies below
the line.(x i , yˆ i )

The residual can be thought of as a measure of

ϵi ≈ βˆ0 + βˆ1 x iand
deviation = Yˆcan
+ ϵˆi we i + ϵˆ
summarize
i the notation in the
following way: ⇒

Y i — Yˆ i = ϵˆ i 29
Residual Plots
Revenue = 2.7 * Temperature – 35

Residual = Observed – Predicted

Residual
Temperature Revenue Revenue
(Observed –
(Celsius) (Observed) (Predicted)
Predicted)

28.2 $44 $41 $3

21.4 $23 $23 $0

32.9 $43 $54 -$11

24.0 $30 $29 $1

etc. etc. etc. etc.

Residual plots (contd.)
Same regression run on two different lemonade stands, one
where the model is very accurate, one where the model is not.
Residual Plots (contd.)
Residual Plots (contd.)
Ideally residual plots
looks like these, i.e.
1. They’re pretty
symmetrically
distributed, tending
to cluster towards
the middle of the
plot.
2. They’re clustered
around the lower
single digits of the y-
axis (e.g., 0.5 or 1.5,
not 30 or 150).
3. In general, there
aren’t any clear
Residual Plots (contd.)

Some not so ideal residual plots

Example Residual Plots and
Their Diagnoses: Y Axis
Imbalanced
Some unexceptionally high value of Y for normal values of X
Example Residual Plots and
Their Diagnoses:
Heteroscedasticity
meaning that the residuals get larger as the prediction
moves from small to large (or from large to small)
Example Residual Plots and
Their Diagnoses: Nonlinear
meaning your model doesn’t accurately represent the
relationship between “Temperature” and “Revenue.”
Example Residual Plots and
Their Diagnoses: Outliers
• data entry error, where the outlier is just wrong, delete it
• If a legitimate outlier, assess the impact of the outlier
Outliers
Data points that diverge in a big way from the overall
pattern are called outliers. There are four ways that a data
point might be considered an outlier.
• It could have an extreme X value compared to other
data points.
• It could have an extreme Y value compared to other
data points.
• It could have extreme X and Y values.
• It might be distant from the rest of the data, even
without extreme X or Y values.
Outliers (contd.)
Each type of
outlier is
depicted
graphically
in the
scatterplots
below.
Influential Points
An influential point is an outlier that greatly affects the
slope of the regression line. One way to test the influence
of an outlier is to compute the regression equation with
and without the outlier.
Influential Points (contd.)
This type of analysis is illustrated below. The scatterplots
are identical, except that one plot includes an outlier.
When the outlier is present, the slope is flatter (-4.10 vs. -
3.32); so this outlier would be considered an influential
point.
Influential Points (contd.)
Here, one chart has a single outlier, located at the high
end of the X axis (where x = 24). As a result of that single
outlier, the slope of the regression line changes greatly,
from -2.5 to -1.6; so the outlier would be considered an
influential point.

R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
StatLearning2r PDF
No ratings yet
StatLearning2r PDF
267 pages
CBSE Class 7 English - Comprehension Passage
100% (1)
CBSE Class 7 English - Comprehension Passage
7 pages
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
No ratings yet
Module 3: Linear Regression: TMA4268 Statistical Learning V2025
110 pages
Proposed Evacuation Center With Research Objectives
100% (6)
Proposed Evacuation Center With Research Objectives
7 pages
Week1 SLR
No ratings yet
Week1 SLR
30 pages
CH 11
No ratings yet
CH 11
55 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Lec2 ASE
No ratings yet
Lec2 ASE
86 pages
Linear Regression Model
No ratings yet
Linear Regression Model
36 pages
Statistics Week3
No ratings yet
Statistics Week3
19 pages
US - TMC - 06 - Curve Fitting & Interpolation
No ratings yet
US - TMC - 06 - Curve Fitting & Interpolation
64 pages
Regression Linear
No ratings yet
Regression Linear
24 pages
Hhghiikkk
No ratings yet
Hhghiikkk
29 pages
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
No ratings yet
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
60 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Regression
No ratings yet
Regression
60 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
Chapter 9 Simple Linear Regression and Correlation
No ratings yet
Chapter 9 Simple Linear Regression and Correlation
56 pages
2.linear Regression
No ratings yet
2.linear Regression
49 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
ch12 0
No ratings yet
ch12 0
82 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Simple Regression
100% (1)
Simple Regression
50 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Regression
No ratings yet
Regression
19 pages
Lev S. Vygotsky - Mind in Society The Development of Higher Psychological Processes
88% (16)
Lev S. Vygotsky - Mind in Society The Development of Higher Psychological Processes
170 pages
Chap01-3 (Autosaved)
No ratings yet
Chap01-3 (Autosaved)
51 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
4 Regression
No ratings yet
4 Regression
24 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Linear Regression
No ratings yet
Linear Regression
56 pages
Business Statistics II
100% (2)
Business Statistics II
100 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Topic 7 Linear Regreation CHP14
No ratings yet
Topic 7 Linear Regreation CHP14
21 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
95 pages
Applied Statistics II-SLR
100% (1)
Applied Statistics II-SLR
23 pages
Linear Regression
100% (2)
Linear Regression
228 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
SimpleMultipleLinearRegression FoundationalMathofAI S24
No ratings yet
SimpleMultipleLinearRegression FoundationalMathofAI S24
6 pages
Module 5
No ratings yet
Module 5
28 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Public Administration
No ratings yet
Public Administration
178 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Estad Istica II Chapter 4: Simple Linear Regression
No ratings yet
Estad Istica II Chapter 4: Simple Linear Regression
46 pages
Plate Heat Exchanger: Pre-Commissioning Checklist
100% (1)
Plate Heat Exchanger: Pre-Commissioning Checklist
1 page
Untitled 472
No ratings yet
Untitled 472
13 pages
Chapter2 (Simple Linear Regression)
No ratings yet
Chapter2 (Simple Linear Regression)
11 pages
Complete Report1 PLC
100% (3)
Complete Report1 PLC
90 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
SurgeTesting EARbasics 0716
100% (1)
SurgeTesting EARbasics 0716
2 pages
Introduction To Community Health and Environmental Sanitation
100% (3)
Introduction To Community Health and Environmental Sanitation
44 pages
HUMIDIFICADOR Fisher Paykel MR850 2
No ratings yet
HUMIDIFICADOR Fisher Paykel MR850 2
57 pages
WheelHorse Raider 10 and Raider 12 Owners Manual For Models 1-6051 1-6251-1-6252-1-6253
100% (3)
WheelHorse Raider 10 and Raider 12 Owners Manual For Models 1-6051 1-6251-1-6252-1-6253
12 pages
Cost Estimate For Construction of Cross Drainage Works Road:-Devari To Kalkoti Road Chainage: - Slab Culvert of Size 8.00 X 5.00 No of Span 5 Slab Thickness 600
No ratings yet
Cost Estimate For Construction of Cross Drainage Works Road:-Devari To Kalkoti Road Chainage: - Slab Culvert of Size 8.00 X 5.00 No of Span 5 Slab Thickness 600
12 pages
Chinese and Japanese Architecture
No ratings yet
Chinese and Japanese Architecture
26 pages
Tata Aig General Insurance Company Limited Smart Care-Extended Warranty Insurance Certificate of Insurance
No ratings yet
Tata Aig General Insurance Company Limited Smart Care-Extended Warranty Insurance Certificate of Insurance
7 pages
Numerical Measures To Describe Data
No ratings yet
Numerical Measures To Describe Data
103 pages
Array: Intermediate Level Questions
No ratings yet
Array: Intermediate Level Questions
3 pages
SwOS CSS326
No ratings yet
SwOS CSS326
14 pages
User'S Guide: 2. External Dimensions and Parts 5. Specifications
No ratings yet
User'S Guide: 2. External Dimensions and Parts 5. Specifications
8 pages
Central University of Haryana: Temporary Camp Office: Govt. B.Ed. College Building, Narnaul (Distt. Mahendergarh) Haryana
No ratings yet
Central University of Haryana: Temporary Camp Office: Govt. B.Ed. College Building, Narnaul (Distt. Mahendergarh) Haryana
7 pages
The Evolving Concept of Life
No ratings yet
The Evolving Concept of Life
17 pages
Use and Care Guide: Motion Security Light
No ratings yet
Use and Care Guide: Motion Security Light
36 pages
Control Serum Preparation
No ratings yet
Control Serum Preparation
12 pages
Sacher Torte
No ratings yet
Sacher Torte
2 pages
Moral Reasoning: Moral Reasoning Is The Process of Determining Right or Wrong in A Given Situation
No ratings yet
Moral Reasoning: Moral Reasoning Is The Process of Determining Right or Wrong in A Given Situation
12 pages
Wind Energy
No ratings yet
Wind Energy
26 pages
Home Cell Group Explosion Compress
No ratings yet
Home Cell Group Explosion Compress
4 pages
Toolbox Talks - Overhead Power Lines
No ratings yet
Toolbox Talks - Overhead Power Lines
2 pages
In Uence of Geographical Phenomenon On Yoga: A Study On Yoga-Geography
No ratings yet
In Uence of Geographical Phenomenon On Yoga: A Study On Yoga-Geography
10 pages
Allied Telesis
No ratings yet
Allied Telesis
2 pages
NOTES LIFE PROCESSES (Respiration, Excretion
No ratings yet
NOTES LIFE PROCESSES (Respiration, Excretion
3 pages
Indonesia Security Market Report 2017
No ratings yet
Indonesia Security Market Report 2017
6 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Complex numbers
From Everand
Complex numbers
Alessio Mangoni
No ratings yet
Exercises of Advanced Statistics
From Everand
Exercises of Advanced Statistics
Simone Malacrida
No ratings yet
Exercises of Tensors
From Everand
Exercises of Tensors
Simone Malacrida
No ratings yet

ch12 0

Uploaded by

ch12 0

Uploaded by

Simple Linear

If the two (random) variables are probabilistically related,

So we assume Y = 0 + 1x + ε, where ε is a

The quantity ε in the model equation is the “error”

Y: The dependent or response variable. For fixed x, Y will

ε: The random deviation or random error term. For fixed x, ε

What exactly does ε do?

If, for example, x = age of a child and y =

part of the population.

0 (the intercept of the true

1 (the slope of the true regression

(b) distribution of Y for different

The variance parameter  2 determines the extent to

Thus, this variance can be used to tell us how

But how do we define “good”?

Instead, sample data consists of n observed pairs

from which the model parameters and the true

The data (pairs) are assumed to have been

and the n deviations ε1, ε2,…, εn are independent

A line provides the best

The point estimates of 0 and 1, denoted by ,

The minimizing values of b0 and b1 are found by

Shortcut formulas for the numerator and

Sxy = xiyi – (xi)(yi)/n and Sxx = xi2 –

The computational formulas for Sxy and Sxx require

(yi2 will be needed shortly for the variance.)

The cetane number is a critical property in

Determination of this number for a

The article “Relating the Cetane Number of

Calculating the relevant statistics

and Sxy = 71,347.30 – (1307.5)(779.2)/14 = –

Suppose we have the following data on filtration

Relevant summary quantities (summary

The residual is a positive number if the point lies

The residual can be thought of as a measure of

Residual = Observed – Predicted

28.2 $44 $41 $3

21.4 $23 $23 $0

32.9 $43 $54 -$11

24.0 $30 $29 $1

etc. etc. etc. etc.

Some not so ideal residual plots

You might also like