0% found this document useful (0 votes)
60 views116 pages

Probability and Statistic Chapter7 - Linear - Regression - Models

This document outlines a chapter on simple linear regression and correlation. The chapter introduces simple linear regression models, discusses potential abuses of regression, and how to interpret results in R software. The learning outcomes cover understanding linear regression parameter estimation, hypothesis testing, prediction, residual analysis, correlation models, and using R to fit and interpret regression models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views116 pages

Probability and Statistic Chapter7 - Linear - Regression - Models

This document outlines a chapter on simple linear regression and correlation. The chapter introduces simple linear regression models, discusses potential abuses of regression, and how to interpret results in R software. The learning outcomes cover understanding linear regression parameter estimation, hypothesis testing, prediction, residual analysis, correlation models, and using R to fit and interpret regression models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 116

Introduction

A simple linear regression model


Abuses of regression
Interpreting R results

P ROBABILITY AND S TATISTICS


C HAPTER 8: S IMPLE L INEAR R EGRESSION AND C ORRELATION

Dr. Phan Thi Huong

HoChiMinh City University of Technology


Faculty of Applied Science, Department of Applied Mathematics
Email: [email protected]

HCM city — 2021.

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

O UTLINE

1 I NTRODUCTION

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

O UTLINE

1 I NTRODUCTION

2 A SIMPLE LINEAR REGRESSION MODEL

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

O UTLINE

1 I NTRODUCTION

2 A SIMPLE LINEAR REGRESSION MODEL

3 A BUSES OF REGRESSION

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

O UTLINE

1 I NTRODUCTION

2 A SIMPLE LINEAR REGRESSION MODEL

3 A BUSES OF REGRESSION

4 I NTERPRETING R RESULTS

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

L EARNING OUTCOMES
After careful study of this chapter, you should be able to do the
following:
1 Understand how the method of least squares is used to
estimate the parameters in a linear regression model.

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

L EARNING OUTCOMES
After careful study of this chapter, you should be able to do the
following:
1 Understand how the method of least squares is used to
estimate the parameters in a linear regression model.
2 Test statistical hypotheses and construct confidence intervals
on regression model parameters.

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

L EARNING OUTCOMES
After careful study of this chapter, you should be able to do the
following:
1 Understand how the method of least squares is used to
estimate the parameters in a linear regression model.
2 Test statistical hypotheses and construct confidence intervals
on regression model parameters.
3 Use the regression model to predict a future observation.

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

L EARNING OUTCOMES
After careful study of this chapter, you should be able to do the
following:
1 Understand how the method of least squares is used to
estimate the parameters in a linear regression model.
2 Test statistical hypotheses and construct confidence intervals
on regression model parameters.
3 Use the regression model to predict a future observation.
4 Analyze residuals to determine whether the regression model
is an adequate fit to the data or whether any underlying
assumptions are violated.

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

L EARNING OUTCOMES
After careful study of this chapter, you should be able to do the
following:
1 Understand how the method of least squares is used to
estimate the parameters in a linear regression model.
2 Test statistical hypotheses and construct confidence intervals
on regression model parameters.
3 Use the regression model to predict a future observation.
4 Analyze residuals to determine whether the regression model
is an adequate fit to the data or whether any underlying
assumptions are violated.
5 Apply the correlation model

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

L EARNING OUTCOMES
After careful study of this chapter, you should be able to do the
following:
1 Understand how the method of least squares is used to
estimate the parameters in a linear regression model.
2 Test statistical hypotheses and construct confidence intervals
on regression model parameters.
3 Use the regression model to predict a future observation.
4 Analyze residuals to determine whether the regression model
is an adequate fit to the data or whether any underlying
assumptions are violated.
5 Apply the correlation model
6 Use R software to fit simple linear regression models and
interpret the output.
Dr. Phan Thi Huong Probability and Statistics
Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

R EGRESSION ANALYSIS AND BINARY DATA

We are often interested in trying to determine the relationship


between a pair of variables. For instances,

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

R EGRESSION ANALYSIS AND BINARY DATA

We are often interested in trying to determine the relationship


between a pair of variables. For instances,
how does the amount of money spent in advertising a new
product relate to the first month’s sales figures for that product?

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

R EGRESSION ANALYSIS AND BINARY DATA

We are often interested in trying to determine the relationship


between a pair of variables. For instances,
how does the amount of money spent in advertising a new
product relate to the first month’s sales figures for that product?
how does the height of a father relate to that of his son?

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

R EGRESSION ANALYSIS AND BINARY DATA

We are often interested in trying to determine the relationship


between a pair of variables. For instances,
how does the amount of money spent in advertising a new
product relate to the first month’s sales figures for that product?
how does the height of a father relate to that of his son?
how does the electrical energy consumption of a house relate
to the size of the house?

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

R EGRESSION ANALYSIS AND BINARY DATA

We are often interested in trying to determine the relationship


between a pair of variables. For instances,
how does the amount of money spent in advertising a new
product relate to the first month’s sales figures for that product?
how does the height of a father relate to that of his son?
how does the electrical energy consumption of a house relate
to the size of the house?
...

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

R EGRESSION ANALYSIS AND BINARY DATA

We are often interested in trying to determine the relationship


between a pair of variables. For instances,
how does the amount of money spent in advertising a new
product relate to the first month’s sales figures for that product?
how does the height of a father relate to that of his son?
how does the electrical energy consumption of a house relate
to the size of the house?
...
⇒ the relationship between those variables are not deterministic

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

R EGRESSION ANALYSIS AND BINARY DATA

We are often interested in trying to determine the relationship


between a pair of variables. For instances,
how does the amount of money spent in advertising a new
product relate to the first month’s sales figures for that product?
how does the height of a father relate to that of his son?
how does the electrical energy consumption of a house relate
to the size of the house?
...
⇒ the relationship between those variables are not deterministic
⇒ The collection of statistical tools that are used to model and
explore relationships between variables that are related in a
nondeterministic manner is called regression analysis.
Dr. Phan Thi Huong Probability and Statistics
Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SIMPLE LINEAR REGRESSION MODEL

The case of simple linear regression considers a single predictor


variable or independent variable x and a dependent or response
variable Y .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SIMPLE LINEAR REGRESSION MODEL

The case of simple linear regression considers a single predictor


variable or independent variable x and a dependent or response
variable Y .
Suppose that for a specified value x of the independent variable the
value of the response variable Y can be expressed as

Y = β0 + β1 x + ε, (1)

where
β0 , β1 are unknown parameters and called regression
coefficients.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SIMPLE LINEAR REGRESSION MODEL

The case of simple linear regression considers a single predictor


variable or independent variable x and a dependent or response
variable Y .
Suppose that for a specified value x of the independent variable the
value of the response variable Y can be expressed as

Y = β0 + β1 x + ε, (1)

where
β0 , β1 are unknown parameters and called regression
coefficients.
ε is called the random error and assumed to be normally
distributed with E(ε) = 0 and Var (ε) = σ2 .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SIMPLE LINEAR REGRESSION MODEL

A simple linear regression model given in the equation (1) states


that mean of the random variable Y is related to x by the following
straight-line relationship:

E[Y |x] = β0 + β1 x,

where β0 and β1 are respectively the intercept and the slope of the
straight-line.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SSUMPTIONS OF THE ERROR TERM

Given n pairs of observations (x 1 , y 1 ), (x 2 , y 2 ),... , (x n , y n ) which are


collected from a random sample of size n, the equation (1) indicates

Yi = β0 + β1 x i + εi , i = 1, 2, . . . , n

. A simple linear regression requires

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SSUMPTIONS OF THE ERROR TERM

Given n pairs of observations (x 1 , y 1 ), (x 2 , y 2 ),... , (x n , y n ) which are


collected from a random sample of size n, the equation (1) indicates

Yi = β0 + β1 x i + εi , i = 1, 2, . . . , n

. A simple linear regression requires


The error terms εi are mutually independent.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SSUMPTIONS OF THE ERROR TERM

Given n pairs of observations (x 1 , y 1 ), (x 2 , y 2 ),... , (x n , y n ) which are


collected from a random sample of size n, the equation (1) indicates

Yi = β0 + β1 x i + εi , i = 1, 2, . . . , n

. A simple linear regression requires


The error terms εi are mutually independent.
εi ∼ N (0, σ2 ) or Y ∼ N (β0 + β1 x, σ2 ).

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SCATTER DIAGRAM FOR PAIR DATA

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SCATTER DIAGRAM FOR PAIR DATA


How might an observed dataset be a candidate for a simple linear
regression model?
Let’s consider a simple example of how the speed of a car affects its
stopping distance, that is, how far it travels before it comes to a stop.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SCATTER DIAGRAM FOR PAIR DATA


How might an observed dataset be a candidate for a simple linear
regression model?
Let’s consider a simple example of how the speed of a car affects its
stopping distance, that is, how far it travels before it comes to a stop.

The cars dataset contains 50


observations of two variables
speed(mph) and dist (ft).

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SCATTER DIAGRAM FOR PAIR DATA


How might an observed dataset be a candidate for a simple linear
regression model?
Let’s consider a simple example of how the speed of a car affects its
stopping distance, that is, how far it travels before it comes to a stop.

speed dist
1 4.00 2.00
The cars dataset contains 50
2 4.00 10.00
observations of two variables
3 7.00 4.00
speed(mph) and dist (ft).
4 7.00 22.00
5 8.00 16.00
... ... ...
48 24.00 93.00
49 24.00 120.00
Dr. Phan Thi Huong 50Statistics
Probability and 25.00 85.00
Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SCATTER DIAGRAM FOR PAIR DATA

F IGURE 1: The scatter diagram of the cars dataset.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A SCATTER DIAGRAM FOR PAIR DATA

F IGURE 1: The scatter diagram of the cars dataset.

⇒ A scatter diagram of the observed dataset can give us an


suggestion of a linear regression model.
Dr. Phan Thi Huong Probability and Statistics
Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Let β̂0 , β̂1 are respectively estimates of β0 and β1 .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Let β̂0 , β̂1 are respectively estimates of β0 and β1 .
The fitted regression line is given by
Ŷ = β̂0 + β̂1 x

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Let β̂0 , β̂1 are respectively estimates of β0 and β1 .
The fitted regression line is given by
Ŷ = β̂0 + β̂1 x

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Let β̂0 , β̂1 are respectively estimates of β0 and β1 .
The fitted regression line is given by
Ŷ = β̂0 + β̂1 x

The residual e i = y i − β̂0 + β̂1 x i =


¡ ¢

y i − ŷ i describes the error in the


fit of the model to the ith observa-
tion y i .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Let β̂0 , β̂1 are respectively estimates of β0 and β1 .
The fitted regression line is given by
Ŷ = β̂0 + β̂1 x

The residual e i = y i − β̂0 + β̂1 x i =


¡ ¢

y i − ŷ i describes the error in the


fit of the model to the ith observa-
tion y i .

The key concept: An optimized fitted regression line should be


"close to the observed data".

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Let β̂0 , β̂1 are respectively estimates of β0 and β1 .
The fitted regression line is given by
Ŷ = β̂0 + β̂1 x

The residual e i = y i − β̂0 + β̂1 x i =


¡ ¢

y i − ŷ i describes the error in the


fit of the model to the ith observa-
tion y i .

The key concept: An optimized fitted regression line should be


"close to the observed data".
β̂0 and β̂1 will be found by the least-square method.
Dr. Phan Thi Huong Probability and Statistics
Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

D EFINITION
For a dataset of n observations (x 1 , y 1 ), ..., (x n , y n ), the sum of
squares for errors is defined by
n n
e i2 = [y i − (β̂0 + β̂1 x i )]2
X X
SSE =
i =1 i =1

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

D EFINITION
For a dataset of n observations (x 1 , y 1 ), ..., (x n , y n ), the sum of
squares for errors is defined by
n n
e i2 = [y i − (β̂0 + β̂1 x i )]2
X X
SSE =
i =1 i =1

The least-square method aims to find the estimates β̂0 , and β̂1 by
minimizing SSE . Those estimates are called least squares
estimates.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


T HEOREM
The least squares estimates of the intercept and slope in the simple linear
regression model are
¡Pn ¢ ¡ Pn ¢
Pn i =1 x i i =1 y i
i =1 x i y i − n Sx y
βˆ1 = ¢2 = , and βˆ0 = ȳ − βˆ1 x̄
S
¡Pn
Pn 2 i =1 ix xx
i =1 x i − n

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


T HEOREM
The least squares estimates of the intercept and slope in the simple linear
regression model are
¡Pn ¢ ¡ Pn ¢
Pn i =1 x i i =1 y i
i =1 x i y i − n Sx y
βˆ1 = ¢2 = , and βˆ0 = ȳ − βˆ1 x̄
S
¡Pn
Pn 2 i =1 ix xx
i =1 x i − n
where S xx and S x y are defined by
¡Pn ¢2 Pn
n n
2 i =1 x i i =1 x i
x i2 −
X X
S xx = (x i − x̄) = , x=
i =1 i =1 n n
¡Pn ¢ ¡ Pn ¢ Pn
n n
i =1 x i i =1 y i i =1 y i
X X
Sx y = (x i − x̄)(y i − ȳ) = xi y i − , y=
i =1 i =1 n n
Dr. Phan Thi Huong Probability and Statistics
Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

E XAMPLE 1
A large midwestern bank is planning on introducing a new word
processing system to its secretarial staff. To learn about the amount
of training that is needed to effectively implement the new system,
the bank chose eight employees of roughly equal skill. These
workers were trained for different amounts of time and were then
individually put to work on a given project. The following data
indicate the training times and the resulting times (both in hours)
that it took each worker to complete the project.
Training time(= x) 22 18 30 16 25 20 10 14
Time to complete project (= Y ) 18.4 19.2 14.5 19.0 16.6 17.7 24.4 21.0

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

E XAMPLE 1 ( CONTINUED )
(A) What is the estimated regression line?
(B) Predict the amount of time it would take a worker who receives
28 hours of training to complete the project.
(C) Find the residual e i of an observation (x i , y i ) = (22, 18.4).
Solution:

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Considering the simple linear model: Yi = β0 + x i β1 + εi , i = 1, . . . , n

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Considering the simple linear model: Yi = β0 + x i β1 + εi , i = 1, . . . , n
The ith error term εi ∼ N (0, σ2 ).

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Considering the simple linear model: Yi = β0 + x i β1 + εi , i = 1, . . . , n
The ith error term εi ∼ N (0, σ2 ).
How would we estimate σ2 ?

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


Considering the simple linear model: Yi = β0 + x i β1 + εi , i = 1, . . . , n
The ith error term εi ∼ N (0, σ2 ).
How would we estimate σ2 ?
T HEOREM
The mean squares error (MSE) of a simple linear regression is
defined by
SSE
M SE = .
n −2
The mean squares error is an unbiased estimate of σ2 , that is

SSE
σ̂2 = M SE =
n −2
Proof:
Dr. Phan Thi Huong Probability and Statistics
Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


SST = ni=1 (y i − ȳ)2 = S y y is the total sum of squares. SST
P

measure the total variation of y i which is the variation of y i


compared to the average value ȳ.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


SST = ni=1 (y i − ȳ)2 = S y y is the total sum of squares. SST
P

measure the total variation of y i which is the variation of y i


compared to the average value ȳ.
SSR = ni=1 ( ŷ i − ȳ)2 = β̂1 S x y is the regression sum of squares.
P

SSR Measure the variation of y i resulted by different values of


x.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


SST = ni=1 (y i − ȳ)2 = S y y is the total sum of squares. SST
P

measure the total variation of y i which is the variation of y i


compared to the average value ȳ.
SSR = ni=1 ( ŷ i − ȳ)2 = β̂1 S x y is the regression sum of squares.
P

SSR Measure the variation of y i resulted by different values of


x.
SSE = ni=1 (y i − ŷ i )2 is the error sum of squares. SSE Measure
P

the variation of y i arisen by error aspects.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS


SST = ni=1 (y i − ȳ)2 = S y y is the total sum of squares. SST
P

measure the total variation of y i which is the variation of y i


compared to the average value ȳ.
SSR = ni=1 ( ŷ i − ȳ)2 = β̂1 S x y is the regression sum of squares.
P

SSR Measure the variation of y i resulted by different values of


x.
SSE = ni=1 (y i − ŷ i )2 is the error sum of squares. SSE Measure
P

the variation of y i arisen by error aspects.


Thus, we have a fundamental identity
n n n
(y i − ȳ)2 = ( ŷ i − ȳ)2 + (y i − ŷ i )2
X X X
(2)
i =1 i =1 i =1
SST = SSR + SSE

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

A more convenient computing formula for SSE is

SSE = SST − βˆ1 S x y .

The standard error of σ̂2 is


s
SSE
SE (σ̂2 ) =
n −2

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

A more convenient computing formula for SSE is

SSE = SST − βˆ1 S x y .

The standard error of σ̂2 is


s
SSE
SE (σ̂2 ) =
n −2

SE (σ̂2 ) indicates the variation of the observed data y i


compared to the fitted linear regression line.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

E STIMATING THE REGRESSION PARAMETERS

E XERCISE 2
The following data give, for certain years between 1982 and 2002,
the percentages of British women who were cigarette smokers.

Treat these data as coming from a linear regression model, with the
input being the year and the response being the percentage. Take
1982 as the base year, so 1982 has input value x = 0, 1986 has input
value x = 4, and so on.
(A) Estimate the value of σ2 .
(B) Predict the percentage of British women who smoked in 1997.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

Hypothesis tests of β1 includes the following cases:


( ( (
H 0 : β1 = b 1 H 0 : β1 = b 1 H 0 : β1 = b 1
(a) (b) (c)
H1 : β1 6= b 1 H 1 : β1 < b 1 H 1 : β1 > b 1

where b 1 and a confident level α are given.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

Hypothesis tests of β1 includes the following cases:


( ( (
H 0 : β1 = b 1 H 0 : β1 = b 1 H 0 : β1 = b 1
(a) (b) (c)
H1 : β1 6= b 1 H 1 : β1 < b 1 H 1 : β1 > b 1

where b 1 and a confident level α are given.


An important hypothesis is β1 = 0. Its importance lies in the
fact that it is equivalent to stating that a response does not
depend on the value of the input; or, in other words, there is no
regression on the input value.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION


T HEOREM
Let Yi = β0 + β1 x i + εi be a simple linear regression model for a
dataset of n independent observations where εi ∼ N (0, 1).
Considering β̂0 and βˆ1 are respectively the least-square estimates of
β0 and β1 , then

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION


T HEOREM
Let Yi = β0 + β1 x i + εi be a simple linear regression model for a
dataset of n independent observations where εi ∼ N (0, 1).
Considering β̂0 and βˆ1 are respectively the least-square estimates of
β0 and β1 , then
1 β̂0 and βˆ1 follow normal distribution.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION


T HEOREM
Let Yi = β0 + β1 x i + εi be a simple linear regression model for a
dataset of n independent observations where εi ∼ N (0, 1).
Considering β̂0 and βˆ1 are respectively the least-square estimates of
β0 and β1 , then
1 β̂0 and βˆ1 follow normal distribution.
2 The expectation and variance of βˆ0 and βˆ1 are respectively

x̄ 2
µ ¶
1
E(βˆ0 ) = β0 , Var (βˆ0 ) = + σ2 , (3)
n S xx
σ2
E(βˆ1 ) = β1 , Var (βˆ1 ) = (4)
S xx
Proof:
Dr. Phan Thi Huong Probability and Statistics
Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

A hypothesis test of β1 follows steps belows:

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

A hypothesis test of β1 follows steps belows:


1 State the hypotheses H0 and H1 .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

A hypothesis test of β1 follows steps belows:


1 State the hypotheses H0 and H1 .
2 State the confident level α.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

A hypothesis test of β1 follows steps belows:


1 State the hypotheses H0 and H1 .
2 State the confident level α.
3 Compute the test statistic:

βˆ1 − b 1
T β1 = ∼ t (n − 2)
SE (β̂1 )

where s
σ̂2
SE (β̂1 ) =
S xx

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

4 Determine the rejected range or compute p-value:


Alternative hypothesis rejected range p - value
n−2
H1 : β1 6= b 1 |t β1 | > t α/2 p = 2P(Tn−2 ≥ |t β0 |)
H 1 : β1 < b 1 t β1 < −t αn−2 p = P(Tn−2 ≤ t β0 )
H 1 : β1 > b 1 t β1 > t αn−2 p = P(Tn−2 ≥ t β0 )
5 Conclude whether H0 is rejected or not.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

A hypothesis test of β0 follows steps belows:

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

A hypothesis test of β0 follows steps belows:


1 State the hypotheses H0 and H1 .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

A hypothesis test of β0 follows steps belows:


1 State the hypotheses H0 and H1 .
2 State the confident level α.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

A hypothesis test of β0 follows steps belows:


1 State the hypotheses H0 and H1 .
2 State the confident level α.
3 Compute the test statistic:

βˆ0 − b 0
T β0 = ∼ t (n − 2)
SE (β̂0 )

where s
x̄ 2
µ ¶
SE (β̂0 ) = σ̂2 1+
S xx

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

H YPOTHESIS TESTS IN S IMPLE L INEAR R EGRESSION

4 Determine the rejected range or compute p-value:


Alternative hypothesis rejected range p - value
n−2
H1 : β0 6= b 0 |t β0 | > t α/2 p = 2P(Tn−2 ≥ |t β0 |)
H 1 : β0 < b 0 t β0 < −t αn−2 p = P(Tn−2 ≤ t β0 )
H 1 : β0 > b 1 t β0 > t αn−2 p = P(Tn−2 ≥ t β0 )
5 Conclude whether H0 is rejected or not.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C ONFIDENCE INTERVALS ON PARAMETERS

T HEOREM

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C ONFIDENCE INTERVALS ON PARAMETERS

T HEOREM
Under the assumption that the observations are normally and
independently distributed, a 100(1 − α)% confidence interval on the
slope β1 in simple linear regression is
s s
n−2 σ̂ n−2 σ̂
βˆ1 − t 1−α/2 ≤ β1 ≤ βˆ1 + t 1−α/2 (5)
S xx S xx

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C ONFIDENCE INTERVALS ON PARAMETERS

T HEOREM
Under the assumption that the observations are normally and
independently distributed, a 100(1 − α)% confidence interval on the
slope β1 in simple linear regression is
s s
n−2 σ̂ n−2 σ̂
βˆ1 − t 1−α/2 ≤ β1 ≤ βˆ1 + t 1−α/2 (5)
S xx S xx

Similarly, a 100(1 − α)% confidence interval on the intercept β0 is


sµ sµ
x̄ 2 x̄ 2
¶ ¶
n−2 1 n−2 1
βˆ0 − t 1−α/2 + 2 ˆ
σ̂ ≤ β0 ≤ β0 + t 1−α/2 + σ̂2 (6)
n S xx n S xx

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C ONFIDENCE INTERVALS ON PARAMETERS


E XERCISE 3
The following table relates the number of sunspots that appeared
each year from 1970 to 1980 to the number of automobile accident
deaths during that year. The data for automobile accident deaths
are in units of 1000 deaths.

Test the hypothesis that


the number of automo-
bile accident deaths is not
related to the number of
sunspots. Use the 5 per-
cent level of significance.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C OEFFICIENT OF DETERMINATION

D EFINITION
The coefficient of determination is the proportion of variation in
the response variables that is explained by the different values of
independent variable compared to the total variation. That is
computed by

SSR SSE
R2 = = 1− (7)
SST SST
Note that 0 ≤ R 2 ≤ 1.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C OEFFICIENT OF DETERMINATION

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C OEFFICIENT OF DETERMINATION

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C OEFFICIENT OF DETERMINATION

A value of R 2 near 1
indicates that most
of the variation of the
response data is ex-
plained by the different
values of independent
variable. In other word,
a the linear regression
model is explaining
well the relationship
between Y and x.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C OEFFICIENT OF DETERMINATION
A value of R 2 near 0 indicates that little of the variation is explained
by the different values of x or only a little portion of pair (Yi , x i ) has
linear correlation.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C OEFFICIENT OF DETERMINATION
A value of R 2 near 0 indicates that little of the variation is explained
by the different values of x or only a little portion of pair (Yi , x i ) has
linear correlation.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C OEFFICIENT OF DETERMINATION
A value of R 2 near 0 indicates that little of the variation is explained
by the different values of x or only a little portion of pair (Yi , x i ) has
linear correlation.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

C OEFFICIENT OF DETERMINATION
E XAMPLE 4
A new-car dealer is interested in the relationship between the
number of salespeople working on a weekend and the number of
cars sold. Data were gathered for six consecutive Sundays:

Number of salespeople 5 7 4 2 4 8
Number of cars sold 22 20 15 9 17 25

(A) Determine the estimated regression line.


(B) What is the coefficient of determination?
(C) How much of the variation in the number of automobiles sold
is explained by the number of salespeople?
(D) Test the null hypothesis that the mean number of sales does
not depend on the number of salespeople working.
Dr. Phan Thi Huong Probability and Statistics
Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

S AMPLE CORRELATION COEFFICIENT

D EFINITION
Considering a sample of n observations: (X i , Yi ), i = 1, . . . , n. The
sample correlation coefficient r X Y , is defined by
Pn
− X̄ )(Yi − Ȳ )
i =1 (X i SX Y
r X Y = qP =p (8)
n 2 n (Y − Ȳ )2
P S X X SST
i =1 (X i − X̄ ) i =1 i

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

S AMPLE CORRELATION COEFFICIENT

Note that s
SST
βˆ1 = rX Y
SX X
thus
2 SX X SX Y SSR
r X2 Y = βˆ1 = βˆ1 =
SST SST SST
• The coefficient of determination R 2 in a simple linear regression
model equals to the square of the sample correlation coefficient.

R 2 = r X2 Y

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

S AMPLE CORRELATION COEFFICIENT

The range of r X Y : −1 ≤ r X Y ≤ 1,
−1 ≤ r X Y < 0: negative correlation. r X Y is closer to −1
indicating a stronger negative correlation between X and Y .
0 < r X Y ≤ 1: positive correlation. r X Y is closer to 1 indicating a
stronger positive correlation between X and Y .
r X Y is closer to 0 indicating a weak correlation between X and
Y . r X Y = 0: indicating linearly independent between X and Y .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

S AMPLE CORRELATION COEFFICIENT

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .


• Analysis of Residuals is used to assess the assumptions of simple
linear regression models.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .


• Analysis of Residuals is used to assess the assumptions of simple
linear regression models.
• The assumptions of simple linear regression models:

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .


• Analysis of Residuals is used to assess the assumptions of simple
linear regression models.
• The assumptions of simple linear regression models:
The linear relationship of Y and x: Y = β0 + β1 x + ² where β0
and β1 are the regression coefficients such that given x we
have E(Y |x) = β0 + β1 x.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .


• Analysis of Residuals is used to assess the assumptions of simple
linear regression models.
• The assumptions of simple linear regression models:
The linear relationship of Y and x: Y = β0 + β1 x + ² where β0
and β1 are the regression coefficients such that given x we
have E(Y |x) = β0 + β1 x.
Constant variation: The variance σ2 of Y is invariant for all
value of x, e.i. V ar (Y |x) = σ2 .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

Normal distribution: Y |x ∼ N (β0 + β1 x, σ2 ).

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

Normal distribution: Y |x ∼ N (β0 + β1 x, σ2 ).


Independence: the observations of Y are independent.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

Normal distribution: Y |x ∼ N (β0 + β1 x, σ2 ).


Independence: the observations of Y are independent.
⇒ to test the normality, we use the Normal probability plot (Q-Q
plot) of the residuals or the standardized residuals.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

Normal distribution: Y |x ∼ N (β0 + β1 x, σ2 ).


Independence: the observations of Y are independent.
⇒ to test the normality, we use the Normal probability plot (Q-Q
plot) of the residuals or the standardized residuals.
⇒ to test the linearity, independence, and constant variances we
use the scatter plot of the residuals or the standardized residuals.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

T HE QQ- PLOT

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

T HE QQ- PLOT
A Q–Q plot is a plot of the quantiles of two distributions
against each other, or a plot based on estimates of the
quantiles. The pattern of points in the plot is used to compare
the two distributions.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

T HE QQ- PLOT
A Q–Q plot is a plot of the quantiles of two distributions
against each other, or a plot based on estimates of the
quantiles. The pattern of points in the plot is used to compare
the two distributions.
The points plotted in a Q–Q plot are always non-decreasing
when viewed from left to right. If the two distributions being
compared are identical, the Q–Q plot follows the 45 deg line
y =x

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

S TANDARDIZED RESIDUALS
The standardized residuals are defined as

Yi − (β̂0 + β̂1 x i )
Ei = p , i = 1, 2, . . . , n
SSE /(n − 2)

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

S TANDARDIZED RESIDUALS
The standardized residuals are defined as

Yi − (β̂0 + β̂1 x i )
Ei = p , i = 1, 2, . . . , n
SSE /(n − 2)

When the simple linear regression model is correct, the


standardized residuals are approximately independent standard
normal random variables. Thus,

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

S TANDARDIZED RESIDUALS
The standardized residuals are defined as

Yi − (β̂0 + β̂1 x i )
Ei = p , i = 1, 2, . . . , n
SSE /(n − 2)

When the simple linear regression model is correct, the


standardized residuals are approximately independent standard
normal random variables. Thus,
they should be randomly distributed about 0 with about 95
percent of their values being between −2 and +2 (since
P (−1.96 < Z < 1.96) = 0.95) );

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

S TANDARDIZED RESIDUALS
The standardized residuals are defined as

Yi − (β̂0 + β̂1 x i )
Ei = p , i = 1, 2, . . . , n
SSE /(n − 2)

When the simple linear regression model is correct, the


standardized residuals are approximately independent standard
normal random variables. Thus,
they should be randomly distributed about 0 with about 95
percent of their values being between −2 and +2 (since
P (−1.96 < Z < 1.96) = 0.95) );
their scatter plot should not indicate any distinct pattern.

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

Dr. Phan Thi Huong Probability and Statistics


Model definition
Introduction
Regression parameters
A simple linear regression model
Coefficient of determination
Abuses of regression
Sample correlation coefficient
Interpreting R results
Analysis of residuals: assessing the model

A NALYSIS OF RESIDUALS : ASSESSING THE MODEL .

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

A BUSES OF REGRESSION

Regression is widely used and frequently misused; we mention


several common abuses of regression briefly here

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

A BUSES OF REGRESSION

Regression is widely used and frequently misused; we mention


several common abuses of regression briefly here
Regression relationships are valid for values of the regression
variable only within the range of the original data. ⇒ be careful
with extrapolates.

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

A BUSES OF REGRESSION

Regression is widely used and frequently misused; we mention


several common abuses of regression briefly here
Regression relationships are valid for values of the regression
variable only within the range of the original data. ⇒ be careful
with extrapolates.
It’s hard to define what level of R 2 is appropriate to claim the
model fits well. Essentially, it will vary with the application and
the domain studied.

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

I NTERPRETING R RESULTS

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

I NTERPRETING R RESULTS

Dr. Phan Thi Huong Probability and Statistics


Introduction
A simple linear regression model
Abuses of regression
Interpreting R results

I NTERPRETING R RESULTS

Dr. Phan Thi Huong Probability and Statistics

You might also like