Linear Regression
Linear Regression
PHOK Ponna
2023–2024
Statistics ITC 1 / 46
Contents
1 Introduction
5 Correlation
Statistics ITC 1 / 46
Contents
1 Introduction
5 Correlation
Statistics ITC 2 / 46
Outline
1 Introduction
5 Correlation
Statistics ITC 3 / 46
Introduction
Statistics ITC 4 / 46
Introduction
Statistics ITC 5 / 46
Introduction
Statistics ITC 6 / 46
Introduction
In this chapter,
We examine the relationship between one or more variables and
create a model that can be used for predictive purposes. Our aim
is to create a model and study inferential procedures when one
dependent and several independent variables are present.
We denote by 𝑌 the random variable to be predicted, also called
the dependent variable (or response variable) and by 𝑥 𝑖 the
independent (or predictor) variables used to model (or predict) 𝑌.
The process of finding a mathematical equation that best fits the
noisy data is known as regression analysis.
There are different forms of regression: simple linear, nonlinear,
multiple, and others.
The primary use of a regression model is prediction.When using a
model to predict 𝑌 for a particular set of values of 𝑥 1 , . . . , 𝑥 𝑘 , one
may want to know how large the error of prediction might be.
Regression analysis, in general after collecting the sample data,
involves the following steps.
Statistics ITC 7 / 46
Introduction
𝑌 = 𝑓 (𝑥 1 , . . . , 𝑥 𝑘 ; 𝛽 0 , 𝛽 1 , . . . , 𝛽 𝑘 ) + 𝜀.
𝜀 represents the random error term.
We assume: 𝐸(𝜀) = 0 but 𝑉(𝜀) = 𝜎2 is unknown.
From this we can obtain 𝐸(𝑌) = 𝑓 (𝑥1 , . . . , 𝑥 𝑘 ; 𝛽 0 , 𝛽 1 , . . . , 𝛽 𝑘 ).
2 Use the sample data to estimate unknown parameters in the
model.
3 Check for goodness of fit of the proposed model.
4 Use the model for prediction.
1 Introduction
5 Correlation
Statistics ITC 9 / 46
The Simple Linear Regression Model
Definition 1
A multiple linear regression model relating a random response 𝑌
to a set of predictor variables 𝑥 1 , . . . , 𝑥 𝑘 is an equation of the form
𝑌 = 𝛽0 + 𝛽1 𝑥1 + . . . + 𝛽 𝑘 𝑥 𝑘 + 𝜀
Statistics ITC 10 / 46
The Method of Least Squares
Definition 2
The sum of squares for errors (SSE) or sum of squares of the
residuals for all of the 𝑛 data points (𝑥 1 , 𝑦1 ), . . . , (𝑥 𝑛 , 𝑦𝑛 ) is
𝑛
Õ 𝑛 h
Õ i2
SSE = 𝑒 𝑖2 = 𝑦 𝑖 − 𝛽ˆ 0 + 𝛽ˆ 1 𝑥 𝑖 ,
𝑖=1 𝑖=1
𝑆𝑥 𝑦
𝛽ˆ 1 = , 𝛽ˆ 0 = 𝑦¯ − 𝛽ˆ 1 𝑥,
¯
𝑆 𝑥𝑥
where
𝑥 𝑖 )( 𝑦𝑖 ) 𝑥 𝑖 )2
Í Í
(
Í
(
𝑆𝑥 𝑦 = 𝑥 𝑖 𝑦𝑖 − , 𝑆 𝑥𝑥 = 𝑥 2𝑖 −
Í Í
𝑛 𝑛 .
Statistics ITC 12 / 46
The Simple Linear Regression Model
Example 1
Use the method of least squares to fit a straight line to the
accompanying data points. Give the estimates of 𝛽0 and 𝛽 1 . Plot the
points and sketch the fitted least-squares line. The observed data
values are given in the following table.
𝑥 −1 0 2 −2 5 6 8 11 12 −3
𝑦 −5 −4 2 −7 6 9 13 21 20 −9
Statistics ITC 13 / 46
Properties of the Least-Squares Estimators for the Model 𝑌 = 𝛽 0 + 𝛽1 𝑥 + 𝜀
Theorem 1
Let 𝑌 = 𝛽 0 + 𝛽1 𝑥 + 𝜀 be a simple linear regression model with
𝜀 ∼ 𝑁(0, 𝜎2 ), and let the errors 𝜀𝑖 associated with different
observations 𝑦 𝑖 (𝑖 = 1, . . . , 𝑛) be independent. Then
(a) 𝛽ˆ 0 and 𝛽ˆ 1 have normal distributions.
(b) The mean and variance are given by
𝑥¯ 2
1
𝐸(𝛽ˆ 0 ) = 𝛽 0 , 𝑉(𝛽ˆ 0 ) = + 𝜎2 ,
𝑛 𝑆 𝑥𝑥
and
𝜎2 𝜎
𝐸(𝛽ˆ 1 ) = 𝛽1 , 𝑉(𝛽ˆ 1 ) = 𝜎𝛽ˆ1 = √
𝑆 𝑥𝑥 𝑆 𝑥𝑥
Thus, 𝛽ˆ 0 and 𝛽ˆ 1 are unbiased estimators of 𝛽0 and 𝛽1 , respectively.
Statistics ITC 14 / 46
Estimating 𝜎2 and 𝜎
Theorem 2
For a random sample of size 𝑛. Then
(a) The error sum of squares can be expressed by
SSE = 𝑆 𝑦 𝑦 − 𝛽ˆ 1 𝑆 𝑥 𝑦
Statistics ITC 15 / 46
The Coefficient of Determination
Definition 3
The total sum of squares
Õ Õ Õ 2
SST = 𝑆 𝑦 𝑦 = (𝑦 𝑖 − 𝑦)
¯ 2= 𝑦 𝑖2 − 𝑦𝑖 /𝑛
Statistics ITC 16 / 46
Outline
1 Introduction
5 Correlation
Statistics ITC 17 / 46
Inferences about the least-squares estimators
Theorem 3
The assumptions of the simple linear regression model imply that the
standardized variable
𝛽ˆ 1 − 𝛽1
𝑇1 = q ∼ 𝑡(𝑛 − 2),
𝑀𝑆𝐸
𝑆 𝑥𝑥
𝛽ˆ 0 − 𝛽 0
𝑇0 = h i 1/2 ∼ 𝑡(𝑛 − 2)
𝑥¯ 2
𝑀𝑆𝐸 1
𝑛 + 𝑆 𝑥𝑥
Statistics ITC 18 / 46
Inferences about the least-squares estimators
Statistics ITC 19 / 46
Confidence Intervals For 𝛽 0 and 𝛽1
Example 2
The observed data values are given in the following table.
𝑥 −1 0 2 −2 5 6 8 11 12 −3
𝑦 −5 −4 2 −7 6 9 13 21 20 −9
Statistics ITC 20 / 46
Hypothesis-Testing Procedures For 𝛽 1
𝐻0 : 𝛽1 = 𝛽 10 (𝛽 10 is a specific value of 𝛽 1 ) 𝐻0 : 𝛽 1 = 𝛽 10
𝐻𝑎 : 𝛽 1 > 𝛽10 or 𝐻𝑎 : 𝛽 1 < 𝛽10 𝐻𝑎 : 𝛽 1 ≠ 𝛽 10
Test statistic value: Test statistic value:
𝛽ˆ 1 −𝛽10 𝛽ˆ 1 −𝛽 10
𝑡= q
𝑀𝑆𝐸
𝑡= q
𝑀𝑆𝐸
𝑆 𝑥𝑥 𝑆 𝑥𝑥
Rejection region: Rejection region:
𝑡 > 𝑡 𝛼,𝑛−2 (upper tail region) |𝑡 | > 𝑡 𝛼/2,𝑛−2
𝑡 < −𝑡 𝛼,𝑛−2 (lower tail region)
Statistics ITC 21 / 46
Hypothesis-Testing Procedures For 𝛽 0
𝐻0 : 𝛽0 = 𝛽 00 (𝛽 00 is a specific value of 𝛽 0 ) 𝐻0 : 𝛽 0 = 𝛽 00
𝐻𝑎 : 𝛽 0 > 𝛽00 or 𝐻𝑎 : 𝛽 0 < 𝛽00 𝐻𝑎 : 𝛽 0 ≠ 𝛽 00
Test statistic value: Test statistic value:
𝛽ˆ 0 −𝛽 00 𝛽ˆ 0 −𝛽00
𝑡= h i 1/ 2 𝑡= h i 1/2
𝑥¯ 2 𝑥¯ 2
𝑀𝑆𝐸 1
𝑛 + 𝑆 𝑥𝑥 𝑀𝑆𝐸 1
𝑛 + 𝑆 𝑥𝑥
Statistics ITC 22 / 46
Hypothesis-Testing Procedures For 𝛽 0 and 𝛽1
Example 3
Statistics ITC 23 / 46
Regression and ANOVA
The splitting of the total sum of squares SST into a part SSE, which
measures unexplained variation, and a part SSR, which measures
variation explained by the linear relationship, is strongly reminiscent of
one-way ANOVA.
Notations
𝑛
Õ 𝑛
Õ 𝑛
Õ
SST = (𝑦 𝑖 − 𝑦)
¯ 2, SSE = (𝑦 𝑖 − 𝑦ˆ 𝑖 )2 , SSR = ( 𝑦ˆ 𝑖 − 𝑦)
¯ 2
𝑖=1 𝑖=1 𝑖=1
Theorem 4
𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅
Statistics ITC 24 / 46
Regression and ANOVA
𝑀𝑆𝑅
𝐹= ∼ 𝐹(1, 𝑛 − 2)
𝑀𝑆𝐸
𝑀𝑆𝑅
and reject 𝐻0 if 𝑓 = 𝑀𝑆𝐸 ≥ 𝐹𝛼,1,𝑛−2 .
ANOVA table
Source of variation 𝑑𝑓 Sum of Squares Mean Square 𝑓
Regression 1 𝑆𝑆𝑅 𝑀𝑆𝑅 = 𝑆𝑆𝑅
1
𝑀𝑆𝑅
𝑀𝑆𝐸
Error 𝑛−2 𝑆𝑆𝐸 𝑀𝑆𝐸 = 𝑆𝑆𝐸
𝑛−2
Total 𝑛−1 𝑆𝑆𝑇
Statistics ITC 25 / 46
Regression and ANOVA
Example 4
In a study of baseline characteristics of 20 patients with foot ulcers, we
want to see the relationship between the stage of ulcer (determined
using the Yarkony-Kirk scale, a higher number indicating a more severe
stage, with range 1 to 6), and duration of ulcer (in days). Suppose we
have the data shown in Table below.
(a) Give an ANOVA table to test 𝐻0 : 𝛽 1 = 0 vs. 𝐻𝑎 : 𝛽 1 ≠ 0. What is
the conclusion of the test based on 𝛼 = 0.05?
(b) Write down the expression for the least-squares line.
1 Introduction
5 Correlation
Statistics ITC 27 / 46
Predicting a particular value of 𝑌
𝑆𝑆𝐸
where 𝑆2 = 𝑛−2 .
Statistics ITC 28 / 46
Predicting a particular value of 𝑌
Example 5
Using the data given in Example 3, obtain a 95% prediction interval at
𝑥 = 5.
Hint: 𝑌ˆ = −3.1011 + 2.0266𝑥, at 𝑥 = 5 =⇒ 𝑌ˆ = 7.0319, 𝑥¯ = 3.8, 𝑆 𝑥𝑥 =
263.6, 𝑆𝑆𝐸 = 7.79028, 𝑆 = 0.9868, 𝑡0.025,8 = 2.306
Statistics ITC 29 / 46
Outline
1 Introduction
5 Correlation
Statistics ITC 30 / 46
The Population and Sample Correlation Coefficients
Definition 4
The population correlation coefficient of two random variables 𝑋 and 𝑌
is defined by
𝐶𝑜𝑣(𝑋 , 𝑌)
𝜌 = 𝜌𝑋 ,𝑌 =
𝜎𝑋 · 𝜎𝑌
where 𝜎𝑋 and 𝜎𝑌 are standard deviations of 𝑋 and 𝑌, respectively.
𝑆𝑥 𝑦 𝑆𝑥 𝑦
𝑟 = pÍ =√
𝑆 𝑥𝑥 𝑆 𝑦 𝑦
p
(𝑥 𝑖 − 𝑥) (𝑦 𝑖 − 𝑦)
pÍ
¯2 ¯ 2
Statistics ITC 31 / 46
The Sample Correlation Coefficient 𝑟
Properties of 𝑟
1 The value of 𝑟 does not depend on which of the two variables is
Statistics ITC 32 / 46
Assumption on 𝑋 and 𝑌
Assumption
We assume that the pair (𝑋 , 𝑌) has bivariate normal probability
distribution, that is, its joint pdf is
(𝑥−𝜇𝑋 )2 (𝑥−𝜇𝑋 )(𝑦−𝜇𝑌 ) (𝑦−𝜇 𝑦 )2
1 − 2 −2𝜌 𝜎𝑋 𝜎𝑌 + /2(1−𝜌)2
𝜎 𝜎2
𝑓 (𝑥, 𝑦) = 𝑒 𝑋 𝑌 ,
2𝜋 · 𝜎𝑋 𝜎𝑌 1 − 𝜌2
p
(𝑥, 𝑦) ∈ R2 .
Theorem 5
Assume that (𝑋 , 𝑌) has bivariate normal distribution. Then 𝑋 and 𝑌
are independent if and only if 𝜌 = 0.
Statistics ITC 33 / 46
Inference about 𝜌
𝐻𝑎 : 𝜌 > 0 𝑡 ≥ 𝑡 𝛼,𝑛−2
𝐻𝑎 : 𝜌 < 0 𝑡 ≤ −𝑡 𝛼,𝑛−2
𝐻𝑎 : 𝜌 ≠ 0 either 𝑡 ≥ 𝑡 𝛼/2,𝑛−2 or 𝑡 ≤ −𝑡 𝛼/2,𝑛−2
Statistics ITC 34 / 46
Other Inferences Concerning 𝜌
Theorem 6
When (𝑋1 , 𝑌1 ), . . . , (𝑋𝑛 , 𝑌𝑛 ), with 𝑛 > 3, is a sample from a bivariate
normal distribution, the rv
1+𝑅
1
𝑉 = ln
2 1−𝑅
1+𝜌
1 1
𝜇𝑉 = ln , 𝜎𝑉2 =
2 1−𝜌 𝑛−3
Statistics ITC 35 / 46
Other Inferences Concerning 𝜌
𝐻 𝑎 : 𝜌 > 𝜌0 𝑧 ≥ 𝑧𝛼
𝐻 𝑎 : 𝜌 < 𝜌0 𝑧 ≤ −𝑧 𝛼
𝐻 𝑎 : 𝜌 ≠ 𝜌0 either 𝑧 ≥ 𝑧 𝛼/2 or 𝑧 ≤ −𝑧 𝛼/2
Statistics ITC 36 / 46
Inference about 𝜌
Example 6
For the data given in Example 3, would you say that the variables 𝑋
and 𝑌 are independent? Use 𝛼 = 0.05. Assume that (𝑋 , 𝑌) is bivariate
normally distributed.
Statistics ITC 37 / 46
Other Inferences Concerning 𝜌
where 𝑐1 and 𝑐2 are the left and right endpoints, respectively, of the
interval for 𝜇𝑣 .
Statistics ITC 38 / 46
Outline
1 Introduction
5 Correlation
Statistics ITC 39 / 46
Matrix Notation For Linear Regression
1 𝑥 11 𝑥 12 . . 𝑥 1𝑘
𝑦1
1 𝑥 21 𝑥 22 . . 𝑥 2𝑘 𝑦2
. . . . . .
.
, ,
X = Y =
. . . . . . .
. . . . . . .
1 𝑥 𝑛1 𝑥 𝑛2 . . 𝑥𝑛 𝑘 𝑦𝑛
Statistics ITC 41 / 46
Matrix Notation For Linear Regression
𝛽0
𝜀1
𝛽1 𝜀2
.
.
𝜷 = , 𝜺 =
. .
. .
𝛽𝑘 𝜀𝑛
Thus the 𝑛 equations representing the linear equations can be
rewritten in the matrix form as
Y = X𝜷 + 𝜺.
Statistics ITC 42 / 46
Matrix Notation For Linear Regression
Statistics ITC 43 / 46
Matrix Notation For Linear Regression
(X𝑇 X)b = X𝑇 Y
𝜷ˆ = b = (X𝑇 X)−1 X𝑇 Y
𝑌𝑖 = 𝛽 0 + 𝛽 1 𝑥 1𝑖 + 𝛽 1 𝑥2𝑖 + . . . + 𝛽 𝑘 𝑥 𝑘𝑖 , 𝑖 = 1, 2, . . . , 𝑛
𝜷ˆ = (X𝑇 X)−1 X𝑇 Y
Y = X𝜷ˆ
Statistics ITC 45 / 46
Matrix Notation For Linear Regression
Example 7
The following data relate to the prices (𝑌) of five randomly chosen
houses in a certain neighborhood, the corresponding ages of the houses
(𝑥 1 ), and square footage (𝑥 2 ).