0% found this document useful (0 votes)
25 views47 pages

Linear Regression

Linear Regression

Uploaded by

Chhin Visal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views47 pages

Linear Regression

Linear Regression

Uploaded by

Chhin Visal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

CHAPTER VI

Linear Regression Models

PHOK Ponna

Institute of Technology of Cambodia


Department of Applied Mathematics and Statistics (AMS)

2023–2024

Statistics ITC 1 / 46
Contents

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 1 / 46
Contents

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 2 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 3 / 46
Introduction

Let 𝑥 = length of work experience (in year).


Let 𝑦 = monthly individual wage (in USD) with work experience 𝑥
years.
Aim: You may want to know the relationship between income 𝑦 and
work experience 𝑥.
Suppose that you asked 100 individuals and you obtained the data
points (𝑥 𝑖 , 𝑦 𝑖 ). The data plotted in the next slide is called scatter plot.
Question:
Overall, what can you say about the relationship between 𝑥 and 𝑦?
Can you sketch the relationship curve?

Statistics ITC 4 / 46
Introduction

Statistics ITC 5 / 46
Introduction

Statistics ITC 6 / 46
Introduction

In this chapter,
We examine the relationship between one or more variables and
create a model that can be used for predictive purposes. Our aim
is to create a model and study inferential procedures when one
dependent and several independent variables are present.
We denote by 𝑌 the random variable to be predicted, also called
the dependent variable (or response variable) and by 𝑥 𝑖 the
independent (or predictor) variables used to model (or predict) 𝑌.
The process of finding a mathematical equation that best fits the
noisy data is known as regression analysis.
There are different forms of regression: simple linear, nonlinear,
multiple, and others.
The primary use of a regression model is prediction.When using a
model to predict 𝑌 for a particular set of values of 𝑥 1 , . . . , 𝑥 𝑘 , one
may want to know how large the error of prediction might be.
Regression analysis, in general after collecting the sample data,
involves the following steps.
Statistics ITC 7 / 46
Introduction

Procedure for Regression Modeling


1 Hypothesize the form of the model as

𝑌 = 𝑓 (𝑥 1 , . . . , 𝑥 𝑘 ; 𝛽 0 , 𝛽 1 , . . . , 𝛽 𝑘 ) + 𝜀.
𝜀 represents the random error term.
We assume: 𝐸(𝜀) = 0 but 𝑉(𝜀) = 𝜎2 is unknown.
From this we can obtain 𝐸(𝑌) = 𝑓 (𝑥1 , . . . , 𝑥 𝑘 ; 𝛽 0 , 𝛽 1 , . . . , 𝛽 𝑘 ).
2 Use the sample data to estimate unknown parameters in the
model.
3 Check for goodness of fit of the proposed model.
4 Use the model for prediction.

The function 𝑓 (𝑥 1 , . . . , 𝑥 𝑘 ; 𝛽 0 , 𝛽 1 , . . . , 𝛽 𝑘 ) contains the independent or


predictor variables 𝑥 1 , . . . , 𝑥 𝑛 (assumed to be nonrandom) and
unknown parameters or weights 𝛽 0 , 𝛽1 , . . . , 𝛽 𝑘 and 𝜀 representing the
random or error variable. We now proceed to introduce the simplest
form of a regression model, called simple linear regression.
Statistics ITC 8 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 9 / 46
The Simple Linear Regression Model

Definition 1
A multiple linear regression model relating a random response 𝑌
to a set of predictor variables 𝑥 1 , . . . , 𝑥 𝑘 is an equation of the form

𝑌 = 𝛽0 + 𝛽1 𝑥1 + . . . + 𝛽 𝑘 𝑥 𝑘 + 𝜀

where 𝛽0 , . . . , 𝛽 𝑘 are unknown parameters and 𝜀 is a random deviation


or error term that is normally distributed with mean 0, variance 𝜎2 ,
and the various 𝜀’s are independent of one another. Simple linear
regression is the special case in which 𝑘 = 1.
The 𝑛 observed pairs (𝑥1 , 𝑦1 ), (𝑥 2 , 𝑦2 ), . . . , (𝑥 𝑛 , 𝑦𝑛 ) are regarded as
having been generated independently of each other from the model
equation (first fix 𝑥 = 𝑥 1 and observe 𝑌1 = 𝛽 0 + 𝛽 1 𝑥 1 + 𝜀1 , then fix
𝑥 = 𝑥 2 and observe 𝑌2 = 𝛽0 + 𝛽 1 𝑥 2 + 𝜀2 , and so on.

Statistics ITC 10 / 46
The Method of Least Squares

Definition 2
The sum of squares for errors (SSE) or sum of squares of the
residuals for all of the 𝑛 data points (𝑥 1 , 𝑦1 ), . . . , (𝑥 𝑛 , 𝑦𝑛 ) is
𝑛
Õ 𝑛 h
Õ  i2
SSE = 𝑒 𝑖2 = 𝑦 𝑖 − 𝛽ˆ 0 + 𝛽ˆ 1 𝑥 𝑖 ,
𝑖=1 𝑖=1

where 𝑒 𝑖 , the residual, is the deviation of 𝑦 𝑖 from its predictor


𝑌ˆ = 𝛽ˆ 0 + 𝛽ˆ 1 𝑥 𝑖 .
The least-squares approach is to find 𝛽ˆ 0 and 𝛽ˆ 1 that minimize SSE:

(𝛽ˆ 0 , 𝛽ˆ 1 ) ∈ argmax𝛽0 ,𝛽1 SSE.

The quantities 𝛽ˆ 0 and 𝛽ˆ 1 are called the least-squares estimates of


the parameters 𝛽 0 and 𝛽 1 , and the corresponding line 𝑌ˆ = 𝛽ˆ 0 + 𝛽ˆ 1 𝑥 is
called the least-squares line.
Statistics ITC 11 / 46
Derivation of 𝛽ˆ 0 and 𝛽ˆ 1

If SSE attains a minimum, then the partial derivatives of SSE with


respect to 𝛽 0 and 𝛽1 are zeros. Then we get 𝛽ˆ 0 and 𝛽ˆ 1 are solutions to
the normal equations:
𝑛
Õ 𝑛
Õ 𝑛
Õ 𝑛
Õ 𝑛
Õ
𝑦 𝑖 = 𝑛𝛽0 + 𝛽 1 𝑥𝑖 , 𝑥 𝑖 𝑦𝑖 = 𝛽0 𝑥 𝑖 + 𝛽1 𝑥 2𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1

The least-squares estimates 𝛽ˆ 0 and 𝛽ˆ 1 are given by

𝑆𝑥 𝑦
𝛽ˆ 1 = , 𝛽ˆ 0 = 𝑦¯ − 𝛽ˆ 1 𝑥,
¯
𝑆 𝑥𝑥
where
𝑥 𝑖 )( 𝑦𝑖 ) 𝑥 𝑖 )2
Í Í
(
Í
(
𝑆𝑥 𝑦 = 𝑥 𝑖 𝑦𝑖 − , 𝑆 𝑥𝑥 = 𝑥 2𝑖 −
Í Í
𝑛 𝑛 .

Statistics ITC 12 / 46
The Simple Linear Regression Model

Example 1
Use the method of least squares to fit a straight line to the
accompanying data points. Give the estimates of 𝛽0 and 𝛽 1 . Plot the
points and sketch the fitted least-squares line. The observed data
values are given in the following table.
𝑥 −1 0 2 −2 5 6 8 11 12 −3
𝑦 −5 −4 2 −7 6 9 13 21 20 −9

Hint: 𝑆 𝑥𝑥 = 263.6, 𝑆 𝑥 𝑦 = 534.2, 𝑥¯ = 3.8, 𝑦¯ = 4.6

Statistics ITC 13 / 46
Properties of the Least-Squares Estimators for the Model 𝑌 = 𝛽 0 + 𝛽1 𝑥 + 𝜀

Theorem 1
Let 𝑌 = 𝛽 0 + 𝛽1 𝑥 + 𝜀 be a simple linear regression model with
𝜀 ∼ 𝑁(0, 𝜎2 ), and let the errors 𝜀𝑖 associated with different
observations 𝑦 𝑖 (𝑖 = 1, . . . , 𝑛) be independent. Then
(a) 𝛽ˆ 0 and 𝛽ˆ 1 have normal distributions.
(b) The mean and variance are given by

𝑥¯ 2
 
1
𝐸(𝛽ˆ 0 ) = 𝛽 0 , 𝑉(𝛽ˆ 0 ) = + 𝜎2 ,
𝑛 𝑆 𝑥𝑥

and
𝜎2 𝜎
𝐸(𝛽ˆ 1 ) = 𝛽1 , 𝑉(𝛽ˆ 1 ) = 𝜎𝛽ˆ1 = √
𝑆 𝑥𝑥 𝑆 𝑥𝑥
Thus, 𝛽ˆ 0 and 𝛽ˆ 1 are unbiased estimators of 𝛽0 and 𝛽1 , respectively.

Statistics ITC 14 / 46
Estimating 𝜎2 and 𝜎

Theorem 2
For a random sample of size 𝑛. Then
(a) The error sum of squares can be expressed by

SSE = 𝑆 𝑦 𝑦 − 𝛽ˆ 1 𝑆 𝑥 𝑦

(b) 𝐸 [SSE] = (𝑛 − 2)𝜎2 .


Thus, an unbiased estimator of the error variance, 𝜎2 , is
𝜎ˆ 2 = SSE/(𝑛 − 2).

We denote (Mean Square Error)

MSE = SSE/(𝑛 − 2).

Statistics ITC 15 / 46
The Coefficient of Determination

Definition 3
The total sum of squares
Õ Õ Õ 2
SST = 𝑆 𝑦 𝑦 = (𝑦 𝑖 − 𝑦)
¯ 2= 𝑦 𝑖2 − 𝑦𝑖 /𝑛

The coefficient of determination, denoted by R2 , is given by


SSE
R2 = 1 −
SST
It is interpreted as the proportion of observed 𝑦 variation that can be
explained by the simple linear regression model (attributed to an
approximate linear relationship between 𝑦 and 𝑥).

Statistics ITC 16 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 17 / 46
Inferences about the least-squares estimators

Theorem 3
The assumptions of the simple linear regression model imply that the
standardized variable

𝛽ˆ 1 − 𝛽1
𝑇1 = q ∼ 𝑡(𝑛 − 2),
𝑀𝑆𝐸
𝑆 𝑥𝑥

where 𝑀𝑆𝐸 = 𝑆𝑆𝐸 𝑛−2 .


Similarly, the standardized variable

𝛽ˆ 0 − 𝛽 0
𝑇0 = h   i 1/2 ∼ 𝑡(𝑛 − 2)
𝑥¯ 2
𝑀𝑆𝐸 1
𝑛 + 𝑆 𝑥𝑥

Statistics ITC 18 / 46
Inferences about the least-squares estimators

Confidence Intervals For 𝛽 0 and 𝛽1


A 100(1 − 𝛼)% CI for 𝛽1 is given by
r r !
𝑀𝑆𝐸 ˆ 𝑀𝑆𝐸
𝛽ˆ 1 − 𝑡 𝛼/2,𝑛−2 , 𝛽 1 + 𝑡 𝛼/2,𝑛−2
𝑆 𝑥𝑥 𝑆 𝑥𝑥

A 100(1 − 𝛼)% CI for 𝛽0 is given by


  1/2
𝑥¯ 2
 
1
𝛽ˆ 0 ± 𝑡 𝛼/2,𝑛−2 𝑀𝑆𝐸 +
𝑛 𝑆 𝑥𝑥

Statistics ITC 19 / 46
Confidence Intervals For 𝛽 0 and 𝛽1

Example 2
The observed data values are given in the following table.
𝑥 −1 0 2 −2 5 6 8 11 12 −3
𝑦 −5 −4 2 −7 6 9 13 21 20 −9

1 Construct a 95% confidence interval for 𝛽 0 and interpret.


2 Construct a 95% confidence interval for 𝛽 1 and interpret.

Hint: 𝑆 𝑥𝑥 = 263.6, 𝑆 𝑥 𝑦 = 534.2, 𝑥¯ = 3.8, 𝑦¯ = 4.6, 𝑆𝑆𝐸 = 7.79028, 𝑀𝑆𝐸 =


0.973785, 𝑡0.025,8 = 2.306.

Statistics ITC 20 / 46
Hypothesis-Testing Procedures For 𝛽 1

One-sided test Two-sided test

𝐻0 : 𝛽1 = 𝛽 10 (𝛽 10 is a specific value of 𝛽 1 ) 𝐻0 : 𝛽 1 = 𝛽 10
𝐻𝑎 : 𝛽 1 > 𝛽10 or 𝐻𝑎 : 𝛽 1 < 𝛽10 𝐻𝑎 : 𝛽 1 ≠ 𝛽 10
Test statistic value: Test statistic value:
𝛽ˆ 1 −𝛽10 𝛽ˆ 1 −𝛽 10
𝑡= q
𝑀𝑆𝐸
𝑡= q
𝑀𝑆𝐸
𝑆 𝑥𝑥 𝑆 𝑥𝑥
Rejection region: Rejection region:
𝑡 > 𝑡 𝛼,𝑛−2 (upper tail region) |𝑡 | > 𝑡 𝛼/2,𝑛−2
𝑡 < −𝑡 𝛼,𝑛−2 (lower tail region)

Statistics ITC 21 / 46
Hypothesis-Testing Procedures For 𝛽 0

One-sided test Two-sided test

𝐻0 : 𝛽0 = 𝛽 00 (𝛽 00 is a specific value of 𝛽 0 ) 𝐻0 : 𝛽 0 = 𝛽 00
𝐻𝑎 : 𝛽 0 > 𝛽00 or 𝐻𝑎 : 𝛽 0 < 𝛽00 𝐻𝑎 : 𝛽 0 ≠ 𝛽 00
Test statistic value: Test statistic value:
𝛽ˆ 0 −𝛽 00 𝛽ˆ 0 −𝛽00
𝑡= h   i 1/ 2 𝑡= h  i 1/2
𝑥¯ 2 𝑥¯ 2
𝑀𝑆𝐸 1
𝑛 + 𝑆 𝑥𝑥 𝑀𝑆𝐸 1
𝑛 + 𝑆 𝑥𝑥

Rejection region: Rejection region:


𝑡 > 𝑡 𝛼,𝑛−2 (upper tail region) |𝑡 | > 𝑡 𝛼/2,𝑛−2
𝑡 < −𝑡 𝛼,𝑛−2 (lower tail region)

Statistics ITC 22 / 46
Hypothesis-Testing Procedures For 𝛽 0 and 𝛽1

Example 3

The observed data values are given in the following table.


𝑥 −1 0 2 −2 5 6 8 11 12 −3
𝑦 −5 −4 2 −7 6 9 13 21 20 −9

1 Test the hypothesis 𝐻0 : 𝛽1 = 2 versus 𝐻𝑎 : 𝛽 1 ≠ 2 using the 0.05


level of significance.
2 Test the hypothesis 𝐻0 : 𝛽0 = −3 versus 𝐻𝑎 : 𝛽0 ≠ −3 using the
0.05 level of significance.

Hint: 𝑆 𝑥𝑥 = 263.6, 𝑆 𝑥 𝑦 = 534.2, 𝑥¯ = 3.8, 𝑦¯ = 4.6, 𝑆𝑆𝐸 = 7.79028, 𝑀𝑆𝐸 =


0.973785, 𝑡0.025,8 = 2.306, 𝛽ˆ 0 = −3.1011, 𝛽ˆ 1 = 2.0266.

Statistics ITC 23 / 46
Regression and ANOVA

The splitting of the total sum of squares SST into a part SSE, which
measures unexplained variation, and a part SSR, which measures
variation explained by the linear relationship, is strongly reminiscent of
one-way ANOVA.
Notations

𝑛
Õ 𝑛
Õ 𝑛
Õ
SST = (𝑦 𝑖 − 𝑦)
¯ 2, SSE = (𝑦 𝑖 − 𝑦ˆ 𝑖 )2 , SSR = ( 𝑦ˆ 𝑖 − 𝑦)
¯ 2
𝑖=1 𝑖=1 𝑖=1

Theorem 4
𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅

Statistics ITC 24 / 46
Regression and ANOVA

To test 𝐻0 : 𝛽 1 = 0 Vs. 𝐻𝑎 : 𝛽1 ≠ 0, we could use the statistic

𝑀𝑆𝑅
𝐹= ∼ 𝐹(1, 𝑛 − 2)
𝑀𝑆𝐸
𝑀𝑆𝑅
and reject 𝐻0 if 𝑓 = 𝑀𝑆𝐸 ≥ 𝐹𝛼,1,𝑛−2 .
ANOVA table
Source of variation 𝑑𝑓 Sum of Squares Mean Square 𝑓
Regression 1 𝑆𝑆𝑅 𝑀𝑆𝑅 = 𝑆𝑆𝑅
1
𝑀𝑆𝑅
𝑀𝑆𝐸
Error 𝑛−2 𝑆𝑆𝐸 𝑀𝑆𝐸 = 𝑆𝑆𝐸
𝑛−2
Total 𝑛−1 𝑆𝑆𝑇

Statistics ITC 25 / 46
Regression and ANOVA

Example 4
In a study of baseline characteristics of 20 patients with foot ulcers, we
want to see the relationship between the stage of ulcer (determined
using the Yarkony-Kirk scale, a higher number indicating a more severe
stage, with range 1 to 6), and duration of ulcer (in days). Suppose we
have the data shown in Table below.
(a) Give an ANOVA table to test 𝐻0 : 𝛽 1 = 0 vs. 𝐻𝑎 : 𝛽 1 ≠ 0. What is
the conclusion of the test based on 𝛼 = 0.05?
(b) Write down the expression for the least-squares line.

Stage of Ulcer (𝑥) 4 3 5 4 4 3 3 4 6 3


Duration (𝑑) 18 6 20 15 16 15 10 18 26 15
Stage of Ulcer (𝑥) 3 4 3 2 3 2 2 3 5 6
Duration (𝑑) 8 16 17 6 7 7 8 11 21 24

Hint: 𝑆𝑆𝑅 = 570.04, 𝑆𝑆𝐸 = 133.16, 𝑆𝑆𝑇 = 703.20, 𝑑 = 4.61𝑥 − 2.40.


Statistics ITC 26 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 27 / 46
Predicting a particular value of 𝑌

Let 𝑌ˆ0 denote a predictor of a particular value of 𝑌 = 𝑌0 and let the


corresponding values of 𝑥 be 𝑥 0 . We shall choose 𝑌ˆ0 to be 𝐸(𝑌|𝑥 0 ). Let
𝑌ˆ denote a predictor of a particular value of 𝑌.
CI for a particular value of 𝑌
A (1 − 𝛼)100% prediction interval for 𝑌 is
s
1 (𝑥 − 𝑥)

¯2
𝑌ˆ ± 𝑡 𝛼/2,𝑛−2 · 𝑆 · 1+ +
𝑛 𝑆 𝑥𝑥

𝑆𝑆𝐸
where 𝑆2 = 𝑛−2 .

Statistics ITC 28 / 46
Predicting a particular value of 𝑌

Example 5
Using the data given in Example 3, obtain a 95% prediction interval at
𝑥 = 5.
Hint: 𝑌ˆ = −3.1011 + 2.0266𝑥, at 𝑥 = 5 =⇒ 𝑌ˆ = 7.0319, 𝑥¯ = 3.8, 𝑆 𝑥𝑥 =
263.6, 𝑆𝑆𝐸 = 7.79028, 𝑆 = 0.9868, 𝑡0.025,8 = 2.306

Statistics ITC 29 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 30 / 46
The Population and Sample Correlation Coefficients

Definition 4
The population correlation coefficient of two random variables 𝑋 and 𝑌
is defined by
𝐶𝑜𝑣(𝑋 , 𝑌)
𝜌 = 𝜌𝑋 ,𝑌 =
𝜎𝑋 · 𝜎𝑌
where 𝜎𝑋 and 𝜎𝑌 are standard deviations of 𝑋 and 𝑌, respectively.

In general, we do not know the value of 𝜌. Usually,given sample data,


𝜌 is estimated by the sample correlation 𝜌ˆ = 𝑟 defined below:
Definition 5
The sample correlation coefficient for the 𝑛 pairs (𝑥 1 , 𝑦1 ), . . . , (𝑥 𝑛 , 𝑦𝑛 ) is

𝑆𝑥 𝑦 𝑆𝑥 𝑦
𝑟 = pÍ =√
𝑆 𝑥𝑥 𝑆 𝑦 𝑦
p
(𝑥 𝑖 − 𝑥) (𝑦 𝑖 − 𝑦)

¯2 ¯ 2

Statistics ITC 31 / 46
The Sample Correlation Coefficient 𝑟

Properties of 𝑟
1 The value of 𝑟 does not depend on which of the two variables is

labeled 𝑥 and which is labeled 𝑦.


2 The value of 𝑟 is independent of the units in which 𝑥 and 𝑦 are
measured.
3 −1 ≤ 𝑟 ≤ 1
4 𝑟 = 1 if and only if (iff) all (𝑥 𝑖 , 𝑦 𝑖 ) pairs lie on a straight line with
positive slope, and 𝑟 = −1 iff all (𝑥 𝑖 , 𝑦 𝑖 ) pairs lie on a straight line
with negative slope.
5 The square of the sample correlation coefficient gives the value of
the coefficient of determination that would result from fitting the
simple linear regression model-in symbols, (𝑟)2 = 𝑟 2 .

Statistics ITC 32 / 46
Assumption on 𝑋 and 𝑌

Assumption
We assume that the pair (𝑋 , 𝑌) has bivariate normal probability
distribution, that is, its joint pdf is
 
(𝑥−𝜇𝑋 )2 (𝑥−𝜇𝑋 )(𝑦−𝜇𝑌 ) (𝑦−𝜇 𝑦 )2
1 − 2 −2𝜌 𝜎𝑋 𝜎𝑌 + /2(1−𝜌)2
𝜎 𝜎2
𝑓 (𝑥, 𝑦) = 𝑒 𝑋 𝑌 ,
2𝜋 · 𝜎𝑋 𝜎𝑌 1 − 𝜌2
p

(𝑥, 𝑦) ∈ R2 .

Theorem 5
Assume that (𝑋 , 𝑌) has bivariate normal distribution. Then 𝑋 and 𝑌
are independent if and only if 𝜌 = 0.

Statistics ITC 33 / 46
Inference about 𝜌

Testing for the absence of correlation


Let 𝑅 denotes the random variable whose realization is 𝑟.
When 𝐻0 : 𝜌 = 0 is true, the test statistic

𝑅 𝑛−2
𝑇= √ ∼ 𝑡(𝑛 − 2)
1 − 𝑅2
and the test value √
𝑟 𝑛−2
𝑡= √ .
1 − 𝑟2
Alternative Hypothesis Rejection Region for Level 𝛼 Test

𝐻𝑎 : 𝜌 > 0 𝑡 ≥ 𝑡 𝛼,𝑛−2
𝐻𝑎 : 𝜌 < 0 𝑡 ≤ −𝑡 𝛼,𝑛−2
𝐻𝑎 : 𝜌 ≠ 0 either 𝑡 ≥ 𝑡 𝛼/2,𝑛−2 or 𝑡 ≤ −𝑡 𝛼/2,𝑛−2

Statistics ITC 34 / 46
Other Inferences Concerning 𝜌

Theorem 6
When (𝑋1 , 𝑌1 ), . . . , (𝑋𝑛 , 𝑌𝑛 ), with 𝑛 > 3, is a sample from a bivariate
normal distribution, the rv

1+𝑅
 
1
𝑉 = ln
2 1−𝑅

has approximately a normal distribution with mean and variance

1+𝜌
 
1 1
𝜇𝑉 = ln , 𝜎𝑉2 =
2 1−𝜌 𝑛−3

Statistics ITC 35 / 46
Other Inferences Concerning 𝜌

Testing for the population correlation


The test statistic for testing 𝐻0 : 𝜌 = 𝜌0 is
 
1+𝜌
𝑉 − 12 ln 1−𝜌00
𝑍= √
1/ 𝑛 − 3

Alternative Hypothesis Rejection Region for Level 𝛼 Test

𝐻 𝑎 : 𝜌 > 𝜌0 𝑧 ≥ 𝑧𝛼
𝐻 𝑎 : 𝜌 < 𝜌0 𝑧 ≤ −𝑧 𝛼
𝐻 𝑎 : 𝜌 ≠ 𝜌0 either 𝑧 ≥ 𝑧 𝛼/2 or 𝑧 ≤ −𝑧 𝛼/2

Statistics ITC 36 / 46
Inference about 𝜌

Example 6
For the data given in Example 3, would you say that the variables 𝑋
and 𝑌 are independent? Use 𝛼 = 0.05. Assume that (𝑋 , 𝑌) is bivariate
normally distributed.

Hint: 𝑛𝑖=1 𝑥 𝑖 = 38, 𝑛𝑖=1 𝑦 𝑖 = 46, 𝑛𝑖=1 𝑥 𝑖 𝑦 𝑖 = 709, 𝑛𝑖=1 𝑥 2𝑖 =


Í Í Í Í
408, 𝑛𝑖=1 𝑦 𝑖2 = 1302, 𝑛 = 10, 𝑟 = 0.99641, 𝑧 = 8.3618, 𝑧0.025 = 1.96.
Í

Statistics ITC 37 / 46
Other Inferences Concerning 𝜌

Testing for the population correlation


 
1+𝜌
To obtain a CI for 𝜌, we first derive an interval for 𝜇𝑉 = 12 ln 1−𝜌 .
Standardizing 𝑉, writing a probability statement, and manipulating
the resulting inequalities yields
𝑧 𝛼/2 𝑧 𝛼/2
 
𝑣− √ ,𝑣 + √
𝑛−3 𝑛−3

as a 100(1 − 𝛼)% interval for 𝜇𝑉 , where 𝑣 = 1 1+𝑟



2 ln 1−𝑟 . This interval can
then be manipulated to yield a CI for 𝜌.
A 100(1 − 𝛼)% confidence interval for 𝜌 is
 
𝑒 2𝑐1 −1 𝑒 2𝑐2 −1
,
𝑒 2𝑐1 +1 𝑒 2𝑐2 +1

where 𝑐1 and 𝑐2 are the left and right endpoints, respectively, of the
interval for 𝜇𝑣 .

Statistics ITC 38 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 39 / 46
Matrix Notation For Linear Regression

Recall that we used an additive model equation to relate a dependent


variable 𝑦 to independent variables 𝑥1 , . . . , 𝑥 𝑘 . That is, we used the
model
𝑌 = 𝛽0 + 𝛽 1 𝑥 1 + 𝛽 2 𝑥 2 + . . . + 𝛽 𝑘 𝑥 𝑘 + 𝜀,
where 𝜀 ∼ 𝑁(0, 𝜎2 ), and the various 𝜀’s are independent of one another.
Simple linear regression is the special case in which 𝑘 = 1.
Suppose that we have 𝑛 observations, each consisting of a 𝑦 value and
values of the 𝑘 predictors (so each observation consists of 𝑘 + 1
numbers). We have then

 𝑦1   𝛽0 + 𝛽1 𝑥11 + 𝛽2 𝑥12 + . . . + 𝛽 𝑘 𝑥1𝑘 + 𝜀1
 


 .   . 
. .
   
 = 
. .
   
   
𝑦𝑛   𝛽0 + 𝛽1 𝑥 𝑛1 + 𝛽2 𝑥 𝑛2 + . . . + 𝛽 𝑘 𝑥 𝑛 𝑘 + 𝜀𝑛
   
 
   
where 𝑥 𝑖𝑗 is the 𝑗th independent variable for the 𝑖th observation,
𝑖 = 1, 2, . . . , 𝑛, and 𝜀𝑖 ’s is are independent.
Statistics ITC 40 / 46
Matrix Notation For Linear Regression

Define the following matrices:


 1 𝑥 11 𝑥 12 . . 𝑥 1𝑘 


 𝑦1 


 1 𝑥 21 𝑥 22 . . 𝑥 2𝑘   𝑦2 
. . . . . .
 
.

, ,
   
X =  Y = 
. . . . . .  . 
. . . . . . .
   
   
1 𝑥 𝑛1 𝑥 𝑛2 . . 𝑥𝑛 𝑘 𝑦𝑛
   
   
   

Statistics ITC 41 / 46
Matrix Notation For Linear Regression


 𝛽0 


 𝜀1 


 𝛽1   𝜀2 
.
 
.

𝜷 =  , 𝜺 = 
   
. .

 
. .
   
   
𝛽𝑘 𝜀𝑛
   
   
   
Thus the 𝑛 equations representing the linear equations can be
rewritten in the matrix form as

Y = X𝜷 + 𝜺.

Statistics ITC 42 / 46
Matrix Notation For Linear Regression

In particular, for the 𝑛 observations from the simple linear model of


the form
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
we can write
Y = X𝜷 + 𝜺,
where

 1 𝑥1  
 𝑦1 


 𝜀1 


 1 𝑥2   𝑦2   𝜀2 
.

.

𝛽0

.
  
, , 𝜷= , 𝜺 = 
     
X =  Y = 
. . 𝛽1 .

  
. . .
     
     
1 𝑥 𝑛  𝑦𝑛 𝜀𝑛
     
    
    

Statistics ITC 43 / 46
Matrix Notation For Linear Regression

We now estimate 𝛽 0 , 𝛽 1 , 𝛽 2 , . . . , 𝛽 𝑘 using the principle of least squares:


Find 𝑏0 , 𝑏1 , 𝑏2 , . . . , 𝑏 𝑘 to minimize
Í𝑛
𝑖=1 [𝑦 𝑖 − (𝑏0 + 𝑏1 𝑥 𝑖1 + 𝑏2 𝑥 𝑖2 + . . . + 𝑏 𝑘 𝑥 𝑖 𝑘 )]2 = (Y − Xb)𝑇 (Y − Xb) =
||Y − Xb|| 2
where b is the column vector with entries 𝑏0 , 𝑏1 , . . . , 𝑏 𝑘 , and ||u|| is the
length of u.
If we equate to zero the partial derivative with respect to each of the
coefficients, then it leads to the normal equations:

(X𝑇 X)b = X𝑇 Y

Assuming the matrix (X𝑇 X) is invertible, we obtain

𝜷ˆ = b = (X𝑇 X)−1 X𝑇 Y

Now we summarize the procedure to obtain a multiple linear regression


equation.
Statistics ITC 44 / 46
Matrix Notation For Linear Regression

PROCEDURE TO OBTAIN A MULTIPLE LINEAR REGRESSION


EQUATION
1 Rewrite the 𝑛 observations

𝑌𝑖 = 𝛽 0 + 𝛽 1 𝑥 1𝑖 + 𝛽 1 𝑥2𝑖 + . . . + 𝛽 𝑘 𝑥 𝑘𝑖 , 𝑖 = 1, 2, . . . , 𝑛

in the matrix notation as Y = X𝜷 + 𝜺


2 Compute (X𝑇 X)−1 and obtain the estimators of 𝜷 as

𝜷ˆ = (X𝑇 X)−1 X𝑇 Y

3 Then the regression equation is

Y = X𝜷ˆ

Statistics ITC 45 / 46
Matrix Notation For Linear Regression

Example 7
The following data relate to the prices (𝑌) of five randomly chosen
houses in a certain neighborhood, the corresponding ages of the houses
(𝑥 1 ), and square footage (𝑥 2 ).

Price 𝑦 in thousands Age 𝑥 1 in Square footage 𝑥 2 in thousands


of dollars years of square feet
100 1 1
80 5 1
104 5 2
94 10 2
130 20 3

Fit a multiple linear regression model 𝑌 = 𝛽0 + 𝛽 1 𝑥1 + 𝛽 2 𝑥2 + 𝜀 to the


foregoing data.

Ans: 𝑌ˆ = 66.12 − 0.3794𝑥 1 + 21.4365𝑥2 .


Statistics ITC 46 / 46

You might also like