0% found this document useful (0 votes)
14 views

Linear Regression

Linear Regression

Uploaded by

Chhin Visal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Linear Regression

Linear Regression

Uploaded by

Chhin Visal
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

CHAPTER VI

Linear Regression Models

PHOK Ponna

Institute of Technology of Cambodia


Department of Applied Mathematics and Statistics (AMS)

2023–2024

Statistics ITC 1 / 46
Contents

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 1 / 46
Contents

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 2 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 3 / 46
Introduction

Let 𝑥 = length of work experience (in year).


Let 𝑦 = monthly individual wage (in USD) with work experience 𝑥
years.
Aim: You may want to know the relationship between income 𝑦 and
work experience 𝑥.
Suppose that you asked 100 individuals and you obtained the data
points (𝑥 𝑖 , 𝑦 𝑖 ). The data plotted in the next slide is called scatter plot.
Question:
Overall, what can you say about the relationship between 𝑥 and 𝑦?
Can you sketch the relationship curve?

Statistics ITC 4 / 46
Introduction

Statistics ITC 5 / 46
Introduction

Statistics ITC 6 / 46
Introduction

In this chapter,
We examine the relationship between one or more variables and
create a model that can be used for predictive purposes. Our aim
is to create a model and study inferential procedures when one
dependent and several independent variables are present.
We denote by 𝑌 the random variable to be predicted, also called
the dependent variable (or response variable) and by 𝑥 𝑖 the
independent (or predictor) variables used to model (or predict) 𝑌.
The process of finding a mathematical equation that best fits the
noisy data is known as regression analysis.
There are different forms of regression: simple linear, nonlinear,
multiple, and others.
The primary use of a regression model is prediction.When using a
model to predict 𝑌 for a particular set of values of 𝑥 1 , . . . , 𝑥 𝑘 , one
may want to know how large the error of prediction might be.
Regression analysis, in general after collecting the sample data,
involves the following steps.
Statistics ITC 7 / 46
Introduction

Procedure for Regression Modeling


1 Hypothesize the form of the model as

𝑌 = 𝑓 (𝑥 1 , . . . , 𝑥 𝑘 ; 𝛽 0 , 𝛽 1 , . . . , 𝛽 𝑘 ) + 𝜀.
𝜀 represents the random error term.
We assume: 𝐸(𝜀) = 0 but 𝑉(𝜀) = 𝜎2 is unknown.
From this we can obtain 𝐸(𝑌) = 𝑓 (𝑥1 , . . . , 𝑥 𝑘 ; 𝛽 0 , 𝛽 1 , . . . , 𝛽 𝑘 ).
2 Use the sample data to estimate unknown parameters in the
model.
3 Check for goodness of fit of the proposed model.
4 Use the model for prediction.

The function 𝑓 (𝑥 1 , . . . , 𝑥 𝑘 ; 𝛽 0 , 𝛽 1 , . . . , 𝛽 𝑘 ) contains the independent or


predictor variables 𝑥 1 , . . . , 𝑥 𝑛 (assumed to be nonrandom) and
unknown parameters or weights 𝛽 0 , 𝛽1 , . . . , 𝛽 𝑘 and 𝜀 representing the
random or error variable. We now proceed to introduce the simplest
form of a regression model, called simple linear regression.
Statistics ITC 8 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 9 / 46
The Simple Linear Regression Model

Definition 1
A multiple linear regression model relating a random response 𝑌
to a set of predictor variables 𝑥 1 , . . . , 𝑥 𝑘 is an equation of the form

𝑌 = 𝛽0 + 𝛽1 𝑥1 + . . . + 𝛽 𝑘 𝑥 𝑘 + 𝜀

where 𝛽0 , . . . , 𝛽 𝑘 are unknown parameters and 𝜀 is a random deviation


or error term that is normally distributed with mean 0, variance 𝜎2 ,
and the various 𝜀’s are independent of one another. Simple linear
regression is the special case in which 𝑘 = 1.
The 𝑛 observed pairs (𝑥1 , 𝑦1 ), (𝑥 2 , 𝑦2 ), . . . , (𝑥 𝑛 , 𝑦𝑛 ) are regarded as
having been generated independently of each other from the model
equation (first fix 𝑥 = 𝑥 1 and observe 𝑌1 = 𝛽 0 + 𝛽 1 𝑥 1 + 𝜀1 , then fix
𝑥 = 𝑥 2 and observe 𝑌2 = 𝛽0 + 𝛽 1 𝑥 2 + 𝜀2 , and so on.

Statistics ITC 10 / 46
The Method of Least Squares

Definition 2
The sum of squares for errors (SSE) or sum of squares of the
residuals for all of the 𝑛 data points (𝑥 1 , 𝑦1 ), . . . , (𝑥 𝑛 , 𝑦𝑛 ) is
𝑛
Õ 𝑛 h
Õ  i2
SSE = 𝑒 𝑖2 = 𝑦 𝑖 − 𝛽ˆ 0 + 𝛽ˆ 1 𝑥 𝑖 ,
𝑖=1 𝑖=1

where 𝑒 𝑖 , the residual, is the deviation of 𝑦 𝑖 from its predictor


𝑌ˆ = 𝛽ˆ 0 + 𝛽ˆ 1 𝑥 𝑖 .
The least-squares approach is to find 𝛽ˆ 0 and 𝛽ˆ 1 that minimize SSE:

(𝛽ˆ 0 , 𝛽ˆ 1 ) ∈ argmax𝛽0 ,𝛽1 SSE.

The quantities 𝛽ˆ 0 and 𝛽ˆ 1 are called the least-squares estimates of


the parameters 𝛽 0 and 𝛽 1 , and the corresponding line 𝑌ˆ = 𝛽ˆ 0 + 𝛽ˆ 1 𝑥 is
called the least-squares line.
Statistics ITC 11 / 46
Derivation of 𝛽ˆ 0 and 𝛽ˆ 1

If SSE attains a minimum, then the partial derivatives of SSE with


respect to 𝛽 0 and 𝛽1 are zeros. Then we get 𝛽ˆ 0 and 𝛽ˆ 1 are solutions to
the normal equations:
𝑛
Õ 𝑛
Õ 𝑛
Õ 𝑛
Õ 𝑛
Õ
𝑦 𝑖 = 𝑛𝛽0 + 𝛽 1 𝑥𝑖 , 𝑥 𝑖 𝑦𝑖 = 𝛽0 𝑥 𝑖 + 𝛽1 𝑥 2𝑖
𝑖=1 𝑖=1 𝑖=1 𝑖=1 𝑖=1

The least-squares estimates 𝛽ˆ 0 and 𝛽ˆ 1 are given by

𝑆𝑥 𝑦
𝛽ˆ 1 = , 𝛽ˆ 0 = 𝑦¯ − 𝛽ˆ 1 𝑥,
¯
𝑆 𝑥𝑥
where
𝑥 𝑖 )( 𝑦𝑖 ) 𝑥 𝑖 )2
Í Í
(
Í
(
𝑆𝑥 𝑦 = 𝑥 𝑖 𝑦𝑖 − , 𝑆 𝑥𝑥 = 𝑥 2𝑖 −
Í Í
𝑛 𝑛 .

Statistics ITC 12 / 46
The Simple Linear Regression Model

Example 1
Use the method of least squares to fit a straight line to the
accompanying data points. Give the estimates of 𝛽0 and 𝛽 1 . Plot the
points and sketch the fitted least-squares line. The observed data
values are given in the following table.
𝑥 −1 0 2 −2 5 6 8 11 12 −3
𝑦 −5 −4 2 −7 6 9 13 21 20 −9

Hint: 𝑆 𝑥𝑥 = 263.6, 𝑆 𝑥 𝑦 = 534.2, 𝑥¯ = 3.8, 𝑦¯ = 4.6

Statistics ITC 13 / 46
Properties of the Least-Squares Estimators for the Model 𝑌 = 𝛽 0 + 𝛽1 𝑥 + 𝜀

Theorem 1
Let 𝑌 = 𝛽 0 + 𝛽1 𝑥 + 𝜀 be a simple linear regression model with
𝜀 ∼ 𝑁(0, 𝜎2 ), and let the errors 𝜀𝑖 associated with different
observations 𝑦 𝑖 (𝑖 = 1, . . . , 𝑛) be independent. Then
(a) 𝛽ˆ 0 and 𝛽ˆ 1 have normal distributions.
(b) The mean and variance are given by

𝑥¯ 2
 
1
𝐸(𝛽ˆ 0 ) = 𝛽 0 , 𝑉(𝛽ˆ 0 ) = + 𝜎2 ,
𝑛 𝑆 𝑥𝑥

and
𝜎2 𝜎
𝐸(𝛽ˆ 1 ) = 𝛽1 , 𝑉(𝛽ˆ 1 ) = 𝜎𝛽ˆ1 = √
𝑆 𝑥𝑥 𝑆 𝑥𝑥
Thus, 𝛽ˆ 0 and 𝛽ˆ 1 are unbiased estimators of 𝛽0 and 𝛽1 , respectively.

Statistics ITC 14 / 46
Estimating 𝜎2 and 𝜎

Theorem 2
For a random sample of size 𝑛. Then
(a) The error sum of squares can be expressed by

SSE = 𝑆 𝑦 𝑦 − 𝛽ˆ 1 𝑆 𝑥 𝑦

(b) 𝐸 [SSE] = (𝑛 − 2)𝜎2 .


Thus, an unbiased estimator of the error variance, 𝜎2 , is
𝜎ˆ 2 = SSE/(𝑛 − 2).

We denote (Mean Square Error)

MSE = SSE/(𝑛 − 2).

Statistics ITC 15 / 46
The Coefficient of Determination

Definition 3
The total sum of squares
Õ Õ Õ 2
SST = 𝑆 𝑦 𝑦 = (𝑦 𝑖 − 𝑦)
¯ 2= 𝑦 𝑖2 − 𝑦𝑖 /𝑛

The coefficient of determination, denoted by R2 , is given by


SSE
R2 = 1 −
SST
It is interpreted as the proportion of observed 𝑦 variation that can be
explained by the simple linear regression model (attributed to an
approximate linear relationship between 𝑦 and 𝑥).

Statistics ITC 16 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 17 / 46
Inferences about the least-squares estimators

Theorem 3
The assumptions of the simple linear regression model imply that the
standardized variable

𝛽ˆ 1 − 𝛽1
𝑇1 = q ∼ 𝑡(𝑛 − 2),
𝑀𝑆𝐸
𝑆 𝑥𝑥

where 𝑀𝑆𝐸 = 𝑆𝑆𝐸 𝑛−2 .


Similarly, the standardized variable

𝛽ˆ 0 − 𝛽 0
𝑇0 = h   i 1/2 ∼ 𝑡(𝑛 − 2)
𝑥¯ 2
𝑀𝑆𝐸 1
𝑛 + 𝑆 𝑥𝑥

Statistics ITC 18 / 46
Inferences about the least-squares estimators

Confidence Intervals For 𝛽 0 and 𝛽1


A 100(1 − 𝛼)% CI for 𝛽1 is given by
r r !
𝑀𝑆𝐸 ˆ 𝑀𝑆𝐸
𝛽ˆ 1 − 𝑡 𝛼/2,𝑛−2 , 𝛽 1 + 𝑡 𝛼/2,𝑛−2
𝑆 𝑥𝑥 𝑆 𝑥𝑥

A 100(1 − 𝛼)% CI for 𝛽0 is given by


  1/2
𝑥¯ 2
 
1
𝛽ˆ 0 ± 𝑡 𝛼/2,𝑛−2 𝑀𝑆𝐸 +
𝑛 𝑆 𝑥𝑥

Statistics ITC 19 / 46
Confidence Intervals For 𝛽 0 and 𝛽1

Example 2
The observed data values are given in the following table.
𝑥 −1 0 2 −2 5 6 8 11 12 −3
𝑦 −5 −4 2 −7 6 9 13 21 20 −9

1 Construct a 95% confidence interval for 𝛽 0 and interpret.


2 Construct a 95% confidence interval for 𝛽 1 and interpret.

Hint: 𝑆 𝑥𝑥 = 263.6, 𝑆 𝑥 𝑦 = 534.2, 𝑥¯ = 3.8, 𝑦¯ = 4.6, 𝑆𝑆𝐸 = 7.79028, 𝑀𝑆𝐸 =


0.973785, 𝑡0.025,8 = 2.306.

Statistics ITC 20 / 46
Hypothesis-Testing Procedures For 𝛽 1

One-sided test Two-sided test

𝐻0 : 𝛽1 = 𝛽 10 (𝛽 10 is a specific value of 𝛽 1 ) 𝐻0 : 𝛽 1 = 𝛽 10
𝐻𝑎 : 𝛽 1 > 𝛽10 or 𝐻𝑎 : 𝛽 1 < 𝛽10 𝐻𝑎 : 𝛽 1 ≠ 𝛽 10
Test statistic value: Test statistic value:
𝛽ˆ 1 −𝛽10 𝛽ˆ 1 −𝛽 10
𝑡= q
𝑀𝑆𝐸
𝑡= q
𝑀𝑆𝐸
𝑆 𝑥𝑥 𝑆 𝑥𝑥
Rejection region: Rejection region:
𝑡 > 𝑡 𝛼,𝑛−2 (upper tail region) |𝑡 | > 𝑡 𝛼/2,𝑛−2
𝑡 < −𝑡 𝛼,𝑛−2 (lower tail region)

Statistics ITC 21 / 46
Hypothesis-Testing Procedures For 𝛽 0

One-sided test Two-sided test

𝐻0 : 𝛽0 = 𝛽 00 (𝛽 00 is a specific value of 𝛽 0 ) 𝐻0 : 𝛽 0 = 𝛽 00
𝐻𝑎 : 𝛽 0 > 𝛽00 or 𝐻𝑎 : 𝛽 0 < 𝛽00 𝐻𝑎 : 𝛽 0 ≠ 𝛽 00
Test statistic value: Test statistic value:
𝛽ˆ 0 −𝛽 00 𝛽ˆ 0 −𝛽00
𝑡= h   i 1/ 2 𝑡= h  i 1/2
𝑥¯ 2 𝑥¯ 2
𝑀𝑆𝐸 1
𝑛 + 𝑆 𝑥𝑥 𝑀𝑆𝐸 1
𝑛 + 𝑆 𝑥𝑥

Rejection region: Rejection region:


𝑡 > 𝑡 𝛼,𝑛−2 (upper tail region) |𝑡 | > 𝑡 𝛼/2,𝑛−2
𝑡 < −𝑡 𝛼,𝑛−2 (lower tail region)

Statistics ITC 22 / 46
Hypothesis-Testing Procedures For 𝛽 0 and 𝛽1

Example 3

The observed data values are given in the following table.


𝑥 −1 0 2 −2 5 6 8 11 12 −3
𝑦 −5 −4 2 −7 6 9 13 21 20 −9

1 Test the hypothesis 𝐻0 : 𝛽1 = 2 versus 𝐻𝑎 : 𝛽 1 ≠ 2 using the 0.05


level of significance.
2 Test the hypothesis 𝐻0 : 𝛽0 = −3 versus 𝐻𝑎 : 𝛽0 ≠ −3 using the
0.05 level of significance.

Hint: 𝑆 𝑥𝑥 = 263.6, 𝑆 𝑥 𝑦 = 534.2, 𝑥¯ = 3.8, 𝑦¯ = 4.6, 𝑆𝑆𝐸 = 7.79028, 𝑀𝑆𝐸 =


0.973785, 𝑡0.025,8 = 2.306, 𝛽ˆ 0 = −3.1011, 𝛽ˆ 1 = 2.0266.

Statistics ITC 23 / 46
Regression and ANOVA

The splitting of the total sum of squares SST into a part SSE, which
measures unexplained variation, and a part SSR, which measures
variation explained by the linear relationship, is strongly reminiscent of
one-way ANOVA.
Notations

𝑛
Õ 𝑛
Õ 𝑛
Õ
SST = (𝑦 𝑖 − 𝑦)
¯ 2, SSE = (𝑦 𝑖 − 𝑦ˆ 𝑖 )2 , SSR = ( 𝑦ˆ 𝑖 − 𝑦)
¯ 2
𝑖=1 𝑖=1 𝑖=1

Theorem 4
𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅

Statistics ITC 24 / 46
Regression and ANOVA

To test 𝐻0 : 𝛽 1 = 0 Vs. 𝐻𝑎 : 𝛽1 ≠ 0, we could use the statistic

𝑀𝑆𝑅
𝐹= ∼ 𝐹(1, 𝑛 − 2)
𝑀𝑆𝐸
𝑀𝑆𝑅
and reject 𝐻0 if 𝑓 = 𝑀𝑆𝐸 ≥ 𝐹𝛼,1,𝑛−2 .
ANOVA table
Source of variation 𝑑𝑓 Sum of Squares Mean Square 𝑓
Regression 1 𝑆𝑆𝑅 𝑀𝑆𝑅 = 𝑆𝑆𝑅
1
𝑀𝑆𝑅
𝑀𝑆𝐸
Error 𝑛−2 𝑆𝑆𝐸 𝑀𝑆𝐸 = 𝑆𝑆𝐸
𝑛−2
Total 𝑛−1 𝑆𝑆𝑇

Statistics ITC 25 / 46
Regression and ANOVA

Example 4
In a study of baseline characteristics of 20 patients with foot ulcers, we
want to see the relationship between the stage of ulcer (determined
using the Yarkony-Kirk scale, a higher number indicating a more severe
stage, with range 1 to 6), and duration of ulcer (in days). Suppose we
have the data shown in Table below.
(a) Give an ANOVA table to test 𝐻0 : 𝛽 1 = 0 vs. 𝐻𝑎 : 𝛽 1 ≠ 0. What is
the conclusion of the test based on 𝛼 = 0.05?
(b) Write down the expression for the least-squares line.

Stage of Ulcer (𝑥) 4 3 5 4 4 3 3 4 6 3


Duration (𝑑) 18 6 20 15 16 15 10 18 26 15
Stage of Ulcer (𝑥) 3 4 3 2 3 2 2 3 5 6
Duration (𝑑) 8 16 17 6 7 7 8 11 21 24

Hint: 𝑆𝑆𝑅 = 570.04, 𝑆𝑆𝐸 = 133.16, 𝑆𝑆𝑇 = 703.20, 𝑑 = 4.61𝑥 − 2.40.


Statistics ITC 26 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 27 / 46
Predicting a particular value of 𝑌

Let 𝑌ˆ0 denote a predictor of a particular value of 𝑌 = 𝑌0 and let the


corresponding values of 𝑥 be 𝑥 0 . We shall choose 𝑌ˆ0 to be 𝐸(𝑌|𝑥 0 ). Let
𝑌ˆ denote a predictor of a particular value of 𝑌.
CI for a particular value of 𝑌
A (1 − 𝛼)100% prediction interval for 𝑌 is
s
1 (𝑥 − 𝑥)

¯2
𝑌ˆ ± 𝑡 𝛼/2,𝑛−2 · 𝑆 · 1+ +
𝑛 𝑆 𝑥𝑥

𝑆𝑆𝐸
where 𝑆2 = 𝑛−2 .

Statistics ITC 28 / 46
Predicting a particular value of 𝑌

Example 5
Using the data given in Example 3, obtain a 95% prediction interval at
𝑥 = 5.
Hint: 𝑌ˆ = −3.1011 + 2.0266𝑥, at 𝑥 = 5 =⇒ 𝑌ˆ = 7.0319, 𝑥¯ = 3.8, 𝑆 𝑥𝑥 =
263.6, 𝑆𝑆𝐸 = 7.79028, 𝑆 = 0.9868, 𝑡0.025,8 = 2.306

Statistics ITC 29 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 30 / 46
The Population and Sample Correlation Coefficients

Definition 4
The population correlation coefficient of two random variables 𝑋 and 𝑌
is defined by
𝐶𝑜𝑣(𝑋 , 𝑌)
𝜌 = 𝜌𝑋 ,𝑌 =
𝜎𝑋 · 𝜎𝑌
where 𝜎𝑋 and 𝜎𝑌 are standard deviations of 𝑋 and 𝑌, respectively.

In general, we do not know the value of 𝜌. Usually,given sample data,


𝜌 is estimated by the sample correlation 𝜌ˆ = 𝑟 defined below:
Definition 5
The sample correlation coefficient for the 𝑛 pairs (𝑥 1 , 𝑦1 ), . . . , (𝑥 𝑛 , 𝑦𝑛 ) is

𝑆𝑥 𝑦 𝑆𝑥 𝑦
𝑟 = pÍ =√
𝑆 𝑥𝑥 𝑆 𝑦 𝑦
p
(𝑥 𝑖 − 𝑥) (𝑦 𝑖 − 𝑦)

¯2 ¯ 2

Statistics ITC 31 / 46
The Sample Correlation Coefficient 𝑟

Properties of 𝑟
1 The value of 𝑟 does not depend on which of the two variables is

labeled 𝑥 and which is labeled 𝑦.


2 The value of 𝑟 is independent of the units in which 𝑥 and 𝑦 are
measured.
3 −1 ≤ 𝑟 ≤ 1
4 𝑟 = 1 if and only if (iff) all (𝑥 𝑖 , 𝑦 𝑖 ) pairs lie on a straight line with
positive slope, and 𝑟 = −1 iff all (𝑥 𝑖 , 𝑦 𝑖 ) pairs lie on a straight line
with negative slope.
5 The square of the sample correlation coefficient gives the value of
the coefficient of determination that would result from fitting the
simple linear regression model-in symbols, (𝑟)2 = 𝑟 2 .

Statistics ITC 32 / 46
Assumption on 𝑋 and 𝑌

Assumption
We assume that the pair (𝑋 , 𝑌) has bivariate normal probability
distribution, that is, its joint pdf is
 
(𝑥−𝜇𝑋 )2 (𝑥−𝜇𝑋 )(𝑦−𝜇𝑌 ) (𝑦−𝜇 𝑦 )2
1 − 2 −2𝜌 𝜎𝑋 𝜎𝑌 + /2(1−𝜌)2
𝜎 𝜎2
𝑓 (𝑥, 𝑦) = 𝑒 𝑋 𝑌 ,
2𝜋 · 𝜎𝑋 𝜎𝑌 1 − 𝜌2
p

(𝑥, 𝑦) ∈ R2 .

Theorem 5
Assume that (𝑋 , 𝑌) has bivariate normal distribution. Then 𝑋 and 𝑌
are independent if and only if 𝜌 = 0.

Statistics ITC 33 / 46
Inference about 𝜌

Testing for the absence of correlation


Let 𝑅 denotes the random variable whose realization is 𝑟.
When 𝐻0 : 𝜌 = 0 is true, the test statistic

𝑅 𝑛−2
𝑇= √ ∼ 𝑡(𝑛 − 2)
1 − 𝑅2
and the test value √
𝑟 𝑛−2
𝑡= √ .
1 − 𝑟2
Alternative Hypothesis Rejection Region for Level 𝛼 Test

𝐻𝑎 : 𝜌 > 0 𝑡 ≥ 𝑡 𝛼,𝑛−2
𝐻𝑎 : 𝜌 < 0 𝑡 ≤ −𝑡 𝛼,𝑛−2
𝐻𝑎 : 𝜌 ≠ 0 either 𝑡 ≥ 𝑡 𝛼/2,𝑛−2 or 𝑡 ≤ −𝑡 𝛼/2,𝑛−2

Statistics ITC 34 / 46
Other Inferences Concerning 𝜌

Theorem 6
When (𝑋1 , 𝑌1 ), . . . , (𝑋𝑛 , 𝑌𝑛 ), with 𝑛 > 3, is a sample from a bivariate
normal distribution, the rv

1+𝑅
 
1
𝑉 = ln
2 1−𝑅

has approximately a normal distribution with mean and variance

1+𝜌
 
1 1
𝜇𝑉 = ln , 𝜎𝑉2 =
2 1−𝜌 𝑛−3

Statistics ITC 35 / 46
Other Inferences Concerning 𝜌

Testing for the population correlation


The test statistic for testing 𝐻0 : 𝜌 = 𝜌0 is
 
1+𝜌
𝑉 − 12 ln 1−𝜌00
𝑍= √
1/ 𝑛 − 3

Alternative Hypothesis Rejection Region for Level 𝛼 Test

𝐻 𝑎 : 𝜌 > 𝜌0 𝑧 ≥ 𝑧𝛼
𝐻 𝑎 : 𝜌 < 𝜌0 𝑧 ≤ −𝑧 𝛼
𝐻 𝑎 : 𝜌 ≠ 𝜌0 either 𝑧 ≥ 𝑧 𝛼/2 or 𝑧 ≤ −𝑧 𝛼/2

Statistics ITC 36 / 46
Inference about 𝜌

Example 6
For the data given in Example 3, would you say that the variables 𝑋
and 𝑌 are independent? Use 𝛼 = 0.05. Assume that (𝑋 , 𝑌) is bivariate
normally distributed.

Hint: 𝑛𝑖=1 𝑥 𝑖 = 38, 𝑛𝑖=1 𝑦 𝑖 = 46, 𝑛𝑖=1 𝑥 𝑖 𝑦 𝑖 = 709, 𝑛𝑖=1 𝑥 2𝑖 =


Í Í Í Í
408, 𝑛𝑖=1 𝑦 𝑖2 = 1302, 𝑛 = 10, 𝑟 = 0.99641, 𝑧 = 8.3618, 𝑧0.025 = 1.96.
Í

Statistics ITC 37 / 46
Other Inferences Concerning 𝜌

Testing for the population correlation


 
1+𝜌
To obtain a CI for 𝜌, we first derive an interval for 𝜇𝑉 = 12 ln 1−𝜌 .
Standardizing 𝑉, writing a probability statement, and manipulating
the resulting inequalities yields
𝑧 𝛼/2 𝑧 𝛼/2
 
𝑣− √ ,𝑣 + √
𝑛−3 𝑛−3

as a 100(1 − 𝛼)% interval for 𝜇𝑉 , where 𝑣 = 1 1+𝑟



2 ln 1−𝑟 . This interval can
then be manipulated to yield a CI for 𝜌.
A 100(1 − 𝛼)% confidence interval for 𝜌 is
 
𝑒 2𝑐1 −1 𝑒 2𝑐2 −1
,
𝑒 2𝑐1 +1 𝑒 2𝑐2 +1

where 𝑐1 and 𝑐2 are the left and right endpoints, respectively, of the
interval for 𝜇𝑣 .

Statistics ITC 38 / 46
Outline

1 Introduction

2 Simple Linear regression


The Method of Least Squares
Properties of the Least-Squares Estimators for the Model
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
Estimation of Error Variance 𝜎2

3 Inference on the Least-squares Estimators


Regression and ANOVA

4 Predicting a particular value of 𝑌

5 Correlation

6 Matrix Notation For Linear Regression

Statistics ITC 39 / 46
Matrix Notation For Linear Regression

Recall that we used an additive model equation to relate a dependent


variable 𝑦 to independent variables 𝑥1 , . . . , 𝑥 𝑘 . That is, we used the
model
𝑌 = 𝛽0 + 𝛽 1 𝑥 1 + 𝛽 2 𝑥 2 + . . . + 𝛽 𝑘 𝑥 𝑘 + 𝜀,
where 𝜀 ∼ 𝑁(0, 𝜎2 ), and the various 𝜀’s are independent of one another.
Simple linear regression is the special case in which 𝑘 = 1.
Suppose that we have 𝑛 observations, each consisting of a 𝑦 value and
values of the 𝑘 predictors (so each observation consists of 𝑘 + 1
numbers). We have then

 𝑦1   𝛽0 + 𝛽1 𝑥11 + 𝛽2 𝑥12 + . . . + 𝛽 𝑘 𝑥1𝑘 + 𝜀1
 


 .   . 
. .
   
 = 
. .
   
   
𝑦𝑛   𝛽0 + 𝛽1 𝑥 𝑛1 + 𝛽2 𝑥 𝑛2 + . . . + 𝛽 𝑘 𝑥 𝑛 𝑘 + 𝜀𝑛
   
 
   
where 𝑥 𝑖𝑗 is the 𝑗th independent variable for the 𝑖th observation,
𝑖 = 1, 2, . . . , 𝑛, and 𝜀𝑖 ’s is are independent.
Statistics ITC 40 / 46
Matrix Notation For Linear Regression

Define the following matrices:


 1 𝑥 11 𝑥 12 . . 𝑥 1𝑘 


 𝑦1 


 1 𝑥 21 𝑥 22 . . 𝑥 2𝑘   𝑦2 
. . . . . .
 
.

, ,
   
X =  Y = 
. . . . . .  . 
. . . . . . .
   
   
1 𝑥 𝑛1 𝑥 𝑛2 . . 𝑥𝑛 𝑘 𝑦𝑛
   
   
   

Statistics ITC 41 / 46
Matrix Notation For Linear Regression


 𝛽0 


 𝜀1 


 𝛽1   𝜀2 
.
 
.

𝜷 =  , 𝜺 = 
   
. .

 
. .
   
   
𝛽𝑘 𝜀𝑛
   
   
   
Thus the 𝑛 equations representing the linear equations can be
rewritten in the matrix form as

Y = X𝜷 + 𝜺.

Statistics ITC 42 / 46
Matrix Notation For Linear Regression

In particular, for the 𝑛 observations from the simple linear model of


the form
𝑌 = 𝛽0 + 𝛽1 𝑥 + 𝜀
we can write
Y = X𝜷 + 𝜺,
where

 1 𝑥1  
 𝑦1 


 𝜀1 


 1 𝑥2   𝑦2   𝜀2 
.

.

𝛽0

.
  
, , 𝜷= , 𝜺 = 
     
X =  Y = 
. . 𝛽1 .

  
. . .
     
     
1 𝑥 𝑛  𝑦𝑛 𝜀𝑛
     
    
    

Statistics ITC 43 / 46
Matrix Notation For Linear Regression

We now estimate 𝛽 0 , 𝛽 1 , 𝛽 2 , . . . , 𝛽 𝑘 using the principle of least squares:


Find 𝑏0 , 𝑏1 , 𝑏2 , . . . , 𝑏 𝑘 to minimize
Í𝑛
𝑖=1 [𝑦 𝑖 − (𝑏0 + 𝑏1 𝑥 𝑖1 + 𝑏2 𝑥 𝑖2 + . . . + 𝑏 𝑘 𝑥 𝑖 𝑘 )]2 = (Y − Xb)𝑇 (Y − Xb) =
||Y − Xb|| 2
where b is the column vector with entries 𝑏0 , 𝑏1 , . . . , 𝑏 𝑘 , and ||u|| is the
length of u.
If we equate to zero the partial derivative with respect to each of the
coefficients, then it leads to the normal equations:

(X𝑇 X)b = X𝑇 Y

Assuming the matrix (X𝑇 X) is invertible, we obtain

𝜷ˆ = b = (X𝑇 X)−1 X𝑇 Y

Now we summarize the procedure to obtain a multiple linear regression


equation.
Statistics ITC 44 / 46
Matrix Notation For Linear Regression

PROCEDURE TO OBTAIN A MULTIPLE LINEAR REGRESSION


EQUATION
1 Rewrite the 𝑛 observations

𝑌𝑖 = 𝛽 0 + 𝛽 1 𝑥 1𝑖 + 𝛽 1 𝑥2𝑖 + . . . + 𝛽 𝑘 𝑥 𝑘𝑖 , 𝑖 = 1, 2, . . . , 𝑛

in the matrix notation as Y = X𝜷 + 𝜺


2 Compute (X𝑇 X)−1 and obtain the estimators of 𝜷 as

𝜷ˆ = (X𝑇 X)−1 X𝑇 Y

3 Then the regression equation is

Y = X𝜷ˆ

Statistics ITC 45 / 46
Matrix Notation For Linear Regression

Example 7
The following data relate to the prices (𝑌) of five randomly chosen
houses in a certain neighborhood, the corresponding ages of the houses
(𝑥 1 ), and square footage (𝑥 2 ).

Price 𝑦 in thousands Age 𝑥 1 in Square footage 𝑥 2 in thousands


of dollars years of square feet
100 1 1
80 5 1
104 5 2
94 10 2
130 20 3

Fit a multiple linear regression model 𝑌 = 𝛽0 + 𝛽 1 𝑥1 + 𝛽 2 𝑥2 + 𝜀 to the


foregoing data.

Ans: 𝑌ˆ = 66.12 − 0.3794𝑥 1 + 21.4365𝑥2 .


Statistics ITC 46 / 46

You might also like