0% found this document useful (0 votes)
12 views

Lecture 4

The document discusses linear regression analysis and hypothesis testing regarding the slope parameter (β1) in a normal error regression model. Specifically, it covers: 1) The null hypothesis that the slope (β1) is equal to 0, indicating no relationship between the predictor (X) and response (Y) variables. 2) How to construct statistical tests to evaluate this null hypothesis using the sampling distribution of the slope (b1) estimated from the data. 3) That the sampling distribution of b1 is normal, allowing hypothesis tests to be performed, and derives the mean and variance of this distribution.

Uploaded by

Noman Shahzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Lecture 4

The document discusses linear regression analysis and hypothesis testing regarding the slope parameter (β1) in a normal error regression model. Specifically, it covers: 1) The null hypothesis that the slope (β1) is equal to 0, indicating no relationship between the predictor (X) and response (Y) variables. 2) How to construct statistical tests to evaluate this null hypothesis using the sampling distribution of the slope (b1) estimated from the data. 3) That the sampling distribution of b1 is normal, allowing hypothesis tests to be performed, and derives the mean and variance of this distribution.

Uploaded by

Noman Shahzad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Inference in Regression Analysis

Dr. Frank Wood

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 1


Today: Normal Error Regression Model
Yi = β0 + β1 Xi + ǫi
• Yi value of the response variable in the ith trial
• β and β are parameters
• Xi is a known constant, the value of the
predictor variable in the ith trial
• ǫi ~iid N(0,σ)
• i = 1,…,n

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 2


Inferences concerning β
• Tests concerning β (the slope) are often of
interest, particularly
H0 : β1 = 0
Ha : β1 = 0
the null hypothesis model

Yi = β0 + (0)Xi + ǫi
implies that there is no relationship between
Y and X

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 3


Review : Hypothesis Testing
• Elements of a statistical test
– Null hypothesis, H0
– Alternative hypothesis, Ha
– Test statistic
– Rejection region

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 4


Review : Hypothesis Testing - Errors
• Errors
– A type I error is made if H0 is rejected when H0 is
true. The probability of a type I error is denoted
by α. The value of α is called the level of the test.
– A type II error is made if H0 is accepted when Ha
is true. The probability of a type II error is
denoted by β.

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 5


P-value
• The p-value, or attained significance level, is
the smallest level of significance α for which
the observed data indicate that the null
hypothesis should be rejected.

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 6


Null Hypothesis
• If β = 0 then with 95% confidence the b1
would fall
40
in some range around zero
Guess, y = 0x + 21.2, mse: 37.1
True, y = 2x + 9, mse: 4.22
35

30
Response/Output

25

20

15

10
1 2 3 4 5 6 7 8 9 10 11
Predictor/Input
Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 7
Alternative Hypothesis : Least Squares Fit
40
Estimate, y = 2.09x + 8.36, mse: 4.15
True, y = 2x + 9, mse: 4.22
35

30
Response/Output

25
b1 rescaled is
test statistic
20

15

10
1 2 3 4 5 6 7 8 9 10 11
Predictor/Input

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 8


Testing This Hypothesis
• Only have a finite sample
• Different finite set of samples (from the same
population / source) will (almost always)
produce different estimates of β and β (b0,
b1) given the same estimation procedure
• b and b are random variables whose
sampling distributions can be statistically
characterized
• Hypothesis tests can be constructed using
these distributions.
Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 9
Example : Sampling Dist. Of b1
• The point estimator for b1 is

(Xi − X̄)(Yi − Ȳ )
b1 = 
(Xi − X̄)2

• The sampling distribution for b1 is the


distribution over b1 that occurs when the
predictor variables Xi are held fixed and the
observed outputs are repeatedly sampled

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 10


Sampling Dist. Of b1 In Normal Regr. Model
• For a normal error regression model the
sampling distribution of b1 is normal, with
mean and variance given by

E(b1 ) = β1
σ2
V (b1 ) = 
(Xi − X̄)2

• To show this we need to go through a number


of algebraic steps.

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 11


First step
• To show
 
(Xi − X̄)(Yi − Ȳ ) = (Xi − X̄)Yi

we observe
  
(Xi − X̄)(Yi − Ȳ ) = (Xi − X̄)Yi − (Xi − X̄)Ȳ
 
= (Xi − X̄)Yi − Ȳ (Xi − X̄)
  
Xi
= (Xi − X̄)Yi − Ȳ (Xi ) + Ȳ n
n

= (Xi − X̄)Yi

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 12


Slope as linear combination of outputs
• b1 can be expressed as a linear combination
of the Yi’s

(Xi − X̄)(Yi − Ȳ )
b1 = 
(Xi − X̄)2

(Xi − X̄)Yi
= 
(Xi − X̄)2

= ki Yi

where

(Xi −X̄)

ki = (X −X̄)2
i

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 13


Properties of the ki’s
• It can be shown that

ki = 0

k i Xi = 1
 1
2
ki = 
(Xi − X̄)2
(possible homework). We will use these
properties to prove various properties of the
sampling distributions of b1 and b0.
write on board
Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 14
Normality of b1’s Sampling Distribution
• Useful fact:
– A linear combination of independent normal
random variables is normally distributed
– More formally: when Y1, …, Yn are independent
normal random variables, the linear combination
a1Y1 + a2Y2 + … + anYn is normally distributed,
with mean ∑ aiE(Yi) and variance ∑ a2iV(Yi)

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 15


Normality of b1’s Sampling Distribution
• Since b1 is a linear combination of the Yi’s
and each Yi is an independent normal
random variable, then b1 is distributed
normally as well

 (Xi − X̄)
b1 = ki Yi , ki = 
(Xi − X̄)2

write on board

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 16


b1 is an unbiased estimator
• This can be seen using two of the properties
  
E(b1 ) = E( ki Yi ) = ki E(Yi ) = ki (β0 + β1 Xi )
 
= β0 ki + β 1 ki X i
= β0 (0) + β1 (1)
= β1

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 17


Variance of b1
• Since the Yi are independent random
variables with variance σ and the ki’s are
constants we get
 
V (b1 ) = V( ki Yi ) = ki2 V (Yi )
 
= ki2 σ 2 =σ 2
ki2
2 1
= σ 
(Xi − X̄)2
note that this assumes that we know σ.
– Can we?

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 18


Estimated variance of b1
• If we don’t know σ then we can replace it with
the MSE estimate
• Remember
 2
 2
2 SSE (Yi −Ŷi ) ei
s = M SE = n−2 = n−2 = n−2

plugging in we get
σ2
V (b1 ) = 
(Xi − X̄)2
s2
V̂ (b1 ) = 
(Xi − X̄)2
Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 19
Digression : Gauss-Markov Theorem
• In a regression model where E(ǫi) = 0 and
variance V(ǫi) = σ < ∞ and ǫi and ǫj are
uncorrelated for all i and j the least squares
estimators b0 and b1 and unbiased and have
minimum variance among all unbiased linear
estimators.
– Remember

(Xi − X̄)(Yi − Ȳ )
b1 = 
(Xi − X̄)2
b0 = Ȳ − b1 X̄
Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 20
Proof
• The theorem states that b1 as minimum
variance among all unbiased linear estimators
of the form

β̂1 = c i Yi

• As this estimator must be unbiased we have



E(β̂1 ) = ci E(Yi ) = β1
  
= ci (β0 + β1 Xi ) = β0 ci + β1 ci Xi = β1

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 21


Proof cont.
• Given these constraints
 
β0 ci + β1 c i Xi = β1

clearly it must be the case that ∑ci =0 and


∑ciXi = 1 write these on board as conditions of unbiasedness

• The variance of this estimator is


 
V (β̂1 ) = c2i V (Yi ) = σ 2 c2i

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 22


Proof cont.
• Now define ci = ki + di where the ki are the
constants we already defined and the di are
arbitrary constants. Let’s look at the variance
of the estimator
 
V (β̂1 ) = c2i V
(Yi ) = σ 2
(ki + di )2
  
= σ2 ( ki2 + d2i + 2 ki di )

Note we just demonstrated that


2

σ ki2 = V (b1 )

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 23


Proof cont.
• Now by showing that ∑ ki di = 0 we’re almost
done
 
ki di = ki (ci − ki )

= ki (ci − ki )
 
= ki ci − ki2
  Xi − X̄  1
= ci  2
−
(Xi − X̄) (Xi − X̄)2
 
ci Xi − X̄ ci 1
=  2
− 2
=0
(Xi − X̄) (Xi − X̄)
from conditions of unbiasedness
Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 24
Proof end
• So we are left with
 
V (β̂1 ) = σ2 ( ki2 + d2i )

2
= V (b1 ) + σ ( d2i )

which is minimized when the di’s = 0.


This means that the least squares estimator
b1 has minimum variance among all unbiased
linear estimators.

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 25


Sampling Distribution of (b1 - β)/S(b1)
• b1 is normally distributed so (b1-β)/(V(b1)1/2)
is a standard normal variable
• We don’t know V(b1) so it must be estimated
from data. We have already denoted it’s
estimate V̂ (b1 )
• Using this estimate we it can be shown that

b1 −β1
Ŝ(b1 )
∼ t(n − 2) Ŝ(b1 ) = V̂ (b1 )

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 26


Where does this come from?
• We need to rely upon the following theorem
– For the normal error regression model


(Yi −Ŷi )2
SSE
σ2 = σ2 ∼ χ2 (n − 2)

and is independent of b0 and b1

• Intuitively this follows the standard result for the sum


of squared normal random variables
– Here there are two linear constraints imposed by the
regression parameter estimation that each reduce the
number of degrees of freedom by one.

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 27


Another useful fact : t distribution
• Let z and χ(ν) be independent random
variables (standard normal and χ
respectively). We then define a t random
variable as follows:

t(ν) =  χz2 (ν)


ν

This version of the t distribution has one


parameter, the degrees of freedom ν

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 28


Distribution of the studentized statistic
• To derive the distribution of this statistic, first
we do the following rewrite
This is a standard
normal variable
b1 −β1
b1 −β1 S(b1 )
Ŝ(b1 )
= Ŝ(b1 )
S(b1 )


Ŝ(b1 ) V̂ (b1 )
S(b1 ) = V (b1 )

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 29


Studentized statistic cont.
• And note the following
 M SE
V̂ (b1 ) (Xi −X̄)2
M SE SSE
V (b1 ) =  σ2
= σ2 = σ 2 (n−2)
(Xi −X̄)2

where we know (by the given theorem) the


distribution of the last term is χ and indep. of
b1 and b0
SSE χ2 (n−2)
σ 2 (n−2) ∼ n−2

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 30


Studentized statistic final
• But by the given definition of the t distribution
we have our result
b1 −β1
Ŝ(b1 )
∼ t(n − 2)

because putting everything together we can


see that
b1 −β1 z
Ŝ(b1 )
∼
χ2 (n−2)
n−2

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 31


Confidence Intervals and Hypothesis Tests
• Now that we know the sampling distribution of
b (t with n-2 degrees of freedom) we can
construct confidence intervals and hypothesis
tests easily

Frank Wood, [email protected] Linear Regression Models Lecture 4, Slide 32

You might also like