0% found this document useful (0 votes)

11 views89 pages

Stats 100 C

Statistic

Uploaded by

dilendrapanta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views89 pages

Stats 100 C

Statistic

Uploaded by

dilendrapanta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 89

Stats 100C – Linear Models

University of California, Los Angeles

Duc Vu
Fall 2021

This is stats 100C – Linear Models taught by Professor Christou. There is not an official
textbook used for the course. Instead, handouts and reference materials are distributed and
can be accessed through the class website. You can find other math/stats lecture notes through
my personal blog. Let me know through my email if you notice something mathematically
wrong/concerning. Thank you!

Contents

1 Lec 1: Sep 27, 2021 4

1.1 Simple Linear Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Prediction Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Lec 2: Sep 29, 2021 6

2.1 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Lec 3: Oct 1, 2021 10

3.1 Gauss-Markov Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Estimation of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Distribution Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Lec 4: Oct 4, 2021 13

4.1 Centered Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Distribution Theory Using the Centered Model . . . . . . . . . . . . . . . . . . . . . 13

5 Lec 5: Oct 6, 2021 16

5.1 Distribution Theory Using Non-Centered Model . . . . . . . . . . . . . . . . . . . . . 16
5.2 A Note on Gamma Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Coefficient of Determination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6 Lec 6: Oct 8, 2021 19

6.1 Variance & Covariance Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.3 Prediction Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

7 Lec 7: Oct 11, 2021 22

7.1 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

8 Lec 8: Oct 13, 2021 25

8.1 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
8.2 Power Analysis in Simple Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Duc Vu (Fall 2021) Contents

9 Lec 9: Oct 15, 2021 28

9.1 Extra Sum of Squares Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
9.2 Power Analysis Using Non-Central F Distribution . . . . . . . . . . . . . . . . . . . 30

10 Lec 10: Oct 18, 2021 31

10.1 Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

11 Lec 11: Oct 20, 2021 34

11.1 Multiple Regression (Cont’d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

12 Lec 12: Oct 22, 2021 37

12.1 Gauss-Markov Theorem in Multiple Regression . . . . . . . . . . . . . . . . . . . . . 37
12.2 Gauss-Markov Theorem For a Linear Combination . . . . . . . . . . . . . . . . . . . 38
12.3 Review of Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . 38

13 Lec 13: Oct 25, 2021 40

13.1 Theorems in Multivariate Normal Distribution . . . . . . . . . . . . . . . . . . . . . 40

14 Lec 14: Oct 27, 2021 44

14.1 Mean and Variance in Multivariate Normal Distribution . . . . . . . . . . . . . . . . 44
14.2 Independent Vectors in Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . 45
14.3 Partial Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

15 Lec 15: Oct 29, 2021 47

15.1 Partial Regression (Cont’d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

16 Lec 16: Nov 1, 2021 50

16.1 Partial Regression (Cont’d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
16.2 Partial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

17 Lec 17: Nov 3, 2021 53

17.1 Constrained Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

18 Lec 18: Nov 5, 2021 56

18.1 Quadratic Forms of Normally Distributed Random Variables . . . . . . . . . . . . . 56

19 Lec 19: Nov 8, 2021 60

19.1 Quadratic Forms and Their Distribution – Overview . . . . . . . . . . . . . . . . . . 60
19.2 Another Proof of Quadratic Forms and Their Distribution . . . . . . . . . . . . . . . 61
19.3 Efficiency of Least Squares Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . 62

20 Lec 20: Nov 10, 2021 63

20.1 Information Matrix and Efficient Estimator . . . . . . . . . . . . . . . . . . . . . . . 63
20.2 Centered Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

21 Lec 21: Nov 12, 2021 66

21.1 Confidence Intervals in Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . 66
21.2 Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

22 Lec 22: Nov 15, 2021 69

22.1 F Test for the General Linear Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . 69
22.2 F Statistics and t statistics in Multiple Regression . . . . . . . . . . . . . . . . . . . 69
22.3 Power Analysis in Multiple Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 70
22.4 F Statistics Using the Extra Sum of Squares . . . . . . . . . . . . . . . . . . . . . . . 70

2
Duc Vu (Fall 2021) Contents

23 Lec 23: Nov 17, 2021 72

23.1 Testing the Overall Significance of the Model . . . . . . . . . . . . . . . . . . . . . . 72
23.2 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
23.3 Multi-Collinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

24 Lec 24: Nov 19, 2021 75

24.1 Centered and Scaled Model in Matrix/Vector Form . . . . . . . . . . . . . . . . . . . 75

25 Lec 25: Nov 22, 2021 78

25.1 Multi-Collinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
25.2 Generalized Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

26 Lec 26: Nov 24, 2021 81

26.1 Generalized Least Squares (Cont’d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
26.2 Comparing Regression Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

27 Lec 27: Nov 29, 2021 83

27.1 Comparing Regression Equations (Cont’d) . . . . . . . . . . . . . . . . . . . . . . . . 83
27.2 Deleting a Single Point in Multiple Regression . . . . . . . . . . . . . . . . . . . . . . 84

28 Lec 28: Dec 1, 2021 85

28.1 Deleting a Single Point in Multiple Regression (Cont’d) . . . . . . . . . . . . . . . . 85
28.2 Influential Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

29 Lec 29: Dec 3, 2021 87

29.1 Influential Analysis (Cont’d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
29.2 Externally Studentized Residual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
29.3 A Note on Valuable Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3
Duc Vu (Fall 2021) 1 Lec 1: Sep 27, 2021

§1 Lec 1: Sep 27, 2021

§1.1 Simple Linear Regression Models

Consider
Yi = µ + εi
i.i.d i.i.d
with εi ∼ N (0, σ); specifically, Y1 , . . . , Yn ∼ N (µ, σ). We want to estimate µ and σ 2 using least
squares or method of maximum likelihood (MML).
Method of Least Squares (OLS – Ordinary Least Squares):
n
X
min Q = (Yi − µ)2
i=1
∂Q X
= −2 (Yi − µ) = 0
∂µ
X
Yi − nµ̂ = 0
=⇒ µ̂ = Y

Method of Maximum Likelihood (MML):

1 1 2
f (yi ) =√ e− 2σ2 (yi −µ)
σ 2π
− 1 1 2
= 2πσ 2 2 e− 2σ2 (yi −µ)
− n 1
P
(yi −µ)2
L = f (y1 ) . . . f (yn ) = 2πσ 2 2 e− 2σ2
n 1 X
ln L = − ln 2πσ 2 − 2 (yi − µ)2
2 2σ
∂ ln L ∂ ln L
= 0, =0
∂µ ∂σ 2

Solve the above, we obtain the MLE of µ and σ 2

(yi − µ̂)2 (yi − y)2

P P
2
µ̂ = ŷ, σ̂ = =
n n
Notice that σ̂ 2 is biased and we adjust it to be unbiased as follows

(yi − y)2
P
2
S =
n−1

§1.2 Prediction Problem

Given Y1 , . . . , Yn , we want to predict a new Y , e.g., Y0 . An educated guess here is

Ŷ0 = Y
Pn
1. Predictor assumption: Ŷ0 = i=1 ai Yi

2. We want Ŷ0 to be unbiased, i.e., E Ŷ0 = µ

X
E ai Yi = µ
X
ai EYi = µ
X
=⇒ ai = 1

4
Duc Vu (Fall 2021) 1 Lec 1: Sep 27, 2021

3. Minimize the mean square error of prediction, i.e.,

2 X
E Y0 − Ŷ0 s.t. ai = 1

Notice that this is a constraint optimization problem, we use the method of Lagrange multiplier
to obtain
2 X
min Q = E Y0 − Ŷ0 − 2λ ai − 1

Note: EW 2 = var(W ) + (EW )2

hX i
min Q = var Y0 − Ŷ0 − 2λ ai − 1
hX i
= var(Y0 ) + var(Ŷ0 ) − 2 cov Y0 , Ŷ0 − 2λ ai − 1
X hX i
= σ2 + σ2 a2i − 2λ ai − 1
∂Q
= 2σ 2 ai − 2λ = 0
∂ai
λ
ai = 2
σ
λ
Notice that a1 = a2 = . . . = an = σ2 . So
X nλ σ2
ai = 2
= 1 =⇒ λ =
σ n
Thus, we can see that
1
ai =
n
P
and therefore since Ŷ0 = ai Yi , it follows that Ŷ0 = Y .
Prediction Interval: r !
1
Y0 − Ŷ0 ∼ N 0, σ 1+
n
Recall from 100B
(n − 1)S 2 2
∼ Xn−1
σ2
So,
−Ŷ0 −0
Y0√
σ 1+ n 1
Y0 − Ŷ0
q = q ∼ tn−1
(n−1)S 2
σ2 /(n − 1) S 1 + n1
We can now construct the prediction interval for Y0 as follows
 
Y0 − Ŷ0
P −t α2 ;n−1 ≤ q ≤ t α2 ;n−1  = 1 − α
1
S 1+ n
q
Finally, Y0 ∈ Ŷ0 ± t α2 ;n−1 S 1 + n1 .

Remark 1.1. Compare this to the confidence interval for µ : µ ∈ Y ± t α2 ;n−1 √Sn .

5
Duc Vu (Fall 2021) 2 Lec 2: Sep 29, 2021

§2 Lec 2: Sep 29, 2021

§2.1 Linear Regression

Consider a simple regression model

Yi = β0 + β1 Xi + εi
or Yi = β1 Xi + εi

Data:

y x
y1 x1
.. ..
. .
yn xn

data

model estimation check model is it adequate?

yes

theory
use it

where the parameters are (

β0 : intercept
β1 : slope
and X1 , . . . , Xn are predictors that are not random; ε1 , . . . , εn are random error terms/disturbance/stochastic
terms, and Y1 , . . . , Yn are random response variable.
Assumption (Gauss-Markov Conditions):

E(εi ) = 0, var(εi ) = σ 2

ε1 , . . . , εn are independent. Using the Gauss-Markov conditions,

EYi = β0 + β1 Xi
var(Yi ) = σ 2
X
min Q = ε2i
X 2
min Q = (Yi − β0 − β1 Xi )
∂Q X
= −2 (Yi − β0 − β1 Xi ) = 0
∂β0
∂Q X
= −2 (Yi − β0 − β1 Xi ) Xi = 0
∂β1

6
Duc Vu (Fall 2021) 2 Lec 2: Sep 29, 2021

So,
(P P
yi − nβ0 − β1 xi = 0
xi yi − β0 xi − β1 x2i = 0
P P P
( P P
nβ0 + β1 xi = yi
=⇒ – normal equations
β0 xi + β1 x2i = xi yi
P P P

We can solve the above to get β̂0 , β̂1 .

P P
n P x2i β̂0 y
P = P i
xi xi β̂1 x i yi
P −1 P
β̂0 n P x2i P yi
= P
β̂1 xi xi xi yi

Determinant of the matrix:

" P 2#
X X 2 X ( xi )
n x2i − xi =n x2i −
n
X 2
=n (xi − x) ≥ 0

(xi − x)2 = 0. From normal equations we get

P
If x1 = x2 = . . . = xn = x then

β̂0 = y − β̂1 x from (1)

and plug (1) into (2) to obtain

xi yi − n1 ( xi )( yi )
P P P
β̂1 = P 2 ( xi )2P
xi − n
or P
(xi − x)(yi − y)
β̂1 = P
(xi − x)2
or P
(xi − x)yi
β̂1 = P (*)
(xi − x)2
or P
(yi − y)xi
β̂1 = P
(xi − x)2
or P
xi yi − nxy
β̂1 = P P 2
x2i − ( nxi )
Note: From (*), we have
P
(xi − x)yi
β̂1 = P
(xi − x)2
(x1 − x)yi (xn − x)yn
=P + ... + P
(xi − x)2 (xi − x)2
Xn
= k1 y1 + . . . + kn yn = ki yi
i=1

7
Duc Vu (Fall 2021) 2 Lec 2: Sep 29, 2021

where ki = P xi −x 2 . Notice that

(xi −x)
X
ki = 0
X 1
ki2 = P
(x − x)2
P i
X (xi − x)xi
ki xi = P =1
(xi − x)2

Properties of β̂1 :
X X
E β̂1 = E ki yi = ki Eyi
X
= ki (β0 + β1 xi )
X X
= β0 ki + β1 ki xi
= β1 – unbiased

For the variance,

X
var(β̂1 ) = var ki yi
X
= ki2 var(Yi )
σ2
=P
(xi − x)2

Properties of β̂0 :

β̂0 = y − β̂1 x
X yi X
= −x ki yi
n
X 1
= − xki yi
n
Xn
= li yi
i=1

1
where li = n − xki and the properties of li are
X
li = 1
X1 2 X 1
X 2
li2 = − xki = 2 2
+ x ki − xki
n n2 n
1 x2
= +P
n (xi − x)2
X
li xi = 0

Now, we can easily show that β̂0 is unbiased

X X
E β̂0 = E li yi = li Eyi
X X X
= li (β0 + β1 xi ) = β0 li + β1 li x i
= β0

8
Duc Vu (Fall 2021) 2 Lec 2: Sep 29, 2021

Thus,
x2

X
2
X
2 2 1
var β̂0 = var li yi = σ li = σ +P
n (xi − x)2
The fitted value is
ŷi = β̂0 + β̂1 xi = y + β̂1 (xi − x)
and the residual is defined as
ei = yi − ŷi
with properties
X
ei = 0
X
ei xi = 0
X
ei ŷi = 0

Estimation Using MML:

i.i.d
Assume ε1 , . . . , εn ∼ N (0, σ). Then Yi ∼ N (β0 + β1 Xi , σ). The log-likelihood function is
n 1 X
ln L = − ln 2πσ 2 − 2 (yi − β0 − β1 xi )2
2 2σ
So, we need to solve
∂ ln L ∂ ln L
= 0, =0
∂β0 ∂β1
to get β̂0 , β̂1 which are the same as least squares method.
∂ ln L n 1 X
2
=− 2 + 4 (yi − β0 − β1 xi )2 = 0
∂σ 2σ
P 2 2σ
2 ei
σ̂ =
n
Then,
 2
X X
(yi − y)2 = yi − ŷi +ŷi + y 
| {z }
ei

in which we expand to get X X X

(yi − y)2 = e2i + (ŷi − y)2
| {z } | {z } | {z }
SST SSE SSR

in which 
SST: sum of squares total

SSE: sum of squares error

SSR: sum of squares regression


9
Duc Vu (Fall 2021) 3 Lec 3: Oct 1, 2021

§3 Lec 3: Oct 1, 2021

§3.1 G a u s s - M a r kov T h e o r e m
Recall X
β̂1 = ki Yi

where ki = P xi −x 2 . Consider now

(xi −x)
X
b1 = ai Yi
P
which is another unbiased estimator of β1 . Then Eb1 = β1 or E ai Yi = β1 . So
X
β1 = ai EYi
X
= ai (β0 + β1 Xi )
X X
= β0 ai + β1 ai Xi

Thus, (P
ai = 0
P
ai xi = 1
and we know that !
n
X X
var(b1 ) = var ai Yi = σ2 a2i
i=1

and
X σ2
var(β̂1 ) = σ 2 ki2 = P
(xi − x)2
Now let ai = ki + di . Then,
X
var(b1 ) = σ 2(ki + di )2
X X X
= σ2 ki2 + σ 2 d2i + 2σ 2 ki di

P
We need to show ki di = 0.
X X X
ki (ai − ki ) = ki ai − ki2
P
(xi − x)ai 1
= P −P
(xi − x)2 (xi − x)2
P P
xi ai x ai 1
=P − P −P
(xi − x)2 (xi − x)2 (xi − x)2
=0

So var(b1 ) ≥ var(β̂1 ) and therefore β̂1 is the best linear unbiased estimator (BLUE).

§3.2 E s t i m a t i o n o f Va r i a n c e
Using MML
e2i
P
σ̂ 2 =
n
Is it unbiased? P
var(ei ) + (Eei )2
P 2
2 Eei
E σ̂ = =
n n

10
Duc Vu (Fall 2021) 3 Lec 3: Oct 1, 2021

Note: ei = Yi − Ŷi = Yi − β̂0 − β̂1 Xi . So

h i
Eei = E Yi − β̂0 − β̂1 Xi = (β0 + β1 Xi ) − (β0 + β1 Xi ) = 0

Then, P
var(ei )
E σ̂ 2 =
n
Notice that
ei = Yi − β̂0 − β̂1 Xi
or
ei = Yi − Y − β̂1 (Xi − X)
where Ŷi = Y + β̂1 (Xi − X). Substitute in and we get
h i
var(ei ) = var Yi − Y − β̂1 (Xi − X)

= var(Yi ) + var(Y ) + (Xi − X)2 var(β̂1 ) − 2 cov(Yi , Y ) − 2(Xi − X) cov(Yi , β̂1 )

+ 2(Xi − X) cov(Y , β̂1 )

Let’s compute each term there.

Yi = β0 + β1 Xi + εi
var(Yi ) = σ 2
P
εi
Y = β0 + β1 X +
n
σ2
var(Y ) =
n
Y1 + . . . + Yi + . . . + Yn
cov(Yi , Y ) = cov Yi ,
n
1 1 1
= cov(Yi , Y1 ) + . . . + cov(Yi , Yi ) + . . . + cov(Yi , Yn )
n n n
σ2
=
n
X
cov(Yi , β̂1 ) = cov(Yi , ki Yi )
= cov(Yi , k1 Y1 ) + . . . + cov(Yi , ki Yi ) + . . . + cov(Yi , kn Yn )
= k1 cov(Yi , Y1 ) + . . . + ki cov(Yi , Yi ) + . . . + kn cov(Y1 , Yn )
xi − x
= σ 2 ki = σ 2 P
(xi − x)2
Note: A property of covariance

cov(aY, bQ) = ab cov(Y, Q)

And for the last term,

Y1 + . . . + Yn
cov(Y , β̂1 ) = cov , k1 Y1 + . . . + kn Yn
n
Y1 Yn
= cov( , k1 Y1 + . . . + kn Yn ) + . . . + cov( , k1 Y1 + . . . + kn Yn )
n n
σ2 σ2 σ2
= k1 + k2 + . . . + kn
n n n
σ2 X
= ki = 0
n

11
Duc Vu (Fall 2021) 3 Lec 3: Oct 1, 2021

Now, we’re ready to compute the variance

σ2 σ 2 (xi − x)2 2σ 2 2σ 2 (xi − x)2

var(ei ) = σ 2 + +P 2
− − P
n (xi − x) n (xi − x)2
(xi − x)2

1
= σ2 1 − − P
n (xi − x)2

Therefore,

Pn 1 (xi −x)2
P
var(ei ) i=1 1− n − P
(xi −x)2
E σ̂ 2 = = σ2
n n
(n − 2) 2
= σ
n
It follows that the unbiased estimator of σ 2 is
P 2
n ei
Se2 = σ2 =
n−2 n−2

§3.3 Distribution Theory

i.i.d
Let Yi = β0 + β1 Xi + εi and we assume ε1 , . . . , εn ∼ N (0, σ)
!
X σ
βˆ1 = ki Yi =⇒ β̂1 ∼ N
β1 , p P
(xi − x)2
 s 
2
X 1 x
β̂0 = li Yi =⇒ β̂0 ∼ N β0 , σ +P 
n (xi − x)2

(n−2)Se2 2
We will show σ2 ∼ Xn−2 in the next lecture.

12
Duc Vu (Fall 2021) 4 Lec 4: Oct 4, 2021

§4 Lec 4: Oct 4, 2021

§4.1 C e nt e r e d M o d e l
Consider the model: Yi = β0 + β1 Xi + εi , i = 1, . . . , n and Gauss-Markov conditions hold, i.e.,

E [εi ] = 0
var [εi ] = σ 2

i.i.d
for i = 1, . . . , n and ε1 , . . . , εn are independent (we assume ε1 , . . . , εn ∼ N (0, σ)). This is non-
centered model. Let’s look at a centered model

Yi = β0 + β1 Xi ± β1 X + εi
Yi = β0 + β1 X + β1 (Xi − X) + εi
Yi = γ0 + β1 Zi + εi – centered model

where P
γ 0 = β0 P + β1 X and Zi = Xi − X.
Note: zi = (xi − x) = 0 and z = 0. So,
P P P
(zi − z)yi zi yi (xi − x)yi
β̂1 = P 2
= P 2 = P – same as non-centered model
(zi − z) zi (xi − x)2
γ̂0 = y − β̂1 z = y

Notice ŷi = y + β̂1 (xi − x) which is the same as ŷi of the non-centered model.

§4.2 D i s t r i b u t i o n T h e o r y U s i n g t h e C e nt e r e d M o d e l
Have

Yi ∼ N γ0 + β1 Xi − X , σ
!
σ
β̂1 ∼ β1 , pP
(xi − x)2

σ
γ̂0 = Y ∼ N γ0 , √
n
(n−2)Se2 2
Now, let’s show that σ2 ∼ Xn−2 . We have

Yi − γ0 − β1 (Xi − X)
∼ N (0, 1)
σ
2
Yi − γ0 − β1 (Xi − X)
∼ X12
σ2
It follows that Pn 2
i=1 Yi − γ0 − β1 (Xi − X)
∼ Xn2
σ2
(n−2)Se2 e2i
P
Notice that σ2 = σ2 . Let’s manipulate this expression. First, let

Ph i2
Yi − γ0 − β1 (Xi − X) ± γ̂0 ± β̂1 (Xi − X)
L=
σ2

13
Duc Vu (Fall 2021) 4 Lec 4: Oct 4, 2021

Then,
Ph i2
yi − γ̂0 − β̂1 (xi − x) + (γ̂0 − γ0 ) + (β̂1 − β1 )(xi − x)
L=
σ2
Ph i2
ei + (γ̂0 − γ0 ) + (β̂1 − β1 )(xi − x
=
σ2
2
e2i (β̂1 − β1 )2 (xi − x)2
P P P
n (γ̂0 − γ0 ) 2(γ̂0 − γ0 ) ei
= + + +
σ2 σ2 σ2 σ2
P P
2(β̂1 − β1 ) ei (xi − x) 2(γ̂0 − γ0 )(β̂1 − β1 ) (xi − x)
+ +
σ2 σ2
So far,
" #2
2
(n − 2)Se2 γ̂0 − γ0
P
[yi − γ0 − β1 (xi − x)] β̂1 − β1
= + √ +
σ{z2 2
pP
| } | σ{z } |σ/{z n} σ/ (xi − x)2
2 Xn ? | {z }
X12 X12

Q = Q1 + Q2 + Q3
Let’s use moment generating function to find “?”. Notice that Q1 , Q2 , Q3 are independent why?

MQ (t) = MQ1 +Q2 +Q3

MQ (t) = MQ1 (t) · MQ2 (t) · MQ3 (t)
We have
n
Q ∼ Xn2 =⇒ MQ (t) = (1 − 2t)− 2
1
Q2 ∼ X12 =⇒ MQ2 (t) = (1 − 2t)− 2
1
Q3 ∼ X12 =⇒ MQ3 (t) = (1 − 2t)− 2
−n+2
=⇒ MQ1 (t) = (1 − 2t) 2

(n − 2)Se2 2
=⇒ Q1 = ∼ Xn−2
σ2
Note: If Y ∼ Γ(α, β) then
MY (t) = (1 − βt)−α
and
McY (t) = MY (ct)
Let’s now find the distribution of s2e .
σ2
Se2 = Q1
n−2
σ2

MSe2 (t) = M σ2 (t) = MQ1 t
n−2 Q1 n−2
−n+2
2σ 2
2

MSe2 (t) = 1− t
n−2
Therefore,
n − 2 2σ 2

Se2 ∼Γ ,
2 n−2
2σ 4
ESe2 = σ 2 , var(Se2 ) =
n−2

14
Duc Vu (Fall 2021) 4 Lec 4: Oct 4, 2021

Another way to show this result is to use the non-centered model

P 2
Yi − β0 − β1 Xi ± β̂0 ± β̂1 Xi
σ2

15
Duc Vu (Fall 2021) 5 Lec 5: Oct 6, 2021

§5 Lec 5: Oct 6, 2021

§5.1 D i s t r i b u t i o n T h e o r y U s i n g N o n - C e nt e r e d M o d e l
(n−2)Se2 2
Recall that we want to show σ2 ∼ Xn−2 using the non-centered model Yi = β0 + β1 Xi + εi for
i.i.d
ε1 , . . . , εn ∼ N (0, σ). Then, Yi ∼ N (β0 + β1 Xi , σ). Let
P 2
Yi − β0 − β1 Xi ± β̂0 ± β̂1 Xi
M= ∼ Xn2
σ2
Then,
P 2
yi − β̂0 − β̂1 xi + (β̂0 − β0 ) + (β̂1 − β1 )xi
M=
σ2
e2i 2
(β̂1 − β1 )2 x2i
P P P P
n(β̂0 − β0 ) 2(β̂0 − β0 ) ei 2(β̂1 − β1 ) ei xi
= + + + +
σ2 σ2 σ2 σ2 σ2
P
2(β̂0 − β0 )(β̂1 − β1 ) xi
+
σ2
P 2
n(β̂0 − β0 )2 (β̂1 − β1 )2 x2i
P P
ei 2(β̂0 − β0 )(β̂1 − β1 ) xi
= 2
+ + + (**)
| σ{z } | σ2 σ2 {z σ2 }
2
(n−2)Se ?
σ2

Let D = β̂0 + β̂1 X = Y and consider

2
(β̂1 − β1 )2 (D − (β0 + β1 x))
+ (*)
var(β̂1 ) var(D)

σ
Note: β̂1 ∼ N β1 , √P and
(xi −x)2

Yi = β0 + β1 Xi + εi
P P
Yi εi
Y = = β0 + β1 X +
n n

So Y ∼ N β0 + β1 X, √σn and thus D−(β 0 +β1 X)
√
σ/ n
∼ N (0, 1). It follows that each term in (*) follows
chi-square distribution with 1 degree of freedom. Now, we have

(β̂1 − β1 )2 X 2 n(β̂0 − β0 )2 (β̂1 − β1 )2 2 2(β̂0 − β0 )(β̂1 − β1 ) X

(∗) = (x i − x) + + nx + xi
σ2 σ2 σ2 σ2
x2i − nx2

(β̂1 − β1 )2 (β̂1 − β1 )2 nx2
P
n(β̂0 − β0 )2
P
2(β̂0 − β0 )(β̂1 − β1 ) xi
= + + +
σ2 σ2 σ2 σ2
which is equivalent to the last three terms of (**). We just need to show that

cov(Y , β̂1 ) = 0
cov(Y , ei ) = 0
cov(β̂1 , ei ) = 0

Remark 5.1. Under normality, zero covariance implies independence.

16
Duc Vu (Fall 2021) 5 Lec 5: Oct 6, 2021

§5.2 A Note on Gamma Distribution

Let Q ∼ Γ(α, β). Then

EQ = αβ
var(Q) = αβ 2
MQ (t) = (1 − βt)−α
Γ(α + k)β k
EQk =
Γ(α)

where Z ∞
Γ(α) = xα−1 e−x dx
0
is the Gamma function.
Property:

Γ(α) = (α − 1)Γ(α − 1)
Γ(α + 1) = αΓ(α)

If α is an integer, then
Γ(α) = (α − 1)!
2

n−2 2σ
Recall that Se2 ∼ Γ 2 , n−2

2σ 4
ESe2 = σ 2 , var(Se2 ) =
n−2
Is Se unbiased estimator of σ?
1
ESe = E Se2 2
12
n−2 1 2σ 2
Γ 2 + 2 n−2
= n−2

Γ 2
r
2 n−1 n−2
=σ Γ /Γ
n−2 2 2
= σA
Se
Thus, it’s biased and we can adjust the result to be unbiased, i.e., A.
If Y ∼ Xn2 , then
n
MY (t) = (1 − 2t)− 2
which is Γ n2 , 2 .

§5.3 C o e f f i c i e nt o f D e t e r m i n a t i o n
Recall X X X
(yi − y)2 = e2i + (ŷi − y)2
| {z } | {z } | {z }
SST SSE SSR

where Ŷi = y + β̂1 (xi − x). We define R2 as

SSR SSE
R2 = or R2 = 1 −
SST SST

17
Duc Vu (Fall 2021) 5 Lec 5: Oct 6, 2021

and 0 ≤ R2 ≤ 1. We have

var(Ŷi ) = var y + β̂1 (xi − x)
!
2 1 (xi − x)2
=σ +P 2
n (xi − x)

Another way to show this is to express Ŷi as a linear combination of Y1 , . . . , Yn .

Ŷi = y + β̂1 (xi − x)

P
yj X
= + (xi − x) kj yj
n
X1
= + (xi − x)kj yj
n
X1 2
var(Ŷi ) = σ 2 + (xi − x)kj
n
X 1 2

= σ2 + (x i − x)2 2
kj + (x i − x)kj
n2 n
(xi − x)2

1
= σ2 +P
n (xi − x)2

Consider
P X
X yl X 1
ei = yi − ŷi = yi − y − β̂1 (xi − x) = al yl − − (xi − x) kl yl = al − − (xi − x)kl yl
n n

where (
1, if l = i
al =
0, otherwise

18
Duc Vu (Fall 2021) 6 Lec 6: Oct 8, 2021

§6 Lec 6: Oct 8, 2021

§6.1 Va r i a n c e & C ova r i a n c e O p e r a t i o n s

Have
X X Xn X
n X X
cov ai Yi , bj Yj = ai bj cov(Yi , Yj ) = ai bi cov(Yi , Yi ) = σ 2 ai bi
i=1 j=1

because Y1 , . . . , Yn are independent.

Example 6.1
Consider β̂0 and β̂1
X X
cov(β̂0 , β̂1 ) = cov li Yi , ki Yj
X
= σ2 li ki
X 1
2
=σ − ki x ki
n
1X X
= σ2 ki − σ 2 x ki2
n
σ2 x
= −P
(xi − x)2

Or

cov β̂0 , β̂1 = cov Y − β̂1 X, β̂1

= cov Y , β̂1 − X var(β̂1 )
−xσ 2
=P
(xi − x)2

Example 6.2
Consider Ŷi and Ŷj

cov Ŷi , Ŷj = cov y + β̂1 (xi − x), y + β̂1 (xj − x)
σ2 (xi − x)(xj − x) 2
= +0+0+ P σ
n (xi − x)2

1 (xi − x)(xj − x)
= σ2 + P
n (xi − x)2

When i = j,
(xi − x)2

2 1
var(Ŷi ) = σ +P
n (xi − x)2

19
Duc Vu (Fall 2021) 6 Lec 6: Oct 8, 2021

Example 6.3 (Cont’d)

Notice that
P
yl X
Ŷi = y + β̂1 (xi − x) = + (xi − x) k l yl
n
X 1 X
= + (xi − x)kl yl = al yl
n
X
Yˆj = . . . = bv yv
X
cov Ŷi , Ŷj = σ 2 al bl
X1
1

2
=σ + (xi − x)kl + (xj − x)kl
n n

1 (xi − x)(xj − x)
= σ2 + P
n (xi − x)2

§6.2 Inference
Construct a confidence interval 1 − α for β1

P (L ≤ β1 ≤ U ) = 1 − α

Know !
σ
β̂1 ∼ N β1 , p P
(xi − x)2
and
(n − 2)Se2 2
∼ Xn−2
σ2
Consider

cov β̂1 , ei = 0

Under normality, since their covariance is 0, β̂1 and Se2 are independent. Thus,

1 −β1
√β̂P
σ/ (xi −x)2 β̂1 − β1
q = pP ∼ tn−2
(n−2)Se2
/(n − 2) Se / (xi − x)2
σ2

Pivot Method: !
β̂1 − β1
P −t α2 ; n−2 ≤ pP ≤ t α2 ; n−2 =1−α
Se / (xi − x)2
and after some manipulation we get
!
Se Se
P β̂1 − t α2 ; n−2 · pP ≤ β1 ≤ β̂1 + t α2 ; n−2 · pP =1−α
(xi − x)2 (xi − x)2

We are 1 − α confident that

" #
Se
β1 ∈ β̂1 ± t α2 ; n−2 · pP
(xi − x)2

20
Duc Vu (Fall 2021) 6 Lec 6: Oct 8, 2021

For β̂0 ,  
s
2
1 x
β̂0 ∼ N β0 , σ +P 
n (xi − x)2
and we proceed similarly to obtain
 s 
2
1 x
β0 ∈ β̂0 ± t α2 ;n−2 · Se +P 
n (xi − x)2

Say if we want to construct a confidence interval for β0 − 2β1 :

var(β̂0 − 2β̂1 ) = var(β̂0 ) + 4 var(β̂1 ) − 4 cov(β̂0 , β̂1 )
x2

2 1 4 4x
=σ + P + P + P
n (xi − x)2 (xi − x)2 (xi − x)2
(x + 2)2

1
= σ2 +P
n (xi − x)2
So, s !
1 (x + 2)2
β̂0 − 2β̂1 ∼ N β0 − 2β1 , σ +P
n (xi − x)2
Thus, the C.I. is " s #
1 (x + 2)2
β0 − 2β1 ∈ β̂0 − 2β̂1 ± t α2 ; n−2 · Se +P
n (xi − x)2

§6.3 P r e d i c t i o n I nt e r va l
Prediction interval for Y0 when X = X0 . Let’s begin with error of prediction: Y0 − Ŷ0 . We know
• Yi = β0 + β1 Xi + εi
• Y0 = β0 + β1 X0 + ε0
• Ŷ0 = β̂0 + β̂1 X0
So
E(Y0 − Ŷ0 ) = 0
var(Y0 − Ŷ0 ) = var(Y0 ) + var(Ŷ0 ) − 2 cov(Y0 , Ŷ0 )
(x0 − x)2

2 1
=σ 1+ + P
n (xi − x)2
We apply the same procedure in the inference section
q 2

Y0 − Ŷ0 ∼ N 0, σ 1 + n1 + P(x(x0 −x)
s
i −x)
2
 1 (x0 − x)2
=⇒ Y0 ∈ Ŷ0 ± t α2 ;n−2 Se 1 + + P
(n−2)Se2
2 n (xi − x)2
∼ Xn−2

σ2
C.I. for EY0 for a given X = X0
s !
1 (x0 − x)2
Ŷ0 ∼ N β0 + β1 X0 , σ +P
n (xi − x)2
(n − 2)Se2 2
∼ Xn−2
σ2 s
1 (x0 − x)2
=⇒ EY0 ∈ Ŷ0 ± t α2 ;n−2 · Se +P
n (xi − x)2

21
Duc Vu (Fall 2021) 7 Lec 7: Oct 11, 2021

§7 Lec 7: Oct 11, 2021

§7.1 H y p o t h e s i s Te s t i n g
Consider the model:
Yi = β0 + β1 Xi + εi

Example 7.1
Hypothesis testing examples

H0 : β1 = 0, Ha : β1 6= 0
H0 : β1 = 1, Ha : β1 6= 1
H0 : β0 = 0, Ha : β0 6= 0
H0 : β0 + β1 = 0, Ha : β0 + β1 6= 0
)
β0 = β0∗
H0 : , Ha : not true
β1 = β1∗

Let’s consider the following two-sided test

H0 : β 1 = 0
Ha : β1 6= 0

Recall under H0 ,

σ
β̂1 ∼ N 0, √P

 β̂
(xi −x)2
=⇒ t = pP 1 ∼ tn−2
(n−2)Se2 2
 Se / (xi − x)2
∼ Xn−2

σ2

We reject H0 if t > t α2 ; n−2 or t < −t α2 ; n−2 . Using a 1 − α C.I.

Se
β1 ∈ β̂1 ± t α2 ; n−2 pP
(xi − x)2
For example, for −2 ≤ β1 ≤ 2, we do not reject H0 .

p − value = 2P (t > t∗ )

We reject H0 if p-value < α.

Test H0 : β1 = 0 using the F statistics. Under H0 ,
!
σ
β̂1 ∼ N 0, pP
(xi − x)2
β̂ − 0
pP1 ∼ N (0, 1)
σ/ (xi − x)2
Then,
β̂12 (xi − x)2
P
∼ X12
σ2
and we know
(n − 2)Se2 2
∼ Xn−2
σ2

22
Duc Vu (Fall 2021) 7 Lec 7: Oct 11, 2021

Therefore, we can form the F statistics

β̂12(xi −x)2
P
β̂12 (xi − x)2
P
σ2 /1
(n−2)Se2
= ∼ F1, n−2
/(n − 2) Se2
σ2

Definition 7.2 (F Distribution) — Let U ∼ Xn2 and V ∼ Xm

2
and U, V are independent. Then,
U
n
V
∼ Fn,m
m

We can observe that t2n−2 = F1, n−2 . In general,

Z ∼ N (0, 1)
U ∼ Xn2
Z, U are independent
Z Z 2 /1
p ∼ tn =⇒ ∼ F1, n
U/n U/n

F1,n−2
α

F1−α;1,n−2

Let’s find the expected value of the F statistics.

• Denominator:
ESe2 = σ 2

• Numerator:
X X
E β̂12 (xi − x)2 = (xi − x)2 E β̂12
X
= (xi − x)2 var(β̂1 + (E β̂1 )2
σ2
X
= (xi − x)2 P + β 2
1
(xi − x)2
X
= σ 2 + β12 (xi − x)2

Under H0 the ratio is approximately equal to 1. If H0 is not true the ratio is greater than 1.

23
Duc Vu (Fall 2021) 7 Lec 7: Oct 11, 2021

Now, for β̂0 ,

q 2

β̂0 ∼ N 0, σ n1 + P x
(xi −x)2
 β̂0
=⇒ t = q ∼ tn−2
(n−2)Se2 2 Se 1
+ P x
2
∼ Xn−2

σ2 n (xi −x)2

and consider H0 : β1 = 1(β1 − 1 = 0) and Ha : β1 6= 1 (β1 − 1 6= 0). Then under H0 ,

β̂ − 1
pP1 ∼ N (0, 1)
σ/ (xi − x)2

Test Statistics:
β̂1 − 1
pP ∼ tn−2
Se / (xi − x)2
Using F statistics
(β̂1 − 1)2 (xi − x)2
P
∼ X12
σ2
and thus
(β̂1 − 1)2 (xi − x)2
P
2
∼ F1, n−2
Se

24
Duc Vu (Fall 2021) 8 Lec 8: Oct 13, 2021

§8 Lec 8: Oct 13, 2021

§8.1 L i ke l i h o o d R a t i o Te s t
Consider
Yi = β1 Xi + εi
H0 : β1 = 0
Ha : β1 6= 0
We know
!
σ
β̂1 ∼ N 0, pP
x2i
(n − 1)Se2 2
∼ Xn−1
σ2
β̂12 x2i
P
β̂1
√
So ttest : ∼ tn−1 and Ftest : Se2 ∼ F1,n−1 .
x2i
P
Se /
Likelihood Ratio Test (LRT):

For testing: H0 : β1 = 0
For the model: Yi = β0 + β1 Xi + εi
Show that this LRT is equivalent to the F statistic.
We reject H0 if
L(ŵ)
Λ= <k
L(ω̂)
where L(ŵ) is the maximized likelihood function under H0 and L(ω̂) is maximized likelihood
function under no restrictions. Under H0 : β1 = 0, we have Yi = β0 + εi . The likelihood function is
n 1
P 2
L = (2πσ 2 )− 2 e− 2σ2 (yi −β0 )
n 1 X
ln L = − ln 2πσ 2 − 2 (yi − β0 )2
2 2σ
β̂0 = y
(yi − y)2
P
2
σ̂0 =
n

e2i
P
Under no restriction, the estimates are the MLEs of β0 , β1 , σ 2 which are β̂0 , β̂1 and σ̂12 = n . Back
to LRT, we have
L(ŵ)
Λ=
L(ω̂)
1
(yi −y)2
P
n −
(2πσ02 )− 2 e 2σ02

= − 1
P
e2i
<k
2
(2πσ12 )e 2σ1

Note:
X
(yi − y)2 = nσ02
X
e2i = nσ12

25
Duc Vu (Fall 2021) 8 Lec 8: Oct 13, 2021

So,
n n
(2πσ̂02 )− 2 e− 2
n n < k
(2πσ̂12 )− 2 e− 2
σ̂12 2
< kn
σ̂02
P 2
ei /n 2
P < kn
(yi − y)2 /n

Notice that
X X X
(yi − y)2 = e2i + (ŷi − y)2
X X X
(yi − y)2 = e2i + β̂12 (xi − x)2

So,

e2i
P
2
< kn
e2i + β̂12 (xi − x)2
P P

1 2
< kn
β̂12 (xi −x)2
P
1+ P 2
ei
2
β̂1 (xi − x)2
P
2
> k− n − 1
(n − 2)Se2
β̂1 (xi − x)2
P 2
−n
> (n − 2) k − 1 = k0
Se2

We reject H0 if
β̂12 (xi − x)2
P
> k0
Se2
Recall we stated that we reject H0 if Λ = L( ŵ)
L(ω̂) < k. Let’s find k. First, we need α (type I error).
Before that, we know
β̂12 (xi − x)2
P
∼ F1,n−2
Se2
So,

P F1,n−2 > k 0 H0 is true = α

§8.2 Powe r A n a l y s i s i n S i m p l e R e g r e s s i o n
Using the non-central t distribution

Definition 8.1 (Non-central t) — Let Z ∼ N (δ, 1) and U ∼ Xn2 and Z and U are independent.
Then,
Z
p ∼ tn (NCP = δ)
U/n

Back to the t ratio. If H0 is true,

√Pβ̂1
σ/ (xi −x)2
q
(n−2)Se2
σ2 /(n − 2)

26
Duc Vu (Fall 2021) 8 Lec 8: Oct 13, 2021

follows central tn−2 in which the numerator

√P follows standard
normal distribution. If H0 is not
β1 (xi −x)2
true, then the numerator follows N σ , 1 . Thus, the ratio follows tn−2 (NCP =
√P
β1 (xi −x)2
σ ). Finally, the power is

1 − β = P tn−2 (NCP) > t α2 ;n−2 + P tn−2 )NCP) < −t α2 ;n−2

27
Duc Vu (Fall 2021) 9 Lec 9: Oct 15, 2021

§9 Lec 9: Oct 15, 2021

§9.1 Extra Sum of Squares Method

So far, we have learnt several ways for hypothesis testing for, e.g.,

Yi = β0 + β1 Xi + εi
H0 : β1 = 0
Ha : β1 6= 0

which are

1. t statistics
2. F statistics
3. Likelihood ratio test

4. Extra sum of square principle (reduced and full model)

(SSER − SSEF )/(dfR − dfF )

∼ F1,n−2
SSEF /dfF
P 2)
SSEF = ei
dfF = n − 2

Under H0 : β1 = 0 we have a reduced model

Yi = β0 + εi =⇒ β̂0 = y

(yi − y)2 and dfR = n − 1. Thus,

P
Therefore SSER =
P
(yi − y)2 − e2i / (n − 1 − (n − 2))
P
P 2
ei /(n − 2)

Note: X X X
(yi − y)2 = e2i + β̂12 (xi − x)2
| {z } | {z } | {z }
SST SSE SSR

So,

β̂12 (xi − x)2

P
∼ F1,n−2
Se2
!2
β̂1
pP ∼ t2n−2
Se / (xi − x)2

28
D u c Vu ( Fa l l 2 0 2 1 ) 9 Lec 9: Oct 15, 2021

Example 9.1
Use the extra sum of squares method to test

H0 : β1 = 1
Ha : β1 6= 1

Reduced model: Yi = β0 + xi + εi

Yi − xi = β0 + εi
β̂0 = y − x
X 2
SSER = (yi − xi − (y − x))
X 2
= (yi − y − (xi − x))
X X X
= (yi − y)2 + (xi − x)2 − 2 (xi − x)(yi − y) (*)

Note:
X X X
(yi − y)2 = e2i + β̂12 (xi − x)2
P
(xi − x)(yi − y)
β̂1 = P
(xi − x)2
X X
=⇒ (xi − x)(yi − y) = β̂1 (xi − x)2

So, we have
X X X X
(∗) = (xi − x)2 +
e2i + β̂12 (xi − x)2 − 2β̂1 (xi − x)2
X X
SSER = e2i + (β̂1 − 1)2 (xi − x)2

Test statistics:
(SSER − SSEF )/(dfR − dfF )
SSEF /dfF
P P
e2i + (β̂1 − 1)2 (xi − x)2 − e2i / (n − 1 − (n − 2))
P
P 2
ei /(n − 2)
2
(xi − x)2
P
(β̂1 − 1)
∼ F1,n−2
Se2

Proof. Under H0 , 
β̂ ∼ N 1, √P σ
1
(xi −x)2
 (n−2)Se2 2
σ2 ∼ Xn−2
So,
2
β̂1 −1)
√(P /1
σ/ (xi −x)2
(n−2)Se2
σ2 /(n − 2)
2
(xi − x)2
P
(β̂1 − 1)
∼ F1,n−2
Se2

29
Duc Vu (Fall 2021) 9 Lec 9: Oct 15, 2021

§9.2 Powe r A n a l y s i s U s i n g N o n - C e nt r a l F D i s t r i b u t i o n

Definition 9.2 — 1. Y ∼ N (µ, 1) then Y 2 ∼ X12 (θ = µ2 )

2. Suppose Y ∼ N (µ, σ)

Y µ
∼N ,1
σ σ
Y2 µ2
2
∼ X12 (θ = 2 )
σ σ

MGF of Y ∼ X12 (NCP = θ). Then

− 21 t
MY (t) = (1 − 2t) eθ 1−2t
− 12
If θ = 0 =⇒ MY (t) = (1 − 2t) .
Consider now
i.i.d
Y1 , Y2 , . . . , Yn ∼ N (µ, σ)
Y12 Yn2
Find distribution of Q = σ2 + ... + σ2 .
n
µ2 t
− 21 σ2 1−2t
MQ (t) = (1 − 2t) e
n nµ2 t
= (1 − 2t)− 2 e σ2 1−2t
P 2
nµ2

Yi 2
Q= ∼ X n θ =
σ2 σ2

Non-Central F Distribution: Let U ∼ Xn2 (NCP = θ) and V ∼ Xm

2
. If U, V are independent, then

U/n
∼ Fn,m (NCP = θ)
V /m

Back to simple regression:

!
σ
β̂1 ∼ N β 1 , pP
(xi − x)2
!
β̂1 β1
pP ∼N pP ,1
σ/ (xi − x)2 σ/ (xi − x)2
β̂12 (xi − x)2 β12 (xi − x)2
P P
2
∼ X 1 θ =
σ2 σ2
(n − 2)Se2 2
∼ Xn−2
σ2
β̂12 (xi −x)2
P
β12 (xi − x)2
P
σ2 /1
(n−2)Se2
∼ F1,n−2 θ =
2 /(n − 2) σ2
σ

Thus,
1 − β = P (F1,n−2 (θ) > F1−α;1,n−2 )

30
Duc Vu (Fall 2021) 10 Lec 10: Oct 18, 2021

§10 Lec 10: Oct 18, 2021

§10.1 Multiple Regression

Consider:
Yi = β0 + β1 Xi1 + β2 Xi2 + . . . + βk Xik + εi , i = 1, . . . , n
where we have k predictors
      
Y1 1 x11 ... x1k β0 ε1
 Y2  1 x12 ... x2k   β1   ε 2 
 ..  =  .. ..   ..  +  .. 
      
..
 .  . . .  .   . 
Yn 1 xn1 ... xnk βk εn
Y = Xβ + ε

where

Y : n × 1 response vector
X : n × (k + 1) regression matrix
β : (k + 1) × 1 parameter vector
ε : n × 1 error vector

Assumption: Gauss-Markov conditions


E [εi ] = 0, i = 1, . . . , n 

var(εi ) = σ 2 , i = 1, . . . , n =⇒ E [ε] = 0, var(ε) = σ 2 I

ε1 , ε2 , . . . , εn are independent


 
Y1
Let Y =  ...  be a random vector with mean vector
 

Yn

    
Y1 EY1 µ1
 ..   ..   .. 
µ = E [Y] = E  .  =  .  =  . 
Yn EYn µn

and variance covariance matrix

σ12
 
σ12 ... σ1n
 σ21 σ22 ... σ2n 
Σ = E [Y − µ] [Y − µ] =  .
 
.. .. .. 
 .. . . . 
σn1 σn2 ... σn2
 
Y1 − µ1
 Y2 − µ2 
E  Y1 − µ1 Y2 − µ2 ... Yn − µn
 
..
 . 
Yn − µn

31
Duc Vu (Fall 2021) 10 Lec 10: Oct 18, 2021

 
a1
Properties: Let a =  ...  be a vector of constants and let a0 Y be a linear combination Y. Then
 

an
X
E [a0 Y] = a0 EY = a0 µ = ai µi
0 0
var (a Y) = a Σa

Let A be an m × n matrix of constant and consider AY (m × 1 vector). Then

E [AY] = AEY = Aµ
var (AY) = AΣA0

Using the Gauss-Markov conditions

EY = E [Xβ + ε] = Xβ
var(Y) = var (Xβ + ε) = σ 2 I

Estimation of β using Least Squares:

1. Geometric interpretation of least squares – orthogonal projection

X0 (Y − Xβ) = 0
X0 Y − X0 Xβ = 0
X0 Xβ = X0 Y
−1
β̂ = (X0 X) X0 Y

which is the least squares estimator of β.

2. Minimize the error sum of squares
X
min Q = ε2i

or min Q = ε0 ε but Y = Xβ + ε. Or
0
min Q = (Y − Xβ) (Y − Xβ)

Then,

min Q = Y0 Y − Y0 Xβ − βX0 Y + β 0 X0 Xβ
= Y0 Y − 2Y0 Xβ + βX0 Xβ
∂Q
=0 (*)
∂β

Note: Matrix
 and  vector differentiation:
θ1
Let θ =  ...  and g(θ) be a function of θ. Then
 

θp
 ∂g(θ) 
∂θ1
∂g(θ)  . 
=  .. 

∂θ ∂g(θ)

∂θp

32
Duc Vu (Fall 2021) 10 Lec 10: Oct 18, 2021

Let g(θ) = c0 θ. Then,

∂g(θ)
=c
∂θ
Let A be a symmetric matrix and consider g(θ) = θ 0 Aθ. Then,

∂g(θ)
= 2Aθ
∂θ
So apply these result to (*), we obtain

2X0 Y + 2X0 Xβ = 0
X0 Xβ = X0 Y

which is known as the normal equations. Notice that

−1
β̂ = (X0 X) X0 Y

which is OLS estimator of β.

33
Duc Vu (Fall 2021) 11 Lec 11: Oct 20, 2021

§11 Lec 11: Oct 20, 2021

§11.1 M u l t i p l e R e g r e s s i o n ( C o nt ’ d )
Recall that

Y = Xβ + ε
E [ε] = 0
var(ε) = σ 2 I

Least squares: X
min ε2i = ε0 ε = (Y − Xβ)0 (Y − Xβ)
Normal Equations:
−1
X0 Xβ = X0 Y =⇒ β̂ = (X0 X) X0 Y
Note that X is not a square matrix, so X0 X has to go together in order for it to be invertible.

X = (1, x1 , x2 , . . . , xk )
0
 
1
10 x 1 10 x 2 10 xk
 
x01  n ...
 0  x1 1 x01 x1
 0
x01 x2 ... x01 xk 
X0 X =  x 2  1 x1 x2 . . . xk =  .
  
.. .. ..
 ..

 ..  . . . 
 . 
x0k 1 x0k x1 ... x0k xk
x0k

which is a symmetric (k + 1) × (k + 1) matrix. We have

X
x1 x1 = x2i1
X
x1 0 x2 = xi1 xi2

Partition X and β

X = 1 X(0)

β0
β=
β(0)

Model:

Y= 1 X(0) β0 β(0) + ε
Y = β0 1 + X(0) β(0) + ε

Then,

10

0

XX= 1 X(0)
X(0) 0
10 X(0)

n
=
X(0) 0 1 X(0) 0 X(0)

So  
β̂0
β̂1 
10 X(0) 10 Y

β̂0 n
β̂ =  .  = ˆ =
 
 ..  β(0) X(0) 0 1 X(0) 0 X(0) X(0) Y
β̂k

34
Duc Vu (Fall 2021) 11 Lec 11: Oct 20, 2021

Also, 0
10

0 1Y
XY= Y=
X(0) 0 X0(0) Y
Fitted Values:

Ŷi = β̂0 + β̂1 xi1 + β̂2 xi2 + . . . + β̂k xik

 
    β̂0
Ŷ1 1 x11 x12 . . . x1k  
 Ŷ2  1 x21 x22 . . . x2k   β̂1 
..  β̂2 
 
 .  =  ..
  
. .. .. 
 .  . . . .   .. 

.
Ŷn 1 xn1 xn2 . . . xnk
β̂k
or
Ŷ = Xβ̂
or
−1
Ŷ = X (X0 X) X0 Y = HY
−1
where H = X (X0 X) X0 which is n × n “hat” matrix.
Properties of H:
1. H0 = H symmetric
0
(ABC) = C0 B0 A0

2. HH = H – idempotent
−1 −1
X (X0 X) X0 X (X0 X) X=H
h i
−1
3. tr H = tr X (X0 X) X0 = tr ((X0 X)−1 X0 X = tr Ik+1 = k + 1. Notice that the property

of trace is
tr (ABC) = tr (BCA) = tr (CAB) 6= tr (BAC)

4. HX = X or H (1, x1 , . . . , xk ) = (1, x1 , . . . , xk )
Residuals:

ei = yi − ŷi i = 1, . . . , n
e = y − ŷ
e = y − xβ̂
e = Y − HY
e = (I − H) Y = (I − H) (Xβ + ε)
= (I − H) Xβ + (I − H) ε
= (I − H) ε

Overall, we have two expressions for e

e = (I − H) Y
e = (I − H) ε

Notice that the error sum of squares

0
X
SSE = e2i = e0 e = [(I − H) Y] [(I − H) Y] = Y0 (I − H) Y

or
0
SSE = [(I − H) ε] [(I − H) ε] = ε0 (I − H) ε

35
Duc Vu (Fall 2021) 11 Lec 11: Oct 20, 2021

Properties of β̂:
h i
−1 −1
E β̂ = E X0 X X0 Y = (X0 X) X0 X |{z}
EY = β
=β

which is unbiased.
h i
−1 −1 −1
var (β) = var (X0 X) X0 Y = (X0 X) X0 σ 2 IX (X0 X)
−1
= σ 2 (X0 X)

which is variance covariance matrix of β̂. Specifically,

 
v00 v01 ... v0k
v10 v11 ... v1k 
−1
var β̂ = σ 2 (X0 X) = σ2  .
 
.. .. .. 
 .. . . . 
vk0 vk1 ... vkk

var β̂0 = σ 2 v00

var(β̂1 ) = σ 2 v11

cov β̂1 , β̂2 = σ 2 v12

where
−1
(X0 X) = {vij }i=1,...,n;j=1,...,n

36
Duc Vu (Fall 2021) 12 Lec 12: Oct 22, 2021

§12 Lec 12: Oct 22, 2021

§12.1 G a u s s - M a r kov T h e o r e m i n M u l t i p l e R e g r e s s i o n
−1
Let β̂ = (X0 X) X0 Y be the least squares estimator of β and ∗
let b =M Y be an unbiased
∗ 0 −1 0
estimator of β (not the least squares). Let’s define M = M + X X X .
b is unbiased
Eb = β
because
EM∗ Y = β
or
h i
−1
E M + X0 X X0 Y = β

−1
M + (X0 X) X0 Xβ = β
MXβ + β = β
MX = 0

Check var(b). h i
−1
var(b) = var (M∗ Y) = var M + (X0 X) X0 Y

Note:
var(AY) = AΣA0
where var(Y) = σ 2 I. Then,
h ih i
−1 −1
var(b) = σ 2 M + (X0 X) X0 M0 + X (X0 X)
−1 −1
= σ 2 MM0 + σ 2 MX (X0 X) + σ 2 (X0 X) X0 M0
−1 −1
+ σ 2 (X0 X) X0 X (X0 X)
−1
= σ 2 MM0 + σ 2 (X0 X)
= σ 2 MM0 + var(β̂1 )

A matrix B is positive definite if for a non zero vector a

a0 Ba > 0

Aside Note:
var (aY0 ) = a0 Σa > 0
Now, let a be a non zero vector
0
a0 MM0 a = (M0 a) (M0 a)
= q0 q
X
= qi2 > 0

Therefore, MM0 is a positive definite matrix and thus var(b) ≥ var β̂ .

37
Duc Vu (Fall 2021) 12 Lec 12: Oct 22, 2021

§12.2 G a u s s - M a r kov T h e o r e m Fo r a L i n e a r C o mb i n a t i o n
We have

var a0 β̂ = a0 var β̂ a
−1
= σ 2 a0 (X0 X) a
or

var a0 β̂0 + a1 β̂1 + a2 β̂2 = a20 var(β̂0 ) + a21 var(β̂1 ) + a22 var(β̂2 ) + 2a0 a1 cov(β̂0 , β̂1 )

+ 2a0 a2 cov(β̂0 , β̂2 ) + 2a1 a1 cov(β̂1 , β̂2 )

0
Let’s compare it to var(a b).
var (a0 b) = a0 var(b)a
h i
−1
= σ 2 a0 MM0 + (X0 X) a
−1
= σ 2 a0 MM0 a + σ 2 a0 (X0 X) a

= σ 2 a0 MM0 a + var a0 β̂

Thus, var(a0 b) ≥ var a0 β̂ .
Special Case:
 
0
0
 .. 
 
.
 
1
a= 
.
 .. 
 
0
0
var(bi ) ≥ var(β̂i )

§12.3 R e v i e w o f M u l t i va r i a t e N o r m a l D i s t r i b u t i o n
i.i.d
Normality assumption: ε1 , . . . , εn ∼ N (0, δ)
ε ∼ Nn 0, σ 2 I

Let Y ∼ Nn (µ, Σ)
1 − 12 − 12 (y−µ)0 Σ−1 (y−µ)
f (y) = n |Σ| e
(2π) 2
Consider
f (ε) = f (ε1 ) · f (ε2 ) . . . f (εn )
)
1 − 12 1 2
I)−1 ε
1 2 = n σ2 I e− 2 ε(σ
f (εi ) = √1 e− 2σ2 Σi (2π) 2
σ 2π
So 1 0
n
f (ε) = (2πσ 2 )− 2 e− 2σ2 ε ε =⇒ ε ∼ Nn (0, σ 2 I)
Joint MGF: Let Y ∼ Nn (µ, Σ). Then
0 0 1 0
MY (t) = Eet Y = et µ+ 2 t Σt
 
t1
 .. 
where t =  . .
tn

38
Duc Vu (Fall 2021) 12 Lec 12: Oct 22, 2021

Theorem 12.1
Let Y ∼ Nn (µ, Σ) and let A be m × n matrix of constant and c m × 1 vector of constants.
Using the joint mgf

AY ∼ Nm (Aµ, AΣA0 )
AY + c ∼ Nm (Aµ + c, AΣA0 )

Notice that

Y + Xβ + ε
2

ε ∼ Nn (0, σ I)

=⇒ Y ∼ Nn Xβ, σ 2 I

EY = Xβ

var(Y) = σ 2 I


39
Duc Vu (Fall 2021) 13 Lec 13: Oct 25, 2021

§13 Lec 13: Oct 25, 2021

§13.1 T h e o r e m s i n M u l t i va r i a t e N o r m a l D i s t r i b u t i o n
Consider: Y ∼ Nn (µ, Σ)
1 − 12 − 12 0
f (y) = n |Σ| e (y − µ) Σ−1 (y − µ)
(2π) 2
0 0 1 0
My (t) = Eet y = et µ+ 2 t Σt
1
Proof. Let Z ∼ N (0, I) and Y = Σ 2 Z + µ. Then, the spectral decomposition of Σ is
 
λ1 0
Σ = PΛP0 , Λ=
 .. 
. 
0 λn
1 1
Σ 2 = PΛ 2 P0

So,
1 2
MZi (ti ) = Eeti zi = e 2 ti
0
MZ (t) = Eet z = Eet1 z1 +...+tn zn
= Eet1 z1 · Eet2 z2 . . . Eetn zn
1 0
= e2t t
MY (t) = M 1 (t)
Σ 2 Z+µ
1
t0 Σ 2 Z+µ
= Ee
1 0
t0 µ Σ2 t Z
=e Ee
1
Let t∗ = Σ 2 t. Then
0 ∗0
MY (t) = et µ Eet Z

0 0 1 ∗0
= et µ MZ (t∗ ) = et µ e 2 t T BA

0 1 0
= et µ+ 2 t Σt

Theorem 13.1
Let A be m × n matrix of constants and C be m × 1 vector of constants. Then

AY + C ∼ Nm (Aµ + C, AΣA0 )
AY ∼ Nm (Aµ, AΣA0 )

Proof. We have
0
MAY+C (t) = Eet (AY+C)
0 0 0
= et C · Ee(A t) Y

40
Duc Vu (Fall 2021) 13 Lec 13: Oct 25, 2021

Let t∗ = A0 t. Then
0
MAY+C (t) = et C · MY (t∗ )
0 ∗0 0
µ+ 21 t∗ Σt∗
= et C · et
0 1 0 0
= et (Aµ+C)+ 2 t AΣA t

Thus, AY + C ∼ Nm (Aµ + C, AΣA0 ).

Theorem 13.2
Let Y Nn (µ, Σ),
Q1 µ1 Σ11 Σ12
Y= , µ= , Σ=
Q2 µ2 Σ21 Σ22
Note that 

Y1
 Y2 
 
 
Y=
 
Y
 3 

 Y4 
Y5
Then,

Q1 ∼ Np (µ1 , Σ11 )
Q2 ∼ Nn−p (µ2 , Σ22 )

Proof. Use the above theorem

Q1 = Ip 0 Y = AY
Then,

EQ1 = EAY

µ1
= I 0
µ2
= µ1
var (Q1 ) = var (AY)

Σ11 Σ12 I
= AΣA0 = I 0
Σ21 Σ22 00
= Σ11

√
If A = a0 (row vector), then a0 Y ∼ N a0 µ, a0 Σa .

41
Duc Vu (Fall 2021) 13 Lec 13: Oct 25, 2021

Theorem 13.3
Q1
Independence for Y =
Q2

µ1
Y ∼ N (µ, Σ) , µ =
µ2

Σ11 Σ12
Σ=
Σ21 Σ22

Then, the MGF is

0 1 0
MY (t) = et µ+ 2 t Σt
0 0 1 0 1 0 0
= et1 µ1 +t2 µ2 + 2 t1 Σ11 t1 + 2 t2 Σ22 t2 +t1 Σ12 t2

If Σ12 = 0, then
0 1 0 0 1 0
MY (t) = et1 µ1 + 2 t1 Σ11 t1 et2 µ2 + 2 t2 Σ22 t2
= MQ1 (t1 ) · MQ2 (t2 )

Thus, Q1 , Q2 are independent ⇐⇒ cov (Q1 , Q2 ) = 0.

For AY, BY, we have

cov (AY, BY) = AΣB0

Theorem 13.4
We have
Q1 Q2 ∼ Np µ1 + Σ12 Σ−1 −1

22 (Q2 − µ2 ) , Σ11 − Σ12 Σ22 Σ21

Back to multiple regression

ε ∼ N 0, σ 2 I

Y = Xβ + ε,
Y ∼ N Xβ, σ 2 I

Then, the likelihood function is

1 − 1 2 −1
σ 2 I 2 e− 2 (y−xβ)(σ I) (y−xβ)
1
L = f (y) = n
(2π) 2
or
n 1 2 0
L = (2πσ 2 )− 2 e− 2 σ (y−xβ) (y−xβ)
n 1 0
ln L = − ln 2πσ 2 − 2 (y − xβ) (y − xβ)
2 2σ
Thus for β,
∂ ln L −1
= 0 =⇒ β̂ = (X0 X) X0 Y
∂β
and estimation for σ 2
∂ ln L µ 1 0
= − 2 + 4 (y − xβ) (y − xβ) = 0
∂σ 2 2σ 2σ
0
Y − Xβ̂ Y − Xβ̂ e0 e
σ̂ 2 = =
n n

42
Duc Vu (Fall 2021) 13 Lec 13: Oct 25, 2021

Now, e = (I − H) Y = (I − H)ε. Therefore,

e0 e = Y0 (I − H) Y = ε0 (I − H) ε
e0 e
σ̂ 2 =
n
Y0 (I − H)Y
=
n
ε0 (I − H) ε
=
n

43
Duc Vu (Fall 2021) 14 Lec 14: Oct 27, 2021

§14 Lec 14: Oct 27, 2021

§14.1 M e a n a n d Va r i a n c e i n M u l t i va r i a t e N o r m a l
Distribution
Consider
Y = Xβ + ε
ε ∼ Nn 0, σ 2 I

=⇒ Y ∼ Nn Xβ, σ 2 I

Joint pdf of Y is
− n2 1 0
f (y) = 2πσ 2 e− 2σ2 (y−xβ) (y−xβ)
Using the method of maximum we obtain the MLEs of β and σ 2
−1
β̂ = (X0 X) X0 Y
which is the same as the least squares estimator. And
0
y − xβ̂ y − xβ̂ e0 e
σ̂ 2 = =
n n
Note that e = (I − H)Y or e = (I − H)ε. Therefore,
e0 e = Y0 (I − H)Y or e0 e = ε0 (I − H)ε
So
1 0
E σ̂ 2 = Ee e
n  
1  0
= E ε (I − H)ε
n | {z }
scalar
1
= E [tr(I − H)εε0 ]
n
1
= tr [E (I − H) εε0 ]
n
1
= tr [(I − H) E (εε0 )]
n
Note:
0
Σ = E (Y − µ) (Y − µ)
E [YY0 ] = Σ + µµ0
where
E(ε) = 0
var(ε) = σ 2 I
Then,
1
E σ̂ 2 = tr (I − H) σ 2 I + 000

n
1
= tr (I − H) σ 2 I
n
σ2
= tr (I − H)
n

44
Duc Vu (Fall 2021) 14 Lec 14: Oct 27, 2021

Let’s compute tr (I − H).

tr (I − H) = tr(I) − tr(H)
h i
−1
= tr(I) − tr X (X0 X) X0
h i
−1
= tr(I) − tr (X0 X) X0 X
= tr(In ) − tr(Ik+1 )
=n−k−1

So,
n−k−1
E σ̂ 2 = σ 2
n
which is biased. Therefore, the unbiased estimator of σ 2 is

n e0 e n e0 e
Se2 = σ̂ 2 = =
n−k−1 n n−k−1 n−k−1
In simple regression (k = 1 – one predictor)

e0 e
P 2
ei
Se2 = =
n−2 n−2

Now, let’s find the mean and variance of Ŷ and e.

Ŷ = HY
E Ŷ = HEY
= HXβ
= Xβ

Note: HX = X.

var Ŷ = var (HY)
= σ2 H

For e,

Ee = E [(I − H)Y]
= E [Y − HY]
= Xβ − Xβ
=0
var(e) = var [(I − H)Y]
= σ 2 (I − H)

§14.2 I n d e p e n d e nt Ve c t o r s i n M u l t i p l e R e g r e s s i o n
If Y ∼ Nn (µ, Σ), then AY and BY are independent iff

cov (AY, BY) = AΣB0 = 0

Apply this result for multiple regression

cov Ŷ, e , cov β̂, e

45
Duc Vu (Fall 2021) 14 Lec 14: Oct 27, 2021

or use
Ŷ HY H
= = Y = AY
e (I − H)Y I−H

Y ∼ Nn Xβ, σ 2 I .

var (AY) = A var(Y)A0

2 H
=σ H I−H
I−H

2 H 0
=σ
0 I−H

Ŷ and e are independent. Similarly, we can show that β̂ and e are independent.

§14.3 Pa r t i a l R e g r e s s i o n
Consider
X = X1 X2
with the following three models
−1
Y = X1 β 1 + ε =⇒ β̂ 1 = (X01 X1 ) X01 Y
−1
Y = X2 β 2 + ε =⇒ β̂ 2 = (X02 X2 ) X02 Y

and
Y = Xβ + ε or Y = X1 β 1 + X2 β 2 + ε

46
Duc Vu (Fall 2021) 15 Lec 15: Oct 29, 2021

§15 Lec 15: Oct 29, 2021

§15.1 Pa r t i a l R e g r e s s i o n ( C o nt ’ d )
Normal equation:
X0 Xβ̂ = X0 Y
using
X01

0 β̂ 12
X = and β̂ =
X02 β̂ 21
Then,

X01 X01 X1 X01 X2

0

XX= X1 X2 =
X02 X02 X1 X02 X2

and
X01
0
0 X1 Y
XY= Y=
X02 X02 Y
and the normal equations are
0
X01 X2
0
X1 X1 β̂ 12 X1 Y
=
X02 X1 X02 X2 β̂ 21 X02 Y

Then,

X01 X1 β̂ 12 + X01 X2 β̂ 21 = X01 Y (1)

X02 X1 β̂ 12 + X02 X2 β̂ 21 = X02 Y (2)

From (1),
X01 X1 β̂ 12 = X01 Y − X01 X2 β̂ 21
So,
−1 −1 0
β̂ 12 = (X01 X1 ) X01 Y − (X01 X1 ) X1 X2 β̂ 21 (3)
| {z }
β̂ 1

Let’s find β̂ 21 by substitute (3) into (2).

h i 0
−1 −1
X02 X1 (X01 X1 ) X01 Y − (X01 X1 ) X01 X2 β̂ 21 + X2 X2 β̂ 21 = X02 Y
−1 −1
X02 X1 (X01 X1 ) X01 Y − X02 X1 (X01 X1 ) X01 X2 β̂ 21 + (X02 X2 ) β̂ 21 = X2 0 Y

Then,

−1 −1
X02 X2 β̂ 21 − X02 X1 (X01 X1 ) X01 X2 β̂ 21 = X02 Y − X02 X1 (X01 X1 ) X01 Y
h i h i
−1 −1
X02 I − X1 (X01 X1 ) X01 X2 β̂ 21 = X02 I − X1 (X01 X1 ) X01 Y

X02 [I − H1 ] X2 β̂ 21 = X02 [I − H1 ] Y
X02 (I − H1 ) (I − H1 ) X2 β̂ 21 = X02 (I − H1 ) (I − H1 ) Y
0 0
[(I − H) X2 ] [(I − H) X2 ] β̂ 21 = [(I − H) X2 ] [(I − H1 ) Y]

Note:
(I − H1 ) Y = Y∗

47
Duc Vu (Fall 2021) 15 Lec 15: Oct 29, 2021

which is residuals from regression of Y on X1 . Suppose

X2 = x3 x4 x5
Here k = 5 and
X= 1 x1 x2 x3 x4 x5
where
X1 = 1 x1 x2 , X2 = x 3 x4 x5
Then,

(I − H1 ) X2 = (I − H1 ) x3 x4 x5

= (I − H1 ) x3 (I − H1 ) x4 (I − H1 ) x5
= X2 ∗

So,
0 0
X∗2 X∗2 β̂ 21 = X∗2 Y∗
and thus 0 −1 0
β̂ 21 = X∗2 X∗2 X∗2 Y∗
Special Case 1:

X = 1 X(0)

β0
β=
β (0)

Now, let’s use partial regression to find β̂ (0) .

Regression Y on 1: Y = β0 1 + ε and
 
y1 − y
1
Y∗ = (I − H1 ) Y = I − 1 (10 1) 10 Y = I − 110 Y =  ... 
h i
−1  
n
yn − y
X∗(0) regress X(0) on 1

X∗(0) = (I − H1 ) X(0)

1
= I − 110 X(0)
n

1 1
= I − 110 x1 , . . . , I − 110 xk
n n
 
x11 − x1 . . . x1k − xk
 x21 − x1 . . . x2k − xk 
=
 
.. .. 
 . . 
xn1 − x1 ... xnk − xk
 
β1
 .. 
Finally, to estimate the vector of the slopes β (0) = . 
βk
 
  x11 − x1 ... x1k − xk
y1 − y  x21 − x1 ... x2k − xk 
We regress  ...  on
   
 .. .. 
 . . 
yn − y
xn1 − x1 ... xnk − xk

48
Duc Vu (Fall 2021) 15 Lec 15: Oct 29, 2021

0 −1 0
to get β̂ (0) = X∗(0) X∗(0) X∗(0) Y∗ where

1 0
X∗(0) = I−
11 X(0)
n

1
Y∗ = I − 110 Y
n

49
Duc Vu (Fall 2021) 16 Lec 16: Nov 1, 2021

§16 L e c 1 6 : N ov 1 , 2 0 2 1

§16.1 Pa r t i a l R e g r e s s i o n ( C o nt ’ d )
Consider:
Y = Xβ + ε
Then,
−1
β̂ = (X0 X) X0 Y
The partial regression of Y∗ on X∗2
0 −1 0
β̂ 21 = X∗2 X∗2 X∗2 Y∗

i.e., Y∗ = X∗2 β 2 + ε.
Special Case 2: Begin with
Y = Xβ + ε
with k predictors. Then, we add an extra predictor Z. The new model is

Y = Xβ + cZ + ε

Use partial regression to estimate c.

1. Regress Y on X → e residuals
2. Regress Z on X → Z∗ residuals.
3. Regress e on Z∗ to get
0 −1 0
ĉ = Z∗ Z∗ Z∗ e
or 0 0
Z∗ e e0 Z∗
ĉ = 0 =
Z∗ Z∗ Z∗0 Z∗
Change in the error sum of squares when a new predictor is added in the model

Y = Xβ + ε (1)
Y = Xβ + cZ + ε (2)

Residuals using (1)

e = Y − Xβ̂
Residuals using (2)
u = Y − Xδ̂ − ĉZ
Now, we need to find δ̂

Y = Xβ + cZ + ε

β
Y= X Z +ε
c

β
Y=w +ε
c
Y = wη + ε

Normal equations:
w0 wη = w0 Y

50
Duc Vu (Fall 2021) 16 Lec 16: Nov 1, 2021

X0 Xδ̂ + X0 Zĉ = X0 Y (1)

0 0 0
Z Xδ̂ + Z Zĉ = Z Y (2)

From (1)
−1
δ̂ = (X0 X) [X0 Y − X0 Zĉ]
or
−1
δ̂ = β̂ − (X0 X) X0 Zĉ
Now back to u
−1
u = Y − X β̂ + X (X0 X) X0 Zĉ − ĉZ

−1
u = e − I − X (X0 X) X0 Zĉ
= e − [I − H] Zĉ
= e − Z∗ ĉ

The SSE is

SSEXZ = u0 u
0
= (e − Z∗ ĉ) (e − Z∗ ĉ)
0 0
= e0 e − 2Z∗ eĉ + Z∗ Z∗ ĉ2
0 0
= e0 e − 2ĉ2 Z∗ Z∗ + Z∗ Z∗ ĉ2
0
= e0 e − Z∗ Z∗ ĉ2

Thus, we can conclude that adding a new predictor would never increase SSE, i.e., u0 u ≤ e0 e. Note
that the new R2 is

2 u0 u
RXZ =1−
SST
0
e0 e Z∗ Z∗ ĉ2
=1− +
SST SST
∗0 ∗ 2
2 Z Z ĉ
= RX +
SST
2 2
So, RXZ ≥ RX .

§16.2 Pa r t i a l C o r r e l a t i o n
Consider
Yi = β0 + β1 Xi1 + β2 Xi2 + εi
where 
Yi : income

Xi1 : age

Xi2 : number of years of education


• Regress Y on X1 → Y∗ residuals.
• Regress X2 on X1 → X∗2 residuals.

51
Duc Vu (Fall 2021) 16 Lec 16: Nov 1, 2021

2 cov2 (Y∗ , X∗2 )

rYX =
2 |X1
var(X2 ∗ ) var(Y∗ )
hP ∗
∗
i2
Y1∗ − Y ∗
Xi2 − X 2 /(n − 1)
=
(Yi∗ −Y ∗ )
2 2
(Xi2 2 )
∗ −X ∗
P P

n−1 n−1
P ∗ 2
( Y Xi2 )
= P ∗2i P
( X2 ) ( Yi∗2 )
0 2
Y∗ X∗2
= 0
X∗2 X∗2 (Y∗0 Y∗ )

Another method:
• Regress Y on X1 , X2 , . . . , Xk−1 → Y∗ .
• Regress Xk on X1 , X2 , . . . , Xk−1 → X2k .
SSE (Y on X1 , . . . , Xk−1 ) − SSE (Y on X1 , . . . , Xk )
rY2 Xk |X1 ,...,xk−1 =
SSE (Y on X1 , . . . , Xk−1 )

52
Duc Vu (Fall 2021) 17 Lec 17: Nov 3, 2021

§17 L e c 1 7 : N ov 3 , 2 0 2 1

§17.1 Constrained Least Squares

Consider
Y = Xβ + ε
We want to estimate β subject to a set of linear constraints of the form cβ = γ where C : m × k + 1,
β : k + 1 × 1 and γ : m × 1.
Suppose k = 4 (
β0 + 2β1 − 3β2 + 5β3 − β4 = 5
2β0 − β1 + β2 + 3β3 = 10
or  
β0
β1 
1 2 −3 5 −1  β2  = 5

2 −1 1 3 0    10
β3 
β4
We still minimize (Y − Xβ 0 (Y − Xβ) but now subject to cβ = γ.
Method of Lagrange Multipliers:
0
min Q = (Y − Xβ) (Y − Xβ) + 2λ0 (cβ − γ)
So,
∂Q
= −2X0 Y + 2X0 Xβ + 2cλ = 0
∂β
Solve for β to get β̂c
−1
β̂ c = (X0 X) [X0 Y − c0 λ]
−1
β̂ c = β̂ − (X0 X) c0 λ
Now, we need to find λ. So
−1
cβ̂ c = cβ̂ − c (X0 X) c0 λ
−1
γ = cβ̂ − c (X0 X) c0 λ
h i−1
−1
λ = c (X0 X) c0 cβ̂ − γ
Therefore,
h i−1
−1 −1
β̂ c = β̂ − (X0 X) c0 c (X0 X) c0 cβ̂ − γ
Fitted values:
h i−1
−1 −1
Ŷc = Xβ̂ c = Xβ̂ − X (X0 X) c0 c (X0 X) c0 cβ̂ − γ
Residuals:
h i−1
−1 −1
ec = Y − Ŷc = e + X (X0 X) c0 c (X0 X) c0 cβ̂ − γ
Error sum of squares:
0
SSEc = ec ec
0
−1 0 −1 −1 0 −1
h i h i
0 −1 0 0 0 −1 0 0
= e + X (X X) c c (X X) c cβ̂ − γ e + X (X X) c c (X X) c cβ̂ − γ

= e0 e + e0 X [. . .] + [. . .] X0 e
0 h i−1 h i−1
−1 −1 −1 −1
+ cβ̂ − γ c (X0 X) c0 c (X0 X) X0 X (X0 X) c0 c (X0 X) c0 cβ̂ − γ

53
Duc Vu (Fall 2021) 17 Lec 17: Nov 3, 2021

Finally,
0
0 h i−1
−1
ec ec = e0 e + cβ̂ − γ c (X0 X) c0 cβ̂ − γ

We can deduce that SSEc ≥ SSE.

MLE of σ 2
e0 e
σ̂ 2 =
n
For the constrained model 0
ec ec
σ̂c2 =
n
and
(n − k − 1 + m)σ 2
E σ̂c2 =
n
Method Using the Canonical Form of the Model:

cβ = γ

β1
c = c1 c2 , β =
β2
c1 β 1 + c2 β 2 = γ
 
β
−3 5 −1  2 

1 2 β0 5
+ β3 =
2 −1 β1 1 3 0 10
β4

Back to the model using the same partition we get

Y = X1 β 1 + X2 β 2 + ε

Then,

Y = X1 c−1
1 [γ − c2 β 2 ] + X2 β 2 + ε
Y − X1 c1 γ = X2 − X1 c−1
−1

1 c2 β 2 + ε
Yr = X2r β 2 + ε

which is the same form as Y = Xβ + ε. Thus,

0 −1 0
β̂ 2c = X2r X2r X2r Yr

and therefore,
β̂1c = c−1
1 γ − c 2 β̂ 2c

Overall,
h i−1
−1 −1
β̂ c = β̂ − (X0 X) c0 c (X0 X) c0 cβ̂ − γ
or
β̂
β̂ c = 1c
β̂ 2c

which is from canonical form. Next, let’s find the mean and variance of β̂ c .

E β̂ c = β

Notice that
−1 0 −1
h i
0 −1 0 0
β̂ c = I − (X X) c c (X X) c c β̂ + const = Aβ̂

54
Duc Vu (Fall 2021) 17 Lec 17: Nov 3, 2021

So
−1
var(β̂ c ) = σ 2 A (X0 X) A0
or using the canonical model
 
var(β̂1c ) cov β̂ 1c , β̂ 2c
var(β̂ c ) =  
cov β̂ 1c , β̂ 2c var(β̂ 2c )

55
Duc Vu (Fall 2021) 18 Lec 18: Nov 5, 2021

§18 L e c 1 8 : N ov 5 , 2 0 2 1
Consider:

Y = Xβ + ε
i.i.d
ε1 , . . . , εn ∼ N (0, σ)
ε ∼ Nn 0, σ 2 I

Then, Y ∼ Nn Xβ, σ 2 I
−1
β̂ = (X0 X) X0 Y

−1
β̂ ∼ Nk+1 β, σ 2 (X0 X)
√
β̂ 1 ∼ N (β1 , σ v11 )
 
v00 v01 . . . v0k
v10 v11 . . . v1k 
−1
(X0 X) =  .
 
.. .. 
 .. . . 
vk1 vk2 ... vkk

§18.1 Q u a d r a t i c Fo r m s o f N o r m a l l y D i s t r i b u t e d R a n d o m
Va r i a b l e s
We have

a) Z ∼ Nn (0, I)
i.i.d
Z1 , . . . , Zn ∼ N (0, 1)
Zi2 ∼ X12
X
Zi2 ∼ Xn2
Z0 Z ∼ Xn2

b) Z ∼ Nn 0, σ 2 I . Then,

Zi ∼ N (0, σ)
Zi
∼ N (0, 1)
σ
Zi2
2
∼ X12
σ
P 2
Zi
∼ Xn2
σ2
Z
∼ Nn (0, I)
σ
Z0 Z
∼ Xn2
σ2
In multiple regression,

ε ∼ Nn 0, σ 2 I

ε0 ε
∼ Xn2
σ2

56
Duc Vu (Fall 2021) 18 Lec 18: Nov 5, 2021

c) Y ∼ Nn µ, σ 2 I

Yi − µi
Yi ∼ N (µi , σ) =⇒ ∼ N (0, 1)
σ
2
Yi − µi
∼ X12
σ
X Yi − µi 2
∼ Xn2
σ
0
(Y − µ) (Y − µ)
∼ Xn2
σ2
In multiple regression

Y ∼ Nn Xβ, σ 2 I

0
(Y − Xβ) (Y − Xβ)
∼ Xn2
σ2
1
d) Y ∼ Nn (µ, Σ), use V = Σ− 2 (Y − µ). Σ is symmetric matrix

Σ = PΛP0
 
λ1 0
 λ2 
Λ=
 
.. 
 . 
0 λn

where |Σ − λI| = 0. If x is a new zero vector such that Σx = λx, we say that x is an
eigenvector of Σ. Normalize the eigenvectors so that they have length 1

(λ1 , e1 ), (λ2 , e2 ), . . . , (λn , en )

e0i ej = 0, e0i ei = 1

P = e1 . . . en
PP0 = I
Σ = PΛP0 = λ1 e1 e01 + λ2 e2 e02 + . . . + λn en e0n

Result:
1 1
Σ− 2 = PΛ− 2 P0
Properties:
1
0 1
Σ− 2 = Σ− 2
1 1
Σ− 2 Σ− 2 = Σ−1
1 1
Σ 2 = PΛ 2 P0
1 1
Σ 2 = PΛ 2 P0
1 0 1
Σ2 = Σ2
1 1
Σ2 Σ2 = Σ

57
Duc Vu (Fall 2021) 18 Lec 18: Nov 5, 2021

Back to the transformation

1
V = Σ− 2 (Y − µ)
1
EV = Σ− 2 E (Y − µ) = 0
h 1
i
var(V) = var Σ− 2 (Y − µ)
h 1 1
i
= var Σ− 2 Y − Σ− 2 µ
1

= var Σ− 2 Y
1 1
= Σ− 2 var(Y)Σ− 2
1 1
= Σ− 2 ΣΣ− 2
=I
1
So, V ∼ Nn (0, I). Then V0 V ∼ Xn2 and because V = Σ− 2 (Y − µ), it follows that
1
1
1 1
0 0
Σ− 2 (Y − µ) Σ− 2 (Y − µ) = (Y − µ) Σ− 2 Σ− 2 (Y − µ) ∼ Xn2

Therefore,
0
(Y − µ) Σ−1 (Y − µ) ∼ Xn2
In multiple regression
−1
β̂ ∼ Nk+1 β, σ 2 (X0 X)
1

We want to create a X 2 random variable using the distribution of β̂. Let V = (X0 X) 2 β̂ − β .

1

EV = (X0 X) 2 E β̂ − β = 0
1
h i
var(V) = var (X0 X) 2 β̂ − β
1 1
= (X0 X) 2 var(β̂) (X0 X) 2
1 1
−1
= σ 2 (X0 X) 2 (X0 X) (X0 X) 2
= σ2 I

We have so far

V ∼ Nk+1 0, σ 2 I

V0 V 2
∼ Xk+1
σ2
0
β̂ − β X0 X β̂ − β
2
∼ Xk+1
σ
Summary:
( (Y−Xβ)0 (Y−Xβ)
σ2 ∼ Xn2
0 0
(β̂−β) X X(β̂−β) 2
σ2 ∼ Xk+1

(n−k−1)Se2 2
Problem 18.1. Show that σ2 ∼ Xn−k−1
Proof. Have 0
Y − Xβ ± Xβ̂ Y − Xβ ± Xβ̂
∼ Xn2
σ2

58
Duc Vu (Fall 2021) 18 Lec 18: Nov 5, 2021

Rearrange and expand

0 0
e + X β̂ − β e + X β̂ − β ee 0 e0 X β̂ − β β̂ − βX0 e
= 2 + +
σ2 σ σ2 σ2
0
β̂ − β X0 X β̂ − β
+
σ2
0
0
ee0 β̂ − β X X β̂ − β
= 2 +
σ σ2
e0 e
Note: Se2 = n−k−1 =⇒ e0 e = (n − k − 1)Se2
0
(n − k − 1)S 2 β̂ − β X0 X β̂ − β
0 e
(Y − Xβ) (Y − Xβ) /σ 2 = +
| {z } σ2 σ{z2
∼X 2
| }
n 2
∼Xk+1

We know cov(β̂, e) = 0.

Q = Q1 + Q2
MQ (t) = MQ1 (t) · MQ2 (t)
MQ (t)
MQ1 (t) =
MQ2 (t)
n
(1 − 2t)− 2
= k+1
(1 − 2t)− 2

n−k−1
= (1 − 2t)− 2

(n−k−1)Se2 2
So, Q1 = σ2 ∼ Xn−k−1 .
In simple regression, k = 1,

(n − 2)Se2 2
∼ Xn−2
σ2
σ2
Se2 = Q1
n−k−1
So,

MSe2 (t) = M σ2 (t)

n−k−1 Q1

σ2 t

= MQ1
n−k−1
− n−k−1
2σ 2 t
2

= 1−
n−k−1

n−k−1 2σ 2
Thus, Se2 ∼ Γ 2 , n−k−1

ESe2 = σ 2
2σ 4
var(Se2 ) =
n−k−1

59
Duc Vu (Fall 2021) 19 Lec 19: Nov 8, 2021

§19 L e c 1 9 : N ov 8 , 2 0 2 1

§19.1 Q u a d r a t i c Fo r m s a n d T h e i r D i s t r i b u t i o n – O ve r v i e w
1. Z ∼ N (0, I)
Z0 Z ∼ Xn2

2. Z ∼ N 0, σ 2 I
Z0 Z
∼ Xn2
σ2
and
ε0 ε
∼ Xn2
σ2

3. Y ∼ Nn µ, σ 2 I
0
(Y − µ) (Y − µ)
∼ Xn2
σ2
or 0
(Y − Xβ) (Y − Xβ)
∼ Xn2
σ2
4. Y ∼ N (µ, Σ). From the spectral decomposition,
1
V = Σ− 2 (Y − µ)

Then,

V ∼ Nn (0, I)

From 1), V0 V ∼ Xn2 or

0
(Y − µ) Σ−1 (Y − µ) ∼ Xn2

−1
β̂ ∼ Nk+1 β, σ 2 (X0 X)

1

V = (X0 X) 2 β̂ − β
V ∼ Nk+1 0, σ 2 I

From 2),
V0 V 2
∼ Xk+1
σ2
Finally, 0
β̂ − β X0 X β̂ − β
2
∼ Xk+1
σ2
Also, recall that we showed in last lecture

(n − k − 1) Se2 2
∼ Xn−k−1
σ2

60
Duc Vu (Fall 2021) 19 Lec 19: Nov 8, 2021

§19.2 A n o t h e r P r o o f o f Q u a d r a t i c Fo r m s a n d T h e i r
Distribution
1. Let Y ∼ Nn (0, I) and Z = P0 Y where P is an orthogonal matrix where P0 P = I. Then,
Z ∼ Nn (0, I).

2. Let A be a symmetric and idempotent matrix. Then the eigenvalues are 0 or 1.

Proof. Have Ax = λx. Multiply both sides by x0

x0 Ax = λx0 x
x0 AAx = λx0 x
0
(Ax) (Ax) = λx0 x
λ2 x0 x = λx0 x

Therefore, λ = 0 or λ = 1.
Question 19.1. How many 1’s?

Using the trace of A,

tr A = tr (PΛP0 )
= tr (ΛPP0 )
= tr Λ

3. Let Y ∼ N (0, I) and suppose A is a symmetric and idempotent matrix. Then Y0 AY ∼ X12
where r = tr (A) (number of eigenvalues equal to 1).

A = PΛP0 =⇒ Y0 AY = Y0 PΛP0 Y = Z0 ΛZ from 1)

Then,
Y0 AY = z12 + z22 + . . . + zr2 ∼ Xr2
where Z ∼ Nn (O, I) =⇒ zi ∼ N (0, 1), and so zi2 ∼ X12
(n−k−1)Se2 2
4. Use the previous theorem (3.) to show that σ2 ∼ Xn−k−1

e0 e
Se2 = =⇒ e0 e = (n − k − 1)Se2
n−k−1
e0 e 2
WTS: σ2 ∼ Xn−k−1

Proof. Have )
e = (I − H) Y
=⇒ e = (I − H) ε
Y = Xβ + ε
Therefore,
e0 e ε0 (I − H) ε ε ε 0

2
= 2
= (I − H) = ε∗ (I − H) ε∗
σ σ σ σ
where ε ∼ N (0, I). Using the theorem above (3.), we conclude that

0 (n − k − 1)Se2
ε∗ (I − H) ε∗ = 2
∼ Xtr(I−H) = Xn−k−1
σ2

61
Duc Vu (Fall 2021) 19 Lec 19: Nov 8, 2021

§19.3 Efficiency of Least Squares Estimators

Let θ̂ be an unbiased estimator of θ. Then,
1
var θ̂ ≥
nI(θ)

This is known as the Cramer-Rao Lower Bound. Recall the score function

∂ ln f (x; θ)
S=
∂θ
and the information matrix
2
E∂ 2 f (x; θ)

∂ ln f (x; θ)
I(θ) = E =− = var(S)
∂θ ∂θ2

and nI(θ) is the information in the sample. An estimator is efficient if

• It is unbiased
• its variance is equal to the Cramer-Rao lower bound.

Also,
∂ 2 ln L
I(θ) = −E
∂θ2
for Y1 , . . . , Yn i.i.d

62
Duc Vu (Fall 2021) 20 Lec 20: Nov 10, 2021

§20 L e c 2 0 : N ov 1 0 , 2 0 2 1

§20.1 I n f o r m a t i o n M a t r i x a n d E f f i c i e nt E s t i m a t o r
i.i.d
Let Y1 , Y2 , . . . , Yn ∼ N (µ, σ). Is y an efficient estimator for µ where
Ey = µ
σ2
var(y) =
n
Consider the pdf
1 1 2
√ e− 2σ2 (yi −µ)
f (yi ) =
σ 2π
− n 1
P 2
L = 2πσ 2 2 e− 2σ2 (yi −µ)
n 1 X
ln L = − ln 2πσ 2 − 2 (yi − µ)2
2 2σ
∂ ln L 2 X 1 X
= 2
(yi − µ) = 2 yi − nµ
∂µ 2σ σ
2
∂ ln L µ
=− 2
∂µ2 σ
Cramer-Rao Lower Bound:
1 1 σ2
= =
− − σn2
2

−E ∂ ∂µln2L n
Thus, y is an efficient estimator for µ.
Let θ̂ be the estimator of θ.
1. E θ̂ = θ
2. Find var(θ̂) and compared it with the inverse of the information matrix I−1 (θ) where
 2
∂ 2 ln L ∂ 2 ln L

∂ ln L
∂θ12 ∂θ1 ∂θ2 . . . ∂θ 1 ∂θp
 ∂ 2 ln L ∂ 2 ln L ∂ 2 ln L 

 ∂θ ∂θ
 2 1 ∂θ22
. . . ∂θ 2 ∂θp 
I(θ) = −E  . .. .. 
 .. . . 
 2 2 2

∂ ln L ∂ ln L
∂θp ∂θ1 ∂θp ∂θ2 . . . ∂ ∂θln2 L
p

2
In multiple regression: β0 , β1 , . . . , βk , σ
Y = Xβ + ε
ε ∼ N 0, σ 2 I

Y ∼ Nn Xβ, σ 2 I

n 1 0
=⇒ ln L = − ln 2πσ 2 − 2 (Y − Xβ) (Y − Xβ)
2 2σ
n 1
ln L = − ln 2πσ 2 − 2 Y0 Y − 2Y0 Xβ + β 0 X0 Xβ

2 2σ
Then
∂ 2 ln L ∂ 2 ln L ∂ 2 ln L ∂ 2 ln L
 
2 ∂β0 ∂β1 ... ∂β0 ∂βk ∂β0 ∂σ 2
 ∂ 2∂βln0 L ∂ 2 ln L ∂ 2 ln L ∂ 2 ln L


 ∂β1 ∂β0 ∂β12
... ∂β1 ∂βk ∂β1 ∂σ 2

∂ 2 ln L
!
 .
 ∂ ln L
.. .. ..  ∂β∂β 0 ∂β∂σ 2
 ..
I(θ) = −E  . . .  = −E ∂ 2 ln L ∂ 2 ln L

 ∂ 2 ln L ∂ 2 ln L ∂ 2 ln L 2
∂ ln L  ∂σ 2 ∂β 0 ∂(σ 2 )(2)
 ∂β ∂β ∂βk ∂β1 ... ∂βk2 ∂βk ∂σ 2 

 2k 0
∂ ln L ∂ 2 ln L ∂ 2 ln L 2
∂ ln L
∂σ 2 ∂β0 ∂σ 2 ∂β1 ... ∂βk2 ∂(σ 2 )(2)

63
Duc Vu (Fall 2021) 20 Lec 20: Nov 10, 2021

Then,
∂ ln L 1
= − 2 (−2X0 Y + 2X0 Xβ)
∂β 2σ
∂ 2 ln L 1 0 X0 X
= − (2X X) = −
∂β∂β 0 2σ 2 σ2
2
∂ ln L 1
2
= − 4 (−2X0 Y + 2X0 Xβ)
∂β∂σ 2σ
∂ ln L n 1 0
= − 2 + 4 (Y − Xβ) (Y − Xβ)
∂σ 2 2σ 2σ
∂ 2 ln L n 1 0
2 (2)
= 4
− 6 (Y − Xβ) (Y − Xβ)
∂(σ ) 2σ σ
2
∂ ln L n n n
E 2 (2)
= 4
− 4 =− 4
∂(σ ) 2σ σ 2σ

Thus,
X0 X X0 X
− σ2 0 σ2 0
I(θ) = =
00 − 2σn4 0 0 n
2σ 4
2 0 −1
σ (X X) 0
I−1 (θ) = 4
00 2σ
n

Notice that E β̂ = β and var(β̂) = σ 2 (X0 X)−1 . So β̂ is an efficient estimator of β.

e0 e
Se2 = , ESe2 = σ 2
n−k−1
2σ 4
var(Se2 ) =
n−k−1

§20.2 C e nt e r e d M o d e l
Consider Y = Xβ + ε

X = 1 X(0)

β0
β=
β (0)

Then,
1 0
Y = β0 1 + X(0) β (0) + ε ± 11 X(0) β (0)
n
Rearrange this expression and we obtain

1 0 1 0
Y = β0 1 + 11 X(0) β (0) + I − 11 X(0) β (0) + ε
n n

1 0 1 0
= 1 β0 + 1 X(0) β (0) + I − 11 X(0) β (0) + ε
n n
= γ0 1 + Zβ (0) + ε

Estimate the centered model

−1 0 −1 0
10 Z

γ̂0 n 1Y n 0 1Y
= =
β̂ (0) Z0 1 Z0 Z Z0 Y 0 Z0 Z Z0 Y

64
Duc Vu (Fall 2021) 20 Lec 20: Nov 10, 2021

Thus,

γ̂0 = y
−1
β̂ (0) = (Z0 Z) Z0 Y
−1
0 1 0 0 1 0
= X(0) I − 11 X(0) X(0) I − 11 Y
n n
0 0 0
= X∗(0) X∗(0) X∗(0) Y∗

Observe that Y ∼ Nn γ0 1 + Zβ (0) , σ 2 I . Then,
0
Y − γ0 1 − Zβ (0) Y − γ0 1 − Zβ (0)
∼ Xn2
σ2

• Fitted values: Ŷ = 1γ̂0 + Zβ̂ (0)

• e = Y − Ŷ = Y − 1γ̂0 − Zβ̂ (0)

Note: Fitted values and residuals are the same for both models.

65
Duc Vu (Fall 2021) 21 Lec 21: Nov 12, 2021

§21 L e c 2 1 : N ov 1 2 , 2 0 2 1

§21.1 C o n f i d e n c e I nt e r va l s i n M u l t i p l e R e g r e s s i o n
Consider
−1
β̂ ∼ Nk+1 β, σ 2 (X0 X)

Let’s find a 1 − α confidence interval for β1 .

√
β̂1 ∼ N (β1 , σ v11 )
(n − k − 1)Se2 2
∼ Xn−k−1
σ2
β̂1√−β1
σ v11 β̂1 − β1
q = √ ∼ tn−k−1
(n−k−1)Se2 Se v11
σ2 /n − k − 1
!
β̂1 − β1
P −t α2 ;n−k−1 = √ ≤ t α2 ;n−k−1 =1−α
Se v11

Finally,
√
β1 ∈ β̂1 ± t α2 ;n−k−1 · Se v11
In general, to construct a confidence interval for a0 β
q
−1
a0 β̂ ∼ N a0 β, σ a0 (X0 X) a

Then,
0 0
√a β̂−a β−1
σ a0 (X0 X) a
q ∼ tn−k−1
(n−k−1)Se2
σ2 /(n − k − 1)
a0 β̂ − a0 β
q ∼ tn−k−1
−1
Se a0 (X0 X) a

Finally, q
0 0 −1
a β ∈ a β̂ ± t α
2 ;n−k−1
· Se a0 (X0 X) a

If a = 0 1 0 0 . . . 0 then a0 β = β1 .

Prediction Interval for Y0 : For a given X00 = 1

x01 x02 . . . x0k

Y = Xβ + ε
 
  β0  
Y1 ε1
 .. 
 β
 1  . 

 . = 1 x01 x02 ... x0k  .  +  .. 
 .. 
Yn εn
βn

So the predictor is
Ŷ0 = x00 β̂

66
Duc Vu (Fall 2021) 21 Lec 21: Nov 12, 2021

Error of the prediction is Y0 − Ŷ0 with E Y0 − Ŷ0 = EY0 − E Ŷ0 = X00 β − X00 β = 0. Note that
Y0 = X̂00 β + ε0

var Y0 − Ŷ0 = var(Y0 ) + var(Ŷ0 )
−1
= σ 2 + σ 2 x00 (X0 X) x0

−1
= σ 2 1 + x00 (X0 X) x0

Then,
q
−1
Y0 − Ŷ0 ∼ N 0, σ 1+ x00 (X0 X) x0

(n − k − 1)Se2 2
∼ Xn−k−1
σ2
With this, we can construct a t ratio

√ Y0 −Ŷ0
σ 1+x00 (X0 X)−1 x0 Y0 − Ŷ0
q = q ∼ tn−k−1
(n−k−1)Se2 −1
σ2 /(n − k − 1) Se 1 + x00 (X0 X) x0

and the prediction interval for Y0 is

q
−1
Y0 ∈ Ŷ0 ± t α2 ;n−k−1 · Se 1 + x00 (X0 X) x0

For a given x00 = 1 x0k , Ŷ0 = x00 β̂ and

x01 x02 ...
q
0 0 0 −1
Ŷ0 ∼ N x0 β, σ x0 (X X) x0

(n − k − 1)Se2 2
∼ Xn−k−1
σ2
SO the confidence interval for EY0 is
q
−1
EY0 = x00 β ∈ Ŷ0 ± t α
2 ;n−k−1
· Se x00 (X0 X) x0

§21.2 H y p o t h e s i s Te s t i n g
Suppose k = 5 then

Yi = β0 + β1 Xi1 + β2 Xi2 + β3 Xi3 + β4 Xi4 + β5 Xi5 + εi

Suppose we want to test

1. H0 : β1 = 0, Ha : β1 6= 0
2. H0 : β3 = 2, Ha : β3 6= 2
3. H0 : β2 − β5 = 0, β2 − β5 6= 0
4. H0 : β2 = β5 = 0, Ha : not true
5. H0 : β1 = β2 = β3 = β4 = β5 , Ha : not true or β(0) = 0, β(0) 6= 0
As the above can be expressed using

H0 : Cβ = γ
Ha : Cβ 6= γ

67
Duc Vu (Fall 2021) 21 Lec 21: Nov 12, 2021

1. C = 0 1 0 0
0 ,γ=0 0

2. C = 0 0 0 1 0 0 , γ = 2

3. C = 0 0 1 0 0 −1 , γ = 0

0 0 1 0 0 0 0
4. C = ,γ= . Check:
0 0 0 0 0 1 0

β2 0
Cβ = =
β5 0

5.    
0 1 0 0 0 0 0
0 0 1 0 0 0 0
   
0
C= 0 0 1 0 0, 0
γ= 
0 0 0 0 1 0 0
0 0 0 0 0 1 0

or C = 0 I
In general, C is m × k + 1 matrix.

H0 : Cβ = γ =⇒ cβ − γ = 0
Ha : Cβ 6= γ =⇒ cβ − γ 6= 0

Consider Cβ̂ − γ and find its distribution under H0 .

E Cβ̂ − γ = 0

−1
var Cβ̂ − γ = σ 2 C (X0 X) C0

Therefore,
−1
Cβ̂ − γ ∼ Nm 0, σ 2 C (X0 X) C0
and let h i− 21 h i
−1
V = C (X0 X) C0 Cβ̂ − γ
Then EV = 0
var (V) = σ 2 Im×m
0
V V

So, V ∼ Nm 0, σ 2 I and σ2
2
∼ Xm
0 −1
−1
Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ
2
∼ Xm
σ2
(n − k − 1)Se2 2
∼ Xn−k−1
σ2
β̂ and Se2 are independent. Therefore,
0
−1
−1
(Cβ̂−γ ) C(X0 X) C0 (Cβ̂−γ )
σ2 /m
(n−k−1)Se2
∼ Fm,n−k−1
σ2 /n −k−1
or 0 −1
−1
Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ
∼ Fm,n−k−1
mSe2

68
Duc Vu (Fall 2021) 22 Lec 22: Nov 15, 2021

§22 L e c 2 2 : N ov 1 5 , 2 0 2 1

§22.1 F Te s t f o r t h e G e n e r a l L i n e a r H y p o t h e s i s
Consider:
H0 : Cβ = γ
Ha : Cβ 6= γ

−1
Under H0 : Cβ̂ − γ ∼ Nm 0, σ 2 C (X0 X) C0
0
−1
−1 
(Cβ̂−γ ) C(X0 X) C0 (Cβ̂−γ ) 2
σ2 ∼ Xm
(n−k−1)Se2 2
∼ Xn−k−1

σ2
0 −1
−1
Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ
=⇒ ∼ Fm,n−k−1
mSe2
Reject H0 if F > F1−α;m,n−k−1
Note: EmSe2 = mσ 2 – expected value of the denominator. Expected value of the numerator (using
properties of trace) is
−1
0 −1
mσ 2 + (Cβ − γ) C (X0 X) C0 (Cβ − γ)

If H0 is true, the second term becomes 0 and the expected value is approximately equal to 1.

§22.2 F Statistics and t statistics in Multiple Regression

Suppose H0 : β1 = 0, Ha : β1 6= 0, k = 5 and m = 1. Then,

C= 0 1 0 0 0 0 , γ=0

and Cβ̂ − γ = β̂1 . Then the F statistics is

β̂12
2
∼ F1,n−k−1
Se v11
Now we test H0 : β1 = 0 using the t statistics
√ )
β̂1 ∼ N 0, σ v11 β̂1
(n−k−1)Se2
=⇒ √ ∼ tn−k−1
∼ 2
Xn−k−1 Se v11
σ2

Thus, t2n−k−1 = F1,n−k−1 .

Suppose
H0 : a0 β = 0
Ha : a0 β 6= 0
q
0 0 0 −1
t-statistics: a β̂ ∼ N 0, σ a (X X) a and

(n − k − 1)Se2 2
∼ Xn−k−1
σ2
Then,
a0 β̂
q ∼ tn−k−1
−1
Se a0 (X0 X) a

69
Duc Vu (Fall 2021) 22 Lec 22: Nov 15, 2021

§22.3 Powe r A n a l y s i s i n M u l t i p l e R e g r e s s i o n
Let Y ∼ Nn (µ, I). Then Y0 Y ∼ Xn2 (NCP = µ0 µ). Let Y ∼ Nn µ, σ 2 I . Then,

Y µ
∼ Nn ,I
σ σ
Y0 Y µ0 µ

2
∼ Xn NCP = 2
σ2 σ

Let Q ∼ Xn2 (NCP = θ)

−n t
MQ (t) = (1 − 2t) 2
eθ 1−2t

When H0 is no true,
−1
Cβ̂ − γ ∼ Nm Cβ − γ, σ 2 C (X0 X) C0
−1
− 1
C(X0 X) C0 2
(Cβ̂−γ )
Let V = σ . Then,
 − 12 
0 −1
 C (X X) C0 (Cβ − γ)
V ∼ Nm  , I

σ
0 h i−1  h i−1 
−1 0 −1 0
Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ (Cβ − γ) C (X 0
X) C (Cβ − γ)
2 
∼ Xm NCP =

σ2 σ2


and the non-central F distribution

)
U ∼ Xn2 (NCP = θ) U/n
2
=⇒ ∼ Fn,m (NCP = θ)
V ∼ Xm V /m

where U, V are independent. Apply this for the power analysis

0
h −1
i−1 0 h i−1
(Cβ̂−γ ) C(X0 X) C0 C β̂ − γ C (X 0
X)
−1 0
C C β̂ − γ
σ2 /m
(n−k−1)Se2
= ∼ Fm,n−k−1
/(n − k − 1) mSe2
σ2
h −1
i−1
(Cβ−γ)0 C(X0 X) C0 (Cβ̂−γ )
with NCP = σ2
figure here
The power is
1 − β = P (Fm,n−k−1 (NCP = θ) > F1−α;m,n−k−1 )

§22.4 F Statistics Using the Extra Sum of Squares

Under H0 : Cβ = γ, we have a constrained least squares problem with
h i
−1 −1
β̂ c = β̂ − (X0 X) C0 C (X0 X) C0 Cβ̂ − γ

and 0 h i−1
−1
SSEc = ec 0 ec = e0 e + Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ

Using extra sum of squares

(SSEc − SSEF ) /(dfR − dfF )
∼ FdfR −dfF ,dfF
SSEF /dfF

70
Duc Vu (Fall 2021) 22 Lec 22: Nov 15, 2021

where dfF = n − k − 1 and dfR = n − (k − m) − 1 so dfR − dfF = m, e.g.,

k=5
H0 : β1 = β2 = 0
Full : n − 5 − 1
Reduced : n − 3 − 1
=⇒ m = 2

Thus,
0 h i−1
−1
Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ
∼ Fm,n−k−1
mSe2
which is the same as method 1.

71
Duc Vu (Fall 2021) 23 Lec 23: Nov 17, 2021

§23 L e c 2 3 : N ov 1 7 , 2 0 2 1

§23.1 Te s t i n g t h e O ve r a l l S i g n i f i c a n c e o f t h e M o d e l
Consider

H0 : β (0) = 0
Ha : β (0) 6= 0

We can test this hypothesis using the F test for the general linear hypothesis
0 h i−1
−1
Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ
∼ Fk,n−k−1
kSe2

where m = k in this case. We can also use the following test statistic

MSR SSR/k
=
MSE SSE/(n − k − 1)
0 h i−1
−1
C (X0 X) C0

Note: SSR = Cβ̂ − γ Cβ̂ − γ where C = 0 Ik .

§23.2 L i ke l i h o o d R a t i o Te s t
Consider:

H0 : β (0) = γ
Ha : β (0) 6= γ

We reject H0 if
L (ŵ)
Λ= <k
L (ω̂)
where
• L (ŵ) : maximized likelihood function under H0

• L (ω̂) : maximized likelihood function under no restriction

Note that
Y = Xβ + ε =⇒ Y ∼ Nn Xβ, σ 2 I

Thus, the likelihood function is

− n2 1 0
L = 2πσ 2 e− 2σ2 (y−xβ) (y−xβ)

Without any restrictions

−1
β̂ = (X0 X) X0 Y
and 0
y − xβ̂ y − xβ̂ e0 e
σ̂12 = =
n n
Under H0 ,
h i−1
−1 −1
β̂ c = β̂ − (X0 X) C0 C (X0 X) C0 Cβ̂ − γ

72
Duc Vu (Fall 2021) 23 Lec 23: Nov 17, 2021

and 0
y − xβ̂ c y − xβ̂ c e0c ec
σ̂02 = =
n n
Back to LRT, we have
− n2 − 1
2 e0c ec
2πσ̂02 e 2σ̂0

Λ= − 1
e0 e
<k
−n 2
(2πσ̂12 ) 2
e 2σ̂1

Replace

e0c ec = nσ̂02
e0 e = nσ̂12

We obtain
e0 e 2
< kn
e0c ec
Also,
0 h i−1
−1
e0c ec = e0 e + Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ

Thus,
1 2
0 −1 < kn
(Cβ̂−γ ) [C(X0 X)−1 C0 ] (Cβ̂−γ )
1+ e0 e

We see that if H0 is true then Cβ̂ ≈ γ and therefore the ratio above is approximately equal to 1. If
H0 is no true then the ratio above is less than 1. Manipulating the above expression, we have
0 h i−1
−1
Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ 2 n−k−1
> k− n − 1 = k0
mSe2 m

Use significance level α (type I error) to find k 0 (rejection region).

P (Fm,n−k−1 > k 0 ) = α

Therefore, k 0 = F1−α;m,n−k−1 .

1−α
α

F1−α;m,n−k−1

§23.3 M u l t i - C o l l i n e a r i ty
This is problem when some predictors are highly correlated with other predictors.

73
Duc Vu (Fall 2021) 23 Lec 23: Nov 17, 2021

Example 23.1
Suppose k = 2. )
H0 : β1 = β2 = 0
Use F statistic
Ha : at least on βi 6= 0
Suppose we reject H0 (at least one bi 6= 0). Then test β1 = 0 and β2 = 0 individually.

H0 : β1 = 0 H0 : β 2 = 0
Ha : β1 6= 0 Ha : β2 6= 0

Suppose we don’t reject H0 is both tests. This contradiction between the F statistic and the t
statistics is a problem caused by multi-collinearity.

Multi-collinearity inflates the variance of β̂i and therefore the corresponding t statistics will be
small. To explain this we will use the centered and scaled model

Y = γ0 1 + Zβ (0) + ε

where Z = I − n1 110 X(0) . Or

Yi = γ0 + β1 Xi1 − X 1 + β2 Xi2 − X 2 + . . . + βk Xik − X k + εi

where i = 1, 2, . . . , n. Or

Yi = γ0 + β1 zi1 + β2 Zi2 + . . . + βk Zik + εi

where Z1 , . . . , Zk are the centered predictors.

74
Duc Vu (Fall 2021) 24 Lec 24: Nov 19, 2021

§24 L e c 2 4 : N ov 1 9 , 2 0 2 1

§24.1 C e nt e r e d a n d S c a l e d M o d e l i n M a t r i x / Ve c t o r Fo r m
Consider the centered model

Y = γ0 1 + Zβ (0) + ε
1
γ0 = β0 + 10 X(0) β (0)
n
= β0 + β1 x 1 + . . . + βk x k

1 0
Z = I − 11 X(0)
n
or

Yi = γ0 + β1 (xi1 − x1 ) + β2 (xi2 − x2 ) + . . . + βk (xik − xk ) + εi

= γ0 + β1 Zi1 + β2 Zi2 + . . . + βk Zik + εi
pP
Centering and scaling: multiply and divide each centered prediction by (xij − xj )2 . Then,

(x1i − x1 (xik − xk )
qX qX
Yi = γ0 + β1 (xi1 − x1 )2 pP + . . . + βk (xik − xk )2 pP + εi
(xi1 − x1 )2 (xik − xk )2
or
Yi = γ0 + δ1 Zsi1 + . . . + δk Zsik + εi
where
qX
δj = βj (xij − xj )2
xij − xj
Zsij = pP
(xij − xj )2

From the centered model

−1
Y = γ0 1 + ZD
| {z } Dβ (0) +ε
Z
| {z }
s
δ (0)

where pP 
(xi1 − x1 )2 0
D=
 .. 
. 
pP
0 (xik − xk )2
Then
Y = γ0 1 + Zs δ (0) + ε
Note:

1. 10 Zs = 10 ZD−1 = 10 I − n1 110 X(0) D−1 = 00

2. Z0s 1 = 0
Z0s1
 

3. Z0s Zs =  ...  Zs1


... Zsk


Z0sk

75
Duc Vu (Fall 2021) 24 Lec 24: Nov 19, 2021

Let’s examine Z0s1 Zs1 and Z0s1 Zs2

√Pxi1 −x1
 
(xi1 −x1 )2 
 P(xi1 − x1 )2
i P
xi1 −x1 √Pxn1 −x1 ..
h
Z0s1 Zs1 = √P(xi1 −x1 )2 ... = =1

(xi1 −x1 ) 2  . (xi1 − x1 )2
√Pxn1 −x1
 
(xi1 −x1 )2

Similarly for Z0s1 Zs2 ,

P
(xi1 − x1 )(xi2 − x2 )/n − 1
Z0s1 Zs2 = pP pP = v12
(xi1 − x1 )2 (xi2 − x2 )2 /n − 1

Then,
 
1 r12 . . . r1k
r21 1 . . . r2k 
Z0s Zs = R =  .
 
.. .. 
 .. . . 
rk1 rk2 ... 1

Estimation of Y = γ0 1 + Zs δ(0) + ε
−1 −1
00
1
10 1 10 Zs 10 Y 10 Y 10 Y

γ̂0 n 0 n
= = = −1
δ̂ (0) Z0s 1 Z0s Zs Z0s Y 0 Z0s Zs Z0s Y 0 (Z0s Zs ) Z0s Y

So
γ̂0 = y
which is the same as the estimate of γ0 of the centered model. And
−1
δ̂ (0) = (Z0s Zs ) Z0s Y

Properties:
−1
E δ̂ (0) = (Z0s Zs ) Z0s EY
−1
= (Z0s Zs ) Z0s γ0 1 + Zs δ (0)

−1
= 0 + (Z0s Zs ) Z0s Zs δ (0)
= δ (0)
h i
−1
var(δ̂ (0) ) = var (Z0s Zs ) Z0s Y
= σ 2 R−1

Non-Centered Model:
Y = Xβ + ε or Y = β0 1 + X(0) β (0) + ε
Centered Model:
Y = γ0 1 + Zβ (0) + ε
Centered/Scaled Model:
Y = γ0 1 + Zs δ (0) + ε
where
1 0
γ 0 = β0 + 1 X(0) β (0)
n
δ (0) = Dβ (0)

76
Duc Vu (Fall 2021) 24 Lec 24: Nov 19, 2021

So
1 0
β̂0 = γ̂0 − 1 X(0) β̂ (0)
n
1
= y − 10 X(0) D−1 δ̂ (0)
n
β̂ (0) = D−1 δ̂ (0)

77
Duc Vu (Fall 2021) 25 Lec 25: Nov 22, 2021

§25 L e c 2 5 : N ov 2 2 , 2 0 2 1

§25.1 M u l t i - C o l l i n e a r i ty
Let X1 , X2 , . . . , Xk be predictors.
pP Some predictors are highly correlated with other predictors.
Earlier we saw that δ1 = β1 (xi1 − x1 )2
qX
δ̂1 = β̂1 (xi1 − x1 )2
δ̂1
β̂1 = pP
(xi1 − x1 )2
var(δ̂1 )
var(β̂1 ) = P
(xi1 − x1 )2

So let’s find the variance of δ̂1 using the centered and scaled model.
 
1 r12 r13 . . . r1k
r21 1 r23 . . . r2k 
var(δ̂ (0) ) = σ 2 R−1 = σ 2  . ..  = ∗
 
 .. ..
. . 
rk1 rk2 rk3 1

Therefore, var(δ̂1 ) = σ 2 R−1 [1, 1]. Using the inverse of a partitioned matrix
−1
C−1 −C−1

A11 A12 11 11 C12
=
A21 A22 −C21 C−111 A−1 −1
22 + C21 C11 C12

Here A11 = 1, A12 = r0 , A21 = r, A22 = R22 , C11 = A11 − A12 A−1
22 A21 . Therefore,

σ2
var(δ̂1 ) =
1 − r0 R−1
22 r
2
σ 2
We will sow that var(δ̂1 ) = 1−R 2 where R1 is the R-square from the regression of X1 on
1
X2 , X3 , . . . , Xk . Instead, we can regress Zs1 on Zs2 , Zs3 , . . . , Zsk . Because we have seen that
the three models:
• Non-centered
• Centered
• Centered/Scaled
Here is the model

Zsi1 = α0 + α1 Zsi2 + α2 Zsi3 + . . . + αk−1 Zsik + εi

SSR
R12 =
SST
X 2 X 2
SST = Zsi1 − Z s1 = Zsi1
 2
0
2
= Zs1 Zs1 = 1. So far, we have R2 = SSR = Ẑsi1 − Z s1  or R12 = 2
P P P
But Zi1 Ẑsi1 =
|{z}
→0
0
Ẑs1 Ẑs1 . Here, Ẑs1 = HZs1 where H is the hat matrix using Zs2 , Zs3 , . . . , Zsk . Therefore, R12 =
0
Zs1 HZs1 or
0
0 −1 0
R12 = Zs1 Z∗s Z∗s Z∗ Z∗s Zs1 = r0 R−1
22 r

78
Duc Vu (Fall 2021) 25 Lec 25: Nov 22, 2021

Earlier we found that

σ2 σ2
var(δ̂1 ) = −1 =
1 − r0 R22 r 1 − R12

Now back to the variance of β̂1 :

var(δ̂1 )
var(β̂1 ) = P
(xi1 − x1 )2
σ2
Replace var(δ̂1 ) = 1−R12
we obtain

σ2
var(β̂1 ) =
R12 )
P
(1 − (xi1 − x1 )2

Therefore, if R12 is close to 1, then var(β̂1 ) is large.

Detection of Multi-Collinearity: use variance inflation factor (VIF). For each predictor j compute
VIFj
1
VIFj =
1 − Rj2
where Rj2 is the R-square from the regression of predictor xj on the other predictors. For example,
if VIF > 10, then Rj2 > 0.90 which means xj is highly correlated with the other predictors.

§25.2 Generalized Least Squares

Consider the model:
Y = Xβ + ε
So far we assumed the Gauss-Markov conditions. Suppose now

Eε = 0
var(ε) = σ 2 V

where V is a symmetric matrix of constants. If we use the ordinary least squares

(OLS) estimator

−1 −1
β̂ = (X X) X Y. We still get E β̂ = β because EY = Xβ but var(β̂) = var (X0 X) X0 Y =
0 0

−1 −1
σ 2 (X0 X) X0 VX (X0 X) . Therefore, β̂ is not BLUE because the Gauss-Markov conditions do
1
not hold. We transform the model as follows: let V− 2 be the inverse square root matrix of V.
− 12
Multiply the model on both sides by V
1 1 1
V− 2 Y = V− 2 Xβ + V− 2 ε

Y∗ = X∗ β + ε∗
1

Eε∗ = E V− 2 ε = 0
1
1 1
var (ε∗ ) = var V− 2 ε = σ 2 V− 2 VV− 2 = σ 2 I

with this transformation we see that the Gauss-Markov conditions hold. Therefore, we estimate β
using
0 −1 0
β̂ GLS = X∗ X∗ X∗ Y ∗
1 1
Replace X∗ = V− 2 X and Y∗ = V− 2 Y to get
−1
β̂ GLS = X0 V−1 X X0 V−1 Y

79
Duc Vu (Fall 2021) 25 Lec 25: Nov 22, 2021

Then, the mean is

−1 0 −1 −1 0 −1
E β̂ GLS = X0 V−1 X X V EY = X0 V−1 X X V Xβ = β

which is unbiased. And

−1 0 −1 −1
var β̂ GLS = var X0 V−1 X X V Y = σ 2 X0 V−1 X

80
Duc Vu (Fall 2021) 26 Lec 26: Nov 24, 2021

§26 L e c 2 6 : N ov 2 4 , 2 0 2 1

§26.1 G e n e r a l i z e d L e a s t S q u a r e s ( C o nt ’ d )
Estimate β by direct minimization of the error sum of squares using the transformed model

Y ∗ = X∗ β + ε ∗
0 0 1 1
min ε∗ ε∗ or min (Y∗ − X∗ β) (Y∗ − X∗ β). Replace Y∗ = V− 2 Y and X∗ = V− 2 X. We minimize
0
min Q = (Y − Xβ) V−1 (Y − Xβ)
= Y0 V−1 Y − 2Y0 V−1 Xβ + β 0 X0 V−1 Xβ
∂Q
= −2X0 V−1 Y + 2X0 V−1 Xβ = 0
∂β
−1 0 −1
β̂ GLS = X0 V−1 X XV Y

Assume now ε ∼ Nn 0, σ 2 V . Then, Y ∼ Nn Xβ, σ 2 V and

1 −1 1 0 2 −1
L= n σ 2 V 2 e− 2 (y−xβ) (σ V) (y−xβ)
(2π) 2
n 1 1 0
ln L = − ln 2πσ 2 − ln |V | − 2 (y − xβ) V−1 (y − xβ)
2 2 2σ
∂ ln L
=0
∂θ
−1 0 −1
β̂ GLS = X0 V−1 X XV Y

Estimation of σ 2
∂ ln L n 1 0
2
= − 2 + 4 (y − xβ) V−1 (y − xβ) = 0
∂σ 2σ 2σ
0 0
y − xβ̂ GLS V−1 y − xβ̂ GLS y∗ − x∗ β̂ GLS y∗ − x∗ β̂ GLS
σ̂ 2 = =
n n
0
e eGLS
= GLS
n
Use the properties of trace to find E σ̂ 2
1 0
E σ̂ 2 = E y − xβ̂ GLS V−1 y − xβ̂ GLS
n
1 h i
= tr V−1 var y − Xβ̂ GLS + 000
n
h i
because E Y − Xβ̂ GLS = 0. So

−1 0 −1
Y − Xβ̂ GLS = Y − X X0 V−1 X

XV Y
h −1 0 −1 i
0 −1
= I−X XV X XV Y

So
h −1 0 −1 i h −1 0 i
var Y − Xβ̂ GLS = σ 2 I − X X0 V−1 X XV V I − V−1 X X0 V−1 X X
−1
= σ 2 V − σ 2 X X0 V−1 X X0

81
Duc Vu (Fall 2021) 26 Lec 26: Nov 24, 2021

Back to expectation
1 h −1 0 i
E σ̂ 2 = tr V−1 σ 2 V − σ 2 X X0 V−1 X X
n
1 2
σ tr In − σ 2 tr Ik+1

=
n
n−k−1 2
= σ
n
0
eGLS eGLS
Thus, the unbiased estimator of σ 2 is Se2GLS = n−k−1 .

§26.2 Comparing Regression Equations

Suppose we have two data sets on the same variables

Y1 = X1 β 1 + ε 1
Y2 = X2 β 2 + ε2
! !
β (1) β (1)
Let β 1 = (2) , β 2 = (2) . Note that
β1 β2

(2) (2)
β (1) : p × 1, β 1 : (k + 1 − p) × 1, β 2 : (k + 1 − p) × 1
(2) (2)
Suppose we want to test β 1 = β 2 (assume that the first p elements of β 1 and β 2 are the same).
We can construct one model as follows
 
! β (1)
(1) (2)
Y1 X1 X1 0  (2)  ε1
= (1) (2) β 1  + ε
Y2 X2 0 X2 (2) 2
β2

Therefore, Y = Xβ + ε and the hypothesis β 21 = β 22 can be tested using

H0 : Cβ = 0
Ha : Cβ 6= 0

82
Duc Vu (Fall 2021) 27 Lec 27: Nov 29, 2021

§27 L e c 2 7 : N ov 2 9 , 2 0 2 1

§27.1 C o m p a r i n g R e g r e s s i o n E q u a t i o n s ( C o nt ’ d )
We can use the F test for the general linear hypothesis
0 h i−1
−1
Cβ̂ − γ C (X0 X) C0 Cβ̂ − γ
∼ Fm,n−k−1
mSe2

Example 27.1
Suppose k = 5 and p = 3. We want to test
(1) (2) (1) (2) (1) (2)
H0 : β3 = β3 , β4 = β4 , β5 = β5
Ha : not true
 
0 0 0 1 0 0 −1 0 0
C = 0 0 0 0 1 0 0 −1 0
0 0 0 0 0 1 0 0 −1

and  
β0
 β1 
 
 β2 
 
β (1) 
 3 
 (1) 
β= β 
 4(1) 
β 
 5 
β (2) 
 3 
 (2) 
 β4 
(2)
β5
In general,
C = 0k+1−p,p Ik+1−p −Ik+1−p
Therefore,
C : (k + 1 − p) × (2(k + 1) − p)

We can also test the hypothesis using the extra sum of squares principle. Under the null hypothesis,
the model is expressed as follows
0 0
Y1 X1 X21 β ε
= + 1
Y2 X02 X22 β∗ ε2

where β ∗ is the common beta subvector under H0 . Therefore,

(SSER − SSEF )/(dfR − dfF )

∼ FdfR −dfF ,dfF
SSEF /dfF
dfF = n − p − 2(k + 1 − p) = n + p − 2(k + 1)
dfR = n − p − (k + 1 − p) = n − k − 1

83
Duc Vu (Fall 2021) 27 Lec 27: Nov 29, 2021

Example 27.2
Suppose k = 5, p = 4
(1) (1) (1) (2)
H0 : β4 = β5 , β5 = β5
Ha : not true

Formulation:
(1) (1) (1) (1) (1)
 

y11
 1 x11 x12 x13 x14 x15 0 0  
ε11
 β0
 
(1) (1) (1) (1) (1)
 y21  1 x21 x22 x23 x24 x25 0 0

 ε21 
  β1   

  . .. .. .. .. .. .. ..
 ..   .   . 
 .  . . . . . . .   β2   .. 
.

  
yn1  1 (1) (1) (1) (1) (1) 
 β3  εn1 
  
 = xn1 xn2 xn3 xn4 xn5 0 0    (1) + 
 y12  1 (2) (2) (2) (2) (2)
x11 x12 x13 0 0 x14 x15  β4   ε12 
   
  
 y22   (2)  β (1) 
x25   5   ε22 
(2) (2) (2) (2)
  1 x21 x22 x23 0 0 x24
 
 .     β2   . 
 ..   .. .. .. .. .. .. .. ..  4  .. 
. . . . . . . .  β2
yn2 (2) (2) (2) (2) (2) 5 εn2
1 xn1 xn2 xn3 0 0 xn4 xn5

0 0 0 0 1 0 −1 0
C=
0 0 0 0 0 1 0 −1
dfF = n − 8, dfR = n − 6

§27.2 D e l e t i n g a S i n g l e Po i nt i n M u l t i p l e R e g r e s s i o n
We want to explore the effect of deleting a single point in multiple regression
• Effect on β̂
• Effect on Se2
• Effect on fitted values

We can delete one point at a time and run a new regression each time to see the effect. But this
will require n + 1 regressions (one on the full data set and n regressions when we delete data point
i, i = 1, . . . , n). There is a more automated way and the result is based on the residuals, ei , and
leverage values hii from the regression of the full data set.

Y = Xβ + ε

Suppose we want to delete data point i

Y(i) X(i) ε
= β + (i)
Yi x0i εi

where Y(i) , X(i) are the vectors Y and matrix X after deleting point i from the data set. We are
working now with the model

Y(i) = X(i) β + ε(i)

−1
β̂ (i) = X0(i) X(i) X0(i) Y(i)

where β̂ (i) is the estimator of the vector β after deleting data point i.

84
Duc Vu (Fall 2021) 28 Lec 28: Dec 1, 2021

§28 Lec 28: Dec 1, 2021

§28.1 D e l e t i n g a S i n g l e Po i nt i n M u l t i p l e R e g r e s s i o n
( C o nt ’ d )
0 0
Let’s find expressions for X(i) X(i) and X(i) Y(i)

X(i)
0 0
X0 X = X(i) xi = X(i) X(i) + xi x0i
x0i
0
X(i) X(i) = X0 X − xi x0i

Result: Let A be a square invertible matrix and b be a vector. Then

−1 A−1 bb0 A−1
A − bb0 = A−1 +

1 − b0 A−1 b
Note: b0 A−1 b 6= 1. We can verify this as follows: multiply both sides by A − bb0 to get

identity matrix both sides. In our problem A is X0 X and b is xi .Using the result we can find
0 −1 0
0 Y 0
(i)
X(i) X(i) . Now let’s find X(i) Y(i) . From X0 Y = X(i) X(i) = X(i) Y(i) + xi yi . We
yi
0
0 −1 0
get X(i) Y(i) = X0 Y − xi yi . Back to β̂ (i) = X(i) X(i) X(i) Y(i) . To find

−1
(X0 X) xi
β̂ (i) = β̂ − ei
1 − hii
Then
−1
(X0 X) xi ei
β̂ − β̂ (i) =
1 − hii
This is the difference in the estimator of β before and after deleting data point i.
Effect on fitted values:

Ŷi − Ŷi(i) = x0i β̂ − x0i β̂ (i)

= x0i β̂ − β̂ (i)
−1
(X0 X) xi ei
= x0i
1 − hii
hii
= ei
1 − hii
Finally, we can show that the error sum of square after deleting data point are connected with the
error sum of squares of the full data set as follows

2 e2i
(n − k − 2)Se(i) = (n − k − 1)Se2 −
1 − hii
• Se(i)
2
is the unbiased estimator of σ 2 after deleting data point i

• Se2 is unbiased estimator of σ 2 using full data set

• ei , hii are residual i and leverage i using full data set

e = (I − H) Y
ei = (I − H)i Y
−1
hii = x0i (X0 X) xi

85
Duc Vu (Fall 2021) 28 Lec 28: Dec 1, 2021

Adding a data point in multiple regression

Y X ε
= β +
y0 x00 ε0
Ynew = Xnew β + εnew

0 X
Xnew = X0 x0 = X0 X + x0 x00

Xnew
x00

Result: Let A be a square invertible matrix and β a vector s.t. 1 + b0 A−1 b 6= 1. Then

−1 A−1 bb0 A−1

A + bb0 = A−1 −
1 + b0 A−1 b
Here A is X0 X and b is x0 . Also,

0 Y
Xnew Ynew = X 0
x0 = X0 Y + x0 y0
y0

Finally,
−1
(X0 X) x0 e0
β̂ new = β̂ +
1 + h00
−1
where e0 = y0 − x00 β̂ and h00 = x00 (X0 X) x0 .

§28.2 I n f l u e nt i a l A n a l y s i s
Internally studentized residuals
√
√ ei
)
ei ∼ N 0, σ 1 − hii σ 1−hii ei
(n−k−1)Se2
=⇒ q = √
∼ 2
Xn−k−1 (n−k−1)Se2
/(n − k − 1) Se 1 − hii
σ2 σ2

This ratio does not follow a t distribution because ei and Se are not independent. Let vi = √ei .
Se 1−hii
Show that
vi2

1 1
∼ beta , (n − k − 2)
n−k−1 2 2
So we need to show that
e2i

1 1
∼ beta , (n − k − 2)
SSE(1 − hii ) 2 2

1. e = (I − H) ε, thus

ei = c0i (I − H) ε

where c0i = 0

0 ... 1 0 ... 0 in which 1 is at the ith position. So

e2i = ei ei = ε0 (I − H) ci c0i (I − H) ε

2. SSE = ε0 (I − H) ε

86
Duc Vu (Fall 2021) 29 Lec 29: Dec 3, 2021

§29 Lec 29: Dec 3, 2021

§29.1 I n f l u e nt i a l A n a l y s i s ( C o nt ’ d )
e2i
Let’s express SSE(1−hii ) as follows

ε0 (I − H) ci c0i (I − H) ε
ε0 (I − H) ε(1 − hii )

Divide by σ 2
ε0
σ (I − H) ci c0i (I − H) σε Z0 QZ
0 =
ε ε
σ (I − H) σ (1 − hii )
Z0 (I − H) Z
ε (I−H)ci c0i (I−H)
Here Z = σ ∼ N (0, I) and Q = 1−hii

3. Have
(I − H) ci c0i (I − H) (I − H) ci c0i (I − H)
QQ =
(1 − hii )2
(I − H) ci ci (I − H) ci c0i (I − H)
0
=
(1 − hii )2
0
(I − H) ci ci (I − H)
=
1 − hii
=Q

Note c0i (I − H) ci = 1 − hii . Thus, Q is symmetric and idempotent matrix. Because

Z ∼ N (0, I), it follows that Z0 QZ ∼ Xtr(Q)
2

(I − H) ci c0i (I − H) c0i (I − H) (I − H) ci

tr (Q) = tr = tr =1
1 − hii 1 − hii

Thus, Z0 QZ ∼ X12
4. Back to the ratio
Z0 QZ Z0 QZ
= 0
Z0 (I − H) Z Z (I − H − Q) Z + Z0 QZ
)
(I − H − Q) Q = Q − HQ − QQ
H(I−H)ci c0i (I−H)
HQ = 1−hii =0

Thus, Z0 (I − H − Q) Z and Z0 QZ are independent.

5. Consider

(I − H − Q) (I − H − Q) = I − H − (I − H) Q − Q (I − H) + QQ
=I−H−Q+0−Q+0+Q
=I−H−Q

Therefore, I − H − Q is symmetric and idempotent matrix. It follows that

Z0 (I − H − Q) Z ∼ Xtr(I−H−Q)
2

or
Z0 (I − H − Q) Z ∼ Xn−k−2
2

87
Duc Vu (Fall 2021) 29 Lec 29: Dec 3, 2021

X
Result: Let X ∼ Γ(α1 , β), Y ∼ Γ(α2 , β) and X, Y are independent. Let U = X + Y and V = X+Y .
Then, U, V are independent and
U ∼ Γ (α1 + α2 , β)
V ∼ beta (α1 , α2 )
Therefore,
Z0 QZ

1 1
∼ beta , (n − k − 2)
Z (I − H − Q) Z + Z0 QZ
0 2 2
Z0 QZ ∼ X12 or Z0 QZ ∼ Γ 21 , 2 . Also,

1
Z0 (I − H − Q) Z ∼ Xn−k−2
2
or ∼ Γ (n − k − 2), 2
2
ri2 1 1

We can conclude that n−k−1 ∼ beta 2 , 2 (n − k − 2)

§29.2 E x t e r n a l l y S t u d e nt i z e d R e s i d u a l
Consider the ratio
e
ti = √i
Se(i) 1 − hii
2
where Se(i) is the unbiased estimator of σ 2 after data point i is deleted from the data set. Notice
that
e2i
t2i = 2
Se(i) (1 − hii )
e2i (n − k − 2)
= 2 (1 −
(n − k − 2)Se(i) hii )

2 e2i
But (n − k − 2)Se(i) = (n − k − 1)Se2 − 1−hii . Then,

e2i (n − k − 2)
t2i = h i
e2
(n − k − 1)Se2 − 1−hi ii (1 − hii )
e2i (n − k − 2)
=
(n − k − 1)Se2 (1 − hii ) − e2i
e2i
Note: ri2 = Se2 (1−hii ) =⇒ e2i = ri2 Se2 (1 − hii ). Then,

ri2 Se2 (1 − hii )(n − k − 2)

t2i =
(n − k − 1)Se2 (1 − hii ) − ri2 Se2 (1 − hii )
r2 (n − k − 2) B
= i = (n − k − 2)
n − k − 1 − ri2 1−B
ri2 ri2
∼ beta 21 , 12 (n − k − 2) . From homework #10, exercise #5: If

where B = n−k−1 with n−k−1
B ∼ beta 12 α, 12 β , then

βB
∼ Fα,β
α(1 − B)
Here α = 1, β = n − k − 2. It follows that
B
t2i = (n − k − 2) ∼ F1,n−k−2
1−B
and therefore
e
ti = √i ∼ tn−k−2
Se(i) 1 − hii

88
Duc Vu (Fall 2021) 29 Lec 29: Dec 3, 2021

§29.3 A N o t e o n Va l u a b l e S e l e c t i o n
Effect on the regression when predictors are removed from the model
a) Effect on β̂
Y = Xβ + ε
is the correct model or Y = X1 β 1 + X2 β 2 + ε. Suppose we decided to use Y = X1 β 1 + ε.
−1
Then β̂ 1 = (X01 X1 ) X01 Y and therefore
−1
E β̂ 1 = (X01 X1 ) X01 (X1 β 1 + X2 β 2 )
−1
= β 1 + (X01 X1 ) X01 X2

which is not unbiased.

b) Effect on the variance covariance matrix of β̂:

−1
• From short regression: var(β̂ 1 ) = σ 2 (X01 X1 )
• Long regression:
0 −1
−1 −1
var β̂ 1·2 = σ 2 X∗1 X∗1 = σ 2 (X01 (I − H)X1 ) = σ 2 (X01 X1 − X01 H2 X1 )

Result: If A−1 ≥ B−1 then A ≤ B. Then

h i−1 h i−1 1
var β̂ 1 − var β̂ 1·2 = 2 X01 H2 X1 ≥ 0
σ
var β̂ 1 ≤ var β̂ 1·2

Doctor's Secret To Hair Growth
100% (1)
Doctor's Secret To Hair Growth
42 pages
Econometric S
No ratings yet
Econometric S
1,341 pages
Advanced Statistics Estimation With Handwritten Solutions
No ratings yet
Advanced Statistics Estimation With Handwritten Solutions
285 pages
Notes
No ratings yet
Notes
199 pages
Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
No ratings yet
Stat 111: Introduction To Statistical Inference: ©2023 by Joseph K. Blitzstein and Neil Shephard
387 pages
2021 - Creel - Econometrics (Githuib Book)
No ratings yet
2021 - Creel - Econometrics (Githuib Book)
1,060 pages
Econometric S 2
0% (1)
Econometric S 2
611 pages
STAT 714 Linear Statistical Models: Lecture Notes
No ratings yet
STAT 714 Linear Statistical Models: Lecture Notes
150 pages
Osu Adjustment Computation Notes Parts 1 and 2 0
No ratings yet
Osu Adjustment Computation Notes Parts 1 and 2 0
335 pages
Intro To Econometrics With R PDF
No ratings yet
Intro To Econometrics With R PDF
392 pages
Yamashita Vs Styer
100% (3)
Yamashita Vs Styer
4 pages
Book
No ratings yet
Book
475 pages
Multivariate Analysis Lecture Notes For Stat 5353: J. D. Tubbs Department of Mathematical Sciences Fall Semester 2002
No ratings yet
Multivariate Analysis Lecture Notes For Stat 5353: J. D. Tubbs Department of Mathematical Sciences Fall Semester 2002
234 pages
Hansen (2006, Econometrics)
No ratings yet
Hansen (2006, Econometrics)
196 pages
ECON835 Lecture Notes Part 2 Maximum Likelihood Through Panel Data (Fall 2014)
No ratings yet
ECON835 Lecture Notes Part 2 Maximum Likelihood Through Panel Data (Fall 2014)
68 pages
Introduction To Econometrics With R
No ratings yet
Introduction To Econometrics With R
400 pages
OSU Adjustment Notes Part 1
No ratings yet
OSU Adjustment Notes Part 1
230 pages
Regression GL M
No ratings yet
Regression GL M
315 pages
Quantitative Research Methods For Political Science, Public Policy and Public Administration, With Applications in R
No ratings yet
Quantitative Research Methods For Political Science, Public Policy and Public Administration, With Applications in R
259 pages
Mathematical Statistics Intro Course 1713243381
No ratings yet
Mathematical Statistics Intro Course 1713243381
142 pages
Eco No Metrics
No ratings yet
Eco No Metrics
299 pages
Eco No Metrics 1
No ratings yet
Eco No Metrics 1
299 pages
STA2005S Regression
No ratings yet
STA2005S Regression
92 pages
Introduction To Statistics 14 Weeks
No ratings yet
Introduction To Statistics 14 Weeks
310 pages
MATH3091
No ratings yet
MATH3091
98 pages
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
100% (1)
Graduate Econometrics Lecture Notes - Michael Creel (414 Pages)
414 pages
ST2134 ASSI 2021 Guide
No ratings yet
ST2134 ASSI 2021 Guide
232 pages
WT ST102
No ratings yet
WT ST102
201 pages
Preguntas y Respuestas The Canterville Ghost
100% (1)
Preguntas y Respuestas The Canterville Ghost
13 pages
Math Stats Lecture 2020F
No ratings yet
Math Stats Lecture 2020F
122 pages
SOA Exam Statistics For Risk Modelling Study Manual
No ratings yet
SOA Exam Statistics For Risk Modelling Study Manual
42 pages
Estimations
100% (1)
Estimations
183 pages
Econometrics Simpler Note
No ratings yet
Econometrics Simpler Note
692 pages
Regbook Inside
100% (1)
Regbook Inside
21 pages
Regression Models For Data Science in R
No ratings yet
Regression Models For Data Science in R
137 pages
7772 LectureNotes
No ratings yet
7772 LectureNotes
120 pages
Creel M Econometrics
No ratings yet
Creel M Econometrics
479 pages
Ebook Econometrics
No ratings yet
Ebook Econometrics
1,006 pages
Eco No Metrics
No ratings yet
Eco No Metrics
1,045 pages
Econometría
No ratings yet
Econometría
43 pages
Lecturenote - COL341 - 2010
No ratings yet
Lecturenote - COL341 - 2010
116 pages
Fundamentals of Mathematical Statistics 2020
No ratings yet
Fundamentals of Mathematical Statistics 2020
196 pages
LE 451 Law of Succession Trusts and Wills (WILLS)
100% (2)
LE 451 Law of Succession Trusts and Wills (WILLS)
81 pages
Stat Modelling Notes
No ratings yet
Stat Modelling Notes
49 pages
Mathematics of The Linear Model and Linear Mixed Model: Brian Zhang February 2020
No ratings yet
Mathematics of The Linear Model and Linear Mixed Model: Brian Zhang February 2020
20 pages
Generalized Linear Models
100% (9)
Generalized Linear Models
243 pages
EC400Stats Lecturenotes2021
No ratings yet
EC400Stats Lecturenotes2021
101 pages
Foundations of Econometrics Using SAS Simulations and Examples
No ratings yet
Foundations of Econometrics Using SAS Simulations and Examples
56 pages
A First Course in Mathematical Statistics - Nusbaum
No ratings yet
A First Course in Mathematical Statistics - Nusbaum
195 pages
Econometrics UAB
No ratings yet
Econometrics UAB
353 pages
Etc 2410 Notes
50% (2)
Etc 2410 Notes
133 pages
Gauss Markov Book
No ratings yet
Gauss Markov Book
150 pages
Stat 331 Course Notes
No ratings yet
Stat 331 Course Notes
79 pages
TLE-Fruit Bearing Trees-Week 2
No ratings yet
TLE-Fruit Bearing Trees-Week 2
6 pages
Chakras Book PDF
100% (17)
Chakras Book PDF
89 pages
Sacred Places To Visit in Madinah Ul Nabi SAW
No ratings yet
Sacred Places To Visit in Madinah Ul Nabi SAW
41 pages
Loraine Boettner Marriage
No ratings yet
Loraine Boettner Marriage
17 pages
This Spreadsheet Supports STUDENT Analysis of The Case "Transportation and Consolidation at Elevalt LTD." (UVA-OM-1490)
No ratings yet
This Spreadsheet Supports STUDENT Analysis of The Case "Transportation and Consolidation at Elevalt LTD." (UVA-OM-1490)
7 pages
Ez Publish Advanced Content Management
No ratings yet
Ez Publish Advanced Content Management
460 pages
211108-2017-Spouses Latonio v. McGeorge Food Industries20180221-6791-1nj34pi
No ratings yet
211108-2017-Spouses Latonio v. McGeorge Food Industries20180221-6791-1nj34pi
8 pages
Tamheed Ul Iman by Ala Hazrat
100% (1)
Tamheed Ul Iman by Ala Hazrat
38 pages
1229 Sketching Curves c1
No ratings yet
1229 Sketching Curves c1
13 pages
Log
No ratings yet
Log
332 pages
Healthy Snacks Outline
No ratings yet
Healthy Snacks Outline
2 pages
NP Iii
No ratings yet
NP Iii
10 pages
Construction Services PDF
No ratings yet
Construction Services PDF
2 pages
Literary Criticism 3Sede-A Group 5
No ratings yet
Literary Criticism 3Sede-A Group 5
25 pages
02 - Buddhaghosa X
No ratings yet
02 - Buddhaghosa X
1 page
Ibm Power S1022 Server Product Report
No ratings yet
Ibm Power S1022 Server Product Report
5 pages
Hamm 2015
No ratings yet
Hamm 2015
8 pages
English Reading Activites - Santiago Arellano - 14-12
No ratings yet
English Reading Activites - Santiago Arellano - 14-12
3 pages
T 502 Productleaflet
No ratings yet
T 502 Productleaflet
2 pages
Letter of Acknowledgment - International Host Incubator
No ratings yet
Letter of Acknowledgment - International Host Incubator
2 pages
Accudemia For Tutors FA24
No ratings yet
Accudemia For Tutors FA24
7 pages
Developing A Multicultural and Inclusive Classroom
No ratings yet
Developing A Multicultural and Inclusive Classroom
2 pages
2024 Exercise Allomorph Der Inf
No ratings yet
2024 Exercise Allomorph Der Inf
5 pages
Turbo Straight
No ratings yet
Turbo Straight
1 page
RPT Bi Ting 1
No ratings yet
RPT Bi Ting 1
5 pages
Manual - Parts - Wiring
No ratings yet
Manual - Parts - Wiring
4 pages