100% found this document useful (1 vote)
324 views140 pages

Linear Statistical Models The Less Than Full Rank Model: Yao-Ban Chan

The document discusses less than full rank linear models, where the design matrix X is not full rank. In a one-way classification model with k populations, the first column of X is the sum of the remaining columns, making X not full rank. This means the normal equations XTXb = XTy do not have a unique solution, as XTX is not invertible. Examples of less than full rank models include a one-way classification model comparing different medical treatments or experimental plant treatments.

Uploaded by

Jack Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
324 views140 pages

Linear Statistical Models The Less Than Full Rank Model: Yao-Ban Chan

The document discusses less than full rank linear models, where the design matrix X is not full rank. In a one-way classification model with k populations, the first column of X is the sum of the remaining columns, making X not full rank. This means the normal equations XTXb = XTy do not have a unique solution, as XTX is not invertible. Examples of less than full rank models include a one-way classification model comparing different medical treatments or experimental plant treatments.

Uploaded by

Jack Wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 140

Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Linear statistical models


The less than full rank model

Yao-ban Chan

Linear statistical models: The less than full rank model 1/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

The less than full rank model

In previous sections we used the linear model

y = Xβ + ε
in the knowledge (or assumption) that X , of dimension n × p, is of
full rank, i.e. r (X ) = p.

This assumption is important because a full rank X implies that


X T X is invertible, and therefore the normal equations

XTXb = XTy
have a unique solution.

Linear statistical models: The less than full rank model 2/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

The less than full rank model

Unfortunately, not all linear models fall into this category.

For example, consider the one-way classification model with fixed


effects.

In this model, samples come from k distinct (sub-)populations,


with different characteristics. We wish to determine the differences
between these populations.

Linear statistical models: The less than full rank model 3/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

One-way classification model

For example:
I A medical researcher compares three different types of pain
relievers for effectiveness in relieving arthritis;
I A botanist studies the effects of four experimental treatments
used to enhance the growth of tomato plants; or
I An engineer investigates the sulfur content in the five major
coal seams in a particular geographic region.

Linear statistical models: The less than full rank model 4/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

One-way classification model

Let yij be the j th sample taken from the i th population. Then the
model we use is
yij = µ + τi + εij ,
for i = 1, 2, . . . , k and j = 1, 2, . . . , ni , where
I k is the number of populations/treatments;
I ni is the number of samples from the i th population.

Linear statistical models: The less than full rank model 5/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

One-way classification model

     
y11 1 1 0 ... 0 ε11
y12 1 1 0 ... 0  ε12
 
µ
    
     
 ..   .. .. .. ..  τ1
 .. 
. . . . .  .
      
 
τ2
     
y21  =  1 0 1 ... 0   +  ε21 
   


y22 

1 0 1 ... 0 
  ..
 
ε22 

.
    
     
 ..   .. .. .. ..  τk
 .. 
 .   . . . .   . 
yk ,nk 1 0 0 ... 1 εk ,nk

y = X β + ε

The first column of X is the sum of the remaining columns, and


therefore X is not of full rank.
Linear statistical models: The less than full rank model 6/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

One-way classification model

Example. Three different treatment methods for removing organic


carbon from tar sand wastewater are compared: airflotation, foam
separation, and ferric-chloride coagulation. A study is conducted
and the amounts of carbon removed are:

AF FS FCC
34.6 38.8 26.7
35.1 39.0 26.7
35.3 40.1 27.0

Linear statistical models: The less than full rank model 7/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

One-way classification model

The linear model is


34.6 1 1 0 0 ε11
     
 35.1   1 1 0 0   ε12 
     
 35.3   1 1 0 0     ε13 

 38.8 
   µ  
 1 0 1 0   ε21 
 
 39.0  =
   τ1   
 1 0 1 0    +  ε22 

 40.1 
    τ2   
 1 0 1 0   ε23 

 26.7 
   τ3  
 

 1 0 0 1 


 ε31 

 26.7   1 0 0 1   ε32 
27.0 1 0 0 1 ε33

y = X β + ε

Linear statistical models: The less than full rank model 8/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

The less than full rank model

The difficulty with a less than full rank model is that X T X is


singular. This means that the normal equations do not have a
unique solution.

However, the problem goes deeper than that: not only can we not
estimate the parameters, but the parameters themselves are not
well defined.

Linear statistical models: The less than full rank model 9/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

The less than full rank model

In a one-way classification model, the response variable from


population i has a mean of µ + τi . Thus, for our carbon removal
example we might have

µ + τ1 = 36
µ + τ2 = 39
µ + τ3 = 27.

So our parameters might be µ = 34, τ1 = 2, τ2 = 5, τ3 = −7.

However, we can also have µ = 30, τ1 = 6, τ2 = 9, τ3 = −3.

In fact we can choose µ to be any real number, and still describe


the system.

Linear statistical models: The less than full rank model 10/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization

One way we can tackle the less than full rank model is to convert
to a full rank model. We can then use all the machinery we have
developed.

Example. Consider the one-way classification model with k = 3.


The less than full rank model for this is

yij = µ + τi + εij ,

for i = 1, 2, 3, j = 1, 2, . . . , ni .

However, we can write the mean of each population as

µi = µ + τi .

Linear statistical models: The less than full rank model 11/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization

Then we can recast the model as

yij = µi + εij ,

with corresponding matrices


 
1 0 0
 1 0 0 
 
 .. .. ..   
 . . .  µ1
 
X =  0 1 0 β =  µ2  .
,

 0 1 0  µ3
 
 .. .. .. 
 . . . 
0 0 1

Linear statistical models: The less than full rank model 12/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization

The columns of X are now linearly independent, and so this is a full


rank model that we can analyse. Simple matrix calculations give us

   1 
n1 0 0 n1 0 0
X T X =  0 n2 0  , (X T X )−1 =  0 1
n2 0 
0 0 n3 1
0 0 n3

 Pn1   Pn1  
i=1 y1i Pi=1 y1i n1
X T y =  Pni=1 b = (X T X )−1 X T y =  Pni=1
P 2 2
y2i  , y2i n2  .
n3 n3
i=1 y3i i=1 y3i n3

Linear statistical models: The less than full rank model 13/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization

Therefore, the least squares estimates for each of the population


means are the means of the samples drawn from that population:
ni
1 X
µ̂i = ȳi = yij .
ni
j =1

Linear functions of the parameters, of the form tT β, are estimated


using tT b. For example, the function µ1 − µ2 is estimated by
n1 n2
1 X 1 X
ȳ1 − ȳ2 = y1i − y2i .
n1 n2
i=1 i=2

Linear statistical models: The less than full rank model 14/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization

The standard assumption that the errors are normally distributed


with mean 0 and variance σ 2 I is interpreted in this context to
mean that all populations have a common variance σ 2 (but
different means). The estimator for this variance is

yT y − yT X (X T X )−1 X T y yT y − yT X b
s2 = = .
n −p n −p

Linear statistical models: The less than full rank model 15/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization

For the example,

s2
  Pn1  
3 ni
Pni=1 y1i n1
1 X X 2  Pn1 Pn2 Pn3 
= yij − i=1 y1i i=1 y2i i=1 y3i  2
Pni=1 y2i n2 
n − 3 i=1 j =1 3
i=1 3i n3
y
" 3 n 3 ni
!2 #
i
1 XX X 1 X
= yij2 − yij
n − 3 i=1 j =1 i=1
ni j =1
3
" ni ni
!2 #
1 X X 2 1 X
= yij − yij .
n − 3 i=1 j =1 ni j =1

Linear statistical models: The less than full rank model 16/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization
This can be written as a ‘pooled’ variance

(n1 − 1)s12 + (n2 − 1)s22 + (n3 − 1)s32


s2 =
(n1 − 1) + (n2 − 1) + (n3 − 1)

where si2 are the individual population variance estimators


i n
1 X
si2 = (yij − ȳi. )2 .
ni − 1
j =1

More generally, for a one-way classification model with k levels,


Pk
2 (ni − 1)si2
s = Pi=1
k
.
i=1 (ni − 1)

Linear statistical models: The less than full rank model 17/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization

In general, it is always possible to re-parameterise a less than full


rank model into a full rank model.

However, this is not always desirable.

For the one-way classification model, we have a nice interpretation


of the (re-)parameters as the population means. But this is not
always possible.

Linear statistical models: The less than full rank model 18/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization
Example. Consider the two-way classification model (without
interaction), with two levels of each factor:
yij = µ + τi + βj + εij , i , j = 1, 2.
The design matrix for this model is
 
1 1 0 1 0
 .. .. .. .. .. 
 . . . . . 
 
 1 1 0 0 1 
 
 .. .. .. .. .. 
 . . . . . 
X =  1 0 1
.
 1 0  
 .. .. .. .. .. 
 . . . . . 
 
 1 0 1 0 1 
.. .. .. .. ..
 
. . . . .

Linear statistical models: The less than full rank model 19/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Reparametrization

It is obvious that the first column is the sum of the next two
columns, and also the sum of the 4th and 5th columns. Thus
r (X ) = 3.

This means that we have to remove 2 parameters — which ones?


This makes interpretability much harder!

Fortunately, we do not have to re-parameterise our models: we can


develop theory for the less than full rank model.

Linear statistical models: The less than full rank model 20/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Conditional inverses

We start with more linear algebra. Here we introduce the concept


of conditional inverses.

Definition 6.1
Let A be a n × p matrix. The p × n matrix Ac is called a
conditional inverse for A if and only if

AAc A = A.

Linear statistical models: The less than full rank model 21/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Conditional inverses

If A is square and nonsingular, then A−1 = Ac , so conditional


inverses are an extension of regular inverses to non-square and
singular matrices.

Example. Consider the (singular) matrices


   
2 4 2 0 1 0
A= 1 0 −1  , A1 =  41 − 21 0  .
3 1 −2 0 0 0

Linear statistical models: The less than full rank model 22/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Conditional inverses

We have
   
2 4 2 01 0 2 4 2
AA1 A =  1 0 −1   1 − 1 0   1 0 −1 
4 2
3 1 −2 0 0 0 3 1 −2
  
2 4 2 1 0 −1
=  1 0 −1  0 1 1 
3 1 −2 0 0 0
 
2 4 2
=  1 0 −1  = A.
3 1 −2

Therefore A1 is a conditional inverse for A.

Linear statistical models: The less than full rank model 23/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Conditional inverses

But it can also be shown that


 
0 1 0
A2 =  0 −3 1 
0 0 0

is also a conditional inverse for A! So conditional inverses are not


unique.

That is why we speak of a conditional inverse for A, not the


conditional inverse for A.

Of course, if A is nonsingular, then the conditional inverse is


uniquely the regular inverse. We can use this in the above example
to show that A is singular.

Linear statistical models: The less than full rank model 24/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Finding a conditional inverse

For a square matrix to have a regular inverse, it must satisfy some


nonsingularity conditions. However, this is not the case for a
conditional inverse.

Theorem 6.2
Let A be a n × p matrix. Then A has a conditional inverse.
Moreover, conditional inverses can be constructed as follows:
1. Find a minor M of A which is nonsingular and of dimension
r (A) × r (A).
2. Replace M in A with (M −1 )T and the other entries with
zeros.
3. Transpose the resulting matrix.

Linear statistical models: The less than full rank model 25/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Finding a conditional inverse

Proof. Let’s assume M is the principal (top left) minor of A. We


write  
M A12
A= .
A21 A22

The procedure constructs a p × n matrix B which can be


partitioned as
M −1 0
 
B= .
0 0

Linear statistical models: The less than full rank model 26/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Finding a conditional inverse

Then we have
M −1 0
  

M A12 M A12
ABA =
0 A21 A22
0 A21 A22
  
I 0 M A12
= −1
A21 M 0 A21 A22
 
M A12
= .
A21 A21 M −1 A12

We merely have to show that A21 M −1 A12 = A22 .

Linear statistical models: The less than full rank model 27/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Finding a conditional inverse

This follows because r (A) is the size of M ; we can write all other
columns of A as linear combinations of the first r (A) columns. In
other words, there exists a matrix R such that
   
A12 M
= R
A22 A21
A12 = MR
R = M −1 A12
A22 = A21 R
= A21 M −1 A12 .

Linear statistical models: The less than full rank model 28/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Finding a conditional inverse

Example. From the previous example,


 
2 4 2
A =  1 0 −1  .
3 1 −2

It can be seen that r (A) = 2, so we take the principal 2 × 2 minor


 
2 4
M = .
1 0

Linear statistical models: The less than full rank model 29/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Finding a conditional inverse

Then T 1
  
1 0 −4 0
(M −1 )T = − = 4
4 −1 2 1 − 12

and T 
1
 
0 4 0 0 1 0
Ac =  1 − 21 0  =  41 − 21 0  .
0 0 0 0 0 0

This is the conditional inverse A1 of the earlier example, so we


have seen that it works.
On the other hand, if we take the lower left 2 × 2 minor, following
the procedure gives us A2 . So this procedure can produce more
than one conditional inverse.

Linear statistical models: The less than full rank model 30/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Conditional inverse properties

Let A be a n × p matrix of rank r , where n ≥ p ≥ r . Then


I r (A) = r (AAc ) = r (Ac A);
I (Ac )T = (AT )c ;
I Ac A, AAc , I − Ac A and I − AAc are idempotent;
I A = A(AT A)c (AT A) and AT = (AT A)(AT A)c AT .

Linear statistical models: The less than full rank model 31/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Conditional inverse properties

We say that an expression involving a conditional inverse is unique


if it is the same no matter what conditional inverse we use.

I A(AT A)c AT is unique, symmetric, and idempotent;


I r (A(AT A)c AT ) = r ;
I I − A(AT A)c AT is unique, symmetric and idempotent;
I r (I − A(AT A)c AT ) = n − r .

Linear statistical models: The less than full rank model 32/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Conditional inverse properties

Proof.

[A(AT A)c AT ]T = A[(AT A)c ]T AT = A[(AT A)T ]c AT


= A(AT A)c AT .

A(AT A)c AT A(AT A)c AT A(AT A)c AT A (AT A)c AT


 
=
= A(AT A)c AT .

r (A(AT A)c AT ) ≥ r (A(AT A)c AT A)


= r (A)
≥ r (A(AT A)c AT ).

Linear statistical models: The less than full rank model 33/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

R Example

> library(MASS)
> A <- matrix(c(2,-6,3,1,6,4,-2,-1,0),3,3)
> det(A)
[1] 89
> A # non-singular
[,1] [,2] [,3]
[1,] 2 1 -2
[2,] -6 6 -1
[3,] 3 4 0

Linear statistical models: The less than full rank model 34/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

R Example

> Ac <- ginv(A)


> Ac
[,1] [,2] [,3]
[1,] 0.04494382 -0.08988764 0.1235955
[2,] -0.03370787 0.06741573 0.1573034
[3,] -0.47191011 -0.05617978 0.2022472
> solve(A)
[,1] [,2] [,3]
[1,] 0.04494382 -0.08988764 0.1235955
[2,] -0.03370787 0.06741573 0.1573034
[3,] -0.47191011 -0.05617978 0.2022472

Linear statistical models: The less than full rank model 35/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

R Example

> A <- matrix(c(2,-6,3,1,6,4,3,0,7),3,3)


> det(A)
[1] 0
> A # singular
[,1] [,2] [,3]
[1,] 2 1 3
[2,] -6 6 0
[3,] 3 4 7

Linear statistical models: The less than full rank model 36/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

R Example

> Ac <- ginv(A)


> Ac
[,1] [,2] [,3]
[1,] 0.025713835 -0.084240416 0.03659883
[2,] 0.009149708 0.080454330 0.04369774
[3,] 0.034863543 -0.003786086 0.08029658
> round(A %*% Ac %*% A, 5)
[,1] [,2] [,3]
[1,] 2 1 3
[2,] -6 6 0
[3,] 3 4 7

Linear statistical models: The less than full rank model 37/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

R Example

> Ac2 <- matrix(0,3,3)


> Ac2[1:2,1:2] <- t(solve(A[1:2,1:2]))
> Ac2 <- t(Ac2)
> Ac2
[,1] [,2] [,3]
[1,] 0.3333333 -0.05555556 0
[2,] 0.3333333 0.11111111 0
[3,] 0.0000000 0.00000000 0
> A %*% Ac2 %*% A
[,1] [,2] [,3]
[1,] 2 1 3
[2,] -6 6 0
[3,] 3 4 7

Linear statistical models: The less than full rank model 38/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

R Example

> Ac3 <- matrix(0,3,3)


> Ac3[2:3,1:2] <- t(solve(A[2:3,1:2]))
> Ac3 <- t(Ac3)
> Ac3
[,1] [,2] [,3]
[1,] 0 -0.09523810 0.1428571
[2,] 0 0.07142857 0.1428571
[3,] 0 0.00000000 0.0000000
> A %*% Ac3 %*% A
[,1] [,2] [,3]
[1,] 2 1 3
[2,] -6 6 0
[3,] 3 4 7

Linear statistical models: The less than full rank model 39/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

R Example

> library(Matrix)
> rankMatrix(A)[1]
[1] 2
> rankMatrix(Ac2 %*% A)[1]
[1] 2
> round(A %*% ginv(t(A) %*% A) %*% t(A) %*% A, 5)
[,1] [,2] [,3]
[1,] 2 1 3
[2,] -6 6 0
[3,] 3 4 7

Linear statistical models: The less than full rank model 40/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

R Example

> A %*% ginv(t(A) %*% A) %*% t(A)


[,1] [,2] [,3]
[1,] 0.16516801 -0.09938476 0.35778514
[2,] -0.09938476 0.98816848 0.04259347
[3,] 0.35778514 0.04259347 0.84666351
> AtAc2 <- matrix(0,3,3)
> AtAc2[1:2,1:2] <- solve((t(A) %*% A)[1:2,1:2])
> A %*% AtAc2 %*% t(A)
[,1] [,2] [,3]
[1,] 0.16516801 -0.09938476 0.35778514
[2,] -0.09938476 0.98816848 0.04259347
[3,] 0.35778514 0.04259347 0.84666351

Linear statistical models: The less than full rank model 41/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Let us now solve the normal equations

X T X b = X T y.

First, we must make sure that they have a solution!

Theorem 6.3
The
 system
 Ax = g is consistent if and only if the rank of
A g is equal to the rank of A.

Linear statistical models: The less than full rank model 42/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

 
Proof. (⇐) Since r ( A g ) = r (A), g must be a linear
combination of the columns of A.

Therefore there exist constants x1 , x2 , . . . , xp so that

x1 a1 + x2 a2 + . . . + xp ap = g

where ai is the i th column of A.

Linear statistical models: The less than full rank model 43/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

But if we put this into matrix notation and set


 
x1
 x2 
x =  . ,
 
.
 . 
xp

then this is exactly the system Ax = g.

Therefore the system is consistent.

Linear statistical models: The less than full rank model 44/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Theorem 6.4
In the general linear model y = X β + ε, the normal equations

XTXb = XTy
are consistent.

Linear statistical models: The less than full rank model 45/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Proof. It is obvious that r (X T X ) ≤ r ( X T X X T y ), as


 

adding a column cannot decrease the number of linearly


independent columns.

However,

r( X T X X T y ) = r (X T X
   
y )
≤ r (X T )
= r (X T X ).

Theorem 6.3 now shows that the normal equations are consistent.

Linear statistical models: The less than full rank model 46/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Now that we know the normal equations always have a solution,


how can we find one?

Theorem 6.5
Let Ax = g be a consistent system. Then Ac g is a solution to the
system, where Ac is any conditional inverse for A.

Proof. Since Ax∗ = g for some x∗ ,

AAc g = AAc Ax∗ = Ax∗ = g.

Therefore, Ac g solves the system.

Linear statistical models: The less than full rank model 47/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

From this theorem, we see that

b = (X T X )c X T y

solves the normal equations, for any conditional inverse.

However, in the less than full rank model, different conditional


inverses may result in different solutions.

Linear statistical models: The less than full rank model 48/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Example. Suppose we have a one-way classification model with


two classes and one sample from each class. The design matrix is
 
1 1 0
X = .
1 0 1

Supposing that yT = [6, 8] we get


  
2 1 1 14
XTX =  1 1 0 , XTy =  6 .
1 0 1 8

Linear statistical models: The less than full rank model 49/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

The normal equations are


    
2 1 1 b0 14
 1 1 0   b1  =  6  .
1 0 1 b2 8

Since the first column of X T X is the sum of the next two, X T X


is not of full rank. It is easy to see that r (X T X ) = 2.

Linear statistical models: The less than full rank model 50/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

To find a conditional inverse of X T X , we apply Theorem 6.2,


2 1
using the nonsingular minor .
1 1
This gives  
1 −1 0
(X T X )c =  −1 2 0 
0 0 0

and therefore
    
1 −1 0 14 8
b = (X T X )c X T y =  −1 2 0   6  =  −2  .
0 0 0 8 0

Linear statistical models: The less than full rank model 51/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations


 
1 0
However, using the minor gives the conditional inverse
0 1
 
0 0 0
(X T X )c =  0 1 0  ,
0 0 1

which gives the solution


    
0 0 0 14 0
b =  0 1 0  6  =  6 .
0 0 1 8 8

Both these solutions solve the normal equations, and are equally
valid! This is the problem with the less than full rank model.

Linear statistical models: The less than full rank model 52/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Carbon removal example

Example. Consider the earlier carbon removal example. We have


 
9 3 3 3
3 3 0 0 
XTX = 


 3 0 3 0 
3 0 0 3

so a conditional inverse is
 
0 0 0 0
1
0 0 0 
(X T X )c = 

3 .
 0 1
0 3 0 
1
0 0 0 3

Linear statistical models: The less than full rank model 53/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Carbon removal example

We can also calculate



303.3
 105 
XTy = 
 117.9  .

80.4

Using the conditional inverse above gives us a solution to the


normal equations:
 
0
 35 
b = (X T X )c X T y = 
 39.3  .

26.8

Linear statistical models: The less than full rank model 54/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

If the model is less than full rank, the normal equations have an
infinite number of solutions.

Theorem 6.6
Let Ax = g be a consistent system. Then

x = Ac g + (I − Ac A)z

solves the system, where z is an arbitrary p × 1 vector.

Linear statistical models: The less than full rank model 55/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Proof. We know that Ac g solves the system, so

Ax = A [Ac g + (I − Ac A)z]
= AAc g + (A − AAc A)z
= g + (A − A)z = g.

Thus, for the normal equations, any vector of the form

b = (X T X )c X T y + [I − (X T X )c X T X ]z

satisfies the equations.

Linear statistical models: The less than full rank model 56/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Example. In the two-class example above, one solution to the


T
normal equations was (X T X )c X T y = 8 −2 0

.

Using the same conditional inverse we have


    
1 −1 0 2 1 1 1 0 1
(X T X )c X T X =  −1 2 0   1 1 0  =  0 1 −1  .
0 0 0 1 0 1 0 0 0

Linear statistical models: The less than full rank model 57/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Then another solution to the normal equations is

b = (X T X )c X T y + [I − (X T X )c X T X ]z
       
8 1 0 0 1 0 1 z1
=  −2 +    0 1 0  −  0 1 −1   z2 
0 0 0 1 0 0 0 z3
 
8 − z3
=  −2 + z3 
z3

for arbitrary z3 .
 
7
For example,  −1  is a solution.
1

Linear statistical models: The less than full rank model 58/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

The converse of the above theorem is also true: all solutions to the
system can be expressed in this form.

Theorem 6.7
Let Ax = g be a consistent system and let x0 be any solution to
the system. Then for any Ac ,

x0 = Ac g + (I − Ac A)z

where z = x0 .

Linear statistical models: The less than full rank model 59/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Proof. Since x0 solves the system, we have

Ac g + (I − Ac A)z = Ac g + (I − Ac A)x0
= Ac g + x0 − Ac Ax0
= Ac g + x0 − Ac g = x0 .

For the normal equations, this means that any solution can be
expressed as

b = (X T X )c X T y + [I − (X T X )c X T X ]z

for any conditional inverse (X T X )c , and some z.

Linear statistical models: The less than full rank model 60/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations


Example. In the two-class example, we found the solution
 
8
b1 =  −2 
0
using our original conditional inverse.
But we also noted that the conditional inverse
 
0 0 0
(X T X )c2 =  0 1 0 
0 0 1
produces the solution 

0
b2 =  6  .
8

Linear statistical models: The less than full rank model 61/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Solving the normal equations

Using the theorem, the first solution can be written in terms of the
second solution:

b1 = (X T X )c2 X T y + (I − (X T X )c2 X T X )z
        
0 1 0 0 0 0 0 2 1 1 8
=  6  +  0 1 0  −  0 1 0   1 1 0   −2 
8 0 0 1 0 0 1 1 0 1 0
    
0 1 0 0 8
=  6  +  −1 0 0   −2 
8 −1 0 0 0
 
8
=  −2  .
0

Linear statistical models: The less than full rank model 62/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

We compare the marks of students in 3 different mathematics


classes. There is another factor (IQ), but we ignore this for the
time being.
> maths <- read.csv("../data/maths.csv")
> str(maths)
'data.frame': 30 obs. of 5 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ maths.y: int 81 84 81 79 78 79 81 85 72 79 ...
$ iq : int 99 103 108 109 96 104 96 105 94 91 ...
$ class : int 1 1 1 1 1 1 1 1 1 1 ...
$ class.f: int 1 1 1 1 1 1 1 1 1 1 ...
> maths$class.f <- factor(maths$class.f)

Linear statistical models: The less than full rank model 63/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> plot(maths$class, maths$maths.y)
100

● ●
95

● ●


90

● ●
maths$maths.y



85

● ●


80



● ● ●
75

● ●

1.0 1.5 2.0 2.5 3.0

maths$class

Linear statistical models: The less than full rank model 64/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> plot(maths$class.f, maths$maths.y)
100
95
90
85


80


75

1 2 3

Linear statistical models: The less than full rank model 65/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

> (y <- maths$maths.y)


[1] 81 84 81 79 78 79 81 85 72 79 85 78 93 80 83 95 90 89
[26] 91 88 93 90 78
> n <- dim(maths)[1]
> k <- length(levels(maths$class.f))
> X <- matrix(0,n,k+1)
> X[,1] <- 1
> X[maths$class.f==1,2] <- 1
> X[maths$class.f==2,3] <- 1
> X[maths$class.f==3,4] <- 1

Linear statistical models: The less than full rank model 66/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> X
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 1 1 0 0
[3,] 1 1 0 0
[4,] 1 1 0 0
[5,] 1 1 0 0
[6,] 1 1 0 0
[7,] 1 1 0 0
[8,] 1 1 0 0
[9,] 1 1 0 0
[10,] 1 1 0 0
[11,] 1 0 1 0
[12,] 1 0 1 0
[13,] 1 0 1 0
[14,] 1 0 1 0
[15,] 1 0 1 0
[16,] 1 0 1 0
[17,] 1 0 1 0
[18,] 1 0 1 0
[19,] 1 0 1 0
[20,] 1 0 1 0
[21,] 1 0 0 1
Linear statistical models: The less than full rank model 67/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example: reparametrisation

> Xre <- X[,-1]


> (b <- solve(t(Xre) %*% Xre, t(Xre) %*% y))
[,1]
[1,] 79.9
[2,] 86.5
[3,] 89.4

Linear statistical models: The less than full rank model 68/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example: reparametrisation


> modelre <- lm(y ~ 0 + X[,2] + X[,3] + X[,4])
> summary(modelre)
Call:
lm(formula = y ~ 0 + X[, 2] + X[, 3] + X[, 4])

Residuals:
Min 1Q Median 3Q Max
-14.40 -1.80 0.85 3.60 10.50

Coefficients:
Estimate Std. Error t value Pr(>|t|)
X[, 2] 79.900 2.053 38.92 <2e-16 ***
X[, 3] 86.500 2.053 42.14 <2e-16 ***
X[, 4] 89.400 2.053 43.55 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.492 on 27 degrees of freedom


Multiple R-squared: 0.9948, Adjusted R-squared: 0.9942
F-statistic:
Linear statistical 1729
models: The less onrank
than full 3 model
and 27 DF, p-value: < 2.2e-16 69/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


Let’s look at the normal equations.
> t(X) %*% X
[,1] [,2] [,3] [,4]
[1,] 30 10 10 10
[2,] 10 10 0 0
[3,] 10 0 10 0
[4,] 10 0 0 10
> t(X) %*% y
[,1]
[1,] 2558
[2,] 799
[3,] 865
[4,] 894

Linear statistical models: The less than full rank model 70/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> XtXc <- matrix(0,4,4)
> XtXc[2:4,2:4] <- solve((t(X) %*% X)[2:4,2:4])
> (b <- XtXc %*% t(X) %*% y)
[,1]
[1,] 0.0
[2,] 79.9
[3,] 86.5
[4,] 89.4
> round(t(X) %*% X %*% b - t(X) %*% y, 3)
[,1]
[1,] 0
[2,] 0
[3,] 0
[4,] 0
Linear statistical models: The less than full rank model 71/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

> (b2 <- ginv(t(X) %*% X) %*% t(X) %*% y)


[,1]
[1,] 63.95
[2,] 15.95
[3,] 22.55
[4,] 25.45
> round(t(X) %*% X %*% b2 - t(X) %*% y, 3)
[,1]
[1,] 0
[2,] 0
[3,] 0
[4,] 0

Linear statistical models: The less than full rank model 72/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> I4 <- diag(4)
> z <- c(2,8,-2,1)
> (b3 <- b + (I4 - XtXc %*% t(X) %*% X) %*% z)
[,1]
[1,] 2.0
[2,] 77.9
[3,] 84.5
[4,] 87.4
> round(t(X) %*% X %*% b3 - t(X) %*% y, 3)
[,1]
[1,] 0
[2,] 0
[3,] 0
[4,] 0
Linear statistical models: The less than full rank model 73/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

> b + (I4 - XtXc %*% t(X) %*% X) %*% b3


[,1]
[1,] 2.0
[2,] 77.9
[3,] 84.5
[4,] 87.4
> b3
[,1]
[1,] 2.0
[2,] 77.9
[3,] 84.5
[4,] 87.4

Linear statistical models: The less than full rank model 74/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Now we know how to solve the normal equations; furthermore, we


know how to find all solutions for them.

But which solution(s) do we want?

Or rather, which solutions can we find?

Linear statistical models: The less than full rank model 75/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Some quantities do not change no matter what solutions we use


for the normal equations. We call these quantities estimable.

A trivial example is the responses y.

Definition 6.8
In the general linear model y = X β + ε, a function tT β is said to
be estimable if there exists a vector c such that E [cT y] = tT β.

In other words, a quantity is estimable if there is a linear unbiased


estimator for it.

Linear statistical models: The less than full rank model 76/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Theorem 6.9
In the general linear model y = X β + ε, tT β is estimable if and
only if there is a solution to the linear system X T X z = t.

Proof. (⇐) Let z0 be a solution to X T X z = t and put c = X z0 .

Then

E [cT y] = E [zT T T T T T T
0 X y] = z0 X E [y] = z0 X X β = t β,

so tT β is estimable.

Linear statistical models: The less than full rank model 77/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Example. Consider our two-class example. We had


 
  2 1 1
1 1 0
X = , XTX =  1 1 0 .
1 0 1
1 0 1

Consider the combination of parameters β1 − β2 . This corresponds


to tT β where  
0
t =  1 .
−1

Linear statistical models: The less than full rank model 78/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

We look for a solution to the system


    
2 1 1 z1 0
 1 1 0   z2  =  1  .
1 0 1 z3 −1

This system has solution z1 = 0, z2 = 1, z3 = −1, so β1 − β2 is


estimable.

Linear statistical models: The less than full rank model 79/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Theorem 6.10
In the general linear model y = X β + ε, tT β is estimable if and
only if
tT (X T X )c X T X = tT ,
for some (and thus all) conditional inverse of (X T X ).

Proof. (⇐) Assume that tT (X T X )c X T X = tT , so

X T X ((X T X )c )T t = X T X (X T X )c t = t.

This means that (X T X )c t is a solution to the system X T X z = t,


and Theorem 6.9 implies that tT β is estimable.

Linear statistical models: The less than full rank model 80/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

(⇒) Suppose that tT β is estimable. By Theorem 6.9, there exists


a solution to the system X T X z = t.

We know that a solution is z = (X T X )c t. (Note that the


conditional inverse is arbitrary.)

In other words,
X T X (X T X )c t = t
and by taking transposes, we see that this gives the required
condition.

Linear statistical models: The less than full rank model 81/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Example. Consider the previous example. Let us take the


conditional inverse
 
1 −1 0
(X T X )c =  −1 2 0 
0 0 0
and consider again the quantity β1 − β2 , which corresponds to
 T
t = 0 1 −1 .

Linear statistical models: The less than full rank model 82/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Then

  
1 −1 0 2 1 1
tT (X T X )c (X T X )
 
= 0 1 −1  −1 2 0  1 1 0 
0 0 0 1 0 1
 
  1 0 1
= 0 1 −1  0 1 −1 
0 0 0
T
 
= 0 1 −1 = t ,

so again we see that β1 − β2 is estimable.

Linear statistical models: The less than full rank model 83/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability
 T
On the other hand, suppose we take t = 1 0 0 so that
tT β = β0 .

Then we have
  
1 −1 0 2 1 1
tT (X T X )c (X T X ) =
 
1 0 0  −1 2 0  1 1 0 
0 0 0 1 0 1
 
  1 0 1
= 1 0 0  0 1 −1 
0 0 0
T
 
= 1 0 1 6= t ,

so β0 is not estimable.

Linear statistical models: The less than full rank model 84/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Example. We return to the carbon removal example. We are


interested in seeing if the three carbon removal treatments have
(significantly) different means.

To test this, we look at the quantities τ1 − τ2 and τ1 − τ3 .

If both of these are (close to) 0, then the treatments are not
significantly different.

Linear statistical models: The less than full rank model 85/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

We have
   
3 1 1 1 0 0 0 0
1 1 0 0  1 1 0 0 
XTX = 3 (X T X )c X T X = 
 
, 
 1 0 1 0   1 0 1 0 
1 0 0 1 1 0 0 1

and the coefficient vectors


   
0 0
 1   1 
 −1  ,
t1 =   t2 = 
 0 .

0 −1

Linear statistical models: The less than full rank model 86/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

 
0 0 0 0
 1 1 0 0 
tT T c T
  
1 (X X ) X X = 0 1 −1 0 
 1
= 0 1 −1 0
0 1 0 
1 0 0 1

so tT
1 β = τ1 − τ2 is estimable.

 
0 0 0 0
 1 1 0 0 
tT T c T
  
1 (X X ) X X = 0 1 0 −1 
 1
= 0 1 0 −1
0 1 0 
1 0 0 1

so tT
2 β = τ1 − τ3 is also estimable.

Linear statistical models: The less than full rank model 87/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Next we will prove that no matter what conditional inverse we use,


we will still generate the same estimate for an estimable quantity.

Theorem 6.11 (A Gauss-Markov Theorem)


In the general linear model y = X β + ε, suppose tT β is
estimable. Then the best linear unbiased estimator (BLUE) for
tT β is zT X T y, where z is a solution to the system X T X z = t.
Furthermore, this estimate is the same for any solution of the
system, and can be written tT b, where b is any solution to the
normal equations.

Linear statistical models: The less than full rank model 88/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Proof. We first show unbiasedness of the estimator.

E [zT X T y] = zT X T E [y]
= zT X T X β
= tT β.

BLUEness is more involved, but is similar to the proof of Theorem


4.4.

Now suppose we have two solutions to the system X T X z = t,


called z0 and z1 . Let b be any solution to the normal equations:

X T X b = X T y.

Linear statistical models: The less than full rank model 89/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

The best linear unbiased estimator of tT β is

zT T T T T T T
0 X y = z0 X X b = (X X z0 ) b = t b.

Since b is an arbitrary solution, this is the same no matter what


solution we choose.

Similarly,
zT T T T T
1 X y = t b = z0 X y.

Thus the best linear unbiased estimator is unique, and equal to


tT b.

Linear statistical models: The less than full rank model 90/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

Example. Let’s look again at the two-class example. We know


that β1 − β2 is estimable. We also know that solutions to the
normal equations include
   
8 0
b =  −2  , b0 =  6  .
0 8

To estimate β1 − β2 , we can use


 
8
tT b = 0 1 −1  −2  = −2.
 

Linear statistical models: The less than full rank model 91/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability

However, from Theorem 6.11, we can also use


 
0
tT b0 = 0 1 −1  6  = −2.
 

This estimate is the same as the previous one, which follows from
the theorem: any solution to the normal equation, using any
conditional inverse, will produce exactly the same estimate.

In other words, the estimator is unique.

Linear statistical models: The less than full rank model 92/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability
Example. Back to the carbon removal example. We have shown
that τ1 − τ2 and τ1 − τ3 are estimable. We estimate them by
 
0
  35 
tT

1 b= 0 1 −1 0   39.3  = −4.3

26.8

and  
0
  35 
tT

2 b= 0 1 0 −1  39.3  = 8.2

26.8
respectively.
Again, no matter what conditional inverse we use, these estimates
remain the same.
Linear statistical models: The less than full rank model 93/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

Theorem 6.12
In the linear model y = X β + ε, elements of X β are estimable.

Proof. We know that E [y] = X β. Now take ei to be the i th


standard basis vector.

We have

(X β)i = eT
i Xβ
= eT
i E [y]
= E [eT
i y]

and so the i th element of X β is estimable.

Linear statistical models: The less than full rank model 94/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

Example. Consider the carbon removal example. We have

1 1 0 0
 

 1 1 0 0 

 1 1 0 0  

  µ

 1 0 1 0 
  τ1 
X =
 1 0 1 0 ,
  τ2  .
β= 
 1 0 1 0 
  τ3

 1 0 0 1 

 1 0 0 1 
1 0 0 1

We know that we cannot estimate the parameter vector β, because


it is not uniquely determined.

Linear statistical models: The less than full rank model 95/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

However, the real quantities of interest are the mean responses


from the three treatments. These are:
 
µ + τ1 = 1 1 0 0 β
 
µ + τ2 = 1 0 1 0 β
 
µ + τ3 = 1 0 0 1 β

and each of these are elements of X β. Therefore, they are


estimable.

In a one-way classification model with any number of levels, µ + τi


is always estimable.

Linear statistical models: The less than full rank model 96/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

Theorem 6.13
Let tT T T
1 β, t2 β, . . . , tk β be estimable functions, and let

z = a1 tT T T
1 β + a2 t2 β + . . . + ak tk β.

Then z is estimable, and the best linear unbiased estimator for z is

a1 tT T T
1 b + a2 t2 b + . . . + ak tk b.

Linear statistical models: The less than full rank model 97/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

Proof. By definition,

z = (a1 t1 + a2 t2 + . . . + ak tk )T β.

From Theorem 6.10,

(a1 t1 + a2 t2 + . . . + ak tk )T (X T X )c X T X
= a1 tT T c T T T c T T T c T
1 (X X ) X X + a2 t2 (X X ) X X + . . . + ak tk (X X ) X X
= a1 tT T T
1 + a2 t2 . . . + ak tk
= (a1 t1 + a2 t2 + . . . + ak tk )T .

Therefore z is estimable, with BLUE

(a1 t1 + a2 t2 + . . . + ak tk )T b.

Linear statistical models: The less than full rank model 98/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

Of particular interest in many studies is the way different


populations compare against each other. To attach a numerical
value to these comparisons, we form linear combinations

a1 τ1 + a2 τ2 + . . . + ak τk ,
Pk
where i=1 ai = 0.

These treatment contrasts wipe out the effect of the overall mean
response, to describe the differences between populations.

Linear statistical models: The less than full rank model 99/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

In a one-way classification model, any treatment contrast is


estimable.

If
z = a1 τ1 + a2 τ2 + . . . + ak τk
is a treatment contrast, then
k
X
z = ak µ + a1 τ1 + a2 τ2 + . . . + ak τk
i=1
= a1 (µ + τ1 ) + a2 (µ + τ2 ) + . . . + ak (µ + τk )

is a linear combination of the estimable functions µ + τi , and is


therefore estimable.

Linear statistical models: The less than full rank model 100/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

Of particular interest among treatment contrasts is the contrast of


the form τi − τj , for some i 6= j . This is because

τi − τj = (µ + τi ) − (µ + τj )

is the difference between the mean responses in populations i and


j.

We would expect to estimate this contrast by the corresponding


difference in sample means, ȳi − ȳj . We can show using the theory
we have developed that this is in fact the case.

Linear statistical models: The less than full rank model 101/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

Example. We do this for k = 3 and the contrast τ1 − τ2 . Our


matrices are
   
y11 1 1 0 0
 ..   .. .. .. .. 

 . 

 . . .
 .  
 y1n1   1 1 0 0   

 y21 
 
 1 0 1 0 
 µ

.
 
. . .

..  ,  τ1 
y= ..  , X = 
 .. .. .. β=
 τ2  .
 
   . 

 y2n2   1 0 1 0 
    τ3
 y31   1 0 0 1 
   
 ..   .. .. .. .. 
 .   . . . . 
y3n3 1 0 0 1

Linear statistical models: The less than full rank model 102/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems

Direct multiplication gives


 P3 Pn 
i
j =1 yij
 
i=1 P n n1 n2 n3
 y   n1 n1 0 0 
XTy = 
 Pj 1j ,

XTX = 
 n2 0 n2 0  .

y
Pj 2j
 
j y3j
n3 0 0 n3

Using the conditional inverse algorithm on the lower right corner of


X T X gives
0 0 0 0
 

T c
 0 n11 0 0 
(X X ) =   1
.
0 0 n2 0 
1
0 0 0 n3

Linear statistical models: The less than full rank model 103/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimability theorems
Therefore a solution to the normal equations is
 
0
 ȳ1 
b = (X T X )c X T y = 
 ȳ2  .

ȳ3
 
We have τ1 − τ2 = 0 1 −1 0 β, so the best linear unbiased
estimator for τ1 − τ2 is
 
0
   ȳ1 
0 1 −1 0  ȳ2  = ȳ1 − ȳ2 .

ȳ3

If we took any conditional inverse, we would get the same result.


Linear statistical models: The less than full rank model 104/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


We return to the maths dataset. Recall that b, b2 and b3 are all
solutions to the normal equations.
> (tt <- c(0,1,-1,0))
[1] 0 1 -1 0
> round(tt %*% XtXc %*% t(X) %*% X, 5) # estimable
[,1] [,2] [,3] [,4]
[1,] 0 1 -1 0
> (tt2 <- c(1,1,1,1))
[1] 1 1 1 1
> tt2 %*% XtXc %*% t(X) %*% X # not estimable
[,1] [,2] [,3] [,4]
[1,] 3 1 1 1
Linear statistical models: The less than full rank model 105/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

> tt %*% b
[,1]
[1,] -6.6
> tt %*% b2
[,1]
[1,] -6.6
> tt %*% b3
[,1]
[1,] -6.6
> mean(maths$maths.y[maths$class.f==1]) -
+ mean(maths$maths.y[maths$class.f==2])
[1] -6.6
Linear statistical models: The less than full rank model 106/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

> tt2 %*% b


[,1]
[1,] 255.8
> tt2 %*% b2
[,1]
[1,] 127.9
> tt2 %*% b3
[,1]
[1,] 251.8

Linear statistical models: The less than full rank model 107/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

For the less than full rank model, R uses contrasts for its tests.
The two main contrast sets are contr.treatment and
contr.sum. For the one-way classification model:

Label contr.treatment contr.sum


Intercept µ1 µ̄
factor1 µ1 − µ̄
factor2 µ2 − µ 1 µ2 − µ̄
factor3 µ3 − µ 1 µ3 − µ̄
.. .. ..
. . .
factor(k-1) µk −1 − µ1 µk −1 − µ̄
factor(k) µk − µ 1

Linear statistical models: The less than full rank model 108/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

In terms of our parameters:

Label contr.treatment contr.sum


µ + k1 P τi
P
Intercept µ + τ1
factor1 τ1 − k1 P τi
factor2 τ2 − τ1 τ2 − k1 P τi
factor3 τ3 − τ1 τ3 − k1 τi
.. .. ..
. . . P
factor(k-1) τk −1 − τ1 τk −1 − k1 τi
factor(k) τk − τ1

Linear statistical models: The less than full rank model 109/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> contrasts(maths$class.f) <- contr.treatment(k)
> model <- lm(maths.y ~ class.f, data = maths)
> summary(model)
Call:
lm(formula = maths.y ~ class.f, data = maths)

Residuals:
Min 1Q Median 3Q Max
-14.40 -1.80 0.85 3.60 10.50

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 79.900 2.053 38.922 < 2e-16 ***
class.f2 6.600 2.903 2.273 0.03117 *
class.f3 9.500 2.903 3.272 0.00292 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.492 on 27 degrees of freedom


Multiple
Linear statistical R-squared:
models: 0.2941,
Adjusted R-squared:
The less than full rank model 0.2418 110/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> contrasts(maths$class.f) <- contr.sum(k)
> model2 <- lm(maths.y ~ class.f, data = maths)
> summary(model2)
Call:
lm(formula = maths.y ~ class.f, data = maths)

Residuals:
Min 1Q Median 3Q Max
-14.40 -1.80 0.85 3.60 10.50

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 85.267 1.185 71.943 < 2e-16 ***
class.f1 -5.367 1.676 -3.202 0.00348 **
class.f2 1.233 1.676 0.736 0.46818
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.492 on 27 degrees of freedom


Multiple
Linear statistical R-squared:
models: 0.2941,
Adjusted R-squared:
The less than full rank model 0.2418 111/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> plot(model, which=1)

Residuals vs Fitted


10






5


● ●



Residuals


0


● ●


−5




−10

● 20 30 ●
−15

22 ●

80 82 84 86 88

Fitted values
lm(maths.y ~ class.f)

Linear statistical models: The less than full rank model 112/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> plot(model, which=2)

Normal Q−Q
2






1
Standardized residuals



●● ●


●●●
●●
0

●●●

●●

−1



20 ●30 ●
−2

● 22

−2 −1 0 1 2

Theoretical Quantiles
lm(maths.y ~ class.f)

Linear statistical models: The less than full rank model 113/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> plot(model, which=3)

Scale−Location
22 ●
1.5

● 20 30 ●



Standardized residuals

● ●
1.0



● ●



0.5

● ●




0.0

80 82 84 86 88

Fitted values
lm(maths.y ~ class.f)

Linear statistical models: The less than full rank model 114/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example


> plot(model, which=5)
Constant Leverage:
Residuals vs Factor Levels
2






1


Standardized residuals


● ●




0


● ●


−1



● 20 30 ●
−2

22 ●

class.f :
1 2 3

Factor Level Combinations

Linear statistical models: The less than full rank model 115/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimating σ 2 in the less than full rank model

In the full rank model, we estimated σ 2 by


SSRes
s2 = ,
n −p
where n is the sample size, p is the number of parameters, and
SSRes is the sum of squares of the residuals:

SSRes = (y − X b)T (y − X b) = yT [I − X (X T X )−1 X T ]y.

Linear statistical models: The less than full rank model 116/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimating σ 2 in the less than full rank model

For the less than full rank model we can still define the residual
sum of squares as

SSRes = (y − X b)T (y − X b),

where b is any solution to the normal equations.

Although b can vary, X b will not, because X β is estimable.


Therefore SSRes is invariant to the choice of b.

Linear statistical models: The less than full rank model 117/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimating σ 2 in the less than full rank model

Theorem 6.14

SSRes = yT [I − X (X T X )c X T ]y.

Proof. Let b = (X T X )c X T y. Then

SSRes = (yT − bT X T )(y − X b)


= yT y − 2yT X b + bT X T X b
= yT y − 2yT X (X T X )c X T y + yT X (X T X )c X T X (X T X )c X T y
= yT y − 2yT X (X T X )c X T y + yT X (X T X )c X T y
= yT [I − X (X T X )c X T ]y.

Linear statistical models: The less than full rank model 118/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimating σ 2 in the less than full rank model

How do we find an estimator for σ 2 ?

Let’s consider SSRes again. Take H = X (X T X )c X T and


remember that HX = X .

E [SSRes ] = E [yT (I − H )y]


= tr (I − H )σ 2 + (X β)T (I − H )X β
= tr (I − H )σ 2 + β T X T X β − β T X T HX β
= tr (I − H )σ 2 + β T X T X β − β T X T X β
= tr (I − H )σ 2 .

Linear statistical models: The less than full rank model 119/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimating σ 2 in the less than full rank model

Since I − H is symmetric and idempotent, we have

E [SSRes ] = r (I − H )σ 2 = (n − r )σ 2 ,

where r = r (X ), the rank of X .

Theorem 6.15
In the general linear model y = X β + ε, suppose X has rank r
and ε has mean 0 and variance σ 2 I . Then an unbiased estimator
for σ 2 is
SSRes
.
n −r

Linear statistical models: The less than full rank model 120/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimating σ 2 in the less than full rank model


Example. We return to the carbon removal example.
> y <- c(34.6,35.1,35.3,38.8,39.0,40.1,26.7,26.7,27.0)
> X <- matrix(c(rep(1,9),rep(0,27)),9,4)
> X[1:3,2] <- 1
> X[4:6,3] <- 1
> X[7:9,4] <- 1
> X
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 1 1 0 0
[3,] 1 1 0 0
[4,] 1 0 1 0
[5,] 1 0 1 0
[6,] 1 0 1 0
[7,] 1 0 0 1
[8,] 1 0 0
Linear statistical models: The less than full rank model
1 121/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Estimating σ 2 in the less than full rank model

> (b <- ginv(t(X)%*%X)%*%t(X)%*%y)


[,1]
[1,] 25.275
[2,] 9.725
[3,] 14.025
[4,] 1.525
> e <- y - X%*%b
> (SSRes <- sum(e^2))
[1] 1.3
> (s2 <- SSRes/(9-3))
[1] 0.2166667

Linear statistical models: The less than full rank model 122/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

> library(Matrix)
> (SSRes <- sum((y-X%*%b)^2))
[1] 1137.8
> sum(y^2) - t(y) %*% X %*% XtXc %*% t(X) %*% y
[,1]
[1,] 1137.8
> (s2 <- SSRes/(n - rankMatrix(X)[1]))
[1] 42.14074

Linear statistical models: The less than full rank model 123/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

> deviance(model)
[1] 1137.8
> deviance(model)/model$df.residual
[1] 42.14074

Linear statistical models: The less than full rank model 124/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Interval estimation in the less than full rank model

We can find point estimates for estimable quantities. The next


step is to try and find confidence intervals for them.

For the Gauss-Markov theorem we only required that ε has mean 0


and variance σ 2 I . However, to find confidence intervals, we need
some idea of the distribution of the variables, so we suppose that
ε ∼ MVN (0, σ 2 I ).

Linear statistical models: The less than full rank model 125/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Interval estimation in the less than full rank model

Recall that in the full rank model, we generated confidence


intervals by finding a t-distributed quantity, which was created by
dividing a normal variable by (the square root of) a χ2 variable.

The χ2 variable was


SSRes
,
σ2
which had n − p degrees of freedom.

The σ 2 term was not known, but cancelled out another σ 2 term in
the numerator to leave us with something that we could calculate.

We can proceed in a similar manner for the less than full rank
model.

Linear statistical models: The less than full rank model 126/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Interval estimation in the less than full rank model

Theorem 6.16
In the general linear model y = X β + ε, assume
ε ∼ MVN (0, σ 2 I ). Then

(n − r )s 2 SSRes
=
σ2 σ2
has a χ2 distribution with n − r degrees of freedom.

Theorem 6.17
In the general linear model y = X β + ε, assume
ε ∼ MVN (0, σ 2 I ). If tT β is estimable, then tT b is independent
of s 2 .

Linear statistical models: The less than full rank model 127/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Interval estimation in the less than full rank model

The steps to derive a confidence interval are very similar to that for
the full rank case, but with two small differences. Firstly, we can
only find confidence intervals for quantities that are estimable!

Secondly, we replace the inverse (X T X )−1 by the conditional


inverse (X T X )c .

All other steps are the same.

Linear statistical models: The less than full rank model 128/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Interval estimation in the less than full rank model

We have

Var tT b = Var tT (X T X )c X T y
= tT (X T X )c X T σ 2 IX (X T X )c t
= σ 2 tT (X T X )c t.

Thus p
(tT b − tT β)/σ tT (X T X )c t
p
s 2 /σ 2
has a t distribution with n − r degrees of freedom.

Linear statistical models: The less than full rank model 129/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Interval estimation in the less than full rank model

This gives us the confidence interval for the (estimable) quantity


tT β, using a t distribution with n − r degrees of freedom:
q
T
t b ± tα/2 s tT (X T X )c t.

This formula can also be used to find confidence intervals for the
individual parameters, if they are estimable.

Linear statistical models: The less than full rank model 130/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Interval estimation in the less than full rank model

Example. We return again to the carbon removal example.


Suppose we want to find a 95% confidence interval for τ1 − τ2 .
> (tt <- c(0,1,-1,0))
[1] 0 1 -1 0
> ta <- qt(0.975,9-3)
> halfwidth <- ta*sqrt(s2*t(tt)%*%ginv(t(X)%*%X)%*%tt)
> tt%*%b + c(-1,1)*halfwidth
[1] -5.22997 -3.37003
In particular, we can say with 95% confidence that the the first
carbon removal treatment is not as effective as the second.

Linear statistical models: The less than full rank model 131/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Interval estimation in the less than full rank model

Example. We showed earlier that in a 3-level 1-way classification


model, the contrast τ1 − τ2 can be estimated by the difference in
the respective population means, ȳ1 − ȳ2 .

We also had
0 0 0 0

  
0
 0 1
 1  0 0 
t=
 −1  ,
 (X T X )c = 
 0
n1
1
.
0 n2 0 
0 1
0 0 0 n3

Linear statistical models: The less than full rank model 132/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Interval estimation in the less than full rank model


Therefore we have
  
0 0 0 0 0
 0 1
0 0   1
tT (X T X )c t =
 n1

0 1 −1 0  1
 
 0 0 n2 0   −1 
1 0
0 0 0 n3
1 1
= +
n1 n2

and the confidence interval is


r
1 1
ȳ1 − ȳ2 ± tα/2 s + .
n1 n2

You may have seen this formula before. The linear models
framework has allowed us to derive it from first principles.

Linear statistical models: The less than full rank model 133/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

We find a confidence interval for the estimable quantity µ + τ1 , the


mean mark of class 1.

> tt <- as.vector(c(1,1,0,0))


> halfwidth <- qt(0.975,df=n-k)*sqrt(s2*t(tt)%*%XtXc%*%tt)
> tt %*% b + c(-1,1)*halfwidth

[1] 75.68796 84.11204

> newdata <- data.frame(class.f=factor(1))


> predict(model, newdata, interval="confidence", level=0.95)

fit lwr upr


1 79.9 75.68796 84.11204

Linear statistical models: The less than full rank model 134/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

We find a prediction interval for a new student from class 1.

> tt <- as.vector(c(1,1,0,0))


> halfwidth <- qt(0.975,df=n-k)*sqrt(s2)*
+ sqrt(1+t(tt)%*%XtXc%*%tt)
> tt %*% b + c(-1,1)*halfwidth

[1] 65.93024 93.86976

> newdata <- data.frame(class.f=factor(1))


> predict(model, newdata, interval="prediction", level=0.95)

fit lwr upr


1 79.9 65.93024 93.86976

Linear statistical models: The less than full rank model 135/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

We now find a confidence interval for the estimable quantity


τ1 − τ2 , the difference between the first two classes.

> tt <- as.vector(c(0,1,-1,0))


> halfwidth <- qt(0.975,df=n-k)*sqrt(s2*t(tt)%*%XtXc%*%tt)
> tt %*% b + c(-1,1)*halfwidth

[1] -12.5567252 -0.6432748

> confint(model, level=0.95)

2.5 % 97.5 %
(Intercept) 75.6879592 84.11204
class.f2 0.6432748 12.55673
class.f3 3.5432748 15.45673

Linear statistical models: The less than full rank model 136/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

We have to express more obscure parameter combinations relative


to the treatment contrasts used. Remember:
Label contr.treatment contr.sum
µ + 13 P τi
P
Intercept µ + τ1
class.f1 τ1 − 31 P τi
class.f2 τ2 − τ1 τ2 − 13 τi
class.f3 τ3 − τ1
So for contr.treatment, τ1 − τ2 = −class.f2.
> library(gmodels)
> ci <- estimable(model, c(0,-1,0), conf.int=0.95)
> c(ci$Lower, ci$Upper)
[1] -12.5567252 -0.6432748

Linear statistical models: The less than full rank model 137/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

For the contr.sum model, we have


1
Intercept = µ + (τ1 + τ2 + τ3 )
3
2 1
class.f1 = τ1 − (τ2 + τ3 )
3 3
2 1
class.f2 = τ2 − (τ1 + τ3 )
3 3
τ1 − τ2 = class.f1 − class.f2

> ci2 <- estimable(model2, c(0,1,-1), conf.int=0.95)


> c(ci2$Lower, ci2$Upper)
[1] -12.5567252 -0.6432748

Linear statistical models: The less than full rank model 138/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

To find the difference between class 3 and the average of the other
two classes, we need
1 1 1
τ3 − τ2 − τ1 = (τ3 − τ1 ) − (τ2 − τ1 )
2 2 2
1
= class.f3 − class.f2.
2

> ci3 <- estimable(model, c(0,-0.5,1), conf.int=0.95)


> c(ci3$Lower, ci3$Upper)
[1] 1.041325 11.358675

Linear statistical models: The less than full rank model 139/140
Classification Conditional inverses Normal equations Estimability σ2 Interval estimation

Exam marks example

For contr.sum:
1 1 2
class.f1 + class.f2 = τ1 + τ2 − τ3
3 3 3
1 1 3 3
τ3 − τ2 − τ1 = − class.f1 − class.f2
2 2 2 2

> ci4 <- estimable(model2, c(0,-1.5,-1.5), conf.int=0.95)


> c(ci4$Lower, ci4$Upper)
[1] 1.041325 11.358675

Linear statistical models: The less than full rank model 140/140

You might also like