0% found this document useful (0 votes)

5 views

MathModel_Lecture 8 1

This document discusses multicollinearity in multiple linear regression, explaining its implications and providing mathematical proofs related to eigenvalues and variance. It introduces ridge regression as a solution to multicollinearity by adding a parameter to improve numerical stability, and outlines the properties of ridge estimates. Additionally, it covers principal component regression, emphasizing the transformation of data into a lower-dimensional space to maximize variance.

Uploaded by

lxz1160915566

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

MathModel_Lecture 8 1

Uploaded by

lxz1160915566

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Mathematical Modeling

Lecture 8. Solve the Problem of Multicollinearity

Yizun Lin
Department of Mathematics, Jinan University
May 29, 2024

1 Multicollinearity
A basic assumption in the multiple linear regression model is that the column rank of the
data matrix X in Lecture 5 is full (K + 1), that is, the column vectors in X are linearly
independent. Let 0n denote the vector of n zeros. If there exists a nonzero vector a =
(a0 , a1 , . . . , aK )> such that
Xa = 0T , (1.1)
then we say that the independent variables X1 , X2 , . . . , XK exist perfect multicollinearity.
In practical problems, perfect multicollinearity is rare. However, it is common that (1.1) is
approximately true, that is,
Xa ≈ 0T . (1.2)
When the independent variables X1 , X2 , . . . , XK satisfies (1.2), we say that they exist collinear-
ity.
Suppose that the regression model

y = β0 + β1 X1 + · · · + βK XK +

exist perfect multicollinearity. Then the column rank of data matrix X in Lecture 5 is strictly
smaller than K + 1. In this case, |X > X| = 0, and the inverse of X > X does not exist. For
the case that (1.2) holds, though we have rank(X) = K + 1, the determinant of X > X will
be very small, that is, |X > X| ≈ 0, which implies that (X > X)−1 must have some very large
diagonal entry. Note that the dispersion matrix (covariance matrix) of b = (X > X)−1 X > y
is given by D(b) = σ 2 (X > X)−1 , which is shown as follows:

D(b) = D (X > X)−1 X > y

= (X > X)−1 X > D(y)X(X > X)−1 (use the fact D(Ay) = AD(y)A> )
= σ 2 (X > X)−1 X > X(X > X)−1
= σ 2 (X > X)−1 .

1
Since the diagonal entries of D(b) are var(b0 ), var(b1 ), . . . , var(bK ), there exists some j ∈
{0, 1, . . . , K} such that var(bj ) is very large, which leads to low accuracy to estimate βj .

Proposition 1.1. Let X ∈ Rm×n . Then the following are equivalent:

(i) The column vectors of X exist multicollinearity.
(ii) X > X exists an eigenvalue that is close to 0.
(iii) |X > X| ≈ 0.

Proof. Note that X > X is symmetric positive semi-definite. There exists an orthonormal
matrix U ∈ Rn×n (consists of the eigenvectors of X > X) such that

X > X = U ΛU > , (1.3)

where Λ = diag{λ1 , λ2 , . . . , λn }, λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0 are the eigenvalues of X > X.

(i) ⇔ (ii): Suppose that (i) holds. Then there exists a nonzero vector a ∈ Rn such that
Xa ≈ 0m , which implies that
a> X > Xa ≈ 0. (1.4)
Substituting (1.3) into (1.4) yields that

z > Λz ≈ 0, (1.5)

where z := U > a. Since U > is nonsingular and a 6= 0n , we have that z 6= 0n , that is, there
exists some j 0 ∈ Nn such that zj 0 6= 0. From (1.5), we have that
n
X
λj 0 zj20 ≤ λj zj2 = z > Λz ≈ 0,
j=1

which implies that λj 0 ≈ 0, and hence item (ii) follows.

Conversely, suppose that λ ≈ 0 is an eigenvalue of X > X, with corresponding eigenvector
u ∈ Rn (u 6= 0n ). Then
X > Xu = λu ≈ 0n ,
which implies that
kXuk2 = u> X > Xu ≈ 0.
Hence Xu ≈ 0n . Since u 6= 0n , we get the multicollinearity of the column vectors of X.
(ii) ⇔ (iii): It is easy to see from (1.3) that |X > X| = λ1 λ2 · · · λn . Then item (iii)
follows from item (ii) immediately. Suppose that item (iii) holds. Then λ1 λ2 · · · λn ≈ 0. Of
course, there must be some j 00 ∈ Nn such that λj 00 ≈ 0, that is, item (ii) holds.

2
Proposition 1.2. Let A ∈ Rn×n be a symmetric positive definite matrix. If |A| → ∞, then
A has very large diagonal entries.

Proof. Since A is symmetric positive definite, there exists an orthonormal matrix U (consists
of eigenvectors of A) such that
A = U ΛU > , (1.6)
where Λ = diag{λ1 , λ2 , . . . , λn }, λ1 ≥ λ2 ≥ · · · ≥ λn > 0 are the eigenvalues of A. Note that
|A| = λ1 λ2 · · · λn . We have λ1 → ∞ since |A| → ∞. It follows from (1.6) that

ajj = λ1 u2j1 + λ2 u2j2 + · · · + λn u2jn , j ∈ Nn .

Since u1 = (u11 , u21 , . . . , un1 )> is a unit vector, it is impossible that all components of u1
are sufficient small, that is, there exists some j ∈ Nn such that uj1 is not too small, which
implies that
ajj ≥ λ1 u2j1 → ∞.
This completes the proof.

We next provide a example of bivariate regression showing that as the correlation of the
independent variables increases, the variance of the estimator will also increase.
Consider linear regression for the dependent variable y and two independent variable x1
and x2 . Assume that x1 , x2 and y are all centralized, and the intercept term in the regression
equation is 0. The regression equation is given by

ŷ = b1 x1 + b2 x2 .

Define n n n
X 2 X X 2
(i) (i) (i) (i)
L11 = x1 , L12 = x1 x 2 , L22 = x2 .
i=1 i=1 i=1

Then the correlation coefficient of x1 and x2 is given by

L12
r12 = √ .
L11 L22
In addition, !
> L11 L12
X X= ,
L12 L22

3
which implies that
!
>
−1 1 L22 −L12
X X =
|X > X| −L12 L11
!
1 L22 −L12
=
L11 L22 − L212 −L12 L11
!
1 L22 −L12
= 2
.
L11 L22 (1 − r12 ) −L12 L11
Thus,
σ2 σ2
var(b1 ) = 2
, var(b 2 ) = 2
.
(1 − r12 )L11 (1 − r12 )L22
Now we know that as the correlation of x1 and x2 increases, the variance of the b1 and b2
will also increase. When x1 and x2 are completely correlated, that is, r12 = 1, the variance
will tend to +∞.

2 Ridge Regression
When the independent variables exist multicollinearity, the smallest eigenvalue of X > X
tends to 0. Note that if λ is the eigenvalue of X > X, then λ + κ is the eigenvalue of
X > X + κI. To avoid too small eigenvalue, we can add a matrix κI (κ > 0) for X > X.
Considering the different scale of variables, we first standardize the data, and still denote
the data matrix by X. We call the equation

b(κ) = (X > X + κI)−1 X > y (2.1)

the ridge estimate of β, and κ the ridge parameter.

Note that the optimization problem for linear regression is given by

min kXβ − yk2 .

β∈RK+1

It is easy to verify that the optimization problem for ridge regression is given by

min kXβ − yk2 + κkβk2 .

β∈RK+1

Ridge regression is a supplement to least squares regression. It loses unbiasedness, in

exchange for high numerical stability, and thus obtains higher calculation accuracy. We next
consider some properties of ridge regression.
Property 1. b(κ) is a bias estimator of β.

4
Proof.

E(b(κ)) = E (X > X + κI)−1 X > y

= (X > X + κI)−1 X > E(y)

= (X > X + κI)−1 X > Xβ.

From the above equation, we know that b(κ) is a bias estimator of β when κ 6= 0.

Property 2. b(κ) is a linear transform of the least square estimator b.

Proof. By the definition of b(κ) and b, we have that

b(κ) = (X > X + κI)−1 X > y

= (X > X + κI)−1 X > X(X > X)−1 X > y
= (X > X + κI)−1 X > Xb, (2.2)

which complies the proof.

Property 3. For any κ > 0 and kbk =

6 0, we have kb(κ)k < kbk.

Proof. It follows from (2.2) that

kb(κ)k = k(X > X + κI)−1 X > Xbk

= k(X > X + κI)−1 (X > X + κI − κI)bk
= k(I − κ(X > X + κI)−1 )bk
≤ kI − κ(X > X + κI)−1 k2 kbk

For any eigenvalue λ of X > X, the corresponding eigenvalue of I − κ(X > X + κI)−1 is given
κ
by 1 − λ+κ < 1, which implies that kI − κ(X > X + κI)−1 k2 < 1, and hence kb(κ)k < kbk
for kbk =
6 0.

3 Principal Component Regression

Note that the definition of matrix X in this section is different from that in Lecture 7. The
goal of the PCA technique is to find a lower dimensional space or PCA space W that is used

to transform the data X = x(1) , x(2) , . . . , x(T ) from a higher dimensional space (RK ) to a
lower dimensional space (Rk ), where T represents the total number of samples or observations
and x(i) ∈ RK represents ith sample, pattern, or observation. Each sample is represented as

5
a point in K-dimensional space. The direction of the PCA space represents the direction of
the maximum variance of the given data as shown in Figure 1. As shown in the figure, the
PCA space consists of a number of PCs. Each principal component has a different robustness
according to the amount of variance in its direction. The PCA space consists of k principal
components. The principal components are uncorrelated, and it represents the direction of
the maximum variance.

Performing a linear transformation on the vector x ∈ RK , we are able to get a new vector
z ∈ RK , i.e., 


 z1 = a11 x1 + a12 x2 + · · · + a1K xK

z = a x + a x + · · · + a x

2 21 1 22 2 2K K
 .
..




zK = aK1 x1 + aK2 x2 + · · · + aKK xK


Lemma 3.1. Let a ∈ RK be a constant vector, Σ be the covariance matrix for the elements
of x ∈ RK . Then var(a> x) = a> Σa.

According to Lemma 3.1, if we don’t add any constraint for the transform matrix A, the

6
variance of zi can be arbitrary large. We shall assume that the following three principles
hold:
(1) A> A = I;
(2) zi and zj are uncorrelated for i 6= j;
(3) z1 gets maximum variance for all A satisfies principle (1). zi gets the ith largest
variance, i ∈ NK .
Based on the above three principles, we call zi the ith principal component of the original
variable x.
We next explain the relationship between maximum variance and principle component.
Let Σ be the covariance matrix for the elements of x ∈ RK , λ1 ≥ λ2 ≥ · · · ≥ λK ≥ 0 be the
eigenvalues of Σ. The spectral decomposition of Σ is given by
K
X
>
Σ = U ΛU = λi u i u >
i ,
i=1

where Λ = diag{λ1 , λ2 , . . . , λK }, λ1 ≥ λ2 ≥ · · · ≥ λK > 0 are the eigenvalues of Σ and

u1 , u2 , . . . , uK are the corresponding eigenvectors. Of course, the column vector A> 1,: can be
written as a linear combination of the orthonormal basis u1 , u2 , . . . , uK , that is,
K
X
A>
1,: = αj uj .
j=1

PK
By the fact kA>
1,: k = 1, we are able to show that i=1 αi2 = 1, which is presented as follows:

K
X K
X K
X
αi2 =h αi ui , αi ui i = kA> 2
1,: k = 1.
i=1 i=1 i=1

Now we have that

var(z1 ) = A1,: ΣA>

1,:
K
X K
X
= λi A1,: ui u> >
i A1,: = λi (A1,: ui )2
i=1 i=1
K K
!2 K
X X X
= λi αj u>
j ui = λi αi2
i=1 j=1 i=1
K
X
≤ λ1 αi2 = λ1 .
i=1

7
Moreover, it is easy to verify that var(z1 ) = λ1 when A1,: = u> >
1 . Therefore, z1 = u1 x has
maximum variance λ1 .
In fact, we are able to prove that the ith principal component of x is given by

zi = u>
i x, i = 1, 2, . . . , K

and we have
var(zi ) = λi ,
cov(zi , zj ) = u>
i Σuj = 0, i 6= j.

In order to eliminate the influence of different scales of variables, we usually centralize

and normalize the data.

Steps of PCA:

1. Given a data matrix X = x(1) , x(2) , . . . , x(T ) , where T represents the total num-
ber of samples and x(i) ∈ RK represents the ith sample.
2. Compute the mean of all samples: µ = T1 Ti=1 x(i) .
P

3. Subtract the mean from all samples as follows:

D = d(1) , d(2) , . . . , d(T ) , d(i) = x(i) − µ, i ∈ NT .

4. Compute the covariance matrix: C = T −1 1

DD > .
5. Compute the eigenvectors V and eigenvalues λ of the covariance matrix C.
6. Sort eigenvectors according to their corresponding eigenvalues.
7. Select the eigenvectors that have the largest eigenvalues W = (v1 , v2 , . . . , vk ). The
selected eigenvectors W represent the projection space of PCA.
8. All samples are projected onto the lower dimensional space of PCA as follows:

Z = W > D.

Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
0% (1)
Solution Manual For Discrete Time Signal Processing 3 E 3rd Edition Alan V Oppenheim Ronald W Schafer
4 pages
Advanced Econometrics PDF
No ratings yet
Advanced Econometrics PDF
58 pages
M208 Assignment Book 1
No ratings yet
M208 Assignment Book 1
12 pages
1 Regression Analysis and Least Squares Estimators
No ratings yet
1 Regression Analysis and Least Squares Estimators
8 pages
Lecture 14: Multiple Linear Regression 1 Review of Simple Linear Regression in Matrix Form
No ratings yet
Lecture 14: Multiple Linear Regression 1 Review of Simple Linear Regression in Matrix Form
7 pages
Lect 6
No ratings yet
Lect 6
20 pages
1 Regression Analysis and Least Squares Estimators
No ratings yet
1 Regression Analysis and Least Squares Estimators
7 pages
Lecture II - Docx - 12
No ratings yet
Lecture II - Docx - 12
12 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Lecture 17: Multicollinearity 1 Why Collinearity Is A Problem
No ratings yet
Lecture 17: Multicollinearity 1 Why Collinearity Is A Problem
9 pages
Aprendizaje Estadistico Final
No ratings yet
Aprendizaje Estadistico Final
71 pages
Multicolinearidade
No ratings yet
Multicolinearidade
24 pages
Midtermsols Sp2010
No ratings yet
Midtermsols Sp2010
6 pages
Elementary Regression Theory: Theorem 1. If The K × 1 Vector Y Is Distributed As N (, V), and If B Is An M × K Matrix
No ratings yet
Elementary Regression Theory: Theorem 1. If The K × 1 Vector Y Is Distributed As N (, V), and If B Is An M × K Matrix
37 pages
Pca
No ratings yet
Pca
10 pages
1981 Estimating the Dimension of a Linear-model_j. Andel, m. g. Perez and a. i. Negrao
No ratings yet
1981 Estimating the Dimension of a Linear-model_j. Andel, m. g. Perez and a. i. Negrao
12 pages
Multi Col Linearity
No ratings yet
Multi Col Linearity
20 pages
ECN302E ProblemSet06 PredictionWithManyRegressorsAndBDPart2 Solutions
No ratings yet
ECN302E ProblemSet06 PredictionWithManyRegressorsAndBDPart2 Solutions
6 pages
Weather Wax Hastie Solutions Manual
No ratings yet
Weather Wax Hastie Solutions Manual
18 pages
Solutions For Practice Set
No ratings yet
Solutions For Practice Set
7 pages
Tutorial On Principal Component Analysis: Javier R. Movellan
No ratings yet
Tutorial On Principal Component Analysis: Javier R. Movellan
9 pages
Factor Analysis
No ratings yet
Factor Analysis
57 pages
Advanced Econometrics: Masters Class
No ratings yet
Advanced Econometrics: Masters Class
24 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
WST 311 - Part 1 2023
No ratings yet
WST 311 - Part 1 2023
59 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
LMnotes 04
No ratings yet
LMnotes 04
9 pages
EC1 Slides Part4
No ratings yet
EC1 Slides Part4
35 pages
Linear Model
No ratings yet
Linear Model
11 pages
Multiple Regression Model - Matrix Form
No ratings yet
Multiple Regression Model - Matrix Form
22 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Stat331-Multiple Linear Regression
No ratings yet
Stat331-Multiple Linear Regression
13 pages
Technometrics
No ratings yet
Technometrics
14 pages
Symmetric-Matrices-and-Eigendecomposition
No ratings yet
Symmetric-Matrices-and-Eigendecomposition
15 pages
Multivariate Statistics - An Introduction 8th Edition
100% (1)
Multivariate Statistics - An Introduction 8th Edition
202 pages
15
No ratings yet
15
10 pages
Lec4
No ratings yet
Lec4
17 pages
AAB Week 21
No ratings yet
AAB Week 21
7 pages
Mathematical model
No ratings yet
Mathematical model
34 pages
Ecd 01
No ratings yet
Ecd 01
16 pages
11668a5f867641748200d0bfd6a889a3_hst951_7
No ratings yet
11668a5f867641748200d0bfd6a889a3_hst951_7
32 pages
The Best Approximation Theorem INCOMPLETE
No ratings yet
The Best Approximation Theorem INCOMPLETE
4 pages
a bình phương tối tiểu
No ratings yet
a bình phương tối tiểu
11 pages
Stat 378
No ratings yet
Stat 378
73 pages
Classical Linear Regression and Its Assumptions
No ratings yet
Classical Linear Regression and Its Assumptions
63 pages
Linear Regression: 1 1 N N I I I D I I
No ratings yet
Linear Regression: 1 1 N N I I I D I I
20 pages
Matrix OLS NYU Notes
No ratings yet
Matrix OLS NYU Notes
14 pages
EC1 Final 2018 03 27 Sol
No ratings yet
EC1 Final 2018 03 27 Sol
3 pages
Least Square by Nicholson-linear algebra-2018
No ratings yet
Least Square by Nicholson-linear algebra-2018
12 pages
05 Lectureslides Kernels
No ratings yet
05 Lectureslides Kernels
47 pages
Chapter 6: Regression
No ratings yet
Chapter 6: Regression
7 pages
Linear Model
No ratings yet
Linear Model
14 pages
Linear Method of Moments 1.1. The Model
No ratings yet
Linear Method of Moments 1.1. The Model
15 pages
dis1_sol
No ratings yet
dis1_sol
8 pages
Solution Manual For Econometric Analysis 7th Edition by Greene
No ratings yet
Solution Manual For Econometric Analysis 7th Edition by Greene
12 pages
Econ-607 - Unit2-W1-3
No ratings yet
Econ-607 - Unit2-W1-3
117 pages
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Introduction to Bessel Functions
From Everand
Introduction to Bessel Functions
Frank Bowman
2.5/5 (1)
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
From Everand
ALGEBRA SIMPLIFIED EQUATIONS WORKBOOK WITH ANSWERS: Linear Equations, Quadratic Equations, Systems of Equations
Luke Aneke
No ratings yet
Decomposition Method
No ratings yet
Decomposition Method
13 pages
Determinant: A B C A B C A B C
No ratings yet
Determinant: A B C A B C A B C
16 pages
Stiffness Matrix
No ratings yet
Stiffness Matrix
2 pages
Operations research handbook Horst A. Eiselt All Chapters Instant Download
100% (2)
Operations research handbook Horst A. Eiselt All Chapters Instant Download
81 pages
Vector Notes (Year 2)
No ratings yet
Vector Notes (Year 2)
13 pages
Instant Access to Applied Linear Algebra and Matrix Methods 1st Edition Feeman Timothy G ebook Full Chapters
100% (5)
Instant Access to Applied Linear Algebra and Matrix Methods 1st Edition Feeman Timothy G ebook Full Chapters
20 pages
+2 JEE Mathematics M-1
No ratings yet
+2 JEE Mathematics M-1
48 pages
Introduction to Classical Mechanics - Lecture Notes
No ratings yet
Introduction to Classical Mechanics - Lecture Notes
65 pages
Tut 02
No ratings yet
Tut 02
1 page
Mecanica de Medios Continuos Problemas
90% (10)
Mecanica de Medios Continuos Problemas
612 pages
Vector Products Dot and Cross Product Sept 14
No ratings yet
Vector Products Dot and Cross Product Sept 14
38 pages
05 Linear Algebra and Machine Learning
0% (1)
05 Linear Algebra and Machine Learning
24 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Mathematical: Proofs
100% (1)
Mathematical: Proofs
6 pages
ProblemSheets2015 Solutions
No ratings yet
ProblemSheets2015 Solutions
202 pages
DETERMINANTS ASSIGNMENTS
No ratings yet
DETERMINANTS ASSIGNMENTS
1 page
Convex - Module A Part 2
No ratings yet
Convex - Module A Part 2
27 pages
Vojtech Jarnik International Math Competition (2002-2012)
No ratings yet
Vojtech Jarnik International Math Competition (2002-2012)
72 pages
"Just The Maths" Unit Number 8.6 Vectors 6 (Vector Equations of Planes) by A.J.Hobson
No ratings yet
"Just The Maths" Unit Number 8.6 Vectors 6 (Vector Equations of Planes) by A.J.Hobson
10 pages
Lecture Notes For Ph219/CS219: Quantum Information: John Preskill California Institute of Technology Updated July 2015
No ratings yet
Lecture Notes For Ph219/CS219: Quantum Information: John Preskill California Institute of Technology Updated July 2015
53 pages
Practice questions-EE5180
No ratings yet
Practice questions-EE5180
2 pages
Class Xii Maths Study Material 2023-24 by KVS, Ernakulam
No ratings yet
Class Xii Maths Study Material 2023-24 by KVS, Ernakulam
518 pages
Determinant Thay Minh Toan
No ratings yet
Determinant Thay Minh Toan
28 pages
How To Learn Quantum Mechanics On Your Own PDF
No ratings yet
How To Learn Quantum Mechanics On Your Own PDF
10 pages
Exam2 Sol
No ratings yet
Exam2 Sol
6 pages
9240BSC Maths V-Semester Year Wise (2019-2024)
No ratings yet
9240BSC Maths V-Semester Year Wise (2019-2024)
15 pages
ES386 Unit 07 Slides MFF Released
No ratings yet
ES386 Unit 07 Slides MFF Released
67 pages
Hibbeler S14 e CH 2 P 83
No ratings yet
Hibbeler S14 e CH 2 P 83
2 pages
Math-2 MCQ QUESTION BANK
No ratings yet
Math-2 MCQ QUESTION BANK
20 pages

MathModel_Lecture 8 1

Uploaded by

MathModel_Lecture 8 1

Uploaded by

Mathematical Modeling

Lecture 8. Solve the Problem of Multicollinearity

D(b) = D (X > X)−1 X > y

Proposition 1.1. Let X ∈ Rm×n . Then the following are equivalent:

X > X = U ΛU > , (1.3)

where Λ = diag{λ1 , λ2 , . . . , λn }, λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0 are the eigenvalues of X > X.

which implies that λj 0 ≈ 0, and hence item (ii) follows.

ajj = λ1 u2j1 + λ2 u2j2 + · · · + λn u2jn , j ∈ Nn .

Then the correlation coefficient of x1 and x2 is given by

b(κ) = (X > X + κI)−1 X > y (2.1)

the ridge estimate of β, and κ the ridge parameter.

min kXβ − yk2 .

min kXβ − yk2 + κkβk2 .

Ridge regression is a supplement to least squares regression. It loses unbiasedness, in

E(b(κ)) = E (X > X + κI)−1 X > y

= (X > X + κI)−1 X > E(y)

Property 2. b(κ) is a linear transform of the least square estimator b.

Proof. By the definition of b(κ) and b, we have that

b(κ) = (X > X + κI)−1 X > y

which complies the proof.

Property 3. For any κ > 0 and kbk =

Proof. It follows from (2.2) that

kb(κ)k = k(X > X + κI)−1 X > Xbk

3 Principal Component Regression

where Λ = diag{λ1 , λ2 , . . . , λK }, λ1 ≥ λ2 ≥ · · · ≥ λK > 0 are the eigenvalues of Σ and

Now we have that

var(z1 ) = A1,: ΣA>

In order to eliminate the influence of different scales of variables, we usually centralize

3. Subtract the mean from all samples as follows:

D = d(1) , d(2) , . . . , d(T ) , d(i) = x(i) − µ, i ∈ NT .

4. Compute the covariance matrix: C = T −1 1

You might also like