0% found this document useful (0 votes)
331 views17 pages

Cholesky Decomposition

Cholesky decomposition is a tool that decomposes a variance matrix into a product of a lower triangular matrix and its transpose. It has many uses including variance decomposition, inverting positive definite matrices, fitting emulators, and validating emulators. The Cholesky decomposition identifies independent components or variables that decompose the variance matrix sequentially, with later components conditioning on earlier ones. This property makes it useful for design of experiments and dimension reduction.

Uploaded by

priyanshupandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
331 views17 pages

Cholesky Decomposition

Cholesky decomposition is a tool that decomposes a variance matrix into a product of a lower triangular matrix and its transpose. It has many uses including variance decomposition, inverting positive definite matrices, fitting emulators, and validating emulators. The Cholesky decomposition identifies independent components or variables that decompose the variance matrix sequentially, with later components conditioning on earlier ones. This property makes it useful for design of experiments and dimension reduction.

Uploaded by

priyanshupandey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

Cholesky decomposition –

a tool with many uses

Tony O’Hagan, Leo Bastos

MUCM/SAMSI, April 07 Slide 1


Prununciation
 Cholesky was French …
 Sholesky
 But his name is probably Polish …
 Kholesky
 Or maybe Russian …
 Sholesky or
 Kholesky or
 Tsholesky
 Take your pick!!

http://mucm.group.shef.ac.uk Slide 2
Variance decompositions
 Given a variance matrix Σ for a random vector X, there
exist many matrices C such that
Σ = C CT
 If C is such a matrix then so is CO for any orthogonal O
 Then var(C-1X) = I
 The elements of C-1X are uncorrelated with unit variance

 E.g.
 Eigenvalue decomposition C = QΛ½
 Q orthogonal and Λ diagonal

 Familiar in principal component analysis

 Unique pds square root C = QΛ½QT


 Σ = C
2

http://mucm.group.shef.ac.uk Slide 3
Cholesky decomposition
 The Cholesky decomposition corresponds to the
unique lower triangular matrix C
 Which I’ll write as L to emphasise it’s lower triangular
 Partition Σ and L as
 11 T21  L 0 
    , L   11
 
  21  22   L 21 L 22 
 Then
 L11L11T = Σ11 , L22L22T = Σ22.1 = Σ22 – Σ21Σ11-1Σ21T
 L-1 is also lower triangular
 Therefore
 The first p elements of L-1X decompose the variance
matrix of the first p elements of X
 Remaining elements decompose the residual variance
of the remaining elements of X conditional on the first p

http://mucm.group.shef.ac.uk Slide 4
Computing Cholesky
 Recursion; take p = 1
 L11 = √(Σ11) (scalar)
 L21 = Σ21/L21 (column vector divided by scalar)
 L22 is the Cholesky decomposition of Σ22.1
 Inverse matrix is usually computed in the same
recursion
 Much faster and more accurate than
eigenvalue computation
 Method of choice for inverting pds matrices

http://mucm.group.shef.ac.uk Slide 5
Principal Cholesky
 There are many Cholesky decompositions
 Permute the variables in X
 Obviously gives different decomposition
 Principal Cholesky decomposition (PCD)
 At each step in recursion, permute to bring the largest
diagonal element of Σ (or Σ22.1) to the first element
 Analogous to principal components
 First Cholesky component is the element of X with
largest variance
 Second is a linear combination of the first with the
element with largest variance given the first
 And so on

http://mucm.group.shef.ac.uk Slide 6
Numerical stability
 Important when decomposing or inverting near-
singular matrices
 A problem that arises routinely with Gaussian process
methods
 I’ve been using PCD for this purpose for about 20 years
 Division by √(Σ11) is the major cause of instability
 Rounding error magnified
 Rounding error can even cause √(Σ11) to be –ve
 PCD postpones this problem to late stages of the
recursion
 The whole of Σ22.1 is then small, and we can truncate
 Analogous to not using all the principal components

http://mucm.group.shef.ac.uk Slide 7
Fitting emulators
 A key step in the fitting process is inverting the matrix
A of covariances between the design points
 Typically needs to be done many times
 Problems arise when points are close together relative
to correlation length parameters
 A becomes ill-conditioned
 E.g. points on trajectory of a dynamic model

 Or in Herbie’s optimisation

 PCD allows us to identify redundant design points


 Simply discard them
 But it’s important to check fit
 Observed function values at discarded points should be
consistent with the (narrow) prediction bounds given
included points

http://mucm.group.shef.ac.uk Slide 8
Validating emulators
 Having fitted the emulator, we test its predictions
against new model runs
 We can look at standardised residuals for individual
predicted points
 But these are correlated
 Mahalanobis distance to test the whole set
D = (y – m)TV-1(y – m)
 Where y is new data, m is mean vector and V is
predictive variance matrix
 Approx χ2 with df equal to number of new points
 Any decomposition of V decomposes D
 Eigenvalues useful but may be hard to interpret

http://mucm.group.shef.ac.uk Slide 9
Validating emulators (2)
 PCD keeps the focus on individual points, but
conditions them on other points
 Initially picks up essentially independent points
 Interest in later points, where variance is
reduced conditional on other points
 In principle, points with the smallest conditional
variances may provide most stringent tests
 But may be badly affected by rounding error

http://mucm.group.shef.ac.uk Slide 10
Example
Nilson-Kuusk Model
Analysis by Leo Bastos
5 inputs, 150 training runs
100 validation runs
D = 1217.9

http://mucm.group.shef.ac.uk Slide 11
Example (cont)

Plots of PCD
components
against each of
the 5 inputs

http://mucm.group.shef.ac.uk Slide 12
Functional outputs
 Various people have used principal
components to do dimension reduction on
functional or highly multivariate outputs
 PCD selects instead a subset of points on the
function (or individual outputs)
 Could save time if not all outputs are needed
 Or save observations when calibrating
 Helps with interpretation
 Facilitates eliciting prior knowledge

http://mucm.group.shef.ac.uk Slide 13
Design
 Given a set of candidate design points, PCD picks
ones with highest variances
 Strategy already widely advocated

Fourcome
Next corners
first.
centres of the
The next 12
sides.
points
Then fill in
centre.
Thena 5x5
thegrid.
four
blue points

http://mucm.group.shef.ac.uk Slide 14
Design (2)
 However, an alternative is to pick points that most
reduce uncertainty in remaining points
 At each step of the recursion, choose point that
maximises L11 + L21L21T/L11

First
Next point
we get in the 4
theThe
centre.
green next 12
points
are all
(notice over
first loss
Then 4 points
the place, but
of symmetry).
around it.
still largely
The four blue
avoid the
points are a
central area.
surprise!

http://mucm.group.shef.ac.uk Slide 15
Design (3)
 Consider adding points to an existing latin hypercube
design

Total variance
version again
fills space less
evenly, but …?

http://mucm.group.shef.ac.uk Slide 16
Conclusion(s)

I ♥ Cholesky!

http://mucm.group.shef.ac.uk Slide 17

You might also like