Cholesky Decomposition
Cholesky Decomposition
http://mucm.group.shef.ac.uk Slide 2
Variance decompositions
Given a variance matrix Σ for a random vector X, there
exist many matrices C such that
Σ = C CT
If C is such a matrix then so is CO for any orthogonal O
Then var(C-1X) = I
The elements of C-1X are uncorrelated with unit variance
E.g.
Eigenvalue decomposition C = QΛ½
Q orthogonal and Λ diagonal
http://mucm.group.shef.ac.uk Slide 3
Cholesky decomposition
The Cholesky decomposition corresponds to the
unique lower triangular matrix C
Which I’ll write as L to emphasise it’s lower triangular
Partition Σ and L as
11 T21 L 0
, L 11
21 22 L 21 L 22
Then
L11L11T = Σ11 , L22L22T = Σ22.1 = Σ22 – Σ21Σ11-1Σ21T
L-1 is also lower triangular
Therefore
The first p elements of L-1X decompose the variance
matrix of the first p elements of X
Remaining elements decompose the residual variance
of the remaining elements of X conditional on the first p
http://mucm.group.shef.ac.uk Slide 4
Computing Cholesky
Recursion; take p = 1
L11 = √(Σ11) (scalar)
L21 = Σ21/L21 (column vector divided by scalar)
L22 is the Cholesky decomposition of Σ22.1
Inverse matrix is usually computed in the same
recursion
Much faster and more accurate than
eigenvalue computation
Method of choice for inverting pds matrices
http://mucm.group.shef.ac.uk Slide 5
Principal Cholesky
There are many Cholesky decompositions
Permute the variables in X
Obviously gives different decomposition
Principal Cholesky decomposition (PCD)
At each step in recursion, permute to bring the largest
diagonal element of Σ (or Σ22.1) to the first element
Analogous to principal components
First Cholesky component is the element of X with
largest variance
Second is a linear combination of the first with the
element with largest variance given the first
And so on
http://mucm.group.shef.ac.uk Slide 6
Numerical stability
Important when decomposing or inverting near-
singular matrices
A problem that arises routinely with Gaussian process
methods
I’ve been using PCD for this purpose for about 20 years
Division by √(Σ11) is the major cause of instability
Rounding error magnified
Rounding error can even cause √(Σ11) to be –ve
PCD postpones this problem to late stages of the
recursion
The whole of Σ22.1 is then small, and we can truncate
Analogous to not using all the principal components
http://mucm.group.shef.ac.uk Slide 7
Fitting emulators
A key step in the fitting process is inverting the matrix
A of covariances between the design points
Typically needs to be done many times
Problems arise when points are close together relative
to correlation length parameters
A becomes ill-conditioned
E.g. points on trajectory of a dynamic model
Or in Herbie’s optimisation
http://mucm.group.shef.ac.uk Slide 8
Validating emulators
Having fitted the emulator, we test its predictions
against new model runs
We can look at standardised residuals for individual
predicted points
But these are correlated
Mahalanobis distance to test the whole set
D = (y – m)TV-1(y – m)
Where y is new data, m is mean vector and V is
predictive variance matrix
Approx χ2 with df equal to number of new points
Any decomposition of V decomposes D
Eigenvalues useful but may be hard to interpret
http://mucm.group.shef.ac.uk Slide 9
Validating emulators (2)
PCD keeps the focus on individual points, but
conditions them on other points
Initially picks up essentially independent points
Interest in later points, where variance is
reduced conditional on other points
In principle, points with the smallest conditional
variances may provide most stringent tests
But may be badly affected by rounding error
http://mucm.group.shef.ac.uk Slide 10
Example
Nilson-Kuusk Model
Analysis by Leo Bastos
5 inputs, 150 training runs
100 validation runs
D = 1217.9
http://mucm.group.shef.ac.uk Slide 11
Example (cont)
Plots of PCD
components
against each of
the 5 inputs
http://mucm.group.shef.ac.uk Slide 12
Functional outputs
Various people have used principal
components to do dimension reduction on
functional or highly multivariate outputs
PCD selects instead a subset of points on the
function (or individual outputs)
Could save time if not all outputs are needed
Or save observations when calibrating
Helps with interpretation
Facilitates eliciting prior knowledge
http://mucm.group.shef.ac.uk Slide 13
Design
Given a set of candidate design points, PCD picks
ones with highest variances
Strategy already widely advocated
Fourcome
Next corners
first.
centres of the
The next 12
sides.
points
Then fill in
centre.
Thena 5x5
thegrid.
four
blue points
http://mucm.group.shef.ac.uk Slide 14
Design (2)
However, an alternative is to pick points that most
reduce uncertainty in remaining points
At each step of the recursion, choose point that
maximises L11 + L21L21T/L11
First
Next point
we get in the 4
theThe
centre.
green next 12
points
are all
(notice over
first loss
Then 4 points
the place, but
of symmetry).
around it.
still largely
The four blue
avoid the
points are a
central area.
surprise!
http://mucm.group.shef.ac.uk Slide 15
Design (3)
Consider adding points to an existing latin hypercube
design
Total variance
version again
fills space less
evenly, but …?
http://mucm.group.shef.ac.uk Slide 16
Conclusion(s)
I ♥ Cholesky!
http://mucm.group.shef.ac.uk Slide 17