Lecture 7
Lecture 7
186
Iterative solvers and optimization
Conjugate gradients
Preconditioning
Review material can be found in Ch. 6 of Aster et al. (2005) and Section
2.3 of Rawlinson and Sambridge (2003).
187
Fully nonlinear inversion
and parameter search
Recap: linear and nonlinear inverse problems
Linear problems
Single minima
Gradient methods work
Quadratic convergence
Many unknowns
d = Gm
−1 −1
φ(d, m) = (d−Gm)T CD (d−Gm)+μ(m−mo)T CM (m−mo)
189
Recap: linear and nonlinear inverse problems
−1
φ(d, m) = (δ d − Gδ m)T CD (δ d − Gδ m) +
−1
μ(m − mo)T CM (m − m o )
190
Recap: linear and nonlinear inverse problems
192
Example: nonlinear inverse problems
Courtesy P. King
193
Monte Carlo methods
195
Monte Carlo methods
…but did someone think of it earlier ?
S = k ln W
196
Monte Carlo methods
…but Monte Carlo solutions had been around for even longer
197
Buffon’s needle problem
½ q µ ¶¾
h 2 t
= l− l2 − t2 − t sin−1 +1
n tπ l
199
Direct search
200
Direct search methods
Simulated Annealing
(Thermodynamic analogy)
Genetic/evolutionary algorithms
(Biological analogy)
Neighbourhood algorithm
201
Uniform random search
For M unknowns we
have an M-
dimensional V = LM
parameter space. The
volume of a cube with
side L is
The curse of
dimensionality
always gets you
in the end !
a = L/2 a = 0.9L
M
1 50% 90% Proportion of volume in the
2 25% 81% outside shell always dominates
over the interior
5 3% 50%
10 0.1% 35% All volume tends to be in the
1 exterior shell as M ↑
20 10000 % 12%
50 10−15% 0.5% 203
Example: Uniform random search
Press (1968)
204
Nested uniform search
S North (s/km)
S East (s/km)
1 −1 μ −1
φ(m) = (d−g(m))T CD (d−g(m))+ (m−mo)T CM (m−mo)
2 2
207
Global optimization: Simulated Annealing
− φ(m )
−φ(m )1/T
σ(m, T ) = e T =e
σ(m, T) is a probability density function for m at temperature T. In Simulated
Annealing we associate the energy σ(m) with negative log likelihood, or the
objective function in the inverse problem to be minimized e.g.
1 −1 μ −1
φ(m) = (d−g(m))T CD (d−g(m))+ (m−mo)T CM (m−mo)
2 2
The minimum of φ(m) corresponds to the maximum of the PDF σ(m, T).
A quadratic and
a Gaussian
− φ(m )
σ(m, T ) = e T
If the system is cooled too quickly then we get a local minimum. If it is cooled
too slowly the we waste a lot of energy (forward) evaluation. The optimum
annealing schedule will depend on the complexity of the energy function φ(m).
209
Global optimization: Simulated Annealing
Global optimization using a heat bath
But what role does T play ?
T = 1000
T = 100
T = 10
T=1
T = 0.1
T = 0.01
φ(m)
T
φ(m)
σ(m, T ) = e− T
212
Global optimization: Simulated Annealing
− φ(m )
σ(m, T ) = e T
If the system is cooled too quickly then we get a local minimum. If it is cooled
too slowly the we waste a lot of energy (forward) evaluation. The optimum
annealing schedule will depend on the complexity of the energy function φ(m).
213
Global optimization: Simulated Annealing
∆φ = φ(mnew ) − φ(mcur )
Accept new model if fit improved
If ∆φ < 0, then mcur = mnew
−∆φ
Accept new model with probability p if fit worse p = e T
P(accepted model)
215
Simulated Annealing Example: TSP
In a famous paper Kirkpatrick et al. (1983) showed how simulated
annealing could be used to solve a difficult combinatorial optimization
problem known as the Travelling salesman problem.
216
Recap: Simulated Annealing
217
Global optimization: Genetic algorithms
Human genotypes
Phenotypes of Albert
Phenotypes of Sharon
220
Genetic algorithms: bit encoding
Here the actual binary coding would make sense: 0 would mean sand and 1 would
represent gold. What if we want to describe a function taking on arbitrary values?
Example:
layer thickness d and velocity c. Then a model vector m would look like:
… and could simply be described by a long bit-string. It is your choice how many
bits you use for the possible range of values for each parameter.
221
Genetic algorithms: Selection
Fitness
222
Genetic algorithms: Crossover
Fitness
Randomly choose a common point on the strings to cut and swap over
Before After
224
Genetic algorithms: Iterations
Example showing a GA
operating on the peaks
function above
Ad hoc choices
Any chosen mapping can be arbitrary and not work for all
ranges of data fit.
227
Genetic algorithms: features
228
Neighbourhood Algorithm
230
Parameter search: Neighbourhood algorithm
231
Parameter search: Neighbourhood algorithm
A conceptually simple methods for adaptive sampling of multi-
dimensional parameter spaces.
n=3 m=1
Repeat the process iteratively each time updating the Voronoi cells and
generating n samples from a random walk inside m neighbourhoods.
233
Parameter search: Neighbourhood algorithm
How is it implemented ?
(
1/Vi : if m inside cell i
σ(m) =
0 : otherwise
First we sample from the
conditional PDF along the x1 axis
Must solve for the intersection points between the 1-D axis and the edge of
each Voronoi cell. 234
Parameter search: Neighbourhood algorithm
235
Example: Neighbourhood algorithm
236
Examples: Neighbourhood algorithm
237
Examples: direct search
238
A comparison of techniques
239
Neighbourhood algorithm
240
What to do with all the samples generated by a global
search algorithm ?
241
The appraisal problem
242
Mapping out the region of acceptable fit
243
Global search: exploration vs exploitation
244
Global search: Parallelisation
245