S M S T C Lecture Notes Lecture4
S M S T C Lecture Notes Lecture4
INVERSE PROBLEMS
Lecture 4: SVD. Revision of probability and statistics
Anya Kirpichnikova, University of Stirlinga
www.smstc.ac.uk
Contents
4.1 Solution of the mixed-determined problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.1 Natural solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–1
4.1.2 SVD: what we want . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2
4.1.3 SVD: results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–2
4.1.4 Is that what we wanted? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4.1.5 Natural generalised inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4.1.6 SVD: Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–3
4.1.7 SVD: Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–4
4.2 Probability Revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–5
4.2.1 Random variables, p.d.f., c.d.f., expected value . . . . . . . . . . . . . . . . . . . . . . . 4–5
4.2.2 Other ways to describe R.V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–6
4.2.3 Correlated data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–6
4.3 Functions of R.V.: how are p(d) and p(m) related? . . . . . . . . . . . . . . . . . . . . . . . . 4–7
4.3.1 Example 1D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7
4.3.2 Example 1D nonlinear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7
4.3.3 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–7
4.3.4 General linear case: important conversion formulae . . . . . . . . . . . . . . . . . . . . . 4–7
4.3.5 Example 2D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.3.6 Example: how useful Eq 4.2 can be . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–8
4.3.7 Example of uncorrelated data with uniform variance . . . . . . . . . . . . . . . . . . . . 4–8
4.4 Gaussian p.d.f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4.4.1 Univariate Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4.4.2 Multiariate Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
4.4.3 K(m) = d case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4–9
K (m p + m0 ) = d p + d0
a [email protected]
4–1
SM ST C : INVERSE PROBLEMS 4–2
and
L = mT m = [m p + m0 ]T [m p + m0 ] = mTp m p + mTp m0 + mT0 m p + mT0 m0 = mTp m p + mT0 m0
since m0 is orthogonal to m p . In the right hand side above, mTp m p is determined by d, and mT0 m0 is determined
by a priori information. The total overall error is then
E = [d p + d0 − Km p ]T [d p + d0 − Km p ]
due to Km0 = 0, simplifying further, with d0 being orthogonal to d p , we have
E = [d p − Km p ]T [d p − Km p ] + dT0 d0
• S p (m) contains m p , i.e. linear combinations of mk such that they can be determined by Km = d
• S0 (m) contains m0 , linear combinations of mk that are unilluminated by Km = d, note Km0 = 0
• S(m) = S p (m) ∪ S0 (m)
• S p (d) is a subspace of S(d) that is spanned by Km.
K = UΛVT (4.1)
UUT = UT U = IN .
VVT = VT V = IM .
• Λ is N × M diagonal eigenvalue matrix whose diagonal elements are called singular values, they are usually
positioned in Λ decreasing in size, then we can partition Λ into sub-matrix Λ p (p× p, diagonal) of p non-zero
singular values and several zero matrices:
Λ 0
Λ= p
0 0
• The natural solution to the inverse problem contains no component in S0 (m) and a prediction error e no
component of S(d)
• mest = V p Λ−1
p U p d has no component in S0 (m) (show that as an exercise)
• Show that e = d−Kmest has no component in S p (d) (prove as an exercise) As VTp V p = UTp U p = Λ p Λ−1
p = Ip,
mest is the natural solution.
which means that model parameters will be perfectly resolved only if V p spans the complete S(m), i.e. p ≥ M.
Natural generalised inverse has the following data resolution:
Step 1
Sw(i) = λi w(i)
SM ST C : INVERSE PROBLEMS 4–4
Step 2
K u(i)
(i)
0 u
Sw(i) = λi w(i) → T = λi
K 0 v(i) v(i)
therefore
Kv(i) = λi u(i) and KT u(i) = λi v(i)
Step 3
Assume there is a positive eigenvalue λi > 0 with eigenvector [u(i) , v(i) ]T , then −λi < 0 is also an eigenvalue with
eigenvector [−u(i) , −v(i) ]T .
If there exists p positive eigenvalues, then there exists N + M − 2p zero eigenvalues.
Then
Kv(i) = λi u(i) ⇒ KT Kv(i) = KT [λi u(i) ] = λi [KT u(i) ] = λi [λi v(i) ] = λ2i v(i)
so v(i) is an eigenvalue of KT K with eigenvalue λ2i ;
KT u(i) = λi v(i) ⇒ KKT u(i) = K[λi v(i) ] = λi [Kv(i) ] = λi [λi u(i) ] = λ2i u(i)
Step 4
Kv(i) = λi u(i) → KV = UΛ
Step 5
K = UΛVT
is the required SVD of K.
Step 1: Generate U
Then the eigenvectors satisfy the following equations together with normalisation:
2 2
x + y = 1
For λ1 : z=w= 0 ⇒ x = −0.817416, y = −0.576048, z = 0, w = 0
1
√
y =
14 −5 + 221 x
x2 + y2 = 1
For λ2 : z=w= 0 ⇒ x = 0.817416, y = −0.576048, z = 0, w = 0
√
y = 1 −5 − 221 x
14
and hence
0.8174 −0.5760 0 0
0.5760 0.8174 0 0
U=
0
0 1 0
0 0 0 1
Step 2: Generate V
Similarly, to Step 1:
5 11
KT K = ⇒ λ1 = 0.1339, λ2 = 29.8661
11 25
and the matrix made of eigenvectors is
0.4046 −0.9145
V=
0.9145 0.4046
Step 3: Generate Λ
as we assume the measurement take its value from (−∞, ∞) or [dmin , dmax ].
The probability of R.V. d taking a value on the interval (d1 , d2 ) is then
Z d2
P(d1 , d2 ) = p(d)dd
d1
SM ST C : INVERSE PROBLEMS 4–6
• width of a distribution (wide distribution means noisy data; narrow means relatively noise-free data)
To measure the width, we multiply the distribution by a function which is zero exactly at the peak (mean) of the
distribution, and grows fast in its vicinity, say a parabola: (d − hdi)2 = (d − µ(d))2 , then we introduce variance
σ2 : Z ∞
σ2 = (d − hdi)2 p(d)dd
−∞
We can also calculate mean and variance from N realisations of data di and get the sample mean and sample
standard deviation
1 N 1 N
hdiest = ∑ di , (σ2 )est = ∑ (d − hdiest )2
N i=1 N − 1 i=1
and for a univariate distribution of di you then just integrate over all N − 1 variables d j , j 6= i
Z ∞ Z ∞
p(di ) = ··· p(d)dd j ddk . . . dds
−∞ −∞
where brackets [d − hd1 i] and [d − hd2 i] are checking how much of the data are on the same/opposite sides of their
means.
For a data vector d, the individual mean for di would be calculated by taking N integrals
Z ∞ Z ∞ Z ∞
hdii = ··· di p(d)dd1 dd2 . . . ddN = di p(di )ddi
−∞ −∞ −∞
4.3.1 Example 1D
We consider one datum d and one model m, and they are related
m(d) = 2d.
Suppose p(d) is uniform on (0, 1), p.d.f. is constant, the area under the curve should total to one, so p(d) = 1. The
p.d.f. is also uniform, but on interval (0, 2) so since the total area should be one, p(m) = 12 ,. p(m) is not merely
p(d(m)) but stretching (shrinking of m−axis w.r.t. d−axis)
Z dmax Z d(mmax ) Z mmax
dd
1= p(d)dd = dm = p(m)dm
dmin d(mmin ) dm mmin
dd
so m = 2d ⇒ d = m2 , the stretching factor is | dm |, note that to accommodate the case when mmin > mmax and
reversing the integral, which might change the direction of integration, we consider the absolute value. In our
dd
example | dm | = 12 .
1 1
p(m) = √ ,
2 m
Proof.
Z Z
∂d ∂m
hmii = mi p(m)dm1 . . . dmN = ∑Mi j d j p[d(m)] det det dd1 . . . ddN =
j ∂m ∂d
Z
= ∑Mi j d j p(d)dd1 . . . ddN = ∑Mi j hd j i
j j
4.3.5 Example 2D
Consider two p.d.fs. for the R.V. d1 and d2 , both uniform of (0, 1). Consider R.V. such that
(
m1 = d1 + d2
m2 = d1 − d2
To have unity of the area, p(d) = 1 (it’s a square with side one)
1 1 1 1
M= , | det M| = 2, J = ⇒ p(m) = p(d)J = .
1 −1 2 2
Importance: this is a limiting p.d.f. for the sum of R.V. (central limit theorem), i.e. as long as the noise in the data
comes from several sources of comparable size, it will tend to follow Gaussian p.d.f.
When m = Md, p(d) is Eq (4.3), then p(m) is also Gaussian with mean µ(m) = Mµ(d) and [covm] = M[covd]M T
are all linear functions of Gaussian R.V. are also Gaussian.
Given Gaussian pA (d) with µ(dA ), [covd]A and Gaussian pB (d) with µ(dB ), [covd]B . The product
and variance
[covdC ]−1 = [covdA ]−1 + [covdB ]−1
K(m) = µ(d)
then
1 1 T −1
p(d) = N 1 exp − [d − K(m)] [covd] [d − K(m)]
(2π) 2 (det[covd]) 2 2
Km must not be a function of R.V., set of unknown quantities that define the shape of the data., if auxilliary
variables in K are random, they should be part of the data
SM ST C : INVERSE PROBLEMS 4–10
Exercises
4–1. Show mest = V p Λ−1
p U p d has no component in S0 (m)
References
[1] W ILLIAM M ENKE, Geophysical Data Analysis: Discrete Inverse Theory, 3rd edition, Essevier, 2012.
[2] P ER C HRISTIAN H ANSEN, Discrete Inverse Problems, Insights and Algorithms, SIAM, 2010.
[3] L ANCZOS C.,, Linear Differential Operators, Van Nostrand-Reinhold, New Jersey, 1962.
[4] P ENROSE R., A., A generalised inverse for matrices , Proc. Cambridge Phil. Soc., 51 (1955), pp. 406–413.
[5] B URDEN , R ICHARD ; FAIRES , D OUGLAS Numerical analysis, (8th ed.), Thomson Brooks/Cole, (10 Decem-
ber 2004). ISBN 9780534392000.
[6] W UNSCH , C., M INSTER , J.F., Methods fro box models and ocean circulation tracers: mathematical pro-
gramming and non-linear inverse theory, J. Geophys. Res., 87 (1982), pp. 5647–5662.
[7] W IGGINS , R.A.,, The general linear inverse problem: Implication of surface waves and free oscillations for
Earth structure, J. Geophys. Space Phys., 10 (1972), pp. 251–285.