Hoversten 1982
Hoversten 1982
COMPARISON OF FIVE
LEAST-SQUARES INVERSION TECHNIQUES
IN RESISTIVITY SOUNDING*
ABSTRACT
HOVERSTEN, G.M., DEY,A. and MORRISON,H.F. 1982, Comparison of Five Least-Squares
Inversion Techniques in Resistivity Sounding, Geophysical Prospecting 30, 688-715.
A brief history of the development of the inverse problem in resistivity sounding is
presented with the development of the equations governing the least-squares inverse. Five
algorithms for finding the minimum of the least-square problem are described and their speed
of convergence is compared on data from two planar earth models. Of the five algorithms
studied, the ridge-regression algorithm required the fewest numbers of forward problem evalu-
ations to reach a desired minimum.
Solution space statistics, including (1) parameter-standard errors, (2) parameter
correlation coefficients, (3) model parameter eigenvectors, and (4) data eigenvectors are
discussed.The type ofweighting applied to the data affects these statistical parameters. Weight-
ing the data by taking log,, of the observed and calculated values is comparable to weighting
by the inverse ofa constant data error. The most reliable parameter standard errors are obtained
by weighting by the inverse of observed data errors. All other solution statistics, such as data-
parameter eigenvector pairs, have more physical significancewhen inverse data error weighting
is used.
INTRODUCTION
Interpretation of resistivity soundings has been a topic of research since the early
1900s. Contributions to the recent geophysical literature have dealt with the develop-
ment and application of a wide variety of one-dimensional inversion techniques and
approaches for estimating uncertainties in the resulting parameters. This paper pre-
sents a comparison of five least-squares minimization algorithms. The comparison is
made in terms of the number of forward problem evaluations required by each
technique to reach a residual minimum. Three weighting schemes which affect the
estimated parameter errors are compared at the minimum reached for a specific
model.
Until the advent of fast digital computers, the interpreter relied primarily on
curve matching procedures, where albums of theoretical curves (Compagnie Gener-
ale de Geophysique 1955, 1963, Mooney and Wetzell956, Flathe 1955, Orellana and
Mooney 1966, Rijkswaterstaat 1969) are used alone or in conjunction with the
auxiliary point method of partial curve matching (Kalenov 1957, Orellana and
Mooney 1966, Zohdy 1965). This method, while undoubtedly the most convenient
and simple, suffers from the drawback that the published curves cover only a limited
number of cases.
H I S T O R I C A DEVELOPMENT
L O F I N V E R S EM E T H O D S
The forward problem expressed in terms of integral expression for potential and
apparent resistivity for the Schlumberger electrode array was developed by Stefan-
esco, Schlumberger and Schlumberger (1930). With these and the associated kernel
function, development began on interpretation based on determining earth par-
ameters by fitting a theoretical apparent resisitivity curve or kernel function to its
observed counterpart. Similar expressions for apparent resistivity have been
developed for many other electrode arrays (Roy and Apparao 1971, Alpin 1966,
Keller and Frischknecht 1966). This general approach remains a very popular
method of inversion today. Another popular inversion method, which will not be
considered here, is based on the use of Dar Zarrouk parameters. This is well
described by Zohdy (1965, 1968, 1974a, 1974b, 1974~).
Stefanesco et a1 (1930) derived the integral expressions
= pll
271 r[’ +2 J ’t)Jo(Ar)
O(A, k, : dA],
where I is the current applied to the earth, r is the current electrode spacing, V ( r )is
the electrode potential at r = AB/2, p,(r) is the Schlumberger apparent resistivity, p1
is the top layer resistivity, Jo ,J1 are the zero and first-order Bessel functions, respect-
ively, O(A, k, t ) is the Stefanesco kernel function, A is the integration variable, k is the
resistivity reflection coefficient, and t is the layer thickness.
The Slichter kernel (Vozoff 1958) is defined by
T(A, k, t ) = pl[l + 2O(A, k, t)]. (3 )
This kernel can also be related to apparent resistivity through the inverse Hankel
transformation of (2):
W
The Slichter kernel can also be expressed in a closed nonintegrable recursive form
for an arbitrary number of n layers of thickness ti (Sunde 1949), where
with
P1 - P2 T2, 3, ..., f l
Pl, 2 , ..., n =
P1 + P2 T2, 3, ..., n ’
with
j = 1, M and i = 1, N ,
where O(X, P)i is the ith observation or kernel as a function of X, the system
parameter vector, C(X, is the ith calculated function value [i.e. (2) or (5)], for X
and PO, the current estimate of the unknown parameter set, and ( P j - Pj”)is the
linear estimate of the correction needed in the jth unknown parameter.
Rewriting (6) in matrix notation, neglecting second and high order terms,
AG = AAP, (7)
where
AGi = O(X,P)i - C(X, Po)i,
and
i=l
where Kl, 2, ..., is the kernel calculated for a particular parameter set, and K , is the
kernel integrated from field data.
Vozoff (1958) considered two methods based on gradients of the system matrix to
692 G.M. HOVERSTEN ET AL.
minimize (10). The first was a functional iteration method and the second a steepest
descent method. The functional iteration procedure follows Newton’s technique for
finding roots of a nonlinear equation of one unknown parameter. The procedure is
discussed by Hildebrand (1949). It consists essentially of an iterative Taylor series
technique. An initial estimate is made for the values of the parameters, assuming that
the estimate is quite close to the true values. The difference or the objective function
AG is approximated by the first two terms of its Taylor series expansion in terms of
the unknown parameters. The expansion is then used to calculate the necessary
changes in the initial estimate. The procedure is iterated until no further changes are
needed.
The steepest descent method deals directly with the difference function AG. If one
maps AG in its m-parameter space, an m-dimensional least-squares surface is gen-
erated. Any choice of a parameter vector not at the minimum gives a point on the
least-squares surface where its m-dimensional gradient is calculated. The parameter
vector is changed so that the value of AG descends along the steepest initial gradient.
Because the surface is not truly described by its first-order gradient alone, the initial
direction will probably not extend to the true minimum. Therefore the parameter
vector is changed until the minimum of the least-squares surface AG is found in that
particular direction. At this point a new gradient is calculated and a new descent is
made in the direction of steepest gradient. This procedure is iterated until a point of
zero gradient is reached or until AG becomes less than some predetermined value.
These two methods of minimizing a function of sums of squares are forerunners of
newer and more advanced methods which will be dealt with in this paper. The
general format of a least-squares approach is now widely used in both the kernel
domain and the apparent resistivity domain since the speed of new large computers
makes this time-consuming method more feasible and, more importantly, because it
lends itself well to statistical evaluation of the resulting parameters.
Many modern least-squares techniques are centered on the method of the gener-
alized inverse. The generalized inverse has been thoroughly discussed for the inver-
sion of surface wave and free oscillation data by Wiggins (1972) and Jackson (1972).
Jupp and Vozoff (1975), Vozoff and Jupp ( 1 9 7 9 and Inman, Ryu and Ward (1973)
found that in many resistivity problems, when model parameters are strongly cor-
related, the system matrix (8) was nearly singular. The resulting non-orthogonality
was not satisfactorily dealt with by the generalized inverse. Hoerl and Kennard
(1970a, 1970b) developed a theory which shows that linear estimation from non-
orthogonal data could be improved by the use of biased estimators. Hoerl gave the
name ridge regression to this method. Marquardt (1970) discussed the relation-
“ ”
ship between generalized inverse and ridge regression and summarized his finding in
these words:
The ridge and generalized inverse estimators share many properties. Both are superior to
least-squares for ill-conditioned problems. The generalized inverse solution is especially
relevant for precisely zero eigenvalues (in the system matrix). The ridge solution is compu-
tationally simpler and seems better suited to coping with very small, but non-zero,
eigenvalues.
LEAST-SQUARES I N V E R S I O N 693
Most practical resistivity problems, owing to the many highly correlated par-
ameters that can result, involve small but non-zero eigenvalues. For this reason
Inman (1975) used ridge regression, with good results, for the inversion of resistivity
data. The ridge regression approach has also been used recently by Glenn and Ward
(1976), Rijo, Palton, Feitosa and Ward (1977) and Petrick, Pelton and Ward (1977).
F I V E L E A S T - S Q U A R EMSI N I M I Z A T I OANL G O R I T H M S
The following descriptions of the algorithms closely follow the published literature.
Some notation has been changed from the original publications for consistency
within this paper.
Q* = (1 + a)Q - LXQh,
where LX, the reflection coefficient, is an arbitrary positive constant. This expression
indicates that Q* = Q on a line joining Q h and Q on the far side of Q from Q h with
(Q*, Q) = a(QhQ). If the Q* which corresponds to AG* falls between AGh and AG, , Qh
is replaced by Q* and the next iteration proceeds with the newly defined simplex.
If the reflection produced a new minimum AG* < Gf,then a step is taken to try
and find a further minimum. The expanded point Q** is given by the relation
Q** = Q* = (1 - y ) Q ,
where the expansion coefficient y is the ratio of the distance (Q*, *Q) to the distance
(Q, *Q). If AG* < GIP,Qhis replaced by Q**, and the process begins again. However,
if AG** > AG, it is called a “failed expansion”, and Qh is replaced by Q* before
restarting.
The third operation of contraction is used if, on reflecting Q to Q*, the condition
AG* > AG, exists for all k # h, i.e. that replacing Qhby Q* leaves AG* the maximum;
694 G . M . H O V E R S T E N ET A L .
then a new Qh is defined to be either the old Qh or Q*, whichever has the lower AG
value. The contracted point is defined by
Q*** = P Q h = (1 - P)Q.
The contraction coefficient lies between 0 and 1 and is defined by the ratio of
distance [Q***, Q] to distance [Qh, Q]. Q*** is used in place of Qh unless
AG*** min(AGh, AG*), i.e. the contracted point Q*** is worse than the better of
Qh and Q*. In this case of a “failed contraction”, all the Qk’s are replaced by
(Qk+ Q,)/2 and the process is restarted.
The iteration process is set to run until AG reaches or surpasses some preset
minimum value. The application of the simplex method to the minimizing of a
function of many variables is discussed by Nelder and Mead (1965).
2. Unconstrained global optimization (Bremermann)
This routine was developed initially for the solution of systems of nonlinear equa-
tions of up to 100 variables (Bremermann 1970). The routine has also been used for
finding maxima or minima of sums of squares, sums of exponentials, and curve
fitting. The method is briefly described as follows:
(a) AG is evaluated for the initial estimate of the parameter set PO.
(b) A random direction r is chosen. The probability distribution of the r is an
N-dimensional Gaussian with o1 = o2 = ... = on= 1.
(c) On the line determined by PO and r, the restriction of AG to this line is approx-
imated by five-point Lagrangian interpolation centered at PO and equidistant with
the distance h, which is a parameter of the method.
(d) The Lagrangian interpolation of the restriction of AG is a fourth-degree poly-
nomial in parameter A describing the line PO + 1r. The five coefficients of the Lagran-
gian interpolation polynomial are determined.
(e) The derivative of the interpolation polynomial is a third-degree polynomial with
one or three real roots. The roots are computed by Cordan’s formula (Bremermann
1970).
(f) If there is one root A,, the procedure is iterated from the point with a new
random direction provided that AG(Po + 1, r ) I AG(Po). If this inequality does not
hold, the method is iterated from PO with a new random direction.
(g) When there are three real roots A,, A 2 , then the polynomial is elevated at
PO + AI r, PO + 1,r and PO + A3 r. Also, considering the value of PO, the procedure is
iterated from the point where AG has the smallest value. If AG has a maximum value
at more than one point, the algorithm chooses one of them.
(h) Iteration continues until a predetermined number of interactions has been run or
until a prescribed minimum has been reached.
3. Peckham’s method
This method was specifically designed for minimizing a sum of squares of nonlinear
functions without calculating the gradients which make up the matrix A (Peckham
1970).
LEAST-SQUARES INVERSION 695
The work of Spendley, Hext and Himsworth (1962) and Nelder and Mead (1965)
in developing methods for minimizing functions in which the function is evaluated at
( N + 1) or more points forming a simplex in N-dimensional space suggested to
Peckham that, for problems where the function is a sum of squares, the function
values at a set of ( N + 1) or more points might be used to estimate values for the
coefficients C i and A i j in (8). These could then be used in (9) for a linear estimate of
the minimum. One iteration consists of replacing the point of the set with the highest
function value by the linear estimate of the minimum position.
Assume that there are function values Qklfor a set of p points with parameters Pjl
where p 2 N + 1 and 1 = 1, 2, . . ., p . Now consider the minimization at each of the
points in the M-dimensional hyperspace. The linear approximation is obtained by
picking C and A to minimize the N expressions
With the change of variables P>l= o l P j l and QL1= o l Q k l the values of C and A
which minimize AG are given by
=ProT,
p’pTAT
where
with
1
c =-
R
0’0.
If these values are substituted into (9), the linear estimation of the parameter set
at the minimum is
-1
PE= -( P P T ) ( P O ’ T ) ( P O T 0 0 ) .
R
In order to solve equation (12) it is rewritten as
-1
PE= -PP’TZ,
R
where
(P’0TO’PT)Z= P’OrTO’o.
696 G.M. HOVERSTEN E T AL.
Equation (13) is the normal equation of a linear least-square problem, i.e. the
Euclidean norm 11 0: - O’P’TZ11 has a minimum when Z satisfies (14).
The solution is obtained by use of orthogonal transformation (Golub 1965).
Peckham (1970) used the ALGOL procedure “Orthlin 2” for his solution. In addition
the values of col were chosen to give function values near the minimum more weight
in determining C and A
1
CO,=-.
AG
4. Ridge regression
Rather than using function values in an M-diqensional hyperspace to estimate
values for C , and A,,, it is very popular to use direct forward problem calculation for
C , and finite differences of these values to calculate A,, . The simplest approach is to
solve (7), neglecting weighting, for AP by calculating the least-squares inverse
(ATA)- ‘AT. However, Hoerl and Kennard (1970a) show that when (ATA)is nearly
singular, as it can be in many geophysical problems, the average difference between
estimated AP and true AP becomes very large. The ridge regression method (Leven-
berg 1944, Foster 1961, Marquardt 1963,1970) seeks to reduce this difference during
the iteration process by damping the diagonal terms of (ATA).The ridge regression
estimate of APRR is
The eigenvalues of (ATA + KI) are (1; + K), where 1; are the eigenvalues of ATA. Any
very small eigenvalues of the least-squares estimator will be increased in the ridge regres-
sion estimator by a factor K. Hence the inversion of the matrix (ATA + KI) will be more
stable. Increasing the size of all the eigenvalues results in a significant decrease of (a) the
mean of the squared length between AP true and APRR,and (b) the variance of the
estimated solution.
The basic concept of the technique is that the best direction for finding a reduced
sum of squares lies somewhere between the direction given by the Taylor series
increment and the direction of steepest descent. In (15), when K = 0, AP approaches
the Taylor series direction. This process insures second convergence near the mini-
mum. The effect of decreasing K in each iteration on the resulting intermediate
model is that the model will at first fit the broad, low-frequency aspects of the data
with higher frequency components being fitted as K decreases. Hoerl and Kennard
(1970a) give an excellent example of the comparison of the ridge regression inverse
and the classic least-squares inverse. A more detailed discussion of ridge regression
and its relationship to the generalized inverse is covered by Marquardt (1970). The
algorithm tested is routine ZXSSQ for the IMSL computer library.
LEAST-SQUARES INVERSION 697
The “spiral algorithm ” and its comparison to methods by Marquardt and Powell is
described in a paper by Jones (1970) for a number of models whose derivatives with
respect to the parameter (i.e. the elements of A) are analytic. Only the general
concepts of the algorithm, following Jones (1970), will be given here.
The main principal of the algorithm is that a reduction in AG can always be
found in the parameter space plane defined by the Taylor series estimate of the
minimum, the steepest descent estimate of the minimum and the starting point for
that iteration (see fig. 1). Figure 2 represents the plane ODT in parameter space,
n
t = Toylor series direction
5 = Steepest descent
direction
PI
Fig. 1. Contours of constant sum of squares, illustrating different directions toward the mini-
mum sum of squares.
‘0
where 0 is the starting point for an iteration, T is the Taylor series point and D is the
steepest descent point, chosen such that the distance OD is equal to OT. The basic
strategy is that the next starting point should be as far away from 0 as possible while
keeping the number of evaluations of the least-squares surface to a minimum.
Within an iteration, the first point checked is the Taylor series point T, which is
generated by Marquardt's method. If AGT < A G O ,this point is accepted as the new
minimum and a new Taylor series point is calculated from there. If ACT 2 A G O ,
then the linear approximation of the model at 0 is not valid at T. This implies that
the sum of squares valley must curve in one of the two directions, indicated by the
dashed lines in fig. 2. In order to try and intercept the valley, the spiral OST is
searched. The curve moves out from T at angle p into the search area OTD and
returns to 0 along a tangent to line OD. The optimum equation for the spiral found
by Jones (expressed in polar coordinates with origin at 0)is
Pn+l = 2PL,/(1+ 4.
This relation was chosen to ensure that the points L become closer together as they
near D.
The coordinates (a, 8) of the points L are given by the relations
p sin y
8 = tan-'
1 - p + p sin y
and
r 0 p sin y
a=
sin 8
With the starting point 0 as origin, the coordinates of the point S in parameter
space are given in terms of t and d, the coordinates of T and D respectively by the
relation
s = -pd
r
a
+ (1 - p ) t . (16)
Equation (16) is the main operating equation of the spiral algorithm aside from
the Marquardt technique for generating T. If AGT > AGO, each successive search
point is derived as a weighted sum of two parameter space vectors T and D.
The entire algorithm is more complicated than described here since it contains a
LEAST-SQUARES INVERSION 699
provision for dealing with spurious local minima. Interpolations are also performed
when three consecutive search points yield reduced sums of squares, the interpolated
minimum being checked to speed convergence.
PARAMETE
SRTATISTICS
The simplest view of the inverse problem considered here is that of an automated
“ curve fitting” procedure. A model is derived by finding a parameter distribution
that will produce a theoretical function, either p a or kernel function, that fits the
observed data in some least-squares sense. Weighted least-squares are usually used
so that the data can be fitted on one of two ways: (1)data are fitted uniformly when
the percentage data error is equal for all data (e.g. resistivity data is usually assumed
to have a constant percentage error at all electrode spacings), and (2) data are
selectively fitted when data errors are variable (e.g. in EM soundings data error is
usually a function of frequency).
An ideal weighting scheme for the first case is logarithmic (Rijo et al. 1977); the
function to be minimized is
Since the relative data errors are constant, this scheme weights the date equally and
eliminates the requirements for a weight vector to be included.
In the case of variable data error it would seem appropriate to weight each data
point in inverse proportions to its error. The following equation represents this type
of weighting:
Parameter standard errors and correlations are derived from the covariance
matrix V evaluated at the minimum:
V = 02(ATW-'A)-', (19)
where
AGTW- 'AG
02 =
N-M '
w..=
i
O
i=j
.i # j '
and
N
(AG)' = 1[observed, - cal~ulated,]~.
i=l
If log,, weighting is being used as in equation (17), then w is replaced by taking log,,
of the observed and calculated values in AG.
The parameter standard errors are defined by the square root of the diagonal
term of V (e.g. JV,, equals the standard error for parameter number 1).
The correlation matrix is the diagonally normalized covariance matrix. Its terms
are the correlation coefficients, which are measures of the linear dependence between
parameters. The correlation matrix C is given by Jenkins and Watts (1968) as
If an element C , is near k 1, then the ith and jth parameters are strongly linearly
dependent. For example, if i represented the thickness t and j the resistivity p of a
layer (i.e. Cij represents the correlation between the thickness and resistivity of a
layer), then only the ratio t / p is well determined by the data if C i j N 1. This case is
true for layers that are highly conductive relative to their surroundings. If C i j=, - 1,
then only the product pt is well determined, as is the case for the relatively resistive
layers. This is the familiar equivalence problem discussed, for example, by Sunde
(1949).
The relationship between parameter correlations and parameter standard errors
is well explained by Inman (1975), and we will paraphrase him here. If the correla-
tions are small, then the standard errors, given by the square roots of the diagonals of
(19), are a good measure of the uncertainty of each parameter. If, however, two
parameters are highly correlated, i.e. C i jN f 1, then the standard deviations wiIl be
larger than the actual uncertainties. Figure 3 illustrates this fact with a generalized
slice of solution space. The two coordinates axes correspond to two parameters of the
estimated layered earth model. The ellipse indicates a confidence region within which
the residual sum of squares is expected to lie for a certain fraction of the repeated
experiments. This region also defines the values of the parameter p2 (resistivity) and
tz (thickness) which give a residual sum of squares within the contour. The origin is
defined by the parameter value at the final solution. The tilt of the axis of the ellipse is
LEAST-SQUARES INVERSlON 70 1
STANDARD SOLUTION
DEVIATION
TRUE CONFIDENCE
, LIMIT
a measure of the degree of correlation between the two parameters. If the standard
errors from (19) are taken to be the true deviation estimates, then the ellipse
is enclosed by a box whose sides are defined by the standard deviation. The box,
which ignores parameter correlation, represents a much larger confidence region
than the ellipse. By using the standard deviation implied by the box one obtains a
very conservative estimate of the parameter confidence interval for correlated par-
ameters. Therefore, by considering the standard deviations in conjunction with par-
ameter correlations, a more realistic parameter standard deviation can be arrived at
which is always less than or equal to the standard deviation computed from (19).
Two models, one described by Inman (1975) and one of our choice (called model 3),
are considered for comparison of the inversion routines and to illustrate some con-
cepts of the parameter statistics (see below, figs 11 and 12).
A common misconception about parameter standard e r r o s is that a single model
parameter can be varied by its estimated standard error with no significant change
resulting in the calculated forward problem. In fact, the parameter standard errors
and correlation coefficients must be viewed as representing a complex interactive
system that describes combinations of parameter changes which can be made with-
out a significant change in the estimated least-squares residual. For example, con-
sider model 3 with its conductive middle layer. The eigenvectors, eigenvalues,
correlation coefficients, and parameter standard errors, calculated using data errors
as weights in equation (18) are shown in fig. 8. Note the high positive correlation
between p2 and t2 indicating a linear relation between the two parameters, i.e.
S2 = t2 / p z . Figure 4 shows the sounding curve for model 3 along with error en-
velopes generated by varying t2 by its standard error. Similarly, fig. 5 shows the error
envelope generated by varying pz by its standard error. Clearly the change in p A is
much larger than the 1% error assumed in the data. However, if the ratio t , /p2 is
varied (see fig. 6), as indicated by their correlation, simultaneously by their standard
errors, the change observed in p A is of the order of 1%, the assumed error level, and
thus is not a statistically significant change.
The parameter eigenvectors and their associated eigenvalues are also very useful
in defining the relation between parameters and their overall effect on the data
generated from a particular model.
Lanczos (1961) factored the system matrix A into its row (parameter) and column
702 G . M . H O V E R S T E N ET A L .
5 1,"l I 1 1 1 ~ l / , , l , ,,,,,,, ,
1 1 1 , 1 1 1
5 5 k 10' ' " " " " 103 ' " " " ' 1 1 0 4
' ' t " " "
ELECTRODE S P A C I N G ,A B / ~ im)
' ' l l l l105i l
Fig. 6. Model 3 sounding curve with 1 % variation of t Z/ p z .
Fig. 7. Parameters and data eigenvectors with associated eigenvalues, parameter correlations,
and best-fit model parameters for minimization without weighting.
704 G . M . H O V E R S T E N ET A L .
A8/2
4 t1f$t2P3 A I 5 10 15 20
*d[1 1
-I
83.5 I I I I I I I I I I , I ,
1 0 027 I.
Correlation matrix Standard
pi t, p, p3 True model
t2 Best fit error
pl 1.0 100.0 pi =9999 ? 0.39
/, -0.41 1.0 50.0 t i =-49.85 ? 0.56
p2 0.15 -0.83 1.0 3.0 p2=-3.31 ? 0.81
f , 0.15 -0.83 0.99 1.0 100.0 f p =-110.81 Z 27.80
p3 0.03 -0.27 0.36 0.36 1.0 1000 0 p3=1005.7 2 11.90
Fig. 8. Parameter and data eigenvectors with associated eigenvalues, parameter correlations,
and best-fit model parameters for minimization with data weighted by their standard errors
(standard error 1% of data value).
A B/2
'"11
-I
qt1qt2g
083
x I 5 10 15 20
k 0041 A
J 0 0 ~ 1 1 1 1
A 00004 I
Correlation matrix Standard
pi p2 t2 p3 True model Best f i t error
pl 1.0 100.0 p , =99.99 t 0.04
tl -0.41 1.0 50.0 f I =49.85 t 0.06
p2 0.15 -0.83 1.0 3.0 ~ ~ ~ 3 . 3 12 0.09
t p 0.15 -0.83 0.99 1.0 100.0 t2=110.81 ? 3.10
p 3 0.02-0.21 0.28 0.29 1.0 1000.0 p3=1005.7 ? 1.30
Fig. 9. Parameters and data eigenvectors, parameter correlations, and best-fit model par-
ameters for weighting by taking log,, of the observed and calculated data.
LEAST-SQUARES INVERSION 705
input and output of the linearized system. Then the decomposition of the matrix A is
similar to the decomposition of the impulse response of an ordinary linear filter in
terms of sinusoids (eigenvectors) of various amplitudes (eigenvalues). For a linear
filter the amplitude response at a particular frequency determines how the filter will
amplify the corresponding spectral component of the output.
Similarly, if we think of the matrix A as a filter relating the input (parameters), to
the output (calculated data), then the eigenvalues are the amplification coefficients
which determine the magnitude of the effect of the linear combination of parameters
ui, on the linear combination of data ui.Small eigenvalues and their associated
eigenvectors represent the spectral components which are poorly transferred through
the earth model. By considering these eigenvector-eigenvalue decompositions, one
can optimize data sets to contain the maximum information related to a model
parameter of particular interest. The subject of experiment design by this method is
discussed by Glenn and Ward (1976).
Figure 8 presents (i) the parameter eigenvectors (columns of V), (ii) eigenvalues
(diagonal elements of A), (iii) the data eigenvectors (columns of U) whose numbering
refers to the data points labelled in fig. 12, (iv) the parameter correlation coefficients,
and (v) the model parameter value with estimated standard errors. Some insight into
the physical significance of this eigenvector decomposition can be gained by con-
sidering the effects of varying parameters on the sounding curve. Figure 13 shows the
variation in pA caused by changing p1 by approximately 50%. Similarly, fig. 14
shows the variation in p A caused by a 50% change in t , . Note that the variation in p A
occurs from data points 1 to data points 10 or 11. Compare this with the second and
third eigenvector pairs of fig. 8, both of whose parameter eigenvectors are composed
of p,, t , components. Their corresponding data eigenvectors have components from
position 1 to 10. The same relation holds for the other eigenvector pairs. Compare
fig. 4,produced by changing p 2 , with the first eigenvector pair of fig. 8. The corre-
spondence between the eigenvector pair and changes induced in p A by varying a
particular parameter are not as clear for the fourth and fifth vector pairs of fig. 8.
However, if the two are lumped together (the eigenvalues are of the same order of
magnitude), the effect of varying t2 or p 2 manifests itself in data from positions 9 to
21, as expected.
Effects of weighting
From a comparison of figs 8 and 9 it is obvious (as proposed by Rijo et al. 1977) that
taking log,, of the observed and calculated data is the same as weighting by a fixed
relative error. The only difference appears in the estimated standard errors, where
those in fig. 8 used o2 = 1 in equation (19) and those in fig. 9 used
02 =
AGT log,, AG log,,
N-M
in equation (19). The standard errors estimated in fig. 8 encompass the true par-
ameter errors in all cases, whereas the standard errors shown in fig. 9 are consistently
too small.
706 G . M . H O V E R S T E N ET A L .
The eigenvector pairs, correlation, and parameter standard errors calculated with
no data weighting are shown in fig. 7. The eigenvectors are essentially the same as
those in figs 8 and 9, with one major exception: the data eigenvectors associated with
parameters p, , t, , and p3 have their components shifted to larger spacings. This bias
occurs because the large pA values toward larger AB/, dominate the sum of squares.
In effect, each data point is weighted by its own magnitude. In addition a much
higher degree of parameter correlation is found when no data weighting is used, and
estimated parameter standard errors are much larger than those calculated by
weighted schemes. For an example of this compare the correlation coefficients for
p 3 p, and p3 t , in figs 7 and 8.
Weighting the data by their standard errors as indicated in equation (12) seems to
be the most advisable because it is the most flexible to varying data errors and yields
the smallest standard errors, consistent with the true error, as shown in figs 7
through 9.
The parameter eigenvectors and correlation coefficients for Inman's (1975)model
are given in fig. 10. The first three eigenvalues are all of about the same order of
<=2 48 :[,
I
I I I I f4=OO8I "["-
I
Correlation matrix
PI tl P2 t2 Pa
p, 1.0 0.86 0.21 -0.22
0.11
t, 1.0 0.57 -0.58 0.16
1.0 -0.988 0.24
p2
t2 1.0 -0.29
1.0
p3
Fig. 10. Parameter eigenvectors and eigenvalues, and parameter correlations for Inman's
(1975) model.
magnitude with the last two very much smaller. The linear combination of par-
ameters represented by the first three eigenvectors has the greatest effect on the
sounding curve. In these three eigenvectors the elements p , and t , have opposite
signs while the elements p, and t , have the same sign. This indicates that if pz and t ,
are both changed in the same direction, the effect on the sounding curve (fig. 11)will
be larger than the effect of similar changes on other parameters. In addition, if p ,
increases while tl decreases, or vice versa, the sounding curve will also change. The
eigenvector associated with A: = 0.081 indicates, since A4 is small, that increasing or
LEAST-SQUARES INVERSION 701
103 , , I I I
15 16 17 18 19 20 2
II I I I I
I 10 102 103 104 I 5
ELECTRODE SPACING, A W 2 . Lm)
P3 = IOOOnm
I I I I I
1
0 I02 I03 104 10
I I I I
I I I I
10 10' 103 104
E L E C T R O D E SPACING, A B / 2 (m)
decreasing p , and t , together will have little effect on the sounding curve (fig. 11).In
other words, the ratio t , / p , is the combination of these parameters which affects the
sounding curve the most (note the correlation coefficient between p , and t , is
+0.86). The eigenvector associated with A: = 0.0097 indicates that increasing p 2
while decreasing t 2 , or vice versa, has little effect on the sounding curve (i.e. only the
product p 2 t 2 affects fig. 12). Again, this is also indicated by considering the correla-
tion coefficient between p 2 and t 2 , which is -0.988.
served and calculated apparent resistivity is the data variance c2,given by Hamilton
(1964) as:
c =
2 Ci”=lI00 - Pcl?
N-M ’
0 N P1 tl Pz tz P3 tZIP2
Model 3
Peckham 0.44 75 100.20 49.7 3.6 121.2 998 33.66
Marquardt 0.188 28 101.01 49.6 3.97 133.3 1005 33.57
Bremermann 212.0 192 101.53 50.2 3.92 133.4 1107 34.0
Simplex 0.034 332 100.02 49.6 3.6 120.0 1002 33.33
Spiral 0.048 45 100.01 50.1 2.7 91.7 997 33.96
True values 100 50 3 100 loo0 33
Model 4
Peckham 0.065 100 9.99 9.98 385.2 252.3 10.0 97 185.0
Marquardt 0.019 29 9.99 9.96 382.8 254.7 9.99 97499.0
Bremermann 1.56 67 10.41 11.29 524.23 181.87 10.05 95342.0
Simplex 9.03 299 10.00 10.3 594.8 162.0 10.09 96 357.0
Spiral 0.041 43 10.00 9.89 380.3 256.4 10.0 97508.0
True values 10 10 390 250 10 97500
and the resulting parameters are shown. The table also contains the linear combina-
tion of parameters for the middle layer, which are best determined by the data. For
Inman’s model (resistive middle layer) this is t 2 /p2 .
The initial guess for Inman’s model was that used by Inman (1975):
p 1 = 8 Rm t l = 15 m,
p2 = 500 Rm t2 = 150 m,
p3 = 5 Rm.
710 G . M . H O V E R S T E N ET A L .
Fig. 15. Progression of the estimate of p 2 for model 3 as a function of the number of forward
problem evaluations. (- - - -) Marquardt, (. ... .) spiral, (-) Peckham.
that Peckham’s parameter estimates develop an oscillation in values near the true
solution, whereas the other two routines converge much more rapidly to a solution.
One final property of the solutions that should be noted is the expression of the
equivalence principal as seen by the procession of the longitudinal conductance
S 2 = t2/ p 2 . For all three routines the longitudinal conductance S 2 reached the true
value much faster than either individual parameter, and even when both parameters
p2 and t2 are in error (as they are for both spiral and Peckham), their ratio t2/ p 2 has
:I ;
5 0 .. .,: I I I I I I -
712 G . M . H O V E R S T E N ET A L .
FUNCTION C A L L S
Fig. 17. Progression of the estimate of t 2 / p 2 for model 3 as a function of number of forward
problem evaluations. (- - - -) Marquardt, ( . . . .) spiral, (-) Peckham.
been accurately determined. This is a realization of the fact that, for a thin conductive
layer. Sz is the quantity best determined by the data.
Considering both models, the ranking in terms of reaching the lowest a’ with the
fewest number of forward problem evaluations is (1) ridge regression, (2)spiral algor-
ithm, (3) Peckham’s method, (4) Bremermann’s method, ( 5 ) simplex method.
It should be noted that, for Bremermann’s method, the number of function evalu-
ations used in the iterative process is independent of the number of parameters describ-
ing the system, while the other four routines require more function evaluations as the
number of parameters increases. For this reason, the Bremermann method would
compare more favorably for models with a large number of parameters.
CONCLUSION
It has been demonstrated that parameter statistics such as parameter standard
errors, parameter correlation, and associated eigenvectors can be greatly affected by
choice of data weighting. The use of inverse data error weighting is flexible and yields
the most reliable parameter standard error estimates. In addition, the relationships
between parameter and data eigenvectors are physically correct and not biased as in
the case where no weighting is used.
In the comparison of the five least-squares minimization algorithms, the ridge
regression algorithm proved to require the fewest number of forward problem evalu-
ations to reach a desired fit. The ranking of the five algorithms is the same for both
models tested, indicating that the relative speeds of the algorithms is, at least to some
degree, model-independent.
LEAST-SQUARES INVERSION 713
ACKNOWLEDGMENTS
This work was supported by the Assistant Secretary for Conservation and
Renewable Energy, Office of Renewable Technology, Division of Geothermal and
Hydropower Technologies of the US Department of Energy under Contract
No. W-7405-ENG-48. A FORTRAN IV version of the spiral algorithm was obtained
for testing through the courtesy of Shell Research Ltd.
REFERENCES
ALPIN,L.M. 1966, Dipole methods for measuring earth conductivity, trans. KELLER, G.V.,
Consultants Bureau.
BREMERMANN, H. 1970, A method of unconstrained global optimization, Mathematical Bio-
sciences, 9, 1-15.
COMPAGNIE GENERALE DE GEOPHYSIQUE, 1955, Abaques de sondage tlectrique: Geophysical
Prospecting 3, Suppl. 3.
COMPAGNIE GENERALE DE GEOPHYSIQUE, 1963, Abaques de sondage electrique, EAEG, The
Hague.
CROUS,C.M. 1971, Computer-assisted interpretation of electrical soundings MS thesis, Color-
ado School of Mines.
FLATHE, H. 1955, A practical method of calculating geoelectric model graphs for horizontally
stratified media, Geophysical Prospecting 3, 268-294.
FOSTER,M. 1961, An application of the Wiener-Kolmogorov smoothing theory to matrix
inversion, Journal of the Society for Industrial and Applied Mathematics 9, 387-392.
GHOSH,D.P. 1971, The application of linear filter theory to the direct interpretation of
geoelectric resistivity measurements, Geophysical Prospecting 19, 192-217.
GINZBURG, A., LOEWENTHAL, D. and SHOHAM, Y. 1976, On the automated interpretation of
direct current resistivity, Pure and Applied Geophysics 114, 983-995.
GLENN,W.E. and WARD,S.H. 1976, Statistical evaluation of electrical sounding methods.
Part 1: Experiment design, Geophysics 41, 6A, 1207-1222.
GOLUB,G. 1965, Numerical methods for solving linear least squares problems, Numerische
Mathematik 7, 20C216.
HAMILTON, W.C. 1964, Statistics in Physical Sciences, Estimation, Hypothesis Testing, and
Least Squares, Ronald Press Co., New York.
HILDEBRAND, F.B. 1949, Advanced Calculus for Engineers, Prentice Hall, New York.
HOERL,A.E. and KENNARD, R.W. 1970a, Ridge regression: Biased estimation for nonortho-
gonal problems, Technometrics 12, 55-67.
HOERL,A.E. and KENNARD, R.W. 1970b, Ridge regression: Applications to nonorthogonal
problems, Technometrics 12, 69-82.
INMAN, J.R., RYU,J. and WARD,S.H. 1973, Resistivity inversion, Geophysics 38, 1088-1108.
INMAN, J.R. 1975, Resistivity inversion with ridge regression, Geophysics 40,798-8 17.
JACKSON, D.D. 1972, Interpretation of inaccurate, insufficient, and inconsistent data, Geophy-
sical Journal of the Royal Astronomical Society 28, 97-109.
JENKINS, G.M. and WATTS,D.G. 1968, Spectral Analysis and its Application, Holden-Day Inc.,
San Francisco.
JONES,A. 1970, Spiral-a new algorithm for non-linear parameter estimation using least
squares, Computer Journal 12, 301-308.
714 G . M . H O V E R S T E N E T AL.
JUPP,D.L.B. and VOZOFF,K. 1975, Stable iterative methods for the inversion of geophysical
data, Geophysical Journal of the Royal Astronomical Society 42, 952-976.
KALENOV, E.N. 1957, Interpretation of Vertical Electrical Sounding Curves, Gostoptekhizdat,
Moscow.
KELLER, G.V. and FRISCHKNECHT, F.C. 1966, Electrical Methods in Geophysical Prospecting,
Pergamon Press, New York.
KOEFOED, 0. 1965, Direct methods of interpreting resistivity observation, Geophysical Pros-
pecting 13, 568-591.
KOEFOED, 0. 1966, The direct interpretation of resistivity observations made with a Wenner
electrode configuration, Geophysical Prospecting 14, 71-79.
KOEFOED, 0. 1968, The Application of the Kernel Function in Interpreting Geoelectrical
Measurements, p. 11 1, Gebruder Borntraeger, Berlin.
KUNETZ, G. 1966, Principles of Direct-current Resistivity Prospecting, p. 103, Gebriider Born-
traeger, Berlin.
LANCZOS, C. 1961, Linear Differential Operators, D. Van Nostrand, London.
LANGER,R.E. 1933, An inverse problem in differential equations, Bulletins of the American
Mathematical Society, Series 2, 29, 814-820.
LEVENBERG, K. 1944, A method for the solution of certain nonlinear problems in least squares,
Quarterly of Applied Mathematics, 2, 164-168.
MARQUARDT, D.W. 1963, An algorithm for least-squares estimation of non-linear parameters,
Journal of the Society for Industrial and Applied Mathematics, 11, 431-441.
MARQUARDT, D.W. 1970, Generalized inverses, ridge regression, biased linear estimation, and
non-linear estimation, Technometrics 12, 591-612.
MEINARDUS, H.A. 1967, The kernel function in direct-current resistivity sounding, MS thesis,
Colorado School Mines.
MEINARDUS, H.A. 1970, Numerical interpretation of resistivity soundings over horizontal
beds, Geophysical Prospecting 18,415-433.
MOONEY, H.M. and WETZEL, W.W. 1956, The Potentials about a Point Electrode and Appar-
ent Resistivity Curves, University of Minnesota Press, Minneapolis.
NELDER,J.A. and MEAD,R. 1965, A simplex method for function minimization, Computer
Journal 7, 308-313.
ONODERA, S. 1960, The kernel function in the multiple-layer resistivity problem, Journal of
Geophysical Research 65, 3787-3794.
ORELLANA, E. and MOONEY, H.M. 1966, Master Tables and Curves for Vertical Electrical
Sounding over Layered Structures, Interciencia, Madrid.
PECKHAM, G. 1970, A new method for minimizing a sum of squares without calculations,
gradients, Computer Journal 13, 418-420.
PEKERIS, C.K. 1940, Direct method of interpretation in resistivity prospecting, Geophysics 5,
31-42.
PETRICK, W.R., PELTON,W.H. and WARD,S.H. 1977, Ridge regression inversion applied to
crustal resistivity sounding data from South Africa, Geophysics 42, 995-1006.
POWELL,M.J.D. 1964, An efficient method for finding the minimum of a function of several
variables without calculating derivatives, Computer Journal 7, 155-162.
RIJO, L., PELTON, W.H., FEITOSA,E.C. and WARD,S.H. 1977, Interpretation of apparent
resistivity data from Apodi Valley, Rio Grande de Norte, Brasil, Geophysics 42,
8 11-822.
ROY,A. and APPARAO, A. 1971, Depth of investigation indirect current methods, Geophysics
36, 5, 943-959.
RIJKS~ATERSTAAT, 1969, Standard Graphs for Resistivity Prospecting, EAEG, The Hague.
LEAST-SQUARES INVERSION 715
SLICHTER, L.B. 1933, Interpretation of resistivity prospecting for horizontal structures, Physics
4, 307-322; also erratum, p. 407.
SPENDLEY, W., HEXT,G.R. and HIMSWORTH, F.R. 1962, Sequential application of simplex
design in optimization and evolutionary operation, Technometrics 4, 441-461.
STEFANESCO, S.S., SCHLUMBERGER, C. and SCHLUMBERGER, M. 1930, Sur la distribution
electrique potentielle autour d’une prix de terre ponctuelle dans un terrain a couches
horizontals, homogenes, et isotropes, Journal de Physique et le Radium 1, 132-140.
STEVENSON, A.F. 1934, On the theoretical determination of earth resistance from surface
potential measurements, Physics 5, 114-124.
SUNDE,E.O. 1949, Earth Conductor Effects in Transmission Systems, Dover Publications,
Inc., New York.
VANYAN,L.L., MOROZOVA, G.M. and LOZHEMITINA, L. 1962, On the calculation of theoretical
electrical sounding curves, Prikladnaya Geofizika 34, 135-144 (in Russian).
VOZOFF,K. 1958, Numerical resistivity analysis-Horizontal layers, Geophysics 28, 536-556.
VOZOFF, K. and JUPP,D.L.B. 1975,Joint inversion of geophysical data, Geophysical Journal of
the Royal Astronomical Society 42, 977-991.
WIGGINS,R.A. 1972, The generalized inverse problem, Reviews of Geophysics and Space
Physics 10, 251-286.
ZOHDY,A.A.R. 1965, The auxiliary point method of electrical sounding interpretation, and its
relationship to the Dar Zarrouk parameters, Geophysics 30, 644-660.
ZOHDY,A.A.R. 1968, The effect of current leakage and electrode spacing errors on resistivity
measurements, in Geological Survey Research, US Geological Survey Professional Paper
600-D, pp. D258-D264.
ZOHDY,A.A.R. 1974a, Use of Dar Zarrouk curves in the interpretation of vertical electrical
sounding data, US Geological Survey Bulletin, 1313-D.
ZOHDY,A.A.R. 1974b, A computer program for the calculation of Schlumberger sounding
curves by convolution, US Geological Survey Report GD-74-010, PB-232-056.
ZOHDY,A.A.R. 1974c, A computer program for the automatic interpretation of Schlumberger
sounding curves over horizontally stratified media, US Geological Survey Report
GD-74-017, PB-232-703.