0% found this document useful (0 votes)
15 views28 pages

Hoversten 1982

This document compares five least-squares inversion techniques for resistivity soundings. It describes the development of inverse problems and least-squares inversions in resistivity sounding. It also discusses solution space statistics under different data weighting schemes.

Uploaded by

picolindo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views28 pages

Hoversten 1982

This document compares five least-squares inversion techniques for resistivity soundings. It describes the development of inverse problems and least-squares inversions in resistivity sounding. It also discusses solution space statistics under different data weighting schemes.

Uploaded by

picolindo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Geophysical Prospecting 30, 688-715, 1982.

COMPARISON OF FIVE
LEAST-SQUARES INVERSION TECHNIQUES
IN RESISTIVITY SOUNDING*

G.M. HOVERSTEN,*** A . D E Y * * and H . F . MORRISON***

ABSTRACT
HOVERSTEN, G.M., DEY,A. and MORRISON,H.F. 1982, Comparison of Five Least-Squares
Inversion Techniques in Resistivity Sounding, Geophysical Prospecting 30, 688-715.
A brief history of the development of the inverse problem in resistivity sounding is
presented with the development of the equations governing the least-squares inverse. Five
algorithms for finding the minimum of the least-square problem are described and their speed
of convergence is compared on data from two planar earth models. Of the five algorithms
studied, the ridge-regression algorithm required the fewest numbers of forward problem evalu-
ations to reach a desired minimum.
Solution space statistics, including (1) parameter-standard errors, (2) parameter
correlation coefficients, (3) model parameter eigenvectors, and (4) data eigenvectors are
discussed.The type ofweighting applied to the data affects these statistical parameters. Weight-
ing the data by taking log,, of the observed and calculated values is comparable to weighting
by the inverse ofa constant data error. The most reliable parameter standard errors are obtained
by weighting by the inverse of observed data errors. All other solution statistics, such as data-
parameter eigenvector pairs, have more physical significancewhen inverse data error weighting
is used.

INTRODUCTION
Interpretation of resistivity soundings has been a topic of research since the early
1900s. Contributions to the recent geophysical literature have dealt with the develop-
ment and application of a wide variety of one-dimensional inversion techniques and
approaches for estimating uncertainties in the resulting parameters. This paper pre-
sents a comparison of five least-squares minimization algorithms. The comparison is
made in terms of the number of forward problem evaluations required by each

* Received March 1981; revision August 1981.


** Now with Chevron Resources Inc., San Francisco, California 94119, USA.
*** Engineering Geosciences, Material Science and Mineral Engineering, University of Cali-
fornia, Berkeley, California 94720, USA.

688 0016-8025/82/1oo0-0688 $02.00 0 1982 EAEG


LEAST-SQUARES INVERSION 689

technique to reach a residual minimum. Three weighting schemes which affect the
estimated parameter errors are compared at the minimum reached for a specific
model.
Until the advent of fast digital computers, the interpreter relied primarily on
curve matching procedures, where albums of theoretical curves (Compagnie Gener-
ale de Geophysique 1955, 1963, Mooney and Wetzell956, Flathe 1955, Orellana and
Mooney 1966, Rijkswaterstaat 1969) are used alone or in conjunction with the
auxiliary point method of partial curve matching (Kalenov 1957, Orellana and
Mooney 1966, Zohdy 1965). This method, while undoubtedly the most convenient
and simple, suffers from the drawback that the published curves cover only a limited
number of cases.

H I S T O R I C A DEVELOPMENT
L O F I N V E R S EM E T H O D S

The forward problem expressed in terms of integral expression for potential and
apparent resistivity for the Schlumberger electrode array was developed by Stefan-
esco, Schlumberger and Schlumberger (1930). With these and the associated kernel
function, development began on interpretation based on determining earth par-
ameters by fitting a theoretical apparent resisitivity curve or kernel function to its
observed counterpart. Similar expressions for apparent resistivity have been
developed for many other electrode arrays (Roy and Apparao 1971, Alpin 1966,
Keller and Frischknecht 1966). This general approach remains a very popular
method of inversion today. Another popular inversion method, which will not be
considered here, is based on the use of Dar Zarrouk parameters. This is well
described by Zohdy (1965, 1968, 1974a, 1974b, 1974~).
Stefanesco et a1 (1930) derived the integral expressions

= pll
271 r[’ +2 J ’t)Jo(Ar)
O(A, k, : dA],

where I is the current applied to the earth, r is the current electrode spacing, V ( r )is
the electrode potential at r = AB/2, p,(r) is the Schlumberger apparent resistivity, p1
is the top layer resistivity, Jo ,J1 are the zero and first-order Bessel functions, respect-
ively, O(A, k, t ) is the Stefanesco kernel function, A is the integration variable, k is the
resistivity reflection coefficient, and t is the layer thickness.
The Slichter kernel (Vozoff 1958) is defined by
T(A, k, t ) = pl[l + 2O(A, k, t)]. (3 )
This kernel can also be related to apparent resistivity through the inverse Hankel
transformation of (2):
W

T(A, k, t ) = J’ !?@ Jl(Ar)dr.


o r
690 G . M . H O V E R S T E N ET A L .

The Slichter kernel can also be expressed in a closed nonintegrable recursive form
for an arbitrary number of n layers of thickness ti (Sunde 1949), where

with
P1 - P2 T2, 3, ..., f l
Pl, 2 , ..., n =
P1 + P2 T2, 3, ..., n ’

- Pm- 1 - P m Tm(m- 1). ..., n


P(m- l)m, ..., n -
Pm- 1 + P m Tm(m- 1). ..., n ’
and
- Pn-1 - P n
P(m- l)m, .... n -
Pn-1 +Pn
The first general approach mentioned above consists of an algorithm that uses (2)
or some similar expression to calculate the forward problem and then to vary model
parameters until the calculated apparent resistivity pa matches the observed data to
some specified tolerance. Alternatively, the kernel function is derived from the appar-
ent resistivity data via (4), and (5) is used as the forward problem in a least-squares
method. This latter approach has the advantage that (5) is much faster to evaluate
than the forward problem represented by (2).
The inverse problem was first approached in the kernel domain by Slichter (1933)
using a solution for surface potentials developed by Langer (1933) for a one-
dimensional resistivity function which varied continuously with depth. Slichter’s
procedure was to determine the kernel function from the apparent resistivity and
then solve for the conductivity profile from the kernel function. Stevenson (1934)
modified the approach to accommodate a stepwise resistivity function of depth. A
partly graphic, partly numerical method was developed by Pekeris (1940) which also
used the Slichter kernel. The ease with which the kernel function can be calculated
encouraged many workers to concentrate on interpretation in the kernel domain.
Slichter used a power series representation for the kernel and later recursion formu-
lae [i.e. equation ( 5 ) ] were developed (Sunde 1949, Flathe 1955, Vanyan, Morozova
and Lozhemitina 1962, Kunetz 1966, Meinardus 1967).
More recently, Ghosh (1971) developed a set of coefficients for transforming
Schlumberger ‘and Wenner sounding curves into the corresponding Slichter kernels,
using linear filter theory rather than numerical integration. Koefoed (1965, 1966,
1968) used the raised kernel H@). Many other authors have also done recent work
concerning inversion in the kernel domain. These include Crous (1971), Meinardus
(1967, 1970), Onodera (1960), Pekeris (1940), Vozoff (1958), and Ginzburg, Loewen-
thal and Shoham (1976).
LEAST-SQUARES INVERSION 69 1

Whether inversion is performed with kernel functions or directly with measured


data, fitting the calculated to the observed data is usually carried out in a least-
squares sense. The least-squares fitting is in turn performed on a linearized version of
the governing equations.
The most common method for generating this system of equations is to expand
the calculated functions, such as (2) and (5), in a Taylor series about an initial
estimate PO in the parameter space. The series, neglecting second and higher order
terms, is

O(X, P)i = C(X, + cN a pa j [C(X,Pyi](Pj - Pj”),


j=l
~ (6)

with
j = 1, M and i = 1, N ,
where O(X, P)i is the ith observation or kernel as a function of X, the system
parameter vector, C(X, is the ith calculated function value [i.e. (2) or (5)], for X
and PO, the current estimate of the unknown parameter set, and ( P j - Pj”)is the
linear estimate of the correction needed in the jth unknown parameter.
Rewriting (6) in matrix notation, neglecting second and high order terms,
AG = AAP, (7)
where
AGi = O(X,P)i - C(X, Po)i,

A I.J . = x=xi (the system matrix),

and

Equation (7) represents a system of N linear equation in M unknowns, and since


this is a nonlinear problem, the solution of (7) provides only a linear estimate of AP,
the correction needed to reach the minimum. The classic least-squares inverse is
AP = (ATA)- ATAG. (91
The solution must be iterated, beginning each time from the current estimate PO of
the parameter set.
Vozoff (1958) calculated the kernel by inverse Hankel transformation and then
used least-squares minimization of
N

i=l

where Kl, 2, ..., is the kernel calculated for a particular parameter set, and K , is the
kernel integrated from field data.
Vozoff (1958) considered two methods based on gradients of the system matrix to
692 G.M. HOVERSTEN ET AL.

minimize (10). The first was a functional iteration method and the second a steepest
descent method. The functional iteration procedure follows Newton’s technique for
finding roots of a nonlinear equation of one unknown parameter. The procedure is
discussed by Hildebrand (1949). It consists essentially of an iterative Taylor series
technique. An initial estimate is made for the values of the parameters, assuming that
the estimate is quite close to the true values. The difference or the objective function
AG is approximated by the first two terms of its Taylor series expansion in terms of
the unknown parameters. The expansion is then used to calculate the necessary
changes in the initial estimate. The procedure is iterated until no further changes are
needed.
The steepest descent method deals directly with the difference function AG. If one
maps AG in its m-parameter space, an m-dimensional least-squares surface is gen-
erated. Any choice of a parameter vector not at the minimum gives a point on the
least-squares surface where its m-dimensional gradient is calculated. The parameter
vector is changed so that the value of AG descends along the steepest initial gradient.
Because the surface is not truly described by its first-order gradient alone, the initial
direction will probably not extend to the true minimum. Therefore the parameter
vector is changed until the minimum of the least-squares surface AG is found in that
particular direction. At this point a new gradient is calculated and a new descent is
made in the direction of steepest gradient. This procedure is iterated until a point of
zero gradient is reached or until AG becomes less than some predetermined value.
These two methods of minimizing a function of sums of squares are forerunners of
newer and more advanced methods which will be dealt with in this paper. The
general format of a least-squares approach is now widely used in both the kernel
domain and the apparent resistivity domain since the speed of new large computers
makes this time-consuming method more feasible and, more importantly, because it
lends itself well to statistical evaluation of the resulting parameters.
Many modern least-squares techniques are centered on the method of the gener-
alized inverse. The generalized inverse has been thoroughly discussed for the inver-
sion of surface wave and free oscillation data by Wiggins (1972) and Jackson (1972).
Jupp and Vozoff (1975), Vozoff and Jupp ( 1 9 7 9 and Inman, Ryu and Ward (1973)
found that in many resistivity problems, when model parameters are strongly cor-
related, the system matrix (8) was nearly singular. The resulting non-orthogonality
was not satisfactorily dealt with by the generalized inverse. Hoerl and Kennard
(1970a, 1970b) developed a theory which shows that linear estimation from non-
orthogonal data could be improved by the use of biased estimators. Hoerl gave the
name ridge regression to this method. Marquardt (1970) discussed the relation-
“ ”

ship between generalized inverse and ridge regression and summarized his finding in
these words:

The ridge and generalized inverse estimators share many properties. Both are superior to
least-squares for ill-conditioned problems. The generalized inverse solution is especially
relevant for precisely zero eigenvalues (in the system matrix). The ridge solution is compu-
tationally simpler and seems better suited to coping with very small, but non-zero,
eigenvalues.
LEAST-SQUARES I N V E R S I O N 693

Most practical resistivity problems, owing to the many highly correlated par-
ameters that can result, involve small but non-zero eigenvalues. For this reason
Inman (1975) used ridge regression, with good results, for the inversion of resistivity
data. The ridge regression approach has also been used recently by Glenn and Ward
(1976), Rijo, Palton, Feitosa and Ward (1977) and Petrick, Pelton and Ward (1977).

F I V E L E A S T - S Q U A R EMSI N I M I Z A T I OANL G O R I T H M S
The following descriptions of the algorithms closely follow the published literature.
Some notation has been changed from the original publications for consistency
within this paper.

1 . The simplex method


The method is designed for the minimization of a function of M variables without
constraints.
Let Qo, Q1, Q, be the (rn + 1)points in the rn-dimensional parameter space which
define the current least-squares surface, or simplex. Each point Qk has its own par-
ameter vector of length rn associated with it. Let AGk be the function value at Qk and
define h as the suffix such that AG,=max(AG,) and 1 as the suffix such that
AG, = min(AGk).
Next we define Q as the centroid of the points with k # h. (Qi, Q j )represents the
distance from Pi to P j . At each stage in the process one of three operations-
reflection, contraction, or expansion-is used to replace Qh with a new point in the
simplex. These true possible replacement points are defined as follows: the reflection
of Qh is denoted Q* and its coordinates are defined by the relation

Q* = (1 + a)Q - LXQh,

where LX, the reflection coefficient, is an arbitrary positive constant. This expression
indicates that Q* = Q on a line joining Q h and Q on the far side of Q from Q h with
(Q*, Q) = a(QhQ). If the Q* which corresponds to AG* falls between AGh and AG, , Qh
is replaced by Q* and the next iteration proceeds with the newly defined simplex.
If the reflection produced a new minimum AG* < Gf,then a step is taken to try
and find a further minimum. The expanded point Q** is given by the relation
Q** = Q* = (1 - y ) Q ,
where the expansion coefficient y is the ratio of the distance (Q*, *Q) to the distance
(Q, *Q). If AG* < GIP,Qhis replaced by Q**, and the process begins again. However,
if AG** > AG, it is called a “failed expansion”, and Qh is replaced by Q* before
restarting.
The third operation of contraction is used if, on reflecting Q to Q*, the condition
AG* > AG, exists for all k # h, i.e. that replacing Qhby Q* leaves AG* the maximum;
694 G . M . H O V E R S T E N ET A L .

then a new Qh is defined to be either the old Qh or Q*, whichever has the lower AG
value. The contracted point is defined by
Q*** = P Q h = (1 - P)Q.

The contraction coefficient lies between 0 and 1 and is defined by the ratio of
distance [Q***, Q] to distance [Qh, Q]. Q*** is used in place of Qh unless
AG*** min(AGh, AG*), i.e. the contracted point Q*** is worse than the better of
Qh and Q*. In this case of a “failed contraction”, all the Qk’s are replaced by
(Qk+ Q,)/2 and the process is restarted.
The iteration process is set to run until AG reaches or surpasses some preset
minimum value. The application of the simplex method to the minimizing of a
function of many variables is discussed by Nelder and Mead (1965).
2. Unconstrained global optimization (Bremermann)
This routine was developed initially for the solution of systems of nonlinear equa-
tions of up to 100 variables (Bremermann 1970). The routine has also been used for
finding maxima or minima of sums of squares, sums of exponentials, and curve
fitting. The method is briefly described as follows:
(a) AG is evaluated for the initial estimate of the parameter set PO.
(b) A random direction r is chosen. The probability distribution of the r is an
N-dimensional Gaussian with o1 = o2 = ... = on= 1.
(c) On the line determined by PO and r, the restriction of AG to this line is approx-
imated by five-point Lagrangian interpolation centered at PO and equidistant with
the distance h, which is a parameter of the method.
(d) The Lagrangian interpolation of the restriction of AG is a fourth-degree poly-
nomial in parameter A describing the line PO + 1r. The five coefficients of the Lagran-
gian interpolation polynomial are determined.
(e) The derivative of the interpolation polynomial is a third-degree polynomial with
one or three real roots. The roots are computed by Cordan’s formula (Bremermann
1970).
(f) If there is one root A,, the procedure is iterated from the point with a new
random direction provided that AG(Po + 1, r ) I AG(Po). If this inequality does not
hold, the method is iterated from PO with a new random direction.
(g) When there are three real roots A,, A 2 , then the polynomial is elevated at
PO + AI r, PO + 1,r and PO + A3 r. Also, considering the value of PO, the procedure is
iterated from the point where AG has the smallest value. If AG has a maximum value
at more than one point, the algorithm chooses one of them.
(h) Iteration continues until a predetermined number of interactions has been run or
until a prescribed minimum has been reached.

3. Peckham’s method
This method was specifically designed for minimizing a sum of squares of nonlinear
functions without calculating the gradients which make up the matrix A (Peckham
1970).
LEAST-SQUARES INVERSION 695

The work of Spendley, Hext and Himsworth (1962) and Nelder and Mead (1965)
in developing methods for minimizing functions in which the function is evaluated at
( N + 1) or more points forming a simplex in N-dimensional space suggested to
Peckham that, for problems where the function is a sum of squares, the function
values at a set of ( N + 1) or more points might be used to estimate values for the
coefficients C i and A i j in (8). These could then be used in (9) for a linear estimate of
the minimum. One iteration consists of replacing the point of the set with the highest
function value by the linear estimate of the minimum position.
Assume that there are function values Qklfor a set of p points with parameters Pjl
where p 2 N + 1 and 1 = 1, 2, . . ., p . Now consider the minimization at each of the
points in the M-dimensional hyperspace. The linear approximation is obtained by
picking C and A to minimize the N expressions

where i = 1, 2, . . ., N and wl is a weighting factor to determine the relative impor-


tance of each point in the simplex. The weighted mean is chosen as the origin for P j ,
so that
P
2 opjl
p= 1
= 0.

With the change of variables P>l= o l P j l and QL1= o l Q k l the values of C and A
which minimize AG are given by
=ProT,
p’pTAT
where

with
1
c =-
R
0’0.

If these values are substituted into (9), the linear estimation of the parameter set
at the minimum is
-1
PE= -( P P T ) ( P O ’ T ) ( P O T 0 0 ) .
R
In order to solve equation (12) it is rewritten as
-1
PE= -PP’TZ,
R
where
(P’0TO’PT)Z= P’OrTO’o.
696 G.M. HOVERSTEN E T AL.

Equation (13) is the normal equation of a linear least-square problem, i.e. the
Euclidean norm 11 0: - O’P’TZ11 has a minimum when Z satisfies (14).
The solution is obtained by use of orthogonal transformation (Golub 1965).
Peckham (1970) used the ALGOL procedure “Orthlin 2” for his solution. In addition
the values of col were chosen to give function values near the minimum more weight
in determining C and A
1
CO,=-.
AG

4. Ridge regression
Rather than using function values in an M-diqensional hyperspace to estimate
values for C , and A,,, it is very popular to use direct forward problem calculation for
C , and finite differences of these values to calculate A,, . The simplest approach is to
solve (7), neglecting weighting, for AP by calculating the least-squares inverse
(ATA)- ‘AT. However, Hoerl and Kennard (1970a) show that when (ATA)is nearly
singular, as it can be in many geophysical problems, the average difference between
estimated AP and true AP becomes very large. The ridge regression method (Leven-
berg 1944, Foster 1961, Marquardt 1963,1970) seeks to reduce this difference during
the iteration process by damping the diagonal terms of (ATA).The ridge regression
estimate of APRR is

APRR = (ATA + KI)-’ATAG, (15)

where I is the density matrix and K 2 0. According to Inman (1975)

The eigenvalues of (ATA + KI) are (1; + K), where 1; are the eigenvalues of ATA. Any
very small eigenvalues of the least-squares estimator will be increased in the ridge regres-
sion estimator by a factor K. Hence the inversion of the matrix (ATA + KI) will be more
stable. Increasing the size of all the eigenvalues results in a significant decrease of (a) the
mean of the squared length between AP true and APRR,and (b) the variance of the
estimated solution.

The basic concept of the technique is that the best direction for finding a reduced
sum of squares lies somewhere between the direction given by the Taylor series
increment and the direction of steepest descent. In (15), when K = 0, AP approaches
the Taylor series direction. This process insures second convergence near the mini-
mum. The effect of decreasing K in each iteration on the resulting intermediate
model is that the model will at first fit the broad, low-frequency aspects of the data
with higher frequency components being fitted as K decreases. Hoerl and Kennard
(1970a) give an excellent example of the comparison of the ridge regression inverse
and the classic least-squares inverse. A more detailed discussion of ridge regression
and its relationship to the generalized inverse is covered by Marquardt (1970). The
algorithm tested is routine ZXSSQ for the IMSL computer library.
LEAST-SQUARES INVERSION 697

5 . The spiral algorithm”


CL

The “spiral algorithm ” and its comparison to methods by Marquardt and Powell is
described in a paper by Jones (1970) for a number of models whose derivatives with
respect to the parameter (i.e. the elements of A) are analytic. Only the general
concepts of the algorithm, following Jones (1970), will be given here.
The main principal of the algorithm is that a reduction in AG can always be
found in the parameter space plane defined by the Taylor series estimate of the
minimum, the steepest descent estimate of the minimum and the starting point for
that iteration (see fig. 1). Figure 2 represents the plane ODT in parameter space,

n
t = Toylor series direction
5 = Steepest descent
direction

PI

Fig. 1. Contours of constant sum of squares, illustrating different directions toward the mini-
mum sum of squares.

‘0

Fig. 2. Geometric picture of the operation of the spiral algorithm.


698 G.M. HOVERSTEN ET A L .

where 0 is the starting point for an iteration, T is the Taylor series point and D is the
steepest descent point, chosen such that the distance OD is equal to OT. The basic
strategy is that the next starting point should be as far away from 0 as possible while
keeping the number of evaluations of the least-squares surface to a minimum.
Within an iteration, the first point checked is the Taylor series point T, which is
generated by Marquardt's method. If AGT < A G O ,this point is accepted as the new
minimum and a new Taylor series point is calculated from there. If ACT 2 A G O ,
then the linear approximation of the model at 0 is not valid at T. This implies that
the sum of squares valley must curve in one of the two directions, indicated by the
dashed lines in fig. 2. In order to try and intercept the valley, the spiral OST is
searched. The curve moves out from T at angle p into the search area OTD and
returns to 0 along a tangent to line OD. The optimum equation for the spiral found
by Jones (expressed in polar coordinates with origin at 0)is

r = ro[l - 8 COS p - (I - COS p)(e/y)2],


where r is the distance OS and ro is the distance OT.
The points S that are checked on the spiral are determined from a sequence of
points L which are generated on the segment TD. The points L divide the segment
TD in the ratio p : (1 - p ) where p is computed from the recurrence relation

Pn+l = 2PL,/(1+ 4.
This relation was chosen to ensure that the points L become closer together as they
near D.
The coordinates (a, 8) of the points L are given by the relations

p sin y
8 = tan-'
1 - p + p sin y

and
r 0 p sin y
a=
sin 8

With the starting point 0 as origin, the coordinates of the point S in parameter
space are given in terms of t and d, the coordinates of T and D respectively by the
relation

s = -pd
r
a
+ (1 - p ) t . (16)

Equation (16) is the main operating equation of the spiral algorithm aside from
the Marquardt technique for generating T. If AGT > AGO, each successive search
point is derived as a weighted sum of two parameter space vectors T and D.
The entire algorithm is more complicated than described here since it contains a
LEAST-SQUARES INVERSION 699

provision for dealing with spurious local minima. Interpolations are also performed
when three consecutive search points yield reduced sums of squares, the interpolated
minimum being checked to speed convergence.

PARAMETE
SRTATISTICS
The simplest view of the inverse problem considered here is that of an automated
“ curve fitting” procedure. A model is derived by finding a parameter distribution
that will produce a theoretical function, either p a or kernel function, that fits the
observed data in some least-squares sense. Weighted least-squares are usually used
so that the data can be fitted on one of two ways: (1)data are fitted uniformly when
the percentage data error is equal for all data (e.g. resistivity data is usually assumed
to have a constant percentage error at all electrode spacings), and (2) data are
selectively fitted when data errors are variable (e.g. in EM soundings data error is
usually a function of frequency).
An ideal weighting scheme for the first case is logarithmic (Rijo et al. 1977); the
function to be minimized is

Since the relative data errors are constant, this scheme weights the date equally and
eliminates the requirements for a weight vector to be included.
In the case of variable data error it would seem appropriate to weight each data
point in inverse proportions to its error. The following equation represents this type
of weighting:

where C i is the standard error for the ith data point.


For the least-squares inverse the weights enter as a matrix and (9) becomes
A P = (ATW-’A)-’ATW- ‘AG, where W is the diagonal matrix of data variance a?.
The choice of a weighting scheme does not appreciably affect the speed of a minimi-
zation. However, it can drastically affect the position of the minimum in parameter
space and the parameter statistics.
The effects of weighting represented by (17) and (18) along with “no weighting”
on statistical parameters such as standard errors, correlation coefficients, and eigen-
vectors will be discussed after these statistical parameters have been examined.
The statistical parameters which are useful in characterizing our models are: (1)
parameter standard errors, and (2) parameter correlation coefficient. In addition to
these statistical parameters, the parameter and data eigenvectors with their asso-
ciated eigenvalues can yield great insight into the relations between individual model
parameters and specific data.
700 G . M . H O V E R S T E N ET A L .

Parameter standard errors and correlations are derived from the covariance
matrix V evaluated at the minimum:
V = 02(ATW-'A)-', (19)
where
AGTW- 'AG
02 =
N-M '

w..=
i
O
i=j
.i # j '
and
N
(AG)' = 1[observed, - cal~ulated,]~.
i=l

If log,, weighting is being used as in equation (17), then w is replaced by taking log,,
of the observed and calculated values in AG.
The parameter standard errors are defined by the square root of the diagonal
term of V (e.g. JV,, equals the standard error for parameter number 1).
The correlation matrix is the diagonally normalized covariance matrix. Its terms
are the correlation coefficients, which are measures of the linear dependence between
parameters. The correlation matrix C is given by Jenkins and Watts (1968) as

If an element C , is near k 1, then the ith and jth parameters are strongly linearly
dependent. For example, if i represented the thickness t and j the resistivity p of a
layer (i.e. Cij represents the correlation between the thickness and resistivity of a
layer), then only the ratio t / p is well determined by the data if C i j N 1. This case is
true for layers that are highly conductive relative to their surroundings. If C i j=, - 1,
then only the product pt is well determined, as is the case for the relatively resistive
layers. This is the familiar equivalence problem discussed, for example, by Sunde
(1949).
The relationship between parameter correlations and parameter standard errors
is well explained by Inman (1975), and we will paraphrase him here. If the correla-
tions are small, then the standard errors, given by the square roots of the diagonals of
(19), are a good measure of the uncertainty of each parameter. If, however, two
parameters are highly correlated, i.e. C i jN f 1, then the standard deviations wiIl be
larger than the actual uncertainties. Figure 3 illustrates this fact with a generalized
slice of solution space. The two coordinates axes correspond to two parameters of the
estimated layered earth model. The ellipse indicates a confidence region within which
the residual sum of squares is expected to lie for a certain fraction of the repeated
experiments. This region also defines the values of the parameter p2 (resistivity) and
tz (thickness) which give a residual sum of squares within the contour. The origin is
defined by the parameter value at the final solution. The tilt of the axis of the ellipse is
LEAST-SQUARES INVERSlON 70 1

ESTIMATE D ESTIMATED CONFIDENCE


STANDARD DEVIATION REGION

STANDARD SOLUTION
DEVIATION

TRUE CONFIDENCE
, LIMIT

Fig. 3. Standard error ellipse (after Inman 1975).

a measure of the degree of correlation between the two parameters. If the standard
errors from (19) are taken to be the true deviation estimates, then the ellipse
is enclosed by a box whose sides are defined by the standard deviation. The box,
which ignores parameter correlation, represents a much larger confidence region
than the ellipse. By using the standard deviation implied by the box one obtains a
very conservative estimate of the parameter confidence interval for correlated par-
ameters. Therefore, by considering the standard deviations in conjunction with par-
ameter correlations, a more realistic parameter standard deviation can be arrived at
which is always less than or equal to the standard deviation computed from (19).
Two models, one described by Inman (1975) and one of our choice (called model 3),
are considered for comparison of the inversion routines and to illustrate some con-
cepts of the parameter statistics (see below, figs 11 and 12).
A common misconception about parameter standard e r r o s is that a single model
parameter can be varied by its estimated standard error with no significant change
resulting in the calculated forward problem. In fact, the parameter standard errors
and correlation coefficients must be viewed as representing a complex interactive
system that describes combinations of parameter changes which can be made with-
out a significant change in the estimated least-squares residual. For example, con-
sider model 3 with its conductive middle layer. The eigenvectors, eigenvalues,
correlation coefficients, and parameter standard errors, calculated using data errors
as weights in equation (18) are shown in fig. 8. Note the high positive correlation
between p2 and t2 indicating a linear relation between the two parameters, i.e.
S2 = t2 / p z . Figure 4 shows the sounding curve for model 3 along with error en-
velopes generated by varying t2 by its standard error. Similarly, fig. 5 shows the error
envelope generated by varying pz by its standard error. Clearly the change in p A is
much larger than the 1% error assumed in the data. However, if the ratio t , /p2 is
varied (see fig. 6), as indicated by their correlation, simultaneously by their standard
errors, the change observed in p A is of the order of 1%, the assumed error level, and
thus is not a statistically significant change.
The parameter eigenvectors and their associated eigenvalues are also very useful
in defining the relation between parameters and their overall effect on the data
generated from a particular model.
Lanczos (1961) factored the system matrix A into its row (parameter) and column
702 G . M . H O V E R S T E N ET A L .

5t"'.l ' " 1 1 1 1 8 1 ")',"'I ' ' ~ ' ' ~ ~ ~ ~ ~ T I


~ ~ ~ ~ ( ~
5 10 102 103 104 105
ELECTRODE SPACING, AB/Z (m)

Fig. 4. Model 3 sounding curve with 1% variation of p z ,

(observation) eigenvectors. The generalized inverse of A is defined in terms of these


eigenvectors and eigenvalues as
H = A-' = V A - ~ U T .

The matrix U consists of q (q is the rank of A) eigenvectors ui of length N associated


with the columns (data) of A. V is made up of the q eigenvectors Vi of length M
associated with the rows (parameters) of A. The matrix A-' is the inverse of the
diagonal matrix comprised of the eigenvalues of A. Figures 7 through 9 show these
quantities for model 3. They will be discussed later.

5 1,"l I 1 1 1 ~ l / , , l , ,,,,,,, ,
1 1 1 , 1 1 1

5 10 102 103 104 105


E L E C T R O D E SPACING, AB/2 (m)

Fig. 5. Model 3 sounding curve with 1% variation of t2.


LEAST-SQUARES INVERSION 703

5 5 k 10' ' " " " " 103 ' " " " ' 1 1 0 4
' ' t " " "
ELECTRODE S P A C I N G ,A B / ~ im)
' ' l l l l105i l
Fig. 6. Model 3 sounding curve with 1 % variation of t Z/ p z .

The q parameter eigenvectors comprising V are a new parametrization of the


model; they are the q-specific linear combinations of the parameters that can be
uniquely determined by the data. Similarly, the q data eigenvectors are the linear
combination of data which are tied, through the assumed model, to the linear com-
bination of parameters formed in V.
A useful analogy is to consider the eigenvectors as spectral components of the

Correlotion matrix Standard


pi fl p2 f2 p3 Truemodel Best fit e z
PI 1.0 100.0 pi =99.99 ? 0.21
t, -0.27 1.0 50.0 t l ~49.85 ? 1.053
pz 0.17 -0.96 1.0 3.0 ~ ~ 3 . 3 1 ? 2.17
f2 0.16 -0.96 1.0 1.0 100.0 f Z =I10.81 t 73.26
p3 0.16 -0.90 0.94 0.94 1.0 1000.0 p3=1005.7 t 4 . 4 8

Fig. 7. Parameters and data eigenvectors with associated eigenvalues, parameter correlations,
and best-fit model parameters for minimization without weighting.
704 G . M . H O V E R S T E N ET A L .

A8/2
4 t1f$t2P3 A I 5 10 15 20

*d[1 1
-I
83.5 I I I I I I I I I I , I ,

1 0 027 I.
Correlation matrix Standard
pi t, p, p3 True model
t2 Best fit error
pl 1.0 100.0 pi =9999 ? 0.39
/, -0.41 1.0 50.0 t i =-49.85 ? 0.56
p2 0.15 -0.83 1.0 3.0 p2=-3.31 ? 0.81
f , 0.15 -0.83 0.99 1.0 100.0 f p =-110.81 Z 27.80
p3 0.03 -0.27 0.36 0.36 1.0 1000 0 p3=1005.7 2 11.90

Fig. 8. Parameter and data eigenvectors with associated eigenvalues, parameter correlations,
and best-fit model parameters for minimization with data weighted by their standard errors
(standard error 1% of data value).

A B/2

'"11
-I
qt1qt2g

083
x I 5 10 15 20

k 0041 A

J 0 0 ~ 1 1 1 1

A 00004 I
Correlation matrix Standard
pi p2 t2 p3 True model Best f i t error
pl 1.0 100.0 p , =99.99 t 0.04
tl -0.41 1.0 50.0 f I =49.85 t 0.06
p2 0.15 -0.83 1.0 3.0 ~ ~ ~ 3 . 3 12 0.09
t p 0.15 -0.83 0.99 1.0 100.0 t2=110.81 ? 3.10
p 3 0.02-0.21 0.28 0.29 1.0 1000.0 p3=1005.7 ? 1.30

Fig. 9. Parameters and data eigenvectors, parameter correlations, and best-fit model par-
ameters for weighting by taking log,, of the observed and calculated data.
LEAST-SQUARES INVERSION 705

input and output of the linearized system. Then the decomposition of the matrix A is
similar to the decomposition of the impulse response of an ordinary linear filter in
terms of sinusoids (eigenvectors) of various amplitudes (eigenvalues). For a linear
filter the amplitude response at a particular frequency determines how the filter will
amplify the corresponding spectral component of the output.
Similarly, if we think of the matrix A as a filter relating the input (parameters), to
the output (calculated data), then the eigenvalues are the amplification coefficients
which determine the magnitude of the effect of the linear combination of parameters
ui, on the linear combination of data ui.Small eigenvalues and their associated
eigenvectors represent the spectral components which are poorly transferred through
the earth model. By considering these eigenvector-eigenvalue decompositions, one
can optimize data sets to contain the maximum information related to a model
parameter of particular interest. The subject of experiment design by this method is
discussed by Glenn and Ward (1976).
Figure 8 presents (i) the parameter eigenvectors (columns of V), (ii) eigenvalues
(diagonal elements of A), (iii) the data eigenvectors (columns of U) whose numbering
refers to the data points labelled in fig. 12, (iv) the parameter correlation coefficients,
and (v) the model parameter value with estimated standard errors. Some insight into
the physical significance of this eigenvector decomposition can be gained by con-
sidering the effects of varying parameters on the sounding curve. Figure 13 shows the
variation in pA caused by changing p1 by approximately 50%. Similarly, fig. 14
shows the variation in p A caused by a 50% change in t , . Note that the variation in p A
occurs from data points 1 to data points 10 or 11. Compare this with the second and
third eigenvector pairs of fig. 8, both of whose parameter eigenvectors are composed
of p,, t , components. Their corresponding data eigenvectors have components from
position 1 to 10. The same relation holds for the other eigenvector pairs. Compare
fig. 4,produced by changing p 2 , with the first eigenvector pair of fig. 8. The corre-
spondence between the eigenvector pair and changes induced in p A by varying a
particular parameter are not as clear for the fourth and fifth vector pairs of fig. 8.
However, if the two are lumped together (the eigenvalues are of the same order of
magnitude), the effect of varying t2 or p 2 manifests itself in data from positions 9 to
21, as expected.

Effects of weighting
From a comparison of figs 8 and 9 it is obvious (as proposed by Rijo et al. 1977) that
taking log,, of the observed and calculated data is the same as weighting by a fixed
relative error. The only difference appears in the estimated standard errors, where
those in fig. 8 used o2 = 1 in equation (19) and those in fig. 9 used

02 =
AGT log,, AG log,,
N-M
in equation (19). The standard errors estimated in fig. 8 encompass the true par-
ameter errors in all cases, whereas the standard errors shown in fig. 9 are consistently
too small.
706 G . M . H O V E R S T E N ET A L .

The eigenvector pairs, correlation, and parameter standard errors calculated with
no data weighting are shown in fig. 7. The eigenvectors are essentially the same as
those in figs 8 and 9, with one major exception: the data eigenvectors associated with
parameters p, , t, , and p3 have their components shifted to larger spacings. This bias
occurs because the large pA values toward larger AB/, dominate the sum of squares.
In effect, each data point is weighted by its own magnitude. In addition a much
higher degree of parameter correlation is found when no data weighting is used, and
estimated parameter standard errors are much larger than those calculated by
weighted schemes. For an example of this compare the correlation coefficients for
p 3 p, and p3 t , in figs 7 and 8.
Weighting the data by their standard errors as indicated in equation (12) seems to
be the most advisable because it is the most flexible to varying data errors and yields
the smallest standard errors, consistent with the true error, as shown in figs 7
through 9.
The parameter eigenvectors and correlation coefficients for Inman's (1975)model
are given in fig. 10. The first three eigenvalues are all of about the same order of

<=2 48 :[,
I
I I I I f4=OO8I "["-
I

Correlation matrix
PI tl P2 t2 Pa
p, 1.0 0.86 0.21 -0.22
0.11
t, 1.0 0.57 -0.58 0.16
1.0 -0.988 0.24
p2
t2 1.0 -0.29
1.0
p3

Fig. 10. Parameter eigenvectors and eigenvalues, and parameter correlations for Inman's
(1975) model.

magnitude with the last two very much smaller. The linear combination of par-
ameters represented by the first three eigenvectors has the greatest effect on the
sounding curve. In these three eigenvectors the elements p , and t , have opposite
signs while the elements p, and t , have the same sign. This indicates that if pz and t ,
are both changed in the same direction, the effect on the sounding curve (fig. 11)will
be larger than the effect of similar changes on other parameters. In addition, if p ,
increases while tl decreases, or vice versa, the sounding curve will also change. The
eigenvector associated with A: = 0.081 indicates, since A4 is small, that increasing or
LEAST-SQUARES INVERSION 701

103 , , I I I

15 16 17 18 19 20 2

II I I I I
I 10 102 103 104 I 5
ELECTRODE SPACING, A W 2 . Lm)

Fig. 11. Schlumberger sounding curve for Inman's (1975) model.

P3 = IOOOnm

I I I I I
1
0 I02 I03 104 10

I I I I

I I I I
10 10' 103 104
E L E C T R O D E SPACING, A B / 2 (m)

Fig. 13. Variation of pA for model 3 with 10%change in p l .


708 G.M. HOVERSTEN E T AL.

decreasing p , and t , together will have little effect on the sounding curve (fig. 11).In
other words, the ratio t , / p , is the combination of these parameters which affects the
sounding curve the most (note the correlation coefficient between p , and t , is
+0.86). The eigenvector associated with A: = 0.0097 indicates that increasing p 2
while decreasing t 2 , or vice versa, has little effect on the sounding curve (i.e. only the
product p 2 t 2 affects fig. 12). Again, this is also indicated by considering the correla-
tion coefficient between p 2 and t 2 , which is -0.988.

ELECTRODE SPACING, A B (rn)


~

Fig. 14. Variation of p~ for model 3 with 10% change in t l .

By considering the parameter eigenvector and parameter correlation coefficients


of this model it can be seen that the products p 2 t 2 are better determined by, and have
greater effect on, this data than either p 2 or t 2 separately. This is also true to a lesser
extent of the parameters in the ratio t , /pl.

Comparison of the inversion algorithms


The five methods discussed here were compared on two three-layer planar earth
models (figs 11 and 12). The data were generated by a numerical evaluation of
equation (2). Each of the five routines was modified to use log parameters to ensure
determination of physically meaningful parameter sets [i.e. no negative parameter
values (see Rijo et al. 1977)l.Each routine minimizes the same AG [equation (17)] and
begins its process from the same initial guess of the parameter vector.
The comparison can be made in terms of the number of forward problem evalua-
tions required to reach a desired minimum and by considering the accuracy of the
determined parameters. The criterion for defining the degree of fit between the ob-
LEAST-SQUARES INVERSION 709

served and calculated apparent resistivity is the data variance c2,given by Hamilton
(1964) as:

c =
2 Ci”=lI00 - Pcl?
N-M ’

where ( N - M ) is the number of degrees of freedom of the system.


Note that the measure of the fit is defined in real space and not in log or weighted
space.
The results of the five inversions on both models are given in table 1. The number
of forward problem evaluations, N , required to reach a data variance estimate of c2

Table 1. Results of t h e j v e inversions on models 3 and 4.

0 N P1 tl Pz tz P3 tZIP2

Model 3
Peckham 0.44 75 100.20 49.7 3.6 121.2 998 33.66
Marquardt 0.188 28 101.01 49.6 3.97 133.3 1005 33.57
Bremermann 212.0 192 101.53 50.2 3.92 133.4 1107 34.0
Simplex 0.034 332 100.02 49.6 3.6 120.0 1002 33.33
Spiral 0.048 45 100.01 50.1 2.7 91.7 997 33.96
True values 100 50 3 100 loo0 33
Model 4
Peckham 0.065 100 9.99 9.98 385.2 252.3 10.0 97 185.0
Marquardt 0.019 29 9.99 9.96 382.8 254.7 9.99 97499.0
Bremermann 1.56 67 10.41 11.29 524.23 181.87 10.05 95342.0
Simplex 9.03 299 10.00 10.3 594.8 162.0 10.09 96 357.0
Spiral 0.041 43 10.00 9.89 380.3 256.4 10.0 97508.0
True values 10 10 390 250 10 97500

0= Data variance estimate.


N =number of calculations of the matrix A.

and the resulting parameters are shown. The table also contains the linear combina-
tion of parameters for the middle layer, which are best determined by the data. For
Inman’s model (resistive middle layer) this is t 2 /p2 .
The initial guess for Inman’s model was that used by Inman (1975):
p 1 = 8 Rm t l = 15 m,
p2 = 500 Rm t2 = 150 m,
p3 = 5 Rm.
710 G . M . H O V E R S T E N ET A L .

The initial guess for model 3 was:


p 1 = 80 Rm t l = 100 m,
p , = 20 Rm t , = 50 m,
p 3 = 500 Rm.
The true model parameters of Inman’s model and model 3 are given in figs 11
and 12 respectively.
A general convergence criterion of (T = 0.05 was used. The values for (T can
become very much less than 0.05 in a single iteration (e.g. on iteration 30, (T = 0.36
for Marquardt’s method, on iteration 31, (T = 0.026). Iterations were stopped if the
number of function calls was excessive compared with the other routines; this only
occurred for the Bremermann and simplex methods.
In a comparison of this type, where we are using the ideal data and requiring a
close fit, all the parameters are just about equally resolved, given a low (T.For both
models, three methods, viz. ridge regression, Peckham’s method and the spiral algor-
ithm, all reached fairly close values and very similar parameter estimates (table 1).
For this reason, a comparison can be made simply in terms of N , the number of
function cells. The iterative progression of parameters in different routines is of
interest since it can give an indication of how a particular routine reaches a model.
For Inman’s model the ranking in terms of N is (1)ridge regression, (2) spiral, (3)
Peckham, (4) Bremermann, and (5) simplex. Considering the accuracy of parameters
p, and t , , it is interesting to note that Peckham’s method has more’accurate esti-
mates for p2 and t , than the ridge regression or spiral methods, although the addi-
tional 50 plus function calls outweigh the increased accuracy. The Bremermann and
simplex estimates for t , are very poor, but it is interesting to note in the case of both
simplex and Bremermann that, although p, and t 2 are in error by as much as 100 %,
their product p, t , is only 2% low.
The ranking in terms of N for model 3 is identical to the ranking for Inman’s
model. All of the routines gave good estimates for p l , t,, p 3 . All five also gave good
estimates of the ratio t , / p 2 , even though the individual parameters were either both
high (i.e. Peckham, ridge regression, Bremermann, simplex), or low (i.e. spiral).
The procession of conductive middle layer parameters, p , , t,, and t z / p 2 is
plotted as a function of forward problem evaluation in figs 15, 16, and 17 respectively
for the three fastest routines. The behavior of the two parameters p, and t , is
characteristic of the other parameters (not shown).
It is immediately obvious that parameter procession for Peckham’s method is
characterized by large oscillations from one forward problem evaluation to another
as contrasted with rather smooth transitions of parameter values for the spiral algor-
ithm and for Marquardt’s methods. The oscillations in Peckham’s method are due to
using a simplex to define the forward problem function space, since each new set of
parameter estimates is obtained through an arbitrary operation, such as reflection of
contraction. This is to be contrasted with the almost monotonous procession
of parameters in the spiral and Marquardt routines, which essentially use Taylor
series increments to define forward problem function space. It is also worth noting
LEAST-SQUARES INVERSION 711
5O1 I I I I I I

Fig. 15. Progression of the estimate of p 2 for model 3 as a function of the number of forward
problem evaluations. (- - - -) Marquardt, (. ... .) spiral, (-) Peckham.

that Peckham’s parameter estimates develop an oscillation in values near the true
solution, whereas the other two routines converge much more rapidly to a solution.
One final property of the solutions that should be noted is the expression of the
equivalence principal as seen by the procession of the longitudinal conductance
S 2 = t2/ p 2 . For all three routines the longitudinal conductance S 2 reached the true
value much faster than either individual parameter, and even when both parameters
p2 and t2 are in error (as they are for both spiral and Peckham), their ratio t2/ p 2 has

:I ;
5 0 .. .,: I I I I I I -
712 G . M . H O V E R S T E N ET A L .

FUNCTION C A L L S

Fig. 17. Progression of the estimate of t 2 / p 2 for model 3 as a function of number of forward
problem evaluations. (- - - -) Marquardt, ( . . . .) spiral, (-) Peckham.

been accurately determined. This is a realization of the fact that, for a thin conductive
layer. Sz is the quantity best determined by the data.
Considering both models, the ranking in terms of reaching the lowest a’ with the
fewest number of forward problem evaluations is (1) ridge regression, (2)spiral algor-
ithm, (3) Peckham’s method, (4) Bremermann’s method, ( 5 ) simplex method.
It should be noted that, for Bremermann’s method, the number of function evalu-
ations used in the iterative process is independent of the number of parameters describ-
ing the system, while the other four routines require more function evaluations as the
number of parameters increases. For this reason, the Bremermann method would
compare more favorably for models with a large number of parameters.

CONCLUSION
It has been demonstrated that parameter statistics such as parameter standard
errors, parameter correlation, and associated eigenvectors can be greatly affected by
choice of data weighting. The use of inverse data error weighting is flexible and yields
the most reliable parameter standard error estimates. In addition, the relationships
between parameter and data eigenvectors are physically correct and not biased as in
the case where no weighting is used.
In the comparison of the five least-squares minimization algorithms, the ridge
regression algorithm proved to require the fewest number of forward problem evalu-
ations to reach a desired fit. The ranking of the five algorithms is the same for both
models tested, indicating that the relative speeds of the algorithms is, at least to some
degree, model-independent.
LEAST-SQUARES INVERSION 713

ACKNOWLEDGMENTS
This work was supported by the Assistant Secretary for Conservation and
Renewable Energy, Office of Renewable Technology, Division of Geothermal and
Hydropower Technologies of the US Department of Energy under Contract
No. W-7405-ENG-48. A FORTRAN IV version of the spiral algorithm was obtained
for testing through the courtesy of Shell Research Ltd.

REFERENCES
ALPIN,L.M. 1966, Dipole methods for measuring earth conductivity, trans. KELLER, G.V.,
Consultants Bureau.
BREMERMANN, H. 1970, A method of unconstrained global optimization, Mathematical Bio-
sciences, 9, 1-15.
COMPAGNIE GENERALE DE GEOPHYSIQUE, 1955, Abaques de sondage tlectrique: Geophysical
Prospecting 3, Suppl. 3.
COMPAGNIE GENERALE DE GEOPHYSIQUE, 1963, Abaques de sondage electrique, EAEG, The
Hague.
CROUS,C.M. 1971, Computer-assisted interpretation of electrical soundings MS thesis, Color-
ado School of Mines.
FLATHE, H. 1955, A practical method of calculating geoelectric model graphs for horizontally
stratified media, Geophysical Prospecting 3, 268-294.
FOSTER,M. 1961, An application of the Wiener-Kolmogorov smoothing theory to matrix
inversion, Journal of the Society for Industrial and Applied Mathematics 9, 387-392.
GHOSH,D.P. 1971, The application of linear filter theory to the direct interpretation of
geoelectric resistivity measurements, Geophysical Prospecting 19, 192-217.
GINZBURG, A., LOEWENTHAL, D. and SHOHAM, Y. 1976, On the automated interpretation of
direct current resistivity, Pure and Applied Geophysics 114, 983-995.
GLENN,W.E. and WARD,S.H. 1976, Statistical evaluation of electrical sounding methods.
Part 1: Experiment design, Geophysics 41, 6A, 1207-1222.
GOLUB,G. 1965, Numerical methods for solving linear least squares problems, Numerische
Mathematik 7, 20C216.
HAMILTON, W.C. 1964, Statistics in Physical Sciences, Estimation, Hypothesis Testing, and
Least Squares, Ronald Press Co., New York.
HILDEBRAND, F.B. 1949, Advanced Calculus for Engineers, Prentice Hall, New York.
HOERL,A.E. and KENNARD, R.W. 1970a, Ridge regression: Biased estimation for nonortho-
gonal problems, Technometrics 12, 55-67.
HOERL,A.E. and KENNARD, R.W. 1970b, Ridge regression: Applications to nonorthogonal
problems, Technometrics 12, 69-82.
INMAN, J.R., RYU,J. and WARD,S.H. 1973, Resistivity inversion, Geophysics 38, 1088-1108.
INMAN, J.R. 1975, Resistivity inversion with ridge regression, Geophysics 40,798-8 17.
JACKSON, D.D. 1972, Interpretation of inaccurate, insufficient, and inconsistent data, Geophy-
sical Journal of the Royal Astronomical Society 28, 97-109.
JENKINS, G.M. and WATTS,D.G. 1968, Spectral Analysis and its Application, Holden-Day Inc.,
San Francisco.
JONES,A. 1970, Spiral-a new algorithm for non-linear parameter estimation using least
squares, Computer Journal 12, 301-308.
714 G . M . H O V E R S T E N E T AL.

JUPP,D.L.B. and VOZOFF,K. 1975, Stable iterative methods for the inversion of geophysical
data, Geophysical Journal of the Royal Astronomical Society 42, 952-976.
KALENOV, E.N. 1957, Interpretation of Vertical Electrical Sounding Curves, Gostoptekhizdat,
Moscow.
KELLER, G.V. and FRISCHKNECHT, F.C. 1966, Electrical Methods in Geophysical Prospecting,
Pergamon Press, New York.
KOEFOED, 0. 1965, Direct methods of interpreting resistivity observation, Geophysical Pros-
pecting 13, 568-591.
KOEFOED, 0. 1966, The direct interpretation of resistivity observations made with a Wenner
electrode configuration, Geophysical Prospecting 14, 71-79.
KOEFOED, 0. 1968, The Application of the Kernel Function in Interpreting Geoelectrical
Measurements, p. 11 1, Gebruder Borntraeger, Berlin.
KUNETZ, G. 1966, Principles of Direct-current Resistivity Prospecting, p. 103, Gebriider Born-
traeger, Berlin.
LANCZOS, C. 1961, Linear Differential Operators, D. Van Nostrand, London.
LANGER,R.E. 1933, An inverse problem in differential equations, Bulletins of the American
Mathematical Society, Series 2, 29, 814-820.
LEVENBERG, K. 1944, A method for the solution of certain nonlinear problems in least squares,
Quarterly of Applied Mathematics, 2, 164-168.
MARQUARDT, D.W. 1963, An algorithm for least-squares estimation of non-linear parameters,
Journal of the Society for Industrial and Applied Mathematics, 11, 431-441.
MARQUARDT, D.W. 1970, Generalized inverses, ridge regression, biased linear estimation, and
non-linear estimation, Technometrics 12, 591-612.
MEINARDUS, H.A. 1967, The kernel function in direct-current resistivity sounding, MS thesis,
Colorado School Mines.
MEINARDUS, H.A. 1970, Numerical interpretation of resistivity soundings over horizontal
beds, Geophysical Prospecting 18,415-433.
MOONEY, H.M. and WETZEL, W.W. 1956, The Potentials about a Point Electrode and Appar-
ent Resistivity Curves, University of Minnesota Press, Minneapolis.
NELDER,J.A. and MEAD,R. 1965, A simplex method for function minimization, Computer
Journal 7, 308-313.
ONODERA, S. 1960, The kernel function in the multiple-layer resistivity problem, Journal of
Geophysical Research 65, 3787-3794.
ORELLANA, E. and MOONEY, H.M. 1966, Master Tables and Curves for Vertical Electrical
Sounding over Layered Structures, Interciencia, Madrid.
PECKHAM, G. 1970, A new method for minimizing a sum of squares without calculations,
gradients, Computer Journal 13, 418-420.
PEKERIS, C.K. 1940, Direct method of interpretation in resistivity prospecting, Geophysics 5,
31-42.
PETRICK, W.R., PELTON,W.H. and WARD,S.H. 1977, Ridge regression inversion applied to
crustal resistivity sounding data from South Africa, Geophysics 42, 995-1006.
POWELL,M.J.D. 1964, An efficient method for finding the minimum of a function of several
variables without calculating derivatives, Computer Journal 7, 155-162.
RIJO, L., PELTON, W.H., FEITOSA,E.C. and WARD,S.H. 1977, Interpretation of apparent
resistivity data from Apodi Valley, Rio Grande de Norte, Brasil, Geophysics 42,
8 11-822.
ROY,A. and APPARAO, A. 1971, Depth of investigation indirect current methods, Geophysics
36, 5, 943-959.
RIJKS~ATERSTAAT, 1969, Standard Graphs for Resistivity Prospecting, EAEG, The Hague.
LEAST-SQUARES INVERSION 715

SLICHTER, L.B. 1933, Interpretation of resistivity prospecting for horizontal structures, Physics
4, 307-322; also erratum, p. 407.
SPENDLEY, W., HEXT,G.R. and HIMSWORTH, F.R. 1962, Sequential application of simplex
design in optimization and evolutionary operation, Technometrics 4, 441-461.
STEFANESCO, S.S., SCHLUMBERGER, C. and SCHLUMBERGER, M. 1930, Sur la distribution
electrique potentielle autour d’une prix de terre ponctuelle dans un terrain a couches
horizontals, homogenes, et isotropes, Journal de Physique et le Radium 1, 132-140.
STEVENSON, A.F. 1934, On the theoretical determination of earth resistance from surface
potential measurements, Physics 5, 114-124.
SUNDE,E.O. 1949, Earth Conductor Effects in Transmission Systems, Dover Publications,
Inc., New York.
VANYAN,L.L., MOROZOVA, G.M. and LOZHEMITINA, L. 1962, On the calculation of theoretical
electrical sounding curves, Prikladnaya Geofizika 34, 135-144 (in Russian).
VOZOFF,K. 1958, Numerical resistivity analysis-Horizontal layers, Geophysics 28, 536-556.
VOZOFF, K. and JUPP,D.L.B. 1975,Joint inversion of geophysical data, Geophysical Journal of
the Royal Astronomical Society 42, 977-991.
WIGGINS,R.A. 1972, The generalized inverse problem, Reviews of Geophysics and Space
Physics 10, 251-286.
ZOHDY,A.A.R. 1965, The auxiliary point method of electrical sounding interpretation, and its
relationship to the Dar Zarrouk parameters, Geophysics 30, 644-660.
ZOHDY,A.A.R. 1968, The effect of current leakage and electrode spacing errors on resistivity
measurements, in Geological Survey Research, US Geological Survey Professional Paper
600-D, pp. D258-D264.
ZOHDY,A.A.R. 1974a, Use of Dar Zarrouk curves in the interpretation of vertical electrical
sounding data, US Geological Survey Bulletin, 1313-D.
ZOHDY,A.A.R. 1974b, A computer program for the calculation of Schlumberger sounding
curves by convolution, US Geological Survey Report GD-74-010, PB-232-056.
ZOHDY,A.A.R. 1974c, A computer program for the automatic interpretation of Schlumberger
sounding curves over horizontally stratified media, US Geological Survey Report
GD-74-017, PB-232-703.

You might also like