Maximum-Likelihood: Seismic Deconvolution
Maximum-Likelihood: Seismic Deconvolution
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. GE-21, NO. 1, JANUARY 1983
JERRY M. MENDEL,
FELLOW, IEEE
Abstract-The purpose of this paper is to describe a broad spectrum of seismic deconvolution problems and solutions which we refer to collectively as maximum-likelihood (seismic) deconvolution (MLD). Our objective is to perform deconvolution and wavelet estimation for the case of nonminimum phase wavelets. Our approach is to exploit state-variable technology, maximum-likelihood estimation, and a sparse spike train (Bernoulli-Gaussian) model for the reflection signal. Our solution requires detection of significant reflectors, wavelet and variance identification (nonlinear optimization), and estimation of the spike density parameter.
Wavelet information
sequence
(k)
p1Lk)
SOURCE
z(k)
DECONVOLUTION
~~FILTER ~~~~
Fig. 1. Single-channel system whose measured output is filtered to give an estimate ,u(k) of system input , (k). The MVD filter needs information about V(k), n (k), and ,u (k).
I. INTRODUCTION S EISMIC DECONVOLUTION is the signal-processing procedure for removing the effects of the seismic source wavelet from a seismogram, the result being an estimate-of the reflectivity sequence. We express the measured seismogram z (k) as
i=l
In this model VR (k) is the noise-free seismogram; n(k) is ''measurement noise,"' which accounts for physical effects not explained by VR (k), as well as sensor noise; V(i), i = 0, 1, * * is a sequence associated with the seismic source wavelet; and p(i) is the reflectivity sequence which contains valuable information about the earth's subsurface. Many deconvolution procedures exist [1], however, most are limited by restrictive modeling assumptions such as minimum-phase source wavelets and stationary noise and/or reflectivity sequence. Mendel [2], [3] and Mendel and Kormylo [4], [5] developed a state space/minimum-variance estimation theory procedure for deconvolution that has the flexibility for eliminating these restrictive modeling assumptions. They developed deconvolution filters which perform optimal smoothing (Fig. 1), but, these filters require certain information about the seismogram model (1) that usually just is not available ahead of time. Often, one does not know the source wavelet. In a land experiment, it is too expensive to drill a deep hole to measure it; in a marine experiment, it is sometimes meaManuscript received January 13, 1982; revised March 23, 1982. This work was supported in part by the National Science Foundation under Grant NSF ECS-7926454, the U.S. Geological Society under Survey Grant 14-08-0001-G-553, Chevron Oil Field Research Company under Contract 76, and by the Sponsors of the University of Southern California Geo-Signal Processing Program. The work was performed at the Department of Electrical Engineering, University of Southern California, Los Angeles, CA. J. J. Kormylo is with the Exxon Production Research Company, Houston, TX 77001. J. M. Mendel is with the Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90007.
sured, but oftentimes it is not. A recorded marine signature is a near-field measurement which differs dramatically from the far-field signature which is needed to seismic processing, hence, even if the marine wavelet is recorded, it is of limited use. Variances of n(k) and ,(k) are also, generally, not known a priori. Of course, if all of this information is known then minimum-variance deconvolution (MVD) can be applied directly to the semismic data. We shall refer to this as the "everything is known" case. If, on the other hand, some or all of the information needed by the deconvolution filter in Fig. 1 is unavailable it must be estimated. The purpose of this paper is to describe a broad spectrum of seismic deconvolution problems and solutions, ranging from the "everything is known" case to the "almost nothing is known" case. We refer to these problems and their solution techniques as maximum-likelihood (seismic) deconvolution (MLD). This is a generic name which includes both conditional and unconditional likelihood [6] -terms which are explained in detail in Section III. Although we may i-ot know V(k) and/or the statistics of n(k), we will always assume that some limited information is known about ,(k). By "limited information" we mean that ,(k) cannot be measured; at best, we may know some statistical information about it. For example, we may know that ,u(k) is Bernoulli-Gaussian (this model is described in Section II), but we do not know the parameters that characterize its probability density function. To reiterate, the ultimate objective in seismic deconvolution is to design a digital signal processor which reconstructs the reflectivity sequence as accurately as possible. It must be able to do this using only noisy measurements and a priori limited information about the input. In general, we must also estimate the effective source wavelet, which is often nonminimum phase. Traditional approaches to deconvolution usually model reflectivity as white Gaussian noise [7], [8], either explicitly or implicitly by using a quadratic objective function. This approach cannot resolve different phase realizations for the wavelet. We instead model reflectivity as a sparse spike train,
1983 IEEE
0196-2892/83/0100-0072$01.00
73
parameters
corresponding to an earth model which consists of a few strong reflectors. We assume that the reflectors are uniformly distributed over time (Bernoulli sequence) and that the corresponding reflection amplitudes are Gaussian distributed. The resulting likelihood objective function is not quadratic. Our approach is similar to other sparse spike train methods, such as L-1 norm [9] and Minimum Entropy [10]. These other methods use nonquadratic objective functions which are preferential to sparse spike trains. These methods are somewhat ad hoc in nature and may break down at low signal-tonoise ratios. An important difference between our approach and all other seismic deconvolution methods is the use of a low-order autoregressive moving-average (ARMA) model for the wavelet. Our principle reason for using an ARMA wavelet is to reduce the order of the resulting MVD filter, with a corresponding reduction in computational effort.
II. MODEL DESCRIPTION Our starting point for MLD is a state-variable model for z(k) in (1). The state-variable model parameterizes wavelet V(k), i.e., we assume that
n(k)
~(k)SOURCE
Limited
(e
C X, q(k))
ju(k) information
z(k)
DECONVOLUTION |E(k)
FILTER
V(k)
In this model q(k) is a Bernoulli sequence [i.e., a random sequence of zeros and ones [14] 1 with parameter X Pr [q(k)]
=
X:
q(k)
I1- X: q(k) =O
(10)
and r(k) is a zero mean Gaussian white noise sequence with variance C. Sequences q(k) and r(k) are assumed to be statistically independent. One can describe ,u(k) as a random spike sequence; when ,(k) # 0 its amplitude is Gaussian. Additionally [27], ,(k) as defined in (9), is white, and E {p2 (k)}
=
C X.
(11)
The product model is very versatile. If q(k) is deterministic and r(k) is Gaussian and white, then,(k) is Gaussian, white V(z1)= Zn +Ofz- -z-+l-z+t zn (2) and nonstationary. Its variance varies with q(k) as a function of time. If r(k) is constant and q(k) is Bernoulli, then p(k) is in which a= col (a,, a2, * *an) and3 = col (j1, 32, **6n) again white; now, however, it is just a random sequence of are the wavelet's parameters. Our state-variable model is zeros and ones. Reflectivity sequence p(k) is white and is characterized by x(k + 1) = fx(k) + u(k) (3) two parameters C and X. We are not able to measure ,(k), z (k) = h'x(k) + n (k) (4) hence, this decomposition and description of r(k) and q(k) constitutes what we shall mean by "limited information" where x (k) = col [xX (k), X2 (k), * *, xn (k)] about p(k). Based on the discussions in this section, Fig. 1 can be reinterpreted as in Fig. 2. The wavelet information is contained in the 2n ARMA oa- and ,B-parameters; the noise information is In I contained in variance R (plus our Gaussian assumption); and the reflectivity sequence information is contained in our b= V O / ~~~~~~~~~~~(5) and related quantities, such as CX or knowledge product model whch~ inn k)n -**(*k)] of reflector locations q (k). in which In - 1 denotes the (n -1) X (n - 1) identify matrix III. MAXIMUM-LIKELIHOOD FORMULATION y= col (0 0 . 0 1) (6) Our ultimate objective is to simultaneously estimate the and correct phase realization of wavelet V(k), the statistical paramh =CO1 (On On-1 O (7) eters associated with n(k) and p,(k) [i.e., R, C, and X] and the input p(k) for all k . Measurement noise n(k) is assumed to be Gaussian and white case mentioned in 0. This is the "almost nothing is known" Section I. The fact that we assume ,(k) is and Bernoulli-Gaussian implies something is known at the onset, E{n2 (k)} = R. (8) hence, it is not proper to refer to this as the "nothing is known" In order to complete the problem description we must char- case. This general problem is a very difficult one. Our approach acterize the reflectivity sequence g(k). In seismic deconvoluto its solution is based on conditional and unconditional likelition, ,u(k) is often assumed to be a white sequence. We adhere hood functions, and on the method of maximum likelihood to this assumption. We have found it useful to model p (k) as a Bernoulli- Gaussian (ML) ([ 1 5], [16] and [1 7], for example). The utility of the product model is that all of the reflection sequence ([11], [12], [13], [27]), which can be expressed amplitude information is contained in the "amplitude" severy conveniently in the following product form: quence r(k) and all of the reflection location information is g (k) = r(k) q (k). (9) contained in the "event" sequence q(k), so that we can esti-
+aizn-i+..+a-i
...
74
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. GE-21, NO. 1, JANUARY 1983
mate the reflection amplitudes and locations separately. Let r (k) and q (k) denote the maximum-likelihood estimates (MLE's) for r(k) and q(k), respectively. The invariance property of maximum-likelihood estimation, [171, lets us compute the MLE of p,(k), namely, , (k) as
p (k) = r (k) q (k)
(12)
We shall exploit this separation principle extensively in the sequel. Let 6 denote a vector of constant, but unknown parameters. We assume that measurements are available to us at k = 1, 2, *- ,N, and we collect these measurements into an N X 1 vector z, where
z = col [z(1), z(2), .*. , z(N)].
too many events are detected [18]. This is not altogether surprising, since it is well known that the ML criterion cannot solve the "order determination" problem in system identification [28] , and, both order determination and event detection involve adding degrees of freedom to the model. The usual solution is to replace equation (20) by a modified-likelihood function, such as Akaike's information criterion [29], which is not biased towards additional degrees of freedom. Our solution is to treat r as a collection of nuisance parameters [6]. By this approach r is integrated out of S {r, q, 1z} in (20) and we first estimate 6 and q by maximizing
(13)
The likelihood of 6 given the observations z is defined to be proportional to the value of the probability density function of the observations given the parameters, i.e.,
L {lz} oc p(z 10)
(21) (22)
(14)
where L is the likelihood function and p is the probability density function (PDF); L {01z} is referred to as the conditional likelihood function [6]. The MLE 6 is the value of 6 which maximizes L {6lz} for a given set of measurements z. ML is closely related to the Bayesian approach to estimation of random quantities. If we consider 6 to be a vector of random variables, then, using Bayes rule, we see that
In the second stage we treat q and 0 as known (true) values of q and 0. Then, estimation of r(k) is equivalent to estimatian of a Gaussian ,u(k), for when q and 6 are known, we can rewrite (3) and (4) as
(23)
(24)
(15)
where p(z) is not a function of 6. The most probable estimate of 6 is, therefore, the value which maximizes the function
tion [6]. We define parameter vector 6 as containing the ARMA parameters in (2) and the statistical parameters, i.e.,
where E{n2(k)} = R. We can then estimate the white amplitude sequence, r(k), using Mendel and Kormylo's MVD formulas, since those formulas only require (, yY (k), I, and R. Observe that yq (k) is a time-varying y-vector. Once we have r (k), then ,i(k) follows from (12). Regardless of which one of the two preceding approaches is taken, one first finds 6 and q after which he finds r via MVD as just described. For rigorous discussions supporting these statements see [ 1] and [18].
IV. DETAILED PROBLEM FORMULATIONS AND SOME RESULTS Our approach to solving the total problem of estimating r, q, and 6 is to look at several subproblems where all but a few of these quantities are assumed to be known a priori. Finding ML estimates for q and 6 is a problem in nonlinear optimization, hence, iterative algorithms have been developed to update the components of q and 6 so as to increase the likelihood at each iteration. Taking advantage of the invariance property of ML estimates, we then combine these iterative algorithms using a Block-Component Method. We now proceed to describe ten problems that have been studied. These problems range from the "everything is known" case (MVD) to the "almost nothing is known" case. Important aspects of each problem are summarized in Table I, so that the reader can see the forest from the trees.
(1 7) , tn~Oi 1* *, On, R, C, A). Because we shall be obtaining MLE's of r(k) and q(k), as well as 6, we define two additional vectors r and q as
-
0 = col (a,l,
(18)
(19)
,q(N)].
Estimation of r and q is fundamentally different than the estimation of 6, since 6 contains deterministic elements, whereas r and q contain random elements. One approach to estimating 6, r, and q is to use the following likelihood expression:
(20)
In this case, S{r, q, Olz} is conditional with respect to 6 and unconditional with respect to r and q. The likelihood expression in (20) uses a probability function for q rather than PDF, since q(k) only takes on two discrete values, 0 and 1. In order to obtain (20), we have also used the fact that r and q are statistically independent. When S {r, q, e z} is maximized with respect to r and q, far
A. Minimum- Variance Deconvolution (MVD) Mendel [21, [3] and Mendel and Kormylo [41, [5] devel, N) when oped a MVD filter for obtaining ,(k) (k = 1, 2, all elements of 6 in (17) are known ahead of time. Their solution solves the deconvolution problem when "everything is known." It is available both for single- and multichannel systems, although it has been numerically tested only in the singlechannel case. Using a pseudo-random number generator, we generated the
-
75
CMET COMMENTS
decoinimmvoauianc
deconvolution
&
1k)
subsequently,i(k)
q9T& PT
{!
where
T'
S{L9I|zTr}aP((zI
C.Detection C.Detection of of
dad
n1 {qq(kik-i)/7q(k)+qll1q(k)jl
2mf-X-2(N-m)A(1-X) N
m=
Recursive detector
,T)Pr(q9IT) =exp[-J(q)/2]
detector
Gaussian-sum detector
E q(k)
Single-most-likelyreplacement detector
(SMLR)
'SMLR best
detector works
D.
D.Wavebestimaion
See(a)&(b).WhenqT isgiven (k) L{6Iz,1r}ap(zI9 ) exp[-J(6)/2] T Marquardt-Levenburg is Gaussian &E{,2(k)jqTCT}zCTqT(k) where -olNar@)(RT2CT 9T)T Requires a good initial guess wavelet. 2Algorithm
wher
T& PT col(a,)
AT &
See (a)
See
F. Variance estimation e1 col(fR,f/C ) and q and -Algorithm event detection X G. X-estimation H Event detection & X & q
=
everything
X-estimation - col (a /)
oesq
?T col(CR,a @)
eT coI(a,/) S{q61j1z1AXTyT}ap(zl1q yT)Pr(9jT) Marquordt-Levenburg Closed form solution qT& %col(C,R,9at.) L{XI9T}aPr(gT1 A)
{
See
(b)
/RfVC
5{ 0X10pig
) | n
OA subscript
0.24
0.20
.~~~~~~~~~
0.10 0.00
0.00
-0.10
-0.08
1TI1
II.
.. ....
AA.
A
IV
I~Irr-Y
-0.20
0.00
0.20
0.40
0.60
0.80
100
-0 .16 1 0.00
...
..
....
...
..
...
...
...
..
...
...
0.20
0.40
0.60
0.80
1.00
[51).
0.00 -0.30
-0.60
0.20
0.10
000
-0.10-
-0.90
0.00
0.08
0.16
0.24
0.32
-0.20
0.00
0.20 0.40 0.60 0.80 Two Way Travel Time 1.00
Seconds
Bernoulli-Gaussian sequence shown in Fig. 3. This signal was then convolved with a fourth-order ARMA wavelet, shown in Fig. 4, to which white Gaussian noise was added to produce the synthetic seismogram data [i.e., z(k) in Fig. 2] shown in Fig. 5. Fig. 6 depicts MVD values of ,(k) which were obtained using a fixed-lag smoother with a lag of 5 units (see [5] for details). Observe that the MVD filter, in which we use (11) for E{fp2(k)}, produces a nonzero value for ,^i(k) at every time point.
B. Maximum-Likelihood Deconvolution (MLD) Kormylo [11] and Kormylo and Mendel [12] developed a
Fig. 6. MV reflection estimates. Circles depict true reflections and bars depict estimates ([21, [5] ).
MLD filter for estimating r and subsequently ii(k) (k= 1, 2, O 3 ,N) when q and = col (C, R, ai, * , O4 n 01v * * , n) are known ahead of time.' In this case, the likelihood function is
(25)
and the ML solution reduces to the minimum-variance solution of problem A, except that we use the time-varying model (23)
' When q
76
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. GE-21, NO. 1, JANUARY 1983
0 00j_
0.20
0.10
-0.10
-0.20
000
1.00
0.00
..
..
100
1,,
in place of (3) and (11). Since q is known a priori we use q"(k) = q(k) in (23). For the data depicted in Figs. 3, 4, and 5, and using the given values of q (determined from Fig. 3), we obtained the ML deconvolved values of ,u(k) depicted in Fig. 7. As one can see by comparing Figs. 6 and 7 the ML's are far superior to the minimum-variance estimates of ,u(k). Observe that the ML estimator produces a nonzero valued estimate only at those time points at which an event occurs. This is because
(30)
(26)
which is zero when q(k) = 0 and is r'(k) when q(k) = 1. Of course, usually q is not known a priori, and so we are led to problem C.
C. Detection of Events We have studied four ML detection procedures for detecting event locations q when all elements of 0 in (17) are known ahead of time. Detection is a practical alternative to estimating the N binary values of q(k) (N can be very large). Since each q(k) can take one of two possible values, there are 2N possible q sequences, and it is impractical to test such a large number of sequences for the one that maximizes a likelihood function. In this case the likelihood function is [ 1], [ 181 (27) S4{qIz,0}xp(zIq,0)Pr(qI0)=exp [-J1(q)/2]
where
N
where k is chosen so as to maximize the corresponding increase in likelihood. Observe that qi, I differs from qi at only a single location k. Each iteration is equivalent to generating N different sequences qi, 1 using (30) for k = 1, 2, * , N and comparing the corresponding J(q^+i1) to each other and to J(qi). Whichever sequence q^+i1 is most likely then replaces qi. Convergence occurs when none of the q^i+ 1 sequences are more likely than qi. We ([1 ], [18] ) have shown that: 1) each iteration can be performed using only as much computation as required by one Kalman smoother (two Kalman filters), rather than N Kalman filters as one might first suspect; 2) convergence guarantees local optimality in the sense that, when convergence has occurred, adding or removing a reflection at any single location does not increase the likelihood; and 3) each iteration of the procedure is guaranteed to improve the likelihood. The SMLR detector is not self-starting. We initialize it with results obtained from a threshold detector. For the data depicted in Figs. 3, 4, and 5, we obtained the SMLR detected estimates of ,u(k) depicted in Fig. 8. There are five missed detections and one false alarm (at k = 1).
D. Wavelet Estimation We have developed ([11], [19]) a ML procedure for estimating the wavelet's ARMA parameters, 0 = col (a,, * *, n i3,.., , fn), when C, R, and q are known ahead of time.1 For this problem, it is well known ([16], for example) that L1 {0Iz, q,R, C} cp(zjoI, q,R, C) = exp [-J2(0)/2] (31) where
= E (2)J2 ( -){i (k k -
J1 [q] = Z
k-=
{2q (kI k
(28)
k=l1
E q(k).
k=l1
(32)
Observe that m is the number of locations at which q(k) = 1. In (28), zq (k Ik 1) is the innovations process obtained for a specific q, and r?q (k) is the variance of zq (k k - 1). The four detection procedures are: threshold detector, recursive detector, Gaussian sum detector, and single most likely replacement (SMLR) detector. All procedures have one thing in common. They use threshold information obtained from Kalman filters and our associated state-variable model in (3) and (4). The SMLR detector seemed to give best results as determined via simulations. Because the SMLR detector gave best results we briefly
After trying many different optimization procedures for maximizing the likelihood function, we chose the MarquardtLevenberg algorithm ([20], [21], [22] ). A good initialization procedure is needed to choose 00 or else convergence of 0 to 0 may not occur. One of the most important theoretical results developed during this study is the following identifiability condition associated with the identification of nonminimum phase wavelets ([1 ] , [19] ). Given that ,(k) equals r(k) q(k) and that we cannot measure p,(k), if q(k) is a "broad-band" signal such that '1qq (w) * 0
77
(34)
0.00
-0.30 -0.60 -0.90
l
0.00
0.03
0.06
SECONDS
0.09
0.12
0.15
F. Variance Estimation and Event Detection We have developed ([ 11], [ 13] ) a ML estimation and detection procedure for simultaneously estimating 01 = col (V,I VK) and detecting event locations when 0 = col (aq, * *, 2,n5 * 01 OSn) and X are known ahead of time. For this problem, which is a combination of problems C and E
(35) S5 {q, Ollz, X, c p(zIO,, q,0) Pr (q IX). The solution to this problem requires using the MarquardtLevenburg algorithm to estimate 01 and the SMLR detector to obtain q.
Fig. 9. Solid line depicts true wavelet and dashed line depicts initialguess (fourth-order) ARMA wavelet.
0.60
0.30
0.00
r"
.A
1.-.
..
-.
.......
-0.30 -0.60
-0.90
0.0)o
, .,
G. Bernoulli-Parameter Estimation (X-Estimation) When 0 = col (C, R, a1, ,2 n) and q are Oln, > known ahead of time then the MLE of X, X is ([11], [ 12] )
-
^_ m
N
T
.
-
(36)
0.03
0.12
0.15
Fig. 10. Solid line depicts true wavelet and dashed line depicts final ML estimated wavelet (for a signal-to-noise ratio of 10).
where m, the number of nonzero events in q, is defined in (29). Observe that once q is known then X is independent of 0. The likelihood function for this problem is
(37)
for all co, then the ML wavelet V(k) resolves the true wavelet V(k) to within a constant scale factor, i.e.,
Of course, the more realistic problem is one in which q is not known ahead of time, hence, problem H.
V(k)=cxV(k)
where
ance
a=
all k=0, 1, 2,
(33)
true value of varic,
C. (X) # 0 for all and When q (k) is a Bernoulli process, we satisfy the preceding broad-band identifiability condition. The broad-band requirement for q (k) has an interesting resemblance to the requirement of "perpetually exciting" input signals which occurs in system identification [23]. Whereas an observed input serves to "excite" the state vector model of a Kalman filter, q(k) serves to "excite" the covariance model. Observe that even if C CT, = +1, hence, there will always be a sign ambiguity that cannot be resolved by ML estimation. For the data depicted in Figs. 2, 3, and 4 and the initial-guess wavelet depicted in Fig. 9, we obtained the ML estimated wavelet depicted in Fig. 10. Convergence occurred in 14 iterations of the Marquardt-Levenburg algorithm. Except for a scale factor, the estimated and true wavelets are virtually identical.
Fqq
H. Event Detection and X-Estimation (Adaptive Detection) We have developed ([11], [13] ) a ML procedure for simultaneously estimating X and detecting event locations q when 0 =col (C, R, a,,* ,n, 1, -* n) is known ahead of time. The likelihood function in this case is
S6 {X,qlz,0}cxp(zlq,O)Pr (qI X).
(38)
(39)
Xi,,
m (Xi)/N.
Kormylo [111 has proven that, under the assumption of optimal detection, (39) converges monotonically to a local maximum of S6 in (38) in N or less steps for any Xo chosen within the interval [0, 1] We use the SMLR detector to obtain q (the SMLR detector is suboptimal). When it is used, Kormylo [11] and Kormylo and Mendel [18] have proven that alternating iterations of the SMLR detector and update (39) also converge in a finite number of steps, but not necessarily monotonically in X, nor in N or less steps. Alternating iterations of the SMLR detector and update (39) E. Variance Estimation provides us with a simple Block-Component Method for simulWe have developed a ML procedure for estimating the two taneous ML estimation of X and detection of q. It is summarnoise variances C and R when q and1 0 = col (oi1,... a,n ized in Fig. 11. This procedure can be interpreted as an adapcol (v'c, known ahead of time. Letting 01 31. * A) tive detection method. The SMLR detector depends on a estimate 01 as in problem D via the Marquardt- threshold (see [111 or [18] for details) which can be comfR)), we Levenburg algorithm. By estimating \/( and VRK instead of puted when 0 and X are known. In the present situation, when R and C we ensure R > O and C > 0, since R (')2 and 0 is known but X is not, the threshold cannot be computed R > 0 and until X is estimated. From (39) we see that X depends on the C = (')2. Regardless of the sign of VR or ,I results obtained from the SMLR detector, namely m(X), and, C > 0. For this problem the likelihood function is
a
.
-
are
78
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. GE-21, NO. 1, JANUARY 1983
Fig. 13. Block-Component Method for problem J- estimating everything. Fig. 11. A Block-Component Method for ML estimation of X and ML detection of q.
estimated by the Marquardt-Levenburg algorithm. Each step of this procedure is guaranteed to increase the likelihood function, and we should converge to a local maximum of S7. J. Simultaneous Wavelet Estimation, Event Detection, and Statistical-Parameters Estimation (Estimating Everything) This last problem is the "almost nothing is known" case, in which we simultaneously estimate 0 = col (R, (x1, * * * , an I1, * ,X) and X-the wavelet parameters and statistical parameters-and detect event locations q. We do not estimate variance C since we can at best estimate V(k) to within the scale factor V+77 (see problem D). The likelihood function for this problem is (42) S8{q,0,Xlz}oxp(zlq,0)Pr(qIX).
-
Update ~
onswe,ge?
yes
no
m (,i)=
,N
q(kIXi). k=1
(40)
Since the detector threshold changes with changing X we have an adaptive detector. When X -* XT (i.e., the true value of X) then adaptation is completed.
I. Event Detection and Statistical-Parameters Estimation The next more complicated problem is one in which all three statistical parameters, X, C, and R, are estimated while q is being detected. As in problem F, we estimate 01 = col V(, N/R-) rather than C and R. This problem combines problems F and H; its likelihood function is
As in problem H, a Block-Component Method is used to maximize S8 ([11], [24]). Holding X and q constant, the Marquardt-Levenburg algorithm is used to update 0; holding 0 and q constant, X is updated via (39); and holding 0 and X constant, q is updated using the SMLR detector. Each of these steps guarantees an increase in likelihood. Unlike the simple alternation used in problem I, we have found it more effective to use the procedure depicted in Fig. 13 for solving the complete problem. The difference between this new method and the one in paragraph I is that here we update q and X several times for each time we update 0. Our reason for doing this is that the update on 0 requires considerably more computation than the updates on q and X, due to computing gradients and Hessians which are required by the Marquardt-Levenburg algorithm, hence, our new procedure represents a more efficient allocation of computation among the three block update algorithms. For the data in Figs. 2, 3, and 4, the Block-Component Method converged to the wavelet and reflection estimates depicted in Figs. 14 and 15, respectively. The z-transform transfer function for the estimated wavelet is
1,
0) Pr (q X).
(41)
Our solution procedure ([1 ], [131 ) is to maximize (41) using Block-Component iterative update on parameters 01, X and q, as depicted in Fig. 12. Holding X and 0 constant, q is estimated by the SMLR detector; holding q and 01 constant, X is estimated by (39); and, holding q and X constant, 01 is
1
(43)
from which we see that V(z) is nonminimum phase. The estimated wavelets for every sixth iteration are depicted in Fig. 16. Additionally, Fig. 17 depicts the convergence of
79
I.
L
0.00
-0.30
-0.60 -0.90 0.00
X 0.65
0.55
I
0.45
1I
T vIl.....Ev
w M
....*..
*s
- - l T
X-X
0.03
0.12
0.15
0.00
5.0
10.0
15.0 ITERATIONS
20.0
25.0
Fig. 14. Solid line depicts true wavelet and dashed line depicts final ML estimated wavelet (for a signal-to-noise ratio of 10). While this fourthorder wavelet was being estimated, the event sequence q was being detected and statistical parameters R and X were being estimated.
,...
... ..
0.204
..
..... I..
..
0.20 -
0.192
I?
9 0
0
0.10
0.00 -0.10
0
x
0
0. 180
0.168
1
a
0.156
0.00
I...
0. .I....... 25.0
lll-
-0.20
5.0
10.0
................ ,,
15.0 IT E RATIONS
20.0
25.0
0.00
0.20
1.00
Fig. 15. Estimated reflections as obtained from the SMLR detector and a fixed-interval deconvolution filter.
0.60
030 0.00
-
. I. .. ..
..
..
I- '
...
0.10
a
0.00
C,
10
'1
*' ''r
0.10
18
-0.30
-0.60
-0.90
-0.20
il
0.00
1.
.I
12
0.20
4-
1.0CD
, , , . ......................1...
0o00
0.03
0.06 0 09 SECONDS
0.12
-0.180
-0.188
2
x
-0.196
-0.204
-0.212
0.00
5.0
20.0
25.0
Convergence of our Block-Component Method seems to in three stages. The first stage (iterations 1-4) consists of convergence to an incorrect phase realization of the wavelet, and is characterized by a rapid increase in likelihood (Fig. 17). The modeling errors for these early wavelets produce large values for R (Fig. 19). Since using an incorrect phase realization for the wavelet corresponds to convolving the reflection signal with an all-pass filter, we detect "bunches" of reflections for each significant true reflection, as shown in Fig. 20. This also produces large values for X (Fig. 18). The second stage (iterations 5-16) consists of converging
occur
to
the correct phase realization for the wavelet, and is characterized by a much slower increase in likelihood. During this
stage the
-2 lnS8 {q,
XIz}; the horizontal line ma rks 2 ln S8 {qT, OT, XT} as a reference. Convergence of X a nd R is depicted in Figs. 18 and 19, respectively. In these figures, the horizontal lines mark XT and RT, respectively.
0,
-
detected "bunches" of reflections become more like single reflectors, with the relative amplitudes of false reflections to true reflections gradually decreasing until finally the false reflections are no longer detected. The third and final stage (iterations 17-25) represents a "locking in" on the true wavelet. It is initiated by a rapid
80
decrease in X as the false reflections are no longer detected. At this point there is an increase in the rate of change for the 0 parameters and the corresponding likelihood. Convergence occurs suddenly and dramatically.
V. APPLICATION TO REAL DATA Needless to say, there exists certain discrepancies between our assumed model and real seismic data. These discrepancies include geological effects such as spherical divergence, backscatter, multiples, and absorption; they also include recording effects such as geophone response and recording filters. Many of these effects can be included in our state-variable model. We are presently trying to determine which effects should be
(k IN) = Cq (k)y'p(k N). (48) We cannot use (48) to compute ji(k IN) because q(k) is unknown. It is q, (k) that has been determined via detection and not q (k). From (46), observe that
'
(49)
(50)
included. Spherical divergence is the easiest effect to model; it consists of a time-varying scale factor applied to the reflection amplitudes [25]. Since this modification does not increase the model order, there is no reason for not including it. On the other hand, absorption is poorly understood, and may not be simple to model. Essentially, it is observed that deeper data is lower in frequency content than shallow data; beyond this, one can only guess. Ideally we would like to use a continuously-varying wavelet model, but, to do that we would need a parameterized model for this variation. Until such a model is available, we must resort to time-gating. A backscatter model classifies reflections into two categories: reflections from small features and large reflections from layer interfaces. To include backscatter in our state-variable model we introduce a second input signal (system noise) PB (k) which is modeled as zero-mean white Gaussian noise with unknown variance QB, i.e., we modify (3) to
p1(k)
(51)
(44)
This is a formula that can be used to compute ih(k IN). We observe that , (k has a nonzero value at every value of tk, whereas ,u(k|N) only has a nonzero value at those values of tk, for which q(k) = 1. This is as it should be, since pi (k) = p(k) + PtB(k) has nonzero values at every value of tk, due to the backscatter term pB(k), whereas ,(k) = r(k) q(k) is nonzero only at those values of tk for which q(k) = 1. For details on how to adapt the SMLR detector to the equivalent product model (see [ch. 10, [27]). Strong multiples cannot be distinguished from primaries in MLD. To model multiples accurately requires a very highorder ARMA model for the earth and, as such, is not feasible. A possible ad hoc solution is to regard multiples as a known input to the system. The method used to generate this multiples signal is then ignored by the Kalman filter, since the multiples are then treated as a known bias signal by that filter. Alternatively, one can simply regard multiples as any other reflection, and rely on downstream processing to remove them
IN)
where p(k) is still given by product model (9). Including geophone models or recording filter models is only When processing real data it is the combined input sequence meaningful when their impulse responses are known a priori. pi (k) = p(k) + 1B (k) that is detected. For detection one must Otherwise these effects are indistinguishable from the wavelet use an equivalent product model for Ml (k), namely model. Furthermore, if the pass-band for these filters exceed the p, (k) _'q I(k) r,(k) (45) tion wavelet spectrum, they are significant only to the observanoise model. where r1 (k) is, by definition, unit-variance white Gaussian The observation noise model turns out to be crucial to our noise, and deconvolution method. While obviously such noise must exist, recording filters guarantee that it will not be white. While q1(k) _ [E{j4(k)Jq(k)}]'12 = [Cq(k) + QB] 1/2. (46) there are many different sources for this noise, it may be betIt is straightforward to show that signals q1 (k) r, (k) and ter to lump all noise sources together as one colored Gaussian q (k) r(k) + PB (k) are statistically equivalent through their noise term. first two moments. Without backscatter in the model one Fig. 21 depicts a general block diagram to include all of these tends to detect an event at every time point, which of course effects. reduces MLD to MVD. The backscatter variance QB raises the Other problems with real data are in determining model threshold level for event detection so that only strong events orders and initial-guess parameters. Solving these problems are detected. requires a lot of trial and error. During MLD we obtain 'Ii (k), but, of course, it is ,i(k) that Figs. 22 and 23 depict some real-data results obtained for we are really interested in. It is straightforward to show (using problem J. In each figure, the top trace is the data which was a modified version of (45) in [3] ) that processed to give the reflectivity sequence and source wavelet, which are in the and p 1 (k IN) = [QB + Cq (k)]iy'p(k IN) (47) date the results, thirdthird fourth traces, respectively. To valithe and fourth traces were convolved where p (k IN) is computed using (46) in [3 ]. Since P I (k) = to give the second trace. Observe the remarkably close agree-
[26].
81
Southern California, Los Angeles), at Shell Development Company (with Dr. W. Moorehead) during the Summer of 1980 revealed that the Block-Component Method depicted in Fig. 13 appears to be insensitive to initialization of the data.
VI. CONCLUSIONS The early work of Mendel and Kormylo [5], for example, demonstrated feasibility of concept for performing MVD. A MVD filter requires a priori knowledge about the source wavelet and noise statistics. When this information is known, MVD works quite well. Usually, some or all of this information is not known. ML techniques, as described in this paper, represent the next step beyond MVD. It is indeed possible to simultaneously estimate the correct phase source wavelet, noise statistics, and deconvolve the data. Results so far are limited to: 1) single-channel systems; 2) time-invariant and linear wavelets; and 3) estimation and detection procedures that use all of the data from an experiment (non real-time procedures). MLD is a high-resolution technique. It should be used in conjunction with other cheaper and faster techniques that can provide it with reasonably good initial estimates for wavelet parameters (including the wavelet's order), noise statistics, and times of occurrence of significant spikes.
__ 41!v , %
vXDATA
APPROXIMATED DATA
l(k)
3100
.4030
SaOL)
TIME
6 3O0
70[) .OO
.830
.9003
1. 000a
REFERENCES
[11
.l
DATA
AN
.
".
IV
[2]
tvu,
APPROXIMATED DATA
\-
[3]
[41 [51
[61
Ii i
.. 1. .Nv.
n,
'
&l(k)
[71
.9 0 00
0
0. 0
000
.2000
.3000
W0 00
SD0
.0 02
.00 J
BUCO
0uO
[8]
[9]
TIME
Unfortunately, closeness between the approximated data and the actual data can be achieved for a wide range of ,(k) and V(k), so, comparing these data does not "validate" our method. To do this one needs to compare p(k) to well-log data. Our ML approach to solving all of the problems discussed in Section V depends on a state-variable model which is initialized at time zero and data that is associated with such a model. Unfortunately, the first r s of real data is often discarded because it is highly unrealiable (it contains near-field effects, and, deconvolution is based on a far-field model), hence, there is an initialization effect that needs to be studied. A study performed by the second author (and D. Kuan, a student in the Electrical Engineering Department at the University of
[101
[11]
[121 [131
[14] [15]
[161
G. M. Webster, Deconvolution, vols. I and II, Geophysics reprint series. Tulsa, OK: Society of Exploration Geophysicists, 1978. J. M Mendel, "White noise estimators for seismic data processing in oil exploration," IEEE Trans. Automat. Contr., vol. AC-22, pp. 694-706, 1977. J. M. Mendel, "Minimum-variance deconvolution," IEEE Trans. Geosci. Remote Sensing, vol. GE-19, pp. 161-171, 1981. J. M. Mendel and J. Kormylo, "New fast optimal white-noise estimates for deconvolution," IEEE Trans. Geosci. Electron., vol. GE-5, pp. 32-41, 1977. , "Single-channel white-noise estimators for deconvolution," Geophysics, vol. 43, pp. 102-124, 1978. N. E. Nahi, Estimation Theory and Applications. New York: Wiley, 1969. E. A. Robinson, "Predictive decomposition of time series with application to seismic exploration," Geophysics, vol. 32, pp. 418-484, 1967. P. R. Gutowski, E. A. Robinson, and S. Treitel, "Spectral estimation: Fact or fiction," IEEE Trans. Geosci. Electron., vol. GE-16, pp. 80-84, 1978. E. S. Siraki and H. L. Taylor, "An application of sparse spike train concepts to seismic data processing," presented at the 48th Annual Meeting of the Int. SEG, San Francisco, CA, 1978. R. A. Wiggins, "Minimum entropy deconvolution," Geoexploration, vol. 16, pp. 21-25, 1978. J. Kormylo, "Maximum-likelihood seismic deconvolution," Ph.D. dissertation, University of Southern California, Los Angeles, CA, 1979. J. Kormylo and J. M. Mendel, "On maximum-likelihood detection and estimation of reflection coefficients," presented at the 48th Annual Meeting of the SEG, San Francisco, CA, 1978. "Applying maximum-likelihood deconvolution to well-log impedance data," presented at the 49th Annual Meeting of the SEG, New Orleans, LA, 1979. A. Papoulis, Probability, Random Variables, and Stochastic Processes. New York: McGraw-Hill, 1965. F. C. Schweppe, Uncertain Dynamical Systems. Englewood Cliffs, NJ: Prentice-Hall, 1973. R. K. Mehra, "Identification of stochastic linear dynamic systems using Kalman filter representation," AIAA J., vol. 9, pp. 28-31, 1971.
82
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. GE-21, NO. 1, JANUARY 1983
[17] A. W. F. Edwards, Likelihood. New York: Cambridge University Press, 1972. [18] J. Kormylo and J. M. Mendel, "Maximum-likelihood detection and estimation of Bernoulli-Gaussian processes," IEEE Trans. Inform. Theory, vol. IT-28, no. 3, pp. 482-488, May 1982. [19] J. Kormylo and J. M. Mendel, "Identifiability of non-minimum phase linear stochastic systems," in Proc. 1980 IEEE Conf Decision Contr., Albuquerque, NM, pp. 684-690, 1980. [20] D. M. Himmelblau,AppliedNonlinearProgramming. New York: McGraw-Hill, 1972. [21] D. W. Marquardt, "An algorithm for least-squares estimation of nonlinear parameters," J. Soc. Indust. Appl. Math., vol. 11, pp. 431-441, 1963. [22] Y. Bard, "Comparison of gradient methods for the solution of nonlinear parameter estimation problems," SIAM J. Numer. Anal., vol. 7, pp. 157-186, 1970. [23] K. J. Astrom and T. Bohlin, "Numerical identification of linear dynamical systems and normal operating records," in Proc. Int. Fed. Automat. Contr. Symp. Adaptive Control, Teddington, Middlesex, England, 1965. [24] J. Kormylo and J. M. Mendel, "Maximum-likelihood seismic deconvolution," presented at the 50th Annual Meeting of the SEG, Houston, TX, 1980. [25] -, "Simultaneous spherical divergence correction and optimal deconvolution,"IEEE Trans. Geosci. RemoteSensing,vol.GE-18, pp. 273-280, 1980. [26] J. M. Mendel and M. Shiva, "Normal-incidence geo-optimal deconvolution," to be published. [27] J. M. Mendel, "Optimal seismic deconvolution," to be published. [28] H. Kwakemaak, "Estimation of pulse heights and arrival times," Automatica, vol. 16, pp. 367-377, 1980. [29] H. Akaike, "A new look at statistical model identification," IEEE Trans. Automat. Contr., vol. AC-19, no. 6, pp. 716-723, 1974.
*
Dr. Kormylo is a member of Eta Kappa Nu, Tau Beta Pi, the Society of Exploration Geophysicists, and several other organizations.
John J. Kormylo (S'77-M'78) received the B.S. degree in engineering science from Florida State University, Tallahassee, in 1972, the M.S. degree in engineering science from the University of South Florida, Tampa, in 1976, and the Ph.D. degree in electrical engineering from the University of Southern California, Los Angeles, in 1979. He is currently a Senior Research Engineer at the Exxon Production Research Company, Houston, TX.
Jerry M. Mendel (S'59-M'61-SM'72-F'78) received the B.S. degree in mechanical engineering and the M.S. and Ph.D. degrees in electrical engineering from the Polytechnic Institute of Brooklyn, Brooklyn, NY, in 1959, 1960, and 1963, respectively. His experience has included teaching courses in electrical engineering at the Polytechnic Institute of Brooklyn, from 1960 to 1963, and _SS; has also included various consulting positions. From July 1963 to January 1974 he was with the McDonnell Douglas Astronautics Company, on a full-time basis. Currently he is Professor and Associate Chairman of Electrical Engineering at the University of Southern California at Los Angeles. He teaches courses in estimation theory and seismic data processing. His primary research interest is estimation and identification for system modeling and control. During the past six years he has been studying applications of estimation and systems theory to seismic data processing for oil exploration, and is Director of the USC Geo-Signal Processing Program. He has published over 120 technical papers and is author of the text Discrete Techniques of Parameter Estimation: The Equation Error Formulation (Dekker, 1973) and co-editor (with K. S. Fu of Purdue University), of Adaptive, Learning and Pattern Recognition Systems (Academic Press, 1970). He served as Editor of the IEEE Control Systems Society's IEEE Transactions on Automatic Control, is Consulting Editor of the Control and Systems Theory Series for Marcel Dekker, Inc., and Associate Editor of Automatica and the IEEE Transactions on Geoscience and Remote Sensing. Dr. Mendel is a member of the Society of Exploration Geophysicists, the European Association for Exploration Geophysicists, the American Association for the Advancement of Science, Tau Beta Pi and Pi Tau Sigma, and is a registered Professional Control Systems Engineer in California. He received the SEG 1976 Outstanding Presentation Award for a paper on the application of Kalman Filtering to deconvolution. He is also Vice-President for technical activities of the IEEE Control System Society.