0% found this document useful (0 votes)
77 views5 pages

Thomson 2006

We propose using overlapping, tapered windows to process seismic data in parallel. This method is also highly scalable and makes parallel processing of large data sets feasible. We use this scheme to define the Parallel Windowed Fast Discrete Curvelet Transform (PWFDCT)

Uploaded by

Patrícia Perez
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views5 pages

Thomson 2006

We propose using overlapping, tapered windows to process seismic data in parallel. This method is also highly scalable and makes parallel processing of large data sets feasible. We use this scheme to define the Parallel Windowed Fast Discrete Curvelet Transform (PWFDCT)

Uploaded by

Patrícia Perez
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

A parallel windowed fast discrete curvelet transform applied to seismic processing

Darren Thomson*, Gilles Hennenfent, Henryk Modzelewski and Felix J. Herrmann


Seismic Laboratory for Imaging and Modeling, Earth & Ocean Sciences Department, University of British
Columbia

SUMMARY cluding how it has been applied to seismic data interpolation, as well as
some specific new issues that our scheme introduces to this (and other)
We propose using overlapping, tapered windows to process seismic algorithms. Finally, we show and discuss the results of applying this
data in parallel. This method consists of numerically tight linear oper- algorithm to a synthetic seismic data set with missing traces.
ators and adjoints that are suitable for use in iterative algorithms. This
method is also highly scalable and makes parallel processing of large
seismic data sets feasible. We use this scheme to define the Parallel BACKGROUND
Windowed Fast Discrete Curvelet Transform (PWFDCT), which we
apply to a seismic data interpolation algorithm. The successful per- Parallel computation is necessary to solve realistic problems in many
formance of our parallel processing scheme and algorithm on a two- fields. The way in which computations are split in order to run in par-
dimensional synthetic data is shown. allel is highly dependent on the particular problem in question, but two
very basic approaches are commonly used in a wide variety of appli-
cations. Any algorithm that involves a large number of independent
operations on independent data are easily run in parallel by simply
INTRODUCTION
giving each computing node a share of the work, without the need for
any communication between nodes for the duration of the process. A
Realistic seismic data sets, especially three-dimensional ones, can be large Monte Carlo experiment, for instance, is easily run in parallel in
extremely large. It is not uncommon for a data set to be orders of mag- this way. Another approach to building parallel algorithms is to give
nitude larger than what one could hope to fit in a computer’s memory. each node a unique window of the data set that it is capable of han-
This presents obvious challenges when attempting to process seismic dling. In some cases, these windows can have small overlaps. This
data and ultimately use it to create images of the subsurface. One ap- is particularly useful in solving differential equations, where, for ex-
proach is to simply shrink the data set by downsampling or by consid- ample, finite difference solvers require knowledge of a function at a
ering only a small subsection of the full data set. In many situations, given point of interest as well as the value of that same function at all
however, this is not satisfactory, and so a means for processing (and of the points surrounding it (see e.g. J. Fan and Rector, 1997). How-
later imaging) the full data set is needed. Ideally, such a scheme would ever, under some circumstances, running an algorithm unchanged in
provide a straightforward way to make use of existing useful methods parallel becomes very difficult. The FFT, for instance, requires a mas-
for processing small data sets. sive amount of communication when the data are spread across mul-
tiple nodes in parallel. As the number of nodes grows, the amount of
The obvious solution to problems that require more memory than any
information that needs to be communicated becomes unfeasible. By
single computer can offer is to use a large number of computers in
extension, this also impacts the FDCT (L. Ying and Candès, 2005).
parallel. Parallel systems of various types are very common in high
performance computing, and are applied to problems of all sorts, in- FFT-based operations, like the FDCT, as well as other operations that
cluding many in seismic data processing. For instance, a parallel im- are not easily run in parallel, often have desirable properties that can
plementation of the Fast Discrete Curvelet Transform (FDCT), which be exploited to find solutions to various problems. Thus, it is advanta-
can be applied to a variety of problems in seismic processing, exists geous to have a system under which these operations can be performed
(L. Ying and Candès, 2005). However, there are limitations to the in parallel on large amounts of data. One approach is to split the data
scalability of this implementation, to the extent that processing typ- into subsections, or windows, and operate on these separately. This
ical large seismic data sets is still intractable. As is the case with approach is ”embarrassingly parallel” - in other words, no communi-
many classes of parallel algorithms, scalability is limited by growths cation between neighbouring processes is necessary during process-
in the amount of communication required between processors working ing. However, this approach often fails due to the creation of arti-
in parallel, though this is dependent on the properties of the specific facts at the edges of the windows (J. Fan and Rector, 1997) as a re-
system being used (see e.g. Gupta and Kumar, 1993). Other classes of sult of, for example, the Gibbs phenomenon, which arises when an
parallel algorithms involve splitting data sets into small pieces (”win- approximation of the data (rather than the full representation in the
dows”) and consider each window separately. While this is perfectly transform domain) is taken. Approximations in transform domains are
legitimate in many applications, problems can arise at window borders common, so a parallelization approach that eliminates the impact of
for a variety of reasons, including, commonly, artifacts related to the edge artifacts is necessary. Even when approximations are not taken
periodicity of the transform or to the discontinuity across the border. on separated windows, it is possible that differences between windows
will be introduced through the process. Also, to be scalable such
Here, we consider the application of overlapping windows processed
that it could feasibly handle large data sets, any parallel processing
in parallel to a seismic data interpolation algorithm that makes use
scheme must minimize the amount of necessary communication be-
of the FDCT. These overlapping windows include a taper that is ap-
tween nodes. Furthermore, any parallelized operation that is going to
plied to overlapping regions such that the overall energy of the system
be performed repeatedly (i.e. as part of some iterative solver) needs
is preserved throughout the operation. We restrict ourselves to two-
to have a defined adjoint operation, and should preserve energy. To-
dimensional data here, but this method is easily generalized to an ar-
gether, these properties create a numerically tight frame that will not
bitrary number of dimensions. This parallelization scheme addresses
introduce any growing error in an iterative process.
shortcomings of other schemes that could be applied to this problem,
and can be generalized to create parallel windowed versions of a wide
variety of operators.

This paper begins with a brief review of parallel algorithms in scien-


tific computing. We then discuss our parallelization in more detail, in-

SEG/New Orleans 2006 Annual Meeting 2767


Downloaded 18 Nov 2010 to 146.164.6.188. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
THEORY We now consider the entire process in the context of linear operators
acting on data vectors. In the parallel context, it is useful to distinguish
We propose a method to parallelize an arbitrary linear operator that between global vectors and operators, which comprise the entire paral-
avoids problems related to edge artifacts and preserves overall energy. lel system, and local vectors and operators that exist only on individual
It requires relatively little communication between parallel nodes, mak- nodes. Our notation will reflect this distinction. For instance, the dis-
ing it highly scalable. Our particular interest here is focused on a scal- tributed operator and vector A and x are easily distinguished from the
able parallel generalization of the FDCT, but we stress the fact that any local operator and vector Ai and xi that exist on a node indexed by i.
linear operator could take the place of the FDCT in this framework. Note that there will be cases when we want to apply a local operator
Ai to the data on every node in a global vector x. This involves a block
Parallel Windowed FDCT diagonal global operator where each block consists of the particular
The basic structures of this framework are overlapping and tapered Ai corresponding to a given xi , which is simply a subsection of x. We
windows. This scheme has previously been used in various parallel denote the block diagonal global operator for a local operator as [Ai ].
processing applications (see e.g. J. Fan and Rector, 1997). The details
of these structures are quite flexible. In general, the sizes of the win- An arbitrary global data vector is expanded into overlapping sections
dows and the overlapping regions do not have to be uniform through- using the global windowing operator W. The tapering operator T is
out the system. Here, though, we will only consider equally sized win- then applied across the system. It is easy to see that the tapering opera-
dows with uniform overlap throughout the system. One can express tor is diagonal, and will have repeating patterns in blocks representing
the overlap between windows by the value ε, which represents the each node. Alternatively, one can simply look at the tapering operator
depth to which one window receives adjacent data from its neighbours as a local Ti on each node. Once W and T have been applied, any
(Mallat, 1998), as illustrated in Figure 1. The overlapping regions of arbitrary linear operator Ai can be applied on each node.
the windows are tapered such that the energy of the system remains
constant when points in the data set are duplicated due to the overlaps.
The tapering also ensures that edges of the data goes smoothly to zero
at the edges, eliminating the potential of creating edge artifacts.

The taper is applied across the outer region of the overlapping win-
dows, affecting points that are within a distance of 2ε from an edge.
The tapering function b must satisfy the relationship

b02 + b002 = 1, (1)

where b0 and b00 signify the tapers in adjacent windows that cover the
same region. For consistency, we consider the application of an iden-
tical tapering function b to all windows. It follows from Eq. 1 that

b2n + b22ε−n+1 = 1, (2)

where n is an integer on the interval [1, 2ε], and subject to the boundary
conditions b1 = 0 and b2ε = 1. This condition ensures that the energy
of the system is conserved when summing values from adjacent win-
dows that represent the data at the same location. There are many
examples of functions that satisfy this relationship (Mallat, 1998), the
simplest being:  
(n − 1)π
bn = sin . (3)
2(2ε − 1)
To taper the end of the data set along each dimension, the function b is
translated to the points n = {N, N − 1, · · · , N − 2ε + 1}, where N is the
total size of the overlapping windows. The tapering function is then
 
(N − n)π
bn = sin . (4) Figure 1: 2D synthetic seismic data with 30% missing vertical traces.
2(2ε − 1)
Dotted lines represent borders of overlapping windows. Taper function
is shown above for clarity.
Once the data is split into overlapping windows and tapered, opera-
tions, like the FDCT in our particular case, can be performed on each
window independently. The shape of the taper function and the over- Perhaps the most useful part of considering windowing and tapering as
lap of neighbouring windows are both shown in Figure 1. The data in linear operators in matrix form is that looking at these matrices makes
this case is split into sixteen overlapping windows, four horizontally understanding the adjoints of these operations simple. Since T is diag-
and four vertically. The dashed lines in the image represent the edges onal, it is its own adjoint. For W, the adjoint operation involves sum-
of the overlapping windows, with the region between nearby paral- ming together overlapping regions. Since the forward operation was a
lel lines being shared between the two windows. The taper function ”scatter,” the adjoint becomes a ”gather,” where data that correspond
is shown, where one can see the value going to zero at the window to the same point are summed (Claerbout, 1992). In a parallel comput-
edges. Since the image and all windows are square in this case, the ing realization, this means that a given node sends its outer band to the
taper functions along the vertical axis are the same as the horizontal nodes to which they belong, then gathers data related to its inner band
ones that are shown. Also, note that tapering is not done at the edges from those same neigbours, and sums them together. Importantly, the
of the image, since this would remove energy from the system. This combination of these processes satisfies the relation
issue can be successfully dealt with in a number of ways, but we omit
this discussion. WH TH TW = I, (5)

SEG/New Orleans 2006 Annual Meeting 2768


Downloaded 18 Nov 2010 to 146.164.6.188. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
This ensures perfect reconstruction of the data when applying the op- of the tapering. This introduces errors that have the potential to grow
erators followed by their adjoints. It implies that energy is preserved through the iterative processes that commonly make use of threshold-
through the entire process. It follows that in any iterative algorithm ing. Thus, it is important to correct threshold values to account for the
instabilities or inaccuracies will not be introduced by this windowing impact of the tapering, such that coefficients that should be kept are
and tapering process. The use of operator adjoints and the preservation not accidentally discounted.
of energy distinguishes our method from previous uses of overlapping
and tapered windows. A number of methods for correcting threshold values are possible. The
problem essentially involves finding an operator in the transform do-
At this point, it is useful to define a new operator that we will call main that is the equivalent of the tapering operator. In the case of the
the Parallel Windowed FDCT (PWFDCT), which is most simply de- FDCT, we are interested in the diagonal operator D such that
scribed as the block diagonal global operator that is created by apply-
ing the FDCT Ci independently on each node after applying the global CH DC = T. (9)
windowing and tapering operators T and W. In other words,
This operator D can then be applied to the threshold vector to ob-
tain a corrected threshold vector that can be used to more properly
C := [Ci ]TW. (6)
distinguish between significant and insignificant data in the transform
(curvelet) domain. Since the FDCT is localized in both space and spa-
Since the local FDCT C is numerically tight, it follows that tial frequency, the approximated diagonal operator D is expected to be
an accurate approximation.
CH C = I. (7)
The simplest way to obtain D is through a large Monte Carlo sampling,
Other properties of C are similarly shared by C. It is possible, then, where random noise realizations are tapered and then transformed.
to make use of the PWFDCT in existing algorithms that include the The root mean square (RMS) of all of the transformed results then
FDCT. It should be noted, though, that curvelets at very large scales gives an arbitrarily accurate approximation of D. Optionally, the same
(or, equivalenty, low frequencies) are not represented in the same way noise realizations can also transformed without the taper applied. The
they would be if a single FDCT were performed on the global data. RMS of the transformed tapered noise realizations can then be divided
However, since the benefits of curvelets are mostly found at the finer by the RMS of the transformed untapered noise realizations to remove
scales (higher frequencies), the lack of large scale curvelet representa- any remaining noise artifacts from the Monte Carlo sampling.
tion is not a problem. This makes a wide variety of algorithms capable
of handling data sets much larger than otherwise possible. An exam- It is also possible, in our case, to approximate D by evaluating the
ple of a seismic data interpolation algorithm where the PWFDCT takes taper function at the centroid of each curvelet, and using that value as a
the place of the FDCT follows. The PWFDCT will be applied to other weighting to apply to the relevant coefficient. However, this approach
seismic processing problems, including primary-multiple separation explicitly ignores the spacial extent of curvelets, since the value of the
and seismic deconvolution, in the future. taper at the centroid of a given curvelet will in general not represent
the overall effect of the tapering on that curvelet.
Interpolation
Hennenfent and Herrmann (2006) exploits the multiscale, multidirec- Other methods for finding D are possible, but further discussion is
tional and continuity properties of seismic wavefronts which lead to omitted here.
sparsity of seismic data in the curvelet domain to solve the interpo- Scalability
lation problem (see also Herrmann and Hennenfent, 2006, for more The parallelization scheme described herein is expected to be highly
details). Reformulated using global operators and vectors, the interpo- scalable. The computational cost will be, as in any parallel operation,
lated data f̃ is obtained by f̃ = CH x̃ where have two aspects. The first is the cost of the operations on each node.
In the language of this paper, this is the cost of the arbitrary local
x̃ = arg min kxk1 s.t. ky − RCH xk2 ≤ ε. (8) operator Ai , or, for the Parallel Windowed FDCT, the local FDCT Ci .
x

In general, though, the limiting factor in most parallel processing ap-


In this expression, y represents the acquired data with missing traces in
plications is the cost of communication between parallel nodes, and
otherwise regularly sampled data, R a restriction (or so-called picking)
here our method has advantages. The communication is contained in
operator that extracts the acquired traces from the interpolated data, x
the windowing operator W. In the forward operation, each node only
the PWFDCT representation of the interpolated data, and ε the size
needs to communicate with the nodes containing adjoining data win-
of the noise present in the acquired data. Eq. (8) is solved using a
dows. It is straightforward to verify that each node needs to communi-
large-scale problem solver for `1 -regularization minimization based on
cate (M + N) 2ε −4ε 2 points, where M and N are the dimensions of the
cooling method optimization and an iterative thresholding algorithm
overlapping windows. It is important to note that the amount of com-
(Daubechies et al., 2004)
munication does not depend on the number of windows used or the size
Thresholding of the full data set. This distinguishes our approach from, for instance,
Algorithms that involve nonlinear thresholding in a transform domain the parallel FFT and other operations based on it, and implies that the
present a problem when using tapered overlapping windows. The method we describe will scale to very large sizes without growth in
Curvelet-based interpolation method described above is but one ex- the amount of necessary communication between neighbouring nodes,
ample of an entire class of algorithms that use approximation or esti- which is especially important when communication between nodes is
mation in a transform domain (in particular a domain where the data in costly in relation to computation (Gupta and Kumar, 1993).
question is sparse), all of which involve thresholding. Fundamental to
the approximation and estimation process is the selection of the most
significant coefficients in the transform domain. The problem arises RESULTS
from the very fact that the overlapping regions are tapered. If we con-
sider the FDCT, it is clear that the amplitude of curvelets representing The interpolation algorithm described above was tested, making use of
data in the tapered regions will have their amplitudes reduced by the the PWFDCT, on the synthetic seismic data set with missing vertical
taper. Thus, when applying a uniform threshold value over the global traces shown in Figure 1. The particular global windowing and taper-
system, it is likely that coefficients that would otherwise be considered ing operators W and T divided the data set into sixteen windows, with
”significant” will fall below the threshold solely because of the impact the overlap and taper parameterized by ε = 16.

SEG/New Orleans 2006 Annual Meeting 2769


Downloaded 18 Nov 2010 to 146.164.6.188. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
To demonstrate the importance of using overlapping and tapering, we CONCLUSIONS
did a similar interpolation using windows that were neither tapered nor
overlapping. Through most of these windows, the algorithm works just Since seismic data sets are typically very large, it is important to have
as well as it does in overlapping windows. However, edge artifacts are the capability to process data sets much larger than a single computer
evident near window borders, as would be expected. In Figure 1, we could potentially hold in memory. In order to run many algorithms in
show a subsection of the full output of interpolation runs using over- parallel, it is not sufficient to process data in separate pieces. At the
lapping and non-overlapping windows. The same subsection is used same time, these same algorithms are often not scalable in their nor-
for both, and contains one full window and parts of its nearest neigh- mal form due to exponential growth in the amount of communication
bours. In the example using non-overlapping windows (Figure 2(a)), between nodes that they require. For these reasons, we have developed
the edges of the windows are obvious from the artifacts that appear. a scheme that involves overlapping, tapered data windows that can be
When overlapping windows are used (Figure 2(b)), the artifacts are no processed in parallel that is highly scalable since the communication
longer evident, and it is not clear at all where the window borders are. costs do not grow with the number of processing nodes. We have used
In essence, the interpolation performs just as well in the proximity of this method to define the PWFDCT, but we stress that this method is
the window edges as it does in the middle of the window, which is general and that the FDCT can be replaced by an arbitrary operator
clearly untrue for non-overlapping windows. The importance of over- acting on each overlapping window independently as desired.
lapping and tapering is thus, as expected, clear from this example. We
We applied the PWFDCT to a seismic data interpolation algorithm that
is shown, in another presentation to this conference, to be success-
ful using the FDCT. We have demonstrated that good results can be
achieved with the PWFDCT in this algorithm, and shown the impor-
tance of the overlapping and tapering. Furthermore, we have demon-
strated that it is possible to correct for the effect of tapering on thresh-
old values in the transform domain.

Much future work is expected to arise from the ideas described herein.
In particular, we will apply the PWFDCT to other seismic data pro-
cessing algorithms, enabling them to work on much larger data sets
than is currently feasible. The parallelization method described here
will also be applied to other operators besides the FDCT. Finally, we
hope that this method will open up new possibilities in parallel signal
processing in a variety of fields.

ACKNOWLEDGMENTS
(a)
The authors would like to thank the authors of the Fast Discrete Curvelet
Transform (FDCT) for making this code available at www.curvelet.org.
This work was in part financially supported by NSERC Discovery
Grant 22R81254 of Felix J. Herrmann and was carried out as part of
the SINBAD project with support, secured through ITF (the Industry
Technology Facilitator), from the following organizations: BG Group,
BP, Chevron, ExxonMobil and Shell.

REFERENCES

Claerbout, J. F., 1992, Earth soundings analysis: Processing versus


inversion: Blackwell Scientific publishing.
Daubechies, I., M. Defrise, and C. de Mol, 2004, An iterative thresh-
olding algorithm for linear inverse problems with a sparsity con-
straint: Comm. Pure Appl. Math., 1413–1457.
Gupta, A. and V. Kumar, 1993, The scalability of FFT on parallel com-
puters: IEEE Transaction on Parallel and Distributed Systems.
(b)
Hennenfent, G. and F. J. Herrmann, 2006, Application of stable seis-
mic signal recovery to seismic interpolation. (submitted for pre-
Figure 2: Interpolation output for (a) non-overlapping windows and sentation at 76th SEG Conference & Exhibition).
for (b) overlapping, tapered windows with threshold correction. Ar- Herrmann, F. J. and G. Hennenfent, 2006, Non-parametric seismic
rows indicate locations of artifacts in (a) and show the improvement in data recovery with curvelet frames. (to be submitted).
(b). J. Fan, K.T. Nihei, L. M. N. C. and J. Rector, 1997, Overlap domain
decomposition method for wave propagation: Presented at the 67th
also compared different methods for correcting the threshold value in SEG Conference & Exhibition.
the curvelet domain to account for the taper. When the taper was not L. Ying, L. D. and E. Candès, 2005, 3D discrete curvelet transform.
taken into account in thresholding, errors were evident in the overlap- submitted for publication.
ping region. When correcting threshold values by using a Monte Carlo Mallat, S., 1998, A wavelet tour of signal processing: Academic Press.
sampling or by evaluating the taper function at curvelet centroids, ar-
tifacts related to erroneous thresholding are eliminated. We omit the
figures demonstrating this due to space limitations.

SEG/New Orleans 2006 Annual Meeting 2770


Downloaded 18 Nov 2010 to 146.164.6.188. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/
EDITED REFERENCES
Note: This reference list is a copy-edited version of the reference list submitted by the
author. Reference lists for the 2006 SEG Technical Program Expanded Abstracts have
been copy edited so that references provided with the online metadata for each paper will
achieve a high degree of linking to cited sources that appear on the Web.

REFERENCES
Claerbout, J. F., 1992, Earth soundings analysis, Processing versus inversion: Blackwell
Scientific Publishing.
Daubechies, I., M. Defrise, and C. de Mol, 2004, An iterative thresholding algorithm for
linear inverse problems with a sparsity constraint: Communications on Pure and
Applied Mathematics, 1413–1457.
Gupta, A., and V. Kumar, 1993, The scalability of FFT on parallel computers: IEEE
Transaction on Parallel and Distributed Systems.
Fan, J., K. T. Nihei, L. Meyer, N. Cook, and J. Rector, 1997, Overlap domain
decomposition method for wave propagation: 67th Annual International Meeting,
SEG, Expanded Abstracts, 1485–1488.
Hennenfent, G., and F. J. Herrmann, 2006, Application of stable seismic signal recovery
to seismic interpolation: Presented at the 76th Annual International Meeting,
SEG.
Herrmann, F. J., and G. Hennenfent, 2006, Non-parametric seismic data recovery with
curvelet frames: to be submitted.
Mallat, S., 1998, A wavelet tour of signal processing: Academic Press.
Ying, L., and E. Candés, 2005, 3D discrete curvelet transform: submitted for publication.

SEG/New Orleans 2006 Annual Meeting 2771


Downloaded 18 Nov 2010 to 146.164.6.188. Redistribution subject to SEG license or copyright; see Terms of Use at http://segdl.org/

You might also like