0% found this document useful (0 votes)
36 views11 pages

J Li2021DeepLearningForSuperResolutionAndDenoising

This document summarizes a research article that proposes using a deep convolutional neural network to simultaneously improve the resolution and reduce noise in seismic images. The network is trained on synthetic seismic data generated to serve as training sets. The network aims to enhance geological features while avoiding overly smoothed results by using a loss function combining L1 loss and structural similarity metrics. Preliminary results on synthetic and field seismic images show improved perception quality over conventional methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views11 pages

J Li2021DeepLearningForSuperResolutionAndDenoising

This document summarizes a research article that proposes using a deep convolutional neural network to simultaneously improve the resolution and reduce noise in seismic images. The network is trained on synthetic seismic data generated to serve as training sets. The network aims to enhance geological features while avoiding overly smoothed results by using a loss function combining L1 loss and structural similarity metrics. Preliminary results on synthetic and field seismic images show improved perception quality over conventional methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING 1

Deep Learning for Simultaneous Seismic Image


Super-Resolution and Denoising
Jintao Li , Xinming Wu , and Zhanxuan Hu

Abstract— Seismic interpretation is often limited by low resolu- sources and receivers to increase the density of samples, better
tion and strong noise data. To deal with this issue, we propose to instruments to record a wider range of frequencies during
leverage deep convolutional neural network (CNN) to achieve acquisition. Besides, a significantly more computational cost
seismic image super-resolution and denoising simultaneously.
To train the CNN, we simulate a lot of synthetic seismic images is required for processing the data sets.
with different resolutions and noise levels to serve as training data For the second potential technology: seismic image
sets. To improve the perception quality, we use a loss function denoising, a number of effective methods have been
that combines the 1 loss and multiscale structural similarity loss. proposed [6]–[11]. These methods enhance the structural
Extensive experimental results on both synthetic and field seismic and stratigraphic features and attenuate random noise in
images demonstrate that the proposed workflow can significantly
improve the perception of quality of original data. Compared to seismic image by constructing structure-oriented filters to
conventional methods, the network obtains better performance smooth a seismic image along reflections. To construct such
in enhancing detailed structural and stratigraphic features, such structure-oriented filters, researchers can utilize anisotropic
as thin layers and small-scale faults. From the seismic images diffusion [6], [8], [11], the steered Kuwahara filter [7],
super-sampled by our CNN method, a fault detection method plane-wave prediction [9], and steered bilateral filter [10].
can compute more accurate fault maps than from the original
seismic images. Wu and Guo [11] proposed a method to simultaneously
enhance reflections, faults, and channels in a seismic image
Index Terms— Deep learning, geophysical image processing, by using the fast explicit diffusion (FED). Although these
image denoising, super-resolution.
methods can attenuate the random noise and enhance the
I. I NTRODUCTION structural and stratigraphic features, they also damage some
useful details of geological structural features in the seismic
S EISMIC interpretation is sensitive to the quality of seis-
mic data. Due to the limitations of seismic acquisition
and processing, the field seismic data often suffer from low
image.
In recent years, with the advancement of hardware com-
puting power, especially graphics processor units (GPUs),
resolution and noise corruption, which bring challenges to
many deep learning methods have been proposed and achieved
subsequent seismic interpretation. Two potential technologies
success in many computer vision tasks including natural image
to solve these issues are image super-resolution and image
super-resolution and denoising [12]–[15]. These methods have
denoising.
used deep convolutional neural network (CNN) to achieve
In the last two decades, many researchers have devel-
remarkable performance. And once the model is well-trained,
oped numerous methods to increase the resolution of seismic
it takes only a brief amount of time in the application. Inspired
images. These methods can be roughly grouped into two
by these CNNs, this article deviates from the traditional
categories: high-density acquisition [1], [2] and broadband
methods and leverages deep CNN to achieve seismic image
seismic [3]–[5]. The former, as the name suggests, generally
super-resolution and denoising simultaneously. Nevertheless,
increases the horizontal resolution by acquiring a densely
directly using such methods to address the seismic image often
sampled data set utilizing a larger number of sources and
encounter two significant issues. The first issue is the lack of
receivers. While the latter improves the vertical resolution
training data. Unlike natural images, we cannot obtain a large
by recording a full range of frequencies, including low- and
amount of noise-free field seismic images with high resolution
high-frequency parts. However, both of them are costly in
as labels. Some authors [16] select a modern, high-fidelity 3-D
data acquisition and processing. Because they require more
seismic survey with well-imaged faults as the donor survey.
Manuscript received June 30, 2020; revised November 6, 2020, January 6, However, it may be insufficient for training. The second is
2021, and January 30, 2021; accepted January 30, 2021. This work was sup- perception quality. Existing CNN-based methods generally use
ported by the National Science Foundation of China under Grant 41974121.
(Corresponding author: Xinming Wu.) a mean absolute error (1 ) loss, which tends to generate blurry
Jintao Li and Xinming Wu are with the School of Earth and Space Sciences, and overly smoothed results, especially near the faults, and
University of Science and Technology of China, Hefei 230026, China (e-mail: therefore limits the subsequent seismic interpretation.
[email protected]; [email protected]).
Zhanxuan Hu is with the School of Computer Science and To tackle the first issue, we follow the workflow proposed
OPTIMAL, Northwestern Polytechnical University, Xi’an 710072, China by Wu et al. [17] and [18] and generate 800 synthetic seismic
(e-mail: [email protected]). volumes. Subsequently, we extract plenty of 2-D inputs from
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TGRS.2021.3057857. generated 3-D volumes to serve as the training data sets.
Digital Object Identifier 10.1109/TGRS.2021.3057857 Besides, to tackle the second issue, we replace the 1 loss
0196-2892 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

2 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

with a new objective, a combination of 1 loss and mul- images as the ground truth. In practice, however, such data
tiscale structural similarity (MS-SSIM) [19] loss, which is sets are rare. To this end, we follow a workflow provided by
also used in computer version task [20]. The network used Wu et al. [17] and [18] to build a realistic structure model,
in our method is a variant of U-net [21], which introduces and then extract thousands of 2-D sections for training.
a subpixel layer [22] and several residual blocks. The details
can be found in Section IV. Some machine learning algo- A. Generate Training Data
rithms [23] leverages multikernel learning to improve accuracy We first generate 800 synthetic 3-D seismic cubes with
and stability. To validate the performance of the proposed size 256 × 256 × 256, as shown in Fig. 1. In this workflow,
method, we conduct extensive tests on both synthetic and field we first build an initial reflectivity model with all flat layers.
seismic data. And the experimental results demonstrate that the Subsequently, we need to add some structures to simulate
network trained on only the synthetic data can significantly field seismic image. The folding and faulting structures are the
improve the perception quality of field seismic data, and the most important structures of field seismic data. We vertically
detailed structural and stratigraphic features are enhanced, sheared the initial model to create folding structures and
such as thin layers and small-scale faults. From the seismic then utilize volumetric vector fields to simulate faulting in
images super-sampled by our CNN method, a fault detection the model [Fig. 1(a)]. In this way, we obtain a reflectivity
method can compute a more accurate fault map than from the model with realistic folding and faulting structures. We further
original seismic images. convolve the generated model with a wavelet to simulate
synthetic seismic volumes.
II. P ROBLEM D EFINITION
As the conventional methods declared in the introduction,
Image super-resolution and image denoising are both a super-resolution seismic image owns a wider frequency
low-level vision tasks and processed similarly. We first ana- band than a native seismic image in general. According to
lyze the principle of the two problems. For seismic super- this principle, we convolve the generated reflectivity model
resolution, the relationship between high-resolution seismic [Fig. 1(a)] with a high-frequency wavelet to obtain the 3-D
images and low-resolution seismic images follows the formula: seismic volume [Fig. 1(b)] from which 2-D label images are
I L = Down(I H , 1 ) (1) extracted. To generate the corresponding input training images,
we first convolve the same reflectivity model [Fig. 1(a)]
where I L and I H represent as the low-resolution seis- with a low-frequency Ricker wavelet to obtain a relatively
mic images and high-resolution seismic images, respectively, low-resolution seismic volume [Fig. 1(c)] where random noise
Down denotes a degradation mapping function and 1 is is further added to obtain a more realistic seismic volume
the parameters of Down. The goal of super-resolution is [Fig. 1(d)]. From the noisy and low-frequency seismic volume,
to reconstruct I S to approximate the high-resolution seismic we finally extract the same 2-D sections and downsample
images I H from the low-resolution seismic images I L through them as the input training images. In this way, we obtain
a CNN or other methods. training image pairs with the same structures but different
For seismic denoising, if I p and I N denote the clean and resolutions.
noisy seismic images, respectively, then the relationship can Fig. 2(a) and (b) shows the frequency spectrum of
be expressed as follows: 2-D seismic sections extracted from the automatically gener-
ated low-resolution [Fig. 1(c)] and high-resolution [Fig. 1(b)]
IN = I p + n (2)
seismic volumes, respectively. We observe that the spectrum
where n is the noise added to I p . Our purpose is to obtain the [Fig. 2(b)] of a high-resolution seismic section shows a wider
noise distribution n and then subtract it from noise seismic frequency band with significant more high-frequency compo-
images I N to obtain output images that approximate the clean nents than the one [Fig. 2(a)] of a low-resolution section.
seismic images I p . In generating the training data pairs of low-resolution (input)
In this work, we tackle those two issues simultaneously. and high-resolution (label or output) seismic images, the peak
The noise-free seismic images with high resolution and the frequency of the wavelets are randomly chosen, however,
noise seismic images with low resolution are used as ground we make sure the spectrum band of the high-resolution seismic
truth Igt and input Iinput , respectively. We recover a seismic image is always wider than the one of the corresponding
approximation Ioutput of the ground truth Igt from the input, low-resolution image. And the range of peak frequencies is
following: 5–25 Hz. Using randomly varying peak frequencies for differ-
ent training data pairs is helpful to train a better generalized
Ioutput = N(Iinput , 2 ) (3) model for different field data sets which typically show dif-
where N denotes the CNN used, and 2 are the parameters ferent peak frequencies.
of the network. We aim to leverage deep CNN to achieve In generating the input seismic data for training, we have
seismic image super-resolution and denoising simultaneously. added random colored noise into the data as shown
The details of the used CNN model are as follows. in Figs. 1(d) and 3(b), where the added noise looks more
realistic than the simple white noise in Fig. 3(a). In order to
III. T RAINING DATA S ETS increase the diversity and generalization of the training data,
Before training a model for super-resolution and denoising the signal-to-noise ratio (SNR) for each training sample is
together, we need many 2-D high resolution pure seismic randomly defined in the range of [4, 14].

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: DEEP LEARNING FOR SIMULTANEOUS SEISMIC IMAGE SUPER-RESOLUTION AND DENOISING 3

Fig. 1. From a folded and faulted reflectivity model (a), we compute two seismic volumes in (b) and (c) by convolving a high-frequency and low-frequency
Ricker wavelet, respectively. From the high-frequency volume, we extract 2-D slices (e) as training labels. In the low-frequency volume, we further add
random noise to obtain a noisy seismic volume (d) from which we extract 2-D slices (f) and then downsample them to obtain input training data.

Fig. 2. Amplitude spectrum of 2-D seismic sections extracted from


Fig. 1(c) and (b), respectively: (a) spectrum map of the seismic volume
convolved with low-frequency wavelets and (b) spectrum map of the seismic
volume convolved with high-frequency wavelets.

Fig. 4. Experimental results on synthetic seismic data: (a) clean and


high resolution seismic section extracted from Fig. 1(b); (b) same section
extracted from Fig. 1(d); (c) input seismic section downsampled from (b);
and (d) recovered seismic section using our method.

high-frequency wavelets [Fig. 4(a)]. And the low-resolution


seismic images (Iinput ) [Fig. 4(c)], used as input, are obtained
by downsampling the same 2-D sections by a factor of
Fig. 3. Comparison of synthetic seismic images with different noise:
(a) synthetic image with white noise and (b) synthetic image with colored
2 [Fig. 4(b)] which are generated from the volumes with
noise. random noise. We expect our network to upscale the input
downsampled image and broaden its frequency band like the
To prepare training data sets, we must generate many pairs conventional methods which achieve the super-resolution by
of 2-D seismic images Igt and Iinput . Inline or crossline using dense receivers and broad frequencies during the data
2-D seismic sections, used as high-resolution images (Igt ) acquisition. In our experiment, we divided 600 of the total
are extracted from 3-D synthetic seismic volumes with 800 3-D volumes to contribute training set, 75 volumes for

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

4 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 5. Network architecture used in our proposed method.

validation, 75 volumes for the test set. The rest 50 volumes 2 × 2 and stride 2, two convolution layers with kernel
are retained to contribute the slices for fault detection. From 3 × 3, and each convolution layer is followed by a batch
each 3-D volume, we extract two 2-D sections for generating normalization layer and a rectified linear unit (ReLU). The
one of training/validation/testing bins. Our work aims at recon- number of feature channels are 64, 128, 256, 512, and 1024,
structing the high-resolution images Ioutput from low-resolution respectively, through four downsampling blocks. Upsampling
images with noise Iinput . In particular, Ioutput is expected to block is an opposite design with the downsampling block, and
close to the original high-resolution images Igt . it enlarges the feature size by a transposed convolution layer.
Then, we concatenate the output of the transposed convolution
B. Data Augmentation layer with the feature maps from the downsampling block at
the same level. After that, the output is fed to two convolution
Data augmentation is one of the most useful methods for layers and the layers yield the same design as downsampling
improving the performance of deep models. There are some except for the feature number.
successful instances in geophysics [24]. In order to avoid using The goal of introducing subpixel convolution layer [22]
a large memory footprint for training, we first crop the pairs is to conduct upsampling. As another way to increase the
of 2-D seismic images into some small patches at random. resolution by a dense acquisition without spatial aliasing, we
However, the information contained in a small patch is often want to simulate the methods to achieve high-resolution by the
insufficient to recovery the details between Iinput and Igt [25]. subpixel convolution layer. Besides, the image size is small in
Thus, we choose the size of the input seismic patches to be the main part of our network, i.e., the U-net. Thus, we can
96 × 96 to balance the above problems, and the corresponding reduce training time and save GPU memory. In practice,
size of the high-resolution seismic patches is 192 × 192. In if we remove the subpixel convolution layer and the input
addition, we add some simple geometric manipulation with size is 256 × 256, the training time will be two to three
randomly horizontal flipping the pairs of patches to increase times longer. Here, we first increase the feature channels by
the diversity of training data sets. convolution and then reshaping them to enlarge the resolution
of inputs. Unlike transposed convolution used in the first
IV. A RCHITECTURE AND T RAINING D ETAILS part, the subpixel convolution layer provides more contextual
The network of this article is a variant of U-net. To obtain information through a larger receptive field, which is beneficial
more realistic results, we use a MS-SSIM loss function [20] to for generating more realistic details [26].
avoid the overly smoothed structural edges. Finally, we esti- The last part, i.e., the residual blocks learn more
mate our method on synthetic data sets. high-frequency information and details from input to target.
Each residual block contains two convolution layers, and
each convolution layer is followed by a batch normalization
A. CNN Architecture layer and a ReLU. Then, a skip connection covers the two
The network architecture used in our method is illustrated convolution layers. In practice, we achieve good performance
in Fig. 5, which consists of three parts: a stand U-net, with just three residual blocks. When we removed these
a subpixel layer and several residual blocks. The U-net is three residual blocks, the peak signal-to-noise ratio (PSNR)
an encoder-decoder network and includes four downsampling on the test set dropped from 29.068 to 28.803. It is more
blocks and corresponding upsampling blocks. Each down- obvious to observe the performance improvement by using
sampling blocks consists of a max-pooling layer with kernel the residual blocks. Fig. 6 shows a field example using the

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: DEEP LEARNING FOR SIMULTANEOUS SEISMIC IMAGE SUPER-RESOLUTION AND DENOISING 5

TABLE I
PSNR ON THE S AME T EST D ATA S ET B UT W ITH
D IFFERENT L OSS F UNCTION W EIGHTS α

In general, we set c1 = 1 × 10−4 , c2 = c3 /2 =9 c1 . And


Fig. 6. Comparison test in a field example: (a) result without the residual MS-SSIM is defined as
α 
blocks and (b) result with the residual blocks.
 M
 β  γ
MS-SSIM(x, y) = l M (x, y) M · c j (x, y) j s j (x, y) j
trained U-Net with and without three residual blocks. The j =1
result with residual blocks shows fewer artifacts and more (7)
recovered details. Finally, we use a convolution layer with
where M is the scale. In generally, M is set as 5, α5 = 0.1333,
kernel 1 × 1 to reduce the number of feature channels to
match the ground truth. α = β = γ = [0.0448, 0.2856, 0.3001, 0.2363, 0.1333]. The
multiscale means that we measure the SSIM in different scales,
i.e., we first zoomed-out view the pairs by a factor 2 j −1 , then
B. Loss Functions calculate each term of SSIM and finally multiply them together
We train our network using a new loss function that com- with the corresponding weights α j , β j , γ j . We zoomed-out
bines the 1 loss and MS-SSIM loss. Due to the advantage view the image pairs by simply using an average pool layer.
in improving the performance and convergence over mean It must be noted that the range of MS-SSIM values is not 0 to
squared error (MSE) or 2 loss [12], 1 loss has been widely 1. Because the covariance σxy can be a negative value which
used for image super-resolution. Mathematically, 1 loss is may lead s(x, y) to be a negative number. To modify it as a
defined as loss function, we need to normalize it to 0 to 1, and we can
1  simply perform MS-SSIM = (MS-SSIM + 1)/2 to achieve the
L1 = |ISR (i, j ) − IHR (i, j )| (4) goal. A ReLU layer can also be used for the same purpose.
N i, j
The higher the value of A, the closer the two pictures are.
where N is the total number of pixels. In practice, however, To improve the perception quality of recovered images,
the network trained using only the 1 loss will generate we combine the 1 loss and MS-SSIM loss and obtain a new
unsatisfying high-resolution images with smooth textures. The loss function defined as
reason is that 1 loss minimizes the only pixel-wise distance
between output and target and ignore the texture structures. LMix = α · LMS-SSIM + (1 − α) · L1 (8)
To tackle this issue, we introduce a more sophisticated loss where
term derived from MS-SSIM, which is also used in computer
version tasks [20]. LMS-SSIM = 1 − MS-SSIM(ISR , IHR ) (9)
MS-SSIM, an assessment for image quality, is sensitive to
and α is the weight of loss function and we empirically set
local structure variations and more appropriate for the human
α = 0.6. Table I shows the PSNR values on the same test
visual system (HVS). It is an improved version of SSIM [27]
data set with different weights α. It shows that our network
and can be mathematically defined as
achieves the best performance when α = 0.6, A field example
SSIM(x, y) = [l(x, y)]α · [c(x, y)]β · [s(x, y)]γ (5) in Fig. 7 also visually demonstrates that α = 0.6 is the best
choice. When α = 0.6, the recovered seismic image yields
where fewer artifacts and looks more realistic.
2μx μ y + c1 The comparison between 1 loss and mix loss is illustrated
l(x, y) =
μ2x + μ2y + c1 in Fig. 8. The three seismic images are extracted from the
2σxy + c2 upper left region shown in Fig. 4 using different loss functions
c(x, y) = 2 and their corresponding ground truth. It is modestly that the
σx + σ y2 + c2
σxy + c3 output of the mix loss [Fig. 8(b)] shows sharper discontinuities
s(x, y) = . (6) near faults than the 1 loss [Fig. 8(c)] in faults. The mix loss
σ x σ y + c3
leads to realistic and perception results. This phenomenon is
Here, x and y are two images. μi , σi represent the mean suggestive where the areas are denoted by the red arrows.
and the standard deviation of image i , σxy is the covariance For example, near the area highlighted by the left arrow in
between images x and y. And c1 , c2 , c3 are three constants to the 1 loss result [Fig. 8(c)], the fault is smoothed out and
avoid the situations where the denominator is too small to be the reflections are continuous across the fault, which is untrue
stable. l(x, y), c(x, y), s(x, y) represent three measurements compared to the ground truth [Fig. 8(a)]. This will mislead the
between x and y: luminance or amplitude in seismic image, following seismic interpretation tasks such as fault detection.
contrast and structure, respectively. α, β, γ are the correspond- By using the mix loss, we are able to better preserve the fault
ing weights of three measurements and have to be positive. discontinuities as denoted by the red arrows in Fig. 8(b).

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

6 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 7. Results (a)–(e) by using a loss function weight (α) of 0.3, 0.4, 0.5, 0.6, 0.7, respectively.

Fig. 8. Comparison between the ground truth (a), the result of the proposed mix loss (b), and traditional 1 loss (c). These subimages are extracted from
the upper left region shown in Fig. 4.

Fig. 10. Three traces extracted from the same position where the vertical
lines shown in Fig. 4. The green, blue, and red curves, respectively, represent
the traces extracted from the input, output, and ground truth seismic images.

Fig. 9. Training record: (a) loss function values on training and vali-
We first evaluate the performance of our CNN model on
dation data sets and (b) performance curves of PSNR on validation data synthetic seismic images, i.e., the test sets that are not involved
sets. in training and validation. The experimental results are shown
in Fig. 4, where Fig. 4(a) is the pure seismic images with
C. Training Details high resolution used as ground truth, Fig. 4(c) and (d) are
As discussed in the section of generating training data, we the input noise seismic image with low resolution and the
generated 1200 pairs of the 2-D image for training. And every output seismic image recovered by our CNN model respec-
input seismic image and label seismic image is normalized to tively. Compared with the low-resolution noise seismic image,
[0, 1] by the following formula: the recovered seismic image provides enhancing structural
features and sharper geologic edges, especially in faults and
x − x min
x∗ = (10) seismic horizons. And the thin layers are recovered well even
x max − x min if there are only some blurred traces almost invisible to human
where x ∗ is the normalized seismic image, x max and x min are eyes on the low-resolution image. In addition, the result also
maximum value and minimum value of each input seismic offers an effect of denoising. It is obvious that the seismic
image, respectively. We then preprocess all the seismic images section between two seismic horizons of recovered images is
by data augmentation discussed before. more smoothed and cleaner compared with the input seismic
We train our model with ADAM optimizer [28] and set the images.
parameter β1 = 0.9, β2 = 0.999,  = 10−8 . The learning Furthermore, we compare the amplitude characteristics of
rate is initialized to 1e − 4. We set batch size as 16 and totally three traces and report the results in Fig. 10. These three
extract 16 × 1000 patch pairs from training data sets. We train traces are extracted from the output seismic section (red) and
our network over 150 epochs. We provide the training details the corresponding pair of the low-resolution seismic image
in Fig. 9, where Fig. 9(a) reports the loss function values (green) and the ground truth (blue) where the corresponding
on training and validation data sets; and Fig. 9(b) reports the color vertical lines. The waveforms of three curves are in
performance curves of PSNR on validation data sets. Although approximate agreement keeping the shape. But the red one
the loss function and PSNR curves do not converge until nearly yields more details than the green curve. The ground truth
100 epochs, it only takes us about 4 h to finish a training task curve maintains similar characteristics with the output trace.
that works in NVIDIA Tesla V100. This characteristic is well manifested in the range of samples

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: DEEP LEARNING FOR SIMULTANEOUS SEISMIC IMAGE SUPER-RESOLUTION AND DENOISING 7

Fig. 11. Comparison between our method and BroadSeis: (a) 2-D native field seismic section; (b) result recovered by our method; and (c) result by BroadSeis.

Fig. 12. Comparison of Fig. 11 in detail: (a), (b), and (c) are three patches (the yellow boxes) extracted from Fig. 11.

0–50 of the curve, where many thin layers are covered by images acquired at different 3-D surveys to the well-trained
random noise in Fig. 4(c). These details may be the faults CNN model. And the sampling interval for each example is
or thin layers that appear blurry or have a small change of 4 ms. Before applying the field seismic images, each image
amplitude compared to surroundings in low-resolution seismic is normalized as same as the synthetic data sets to make it
images. In a words, the clean and high-resolution seismic consistent with training. Besides, the dimension size of the
sections with enhanced faults and thin layers are generated input seismic image is not fixed and is only required to be
by applying the input seismic images to the CNN model, and dividable by 2t , or we need to resize the input seismic image
can facilitate subsequent seismic interpretation. Our method of so that its dimension can be dividable by 2t , where t is the
simultaneous super-resolution and denoising is highly efficient. downsampling times of the architecture. We use t = 4 in our
It takes only several seconds to process all 150 images, each experiment.
with a size of 128 × 128. Fig. 11 shows a real example where the native seismic
image [Fig. 11(a)] is directly captured from the paper of
V. A PPLICATIONS BroadSeis [3]. Feeding this native image into our trained CNN
We feed several field seismic images to the well-trained model, we obtain an improved image [Fig. 11(b)] where the
model and make a comparison with a conventional method. noise is effectively removed and the resolution of detailed fea-
We also deploy the results of our CNN model into fault tures (e.g., thin layers and small-scale faults) are significantly
detection to confirm the ability of our well-trained model. improved. Our result [Fig. 11(b)] shows even more details than
the improved image [Fig. 11(c)] by the BroadSeis technique.
The BroadSeis requires more expensive acquisition costs and
A. Several Real Examples computational costs for processing while our CNN-based
Our CNN model achieves good effectiveness and gener- method requires no extra cost and takes only milliseconds to
alization on both synthetic and field seismic data sets even compute the result shown in Fig. 11(b). Fig. 12(a)–(c) show
if it is trained with synthetic seismic data only. To verify a zoomed-in view of the yellow boxes in the native image,
the capability of the model, we feed some 2-D field seismic our result, and the BroadSeis image, respectively. From these

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

8 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 13. Experimental results of proposed method on field seismic data: (a), (b), and (c) are three field seismic sections; (d), (e), and (f) are corresponding
recovered results; (g), (h), and (i) are corresponding interpolation results (using bicubic). In particular, (b) shows large amplitude values (see the color bar at
the bottom) which are much different from the synthetic seismic sections used for training.

subimages, we can more clearly observe that our CNN-based to the native images (the top row of Fig. 13) and the corre-
method [Fig. 12(b)] significantly enhances the detailed struc- sponding interpolated images (the bottom row of Fig. 13), our
tures of thin layers and the faults with small throws (as denoted results show much more clear structures with noise removed
by the red arrows). Those faults are relatively small-scale ones and higher resolution of detailed structural and stratigraphic
but they certainly exist. They can be seen roughly in the native features such as small-scale faults and thin layers. Three
seismic image but appear very blurry, while the image after subimages (Fig. 14) are extracted from the yellow box area
processing by our method provides a better view of those of the second field data in Fig. 13 to provide a detailed
faults. comparison. Although the structural features and the amplitude
Fig. 13 shows another three field examples, where values in these field images are significantly different from our
Fig. 13(a)–(c) are the three native field seismic images that synthetic training data, our trained CNN model still works well
are acquired at different surveys. Fig. 13(d)–(f) are the cor- as in the synthetic tests. This indicated that our CNN model,
responding results computed by our CNN-based method of trained with only synthetic data sets, is well generalized for
simultaneous super-resolution and denoising. Fig. 13(g)–(i) various field data sets.
are the corresponding seismic images obtained by upsampling Fig. 15 shows some seismic traces that are extracted from
the native image through bicubic interpolation with a factor the native images and our improved images in Fig. 13. Due
of 2. The size of the image obtained by bicubic is consistent to the bottom row of Fig. 13 is just an interpolated result and
with the output image of CNN. The interpolated images look very similar to the native images, we do not provide the traces
very similar to the native images, which indicates that the of the three bottom images. The top subfigure in Fig. 15 shows
conventional interpolation method is not helpful to improve the traces of first field data shown in Fig. 13(a) and (c), and the
the resolution of details of the original images. Compared traces in the middle figure are extracted from the second data

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: DEEP LEARNING FOR SIMULTANEOUS SEISMIC IMAGE SUPER-RESOLUTION AND DENOISING 9

Fig. 14. Comparison in detail: (a), (b), and (c) are extracted from Fig. 13(b), (e), and (h) (the yellow boxes).

seismic label image. Fig. 17(b)–(d) show the fault maps


computed from the seismic label image, the output seismic
image of our network, and the input low-resolution and noisy
seismic image, respectively, by using the same fault likelihood
scanning method [29], [30]. We observe that the fault maps
computed from our output seismic image [Fig. 17(c)] and
the seismic label image [Fig. 17(b)] show almost the same
fault features. Compared to the fault map computed from the
input noisy and low-resolution seismic image [Fig. 17(d)],
the fault map computed from our output seismic image shows
much cleaner and higher resolution fault features. To further
quantitatively evaluate the fault detection results, we compute
the curves of the pixel-wise accuracy and mean intersection
Fig. 15. Traces analysis of the field seismic images (blue) and the results (red) over union (MIoU) based on the ground truth of the faults
of our CNN method shown in Fig. 13. [Fig. 18(a)]. The MIoU is defined [31] as

1 
[Fig. 13(b) and (d)]. The traces in the bottom subfigure are k
pii
extracted from the third field data [Fig. 13(c) and (f)]. The MIoU = k k (11)
k + 1 i=0 j =0 pi j + j =0 p j i − pii
blue curves represent the native seismic image traces and the
red ones are the traces of our output images. Compared to where the total number of classes is k+1, and pi j is the number
the blue curves (native images), the red curves (our results) of pixels of class i inferred to belong to class j . To calculate
show similar waveform trends and characteristics but much these two assessments, we assume the calculated faults are
more details. accurate when the likelihood value is greater than a threshold.
Spectrum analysis of the three field seismic data is illus- These curves also show that fault detection in our output
trated in Fig. 16 where each frequency amplitude is averaged seismic image (green curves) is significantly more accurate
over all the traces in a 2-D section. The blue and red than in the corresponding input seismic image (blue curves).
curves represent the amplitude spectrum of our output seismic The accuracy of input increases as the threshold increases,
sections and the input sections, respectively. As we expect, while the other two curves change very little. Because the fault
the frequency bands of our output seismic sections are wider detected result of the input seismic image is greatly affected
than the bands of native input seismic sections, especially by the noise. This indicates that our network is helpful for the
in high-frequency part. It must be noted that some of the next step of seismic structure interpretation by simultaneously
recovered detailed features are not necessarily true in the improving the seismic resolution and removing the noise in
results of our CNN, especially in the areas with quite low the seismic image.
data quality, such as the lower left area of Fig. 13(a), where We also illustrated a real example. Fig. 19(a) and (b),
the results are more likely to contain artifacts. respectively, show the fault detection results computed from
the native seismic image [Fig. 13(b)] and the correspond-
B. Fault Detection ing super-sampled and denoised seismic image [Fig. 13(e)].
Fault detection is one of the most important tasks in Almost all the individual faults detected on our CNN results
seismic interpretation as faults often indicate the locations of are sharper than those computed on the native seismic image.
petroleum reservoirs. The experiment on synthetic and field In addition, the fault features are less noise and more contin-
data shows fault detection can be significantly benefited from uously tracked. This means that the positions of the faults are
our CNN method. predicted more accurately after processing the seismic data by
We first apply fault detection on 100 synthetic 2-D sections using our CNN method.
where we know the ground truth of the faults for comparison. In summary, we conclude that our method is indeed benefi-
Fig. 17(a) shows true faults overlaid on the high-resolution cial to fault detection, and the images processed by our CNN

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

10 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING

Fig. 16. Spectrum analysis of three field seismic images.

Fig. 17. Comparison of fault detection: (a) true fault map overlaid on the
seismic label image; (b) fault likelihood map scanned from the high-resolution
seismic label; (c) fault likelihood map scanned from the output seismic; and
(d) fault likelihood map scanned from the input low-resolution and noisy
seismic.

Fig. 18. Quantitative evaluation of the fault maps computed from the
high-resolution seismic label image (red), recovered seismic image (green)
by our network, and low-resolution and noisy seismic image input to the
network: (a) pixel accuracy and (b) MIoU.
Fig. 19. Comparison of fault detection: (a) fault detection in the native
seismic image and (b) fault detection in the seismic image enhanced by our
model does provide a better view of fault detection with clearer CNN method.
and sharper fault features, and more accurate fault locations.
is able to significantly enhance the detailed structural and
VI. C ONCLUSION stratigraphic features (e.g., thin layers and small-scale faults)
In this article, we developed a novel CNN-based method in the input seismic images. Besides, a result of fault detection
for achieving seismic image super-resolution and denoising confirms the effectiveness of our method.
simultaneously. Due to lacking seismic labels in the real world, However, our proposed method still has some limitations.
we generate plenty of synthetic seismic sections to train our Some recovered detailed features need to be further verified
CNN. And we use a loss function that combines the 1 loss and as some of them (especially those in the areas with low
MS-SSIM loss. That loss function can improve the perception data quality) may be artifacts. And the loss gap between the
of quality and alleviate overly smoothed geological edges. Our training and validation may indicate slight overfitting. For
proposed method performs well on both synthetic and field future work, we will investigate more details of the tradeoffs
seismic data. Multiple examples demonstrate that our method between overfitting, training data set size, and architecture.

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.

LI et al.: DEEP LEARNING FOR SIMULTANEOUS SEISMIC IMAGE SUPER-RESOLUTION AND DENOISING 11

Besides we want to leverage transfer learning to reduce the [23] X. Liu, X. Chen, J. Li, X. Zhou, and Y. Chen, “Facies identification
gap between synthetic and field seismic data. By doing this, based on multikernel relevance vector machine,” IEEE Trans. Geosci.
Remote Sens., vol. 58, no. 10, pp. 7269–7282, Oct. 2020.
the performance of field seismic data is expected to be further [24] F. Li, H. Zhou, Z. Wang, and X. Wu, “ADDCNN: An attention-based
improved. deep dilated convolutional neural network for seismic facies analysis
with interpretable spatial–spectral maps,” IEEE Trans. Geosci. Remote
Sens., vol. 59, no. 2, pp. 1733–1744, Feb. 2021.
R EFERENCES [25] J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution
using very deep convolutional networks,” in Proc. IEEE Conf. Comput.
[1] Y.-G. Zhang, Y. Wang, and J.-J. Yin, “Single point high density seismic Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1646–1654.
data processing analysis and initial evaluation,” Shiyou Diqiu Wuli [26] Z. Wang, J. Chen, and S. C. H. Hoi, “Deep learning for image super-
Kantan(Oil Geophys. Prospecting), vol. 45, no. 2, pp. 201–207, 2010. resolution: A survey,” 2019, arXiv:1902.06068. [Online]. Available:
[2] F. Xiao et al., “High-density 3D point receiver seismic acquisition and http://arxiv.org/abs/1902.06068
processing–a case study from the Sichuan Basin, China,” First Break, [27] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
vol. 32, no. 1, 2014. quality assessment: From error visibility to structural similarity,” IEEE
[3] R. Soubaras, R. Dowle, and R. Sablon, “Broadseis: Enhancing interpre- Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
tation and inversion with broadband marine seismic,” CSEG Recorder, [28] D. P. Kingma and J. Ba, “Adam: A method for stochastic
vol. 37, no. 7, pp. 40–46, 2012. optimization,” 2014, arXiv:1412.6980. [Online]. Available:
[4] T. Rebert, R. Sablon, N. Vidal, P. Charrier, and R. Soubaras, “Improving http://arxiv.org/abs/1412.6980
pre-salt imaging with variable-depth streamer data,” in Proc. SEG Tech. [29] D. Hale, “Methods to compute fault images, extract fault surfaces, and
Program Expanded Abstr., Sep. 2012, pp. 1–5. estimate fault throws from 3D seismic images,” Geophysics, vol. 78,
[5] Y. Wang, J. Wang, X. Wang, W. Sun, and J. Zhang, “Broadband no. 2, pp. O33–O43, Mar. 2013.
processing key technology research and application on slant streamer,” [30] X. Wu and D. Hale, “3D seismic image processing for faults,” Geo-
in Proc. Int. Geophys. Conf., Beijing, China, Apr. 2018, pp. 135–138. physics, vol. 81, no. 2, pp. IM1–IM11, Mar. 2016.
[6] G. C. Fehmers and C. F. Höcker, “Fast structural interpretation with [31] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, and
structure-oriented filtering,” Geophysics, vol. 68, no. 4, pp. 1286–1293, J. Garcia-Rodriguez, “A review on deep learning techniques applied to
2003. semantic segmentation,” 2017, arXiv:1704.06857. [Online]. Available:
[7] N. M. AlBinHassan, Y. Luo, and M. N. Al-Faraj, “3D edge-preserving https://arxiv.org/abs/1704.06857
smoothing and applications,” Geophysics, vol. 71, no. 4, pp. P5–P11,
Jul. 2006.
[8] D. Hale, “Structure-oriented smoothing and semblance,” Colorado
School Mines, Golden, CO, USA, CWP Rep. 635, 2009.
[9] Y. Liu, S. Fomel, and G. Liu, “Nonlinear structure-enhancing filtering Jintao Li received the B.S. degree in geophysics
using plane-wave prediction,” Geophys. Prospecting, vol. 58, no. 3, from the University of Science and Technology of
pp. 415–427, May 2010. China (USTC), Hefei, China, in 2020, where he
[10] D. Hale, “Structure-oriented bilateral filtering,” Colorado School Mines, is pursuing the M.S. degree with Computational
Golden, CO, USA, CWP Rep. 695, 2011, pp. 239–248. Interpretation Group (CIG).
[11] X. Wu and Z. Guo, “Detecting faults and channels while enhancing His research interests include deep-learning appli-
seismic structural and stratigraphic features,” Interpretation, vol. 7, no. 1, cations on geophysics including seismic super-
pp. T155–T166, Feb. 2019. resolution, denoising, and seismic facies analysis.
[12] B. Lim, S. Son, H. Kim, S. Nah, and K. M. Lee, “Enhanced deep
residual networks for single image super-resolution,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jul. 2017,
pp. 136–144.
[13] K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a
Gaussian denoiser: Residual learning of deep CNN for image denoising,”
IEEE Trans. Image Process., vol. 26, no. 7, pp. 3142–3155, Jul. 2017. Xinming Wu received the Ph.D. degree in geo-
[14] V. Lempitsky, A. Vedaldi, and D. Ulyanov, “Deep image prior,” in physics from the Colorado School of Mines, Golden,
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Jun. 2018, CO, USA, in 2016.
pp. 9446–9454. He was a Post-Doctoral Fellow with the Bureau of
[15] T. Dai, J. Cai, Y. Zhang, S.-T. Xia, and L. Zhang, “Second- Economic Geology, University of Texas at Austin,
order attention network for single image super-resolution,” in Proc. Austin, TX, USA. He is a Professor with the School
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019, of Earth and Space Sciences, University of Science
pp. 11065–11074. and Technology of China (USTC), Hefei, China.
[16] P. Lu, M. Morris, S. Brazell, C. Comiskey, and Y. Xiao, “Using gener- His research interests include image processing,
ative adversarial networks to improve deep-learning fault interpretation machine learning, 3-D seismic interpretation, sub-
networks,” Lead. Edge, vol. 37, no. 8, pp. 578–583, Aug. 2018. surface modeling, and geophysical inversion.
[17] X. Wu, L. Liang, Y. Shi, and S. Fomel, “FaultSeg3D: Using synthetic Dr. Wu received the Society of Exploration Geophysicists (SEG’s) awards
data sets to train an end-to-end convolutional neural network for 3D for J. Clarence Karcher Award, 2020 and Honorary Lecturer, South and
seismic fault segmentation,” Geophysics, vol. 84, no. 3, pp. IM35–IM45, East Asia, 2020. He was also a recipient of the Best Paper Award in
2019. Geophysics in 2016, the Best Student Poster Paper Award at the 2017 SEG
[18] X. Wu, Z. Geng, Y. Shi, N. Pham, S. Fomel, and G. Caumon, Annual Convention, and the Honorable Mention Award for Best Paper at the
“Building realistic structure models to train convolutional neural net- 2018 SEG annual convention.
works for seismic structural interpretation,” Geophysics, vol. 85, no. 4,
pp. WA27–WA39, Jul. 2020.
[19] Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural
similarity for image quality assessment,” in Proc. 37th Asilomar Conf. Zhanxuan Hu is pursuing the Ph.D. degree
Signals, Syst. Comput., vol. 2, 2003, pp. 1398–1402. with Northwestern Polytechnical University, Xi’an,
[20] H. Zhao, O. Gallo, I. Frosio, and J. Kautz, “Loss functions for image China.
restoration with neural networks,” IEEE Trans. Comput. Imag., vol. 3, His research interests include topics in machine
no. 1, pp. 47–57, Mar. 2017. learning and its application in geophysics.
[21] O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional net-
works for biomedical image segmentation,” in Proc. Int. Conf. Med.
Image Comput. Comput.-Assist. Intervent. Springer, 2015, pp. 234–241.
[22] W. Shi et al., “Real-time single image and video super-resolution using
an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 1874–1883.

Authorized licensed use limited to: University of Science & Technology of China. Downloaded on April 20,2021 at 00:53:06 UTC from IEEE Xplore. Restrictions apply.

You might also like