Curvelet Deep Mutli Scale 2019
Curvelet Deep Mutli Scale 2019
MULTI-SCALE NETWORK
Chunpeng Wang? Simiao Wang† Bin Ma?∗ Jian Li?∗ Xiangjun Dong? Zhiqiu Xia‡∗
?
School of Information, Qilu University of Technology (Shandong Academy of Sciences)
†
School of Electronic and Information Engineering, Liaoning Technical University
‡Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology
ABSTRACT
This paper proposes a new medical image super-resolution
(SR) network, namely deep multi-scale network (DMSN),
in the uniform discrete curvelet transform (UDCT) domain.
DMSN is made up of a set of cascaded multi-scale fush-
ion (MSF) blocks. In each MSF block, we use convolution
kernels of different sizes to adaptively detect the local multi-
scale feature, and then local residual learning (LRL) is used Fig. 1: The performance of our network: the left side is the
to learn effective feature from preceding MSF block and cur- original image. The right side is the red zone of the LR im-
rent multi-scale features. After obtaining multi-scale features age (8×), the SR image, and the original image from top to
of different MSF block, we use global feature fusion (GFF) bottom
to jointly and adaptively learn global hierarchical features
in a holistic manner. Finally, compared with other predic-
tion methods in spatial domain, we applied DMSN in UDCT proves that increasing the network depth can result in better
domain, which enables a better representation of global topo- SR results. In addition, the current deep SR network is a se-
logical structure and local texture detail of HR images. DM- ries of identical feature extraction blocks (FEB). The ability
SN shows superior performance over other state-of-the-art of each FEB to extract features plays a crucial role in the fi-
medical image SR methods. nal SR performance. Based on this consideration, this paper
Index Terms— super-resolution, deep multi-scale net- proposes an efficient multi-scale fushion block to effectively
work, uniform discrete curvelet transform, local residual exploit features.
learning, global feature fusion Due to image transform domain can reserve context and
texture information of image at different levels, image SR re-
1. INTRODUCTION construction in the transform domain has attracted some at-
tention. As a classic image transformation, wavelet transform
In clinical medicine, high-resolution (HR) medical images has been used for SR in natural and face images. However,
are visual and effective tools for physicians to make accu- the direct extension of a wavelet to 2D by the tensor product
rate diagnoses. However, acquisition of HR medical images of two 1D wavelets is no longer optimal for representing a im-
is complicated by many factors. Low-resolution (LR) medi- age that has features along smooth curves. To overcome this
cal images will badly influence physicians’ diagnoses; thus, limitation, we use uniform discrete curvelet transform (UD-
super-resolution (SR) techniques for medical images [1] have CT) for SR in this paper, which is a real 2D image represen-
gradually become extremely crucial. tation tool with multi-scale, multi-directional and anisotropic
Due to the powerful learning ability, CNN-based methods features. Fig. 1 shows the performance of our network, in-
[2, 3, 4, 5, 6, 7, 8, 9, 10, 11] are widely used to address nature dicating the super-resolution medical images by the proposed
image SR tasks and have achieved impressive results. From DMSN have abundant details.
the first SR network SRCNN [2] to the latest RCAN [12], the This paper makes the following contributions: (1) The d-
number of convolutional layer increases from 3 to 400, which ifference between natural images and medical imaging gives
rise to significant differences in textural detail and edge struc-
This work was supported by the National Natural Science Foundation of
China (Nos: 61802212, 61872203, and 61502241). Data used in this pub-
ture; in light of this, we constructed a database applicable to
lication were generated by the National Cancer Institute Clinical Proteomic medical image SR (SRMIdataset) to improve the learning ef-
Tumor Analysis Consortium (CPTAC). fects of the CNN-based SR method; (2) We proposed a new
17×17 Deconvolutional
to construct our network in cascaded manner. In each MSF
3×3 conv
3×3 conv
1×1 conv
3×3 conv
MSF D
Output
MSF 1
MSF d
concat
FGF FDF
Input
IInput F-1 F0 F1 Fd FD IOutput
Å
block, the features of multiple receptive fields are efficiently
exploited through convolution kernels of different sizes. By
using of local residual learning (LRL) and global feature fu-
sion (GFF), our DMSN can jointly and adaptively learn hi- (a)
erarchical features in a holistic manner. (3) Existing CNN-
based SR methods mostly concentrate on the spatial domain,
leading to over-smooth reconstruction results; therefore, UD- 7×7 conv 5×5 conv 3×3 conv
CT is applied to effectively restore global topology and local Low frequency Low frequency
edge detail information of HR images; (4) Results showed concat concat concat
similarity (SSIM) and augmentation of texture detail and edge 1×1 conv
Bicubic SR
... ...
structure of medical images. Å
High frequency High frequency
(b) (c)
Fig. 2: Network structure: (a) DMSN, (b) MSF block, and (c)
UDCT prediction
2. RELATED WORK
3. METHOD
In recent years, deep learning has aroused widespread interest 3.1. Network Structure
as a method for overcoming the defects of conventional shal-
low learning methods. Dong et al. [2] pioneered the appli- Our network structure is shown in Fig. 2(a). It consists of
cation of a CNN to image SR and unveiled a super-resolution three parts, the shallow feature extraction module, the multi-
convolutional neural network (SRCNN), which is significant- scale feature extraction module, and the up-sample module.
ly better than the output attained using conventional meth- We solve the following problem:
ods. Based on this, many CNN-based super-resolution algo- N
1 X SR
rithms have been proposed [4, 1, 3]. The above models share θ̂ = arg min L (Fθ (IiLR ), IiHR ), (1)
θ N
something in common—their network structures have fewer i=1
than 10 layers. However, other network models applied to
where θ = {w1 , w2 , w3 , ..., wm , b1 , b2 , b3 , ..., bm } stands for
computer vision indicate that depth of network does count in
the weights and bias of the convolutional layer, N is the num-
deep learning. As a result, researchers have started to apply
ber of training samples. LSR is the loss function for minimiz-
deep network models to SR [13, 5, 6, 7, 8]. Recently, many
ing the difference between IiLR and IiHR .
CNN-based SR methods construct the entire SR network by
The most widely-used image objective optimization func-
concatenating a series of identical feature extraction blocks
tion is the MSE function. However, Lim et al. [16] have
[14, 15, 12], indicating the ability of each block plays a key
demonstrated that training with MSE loss is not a good
role in the SR performance of the deep network.
choice. As a better alternative, the MAE loss function can be
The above methods complete image SR in the spatial do- defined as
main of the image but often generate overly smooth output N
1 X LR
that loses textural details. By contrast, image SR in the trans- LSR = I − IiHR . (2)
form domain can preserve the image’s context and texture N i=1 i 1
information in different layers to produce better SR results. Zheng et al. [14] empirically found that their model with
With that in mind, Guo et al. [9] designed a deep wavelet MSE loss can improve performance of a trained network with
super-resolution (DWSR) network to acquire HR image by MAE loss. In order to avoid introducing unnecessary training
predicting “missing details” of wavelet coefficients of the LR tricks and reduce computations, we use the L1 function.
image. Later, the same team [10] integrated discrete cosine After shallow feature module, we obtain the F0 and in-
transformation (DCT) into CNN and put forward an orthogo- put it into the feature extraction module, which contains a set
nally regularized deep network (ORDSR). In addition, Huang of cascaded MSF blocks. Here, we use global feature G by
et al. [11] applied wavelet transform to CNN-based face S- fusing features from all the MSF blocks.
R to validate that this method can accurately capture global
topology information and local textural details of faces. FGF = HGF F ([F1 , · · · , FD ]), (3)
2388
where [F1 , · · · , FD ] denotes the concatenation of feature- to effectively global topology while high-frequency subbands
maps produced by MSF blocks 1, · · · , D. HGF F is a com- capture important structural information. It is worth mention-
posite function of 1×1 and 3×3 convolution. ing that UDCT can be used in different SR networks, which
Global residual learning is then utilized to obtain the is a simple and effective way to improve the performance. S-
feature-maps before conducting up-scaling by peaking of the role of UDCT, it is to take further experiment
in Section 4.4. The detailed process of UDCT implementa-
FDF = F−1 + FGF , (4) tion can be found in [17].
where F−1 represents the shallow feature-maps. All the other
layers before GFF are fully utilized with our proposed MSF 4. EXPERIMENTS
blocks. Finally, we input FDF to 17×17 deconvolution layer
to obtain the output of HR. Except for the deconvolutional In the experiments, the performance of the proposed DMSN is
layer, the other layers are followed by ReLu. evaluated on both qualitative and quantitative aspects. PSNR
and SSIM are used for quantitative evaluation. The contrast-
ing methods selected in this part—very deep convolutional
3.2. MSF block network (VDSR) [13], deep recursive residual network (DR-
The proposed MSF block is shown in Fig. 2(b). In each MSF RN) [7], deep persistent memory network (MemNet) [8], and
block, we construct a three-bypass network and different by- information distillation network (IDN) [14]—are all state-of-
pass use different convolutional kernel. In this way, the infor- the-art deep learning SR methods.
mation between those bypass can be shared with each other
so that able to detect the image features at different scales. 4.1. Medical Image Database
The operation can be defined as:
Image databases of four body parts, found in The Can-
1
C3 = σ(w3×3 ∗ Fd−1 + b1 ), (5) cer Imaging Archive (TCIA) [18]—breast, brain, lung and
kidney—are integrated to create a database applicable to
1
C5 = σ(w5×5 ∗ Fd−1 + b2 ), (6) medical image SR. This database comprises 400 medical
1 images—100 images for each body part. A total of 280 medi-
C7 = σ(w7×7 ∗ Fd−1 + b3 ), (7)
cal images (70 images for each body part) compose a training
2 4
H1 = σ(w1×1 ∗ [C3 , C5 ] + b ), (8) set; the remaining 120 images compose a test set.
2
H2 = σ(w1×1 ∗ [C5 , C7 ] + b5 ), (9)
2 6 4.2. Implementation Details
H3 = σ(w1×1 ∗ [C7 , C3 ] + b ), (10)
3 Data augmentation is performed on the 280-image training
Fd = w1×1 ∗ [H1 , H2 , H3 ] + b7 + Fd−1 , (11)
dataset described in Section 4.1. Inspired by [7, 13], the
where w and b stand for the weights and bias, respectively. flipped and rotated versions of training images are considered;
Fd−1 and Fd are the input and output of the d-th MSF block, specifically, we rotate the original images by 90◦ , 180◦ , and
respectively. σ(x) denotes the ReLU function. 270◦ and flip them horizontally. After that, for each original
image, we have seven additional augmented versions. The
3.3. Uniform Discrete Curvelet Transform training images are split into 41×41 patches, with the step of
31, by considering both the training time and storage com-
Wavelet analysis can not “optimally” represent image func- plexities. Our network contains 18 MSF blocks. The number
tions with straight lines and curves. Curvelet transform is a of feature maps used in all the convolutional layers is 64. The
very effective image representation method, which improves learning rate in initialized to 10−4 for all layers and decreases
the processing ability of complex lines. Several discrete half for every 50 epochs. Training our model takes roughly
curvelet and curvelet like transforms have been proposed in one day with Tesla P40 GPUs.
the past years, which can be divided into discrete transforms
based on the fast Fourier transform (FFT), or based on filter
4.3. Evaluation of Results
bank (FB) implementations. UDCT [17] is a new discrete
curvelet transform that uses the ideas of both FFT-based In this section, we evaluate the performance of our method on
discrete curvelet transform and filter-bank based contourlet four databases (i.e., breast, brain, lung, and kidney). PSNR
transform, which has excellent frequency response and ex- and SSIM [19] are used to measure the image quality. For fair
tremely low redundancy. comparison, we use the released codes of the above models
As shown in Fig. 2(c), a low-frequency subband and six and train all models with the same training set. The PSNR and
high-frequency subbands of one-level UDCT are entered in SSIM values for comparison (scale: 4× and 8×) are shown in
the network structure as “Input”. The seven subbands of the Table 1; values in bold font indicate optimal values. The table
SR image are as “Output”. Low-frequency subband is applied shows that when evaluated on four databases, our proposed
2389
Table 1: Comparison of PSNR/SSIM for different methods.
DMSN
DMSN+UDCT
32.8 35.3 30.5 32.6
30.4 32.5
32.7 35.2
32.4
PSNR(dB)
30.3
32.6 35.1 32.3
30.2
32.2
32.5 35
30.1
32.1
32.4 34.9 30 32
Breast Brain Lung Kidney
31.9 32.2
32.2
DMSN obtains higher PSNR and SSIM on average than other 31.8 32.1
methods. 31.7
32.1
32
2390
6. REFERENCES super-resolution,” arXiv preprint arXiv:1802.02018,
2018.
[1] Kensuke Umehara, Junko Ota, and Takayuki Ishida,
“Application of super-resolution convolutional neural [11] Huaibo Huang, Ran He, Zhenan Sun, Tieniu Tan, et al.,
network for enhancing image resolution in chest ct,” J. “Wavelet-srnet: A wavelet-based cnn for multi-scale
Digit. Imaging, vol. 31, no. 4, pp. 441–450, 2018. face super resolution,” in Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
[2] Chao Dong, Chen Change Loy, Kaiming He, and Xi- (CVPR), 2017, pp. 1689–1697.
aoou Tang, “Learning a deep convolutional network
for image super-resolution,” in European Conference [12] Yulun Zhang, Kunpeng Li, Kai Li, Lichen Wang, Bi-
on Computer Vision (ECCV), 2014, pp. 184–199. neng Zhong, and Yun Fu, “Image super-resolution using
very deep residual channel attention networks,” in Euro-
[3] Wenzhe Shi, Jose Caballero, Ferenc Huszár, Johannes pean Conference on Computer Vision (ECCV). Springer,
Totz, Andrew P Aitken, Rob Bishop, Daniel Rueckert, 2018, pp. 294–310.
and Zehan Wang, “Real-time single image and video
super-resolution using an efficient sub-pixel convolu- [13] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee, “Ac-
tional neural network,” in Proceedings of the IEEE curate image super-resolution using very deep convo-
Conference on Computer Vision and Pattern Recogni- lutional networks,” in Proceedings of the IEEE Con-
tion (CVPR), 2016, pp. 1874–1883. ference on Computer Vision and Pattern Recognition
(CVPR), 2016, pp. 1646–1654.
[4] Chao Dong, Chen Change Loy, and Xiaoou Tang, “Ac-
celerating the super-resolution convolutional neural net- [14] Zheng Hui, Xiumei Wang, and Xinbo Gao, “Fast and
work,” in European Conference on Computer Vision accurate single image super-resolution via information
(ECCV), 2016, pp. 391–407. distillation network,” in Proceedings of the IEEE Con-
ference on Computer Vision and Pattern Recognition
[5] Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee, (CVPR), 2018, pp. 723–731.
“Deeply-recursive convolutional network for image
[15] Yulun Zhang, Yapeng Tian, Yu Kong, Bineng Zhong,
super-resolution,” in Proceedings of the IEEE Con-
and Yun Fu, “Residual dense network for image super-
ference on Computer Vision and Pattern Recognition
resolution,” in The IEEE Conference on Computer Vi-
(CVPR), 2016, pp. 1637–1645.
sion and Pattern Recognition (CVPR), 2018.
[6] Xiao-Jiao Mao, Chunhua Shen, and Yu-Bin Yang, “Im-
[16] Bee Lim, Sanghyun Son, Heewon Kim, Seungjun Nah,
age restoration using very deep convolutional encoder-
and Kyoung Mu Lee, “Enhanced deep residual net-
decoder networks with symmetric skip connections,” in
works for single image super-resolution,” in The IEEE
Proceedings of the 30th International Conference on
conference on computer vision and pattern recognition
Neural Information Processing Systems (NIPS), 2016,
(CVPR) workshops, 2017, pp. 136–144.
pp. 2810–2818.
[17] Truong T Nguyen and Hervé Chauris, “Uniform dis-
[7] Ying Tai, Jian Yang, and Xiaoming Liu, “Image super- crete curvelet transform,” IEEE Trans. Signal Process.,
resolution via deep recursive residual network,” in Pro- vol. 58, no. 7, pp. 3618–3634, 2010.
ceedings of the IEEE Conference on Computer Vision
and Pattern Recognition (CVPR), 2017, pp. 3147–3155. [18] Kenneth Clark, Bruce Vendt, Kirk Smith, John Frey-
mann, Justin Kirby, Paul Koppel, Stephen Moore, Stan-
[8] Ying Tai, Jian Yang, Xiaoming Liu, and Chunyan X- ley Phillips, David Maffitt, Michael Pringle, et al., “The
u, “Memnet: A persistent memory network for image cancer imaging archive (tcia): maintaining and operat-
restoration,” in Proceedings of the IEEE Conference ing a public information repository,” J. Digit. Imaging,
on Computer Vision and Pattern Recognition (CVPR), vol. 26, no. 6, pp. 1045–1057, 2013.
2017, pp. 4539–4547.
[19] Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P
[9] Tiantong Guo, Hojjat Seyed Mousavi, Tiep Huu Vu, Simoncelli, “Image quality assessment: from error vis-
and Vishal Monga, “Deep wavelet prediction for image ibility to structural similarity,” IEEE Trans. Image Pro-
super-resolution,” in The IEEE Conference on Comput- cess., vol. 13, no. 4, pp. 600–612, 2004.
er Vision and Pattern Recognition Workshops (CVPRW),
2017, pp. 104–113.
2391