Image processing
Image processing
6,400
Open access books available
172,000
International authors and editors
190M Downloads
154
Countries delivered to
TOP 1%
most cited scientists
12.2%
Contributors from top 500 universities
Abstract
In recent years, deep learning HS-MS fusion has become a very active research tool
for the super resolution of hyperspectral image. The deep conventional neural
networks (CNN) help to extract more detailed spectral and spatial features from the
hyperspectral image. In CNN, each convolution layer takes the input from the
previous layer which may cause the problems of information loss as the depth of the
network increases. This loss of information causes vanishing gradient problems,
particularly in the case of very high-resolution images. To overcome this problem in
this work we propose a novel HS–MS ResNet fusion architecture with help of skip
connection. The ResNet fusion architecture contains residual block with different
stacked convolution layer, in this work we tested the residual block with two-, three-,
and four- stacked convolution layers. To strengthens the gradients and for decreases
negative effects from gradient vanishing, we implemented ResNet fusion architecture
with different skip connections like short, long, and dense skip connection. We
measure the strength and superiority of our ResNet fusion method against traditional
methods by using four public datasets using standard quality measures and found that
our method shows outstanding performance than all other compared methods.
1. Introduction
Spectral imaging technology captures contiguous spectrum for each image pixel
over a selected range of wavelength bands in the spectrum. Thus, spectral images
accommodate more information than conventional monochromatic or RGB images.
The wide range of spectral information available in hyperspectral image brings the
spectral imaging technology into a new horizon of research for analyzing the pixel
content at macroscopic level. This tremendous change in image processing research
area makes revolutionary developments in every walks of human life in coming
future. In general, spectral images are divided into either Multispectral (<20 numbers
1
Hyperspectral Imaging - A Perspective on Recent Advances and Applications
2. Review of literature
Many algorithms have been proposed to enhance the spatial quality of HS images
in past decades. One such popular and attractive method is HS-MS image fusion,
2
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
DOI: http://dx.doi.org/10.5772/intechopen.105455
which is mainly divided into four groups: component substitution (CS), multi-
resolution analysis (MRA), Bayesian approach, and spectral unmixing (SU) [6]. The
CS and MRA methods are described under the concept of an injection framework. In
this framework, the high-quality information from one image is injected into another
[7]. Apart from these, Bayesian-based methods use probability or posterior distribu-
tion of prior information about the target image. The posterior distribution of the
target image is considered based on the given HS and MS images [8]. Later, spectral
unmixing-based HS–MS image fusion was introduced and is one of the promising and
widely used methods for enhancing the quality of HS image.
In SU method, the quality of the abundance estimation highly depends on the
accuracy of the endmembers. Therefore, any obstruction that occurs during the end
member extraction process leads to inconsistency in the abundance estimation. To
overcome this limitation, Paatero and Tapper in 1994 [9] introduced nonnegative
matrix factorization (NMF) method and it was popularized in article by Lee and
Seung in 1999 [10]. It has become an emerging tool for processing high-dimensional
data due to the automatic feature extraction capability. The main advantage of this
NMF method is that it shows a unique solution to the problem compared to other
unmixing techniques [11]. In general, NMF based on the spectral unmixing jointly
estimates both endmember and corresponding fractional abundance in a single step
are mathematically represented as follows,
Y ¼ EA (1)
3
Hyperspectral Imaging - A Perspective on Recent Advances and Applications
By adding these two regularization terms with two convex subproblems helps to
upgrade the performance of the existing CNMF method. However, sometimes per-
formance degradation may occur in the CO-CNMF algorithm as the noise level
increases. Therefore, it is necessary to add image denoising and spatial smoothing
constraints with this fusion method.
Yang et al. in 2019 [17] introduced a total variation and signature-based (TVSR)
regularizations CNMF method named as TVSR-CNMF. The TV regularizer is added to
the abundance matrix to ensure the images spatial smoothness. Similarly, a signature-
based regularizer (SR) is added with the endmember matrix for extracting high-
quality spectral data. So, this method helps to reconstruct a hyperspectral image with
good quality in spatial and spectral data.
Yang et al. in 2019 [18] introduced a sparsity and proximal minimum-volume
regularized CNMF method named as SPR-CNMF. The minimum-volume regularizer
controls and minimizes the distance between selected endmembers and the center of
mass of the selected region in the image to reduce the computational complexity. It
redefines the fusion method at each iteration until reaches the simplex with minimum
volume. This method improves the fusion performance by controlling the loss of cubic
structural information.
After being influenced by this work, we implemented an unmixing-based fusion
algorithm named fully constrained CNMF (FC-CNMF). This method is a modified
version of CNMF by including all spatial and spectral constraints available in the
literature. In our method, a simplex with minimum volume constraint is imposed with
the endmember matrix to exploit the spectral information fully. Similarly, sparsity
and total variation constraints are incorporated with the abundance matrix to provide
dimensionality reduction and spatial smoothness to the image. Finally, we evaluated
the quality of the fused image obtained by FC-CNMF against the methods discussed in
the literature using some standard quality measures. From these evaluations, we
understood that our method shows better performance by yielding high-fidelity in the
reconstructed images.
These traditional approaches reconstruct the high-resolution hyperspectral image
by fusing the high-quality data from hyperspectral and multispectral images. How-
ever, to improve the quality of the reconstructed images, these approaches use differ-
ent constraints such as sparsity, minimum volume simplex, and total variance
regularization, etc. The performance and quality of the reconstructed HS image are
highly influenced by these constraints and therefore our existing methods still have an
ample space to enhance the quality of HSI.
Deep learning (DL) is a subbranch in machine learning (ML) and has shown
remarkable performances in the research field, especially in the area of image
processing and computer vision recently. DL is based on an artificial neural network
that has been widely used in different areas such as super-resolution, classification,
image fusion, object detection, etc. DL-based image fusion methods have the ability to
extract deep features automatically from the image. Therefore, DL-based methods
overcome the difficulties that are faced during the conventional image fusions
methods and make the whole fusion process as easier and simple.
A deep learning-based HS-MS image fusion concept was first introduced by
Palsson et. al in 2017 [19]. In this method, they used a 3-D convolutional neural
network (3D-CNN) to fuse LR–HS and HR–MS image to construct HR-HS image.
4
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
DOI: http://dx.doi.org/10.5772/intechopen.105455
This method improves the quality of hyperspectral image by reducing noise and the
computational cost. In this paper, they focused on enhancing the spatial data of LR–
HS image without any changes in the spectral information and it caused the degrada-
tion of spectral data [19].
Later, Masi et al. in 2017 [20] proposed a CNN-architecture for image super-
resolution, which uses deep CNN for extracting both spatial and spectral features.
Deep CNN is used to acquire features from HSI with a very complex spatial-spectral
structure. But in this paper, authors used deep CNN with single branch CNN archi-
tecture which is difficult to extract the discriminating features from the image.
To overcome this drawback, Shao and Cai in 2018 [21] designed a fusion method
by extending CNN with depth of 3D-CNN for obtaining better performance while
fusion. For implementing this, they used a remote sensing image fusion neural net-
work (RSIFNN) that uses two CNN branches separately. One branch extract the
spectral and the other extract the spatial data from the image. In this way, this method
helps to exploit the spectral as well as spatial information from the input images to
reconstruct high spectral and spatial resolution hyperspectral image.
Yang et.al in 2019 [22] introduced a deep two-branch CNN for HS–MS fusion. This
method uses a two-branch CNN architecture for extracting spectral and spatial fea-
tures from LR–HSI and HR–MSI. These extracted features from two branches of CNN
are concatenated and then passed to the fully connected convolution layer to obtain
HR–HSI. In all the conventional fusion methods, HR–HSI is reconstructed in a band-
by-band fashion whereas in CNN concepts all bands are reconstructed jointly. There-
fore, it helps to reduce the spectral distortion that occurs in the fused image. But this
method uses fully connected layer for image reconstruction that is heavily weighted
layer and it increases the network parameters.
Chen et al in 2020 [23], introduced a spectral–spatial features extraction fusion-
CNN (S2FEF- CNN) which extracts joint spectral and spatial features by using three -
S2FEF blocks. The S2FEF method use 1D and 2D convolution network to extract
spectral and spatial features and fuse these spectral and spatial features. This method
uses fully connected network layer for dimensionality reduction, and it further
reduces the network parameters during the fusion. This method shows good results
with less computational complexity compared to all other deep learning-based fusion
method.
Although the deep learning-based fusion methods achieved tremendous improve-
ment in their implementation, however, all these methods still possess many draw-
backs [24]. As the network goes deeper, its performance gets saturated and then
rapidly degrades. This is because, in DL method, each convolution layer takes inputs
from the output of the previous layers, so when it reaches the last layer, a lot of
meaningful information obtained from the initial layers will be lost. The information
loss tends to get worse when the network is going deeper in architecture. This will
bring some negative effects such as overfitting of data and this effect is called
vanishing gradient problem [25].
Due to the vanishing gradient problem, the existing deep learning-based fusion
could not be able to extract the detailed features from high dimensional images. He
et al in ref., [26], introduced a deep network with residual learning to address the
vanishing gradient problem. In this framework, a residual block is added between the
layers to diminish the performance degradation. The networks with these concepts are
called residual networks or ResNets. Therefore, in this work, our aim is to invoke this
ResNet architecture into the standard CNN to exploit more detailed features from
both spatial and spectral data of HSI.
5
Hyperspectral Imaging - A Perspective on Recent Advances and Applications
3.1 Dataset
The four real datasets such as Washington DC mall, Botswana, Pavia University,
and Indian Pines are used in this work. The Washington DC Mall dataset is a well-
known dataset captured by HYDICE sensor, which acquired a spectral range from
400 to 2500 nm have 1278307 pixel size and 191 bands. The Botswana dataset which
is captured by Hyperion sensor acquired over the Okavango delta in Botswana, which
acquired a spectral range from 400 to 2500 nm with 1476 256 pixel size and
145 bands. The Pavia University dataset was captured by the reflective optics spec-
trographic imaging system (ROSIS-3) at the University of Pavia, northern Italy, in
2003. It has a spectral range from 430 to 838 nm and has a 610 340 pixel size and
103 bands. Finally, the dataset AVIRIS Indian Pines was captured by AVIRIS sensor
over the Indian Pines test site in northwestern Indiana, USA, in 1992. It acquired a
spectral range from 4 to 2500 μm having 512 614 pixel size and 192 bands [26].
All these datasets have been widely used in earlier spectral unmixing-based fusion
research.
A. Convolution layer
The convolution layer is used to extract various features from the input image
with the help of filters. In convolution layer, mathematical operation is
performed between the input image and the filter with m m kernel size. This
filter is sliding across the input image to calculate the dot product of the filter
and part of the image. This process is repeated for convolving the kernel to all
over the image and the output of the convolution operation is called a feature
map. This feature map includes all essential information about the image such as
the boundary and edges of objects etc. [28].
B. Pooling layer
The convolution layer is followed by a pooling layer, which reduces the size of
the feature map by maintaining all the essential features. There are two types of
pooling layers such as max pooling and average pooling. In Max pooling, the
largest element is taken from the feature map whereas in the average pooling
calculates the average of the element in the feature map [28].
6
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
DOI: http://dx.doi.org/10.5772/intechopen.105455
C. Activation function
One of the most important characteristics of any CNN is its activation function.
There are several activation functions such as sigmoid, tanH, softmax, and
ReLU, and all these functions have their own importance. The ReLU is the most
commonly used activation function in DL that accounts for the nonlinear nature
of the input data [28].
y ¼ F ðW i x Þ þ x (2)
Here x is an input and y is the output of the residual unit. Then y is a guaranteed
input to the next residual block. The function F(Wi x) represents the output of each
convolution layer, and Wi is the weight associated with ith residual blocks. Figure 1
uses two convolution layers for the residual unit, so the output from this residual layer
can be written as:
Where ReLU represents the nonlinear activation function rectified linear unit
(ReLU), W 1 and W 2 are the weight associated with convolution layers 1 and 2 of the
residual block A. Deep residual networks consist of many stacked residual blocks and
each block can be formulated in general as follows:
Where F is the output from residual block with l stacked convolution layer and xi is
the residual connection to the ith residual block, then xiþ1 become the output of the ith
residual block, which is calculated by a skip connection and element-wise
7
Hyperspectral Imaging - A Perspective on Recent Advances and Applications
Figure 1.
HS–MS fusion using CNN.
multiplication. After passing through the ReLU activation layer, the output residual
network can be represented as:
4. Problem formulation
Z ¼ EA þ R (6)
Where, Z is the original referenced images, E and A are the endmember, abun-
dance matrices, and R is the residual matrix respectively.
The observed Yh and Ym are spectrally and spatially degraded versions of image Z
is further mathematically represented by:
Ym ≈ SZ þ Rm (7)
Yh ≈ ZB þ Rh (8)
Where B ∈ ℝNN=d is a Gaussian blur filter with blurring factor d used to blur the
spatial quality of the referenced hyperspectral image Z to obtain LR–HSI, Yh . The
spectral response function, S ∈ ℝLm L is used to downsampling the spectral quality of
the referenced hyperspectral image Z to obtain HR–MSI, Ym . The term Lm means the
number of spectral bands used in the multispectral image after downsampling. In this
work, referenced image Z is downsampled by its spectral values using standard L and
sat 7 multispectral image that contains a high-quality visual image of Earth’s surface as
HR–MSI with Lm ¼ 7 [28]. Both B and S are spared matrices containing zeros and
ones. In general, the residual matrix Rm and Rh are assumed as zero-mean Gaussian
noises in the literature, Therefore, the original CNMF method is shown as:
However, in this work, we make use of the residual term Rm and Rh as a nonneg-
ative residual matrix to account for the nonlinearity effects in the image fusion [29].
8
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
DOI: http://dx.doi.org/10.5772/intechopen.105455
Since the objective function for the original CNMF method expressed in the Eq. (9)
can be re-written as:
Therefore the Eq. (10) represents the proposed model of the HS–MS fusion by
including the nonlinearity nature of the image. To implement this model, we use
standard deep neural network architecture CNN and ResNet. For further enhance-
ment of the proposed method, we implemented modified architecture of ResNet with
different stacked layers and multiple skip connections.
5. Problem implementation
Similarly, the Conv2D() convolution filter with kernel size r r having weight w
are used for extracting spatial data from HR–MSI, Ym image are represented as:
f spat ¼ Conv2D ReLU F wij Y m (12)
The two convolutional layers use ReLU (rectified linear unit) activation functions,
i.e., ReLU (x) = max(x, 0), to provide nonlinear mapping of data. Finally, fuse the
extracted spatial and spectral features to get high-quality reconstructed image as
shown in Eq. (4).
F ¼ ReLU f spec f spat (13)
Conv 2D 32 33
Conv 2D 64 33
Conv 2D 1 11
Table 1.
The Simple CNN Fusion Architecture.
In CNN, each layer takes its input as the output from the previous layer and it
introduces lose information as the network architecture goes in deeper. This problem
in deep neural network leads to overfitting of data, and it is known as vanishing
gradient problem [24]. To overcome this, we implemented HS-MS fusion using an
alternative ResNet-based network architecture. In ResNet, we introduced the skip
connection between two convolution layers. This skip connection helps to map the
identity of information throughout the deep convolution network.
The ResNet fusion architecture for HS–MS fusion uses residual or skip connection
which helps to improve the feature extraction capability from the images. For imple-
mentation, we use 1D ResNet to extract the spectral features from the LR–HSI and 2D
ResNet for extracting spatial features from HR–MSI. Both 1D and 2D ResNet archi-
tecture consists of three residual blocks each having two convolutional layers and 64
filters as shown in Figure 2. A3 3 kernel size for 2D Resnet and 1 3 kernel size for
1D Resnet are used for extracting the spatial and spectral data from MSI and HSI. Each
residual block has ReLU activation layer to accommodate the nonlinearity constraints
included in the proposed hyperspectral image fusion model as explained in Eq. (10).
Finally, the feature embedding and image reconstruction process are performed using
another 2D CNN.
Figure 2.
Residual block with two stacked layer.
10
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
DOI: http://dx.doi.org/10.5772/intechopen.105455
f ispec ¼ f ispec1 Y hl þ ri 1
Y hl (16)
Where, Y h denotes the input LR- HSI data, i is the number of residual units
i = 1,2,3 … ..I and l are the number of convolution layer l = 1,2,3 … ..l. The weight of
convolution kernel is represented as W. Finally, ReLU an activation functions are
exploited to introduce nonlinearities in the output of deep network as follows:
F spec ¼ ReLU f spec (17)
f ispat ¼ f ispat1 Y ml þ ri 1
Y ml (20)
Where, Y m denotes the input HR- MSI data, i is the number of residual blocks
i = 1,2,3 … ..I and l are the number of convolution layer l = 1,2,3 … ..L. The weight
of the convolution kernel is represented as W. Finally, similar to spectral
extraction ReLU is exploited to introduce nonlinearities in the spatial output of a
deep network as follows:
Fspat ¼ ReLU f spat (21)
11
Hyperspectral Imaging - A Perspective on Recent Advances and Applications
Then, the feature embedding and image reconstruction are performed by using
ReLU activation layer. The proposed ResNet Fusion framework is shown in
Figure 3. Therefore, the final generated HR-HSI, Z can be written as:
Z ¼ ReLU ðF Z Þ (23)
Figure 3.
The framework of the proposed ResNet Fusion architecture.
12
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
DOI: http://dx.doi.org/10.5772/intechopen.105455
Figure 4.
Representation of short, long, and dense skip connection on ResNet.
dense skip connection, each layer in the ResNet receives feature maps from all
the preceding layers and that limits the number of filters and network
parameters for extracting deep features. In order to obtain high fidelity
reconstructed image, we proposed a modified version of ResNet with long and
dense skip connections shown in Figure 4.
In the Figure 4 show three Resnet architecture, having three- residual blocks
(Res Block), with three different types of skip connections. Algorithm 1 summarizes
the procedures of our proposed ResNet fusion method.
13
Hyperspectral Imaging - A Perspective on Recent Advances and Applications
Table 2.
The performance evaluation of different fused algorithms on four hyperspectral datasets.
14
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
DOI: http://dx.doi.org/10.5772/intechopen.105455
UIQI show good spatial quality and high fidelity reconstructed image with less spec-
tral distortion. From Table 2, it is further clear that good spectral preservation is
obtained in Botswana dataset on analyzing the SAM value, which is reduced by more
than 0.02 dB. Simultaneously, significant spatial preservation is achieved in the Indian
Pine database revealed by the PSNR value increased by 1.5 dB.
The above work is extended by introducing different stacked convolution layers in
the residual block of the ResNet. The experimental results obtained after stacked
convolution layer in the ResNet are shown in Table 3. From the SAM value in Table 3,
it is clear that the spectral information of the image is reducing as and when the
number of stacked layers in the residual block increases. The UIQI value from the
Table 3 also reveals that quality of the reconstructed image is also diminishing as the
number stacked layer increases in the ResNet. The PSNR and EARGS show a stable
performance, which ensure the spatial consistency of our proposed method. So, we
concluded that ResNet Fusion network with two-stacked convolution layer acquires
more discriminative features from the source images and guarantee the quality of the
reconstructed image on analyzing the results obtained in Table 3.
Figure 1 shown below is the visual representation of the output provided by our
proposed ResNet fusion method on four benchmark datasets against all other baseline
methods. From the figure, it is evident that ResNet Fusion with two-stacked convolu-
tion layers produces better performance in most of the areas in the image
(highlighted) of the four datasets (Figure 5).
We further extend the Resnet fusion architecture to reduce the number of param-
eters to make our proposed method more efficient and effective to handle high
Table 3.
The performance of ResNet fusion by varying the stacked layers.
15
Hyperspectral Imaging - A Perspective on Recent Advances and Applications
Figure 5.
The ground truth and fused image of different methods using four benchmark datasets.
CNN 31,586,081million
Table 4.
The performance of different skip connection.
dimensional data. For that, we used short skip, long skip, and dense skip connection to
the ResNet architecture with two-stacked convolution layers. Table 4 gives the total
number of network parameters required for this ResNet architecture in each skip
connection. From Table 4, it is clear that ResNet architecture with dense skip con-
nection provides very less network parameters compared to ResNet with short and
long skip connections.
A. Time complexity
Comparing the performance and running time of all the proposed algorithms on
four benchmark datasets are shown in Figure 6. From this figure, it is evident
that ResNet fusion with dense skip connection took very less running time and
showed good performance in reconstructing high-fidelity hyperspectral image.
16
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
DOI: http://dx.doi.org/10.5772/intechopen.105455
FCN-CNMF
CNN
Short skip Resnet
Long skip Resnet
7000
6000
Running Time (seconds)
5000
4000
3000
2000
1000
0
Washinton DC Mall Pavia University Indian Pine Botswana
Datasets
Figure 6.
The running time of traditional and deep learning HS-MS image fusion.
On comparing the performance and running time of ResNet with long skip and
short skip connection, long skip connection ResNet fusion architecture shows
good performance and running time than short skip connection. On evaluating
the performance and running time of all ResNet fusion architectures, ResNet
with dense skip connection outperformed compared to the other two ResNet
fusion architectures. While comparing the performance and running time, the
FCN-CNMF method showed better performance and time than CNN-based
fusion. Finally, we concluded that, ResNet with dense skip connection with less
network parameter shown highlighting performance for reconstructing good
spatial and spectral quality HR-HSI compared to all other proposed methods.
However, all our proposed methods show good in performance but the cost
incurred in terms of time is high.
DOI: http://dx.doi.org/10.5772/intechopen.105455
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
Name Layer Kernel size Input size Input content Stride Padding Activation Output size Output content
Flatten layer Conv 9 ID-CNN 1*1 32 1DConv8 1 same ReLU 1 Spectral data
Upsampling layer Conv 10 2D-CNN 3*3 1 Spectral/Spatial data 1 same ReLU 32 Spectral*Spatial
Output layer Conv 11 2D-CNN 3*3 32 Spectral * Spatial 1 same ReLU 64 Fused Image
Table 5.
ResNet-dense skip Architecture of HS-MS image fusion.
Hyperspectral Imaging - A Perspective on Recent Advances and Applications
experimented with three skip connections: short skip, long skip, and dense skip
connection. From this experiment, we found that ResNet with a dense skip
connection reduces the number of network parameters to a large extent.
Finally, we built a generative ResNet model for the fusion of HS–MS image as
shown in Table 5. The ResNet fusion model uses ID and 2D convolution
networks. These two convolution networks consist of three residual blocks, each
residual block contains two convolution layers with 64 filters, 3x3 kernel size,
stride = 1, max-pooling, and padding = same. To make the information flow
accurately throughout the network, we use dense skip connection. At last, it uses
a 2D convolution to decode the reconstructed image into the original format.
7. Conclusion
20
Hyperspectral and Multispectral Image Fusion Using Deep Convolutional Neural Network…
DOI: http://dx.doi.org/10.5772/intechopen.105455
Author details
© 2022 The Author(s). Licensee IntechOpen. This chapter is distributed under the terms of
the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided
the original work is properly cited.
21
Hyperspectral Imaging - A Perspective on Recent Advances and Applications
References