0% found this document useful (0 votes)
25 views14 pages

Context Aware Edge-Enhanced GAN For Remote Sensing Image Super-Resolution

GAN

Uploaded by

shirisha edikoju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views14 pages

Context Aware Edge-Enhanced GAN For Remote Sensing Image Super-Resolution

GAN

Uploaded by

shirisha edikoju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL.

17, 2024 1363

Context Aware Edge-Enhanced GAN for Remote


Sensing Image Super-Resolution
Zhihan Ren , Lijun He , and Jichuan Lu

Abstract—Remote sensing images are essential in many fields, (LR) image. With the development of remote sensing image
such as land cover classification and building extraction. The huge application technology, remote sensing images are extensively
difference between the directly acquired remote sensing images used in hyperspectral application [1], [2], [3], [4], [5], [6], [7],
and the actual scene, due to the complex degradation process and
hardware limitations, seriously affects the performance achieved [8], object detection [9], [10], change detection [11], [12], [13],
by the same classification or segmentation model. Therefore, using and other fields. However, image SR is an ill-posed problem
super-resolution (SR) algorithms to improve image quality and because one LR image may degenerate from several different
achieve better results is an effective method. However, current SR HR images. The SR algorithm is to find an optimal HR image
methods only focus on the similarity of pixel values between SR and from all possible solutions. The quality of remote sensing im-
high-resolution (HR) images without considering perceptual sim-
ilarities, which usually leads to the problem of oversmoothed and ages is limited by hardware equipment and natural environment
blurred edge details. Moreover, there is little attention to human interference, such as clouds and fog, which leads to serious
visual habits and machine vision applications for remote sensing degradation of the acquired remote sensing images that do not
images. In this work, we propose the context aware edge-enhanced match the actual scene seriously. However, the natural image
generative adversarial network (CEEGAN) SR framework to re- SR algorithms neglect these factors, leading to unsatisfactory
construct visually pleasing images that can be practically applied
in actual scenarios. In the generator of CEEGAN, we build an edge results in remote sensing images. Specifically, incorrect edges
feature enhanced module (EFEM) to enhance the edges by combin- can cause semantic segmentation models to be unable to dis-
ing the edge features with context information. Edge restoration tinguish pixels at the edge or image classification models to be
block is designed to fuse multiscale edge features enhanced by unable to correctly classify images. Therefore, the development
EFEM and reconstruct a refined edge map. Furthermore, we de- of a remote sensing image SR algorithm that can accommodate
signed an edge loss function to constrain the generated SR and HR
similarity at the edge domain. Experimental results show that our both human visual perception and machine vision applications
proposed method can obtain SR images with a better reconstruction is crucial.
performance. Meanwhile, CEEGAN can achieve the best results Over the past few decades, SR has attracted great atten-
on classification and semantic segmentation datasets for machine tion from researchers and many SR methods have emerged.
vision applications. Existing methods can be divided into two main categories:
Index Terms—Edge enhancement, generative adversarial 1) traditional methods and 2) deep learning-based methods.
network (GAN), remote sensing images, super-resolution (SR). Traditional methods can be further categorized into interpolation
methods and reconstruction methods. Interpolation methods use
I. INTRODUCTION the same kernel without considering the position of the pixel
UPER-RESOLUTION (SR) is the process of restoring a point. Interpolation methods, such as nearest neighbor inter-
S high-resolution (HR) image from a given low-resolution polation, bilinear interpolation, and bicubic interpolation [14],
can improve the image resolution based on the content of the
Manuscript received 12 October 2023; revised 7 November 2023; accepted image itself but cannot provide more information. However,
10 November 2023. Date of publication 15 November 2023; date of current they may cause edge blurring and fail to achieve the desired
version 14 December 2023. This work was supported in part by the National
Key R&D Program of China under Grant 2022ZD0115802, in part by the Central visual quality. The reconstruction-based methods [15], [16]
Guidance on Local Science and the Technology Development Fund under Grant solve the problem from the perspective of image degradation
2022ZY1-CGZY-01HZ01, in part by the Natural Science Foundation of Sichuan models, assuming that the HR images are obtained from the LR
Province under Grant 2022NSFSC0966, and in part by the Key R&D Project in
Shaanxi Province under Grant 2023-ZDLNY-65. (Corresponding author: Lijun images with appropriate motion transformation, blurring, and
He.) noise. These methods constrain the generation of SR images by
Zhihan Ren is with the Shaanxi Key Laboratory of Deep Space Exploration extracting information from LR images and combining it with
Intelligent Information Technology, School of Information and Communica-
tions Engineering, Xi’an Jiaotong University, Xi’an 710049, China (e-mail: prior knowledge. However, reconstruction-based methods suffer
[email protected]). from the problems of complex optimization methods and high
Lijun He is with the Shaanxi Key Laboratory of Deep Space Exploration computational costs.
Intelligent Information Technology, School of Information and Communications
Engineering, Xi’an Jiaotong University, Xi’an 710049, China, and also with the With the development of deep learning, SR algorithms
Sichuan Digital Economy Industry Development Research Institute, Chengdu, based on convolutional neural network (CNN) have been pro-
Sichuan 610036, China (e-mail: [email protected]). posed [17]. Based on this, an amount of research [18], [19],
Jichuan Lu is with the China Mobile Communications Group Shaanxi Com-
pany Ltd., Xi’an 710077, China (e-mail: [email protected]). [20], [21], [22], [23], [24], [25] has been devoted to optimizing
Digital Object Identifier 10.1109/JSTARS.2023.3333271 networks by using L1 or L2 loss functions. These methods

© 2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see
https://creativecommons.org/licenses/by-nc-nd/4.0/
1364 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

use CNN to learn the implicit mapping between pairs of LR features can be severely affected. CEEGAN performs the
and HR images and then predict the HR images corresponding information exchange between contextual semantic fea-
to the LR images based on the learned mapping relationship. tures and edge features, and integrates contextual features
However, these methods select PSNR as the evaluation metric into the image, aiming to restore the texture information as
and generate images with high PSNR values that may not align much as possible in the generated SR images. Both types
with human visual perception. In addition to the PSNR-oriented of features complement each other, to satisfy the needs of
method, there is another type of methods [26], [27], [28], [29], both human and machine vision.
[30], [31], [32], [33], [34], [35], [36] that introduce generative 2) An edge enhancement module based on context-guidance:
adversarial network (GAN) [37] into the field of SR, which To enhance the images with blurred edge details, we
aim to generate more realistic images. Although the images explore and integrate multiscale (MS) edge features and
generated by GAN-based methods are more realistic compared context semantic features in the edge feature enhanced
to PSNR-oriented methods, the high-frequency details learned module (EFEM). The high-level context semantic features
by GAN-based methods may not be consistent with the actual are beneficial to guide the extraction and enhancement of
situation due to the lack of constraints and the severe degradation edge features. Therefore, we keep the information inter-
of remote sensing images. Jiang et al. [29] noticed that existing action between the context semantic and edge branches to
methods cannot generate correct edges and proposed a denoising help the edge branch understand the high-level semantic
approach through mask processing. However, this method solely information in the image. Moreover, we fuse the MS
focused on extracting and utilizing features from the edge do- features enhanced by EFEMs and reconstruct the edge
main, neglecting the valuable contextual semantic information map through the edge restoration block (ERB).
present in remote sensing images. Moreover, it does not impose 3) An edge loss function to generate more realistic image:
any constraints on the enhanced edges, resulting in suboptimal Due to the neglect of edges by L1 and L2 loss, which leads
performance. Recently, the transformer and diffusion model to the oversmooth images, and the sensitivity of GAN-loss
has been widely used in the field of computer vision [38], to noise, we expand the attention of SR models from the
[39], [40]. Some works have extended the transformer and pixel domain to the edge domain by designing edge loss
diffusion model to the field of SR. SwinIR [41] proposed an function that constrains the SR and HR image in the edge
image reconstruction model based on Swin Transformer [40]. domain, making the generated images more realistic and
Transformer-based enhancement network (TransENet) [42] pro- preserving important edge details.
posed a multistage enhancement structure based on transformer. The rest of this article is organized as follows. Section II
Inspired by the denoising diffusion probabilistic model, Image summarizes the related works. Section III introduces the details
super-resolution via iterative refinement (SR3) [43] performed of CEEGAN. The experimental results are given in Section IV.
SR images through a random iterative denoising process. Finally, Section V concludes this article.
In general, there are still some limitations of the existing
algorithms. II. RELATED WORK
1) The generated images by the PSNR-oriented method may
not align with human visual perception since they ignore A. PSNR-Oriented SR Models
the perceptual similarity, which results in oversmoothed Since CNN was introduced to SR by Dong et al. [17], a
images. super-resolution convolutional neural network (SRCNN) with
2) The practical applications of GAN-based methods may be three convolutional layers was pioneered. SRCNN utilized the
limited due to their susceptibility to noise, and inaccura- L2 loss function to optimize the PSNR, aiming to improve
cies in generating essential high-frequency information in its performance beyond traditional methods with the strong
images, especially in the case of remote sensing images nonlinear fitting capability of CNN. With the introduction of
with complex degradation processes. ResNet [44], SRResNet [26] introduced the residual network in
To address the above problems, we propose an SR framework the SR network to combine the low-level feature and deep-level
for joint context semantic information for edge enhancement feature to improve the network learning ability. To enhance
of remote sensing images, named context aware edge-enhanced the performance of the network, Lim et al. [27] proposed the
generative adversarial network (CEEGAN). The main contribu- enhanced deep residual network (EDSR), which removed the
tions of this work are summarized as follows. BN layer and increased the model size without increasing com-
1) An SR framework for both human vision and machine putational resources. Furthermore, Zhang et al. [18] proposed
analysis: The existing SR framework is primarily focused a residual channel attention network (RCAN) with a channel
on restoration at the pixel level, and a high PSNR may not attention mechanism, which can adaptively rescale channel fea-
satisfy the requirements for high-level computer vision tures. With the development of the transformer, Chen et al. [45]
tasks in practical scenarios. Remote sensing images usu- constructed the first transformer model for image SR, and Liang
ally contain a multitude of objects with different scales, et al. [41] proposed SwinIR, an image reconstruction model
shapes, and complex spatial relationships. Therefore, re- based on the swin transformer [40].
mote sensing images possess highly rich contextual se- Remote sensing images are more complex than natural images
mantic features. However, in the complex degradation in terms of the degradation process and contain objects of
process of remote sensing images, the contextual semantic varying sizes and shapes, which makes SR of remote sensing
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1365

Fig. 1. Overall structure of the CEEGAN. Edge extraction denotes the Laplacian operator. The blue line indicates the network flow of CEEGAN. The orange
line indicates the calculation process of the loss function of CEEGAN. SRLoss contains L1 loss, adversarial loss, and perceptual loss.

images more challenging. Therefore, some works proposed SR of discriminators in GAN, enhanced SRGAN (ESRGAN) [28]
algorithms for remote sensing images, considering their unique proposed relativistic GAN that allowed the discriminator to
characteristics and challenges. Lei et al. [19] designed a local- predict relative realness instead of the absolute value and used
global combined network, which combined local and global the network interpolation method to balance the conflict be-
information by cascading shallow and deep feature mappings. tween objective and subjective evaluation metrics. Similarly,
To further integrate information from different depths, Zhang coupled-discriminated GANs [46] proposed a discriminator that
et al. [20] proposed the mixed high-order attention network, makes judgments based on SR images and HR images to better
which made full use of hierarchical features through high-order distinguish the input. To solve the problem of unclear image
attention. Some works addressed the problem of remote sensing generation edges, edge-enhanced GAN (EEGAN) [29] proposed
image SR from the direction of MS features. The MS attention an edge enhancement strategy by purifying noisy images to
network [21] employed convolutional with different kernel sizes generate precise edges and enhance image contours. Multiat-
to extract MS features of remote sensing images and uses the tention GAN [30] proposed branch attention to integrating up-
channel attention mechanism to fuse features at different scales. sampled LR images with high-level features. Different from the
Dong et al. [22] proposed a second-order MS super-resolution PSNR-oriented SR methods, which pursue high PSNR value, the
network to maximize the use of learned MS information by GAN-based methods prefer to generate more realistic images.
exploiting small-difference and large-difference features at the However, the GAN-based methods are sensitive to noise, which
local and global levels, respectively. Hybrid-scale self-similarity leads the generator to add incorrect high-frequency details to SR
exploitation network (HSENet) [23] used self-similarity to learn images. This shortcoming limits the performance of SR images
internal recurrence models in single-scale and cross-scale in in subsequent high-level computer vision tasks.
remote sensing images, achieving stronger feature representa-
tion. In contrast to the abovementioned methods, TransENet [42] III. METHOD
proposed a transformer-based MS enhancement structure to MS
high and low-dimensional features. A. Overview of the Proposed CEEGAN
Fig. 1 illustrates the overall framework of CEEGAN. The
generator of CEEGAN can be divided into the following three
B. GAN-Based SR Models
parts: Initial feature extraction (IFE), EFEM, and ERB.
Ledig et al. [26] first applied GAN to image SR reconstruction The original LR image ILR ∈ RH×W ×3 , where H and W
and proposed SRGAN, which increased perceptual loss and denote the height and width of the image, respectively, is first
adversarial loss to make the generated images more realistic. processed by presuper-resolution module (PSRM) to generate a
To fully leverage the features across different layers, ultradense Pre-SR image IP ∈ R4H×4 W ×3 , which can achieve reconstruc-
GAN (udGAN) [36] proposed the ultradense residual block, tion of most regions in LR images except for the edges. Pre-SR
which reforms the internal layout of the residual block into image is simultaneously fed into the edge and context semantic
a two-dimensional matrix topology. To improve the judgment feature extraction layer to obtain corresponding features. In the
1366 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

EFEM, the bidirectional edge feature fusion block (BiEFFB)


and bidirectional context feature fusion block (BiCFFB) are
used for bidirectional weighted fusion to learn the importance
of features between different scales. Simultaneously, we de-
signed the context edge information exchange block (CEIEB)
to exchange information between the two branches, which is Fig. 2. Architecture of the BiEFFB and BiCFFB. ⊕ denotes the weighted sum
helpful in guiding edge information using advanced context operation. The LeakyReLU after each Conv layer is omitted for simplicity.
semantic information. For the MS enhanced features of EFEM
output, ERB is deployed to aggregate them into an enhanced
edge E ∗ ∈ R4H×4 W ×3 . Finally, we replace the blurred edge in reflect the high-frequency information in noncontour regions,
IP with the enhanced edges E ∗ to gain a reconstructed image which is equally important for edge reconstruction. In contrast,
ISR ∈ R4H×4 W ×3 with refined and accurate edges. many objects in remote sensing images do not have clear and
For the discriminator, VGG-Discriminator [26] is chosen to well-defined target edges and we need the trend of object contour
determine whether the output image ISR of the generator is real in the image. Therefore, the edge extraction method cannot meet
or fake, which is widely used in the SR domain. the requirements of our proposed framework. The Laplacian
is a second-order derivative-based edge-aware operator, which
B. Initial Feature Extraction can extract the edges in an image while suppressing the smooth
regions. Due to its nonlinear nature, it can enhance image edges
Due to the complex degradation in LR images, a significant and fine details without introducing artifacts or ringing effects.
amount of crucial high-frequency information is lost, making it Therefore, the eight-directional Laplacian operator is chosen as
difficult to achieve satisfactory results by directly enhancing the the edge extraction that can represent the intensity of the change
edges of LR images. Therefore, to provide exact information for of each position in the image. The Laplacian operator L(·) of
subsequent modules to extract features, PSRM is built to process given image h can be expressed as
the LR image for generating Pre-SR images. The PSRM is an SR ⎡ ⎤
network that builds upon the strengths of SwinIR [41], which is a −1 −1 −1
state-of-the-art solution in the SR domain, while also addressing L(h) = ⎣−1 8 −1⎦ ⊗ h (1)
its limitations. Although SwinIR can generate relatively clear −1 −1 −1
images, its utilization of the L1 loss function may cause the
where ⊗ is the convolution operator.
generated SR images to appear overly smooth. This is due to the
tendency of the network to generate images with pixel values
C. Edge Feature Enhanced Module
closer to the average, resulting in the loss of high-frequency
information and details. To address the limited ability of existing Due to the particular characteristics of remote sensing images,
methods to accurately recover edges using only edge domain natural factors, such as clouds and fog often interfere with the
features, we extract features from both pixel and edge domains objects and blur the edge features of the objects. In addition,
via the use of two convolutional layers. Let Fc1 = [fc1 1 1
, fc2 1
, fc3 ] since the complex background characteristics of remote sensing
1 1 1 1
and Fe = [fe1 , fe2 , fe3 ] denote the initial MS context features images, a large amount of noise is introduced in the feature
and edge features extracted by the corresponding convolutional extraction process. These issues lead to insufficient initial edge
layers. Taking edge features as an example, an edge image of feature characterization ability. To address this problem, we
size 4H × 4 W sequentially passes through three convolutional design a bidirectional connection and weighted feature fusion
blocks, resulting in outputs of size 4H × 4 W , 2H × 2 W , and block, named BiEFFB and BiCFFB. To preserve the spatial con-
H × W , respectively. Specifically, each convolutional block sistency of the two categories of features, we maintain the same
consists of three convolutional operators with kernel sizes of architectural settings of convolution parameters. As shown in
1 × 1, 3 × 3, and 3 × 3. The only difference among the three Fig. 2, BiEFFB or BiCFFB consists of four 3 × 3 convolutional
blocks lies in the stride of the last 3 × 3 convolutional operator, layers with stride 1. Let fˆe1
i+1
denote the smallest size feature in
which is 1, 2, and 2, respectively. The computation process for the output of the ith EFEM, and it can be calculated as follows:
context features is the same as that for edge features. The con-
volutions in the pixel domain can extract rich context semantic fˆe1
i+1

⎛ ⎛  ⎞⎞
features, which can guide the enhancement of edge features.
w1i+1 · fˆe1
i
+ w2i+1 · Resize fˆe2
i+1
Moreover, objects in remote sensing images usually cover a large = LReLU ⎝Conv ⎝ ⎠⎠
span at the scale dimension, which makes it challenging for a w1i+1 + w2i+1 + 
single-scale feature to capture all the information needed for SR (2)
reconstruction. Therefore, the proposed convolutional layer can
extract MS features, which enables it to adapt to various object where LReLU(·) indicates the LeakyReLU activation function.
sizes in remote sensing images. Resize(·) represents the use of interpolation operations to make
For the edge extraction, there are several methods, such as the dimensions consistent. w1i+1 and w2i+1 are two learnable
Canny [47], HED [48], and EDTER [49]. Although the edge ex- weights that refer to the importance of each feature. The super-
traction algorithm can get accurate contours of objects, it cannot script i + 1 indicates that the variable represents the weight for
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1367

to obtain an enhanced edge image that can reflect the reality


edges of objects of different scales and classes. The structure
of ERB is depicted in Fig. 4. Let FR = [fr1 , fr2 , fr3 ] denote
the output features of the residual connection with different
scales in FR . These features are resized to the same shape as the
SR image and concatenated together to form a unified feature
representation FM . The process can be expressed as
FR = FeN +1 + Fe1 = [fr1 , fr2 , fr3 ] (4)
Fig. 3. Architecture of the CEIEB. Variables with the hat symbol indicate raw
features without information exchange, and variables without the hat symbol FM = Concat(fr1 , R2 (fr2 ), R4 (fr3 )) (5)
indicate refined features after information exchange. ⊗ and ⊕ denote the
element-wise multiplication and sum operation, respectively. The LeakyReLU where Concat(·) represents the concatenate operation in the
after each Conv layer is omitted for simplicity. channel dimension. R2 and R4 denote the 2× upscale and 4×
upscale interpolation operation, respectively. After the scaling
operation, all three features remain in the same feature dimen-
the i + 1th EFEM.  is a small value to increase stability and sion, where fr1 , R2 (fr2 ), R4 (fr3 ) ∈ R4H×4 W ×C . C denotes
is set to 10−6 . According to the same calculation method, we the number of output channels. The output FM ∈ R4H×4 W ×3 C
can obtain the enhanced features F̂ei+1 = [fˆe1
i+1 ˆi+1 ˆi+1
, fe2 , fe3 ] and will be used for further feature fusion.
F̂c = [fˆc1 , fˆc2 , fˆc3 ] from F̂e and F̂c . The quality of the
i+1 i+1 i+1 i+1 i i Distinct from features present in the pixel domain, features
edge features is essential to determining the quality of the final extracted from the edge map contain limited information. Addi-
edge enhancement images obtained by subsequent modules. tionally, in the edge map, distinct objects may be represented by
To gradually convert the edge features containing noise into the same value, making it challenging to accurately distinguish
accurate features, N EFEMs are deployed as a cascaded module between different objects during edge map reconstruction. To
to fuse and exchange information for each context and edge reduce the impact of noise and improve the accuracy of the
feature. The analysis and discussion of the number of EFEM reconstructed image, it is necessary to deeply explore the re-
cascades are presented in Section IV-D4. lationship between different regions of the input image. Coor-
To overcome the limitations of current edge enhancement dinate attention (CA) [50] is employed to enhance the model’s
methods that only use edge features, we design a method that understanding of edge features, which integrates location in-
exploits the interplay between context semantic features present formation into the channel, enabling more precise localization
in the pixel domain and edge features to improve the accuracy and identification of the target of interest. By leveraging CA,
of edge reconstruction. As shown in Fig. 3, the context semantic our model gains a deeper understanding of the spatial context
features from the context branch facilitate the edge branch’s and improves its ability to accurately capture and utilize edge
understanding of high-level context semantic information and features for enhanced performance.
contribute to the network’s ability to learn edges better, resulting As shown in Fig. 4, the output of the CA is then passed through
in more accurate edge reconstruction results. The information a convolution network consisting of three layers with kernel
exchange process can be expressed as follows: sizes of 1 × 1, 3 × 3, and 1 × 1, respectively. Each layer is
  followed by a LeakyReLU activation function. MS features are
Fei = F̂ei + LReLU Conv F̂ci  F̂ei fused to produce the enhanced edge maps E ∗ . The process can
  be expressed as follows:
Fci = F̂ci + LReLU Conv F̂ei  F̂ci (3)
E ∗ = FS(CA(FM )) (6)
where  denotes the element-wise multiplication operation. In where CA(·) denotes the CA module and FS(·) represents the
practical applications, we calculate each element contained in feature fusion module.
F̂ei and F̂ci according to (3). The results Fei and Fci can be As shown in Fig. 1, to obtain the SR image ISR , we add the
used as input for the next EFEM. The use of multiple EFEMs desired amount of edge enhancement, denoted as E ∗ − L(IP ),
allows for the progressive refinement of image features and the to the Pre-SR image IP . The plus and minus in Fig. 1 represent
improvement of image quality over successive iterations. After addition and subtraction operations, respectively, performed
N EFEMs enhancement, we can obtain the final results FeN +1 pixel by pixel when computing the SR image. This calculation
and FcN +1 , representing the enhanced edge and context feature, process can be expressed as
respectively.
ISR = IP + E ∗ − L(IP ). (7)
D. Edge Restoration Block
E. Loss Functions in CEEGAN
Most SR methods only extract and utilize features at a single
In the SR field, L1 loss is a widely used loss function. It can
scale, which results in limited utilization of the image’s informa-
reflect the difference between HR and SR images in the pixel
tion and consequently restricts the effectiveness of the model. To
domain. The L1 loss is calculated as follows:
take advantage of MS features, we design the ERB, which can
aggregate the enhanced MS edge features obtained from EFEMs L1 = IHR − ISR 1 (8)
1368 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

Fig. 4. Architecture of the ERB. R2 and R4 denote the 2× upscale and 4× upscale interpolation operation, respectively. ⊗ denotes the element-wise multiplication
operation.

Fig. 5. Visualization comparison of different methods on UCMerced dataset. (a) Baseballdiamond38. (b) Building12. The GT images are zoomed in and displayed
in the top-left corner.

where IHR refer to the ground-truth image. The L1 loss is bene- where φl (·) represents the lth layer of the VGG19 network and
ficial for optimizing PSNR, but the network will tend to output ωl refers to the weight of the lth layer. G(·) and D(·) refer to the
smooth results without sufficient high-frequency detail because generator and the discriminator, respectively.
the L1 loss and PSNR metrics are fundamentally inconsistent Although the combination of the abovementioned loss func-
with the subjective evaluation of human observers. GAN-based tions performs better on natural images, it suffers from severe
methods typically use the sum of pixel loss and adversarial loss. image degradation in the field of remote sensing images. This
In addition, to ensure the generation of high-frequency details, it results in reconstructed images that still have noise and artifacts
is also necessary to utilize perceptual loss [51], which is usually in the high-frequency detail parts. To solve this problem by
a pretrained VGG network. Adversarial loss Ladv and perceptual strengthening the constraints in the high-frequency component
loss Lper are defined as of the image. We design the edge loss LEdge , which is formulated
as follows:
Ladv = − log (D (ISR )) = − log (D (G (ILR ))) (9) LEdge = L(IHR ) − L(ISR )1 . (11)
n Finally, the total loss for the generator is given by
Lper = ωl φl (IHR ) − φl (ISR )22 (10)
LG = L1 + αLadv + βLper + γLEdge (12)
l=1
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1369

Fig. 6. Visualization comparison of different methods on AID dataset. (a) Center253. (b) Viaduct242.

where α, β, γ are the weighting parameters of each component, classes of scenes.2 All images are 600 × 600 in size. For the AID
which is designed to balance the magnitudes of different losses. dataset, we randomly select 80% images as the training set and
In our experiment, we set them to 0.1, 1, and 0.5. ten images per class as the validation set. The NWPU-RESISC45
For the discriminator, the loss widely used in other GAN- remote sensing dataset is a large-scale public dataset for remote
based methods is selected, which is calculated as sensing image scene classification released by Northwestern
Polytechnical University, which contains 45 different scenes. It
LD = − log (D (IHR )) − log (1 − D (ISR )) . (13) contains a total of 31 500 images, each with a size of 256 × 256.
The LoveDA dataset contains 5987 high SR images with 166 768
We train D by minimizing LD , which allows it to distinguish
annotated objects from three different cities. It encompasses
whether the input image is synthesized by the G.
two domains, including urban and rural. Compared to existing
datasets, the LoveDA dataset presents considerable challenges
IV. EXPERIMENTS due to its MS objects and complex background samples.
A. Datasets and Evaluation Metrics 2) Evaluation Metrics: To make a comprehensive compari-
son of the performance of the model, we select three commonly
1) Datasets: In this work, we select four public remote used evaluation metrics that focus on different aspects. The first
sensing datasets, including UCMerced [52], AID [53], NWPU- metric is PSNR, which is an objective criterion for evaluating
RESISC45 [54], and LoveDA [55] for SR, classification, and images. However, a higher PSNR does not necessarily corre-
semantic segmentation. The UCMerced dataset contains 21 spond to better perceptual quality. To better simulate human
scene categories.1 Each class has 100 images with a spatial visual perception, Zhang et al. [56] proposed learned perceptual
resolution of one foot, and the size of all images is 256 × 256. We image patch similarity (LPIPS), which measures the similarity
randomly select 80% of them as the training set and the rest as of two images in a way that is more in line with human judgment.
the validation set. The AID dataset contains 10 000 images in 30 To validate the performance of models in real-world scenarios,

2 All these 30 classes of AID dataset: 1—Airport, 2—Bareland,


1 All these 21 classes of UCMerced dataset: 1—Agricultural, 2— 3—Baseballdiamond, 4—Beach, 5—Bridge, 6—Center, 7—Church, 8—
Airplane, 3—Baseballdiamond, 4—Beach, 5—Buildings, 6—Chaparral, Commercial, 9—Denseresidential, 10—Desert, 11—Farmland, 12—Forest,
7—Denseresidential, 8—Forest, 9—Freeway, 10—Golfcourse, 11—Harbor, 13—Industrial, 14—Meadow, 15—Mediumresidential, 16—Mountain, 17—
12—Intersection, 13—Mediumresidential, 14—Mobilehomepark, 15— Park, 18—Parking, 19—Playground, 20—Pond, 21—Port, 22—Railwaystation,
Overpass, 16—Parkinglot, 17—River, 18—Runway, 19—Sparseresidential, 23—Resort, 24—River, 25—School, 26—Sparseresidential, 27—Square, 28—
20—Storagetanks, and 21—Tenniscourt. Stadium, 29—Storagetanks, 30—Viaduct.
1370 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

TABLE I
RESULTS OF COMPARISON WITH DIFFERENT METHODS ON THE UCMERCED DATASET

we selected the natural image quality evaluator (NIQE) as the C. Performance of Human Visual Perception and Machine
no-reference metric. It should be noted that the lower LPIPS and Vision Applications
NIQE values represent the better effect of SR reconstruction.
In this section, we compare the proposed methods
For the classification experiment, top-1 accuracy is used as with state-of-the-art SR methods, including PSNR-Oriented
the metric. In the semantic segmentation experiment, we select approaches: SRResNet [26], EDSR [27], RCAN [18],
three general evaluation indicators: overall accuracy (OA), mean
SwinIR [41], TransEnet [42], SR3 [43], GAN-based meth-
intersection over union (mIoU), and F1-score. ods: EEGAN [29], SRGAN [26], ESRGAN [28]. EEGAN and
TransENet are proposed for remote sensing images, and other
methods are proposed for natural images. For a fair comparison,
B. Implementation Details and Experimental Environment
we train these models using the same training datasets.
For training, the input patches are cropped in size of 1) Quantitative Results: Due to the different learning diffi-
128 × 128 pixels with a batch size of 16. Meanwhile, we use culties of different scenarios, which tend to lead to large fluc-
random rotation and horizontal flipping to augment the training tuations in indicators. To make a fair comparison, we calculate
samples. For optimization, we use Adam optimizer by setting PSNR and LPIPS separately for each category and report the
β1 = 0.9, β2 = 0.99, and  = 10−8 . We train the PSRM in IFE average scores. Tables I and II show the results of different
for 1600 epochs and fine-tune the whole CEEGAN for 400 methods for upscale ×4 on the UCMerced and AID dataset,
epochs. The learning rate is initialized 2 × 10−4 and decreases respectively. From a macro perspective, PSNR-oriented methods
to half every 400 epochs and 100 epochs for two stages. In our dominate PSNR and GAN-based methods outperform by a
experiments, the raw images in each dataset are treated as real significant margin in terms of LPIPS. Due to the addition of
HR references and corresponding LR images are generated by high-frequency information by GAN to generated images aims
Gaussian blur and bicubic interpolation to construct HR–LR to produce more realistic images, it may result in a decrease in
pairs for training and evaluation. We first use the degraded the PSNR. CEEGAN can outperform other methods compared
image as the input of different SR reconstruction methods for with in terms of LPIPS in both two datasets, which means it
the classification and semantic segmentation experiments. Then, can get better visual quality results. This is because EFEM and
we evaluate the corresponding results by the same classification edge loss, designed for edge enhancement in CEEGAN, make
or semantic segmentation model. The proposed method is the model pay more attention to edge details when restoring
implemented by the PyTorch framework under Ubuntu18.04 and images. Meanwhile, ERB can fuse MS features so that refined
CUDA10.2. All experiments are run on NVIDIA 2080Ti GPUs. details can be obtained on both large and small objects. The
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1371

TABLE II
RESULTS OF COMPARISON WITH DIFFERENT METHODS ON THE AID DATASET

clearer the edge details are, the more realistic image can be. are mainly focused on the object edges, which proves that the
Although EEGAN also focuses on improving image quality region of interest of CEEGAN is the edge. The addition of
through edge enhancement, our method further incorporates high-frequency information can make the image more realistic
edge domain constraints and context features to guide edge re- and comfortable for human vision.
construction, resulting in better performance in terms of LPIPS. 3) Classification Results on UCMerced and NWPU-
It can be observed that CEEGAN can achieve the highest PSNR RESISC45 Datasets: To assess the application performance of
compared with other GAN-based methods on the UCMerced CEEGAN in real scenarios, we conduct image classification ex-
dataset. periments over the UCMerced and NWPU-RESISC45 datasets.
2) Qualitative Comparison: To directly compare the recon- First, we trained a ResNet-34 [44] image classification network
struction results obtained by different methods, Baseballdia- over the original images in both datasets. Subsequently, we
mond38 and Building12 from the UCMerced dataset and Cen- applied bicubic interpolation and Gaussian blur downsampling
ter253 and Viaduct24 from the AID dataset are selected to show on the HR images with a size of 256 × 256 to generate LR
the details of the SR images. As shown in Fig. 5 and Fig. 6, these images with a size of 64 × 64. We then utilize various SR
SR images show that the CEEGAN can obtain more realistic methods to obtain the corresponding reconstruction images.
reconstruction results at the edge of objects, such as roads and Finally, we input the reconstructed images into the image clas-
buildings. To intuitively demonstrate the effectiveness of our sification network to obtain the classification result. As shown
method on edge enhancement, we compare the images before in Table III, the classification task performs the best over the SR
and after edge enhancement, their response edge maps, and the images obtained by CEEGAN. Compared to SwinIR and SR3,
difference between the two edges, as shown in Fig. 7. From which ranked second in the evaluation, CEEGAN can achieve
Fig. 7(j), it can be seen the differences between E ∗ and L(IP ) significant improvements of 0.58% and 1.91% on the UCMerced
1372 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

Fig. 7. Comparison of images before and after edge enhancement is implemented. (a) Pre-SR image IP . (b) Edge map of Pre-SR. L(IP ) (c) Enhanced edge map
E ∗ . (d) SR image ISR . (e) Edge map of SR L(ISR ). (f) GT image IHR . (g) Edge map of GT L(IGT ). (h) Difference of (b) and (g) |L(IHR ) − L(IP )|. (i) Difference
of (c) and (g) |E ∗ − L(IP )|. (j) Difference of (b) and (c) |E ∗ − L(IP )|.

TABLE III
CLASSIFACTION RESULTS ON UCMERCED AND NWPU-RESISC45 DATASET

TABLE IV
SEMANTIC SEGMENTATION RESULTS ON LOVEDA DATASET

and NWPU-RESISC45 datasets, respectively. Furthermore, it is segmentation results. Our method can achieve the best perfor-
worth noting that our method is capable of generating results mance not only in the classification task but also in semantic
that closely resemble HR images, indicating that our method segmentation, with the highest values in OA, F1-score, and
can effectively preserve essential image features and produce mIoU. This can be attributed to the ability of edge loss to
highly realistic images. These advantages highlight the effec- constrain the edge graph, allowing the generation of accurate
tiveness and potential of our proposed method in practical high-frequency information, which enable CEEGAN to achieve
applications. results far beyond EEGAN on semantic segmentation experi-
4) Semantic Segmentation Results on LoveDA Dataset: To ments. The result proves that CEEGAN can not only improve
further verify the practicality of our method, we have de- the model’s understanding of the entire image but also improve
signed a comparative experiment for the semantic segmenta- the model’s ability to understand images at the pixel scale.
tion task. The specific experimental process is similar to the 5) Comparative Results of Real-World Scenarios SR: In
image classification task. The downsampled images are recon- real-world scenarios, the degradation process of images may
structed by different SR models. Then, they are input into the not align with the assumptions made in creating SR datasets.
deeplabv3 [57] model to compare the segmentation results. It Therefore, we select the no-reference metric NIQE to evaluate
should be noted that the semantic segmentation models are the performance of different models on a real-world scenario
trained on the LoveDA dataset, and the SR models are trained image from the WorldView-3 satellite. Fig. 9 shows the SR
on the UCMerced dataset. Table IV lists the semantic segmen- images reconstructed by different SR models along with their
tation evaluation results of SR images obtained by different SR corresponding NIQE values. It can be observed that CEEGAN
models on the LoveDA dataset. Fig. 8 visualizes the semantic can achieve the lowest NIQE value, indicating that CEEGAN
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1373

Fig. 8. Visualization comparison of reconstruction results using different SR methods in semantic segmentation experiments.

Fig. 9. Comparison of SR reconstruction results on real-world scene images using different methods. Red indicates the model that achieved the lowest NIQE.

can produce more natural results when faced with unknown advantages in terms of parameters, FLOPs, and runtimes. Al-
degraded images. though CEEGAN may exhibit a gap in terms of model efficiency
6) Model Complexity Comparison: To compare the running compared to earlier algorithms like ESRGAN, it can achieve
efficiency of different models, we have compared the parame- superior results in terms of LPIPS and practical applications
ters, floating point operations (FLOPs), runtimes, LPIPS, and by leveraging larger models and more advanced architectural
classification accuracy on the UCMerced dataset. Although our designs. The comparative experimental results on model com-
primary objective is to optimize the performance of the SR model plexity confirm that CEEGAN acquires a better balance between
for machine vision applications in remote sensing images, the complexity and accuracy and has more potential for practical
results presented in Table V indicate that our model can also applications in real remote sensing scenarios.
offer advantages in terms of parameter numbers, FLOPs, and
run time. In addition, it outperforms some recent models for
remote sensing image SR, such as TransENet and EEGAN, in D. Ablation Studies
terms of runtime. Moreover, compared to the state-of-the-art In this section, we design a series of experiments on the
diffusion model in the field of SR, SR3, CEEGAN demonstrates UCMereced dataset to validate the effectiveness and necessity of
1374 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

TABLE V
QUANTITATIVE COMPARISON OF PARAMETERS, FLOPS, RUNTIMES, LPIPS, AND CLASSIFICATION ACCURACY FOR DIFFERENT METHODS

TABLE VI TABLE VIII


QUANTITATIVE COMPARISON OF DIFFERENT COMPONENTS IN EFEM QUANTITATIVE COMPARISON OF DIFFERENT LOSS FUNCTIONS

TABLE VII
QUANTITATIVE COMPARISON OF DIFFERENT COMPONENTS IN ERB

Compared to the combined usage of MS and CA in case IV, there


is still a 0.0110 gap in maximizing the utilization of effective
image information for edge reconstruction.
3) Effect of GAN-Loss and Edge Loss: To verify the effect
each component in our proposed method. All models are trained of edge loss with and without EFEM. As listed in Table VIII,
with the same settings. the cases I–IV are different loss function combinations with
1) Effect of Each Part in EFEM: We first investigate the EFEM and ERB, and cases V–VIII remove the whole process
impact of each component in EFEM, including the BiEFFB, of edge enhancement. The GAN-Loss includes adversarial loss
BiCFFB, and CEIEB. Table VI lists the evaluation results of and perceptual loss as presented in (9) and (10). The data from
different settings. The result of case IV is 4.03% higher than the cases II and VI show that the addition of GAN-Loss causes the
result of case II on LPIPS, which indicates the guiding role of decrease of both PSNR and LPIPS, which is consistent with
the semantic branch on the edge branch. Case III adds semantic the purpose of adding GAN-Loss. The data from cases III and
branch to case II. Instead of exchanging information between VII show that the addition of edge loss also causes a decrease in
any pair of BiEFFB and BiCFFB, it takes place once in the ERB. PSNR and LPIPS without EFEM, while its effect is more limited
The results can achieve 0.75% improvement compared with case compared to that of GAN-Loss. When EFEM is available, the
II on LPIPS, which can demonstrate the effectiveness of our addition of edge loss can improve both PSNR and LPIPS. This
proposed edge enhancement module based on context guidance. result indicates that edge loss can work together with EFEM
By comparing the results of cases III and IV, it can be concluded to recover more realistic image edge details in the absence of
that the addition of CEIEB improves LPIPS by 3.30%, which GAN-Loss. However, since edge loss is still essentially an L1
indicates the necessity for semantic communication after each loss, the network is not optimized in the direction of conforming
edge enhancement. to human visual habits. Cases IV and VIII demonstrate that
2) Effect of Each Part in ERB: We conduct the ablation study the combined use of both edge loss and GAN-Loss can make
on the MS architectural and CA mechanism in ERB, as shown in the model optimal at LPIPS with or without EFEM. This also
Table VII. Compared to case IV, case I removes the MS structure implies that the network can conform itself to human visual
and CA mechanism in ERB, and only utilizes the edge features habits and be suitable for practical machine vision applications
from the maximum scale as input for edge reconstruction, which by learning how to incorporate high-frequency information that
leads to a decrease of 0.0307 in LPIPS. Case II adds the MS is difficult to recover from L1 loss during the reconstruction
architectural into case I. However, due to the absence of CA, process under the constraints of these two loss functions.
which helps the model understand the importance of different 4) Number of EFEM: To examine how the number of EFEMs
channels and coordinates, the reconstruction results only im- affects the effect of edge reconstruction and the performance
proved by 0.0025. Case III adds the CA mechanism into case of the model, ablation experiments are carried out and the
I, while still using only the maximum size features. Although it results are shown in Table IX. Our experimental results show
resulted in a 0.0197 improvement in LPIPS, using single-scale that increasing the number of EFEMs can improve the model
features cannot fully represent the objects that exist in the image. performance. As shown in Fig. 10, the benefits of adding the
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1375

TABLE IX [2] R. Dian, A. Guo, and S. Li, “Zero-shot hyperspectral sharpening,” IEEE
QUANTITATIVE COMPARISON OF DIFFERENT NUMBERS OF EFEMS Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 12650–12666,
Oct. 2023.
[3] R. Dian, T. Shan, W. He, and H. Liu, “Spectral super-resolution via model-
guided cross-fusion network,” IEEE Trans. Neural Netw. Learn. Syst., to
be published, doi: 10.1109/TNNLS.2023.3238506.
[4] L. He, W. Zhang, J. Shi, and F. Li, “Cross-domain association mining
based generative adversarial network for pansharpening,” IEEE J. Sel.
Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 7770–7783, 2022.
[5] X. Guan, F. Li, X. Zhang, M. Ma, and S. Mei, “Assessing full-resolution
pansharpening quality: A comparative study of methods and measure-
ments,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 16,
pp. 6860–6875, Jul. 2023.
[6] S. Mei et al., “Lightweight multiresolution feature fusion network for
spectral super-resolution,” IEEE Trans. Geosci. Remote Sens., vol. 61,
Jan. 2023, Art. no. 5501414.
[7] S. Mei, X. Li, X. Liu, H. Cai, and Q. Du, “Hyperspectral image classifica-
tion using attention-based bidirectional long short-term memory network,”
IEEE Trans. Geosci. Remote Sens., vol. 60, Aug. 2021, Art. no. 5509612.
[8] S. Mei, C. Song, M. Ma, and F. Xu, “Hyperspectral image classification
using group-aware hierarchical transformer,” IEEE Trans. Geosci. Remote
Sens., vol. 60, Sep. 2022, Art. no. 5539014.
[9] G.-S. Xia et al., “DOTA: A large-scale dataset for object detection in aerial
images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018,
pp. 3974–3983.
[10] J. Wang, F. Li, and H. Bi, “Gaussian focal loss: Learning distribution
Fig. 10. Performance of LPIPS with the different numbers of EFEMs. polarized angle prediction for rotated object detection in aerial images,”
IEEE Trans. Geosci. Remote Sens., vol. 60, May 2022, Art. no. 4707013.
[11] S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource
building extraction from an open aerial and satellite imagery data set,”
same number of EFEMs gradually decrease, while the increase IEEE Trans. Geosci. Remote Sens., vol. 57, no. 1, pp. 574–586, Jan. 2019.
[12] S. Saha, F. Bovolo, and L. Bruzzone, “Unsupervised deep change vector
in parameter count is fixed. Although EFEM can improve the analysis for multiple-change detection in VHR images,” IEEE Trans.
model performance, its excessive use may increase model com- Geosci. Remote Sens., vol. 57, no. 6, pp. 3677–3693, Jun. 2019.
plexity and result in a potential loss of effectiveness, leading to [13] M. Liu, Q. Shi, A. Marinoni, D. He, X. Liu, and L. Zhang, “Super-
resolution-Based change detection network with stacked attention module
overfitting and lengthier training times. Therefore, we decided for images with different resolutions,” IEEE Trans. Geosci. Remote Sens.,
to limit the number of EFEMs used in practice to three, as it vol. 60, Jul. 2022, Art. no. 4403718.
strikes a balance between model performance and complexity. [14] R. Keys, “Cubic convolution interpolation for digital image processing,”
IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 6,
pp. 1153–1160, Dec. 1981.
[15] R. Schultz and R. Stevenson, “A Bayesian approach to image expansion
V. CONCLUSION for improved definition,” IEEE Trans. Image Process., vol. 3, no. 3,
pp. 233–242, May 1994.
In this work, a context aware EEGAN is proposed for remote [16] C.-Y. Yang and M.-H. Yang, “Fast direct super-resolution by simple
sensing SR. To address the limitations of existing edge recon- functions,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 561–568.
struction methods, we propose a new edge enhancement method [17] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using
deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell.,
that simultaneously exploits contextual semantic features with vol. 38, no. 2, pp. 295–307, Feb. 2016.
edge features and maintains information exchange during the [18] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-
gradual enhancement process. To maintain the accuracy of the resolution using very deep residual channel attention networks,” in Proc.
Eur. Conf. Comput. Vis., 2018, pp. 286–301.
high-frequency information, the edge loss function is designed [19] S. Lei, Z. Shi, and Z. Zou, “Super-resolution for remote sensing images
to constrain the generated edges in the edge domain. via local,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 8, pp. 1243–1247,
We compare the SR results on the UCMerced and AID Aug. 2017.
[20] D. Zhang, J. Shao, X. Li, and H. T. Shen, “Remote sensing image super-
datasets with the current advanced methods and achieve the best resolution via mixed high-order attention network,” IEEE Trans. Geosci.
results on the LPIPS. The ablation experiments prove the effec- Remote Sens., vol. 59, no. 6, pp. 5183–5196, Jun. 2021.
tiveness of each part in CEEGAN and the Edge loss function. [21] S. Zhang, Q. Yuan, J. Li, J. Sun, and X. Zhang, “Scene-adaptive re-
mote sensing image super-resolution using a multiscale attention net-
To further demonstrate the suitability of our results for practical work,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 7, pp. 4764–4779,
remote sensing applications, classification, and segmentation Jul. 2020.
experiments are conducted on the UCMerced and LoveDA [22] X. Dong, L. Wang, X. Sun, X. Jia, L. Gao, and B. Zhang, “Remote sensing
image super-resolution using second-order multi-scale networks,” IEEE
datasets, respectively. CEEGAN achieves the best results on Trans. Geosci. Remote Sens., vol. 59, no. 4, pp. 3473–3485, Apr. 2021.
multiple evaluation metrics, which proves that our method is [23] S. Lei and Z. Shi, “Hybrid-scale self-similarity exploitation for remote
more suitable for the practical application of remote sensing sensing image super-resolution,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, Apr. 2022, Art. no. 5401410.
image SR. [24] Y. Xiao, Q. Yuan, K. Jiang, J. He, Y. Wang, and L. Zhang, “From degrade to
upgrade: Learning a self-supervised degradation guided adaptive network
REFERENCES for blind remote sensing image super-resolution,” Inf. Fusion, vol. 96,
pp. 297–311, Aug. 2023.
[1] G. Cheng, J. Han, P. Zhou, and D. Xu, “Learning rotation-invariant and [25] J. Feng et al., “A deep multitask convolutional neural network for remote
fisher discriminative convolutional neural networks for object detection,” sensing image super-resolution and colorization,” IEEE Trans. Geosci.
IEEE Trans. Image Process., vol. 28, no. 1, pp. 265–278, Jan. 2019. Remote Sens., vol. 60, Feb. 2022, Art. no. 5407915.
1376 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024

[26] C. Ledig et al., “Photo-realistic single image super-resolution using a [50] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile
generative adversarial network,” in Proc. IEEE Conf. Comput. Vis. Pattern network design,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
Recognit., 2017, pp. 4681–4690. 2021, pp. 13713–13722.
[27] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual [51] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style
networks for single image super-resolution,” in Proc. IEEE Conf. Comput. transfer and super-resolution,” in Proc. Eur. Conf. Comput. Vis., 2016,
Vis. Pattern Recognit. Workshops, 2017, pp. 136–144. pp. 694–711.
[28] X. Wang et al., “ESRGAN: Enhanced super-resolution generative ad- [52] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions
versarial networks,” in Proc. Eur. Conf. Comput. Vis. Workshops, 2018, for land-use classification,” in Proc. 18th SIGSPATIAL Int. Conf. Adv.
pp. 63–79. Geographic Inf. Syst., 2010, pp. 270–279.
[29] K. Jiang, Z. Wang, P. Yi, G. Wang, T. Lu, and J. Jiang, “Edge-enhanced [53] G.-S. Xia et al., “AID: A benchmark data set for performance evaluation
GAN for remote sensing image superresolution,” IEEE Trans. Geosci. of aerial scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 55,
Remote Sens., vol. 57, no. 8, pp. 5799–5812, Aug. 2019. no. 7, pp. 3965–3981, Jul. 2017.
[30] S. Jia, Z. Wang, Q. Li, X. Jia, and M. Xu, “Multiattention generative [54] G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classifi-
adversarial network for remote sensing image super-resolution,” IEEE cation: Benchmark and state of the art,” Proc. IEEE, vol. 105, no. 10,
Trans. Geosci. Remote Sens., vol. 60, Jun. 2022, Art. no. 5624715. pp. 1865–1883, Oct. 2017.
[31] R. Dong, L. Zhang, and H. Fu, “RRSGAN: Reference-based super- [55] J. Wang, Z. Zheng, A. Ma, X. Lu, and Y. Zhong, “Loveda: A remote sensing
resolution for remote sensing image,” IEEE Trans. Geosci. Remote Sens., land-cover dataset for domain adaptive semantic segmentation,” in Proc.
vol. 60, Jan. 2022, Art. no. 5601117. Neural Inf. Process. Syst. Track Datasets Benchmarks, 2021.
[32] M. S. Moustafa and S. A. Sayed, “Satellite imagery super-resolution [56] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The
using squeeze-and-excitation-based GAN,” Int. J. Aeronautical Space Sci., unreasonable effectiveness of deep features as a perceptual metric,” in
vol. 22, no. 6, pp. 1481–1492, Dec. 2021. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 586–595.
[33] K. Jiang, Z. Wang, P. Yi, J. Jiang, J. Xiao, and Y. Yao, “Deep distillation [57] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethink-
recursive network for remote sensing imagery super-resolution,” Remote ing atrous convolution for semantic image segmentation,” 2017,
Sens., vol. 10, no. 11, Nov. 2018, Art. no. 1700. arXiv:1706.05587.
[34] Y. Xiao, X. Su, Q. Yuan, D. Liu, H. Shen, and L. Zhang, “Satellite video
super-resolution via multiscale deformable convolution alignment and
temporal grouping projection,” IEEE Trans. Geosci. Remote Sens., vol. 60,
Sep. 2022, Art. no. 5610819.
[35] P. Yi, Z. Wang, K. Jiang, J. Jiang, T. Lu, and J. Ma, “A progressive
fusion generative adversarial network for realistic and consistent video Zhihan Ren received the B.S. degree in informa-
super-resolution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, tion engineering from the College of Communication
pp. 2264–2280, May 2022. Engineering, Jilin University, Changchun, China, in
[36] Z. Wang, K. Jiang, P. Yi, Z. Han, and Z. He, “Ultra-dense GAN for satel- 2023. He is currently working toward the M.S. degree
lite imagery super-resolution,” Neurocomputing, vol. 398, pp. 328–337, in information and communication engineering with
Jul. 2020. the School of Information and Communications En-
[37] I. J. Goodfellow et al., “Generative adversarial nets,” in Proc. Int. Conf. gineering, Xi’an Jiaotong University, Xi’an, China.
Neural Inf. Process. Syst., 2014. His research interests include deep learning and
[38] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for remote sensing image understanding and processing.
image recognition at scale,” in Proc. 9th Int. Conf. Learn. Representations,
2021.
[39] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S.
Zagoruyko, “End-to-end object detection with transformers,” in Proc. Eur.
Conf. Comput. Vis., 2020, pp. 213–229.
[40] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using
shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, Lijun He received the B.S. degree in information
pp. 9992–10002. engineering and Ph.D. degree in information and
[41] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: communications engineering from the School of In-
Image restoration using Swin transformer,” in Proc. IEEE/CVF Int. Conf. formation and Communications Engineering, Xi’an
Comput. Vis. Workshops, 2021, pp. 1833–1844. Jiaotong University, Xi’an, China, in 2008 and 2016,
[42] S. Lei, Z. Shi, and W. Mo, “Transformer-based multistage enhancement respectively.
for remote sensing image super-resolution,” IEEE Trans. Geosci. Remote She is currently an Associate Professor with the
Sens., vol. 60, Dec. 2022, Art. no. 5615611. School of Information and Communications Engi-
[43] C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, neering, Xi’an Jiaotong University. Her research in-
“Image super-resolution via iterative refinement,” IEEE Trans. Pattern terests include video communication and transmis-
Anal. Mach. Intell., vol. 45, no. 4, pp. 4713–4726, Apr. 2022. sion, video analysis, processing, and compression
[44] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image techniques.
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
pp. 770–778.
[45] H. Chen et al., “Pre-trained image processing transformer,” in
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021,
pp. 12299–12310.
[46] S. Lei, Z. Shi, and Z. Zou, “Coupled adversarial training for remote sensing Jichuan Lu received the master’s degree in commu-
image super-resolution,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5, nication and information system from the School of
pp. 3633–3643, May 2020. Communication and Information Engineering, Xid-
[47] J. Canny, “A computational approach to edge detection,” IEEE Trans. ian University, Xi’an, China, in 2011.
Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. He is currently a Senior Expert with China Mo-
[48] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. IEEE Int. bile Communications Group, Xi’an, China, mainly
Conf. Comput. Vis., 2015, pp. 1395–1403. engaged in research in the fields of cloud computing
[49] M. Pu, Y. Huang, Y. Liu, Q. Guan, and H. Ling, “EDTER: Edge detec- and AI.
tion with transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit., 2022, pp. 1392–1402.

You might also like