Context Aware Edge-Enhanced GAN For Remote Sensing Image Super-Resolution
Context Aware Edge-Enhanced GAN For Remote Sensing Image Super-Resolution
Abstract—Remote sensing images are essential in many fields, (LR) image. With the development of remote sensing image
such as land cover classification and building extraction. The huge application technology, remote sensing images are extensively
difference between the directly acquired remote sensing images used in hyperspectral application [1], [2], [3], [4], [5], [6], [7],
and the actual scene, due to the complex degradation process and
hardware limitations, seriously affects the performance achieved [8], object detection [9], [10], change detection [11], [12], [13],
by the same classification or segmentation model. Therefore, using and other fields. However, image SR is an ill-posed problem
super-resolution (SR) algorithms to improve image quality and because one LR image may degenerate from several different
achieve better results is an effective method. However, current SR HR images. The SR algorithm is to find an optimal HR image
methods only focus on the similarity of pixel values between SR and from all possible solutions. The quality of remote sensing im-
high-resolution (HR) images without considering perceptual sim-
ilarities, which usually leads to the problem of oversmoothed and ages is limited by hardware equipment and natural environment
blurred edge details. Moreover, there is little attention to human interference, such as clouds and fog, which leads to serious
visual habits and machine vision applications for remote sensing degradation of the acquired remote sensing images that do not
images. In this work, we propose the context aware edge-enhanced match the actual scene seriously. However, the natural image
generative adversarial network (CEEGAN) SR framework to re- SR algorithms neglect these factors, leading to unsatisfactory
construct visually pleasing images that can be practically applied
in actual scenarios. In the generator of CEEGAN, we build an edge results in remote sensing images. Specifically, incorrect edges
feature enhanced module (EFEM) to enhance the edges by combin- can cause semantic segmentation models to be unable to dis-
ing the edge features with context information. Edge restoration tinguish pixels at the edge or image classification models to be
block is designed to fuse multiscale edge features enhanced by unable to correctly classify images. Therefore, the development
EFEM and reconstruct a refined edge map. Furthermore, we de- of a remote sensing image SR algorithm that can accommodate
signed an edge loss function to constrain the generated SR and HR
similarity at the edge domain. Experimental results show that our both human visual perception and machine vision applications
proposed method can obtain SR images with a better reconstruction is crucial.
performance. Meanwhile, CEEGAN can achieve the best results Over the past few decades, SR has attracted great atten-
on classification and semantic segmentation datasets for machine tion from researchers and many SR methods have emerged.
vision applications. Existing methods can be divided into two main categories:
Index Terms—Edge enhancement, generative adversarial 1) traditional methods and 2) deep learning-based methods.
network (GAN), remote sensing images, super-resolution (SR). Traditional methods can be further categorized into interpolation
methods and reconstruction methods. Interpolation methods use
I. INTRODUCTION the same kernel without considering the position of the pixel
UPER-RESOLUTION (SR) is the process of restoring a point. Interpolation methods, such as nearest neighbor inter-
S high-resolution (HR) image from a given low-resolution polation, bilinear interpolation, and bicubic interpolation [14],
can improve the image resolution based on the content of the
Manuscript received 12 October 2023; revised 7 November 2023; accepted image itself but cannot provide more information. However,
10 November 2023. Date of publication 15 November 2023; date of current they may cause edge blurring and fail to achieve the desired
version 14 December 2023. This work was supported in part by the National
Key R&D Program of China under Grant 2022ZD0115802, in part by the Central visual quality. The reconstruction-based methods [15], [16]
Guidance on Local Science and the Technology Development Fund under Grant solve the problem from the perspective of image degradation
2022ZY1-CGZY-01HZ01, in part by the Natural Science Foundation of Sichuan models, assuming that the HR images are obtained from the LR
Province under Grant 2022NSFSC0966, and in part by the Key R&D Project in
Shaanxi Province under Grant 2023-ZDLNY-65. (Corresponding author: Lijun images with appropriate motion transformation, blurring, and
He.) noise. These methods constrain the generation of SR images by
Zhihan Ren is with the Shaanxi Key Laboratory of Deep Space Exploration extracting information from LR images and combining it with
Intelligent Information Technology, School of Information and Communica-
tions Engineering, Xi’an Jiaotong University, Xi’an 710049, China (e-mail: prior knowledge. However, reconstruction-based methods suffer
[email protected]). from the problems of complex optimization methods and high
Lijun He is with the Shaanxi Key Laboratory of Deep Space Exploration computational costs.
Intelligent Information Technology, School of Information and Communications
Engineering, Xi’an Jiaotong University, Xi’an 710049, China, and also with the With the development of deep learning, SR algorithms
Sichuan Digital Economy Industry Development Research Institute, Chengdu, based on convolutional neural network (CNN) have been pro-
Sichuan 610036, China (e-mail: [email protected]). posed [17]. Based on this, an amount of research [18], [19],
Jichuan Lu is with the China Mobile Communications Group Shaanxi Com-
pany Ltd., Xi’an 710077, China (e-mail: [email protected]). [20], [21], [22], [23], [24], [25] has been devoted to optimizing
Digital Object Identifier 10.1109/JSTARS.2023.3333271 networks by using L1 or L2 loss functions. These methods
© 2023 The Authors. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License. For more information, see
https://creativecommons.org/licenses/by-nc-nd/4.0/
1364 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024
use CNN to learn the implicit mapping between pairs of LR features can be severely affected. CEEGAN performs the
and HR images and then predict the HR images corresponding information exchange between contextual semantic fea-
to the LR images based on the learned mapping relationship. tures and edge features, and integrates contextual features
However, these methods select PSNR as the evaluation metric into the image, aiming to restore the texture information as
and generate images with high PSNR values that may not align much as possible in the generated SR images. Both types
with human visual perception. In addition to the PSNR-oriented of features complement each other, to satisfy the needs of
method, there is another type of methods [26], [27], [28], [29], both human and machine vision.
[30], [31], [32], [33], [34], [35], [36] that introduce generative 2) An edge enhancement module based on context-guidance:
adversarial network (GAN) [37] into the field of SR, which To enhance the images with blurred edge details, we
aim to generate more realistic images. Although the images explore and integrate multiscale (MS) edge features and
generated by GAN-based methods are more realistic compared context semantic features in the edge feature enhanced
to PSNR-oriented methods, the high-frequency details learned module (EFEM). The high-level context semantic features
by GAN-based methods may not be consistent with the actual are beneficial to guide the extraction and enhancement of
situation due to the lack of constraints and the severe degradation edge features. Therefore, we keep the information inter-
of remote sensing images. Jiang et al. [29] noticed that existing action between the context semantic and edge branches to
methods cannot generate correct edges and proposed a denoising help the edge branch understand the high-level semantic
approach through mask processing. However, this method solely information in the image. Moreover, we fuse the MS
focused on extracting and utilizing features from the edge do- features enhanced by EFEMs and reconstruct the edge
main, neglecting the valuable contextual semantic information map through the edge restoration block (ERB).
present in remote sensing images. Moreover, it does not impose 3) An edge loss function to generate more realistic image:
any constraints on the enhanced edges, resulting in suboptimal Due to the neglect of edges by L1 and L2 loss, which leads
performance. Recently, the transformer and diffusion model to the oversmooth images, and the sensitivity of GAN-loss
has been widely used in the field of computer vision [38], to noise, we expand the attention of SR models from the
[39], [40]. Some works have extended the transformer and pixel domain to the edge domain by designing edge loss
diffusion model to the field of SR. SwinIR [41] proposed an function that constrains the SR and HR image in the edge
image reconstruction model based on Swin Transformer [40]. domain, making the generated images more realistic and
Transformer-based enhancement network (TransENet) [42] pro- preserving important edge details.
posed a multistage enhancement structure based on transformer. The rest of this article is organized as follows. Section II
Inspired by the denoising diffusion probabilistic model, Image summarizes the related works. Section III introduces the details
super-resolution via iterative refinement (SR3) [43] performed of CEEGAN. The experimental results are given in Section IV.
SR images through a random iterative denoising process. Finally, Section V concludes this article.
In general, there are still some limitations of the existing
algorithms. II. RELATED WORK
1) The generated images by the PSNR-oriented method may
not align with human visual perception since they ignore A. PSNR-Oriented SR Models
the perceptual similarity, which results in oversmoothed Since CNN was introduced to SR by Dong et al. [17], a
images. super-resolution convolutional neural network (SRCNN) with
2) The practical applications of GAN-based methods may be three convolutional layers was pioneered. SRCNN utilized the
limited due to their susceptibility to noise, and inaccura- L2 loss function to optimize the PSNR, aiming to improve
cies in generating essential high-frequency information in its performance beyond traditional methods with the strong
images, especially in the case of remote sensing images nonlinear fitting capability of CNN. With the introduction of
with complex degradation processes. ResNet [44], SRResNet [26] introduced the residual network in
To address the above problems, we propose an SR framework the SR network to combine the low-level feature and deep-level
for joint context semantic information for edge enhancement feature to improve the network learning ability. To enhance
of remote sensing images, named context aware edge-enhanced the performance of the network, Lim et al. [27] proposed the
generative adversarial network (CEEGAN). The main contribu- enhanced deep residual network (EDSR), which removed the
tions of this work are summarized as follows. BN layer and increased the model size without increasing com-
1) An SR framework for both human vision and machine putational resources. Furthermore, Zhang et al. [18] proposed
analysis: The existing SR framework is primarily focused a residual channel attention network (RCAN) with a channel
on restoration at the pixel level, and a high PSNR may not attention mechanism, which can adaptively rescale channel fea-
satisfy the requirements for high-level computer vision tures. With the development of the transformer, Chen et al. [45]
tasks in practical scenarios. Remote sensing images usu- constructed the first transformer model for image SR, and Liang
ally contain a multitude of objects with different scales, et al. [41] proposed SwinIR, an image reconstruction model
shapes, and complex spatial relationships. Therefore, re- based on the swin transformer [40].
mote sensing images possess highly rich contextual se- Remote sensing images are more complex than natural images
mantic features. However, in the complex degradation in terms of the degradation process and contain objects of
process of remote sensing images, the contextual semantic varying sizes and shapes, which makes SR of remote sensing
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1365
Fig. 1. Overall structure of the CEEGAN. Edge extraction denotes the Laplacian operator. The blue line indicates the network flow of CEEGAN. The orange
line indicates the calculation process of the loss function of CEEGAN. SRLoss contains L1 loss, adversarial loss, and perceptual loss.
images more challenging. Therefore, some works proposed SR of discriminators in GAN, enhanced SRGAN (ESRGAN) [28]
algorithms for remote sensing images, considering their unique proposed relativistic GAN that allowed the discriminator to
characteristics and challenges. Lei et al. [19] designed a local- predict relative realness instead of the absolute value and used
global combined network, which combined local and global the network interpolation method to balance the conflict be-
information by cascading shallow and deep feature mappings. tween objective and subjective evaluation metrics. Similarly,
To further integrate information from different depths, Zhang coupled-discriminated GANs [46] proposed a discriminator that
et al. [20] proposed the mixed high-order attention network, makes judgments based on SR images and HR images to better
which made full use of hierarchical features through high-order distinguish the input. To solve the problem of unclear image
attention. Some works addressed the problem of remote sensing generation edges, edge-enhanced GAN (EEGAN) [29] proposed
image SR from the direction of MS features. The MS attention an edge enhancement strategy by purifying noisy images to
network [21] employed convolutional with different kernel sizes generate precise edges and enhance image contours. Multiat-
to extract MS features of remote sensing images and uses the tention GAN [30] proposed branch attention to integrating up-
channel attention mechanism to fuse features at different scales. sampled LR images with high-level features. Different from the
Dong et al. [22] proposed a second-order MS super-resolution PSNR-oriented SR methods, which pursue high PSNR value, the
network to maximize the use of learned MS information by GAN-based methods prefer to generate more realistic images.
exploiting small-difference and large-difference features at the However, the GAN-based methods are sensitive to noise, which
local and global levels, respectively. Hybrid-scale self-similarity leads the generator to add incorrect high-frequency details to SR
exploitation network (HSENet) [23] used self-similarity to learn images. This shortcoming limits the performance of SR images
internal recurrence models in single-scale and cross-scale in in subsequent high-level computer vision tasks.
remote sensing images, achieving stronger feature representa-
tion. In contrast to the abovementioned methods, TransENet [42] III. METHOD
proposed a transformer-based MS enhancement structure to MS
high and low-dimensional features. A. Overview of the Proposed CEEGAN
Fig. 1 illustrates the overall framework of CEEGAN. The
generator of CEEGAN can be divided into the following three
B. GAN-Based SR Models
parts: Initial feature extraction (IFE), EFEM, and ERB.
Ledig et al. [26] first applied GAN to image SR reconstruction The original LR image ILR ∈ RH×W ×3 , where H and W
and proposed SRGAN, which increased perceptual loss and denote the height and width of the image, respectively, is first
adversarial loss to make the generated images more realistic. processed by presuper-resolution module (PSRM) to generate a
To fully leverage the features across different layers, ultradense Pre-SR image IP ∈ R4H×4 W ×3 , which can achieve reconstruc-
GAN (udGAN) [36] proposed the ultradense residual block, tion of most regions in LR images except for the edges. Pre-SR
which reforms the internal layout of the residual block into image is simultaneously fed into the edge and context semantic
a two-dimensional matrix topology. To improve the judgment feature extraction layer to obtain corresponding features. In the
1366 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024
⎛ ⎛ ⎞⎞
features, which can guide the enhancement of edge features.
w1i+1 · fˆe1
i
+ w2i+1 · Resize fˆe2
i+1
Moreover, objects in remote sensing images usually cover a large = LReLU ⎝Conv ⎝ ⎠⎠
span at the scale dimension, which makes it challenging for a w1i+1 + w2i+1 +
single-scale feature to capture all the information needed for SR (2)
reconstruction. Therefore, the proposed convolutional layer can
extract MS features, which enables it to adapt to various object where LReLU(·) indicates the LeakyReLU activation function.
sizes in remote sensing images. Resize(·) represents the use of interpolation operations to make
For the edge extraction, there are several methods, such as the dimensions consistent. w1i+1 and w2i+1 are two learnable
Canny [47], HED [48], and EDTER [49]. Although the edge ex- weights that refer to the importance of each feature. The super-
traction algorithm can get accurate contours of objects, it cannot script i + 1 indicates that the variable represents the weight for
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1367
Fig. 4. Architecture of the ERB. R2 and R4 denote the 2× upscale and 4× upscale interpolation operation, respectively. ⊗ denotes the element-wise multiplication
operation.
Fig. 5. Visualization comparison of different methods on UCMerced dataset. (a) Baseballdiamond38. (b) Building12. The GT images are zoomed in and displayed
in the top-left corner.
where IHR refer to the ground-truth image. The L1 loss is bene- where φl (·) represents the lth layer of the VGG19 network and
ficial for optimizing PSNR, but the network will tend to output ωl refers to the weight of the lth layer. G(·) and D(·) refer to the
smooth results without sufficient high-frequency detail because generator and the discriminator, respectively.
the L1 loss and PSNR metrics are fundamentally inconsistent Although the combination of the abovementioned loss func-
with the subjective evaluation of human observers. GAN-based tions performs better on natural images, it suffers from severe
methods typically use the sum of pixel loss and adversarial loss. image degradation in the field of remote sensing images. This
In addition, to ensure the generation of high-frequency details, it results in reconstructed images that still have noise and artifacts
is also necessary to utilize perceptual loss [51], which is usually in the high-frequency detail parts. To solve this problem by
a pretrained VGG network. Adversarial loss Ladv and perceptual strengthening the constraints in the high-frequency component
loss Lper are defined as of the image. We design the edge loss LEdge , which is formulated
as follows:
Ladv = − log (D (ISR )) = − log (D (G (ILR ))) (9) LEdge = L(IHR ) − L(ISR )1 . (11)
n Finally, the total loss for the generator is given by
Lper = ωl φl (IHR ) − φl (ISR )22 (10)
LG = L1 + αLadv + βLper + γLEdge (12)
l=1
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1369
Fig. 6. Visualization comparison of different methods on AID dataset. (a) Center253. (b) Viaduct242.
where α, β, γ are the weighting parameters of each component, classes of scenes.2 All images are 600 × 600 in size. For the AID
which is designed to balance the magnitudes of different losses. dataset, we randomly select 80% images as the training set and
In our experiment, we set them to 0.1, 1, and 0.5. ten images per class as the validation set. The NWPU-RESISC45
For the discriminator, the loss widely used in other GAN- remote sensing dataset is a large-scale public dataset for remote
based methods is selected, which is calculated as sensing image scene classification released by Northwestern
Polytechnical University, which contains 45 different scenes. It
LD = − log (D (IHR )) − log (1 − D (ISR )) . (13) contains a total of 31 500 images, each with a size of 256 × 256.
The LoveDA dataset contains 5987 high SR images with 166 768
We train D by minimizing LD , which allows it to distinguish
annotated objects from three different cities. It encompasses
whether the input image is synthesized by the G.
two domains, including urban and rural. Compared to existing
datasets, the LoveDA dataset presents considerable challenges
IV. EXPERIMENTS due to its MS objects and complex background samples.
A. Datasets and Evaluation Metrics 2) Evaluation Metrics: To make a comprehensive compari-
son of the performance of the model, we select three commonly
1) Datasets: In this work, we select four public remote used evaluation metrics that focus on different aspects. The first
sensing datasets, including UCMerced [52], AID [53], NWPU- metric is PSNR, which is an objective criterion for evaluating
RESISC45 [54], and LoveDA [55] for SR, classification, and images. However, a higher PSNR does not necessarily corre-
semantic segmentation. The UCMerced dataset contains 21 spond to better perceptual quality. To better simulate human
scene categories.1 Each class has 100 images with a spatial visual perception, Zhang et al. [56] proposed learned perceptual
resolution of one foot, and the size of all images is 256 × 256. We image patch similarity (LPIPS), which measures the similarity
randomly select 80% of them as the training set and the rest as of two images in a way that is more in line with human judgment.
the validation set. The AID dataset contains 10 000 images in 30 To validate the performance of models in real-world scenarios,
TABLE I
RESULTS OF COMPARISON WITH DIFFERENT METHODS ON THE UCMERCED DATASET
we selected the natural image quality evaluator (NIQE) as the C. Performance of Human Visual Perception and Machine
no-reference metric. It should be noted that the lower LPIPS and Vision Applications
NIQE values represent the better effect of SR reconstruction.
In this section, we compare the proposed methods
For the classification experiment, top-1 accuracy is used as with state-of-the-art SR methods, including PSNR-Oriented
the metric. In the semantic segmentation experiment, we select approaches: SRResNet [26], EDSR [27], RCAN [18],
three general evaluation indicators: overall accuracy (OA), mean
SwinIR [41], TransEnet [42], SR3 [43], GAN-based meth-
intersection over union (mIoU), and F1-score. ods: EEGAN [29], SRGAN [26], ESRGAN [28]. EEGAN and
TransENet are proposed for remote sensing images, and other
methods are proposed for natural images. For a fair comparison,
B. Implementation Details and Experimental Environment
we train these models using the same training datasets.
For training, the input patches are cropped in size of 1) Quantitative Results: Due to the different learning diffi-
128 × 128 pixels with a batch size of 16. Meanwhile, we use culties of different scenarios, which tend to lead to large fluc-
random rotation and horizontal flipping to augment the training tuations in indicators. To make a fair comparison, we calculate
samples. For optimization, we use Adam optimizer by setting PSNR and LPIPS separately for each category and report the
β1 = 0.9, β2 = 0.99, and = 10−8 . We train the PSRM in IFE average scores. Tables I and II show the results of different
for 1600 epochs and fine-tune the whole CEEGAN for 400 methods for upscale ×4 on the UCMerced and AID dataset,
epochs. The learning rate is initialized 2 × 10−4 and decreases respectively. From a macro perspective, PSNR-oriented methods
to half every 400 epochs and 100 epochs for two stages. In our dominate PSNR and GAN-based methods outperform by a
experiments, the raw images in each dataset are treated as real significant margin in terms of LPIPS. Due to the addition of
HR references and corresponding LR images are generated by high-frequency information by GAN to generated images aims
Gaussian blur and bicubic interpolation to construct HR–LR to produce more realistic images, it may result in a decrease in
pairs for training and evaluation. We first use the degraded the PSNR. CEEGAN can outperform other methods compared
image as the input of different SR reconstruction methods for with in terms of LPIPS in both two datasets, which means it
the classification and semantic segmentation experiments. Then, can get better visual quality results. This is because EFEM and
we evaluate the corresponding results by the same classification edge loss, designed for edge enhancement in CEEGAN, make
or semantic segmentation model. The proposed method is the model pay more attention to edge details when restoring
implemented by the PyTorch framework under Ubuntu18.04 and images. Meanwhile, ERB can fuse MS features so that refined
CUDA10.2. All experiments are run on NVIDIA 2080Ti GPUs. details can be obtained on both large and small objects. The
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1371
TABLE II
RESULTS OF COMPARISON WITH DIFFERENT METHODS ON THE AID DATASET
clearer the edge details are, the more realistic image can be. are mainly focused on the object edges, which proves that the
Although EEGAN also focuses on improving image quality region of interest of CEEGAN is the edge. The addition of
through edge enhancement, our method further incorporates high-frequency information can make the image more realistic
edge domain constraints and context features to guide edge re- and comfortable for human vision.
construction, resulting in better performance in terms of LPIPS. 3) Classification Results on UCMerced and NWPU-
It can be observed that CEEGAN can achieve the highest PSNR RESISC45 Datasets: To assess the application performance of
compared with other GAN-based methods on the UCMerced CEEGAN in real scenarios, we conduct image classification ex-
dataset. periments over the UCMerced and NWPU-RESISC45 datasets.
2) Qualitative Comparison: To directly compare the recon- First, we trained a ResNet-34 [44] image classification network
struction results obtained by different methods, Baseballdia- over the original images in both datasets. Subsequently, we
mond38 and Building12 from the UCMerced dataset and Cen- applied bicubic interpolation and Gaussian blur downsampling
ter253 and Viaduct24 from the AID dataset are selected to show on the HR images with a size of 256 × 256 to generate LR
the details of the SR images. As shown in Fig. 5 and Fig. 6, these images with a size of 64 × 64. We then utilize various SR
SR images show that the CEEGAN can obtain more realistic methods to obtain the corresponding reconstruction images.
reconstruction results at the edge of objects, such as roads and Finally, we input the reconstructed images into the image clas-
buildings. To intuitively demonstrate the effectiveness of our sification network to obtain the classification result. As shown
method on edge enhancement, we compare the images before in Table III, the classification task performs the best over the SR
and after edge enhancement, their response edge maps, and the images obtained by CEEGAN. Compared to SwinIR and SR3,
difference between the two edges, as shown in Fig. 7. From which ranked second in the evaluation, CEEGAN can achieve
Fig. 7(j), it can be seen the differences between E ∗ and L(IP ) significant improvements of 0.58% and 1.91% on the UCMerced
1372 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024
Fig. 7. Comparison of images before and after edge enhancement is implemented. (a) Pre-SR image IP . (b) Edge map of Pre-SR. L(IP ) (c) Enhanced edge map
E ∗ . (d) SR image ISR . (e) Edge map of SR L(ISR ). (f) GT image IHR . (g) Edge map of GT L(IGT ). (h) Difference of (b) and (g) |L(IHR ) − L(IP )|. (i) Difference
of (c) and (g) |E ∗ − L(IP )|. (j) Difference of (b) and (c) |E ∗ − L(IP )|.
TABLE III
CLASSIFACTION RESULTS ON UCMERCED AND NWPU-RESISC45 DATASET
TABLE IV
SEMANTIC SEGMENTATION RESULTS ON LOVEDA DATASET
and NWPU-RESISC45 datasets, respectively. Furthermore, it is segmentation results. Our method can achieve the best perfor-
worth noting that our method is capable of generating results mance not only in the classification task but also in semantic
that closely resemble HR images, indicating that our method segmentation, with the highest values in OA, F1-score, and
can effectively preserve essential image features and produce mIoU. This can be attributed to the ability of edge loss to
highly realistic images. These advantages highlight the effec- constrain the edge graph, allowing the generation of accurate
tiveness and potential of our proposed method in practical high-frequency information, which enable CEEGAN to achieve
applications. results far beyond EEGAN on semantic segmentation experi-
4) Semantic Segmentation Results on LoveDA Dataset: To ments. The result proves that CEEGAN can not only improve
further verify the practicality of our method, we have de- the model’s understanding of the entire image but also improve
signed a comparative experiment for the semantic segmenta- the model’s ability to understand images at the pixel scale.
tion task. The specific experimental process is similar to the 5) Comparative Results of Real-World Scenarios SR: In
image classification task. The downsampled images are recon- real-world scenarios, the degradation process of images may
structed by different SR models. Then, they are input into the not align with the assumptions made in creating SR datasets.
deeplabv3 [57] model to compare the segmentation results. It Therefore, we select the no-reference metric NIQE to evaluate
should be noted that the semantic segmentation models are the performance of different models on a real-world scenario
trained on the LoveDA dataset, and the SR models are trained image from the WorldView-3 satellite. Fig. 9 shows the SR
on the UCMerced dataset. Table IV lists the semantic segmen- images reconstructed by different SR models along with their
tation evaluation results of SR images obtained by different SR corresponding NIQE values. It can be observed that CEEGAN
models on the LoveDA dataset. Fig. 8 visualizes the semantic can achieve the lowest NIQE value, indicating that CEEGAN
REN et al.: CONTEXT AWARE EDGE-ENHANCED GAN FOR REMOTE SENSING IMAGE SUPER-RESOLUTION 1373
Fig. 8. Visualization comparison of reconstruction results using different SR methods in semantic segmentation experiments.
Fig. 9. Comparison of SR reconstruction results on real-world scene images using different methods. Red indicates the model that achieved the lowest NIQE.
can produce more natural results when faced with unknown advantages in terms of parameters, FLOPs, and runtimes. Al-
degraded images. though CEEGAN may exhibit a gap in terms of model efficiency
6) Model Complexity Comparison: To compare the running compared to earlier algorithms like ESRGAN, it can achieve
efficiency of different models, we have compared the parame- superior results in terms of LPIPS and practical applications
ters, floating point operations (FLOPs), runtimes, LPIPS, and by leveraging larger models and more advanced architectural
classification accuracy on the UCMerced dataset. Although our designs. The comparative experimental results on model com-
primary objective is to optimize the performance of the SR model plexity confirm that CEEGAN acquires a better balance between
for machine vision applications in remote sensing images, the complexity and accuracy and has more potential for practical
results presented in Table V indicate that our model can also applications in real remote sensing scenarios.
offer advantages in terms of parameter numbers, FLOPs, and
run time. In addition, it outperforms some recent models for
remote sensing image SR, such as TransENet and EEGAN, in D. Ablation Studies
terms of runtime. Moreover, compared to the state-of-the-art In this section, we design a series of experiments on the
diffusion model in the field of SR, SR3, CEEGAN demonstrates UCMereced dataset to validate the effectiveness and necessity of
1374 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024
TABLE V
QUANTITATIVE COMPARISON OF PARAMETERS, FLOPS, RUNTIMES, LPIPS, AND CLASSIFICATION ACCURACY FOR DIFFERENT METHODS
TABLE VII
QUANTITATIVE COMPARISON OF DIFFERENT COMPONENTS IN ERB
TABLE IX [2] R. Dian, A. Guo, and S. Li, “Zero-shot hyperspectral sharpening,” IEEE
QUANTITATIVE COMPARISON OF DIFFERENT NUMBERS OF EFEMS Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 12650–12666,
Oct. 2023.
[3] R. Dian, T. Shan, W. He, and H. Liu, “Spectral super-resolution via model-
guided cross-fusion network,” IEEE Trans. Neural Netw. Learn. Syst., to
be published, doi: 10.1109/TNNLS.2023.3238506.
[4] L. He, W. Zhang, J. Shi, and F. Li, “Cross-domain association mining
based generative adversarial network for pansharpening,” IEEE J. Sel.
Topics Appl. Earth Observ. Remote Sens., vol. 15, pp. 7770–7783, 2022.
[5] X. Guan, F. Li, X. Zhang, M. Ma, and S. Mei, “Assessing full-resolution
pansharpening quality: A comparative study of methods and measure-
ments,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 16,
pp. 6860–6875, Jul. 2023.
[6] S. Mei et al., “Lightweight multiresolution feature fusion network for
spectral super-resolution,” IEEE Trans. Geosci. Remote Sens., vol. 61,
Jan. 2023, Art. no. 5501414.
[7] S. Mei, X. Li, X. Liu, H. Cai, and Q. Du, “Hyperspectral image classifica-
tion using attention-based bidirectional long short-term memory network,”
IEEE Trans. Geosci. Remote Sens., vol. 60, Aug. 2021, Art. no. 5509612.
[8] S. Mei, C. Song, M. Ma, and F. Xu, “Hyperspectral image classification
using group-aware hierarchical transformer,” IEEE Trans. Geosci. Remote
Sens., vol. 60, Sep. 2022, Art. no. 5539014.
[9] G.-S. Xia et al., “DOTA: A large-scale dataset for object detection in aerial
images,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018,
pp. 3974–3983.
[10] J. Wang, F. Li, and H. Bi, “Gaussian focal loss: Learning distribution
Fig. 10. Performance of LPIPS with the different numbers of EFEMs. polarized angle prediction for rotated object detection in aerial images,”
IEEE Trans. Geosci. Remote Sens., vol. 60, May 2022, Art. no. 4707013.
[11] S. Ji, S. Wei, and M. Lu, “Fully convolutional networks for multisource
building extraction from an open aerial and satellite imagery data set,”
same number of EFEMs gradually decrease, while the increase IEEE Trans. Geosci. Remote Sens., vol. 57, no. 1, pp. 574–586, Jan. 2019.
[12] S. Saha, F. Bovolo, and L. Bruzzone, “Unsupervised deep change vector
in parameter count is fixed. Although EFEM can improve the analysis for multiple-change detection in VHR images,” IEEE Trans.
model performance, its excessive use may increase model com- Geosci. Remote Sens., vol. 57, no. 6, pp. 3677–3693, Jun. 2019.
plexity and result in a potential loss of effectiveness, leading to [13] M. Liu, Q. Shi, A. Marinoni, D. He, X. Liu, and L. Zhang, “Super-
resolution-Based change detection network with stacked attention module
overfitting and lengthier training times. Therefore, we decided for images with different resolutions,” IEEE Trans. Geosci. Remote Sens.,
to limit the number of EFEMs used in practice to three, as it vol. 60, Jul. 2022, Art. no. 4403718.
strikes a balance between model performance and complexity. [14] R. Keys, “Cubic convolution interpolation for digital image processing,”
IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 6,
pp. 1153–1160, Dec. 1981.
[15] R. Schultz and R. Stevenson, “A Bayesian approach to image expansion
V. CONCLUSION for improved definition,” IEEE Trans. Image Process., vol. 3, no. 3,
pp. 233–242, May 1994.
In this work, a context aware EEGAN is proposed for remote [16] C.-Y. Yang and M.-H. Yang, “Fast direct super-resolution by simple
sensing SR. To address the limitations of existing edge recon- functions,” in Proc. IEEE Int. Conf. Comput. Vis., 2013, pp. 561–568.
struction methods, we propose a new edge enhancement method [17] C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using
deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell.,
that simultaneously exploits contextual semantic features with vol. 38, no. 2, pp. 295–307, Feb. 2016.
edge features and maintains information exchange during the [18] Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, and Y. Fu, “Image super-
gradual enhancement process. To maintain the accuracy of the resolution using very deep residual channel attention networks,” in Proc.
Eur. Conf. Comput. Vis., 2018, pp. 286–301.
high-frequency information, the edge loss function is designed [19] S. Lei, Z. Shi, and Z. Zou, “Super-resolution for remote sensing images
to constrain the generated edges in the edge domain. via local,” IEEE Geosci. Remote Sens. Lett., vol. 14, no. 8, pp. 1243–1247,
We compare the SR results on the UCMerced and AID Aug. 2017.
[20] D. Zhang, J. Shao, X. Li, and H. T. Shen, “Remote sensing image super-
datasets with the current advanced methods and achieve the best resolution via mixed high-order attention network,” IEEE Trans. Geosci.
results on the LPIPS. The ablation experiments prove the effec- Remote Sens., vol. 59, no. 6, pp. 5183–5196, Jun. 2021.
tiveness of each part in CEEGAN and the Edge loss function. [21] S. Zhang, Q. Yuan, J. Li, J. Sun, and X. Zhang, “Scene-adaptive re-
mote sensing image super-resolution using a multiscale attention net-
To further demonstrate the suitability of our results for practical work,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 7, pp. 4764–4779,
remote sensing applications, classification, and segmentation Jul. 2020.
experiments are conducted on the UCMerced and LoveDA [22] X. Dong, L. Wang, X. Sun, X. Jia, L. Gao, and B. Zhang, “Remote sensing
image super-resolution using second-order multi-scale networks,” IEEE
datasets, respectively. CEEGAN achieves the best results on Trans. Geosci. Remote Sens., vol. 59, no. 4, pp. 3473–3485, Apr. 2021.
multiple evaluation metrics, which proves that our method is [23] S. Lei and Z. Shi, “Hybrid-scale self-similarity exploitation for remote
more suitable for the practical application of remote sensing sensing image super-resolution,” IEEE Trans. Geosci. Remote Sens.,
vol. 60, Apr. 2022, Art. no. 5401410.
image SR. [24] Y. Xiao, Q. Yuan, K. Jiang, J. He, Y. Wang, and L. Zhang, “From degrade to
upgrade: Learning a self-supervised degradation guided adaptive network
REFERENCES for blind remote sensing image super-resolution,” Inf. Fusion, vol. 96,
pp. 297–311, Aug. 2023.
[1] G. Cheng, J. Han, P. Zhou, and D. Xu, “Learning rotation-invariant and [25] J. Feng et al., “A deep multitask convolutional neural network for remote
fisher discriminative convolutional neural networks for object detection,” sensing image super-resolution and colorization,” IEEE Trans. Geosci.
IEEE Trans. Image Process., vol. 28, no. 1, pp. 265–278, Jan. 2019. Remote Sens., vol. 60, Feb. 2022, Art. no. 5407915.
1376 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. 17, 2024
[26] C. Ledig et al., “Photo-realistic single image super-resolution using a [50] Q. Hou, D. Zhou, and J. Feng, “Coordinate attention for efficient mobile
generative adversarial network,” in Proc. IEEE Conf. Comput. Vis. Pattern network design,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
Recognit., 2017, pp. 4681–4690. 2021, pp. 13713–13722.
[27] B. Lim, S. Son, H. Kim, S. Nah, and K. Mu Lee, “Enhanced deep residual [51] J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style
networks for single image super-resolution,” in Proc. IEEE Conf. Comput. transfer and super-resolution,” in Proc. Eur. Conf. Comput. Vis., 2016,
Vis. Pattern Recognit. Workshops, 2017, pp. 136–144. pp. 694–711.
[28] X. Wang et al., “ESRGAN: Enhanced super-resolution generative ad- [52] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial extensions
versarial networks,” in Proc. Eur. Conf. Comput. Vis. Workshops, 2018, for land-use classification,” in Proc. 18th SIGSPATIAL Int. Conf. Adv.
pp. 63–79. Geographic Inf. Syst., 2010, pp. 270–279.
[29] K. Jiang, Z. Wang, P. Yi, G. Wang, T. Lu, and J. Jiang, “Edge-enhanced [53] G.-S. Xia et al., “AID: A benchmark data set for performance evaluation
GAN for remote sensing image superresolution,” IEEE Trans. Geosci. of aerial scene classification,” IEEE Trans. Geosci. Remote Sens., vol. 55,
Remote Sens., vol. 57, no. 8, pp. 5799–5812, Aug. 2019. no. 7, pp. 3965–3981, Jul. 2017.
[30] S. Jia, Z. Wang, Q. Li, X. Jia, and M. Xu, “Multiattention generative [54] G. Cheng, J. Han, and X. Lu, “Remote sensing image scene classifi-
adversarial network for remote sensing image super-resolution,” IEEE cation: Benchmark and state of the art,” Proc. IEEE, vol. 105, no. 10,
Trans. Geosci. Remote Sens., vol. 60, Jun. 2022, Art. no. 5624715. pp. 1865–1883, Oct. 2017.
[31] R. Dong, L. Zhang, and H. Fu, “RRSGAN: Reference-based super- [55] J. Wang, Z. Zheng, A. Ma, X. Lu, and Y. Zhong, “Loveda: A remote sensing
resolution for remote sensing image,” IEEE Trans. Geosci. Remote Sens., land-cover dataset for domain adaptive semantic segmentation,” in Proc.
vol. 60, Jan. 2022, Art. no. 5601117. Neural Inf. Process. Syst. Track Datasets Benchmarks, 2021.
[32] M. S. Moustafa and S. A. Sayed, “Satellite imagery super-resolution [56] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang, “The
using squeeze-and-excitation-based GAN,” Int. J. Aeronautical Space Sci., unreasonable effectiveness of deep features as a perceptual metric,” in
vol. 22, no. 6, pp. 1481–1492, Dec. 2021. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2018, pp. 586–595.
[33] K. Jiang, Z. Wang, P. Yi, J. Jiang, J. Xiao, and Y. Yao, “Deep distillation [57] L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethink-
recursive network for remote sensing imagery super-resolution,” Remote ing atrous convolution for semantic image segmentation,” 2017,
Sens., vol. 10, no. 11, Nov. 2018, Art. no. 1700. arXiv:1706.05587.
[34] Y. Xiao, X. Su, Q. Yuan, D. Liu, H. Shen, and L. Zhang, “Satellite video
super-resolution via multiscale deformable convolution alignment and
temporal grouping projection,” IEEE Trans. Geosci. Remote Sens., vol. 60,
Sep. 2022, Art. no. 5610819.
[35] P. Yi, Z. Wang, K. Jiang, J. Jiang, T. Lu, and J. Ma, “A progressive
fusion generative adversarial network for realistic and consistent video Zhihan Ren received the B.S. degree in informa-
super-resolution,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, tion engineering from the College of Communication
pp. 2264–2280, May 2022. Engineering, Jilin University, Changchun, China, in
[36] Z. Wang, K. Jiang, P. Yi, Z. Han, and Z. He, “Ultra-dense GAN for satel- 2023. He is currently working toward the M.S. degree
lite imagery super-resolution,” Neurocomputing, vol. 398, pp. 328–337, in information and communication engineering with
Jul. 2020. the School of Information and Communications En-
[37] I. J. Goodfellow et al., “Generative adversarial nets,” in Proc. Int. Conf. gineering, Xi’an Jiaotong University, Xi’an, China.
Neural Inf. Process. Syst., 2014. His research interests include deep learning and
[38] A. Dosovitskiy et al., “An image is worth 16x16 words: Transformers for remote sensing image understanding and processing.
image recognition at scale,” in Proc. 9th Int. Conf. Learn. Representations,
2021.
[39] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S.
Zagoruyko, “End-to-end object detection with transformers,” in Proc. Eur.
Conf. Comput. Vis., 2020, pp. 213–229.
[40] Z. Liu et al., “Swin transformer: Hierarchical vision transformer using
shifted windows,” in Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, Lijun He received the B.S. degree in information
pp. 9992–10002. engineering and Ph.D. degree in information and
[41] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, “SwinIR: communications engineering from the School of In-
Image restoration using Swin transformer,” in Proc. IEEE/CVF Int. Conf. formation and Communications Engineering, Xi’an
Comput. Vis. Workshops, 2021, pp. 1833–1844. Jiaotong University, Xi’an, China, in 2008 and 2016,
[42] S. Lei, Z. Shi, and W. Mo, “Transformer-based multistage enhancement respectively.
for remote sensing image super-resolution,” IEEE Trans. Geosci. Remote She is currently an Associate Professor with the
Sens., vol. 60, Dec. 2022, Art. no. 5615611. School of Information and Communications Engi-
[43] C. Saharia, J. Ho, W. Chan, T. Salimans, D. J. Fleet, and M. Norouzi, neering, Xi’an Jiaotong University. Her research in-
“Image super-resolution via iterative refinement,” IEEE Trans. Pattern terests include video communication and transmis-
Anal. Mach. Intell., vol. 45, no. 4, pp. 4713–4726, Apr. 2022. sion, video analysis, processing, and compression
[44] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image techniques.
recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016,
pp. 770–778.
[45] H. Chen et al., “Pre-trained image processing transformer,” in
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2021,
pp. 12299–12310.
[46] S. Lei, Z. Shi, and Z. Zou, “Coupled adversarial training for remote sensing Jichuan Lu received the master’s degree in commu-
image super-resolution,” IEEE Trans. Geosci. Remote Sens., vol. 58, no. 5, nication and information system from the School of
pp. 3633–3643, May 2020. Communication and Information Engineering, Xid-
[47] J. Canny, “A computational approach to edge detection,” IEEE Trans. ian University, Xi’an, China, in 2011.
Pattern Anal. Mach. Intell., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. He is currently a Senior Expert with China Mo-
[48] S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. IEEE Int. bile Communications Group, Xi’an, China, mainly
Conf. Comput. Vis., 2015, pp. 1395–1403. engaged in research in the fields of cloud computing
[49] M. Pu, Y. Huang, Y. Liu, Q. Guan, and H. Ling, “EDTER: Edge detec- and AI.
tion with transformer,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit., 2022, pp. 1392–1402.