High Spatial Resolution Remote Sensing Scene Class
High Spatial Resolution Remote Sensing Scene Class
X, JANUARY 2020 1
Abstract—Deep convolutional neural network (DeCNN) is object identification, classification and information extraction
considered one of promising techniques for classifying the high [1]–[3]. In recent years, a lot of HSRRS images have been
spatial resolution remote sensing (HSRRS) scenes, due to its acquired and significant efforts have been made for land use
powerful feature extraction capabilities. It is well-known that
huge high quality labeled datasets are required for achieving land cover (LULC) scene classification in the field of pattern
the better classification performances and preventing over-fitting, recognition [4]–[8]. These approaches extract features firstly
during the training DeCNN model process. However, the lack of from training data and then build a classification model for
high quality datasets often limits the applications of DeCNN. testing other data. Most of the recognition methods are based
In order to solve this problem, in this paper, we propose a on deep learning.
HSRRS image scene classification method using transfer learning
and DeCNN (TL-DeCNN) model in few shot HSRRS scene Deep learning has been successfully applied in extraction
samples. Specifically, three typical DeCNNs of VGG19, ResNet50 of abstract and semantic features [9]–[15], and it performs
and InceptionV3, trained on the ImageNet2015, the weights well in target identification, object detection and classification.
of their convolutional layer for that of the TL-DeCNN are Convolutional neural network (CNN) is one of typical deep
transferred, respectively. Then, TL-DeCNN just needs to fine- learning algorithms, and many types of algorithms based on
tune its classification module on the few shot HSRRS scene
samples in a few epochs. Experimental results indicate that CNN (e.g., ResNet, VGG, Inception) have been developed
our proposed TL-DeCNN method provides absolute dominance in computer vision, natural language processing, medical and
results without over-fitting, when compared with the VGG19, remote sensing image processing [16]. These practical appli-
ResNet50 and InceptionV3, directly trained on the few shot cations indicated that the depth of a network is vital for the
samples. model, when adding layers to the network, it can extract more
Index Terms—Transfer learning, deep convolutional neural complex features. While the model with a deeper layer will
network (DeCNN), few shot, high spatial resolution remote obtain better performance and training CNN model, especially
sensing (HSRRS), scene classification. deep CNN (DeCNN) model often requires a lot of labeled data.
However, it is hard to obtain a huge amount of labeled data
I. I NTRODUCTION to train the DeCNN model for HSRRS scene classification.
In addition, it takes a lot of manpower and resources to label
W ITH the development of satellite remote sensing and
computer technology, the spatial resolution and texture
information of remote sensing image is improved and the pro-
the HSRRS data. When the size of labeled data is not large
enough, the trained DeCNN model easily show an over-fitting
cessing approaches have been updated. High spatial resolution problem. Several studies have shown that transfer learning get
remote sensing (HSRRS) image with higher spatial resolution a good performance in classification and recognition for small
and abundant texture details have been performed well in scale training data [17].
In this paper, we propose a transfer learning and DeCNN
Manuscript received April 19, 2019; revised August 26, 2019. model (TL-DeCNN) based classification method to reduce
This work was supported by the Project Funded by the National Science and
Technology Major Project of China under Grant TC190A3WZ-2, the Natural the over-fitting problem and improve the classification accu-
Science Foundation of Jiangsu Province under Grant BK20191384, the China racy with limited labeled samples. Specifically, three typical
Postdoctoral Science Foundation under Grant 2019M661896, the National deep CNN models, i.e., VGG19, ResNet50 and InceptionV3,
Natural Science Foundation of China under Grant 61671253, the Jiangsu Spe-
cially Appointed Professor under Grant RK002STP16001, the Innovation and are combined with transfer learning, respectively, and these
Entrepreneurship of Jiangsu High-level Talent under Grant CZ0010617002, combined algorithms are called TLVGG19, TLResNet50 and
the Six Top Talents Program of Jiangsu under Grant XYDXX-010, the 1311 TLInceptionV3. To assess the performance of TL-DeCNN for
Talent Plan of Nanjing University of Posts and Telecommunications, Nanjing
University of Posts and Telecommunications Science Foundation (NUPTSF few shot HSRRS scene classification, the retraining and testing
Grant No. 218085). (Corresponding author: Guan Gui) accuracy, loss, confusion matrix, overall accuracy (OA) and
W. Li, J. Wu, and Y. Jia are with the School of Geographic and Bi- kappa coefficient (KC) are used. The main contribution of the
ologic Information, Nanjing University of Posts and Telecommunications,
Nanjing 210023, China (e-mails: [email protected], [email protected], paper includes three aspects:
[email protected]) • DeCNN based HSRRS image scene classification method
Z. Wang, Y. Wang, J. Wang and G. Gui are with the College of T- is presented in a few shot samples. We train three CNN
elecommunications and Information Engineering, Nanjing University of Posts
and Telecommunications, Nanjing 210003, China (e-mails: {1019010409, models, i.e., VGG19, ResNet50 and InceptionV3, in a
1018010407, 1219012920, guiguan}@njupt.edu.cn) few shot samples and evaluate their accuracy. Experiment
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. XX, NO. X, JANUARY 2020 2
results show that InceptionV3 is the best mode among the classification based on CNN has achieved excellent results
three models. recently. Penatti et al. evaluated the generalization power of
• TL-DeCNN based HSRRS image scene classification is CNN features from fully-connected layers and obtained a state
proposed in limited labeled samples case. Our proposed of the art result with a public HSRRS image data sets [25].
TL-DeCNN model is trained in a limited labeled HSRRS Feature fusion strategies to integrate the multilayers features
scene samples in a few epochs by considering fine-tune. to CNNs for HSRRS image scene classification have been
• DeCNN based scene classification method is also consid- proposed to complete the classification tasks [18], [26]–[29].
ered as benchmark method using large amount of labeled Gong et al. proposed a deep structural metric based learning
HSRRS images. approach for HSRRS image scene classification [30]. Ji et al.
The remainder of the paper is organized as follows. An proposed a model based on multilevel features and attention
overview of CNN based HSRRS image scene classification model for remote sensing image scene classification [31].
and transfer learning based application is presented in Section Bi et al. proposed an attention pooling based convolutional
II. The proposed architectures based on DeCNN model and network for aerial scene classification [32]. The early works
TL-DeCNN for HSRRS image scene classification with small have achieved excellent results in HSRRS image scene clas-
and large amounts of labeled data are given in Section III, re- sification with a fully training CNN model. However, training
spectively. In the Section IV, the HSRRS image preprocessing, a CNN model needs a considerable amount of labeled data
the architecture based on VGG19, ResNet50 and InceptionV3 set, which is rather difficult for HSRRS images. Many efforts
for scene classification will be described, respectively. Also have been made to add the training samples or improve the
the evaluation indexes of the classification model will be de- robustness of CNN, including data enhancement, detecting
scribed. Following, the results of HSRRS scene classification adversarial perturbations [33], increasing the depth of CNN
with DeCNN and TL-DeCNN with a few shot samples and and transferring the pre-trained CNN model or knowledge into
quantitative indicators are described in Section V. Meanwhile, scene classification task [34].
the results of the large amount of labeled HSRRS image scene Transfer learning is an important solution for improving
classification based on DeCNN are compared with that of TL- the robustness of CNN based classification models. Zhang et
DeCNN with few shot. Finally, some concluding remarks are al. based on the features of adjacent parallel lines searched
drawn in Section VI. for regions of interest and confirmed the final targets through
transfer learning on the AlexNet [36]. Li et al. proposed a
best activation model (BAM) in the end-to-end process for
II. R ELATED W ORK
LULC image classification [4]. Nogueira et al. proposed a
HSRRS image scene classification problem can be extracted method by transferring parameters from a pre-trained network
subregions into different semantic classes, and it is a funda- and retrained the new network without parameter selection
mental task and significant for remote sensing applications, [37]. Zhao et al. combined the pre-trained AlexNet with a
such as urban planning, object detection, and natural resource multilayer perception structure to make classification [38].
management. Many recent works have demonstrated that CNN Huang et al. constructed a semi-transfer DeCNN to make
is the most successful and widely applied deep learning image classification [39].
method, and has been used to make HSRRS image scene
classification task [18]. Especially, the DeCNN performs well III. T HE P ROPOSED TL-D E CNN BASED M ETHOD
in semantic features extraction with a lot of convolutional
Deep learning based HSRRS scene classification problem
layers and a large amount of training data set. However, it
is still a challenge due to the limited labeled images. In
is difficult to train a DeCNN model with a few samples.
this section, a robust classification method using TL-DeCNN
HSRRS image has higher spatial resolution and fewer
is proposed. The architecture for our proposed TL-DeCNN
spectral channels compared with coarse or medium spatial res-
based HSRRS image scene classification method is shown
olution remote sensing data, and it is more difficult to identify
in Fig. 1. We can see that the architecture can be divided
subtle differences among similar land cover types. Meanwhile,
into three steps, the first step is training classification model
the phenomenon “same object with different spectrum” and
based on ImageNet2015 and transfer the knowledge to the
“same spectrum with different objects” of HSRRS image leads
target classification task, the second step is fine-tuning with
to the failure in solving lots of classification tasks with high
HSRRS images, and the third is the evaluation indicators for
accuracy demand. Tremendous efforts have been made to
model and results. The goal of the architecture is to transfer
develop robust and automatic image classification methods.
deep knowledge from the ImageNet2015 to the limited training
Machine learning approaches (eg. support vector machine,
HSRRS image data in urban built-up areas scene classification,
random forest, k-nearest neighbor and multilayer perceptron)
and improve the accuracy of classification.
have been used widely in HSRRS image classification, and
lots of achievements have been gained [19]–[21].
Recently, deep learning has represented the state of the A. Transfer Learning
art in a variety of domains, and CNN as a typical deep Transfer learning is a popular training strategy to overcome
learning method, has obtained excellent results in the field of the label-limited difficulty by initializing the training model
computer vision [22], wireless communications [23], [24] and with the parameters or knowledge which have been learned
remote sensing image processing [18]. HSRRS image scene from other large data sets. Through fine-tuning with a small
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. XX, NO. X, JANUARY 2020 3
Dense(1000)
Softmax
Average
Pooling
Global
…… Deep CNN model training by a model in application. The model can be expressed as
(VGG19/ ResNet50/InceptionV3)
Dense(1024)
Dense(10)
and θ can be divided into two parts: θ=(θF , θCCE ), and
Average
Softmax
Pooling
Global
Deep CNN model retraining
(VGG19/ ResNet50/InceptionV3) the former is feature extraction or learning and the latter is
……
called classification cross-entropy (CCE) loss function, which
Target HSRRS Fine-tuning convolution layers is applied to make multi-category classification or prediction.
image dataset
Therefore, the equation can be written as
Fig. 1: The framework of our proposed TL-DeCNN based
method. T˜ = fCN N (θF , θCCE ; S) (4)
4) InceptionV3: InceptionNet is proposed to increase the learning automatically extracts more expressive features and
depth and width of the network, and finally improves the meets the requirement of end-to-end in practical applications
performance of the neural network. InceptionV3 is one of the [43].
most popular InceptionNet for classification [35]. It introduces
the idea of factorization into small convolutions and uses C. Fine-tuning for HSRRS Image Scene Classification Task
branches not only in the inception module but also in the
Fine-tuning is the process to initialize the HSRRS scene
branches, which can promote high dimensional representation-
classification task network with the trained knowledge, which
s.
5) Transfer learning based method: Fig. 2 is the schematic is transferred from the ImageNet2015. And the model is
diagram of transfer learning, and given a source domain DS trained with the labeled HSRRS images further, the adjustment
and learning task TS , a target domain DT and learning task of parameters is the same with that in scratch training. It
TT , transfer learning is defined to help improve the learning of requires the layer of the initial network is the same with that
the target predictive function fT () in DT with the knowledge of the source network, including the same layer name, types,
in DS and TS , where DS 6= DT , or TS 6= TT . What’s need setting parameters and so on. The fine-tuning is a vital process
to be noted is that each domain is a pair DS = XS , P(X )S for HSRRS scene classification, not only make the network
and DT = XT , P(X )T , the condition implies for the source converge as quickly as possible but also make generic features
and target task, either the term features are different or their contribute to a specific task. Compared with the learning rate
marginal distribution are different. Similarly, the tasks have the in model training with ImageNet2015 (0.005), the fine-tuning
same requirement. Therefore, it can be inclined to that when learning rate is smaller (0.001), this setting can improve the
the domains are different, either the feature spaces are different accuracy of the HSRRS scene classification.
or the feature spaces between the domains are the same but
the marginal probability distributions are different. And the D. Accuracy Verification
definition implies that when there is some relationship (overt The evaluation metrics include confusion matrix, OA, KC
or covert) between the feature spaces of the two domains, the and precision. The confusion matrix is the most commonly
source and target domains are considered related, and transfer used indicator for evaluating the performances. The OA is
learning can be carried out between the two domains. an indicator for evaluating the proportion correctly classified.
The KC calculation using the confusion matrix is applied
to check consistency and evaluate classification precision. It
considers not only the overall accuracy but also the imbalance
Source Target
domain of the number of samples in each category. The precision
domain
\task is an indicator measuring the accuracy of each class, and it
\task
means the number classified into a certain class, which actually
belongs to the true class.
IV. E XPERIMENTS
Present model In this section, to check the performance of the proposed
Previous model
TL-DeCNN, experiments have been conducted on three as-
pects. The first one is few shot HSRRS image scene classifi-
cation based on VGG19, ResNet50, and InceptionV3, respec-
Knowledge/ tively. The second one is limited labeled HSRRS image scene
Features
classification based on TL-DeCNN, which means transferring
the knowledge trained by VGG19, ResNet50, and InceptionV3
Fig. 2: The schematic diagram of transfer learning.
based on ImageNet2015, to the target limited labeled HSRRS
image data set to make classification, respectively. And the
There are three topics in transfer learning, the first one is
third one is a large amount of labeled HSRRS images for scene
what to transfer, the second is how to transfer and the third
classification based on VGG19, ResNet50, and InceptionV3,
is when to transfer. What to transfer means which part of
respectively.
knowledge can be transferred across domains or tasks. How
to transfer means developing algorithms to transfer the knowl-
edge and when to transfer asks in which situations, transfer A. Data Description
learning should be done. In this paper, we aim to achieve The HSRRS images collected in urban built-up areas are
good performance in the target HSRRS scene classification extracted from the UC merced land use dataset [45] and
task by transferring knowledge from the source ImageNet2015 the remote sensing image classification benchmark (RSI-CB)
task, and as there are labeled data both in source and task dataset [46]. There are 10 categories objects needed to be
domains, it belongs to the inductive transfer learning setting classified in our experiments, and the sample size of training
[43]. Meanwhile, the preliminary trained model based on and testing for few, TL-DeCNN-few and large amount labeled
DeCNN with ImageNet2015 is also geared to deep transfer samples are shown in Tab. I, respectively. All of the testing
learning. Compared with the non-deep approach, deep transfer sample sizes are the same, and it is 100 samples for each
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. XX, NO. X, JANUARY 2020 5
category. The few and TL-DeCNN-few amount of labeled performance, which reduces the over fitting phenomenon to
samples for training is randomly selected in the large number some extent. However, the ResNet50 proposed for HSRRS
of labeled samples. The training samples for TL-DeCNN-few scene classification with few shot still demonstrates a certain
not only contains the few HSRRS image samples but also over-fitting problem.
includes the knowledge transferred from the ImageNet2015. 3) InceptionV3: To solve the over-fitting problem further,
Therefore, it combines the prior knowledge with the target InceptionV3 is applied to the limited labeled HSRRS scene
to make an identification. It is notice that effective data classification task. As described in section III, the idea of
augmentation has been made for the large number of labeled InceptionV3 is the fractorization, which promotes high dimen-
samples to enlarge the number of training samples and increase sional representations. The accuracy and loss during training
their diversity [18]. and testing stages are shown in Fig. 3(c). It shows that
the accuracy is 100% and 83.0% in training and testing
TABLE I: The sample size in our experiments. after stabilization, respectively. And the loss is 0 and about
3 1.8 in training and testing phases, respectively. Compared
SS 2 FS TLDCNN-FS 4 LS 5
C 1
Train Test Train Test Train Test with the accuracies and losses of VGG19 and ResNet50, the
Airport 10 100 K 6 +10 100 578 100 InceptionV3 is better in solving the over-fitting problem. But
Avenue 10 100 K+10 100 444 100 the testing result is still much worse than that of training, and
Bridge 10 100 K+10 100 369 100 there is still over-fitting for InceptionV3 model with few shot.
Building 10 100 K+10 100 914 100
Roadside tree 10 100 K+10 100 321 100 C. TL-DeCNN based HSRRS Image Scene Classification
Road 10 100 K+10 100 367 100 Method
Marina 10 100 K+10 100 266 100
Parking lot 10 100 K+10 100 367 100 The TL-DeCNN is proposed to solve the over-fitting prob-
Residents 10 100 K+10 100 710 100 lem with limited training HSRRS images. Similar with that
Storeroom 10 100 K+10 100 1207 100 of few shot experiments, TL-DeCNN experiment is carried
1
Category out based on limited labeled HSRRS image and knowledge
2
Sample size transferred from ImageNet2015. Three typical deep CNN
3
Few shot learning models VGG19, ResNet50 and InceptionV3 are considered
4
TLDCNN-few shot learning
5
Large amount labeled sample in this experiment.
6
Knowledge 1) VGG19: The architecture of HSRRS scene classification
based on transfer learning and VGG19 (TLVGG19) model can
be seen from Fig. 1. The knowledge trained by VGG19 with
ImageNet2015 is transferred to the limited labeled HSRRS
B. HSRRS Image Scene Classification in a Few Shot scene classification task. The accuracy and loss during the task
In this experiment, VGG19, ResNet50 and InceptionV3 are training and testing are shown in Fig. 4(a). When the process
applied for HSRRS image scene classification in few shot case, is stabilized, the training accuracy is 100%, and the testing
respectively. accuracy is 90.0%. Meanwhile, the training loss is 0, and the
1) VGG19: There are 16 convolutional layers mainly using testing loss is nearly to 0.25. Compared with that without
3 × 3 convolutional kernels and 3 fully connected layers. The transferred knowledge, the HSRRS scene classification task
combination of convolutional, BN and ReLu layers constructs based on TLVGG19 performs better in accuracy and loss. The
a convolutional block. The max pooling layer is applied in testing accuracy increases from about 40% to 90.0%, and the
every two or three convolutional blocks. And the convolutional testing loss decreases from about 8 to 0.25. It demonstrates
blocks are followed by the dense layers, which are set as 4096, that the proposed approach can greatly reduce the effect of
4096 and 10 in our experiment. Finally, the softmax is applied over fitting problems with limited labeled HSRRS images.
to make a classification. The accuracy and loss in the training 2) ResNet50: In few shot HSRRS image scene classifi-
and testing stages are shown in Fig. 3(a). It is easier to see cation task, the architecture of transfer learning based on
that the accuracy in training is nearly to 100% and that in ResNet50 (TLResNet50) is also shown in Fig. 1. Similar to
testing is lower than 40%. Meanwhile, the loss is close to 0 and TLVGG19, the architecture transfers the knowledge trained
fluctuating around 8 in training and testing stages, respectively, with ImageNet2015 to the target HSRRS scene classification
which means the VGG19 model is over-fitting in HSRRS task. And the added one fully connected layer is able to
image scene classification with limited labeled samples. map features to result better. The accuracy and loss during
2) ResNet50: As illustrated in Fig. 1, the limited labeled task training and testing are shown in Fig. 4(b), the training
HSRRS images are input into the ResNet50 model. And the accuracy is 100% and the testing accuracy is about 93.3%
accuracy and loss in training and testing phases are shown in when the processes are stable. The loss is 0 and 0.65 in the
Fig. 3(b). It can be seen that the training accuracy is nearly training and testing phase after the process stabilized, respec-
to 100%, and the testing accuracy is about 75% which is tively. Compared with that without transfer knowledge, the
below 80% after training and testing process is stabilized. testing accuracy increases about 18%, and the loss decreases
Meanwhile, the training loss is nearly to 0, and the test loss about 74%. This result indicates that the TLResNet50 solve
is larger than 2 when the model is stable. Compared with the effect of over-fitting problem well with limited labeled
the accuracy and loss of VGG19, ResNet50 obtains a better HSRRS image.
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. XX, NO. X, JANUARY 2020 6
100% 14
100% 10
90%
12
80% 8 80%
10
70%
Train-Inception-v3-Acc
Train-VGG19-Acc
Test-Inception-v3-Acc
60% Test-VGG19-Acc 6 60%
Train-Inception-v3-Loss 8
Train-VGG19-Loss
Accuracy
Accuracy
Test-Inception-v3-Loss
Loss
Loss
50%
Test-VGG19-Loss
6
40% 4 40%
30% 4
20% 2
20%
2
10%
0 0
0 0
0 10 20 30 40 50 0 10 20 30 40 50
Epoch Epoch
Fig. 3: The accuracy and loss during training and testing for (a) VGG19, (b) Resnet50, and (c) InceptionV3.
100% 2.5 1
100% 100% 2.5
90%
90% 0.8
2 90% 2
80%
Train-Resnet50-Acc 70%
80% Train-VGG19-Acc
Train-Inception-v3-Acc
1.5 80% Test-Resnet50-Acc 0.6
60% 1.5
Test-VGG19-Acc
Accuracy
Accuracy
Train-Resnet50-Loss Test-Inception-v3-Acc
Loss
Loss
Accuracy
70% Train-VGG19-Loss
Loss
50%
Test-Resnet50-Loss Train-Inception-v3-Loss
Test-VGG19-Loss 1 70% 0.4
40% 1
Test-Inception-v3-Loss
60%
30%
10%
40% 0 50% 0
0 0
0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50
Epoch Epoch Epoch
Fig. 4: The accuracy and loss during training and testing for (a) TLVGG19, (b) TLResnet50, and (c) TLInceptionV3.
3) InceptionV3: The architecture of transfer learning com- compared with the VGG19 based HSRRS scene classification
bined with InceptionV3 (TLInceptionV3) for limited labeled trained with a large number of labeled samples, the TLVGG19
HSRRS image is also shown in Fig. 1. The accuracies and with few shot could obtain similar results, and reduces the
losses in training and testing processes are shown in Fig. effect of over fitting problem.
4(c). After the process is stabilized, the testing accuracy and 2) ResNet50: The ResNet50 is suitable for scene classifi-
loss is about 93.3% and 0.26 , respectively. Compared with cation with a large number of labeled samples. The accuracies
the InceptionV3 without transferred knowledge, the testing and losses in training and testing are shown in Fig. 5(b). After
accuracy increases by 10.3%, and the testing loss decreases about 10 epochs, the testing accuracy and loss are stable, and
from 1.8 to 0.26, which indicates that the approach we the testing accuracy is close to 98% and the testing loss is
proposed is effective in solving the over fitting problem with nearly to 0. Compared with the testing accuracy and loss
limited labeled HSRRS images. in TLResNet50 with few shot, ResNet50 architecture with a
large number of labeled samples is better for HSRRS scene
D. HSRRS Image Scene Classification in a Large Number of classification task. It demonstrates that the transfer learning
Labeled Samples contributes to the classification task, and the testing effect is
inferior to the approach based on ResNet50 with large amount
From the above experiments IV-B and IV-C, it has been
labeled samples.
found that the TL-DeCNN architectures, including TLVGG19,
3) InceptionV3: The InceptionV3 is a typical DeCNN for
TLResNet50 and TLInceptionV3 are efficient and effective
deep features extraction. It is good at extracting deep features
in solving the over-fitting problem. However, whether the
from a large number of labeled samples. The accuracies and
accuracy and loss of TL-DeCNN can compare with that of
losses of InceptionV3 with a large amount of labeled HSRRS
a large number of labeled samples based on DeCNN. This
images in training and testing are shown in Fig. 5(c). It can be
experiment is carried out with augmented HSRRS images
seen that after about 15 epoches, the testing accuracy and loss
using VGG19, ResNet50 and InceptionV3, respectively.
are stable, and the former is stable around 99%, the latter is
1) VGG19: As described in IV-A, there are more than 1064
stable around 0.1, which is better than that in TLInceptionV3.
samples (the size of the fewest samples is 266, and geometric
transformations have been applied for data augmentation) for
training in each category in the large amount of labeled data V. R ESULTS AND D ISCUSSIONS
experiment. The accuracies and losses in training and testing First of all, we present the confusion matrix of each DeCNN
are shown in Fig. 5(a), and it can be seen that the testing classifier. Fig. 6 shows the confusion matric of HSRRS image
accuracy is about 90% and the testing loss is about 0.38, which classification with VGG19, ResNet50 and InceptionV3 based
is similar with that of TLVGG19. Therefore, it indicates that on limited labeled samples. The OA of classification is 35.9%,
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. XX, NO. X, JANUARY 2020 7
80%
80% 3 80%
10 10
Train-VGG19-Acc 70%
70%
70% 2.5
Test-VGG19-Acc Train-Resnet50-Acc Train-Inception-v3-Acc
60%
8 8
60% Test-Resnet50-Acc
Accuracy
Test-Inception-v3-Acc
Train-VGG19-Loss
Accuracy
60% 2
Loss
Accuracy
Loss
Train-Resnet50-Loss 50%
Loss
Train-Inception-v3-Loss
Test-VGG19-Loss 50% Test-Resnet50-Loss Test-Inception-v3-Loss 6
6
50% 1.5 40%
40%
4 30% 4
40% 1
30%
20%
30% 0.5 2 2
20% 10%
20% 0 0
10% 0 0
0 10 20 30 40 50 0 10 20 30 40 50
0 10 20 30 40 50
Epoch Epoch Epoch
Fig. 5: The accuracy and loss during training and testing for (a) VGG19, (b) Resnet50, and (c) InceptionV3.
77.8%, and 87.0% for VGG19, ResNet50 and InceptionV3 for each category obtained by TLVGG19 are greatly improved,
architecture, respectively. The KC is 0.287, 0.753 and 0.856 and the category with the greatest growth is “avenue”, from
for VGG19, ResNet50 and InceptionV3 architecture, respec- 24.0% to 99.0%. Compared with VGG19, ResNet50 obtains
tively. Fig. 7 shows the confusion matric of HSRRS image better precision for all categories. The lowest precision is
classification with TLVGG19, TLResNet50 and TLInception- “bridge” 46.0%, and after the knowledge transferred into the
V3 based on limited labeled samples. The OA is 89.0%, model, the precision increases to 96.0%. Similar to the situa-
95.7% and 92.4% and the KC is 0.878, 0.952 and 0.916 tion of VGG19, when the transferred knowledge is considered,
for TLVGG19, TLResNet50 and TLInceptionV3 architecture, the precisions of all categories are improved. The lowest
respectively. Fig. 8(a) shows the OA and KC of DeCNN and precision is 41% of the InceptionV3 model for “bridge” identi-
TL-DeCNN with fine-tuning. From the figure, we can see that fication. After the knowledge transferred into the architecture,
the transferred knowledge improves the OA and KC for TL- the precision increases to 83%. Most of the precisions are
DeCNN classification models. Transfer learning improves the improved, but the precision of the “airport” category decreases
OA of VGG19 (increases by 53.1%) most obviously, and has from 98% to 89%. It may be caused by the transferred
the least effect on the OA of InceptionV3 (increases by 5.4%). knowledge which is extracted from huge airport information in
Meanwhile, for few shot learning, InceptionV3 obtains the best ImageNet2015. The transferred knowledge contains intricate
OA and KC, and after adding the transferred knowledge the airport information, which is not similar or the same with our
TLResNet50 gets the best performance in OA and KC. And task “airport” in features. In short, the transferred knowledge
Fig. 8(b) is the corresponding OA and KC without fine-tuning, improves the precisions of most of the categories for DeCNN
the best OA and KC is 14.9% and 0.054, respectively, for the scene classification tasks.
three TL-DeCNN models, it may indicate that fine-tuning is a Finally, to evaluate the performance gap between TL-
key step for ensuring forward transfer learning. DeCNN based on limited labeled samples and DeCNN based
on a large amount of labeled HSRRS images, the VGG19,
100% 1
100% OA_VGG19 1 ResNet50, and InceptionV3 are applied to make HSRRS
90% 0.9
80% 0.8
90% KC_VGG19 0.9
scene classification with a large amount of labeled samples,
80% OA_ResNet50 0.8
70% 0.7
70% OA_InceptionV3 0.7 respectively. The OA and KC is 96.1%, 97.1%, 99.4%, 0.956,
60% 0.6
OA_VGG19 60% KC_InceptionV3 0.6
0.968, and 0.993 for VGG19, ResNet50 and InceptionV3, re-
OA
KC
50% 0.5
OA
50% 0.5
KC
OA_ResNet50 KC_ResNet50
40%
KC_ResNet50
0.4 40% 0.4 spectively. It’s obvious that the OA and KC are both larger than
30% 0.3
20%
OA_InceptionV3
0.2
30% 0.3
that obtained by TL-DeCNN, among which the InceptionV3
20% 0.2
10%
KC_InceptionV3
KC_VGG19
0.1 10% 0.1 obtains the best result for a large number of labeled samples,
0%
Deep CNN TLDCNN
0 0%
Deep CNN TLDCNN
0
and for few shot samples TLResNet50 is the best architecture.
(a) (b)
VI. C ONCLUSION
Fig. 8: The OA and KC of DeCNN and TL-DeCNN with (a) In this paper, three TL-DeCNN models, i.e., TLVGG19,
fine-tuning, and (b) without fine-tuning. TLResNet50 and TLInceptionV3 are proposed for HSRRS
scene classification in urban built-up areas. The main idea
Then, the precision of each category with VGG19, of our work is to solve the over fitting and gradient dis-
ResNet50, InceptionV3, TLVGG19, TLResNet50 and TLIn- appearance problems with limited labeled HSRRS images.
ceptionV3 is labeled in the Tab. II. VGG19 obtains the lowest Three experiments have been carried out, first is the DeCNN
precision 9.0% for “road” identification, at the same time based HSRRS scene classification with few shot, the second
ResNet50 and InceptionV3 gets 99.0% and 91.0% precision is the TL-DeCNN based scene classification with the same
for the same class. The same phenomenon appears in “roadside few shot, and the third one is DeCNN based HSRRS scene
tree” and “marina” classes, and it may indicate that ResNet50 classification with a large number of labeled samples. The
and InceptionV3 perform better for these objects identification. results show that for few shot HSRRS scene classification,
When the transferred knowledge is considered, the precisions all of the three architectures TLVGG19, TLResNet50 and
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. XX, NO. X, JANUARY 2020 8
Fig. 6: The confusion matric of limited HSRRS image samples based on (a) VGG19, (b) ResNet50, and (c) InceptionV3.
Fig. 7: The confusion matric of limited HSRRS image samples based on (a) TLVGG19, (b) TLResNet50, and (c) TLInceptionV3.
TLInceptionV3 greatly improve the performance compared transfer learning and DeCNN with few shot.
with that without transferred knowledge. And the ResNet50 is
more suitable for transfer learning applications compared with R EFERENCES
VGG19 and InceptionV3, and InceptionV3 could reduce the
over fitting and gradient disappearance problems to a certain [1] Y. Zhong, X. Han, and L. Zhang, “Multi-class geospatial object detection
based on a position-sensitive balancing framework for high spatial
degree and it performs better with few shot. Meanwhile, DeC- resolution remote sensing imagery,” ISPRS Journal of Photogrammetry
NN based HSRRS scene classification with a large amount of and Remote Sensing, vol. 138, pp. 281–294, 2018.
labeled HSRRS images show that their performance are better [2] H. Liu, X. Y. Huang, et al., “Hybrid polarimetric GPR calibration
and elongated object orientation estimation,” IEEE Journal of Selected
compared with Tl-DeCNN with few shot. It indicates that there Topics in Applied Earth Observations and Remote Sensing, vol. 12, no.
is still space for improvement of classification performance for 07, pp. 2080-2087, 2019.
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. XX, NO. X, JANUARY 2020 9
[3] H. Liu, H. Y. Xia, et al., “Reverse time migration of acoustic waves [25] O. A. Penatti, K. Nogueira, J. A. Dos Santos, “Do deep features
for imaging based defects detection for concrete and CFST structures,” generalize from everyday objects to remote sensing and aerial scenes
Mechanical System and Signal Processing, vol. 117, pp. 210-220, 2019. domains?” in Proceedings of the IEEE Conference on Computer Vision
[4] B. Li, W. Su, H. Wu, R. Li, W. Zhang, W. Qin, S. Zhang, and J. Wei, and Pattern Recognition Workshops, Boston, MA, USA, 12 June 2015,
“Further exploring convolutional neural networks potential for land-use pp. 44–51.
scene classification,” IEEE Geoscience and Remote Sensing Letters, doi: [26] E. Li, J. Xia, P. Du, C. Lin, and A. Samat, “Integrating multilayer
10.1109/LGRS.2019.2952660, [Online]. features of convolutional neural networks for remote sensing scene
[5] G. Cheng, C. Yang, X. Yao, L. Guo, and J. Han, “When deep learning classification,” IEEE Transactions on Geoscience and Remote Sensing,
meets metric learning: remote sensing image scene classification via vol. 55, no. 10, pp. 5653–5665, 2017.
learning discriminative CNNs,” IEEE Trans. Geoscience and Remote [27] S. Chaib, H. Liu, Y. Gu, and H. Yao, “Deep feature fusion for VHR
Sensing, vol. 56, no. 5, pp. 2811–2821, 2018. remote sensing scene classification,” IEEE Transactions on Geoscience
[6] G. Cheng, Z. Li, J. Han, X. Yao, and L. Guo, “Exploring hierarchi- and Remote Sensing, vol. 55, no. 8, pp. 4775–4784, 2017.
cal convolutional features for hyperspectral image cassification,” IEEE [28] C. Ma, X. Mu, and D. Sha, “Multi-layers feature fusion of convolutional
Transactions on Geoscience and Remote Sensing, vol. 56, no. 11 pp. neural network for scene classification of remote sensing,” IEEE Access,
6712–6722, 2018. vol. 7, no. 1, pp. 121685–121694, 2019.
[7] G. Cheng, J. Han, P. Zhou, and D. Xu, “Learning rotation-invariant [29] H. Sun, S. Li, X. Zheng, and X. Lu, “Remote sensing scene classification
and Fisher discriminative convolutional neural networks for object by gated bidirectional network,” IEEE Transactions on Geoscience and
detection,” IEEE Transactions on Image Processing, vol. 28, no, 1, pp. Remote Sensing, in press, doi: 10.1109/TGRS.2019.2931801.
265–278, 2019. [30] Z. Gong, P. Zhong, Y. Yu, and W. Hu, “Diversity promoting deep
[8] G. Cheng, Z. Li, X. Yao, and L. Guo, and Z. Wei, “Remote sensing structural metric learning for remote sensing scene classification,” IEEE
image scene classification using bag of convolutional features,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 1, pp.
Geoscience and Remote Sensing Letters, vol. 14, no. 10, pp. 1735–1739, 371-390, 2018.
2017. [31] J. Ji, T. Zhang, L. Jiang, W. Zhong, and H. Xiong, “Combining
[9] H. Huang, J. Yang, Y. Song, H. Huang, and G. Gui, “Deep learning for multilevel features for remote sensing image scene classification with
super-resolution channel estimation and DOA estimation based massive attention model,” IEEE Geoscience and Remote Sensing Letters, in press,
MIMO system”, IEEE Transactions on Vehicluar Technology, vol. 67, doi: 10.1109/LGRS.2019.2949253.
no. 9, pp. 8549–8560, 2018. [32] Q. Bi, K. Qin, H. Zhang, J. Xie, Z. Li, and K. Xu, “APDC-Net:
[10] H. Huang, et al., “Fast beamforming design via deep learning,” IEEE attention pooling-based convolutional network for aerial scene classi-
Transactions on Vehicular Technology, vol. 69, no. 1, pp. 1065–1069, fication,” IEEE Geoscience and Remote Sensing Letters, in press, doi:
2020. 10.1109/LGRS.2019.2949930.
[11] G. Gui, et al., “Deep learning for an effective nonorthogonal multiple [33] W. Li, Z. Li, J. Sun, Y. Wang, H. Liu, J. Yang, and G. Gui, “Spear
access scheme,” IEEE Transactions on Vehicular Technology, vol. 67, and shield: attack and detection for CNN-based high spatial resolution
no. 9, pp. 8440–8450, 2018. remote sensing images identification,” IEEE Access, vol. 7, pp. 94583-
[12] F. Tang, et al., “An intelligent traffic load prediction-based adaptive 94592, 2019.
channel assignment algorithm in SDN-IoT: A deep learning approach,” [34] F. Hu, G. Xia, J. Hu, and L. Zhang, “Transferring deep convolutional
IEEE Internet of Things Jounral, vol. 5, no. 6, pp. 5141–5154, 2018. neural networks for the scene classification of high-resolution remote
sensing imagery,” Remote Sensing, vol. 7, no. 11, pp. 14680-14707,
[13] B. Mao, et al., “A novel non-supervised deep learning based network
2015.
traffic control method for software defined wireless networks,” IEEE
[35] C. Wang, D. Chen, L. Hao, X. Liu, Y. Zeng, J. Chen, and G. Zhang,
Wireless Communications Magazine, vol. 25, no. 4, pp. 74–81, 2018.
“Pulmonary image classification based on inception-v3 transfer learning
[14] Y. Wang, M. Liu, J. Yang and G. Gui, “Data-driven deep learning for au-
model,” IEEE Access, vol. 7, no. 1, pp. 146533–146541, 2019.
tomatic modulation recognition in cognitive radios,” IEEE Transactions
[36] P. Zhang, X. Niu, Y. Dou, and F. Xia, “Airport detection on optical satel-
on Vehicular Technology, vol. 68, no. 4, pp. 4074–4077, 2019.
lite images using deep convolutional neural networks,” IEEE Geoscience
[15] N. Kato, et al., “The deep learning vision for heterogeneous network and Remote Sensing Letters, vol. 14, no. 8, pp. 1183-1187, 2017.
traffic control: Proposal, challenges, and future perspective,” IEEE [37] K. Nogueira, Otavio A. B. Penatti, and J. A. Dos Santos, “Towards
Wireless Communications Magazine, vol. 24, no. 3, pp. 146–153, 2017. better exploiting convolutional neural networks for remote sensing scene
[16] Z. Shao, J. Cai, P. Fu, L. Hu, T. Liu, “Deep learning-based fusion of classification,” Pattern Recognition, vol. 61, pp. 539–556, Jan. 2017.
Landsat-8 and Sentinel-2 images for a harmonized surface reflectance [38] B. Zhao, B. Huang, and Y. Zhong, “Transfer learning with fully
product,” Remote Sensing of Environment, vol. 235, Dec. 2019, doi: pretrained deep convolution networks for land-use classification,” IEEE
10.1016/j.rse.2019.111425. Geoscience and Remote Sensing Letter, vol. 14, no. 9, pp. 1436–1440,
[17] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable 2017.
are features in deep neural networks?” Advances in Neural Information [39] B. Huang, B. Zhao, and Y. Song, “Urban land-use mapping using a deep
Processing Systems, vol. 27, pp. 3320–3328, 2014. convolutional neural network with high spatial resolution multispectral
[18] W. Li, H. Liu, Y. Wang, Z. Li, Y. Jia, and G. Gui, “Deep learning- remote sensing imagery,” Remote Sensing Environment, vol. 214, pp.
based classification methods for remote sensing images in urban built-up 73–86, Sep. 2018.
areas,” IEEE Access, vol. 7, no. 1, pp. 36274–36284, 2019. [40] I. Goodfellow, Y. Bengio, and A. Courville, “Deep learning,” Cambridge,
[19] C. Zhang, T. Wang, P. M. Atkinson, X. Pan, and H. Li, “A novel multi- MA, USA: MIT Press, 2016.
parameter support vector machine for image classification,” Internation- [41] K. Simonyan, and A. Zisserman, “Very deep convolutional networks
al Journal of Remote Sensing, vol. 36, pp. 1890–1906, 2015. for large-scale image recognition,” in Proceedings of the International
[20] M. E. Mavroforakis, and S. Theodoridis, “A geometric approach to Conference on Learning Representations, San Diego, CA, USA, 7–9
Support Vector Machine (SVM) classification,” IEEE Transactions on May, 2015.
Neural Networks, vol. 17, no. 3, pp. 671–682, May 2006. [42] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
[21] M. Belgiu, and L. Dragut, “Random forest in remote sensing: a review recognition,” in Proceedings of the IEEE Conference Computer Vision
of applications and future directions,” ISPRS Journal of Photogrammetry and Pattern Recognition, Jun. 2016, pp. 770–778.
and Remote Sensing, vol. 114, pp. 24–31, 2016. [43] S. J. Pan, and Q. Yang, “A survey on transfer learning,” IEEE Transac-
[22] C. Zhang, X. Pan, H. Li, A. Gardiner, I. Sargent, J. Hare, and P. M. tions on Knowledge and Data Engineering, vol. 22, no. 10, pp. 1345–
Atkinson, “A hybrid MLP-CNN classifier for very fine resolution re- 1359, 2010.
motely sensed image classification,” ISPRS Journal of Photogrammetry [44] A. Andreas, T. Evgeniou, and M. Pontil, “Multi-task feature learning,” in
and Remote Sensing, vol. 140, pp. 133–144, 2018. Proceedings of the 19th International Conference on Neural Information
[23] J. Sun, W. Shi, Z. Han, J. Yang, and G. Gui, “Behavioral modeling and Processing Systems, MIT Press, 2006.
linearization of wideband RF power amplifiers using BiLSTM networks [45] Y. Yang, and S. Newsam, “Bag-of-visual-words and spatial extensions
for 5G wireless systems,” IEEE Transactions on Vehicular Technology, for land-use classification,” in Proccedings of the 18th SIGSPATIAL In-
vol. 68, no. 11, pp. 10348–10356, Nov. 2019. ternational Conference on Advances in Geographic Information Systems,
[24] M. Liu, T. Song, G. Gui, J. Hu, and H. Sari, “Deep cognitive perspective: 2010, pp. 270–279.
resource allocation for NOMA based heterogeneous IoT with imperfect [46] H. Li, C. Tao, Z. Wu, J. Chen, J. Gong, and M. Deng, “RSI-CB:Alarge
SIC,” IEEE Internet of Things Journal, vol. 6, no. 2, pp. 2885–2894, scale remote sensing image classification benchmark via crowdsource
Apr. 2019. data,” [Online]. Available: https://arxiv.org/abs/1705.10450
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, VOL. XX, NO. X, JANUARY 2020 10
Wenmei Li (M’18) received the M.S. degree, PhD Yan Jia received the double M.S. degree in t-
degree from Nanjing University and Chinese Acade- elecommunications engineering and computer ap-
my of Forestry in 2010 and 2013, respectively. She is plication technology from Politecnico di Torino,
associate professor with School of Geographic and Turin, Italy, and Henan Polytechnic University, in
Biologic Information, Nanjing University of Posts 2013. Her Ph.D. degree was awarded in electronics
and Telecommunications. And she is working for engineering from Politecnico di Torino in 2017. Now
her postdoctoral studies (2018-) in Nanjing Univer- she is working in Nanjing University of Posts and
sity of Posts and Telecommunications. Her research Telecommunications. Her research interests include
interests include deep learning, optimization, image microwave remote sensing, soil moisture retrieval,
reconstruct, and their application in land remote Global Navigation Satellite System Reflectometry
sensing. (GNSS-R) applications to land remote sensing and
antenna design.