Object Detection Using Adaptive Mask RCNN
Object Detection Using Adaptive Mask RCNN
65
Object Detection Using Adaptive Mask RCNN in Optical Remote Sensing Images
1
National Authority for Remote Sensing and Space Science, Cairo, Egypt
2
Faculty of Computers and Information, Cairo university, Giza, Egypt
* Corresponding author’s Email: [email protected]
Abstract: Fast and automatic object detection in remote sensing images is a critical and challenging task for civilian
and military applications. Recently, deep learning approaches were introduced to overcome the limitation of traditional
object detection methods. In this paper, adaptive mask Region-based Convolutional Network (mask-RCNN) is utilized
for multi-class object detection in remote sensing images. Transfer learning, data augmentation, and fine-tuning were
adopted to overcome objects scale variability, small size, the density of objects, and the scarcity of annotated remote
sensing image. Also, five optimization methods were investigated namely: Adaptive Moment Estimation (Adam),
stochastic gradient decent (SGD), adaptive learning rate method (Adelta), Root Mean Square Propagation (RMSprop)
and hybrid optimization. In hybrid optimization, the training process begins Adam then switches to SGD when
appropriate and vice versa. Also, the behaviour of adaptive mask RCNN was compared to baseline deep object
detection methods. Several experiments were conducted on the challenging NWPU-VHR-10 dataset. The hybrid
method Adam_SGD acheived the highest Accuracy precision, with 95%. Experimental results showed detection
performance in terms of accuracy and intersection over union (IOU) boost of performance up to 6%.
Keywords: Object detection, Deep learning, Mask RCNN, Adam, SGD, RmsProp.
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 66
Recently, deep learning algorithms show their feature constraints (GFC) to improve the accuracy of
superiority in feature representation tasks in different aircraft detection. The detection accuracy increased
computer vision and remote sensing domain. The by an average of 3.66%. In [23], an enhancement of
recent evolution of deep learning (DL) in detecting Faster R-CNN was introduced to detect densely
complicated patterns in big remote sensing imagery packed objects in satellite images. Enormous
exposes its high potential to address various experiments were conducted to evaluate the
challenges such as complexity of satellite images, effectiveness of the proposed method in terms of
lack of training datasets, multi-sensor data, complex accuracy and IOU. Results showed the effectiveness
background, atmospheric conditions. These are the of the proposed method.
primary challenges to achieve a robust automatic In [24], AlexNet was adopted to extract generic
object detection using deep learning. feature for ship detection task in very high-resolution
Region-based Convolutional Network (R-CNN) images. The proposed method outperforms You Only
[18] achieved an excellent object detection accuracy Look Once (YOLO) and SSD in terms of accuracy
using very deep CNN to classify object proposals. R- and IOU. Moreover, Nie et al. [25] proposed a novel
CNN has notable drawbacks such as multi-stage framework based on Mask R-CNN for the inshore
pipeline training, extensive training time and space, ship detection task. They adopted Soft-Non-
and slow detection. An enhancement was introduced Maximum Suppression (Soft-NMS) to improve the
by Spatial Pyramid Pooling Networks (SPPnets)[19] proposed method of performance robustness and
by sharing convolutions across proposals to limit efficiency. In [26], Yang et al. proposed a three
time cost in training. Fast RCNN [20] operates on a stages framework for object detection. In the first
single stage with a multi-task loss during the training stage, a sliding window technique utilized to generate
phase. This enhancement limits the used storage the candidate region proposal. Next, AlexNet and
space and improves accuracy, but region proposal GoogleNet were chosen to extract generic image
computation still considered the main bottleneck. To features from each region proposal. Finally,
overcome this problem, Ren et al. introduced an unsupervised score-based bounding box regression
additional region proposal network (RPN) [13] that (USB-BBR) algorithm was proposed to optimize the
replaced the selective search for region proposal bounding box of the detected object. Results of the
generation, thereby combining region proposal, proposed framework surpass other methods in terms
classification, and localization regression improve of accuracy and IOU quality with complex
speed and accuracy but still too slow to achieve real- backgrounds. Inspired by Faster-RCNN, Li et al. [27]
time detection. Another approach to overcome the used region proposal network to generate translation-
time-consumed in region selection step was to invariant and multi-scale candidate region. Next,
directly predict confidences for both classification local-contextual feature fusion network was used to
and localization bounding boxes. YOLO [14] form a discriminative joint representation (local-and
introduced real-time performance by computing a contextual feature) for each candidate region. Finally,
single loss. YOLOv2 [15] is an enhancement that accurate classification and accurate object
provided a smooth trade-off Between speed and localization were implemented. In [28], Cheng et al.
accuracy. SSD method [16], achieved significantly presented a two stages approach based on Faster R-
accurate performance compared with YOLO by CNN, namely deep adaptive proposal network
adding feature map at each scale YOLO versions and (DAPNet). The input image is feed to the backbone
SSD methods struggle with small objects within the network to generate the high-level features
image, due to the spatial constraints of the algorithm. representation of the image, then the category prior
R-FCN [17, 32] is considered as two-stage object network (CPN) sub-network and fine-region proposal
detector which applies the position-sensitive ROI- network (F-RPN) used the aforementioned high-level
pooling to tackle the dilemma between translation- features to obtain the category prior information and
invariance in classification and translation-variance candidate regions for each image respectively. Both
in localization however it less accurate than faster R- results were combined to achieve an adaptive region
CNN. proposal. Finally, the accuracy detection network
In [12], a deep neural network was utilized for sub-network was used to classification and
ship detection task in optical images. Various regression for each adaptive candidate boxes. Several
augmentation methods, such as rotation, scaling, and experiments were carried out on a public
illuminations conditions, were adopted to enhance NWPUVHR dataset to evaluate the proposed
the learning procedure. In [22], Pan et al. utilized a approach performance and results show its
cascade convolutional neural network (CCNN) superiority. Ammour et al. [29] proposed a car
framework based on transfer-learning and geometric detection method in unmanned aerial vehicle images
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 67
(UAV). A mean-shift algorithm was used to segment were introduced in section 3. Finally, section 4 draws
the UAV input image into small homogeneous the conclusion.
regions. Then, a pre-trained Vgg-16 was adopted to
extract a generic feature for each segment. Finally, 2. Proposed method
linear support vector (SVM) classifier was adopted to
In recent years, deep learning techniques have
binary map each segment as into “car” and “no car.”
achieved state-of-the-art results for object detection
The proposed method outperformed state-of-the-art
on standard benchmarks. Mask R-CNN
methods, both in terms of accuracy and
outperformed other deep learning object detection
computational time. To overcome the limited
model and won a COCO object detection challenge
accuracy of the traditional ship detection methods,
in 2016. However, the performance of Mask R-CNN
Yang et al. [30]proposed an approach called Rotation
in remote sensing domain hardly achieved
Dense Feature Pyramid Network (R-DFPN) method.
comparable results due to the complex nature of
The proposed method has two stages: dense feature
satellite images, the lake of annotated sampled, and
Pyramid Network (DFPN) for feature fusion and
varied object scales. This work study the behavior of
Rotation Region Detection Network (RDN) for
different optimization methods and a hybrid training
prediction. Comprehensive evaluations on remote
strategy that starts with an adaptive method (Adam)
sensing images extracted from Google Earth for ship
then switches to SGD (SWATS), and vice versa.
detection demonstrated the superiority of the
Mask-RCNN[33] was introduced by He et al. in
proposed method. In [31], Cheng et al. proposed an
2018 as an extension to Faster RCNN [13] to allow
effective approach to learn a rotation-invariant CNN
an accurate pixel-based segmentation. It consists of
mode. First, the new rotation-invariant layer was
two main stages namely: Feature Pyramid Network
trained by optimizing a new objective function via
(FPN) and Region Proposal Network (RPN). In
imposing a regularization constraint then fine-tune
feature pyramid network, a different number of
the whole CNN network to boost the performance
proposals was generated about the regions where
further. The proposed method was evaluated on a
there might be an object based on the input image.
public NWPUVHR dataset, and the results denoted
First, we utilized a standard convolutional neural
the effectiveness of the proposed method.
network to serve as a feature extractor. The state of
The problem investigated in this paper, we
art architectures AlexNet, VGG Net and GoogleNet
utilized mask-RCNN to boost the object detection
had (5, 19, 22) layers respectively. By getting deeper,
accuracy in the RS domain. The main contribution of
the network suffers from vanishing gradient problem,
this paper is utilizing adaptive Mask RCNN
which results in performance saturation or even
framework to detect multi-scale object in optical
degrading rapidly. Several attempts [34] had been
remote sensing images. The proposed adaptive mask
introduced to overcome the vanishing gradient
RCNN efficiently reduce the redundancy of detectors
problem. Based on the residual block, [35] was firstly
boxes and allow multi-scale targets under complex
introduced ResNet50 architecture. Skip connection
background images. Transfer learning and fine-tune
or shortcut which allow to take activation from one
were adopted to overcome the scarcity and
layer and feed it to another layer s that about 2–3 hops
complexity of remote sensing images. The paper also
away. ResNet50 becomes seminal architecture to
studies the behaviour of adaptive mask RCNN
different computer vision applications. In this paper,
towards baseline optimization methods namely:
we used a pre-trained architecture on ImageNet (1000
Adam, SGD, Ada-delta, RMSprop, hybrid
class) dataset. Generally, the size of the recent model
SGD_Adam, hybrid Adam_SGD. The paper also
is substantially smaller due to the usage of global
studies compare adaptive mask RCNN towards
average pooling rather than fully-connected layers.
baseline object detection methods Faster RCNN
We choose ResNet50 as a feature extractor network
(FRCN) method [13], You only look once (YOLO)
which encodes input image into 32x32x2048 feature
method [14], (YOLO2) method [15], Single Shot
map. The FPN extracts regions of interest from
Multibox Detector (SSD) method [16], Region-based
features of different levels according to the size of the
Fully Convolutional Network (R FCN) [17]. All
feature which feeds as input to Next stage (RPN).
experiments were conducted on a publicly
In Region Proposal Network (RPN), the regions
available10-class geospatial object NWPU VHR-10
scanned individually and predicted whether or not an
dataset[ 33].
object is present. The actual input image is never
The remainder of this paper is organized as
scanned by RPN instead RPN network scans the
follows, proposed adaptive Mask R-CNN is proposed
feature map, making it much faster. Next, each of
in section 2. Experimental results and discussion
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 68
Testing
Boundary Box
Data Augmentation
Satellite Image - Resize Input Images
Train (70%) 1024x1024
dataset
- -Rotation /Flipping
Figure.1 The proposed object detection method for optical remote sensing image
regions of interest proposed by the RPN as inputs and overcome the problem of limited annotated dataset in
outputs a classification (SoftMax) and a bounding remote sensing domain, we adopted transfer learning
box (regressor). Finally, Mask- RCNN adds a new by selected the pre-trained network weights of the
branch to output a binary mask that indicates whether resnet50 model, which was successfully trained with
the given pixel is or not part of an object. This added the image net dataset [36]. We utilized the pre-trained
branch is a Fully Convolutional Network on top of resnet50 and fine-tuned the network weights to the
the backbone architecture. The proposed method NWPUVHR dataset. Due to limited memory, we
consists of two main phases: Training and testing consider three different strategies in fine-tuning. First
phase as illustrated in Fig. 1. strategy, we train the head layer for 30 epochs while
freezing other layers with learning rate 0.1. Second,
2.1 Loss function the convolution layer (+5) and convolution layer (+4)
were trained for 30 epochs each using a learning rate
Mask R-CNN utilized a multi-task loss function 0.01and 0.001, respectively. Finally, the convolution
that combined the loss of classification, localization layer (+3) were trained for 400 epochs with learning
and segmentation mask as illustrated in Eq. (1). rate 0.001. We used different argumentation methods
such as horizontal flip, vertical flip, image rotation,
𝐿 = 𝐿𝑐𝑙𝑠 + 𝐿𝑏𝑏𝑜𝑥 + 𝐿𝑚𝑎𝑠𝑘 (1)
and image translation to enlarge the training data.
One can observe that this domain-specific fine-tuning
Where 𝐿𝑐𝑙𝑠 , 𝐿𝑏𝑏𝑜𝑥 are same as in Faster R-CNN [13].
allows learning good network weights for a high-
The added mask 𝐿𝑚𝑎𝑠𝑘 is illustrated in Eq. (2). as the
capacity CNN for NWPUVHR dataset.
average binary cross-entropy that only includes 𝑘 𝑡ℎ
mask if the region is associated with the ground truth 2.3 Testing phase
class 𝑘 .
The learned model used directly to predict class
1 label, boundary box, and masked segment for each
𝐿𝑚𝑎𝑠𝑘 = − 2 ∑ 𝑦𝑖𝑗 𝑙𝑜𝑔𝑦̂𝑖𝑗𝑘
𝑚 image in testing data. To evaluate the learned model
1≤𝑖,𝑗≤𝑚
+ (1 − 𝑦𝑖𝑗 ) log(1 − 𝑦̂𝑖𝑗𝑘 ) (2) performance, the predicted labels and boundary box
is matched with those in the dataset.
Where the mask branch generates a mask of 2.4 Optimization techniques
dimension m x m for each RoI and each class𝑦𝑖𝑗 and
𝑘
𝑦̂𝑖𝑗 are cell (i, j) label of the true mask and the Neural network optimization played an essential
predicted value respectively. role in training deep neural networks. Generally,
there are two metrics to evaluate the efficiency of
2.2 Training phase optimizer: speed of convergence and generalization.
Stochastic gradient descent (SGD) [37] is commonly
Mask-RCNN requires a large amount of used for training deep neural networks. Compared
annotated data for training to avoid overfitting. To
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 69
Figure. 2 Statistics of total number of objects of each category used in training and testing in the NWPU VHR-10 data set
with SGD, Adaptive optimization methods such as for multi-class in remote sensing images. This data
Adam [38] , Adelta [39], RMSprop [40] perform well set was cropped from Google Earth then manually
in the initial stages of training but tend to generalize annotated by experts; it contains ten classes of objects,
poorly. Inspired by their work, Keskar, and Soche namely “airplane, ship, storage tank, baseball
[41]. We introduced two-hybrid training strategy that diamond, tennis court, basketball court, ground track
starts with an adaptive method (Adam) then switches field, harbor, bridge, and vehicle” samples as shown
to SGD (SWATS), and vice versa. An evaluation of in Fig. 2.
their performance of the hybrid approach in object In our work, the total number of objects in the NWPU
detection in remote sensing domain. We conducted VHR-10 data set is divided into 70% and 30% for
several experiments to investigate the triggering training and testing in class level. Fig. 2 presents the
condition to switch between Adam and SGD. The statistics of the total number of objects in each class
triggering condition includes the number of epochs used in both training and testing. Overall, it can be
and value of learning rate. The optimal triggering seen that the 10- classes included in NWPU dataset
condition in object detection was to set the learning are not equally distributed in terms of the number of
rate to 0.001 or epochs achieved 400. images or objects.
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 70
Table 1. Performance for YOLO, Faster RCNN, SSD, R-FCN, and proposed method on NWPU dataset in terms of AP
percentage values and average running time in seconds per image
FRCN YOLO1 YOLO2 SSD R-FCN Proposed
[13] [14] [15] [16] [17] Method
Class
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 71
Second, we conduct several experiments to two hybrid techniques Adam- SGD, and SGD_Adam
evaluate six optimization method in the remote were tested method in terms of IOU. The recall rates
sensing object detection task. Four optimization of these optimization techniques under different IOU
techniques: Adam, Adelta, RMSprop and SGD, and thresholds are plotted in Fig. 3. It can be observed that
Figure. 3 Recall vs. IOU overlap ratio on the NWPU VHR-10 data set for airplane, ship, storage tank, baseball diamond,
tennis court, basketball, ground track field, and harbour, bridge, and vehicle classes, respectively
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 72
Figure.4 precision and recall on the NWPU VHR-10 dataset for airplane, ship, storage tank, baseball diamond, tennis
court, basketball, ground track field, harbour, bridge, and vehicle classes, respectively
(1) The recall curves declined with the increasing of basketball, ground-track, and harbor, the recall of
IoU thresholds. In detail, the recall of Adelta and different optimization techniques is higher compared
SGD optimization decreased more quickly compared with other object classes. This is due to small size
with other techniques, which demonstrates their objects with a complex background are harder to
limited performance in object detection task in detect. (3) Hybrid based optimization Adam-SGD
remote sensing domain. (2) For object classes such as
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 73
Table 2. performance of six optimization techniques in terms of AP percentage values and average running time per
image
optimization techniques clarify superb performance and Remote Sensing, Vol.49, No.12, pp. 4928-
for the object categories of airplane and baseball 4943, 2011.
diamond. However, for other eight object categories, [3] D. Chaudhuri, N. K. Kushwaha, and A.Samal,
are varied. This is due to that both classes have “Semi-automated road detection from high
relatively larger in training samples count and size. resolution satellite images by directional
The AP metric measures the area under the PRC. The morphological enhancement and segmentation
higher the AP value, the better the performance, and techniques”, IEEE Journal of Selected Topics in
vice versa. the results of the average precision (AP) Applied Earth Observations and Remote
in optimization techniques Adam, SGD, RMSprop, Sensing, Vol.5, No.5, pp. 1538-1544, 2012.
Adelta, hybrid SGD_Adam, and hybrid Adam_SGD [4] A.O. Ok, “Automated detection of buildings
were 90.8%, 87.7%, 87.3%, 48.6%, 91.2, and 95% from single VHR multispectral images using
respectively. The proposed adaptive Mask RCNN shadow information and graph cuts”, ISPRS
firstly, outperformed other deep learning methods Journal of Photogrammetry and Remote Sensing,
and achieved the highest accuracy in terms of IOU Vol.86, pp. 21-40, 2013.
and PRC by utilizing the switch between optimizers [5] T. Blaschke, G.J. Hay, M. Kelly, S. Lang, P.
SWATS (switch from Adam to SGD) in training Hofmann, E. Addink, R.Q. Feitosa, F.V. Meer,
phase compared with utilizing default optimizer H.V. Werff, F.V. Coillie, and D. tiede,
(SGD) in other methods. Secondly, SWATS “Geographic object-based image analysis–
achieved a verified high accuracy with reducing the towards a new paradigm”, ISPRS Journal of
computation time and cost. Hence, in our future work, Photogrammetry and Remote Sensing, Vol.87,
we intend to implement an ensemble of pp. 180-191, 2014.
heterogeneous object detection approaches. In [6] Y. Li, S. Wang, Q. Tian, and X. Ding, “Feature
addition to incorporate a multi-GPU configuration to representation for statistical-learning-based
further reduce the computation time. object detection: A review”, Pattern
Recognition, Vol.48, No.11, pp. 3542-3559,
References 2015.
[7] G. Cheng, J. Han, L. Guo, Z. Liu, S. Bu, and J.
[1] G. Cheng and J. Han, “A survey on object
Ren, “Effective and efficient midlevel visual
detection in optical remote sensing images”,
elements-oriented land-use classification using
ISPRS Journal of Photogrammetry and Remote
VHR remote sensing images”, IEEE
Sensing, Vol.117, pp.11-28, 2016.
Transactions on Geoscience and Remote
[2] T.R. Martha, N. Kerle, C.J. Westen, V. Jetten,
Sensing, Vol.53, No.8, pp. 4238-4249, 2015.
and K.V. Kumar, “Segment optimization and
[8] D. Zhang, J. Han, G. Cheng, Z. Liu, S. Bu, and
data-driven thresholding for knowledge-based
L. Guo, “Weakly supervised learning for target
landslide detection by object-based image
detection in remote sensing images”, IEEE
analysis”, IEEE Transactions on Geoscience
Geoscience and Remote Sensing Letters, Vol.12,
No.4, pp. 701-705, 2015.
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 75
[9] N. Yokoya and A. Iwasaki, “Object detection [20] R. Girshick, “Fast r-cnn”, In: Proc. of the IEEE
based on sparse representation and Hough International Conference on Computer Vision,
voting for optical remote sensing imagery”, pp. 1440-1448, 2015.
IEEE Journal of Selected Topics in Applied [21] M.M.U. Rathore, A. Paul, A. Ahmad, B.W.
Earth Observations and Remote Sensing, Vol.8, Chen, B. Huang, and W. Ji, “Real-time big data
No.5, pp. 2053-2062, 2015. analytical architecture for remote sensing
[10] G. Mountrakis, J. Im, and C. Ogole, “Support application”, IEEE Journal of Selected Topics in
vector machines in remote sensing: A review”, Applied Earth Observations and Remote
ISPRS Journal of Photogrammetry and Remote Sensing, Vol.8, No.10, pp. 4610-4621, 2015.
Sensing, Vol.66, No.3, pp. 247-259, 2011. [22] B. Pan, J. Tai, Q. Zheng, and S. Zhao, “Cascade
[11] Z. Shi, X. Yu, Z. Jiang, and B. Li, “Ship Convolutional Neural Network Based on
detection in high-resolution optical imagery Transfer-Learning for Aircraft Detection on
based on anomaly detector and local shape High-Resolution Remote Sensing Images”,
feature”, IEEE Transactions on Geoscience and Journal of Sensors, Vol.2017, 2017.
Remote Sensing, Vol.52, No.8, pp.4511-4523, [23] Z. Deng, L. Lei, H. Sun, H. Zou, S. Zhou, and J.
2014. Zhao, “An enhanced deep convolutional neural
[12] J. Tang, C. Deng, G.B. Huang, and B. Zhao, network for densely packed objects detection in
“Compressed-domain ship detection on remote sensing images”, International
spaceborne optical image using deep neural Workshop on Remote Sensing with Intelligent
network and extreme learning machine”, IEEE Processing, pp. 1-4, 2017.
Transactions on Geoscience and Remote [24] T. Wang and Y. Gu, “Cnn Based
Sensing, Vol.53, No.3, pp.1174-1185, 2015. Renormalization Method for Ship Detection in
[13] S. Ren, K. He, R. Girshick, and J. Sun, “Faster Vhr Remote Sensing Images”, In: Proc. of
r-cnn: Towards real-time object detection with IGARSS IEEE International Geoscience and
region proposal networks”, Advances in Neural Remote Sensing Symposium, pp.1252-1255,
Information Processing Systems, pp.91-99, 2018.
2015. [25] S. Nie, Z. Jiang, H. Zhang, B. Cai, and Y.Yao,
[14] J. Redmon, S. Divvala, R. Girshick, and A. “Inshore Ship Detection Based on Mask R-
Farhadi, “You only look once: Unified, real-time CNN”, In: Proc. of IGARSS IEEE International
object detection”, In: Proc. of the IEEE Geoscience and Remote Sensing Symposium,
Conference on Computer Vision and Pattern pp.693-696, 2018.
Recognition, pp.779-788, 2016. [26] Y. Long, Y. Gong, Z. Xiao, and Q. Liu,
[15] J. Redmon and A. Farhadi, “YOLO9000: better, “Accurate Object Localization in Remote
faster, stronger”, In: Proc. of the IEEE Sensing Images Based on Convolutional Neural
Conference on Computer Vision and Pattern Networks”, IEEE Transactions on Geoscience
Recognition, pp. 7263-7271, 2017. and Remote Sensing, Vol.55, No.5, pp. 2486-
[16] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. 2498, 2017.
Reed, C.Y. Fu, and A.C. Berg, “Ssd: Single shot [27] K. Li, G. Cheng, S. Bu, and X. You, “Rotation-
multibox detector”, In: Proc. of European insensitive and context-augmented object
Conference on Computer Vision, pp. 21-37, detection in remote sensing images”, IEEE
2016. Transactions on Geoscience and Remote
[17] J. Dai, Y. Li, K. He, and J. Sun, “R-fcn: Object Sensing, Vol.56, No.4, pp. 2337-2348, 2017.
detection via region-based fully convolutional [28] L. Cheng, X. Liu, L. Li, L. Jiao, and X. Tang,
networks”, Advances in Neural Information “Deep Adaptive Proposal Network for Object
Processing Systems, pp.379-387, 2016. Detection in Optical Remote Sensing Images”,
[18] R. Girshick, J. Donahue, T. Darrell, and J. Malik, arXiv preprint arXiv:1807.07327, 2018.
“Rich feature hierarchies for accurate object [29] N. Ammour, H. Alhichri, Y. Bazi ,B. Benjdira,
detection and semantic segmentation”, In: Proc. N. Alajlan, and M. Zuair, “Deep learning
of the IEEE Conference on Computer Vision and approach for car detection in UAV imagery”,
Pattern Recognition, pp. 580-587, 2014. Remote Sensing, Vol.9, No.4, pp.31, 2017.
[19] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial [30] X. Yang, H. Sun, k. Fu, J. Yang, X. Sun, M. Yan,
pyramid pooling in deep convolutional networks and Z. Guo, “Automatic ship detection in remote
for visual recognition”, IEEE Transactions on sensing images from google earth of complex
Pattern Analysis and Machine Intelligence, scenes based on multiscale rotation dense
Vol.37, No.9, pp. 1904-1916, 2015.
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07
Received: August 3, 2019. Revised: October 22, 2019. 76
feature pyramid networks”, Remote Sensing, adam to sgd”, arXiv preprint arXiv:1712.07628,
Vol.10, No.1, pp.132, 2018. 2017.
[31] G. Cheng, P. Zhou, and J. Han, “Learning [42] K. Oksuz, B.C. Cam, E. Akbas, S. Kalkan,
rotation-invariant convolutional neural networks “Localization recall precision (lrp): A new
for object detection in VHR optical remote performance metric for object detection”, In:
sensing images”, IEEE Transactions on Proc.of the European Conference on Computer
Geoscience and Remote Sensing, Vol.54, No.12, Vision (ECCV), pp.504-519, 2018.
pp. 7405-7415, 2016. [43] C. Szegedy, S. Ioffe, V. Vanhoucke, A.A. Alemi,
[32] Z. Deng, H. Sun, S. Zhou, J. Zhao, L. Lei, and H. “Inception-v4, inception-resnet and the impact
Zou, “Multi-scale object detection in remote of residual connections on learning”, In: Proc. of
sensing imagery with convolutional neural the Thirty-First AAAI Conference on Artificial
networks”, ISPRS Journal of Photogrammetry Intelligence, 2017.
and Remote Sensing, Vol.145, pp.3-22, 2018.
[33] K. Zhao, J. Kang, J. Jung, and G.Soh, “Building
Extraction from Satellite Images Using Mask R-
CNN with Building Boundary Regularization”,
In: Proc. of the IEEE Conference on Computer
Vision and Pattern Recognition Workshops,
pp.247-251, 2018.
[34] J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy,
B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, and
T. Chen, “Recent advances in convolutional
neural networks”, Pattern Recognition, Vol.77,
pp.354-377, 2018.
[35] K. He, X. Zhang, S. Ren, and J. Sun “Deep
residual learning for image recognition”, In:
Proc.of the IEEE Conference on Computer
Vision and Pattern Recognition, pp.770-778,
2016.
[36] J. Wang, C. Luo, H. Huang, H. Zhao, and S.
Wang, “Transferring Pre-Trained Deep CNNs
for Remote Scene Classification with General
Features Learned from Linear PCA Network”,
Remote Sensing, Vol.9, No.3, pp.225, 2017.
[37] H. Robbins and N. Carolina, “A stochastic
approximation method”, The Annals of
Mathematical Statistics, pp. 400-40, 1951.
[38] D. P. Kingma and J. Ba, “Adam: A method for
stochastic optimization”, arXiv preprint
arXiv:1412.6980, 2014.
[39] M.D. Zeiler, “ADADELTA an adaptive learning
rate method”, arXiv preprint arXiv:1212.5701,
2012.
[40] T. Tieleman and G. Hinton ,“Divide the gradient
by a running average of its recent magnitude.
coursera: Neural networks for machine
learning”, Tech. rep., Technical Report.
Available online:
https://zh.coursera.org/learn/neuralnetworks/lec
ture/YQHki/rmsprop-divide-the-gradient-by-a-
running-average-of-its-recent-magnitude
(Accessed on 21 April 2017)
[41] N.S. Keskar and R. Socher, “Improving
generalization performance by switching from
International Journal of Intelligent Engineering and Systems, Vol.13, No.1, 2020 DOI: 10.22266/ijies2020.0229.07