1 s2.0 S0925753520302095 Main
1 s2.0 S0925753520302095 Main
1 s2.0 S0925753520302095 Main
Safety Science
journal homepage: www.elsevier.com/locate/safety
A R T I C LE I N FO A B S T R A C T
Keywords: Small ship detection is an important topic in autonomous ship technology and plays an essential role in shipping
Autonomous ship safety. Since traditional object detection techniques based on the shipborne radar are not qualified for the task of
Small ship detection near and small ship detection, deep learning-based image recognition methods based on video surveillance
Wasserstein generative adversarial network systems can be naturally utilized on autonomous vessels to effectively detect near and small ships. However, a
Convolutional neural network
limited number of real-world samples of small ships may fail to train a learning method that can accurately
Ship safety
detect small ships in most cases. To address this, a novel hybrid deep learning method that combines a modified
Generative Adversarial Network (GAN) and a Convolutional Neural Network (CNN)-based detection approach is
proposed for small ship detection. Specifically, a Gaussian Mixture Wasserstein GAN with Gradient Penalty is
utilized to first directly generate sufficient informative artificial samples of small ships based on the zero-sum
game between a generator and a discriminator, and then an improved CNN-based real-time detection method is
trained on both the original and the generated data for accurate small ship detection. Experimental results show
that the proposed deep learning method (a) is competent to generate sufficient informative small ship samples
and (b) can obtain significantly improved and robust results of small ship detection. The results also indicate that
the proposed method can be effectively applied to ensuring autonomous ship safety.
⁎
Corresponding authors.
E-mail addresses: [email protected] (Z. Chen), [email protected] (D. Chen), [email protected] (Y. Zhang), [email protected] (X. Cheng),
mingyang.0.zhang@aalto.fi (M. Zhang), [email protected] (C. Wu).
https://doi.org/10.1016/j.ssci.2020.104812
Received 1 January 2020; Received in revised form 28 March 2020; Accepted 6 May 2020
Available online 18 June 2020
0925-7535/ © 2020 Elsevier Ltd. All rights reserved.
Z. Chen, et al. Safety Science 130 (2020) 104812
the smart and autonomous ships that work in complex voyage en- training samples, based on the zero-sum game between a generator
vironments like inland rivers (e.g., the unmanned patrol speedboat neural network and a discriminator neural network.
launched by Hefei public security bureau for drowning prevention on Inspired by this, a novel small ship detection method is proposed
the ‘Tiane’ lake in Chaohu, Anhui, China1) or near-shore ocean en- mainly based on GAN and YOLO v2. Specifically, a modified
vironments, efficiently detecting near and small ships around their Wasserstein Generative Adversarial Network with Gradient Penalty
channels seems an essential task to keeping the safety of the voyage (WGAN-GP) is first utilized in this study to generate sufficient in-
mission. formative small ship images as the additional training samples for
Surveillance video-based ship detection methods have thus been training data enhancement, and an improved YOLO v2 algorithm is
developed to address the aforementioned problem. Tran and Le (Tran then applied to complete the task of small ship detection based on the
and Le, 2016) propose a vision-based method to detect the outline of augmented training samples (comprising of original and generated
ships via maritime surveillance videos. This method can effectively samples).
detect ships in both maritime and non-maritime backgrounds. Yao et al. The remainder of the paper is organized as follows. Section 2 mainly
(Yao et al., 2017) propose a hybrid ship detection method that in- introduces the modified WGAN-GP. Section 3 mainly introduces the
tegrates deep learning methods. Specifically, they utilize Deep Neural improved YOLO v2 algorithm. Section 4 provides the framework and
Networks (DNNs) and Region Proposal Networks (RPNs) to obtain a 2D the pseudo-code of the proposed method. Extensive experiments are
bounding box of target ships. Similarly, Zhao et al. (Zhao et al., 2019) conducted in Section 5 to validate the effectiveness of the proposed
also propose a two-stage neural network for ship detection and re- method. Conclusions and future work are presented in Section 6.
cognition. Wijnhoven et al. (Wijnhoven et al., 2010) present an object
detection system based on Histogram of Oriented Gradients (HOG) 2. A modified generative adversarial network for data
(Dalal and Triggs, 2005) for finding ships in the maritime video. Mat- enhancement
sumoto (Matsumoto, 2013) proposes a HOG-SVM (Support Vector
Machine) method to detect ships on the images from ship mounted 2.1. Generative adversarial network
camera. However, most of the above methods are not feasible to be
applied by autonomous ships (e.g. the methods proposed in (Zhao et al., GAN is a generative model based on zero-sum game theory
2019) and (Wijnhoven et al., 2010) are based on the static cameras for (Goodfellow et al., 2014). It aims to fit the distribution of the sample set
port management and thus do not match the shipborne surveillance and then to output highly qualified generated samples. GAN is com-
systems on moved autonomous ships). Furthermore, those methods still posed of a generator (G) and a discriminator (D), the generator first
show effectiveness in detecting medium or large-sized ships, while the generates samples according to the random noise (z), and then feeds the
task of accurate recognition of small or tiny ships in inland rivers still generated samples together with the real samples to the discriminator,
lacks an effective solution. For autonomous ships that sail on inland training the discriminator’s identification ability of the fake (i.e. gen-
rivers or near-shore ocean environments, small moving obstacles like erated) samples. The discriminator in turn feeds back its identification
small ships need to be paid extraordinary attention to ensure voyage result of the real samples and the generated samples to the generator,
safety. Unfortunately, it is hard to obtain sufficient training samples of thereby training the generator to simulate the real samples. Through
small ship images (including different shapes and types of small ships) multiple iterations of this process, the samples generated by the gen-
due to the collection and calibration difficulties, which serves as a fatal erator are getting much closer to the real ones, and the critical result of
shortage for traditional machine learning methods for small ship de- the discriminator is close to the same for the real and the generated
tection. samples. Ideally, the generator and the discriminator will finally reach a
The significant development of deep learning techniques in recent dynamic balance, and at this time the generator can generate samples
years provides great opportunities for proposing an effective autono- that discriminator is not able to identify whether it is real or fake. The
mous ship-oriented small ship detection method. Deep learning, parti- core objective function of GAN can be expressed as the following
cularly deep neural networks, achieve increasing performances in ob- minimax formulation:
ject detection tasks (Girshick et al., 2014; Chen et al., 2019; Ren et al.,
2015; Redmon et al., 2016; Liu et al., 2016; Redmon and Farhadi, minmax V (D, G ) = Ex ~ pdata (x ) [log D (x )] + Ex ~ pz (x ) [log(1 − D (G (z )))]
G D
2017). Regions with CNN (R-CNN) proposed by Girshick et al. (Girshick (1)
et al., 2014) and its variants like Fast R-CNN (Chen et al., 2019) and
Faster R-CNN (Ren et al., 2015) are the typical deep learning-based The loss function shown in Eq. (1) is composed of two parts: the
object detection methods. For any input image, it generates multiple expectations of logarithmic distribution function of the generator and
windows which may contain the detected objects by explanatorily se- discriminator, where x represents the real sample, z is the random noise
lecting the sliding target window form. Then these windows are iden- and is also the input of the generator, G(z) denotes the generated
tified and the redundant windows (i.e. windows without target objects) samples from the generator, p represents the distribution of data, and E
are finally removed. The YOLO (“You Only Look Once”) method pro- represents the expected value in the distribution of samples. The pur-
posed by Redmon et al. (Redmon et al., 2016) and its variants (e.g. pose of the generator is to make D(G(z)) as large as possible, i.e., to
YOLO v2 (Liu et al., 2016) are recent representative CNN-based real- make V(D, G) as small as possible. In contrast, the purpose of the dis-
time detection methods. The core idea of YOLO is integrating posi- criminator is to make D(x) as large as possible while D(G(z)) as small as
tioning (i.e. generating windows to overlap objects) and classification possible, i.e., to make V(D, G) as large as possible. The structure of GAN
(i.e. detecting whether objects are in the windows) by using a regres- is shown in Fig. 1.
sion approach. Currently, SSD (Single Shot multibox Detector) (Redmon
and Farhadi, 2017) and YOLO v2 (Liu et al., 2016) generally perform 2.2. Wasserstein GAN with the gradient penalty term
best in both accuracy and speed in the majority of object detection
tasks. More importantly, recently developed deep learning method Although GAN has pioneered sample generation, it has a serious
based on game theory, called Generative Adversarial Networks (GANs) deficit: the generator gradient tends to disappear when the dis-
(Goodfellow et al., 2014), can successfully generate informative and criminator trains well. This may cause a deadlock in network training
vivid images given a certain number of ground truth images as the and result in a lack of diversity or distortion in generated samples. To
this end, Wasserstein GAN (WGAN) (Arjovsky et al., 2017) refines GAN
by using the Wasserstein distance (i.e. Earth-Mover distance (Chen
1
http://language.chinadaily.com.cn/2017–07/10/content_30055692.htm. et al., 2019) instead of the JS (Jensen-Shannon) divergence to measure
2
Z. Chen, et al. Safety Science 130 (2020) 104812
Real samples
generates
random noise Generator (G) Discriminator (D) loss
samples
update
the distance between the distributions of the real and the generated generator have 32 filters, and the last layer utilizes the 3 × 3 filter to
samples. The advantage of Wasserstein (EM) distance is its ability to generate a feature map and the activation function is ReLU (Rectified
measure two distributions even there is no overlap between them, Linear Units). The discriminator has 6 convolutional layers, where the
making GAN able to tackle the problems of insufficient training samples first two convolutional layers have 64 filters, the middle two convolu-
and lack of diversity caused by collapse (Arjovsky et al., 2017). tional layers have 128 filters, and the last two convolutional layers have
Compared with GAN, Wasserstein-GAN has the following char- 256 filters. As in the generator, the convolution kernel size of each
acteristics: (a) The sigmoid function of the last layer of the dis- convolutional layer is 3 × 3 pixels. After the convolutional layer, there
criminator is discarded. Recall that the discriminator in GAN uses the are two fully connected layers, the first has 1024 outputs and the other
sigmoid function to achieve the real-or-fake binary classification. In has only one output. The main structure of the gradient penalty net-
contrast, the task of discriminator of WGAN, i.e. approximately fitting work is the Visual Geometry Group (VGG) network, especially the VGG
Wasserstein distance, is a regression task, and hence the sigmoid with 19 hidden layers (VGG-19) (Simonyan and Zisserman, 2014)
function in the last layer is no longer required; (b) A threshold is uti- network, from which the gradient penalty loss function is obtained.
lized as the upper bound of the absolute values of the parameters of the
discriminator, and the loss functions of both the generator and the 3. An improved YOLO v2 algorithm for small ship detection
discriminator are not logarithmic; (c) WGAN chooses the Root Mean
Square Prop (RMSProp) or the stochastic gradient descent (SGD) al- For the task of target detection, most of the feature extraction net-
gorithm instead of the momentum-based optimization algorithm as its works are constructed based on VGG-16 (Simonyan and Zisserman,
optimization method, and set a lower learning rate for the optimization 2014); in that VGG-16 usually performs superiorly on feature extraction
process. The objective function of WGAN is formulated as: and classification. However, the complexity of VGG-16 makes it very
minmax V (D , G ) = Ex ~ pdata (x ) [log D (x )] − Ex ~ pz (x ) [D (G (z ))] time-consuming and thus cannot be qualified for the tasks where the
G D ∈ D∗ (2) real-time performance plays an essential role. To address this, we apply
where D* is a 1-Lipschitz function, which requires a limit constant k the YOLO v2 algorithm based on the Darknet-19 network (Liu et al.,
such that the LP norm of the gradient in the discriminator D(x) is not 2016) to extract features in this study. As a basic network for feature
bigger than k. This can be achieved by: extraction, Darknet-19 consists of 19 convolutional layers and 5 pooling
layers. Referring to the complicated structure of VGG-16, the size of the
∥∇x D (x ) ∥p ⩽ k (3) convolution kernel of Darknet-19 is only 3 × 3 pixels. The feature
where D* is a k-Lipschitz function, ensuring that the updated parameter parameters are also dimensionally reduced by means of global average
of the discriminator does not exceed the threshold. The Wasserstein pooling, and the training process is more stable and efficient by batch-
distance can thus be fitted approximately by using such a modified normalization (Deng et al., 2014).
discriminator. The YOLO v2 algorithm partially adjusts the Darknet-19 network
However, the neural network of the discriminator of WGAN is si- structure and obtains the fine-grained features of the targets by adding
milar to a binary network, and the parameters are easily polarized (i.e., a pass-through layer to the detection network. It discards YOLO's
close to the boundary). To this end, the Lipschitz constraint is em- method of predicting bounding box coordinates using the fully con-
bedded in WGAN by additionally setting a gradient penalty (GP) term nected layer, but draws on the idea of the anchor boxes of Faster R-
(where k is set to 1), and the discriminator loss function of WGAN-GP is CNN, i.e., uses the K-means method to cluster the target candidate
obtained by weighting the original discriminator loss with WGAN: frames, and finally obtains the size and quantity of the adaptive anchor
box. YOLO v2 still inherites the loss function from YOLO. The weighted
LWGAN − gp (D) method is used to balance the influence of the positioning error and the
= Ex ~ pz (x ) [D (G (z ))] − Ex ~ pdata (x ) [D (x )] + λEx ~ pt (x ) [( ∥∇D (x ) ∥p − 1)]2 classification error on the stability of the model, to obtain relatively
high comprehensive performance.
(4)
The loss function formula of YOLO v2 is:
where t is the random different sample consisting of the convex com-
s2 B abj
bination of the real sample x and the generated sample z:
Loss = λ coord ∑ ∑ ∏ [(x i −
x i )2 + (yi −
yi )2]
t= εx + (1 − ε ) z , ε ∈ Uniform [0, 1] (5) i = 0 j = 0 ij
2 2
s B abj
⎡ i ⎞⎟ ⎤
i ) + ⎛⎜ hi −
Therefore, the overall structure of WGAN-GP consists of three parts: 2
+ λ coord ∑ ∑ ∏ ⎢ ( wi − w h ⎥
the generator, the discriminator, and the gradient penalty network. i = 0 j = 0 ij
⎣ ⎝ ⎠⎦
Specifically, the generator is a convolutional neural network with 8 s2 B abj s 2 abj
convolutional layers, where the convolution kernel size of each con- i )2 + ∑ ∏
+ (1 + λnoobj ) ∑ ∑ ∏ (Ci − C ∑ ̂ 2
(pi (c ) − pi ( c ))
volutional layer is 3 × 3 pixels. The first seven hidden layers of the i = 0 j = 0 ij i = 0 i c ∈ classes (6)
3
Z. Chen, et al. Safety Science 130 (2020) 104812
where λ coord and λ noobj represent the weighting coefficients of the po-
GMWGAN-GP and YOLO v2 with DBSCAN (GWGY)
sitioning prediction error and the classification error, respectively, s
denotes the number of rows and columns of the input image region Input: X /* real sample images */, img /* images that need to be recognize */,
division, B denotes the number of bounding boxes of the divided region annotation /* ground truth boxes and real class */
abj Output: cls_nm /* class name of target */, score /* confidence in category judge-
cells, and ∏ is used to determine whether the j-th bounding box of the ment */, coord /* coordinate position of target */
ij
1: Initialize learning rate ← 0.0001, weight decay ← 0.0005, momentum ← 0.9
i-th cell is the target. xi and yi represent the center position coordinates 2: while θ has not converged do
of the bounding box of the current grid, w and h denote the width and 3: for t = 1,…,n do
height of the bounding box, x i and
yi represent the central coordinate 4: for i = 1,…,m do
values of the real labeling boxes of each target, and w i are the
i and h 5: Sample real data x ~ pr, variable z=µi + σiδ, a random number ε∈U[0,1]
~
x ← Gθ(z)
width and height of the labeling boxes, Ci and C i are the Intersection 6:
7: x ← εx+(1-ε)~
x
over Union (IOU) values of the real bounding box and the detection (∼
L(i) ←D (∼
2
8: w x ) − D (x ) + λ ( ∥∇xD
w wx ) ∥ − 1)
2
box, and the confidence of the target in the bounding box. pi (c ) and 9: end for
pi (c ) represent the predicted and true probabilities of a certain type of
w ← Adam(∇ ⎛⎜ ∑i = 1 L(i) , w, α, β1, β2⎞⎟ )
1 m
target in the box, respectively. 10:
m
⎝ ⎠
Since YOLO v2 does not contain fully connected layers, it can dy- 11: end for
namically adjust the size of the input image. The size of the input image 12: sample a batch of latent variables \{ z(i)\} m
i=1~p (z )
can be fine-tuned during the training process, making YOLO v2 have
θ ← Adam(∇ ⎜⎛ ∑i = 1 (1 − D w (Gθ (z ))), θ, α, β1, β2⎞⎟)
1 m
the ability of good scale invariance. According to the previous studies 13:
m
⎝ ⎠
(Chen et al., 2019; Simonyan and Zisserman, 2014; Deng et al., 2014), 14: end while
the performance of current mainstream target detection frameworks is 15: for j = 1,…,n do
compared via two metrics, namely, mean Average Precision (mAP) and 16: for p = 1,…,m do
17: L ← Lcoord + Lobj + Lclass
Frames per Second (FPS).
18: end for
w ← Adam(∇ ⎜⎛ ∑p= 1 L, w, α ⎟⎞ )
1 m
19:
4. The proposed method ⎝
m
⎠
20: end for
The Gaussian mixture model (GMM) uses m normal distributions to 21: for img = 1,…,frame do
22: img ← img(convolution,pooling)
describe the overall diversity of the sample, which can accurately re-
23: F ← Fconvert + Fresize
flect the diversity characteristics of the sample while maintaining the 24: loss ← match(F)
similarity of features with the original sample. Therefore, GMM can be 25: end for
integrated into WGAN-GP to build a brand new Gaussian mixture 26: coord ← min∑ loss , class name ← max(score)
WGAN-GP (GMWGAN-GP) model. The probability density function of 27: return coord, class name
the Gaussian mixture model is:
−1
⎛ 1 T ∑ (x − μ) 5. Experimental results and discussion
∑ ⎞⎟ =
1
N ⎜x, μi , 1
e 2 (x − μ)
⎝ i ⎠ ( )n
2π 2 ∑2 (7) 5.1. Data description and experimental design
The parameterization trick can be used to generate a one-dimen-
sional random noise vector z obeying the prior distribution, as shown The small ship dataset used to evaluate the effectiveness of the
below: proposed method is collected via two means. Firstly, we use an ex-
perimental ship (see Fig. 3) with a camera to capture the images of real-
z= μi + σi δ (8) world small ships (positive samples) and the images without ships
(negative samples) on the Yangtze River in Wuhan. The experimental
In Eq. (7), μi , σi are the mean and standard deviation of the i-th
ship (7.50 m (length) × 2.72 m (width) × 2.20 m (height)) has been
Gaussian component, respectively, δ ~N (0, 1) . Therefore, the modified
converted into an unmanned ship that can drive automatically,
generator loss function can be written as:
equipped with a camera, millimeter-wave radar, Global Positioning
N
(1 − σi )2 System (GPS), and Inertial Navigation System. Secondly, since the
minVG (D , G ) = minEz~ pz [log(1 − D (G (μi + σi δ )))] + η ∑ number of the qualified images of small ships captured by the experi-
G G
i=1
N
mental ship is far from enough, we use small ship pictures collected
(9)
online as the supplement of the positive training samples. We set the
YOLO v2 transforms the candidate box selection strategy from the above-collected samples as the basic dataset.
traditional manual custom multi-scale candidate frame selection Examples of the collected samples (including the real-world photos
strategy to the k-means clustering approach. However, k-means may of the ships and the ship pictures collected online) in the basic dataset
encounter the problems of irregular object recognition and super- are shown in Fig. 4. Detailed information on the small dataset is sum-
parameter selection. A density-based clustering algorithm, called marized in Table 1.
Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Table 1 reveals the fact that the collected positive samples are still
is used instead of k-means in YOLO v2 in the proposed method. insufficient for training a classifier. To this end, the proposed
DBSCAN can solve the problems of irregular object recognition and GMWGAN-GP will be applied in the experiments to generate simulated
super-parameter selection, saving a lot of time for manual tuning. positive samples (i.e., fake images of small ships). The open-source
The proposed method in this study combines the aforementioned software, LabelImg, is used to annotate the original samples and the
deep neural networks, i.e., GMWGAN-GP and YOLO v2 with DBSCAN simulated samples generated by GMWGAN-GP. The generated annota-
(denoted as GWGY), which can effectively increase the small sample tion labels contain the image name, the target category, and the co-
image dataset and improve the recognition accuracy of small sample ordinates and size of the target circumscribed rectangle. Examples of
targets based on augmented samples. The framework and pseudo-code the generated samples are shown in Fig. 5.
of GWGY are shown in Fig. 2 and Algorithm 1, respectively. Theoretically, there is no limit to the number of generated images
Algorithm 1. Pseudo-code of the proposed method obtained by generating a network, but if the number of generated
4
Z. Chen, et al. Safety Science 130 (2020) 104812
system is Ubuntu16.04.
For GMWGAN-GP, the default hyperparameter setting is followed
by WGAN-GP. Specifically, the learning rate of the generator network is
Camera set to 0.0001, the learning rate of the discriminator network is set to
0.04, and the batch size of the generator and the discriminator is set to
4 and 1, respectively. For YOLO v2 algorithm, the learning rate is set to
0.0002, the number of steps is set to 5000, the batch size is set to 16,
and the rest of the parameters are set to their default values in
TensorFlow.
5
Z. Chen, et al. Safety Science 130 (2020) 104812
Table 1
General information of basic dataset.
Number of positive samples Number of negative samples Number of samples
Table 2
General information of dataset1 and the sub-datasets.
Type of samples Number of positive samples Number of negative samples Training set Validation set Test set
6
Z. Chen, et al. Safety Science 130 (2020) 104812
Table 4
Detection results of selected methods.
No. Method Accuracy (%) TPR (%) FPR (%)
boats.
• SSD: It extracts features of objectives in different sizes to detect
small objects more precisely. It is applied as the target detection
algorithm in the experiments.
7
Z. Chen, et al. Safety Science 130 (2020) 104812
Table 5 the proposed method achieves the highest accuracy rate (97.2%), fol-
Results of different detection methods combined with various generation lowed by YOLO v2-WGAN-GP (95%), YOLO v2-DCGAN (94.5%), SSD-
methods. WGAN-GP and SSD-DCGAN (94.3%), Faster R-CNN-WGAN-GP (90.2%),
No. Method Accuracy (%) TPR (%) FPR (%) and Faster R-CNN-DCGAN (89.3%). According to True Positive Rate
(TPR) and False Positive Rate (FPR), it can be seen that the proposed
1 Faster R-CNN-DCGAN 89.3 97.2 20.2 method (TPR: 98.3% and FPR: 3.5%) outperforms the compared
2 Faster R-CNN-WGAN-GP 90.2 96.6 16.3
methods. The result of FPR of the proposed method is the lowest among
3 SSD-DCGAN 94.3 96.2 8.5
4 SSD-WGAN-GP 94.3 94.3 6.2 the compared methods. Besides, the highest value (0.960) of AUC is
5 YOLO v2-DCGAN 94.5 90.2 4.3 obtained by the proposed method. Therefore, it is concluded that the
6 YOLO v2-WGAN-GP 95.0 94.3 4.3 proposed method is more effective in small ship detection than other
7 The proposed method 97.2 98.3 3.5
compared methods (see Fig. 8).
8
Z. Chen, et al. Safety Science 130 (2020) 104812
Intelligence 32 (9), 1627–1645. Sanchez-Lopez, J.L., Pestana, J., Saripalli, S., Campoy, P., 2014. An approach toward
Girshick, R., Donahue, J., Darrell, T., Malik, J., 2014. Rich feature hierarchies for accurate visual autonomous ship board landing of a VTOL UAV. J. Intelligent Robotic Syst. 74
object detection and semantic segmentation, pp. 580–587. (1–2), 113–127.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale
A., Bengio, Y., 2014. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 3, image recognition. arXiv preprint arXiv:1409.1556.
2672–2680. Tran, T., Le, T., 2016. Vision based boat detection for maritime surveillance. International
Henschel, M.D., Rey, M.T., Campbell, J., Petrovic, D., 1998. Comparison of probability Conference on Electronics, Information, and Communications. IEEE, pp. 1–4.
statistics for automated ship detection in SAR imagery. Int. Soc. Opt. Photonics 3491, Wackerman, C.C., Friedman, K.S., Pichel, W.G., Clemente-Col O N, P., Li, X., 2001.
986–991. Automatic detection of ships in RADARSAT-1 SAR imagery. Can. J. Remote Sens.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A.C., 2016. Ssd: 27(5), 568–577.
Single Shot Multibox Detector. Springer, Cham, pp. 21–37. Wijnhoven, R., van Rens, K., Jaspers, E.G., de With, P.H., 2010. Online learning for ship
Matsumoto, Y., 2013. Ship image recognition using HOG. The Journal of Japan Institute detection in maritime surveillance. In: Procceedings of 31th Symposium on
of Navigation 129. Information Theory in the Benelux, pp. 73–80.
Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2016. You only look once: Unified, real- Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., Guo, Z., 2018. Automatic ship de-
time object detection. In: Proceedings of the IEEE Conference on Computer Vision tection in remote sensing images from google earth of complex scenes based on
and Pattern Recognition, pp. 779–788. multiscale rotation dense feature pyramid networks. Remote Sens. 10 (1), 132.
Redmon, J., Farhadi, A., 2017. YOLO9000: better, faster, stronger. IEEE Conference on Yao, Y., Jiang, Z., Zhang, H., Zhao, D., Cai, B., 2017. Ship detection in optical remote
Computer Vision and Pattern Recognition 6517–6525. sensing images based on deep convolutional neural networks. J. Appl. Remote Sens.
Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster r-cnn: Towards real-time object de- 11 (4) 042611.
tection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intelligence Zhao, H., Zhang, W., Sun, H., Xue, B., 2019. Embedded deep learning for ship detection
39 (6), 1137–1149. and recognition. Future Internet. 11 (2), 53.