1 s2.0 S0925753520302095 Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Safety Science 130 (2020) 104812

Contents lists available at ScienceDirect

Safety Science
journal homepage: www.elsevier.com/locate/safety

Deep learning for autonomous ship-oriented small ship detection T


a,b a,b c,⁎ a,b d,⁎
Zhijun Chen , Depeng Chen , Yishi Zhang , Xiaozhao Cheng , Mingyang Zhang ,
Chaozhong Wua
a
Intelligent Transportation Systems Research Center, Wuhan University of Technology, Wuhan 430063, PR China
b
School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, PR China
c
School of Management, Wuhan University of Technology, Wuhan 430063, PR China
d
School of Engineering, Department of Mechanical Engineering, Marine Technology, Aalto University, Espoo, Finland

A R T I C LE I N FO A B S T R A C T

Keywords: Small ship detection is an important topic in autonomous ship technology and plays an essential role in shipping
Autonomous ship safety. Since traditional object detection techniques based on the shipborne radar are not qualified for the task of
Small ship detection near and small ship detection, deep learning-based image recognition methods based on video surveillance
Wasserstein generative adversarial network systems can be naturally utilized on autonomous vessels to effectively detect near and small ships. However, a
Convolutional neural network
limited number of real-world samples of small ships may fail to train a learning method that can accurately
Ship safety
detect small ships in most cases. To address this, a novel hybrid deep learning method that combines a modified
Generative Adversarial Network (GAN) and a Convolutional Neural Network (CNN)-based detection approach is
proposed for small ship detection. Specifically, a Gaussian Mixture Wasserstein GAN with Gradient Penalty is
utilized to first directly generate sufficient informative artificial samples of small ships based on the zero-sum
game between a generator and a discriminator, and then an improved CNN-based real-time detection method is
trained on both the original and the generated data for accurate small ship detection. Experimental results show
that the proposed deep learning method (a) is competent to generate sufficient informative small ship samples
and (b) can obtain significantly improved and robust results of small ship detection. The results also indicate that
the proposed method can be effectively applied to ensuring autonomous ship safety.

1. Introduction of inland rivers.


The identification of objects particularly ships in the channel plays
Autonomous ship technology has been rapidly developing and is an essential role in navigation aids and safety control of autonomous
gradually changing from the experimental stage to the practical stage. It ships. Ship detection via Satellite remote sensing (Yao et al., 2017; Yang
could significantly reduce the challenges caused by the unexpected et al., 2018), Synthetic Aperture Radar (SAR) (Henschel et al., 1998;
errors of manual navigation and thus decrease manpower cost, increase Wackerman et al., 2001), and surveillance video systems (Tran and Le,
navigation security, and promote the associated profit margins. Since it 2016; Sanchez-Lopez et al., 2014; Felzenszwalb et al., 2009) attracted
is generally accepted that autonomous ships will be best suited for significant attention in the past decades, while the latter two can be
ocean navigation, a large number of applications and studies of au- directly applied in autonomous ship voyage. Henschel et al. (Henschel
tonomous ships focus on large ships recognition and classification in the et al., 1998) propose a ship detection method using SAR, which esti-
marine environment. However, despite ocean navigation, inland river mates the probability distribution of ships in SAR imagery. Wackerman
navigation is also an important part of water transportation. In the river et al. (Wackerman et al., 2001) propose a method for detecting ships of
or near-shore ocean environments, many of the surface targets are small which the length is over 35 m in low-resolution SAR imagery. However,
and compact targets such as small fishing boats and bamboo rafts. there are two main shortcomings of SAR-based ship detection methods,
These small targets often move elusively within relatively narrow including a) SAR is often limited to large vessel detection, but may not
channels and are difficult to be discriminated automatically in the work well on identify ships in the complex contexts, and b) shipborne
complex inland river environment. Therefore, it is imperative to focus radar systems usually work efficiently on remote sensing rather than
on the application of the autonomous ships in the specific environment near-side monitoring for ship recognition (Henschel et al., 1998). For


Corresponding authors.
E-mail addresses: [email protected] (Z. Chen), [email protected] (D. Chen), [email protected] (Y. Zhang), [email protected] (X. Cheng),
mingyang.0.zhang@aalto.fi (M. Zhang), [email protected] (C. Wu).

https://doi.org/10.1016/j.ssci.2020.104812
Received 1 January 2020; Received in revised form 28 March 2020; Accepted 6 May 2020
Available online 18 June 2020
0925-7535/ © 2020 Elsevier Ltd. All rights reserved.
Z. Chen, et al. Safety Science 130 (2020) 104812

the smart and autonomous ships that work in complex voyage en- training samples, based on the zero-sum game between a generator
vironments like inland rivers (e.g., the unmanned patrol speedboat neural network and a discriminator neural network.
launched by Hefei public security bureau for drowning prevention on Inspired by this, a novel small ship detection method is proposed
the ‘Tiane’ lake in Chaohu, Anhui, China1) or near-shore ocean en- mainly based on GAN and YOLO v2. Specifically, a modified
vironments, efficiently detecting near and small ships around their Wasserstein Generative Adversarial Network with Gradient Penalty
channels seems an essential task to keeping the safety of the voyage (WGAN-GP) is first utilized in this study to generate sufficient in-
mission. formative small ship images as the additional training samples for
Surveillance video-based ship detection methods have thus been training data enhancement, and an improved YOLO v2 algorithm is
developed to address the aforementioned problem. Tran and Le (Tran then applied to complete the task of small ship detection based on the
and Le, 2016) propose a vision-based method to detect the outline of augmented training samples (comprising of original and generated
ships via maritime surveillance videos. This method can effectively samples).
detect ships in both maritime and non-maritime backgrounds. Yao et al. The remainder of the paper is organized as follows. Section 2 mainly
(Yao et al., 2017) propose a hybrid ship detection method that in- introduces the modified WGAN-GP. Section 3 mainly introduces the
tegrates deep learning methods. Specifically, they utilize Deep Neural improved YOLO v2 algorithm. Section 4 provides the framework and
Networks (DNNs) and Region Proposal Networks (RPNs) to obtain a 2D the pseudo-code of the proposed method. Extensive experiments are
bounding box of target ships. Similarly, Zhao et al. (Zhao et al., 2019) conducted in Section 5 to validate the effectiveness of the proposed
also propose a two-stage neural network for ship detection and re- method. Conclusions and future work are presented in Section 6.
cognition. Wijnhoven et al. (Wijnhoven et al., 2010) present an object
detection system based on Histogram of Oriented Gradients (HOG) 2. A modified generative adversarial network for data
(Dalal and Triggs, 2005) for finding ships in the maritime video. Mat- enhancement
sumoto (Matsumoto, 2013) proposes a HOG-SVM (Support Vector
Machine) method to detect ships on the images from ship mounted 2.1. Generative adversarial network
camera. However, most of the above methods are not feasible to be
applied by autonomous ships (e.g. the methods proposed in (Zhao et al., GAN is a generative model based on zero-sum game theory
2019) and (Wijnhoven et al., 2010) are based on the static cameras for (Goodfellow et al., 2014). It aims to fit the distribution of the sample set
port management and thus do not match the shipborne surveillance and then to output highly qualified generated samples. GAN is com-
systems on moved autonomous ships). Furthermore, those methods still posed of a generator (G) and a discriminator (D), the generator first
show effectiveness in detecting medium or large-sized ships, while the generates samples according to the random noise (z), and then feeds the
task of accurate recognition of small or tiny ships in inland rivers still generated samples together with the real samples to the discriminator,
lacks an effective solution. For autonomous ships that sail on inland training the discriminator’s identification ability of the fake (i.e. gen-
rivers or near-shore ocean environments, small moving obstacles like erated) samples. The discriminator in turn feeds back its identification
small ships need to be paid extraordinary attention to ensure voyage result of the real samples and the generated samples to the generator,
safety. Unfortunately, it is hard to obtain sufficient training samples of thereby training the generator to simulate the real samples. Through
small ship images (including different shapes and types of small ships) multiple iterations of this process, the samples generated by the gen-
due to the collection and calibration difficulties, which serves as a fatal erator are getting much closer to the real ones, and the critical result of
shortage for traditional machine learning methods for small ship de- the discriminator is close to the same for the real and the generated
tection. samples. Ideally, the generator and the discriminator will finally reach a
The significant development of deep learning techniques in recent dynamic balance, and at this time the generator can generate samples
years provides great opportunities for proposing an effective autono- that discriminator is not able to identify whether it is real or fake. The
mous ship-oriented small ship detection method. Deep learning, parti- core objective function of GAN can be expressed as the following
cularly deep neural networks, achieve increasing performances in ob- minimax formulation:
ject detection tasks (Girshick et al., 2014; Chen et al., 2019; Ren et al.,
2015; Redmon et al., 2016; Liu et al., 2016; Redmon and Farhadi, minmax V (D, G ) = Ex ~ pdata (x ) [log D (x )] + Ex ~ pz (x ) [log(1 − D (G (z )))]
G D
2017). Regions with CNN (R-CNN) proposed by Girshick et al. (Girshick (1)
et al., 2014) and its variants like Fast R-CNN (Chen et al., 2019) and
Faster R-CNN (Ren et al., 2015) are the typical deep learning-based The loss function shown in Eq. (1) is composed of two parts: the
object detection methods. For any input image, it generates multiple expectations of logarithmic distribution function of the generator and
windows which may contain the detected objects by explanatorily se- discriminator, where x represents the real sample, z is the random noise
lecting the sliding target window form. Then these windows are iden- and is also the input of the generator, G(z) denotes the generated
tified and the redundant windows (i.e. windows without target objects) samples from the generator, p represents the distribution of data, and E
are finally removed. The YOLO (“You Only Look Once”) method pro- represents the expected value in the distribution of samples. The pur-
posed by Redmon et al. (Redmon et al., 2016) and its variants (e.g. pose of the generator is to make D(G(z)) as large as possible, i.e., to
YOLO v2 (Liu et al., 2016) are recent representative CNN-based real- make V(D, G) as small as possible. In contrast, the purpose of the dis-
time detection methods. The core idea of YOLO is integrating posi- criminator is to make D(x) as large as possible while D(G(z)) as small as
tioning (i.e. generating windows to overlap objects) and classification possible, i.e., to make V(D, G) as large as possible. The structure of GAN
(i.e. detecting whether objects are in the windows) by using a regres- is shown in Fig. 1.
sion approach. Currently, SSD (Single Shot multibox Detector) (Redmon
and Farhadi, 2017) and YOLO v2 (Liu et al., 2016) generally perform 2.2. Wasserstein GAN with the gradient penalty term
best in both accuracy and speed in the majority of object detection
tasks. More importantly, recently developed deep learning method Although GAN has pioneered sample generation, it has a serious
based on game theory, called Generative Adversarial Networks (GANs) deficit: the generator gradient tends to disappear when the dis-
(Goodfellow et al., 2014), can successfully generate informative and criminator trains well. This may cause a deadlock in network training
vivid images given a certain number of ground truth images as the and result in a lack of diversity or distortion in generated samples. To
this end, Wasserstein GAN (WGAN) (Arjovsky et al., 2017) refines GAN
by using the Wasserstein distance (i.e. Earth-Mover distance (Chen
1
http://language.chinadaily.com.cn/2017–07/10/content_30055692.htm. et al., 2019) instead of the JS (Jensen-Shannon) divergence to measure

2
Z. Chen, et al. Safety Science 130 (2020) 104812

Real samples

generates
random noise Generator (G) Discriminator (D) loss
samples

update

Fig. 1. Structure of GAN.

the distance between the distributions of the real and the generated generator have 32 filters, and the last layer utilizes the 3 × 3 filter to
samples. The advantage of Wasserstein (EM) distance is its ability to generate a feature map and the activation function is ReLU (Rectified
measure two distributions even there is no overlap between them, Linear Units). The discriminator has 6 convolutional layers, where the
making GAN able to tackle the problems of insufficient training samples first two convolutional layers have 64 filters, the middle two convolu-
and lack of diversity caused by collapse (Arjovsky et al., 2017). tional layers have 128 filters, and the last two convolutional layers have
Compared with GAN, Wasserstein-GAN has the following char- 256 filters. As in the generator, the convolution kernel size of each
acteristics: (a) The sigmoid function of the last layer of the dis- convolutional layer is 3 × 3 pixels. After the convolutional layer, there
criminator is discarded. Recall that the discriminator in GAN uses the are two fully connected layers, the first has 1024 outputs and the other
sigmoid function to achieve the real-or-fake binary classification. In has only one output. The main structure of the gradient penalty net-
contrast, the task of discriminator of WGAN, i.e. approximately fitting work is the Visual Geometry Group (VGG) network, especially the VGG
Wasserstein distance, is a regression task, and hence the sigmoid with 19 hidden layers (VGG-19) (Simonyan and Zisserman, 2014)
function in the last layer is no longer required; (b) A threshold is uti- network, from which the gradient penalty loss function is obtained.
lized as the upper bound of the absolute values of the parameters of the
discriminator, and the loss functions of both the generator and the 3. An improved YOLO v2 algorithm for small ship detection
discriminator are not logarithmic; (c) WGAN chooses the Root Mean
Square Prop (RMSProp) or the stochastic gradient descent (SGD) al- For the task of target detection, most of the feature extraction net-
gorithm instead of the momentum-based optimization algorithm as its works are constructed based on VGG-16 (Simonyan and Zisserman,
optimization method, and set a lower learning rate for the optimization 2014); in that VGG-16 usually performs superiorly on feature extraction
process. The objective function of WGAN is formulated as: and classification. However, the complexity of VGG-16 makes it very
minmax V (D , G ) = Ex ~ pdata (x ) [log D (x )] − Ex ~ pz (x ) [D (G (z ))] time-consuming and thus cannot be qualified for the tasks where the
G D ∈ D∗ (2) real-time performance plays an essential role. To address this, we apply
where D* is a 1-Lipschitz function, which requires a limit constant k the YOLO v2 algorithm based on the Darknet-19 network (Liu et al.,
such that the LP norm of the gradient in the discriminator D(x) is not 2016) to extract features in this study. As a basic network for feature
bigger than k. This can be achieved by: extraction, Darknet-19 consists of 19 convolutional layers and 5 pooling
layers. Referring to the complicated structure of VGG-16, the size of the
∥∇x D (x ) ∥p ⩽ k (3) convolution kernel of Darknet-19 is only 3 × 3 pixels. The feature
where D* is a k-Lipschitz function, ensuring that the updated parameter parameters are also dimensionally reduced by means of global average
of the discriminator does not exceed the threshold. The Wasserstein pooling, and the training process is more stable and efficient by batch-
distance can thus be fitted approximately by using such a modified normalization (Deng et al., 2014).
discriminator. The YOLO v2 algorithm partially adjusts the Darknet-19 network
However, the neural network of the discriminator of WGAN is si- structure and obtains the fine-grained features of the targets by adding
milar to a binary network, and the parameters are easily polarized (i.e., a pass-through layer to the detection network. It discards YOLO's
close to the boundary). To this end, the Lipschitz constraint is em- method of predicting bounding box coordinates using the fully con-
bedded in WGAN by additionally setting a gradient penalty (GP) term nected layer, but draws on the idea of the anchor boxes of Faster R-
(where k is set to 1), and the discriminator loss function of WGAN-GP is CNN, i.e., uses the K-means method to cluster the target candidate
obtained by weighting the original discriminator loss with WGAN: frames, and finally obtains the size and quantity of the adaptive anchor
box. YOLO v2 still inherites the loss function from YOLO. The weighted
LWGAN − gp (D) method is used to balance the influence of the positioning error and the
= Ex ~ pz (x ) [D (G (z ))] − Ex ~ pdata (x ) [D (x )] + λEx ~ pt (x ) [( ∥∇D (x ) ∥p − 1)]2 classification error on the stability of the model, to obtain relatively
high comprehensive performance.
(4)
The loss function formula of YOLO v2 is:
where t is the random different sample consisting of the convex com-
s2 B abj
bination of the real sample x and the generated sample z:
Loss = λ coord ∑ ∑ ∏ [(x i − 
x i )2 + (yi − 
yi )2]
t= εx + (1 − ε ) z , ε ∈ Uniform [0, 1] (5) i = 0 j = 0 ij
2 2
s B abj
⎡ i ⎞⎟ ⎤
i ) + ⎛⎜ hi −
Therefore, the overall structure of WGAN-GP consists of three parts: 2
+ λ coord ∑ ∑ ∏ ⎢ ( wi − w h ⎥
the generator, the discriminator, and the gradient penalty network. i = 0 j = 0 ij
⎣ ⎝ ⎠⎦
Specifically, the generator is a convolutional neural network with 8 s2 B abj s 2 abj
convolutional layers, where the convolution kernel size of each con- i )2 + ∑ ∏
+ (1 + λnoobj ) ∑ ∑ ∏ (Ci − C ∑ ̂ 2
(pi (c ) − pi ( c ))
volutional layer is 3 × 3 pixels. The first seven hidden layers of the i = 0 j = 0 ij i = 0 i c ∈ classes (6)

3
Z. Chen, et al. Safety Science 130 (2020) 104812

where λ coord and λ noobj represent the weighting coefficients of the po-
GMWGAN-GP and YOLO v2 with DBSCAN (GWGY)
sitioning prediction error and the classification error, respectively, s
denotes the number of rows and columns of the input image region Input: X /* real sample images */, img /* images that need to be recognize */,
division, B denotes the number of bounding boxes of the divided region annotation /* ground truth boxes and real class */
abj Output: cls_nm /* class name of target */, score /* confidence in category judge-
cells, and ∏ is used to determine whether the j-th bounding box of the ment */, coord /* coordinate position of target */
ij
1: Initialize learning rate ← 0.0001, weight decay ← 0.0005, momentum ← 0.9
i-th cell is the target. xi and yi represent the center position coordinates 2: while θ has not converged do
of the bounding box of the current grid, w and h denote the width and 3: for t = 1,…,n do
height of the bounding box,  x i and 
yi represent the central coordinate 4: for i = 1,…,m do
values of the real labeling boxes of each target, and w i are the
i and h 5: Sample real data x ~ pr, variable z=µi + σiδ, a random number ε∈U[0,1]
~
x ← Gθ(z)
width and height of the labeling boxes, Ci and C i are the Intersection 6:
7: x ← εx+(1-ε)~
 x
over Union (IOU) values of the real bounding box and the detection  (∼
L(i) ←D (∼
2
8: w x ) − D (x ) + λ ( ∥∇xD
w wx ) ∥ − 1)
2
box, and the confidence of the target in the bounding box. pi (c ) and 9: end for

pi (c ) represent the predicted and true probabilities of a certain type of
w ← Adam(∇ ⎛⎜ ∑i = 1 L(i) , w, α, β1, β2⎞⎟ )
1 m
target in the box, respectively. 10:
m
⎝ ⎠
Since YOLO v2 does not contain fully connected layers, it can dy- 11: end for
namically adjust the size of the input image. The size of the input image 12: sample a batch of latent variables \{ z(i)\} m
i=1~p (z )
can be fine-tuned during the training process, making YOLO v2 have
θ ← Adam(∇ ⎜⎛ ∑i = 1 (1 − D w (Gθ (z ))), θ, α, β1, β2⎞⎟)
1 m
the ability of good scale invariance. According to the previous studies 13:
m
⎝ ⎠
(Chen et al., 2019; Simonyan and Zisserman, 2014; Deng et al., 2014), 14: end while
the performance of current mainstream target detection frameworks is 15: for j = 1,…,n do
compared via two metrics, namely, mean Average Precision (mAP) and 16: for p = 1,…,m do
17: L ← Lcoord + Lobj + Lclass
Frames per Second (FPS).
18: end for

w ← Adam(∇ ⎜⎛ ∑p= 1 L, w, α ⎟⎞ )
1 m
19:
4. The proposed method ⎝
m

20: end for
The Gaussian mixture model (GMM) uses m normal distributions to 21: for img = 1,…,frame do
22: img ← img(convolution,pooling)
describe the overall diversity of the sample, which can accurately re-
23: F ← Fconvert + Fresize
flect the diversity characteristics of the sample while maintaining the 24: loss ← match(F)
similarity of features with the original sample. Therefore, GMM can be 25: end for
integrated into WGAN-GP to build a brand new Gaussian mixture 26: coord ← min∑ loss , class name ← max(score)
WGAN-GP (GMWGAN-GP) model. The probability density function of 27: return coord, class name
the Gaussian mixture model is:
−1
⎛ 1 T ∑ (x − μ) 5. Experimental results and discussion
∑ ⎞⎟ =
1
N ⎜x, μi , 1
e 2 (x − μ)
⎝ i ⎠ ( )n
2π 2 ∑2 (7) 5.1. Data description and experimental design
The parameterization trick can be used to generate a one-dimen-
sional random noise vector z obeying the prior distribution, as shown The small ship dataset used to evaluate the effectiveness of the
below: proposed method is collected via two means. Firstly, we use an ex-
perimental ship (see Fig. 3) with a camera to capture the images of real-
z= μi + σi δ (8) world small ships (positive samples) and the images without ships
(negative samples) on the Yangtze River in Wuhan. The experimental
In Eq. (7), μi , σi are the mean and standard deviation of the i-th
ship (7.50 m (length) × 2.72 m (width) × 2.20 m (height)) has been
Gaussian component, respectively, δ ~N (0, 1) . Therefore, the modified
converted into an unmanned ship that can drive automatically,
generator loss function can be written as:
equipped with a camera, millimeter-wave radar, Global Positioning
N
(1 − σi )2 System (GPS), and Inertial Navigation System. Secondly, since the
minVG (D , G ) = minEz~ pz [log(1 − D (G (μi + σi δ )))] + η ∑ number of the qualified images of small ships captured by the experi-
G G
i=1
N
mental ship is far from enough, we use small ship pictures collected
(9)
online as the supplement of the positive training samples. We set the
YOLO v2 transforms the candidate box selection strategy from the above-collected samples as the basic dataset.
traditional manual custom multi-scale candidate frame selection Examples of the collected samples (including the real-world photos
strategy to the k-means clustering approach. However, k-means may of the ships and the ship pictures collected online) in the basic dataset
encounter the problems of irregular object recognition and super- are shown in Fig. 4. Detailed information on the small dataset is sum-
parameter selection. A density-based clustering algorithm, called marized in Table 1.
Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Table 1 reveals the fact that the collected positive samples are still
is used instead of k-means in YOLO v2 in the proposed method. insufficient for training a classifier. To this end, the proposed
DBSCAN can solve the problems of irregular object recognition and GMWGAN-GP will be applied in the experiments to generate simulated
super-parameter selection, saving a lot of time for manual tuning. positive samples (i.e., fake images of small ships). The open-source
The proposed method in this study combines the aforementioned software, LabelImg, is used to annotate the original samples and the
deep neural networks, i.e., GMWGAN-GP and YOLO v2 with DBSCAN simulated samples generated by GMWGAN-GP. The generated annota-
(denoted as GWGY), which can effectively increase the small sample tion labels contain the image name, the target category, and the co-
image dataset and improve the recognition accuracy of small sample ordinates and size of the target circumscribed rectangle. Examples of
targets based on augmented samples. The framework and pseudo-code the generated samples are shown in Fig. 5.
of GWGY are shown in Fig. 2 and Algorithm 1, respectively. Theoretically, there is no limit to the number of generated images
Algorithm 1. Pseudo-code of the proposed method obtained by generating a network, but if the number of generated

4
Z. Chen, et al. Safety Science 130 (2020) 104812

Fig. 2. The framework of GWGY.

system is Ubuntu16.04.
For GMWGAN-GP, the default hyperparameter setting is followed
by WGAN-GP. Specifically, the learning rate of the generator network is
Camera set to 0.0001, the learning rate of the discriminator network is set to
0.04, and the batch size of the generator and the discriminator is set to
4 and 1, respectively. For YOLO v2 algorithm, the learning rate is set to
0.0002, the number of steps is set to 5000, the batch size is set to 16,
and the rest of the parameters are set to their default values in
TensorFlow.

5.3. Experimental results and discussion

The accuracy of the proposed method (and the compared methods)


are used as the evaluation metric. We also report the Intersection over
Union (IOU) value on all the datasets. IOU is defined as the ratio of the
Fig. 3. Experimental ship. Area of Overlap (represents the intersection of the truth ground box and
the bounding box of the regression result) to the Area of Union (re-
samples is too large, it will be difficult to guarantee the quality of the presents the union of the truth box and the bounding box of the re-
generated samples, and thus will finally impair the performance of ship gression result):
detection. Therefore, the collection of the basic dataset and the gener- Area of Overlap
ated positive samples (denoted as dataset1) is split into several sub- IOU =
Area of Union (10)
datasets in the experiments, for investigating the relationship between
the percentage of the generated positive samples and the detection In addition, we also report the Recall rate on all the datasets for a
results. General information of dataset1 and the splitting sub-datasets is comprehensive comparison. The results of the above evaluation metrics
shown in Table 2. are given in Table 3.
Table 3 shows that the detection results of the proposed method
5.2. Experimental platform and parameter settings achieved on augmented datasets are all higher than that on the basic
dataset. Specifically, the detection results of the proposed method ob-
The experimental platform applied in this study is the famous deep tains the highest accuracy (97.2%), the highest IOU (84.2%), and the
learning platform called TensorFlow2. The experiments are conducted highest Recall (92.3%) on augmented dataset_2. Therefore, the aug-
on a 64-bit personal computer with 2.40 GHz CPU, 4 GB RAM (CPU: mented dataset_2 are used as the training and test dataset for compar-
Intel Core i5-8300H and GPU: NVIDIA GTX 1050Ti). The operation ison experiments in the following section. Fig. 6 shows same examples
of small ships that are detected using the proposed method.
To validate the performance and effectiveness of the proposed
2
https://tensorflow.google.cn/ method, the most representative and well-performed learning methods

5
Z. Chen, et al. Safety Science 130 (2020) 104812

Fig. 4. Examples of the collected positive samples.

Table 1
General information of basic dataset.
Number of positive samples Number of negative samples Number of samples

Original training samples 205 2000 2205


Test samples 41 81 122

Fig. 5. Examples of the generated positive samples.

Table 2
General information of dataset1 and the sub-datasets.
Type of samples Number of positive samples Number of negative samples Training set Validation set Test set

Basic dataset 205 2000 2205 81 41


Generated samples_1 60 \ \ \ \
Augmented dataset_1 265 2000 2260 103 52
Generated samples_2 150 \ \ \ \
Augmented dataset_2 355 2000 2355 141 71
Generated samples_3 400 \ \ \ \
Augmented dataset_3 605 2000 2605 241 121

Table 3 including Faster Region Convolutional Neural Network (Faster R-CNN),


Result of the proposed method on dataset1. Single Shot Multi Box Detector (SSD) and YOLO v2 are applied to make
Sample set Accuracy (%) IOU (%) Recall (%)
a detailed comparison. The detailed information of the compared
methods (YOLO v2 has been already introduced previously) are shown
Basic dataset 90.2 68.6 85.3 as follows:
Augmented dataset_1 92.3 72.3 90.7
Augmented dataset_2
Augmented dataset_3
97.2
92.6
84.2
76.5
92.3
91.8 • Faster R-CNN: It further merges RPN and Fast R-CNN into a single
network by sharing their convolutional features. In our experiments,
this method is applied to locate and classify the objective small

6
Z. Chen, et al. Safety Science 130 (2020) 104812

Fig. 6. Examples of small ships detected by the proposed method.

Table 4
Detection results of selected methods.
No. Method Accuracy (%) TPR (%) FPR (%)

1 Faster R-CNN 86.0 96.1 24.0


2 SSD 93.5 94.5 8.5
3 YOLO v2 91.0 92.1 2.3
4 The proposed method 97.2 98.3 3.5

boats.
• SSD: It extracts features of objectives in different sizes to detect
small objects more precisely. It is applied as the target detection
algorithm in the experiments.

Table 4 shows the results of small ships detected using Faster R-


CNN, SSD, YOLO v2, and the proposed method. According to the re-
sults, the proposed method obtains the highest classification accuracy
(97.2%) among all methods. The lowest classification accuracy (86%) is
Fig. 7. AUC of the selected methods (algorithm index corresponds to the No.
obtained by Faster R-CNN. Also, according to True Positive Rate (TPR) shown in Table 4).
and False Positive Rate (FPR), the proposed method achieves the best

7
Z. Chen, et al. Safety Science 130 (2020) 104812

Table 5 the proposed method achieves the highest accuracy rate (97.2%), fol-
Results of different detection methods combined with various generation lowed by YOLO v2-WGAN-GP (95%), YOLO v2-DCGAN (94.5%), SSD-
methods. WGAN-GP and SSD-DCGAN (94.3%), Faster R-CNN-WGAN-GP (90.2%),
No. Method Accuracy (%) TPR (%) FPR (%) and Faster R-CNN-DCGAN (89.3%). According to True Positive Rate
(TPR) and False Positive Rate (FPR), it can be seen that the proposed
1 Faster R-CNN-DCGAN 89.3 97.2 20.2 method (TPR: 98.3% and FPR: 3.5%) outperforms the compared
2 Faster R-CNN-WGAN-GP 90.2 96.6 16.3
methods. The result of FPR of the proposed method is the lowest among
3 SSD-DCGAN 94.3 96.2 8.5
4 SSD-WGAN-GP 94.3 94.3 6.2 the compared methods. Besides, the highest value (0.960) of AUC is
5 YOLO v2-DCGAN 94.5 90.2 4.3 obtained by the proposed method. Therefore, it is concluded that the
6 YOLO v2-WGAN-GP 95.0 94.3 4.3 proposed method is more effective in small ship detection than other
7 The proposed method 97.2 98.3 3.5
compared methods (see Fig. 8).

6. Conclusions and future work

In this study, a novel detection method based on deep learning is


proposed using a camera to detect small ships such as small fishing
boats and bamboo rafts in the river or near-shore ocean environments.
The proposed method is developed based on GMWGAN-GP and an
improved YOLO v2. Specifically, WGAN-GP of which the generator is
optimized through GMM, DBSCAN (instead of K-means), and the im-
proved GAN and YOLO are combined to form the proposed method
called GWGY. The multichannel small ship datasets are then collected
in order to empirically evaluate the performance and effectiveness of
the proposed method.
The most representative and well-performed learning methods, in-
cluding Faster Region Convolutional Neural Network (Faster R-CNN),
Single Shot Multi Box Detector (SSD), and YOLO v2, are applied to
make a detailed comparison. Also, different detection methods com-
bined with various generation methods are applied in the experiments.
The most representative methods including DCGAN and WGAN-GP are
Fig. 8. AUC results of different detection algorithms combined with generation applied to generate training samples and integrated with the compared
methods (algorithm index corresponds to the No. shown in Table 5). classification algorithms in the experiments. According to experimental
results, it is concluded that the proposed method is more effective than
other compared methods, indicating that the proposed method can be
performance among the selected methods. Fig. 7 shows the results of
applied in autonomous shipping for small target detection. Note that
Area Under the receiver operating characteristic Curve (AUC) of all the
the proposed method can be utilized not only for intelligent ships but
selected methods. It can be seen that the highest value (0.96) is
also for shore-based identification of various ships and obstacles on the
achieved by the proposed method.
water surface. In addition, the proposed method may also be extended
In addition, selected detection methods combined with generation
to, for example, the applications of automatic pilot navigation and
methods are compared in the experiments. The most representative
vehicle driving assistance. Future work about the proposed method
methods including DCGAN and WGAN-GP are applied to generate
includes detecting small targets under various environments including
training samples of which the number is equal to that of the training
dark scenarios in the nights and swing scenarios caused by the waves
samples in augmented dataset_2. We show the detailed information of
and other external factors, and the above-mentioned potential appli-
the detection methods based on the generation methods as follows:
cations of the proposed method.

• DCGAN: The multilayer perceptron in GAN is substituted by the Acknowledgment


convolutional neural network, the activation function in dis-
criminator is replaced by leakyRELU function to prevent gradient
This work is supported in part by National Key R&D Program of
sparsity. It is applied as the benchmark methods for sample gen-
China under Grant 2018YFB1600600; in part by National Natural
eration in our experiment.

Science Foundation of China under Grants 61703319, 71702066,
WGAN-GP: It applies gradient penalty to reduce the effects of gra-
51775396, and U1764262; and in part by the Major Project of
dient disappears or gradient explosion from gradient clipping com-
Technological Innovation of Hubei Province under Grant 2017CFA008.
pared to GAN. It is also applied as a sample generation method in
our experiment.

References
Faster R-CNN + DCGAN: It is a combination of Faster R-CNN and
DCGAN.

Arjovsky, M., Chintala, S., Bottou, L.E.O., 2017. Wasserstein gan. arXiv preprint
Faster R-CNN + WGAN-GP: It is a combination of Faster R-CNN and arXiv:1701.07875.
WGAN-GP. Chen, Z., Cai, H., Zhang, Y., Wu, C., Mu, M., Li, Z., Sotelo, M.A., 2019. A novel sparse
• SSD + DCGAN: It is a combination of SSD and DCGAN. representation model for pedestrian abnormal trajectory understanding. Expert Syst.

• SSD + WGAN-GP: It is a combination of SSD and WGAN-GP.


Appl. 138 112753.
Chen, Z., Zhang, Y., Wu, C., Ran, B., 2019. Understanding individualization driving states
• YOLO v2 + DCGAN: It is a combination of YOLO v2 and DCGAN. via latent Dirichlet allocation model. IEEE Intell. Transp. Syst. Mag. 11, 41–53.
• YOLO v2 + WGAN-GP: It is a combination of YOLO v2 and WGAN- Dalal, N., Triggs, B., 2005. Histograms of oriented gradients for human detection. IEEE
computer society conference on computer vision and pattern recognition. IEEE, pp.
GP. 886–893.
Deng, J., Russakovsky, O., Krause, J., Bernstein, M.S., Berg, A., Fei-Fei, L., 2014. Scalable
Table 5 shows the results of different detection methods combined multi-label annotation. hci.stanford.edu, 3099–3102.
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D., 2009. Object detection
with various generation methods. According to the detection results,
with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Machine

8
Z. Chen, et al. Safety Science 130 (2020) 104812

Intelligence 32 (9), 1627–1645. Sanchez-Lopez, J.L., Pestana, J., Saripalli, S., Campoy, P., 2014. An approach toward
Girshick, R., Donahue, J., Darrell, T., Malik, J., 2014. Rich feature hierarchies for accurate visual autonomous ship board landing of a VTOL UAV. J. Intelligent Robotic Syst. 74
object detection and semantic segmentation, pp. 580–587. (1–2), 113–127.
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale
A., Bengio, Y., 2014. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 3, image recognition. arXiv preprint arXiv:1409.1556.
2672–2680. Tran, T., Le, T., 2016. Vision based boat detection for maritime surveillance. International
Henschel, M.D., Rey, M.T., Campbell, J., Petrovic, D., 1998. Comparison of probability Conference on Electronics, Information, and Communications. IEEE, pp. 1–4.
statistics for automated ship detection in SAR imagery. Int. Soc. Opt. Photonics 3491, Wackerman, C.C., Friedman, K.S., Pichel, W.G., Clemente-Col O N, P., Li, X., 2001.
986–991. Automatic detection of ships in RADARSAT-1 SAR imagery. Can. J. Remote Sens.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C., Berg, A.C., 2016. Ssd: 27(5), 568–577.
Single Shot Multibox Detector. Springer, Cham, pp. 21–37. Wijnhoven, R., van Rens, K., Jaspers, E.G., de With, P.H., 2010. Online learning for ship
Matsumoto, Y., 2013. Ship image recognition using HOG. The Journal of Japan Institute detection in maritime surveillance. In: Procceedings of 31th Symposium on
of Navigation 129. Information Theory in the Benelux, pp. 73–80.
Redmon, J., Divvala, S., Girshick, R., Farhadi, A., 2016. You only look once: Unified, real- Yang, X., Sun, H., Fu, K., Yang, J., Sun, X., Yan, M., Guo, Z., 2018. Automatic ship de-
time object detection. In: Proceedings of the IEEE Conference on Computer Vision tection in remote sensing images from google earth of complex scenes based on
and Pattern Recognition, pp. 779–788. multiscale rotation dense feature pyramid networks. Remote Sens. 10 (1), 132.
Redmon, J., Farhadi, A., 2017. YOLO9000: better, faster, stronger. IEEE Conference on Yao, Y., Jiang, Z., Zhang, H., Zhao, D., Cai, B., 2017. Ship detection in optical remote
Computer Vision and Pattern Recognition 6517–6525. sensing images based on deep convolutional neural networks. J. Appl. Remote Sens.
Ren, S., He, K., Girshick, R., Sun, J., 2015. Faster r-cnn: Towards real-time object de- 11 (4) 042611.
tection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intelligence Zhao, H., Zhang, W., Sun, H., Xue, B., 2019. Embedded deep learning for ship detection
39 (6), 1137–1149. and recognition. Future Internet. 11 (2), 53.

You might also like