0% found this document useful (0 votes)
20 views13 pages

Huang 2019

This document discusses using Faster R-CNN, a deep learning object detection model, for marine organism detection and recognition. However, obtaining large labeled datasets of marine organisms is difficult. Therefore, the paper proposes three data augmentation methods tailored for underwater imaging to address this issue: 1) simulating different underwater environments using underwater image restoration, 2) simulating different camera views using perspective transformation, and 3) simulating different illumination using illumination synthesis. The paper then evaluates the effectiveness of these proposed methods on a real-world underwater dataset using Faster R-CNN.

Uploaded by

ilkom.uho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views13 pages

Huang 2019

This document discusses using Faster R-CNN, a deep learning object detection model, for marine organism detection and recognition. However, obtaining large labeled datasets of marine organisms is difficult. Therefore, the paper proposes three data augmentation methods tailored for underwater imaging to address this issue: 1) simulating different underwater environments using underwater image restoration, 2) simulating different camera views using perspective transformation, and 3) simulating different illumination using illumination synthesis. The paper then evaluates the effectiveness of these proposed methods on a real-world underwater dataset using Faster R-CNN.

Uploaded by

ilkom.uho
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Neurocomputing 337 (2019) 372–384

Contents lists available at ScienceDirect

Neurocomputing
journal homepage: www.elsevier.com/locate/neucom

Faster R-CNN for marine organisms detection and recognition using


data augmentation
Hai Huang a, Hao Zhou a, Xu Yang b,∗, Lu Zhang b, Lu Qi c, Ai-Yun Zang d
a
National Key Laboratory of Science and Technology on Underwater Vehicle, Harbin Engineering University, Harbin 150001, PR China
b
Institute of Automatio Chinese Academy of Sciences, Beijing 100000, PR China
c
Computer Science and Engineering Department, The Chinese University of Hong Kong, Hong Kong 999077, PR China
d
College of Engineering, Ocean University of China, Qingdao 266100, PR China

a r t i c l e i n f o a b s t r a c t

Article history: Recently, Faster Region-based Convolutional Neural Network (Faster R-CNN) has achieved marvelous ac-
Received 18 March 2018 complishment in object detection and recognition. In this paper, Faster R-CNN is applied to marine or-
Revised 22 November 2018
ganisms detection and recognition. However, the training of Faster R-CNN requires a mass of labeled
Accepted 18 January 2019
samples which are difficult to obtain for marine organisms. Therefore, three data augmentation meth-
Available online 2 February 2019
ods dedicated to underwater-imaging are proposed. Specifically, the inverse process of underwater image
Communicated by Dr Shenglan Liu restoration is used to simulate different marine turbulence environments. Perspective transformation is
proposed to simulate different views of camera shooting. Illumination synthesis is used to simulate dif-
MSC:
00-01 ferent marine uneven illuminating environments. The performance of each data augmentation method,
99-00 together with previous frequently used data augmentation methods are evaluated by Faster R-CNN on
the real-world underwater dataset, which validate the effectiveness of the proposed methods for marine
Keywords: organisms detection and recognition.
Data augmentation
Underwater-imaging © 2019 Elsevier B.V. All rights reserved.
Marine organisms
Detection and recognition
Faster R-CNN

1. Introduction needs an operator with rich experience and highly focused atten-
tion. To further cut costs and reduce the difficulties in operation,
For underwater robotic capturing, marine organisms detection autonomous capturing is necessary. Object detection and recogni-
and recognition is becoming more and more important. Right now, tion are indispensable procedures for autonomous capturing.
seafood is mostly captured by human divers. Diver fishing, not In object detection and recognition, traditional machine learn-
only creates severe bodily injury for divers but also results in low ing based algorithms are popular in the past few decades. For
working efficiency, especially in the water depth more than 20 m. example, Garca et al. [4] realized object identification and seg-
Robotic capturing, by using underwater robot for seafood captur- mentation through a generic segmentation process. Sun et al.
ing, has been proposed for solving the existing problem of un- [5] proposed an automatic recognition algorithm through color-
derwater diver fishing. It can not only reduce the bodily injury of based identification and shape-based identification. For traditional
divers but also lower the price of seafood. methods, shape and color feature are the most frequently used for
Due to the numerous advantages of robotic capturing, fish- object detection and recognition. However, marine organisms show
ery robotics has attracted great research efforts [1–3]. Gener- varies shapes in different marine environments and their colors are
ally, robotic capturing is realized by remote aspiration manually similar to the seabed environment due to ecological reasons.
through umbilical cable. For example, Norway has developed a The popular deep learning based algorithms may improve the
submarine harvesting Remote Operated Vehicle (ROV) which has marine organism perception ability, which provides a powerful
realized sea urchin harvesting through remote aspiration manually. framework for object detection and recognition. For instance, Ser-
However, underwater robot manipulation is a difficult problem, it manet et al. [6] proposed the OverFeat algorithm, using Convolu-
tion Neural Networks (CNNs) and multi-scale sliding windows for
object detection, recognition, and classification. Ren et al. [7] pro-

Corresponding author.
posed a Faster Region-based Convolution Neural Network (Faster R-
E-mail addresses: [email protected] (H. Huang), [email protected]
(H. Zhou), [email protected] (X. Yang), [email protected] (L. Zhang),
CNN), using a Region Proposal Network (RPN) for region proposal
[email protected] (L. Qi), [email protected] (A.-Y. Zang). generation and then using a CNN for classification and bounding

https://doi.org/10.1016/j.neucom.2019.01.084
0925-2312/© 2019 Elsevier B.V. All rights reserved.
H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384 373

box regression. Compared with the traditional algorithms, deep The frequently used data augmentation methods like flip-
learning based algorithms are more robust to varying environ- ping, cropping, color casting, blur are useful data augmentation
ments like illumination changes, motion blur, and perspectives dis- methods. These frequently used methods just do simple trans-
tortion. Among recent deep learning algorithms, Faster R-CNN has formations and cannot meet the requirements of special situa-
excellent performance in lots of aspects. For example, Sa et al. tions. Therefore, different data augmentation methods are pro-
[8] applied Faster R-CNN model, together with transfer learning, to posed according to different application environments. Lv et al.
fruit detection with excellent performance. Hoang et al. [9] pro- [13] proposed five data augmentation methods dedicated to
posed a multiple scales Faster R-CNN approach to automatically face image, including landmark perturbation and four synthesis
detect whether a driver is using a cell-phone and whether driver’s methods (hairstyles, glasses, poses, illuminations). To reduce the
hands are on the steering. Zhang et al. [10] proposed a pedes- recognition error of plant leaf, Zhang et al. [14] adopted data
trian detection method, using RPN followed by boosted forests on augmentation methods (translation, scaling, sharpen, rotation)
shared, high-resolution convolutional feature maps. The above re- for reducing the over-fitting degree. Cui et al. [15] proposed a
sults validate competitive detection accuracy and real-time perfor- label-preserving data augmentation approach based on stochas-
mance of Faster R-CNN. Therefore, in this paper, we use Faster tic feature mapping and investigated it along with vocak tract
R-CNN model to evaluate different marine organisms data augmen- length perturbation on improving representation of pattern vari-
tation methods. ations in CNN acoustic modeling to deal with data sparsity.
Specifically, the excellent performance of Faster R-CNN is Charalambous and Bharath [16] introduced a simulation-based
mainly attributed to its powerful feature extractor, i.e. the CNN methodology and a subject-specific dataset which can be used for
component, which is used to extract features of image. However, generating synthetic video frames and sequences for data augmen-
it should be pointed out that large amounts of training data are tation. Rogez and Schmid [17] introduced an image-based syn-
needed for training a Faster R-CNN model. Overfitting problem oc- thesis engine that artificially augments a dataset of real images
curs when training a Faster R-CNN model with limited training with 2D human pose annotations by using 3D Motion Capture
data. Lots of methods can solve overfitting problem, like transfer data. Because of the special imaging environment, the research
learning, Dropout [11], etc. However, generating more training data of data augmentation dedicated to underwater image is relatively
is a more direct and effective way. Sun et al. [12] found that a bet- less.
ter model does not lead to substantial gains when the training set According to the special underwater environment, we introduce
is insufficient and the modest performance improvements are still three data augmentation methods for marine organisms image.
possible for exponential increases of the data. Therefore, data aug- Simulate marine turbulence Zhang et al. [18] proposed an at-
mentation methods, such as flipping, cropping, color casting, and mospheric turbulence model [19] based underwater image degra-
blur, are proposed by researchers to solve the problem of data in- dation function. On that function, frequency domain filter was
sufficiency. adopted for underwater image restoration. Inspired by this article,
The traditional data augmentation methods are effective in instead of image restoration, the inverse process of frequency do-
most circumstances, but may lose efficacy for marine organisms main filter is used for simulating marine turbulence under differ-
dataset due to the special underwater imaging environment. It is ent turbulence coefficients.
difficult to get enough marine organisms training data because of Simulate different shooting angles Different shooting angles of
the complicate underwater environments, and labeling numerous underwater object show different visual effects. Wimmer et al.
underwater images are costly and time-consuming. Therefore, ac- [20] used affine transformation to simulate different shooting an-
cording to the underwater imaging environment, three data aug- gles of endoscopic imagery for the classification of celiac disease.
mentation methods are proposed. First, marine turbulence can lead Freifeld et al. [21] proposed a Continuous Piecewise-Affine (CPA)
to underwater image degradation, the inverse process of underwa- Velocity Fields based transformation method, this method is used
ter image restoration can add turbulence influence on image and is as a key component in a learned data augmentation scheme [22],
used to simulate different levels of marine turbulence. Second, un- improving the result of image classifier. As we can see, affine trans-
derwater robots approach the marine organisms from different di- formation is used in [20,22] for data augmentation. Affine transfor-
rections, which implies different views of camera shooting. There- mation is a linear transformation between 2-dimensional coordi-
fore, perspective transformation is proposed to simulate different nates, it can implement translation, scaling, flipping, rotation, and
views of camera shooting. Third, seafloor is always too dark to shearing of the picture. Transformation between different shooting
shoot, additional interior light is needed for underwater imaging. angles is a 3-dimensional coordinates transformation. Thus, instead
However, the use of auxiliary light can result in uneven illumina- of affine transformation, perspective transformation is applied for
tion phenomenon for underwater images, thus illumination syn- simulating different shooting angles in this paper.
thesis is used to simulate different marine artificial illuminating Simulate uneven illumination Illumination characteristic has an
environments. important influence on object detection and recognition. In face
The remainder of the paper consists of the following parts. recognition, due to the change in illumination conditions, the same
Section 2 discusses the related works. Section 3 introduces the face appears differently. Lv et al. [13] used 3D face model recon-
Faster R-CNN model. Three data augmentation methods are in- struction method to simulate different illuminations, which makes
troduced in Section 4. The experimental results are presented in the face recognition model robust to different illuminations. Faraji
Section 5. Conclusions are drawn in Section 6. and Qi [23] proposed to produce illumination-invariant representa-
tion for face images by using logarithmic fractal dimension-based
method, and complete eight local directional patterns. Different
2. Related works from facial environment, underwater environment is always illumi-
nation insufficient and additional artificial illumination is needed
Adequate training data is needed for training a deep learning for underwater imaging. The use of artificial illumination can re-
based object detection and recognition model. It is difficult to ob- sult in uneven illumination and decrease the performance of ob-
tain sufficient marine organisms data in underwater environment. ject detection algorithm. Thus, illumination synthesis method is
Thus, data augmentation is a very important way to take full ad- used for simulating different underwater uneven illumination con-
vantage of the dataset we have. ditions.
374 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384

Fig. 1. Architecture of marine organisms detection and recognition model.

3. Introduction and training of Faster R-CNN regression loss is activated only for positive anchors.

Fig. 1 illustrated the architecture of the automatic marine or- ( x − xa ) ( y − ya )


tx = ty =
ganisms detection and recognition model which is based on the wa ha
w  
Faster R-CNN model. A Faster R-CNN network consists of two h
subnetworks — the Region Proposal Network (RPN) and the Fast tw = l og th = l og
wa ha
Region-based Convolutional Neural Network (Fast R-CNN) [24].
( x − xa )

( y∗ − ya )
RPN and Fast R-CNN share the same input feature maps which are tx∗ = ty∗ =
extracted by the base convolutional network. VGG16 [25] which is wa h
 w∗   a∗ 
pretrained on the ImageNet dataset is used as the base network h
in this paper. RPN is used to generate proposals and Fast R-CNN is tw∗ = l og th∗ = l og . (2)
wa ha
used to classify these proposals.
For bounding box regression, the parameterization of the four
coordinates is shown in Eq. (2). Where x and y represent the coor-
3.1. Region Proposal Network dinates of the box center, w and h represent the weight and height
of the bounding box respectively, x, xa , and x∗ are for the predi-
Region Proposal Network is used to generate proposals from the cated box, anchor box, and ground-truth box respectively (likewise
input feature maps. 512-dimensional feature maps are extracted for y, w, h). This can be the thought of bounding box regression
by the base convolutional network (VGG16) which takes as input from an anchor box to a nearby ground-truth box.
an entire image. The input feature maps are used as the input
of a 3 × 3 spatial window, every sliding window is mapped to a
512-dimensional feature vector. This feature vector is fed into two 3.2. Fast R-CNN network
sibling fully-connected layers — a box-regression layer and a box-
classification layer. In [7], a novel concept ’Anchors’ was proposed. Faster R-CNN network [24] is used for classifying the object
An anchor is centered at the 3 × 3 sliding window, each sliding proposals which are detected by RPN. A Fast R-CNN network takes
window has nine anchors with the combination of three scales as input an entire image and a set of object proposals. The same
[1282 , 2562 , 5122 ] and three aspect ratios [1: 1, 1: 2, 2: 1]. as RPN, the network first processes the whole image with the base
The box-classification layer checks whether an anchor is posi- convolutional network (VGG16) and produce a 512-d feature map.
tive or not and the box-regression layer outputs the coordinates of Then, each object proposal is mapped to a region of interest from
the bounding box. For training RPN, a loss function is defined as the feature maps. After that, the ROI pooling layer uses max pool-
follow: ing to convert the features of region of interest into a fixed spatial
extent of 7 × 7. The 7 × 7 feature vector is fed into a sequence of
1  1  ∗ fully connected layer that finally branch into two sibling output
L= Lcls ( p, p∗ ) + λ p Lreg (t , t ∗ ), (1)
Ncls Nreg layers — one softmax results and another bounding box regression
results.
where, i is the index of an anchor in a mini-batch, p is the proba- The softmax results show softmax probability estimates over K
bility of a proposal being object, p∗ is the true label of a proposal (K = 3 in this paper) object classes and a background class. The
(when a proposal is an object, p∗ = 1, otherwise p∗ = 0), t and t∗ bounding box regression results show coordinates of the bounding
are the coordinates of the predicted and ground-truth bounding box of the K object classes. For training of classification and bound-
box respectively, Ncls and Nreg are two normalization parameters, ing box regression, a multi-task loss L is defined as follow:
Lcls is classification loss, it is log loss over two classes (object ver-
sus not object). Lreg is regression loss, the term p∗ Lreg means the L( p, u, t u , v ) = Lcls ( p, u ) + λ[u ≥ 1]Lloc (t u , v ), (3)
H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384 375

where, p = ( p0 , p1 , pk ) is a discrete probability distribution over For image restoration, we need to know the turbulence coef-
K + 1 outputs, u is the true class number of objects, tu is the ficient k of the dataset. A series of turbulence coefficient k from
coordinate of predicted bounding box, v is the ground-truth 0.001 to 0.002 are presented. Then, Wiener filtering with differ-
bounding box, Lcls is log loss for true class u, Lloc is the bound- ent coefficients k are used to restore image. The process of image
ing box regression loss, λ is the hyper-parameter which controls restoration with Wiener filtering is described as follow:
the balance between the two tasks. This can be the thought of  1 |T (u, v )|2
bounding-box regression and classification. (u, v ) =
G G(u, v ), (6)
T (u, v ) |T (u, v )|2 + Q
3.3. Training Faster R-CNN where, Q is a constant, G is the Fourier transform of the origi-
nal image, T is the degradation model mentioned above, G  is the
There are three training patterns for Faster R-CNN. Four-Step Fourier transform of the restored image. Restored image can be ob-
Alternating Training method is used in this paper. This method tained by using Fourier inverse transform in G . The restoration re-
can be summarized in four steps. Firstly, training the RPN end- sults can be seen in Fig. 2. As illustrated in Fig. 2, when k = 0.001,
to-end by back-propagation and stochastic gradient descent (SGD). the restoration results show little improvement. Then, with the in-
This network is initialized with an ImageNet pre-trained model crease of coefficient k, the restoration results become better, es-
(VGG16). Secondly, a separate detection network is trained by Fast pecially, when k = 0.0015, we can get the best restoration results.
R-CNN using the proposals generated by step-1 RPN. This detection After that, with the increase of coefficient k, the restoration results
network is also initialized by VGG16. Thirdly, sharing the detection become overexposed. Thus, for this bunch of underwater image
network’s convolutional layers with RPN and only finetuning the set which was shot in the same underwater environment and the
layers unique to RPN. Fourthly, keeping the shared convolutional same time, the best image restoration result can be obtained when
layers fixed and fine-tune the unique layers of Fast R-CNN. Finally, k = 0.0015.
both networks share the same convolutional layers and form a uni- In the condition that the best restored image has been obtained,
fied network. we apply the inverse process of Wiener filtering with degradation
model Eq. (4) to simulate different levels of turbulence interfer-
4. Data augmentation ence. The inverse process is defined as follow:

To promote the research of marine organisms object detec-



T (u, v )(|T (u, v )|2 + Q )
F(u, v ) = F (u, v ) , (7)
tion and recognition, the National Natural Science Foundation of |T (u, v )|2
China (NSFC) announced a batch of annotated marine organisms
data. The dataset only contains 1800 labeled images, is insufficient where, F (u, v ) is the Fourier transform of the best restored im-
for training a Faster R-CNN model. Therefore, data augmentation age, F(u, v ) is the Fourier transform of the degradation im-
methods are used to enlarge the dataset. In this section, three data age with turbulence coefficient k. In this paper, we choose k =
augmentation methods which are specific to underwater marine {0.0 0 05, 0.0 01, 0.0 015, 0.0 02} respectively, and four augmented
organisms images are introduced. images are got per image, as shown in Fig. 3. Fig. 4 illustrates the
framework of the proposed method to simulate marine turbulence
4.1. Simulate marine turbulence for underwater image.

The optical property of water results in bad quality of under- 4.2. Simulate different shooting angles
water image. Experiment results [18] indicate, even the pure wa-
ter can affect the spread of light. Scattering and absorbing cause Marine organisms like sea cucumber, sea urchin and scallop like
the attenuation of light. The absorbing phenomenon of water can clinging to the seafloor because of their living habits. Therefore,
make the light intensity decay 4% per meter. The scattering phe- most of the marine organisms images are shot from an overhead
nomenon can reduce the contrast of underwater image and make angle and an object can be shot with different azimuth and de-
the image blur. In conclusion, severe degradation exists in under- pression angles. Affine transform or perspective transform can be
water image. In reality, different marine environments may cause used to simulate different shooting angles of underwater image.
varying degrees of degradation in underwater image. Thus, we con- Affine transformation is a frequently used data augmenta-
sider using the inverse process of underwater image restoration to tion method and is an extremely powerful tool in image pro-
simulate different degrees of degradation. cessing. Affine transformation is a linear transformation between
Inspired by the thought of Zhang et al. [18], which used atmo- 2-dimensional coordinates, it can implement translation, scaling,
spheric turbulence model to restore underwater degradation im- flipping, rotation, and shearing of the picture. However, The trans-
age, we use atmospheric turbulence model to simulate different formation between different shooting angles is a transformation
levels of marine turbulence. The atmospheric turbulence model is between 3-dimensional coordinates. Perspective transformation, is
defined as follow [19]: to imagine the 2D projection as though the object is being viewed
T (u, v ) = K1 e−(K2 u +K3 v2 )5/6
2
, (4) through a camera viewfinder. The camera’s position, orientation,
and field of view control the behavior of the perspective trans-
where, u and v are the coordinates of pixel, K1 is a constant, K2 formation. Fig. 5 shows underwater images transformed by affine
and K3 are the scaling factors of atmospheric turbulence control. transformation and perspective transformation respectively, the
Generally, Eq. (4) can be simplified as follow: comparison between Fig. 5b and 5c indicates that perspective
T (u, v ) = e−k(u +v ) ,
2 2 5/6
(5) transformation has a better visual effect in simulating different
shooting angles of image. Therefore, unlike most papers that use
where, k is the turbulence coefficient. For atmospheric turbulence, affine transformation for data augmentation, perspective transfor-
when k > 0.0025, it means excessively turbulent, when k ≈ 0.001, it mation is used for data augmentation in this paper.
means intermediate turbulent, when k < 0.0 0 025, it means slight Under homography, we can write the transformation of points
turbulent. However, in underwater environment, marine turbu- [26] in 3-dimensional between camera 1 and camera 2 as:
lence has a larger turbulence coefficient and brings a much more
serious degradation in underwater image. X2 = HX1 , X1 , X2 ∈ R3 . (8)
376 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384

Fig. 2. Restored images with different turbulence coefficients k.

Fig. 3. Augmented images with different turbulence coefficients k.

In the image planes, with homogeneous coordinates, we can


have:

λ1 x1 = X1 , λ2 x2 = X2 (9)

Therefore:
Fig. 4. Framework of marine turbulence simulation based data augmentation
method.
λ2 x2 = H λ1 x1 , (10)

where, λ1 and λ2 are the scale coefficients, H is the homography equation:


matrix. Ignoring the scales we can know that x2 is equal to Hx1 . x2 h11 h12 h13 x1
Note that x2 ∼ Hx1 is a direct mapping between points in the im- y2 = h21 h22 h23 y1 . (11)
age planes. From the equation x2 ∼ Hx1 , we can get the constraint z2 h31 h32 h33 z1
H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384 377

Fig. 5. Comparison between affine transform and perspective transform. Affine transformation matrix Hat and perspective transformation matrix Hpt are described in
Appendix A.

From Eq. (11), taking x = x2 /z2 and y = y2 /z2 , we can get: filter of linear spatial filtering method [32]. The spatial filtering op-
eration can be described as:
h11 x1 + h12 y1 + h13 z1
x= 1 
h31 x1 + h32 y1 + h33 z1 E =OX = O(x − i, y − j )X (i, j ), (14)
MN
h21 x1 + h22 y1 + h23 z1 i j
y= . (12)
h31 x1 + h32 y1 + h33 z1 where, O is the uneven illumination image and E is the light in-
Without loss of generality, set z1 = 1 and we can get: tensity distribution templet of O. X is the convolution mask which
is an identity matrix, the height M and width N of X are set as
h11 x1 + h12 y1 + h13 1/6 of the image height and weight. The notation “” means the
x=
h31 x1 + h32 y1 + h33 convolution operation. The light intensity distribution templets of
h21 x1 + h22 y1 + h23 images in Fig. 8 are shown in Fig. 9.
y= . (13) For simulating uneven illumination phenomenon in underwater
h31 x1 + h32 y1 + h33
image. Light intensity distribution templet E is first resized to the
The method for solving the homography matrix H can be found same size of underwater image I, then its average gray values a is
in [26]. From Eq. (13) we can know that after the homography ma- calculated. For further reflecting the uneven characteristic, we cal-
trix has been confirmed, we can use this homography matrix to culate the difference between the templet E and its average value
transform coordinates between different image planes. Fig. 6 shows a as follow:
images transformed by different perspective transformation matri- 
m 
n
ces. Fig. 7 illustrates the framework of the proposed method to D(x, y ) = E (x, y ) − a, (15)
simulate different shooting angles for underwater image. x=1 y=1

where, m and n are the height and weight of templet E, respec-


4.3. Simulate uneven illumination tively; D is the difference templet of templet E. After difference
templet D has been confirmed, we fuse it with the underwater im-
Illumination is an important influence factor for the robustness age I as follow:
of dataset, different types of illumination conditions may affect the

m 
n
results of object detection and recognition. In order to improve A=ID= [I (x, y ) + D(x, y )], (16)
the robustness of dataset to different illumination types, illumi- x=1 y=1
nation synthesis method is used for data augmentation [13,27,28].
The seafloor is usually illumination shortage for underwater cam- where, A is the augmented image. The notation “” means the ma-
era shooting, external light source is needed. However, different trix addition. Fig. 10 shows illumination synthesis results of un-
types of external light bring different impacts on underwater im- derwater image. Fig. 11 illustrates the framework of the proposed
age, thus illumination synthesis is used for underwater image data method to simulate uneven illumination for underwater image.
augmentation.
Different from the illumination synthesis method used in face 5. Experiment
recognition, illumination synthesis for underwater image is more
complicated because of the uneven illumination phenomenon. Un- 5.1. Experimental setup
even illumination means the tone, luminance or contrast is differ-
ent in different parts within an image frame [30]. A project pattern The marine organisms dataset which contains 1800 labeled
[31] emanates from a single point, and it can be impossible to en- images (705 × 420) is supported by NSFC for research purpose.
sure an even illumination for every point in the scene, whether The dataset was divided into two parts randomly, training set
close to or far from the projector. In addition, inhomogeneities in (1200 labeled training images) and test set (600 labeled test im-
the water will result in different brightness values being captured ages). Three data augmentation methods are used to augment the
for different pixels across the surface of the object. Hence, uneven original dataset (both training set and test set), the detailed in-
illumination phenomenon is inevitable when using auxiliary light formation about the augmented datasets are shown in Table 1.
resource in underwater environment. Fig. 8 shows uneven illumi- For simulating marine turbulence, four turbulence coefficients k =
nation phenomenon common in underwater images. {0.0 0 05, 0.0 01, 0.0 015, 0.0 02} are used to augment the original
According to the characteristics of underwater uneven illumi- dataset, then an augmented training set AD_1 contains 60 0 0 la-
nation, we first collect plenty of underwater images with uneven beled images and four test sets which per test set has 600 la-
illumination as shown in Fig. 8. Then, the light intensity distribu- beled test images are got. For simulating different shooting angles,
tion templet of each image is calculated with rectangular average we choose 11 perspective transformation matrices to augment the
378 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384

Fig. 6. Augmented images by perspective transformation. Perspective transformation matrices H1 to H15 are described in Appendix B.

Table 1 each class. The definition of AP is defined as following [34]:


Details of the augmented datasets.
1
Methods Training set Test sets
AP = p(r )dr, (17)
Marine Turbulence AD_1: 60 0 0 = 120 0 + 120 0 × 4 4 × 600 0
Shooting Angle AD_2: 14400 = 1200 + 1200 × 11 11 × 600
Uneven Illumination AD_3: 60 0 0 = 120 0 + 120 0 × 4 4 × 600 where, p is the precision rate, r is the recall rate. In this paper, mAP
scores are reported using Intersection-over-union (IoU) thresholds
at 0.5. Usually, the bigger the mAP scores, the better the detection
original dataset, and we get a training set AD_2 contains 14,400 results. PR curve [35], plots the true positive rate (TPR) against the
labeled images and 11 test sets. Four uneven illumination tem- positive predictive value (PPV) of a detector, is a visual represen-
plets are collected to simulate different uneven illumination phe- tation of an algorithm’s performance. A detector with higher PPV
nomenon, then a training set AD_3 with 60 0 0 labeled images and and higher TPR indicates a better verification and discrimination
four test sets are obtained. ability.
Mean Average Precision (mAP) [33] scores and Precision-Recall Experiments are running on an NVIDIA GTX 1080 GPU with a
(PR) curve are used for result evaluation. As its name suggests, graphics memory of 8 G. In SGD training, one image is randomly
mAP scores are the mean of the average precision (AP) scores for sampled in each mini-batch and each iteration with one mini-
H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384 379

Fig. 9. The extracted light intensity distribution templets.

Fig. 7. Framework of shooting angle simulation based data augmentation method.

Fig. 10. The illuminations synthesis results.

Fig. 8. Underwater uneven illumination images, taken by Li and Ge [29] in Li- method. Four test sets augmented by marine turbulence simula-
uqinghe Seas, East China Sea. tion based method with four different turbulence coefficients k =
{0.0 0 05, 0.0 01, 0.0 015, 0.0 02} are tested in this experiment.
batch. We set the learning rates as 0.001 and 0.0001 in the first The results on marine organisms detection and recognition are
2/3 and the last 1/3 iterations, respectively. shown in Table 2. The Baseline are the results tested by the Faster
R-CNN model which is trained by the original training set, and the
5.2. Robustness to marine turbulence Turbulence Sim are the results tested by the Faster R-CNN model
that is trained by training set AD_1. As illustrated in Table 2, the
We verify the robustness of the Faster R-CNN model trained by Baseline are sensitive to different marine turbulence, especially,
the training set AD_1 to turbulence coefficient variations. Train- with the increase of turbulence coefficient k, the detection results
ing set AD_1 contains 1200 original images and 4800 images show a slight decline. However, in the Turbulence Sim, the aver-
augmented by marine turbulence simulation based augmentation age mAP scores are 51.80, which is much higher than the Baseline,
380 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384

Table 3
Performance on the test sets aug-
mented by shooting angle simulation
based method.

Methods Baseline Angle Sim

H1 35.77 51.03
H2 38.07 55.48
H3 37.34 54.80
H4 32.48 52.01
H5 38.13 51.61
H6 23.07 52.20
H7 24.72 52.29
H8 41.86 51.33
H9 23.83 50.33
H10 39.19 49.29
H11 33.17 52.35
Ave. 33.57 52.25

Table 4
Performance on the test sets augmented by uneven illumination sim-
ulation based method.

Methods Temp-1 Temp-2 Temp-3 Temp-4 Ave.

Baseline 32.51 34.61 28.81 28.84 31.19


Illumin Sim 54.69 57.56 54.21 54.32 55.20

5.4. Robustness to illumination model

We evaluate the robustness of our Faster R-CNN model trained


by training set AD_3 to uneven illumination variations. Training set
AD_3, is augmented by uneven illumination simulation based data
augmentation method, contains 120 0 original images and 480 0
Fig. 11. Framework of uneven illumination simulation based data augmentation
augmented images. Four test sets, augmented by uneven illumina-
method.
tion simulation based data augmentation method with four uneven
illumination templets, are tested in this experiment. The four un-
Table 2
even illumination templets as shown in Fig. 9 and we name it from
Performance on the test sets augmented by marine turbulence sim-
ulation based method.
Temp-1 to Temp-4.
The results of marine organisms detection and recognition with
Methods 0.0 0 05 0.001 0.0015 0.002 Ave.
uneven illumination are shown in Table 4. The Baseline are the re-
Baseline 36.74 33.63 32.77 32.63 33.94 sults tested by the Faster R-CNN model which is trained by the
Turbulence Sim 52.77 52.60 52.77 49.30 51.80 original training set, and the Illumination Sim are the results tested
by the Faster R-CNN model which is trained by training set AD_3.
Comparing the results of the Baseline with the Illumination Sim, we
meaning that the Faster R-CNN model trained by augmented train- can see that uneven illumination simulation based data augmen-
ing set AD_1 is robust to marine turbulence variations. tation method significantly increases the performance of Faster R-
CNN model on marine organisms detection and recognition.

5.5. Comparison of data augmentation method


5.3. Robustness to different shooting angles
We evaluate the performance of different data augmentation
The performance of shooting angle simulation based data aug- methods in this section. Faster R-CNN model trained by different
mentation method is tested in this experiment. Two Faster R-CNN augmented training sets as listed below (O-1, O-2 and O-3 are fre-
models are trained by training set AD_2 and the original train- quently used data augmentation methods):
ing set, respectively. Training set AD_2, is augmented by shooting
• Baseline: No data augmentation method;
angle simulation based data augmentation method, contains 1200
• O-1: Flipping;
original images and 14,400 augmented images. Eleven test sets,
• O-2: Color casting [36];
augmented by shooting angle simulation based data augmenta-
• O-3: Noise;
tion method with eleven different perspective transformation ma-
• A: Simulating different marine turbulences;
trices, are tested in this experiment. Perspective transformation
• B: Simulating different shooting angles;
matrices H1 to H11 are chosen from Fig. 5 and are described in
• C: Simulating different uneven illuminations;
Appendix B.
• Fusion: Fusing augmented training sets A, B and C.
The test results are listed in Table 3, where the Baseline are the
results tested by the Faster R-CNN model which is trained by the The original test set, contains 600 labeled images, is used as
original training set, and the Angle Sim are the results tested by the test set of this experiment.
Faster R-CNN model which is trained by training set AD_2. From Table 5 shows marine organisms detection and recognition re-
Table 3 we can see that shooting angle simulation based data aug- sults by using different data augmentation methods, as well as fre-
mentation greatly improves the marine organisms detection and quently used data augmentation methods like flipping, color cast-
recognition performance of the Faster R-CNN model. ing, noise, etc. Compared with the Baseline, the O-1, O-2, O-3, A,
H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384 381

Fig. 12. The Baseline (left) vs. the Fusion (right) on marine organisms detection and recognition.

B and C improve the marine organisms detection results by 9.18%, can see from Fig. 12. However, color casting changes the color
−1.75%, 3.48%, 17.46%, 16.84%, 15.45% and 20.44%. All the data aug- of background, therefore influences the performance of Faster R-
mentation methods improve the performance of marine organisms CNN model. Meanwhile, the underwater image directed data aug-
detection and recognition except color casting. The reason why mentation methods have better performance than the frequently
color casting declined the performance of Faster R-CNN model is used data augmentation methods, such as A, B and C have ma-
that the background of marine organisms image is blue, as we rine organisms detection performance of 58.45, 58.14 and 57.45
382 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384

Fig. 13. PR curves of different data augmentation methods.

Table 5 Baseline. The test results of the Baseline and the Fusion are visual-
Test results of various data augmentation methods on the original
ized in Fig. 12. As illustrated in Fig. 12, the Fusion achieves better
test set. Bold type indicates the best results.
results than the Baseline.
Methods Ave. Sea urchin Sea cucumber Scallop Since we use the same detector for all methods, the PR curve
Baseline 49.76 15.12 71.12 63.03 can illustrate the performance of data augmentation methods on
O-1 54.33 29.18 70.62 63.19 different marine organisms categories. In Fig. 13, we plot three
O-2 48.89 22.23 70.15 53.30 PR curves for the three categories on marine organisms dataset.
O-3 51.49 20.23 71.63 62.62
In conjunction with Fig. 13 and Table 5 we can see that the de-
A 58.45 40.89 71.08 63.39
B 58.14 32.08 79.79 62.50 tection performances on sea urchin and scallop are analogous un-
C 57.45 39.26 70.16 62.91 der different data augmentation methods, however, the detection
Fusion 59.93 46.82 69.79 63.19 performances on sea cucumber are quite different under differ-
ent data augmentation methods. The reason why the detection
performance between different categories are different is that the
sample size on different categories is imbalance on the marine
respectively, much higher than the performance of O-1, O-2 and organisms dataset. For example, in Fig 12 we can see, the aver-
O-3 which are 54.33, 48.89 and 51.49 respectively. This is expected age sample size of sea urchin and scallop are far more than sea
and confirms that the underwater image directed data augmenta- cucumber which means that sample size of sea urchin and scal-
tion method actually works in underwater environment. The Fu- lop is saturated, while sample size of sea cucumber is insufficient.
sion is a dataset concatenated from data augmentation methods Therefore, data augmentation methods have better promotion on
A, B and C, achieves the best detection and recognition result of sea cucumber detection than sea urchin detection and scallop
59.93, improving the performance by 20.44% compared with the detection.
H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384 383

6. Conclusion
2.0697 0.1005 −84.32
Three underwater image data augmentation methods have been H9 = 0.2607 1.5784 −41.15 , (B.9)
proposed in this paper for improving the performance of marine 0.002265 0.0 010 0 0 1.0 0 0
organisms detection and recognition. Experiment results confirm 1.6316 0 −44.74
that the proposed three data augmentation methods increase the H10 = 0.2237 1.184 −22.3684 , (B.10)
robustness of Faster R-CNN to marine turbulence variations, shoot- 0.001842 0 1.0 0 0
ing angle variations, and uneven illumination variations, respec- 1.795 0.2756 −69.87
tively. When comparing with frequently used data augmentation H11 = 0.08013 1.676 −30.13 , (B.11)
methods, our methods, which dedicated to marine organisms im- 0.001603 0.001474 1.0 0 0
age are more effective in marine organisms detection and recog- 0.8531 0.1930 5.429
nition. In particularly, by fusing datasets from different data aug- H12 = −0.02154 1.087 −2.025 , (B.12)
mentation methods, Faster R-CNN gets the best performance. In −0.0 0 04308 0.0 0 09393 1.0 0 0
conclusion, the proposed methods show robustness to underwater
0.6723 0.08150 25.47
environment variations and outperform the previous data augmen-
H13 = −0.01698 0.8217 8.998 , (B.13)
tation methods on marine organisms detection and recognition.
−0.0 0 03396 0 1.0 0 0
0.7252 −0.1298 38.93
Acknowledgments
H14 = 0 0.6794 13.74 , (B.14)
0 −0.0 0 09160 1.0 0 0
Research supported by the National Science Foundation of
China under the grants of Nos. 61633009, 51570953, 51209050, 0.7518 −0.1719 39.79
61503383, and the National Key Research and Development Plan H15 = 0.1009 0.5674 9.889 . (B.15)
of China under the grant of 2016YFC0300801. 0.0 0 03364 −0.001332 1.0 0 0

References
Appendix A. Details of affine transformation matrix Hat and
perspective transformation matrix Hpt [1] M. Takagi, H. Mori, A. Yimit, Y. Hagihara, T. Miyoshi, Development of a small
size underwater robot for observing fisheries resources–underwater robot for
assisting abalone fishing, J. Robot. Mechatron. 28 (3) (2016) 397–403.
 [2] K. Koreitem, Y. Girdhar, W. Cho, H. Singh, J. Pineda, G. Dudek, Subsea fauna
0.1362 0.1354 0 enumeration using vision-based marine robots, in: Proceedings of the IEEE
Hat = , (A.1) Conference on Computer and Robot Vision, 2016, pp. 101–108.
−0.07246 0.7157 50.00 [3] S. Swart, J. Zietsman, J. Coetzee, D. Goslett, A. Hoek, D. Needham, P. Monteiro,
Ocean robotics in support of fisheries research and management, African J.
Mar. Sci. 38 (4) (2016) 525–538.
0.5781 0.1285 0 [4] J. García, J. Fernández, P. Sanz, R. Marín, Increasing autonomy within underwa-
H pt = −0.07246 0.6744 50.00 . (A.2) ter intervention scenarios: the user interface approach, in: Proceedings of the
−0.0 0 03982 0 1.0 0 0 IEEE Systems Conference, 2010, pp. 71–75.
[5] F. Sun, J. Yu, S. Chen, D. Xu, Active visual tracking of free-swimming robotic
fish based on automatic recognition, in: Proceedings of the IEEE Conference
Appendix B. Details of perspective transformation matrices H1 on Intelligent Control and Automation, 2014, pp. 2879–2884.
[6] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, in: Over-
to H15 feat: Integrated recognition, localization and detection using convolutional net-
works, 2013 preprint arXiv:1312.6229.
[7] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detec-
0.3446 −0.3369 114.7 tion with region proposal networks, in: Proceedings of the Advances in Neural
Information Processing Systems, 2015, pp. 91–99.
H1 = −0.1367 0.3914 62.67 , (B.1) [8] I. Sa, Z. Ge, F. Dayoub, B. Upcroft, T. Perez, C. McCool, Deepfruits: a fruit detec-
−0.0 0 06160 −0.001123 1.0 0 0 tion system using deep neural networks, Sensors 16 (8) (2016) 1222.
[9] T. Hoang Ngan Le, Y. Zheng, C. Zhu, K. Luu, M. Savvides, Multiple scale Faster
0.4480 0.1965 73.08 R-CNN approach to driver’s cell-phone usage and hands on steering wheel de-
H2 = −0.1415 0.5370 59.32 , (B.2) tection, in: Proceedings of the IEEE Conference on Computer Vision and Pat-
tern Recognition Workshops, 2016, pp. 46–53.
−0.0 0 06680 −0.0 0 06549 1.0 0 0
[10] L. Zhang, L. Lin, X. Liang, K. He, Is faster R-CNN doing well for pedestrian
0.4468 −0.1801 73.42 detection? in: Proceedings of the European Conference on Computer Vision,
H3 = −0.1437 0.6061 56.80 , (B.3) Springer, 2016, pp. 443–457.
[11] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, in:
−0.0 0 06926 −0.0 0 04329 1.0 0 0 Improving neural networks by preventing co-adaptation of feature detectors,
0.7538 −0.1897 53.33 2012 preprint arXiv:1207.0580.
[12] C. Sun, A. Shrivastava, S. Singh, A. Gupta, Revisiting unreasonable effectiveness
H4 = −0.1128 0.6872 56.41 , (B.4) of data in deep learning era, in: Proceedings of the IEEE International Confer-
−0.0 0 01382 −0.0 0 03846 1.0 0 0 ence on Computer Vision, 2017, pp. 843–852.
0.5908 0.1991 44.73 [13] J. Lv, X. Shao, J. Huang, X. Zhou, X. Zhou, Data augmentation for face recogni-
tion, Neurocomputing 230 (2017) 184–196.
H5 = −0.1540 0.9177 45.35 , (B.5) [14] C. Zhang, P. Zhou, C. Li, L. Liu, A convolutional neural network for leaves recog-
−0.0 0 08039 0.0 0 05688 1.0 0 0 nition using data augmentation, in: Proceedings of the IEEE Conference on
Computer and Information Technology; Ubiquitous Computing and Communi-
0.5921 0.05934 49.73 cations; Dependable, Autonomic and Secure Computing; Pervasive Intelligence
H6 = −0.1499 0.7935 49.92 , (B.6) and Computing, 2015, pp. 2143–2150.
−0.0 0 07596 0.0 0 01695 1.0 0 0 [15] X. Cui, V. Goel, B. Kingsbury , Data augmentation for deep neural network
acoustic modeling, IEEE/ACM Trans. Audio Speech Lang. Process. 23 (9) (2015)
0.6493 −0.1038 51.50 1469–1477.
H7 = −0.1403 0.6927 53.56 , (B.7) [16] C.C. Charalambous, A.A. Bharath, in: A data augmentation methodology for
−0.0 0 05814 −0.0 0 02966 1.0 0 0 training machine/deep learning gait recognition algorithms, 2016 preprint
arXiv:1610.07570.
0.7762 −0.1070 42.37 [17] G. Rogez, C. Schmid, Mocap-guided data augmentation for 3d pose estimation
H8 = −0.05289 0.7903 23.10 , (B.8) in the wild, in: Proceedings of the Advances in Neural Information Processing
Systems, 2016, pp. 3108–3116.
−0.0 0 02935 −0.0 0 03057 1.0 0 0
384 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384

[18] H. Zhang, Y. Xu, L. Wan, X. Tang, H. Cai, Processing method for underwater Hao Zhou received the B.S. degree in Naval Architecture
degenerative image, J. Tianj. Univ. 43 (9) (2010) 827–834. and Ocean Engineering from Harbin Engineering Univer-
[19] R. Hufnagel, N. Stanley, Modulation transfer function associated with image sity in 2016. He is currently pursuing an M.S. degree in
transmission through turbulent media, J. Opt. Soc. Am. 54 (1) (1964) 52– Computer Vision at National Key Laboratory of Science
61. and Technology on Underwater Vehicle, Harbin Engineer-
[20] G. Wimmer, A. Uhl, A. Vecsei, Evaluation of domain specific data augmentation ing University. His research interests include object detec-
techniques for the classification of celiac disease using endoscopic imagery, in: tion, binocular vision, and deep learning.
Proceedings of the IEEE International Workshop on Multimedia Signal Process-
ing, 2017, pp. 1–6.
[21] O. Freifeld, S. Hauberg, K. Batmanghelich, J.W. Fisher, Transformations based
on continuous piecewise-affine velocity fields, IEEE Trans. Pattern Anal. Mach.
Intell. 39 (12) (2017) 2496–2509.
[22] S. Hauberg, O. Freifeld, A.B.L. Larsen, J. Fisher, L. Hansen, Dreaming more
Xu Yang received the B.S. degree in Automation from
data: class-dependent distributions over diffeomorphisms for learned data
China Ocean University and the Ph.D. degree in Pattern
augmentation, in: Proceedings of the Artificial Intelligence and Statistics, 2016,
Recognition and Intelligent System from Institute of Au-
pp. 342–350.
tomation, Chinese Academy of Sciences, Bei Jing, China, in
[23] M.R. Faraji, X. Qi, Face recognition under varying illuminations using logarith-
2009 and 2014, respectively. He is currently an Associate
mic fractal dimension-based complete eight local directional patterns, Neuro-
Professor with State Key Laboratory of Management and
computing 199 (2016) 16–30.
Control for Complex System, Institute of Automation, Chi-
[24] R. Girshick, Fast R-CNN, in: Proceedings of the IEEE International Conference
nese Academy of Science, Bei Jing, China. His current re-
on Computer Vision, 2015, pp. 1440–1448.
search interests include underwater image processing and
[25] K. Simonyan, A. Zisserman, in: Very deep convolutional networks for large-
graph matching.
scale image recognition, 2014 arXiv:1409.1556.
[26] E. Dubrofsky, Homography Estimation, Diplomová práce. Vancouver: Univerzita
Britské Kolumbie, 2009.
[27] Y.K. Park, S.k. Park, J.K. Kim, Retinex method based on adaptive smooth-
ing for illumination invariant face recognition, Signal Process. 88 (8) (2008) Lu Zhang received the B.S. degree in Automation from
1929–1945. Shandong University in 2016. Since September 2016, she
[28] D. Jiang, Y. Hu, S. Yan, L. Zhang, H. Zhang, W. Gao, Efficient 3D reconstruction has been a Ph.D. candidate at the State Key Laboratory of
for face recognition, Pattern Recogn. 38 (6) (2005) 787–798. Management and Control for Complex System, Institute of
[29] Q. Li, Z. Ge, A visibility improving algorithm based on underwater imaging Automation, Chinese Academy of Science (CASIA). Her re-
model with non-uniform illumination, J. Optoelectron. Laser 12 (2011) 023. search interests include computer vision, pattern recogni-
[30] M. Wang, J. Pan, S. Chen, H. Li, A method of removing the uneven illumination tion, especially for image processing and object detection.
phenomenon for optical remote sensing image, in: Proceedings of the IEEE
International Conferences on Geoscience and Remote Sensing Symposium, 5,
2005, pp. 3243–3246.
[31] L. Ma, R.C. Staunton, A modified fuzzy c-means image segmentation algorithm
for use with uneven illumination patterns, Pattern Recogn. 40 (11) (2007)
3005–3011.
[32] R.C. Gonzalez, R.E. Woods, S.L. Eddins, Digital image processing using MATLAB, Lu Qi received the B.S. degree in Automation from Shan-
Prentice-Hall, Inc., 2003. dong University and M.S. degree in Pattern Recognition
[33] M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pas- and Intelligent System from Institute of Automation, Chi-
cal visual object classes (VOC) challenge, Int. J. Comput. Vis. 88 (2) (2010) nese Academy of Sciences, Bei Jing, China, in 2014 and
303–338. 2017. He is currently a Ph.D. student in Computer Science
[34] A. Turpin, F. Scholer, User performance versus precision measures for simple and Engineering Department, The Chinese University of
search tasks, in: Proceedings of the ACM International Conference on Research Hong Kong. His current research interests include object
and Development in Information Retrieval, 2006, pp. 11–18. detection and instance segmentation.
[35] C.D. Manning, H. Schütze, Foundations of statistical natural language process-
ing, MIT Press, 1999.
[36] R. Wu, S. Yan, Y. Shan, Q. Dang, G. Sun, in: Deep image: Scaling up image
recognition, 2015 arXiv:1501.02876.

Hai Huang received the B.S. and Ph.D. degrees in me- Aiyun Zang received the Ph.D. degree in control theory
chanical engineering from Harbin Institute of Technol- and engineering from Institute of Automation, Chinese
ogy, Harbin, China, in 2001 and 2008, respectively. He Academy of Sciences, Bei Jing, China, in 2004. She is now
is currently an Associate Professor and a Ph.D. Candidate working at the automation and control system depart-
Supervisor with the National Key Laboratory of Science ment, China Ocean University, Qing Dao, China. Her cur-
and Technology of Underwater Vehicle, Harbin Engineer- rent research interests include underwater image process-
ing University, Harbin, China. His current research inter- ing and graph matching.
ests include underwater vehicle and autonomous opera-
tion.

You might also like