Huang 2019
Huang 2019
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
a r t i c l e i n f o a b s t r a c t
Article history: Recently, Faster Region-based Convolutional Neural Network (Faster R-CNN) has achieved marvelous ac-
Received 18 March 2018 complishment in object detection and recognition. In this paper, Faster R-CNN is applied to marine or-
Revised 22 November 2018
ganisms detection and recognition. However, the training of Faster R-CNN requires a mass of labeled
Accepted 18 January 2019
samples which are difficult to obtain for marine organisms. Therefore, three data augmentation meth-
Available online 2 February 2019
ods dedicated to underwater-imaging are proposed. Specifically, the inverse process of underwater image
Communicated by Dr Shenglan Liu restoration is used to simulate different marine turbulence environments. Perspective transformation is
proposed to simulate different views of camera shooting. Illumination synthesis is used to simulate dif-
MSC:
00-01 ferent marine uneven illuminating environments. The performance of each data augmentation method,
99-00 together with previous frequently used data augmentation methods are evaluated by Faster R-CNN on
the real-world underwater dataset, which validate the effectiveness of the proposed methods for marine
Keywords: organisms detection and recognition.
Data augmentation
Underwater-imaging © 2019 Elsevier B.V. All rights reserved.
Marine organisms
Detection and recognition
Faster R-CNN
1. Introduction needs an operator with rich experience and highly focused atten-
tion. To further cut costs and reduce the difficulties in operation,
For underwater robotic capturing, marine organisms detection autonomous capturing is necessary. Object detection and recogni-
and recognition is becoming more and more important. Right now, tion are indispensable procedures for autonomous capturing.
seafood is mostly captured by human divers. Diver fishing, not In object detection and recognition, traditional machine learn-
only creates severe bodily injury for divers but also results in low ing based algorithms are popular in the past few decades. For
working efficiency, especially in the water depth more than 20 m. example, Garca et al. [4] realized object identification and seg-
Robotic capturing, by using underwater robot for seafood captur- mentation through a generic segmentation process. Sun et al.
ing, has been proposed for solving the existing problem of un- [5] proposed an automatic recognition algorithm through color-
derwater diver fishing. It can not only reduce the bodily injury of based identification and shape-based identification. For traditional
divers but also lower the price of seafood. methods, shape and color feature are the most frequently used for
Due to the numerous advantages of robotic capturing, fish- object detection and recognition. However, marine organisms show
ery robotics has attracted great research efforts [1–3]. Gener- varies shapes in different marine environments and their colors are
ally, robotic capturing is realized by remote aspiration manually similar to the seabed environment due to ecological reasons.
through umbilical cable. For example, Norway has developed a The popular deep learning based algorithms may improve the
submarine harvesting Remote Operated Vehicle (ROV) which has marine organism perception ability, which provides a powerful
realized sea urchin harvesting through remote aspiration manually. framework for object detection and recognition. For instance, Ser-
However, underwater robot manipulation is a difficult problem, it manet et al. [6] proposed the OverFeat algorithm, using Convolu-
tion Neural Networks (CNNs) and multi-scale sliding windows for
object detection, recognition, and classification. Ren et al. [7] pro-
∗
Corresponding author.
posed a Faster Region-based Convolution Neural Network (Faster R-
E-mail addresses: [email protected] (H. Huang), [email protected]
(H. Zhou), [email protected] (X. Yang), [email protected] (L. Zhang),
CNN), using a Region Proposal Network (RPN) for region proposal
[email protected] (L. Qi), [email protected] (A.-Y. Zang). generation and then using a CNN for classification and bounding
https://doi.org/10.1016/j.neucom.2019.01.084
0925-2312/© 2019 Elsevier B.V. All rights reserved.
H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384 373
box regression. Compared with the traditional algorithms, deep The frequently used data augmentation methods like flip-
learning based algorithms are more robust to varying environ- ping, cropping, color casting, blur are useful data augmentation
ments like illumination changes, motion blur, and perspectives dis- methods. These frequently used methods just do simple trans-
tortion. Among recent deep learning algorithms, Faster R-CNN has formations and cannot meet the requirements of special situa-
excellent performance in lots of aspects. For example, Sa et al. tions. Therefore, different data augmentation methods are pro-
[8] applied Faster R-CNN model, together with transfer learning, to posed according to different application environments. Lv et al.
fruit detection with excellent performance. Hoang et al. [9] pro- [13] proposed five data augmentation methods dedicated to
posed a multiple scales Faster R-CNN approach to automatically face image, including landmark perturbation and four synthesis
detect whether a driver is using a cell-phone and whether driver’s methods (hairstyles, glasses, poses, illuminations). To reduce the
hands are on the steering. Zhang et al. [10] proposed a pedes- recognition error of plant leaf, Zhang et al. [14] adopted data
trian detection method, using RPN followed by boosted forests on augmentation methods (translation, scaling, sharpen, rotation)
shared, high-resolution convolutional feature maps. The above re- for reducing the over-fitting degree. Cui et al. [15] proposed a
sults validate competitive detection accuracy and real-time perfor- label-preserving data augmentation approach based on stochas-
mance of Faster R-CNN. Therefore, in this paper, we use Faster tic feature mapping and investigated it along with vocak tract
R-CNN model to evaluate different marine organisms data augmen- length perturbation on improving representation of pattern vari-
tation methods. ations in CNN acoustic modeling to deal with data sparsity.
Specifically, the excellent performance of Faster R-CNN is Charalambous and Bharath [16] introduced a simulation-based
mainly attributed to its powerful feature extractor, i.e. the CNN methodology and a subject-specific dataset which can be used for
component, which is used to extract features of image. However, generating synthetic video frames and sequences for data augmen-
it should be pointed out that large amounts of training data are tation. Rogez and Schmid [17] introduced an image-based syn-
needed for training a Faster R-CNN model. Overfitting problem oc- thesis engine that artificially augments a dataset of real images
curs when training a Faster R-CNN model with limited training with 2D human pose annotations by using 3D Motion Capture
data. Lots of methods can solve overfitting problem, like transfer data. Because of the special imaging environment, the research
learning, Dropout [11], etc. However, generating more training data of data augmentation dedicated to underwater image is relatively
is a more direct and effective way. Sun et al. [12] found that a bet- less.
ter model does not lead to substantial gains when the training set According to the special underwater environment, we introduce
is insufficient and the modest performance improvements are still three data augmentation methods for marine organisms image.
possible for exponential increases of the data. Therefore, data aug- Simulate marine turbulence Zhang et al. [18] proposed an at-
mentation methods, such as flipping, cropping, color casting, and mospheric turbulence model [19] based underwater image degra-
blur, are proposed by researchers to solve the problem of data in- dation function. On that function, frequency domain filter was
sufficiency. adopted for underwater image restoration. Inspired by this article,
The traditional data augmentation methods are effective in instead of image restoration, the inverse process of frequency do-
most circumstances, but may lose efficacy for marine organisms main filter is used for simulating marine turbulence under differ-
dataset due to the special underwater imaging environment. It is ent turbulence coefficients.
difficult to get enough marine organisms training data because of Simulate different shooting angles Different shooting angles of
the complicate underwater environments, and labeling numerous underwater object show different visual effects. Wimmer et al.
underwater images are costly and time-consuming. Therefore, ac- [20] used affine transformation to simulate different shooting an-
cording to the underwater imaging environment, three data aug- gles of endoscopic imagery for the classification of celiac disease.
mentation methods are proposed. First, marine turbulence can lead Freifeld et al. [21] proposed a Continuous Piecewise-Affine (CPA)
to underwater image degradation, the inverse process of underwa- Velocity Fields based transformation method, this method is used
ter image restoration can add turbulence influence on image and is as a key component in a learned data augmentation scheme [22],
used to simulate different levels of marine turbulence. Second, un- improving the result of image classifier. As we can see, affine trans-
derwater robots approach the marine organisms from different di- formation is used in [20,22] for data augmentation. Affine transfor-
rections, which implies different views of camera shooting. There- mation is a linear transformation between 2-dimensional coordi-
fore, perspective transformation is proposed to simulate different nates, it can implement translation, scaling, flipping, rotation, and
views of camera shooting. Third, seafloor is always too dark to shearing of the picture. Transformation between different shooting
shoot, additional interior light is needed for underwater imaging. angles is a 3-dimensional coordinates transformation. Thus, instead
However, the use of auxiliary light can result in uneven illumina- of affine transformation, perspective transformation is applied for
tion phenomenon for underwater images, thus illumination syn- simulating different shooting angles in this paper.
thesis is used to simulate different marine artificial illuminating Simulate uneven illumination Illumination characteristic has an
environments. important influence on object detection and recognition. In face
The remainder of the paper consists of the following parts. recognition, due to the change in illumination conditions, the same
Section 2 discusses the related works. Section 3 introduces the face appears differently. Lv et al. [13] used 3D face model recon-
Faster R-CNN model. Three data augmentation methods are in- struction method to simulate different illuminations, which makes
troduced in Section 4. The experimental results are presented in the face recognition model robust to different illuminations. Faraji
Section 5. Conclusions are drawn in Section 6. and Qi [23] proposed to produce illumination-invariant representa-
tion for face images by using logarithmic fractal dimension-based
method, and complete eight local directional patterns. Different
2. Related works from facial environment, underwater environment is always illumi-
nation insufficient and additional artificial illumination is needed
Adequate training data is needed for training a deep learning for underwater imaging. The use of artificial illumination can re-
based object detection and recognition model. It is difficult to ob- sult in uneven illumination and decrease the performance of ob-
tain sufficient marine organisms data in underwater environment. ject detection algorithm. Thus, illumination synthesis method is
Thus, data augmentation is a very important way to take full ad- used for simulating different underwater uneven illumination con-
vantage of the dataset we have. ditions.
374 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384
3. Introduction and training of Faster R-CNN regression loss is activated only for positive anchors.
where, p = ( p0 , p1 , pk ) is a discrete probability distribution over For image restoration, we need to know the turbulence coef-
K + 1 outputs, u is the true class number of objects, tu is the ficient k of the dataset. A series of turbulence coefficient k from
coordinate of predicted bounding box, v is the ground-truth 0.001 to 0.002 are presented. Then, Wiener filtering with differ-
bounding box, Lcls is log loss for true class u, Lloc is the bound- ent coefficients k are used to restore image. The process of image
ing box regression loss, λ is the hyper-parameter which controls restoration with Wiener filtering is described as follow:
the balance between the two tasks. This can be the thought of 1 |T (u, v )|2
bounding-box regression and classification. (u, v ) =
G G(u, v ), (6)
T (u, v ) |T (u, v )|2 + Q
3.3. Training Faster R-CNN where, Q is a constant, G is the Fourier transform of the origi-
nal image, T is the degradation model mentioned above, G is the
There are three training patterns for Faster R-CNN. Four-Step Fourier transform of the restored image. Restored image can be ob-
Alternating Training method is used in this paper. This method tained by using Fourier inverse transform in G . The restoration re-
can be summarized in four steps. Firstly, training the RPN end- sults can be seen in Fig. 2. As illustrated in Fig. 2, when k = 0.001,
to-end by back-propagation and stochastic gradient descent (SGD). the restoration results show little improvement. Then, with the in-
This network is initialized with an ImageNet pre-trained model crease of coefficient k, the restoration results become better, es-
(VGG16). Secondly, a separate detection network is trained by Fast pecially, when k = 0.0015, we can get the best restoration results.
R-CNN using the proposals generated by step-1 RPN. This detection After that, with the increase of coefficient k, the restoration results
network is also initialized by VGG16. Thirdly, sharing the detection become overexposed. Thus, for this bunch of underwater image
network’s convolutional layers with RPN and only finetuning the set which was shot in the same underwater environment and the
layers unique to RPN. Fourthly, keeping the shared convolutional same time, the best image restoration result can be obtained when
layers fixed and fine-tune the unique layers of Fast R-CNN. Finally, k = 0.0015.
both networks share the same convolutional layers and form a uni- In the condition that the best restored image has been obtained,
fied network. we apply the inverse process of Wiener filtering with degradation
model Eq. (4) to simulate different levels of turbulence interfer-
4. Data augmentation ence. The inverse process is defined as follow:
The optical property of water results in bad quality of under- 4.2. Simulate different shooting angles
water image. Experiment results [18] indicate, even the pure wa-
ter can affect the spread of light. Scattering and absorbing cause Marine organisms like sea cucumber, sea urchin and scallop like
the attenuation of light. The absorbing phenomenon of water can clinging to the seafloor because of their living habits. Therefore,
make the light intensity decay 4% per meter. The scattering phe- most of the marine organisms images are shot from an overhead
nomenon can reduce the contrast of underwater image and make angle and an object can be shot with different azimuth and de-
the image blur. In conclusion, severe degradation exists in under- pression angles. Affine transform or perspective transform can be
water image. In reality, different marine environments may cause used to simulate different shooting angles of underwater image.
varying degrees of degradation in underwater image. Thus, we con- Affine transformation is a frequently used data augmenta-
sider using the inverse process of underwater image restoration to tion method and is an extremely powerful tool in image pro-
simulate different degrees of degradation. cessing. Affine transformation is a linear transformation between
Inspired by the thought of Zhang et al. [18], which used atmo- 2-dimensional coordinates, it can implement translation, scaling,
spheric turbulence model to restore underwater degradation im- flipping, rotation, and shearing of the picture. However, The trans-
age, we use atmospheric turbulence model to simulate different formation between different shooting angles is a transformation
levels of marine turbulence. The atmospheric turbulence model is between 3-dimensional coordinates. Perspective transformation, is
defined as follow [19]: to imagine the 2D projection as though the object is being viewed
T (u, v ) = K1 e−(K2 u +K3 v2 )5/6
2
, (4) through a camera viewfinder. The camera’s position, orientation,
and field of view control the behavior of the perspective trans-
where, u and v are the coordinates of pixel, K1 is a constant, K2 formation. Fig. 5 shows underwater images transformed by affine
and K3 are the scaling factors of atmospheric turbulence control. transformation and perspective transformation respectively, the
Generally, Eq. (4) can be simplified as follow: comparison between Fig. 5b and 5c indicates that perspective
T (u, v ) = e−k(u +v ) ,
2 2 5/6
(5) transformation has a better visual effect in simulating different
shooting angles of image. Therefore, unlike most papers that use
where, k is the turbulence coefficient. For atmospheric turbulence, affine transformation for data augmentation, perspective transfor-
when k > 0.0025, it means excessively turbulent, when k ≈ 0.001, it mation is used for data augmentation in this paper.
means intermediate turbulent, when k < 0.0 0 025, it means slight Under homography, we can write the transformation of points
turbulent. However, in underwater environment, marine turbu- [26] in 3-dimensional between camera 1 and camera 2 as:
lence has a larger turbulence coefficient and brings a much more
serious degradation in underwater image. X2 = HX1 , X1 , X2 ∈ R3 . (8)
376 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384
λ1 x1 = X1 , λ2 x2 = X2 (9)
Therefore:
Fig. 4. Framework of marine turbulence simulation based data augmentation
method.
λ2 x2 = H λ1 x1 , (10)
Fig. 5. Comparison between affine transform and perspective transform. Affine transformation matrix Hat and perspective transformation matrix Hpt are described in
Appendix A.
From Eq. (11), taking x = x2 /z2 and y = y2 /z2 , we can get: filter of linear spatial filtering method [32]. The spatial filtering op-
eration can be described as:
h11 x1 + h12 y1 + h13 z1
x= 1
h31 x1 + h32 y1 + h33 z1 E =OX = O(x − i, y − j )X (i, j ), (14)
MN
h21 x1 + h22 y1 + h23 z1 i j
y= . (12)
h31 x1 + h32 y1 + h33 z1 where, O is the uneven illumination image and E is the light in-
Without loss of generality, set z1 = 1 and we can get: tensity distribution templet of O. X is the convolution mask which
is an identity matrix, the height M and width N of X are set as
h11 x1 + h12 y1 + h13 1/6 of the image height and weight. The notation “” means the
x=
h31 x1 + h32 y1 + h33 convolution operation. The light intensity distribution templets of
h21 x1 + h22 y1 + h23 images in Fig. 8 are shown in Fig. 9.
y= . (13) For simulating uneven illumination phenomenon in underwater
h31 x1 + h32 y1 + h33
image. Light intensity distribution templet E is first resized to the
The method for solving the homography matrix H can be found same size of underwater image I, then its average gray values a is
in [26]. From Eq. (13) we can know that after the homography ma- calculated. For further reflecting the uneven characteristic, we cal-
trix has been confirmed, we can use this homography matrix to culate the difference between the templet E and its average value
transform coordinates between different image planes. Fig. 6 shows a as follow:
images transformed by different perspective transformation matri-
m
n
ces. Fig. 7 illustrates the framework of the proposed method to D(x, y ) = E (x, y ) − a, (15)
simulate different shooting angles for underwater image. x=1 y=1
Fig. 6. Augmented images by perspective transformation. Perspective transformation matrices H1 to H15 are described in Appendix B.
Fig. 8. Underwater uneven illumination images, taken by Li and Ge [29] in Li- method. Four test sets augmented by marine turbulence simula-
uqinghe Seas, East China Sea. tion based method with four different turbulence coefficients k =
{0.0 0 05, 0.0 01, 0.0 015, 0.0 02} are tested in this experiment.
batch. We set the learning rates as 0.001 and 0.0001 in the first The results on marine organisms detection and recognition are
2/3 and the last 1/3 iterations, respectively. shown in Table 2. The Baseline are the results tested by the Faster
R-CNN model which is trained by the original training set, and the
5.2. Robustness to marine turbulence Turbulence Sim are the results tested by the Faster R-CNN model
that is trained by training set AD_1. As illustrated in Table 2, the
We verify the robustness of the Faster R-CNN model trained by Baseline are sensitive to different marine turbulence, especially,
the training set AD_1 to turbulence coefficient variations. Train- with the increase of turbulence coefficient k, the detection results
ing set AD_1 contains 1200 original images and 4800 images show a slight decline. However, in the Turbulence Sim, the aver-
augmented by marine turbulence simulation based augmentation age mAP scores are 51.80, which is much higher than the Baseline,
380 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384
Table 3
Performance on the test sets aug-
mented by shooting angle simulation
based method.
H1 35.77 51.03
H2 38.07 55.48
H3 37.34 54.80
H4 32.48 52.01
H5 38.13 51.61
H6 23.07 52.20
H7 24.72 52.29
H8 41.86 51.33
H9 23.83 50.33
H10 39.19 49.29
H11 33.17 52.35
Ave. 33.57 52.25
Table 4
Performance on the test sets augmented by uneven illumination sim-
ulation based method.
Fig. 12. The Baseline (left) vs. the Fusion (right) on marine organisms detection and recognition.
B and C improve the marine organisms detection results by 9.18%, can see from Fig. 12. However, color casting changes the color
−1.75%, 3.48%, 17.46%, 16.84%, 15.45% and 20.44%. All the data aug- of background, therefore influences the performance of Faster R-
mentation methods improve the performance of marine organisms CNN model. Meanwhile, the underwater image directed data aug-
detection and recognition except color casting. The reason why mentation methods have better performance than the frequently
color casting declined the performance of Faster R-CNN model is used data augmentation methods, such as A, B and C have ma-
that the background of marine organisms image is blue, as we rine organisms detection performance of 58.45, 58.14 and 57.45
382 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384
Table 5 Baseline. The test results of the Baseline and the Fusion are visual-
Test results of various data augmentation methods on the original
ized in Fig. 12. As illustrated in Fig. 12, the Fusion achieves better
test set. Bold type indicates the best results.
results than the Baseline.
Methods Ave. Sea urchin Sea cucumber Scallop Since we use the same detector for all methods, the PR curve
Baseline 49.76 15.12 71.12 63.03 can illustrate the performance of data augmentation methods on
O-1 54.33 29.18 70.62 63.19 different marine organisms categories. In Fig. 13, we plot three
O-2 48.89 22.23 70.15 53.30 PR curves for the three categories on marine organisms dataset.
O-3 51.49 20.23 71.63 62.62
In conjunction with Fig. 13 and Table 5 we can see that the de-
A 58.45 40.89 71.08 63.39
B 58.14 32.08 79.79 62.50 tection performances on sea urchin and scallop are analogous un-
C 57.45 39.26 70.16 62.91 der different data augmentation methods, however, the detection
Fusion 59.93 46.82 69.79 63.19 performances on sea cucumber are quite different under differ-
ent data augmentation methods. The reason why the detection
performance between different categories are different is that the
sample size on different categories is imbalance on the marine
respectively, much higher than the performance of O-1, O-2 and organisms dataset. For example, in Fig 12 we can see, the aver-
O-3 which are 54.33, 48.89 and 51.49 respectively. This is expected age sample size of sea urchin and scallop are far more than sea
and confirms that the underwater image directed data augmenta- cucumber which means that sample size of sea urchin and scal-
tion method actually works in underwater environment. The Fu- lop is saturated, while sample size of sea cucumber is insufficient.
sion is a dataset concatenated from data augmentation methods Therefore, data augmentation methods have better promotion on
A, B and C, achieves the best detection and recognition result of sea cucumber detection than sea urchin detection and scallop
59.93, improving the performance by 20.44% compared with the detection.
H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384 383
6. Conclusion
2.0697 0.1005 −84.32
Three underwater image data augmentation methods have been H9 = 0.2607 1.5784 −41.15 , (B.9)
proposed in this paper for improving the performance of marine 0.002265 0.0 010 0 0 1.0 0 0
organisms detection and recognition. Experiment results confirm 1.6316 0 −44.74
that the proposed three data augmentation methods increase the H10 = 0.2237 1.184 −22.3684 , (B.10)
robustness of Faster R-CNN to marine turbulence variations, shoot- 0.001842 0 1.0 0 0
ing angle variations, and uneven illumination variations, respec- 1.795 0.2756 −69.87
tively. When comparing with frequently used data augmentation H11 = 0.08013 1.676 −30.13 , (B.11)
methods, our methods, which dedicated to marine organisms im- 0.001603 0.001474 1.0 0 0
age are more effective in marine organisms detection and recog- 0.8531 0.1930 5.429
nition. In particularly, by fusing datasets from different data aug- H12 = −0.02154 1.087 −2.025 , (B.12)
mentation methods, Faster R-CNN gets the best performance. In −0.0 0 04308 0.0 0 09393 1.0 0 0
conclusion, the proposed methods show robustness to underwater
0.6723 0.08150 25.47
environment variations and outperform the previous data augmen-
H13 = −0.01698 0.8217 8.998 , (B.13)
tation methods on marine organisms detection and recognition.
−0.0 0 03396 0 1.0 0 0
0.7252 −0.1298 38.93
Acknowledgments
H14 = 0 0.6794 13.74 , (B.14)
0 −0.0 0 09160 1.0 0 0
Research supported by the National Science Foundation of
China under the grants of Nos. 61633009, 51570953, 51209050, 0.7518 −0.1719 39.79
61503383, and the National Key Research and Development Plan H15 = 0.1009 0.5674 9.889 . (B.15)
of China under the grant of 2016YFC0300801. 0.0 0 03364 −0.001332 1.0 0 0
References
Appendix A. Details of affine transformation matrix Hat and
perspective transformation matrix Hpt [1] M. Takagi, H. Mori, A. Yimit, Y. Hagihara, T. Miyoshi, Development of a small
size underwater robot for observing fisheries resources–underwater robot for
assisting abalone fishing, J. Robot. Mechatron. 28 (3) (2016) 397–403.
[2] K. Koreitem, Y. Girdhar, W. Cho, H. Singh, J. Pineda, G. Dudek, Subsea fauna
0.1362 0.1354 0 enumeration using vision-based marine robots, in: Proceedings of the IEEE
Hat = , (A.1) Conference on Computer and Robot Vision, 2016, pp. 101–108.
−0.07246 0.7157 50.00 [3] S. Swart, J. Zietsman, J. Coetzee, D. Goslett, A. Hoek, D. Needham, P. Monteiro,
Ocean robotics in support of fisheries research and management, African J.
Mar. Sci. 38 (4) (2016) 525–538.
0.5781 0.1285 0 [4] J. García, J. Fernández, P. Sanz, R. Marín, Increasing autonomy within underwa-
H pt = −0.07246 0.6744 50.00 . (A.2) ter intervention scenarios: the user interface approach, in: Proceedings of the
−0.0 0 03982 0 1.0 0 0 IEEE Systems Conference, 2010, pp. 71–75.
[5] F. Sun, J. Yu, S. Chen, D. Xu, Active visual tracking of free-swimming robotic
fish based on automatic recognition, in: Proceedings of the IEEE Conference
Appendix B. Details of perspective transformation matrices H1 on Intelligent Control and Automation, 2014, pp. 2879–2884.
[6] P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun, in: Over-
to H15 feat: Integrated recognition, localization and detection using convolutional net-
works, 2013 preprint arXiv:1312.6229.
[7] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detec-
0.3446 −0.3369 114.7 tion with region proposal networks, in: Proceedings of the Advances in Neural
Information Processing Systems, 2015, pp. 91–99.
H1 = −0.1367 0.3914 62.67 , (B.1) [8] I. Sa, Z. Ge, F. Dayoub, B. Upcroft, T. Perez, C. McCool, Deepfruits: a fruit detec-
−0.0 0 06160 −0.001123 1.0 0 0 tion system using deep neural networks, Sensors 16 (8) (2016) 1222.
[9] T. Hoang Ngan Le, Y. Zheng, C. Zhu, K. Luu, M. Savvides, Multiple scale Faster
0.4480 0.1965 73.08 R-CNN approach to driver’s cell-phone usage and hands on steering wheel de-
H2 = −0.1415 0.5370 59.32 , (B.2) tection, in: Proceedings of the IEEE Conference on Computer Vision and Pat-
tern Recognition Workshops, 2016, pp. 46–53.
−0.0 0 06680 −0.0 0 06549 1.0 0 0
[10] L. Zhang, L. Lin, X. Liang, K. He, Is faster R-CNN doing well for pedestrian
0.4468 −0.1801 73.42 detection? in: Proceedings of the European Conference on Computer Vision,
H3 = −0.1437 0.6061 56.80 , (B.3) Springer, 2016, pp. 443–457.
[11] G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R.R. Salakhutdinov, in:
−0.0 0 06926 −0.0 0 04329 1.0 0 0 Improving neural networks by preventing co-adaptation of feature detectors,
0.7538 −0.1897 53.33 2012 preprint arXiv:1207.0580.
[12] C. Sun, A. Shrivastava, S. Singh, A. Gupta, Revisiting unreasonable effectiveness
H4 = −0.1128 0.6872 56.41 , (B.4) of data in deep learning era, in: Proceedings of the IEEE International Confer-
−0.0 0 01382 −0.0 0 03846 1.0 0 0 ence on Computer Vision, 2017, pp. 843–852.
0.5908 0.1991 44.73 [13] J. Lv, X. Shao, J. Huang, X. Zhou, X. Zhou, Data augmentation for face recogni-
tion, Neurocomputing 230 (2017) 184–196.
H5 = −0.1540 0.9177 45.35 , (B.5) [14] C. Zhang, P. Zhou, C. Li, L. Liu, A convolutional neural network for leaves recog-
−0.0 0 08039 0.0 0 05688 1.0 0 0 nition using data augmentation, in: Proceedings of the IEEE Conference on
Computer and Information Technology; Ubiquitous Computing and Communi-
0.5921 0.05934 49.73 cations; Dependable, Autonomic and Secure Computing; Pervasive Intelligence
H6 = −0.1499 0.7935 49.92 , (B.6) and Computing, 2015, pp. 2143–2150.
−0.0 0 07596 0.0 0 01695 1.0 0 0 [15] X. Cui, V. Goel, B. Kingsbury , Data augmentation for deep neural network
acoustic modeling, IEEE/ACM Trans. Audio Speech Lang. Process. 23 (9) (2015)
0.6493 −0.1038 51.50 1469–1477.
H7 = −0.1403 0.6927 53.56 , (B.7) [16] C.C. Charalambous, A.A. Bharath, in: A data augmentation methodology for
−0.0 0 05814 −0.0 0 02966 1.0 0 0 training machine/deep learning gait recognition algorithms, 2016 preprint
arXiv:1610.07570.
0.7762 −0.1070 42.37 [17] G. Rogez, C. Schmid, Mocap-guided data augmentation for 3d pose estimation
H8 = −0.05289 0.7903 23.10 , (B.8) in the wild, in: Proceedings of the Advances in Neural Information Processing
Systems, 2016, pp. 3108–3116.
−0.0 0 02935 −0.0 0 03057 1.0 0 0
384 H. Huang, H. Zhou and X. Yang et al. / Neurocomputing 337 (2019) 372–384
[18] H. Zhang, Y. Xu, L. Wan, X. Tang, H. Cai, Processing method for underwater Hao Zhou received the B.S. degree in Naval Architecture
degenerative image, J. Tianj. Univ. 43 (9) (2010) 827–834. and Ocean Engineering from Harbin Engineering Univer-
[19] R. Hufnagel, N. Stanley, Modulation transfer function associated with image sity in 2016. He is currently pursuing an M.S. degree in
transmission through turbulent media, J. Opt. Soc. Am. 54 (1) (1964) 52– Computer Vision at National Key Laboratory of Science
61. and Technology on Underwater Vehicle, Harbin Engineer-
[20] G. Wimmer, A. Uhl, A. Vecsei, Evaluation of domain specific data augmentation ing University. His research interests include object detec-
techniques for the classification of celiac disease using endoscopic imagery, in: tion, binocular vision, and deep learning.
Proceedings of the IEEE International Workshop on Multimedia Signal Process-
ing, 2017, pp. 1–6.
[21] O. Freifeld, S. Hauberg, K. Batmanghelich, J.W. Fisher, Transformations based
on continuous piecewise-affine velocity fields, IEEE Trans. Pattern Anal. Mach.
Intell. 39 (12) (2017) 2496–2509.
[22] S. Hauberg, O. Freifeld, A.B.L. Larsen, J. Fisher, L. Hansen, Dreaming more
Xu Yang received the B.S. degree in Automation from
data: class-dependent distributions over diffeomorphisms for learned data
China Ocean University and the Ph.D. degree in Pattern
augmentation, in: Proceedings of the Artificial Intelligence and Statistics, 2016,
Recognition and Intelligent System from Institute of Au-
pp. 342–350.
tomation, Chinese Academy of Sciences, Bei Jing, China, in
[23] M.R. Faraji, X. Qi, Face recognition under varying illuminations using logarith-
2009 and 2014, respectively. He is currently an Associate
mic fractal dimension-based complete eight local directional patterns, Neuro-
Professor with State Key Laboratory of Management and
computing 199 (2016) 16–30.
Control for Complex System, Institute of Automation, Chi-
[24] R. Girshick, Fast R-CNN, in: Proceedings of the IEEE International Conference
nese Academy of Science, Bei Jing, China. His current re-
on Computer Vision, 2015, pp. 1440–1448.
search interests include underwater image processing and
[25] K. Simonyan, A. Zisserman, in: Very deep convolutional networks for large-
graph matching.
scale image recognition, 2014 arXiv:1409.1556.
[26] E. Dubrofsky, Homography Estimation, Diplomová práce. Vancouver: Univerzita
Britské Kolumbie, 2009.
[27] Y.K. Park, S.k. Park, J.K. Kim, Retinex method based on adaptive smooth-
ing for illumination invariant face recognition, Signal Process. 88 (8) (2008) Lu Zhang received the B.S. degree in Automation from
1929–1945. Shandong University in 2016. Since September 2016, she
[28] D. Jiang, Y. Hu, S. Yan, L. Zhang, H. Zhang, W. Gao, Efficient 3D reconstruction has been a Ph.D. candidate at the State Key Laboratory of
for face recognition, Pattern Recogn. 38 (6) (2005) 787–798. Management and Control for Complex System, Institute of
[29] Q. Li, Z. Ge, A visibility improving algorithm based on underwater imaging Automation, Chinese Academy of Science (CASIA). Her re-
model with non-uniform illumination, J. Optoelectron. Laser 12 (2011) 023. search interests include computer vision, pattern recogni-
[30] M. Wang, J. Pan, S. Chen, H. Li, A method of removing the uneven illumination tion, especially for image processing and object detection.
phenomenon for optical remote sensing image, in: Proceedings of the IEEE
International Conferences on Geoscience and Remote Sensing Symposium, 5,
2005, pp. 3243–3246.
[31] L. Ma, R.C. Staunton, A modified fuzzy c-means image segmentation algorithm
for use with uneven illumination patterns, Pattern Recogn. 40 (11) (2007)
3005–3011.
[32] R.C. Gonzalez, R.E. Woods, S.L. Eddins, Digital image processing using MATLAB, Lu Qi received the B.S. degree in Automation from Shan-
Prentice-Hall, Inc., 2003. dong University and M.S. degree in Pattern Recognition
[33] M. Everingham, L. Van Gool, C.K. Williams, J. Winn, A. Zisserman, The pas- and Intelligent System from Institute of Automation, Chi-
cal visual object classes (VOC) challenge, Int. J. Comput. Vis. 88 (2) (2010) nese Academy of Sciences, Bei Jing, China, in 2014 and
303–338. 2017. He is currently a Ph.D. student in Computer Science
[34] A. Turpin, F. Scholer, User performance versus precision measures for simple and Engineering Department, The Chinese University of
search tasks, in: Proceedings of the ACM International Conference on Research Hong Kong. His current research interests include object
and Development in Information Retrieval, 2006, pp. 11–18. detection and instance segmentation.
[35] C.D. Manning, H. Schütze, Foundations of statistical natural language process-
ing, MIT Press, 1999.
[36] R. Wu, S. Yan, Y. Shan, Q. Dang, G. Sun, in: Deep image: Scaling up image
recognition, 2015 arXiv:1501.02876.
Hai Huang received the B.S. and Ph.D. degrees in me- Aiyun Zang received the Ph.D. degree in control theory
chanical engineering from Harbin Institute of Technol- and engineering from Institute of Automation, Chinese
ogy, Harbin, China, in 2001 and 2008, respectively. He Academy of Sciences, Bei Jing, China, in 2004. She is now
is currently an Associate Professor and a Ph.D. Candidate working at the automation and control system depart-
Supervisor with the National Key Laboratory of Science ment, China Ocean University, Qing Dao, China. Her cur-
and Technology of Underwater Vehicle, Harbin Engineer- rent research interests include underwater image process-
ing University, Harbin, China. His current research inter- ing and graph matching.
ests include underwater vehicle and autonomous opera-
tion.