A Deep Learning Approach To Detecting Objects in Underwater Images
A Deep Learning Approach To Detecting Objects in Underwater Images
https://doi.org/10.1080/01969722.2023.2166246
ABSTRACT KEYWORDS
A deep learning approach, also known as deep machine learn- Deep learning; underwater
ing or deep structure learning, has recently been found to be pipeline image; object
successful in categorizing digital images and detecting objects detection; aquatic
ecosystem
within them. Consequently, it has rapidly gained attention
and a reputation in computer vision research. Aquatic ecosys-
tems, especially seagrass beds, are increasingly observed using
digital photographs. Automatic detection and classification
now requires deep neural network-based classifiers due to the
increase in image data. The purpose of this paper is to pre-
sent a systematic method for analyzing recent underwater
pipeline imagery using deep learning. There is a logical organ-
ization of the analytical methods based on the recognized
items, as well as an outline of the deep learning architectures
employed. Deep neural network analysis of digital photo-
graphs of the seafloor has a lot of potential for automation,
particularly in the discovery and monitoring of underwater
pipeline images.
1. Introduction
Underwater image recognition is incredibly useful and beneficial for a var-
iety of uses, including pipeline maintenance, mining, marine life monitor-
ing, and military applications. Light moves at a pace of 20 m in clear water
and slightly slower in turbid or coastal water, Veiga et al. (2022).
Underwater light visibility is rather limited, and the speed of entry into the
water drops rapidly. Poor quality underwater photos make it harder to rec-
ognize items. Underwater photography is done at great depths and might
be of poor quality, Mathias et al. (2021). Because underwater photography
is frequently done in deep water, an autonomous underwater vehicle
(AUV) is employed to inspect the photographs.
The artificial light in the AUV intensifies the brightness in the shot, but
it generates a haze and sounds like noise as it moves across the water. To
layers after starting with a very flat 6-level model and gradually increasing
the amount of layers.
To allow the network to assess the input from numerous perspectives while
using the same feature extraction pipeline, a cyclic pooling method was used.
The same stack of convolutional layers was used, which was then fed into a
stack of dense layers and feature maps that were merged on top.Finally, stacks
of cyclic pooling output feature maps from separate directions are combined
into a single huge stack, and this combined input is used to train the next level,
which includes four times the number of filters as previously. The technique of
integrating feature maps from multiple directions is known as "rolling." It
planned and built an initial module with convolutional layers to avoid distor-
tion and optimize visual information extraction using the same dataset from
the 2015 National Dataset Bowl. GoogleNet served as inspiration. Network
architecture has been defined by increased utilization of computational resour-
ces within a network. To alter rotational and translational invariants, data aug-
mentation was used, and rotational affinity was used to enhance the data.
The deep convolutional neural network was partitioned into two parts:
classifiers and features. However, if the dataset is too tiny, this form of clas-
sifier subdesign overfits, thus it substitute the last two fully connected
layers with small kernels. It turned out to be better than expected. The
model outperformed state-of-the-art approaches at particular image sizes.
Lee and colleagues created a deep network technique for categorizing
plankton using very large datasets. They made advantage of his WHOI
plankton dataset (produced by the Woods Hole Oceanographic Institution).
This dataset included 3.4 million expert-labeled images of him from 103
distinct classes. They primarily concentrated their methods on addressing
the issue of class imbalance in large datasets.
To eliminate the bias induced by class imbalance, they employed the
CIFAR 10 CNN model as a classifier. Three levels of convolutions were fol-
lowed by two completely connected layers in the suggested design. Their
classifier was pre trained using class-normalized data before being
restrained using the original data. We were able to remove the bias from
the class imbalance as a result of this. Dai and colleagues We developed a
deep folding network specifically to classify zooplankton. For the data set,
the ZooScan system acquired 9460 photomicrographs and grayscale pic-
tures of zooplankton from 13 distinct classes. They proposed ZooplanktoN,
a new deep learning architecture for classifying zooplankton. After experi-
menting with various convolution sizes, he determined that ZooplanktoN
net provided the best performance to date at 11 layers.To buttress up their
claims, they ran comparative trials with other deep learning architectures,
including his AlexNet, CaffeNet, VGGNet, and GoogleNet, and discovered
that ZooplanktoN outperformed with 93.7% accuracy.
6 KALAIARASI G ET AL.
taken by orbiting the WorldView2 satellite. This technique was only useful
for detecting coastline scars in flat coastal locations. A common technique
to digital imaging, presently recommended by Australia’s Commonwealth
Scientific and Industrial Research Organization (CSIRO) and Health Safety
and Environment Policies (HSE), is to snap an image from a digital camera
every three seconds.
The camera is normally mounted on a frame and towed behind a boat
cruising at 1.5-3 knots, ensuring that the photo is around 2-3 meters away.
The pictures are then evaluated using the Photo Grid or TranscetMeasure
(VR SeaGIS) programmes. A standard 20-point grid is overlaid, and a human
3. Proposed Methodology
Object recognition is the ability to properly recognize objects, calculate their
position and dimensions, and conduct semantic or instance segmentation.
Previous studies relied on algorithms based on shape, color, and contour
matching to recognize things, which are unsuitable for real-world object
detection. Deep learning frameworks are categorized as object-regression-
based, classification-based, or both. Methods for recommending regions
include Region-based CNN (RCNN), Fast RCNN, Faster RCNN, and Mask
RCNN, for example, indicate regions of interest and attempt to identify
objects within them, but are incapable of doing so. For direct object detec-
tion, classification-based algorithms make use of an integrated framework.
Figure 2 depicts a flowchart for anticipating underwater images. A convo-
lutional neural network-based underwater object prediction is designed using
deep learning. Using additional number arrays in the model, this network
detects images after encoding them into numeric arrays. Images are recorded
and sent to the internet software as the user adds various characteristics of
the forecast into web form and the model with TensorFlow and scaled it
down from its original enormous size. The numerical values of the various
objects are entered into the data collection by this model.To identify object
prediction from photographs found online, the proposed technique leverages
a CNN object recognition model. Image acquisition, image preprocessing,
segmentation, feature extraction, and grading are five primary stages of object
identification. Scanners are used for image processing tasks such as picture
enhancement, segmenting a photograph into separate sections, locating infec-
tion foci, and extracting features that aid in image classification.
3.1. Dataset
The fundamental data gathering strategy employed in this investigation is
depicted in Figure 3. Kaggle was used to collect data. Image acquisition
refers to the process of capturing images from underwater. A digital camera
or scanner is used to capture images of objects or to collect data.The type
and location of the digital camera have an impact on the quality of the
CYBERNETICS AND SYSTEMS: AN INTERNATIONAL JOURNAL 9
3.5. CNN
Because of its ability to perceive and comprehend patterns, CNNs have
evolved substantially. The output accuracy is quite high, making it the
most efficient design for image categorization, retrieval, and recognition
applications (Figure 5). Because it accepts any photo as input, this quality
makes it the best prediction method.A CNN must be able to satisfy crucial
properties such as spatial invariance in order to learn to recognize and
extract visual information from random points within an image. CNNs
automatically extract pictures and learn features from photos and data. As
a result, CNNs can deliver accurate deep learning outcomes.
4. Simulation Results
The technology recommended will be used in a remotely operated under-
water vehicle (ROV). A total of 33,000 pictures have been manually and
artificially labeled. Deep learning employs 9720 photos for training, 8910
images for validation, and 14370 images for testing. Accuracy, recall, and
mean are often used to assess object detecting accuracy. Using a split train-
ing and testing strategy, divide the data into a 70% training model and a
30% testing model.
Accuracy: Use the best-fit model to find patterns in the data set.
Prediction: Predictions with positive outcomes split by total positive
predictions
Recall: It should be noted that measures are used to establish the type
of TP
F1 score: Weighted average of precision and recall measurements.
Predictions are used to classify each image in the trained set. The output
of each image is predicted based on its precision.
As illustrated in Figure 6, this technique uses deep learning to produce
results such as object recognition and categorization. So the overall accur-
acy is 98.48%. This can be done with accurate proportions.The training
and validation accuracy are represented on the y-axis while the epoch is
plotted on the x-axis as the model trains the dataset. The link between
training accuracy and epochs is depicted in Figure 7.
The training and validation losses are displayed on the Y-axis and the
epochs are exhibited on the X-axis as the model trains the dataset, as illus-
trated in the graph (Figures 8–10).
CYBERNETICS AND SYSTEMS: AN INTERNATIONAL JOURNAL 13
As illustrated in Figure 11, certain objects are missed because the dataset
is too small, especially if the images in the dataset are so similar. The light-
ing and surroundings are straightforward. As a result, if the trained model
is employed for detection in various sea areas or under different climatic
conditions, the detection accuracy will be diminished. As a result, the pro-
posal intends to capture additional underwater images in diverse marine
locations and climatic conditions.
5. Conclusion
The primary goal of underwater object detection technology is to detect
objects as quickly as possible. It built and tested an autonomous underwater
object detection system that can detect objects in challenging underwater
images in the proposal. The suggested automatic underwater object detection
output is evaluated for accuracy in terms of reduced tracking error compared
to earlier detection approaches. The proposed detection system could be used
as an automated module for an ocean explorer’s high-end computer-equipped
underwater object detection. The proposed method is designed to discover
hidden or camouflaged visualization in underwater situations. The applica-
tion of this method to object detection is widespread, despite its superior
detection accuracy compared to existing systems.
ORCID
Manoj Prabu M http://orcid.org/0000-0002-7316-4810
References
Chiang, J. Y., and Y.-C. Chen. 2012. Underwater image enhancement by wavelength com-
pensation and dehazing. IEEE Transactions on Image Processing : a Publication of the
IEEE Signal Processing Society 21 (4):1756–69. doi:10.1109/TIP.2011.2179666.
Galdran, A., D. Pardo, A. Pic on, and A. Alvarez-Gila. 2015. Automatic red-channel under-
water image restoration. Journal of Visual Communication and Image Representation 26:
132–45. doi:10.1016/j.jvcir.2014.11.006.
Jalal, A., A. Salman, A. Mian, M. Shortis, and F. Shafait. 2020. Fish detection and species
classification in underwater environments using deep learning with temporal informa-
tion. Ecological Informatics 57:101088. doi:10.1016/j.ecoinf.2020.101088.
Kumari, C. U., D. Samiappan, R. Kumar, and T. Sudhakar. 2019. Fiber optic sensors in ocean
observation: A comprehensive review. Optik 179:351–60. doi:10.1016/j.ijleo.2018.10.186.
Lee, H., M. Park, and J. Kim. 2016. Plankton classification on imbalanced large scale data-
base via convolutional neural networks with transfer learning. In 2016 IEEE
International Conference on Image Processing (ICIP), 3713–7. IEEE.
Li, C.-Y., J.-C. Guo, R.-M. Cong, Y.-W. Pang, and B. Wang. 2016. Underwater image
enhancement by dehazing with minimum information loss and histogram distribution
16 KALAIARASI G ET AL.