Vishnu
Vishnu
Vishnu
Sonar imaginary
Submitted in partial fulfilment of the requirements for the award of the degree
BACHELOR OF TECHNOLOGY
in
(Branch no - 12)
Submitted by
G. Vivek (231FA07110)
N. Vishnu (231FA07111)
Vignan's Foundation for Science, Technology and Research (Deemed to be University) Vadlamudi,
Guntur, Andhra Pradesh-522213, India
September 2024
Abstract—
In recent years, seabed inspection has become one of the most sought-after tasks for Autonomous
Underwater Vehicles (AUVs) in various applications. Forward-Looking Sonars (FLS)
are commonly favored over optical cameras, which are notnegligibly affected by environmental
conditions, to carry out inspection and exploration tasks. Indeed, sonars are not influenced
by illumination conditions and can provide high-range data atthe cost of lower-resolution images
compared to optical sensors.However, due to the lack of features, sonar images are often hard to
interpret by using conventional image processing techniques, leading to the necessity of human
operators analyzing thousandsof images acquired during the AUV mission to identify the potential
targets of interest. This paper reports the development of an Automatic Target Recognition (ATR)
methodology to identify and localize potential targets in FLS imagery, which could help human
operators in this challenging, demanding task. The Mask R-CNN (Region-based Convolutional Neural
Network), which constitutes the core of the proposed solution, has been trained with a dataset
acquired in May 2019 at the Naval Support and Experimentation Centre (Centro di Support
Sperimentazione Navale - CSSN) basin, in La Spezia (Italy). The ATR strategy was then successfully
validated online in the same site in October 2019, where the targets were replaced and
relocated on the seabed. Index Terms—Marine Robotics, Artificial Intelligence, Automatic Target
Recognition, Autonomous Underwater Vehicles, Convolutional Neural Networks, Intelligent Robotics
I. INTRODUCTION
The recent advancement in autonomous vehicles aims to develop increasingly intelligent systems
capable of interacting with the surrounding environment and independently deciding the best
actions to fulfill specif tasks. In this context, modern robotics tries to integrate Artificial Intelligence
(AI) concepts and technologies. In particular, AI is currently used in robotic systems for different
purposes, such as making autonomous decisions, planning paths, and extensive data processing;
fields where excellent results are being achieved. Since the demanded tasks of Autonomous
Underwater Vehicles (AUVs) have become more and more challenging [1] [2], researchers and
scientists are investigating the use of AI technologies in the marine environment. Indeed,
autonomous inspection strategies [3] for underwater installations and exploration planning solutions
[4] have become essential tools to execute demanding and hazardous subsea operations for
unknown scenarios. AUVs are commonly used for seabed inspections in a wide variety of
applications, ranging from geomorphological and
biological analyses to port supervision in the view of ensuring the safety of the vessel traffic. These
tasks are generally performed by exploiting optical sensors; however, optical cameras are affected by
water turbidity and lighting conditions, and gathering satisfactory images does arise as a non-trivial
task, not feasible in several scenarios. As a consequence, e.g., Side-Scan Sonars (SSS), as well as
Forward-Looking Sonars (FLS), are commonly favored to carry out inspectionand exploration tasks; in
fact, sonars are not influenced by illumination conditions and can provide high-range data. In
particular, FLSs can synthesize satisfactory resolution images, more detailed than SSSs, but at shorter
distances. Nevertheless, the high-noise and the lack of features make sonar images hard to interpret
by using conventional image processing techniques. De facto, a human operator is usually in charge
of analyzing the thousands of acquired images to identify the so-called Objects of Potential Interest
(OPIs). An Automatic Target Recognition (ATR) strategy that detects and localizes OPIs in FLS imagery
hence represents an important tool that could help human operators in this demanding task. In this
context, cutting-edge Deep Learning (DL) techniques, which have become the state of the art in the
classification and object detection tasks [5], are being investigated in marine ATR applications [6] [7].
In this work, an ATR strategy for FLS frames, relying on image-based state-of-the-art DL architectures,
has been0.dbfesigned and implemented to detect and geolocalize potential targets of interest placed
on the seabed. The research activity has focused on developing and evaluating the aforementioned
solution on FLS imagery; furthermore, the system feasibility has been verified during real-time tests.
Firstly,the selected CNN (Convolutional Neural Network) model shave been trained by exploiting a
custom gathered data set of heterogeneous images, acquired in May 2019 at the Naval Support and
Experimentation Centre (Centro di Supporto eSperimentazione Navale - CSSN) basin, in La Spezia
(Italy),and the open-source machine learning library TensorFlow.Then, the trained neural network
models have been incorporated in a custom ATR software, developed in the Robot Operating System
framework [8]. Finally, the ATR solution has been validated online with FeelHippo AUV [9] and an
autonomous moving buoy [10] during an experimental campaign performed within the activities of
the SEALab, the joint research laboratory between the CSSN of the Italian Navy and the
Interuniversity Center of Integrated Systems for the Marine Environment (ISME). This paper is
organized as follows: Section II describes underwater ATR systems with FLS and reviews the most
used CNN architectures in this field. Section III is dedicated to the description of the proposed
methodology by accurately outlining the DL model selection and training processes. Section IV
overviews the experimental scenario and the on-field results obtained by collecting data during a sea
mission. Finally, Section V summarizes the presented research by focusing on the major achieved
results; furthermore, a brief description of the future trends is illustrated.
II. BACKGROUND
With the growing demand for intelligent systems capable of performing complex interactive tasks,
reacting to the environment while inspecting areas, and cooperating meaningfully with human
operators, object detection has become a fundamental feature of modern robots. Unmanned
Ground Vehicles (UGVs) and Unmanned Aerial Vehicles (UAVs) can rely on a large variety of sensors,
ranging from optical cameras to Light Detection and Ranging (LiDAR) devices, to detect objects. Due
to the wide use of modern cameras, several image based target identification solutions have been
developed. In particular, CNN-based approaches showed outstanding results, becoming the state-of-
the-art in the image classification andtarget recognition tasks [5]. Nevertheless, AUVs have limited
recognition capabilities due to the underwater domain. Water turbidity, low-light conditions, and
poor visibility degrade the quality of the optical images (Fig. 1), making the subsea object detection,
also denoted as ATR, not achievable in many cases. Acoustic sensors, such as FLS or SSS, represent a
valid solution; indeed, these sensors provide high-range data, not affected by illumination conditions.
Moreover, even though recognizing object patterns in the high-noise acoustic sonar images can be
challenging, FLS may represent a functional device for
underwater ATR by providing quite good resolution images (an example is provided in Fig. 1), at high
frame rates, and not requiring the vehicle to move. Different Template Matching-based object
recognition approaches for FLS imagery have been developed and tested with different similarity
measures and feature-trained classifiers. Nonetheless, these techniques cannot generalize the
template patterns and recognize objects not considered in the dataset; additionally, their
performance degrades in the
handling of multi-scale objects. Therefore, these limitations led many marine researchers to
investigate the use of CNN solutions, state of the art for optical images, in acoustic imagery. In ,
custom CNN architectures to classify FLS images have been evaluated. The reported performance
comparison with classical template matching solutions showed that CNNs could provide better
performance while keeping a low number of parameters. Nevertheless, developing a custom CNN
architecture is time expensive and requires plenty of images to train the network; besides, AUVs
usually have limited onboard computational power, and ATR shall be performed in realtime. Thus,
developing a CNN for onboard applications does emerge as a real challenge
Turning to a more detailed overview, these solutions follow a common approach: the first network
layers, called the backbone of the network, are in charge of extracting the dominant features, while
the last layers classify those features and localize objects in the image. Generally, the backbone
is tricky to train and requires a large dataset. Conversely, by using transfer learning, the last layers
can easily be trained on a custom dataset by fine-tuning higher-order feature representations,
speeding up the training phase. As regards the concerned subsea environment, since gathering a
large dataset
in [13], where the authors tested the You Only Look Once (YOLO) architecture [14], a fully-
convolutional end-to-end optimized network, to detect divers. The Single Shot Multibox Detector
(SSD) [15] is a convolutional network able to detect and classify objects at different scales at a high
frames per second (fps). Its native version used the Visual Geometry Group (VGG) network [16] as a
backbone to extract the image features. Also different feature extraction networks, such as Inception
[17] and Mobilenets [18], were tested. Small convolutional filters are then applied to different scale
feature maps in the final layers to detect and classify objects. The network training aimed to optimize
a multi-task loss that took into account both the classification error and the bounding box coordinate
error. This simple structure let the SSD to reach high-accuracy detections at high fps (up to 45). When
the detection accuracy shall be favored over the inference speed, Region-based architectures, such
as the Faster R-CNN (Region-based Convolutional Neural Network) [19], are the recommended
choice. The Faster R-CNN’s backbone is composed of a feature extractor network and a Region
Proposal Network (RPN) to produce the Regions Of Interest (ROIs) in the feature maps and predicts
the bounding boxes. Two fully-connected sibling layers take each ROI as input, and classify possible
objects and refine the bounding boxes. The loss function used to train the network was a trade-off
between
the classification and the localization tasks. Compared with the SSD, the Faster R-CNN is more
accurate but cannot reach the exceptionally high inference speed.
Mask R-CNN [20] extended the Faster R-CNN. Firstly, the backbone was improved through the
Feature Pyramid Network(FPN) that can better represent objects at multiple scales. Besides, the
authors added in the final layers a convolutional branch to generate a segmentation mask for the
selected ROIs. The training loss also considered the segmentation tasks, improving the network
performance. In fact, instance segmentation enables identifying object outlines at the pixellevel,
enhancing the localization precision.
frames are acquired in this work at a low frame-rate (3 Hz); thus, too high inference rates are not
required. Moreover, the ATR solution has to provide an additional geolocalization of possible seabed
objects; within this context, since the target 3D positions are estimated from the 2D DNN localization
in the FLS frame, small errors in the bounding boxes at the pixel level could lead to large errors in
meters in the 3D localization. Therefore, the network accuracy is of utmost importance and shall
favor the inference speed as the model selection parameter.