A Deep Learning Framework For Detecting Underwater Trash
A Deep Learning Framework For Detecting Underwater Trash
Abstract— The increasing environmental concerns related to techniques, the goal is to quickly lessen the harmful effects of
underwater pollution and marine debris necessitate effective aquatic litter. Underwater debris must be found as soon as
methods for detecting and managing underwater trash. This paper possible to allow for immediate action and to make a major
presents a novel approach for underwater trash detection utilizing contribution to environmental conservation efforts. In order to
the YOLOv8 (You Only Look Once version 8) object detection
facilitate efficient waste management techniques, the project
framework. YOLOv8 is known for its efficiency and accuracy in
object detection tasks, making it a suitable choice for detecting also aims to accurately detect and categorize underwater litter.
submerged waste in aquatic environments. Innovative data The overall objective goes beyond simple detection; itincludes
augmentation techniques and pre-processing steps are employed the necessity of promoting sustainable behaviors and reducing
to enhance the model's robustness and adaptability to underwater the environmental impact of human activity. Theproject aims to
environments. Our tests' outcomes show how well the suggested transform underwater monitoring and open the door for active
method works to precisely recognize and classify underwater conservation efforts by putting in place an advanced trash
trash. This technology has the potential to aid in marine detection system. To minimize harmful effects on aquatic
conservation efforts, enabling the timely removal of waste ecosystems, the project aims to identify underwater litter in
materials from aquatic ecosystems and reducing the harm caused
real-time, classify it with high accuracy, and contribute to
to marine life.
environmental conservation by making effective trash disposal
Keywords: YOLOv8, Instance segmentation, Machinelearning, Deep possible.
learning
II. RELATED WORKS
I. INTRODUCTION [1]Azzouni(2023) The use of deep learning methods for
trash detection underwater is examined in this article. It
The development of an advanced image segmentation model probably examines a range of algorithms, their capabilities,
is very important because of the urgent requirement to address and any difficulties that may arise in this
underwater pollution, especially the universal problem of field.[2]Boorugu(2020) This paper surveys the application of
submerged trash in aquatic environments. This project is
natural language processing (NLP) for summarizing product
important because of its deep impact on economies and public
reviews. It likely analyzes different NLP approaches, their
health in addition to its ecological implications. Tin cans and
glass bottles are among the many types of submerged debris that effectiveness in extracting key points from reviews, and
are destroying underwater ecosystems; thus, quick and effective potential challenges in this context.[3]Dilawari(2019) This
action is required. The principal aim is to develop and execute work introduces ASOVS, an abstractive summarization
a novel underwater waste detection system that is distinguished method for video sequences. It might explain how ASOVS
by its dependability, efficiency, and remarkable precision. By extracts and condenses key information from video content
employing state-of-the-art technology, such as the YOLO-v8 into concisesummaries.[4]Ji(2020) This paper proposes the
(You Only Look Once, version 8) algorithm,the project aims use of attention-based encoder-decoder networks for video
to identify and locate submerged objects with an unprecedented summarization. It likely details the network architecture,
level of precision. By applying cutting-edge how it focuses on important video segments, and its
effectiveness in generating summaries.[5]Jeevitha(2020)
This work focuses on generating natural language
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on May 07,2025 at 14:23:08 UTC from IEEE Xplore. Restrictions apply.
descriptions for videos using NetVLAD and attentional Kumar P. describe the project focuses on accurately
LSTM techniques. It might explain how these techniques estimating and segmenting lung tumors using an Adaptive
extract video features and translate them into coherent Multiple Resolution Contour Model. This involves utilizing
sentences describing the video content.[6]Li(2019) This advanced segmentation techniques to precisely delineate
paper proposes a multimodal summarization approach for tumor boundaries within lung images, enabling volumetric
asynchronous text, image, audio, and video data. It likely estimation essential for diagnosis and treatment planning in
details the methods for integrating information fromdifferent oncology.
modalities and generating comprehensive summaries across
various media types.[7]Al Muksit(2022) This work III. ARCHITECTURAL DESIGN
introduces TC-YOLO, an underwater object detection YOLO, short for 'You Only Look Once,' is a leading
method using attention mechanisms. It might explain how algorithm in computer vision renowned for its swift and
the attention mechanism improves the accuracy of object effective object detection capabilities. Unlike traditional
detection in challenging underwater methods, YOLO treats object detection as a regression
environments.[8]Zhao(2021) This paper proposes a method problem, swiftly providing class probabilities for detected
for underwater trash detection using spatial pyramidpooling images in real-time scenarios. Leveraging Convolutional
and attention modules. It likely details how these techniques Neural Networks (CNNs) for rapid processing, YOLO
help identify and localize trash objects amidst underwater necessitates only a single forward pass through the network
clutter.[9]Hu(2020) This work explores the use of YOLOv3 to detect objects, eliminating the complexity of multiple
stages and predictions. The latest iteration, YOLO-V8,
with an attention mechanism for underwater trash detection.
introduces an anchor-free design, directly predicting object
It might explain how the attention mechanism improves the
centers rather than offsets from anchor boxes. Its architecture
accuracy of trash detection in underwater environments. [10]
comprises a backbone, rooted in a modified CSPDarknet53,
Murugan and Emilyn (2021)present a study on monitoring and a head featuring convolutional layers responsible for
and forecasting water quality and fish population using predicting bounding boxes, objectless scores, and class
stacked LSTM-GRU in an IoT environment. The study may probabilities. Apart from its efficiency and real-time
involve real-time fish count calculation using OpenCV capabilities, YOLO's architecture underscores end-to-end
libraries and a Pi camera module. The proposed processing, enabling it to handle entire images and output
methodology likely utilizes recurrent neural network bounding box coordinates directly. This streamlined
algorithms for predicting fish populations and compares approach minimizes computational overhead and facilitates
them with existing methods based on root mean squared faster inference times, catering to applications requiring
error. [11] Murugan, Jeba Emilyn, and Prabu M (2020) rapid object detection across diverse scenarios. Additionally,
describe the implementation of a stacked autoencoder with YOLO's anchor-free design in version 8 enhances its
RBM for predicting and monitoring aquatic biodiversity. adaptability to varying object sizes and orientations,
This research likely explores the application of deep learning ensuring. more accurate detections. Its modular architecture
techniques, such as autoencoders and restricted Boltzmann further allows seamless integration with other computer
machines, to predict and monitor changes in aquatic vision tasks, empowering developersto tailor and expand its
biodiversity. The study may focus on leveraging machine functionality for specific use cases.
learning for environmental monitoring and conservation
efforts in aquatic ecosystems.
[12] K. Saraswathi, V. Mohanraj, Y. Suresh, J. Senthil
Kumar, "A Hybrid Multi-Feature Semantic Similarity-
Based Online Social Recommendation System Using
CNN," International Journal of Uncertainty, Fuzziness and
Knowledge-Based Systems, December 2021.This paper
presents a novel hybrid recommendation system for online
social platforms, integrating semantic similarity and
convolutional neural networks (CNN) to enhance
recommendation accuracy and relevance. [13] The paper
investigates the improvement of classification accuracy in
machine learning through hyperparameter optimization.
Authors S. Senthil Pandi, V. R. Chiranjeevi,
Fig. 1 YOLO V8 architecture with head and backbone
Kumaragurubaran. T, and Kumar P present their findings in
the 2023 International Conference on Research
Methodologies in Knowledge Management, Artificial IV. PROPOSED SYSTEM
Intelligence, and Telecommunication Engineering The proposed system is an innovative solution for
(RMKMATE) held in Chennai, India. The study explores underwater trash detection, leveraging cutting-edge
methods to refine machine learning algorithms for better technologies and algorithms to achieve the objectives outlined
performance.[14] S. SenthilPandi, B. Kalpana, V. K. S., & in the introduction. Its modular architecture ensuresscalability
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on May 07,2025 at 14:23:08 UTC from IEEE Xplore. Restrictions apply.
and flexibility in operation, while the core of the system relies V. DATA COLLECTION
on the YOLO-v8 (You Only Look Once, version 8) algorithm, The pivotal data collection phase in the project involves the
renowned for its real-time capabilities, exceptional accuracy, meticulous acquisition of underwater images and videos from
and versatility in detecting submerged debris. This algorithm sensors or deployed cameras in aquatic environments,
processes underwater images and videos in a single forward establishing the fundamental groundwork for subsequent
pass through the neural network, enabling simultaneous analysis and detection processes. The paramount significance
prediction of object classes and bounding box coordinates. of obtaining high-quality and reliable data underscores the
Through the incorporation of anchor boxes, the system success of the entire system. The cornerstone of this data
enhances its ability to detect objects of various sizes, collection effort is the TrashCan dataset, a comprehensive
including different-sized underwater trash items. Rigorous repository comprising 7,212 meticulously annotated images.
training of the YOLO-v8 model is conducted using a This dataset encompasses diverse observations, including
meticulously curated dataset of underwater trash images and instances of trash, remotely operated vehicles (ROVs), and a
annotations, enabling it to detect and classify a wide range of wide spectrum of undersea flora and fauna. Noteworthy in its
underwater debris, such as tin cans, glass bottles, and debris. format, the annotations within TrashCan take the intricate
Fine-tuning of the model involves adjusting hyperparameters, shape of instance segmentation annotations, utilizing bitmaps
anchor box sizes, and other configurations to optimize its to precisely mark pixels containing each discernible object
accuracy and reliability in underwater environments. An within the underwater imagery. The TrashCan dataset draws
essential component of the system is a dedicated real-time its source material from the J-EDI (JAMSTEC E- Library of
inference engine, facilitating real- time or batch processing for Deep-sea Images) dataset, a curated compilation by the Japan
underwater trash detection,which takes input images or video Agency of Marine Earth Science and Technology
streams and producesdetection results that include object (JAMSTEC). The imagery in TrashCan originates from videos
classes and boundingboxes. Key technologies utilized in the captured by ROVs operated by JAMSTEC since 1982,
system include the YOLO-v8 algorithm, OpenCV for image predominantly in the sea of Japan. Significantly, the dataset
preprocessing and data augmentation, and Python as the presents two versions, namely TrashCan-Material and
primary programming language.The system effectively TrashCan-Instance, each aligning with distinct configurations
fulfills the project's primary objectives through its utilization of object classes. The overarching aim is to develop cutting-
of the YOLO-v8 algorithm, ensuring real-time detection edge and highly effective trash detection methodologies,
capabilities for immediate responses to pollution incidents specifically tailored for onboard robot deployment in aquatic
while achieving high accuracy in identifying various types and environments. An exceptional aspect of the TrashCan dataset
sizes of underwater debris through meticulous training and is its distinction as the pioneering instance-segmentation
fine-tuning. This capability facilitates efficient removal, annotated dataset solely dedicated to underwater trash. This
contributing to environmental conservation. Moreover, its unique feature marks a substantial advancement in research
modulararchitecture ensures scalability, making it suitable for efforts, contributing to the ongoing exploration of thiscomplex
deployment in diverse aquatic environments, while problem. The release of the TrashCan dataset is anticipated to
optimizing hardware resources, particularly the NVIDIA RTX play a pivotal role in catalyzing further research endeavors and
3050 GPU, for cost-effective and practical deployment,thus innovative solutions within the marine robotics community,
ensuring both effective pollution management and resource ultimately edging closer to resolving the pressing issue of
efficiency. autonomous trash detection and removal in underwater
ecosystems.Since Kaggle and other online resources didn't
initially provide us with good quality images, we decided to
create our own datasets based on our needs. Our platform of
choice for manually annotating boundingboxes and generating
datasets was Roboflow.The images in this dataset were
extremely helpful in improving overall accuracy and are
capable of addressing low-quality images and annotations with
great accuracy.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on May 07,2025 at 14:23:08 UTC from IEEE Xplore. Restrictions apply.
VI. DATA PREPROCESSING and precision. In order to guarantee the precision and
dependability of the model's identifications, precision assesses
During the data collection phase, it's important to ensure that
the model's ability to make accurate positive predictions.
the video and alignment data are well-prepared and pre-
Recall reduces the possibility of missing debris items by
processed.This module serves as a pivotal initial step in our
underwater trash detection system, acting as the fundamental measuring the model's capacity to find all pertinent objects in
gateway for accurate object detection and classification. It the underwater environment. Mean average precision (mAP)
focuses on meticulously collecting underwater images and provides a comprehensive measure of the model's overall
videos, preparing them comprehensively for seamless object detection accuracy, offering insight into its
integration with the YOLO-v8 algorithm. The primary effectiveness. In order to verify the model's suitability for
objective is to refine and optimize the data to enhance the practical implementation, the evaluation phase acts as a crucial
system's ability to achieve precise detection accuracy. checkpoint to guarantee the model's accuracy and
Originally provided in JSON format, the annotations are dependability in identifying underwater trash.
converted into YOLO text file format for direct download and
utilization. With 6008 training cases and 1204 validation • Precision: The precision of the model's positive
cases, the dataset makes robust model evaluation and training predictions is measured. It measures the proportion
possible. The dataset comprises 16 distinct classes, all ofwhich of correctly identified trash items, or true positives,
are linked to distinct label indices. This allows for the accurate
to the total of false positives and true positives
identification and categorization of various kinds of
(incorrectly identified items).
underwater waste. In YOLOv8 , several pre-processing steps
are essential to prepare the raw data for training the models • Recall: Recall, sometimes referred to as sensitivity,
effectively, gauges how well the model can locate all pertinent
instances of trash items. The ratio of true positives
Image Enhancement: advanced techniques are applied to
elevate the quality and clarity of the collected data. These to the total of false negatives (missed items) is
techniques include contrast adjustment to improve visibility, quantified.
noise reduction to eliminate unwanted artifacts, and image • mAP (Mean Average Precision): mAP is an all-
stabilization to counter the effects of turbulence or motion. The inclusive measure that takes the trade-off between
goal is to provide the YOLO-v8 algorithmwith clear and precision and recall into account at different
actionable data, minimizing potential hindrances to accurate thresholds. It offers an overview of how accurate
detection. Data Augmentation: Various strategies are
and efficient the model is overall at finding trash
employed to bolster the robustness of the detection model by
diversifying the dataset. This step is crucial to enhance the underwater.
model's ability to generalize across varying environmental
conditions. Data Labelling: the core of this module's VIII. RESULTS AND DISCUSSIONS
functionality, ensuring precise and consistent annotation of In addition to the YOLO-v8 models trained, including Nano
underwater images and videos, primarily focusing on objects and Small, the Medium model stands out with the highest box
of interest—underwater trash items. mAP achieved so far, reaching an impressive 45%.
Furthermore, the segmentation mask mAP for the Medium
VII. MODEL DEVELOPMENT AND EVALUATION model reaches 36.2%, indicating its effectiveness in
The model training phase marks the inception of the model's accurately delineating underwater trash objects.This signifies
journey, where the YOLO-v8 model undergoes rigorous a significant advancement in our model developmentefforts,
training to acquire the knowledge necessary for recognizing showcasing the Medium model's superior performance
and classifying various underwater trash items. This phase compared to its counterparts. Through rigoroustraining and
heavily relies on a meticulously curated dataset of underwater fine-tuning, the Medium model demonstrates promising
trash images, providing essential visual references for the capabilities in detecting and classifying underwater debris,
aligning closely with the project's objectives of precise and
model. Fine-tuning techniques are thoughtfullyapplied during
efficient trash detection in aquatic environments.
this phase to optimize the model's parameters, ensuring its
adaptability to the distinct and often challenging underwater
environment. Fine-tuning involves adjusting hyperparameters,
anchor box sizes, and other model configurations to enhance
its performance in underwater settings, thereby tailoring the
YOLO-v8 model to meet the specific conditions and
requirements of our project.Once the YOLO-v8 model
completes its training, it transitions into the evaluation phase,
where its performance is meticulously assessed using well-
established metrics critical for determining its readiness for
real-world application. This assessment includes mean
Fig. 4 Sample Segmentation(I/II)
average precision (mAP), recall,
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on May 07,2025 at 14:23:08 UTC from IEEE Xplore. Restrictions apply.
IX. CONCLUSION AND FUTURE WORKS
REFERENCES
[1] Azzouni, A., & Rahmani, Y. (2023). Underwater Trash Detection Using
Deep Learning Techniques: A Review. Sensors, 23(1), 124.
[2] Bassel, E. A., Refaat, M., Abdelhamed, M., Shorim, N., & AbdelRaouf,
A. (2021). Automatic Video summarization with Timestamps using
natural language processing text fusion. IEEE 11th Annual Computing
and Communication Workshop and Conference (CCWC), 0060-0066.
[3] Boorugu, R., & Ramesh, G. (2020). A survey on NLP- based text
Fig. 7 Segmentation mask mAP
summarization for summarizing product reviews. In Proceedings of the
comparison. 2nd International Conference on Inventive Research in Computing
Applications, ICIRCA 2020 (pp. 352-356).
[4] A. K, S. Y. Prathima, S. Ps, V. V and T. M, "Toxicity Detection in Soap
using Deep Learning," 2023 International Conference on Innovative
Computing, Intelligent Communication and Smart Electrical Systems
(ICSES), Chennai, India, 2023, pp. 1-6, doi:
10.1109/ICSES60034.2023.10465402.
[5] Ananthajothi, K & Subramaniam, M , ‘CLDC: Efficient Classification of
Medical Data Using Class Level Disease Convergence Divergence
Measure’, International Journal of Innovative Technology and Exploring
Engineering (IJITEE), ISSN 2278-3075, Vol. 8, no. 10, pp. 2256-2262
(2019).https://doi:10.35940/ijitee.J1123.0881019.
[6] Jeevitha, V. K., & Hemalatha, M. (2020). Natural Language Description
for Videos Using NetVLAD and Attentional LSTM. 2020 International
Conferencefor Emerging Technology (INCET), 1-6.
[7] 5. S. Saravanan, M. Sivabalakrishnan, N. Duraimurugan, D. Divya,
"Artificial Intelligence Security Model For Privacy Renitence In Big Data
Fig. 8 Bounding box mAP comparison. Analytics", Applied Mathematics & Information Sciences, 2022, Vol-16
[6], pp. 919-927, doi:10.18576/amis/160608.
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on May 07,2025 at 14:23:08 UTC from IEEE Xplore. Restrictions apply.
[8] Al Muksit, F. A., Hossain, A. B. M. S., & Rahman, M.S. (2022).
Underwater object detection using TC- YOLO with attention
mechanisms. Sensors, 22(5), 1786.
[9] D. Nagendiran and S.P. Chokkalingam, "Real Time Brain Tumor
Prediction Using Adaptive Neuro Fuzzy Technique," Intell. Automat.
Soft Comput., vol. 33, no. 2, pp. 983-996. 2022.
https://doi.org/10.32604/iasc.2022.023982.
[10] Murugan, V., Emilyn, J.J., “Monitoring And Forecasting Of Water
Quality And Fish Population Using Stacked LSTM-GRU In Iot
Environment”, Comptes Rendus De L'academie Bulgare Des Sciences,
2021, 74(10), Pp. 1529–1536
[11] Murugan, V., Jeba Emilyn, J.,Prabu M, “Implementation of stacked
autoencoder with rbm for predicting and monitoring aquatic
biodiversity”, International Journal on Emerging Technologies, 2020,
11(3), pp. 816–820.
[12] K. Saraswathi, V. Mohanraj, Y. Suresh, J. Senthil Kumar, "A Hybrid
Multi-Feature Semantic Similarity- Based Online Social
Recommendation System Using CNN," International Journal of
Uncertainty, Fuzziness and Knowledge-Based Systems, December
2021.
[13] S. Senthil Pandi, V. R. Chiranjeevi, Kumaragurubaran. T and Kumar P,
"Improvement of Classification Accuracy in Machine Learning
Algorithm by Hyper- Parameter Optimization," 2023 International
Conference on Research Methodologies in Knowledge Management,
Artificial Intelligence and Telecommunication Engineering
(RMKMATE), Chennai, India, 2023, pp. 1-5, doi:
10.1109/RMKMATE59243.2023.10369177
[14] S. SenthilPandi, B. Kalpana, V. K. S and Kumar P, "Lung Tumor
Volumetric Estimation and Segmentation using Adaptive Multiple
Resolution Contour Model," 2023 International Conference on
Research Methodologies in Knowledge Management, Artificial
Intelligence and Telecommunication Engineering (RMKMATE),
Chennai, India, 2023, pp. 1-4, doi:
10.1109/RMKMATE59243.2023.10369853
Authorized licensed use limited to: Amrita School of Engineering. Downloaded on May 07,2025 at 14:23:08 UTC from IEEE Xplore. Restrictions apply.