Active Detection For Fish Species Recognition in Underwater Environments

Active detection for fish species recognition in underwater
environments
Chiranjibi Shaha , M M Nabib , Simegnew Yihunie Alabab , Ryan Caillouetc , Jack Priora,c ,
Matthew Campbellc , Matthew D. Grossi c , Farron Wallaced , John E. Ballb , and Robert
Moorheada
a
Northern Gulf Institute, Mississippi State University, Starkville, MS 39759, USA
b
Department of Electrical and Computer Engineering, James Worth Bagley College of
Engineering, Mississippi State University, Starkville, MS 39762, USA
c
National Marine Fisheries Services, Southeast Fisheries Science Center, 3209 Frederic Street,
Pascagoula, MS 39567, USA
d
NOAA Fisheries, 4700 Avenue U, Galveston, TX 77551, USA
ABSTRACT
Fish species must be identified for stock assessments, ecosystem monitoring, production management, and the
conservation of endangered species. Implementing algorithms for fish species detection in underwater settings like
the Gulf of Mexico poses a formidable challenge. Active learning, a method that efficiently identifies informative
samples for annotation while staying within a budget, has demonstrated its effectiveness in the context of object
detection in recent times. In this study, we present an active detection model designed for fish species recognition
in underwater environments. This model can be employed as an object detection system to effectively lower the
expense associated with manual annotation. It uses epistemic uncertainty with evidential deep learning (EDL)
and proposes a novel module denoted as model evidence head (MEH) for fish species detection in underwater
environments. It employs hierarchical uncertainty aggregation (HUA) to obtain the informativeness of an image.
We conducted experiments using a fine-grained and extensive dataset of reef fish collected from the Gulf of
Mexico, specifically the Southeast Area Monitoring and Assessment Program Dataset 2021 (SEAMAPD21).
The experimental results demonstrate that an active detection framework achieves better detection performance
on the SEAMAPD21 dataset demonstrating a favorable balance between performance and data efficiency for
underwater fish species recognition.
Keywords: active detection, iformativeness of an image, uncertainty aggregation, underwater fish species recog-
nition
1. INTRODUCTION
For over 30 years, the NOAA’s Southeast Fisheries Science Center has been carrying out video surveys within
the Gulf of Mexico. These surveys are essential in providing valuable data for scientists to estimate the long-
term assessment of ecologically and economically important species’ population status. They are essential for
the management and sustainability of the snapper-grouper fisheries in the Gulf of Mexico. Although there is a
growing interest in video surveys within NOAA Fisheries and the Southeast Fisheries Science Center, the process
of processing these videos remains slow and costly, relying heavily on manual labor.
The identification and enumeration of fish species are crucial aspects of processing video-based fish surveys.
Accurately and efficiently recognizing fish species is essential for species identification, ecosystem monitoring,
and the development of effective management systems.1–3 The determination of population status relies on
precise survey information, and accurate identification of fish species becomes particularly vital when their
Further author information: (Send correspondence to John E. Ball)
Chiranjibi Shah: E-mail: [email protected],
John E. Bal: E-mail: [email protected]
Ocean Sensing and Monitoring XVI, 1

edited by Weilin Hou, Linda J. Mullen,
Proc. of SPIE Vol. 13061, 130610D · © 2024 SPIE
0277-786X · doi: 10.1117/12.3013344
Proc. of SPIE Vol. 13061 130610D-1

productivity is at risk. Traditional human-based methods of fish species identification are labor-intensive and
time-consuming, often leading to delays in management decisions. This poses challenges for sustainably managing
fishing production, monitoring national fisheries, assessing fish populations, and identifying species at risk.1, 2
Therefore, employing deep learning (DL)-based models for fish species identification4, 5 holds promise as it could
reduce costs, save time, and improve identification accuracy.
Implementing a machine vision approach is a potential solution for replacing manual systems. Several meth-
ods, including sonar,6 lidar,7 and RGB8 imaging, are used for fish identification. However, RGB imaging is
preferred for its capability to detect species based on their shape, texture, and color. Typically, images are
extracted from video sequences and further processed.9, 10 This approach is cost-effective, lightweight, and envi-
ronmentally friendly as it does not harm fish habitats. Various camera-based technologies have been employed
to monitor fish stocks and ecosystem sustainability.11, 12 Numerous Deep Learning (DL) methods for object
identification and categorization can be applied to gather insights into marine ecology. Nonetheless, the under-
water environment poses challenges due to low-light conditions and limited image resolution, making it difficult
to distinguish fish from the background. Moreover, fish movement and density result in images of fish in varying
poses and introduce occlusion issues, making localization and classification of underwater fish species challenging.
Deep Learning (DL) has been extensively applied in computer vision to address various challenges including
detection, localization, estimation, and classification.13–15 However, its adoption in agriculture and marine
ecosystems remains limited. Several machine learning (ML) and DL techniques have been developed for the
classification of fish species. For example, Huang et al.16 employed hierarchical features and support vector
machines (SVM) for fish classification, while Jager et al.17 utilized the AlexNet deep-learning model for feature
extraction and multiclass SVM for classification. Zhuang et al.18 conducted fish classification using advanced
DL techniques with pre- and post-processing.
However, most existing Deep Learning (DL) techniques have been designed and trained using straightforward
classification datasets where each image contains a single fish. This does not mirror the real-world conditions
of marine environments, where multiple fish are commonly found in a single image. Consequently, applying
simple classification networks becomes challenging. To tackle this challenge, the Southeast Area Monitoring
and Assessment Program Dataset 2021 (SEAMAPD21)12 was created to feature multiple fish within a single
image, making it more representative of natural habitats. Prior et. al.19 presented AI/ML for optics-based fish
sampling. As a result, researchers working with these models, raw data, and derived products can have increased
confidence in their analyses and subsequent management decisions. Alaba et al.20 presented a semi-supervised
object detection approach for recognizing fish species. Shah et. al.21 introduced an enhanced YOLOv5 method
for fish species recognition in SEAMAPD21 data. Nabi et. al22 presented probabilistic model-based active
learning for fish species recognition. Shah et. al.23 introduced a zero-shot detection-based method for fish
species recognition in SEAMAPD21. Shah and colleagues24 introduced a framework for fish species recognition
based on multiple instance-based active learning. In general, current methods do not fully capitalize on epistemic
uncertainty and often overlook the attributes of bounding boxes during the aggregation of uncertainty.
Park et. al.25 introduced an active learning approach based on evidential deep learning and hierarchical
uncertainty for the public Pascal VOC dataset. Incorporating epistemic uncertainty, they utilize evidential
deep learning (EDL) and introduce a novel module called the model evidence head (MEH). Utilizing the calcu-
lated epistemic uncertainty for individual bounding boxes, it introduce a method called hierarchical uncertainty
aggregation (HUA) to determine the informativeness of an image.
In this paper, we have introduced an active detection for Fish-species recognition (AD-FSR) in underwater
environments. It implements evidential deep learning (EDL), the model evidence head (MEH), and utilizes
hierarchical uncertainty aggregation (HUA) for fish species recognition on SEAMAPD21 data. In AD-FSR, an
implementation of the VGG-1626 backbone network is utilized with varying resolutions of SSD300 and SSD51227
detection heads for active learning. These experiments are carried out on the SEAMAPD21 dataset. The exper-
imental outcomes demonstrate the superior performance of AD-FSR when paired with the VGG512 backbone
network, particularly in terms of mean average precision (mAP), on this challenging dataset.

2. RELATED WORK
In the realm of computer vision, the categorization of fish is an extensively studied challenge. It entails the task of
recognizing and classifying different fish species by comparing their visual features to those of standard specimen
images. Historically, manual feature generation methods have been employed for fish identification.28 However,
these approaches suffer from drawbacks including reduced accuracy and an inability to efficiently handle large
datasets. Deep learning (DL) techniques surpass these shallow learning methods because of their deep-layer
architecture and robust support from extensive data. Researchers have introduced diverse DL-based approaches
for fish classification, covering tasks such as fish detection and recognition within underwater videos.29, 30 Utilizing
three highly varied datasets obtained from real water power locations, the You Only Look Once (YOLO) deep
learning model31 was trained to detect fish in underwater video footage. Jalal et al.32 introduced a two-stage
deep learning approach for detecting and classifying temperate fishes, without the utilization of pre-filtering
techniques. The review article by Yang et al.33 offers an extensive examination of computer vision models
for fish detection, addressing specific application scenarios including high noise, illumination variations, low
contrast, fish deformation, frequent occlusion, and dynamic backgrounds. Additionally, they discuss image
acquisition methods based on both 2D and 3D systems. Alshdaifat et al.34 introduced an innovative framework
for segmenting individual fish instances in underwater videos.
The primary challenge posed by the SEAMAPD21 dataset lies in its underwater nature, characterized by
low light conditions and varying degrees of turbidity. Locating fish within the images can sometimes prove
challenging even for human observers. An additional significant challenge associated with this extensive dataset
is the annotation or labeling process, which demands considerable time and effort to manually annotate the fish.
Furthermore, the dataset exhibits a class-imbalance issue, wherein certain classes possess a higher number of
samples while others have fewer samples. To address this issue, Alaba et al.5 introduced a class-aware loss method
that considers the inverse of the number of samples in each class, aiming to mitigate the class imbalance problem.
The proposed approach was evaluated on the SEAMAPD21 dataset. It employs two widely recognized feature
extraction networks, namely MobileNetv3-large and VGG16, with the single-shot multibox detector (SSD) serving
as the detection head for both regression and classification tasks. Shah et. al24 presented multiple instance-based
active learning frameworks for fish species recognition. By employing adversarial classifiers trained on labeled
sets, it is able to estimate the uncertainty of unlabeled images, thus enabling the selection of the most informative
fish images from unlabeled sets. In general, current methods fail to fully exploit epistemic uncertainty and often
overlook the characteristics of bounding boxes during the process of uncertainty aggregation. In this work, we
present an active learning method that considers evidential deep learning (EDL), model evidence head (MEH),
and hierarchical uncertainty aggregation (HUA)25 for fish species recognition in underwater environments to
overcome the limitations of existing methods.
3. PROPOSED METHOD
3.1 AD-FSR: Active Detection for Fish Species Recognition in Underwater
Environments
Active learning comprises multiple learning cycles. In the initial cycle, a large unlabeled dataset U s and a
small labeled dataset Ls are provided. After training an object detection model, the network chooses the most
informative images I s from U s . The process continues iteratively until the annotation budget is depleted.
N
As shown in Figure 1, the process predicts the confidence scores for classes β = [βn ]n=1 and model evidence λ
N
for obtaining the parameter of concentration α = [αn ]n=1 . Then, it constructs the prior Dirichlet distribution35
Dir(θ|α) for every bounding box.25 Epistemic uncertainty is subsequently calculated using model ensembles
θ ∼ Dir(θ|α) as the mutual information between predictions and the model posterior with the following relation:

I(y, θ) = H Ep(θ|α) [P (y|θ)] − Ep(θ|α) (H [P (y|θ)]) , (1)
where I(y, θ) represents the epistemic uncertainty, the first term in the right side of the Eq. 1 denotes the total
uncertainty, and the second term in the right side of the Eq. 1 denotes the aleatoric uncertainty. The symbol H
represents Shannon entropy, and θ is the parameter to estimate likelihood for functions of probability in Dirichlet

Figure 1. Uncertainty Estimation based on Evidential Deep Learning (EDL) in Active Detection for Fish-species Recog-
nition (AD-FSR). Parameters β and λ are used to estimate α, which is utilized to estimate parameter θ for calculating
epistemic uncertainty.
Figure 2. Hierarchical Uncertainty Aggregation in Active Detection for Fish-species Recognition (AD-FSR)
and categorical distributions. Then, Evidential Deep Learning (EDL) is applied for the fish species recognition.
The marginal likelihood for bounding box yn can be calculated with parameter θ as:
PT QT
Γ( t=1 αt )) t̸=n Γ(αt ))Γ(αt + 1) αn
p(yn |α) = QT · PT =P , (2)
t=1 Γ(αt )) Γ( t=1 αt + 1) t αt
where Γ(αt ) is the gamma function36 such that Γ(αt ) = (αt − 1)!, for an integer αt .
To overcome the issues of instability in EDL due to loss of adversarial regularization and use of the ReLU
function, the model evidence head (MEH) has been used. The model evidence λ produced by MEH can be
rescaled with the following relation:
exp(βn )
αn = λ P , (3)
cl exp(βcl )
where βcl is the confidence for the class cl. Epistemic uncertainty is then estimated based on the re-scaled
Dir(θ|α), obtained with λ and α.

3.2 Hierarchical Uncertainty Aggregation
Utilizing the epistemic uncertainty of a bounding box, we introduce Hierarchical Uncertainty Aggregation
(HUA),25 a novel approach for calculating the ultimate informativeness score of an image. As shown in fig
2, the uncertainties of the boxes at a lower level (such as class) are combined into a single value using a prede-
fined aggregation function. This aggregation process extends from the “class level” to the “object level”. The
HUA uses the sum operation when aggregating uncertainties at the object level, as this corresponds to the num-
ber of objects in the final informativeness of the image. Additionally, it applies the sum operation at the class
level as well. Considering each bounding box’s attributes, HUA efficiently captures the image context within the
informativeness score.
3.3 Backbone Network and Detection Heads

In the context of active learning for fisheries using AD-FSR, we have introduced the VGG-1626 backbone network,
coupled with different input resolutions, for fish analysis. Various backbone networks, including VGG,26 Efficinet-
Net,37 MobileNet,38 and ResNet,39 are commonly used as feature extraction methods. VGG, for instance, can
yield superior detection performance despite potentially needing more parameters. With this property, large and
small-scale objects can be detected properly. The SSD detection head27 is employed for bounding box regression
and predicting fish species. SSD operates at different scales and aspect ratios, allowing it to detect objects at four
locations.27, 40 This characteristic enables proper detection of both large and small-scale objects. Non-maximum
suppression (NMS) is utilized to decrease the Intersection over Union (IOU) value. IOU provides an estimate
of the overlap between the true prediction and the original ground truth. A higher IOU value can result in
more accurate predictions. For the VGG-16 backbone network, we have used an SSD detection head.27 The
input resolution in both networks is adjusted from 300×300 to 512×512 to enhance detection performance across
various scales.
3.4 Dataset
The extensive SEAMAPD2112 dataset of reef fish has been utilized for the experiment. In its entirety, the dataset
contains 130 unique fish species classes within underwater environments, comprising 28,319 images. Nevertheless,
certain species have limited representation, and the model is affected by samples with higher species frequency
per class. Species with lower frequencies pose challenges in achieving an adequate train-test ratio necessary for
active learning methods. To mitigate intraclass similarities among different fish species and to address the scarcity
of samples for rare species, efforts have been made to diminish redundancy from similar species. Consequently,
species with higher occurrences have been prioritized for selection. To achieve this goal, we have chosen 51 unique
fish species classes from the SEAMAPD21 dataset. This dataset, which is fisheries-independent and underwater,
presents challenges in fish detection due to the low resolution, which makes it challenging to differentiate between
images and the background. A split ratio of 70/15/15 is employed for the training, validation, and test sets,
respectively.
4. EXPERIMENTAL RESULTS
4.1 Experimental Settings
We employed the VGG-1626 backbone network in conjunction with the SSD27 detection head. Input resolutions
of both 300×300 and 512×512 were tested to assess the performance across different scales. For AD-FSR on the
SEAMAPD21 dataset, 5 (%) of the labeled samples were initially chosen from the training set. Subsequently, in
each active learning cycle, 2.5 (%) of the fish images were selected from the remaining unlabeled pool until the
labeled pool reached 20 (%) of the training set. Next, the detection results, in terms of mean Average Precision
(mAP), were evaluated on the test set. We chose an Intersection over Union (IOU) threshold of 0.50 to calculate
the mAP across 51 classes of fish species. To mitigate intraclass similarity among various fish species and address
the issue of limited samples for rare species, we have aimed to minimize redundancy caused by similar species.
Our approach involves selecting species with a higher frequency of occurrences. During each active learning cycle,
a Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.001, λ of 0.5 (epistemic uncertainty
decreases for higher values), and a batch size of 64 was employed.

4.2 Performance Analysis
The performance, in terms of mAP, obtained by averaging the average precision across 51 distinct fish species
for the VGG-16 backbone network with the SSD detection head, is shown in Table 1. For the VGG-16 backbone
network with the SSD head the input resolution of 512×512 outperforms the input resolution of 300×300 as
shown in Table 1.
Similarly, mean average precision (mAP) for the public Pascal VOC data is presented in Table 2. It can be
observed that for the VGG-16 backbone network with the SSD detection head, the input resolution of 512×512
outperforms the input resolution of 300×300.
Table 1. mAP (%) for AD-FSR on SEAMAPD21 data for varying resolutions
mAP(%) on ratio (%) of labeled samples

Model Backbone
5 7.5 10 12.5 15 17.5 20
SSD300 VGG-16 43.4 49.7 53.2 56.3 58.1 61.0 63.3
SSD512 VGG-16 51.3 56.4 58.3 62.4 65.5 67.3 69.1
difference 7.9 6.7 5.1 6.1 7.4 6.3 5.8
Table 2. mAP (%) for AD-FSR on public pascal VOC for varying resolutions
mAP(%) on ratio (%) of labeled samples

Model Backbone
5 7.5 10 12.5 15 17.5 20
SSD300 VGG-16 53.4 59.8 62.8 65.2 67.5 69.1 70.6
SSD512 VGG-16 58.2 63.6 66.8 68.8 70.6 71.7 73.4
difference 4.8 3.8 4.0 3.6 3.1 2.6 2.8
Figure 3 is a graphical representation of the performance corresponding to the various input resolutions
presented in Table 1. It can be observed that the VGG 16 backbone with SSD head having input resolution
512 × 512 outperforms VGG 16 with SSD head consisting of 300 × 300 input resolution for all numbers of labeled
training samples ranging from 5% to 20%.
Figure 3. Performance comparison of AD-FSR on SEAMAPD21 data for different resolutions
Table 3 presents the parameter counts and inference times in frames per second (FPS) for various input

resolutions. To ensure fairness in comparison, all experiments were carried out on a single node using an
NVIDIA A100-SXM GPU. It is observed that the VGG-16 with SSD512 exhibits a higher parameter count and
inference time (in terms of FPS) compared to the VGG-16 with SSD300.
Table 3. Inference time of AD-FSR on SEAMAPD21 data for different resolutions
Model Backbone Parameters Frames per second (FPS)

SSD300 VGG-16 30.56 M 32
SSD512 VGG-16 31.90M 28
The results of detection on fish species dataset for varying input resolutions are demonstrated in the first and
second columns of Figures 4. The detection maps align with the mean average precision (mAP) results provided
in table 1. The first column of Figure 4 shows the qualitative output for AD-FSR with VGG-16 backbone and
SSD300 detection head on SEAMAPD21. Similarly, the second column of Figure 4 demonstrates the qualitative
output for AD-FSR with VGG-16 backbone and SSD512 detection head on SEAMAPD21. The results indicate
that the VGG-16 with SSD512 (second column in Figure 4) demonstrates superior detection performance for
fish species in underwater environments when compared to the VGG-16 with SSD300 (first column in Figure 4).
Figure 4. Detection images of AD-FSR with VGG-16 backbone network with varying resolutions. (left column) SSD300
detection head on SEAMAPD21 (right column) SSD512 detection head on SEAMAPD21
5. CONCLUSIONS
In summary, we employed active detection for fish species recognition (AD-FSR) within the SEAMAPD21
dataset. Utilizing the evidential deep learning framework and model evidence head, this approach proficiently
estimates the epistemic uncertainty associated with a bounding box. Additionally, the hierarchical uncertainty
aggregation offers a novel method for calculating the informativeness of an image. The VGG16 backbone network

was applied with multiple resolutions. We have observed that among the applied models, the VGG16 with the
SSD512 model delivers the best performance. Nonetheless, this model is considerably large, demanding more
computational resources and consequently impeding real-time processing. This represents the trade-off between
accuracy and computational power. In future work, it can be extended for semi-supervised learning, transfer
learning, or domain adaptation, and entropy-based or query-by-committee approaches can be combined with
evidential deep learning.
ACKNOWLEDGMENT
This research received support from grants NA16OAR4320199 and NA21OAR4320190 awarded to the Northern
Gulf Institute at Mississippi State University by NOAA’s Office of Oceanic and Atmospheric Research, U.S.
Department of Commerce. The authors acknowledge this funding source with gratitude.
REFERENCES
[1] Chang, C., Fang, W., Jao, R.-C., Shyu, C., and Liao, I.-C., “Development of an intelligent feeding con-
trkrizhevsky2009learningoller for indoor intensive culturing of eel,” Aquacultural engineering 32(2), 343–353
(2005).
[2] Cabreira, A. G., Tripode, M., and Madirolas, A., “Artificial neural networks for fish-species identification,”
ICES Journal of Marine Science 66(6), 1119–1129 (2009).
[3] Prior, J. H., Alaba, S. Y., Wallace, F., Campbell, M. D., Shah, C., Nabi, M. M., Mickle, P. F., Moorhead,
R., and Ball, J. E., “Optimizing and gauging model performance with metrics to integrate with existing
video surveys,” in [OCEANS 2023 - MTS/IEEE U.S. Gulf Coast ], 1–6 (2023).
[4] Yang, X., Zhang, S., Liu, J., Gao, Q., Dong, S., and Zhou, C., “Deep learning for smart fish farming:
applications, opportunities and challenges,” Reviews in Aquaculture 13(1), 66–90 (2021).
[5] Alaba, S. Y., Nabi, M., Shah, C., Prior, J., Campbell, M. D., Wallace, F., Ball, J. E., and Moorhead, R.,
“Class-aware fish species recognition using deep learning for an imbalanced dataset,” Sensors 22(21), 8268
(2022).
[6] Boswell, K. M., Wilson, M. P., and Cowan Jr, J. H., “A semiautomated approach to estimating fish size,
abundance, and behavior from dual-frequency identification sonar (didson) data,” North American Journal
of Fisheries Management 28(3), 799–807 (2008).
[7] Churnside, J. H., Wells, R., Boswell, K. M., Quinlan, J. A., Marchbanks, R. D., McCarty, B. J., and Sutton,
T. T., “Surveying the distribution and abundance of flying fishes and other epipelagics in the northern gulf
of mexico using airborne lidar,” Bulletin of Marine Science 93(2), 591–609 (2017).
[8] Villon, S., Chaumont, M., Subsol, G., Villéger, S., Claverie, T., and Mouillot, D., “Coral reef fish detection
and recognition in underwater videos by supervised machine learning: Comparison between deep learning
and hog+ svm methods,” in [International Conference on Advanced Concepts for Intelligent Vision Systems ],
160–171, Springer (2016).
[9] Sirohey, S., Rosenfeld, A., and Duric, Z., “A method of detecting and tracking irises and eyelids in video,”
Pattern recognition 35(6), 1389–1401 (2002).
[10] Morshed, M., Nabi, M., and Monzur, N., “Frame by frame digital video denoising using multiplicative noise
model,” Int. J. Technol. Enhanc. Emerg. Eng. Res 2, 1–6 (2014).
[11] Gilby, B. L., Olds, A. D., Connolly, R. M., Yabsley, N. A., Maxwell, P. S., Tibbetts, I. R., Schoeman,
D. S., and Schlacher, T. A., “Umbrellas can work under water: Using threatened species as indicator
and management surrogates can improve coastal conservation,” Estuarine, Coastal and Shelf Science 199,
132–140 (2017).
[12] Boulais, O., Alaba, S. Y., Ball, J. E., Campbell, M., Iftekhar, A. T., Moorehead, R., Primrose, J., Prior,
J., Wallace, F., Yu, H., et al., “Seamapd21: a large-scale reef fish dataset for fine-grained categorization,”
The Eight Workshop on Fine-Grained Visual Categorization (2021).
[13] Zhao, M., Chang, C. H., Xie, W., Xie, Z., and Hu, J., “Cloud shape classification system based on multi-
channel cnn and improved fdm,” IEEE Access 8, 44111–44124 (2020).

[14] Nabi, M., Senyurek, V., Gurbuz, A. C., and Kurum, M., “Deep learning-based soil moisture retrieval in
conus using cygnss delay–doppler maps,” IEEE Journal of Selected Topics in Applied Earth Observations
and Remote Sensing 15, 6867–6881 (2022).
[15] Islam, F., Nabi, M., and Ball, J. E., “Off-road detection analysis for autonomous ground vehicles: a review,”
Sensors 22(21), 8463 (2022).
[16] Huang, P. X., Boom, B. J., and Fisher, R. B., “Underwater live fish recognition using a balance-guaranteed
optimized tree,” in [Asian Conference on Computer Vision], 422–433, Springer (2012).
[17] Jäger, J., Rodner, E., Denzler, J., Wolff, V., and Fricke-Neuderth, K., “Seaclef 2016: Object proposal
classification for fish detection in underwater videos.,” in [CLEF (working notes)], 481–489 (2016).
[18] Zhuang, P., Xing, L., Liu, Y., Guo, S., and Qiao, Y., “Marine animal detection and recognition with
advanced deep learning models,” in [CLEF ], (2017).
[19] Prior, J. H., Campbell, M. D., Dawkins, M., Mickle, P. F., Moorhead, R. J., Alaba, S. Y., Shah, C.,
Salisbury, J. R., Rademacher, K. R., Felts, A. P., et al., “Estimating precision and accuracy of automated
video post-processing: A step towards implementation of ai/ml for optics-based fish sampling,” Frontiers in
Marine Science 10, 1150651 (2023).
[20] Alaba, S., Shah, C., Nabi, M., Ball, J., Moorhead, R., Han, D., Prior, J., Campbell, M., and Wallace,
F., “Semi-supervised learning for fish species recognition,” in [Ocean Sensing and Monitoring XV], 12543,
248–255, SPIE (2023).
[21] Shah, C., Alaba, S. Y., Nabi, M., Prior, J., Campbell, M., Wallace, F., Ball, J. E., and Moorhead, R., “An
enhanced yolov5 model for fish species recognition from underwater environments,” in [Ocean Sensing and
Monitoring XV], 12543, 239–247, SPIE (2023).
[22] Nabi, M. M., Shah, C., Alaba, S. Y., Prior, J., Campbell, M. D., Wallace, F., Moorhead, R., and Ball,
J. E., “Probabilistic model-based active learning with attention mechanism for fish species recognition,” in
[OCEANS 2023 - MTS/IEEE U.S. Gulf Coast ], 1–8 (2023).
[23] Shah, C., Nabi, M. M., Alaba, S. Y., Prior, J., Caillouet, R., Campbell, M. D., Wallace, F., Ball, J. E., and
Moorhead, R., “A zero shot detection based approach for fish species recognition in underwater environ-
ments,” in [OCEANS 2023 - MTS/IEEE U.S. Gulf Coast ], 1–7 (2023).
[24] Shah, C., Alaba, S. Y., Nabi, M., Caillouet, R., Prior, J., Campbell, M., Wallace, F., Ball, J. E., and
Moorhead, R., “Mi-afr: multiple instance active learning-based approach for fish species recognition in
underwater environments,” in [Ocean Sensing and Monitoring XV], 12543, 227–238, SPIE (2023).
[25] Park, Y., Choi, W., Kim, S., Han, D.-J., and Moon, J., “Active learning for object detection with eviden-
tial deep learning and hierarchical uncertainty aggregation,” in [The Eleventh International Conference on
Learning Representations ], (2023).
[26] Simonyan, K. and Zisserman, A., “Very deep convolutional networks for large-scale image recognition,”
arXiv preprint arXiv:1409.1556 (2014).
[27] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., and Berg, A. C., “Ssd: Single shot
multibox detector,” in [European conference on computer vision ], 21–37, Springer (2016).
[28] Ravanbakhsh, M., Shortis, M., Shaifat, F., Mian, A. S., Harvey, E., and Seager, J., “An application of
shape-based level sets to fish detection in underwater images.,” in [GSR ], (2014).
[29] Chen, G., Sun, P., and Shang, Y., “Automatic fish classification system using deep learning,” in [2017 IEEE
29th International Conference on Tools with Artificial Intelligence (ICTAI)], 24–29, IEEE (2017).
[30] Spampinato, C., Chen-Burger, Y.-H., Nadarajan, G., and Fisher, R. B., “Detecting, tracking and counting
fish in low quality unconstrained underwater videos.,” VISAPP (2) 2008(514-519), 1 (2008).
[31] Xu, W. and Matzner, S., “Underwater fish detection using deep learning for water power applications,” in
[2018 International conference on computational science and computational intelligence (CSCI)], 313–318,
IEEE (2018).
[32] Jalal, A., Salman, A., Mian, A., Shortis, M., and Shafait, F., “Fish detection and species classification
in underwater environments using deep learning with temporal information,” Ecological Informatics 57,
101088 (2020).

[33] Yang, L., Liu, Y., Yu, H., Fang, X., Song, L., Li, D., and Chen, Y., “Computer vision models in intelligent
aquaculture with emphasis on fish detection and behavior analysis: a review,” Archives of Computational
Methods in Engineering 28, 2785–2816 (2021).
[34] Alshdaifat, N. F. F., Talib, A. Z., and Osman, M. A., “Improved deep learning framework for fish segmen-
tation in underwater videos,” Ecological Informatics 59, 101121 (2020).
[35] Sensoy, M., Kaplan, L., and Kandemir, M., “Evidential deep learning to quantify classification uncertainty,”
(2018).
[36] Tu, S., “The dirichlet-multinomial and dirichlet-categorical models for bayesian inference,” (2014).
[37] Tan, M. and Le, Q., “Efficientnet: Rethinking model scaling for convolutional neural networks,” in [Inter-
national conference on machine learning], 6105–6114, PMLR (2019).
[38] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., and Adam,
H., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint
arXiv:1704.04861 (2017).
[39] He, K., Zhang, X., Ren, S., and Sun, J., “Deep residual learning for image recognition,” in [Proceedings of
the IEEE conference on computer vision and pattern recognition ], 770–778 (2016).
[40] Alaba, S. Y. and Ball, J. E., “Wcnn3d: Wavelet convolutional neural network-based 3d object detection for
autonomous driving,” Sensors 22(18) (2022).
10

Active Detection For Fish Species Recognition in Underwater Environments

Uploaded by

Copyright:

Available Formats

Active Detection For Fish Species Recognition in Underwater Environments

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Active Detection For Fish Species Recognition in Underwater Environments

Uploaded by

Copyright:

Available Formats

Active detection for fish species recognition in underwater

Ocean Sensing and Monitoring XVI, 1

Proc. of SPIE Vol. 13061 130610D-1

Proc. of SPIE Vol. 13061 130610D-2

Proc. of SPIE Vol. 13061 130610D-3

Proc. of SPIE Vol. 13061 130610D-4

3.3 Backbone Network and Detection Heads

Proc. of SPIE Vol. 13061 130610D-5

mAP(%) on ratio (%) of labeled samples

mAP(%) on ratio (%) of labeled samples

Figure 3. Performance comparison of AD-FSR on SEAMAPD21 data for different resolutions

Proc. of SPIE Vol. 13061 130610D-6

Table 3. Inference time of AD-FSR on SEAMAPD21 data for different resolutions

Model Backbone Parameters Frames per second (FPS)

Proc. of SPIE Vol. 13061 130610D-7

Proc. of SPIE Vol. 13061 130610D-8

Proc. of SPIE Vol. 13061 130610D-9

Proc. of SPIE Vol. 13061 130610D-10

You might also like