Active Detection For Fish Species Recognition in Underwater Environments
Active Detection For Fish Species Recognition in Underwater Environments
Active Detection For Fish Species Recognition in Underwater Environments
environments
Chiranjibi Shaha , M M Nabib , Simegnew Yihunie Alabab , Ryan Caillouetc , Jack Priora,c ,
Matthew Campbellc , Matthew D. Grossi c , Farron Wallaced , John E. Ballb , and Robert
Moorheada
a
Northern Gulf Institute, Mississippi State University, Starkville, MS 39759, USA
b
Department of Electrical and Computer Engineering, James Worth Bagley College of
Engineering, Mississippi State University, Starkville, MS 39762, USA
c
National Marine Fisheries Services, Southeast Fisheries Science Center, 3209 Frederic Street,
Pascagoula, MS 39567, USA
d
NOAA Fisheries, 4700 Avenue U, Galveston, TX 77551, USA
ABSTRACT
Fish species must be identified for stock assessments, ecosystem monitoring, production management, and the
conservation of endangered species. Implementing algorithms for fish species detection in underwater settings like
the Gulf of Mexico poses a formidable challenge. Active learning, a method that efficiently identifies informative
samples for annotation while staying within a budget, has demonstrated its effectiveness in the context of object
detection in recent times. In this study, we present an active detection model designed for fish species recognition
in underwater environments. This model can be employed as an object detection system to effectively lower the
expense associated with manual annotation. It uses epistemic uncertainty with evidential deep learning (EDL)
and proposes a novel module denoted as model evidence head (MEH) for fish species detection in underwater
environments. It employs hierarchical uncertainty aggregation (HUA) to obtain the informativeness of an image.
We conducted experiments using a fine-grained and extensive dataset of reef fish collected from the Gulf of
Mexico, specifically the Southeast Area Monitoring and Assessment Program Dataset 2021 (SEAMAPD21).
The experimental results demonstrate that an active detection framework achieves better detection performance
on the SEAMAPD21 dataset demonstrating a favorable balance between performance and data efficiency for
underwater fish species recognition.
Keywords: active detection, iformativeness of an image, uncertainty aggregation, underwater fish species recog-
nition
1. INTRODUCTION
For over 30 years, the NOAA’s Southeast Fisheries Science Center has been carrying out video surveys within
the Gulf of Mexico. These surveys are essential in providing valuable data for scientists to estimate the long-
term assessment of ecologically and economically important species’ population status. They are essential for
the management and sustainability of the snapper-grouper fisheries in the Gulf of Mexico. Although there is a
growing interest in video surveys within NOAA Fisheries and the Southeast Fisheries Science Center, the process
of processing these videos remains slow and costly, relying heavily on manual labor.
The identification and enumeration of fish species are crucial aspects of processing video-based fish surveys.
Accurately and efficiently recognizing fish species is essential for species identification, ecosystem monitoring,
and the development of effective management systems.1–3 The determination of population status relies on
precise survey information, and accurate identification of fish species becomes particularly vital when their
Further author information: (Send correspondence to John E. Ball)
Chiranjibi Shah: E-mail: [email protected],
John E. Bal: E-mail: [email protected]
3. PROPOSED METHOD
3.1 AD-FSR: Active Detection for Fish Species Recognition in Underwater
Environments
Active learning comprises multiple learning cycles. In the initial cycle, a large unlabeled dataset U s and a
small labeled dataset Ls are provided. After training an object detection model, the network chooses the most
informative images I s from U s . The process continues iteratively until the annotation budget is depleted.
N
As shown in Figure 1, the process predicts the confidence scores for classes β = [βn ]n=1 and model evidence λ
N
for obtaining the parameter of concentration α = [αn ]n=1 . Then, it constructs the prior Dirichlet distribution35
Dir(θ|α) for every bounding box.25 Epistemic uncertainty is subsequently calculated using model ensembles
θ ∼ Dir(θ|α) as the mutual information between predictions and the model posterior with the following relation:
I(y, θ) = H Ep(θ|α) [P (y|θ)] − Ep(θ|α) (H [P (y|θ)]) , (1)
where I(y, θ) represents the epistemic uncertainty, the first term in the right side of the Eq. 1 denotes the total
uncertainty, and the second term in the right side of the Eq. 1 denotes the aleatoric uncertainty. The symbol H
represents Shannon entropy, and θ is the parameter to estimate likelihood for functions of probability in Dirichlet
Figure 2. Hierarchical Uncertainty Aggregation in Active Detection for Fish-species Recognition (AD-FSR)
and categorical distributions. Then, Evidential Deep Learning (EDL) is applied for the fish species recognition.
The marginal likelihood for bounding box yn can be calculated with parameter θ as:
PT QT
Γ( t=1 αt )) t̸=n Γ(αt ))Γ(αt + 1) αn
p(yn |α) = QT · PT =P , (2)
t=1 Γ(αt )) Γ( t=1 αt + 1) t αt
where Γ(αt ) is the gamma function36 such that Γ(αt ) = (αt − 1)!, for an integer αt .
To overcome the issues of instability in EDL due to loss of adversarial regularization and use of the ReLU
function, the model evidence head (MEH) has been used. The model evidence λ produced by MEH can be
rescaled with the following relation:
exp(βn )
αn = λ P , (3)
cl exp(βcl )
where βcl is the confidence for the class cl. Epistemic uncertainty is then estimated based on the re-scaled
Dir(θ|α), obtained with λ and α.
3.4 Dataset
The extensive SEAMAPD2112 dataset of reef fish has been utilized for the experiment. In its entirety, the dataset
contains 130 unique fish species classes within underwater environments, comprising 28,319 images. Nevertheless,
certain species have limited representation, and the model is affected by samples with higher species frequency
per class. Species with lower frequencies pose challenges in achieving an adequate train-test ratio necessary for
active learning methods. To mitigate intraclass similarities among different fish species and to address the scarcity
of samples for rare species, efforts have been made to diminish redundancy from similar species. Consequently,
species with higher occurrences have been prioritized for selection. To achieve this goal, we have chosen 51 unique
fish species classes from the SEAMAPD21 dataset. This dataset, which is fisheries-independent and underwater,
presents challenges in fish detection due to the low resolution, which makes it challenging to differentiate between
images and the background. A split ratio of 70/15/15 is employed for the training, validation, and test sets,
respectively.
4. EXPERIMENTAL RESULTS
4.1 Experimental Settings
We employed the VGG-1626 backbone network in conjunction with the SSD27 detection head. Input resolutions
of both 300×300 and 512×512 were tested to assess the performance across different scales. For AD-FSR on the
SEAMAPD21 dataset, 5 (%) of the labeled samples were initially chosen from the training set. Subsequently, in
each active learning cycle, 2.5 (%) of the fish images were selected from the remaining unlabeled pool until the
labeled pool reached 20 (%) of the training set. Next, the detection results, in terms of mean Average Precision
(mAP), were evaluated on the test set. We chose an Intersection over Union (IOU) threshold of 0.50 to calculate
the mAP across 51 classes of fish species. To mitigate intraclass similarity among various fish species and address
the issue of limited samples for rare species, we have aimed to minimize redundancy caused by similar species.
Our approach involves selecting species with a higher frequency of occurrences. During each active learning cycle,
a Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.001, λ of 0.5 (epistemic uncertainty
decreases for higher values), and a batch size of 64 was employed.
Table 2. mAP (%) for AD-FSR on public pascal VOC for varying resolutions
Figure 3 is a graphical representation of the performance corresponding to the various input resolutions
presented in Table 1. It can be observed that the VGG 16 backbone with SSD head having input resolution
512 × 512 outperforms VGG 16 with SSD head consisting of 300 × 300 input resolution for all numbers of labeled
training samples ranging from 5% to 20%.
Table 3 presents the parameter counts and inference times in frames per second (FPS) for various input
The results of detection on fish species dataset for varying input resolutions are demonstrated in the first and
second columns of Figures 4. The detection maps align with the mean average precision (mAP) results provided
in table 1. The first column of Figure 4 shows the qualitative output for AD-FSR with VGG-16 backbone and
SSD300 detection head on SEAMAPD21. Similarly, the second column of Figure 4 demonstrates the qualitative
output for AD-FSR with VGG-16 backbone and SSD512 detection head on SEAMAPD21. The results indicate
that the VGG-16 with SSD512 (second column in Figure 4) demonstrates superior detection performance for
fish species in underwater environments when compared to the VGG-16 with SSD300 (first column in Figure 4).
Figure 4. Detection images of AD-FSR with VGG-16 backbone network with varying resolutions. (left column) SSD300
detection head on SEAMAPD21 (right column) SSD512 detection head on SEAMAPD21
5. CONCLUSIONS
In summary, we employed active detection for fish species recognition (AD-FSR) within the SEAMAPD21
dataset. Utilizing the evidential deep learning framework and model evidence head, this approach proficiently
estimates the epistemic uncertainty associated with a bounding box. Additionally, the hierarchical uncertainty
aggregation offers a novel method for calculating the informativeness of an image. The VGG16 backbone network
ACKNOWLEDGMENT
This research received support from grants NA16OAR4320199 and NA21OAR4320190 awarded to the Northern
Gulf Institute at Mississippi State University by NOAA’s Office of Oceanic and Atmospheric Research, U.S.
Department of Commerce. The authors acknowledge this funding source with gratitude.
REFERENCES
[1] Chang, C., Fang, W., Jao, R.-C., Shyu, C., and Liao, I.-C., “Development of an intelligent feeding con-
trkrizhevsky2009learningoller for indoor intensive culturing of eel,” Aquacultural engineering 32(2), 343–353
(2005).
[2] Cabreira, A. G., Tripode, M., and Madirolas, A., “Artificial neural networks for fish-species identification,”
ICES Journal of Marine Science 66(6), 1119–1129 (2009).
[3] Prior, J. H., Alaba, S. Y., Wallace, F., Campbell, M. D., Shah, C., Nabi, M. M., Mickle, P. F., Moorhead,
R., and Ball, J. E., “Optimizing and gauging model performance with metrics to integrate with existing
video surveys,” in [OCEANS 2023 - MTS/IEEE U.S. Gulf Coast ], 1–6 (2023).
[4] Yang, X., Zhang, S., Liu, J., Gao, Q., Dong, S., and Zhou, C., “Deep learning for smart fish farming:
applications, opportunities and challenges,” Reviews in Aquaculture 13(1), 66–90 (2021).
[5] Alaba, S. Y., Nabi, M., Shah, C., Prior, J., Campbell, M. D., Wallace, F., Ball, J. E., and Moorhead, R.,
“Class-aware fish species recognition using deep learning for an imbalanced dataset,” Sensors 22(21), 8268
(2022).
[6] Boswell, K. M., Wilson, M. P., and Cowan Jr, J. H., “A semiautomated approach to estimating fish size,
abundance, and behavior from dual-frequency identification sonar (didson) data,” North American Journal
of Fisheries Management 28(3), 799–807 (2008).
[7] Churnside, J. H., Wells, R., Boswell, K. M., Quinlan, J. A., Marchbanks, R. D., McCarty, B. J., and Sutton,
T. T., “Surveying the distribution and abundance of flying fishes and other epipelagics in the northern gulf
of mexico using airborne lidar,” Bulletin of Marine Science 93(2), 591–609 (2017).
[8] Villon, S., Chaumont, M., Subsol, G., Villéger, S., Claverie, T., and Mouillot, D., “Coral reef fish detection
and recognition in underwater videos by supervised machine learning: Comparison between deep learning
and hog+ svm methods,” in [International Conference on Advanced Concepts for Intelligent Vision Systems ],
160–171, Springer (2016).
[9] Sirohey, S., Rosenfeld, A., and Duric, Z., “A method of detecting and tracking irises and eyelids in video,”
Pattern recognition 35(6), 1389–1401 (2002).
[10] Morshed, M., Nabi, M., and Monzur, N., “Frame by frame digital video denoising using multiplicative noise
model,” Int. J. Technol. Enhanc. Emerg. Eng. Res 2, 1–6 (2014).
[11] Gilby, B. L., Olds, A. D., Connolly, R. M., Yabsley, N. A., Maxwell, P. S., Tibbetts, I. R., Schoeman,
D. S., and Schlacher, T. A., “Umbrellas can work under water: Using threatened species as indicator
and management surrogates can improve coastal conservation,” Estuarine, Coastal and Shelf Science 199,
132–140 (2017).
[12] Boulais, O., Alaba, S. Y., Ball, J. E., Campbell, M., Iftekhar, A. T., Moorehead, R., Primrose, J., Prior,
J., Wallace, F., Yu, H., et al., “Seamapd21: a large-scale reef fish dataset for fine-grained categorization,”
The Eight Workshop on Fine-Grained Visual Categorization (2021).
[13] Zhao, M., Chang, C. H., Xie, W., Xie, Z., and Hu, J., “Cloud shape classification system based on multi-
channel cnn and improved fdm,” IEEE Access 8, 44111–44124 (2020).
10