0% found this document useful (0 votes)

54 views

Unsupervised Knowledge Transfer For Object Detection in Marine Environmental Monitoring and Exploration

This document discusses unsupervised knowledge transfer (UnKnoT) for object detection in marine environmental monitoring using deep learning. It introduces UnKnoT, a new method to more efficiently use limited training data by reusing existing data through "scale transfer" and data augmentation. The method is evaluated on four fully annotated marine image datasets from the same area but with different equipment and distances, showing improved object detection performance compared to without knowledge transfer.

Uploaded by

Alejandro Limpo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

Unsupervised Knowledge Transfer For Object Detection in Marine Environmental Monitoring and Exploration

Uploaded by

Alejandro Limpo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Received July 2, 2020, accepted August 1, 2020, date of publication August 5, 2020, date of current version August 17,

2020.
Digital Object Identifier 10.1109/ACCESS.2020.3014441

Unsupervised Knowledge Transfer for Object

Detection in Marine Environmental
Monitoring and Exploration
MARTIN ZUROWIETZ AND TIM W. NATTKEMPER
Biodata Mining Group, Faculty of Technology, Bielefeld University, 33615 Bielefeld, Germany
Corresponding author: Martin Zurowietz ([email protected])
This work was supported in part by the BMBF Project COSEMIO under Grant FKZ 03F0812C, in part by the BMBF-funded
de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) under Grant 031A537B, Grant 031A533A,
Grant 031A538A, 031A533B, Grant 031A535A, Grant 031A537C, Grant 031A534A, and Grant 031A532B, in part by the
German Research Foundation (DFG), and in part by the Open Access Publication Fund of Bielefeld University.

ABSTRACT The volume of digital image data collected in the field of marine environmental monitoring and
exploration has been growing in rapidly increasing rates in recent years. Computational support is essential
for the timely evaluation of the high volume of marine imaging data, but often modern techniques such as
deep learning cannot be applied due to the lack of training data. In this article, we present Unsupervised
Knowledge Transfer (UnKnoT), a new method to use the limited amount of training data more efficiently.
In order to avoid time-consuming annotation, it employs a technique we call ‘‘scale transfer’’ and enhanced
data augmentation to reuse existing training data for object detection of the same object classes in new image
datasets. We introduce four fully annotated marine image datasets acquired in the same geographical area
but with different gear and distance to the sea floor. We evaluate the new method on the four datasets and
show that it can greatly improve the object detection performance in the relevant cases compared to object
detection without knowledge transfer. We conclude with a recommendation for an image acquisition and
annotation scheme that ensures a good applicability of modern machine learning methods in the field of
marine environmental monitoring and exploration.

INDEX TERMS Object detection, knowledge transfer, deep learning, marine environmental monitoring,
image annotation.

I. INTRODUCTION specialized for the the task of manual image annotation.

Digital imaging is nowadays a popular technique in the In contrast to other areas of computer science, where image
marine sciences as it is a non-invasive method for monitoring annotation refers to the assignment of semantics to images
and exploring marine habitats on a large scale (e.g. biodiver- as a whole (e.g. describing the scene in the image), image
sity estimation or ecological management). Thanks to recent annotation in this context refers to the assignment of class
technological advances in high-resolution digital imaging labels (e.g. a species name selected from a certain taxonomy)
and digital storage technology, mobile marine observation to several points or regions in an image [6]–[8] (see Fig. 1).
platforms such as autonomous underwater vehicles (AUV) This type of manual image annotation is a time-consuming
or ocean floor observation systems (OFOS) are capable to and error-prone task [6], [7]. Moreover, usually only domain
acquire large volumes of imaging data in a short time [1]. experts are able to detect the objects of interest (OOI), which
The sustainable curation and management of terabyte-scale can be for example bacterial mats, litter or fauna such as
volumes of marine imaging data is a challenge that has only sponges and sea cucumbers, and to select the correct class
recently been addressed [2]. labels with sufficient accuracy and reproducibility. The grow-
The analysis of marine imaging datasets is usually ing volume of image data and the limited availability of
performed manually with dedicated software tools like trained domain experts is a bottleneck problem for the evalu-
SQUIDLE+ [3], VARS [4] or BIIGLE 2.0 [5], which are ation of marine imaging datasets.
In order to cope with the growing volume of marine imag-
The associate editor coordinating the review of this manuscript and ing data that needs to be analyzed, computer-aided methods
approving it for publication was Hossein Rahmani . have been proposed to automate (or assist in certain steps of)

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
143558 VOLUME 8, 2020
M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

training on the target dataset because the (number of) object

classes and the high-level visual properties of the objects are
different.
In addition to the general problem of a lack of anno-
tated training data, computer-aided marine image annotation
suffers from another problem that makes the application of
transfer learning difficult. Common scenarios in marine envi-
ronmental monitoring and exploration are research cruises,
where images of the sea floor are captured during multiple
FIGURE 1. Image annotation in the context of marine environmental deployments of AUVs or OFOSs. Often the image datasets
monitoring and exploration where class labels are assigned to several
points or regions in an image. This example shows three classes of
collected on a single cruise cover a specific geographical area
marine fauna which were annotated using circles. and show similar habitats and OOI [22]–[25]. Between dif-
ferent deployments, the visual properties of the photographed
habitats and OOI may change due to different cruising speeds
image annotation. These include specialized approaches to of the observation platforms, different camera gear or differ-
laser point detection [9], coral reef annotation [10], fish ent distances of the observation platforms to the sea floor.
detection [11] or quantification of megafauna [12]. As in These differences may result in varying degrees of motion
many areas of computer vision research, deep learning has blur, distorted colors caused by the water column between
become increasingly popular in marine image processing, e.g. the camera and the sea floor, or different scales of OOI
for marine object detection [13] or semi-automated image of the same class. From a pattern recognition perspective,
annotation [14]. Most areas of computer vision research focus the distribution of data described by the visual properties of
on large annotated image datasets showing everyday objects a given class of OOI changes (or drifts) between datasets.
from scenes on land. The impressive performance of state-of- This phenomenon is referred to as ‘‘concept drift’’ by the
the-art deep learning models is based on such datasets (e.g. computer vision community [26]. In most of the literature,
ImageNet [15] or COCO [16]). The field of marine image ‘‘gradual’’ concept drift is described as a dynamic signal flow
processing, however, lacks such large numbers of annotations that changes continuously over time (e.g. a slowly wearing
in images due to the limited availability of domain experts piece of factory equipment might cause a gradual change in
capable of annotating the images. This leads to a ‘‘vicious the quality of output parts). In our context, however, a ‘‘sud-
circle’’ in which marine scientists are not able to produce den’’ concept drift is described where the visual properties
large annotated image datasets because adequate computer of OOI change instantaneously from one dataset to the other.
support is not available and state-of-the-art methods for ade- Langenkämper et al. [27] have recently shown that sudden
quate computer support cannot be developed because large concept drift can have a strong impact on the performance
annotated image datasets are not available. of deep learning classifiers applied for megafauna image
One approach for situations where precise image annota- classification collected at the same site but with changing gear
tions are costly to acquire is weakly supervised object detec- and operation. Schoening et al. [28] have shown that concept
tion [17]–[19]. This technique uses weakly labeled images drift is even challenging for purely manual image annotation
or videos where only the whole image or video frame is by trained experts.
labeled instead of a precise region in the image or video Despite the challenges mentioned above, some applica-
frame to train a machine learning model for object detection. tions of computer vision for underwater image analysis have
Weak labels for images or videos can be created much faster been proposed in the last years. Most of these methods focus
than precise image annotations for all objects in an image. only on a single class or very few classes of OOI such
However, weakly supervised object detection methods are as fish or corals [13]. Recently we proposed the Machine
unsuitable for datasets where dozens or even hundreds of learning Assisted Image Annotation (MAIA) method [14],
objects of different classes may occur in a single image, which does not distinguish between classes of OOI and can be
which is not uncommon in marine imaging datasets. used in a broader context. MAIA consists of four successive
Another technique for dealing with situations where only stages. In Stage I unsupervised novelty detection is used to
limited data is available to train a deep learning model is generate a list of possible OOI, which are manually filtered
transfer learning [20]. Transfer learning is usually applied in in Stage II. The filtered OOI are used to train a machine
deep learning by reusing the weights of the neural network learning model for object detection in Stage III. In the last
acquired by training on one ‘‘source’’ dataset as the starting stage, the final detections are again manually filtered and
point for training on another ‘‘target’’ dataset. It has been class labels are manually assigned to the OOI to produce
shown that transfer learning is also effective when the source the final image annotations. However, all methods, including
dataset from which the reused weights are derived is from MAIA, are meant to be applied independently to each new
a completely different visual domain than the target dataset dataset and do not allow the reuse of existing training data
(e.g. everyday objects vs. marine images [14], [21]). In most for object detection in a new dataset, as they do not account
cases, however, transfer learning still requires supervised for concept drift between different datasets.

VOLUME 8, 2020 143559

M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

Considering the lack of annotated training data and the In the following section, the UnKnoT method is presented
high cost of time-consuming manual image annotation, it is in detail, describing the individual steps for scale transfer,
desirable to have a computer vision system for automated data augmentation and object detection (see Section II). Code
or assisted image annotation that does not require extensive has been made available with this publication and can be
retraining for each new dataset. Such a computer vision sys- accessed on GitHub.1 The experimental setup that was used
tem must be able to adapt to the changes between datasets to evaluate the method is presented in Section III, including
as described above, where OOI of the same class may dif- the four datasets, referred to as S083, S155, S171 and S233.
fer in their visual appearance. Assisted by such a system, The datasets have been made available with this publication
marine scientists would only have to annotate one dataset, [34]–[37] and can be visually explored in BIIGLE 2.02 . The
and the time required for object detection, which is the most experimental results are summarized in Section IV and dis-
time-consuming part of manual image annotation [6], would cussed in Section V. The manuscript ends with a conclusion
be greatly reduced for the remaining datasets of the same about the relevance of our results and the UnKnoT method
geographical area. In this way, the knowledge consisting of for marine image annotation.
images and annotations that were collected previously can be
transferred and is not lost. II. METHODS
Knowledge transfer in the context of marine environmental In the UnKnoT approach, knowledge is represented by a
monitoring and exploration has previously been presented source dataset Ds and annotations that were manually created
by Skaldebø et al. [29], who attempt to transfer the knowl- by domain experts. The knowledge is transferred by trans-
edge obtained in a simulated underwater environment to the forming Ds and the annotations so a deep learning model can
real environment. First, artificial 3D-rendered images are be trained to perform object detection on a target dataset Dt
created, showing scenes similar to the real images. Then which has not been annotated. The entire process consists
CycleGAN [30] is used to make the artificial images look of three consecutive steps which are described in detail in
more realistic. Walker et al. [31] use physics based color the following sections (see also Fig. 2). In the first step,
correction and scale normalization on underwater images to scale transfer is applied to the images I s of the annotated
reduce the generalization error of a DeepLabV3+ model [32] source dataset Ds . This transforms the visible OOI in I s to
for image segmentation. Similarly, Yamada et al. [33] use a similar scale than the OOI in the images of the target
color correction and image rescaling to enhance their method dataset Dt . A set of annotation patches As→t is extracted from
for unsupervised feature learning of georeferenced sea floor the scale-transferred images, where each annotation patch is
images. All methods are applied to a single dataset and are not a cropped image centered on an annotated OOI. In the sec-
used for knowledge transfer to enable cross-dataset machine ond step, enhanced data augmentation is applied to increase
learning. the size and variety of the set of annotation patches As→t ,
In this article we present Unsupervised Knowledge Trans- resulting in the set of augmented annotation patches As⇒t .
fer (UnKnoT), a new method for object detection in marine In the final step, the set As⇒t is used to train a Mask R-CNN
environmental monitoring and exploration. The method model [38] which is subsequently applied for object detection
employs a technique we call ‘‘scale transfer’’ and enhanced on the target dataset Dt .
data augmentation to adapt one image dataset to the visual
properties of another image dataset and to reuse existing A. SCALE TRANSFER
image annotations for object detection. To the best of our On most deployments of an AUV or OFOS, the observation
knowledge, UnKnoT is the first method that addresses the platform moves at a fixed distance to the sea floor. This
reuse of existing image annotations for cross-dataset machine ensures an almost stable scale and illumination of OOI in
learning in marine environmental monitoring and explo- the images that are captured during the same deployment.
ration. To evaluate the method, we introduce four fully The distance to the sea floor may vary between two deploy-
annotated marine image datasets collected in the same geo- ments, though. An OFOS is usually operated much closer
graphical area but with varying gear and distance to the to the sea floor than an AUV and even the same observa-
sea floor. Our experiments show that UnKnoT can greatly tion platform can be operated at different target distances
improve the object detection performance in the relevant on different deployments. This can result in highly different
cases compared to object detection without knowledge trans- scales for the same classes of OOI in different image datasets
fer. In combination with the existing MAIA method, UnKnoT (see Fig. 4). Fully convolutional neural networks for instance
can be used instead of novelty detection in Stage I of segmentation or object detection such as Mask R-CNN [38]
MAIA to generate more accurate suggestions for OOI if are usually scale-invariant because they are trained on large
the images of the datasets show the same classes of OOI. image datasets in which many scales of objects of the same
Taking this into account, we conclude with a recommen- class occur. In this context, however, the scales of OOI of
dation for an image acquisition and annotation scheme that
ensures a good applicability of modern machine learning 1 https://github.com/BiodataMiningGroup/unknot
methods in the field of marine environmental monitoring and 2 https://biigle.de/projects/237, login: [email protected], pass-
exploration. word: UnKnoTpaper

143560 VOLUME 8, 2020

M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

FIGURE 2. The UnKnoT method. (1) Scale transfer: Images from an annotated source dataset Ds are
transformed to the set of scale-transferred images I s→t (a) and the set of annotation patches As→t is
extracted (b). (2) Data augmentation: The size and variety of the annotation patches As→t is increased
through data augmentation, resulting in the set of augmented annotation patches As⇒t (c). (3) Object
detection: A Mask R-CNN model is trained on As⇒t and applied to the images of Dt to produce the final
object detections (d).

the same class and dataset have a very low variance owing patches As→t (see Fig. 2b). The annotation patches are passed
to the fixed distance of the observation platform to the sea to the next step for data augmentation.
floor. In addition, the datasets usually have a much lower total
number of annotations than in other scenarios. B. DATA AUGMENTATION
To mitigate the scale shift between different datasets, scale Data augmentation is often used to increase the size and
transfer transforms the images I s of an annotated source variety of data that is available to train a machine learning
dataset Ds to make the OOI appear at a similar scale than model. This can often improve the performance of the trained
the OOI in the images I t of the target dataset Dt . The source model [40], [41]. In the context of computer vision tasks such
dataset Ds = {(Iis , dis )} and the target dataset Dt = {(Iit , dit )} as object detection or classification, common augmentation
consist of tuples of an image Ii and the distance di of the methods include operations such as horizontal or vertical
observation platform to the sea floor when the image was flipping, rotation or blurring of images. Viable augmentation
captured. The average distance to the sea floor of the target operations highly depend on the visual domain of the image
dataset is denoted as d¯t : datasets (e.g. vertical flipping makes sense for the image of a
|I t | football but not for a face).
1 X t In case of images of the sea floor captured with an AUV
d¯t = t di (1)
|I | or OFOS, augmentation operations such as flipping, rotation
i=1
or blurring can be applied. The OOI in the images are mostly
Each image Iis ∈ I s has a width of wi and a height of living organisms with a symmetric shape, which makes the
hi pixels. To apply scale transfer to an image Iis , the scale flipping operations viable. In addition, the OOI in the images
transfer factor dis→t is calculated first as defined in (2). Next, are photographed from the top, so they can occur at any rota-
each image Iis is scaled to the width ws→ti and height hs→t
i as tion angle. Different camera properties, motion of the obser-
can be seen in (3) and (4), respectively, resulting in the set vation platform or optical distortion by the water column can
of scale-transferred images I s→t (see Fig. 2a). A three-lobe introduce varying degrees of blur. An object detection model
Lanczos kernel is applied for downscaling (i.e. dis→t < 1) and that was trained partially with blurred images through data
a cubic filter is applied for upscaling (i.e. dis→t > 1) which augmentation can be more robust for these cases.
are the recommended methods of the VIPS image processing In case of UnKnoT, the machine learning model is trained
library [39]. with images of one dataset and applied to images of another
dataset. The images can be captured with different observa-
dis
dis→t = (2) tion platforms and different cameras, and are usually avail-
d¯t able as JPEG files. Different camera and storage settings can
ws→t
i = wi · dis→t (3) produce JPEG files with a varying degree of compression,
hs→t
i = hi · dis→t (4) which can introduce characteristic compression artifacts in
the images. We propose to use artificial JPEG compression
From each image in I s→t the annotated OOI are extracted as augmentation operation to make an object detection model
as 512 × 512 pixel crops which form the set of annotation more robust for the application on different datasets.

VOLUME 8, 2020 143561

M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

In UnKnoT, data augmentation is applied to the annotation image annotation. The method was tested in comprehensive
patches As→t at each step during training of the Mask R-CNN experiments on different combinations of datasets to evaluate
model (see the following section). For each step, a ran- the effectiveness of scale transfer and enhanced data augmen-
dom selection of zero to all of the following augmentation tation for unsupervised knowledge transfer.
operations is applied: horizontal flipping, vertical flipping,
rotation by 90, 180 or 270 degrees, Gaussian blur with a A. DATASETS
random standard deviation σ ∈ [1.0, 2.0] and artificial JPEG The four image datasets used to evaluate UnKnoT are referred
compression with a random compression factor c ∈ [25, 50]. to as S083, S155, S171 and S233. Each dataset consists
The set of augmented annotation patches is denoted as As⇒t of 550 randomly selected images from the image collec-
(see Fig. 2c). tions [22] (S083), [23] (S155), [24] (S171) and [25] (S233).
The image collections were acquired during the 2015 cruises
C. OBJECT DETECTION SO242/1 and SO242/2 of research vessel SONNE at the Peru
Object detection is performed in a similar way than in Basin Disturbance and Colonization (DISCOL) area [43].
Stage III of the MAIA method [14] which has been shown to The images of the different datasets were captured using
be effective in this particular context, with a few differences different observation platforms (OFOS and AUV) as well as
that are described in the following. In Stage III of MAIA, different average distances to the sea floor (see Table 1).
a Mask R-CNN model [42] is trained on an augmented set
of training samples using pre-trained weights of the COCO TABLE 1. Properties of the four datasets that were used to evaluate
dataset [16]. The trained model is applied to an image collec- UnKnoT with the observation platform, average distance and standard
deviation of the camera to the sea floor, and the number of images and
tion for the segmentation of ‘‘interesting’’ pixel regions in the annotations in the train and test splits.
images, which are subsequently converted into circle annota-
tions. In UnKnoT, the Mask R-CNN model is trained using
the set As⇒t of augmented annotation patches, as well as the
pre-trained weights of the COCO dataset. The data augmen-
tation used in Stage III of MAIA is replaced by the enhanced
data augmentation described in the previous section. Differ-
ent to the training configuration of MAIA and [42], a value
of 0.85 is used for the RPN_NMS_THRESHOLD, which The image annotations are based on a subset of ten
increases the number of region proposals during training. morphological classes of the fauna identification guide pre-
In this context, a higher number of region proposals during sented in [28] (see Fig. 4). The images were annotated in
training is beneficial for the detection of very small objects BIIGLE 2.0 [5] using MAIA [14] with an additional review
in the presence of very large and salient objects in the same using the Lawnmower tool to annotate OOI that were missed
image. In addition, a stepped learning rate decay is used to by MAIA. In total, the datasets contain 10,784 manual anno-
improve convergence of the object detection performance of tations on 2,200 images. Compared to datasets of other
experiment replicates. For the stepped learning rate decay, research areas such as the detection of everyday objects,
the heads layers are trained for 10 epochs each with a the datasets presented here may seem rather small. However,
learning rate of 10−3 , 5 · 10−4 and 10−4 , and all layers for the acquisition of annotations in marine images is much more
another 10 epochs each with a learning rate of 10−4 , 5 · 10−5 costly, as it requires more training and background knowl-
and 10−5 , resulting in a total of 60 training epochs compared edge in marine biology. This makes it infeasible to generate
to the 30 epochs of the training configuration of MAIA. datasets as large as e.g. COCO [16] to evaluate machine
One epoch consists of 400 steps and in each step, a batch learning methods in this research area.
of five images is processed. Training took about five hours The datasets S083, S155, S171 and S233 have been made
per dataset on a single NVIDIA Tesla V100. Inference is available with this publication [34]–[37]. Example images
performed on the images I t of the target dataset in the same with annotations can be found in the supplementary material.
way than in Stage III of MAIA [14] (see Fig. 3). The final
result is a set of circle annotations, enclosing potential OOI B. EVALUATION METRIC
in I t (see Fig. 2d). A common metric to evaluate the performance of an object
detection method is the mean average precision [44]. In this
III. EXPERIMENTAL SETUP context, object detections are produced based on the seg-
Four fully annotated image datasets were created to eval- mentation output of Mask R-CNN as described in [14]
uate the UnKnoT method. The datasets were captured in (see Fig. 3). This allows only the calculation of the ‘‘recall’’
the same geographical area, showing the same classes of (i.e. the percentage of OOI that were detected) and the ‘‘pre-
OOI, but with different observation platforms and distances cision’’ (i.e. the percentage of correct detections in the final
to the sea floor. In addition, a new metric to measure the result) but does not allow the ranking of the detections, so the
effectiveness of UnKnoT was created, which accounts for the mean average precision is not applicable. Another metric
desired properties of an object detection method for marine which is the harmonic mean of the recall and precision is

143562 VOLUME 8, 2020

M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

FIGURE 3. Example for inference with Mask R-CNN and the final object detection on a subsection of image TIMER_2015_09_04at09_13_31IMG_
0864.jpg of the S155 dataset. The image (a) is processed by Mask R-CNN which returns a segmentation mask for ‘‘interesting’’ pixels (b). The regions of
interesting pixels are converted to circle annotations for the final detections (c). The full image with manual annotations can be found in the
supplementary material.

the F1 -Score [45]. A variant of the F1 -Score is the F2 -Score the dataset (Atest ). The train splits consist of the remain-
which puts a higher weight on the recall and which has been ing images (Itrain ) and annotations (Atrain ) of the respective
used in a similar context to evaluate the object detection dataset (see Table 1). For evaluation, UnKnoT was applied to
performance of the MAIA method [14]: a given train split as source dataset Ds and a given test split
5 · precision · recall as target dataset Dt .
F2 (recall, precision) = (5) All combinations of using two datasets as Ds and Dt were
(4 · precision) + recall
evaluated in experiments using the following methods for
In case of an object detection method for images in Ds →Dt ), UnKnoT without enhanced
comparison: UnKnoT (Esc,tr,au
marine environmental monitoring and exploration, a mini- training compared to MAIA (Esc,au Ds →Dt ), UnKnoT without
mum of 80% for the recall and 10% for the precision can Ds →Dt ), UnKnoT only with
enhanced image augmentation (Esc,tr
be considered acceptable [14]. The F2 -Score does not take Ds →Dt ) and the baseline configuration of the
this into account. For example, it is possible to achieve a scale transfer (Esc
higher F2 -Score based on a precision of 20% and a recall MAIA object detection stage without any knowledge transfer
s t
of 70% than an F2 -Score based on a precision of 10% and a (E D →D ). The subscripts ‘‘sc’’, ‘‘tr’’ and ‘‘au’’ refer to scale
recall of 80%. In this context, the latter result would be more transfer, enhanced training and enhanced image augmenta-
s t
desirable and should yield a higher score in the evaluation. tion, respectively. For all experiments except E D →D the
s t
combinations D = D were not evaluated as this would mean
As a consequence, we do not apply the F2 -Score as the
primary metric to evaluate UnKnoT. Instead, we propose the knowledge transfer within the same dataset. Each experiment
‘‘Logistic Score’’ (L-Score) as a new metric which is better was repeated three times and the average L-Score was calcu-
suited to evaluate marine object detection regarding a mini- lated as final performance. We denote the average resulting
Ds →Dt and E Ds →Dt as L Ds →Dt
L-Score of the experiments Esc,tr,au
mum recall of 80% and a minimum precision of 10%. The s t
sc,tr,au
L-Score is the harmonic mean of the two logistic functions and L D →D , respectively.
Lr to assess the recall and Lp to assess the precision (see (6),
(7) and (8)). Lr is centered on the value of 80% recall with
a growth rate that yields Lr (1) ≈ 1 (see Fig. 5a) and Lp is IV. RESULTS
centered on the value of 10% precision with a growth rate that The effect of scale transfer that is applied with UnKnoT can
yields Lp (0) ≈ 0 (see Fig. 5b). The L-Score produces high be seen in Fig. 6. In case of source dataset S083, scale transfer
scores for a recall close to or greater than 80% and a precision magnifies the OOI with a factor of dis→t > 1 (see Fig. 6 first
close to or greater than 10%, and low scores otherwise (see row). In contrast to that, the size of the OOI of the source
Fig. 5c). datasets S155 and S171 is reduced with a factor of dis→t < 1
1 during scale transfer, with the exception of AS155→S171 where
Lr (recall) = (6) the size is marginally increased (see Fig. 6 second and third
1 + e−0.25·(100·recall−80)
row). In case of S233 as source dataset, the size of the OOI
1
Lp (precision) = −0.5·(100·precision−10)
(7) is both increased with S155 and S171 as target datasets and
1+e decreased with S083 as target dataset (see Fig. 6 fourth row).
2 · Lr (recall) · Lp (precision)
L(recall, precision) = (8) Table 2 shows the average resulting L-Scores of the exper-
Lr (recall) + Lp (precision) s t Ds →Dt with
iments E D →D without knowledge transfer and Esc,tr,au
s t
knowledge transfer. The experiments E D →D without knowl-
C. EXPERIMENTS edge transfer show the highest scores for the cases Ds = Dt ,
To evaluate the UnKnoT method, each of the four datasets where the images of source and target come from the same
was separated into train and test splits. The test splits consist dataset. The experiments E S155→S171 and E S171→S155 show
of images Itest that contain ≈ 10% of the annotations of almost identical scores to the experiments with Ds = Dt of

VOLUME 8, 2020 143563

M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

FIGURE 5. The harmonic mean of the two logistic functions Lr (a) and Lp
(b) forms the L-Score (c).

FIGURE 4. Examples for the ten classes of OOI (rows) of each of the four
image datasets (columns) that were used to evaluate UnKnoT. Scales of
OOI can vary drastically between different datasets. FIGURE 6. Annotation patches of the ‘‘sea cucumber’’ class without scale
transfer (dashed outline on the main diagonal) compared to
scale-transferred annotation patches of As→t . The rows denote the
source dataset and the columns denote the target dataset (e.g. the patch
in the first row and second column is from AS083→S155 ). Annotation
s patches produced with a scale transfer factor of d s→t > 1 are marked
these datasets. The experiments E D →S083 with Ds 6 = S083 as
with a + and patches produced with a scale transfer factor of d s→t < 1
well as E S083→S155 and E S083→S171 show a score close to 0. are marked with a −.
Ds →Dt with knowl-
Eight of the twelve experiments Esc,tr,au
edge transfer show higher scores than the experiments
s t
E D →D with the same combination of datasets. The produced by Mask R-CNN show only crude region proposal
L-Scores are increased by an average of 0.32. However, boxes instead of the refined regions of a valid segmentation
further inspection of the output of Mask R-CNN reveals (see Fig. 7). Similarly, the segmentation results for the exper-
S083→S155
invalid segmentation results for the experiments Esc,tr,au S083→S233 are not as refined as desired. The score of
iment Esc,tr,au
S083→S171
and Esc,tr,au . In these experiments, the segmentations S233→S155
Esc,tr,au is decreased when compared to object detection

143564 VOLUME 8, 2020

M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

TABLE 2. Average resulting L-Score of the experiments without TABLE 3. Standard deviation of the area of the circle annotations per
s t s t
D →D ) and the class and dataset, and the average over all datasets, given as multiples of
knowledge transfer (LD →D ), with knowledge transfer (Lsc,tr,au
the average annotation area of the respective class. Rows with an
average increase of the L-Score through knowledge transfer. Experiments
average standard deviation of less than 1.5 are highlighted.
based on a scale transfer factor of dis→t < 0.9 are highlighted.

TABLE 4. Average resulting L-Score, recall and precision of the

s t Ds →Dt .
experiments E D →D and Esc,tr,au

FIGURE 7. Part of an invalid segmentation from the experiment

S083→S155 (outlined blue) and a valid segmentation from the
Esc,tr,au
experiment E S083→S155 (outlined green) for comparison. The invalid
segmentation shows only crude region proposal boxes instead of the
desired refined regions of a valid segmentation. At the center of the
image, the five tips of the arms of a burrowed ophiuroid are visible.

S233→S171
without knowledge transfer whereas the score of Esc,tr,au
shows one of the highest valid increases. When the detection
is limited to the subset of OOI classes that have an average TABLE 5. Average resulting L-Score, recall and precision of all
intra-class area standard deviation of less than 1.5 times their experiments with a scale transfer factor of dis→t < 0.9.
average annotation area (‘‘Coral’’, ‘‘Crustacean’’, ‘‘Ipnops
fish’’ and ‘‘Ophiuroid’’, see Table 3), the L-Scores of both
experiments converge to 0.58 ± 0.10 (S233 → S155) and
0.86 ± 0.03 (S233 → S171) but are still not equal. All these
experiments are exclusively the cases where scale transfer
was applied with a factor of dis→t > 1. Among the remain-
S171→S155 shows a slightly decreased
ing experiments only Esc,tr,au
L-Score compared to object detection without knowledge
transfer. In this experiment, a scale transfer factor of 0.9 < V. DISCUSSION
dis→t < 1 was applied. The average increase of L-Scores of The UnKnoT method applies knowledge transfer from a
the remaining experiments, where a scale transfer factor of source dataset Ds with existing annotations to a target
dis→t < 0.9 was applied, is 0.58. dataset Dt for object detection. The knowledge transfer con-
The detailed results of all experiments including L-Score, sists of scale transfer, which adapts the scales of OOI in the
recall and precision are presented in Tables 4 and 5. source dataset Ds to the scales of OOI in the target dataset Dt ,

VOLUME 8, 2020 143565

M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

and of enhanced data augmentation for typical images of the the high difference in the scale of OOI is the cause for the bad
sea floor. object detection performance.
Fig. 6 shows that the scale transfer effectively transforms Ds →Dt can be separated into the same
The experiments Esc,tr,au
the scale of OOI of the source dataset to the scale of OOI of two scenarios as the annotation patches of Fig. 6 mentioned
the target dataset. First we will review the results obtained above, where the scale transfer is only effective in the cases
for experiments with a scale transfer factor of dis→t > 1 where a scale transfer factor of dis→t < 1 is applied.
(see patches marked with + in Fig. 6). In this scenario, The experiments where the source dataset Ds has a higher
the images of the source dataset have been transformed by average distance to the sea floor than the target dataset Dt
upscaling, as the OOI of the target dataset are shown larger belong to the first scenario. Even though UnKnoT produces
and more detailed. In a real setting, the images of the target an improved object detection performance in some of these
dataset would have been captured by an AUV or OFOS closer cases, the segmentation results of Mask R-CNN are invalid
to the sea floor as in the previous dives. In case of S083 (i.e. not as refined as desired) or the object detection per-
as source dataset, the OOI are transformed to a scale that formance is highly affected by the intra-class area standard
matches the scale of the OOI in the target dataset. However, deviation of the annotations. An invalid segmentation (as can
the scaling blurs the OOI and they do not appear as in focus be seen in Fig. 7) can be the result of OOI that were highly dis-
as the OOI in the target datasets. The results are similar but torted by a large scale transfer factor dis→t 1 so the trained
not as pronounced in case of S233 as target dataset. In the Mask R-CNN model cannot produce a meaningful segmen-
opposite scenario, where the images of the target dataset tation for the target dataset. Although the datasets S155 and
would have been captured further away from the sea floor S171 are very similar in terms of the average distance of the
than in the previous dives, the images of the source dataset camera to the sea floor, they show very different L-Scores in
are transformed by downscaling with a scale transfer factor S233→S155 and E S233→S171 . A closer look
the experiments Esc,tr,au sc,tr,au
of dis→t < 1 (see patches marked with − in Fig. 6). In case of at the intra-class area standard deviations of the annotations
S083 as the target dataset, the scale of the OOI matches the reveals that the compositions of annotations of some classes
scale of the OOI of the target dataset and the OOI appear in differ between these datasets (see Table 3). A high intra-class
focus. Considering only the visual appearance of the OOI, area standard deviation can be amplified by scale transfer
UnKnoT works more effectively if the source dataset was and can potentially result in unrealistically large OOI in the
captured closer to the sea floor than the target dataset and annotation patches of the source dataset. A limited amount
the scale of the annotated OOI is reduced during knowledge of training samples per class and an equally high intra-class
transfer. This observation is confirmed by the experimental standard deviation in the target dataset can lead to highly
results. different object detection performances, even if the source
s t
The experiments E D →D with Ds = Dt , where the datasets were captured at a similar average distance to the
images of source and target come from the same dataset, sea floor. When limited only to classes that show an average
show the highest average L-Scores. This is to be expected, intra-class area standard deviation of less than 1.5 times their
as Mask R-CNN is trained with OOI that appear most similar average annotation area (see Table 3), the L-Scores produced
to the OOI that should be detected. These experiments can be by the experiments converge, but are still not equally high.
seen as baseline with the best possible object detection perfor- This confirms the observation that UnKnoT is not well suited
mance in this context. Notably, the experiments E S171→S155 for cases where the source dataset was captured at a higher
and E S155→S171 show a score almost equal to E S155→S155 distance to the sea floor than the target dataset.
and E S171→S171 , respectively. Although these datasets differ The experiments where the source dataset Ds has a lower
in the distribution of annotations in the images (cf. |Itest | and average distance to the sea floor than the target dataset
|Atest | in Table 1), both datasets were captured at a similar dis- Dt belong to the second scenario. Among these cases only
tance to the sea floor with an OFOS. When Mask R-CNN is S171→S155 , in which a scale transfer factor of 0.9 < d s→t <
Esc,tr,au i
trained on one dataset and applied to the other, no knowledge 1 was applied, shows a slightly decreased L-Score compared
transfer is required to achieve a very good object detection to object detection without knowledge transfer. This high-
performance. Other notable results are given by the exper- lights a drawback of the proposed L-Score, as only small
s
iments E D →S083 with Ds 6 = S083, as well as E S083→S155 changes in the precision and/or recall can cause high dif-
and E S083→S171 which show a score close to 0. Such a low ferences in the L-Score if the score is already high. In case
L-Score is produced if either the recall is bad (i.e. < 80%), S171→S155 , the lower L-Score is produced by a slightly
of Esc,tr,au
the precision is bad (i.e. < 10%) or both. In case of the three lower precision of 12% compared to 16% of E S171→S155
s
experiments E D →S083 with Ds 6 = S083, the low average and an actually slightly higher recall of 90% compared to
recall of 47% is the cause for the low L-Score (see Table 4). 88% of E S171→S155 (see Table 4). Still, even if UnKnoT
Trained with OOI at a much larger scale, Mask R-CNN is does not have a negative impact on the object detection
unable to achieve an adequate recall in these cases. For the performance in this case, it does not improve the perfor-
experiments E S083→S155 and E S083→S171 , the low average mance either. Hence, we only denote the experiments in
precision of 5% causes the low L-Score (see Table 4). Again, which a scale transfer factor of dis→t < 0.9 was applied

143566 VOLUME 8, 2020

M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

as ‘‘relevant’’. These are the cases with a sufficiently large on a large scale using observation platforms such as
difference in the average distance of the camera to the sea AUVs. Following Step 1, the preferred target distance
floor. On average, UnKnoT improves the object detection to the sea floor should be 3.4 m. At this distance,
performance by an L-Score of 0.58 (189%) compared to the images cover a larger area than the images of Step 1,
object detection without knowledge transfer in these cases. potentially containing more OOI (cf. |Atrain | in Table 1).
s
Where the experiments E D →S083 with Ds 6 = S083 produced 3) UnKnoT should be used for object detection with the
a bad average recall of 47%, UnKnoT improves the average annotated dataset of Step 1 as source dataset and each
recall to 86% (see Table 4). Notably, the improved object of the datasets acquired in Step 2 as target dataset.
detection performance is highest for S233 → S083 compared 4) MAIA [14] should be used for the final image annota-
to S155 → S083 and S171 → S083. Also, the object tion of each of the datasets acquired in Step 2, by using
detection performance is improved to a similarly high level the object detection results of Step 3 as training pro-
for S155 → S233 and S171 → S233, in case of Esc,tr,au S171→S233 posals. The object detection results of Step 3 replace
even surpassing the baseline average L-Score of E S233→S233 . the results of the novelty detection stage of MAIA and
These results indicate that UnKnoT produces a better object ensure a highly specialized Mask R-CNN model for
detection performance with a source dataset that was captured each individual dataset in the instance segmentation
at an average distance to the sea floor that is roughly half the stage.
average distance of the target dataset. This image acquisition and annotation scheme can be an
Considering only the relevant experiments, scale transfer efficient way to produce large volumes of high-quality image
accounts for most of the improvements in the object detection annotations in typical scenarios of the field of marine envi-
performance. The additional enhanced training configuration ronmental monitoring and exploration.
of Mask R-CNN and the data augmentation improve the In summary, we presented UnKnoT, a new method for
object detection performance even further (see Table 5). unsupervised knowledge transfer that allows the reuse of
existing knowledge in the form of image annotations for
VI. CONCLUSION object detection in new marine image datasets that show sim-
Based on the observations and experimental results we draw ilar OOI. In addition, we presented the L-Score, a metric that
the following conclusions: If the annotated source dataset and is better suited to evaluate the object detection performance
the target dataset are very similar in terms of average distance in this particular context. We evaluated the effectiveness of
to the sea floor and observation platform, no knowledge trans- UnKnoT with four fully annotated image datasets, compris-
fer is required to achieve a good object detection performance ing a total of 10,784 annotations on 2,200 images captured
with a machine learning model such as Mask R-CNN. If the in the same geographical area at different distances to the
annotated source dataset was captured at roughly half the sea floor. Our experimental results have shown that UnKnoT
distance to the sea floor than the target dataset, UnKnoT greatly improves the object detection performance compared
can be used to greatly improve the object detection perfor- to object detection without knowledge transfer in the relevant
mance in an unsupervised way. As the discrepancy in average cases. Based on these results, we conclude by recommend-
distances to the sea floor increases, the increase in object ing a four-step image acquisition and annotation scheme for
detection performance by UnKnoT decreases, but the final future studies, which can be an efficient way to produce large
object detection is still much better than if no knowledge volumes of high-quality image annotations in the field of
transfer is performed. marine environmental monitoring and exploration.
To ensure a good applicability of machine learning meth-
ods such as UnKnoT for marine image annotation, we pro- REFERENCES
pose a four-step image acquisition and annotation scheme for [1] K. J. Morris, B. J. Bett, J. M. Durden, V. A. I. Huvenne, R. Milligan,
future studies of the same geographical area: D. O. B. Jones, S. McPhail, K. Robert, D. M. Bailey, and H. A. Ruhl,
1) One dataset with images of the sea floor should be ‘‘A new method for ecological surveying of the abyss using autonomous
underwater vehicle photography,’’ Limnol. Oceanography, Methods,
captured close to the ground and the current distance vol. 12, no. 11, pp. 795–809, Nov. 2014.
to the sea floor should be recorded for each image. [2] T. Schoening, K. Köser, and J. Greinert, ‘‘An acquisition, curation and man-
The images should be fully annotated in a manual agement workflow for sustainable, terabyte-scale marine image analysis,’’
way using a software such as BIIGLE 2.0 [5]. A target Sci. Data, vol. 5, no. 1, Dec. 2018, Art. no. 180181.
[3] R. Proctor, T. Langlois, A. Friedman, S. Mancini, X. Hoenner, and
distance to the sea floor of 1.7 m should be preferred as B. Davey, ‘‘Cloud-based national on-line services to annotate and analyse
OOI are likely to be easy to identify at this distance. underwater imagery,’’ in Proc. IMDIS Int. Conf. Mar. Data Inf. Syst.,
Methods to assist image annotation such as MAIA vol. 59, 2018, p. 49.
[4] B. Schlining and N. Stout, ‘‘MBARI’s video annotation and reference
[14] can be used to speed up the image annotation system,’’ in Proc. OCEANS, Sep. 2006, pp. 1–5.
process. [5] D. Langenkämper, M. Zurowietz, T. Schoening, and T. W. Nattkemper,
2) The remaining image datasets should be captured at ‘‘BIIGLE 2.0–browsing and annotating large marine image collections,’’
Frontiers Mar. Sci., vol. 4, p. 83, Mar. 2017.
twice the distance to the sea floor than the dataset from
[6] T. Schoening, J. Osterloff, and T. W. Nattkemper, ‘‘Recomia—
Step 1 and should also record the current distance to the Recommendations for marine image annotation: Lessons learned
sea floor for each image. Image acquisition can be done and future directions,’’ Frontiers Mar. Sci., vol. 3, p. 59, Apr. 2016.

VOLUME 8, 2020 143567

M. Zurowietz, T. W. Nattkemper: UnKnoT for Object Detection in Marine Environmental Monitoring and Exploration

[7] J. Monk, N. S. Barrett, D. Peel, E. Lawrence, N. A. Hill, V. Lucieer, and [29] M. Skaldebo, A. S. Muntadas, and I. Schjolberg, ‘‘Transfer learning in
K. R. Hayes, ‘‘An evaluation of the error and uncertainty in epibenthos underwater operations,’’ in Proc. OCEANS-Marseille, Jun. 2019, pp. 1–8.
cover estimates from AUV images collected with an efficient, spatially- [30] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, ‘‘Unpaired image-to-image
balanced design,’’ PLoS ONE, vol. 13, no. 9, Sep. 2018, Art. no. e0203827. translation using cycle-consistent adversarial networks,’’ in Proc. IEEE Int.
[8] J. M. Durden, B. J. Bett, T. Schoening, K. J. Morris, T. W. Nattkemper, Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232.
and H. A. Ruhl, ‘‘Comparison of image annotation data generated by [31] J. Walker, T. Yamada, A. Prugel-Bennett, and B. Thornton, ‘‘The effect
multiple investigators for benthic ecology,’’ Mar. Ecol. Prog. Ser., vol. 552, of physics-based corrections and data augmentation on transfer learning
pp. 61–70, Jun. 2016. for segmentation of benthic imagery,’’ in Proc. IEEE Underwater Technol.
[9] T. Schoening, T. Kuhn, M. Bergmann, and T. W. Nattkemper, ‘‘DELPHI— (UT), Apr. 2019, pp. 1–8.
Fast and adaptive computational laser point detection and visual footprint [32] L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, ‘‘Encoder-
quantification for arbitrary underwater image collections,’’ Frontiers Mar. decoder with atrous separable convolution for semantic image segmenta-
Sci., vol. 2, p. 20, Apr. 2015. tion,’’ in Proc. ECCV, Sep. 2018, pp. 801–818.
[10] O. Beijbom, P. J. Edmunds, D. I. Kline, B. G. Mitchell, and D. Kriegman, [33] T. Yamada, A. P. Bennett, and B. Thornton, ‘‘Learning features from
‘‘Automated annotation of coral reef survey images,’’ in Proc. IEEE Conf. georeferenced seafloor imagery with location guided autoencoders,’’
Comput. Vis. Pattern Recognit., Jun. 2012, pp. 1170–1177. J. Field Robot., pp. 1–16, May 28, 2020, doi: 10.1002/rob.21961.
[11] X. Li, M. Shang, H. Qin, and L. Chen, ‘‘Fast accurate fish detection and [34] M. Zurowietz, S083, Jan2020, doi: 10.5281/zenodo.3600132.
recognition of underwater images with fast R-CNN,’’ in Proc. OCEANS- [35] M. Zurowietz, S155, Jan. 2020, doi: 10.5281/zenodo.3603803.
MTS/IEEE Washington, Oct. 2015, pp. 1–5. [36] M. Zurowietz, S171, Jan. 2020, doi: 10.5281/zenodo.3603809.
[12] T. Schoening, M. Bergmann, J. Ontrup, J. Taylor, J. Dannheim, J. Gutt, [37] M. Zurowietz, S171, Jan. 2020, doi: 10.5281/zenodo.3603815.
A. Purser, and T. W. Nattkemper, ‘‘Semi-automated image analysis for [38] K. He, G. Gkioxari, P. Dollár, and R. Girshick, ‘‘Mask R-CNN,’’ in Proc.
the assessment of megafaunal densities at the arctic deep-sea observatory IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2961–2969.
HAUSGARTEN,’’ PLoS ONE, vol. 7, no. 6, Jun. 2012, Art. no. e38179. [39] J. Cupitt and K. Martinez, ‘‘VIPS: An image processing system for large
[13] M. Moniruzzaman, S. M. S. Islam, M. Bennamoun, and P. Lavery, ‘‘Deep images,’’ Proc. SPIE, vol. 2663, pp. 19–28, Feb. 1996.
learning on underwater marine object detection: A survey,’’ in Proc. Int. [40] S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell, ‘‘Understand-
Conf. Adv. Concepts Intell. Vis. Syst. Antwerp, Belgium: Springer, 2017, ing data augmentation for classification: When to warp?’’ in Proc. Int.
pp. 150–160. Conf. Digit. Image Comput., Techn. Appl. (DICTA), Nov. 2016, pp. 1–6.
[14] M. Zurowietz, D. Langenkämper, B. Hosking, H. A. Ruhl, and [41] L. Perez and J. Wang, ‘‘The effectiveness of data augmentation in image
T. W. Nattkemper, ‘‘MAIA—A machine learning assisted image annota- classification using deep learning,’’ 2017, arXiv:1712.04621. [Online].
tion method for environmental monitoring and exploration,’’ PLoS ONE, Available: http://arxiv.org/abs/1712.04621
[42] W. Abdulla. (2017). Mask R-CNN for Object Detection and Instance Seg-
vol. 13, no. 11, Nov. 2018, Art. no. e0207498.
[15] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: mentation on Keras and TensorFlow. [Online]. Available: https://github.
A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput. com/matterport/Mask_RCNN
[43] E. J. Foell, H. Thiel, and G. Schriever, ‘‘DISCOL: A long-term, large-
Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
[16] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, scale, disturbance-recolonization experiment in the abyssal eastern tropical
and C. L. Zitnick, ‘‘Microsoft COCO: Common objects in context,’’ in South Pacific Ocean,’’ in Proc. Offshore Technol. Conf., 1990. [Online].
Proc. ECCV. Zurich, Switzerland: Springer, 2014, pp. 740–755. Available: https://doi.org/10.4043/6328-MS
[17] P. Tang, X. Wang, S. Bai, W. Shen, X. Bai, W. Liu, and A. Yuille, ‘‘PCL: [44] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman,
Proposal cluster learning for weakly supervised object detection,’’ IEEE ‘‘The PASCAL visual object classes (VOC) challenge,’’ Int. J. Comput.
Trans. Pattern Anal. Mach. Intell., vol. 42, no. 1, pp. 176–191, Jan. 2020. Vis., vol. 88, no. 2, pp. 303–338, Jun. 2010.
[18] D. Zhang, J. Han, L. Yang, and D. Xu, ‘‘SPFTN: A joint learning frame- [45] M. Sokolova and G. Lapalme, ‘‘A systematic analysis of performance
work for localizing and segmenting objects in weakly labeled videos,’’ measures for classification tasks,’’ Inf. Process. Manage., vol. 45, no. 4,
IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 475–489, pp. 427–437, Jul. 2009.
Feb. 2020.
[19] D. Zhang, J. Han, G. Guo, and L. Zhao, ‘‘Learning object detectors with
semi-annotated weak labels,’’ IEEE Trans. Circuits Syst. Video Technol.,
vol. 29, no. 12, pp. 3622–3635, Dec. 2019. MARTIN ZUROWIETZ received the B.Sc. degree
[20] S. J. Pan and Q. Yang, ‘‘A survey on transfer learning,’’ IEEE Trans. Knowl.
in bioinformatics and the M.Sc. degree in infor-
Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010.
[21] E. C. Orenstein and O. Beijbom, ‘‘Transfer learning and deep feature matics in the natural sciences from Bielefeld Uni-
extraction for planktonic image data sets,’’ in Proc. IEEE Winter Conf. versity, Bielefeld, Germany, in 2013 and 2016,
Appl. Comput. Vis. (WACV), Mar. 2017, pp. 1082–1088. respectively, where he is currently pursuing the
[22] J. Greinert, T. Schoening, K. Köser, and M. Rothenbeck, Seafloor Ph.D. degree with the Biodata Mining Group. His
Images and Raw Context Data Along AUV Track SO242/1_83-1_AUV10 research interests include automatic object detec-
(Abyss_196) During SONNE Cruise SO242/1. Bremerhaven, Germany: tion in marine imagery using deep learning meth-
PANGAEA, 2017, doi: 10.1594/PANGAEA.881896. ods, assistance systems for manual marine image
[23] A. Purser et al., Seabed Photographs Taken Along OFOS Profile annotation, and the development of large-scale
SO242/2_155-1 During SONNE Cruise SO242/2. Bremerhaven, Germany: web-based collaborative image annotation platforms.
PANGAEA, 2018, doi: 10.1594/PANGAEA.890617.
[24] A. Purser et al., Seabed Photographs Taken Along OFOS Profile
SO242/2_171-1 During SONNE Cruise SO242/2. Bremerhaven, Germany:
PANGAEA, 2018, doi: 10.1594/PANGAEA.890620.
[25] A. Purser et al., Seabed Photographs Taken Along OFOS Profile TIM W. NATTKEMPER is currently a Professor
SO242/2_233-1 During SONNE Cruise SO242/2. Bremerhaven, Germany: of biodata mining with the Faculty of Technology,
PANGAEA, 2018, doi: 10.1594/PANGAEA.890633. Bielefeld University, Germany. His research inter-
[26] A. Tsymbal, ‘‘The problem of concept drift: Definitions and related work,’’ ests include the development of methods for the
Comput. Sci. Dept., Trinity College Dublin, vol. 106, no. 2, p. 58, 2004. analysis of digital images and video (bioimaging,
[27] D. Langenkämper, R. van Kevelaer, A. Purser, and T. W. Nattkemper, medical imaging, marine imaging, remote, and
‘‘Gear-induced concept drift in marine images and its effect on deep
sensing). One particular focus of TWNs research is
learning classification,’’ Frontiers Mar. Sci., vol. 7, p. 506, Jul. 2020.
[28] T. Schoening, A. Purser, D. Langenkämper, I. Suck, J. Taylor, the development of algorithmic approaches to har-
D. Cuvelier, L. Lins, E. Simon-Lledó, Y. Marcon, D. O. B. Jones, vest large marine image and sensor data collections
T. Nattkemper, K. Köser, M. Zurowietz, J. Greinert, and for hidden regularities. Two very important aspects
J. Gomes-Pereira, ‘‘Megafauna community assessment of polymetallic- are the computational classification/quantification with machine learning
nodule fields with cameras: Platform and methodology comparison,’’ and computer vision and the integration of field expert knowledge through
Biogeosciences, vol. 17, no. 12, pp. 3115–3133, Jun. 2020. [Online]. modern web-platforms and data-driven visualizations.
Available: https://www.biogeosciences.net/17/3115/2020/

143568 VOLUME 8, 2020

Z900 Series Manual 1.4
No ratings yet
Z900 Series Manual 1.4
155 pages
The Networked Image in Post-Digital Culture (Andrew Dewdney, Katrina Sluis) (Z-Library)
No ratings yet
The Networked Image in Post-Digital Culture (Andrew Dewdney, Katrina Sluis) (Z-Library)
249 pages
Survey 16
No ratings yet
Survey 16
10 pages
Huang 2019
No ratings yet
Huang 2019
13 pages
Pedersen Detection of Marine Animals in A New Underwater Dataset With CVPRW 2019 Paper
No ratings yet
Pedersen Detection of Marine Animals in A New Underwater Dataset With CVPRW 2019 Paper
9 pages
Detecting Marine Organisms Via Joint Attention-Relation Learning For Marine Video Surveillance
No ratings yet
Detecting Marine Organisms Via Joint Attention-Relation Learning For Marine Video Surveillance
16 pages
A Deep Learning Approach To Detecting Objects in Underwater Images
No ratings yet
A Deep Learning Approach To Detecting Objects in Underwater Images
16 pages
Hammerhead Shark Detection Using Regions With Convolutional Neural Networks
No ratings yet
Hammerhead Shark Detection Using Regions With Convolutional Neural Networks
6 pages
Jmse 12 00055 v2
No ratings yet
Jmse 12 00055 v2
18 pages
Unlocking The Potential of Deep Learning For Marin
No ratings yet
Unlocking The Potential of Deep Learning For Marin
44 pages
[email protected]
No ratings yet
[email protected]
19 pages
dipu_rpaper
No ratings yet
dipu_rpaper
11 pages
Boosting R-CNN - Reweighting R-CNN Samples by RPN's Error For Underwater Object Detection
No ratings yet
Boosting R-CNN - Reweighting R-CNN Samples by RPN's Error For Underwater Object Detection
15 pages
Sensors 20 00726 With Cover
No ratings yet
Sensors 20 00726 With Cover
26 pages
Underwater Computer Vision: Exploring the Depths of Computer Vision Beneath the Waves
From Everand
Underwater Computer Vision: Exploring the Depths of Computer Vision Beneath the Waves
Fouad Sabry
No ratings yet
Underwater Object Detection Using Image Enhancement and Deep Learning Models
No ratings yet
Underwater Object Detection Using Image Enhancement and Deep Learning Models
6 pages
Marine Robotics: An Improved Algorithm For Object Detection Underwater
No ratings yet
Marine Robotics: An Improved Algorithm For Object Detection Underwater
9 pages
Automated Detection, Classification and Counting of Fish in Fish Passages With Deep Learning
No ratings yet
Automated Detection, Classification and Counting of Fish in Fish Passages With Deep Learning
15 pages
Deep Learning For Shark Detection Tasks
No ratings yet
Deep Learning For Shark Detection Tasks
6 pages
UG Research
No ratings yet
UG Research
7 pages
附件5
No ratings yet
附件5
29 pages
Underwater Object Detection
No ratings yet
Underwater Object Detection
7 pages
008 Machine-Learning For Mapping and Monitoring Shallow Coral Reef Habitats
No ratings yet
008 Machine-Learning For Mapping and Monitoring Shallow Coral Reef Habitats
35 pages
Live Fish Species Classification in Underwater Ima
No ratings yet
Live Fish Species Classification in Underwater Ima
15 pages
Sensors 21 07598
No ratings yet
Sensors 21 07598
15 pages
Deep Learning to Count Fish in Sonar Images
No ratings yet
Deep Learning to Count Fish in Sonar Images
39 pages
HydroNet_PPT[1]
No ratings yet
HydroNet_PPT[1]
23 pages
Contextual Embedding Generation of Underwater Images Using Deep Learning Techniques
No ratings yet
Contextual Embedding Generation of Underwater Images Using Deep Learning Techniques
8 pages
Monitoring the Ocean Environment Using Robotic Sys_250416_013438
No ratings yet
Monitoring the Ocean Environment Using Robotic Sys_250416_013438
19 pages
Active Detection For Fish Species Recognition in Underwater Environments
No ratings yet
Active Detection For Fish Species Recognition in Underwater Environments
10 pages
s41060-024-00704-9
No ratings yet
s41060-024-00704-9
15 pages
Walter Avila Cordova 2020 J. Phys. Conf. Ser. 1642 012003
No ratings yet
Walter Avila Cordova 2020 J. Phys. Conf. Ser. 1642 012003
11 pages
Giles 2023 Combining Drones and Deep Learning to Automate Coral Reef Assessment with RGB Imagery
No ratings yet
Giles 2023 Combining Drones and Deep Learning to Automate Coral Reef Assessment with RGB Imagery
16 pages
paper_UnderwaterObjectDetectionUsingYOLOV4
No ratings yet
paper_UnderwaterObjectDetectionUsingYOLOV4
8 pages
Wang 2021 Hyperspectralimagingforunderwaterobjectdetect
No ratings yet
Wang 2021 Hyperspectralimagingforunderwaterobjectdetect
17 pages
Citi 2020
No ratings yet
Citi 2020
10 pages
Battery-Free Wireless Imaging of Underwater Environments
No ratings yet
Battery-Free Wireless Imaging of Underwater Environments
9 pages
13
No ratings yet
13
19 pages
s41598-024-63501-1
No ratings yet
s41598-024-63501-1
11 pages
Automatic Coral Detection With YOLO: A Deep Learning Approach For Efficient and Accurate Coral Reef Monitoring
No ratings yet
Automatic Coral Detection With YOLO: A Deep Learning Approach For Efficient and Accurate Coral Reef Monitoring
7 pages
Remotesensing 11 01374
No ratings yet
Remotesensing 11 01374
19 pages
A Review of Unsupervised Learning in Astronomy
No ratings yet
A Review of Unsupervised Learning in Astronomy
23 pages
007 Learning Instrument Invariant Characteristics For Generating
No ratings yet
007 Learning Instrument Invariant Characteristics For Generating
8 pages
Change_Detection_in_Hyperdimensional_Images_Using_Untrained_Models
No ratings yet
Change_Detection_in_Hyperdimensional_Images_Using_Untrained_Models
13 pages
15
No ratings yet
15
16 pages
1 s2.0 S001282521830391X Main PDF
No ratings yet
1 s2.0 S001282521830391X Main PDF
12 pages
Multi-Fish Tracking For Marine Biodiversity Monitoring: Keywords
No ratings yet
Multi-Fish Tracking For Marine Biodiversity Monitoring: Keywords
7 pages
Journal of King Saud University - Computer and Information Sciences
No ratings yet
Journal of King Saud University - Computer and Information Sciences
9 pages
Underwater Objects Detection and Tracking Using Image Processing
No ratings yet
Underwater Objects Detection and Tracking Using Image Processing
9 pages
Comparative Analysis of Neural Architectures For Underwater Object Detection
No ratings yet
Comparative Analysis of Neural Architectures For Underwater Object Detection
8 pages
Remotesensing 13 00516 v3
No ratings yet
Remotesensing 13 00516 v3
19 pages
A Machine Learning Approach for Early Detection of Fish Diseases by Analysing Water Quality
No ratings yet
A Machine Learning Approach for Early Detection of Fish Diseases by Analysing Water Quality
20 pages
Visapp 2018
No ratings yet
Visapp 2018
8 pages
A Sea Creatures Classification Method Using CNN
No ratings yet
A Sea Creatures Classification Method Using CNN
4 pages
Physical_Knowledge_Enhanced_Deep_Neural_Network_fo
No ratings yet
Physical_Knowledge_Enhanced_Deep_Neural_Network_fo
13 pages
14
No ratings yet
14
26 pages
Object Detection Using Adaptive Mask RCNN
No ratings yet
Object Detection Using Adaptive Mask RCNN
12 pages
Literature Review Hritick
No ratings yet
Literature Review Hritick
5 pages
annurev-environ-112420-013219
No ratings yet
annurev-environ-112420-013219
25 pages
Deep Learning
No ratings yet
Deep Learning
16 pages
1 s2.0 S0950705121008248 Main
No ratings yet
1 s2.0 S0950705121008248 Main
14 pages
a17_6022_www
No ratings yet
a17_6022_www
6 pages
Lehman PH.D
No ratings yet
Lehman PH.D
272 pages
Climate Change and The Geographies of Objectivity
No ratings yet
Climate Change and The Geographies of Objectivity
16 pages
AI - and - Society - Ways of Machine Seeing
No ratings yet
AI - and - Society - Ways of Machine Seeing
200 pages
Public Information On Contracts JOGMEC
No ratings yet
Public Information On Contracts JOGMEC
25 pages
Course Outline - Vietnamese English Business Translation
No ratings yet
Course Outline - Vietnamese English Business Translation
3 pages
Conveyor Belt Calculations
No ratings yet
Conveyor Belt Calculations
3 pages
Speaking Part 3
No ratings yet
Speaking Part 3
3 pages
Biology Fsc-Part 1 Name: Time Allowed: 20 Min Total Marks: 15
No ratings yet
Biology Fsc-Part 1 Name: Time Allowed: 20 Min Total Marks: 15
3 pages
KTGKI
No ratings yet
KTGKI
4 pages
Elements of Communication
No ratings yet
Elements of Communication
6 pages
Aeronautical Engineering Institutes in Delhi
No ratings yet
Aeronautical Engineering Institutes in Delhi
2 pages
St. Mary's College of Catbalogan
No ratings yet
St. Mary's College of Catbalogan
6 pages
Taller Pre Icfes - 2020
No ratings yet
Taller Pre Icfes - 2020
6 pages
Post Weld Heat Treatment Procedure
No ratings yet
Post Weld Heat Treatment Procedure
22 pages
PNS BAFS 310 2021 Large Ruminant
No ratings yet
PNS BAFS 310 2021 Large Ruminant
17 pages
Australian Standard: Plastics-Standard Atmospheres For Conditioning and Testing
No ratings yet
Australian Standard: Plastics-Standard Atmospheres For Conditioning and Testing
6 pages
iLS Guide For Students
No ratings yet
iLS Guide For Students
2 pages
Experiment 302 Heat and Calorimetry
No ratings yet
Experiment 302 Heat and Calorimetry
2 pages
QUARTER 4 Face To Face
No ratings yet
QUARTER 4 Face To Face
5 pages
Weather Data For Zanzibar
No ratings yet
Weather Data For Zanzibar
2 pages
8 Elements of Character
No ratings yet
8 Elements of Character
2 pages
The Fear of Things to Come_ Science Fiction Before and After Worl
No ratings yet
The Fear of Things to Come_ Science Fiction Before and After Worl
18 pages
Sonipat-Agriculture and Irrigation
No ratings yet
Sonipat-Agriculture and Irrigation
52 pages
Motorized Roller Bending Machine
No ratings yet
Motorized Roller Bending Machine
5 pages
CurriculumVitaeof Tahazibafor Job Application
No ratings yet
CurriculumVitaeof Tahazibafor Job Application
3 pages
Arcilla, Zoren - Me Lab1 - Exp1 - M1act5docx
No ratings yet
Arcilla, Zoren - Me Lab1 - Exp1 - M1act5docx
14 pages
Translation Transformation Lesson Plan For Module 4 For Fourth Form Mathematics
No ratings yet
Translation Transformation Lesson Plan For Module 4 For Fourth Form Mathematics
8 pages
Kiểm tra môn Học tốt Tiếng Anh 10 (Hệ 10 năm)
No ratings yet
Kiểm tra môn Học tốt Tiếng Anh 10 (Hệ 10 năm)
4 pages
Cambridge IGCSE™: First Language English 0500/23 May/June 2022
No ratings yet
Cambridge IGCSE™: First Language English 0500/23 May/June 2022
12 pages
Aerm Notes3
No ratings yet
Aerm Notes3
7 pages
2021 Subsurface Investigation For Design and Construction of Foundations of Buildings Part I
No ratings yet
2021 Subsurface Investigation For Design and Construction of Foundations of Buildings Part I
10 pages
Tensile and Fracture Properties of Coir Fiber Green Composites Bone Plate Fixation
No ratings yet
Tensile and Fracture Properties of Coir Fiber Green Composites Bone Plate Fixation
11 pages
Maintenance Management Impact On Hospital Energy Consumption
No ratings yet
Maintenance Management Impact On Hospital Energy Consumption
6 pages

Unsupervised Knowledge Transfer For Object Detection in Marine Environmental Monitoring and Exploration

Uploaded by

Unsupervised Knowledge Transfer For Object Detection in Marine Environmental Monitoring and Exploration

Uploaded by

Received July 2, 2020, accepted August 1, 2020, date of publication August 5, 2020, date of current version August 17,

Unsupervised Knowledge Transfer for Object

I. INTRODUCTION specialized for the the task of manual image annotation.

training on the target dataset because the (number of) object

VOLUME 8, 2020 143559

143560 VOLUME 8, 2020

VOLUME 8, 2020 143561

143562 VOLUME 8, 2020

VOLUME 8, 2020 143563

143564 VOLUME 8, 2020

TABLE 4. Average resulting L-Score, recall and precision of the

FIGURE 7. Part of an invalid segmentation from the experiment

VOLUME 8, 2020 143565

143566 VOLUME 8, 2020

VOLUME 8, 2020 143567

143568 VOLUME 8, 2020

You might also like