0% found this document useful (0 votes)
52 views34 pages

Deep Learning Methods For Multi Species Animal Re Identification and Tracking A Survey

Uploaded by

bbutcher415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views34 pages

Deep Learning Methods For Multi Species Animal Re Identification and Tracking A Survey

Uploaded by

bbutcher415
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/343518676

Deep Learning Methods for Multi-Species Animal Re-identification and


Tracking – a Survey

Article in Computer Science Review · November 2020


DOI: 10.1016/j.cosrev.2020.100289

CITATIONS READS
42 2,233

2 authors:

Prashanth C Ravoor Sudarshan Tsb


PES University PES Institute of Technology
7 PUBLICATIONS 101 CITATIONS 104 PUBLICATIONS 726 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Prashanth C Ravoor on 22 December 2020.

The user has requested enhancement of the downloaded file.


Deep Learning Methods for Multi-Species Animal
Re-Identification and Tracking – a Survey

Prashanth C Ravoor∗, Sudarshan T S B


Dept. of CSE, PES University, Bengaluru

Abstract

Technology has an important part to play in wildlife and ecosystem conserva-


tion, and can vastly reduce time and effort spent in the associated tasks. Deep
learning methods for computer vision in particular show good performance on a
variety of tasks; animal detection and classification using deep learning networks
are widely used to assist ecological studies. A related challenge is tracking ani-
mal movement over multiple cameras. For effective animal movement tracking,
it is necessary to distinguish between individuals of the same species to correctly
identify an individual moving between two cameras. Such problems could po-
tentially be solved through animal re-identification methods. In this paper,
the applicability of existing animal re-identification techniques for fully auto-
mated individual animal tracking in a cross-camera setup is explored. Recent
developments in animal re-identification in the context of open-set recognition
of individuals, and the extension of these systems to multiple species is exam-
ined. Some of the best performing human re-identification and object tracking
systems are also reviewed in view of extending ideas within them to individ-
ual animal tracking. The survey concludes by presenting common trends in
re-identification methods, lists a few challenges in the domain and recommends

∗ Corresponding author

c 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
http://creativecommons.org/licenses/by-nc-nd/4.0/
Published article: https://doi.org/10.1016/j.cosrev.2020.100289

Email addresses: [email protected] (Prashanth C Ravoor),


[email protected] (Sudarshan T S B)

Preprint submitted to Computer Science Review July 2020


possible solutions.
Keywords: animal re-identification, cross-camera tracking, deep learning,
multi species re-identification, open-set re-identification

1. Introduction

Use of technological solutions for conservation of wildlife is on the rise. Re-


cent advances in development of low powered fast computation devices, parallel
processing, advanced and efficient learning algorithms among others make this
feasible. Several methods are now available for ecologists and academia to ease
their research and help build tools for protection of endangered species. The
focus of this paper is the use of technology to automatically monitor and track
animals in their natural habitat.

1.1. Motivation

Animals straying into human settlements, primarily in search for food cause
conflicts resulting in injury to humans, animals, or both. A fully automated
monitoring system that detects animal transgressions and alerts concerned au-
thorities can help reduce causalities. Computer Vision is one choice of tech-
nology that can potentially solve most of the associated problems. The system
referred to henceforth is in context of a network of cameras running image pro-
cessing software.
It is not sufficient to simply detect if a wild animal is close by. It would also
be highly beneficial to provide the closest estimate of current animal location.
The problem thus faced here is that of tracking an individual animal (or a group
of individual animals) in a cross-camera setup in real or near-real time. It is not
necessary to remember the identity of the individual(s) for an extended duration
of time, but the identity would only need to be retained until the animal has
moved completely out of range of the system (this could be in the order of
minutes or hours).

2
1.2. Terminology

There are multiple references to re-identification, feature extraction and


closed/open-set recognition throughout this paper. A brief introduction to each
of these terms is presented below.
Re-Identification refers to ranking a list of known individuals (the gallery
set), in context of a probe or query image. Within the ranked or ordered set
of gallery images, the top few ranks (generally referred to as top-k ) would have
high probability of containing the individual in the probe image. If such a
system is built for a controlled environment, such as a laboratory or an animal
reserve, the problem is categorized as a closed-set recognition problem; all the
individuals in the gallery are known before hand. In most cases, it is likely that
the individual being probed would not appear in the gallery set. This is referred
to as an open-set recognition problem.
Feature extraction maps an image to a vector, typically of much smaller di-
mension than the original image. The resultant vector consists of the significant
features of the original image. Such feature vectors are easily compared using
well known metrics, such as cosine similarity, euclidean distance or even soft-
max classification. Classification convolutional neural networks (CNNs), such
as Alexnet [1], Inception [2] and ResNet [3] which have been pre-trained over
a large set of images, such as the ImageNet database are often used to extract
features. Recently, training feature extraction networks using a Siamese Archi-
tecture, inspired by [4], has gained prominence. In the Siamese Architecture,
two or more neural networks are trained together with shared weights to find
the similarity (or dissimilarity) of pairs of images. The networks learn to dis-
criminate between two images by auto-learning the most distinguishing features
contained in them.

1.3. Problem Statement

The challenge being addressed here is a variation of the re-identification


problem. For the purposes of fully automated monitoring and tracking, each

3
probe needs to generate a single result – either the identity of the probe individ-
ual or a new identity (in case of open-set recognition). In re-identification terms,
instead of the top-k matches, tracking systems use only the top-1 match always.
It goes without saying that re-identification applies only for individuals of the
same species. However, general applications of animal tracking would require
identifying multiple kinds of animals, and it must be capable of identifying each
of them separately.
In summary, this study focuses on the top-1, open-set re-identification of
multiple species of animals in a cross-camera setup with identities assigned on
a non-permanent basis (short-term tracking).

1.4. Animal Re-Identification Challenges

The challenges for animal re-identification bear strong resemblance to those


of person re-identification or vehicle re-identification. There could be large
variations in the images in terms of illumination, angle of capture, pose of the
individual, lateral translations, minor or complete occlusions, differing weather
conditions etc. While re-identification of humans also has to deal with changes in
clothing and accessories (which form nearly the whole image area of the human)
this particular challenge doesn’t exist in animal or vehicle re-identification.
However, shape or form of the animal varies significantly when compared
to that of humans. Since human movements involve smaller changes to pose
due to a primarily rigid style of movement, animals are considerably more fluid
and undergo larger differences in form. Angle of capture also vastly affects re-
identification of animals, since some identifying traits may not appear in certain
angles for example, tigers which are identified by their stripe patterns, require
side capture vs. frontal capture.
The contributions of this paper are:

1. Present a survey of recent studies in animal re-identification which use


Deep Learning methods for feature extraction
2. Review such works in the context of multi-species re-identification

4
3. Provide an overview of such methods in the context of open-set recognition
and their applicability for the same
4. Explore application of ideas used in state-of-the-art object tracking and
human re-identification methods to animal re-identification
5. List common trends in animal re-identification, the challenges faced and
possible solutions to such problems

2. Related Work

Accurate tracking of animals in their natural habitats allow ecologists to ob-


tain their approximate locations in a non-invasive manner. Besides monitoring
of individual animals, tracking data can offer useful insights about the ecological
status of the location under study. It also forms a basis for crucial decisions. For
example, Hays et al. [5] compile several successful case studies where tracking
information of endangered animals has helped form policies for their conserva-
tion. With wide use of camera traps, there is scope for automation of tracking
through use of computer vision and animal re-identification. There are however,
wide and varied methods currently available for animal re-identification. Using
the correct method requires a careful analysis of the model, feasibility of the
approach and available resources.
To the best of the authors’ knowledge, the work by Schneider et al. [6] is
the only survey that specifically looks at animal re-identification. Schneider
et al. [6] describe different techniques used for automating identification of
animals using computer vision, the earliest of which dates back to the 1990s,
where Whitehead et al. [7] built a system to assist identification of sperm
whales. Use of handcrafted features or visual information from pictures to
identify animals based on patterns, and other such feature engineering methods
require expert knowledge of the species to allow for extraction of a particular
set of discriminating features. These methods are also non-transferable to other
species.
This paper concurs with one of the conclusions drawn in [6], which states

5
that the use of deep learning methods for re-identification results in increased
accuracy and eases the development of completely autonomous re-identification
systems. While the work of Schneider et. al. is fairly comprehensive, it does
not comment on re-identification in terms of a) open set recognition, b) multi-
species re-identification, c) applicability to end-to-end automation for tasks such
as tracking. The work done in this paper extends the survey, and contains a
study of recent research in the area. Although there is some overlap in the
papers presented here and in [6], the context of the presentations vary. This
paper also looks at the problem of re-identification in a new light – namely as
a tool for tracking individuals in a cross-camera setup.

3. Object Tracking

Since the problem described in this paper directly relates to tracking, some
state-of-the-art object tracking methods are explored in this section, along with
their advantages and shortcomings. The papers selected for review are some of
the top performers from the Multi Object Tracking (MOT) challenge [8], which
is a widely used benchmark. Apart from the overall score (MOTA), two other
key metrics have been taken into consideration when selecting the papers for
review – the number of id switches (where the identity assigned to one of the
tracklets changes erroneously) and fragmented tracks (discontinuation in the
tracking sequence). Three publications are presented below, which are among
the best performers in the MOT 2019 challenge, and which had, at the time
of writing, published articles describing the algorithms used and methodology
employed. A summary of the performance of all three papers is illustrated in
Table 1.
Bergmann et al. [9] present a novel way of using the task of object detection
itself to additionally track objects, by suitably modifying the detection network.
Outputs are added to allow the system to decide for each detection, whether

1 https://motchallenge.net/results/CVPR_2019_Tracking_Challenge/

6
Id Fragmented
Ref. Feature Speed
MOTA Switches Tracks
(Year) Extraction (FPS)
Ratio Ratio

[9] (2019) Siamese Network Architecture 51.3 ± 18.7 47.2 88.2 2.7
[10] (2019) Joint Inference Network 47.6 ± 20.3 44.4 70.9 0.2
[11] (2018) – 46.7 ± 19.6 48.6 81.8 18.2

Table 1: Performance metrics of the surveyed object tracking algorithms. MOTA stands for
Multi-Object Tracking Accuracy. The details of each metric can be found from the MOT2019
tracking results page1

to continue an existing track, or start a new one. Stale tracks are removed pe-
riodically. Camera motion compensation and constant velocity assumption are
used to smooth tracks in case of occlusions or failed detection frames. To pre-
serve identities of tracked objects, feature vectors are generated from a network
trained using a Siamese architecture [4], and used to determine if the identities
of objects in two discontinuous tracks are the same.
Yoon et al. [10] describe the use of a combination of locality based associa-
tion using Hungarian algorithm and appearance based association using visual
features to maintain track continuity. A joint inference network (JI-Net) is used
to extract feature vectors and compare the similarity of two images. The JI-Net
is trained using a Siamese Architecture [4] with Triplet Loss [12] as the objec-
tive function. The output of the JI-Net is fed to an Long Short Term Memory
(LSTM) network, and it is updated with changes to appearance of the object
over time.
Bochinski et al. [11] describe an Intersection-Over-Union (IOU) based tracker,
and show that augmenting it with visual information largely reduces number of
id switches and reduces fragmented tracks. In every frame, the detected object
with the highest overlap (IOU) is assigned to the same tracklet. Any interme-
diate frames with a failed detection would break tracks. To avoid this, a visual
tracker is initialized over the last known frame, and if the object is detected
again in subsequent frames, the tracklets are merged. This eliminates the need

7
for predictive tracking, where the last known frame and direction are used to
predict direction of movement. The visual cues used are only image location
based, and not to extract features. [11] report a good increase in overall tracking
accuracy and a reduction in the number of id switches.
One of the challenges observed with tracking methods is cross-camera track-
ing; specifically, how visual features are used to track identities across different
cameras in real-time. Subtle changes in lighting and colouration can lead to
non-matching of extracted feature vectors. Use of deep learning for feature ex-
traction goes a long way to address this problem. Current methods tend to use
only spatial factors (location of objects within the image) to determine identities
within images from a single camera. Clearly, when the object moves between
two cameras, the locations of the object with respect to the frame can vary sig-
nificantly depending on the angle of capture and direction of movement, giving
rise to possibility of an id-switch. Apart from having a good feature extrac-
tion method, it follows that temporal factors can also be introduced to build a
better strategy for identity association and make the tracker more robust in a
cross-camera setting,
Use of deep learning networks for feature extraction is commonly used to re-
duce id switches and fragmented tracks. However, there is no specific test case
for tracking animals in the datasets – a majority of the test cases use people as
subjects for tracking. As noted in section 1.4, there are some fundamental dif-
ferences in identifying humans vs. identifying animals. Assuming that the rest
of the tracking system remains unchanged, just the feature extraction module
can potentially be replaced with one more suited to animals. The new module
could use the same network architecture used to identify humans (with appro-
priate training) or a completely different network architecture. In view of the
latter, a study of existing feature extraction methods for animals is presented
in the next section.

8
4. Deep Learning Methods for Animal Re-Identification

Building on the conclusions of Schneider et. al. in [6], the publications


studied in this paper are only those that make use of deep learning methods
for feature extraction. The list of publications surveyed here had various mo-
tivations for their research. Figure 1 illustrates a chart that categorizes them
into one of four types, namely a) animal monitoring for individual welfare /
protection, b) animal monitoring for behavioural studies, c) conducting animal
census for wildlife conservation, and d) tools for ecological studies. The same
information is also shown in Table 2. A majority of the research work focuses
on building systems that aid census of animals, so that informed decisions can
be taken regarding their protection. A significant number of them also attempt
to build tools to help ecologists automate their research studies. While animal
re-identification in general has a wide range of applications, such as the ones
listed above, the focus of this paper is to utilize extracted feature vectors for
effective animal tracking.
A tracking system should not be bound to a single animal species, and should
generalise to different species. In case of publications that target particular
type of animal, the system performance on different species is to be considered.
For publications that include multiple species, the type of animals included in
the study determines its robustness. In view of the above, this section has
been broadly categorized into publications that apply techniques for animal
re-identification on a single species and those that include multiple species.

4.1. Single Species Identification

This section covers all the studies that have been conducted on identification
of a single species. A brief summary of all the publications are tabulated in
Table 3. The publications here have been sub-categorized by the mechanism of
identification of each species – namely those that use a) localized body parts
(like trunk, flank etc.), b) face, head or full body of the animal. An illustration
of what is meant by the above mentioned methods is found in Figure 2, which

9
Figure 1: Motivations for animal re-identification studies surveyed in this paper. Y-Axis
indicates the count of the number of papers.

Motivation Count Publications

Monitoring for Individual Welfare / Protection 3 [13], [14], [15]


Monitoring for Behavioural Studies 2 [16], [17]
Census for Animal Conservation 7 [18], [19], [20],
[21], [22], [23],
[24]
Automated Tools for Ecological Studies 5 [25], [26], [27],
[28], [29]

Table 2: The primary motivations for the surveyed animal re-identification publications

10
indicates various parts of the animal that can be used for feature extraction and
subsequent identification. The Amur Tiger Re-Identification challenge has been
addressed separately, since these solutions are specific to a particular problem.

4.1.1. Localized Parts Based


The publications [16] (minke whales), [26] (dolphins), [13] (cows) all segment
characteristic features from the original image and use only the segmented por-
tions for feature extraction. For [16] and [26], these are the fins (to identify
minke whales and dolphins respectively) and [13] uses the patterns along the
spine of the cow. [16] and [13] employ a fairly similar technique, by using a
Deep Learning network to extract features from the localized images, and use
Softmax classification to assign identities. [26] on the other hand compares fea-
tures using Cosine Similarity, and the identity assigned is that of the individual
with the highest similarity.
Konovalov et al. [16] used a dataset consisting of underwater images of 76
different minke whales. An 8-layer Fully Convolutional Network (FCN-8), based
on the VGG-16 [30] architecture was used for both semantic segmentation and
classification. The system achieved a validation accuracy of 92.3% and a test
set accuracy of 93.5%.
Bouma et al. [26] use ResNet50 [3] as a feature extraction network. Triplet
Loss [12] is used as the objective function to train the network. The system is
tested over a dataset nearly 4750 images of around 185 individual dolphins, and
obtains a top-1 accuracy of 90.6%. Accuracy of open-set recognition is factored
in, but not explicitly reported.
Phyo et al. [13] use a custom 3-layer CNN for the individual’s classification.
A dataset of approximately 13600 images is used, pertaining to 60 different
cows. The system achieves a top-1 closed set accuracy of 96.3%.
[16] and [13] do not report open-set recognition accuracy. Further, use of
a Softmax classifier makes it difficult to implement. While the approach in
the method described here is easily extendable beyond individual species, the
object detector, which localizes the fin for example, needs to be replaced. If

11
Ref. # labels/ Id Feature Identity Closed Set Open Set
(Year) # images Mechanism Extractor Association Acc. % Acc. %

[16] (2018) 76/1320 LBP FCN-8 Softmax 93.50 -


[26] (2018) 185/3544 LBP Resnet50 Euclidean Distance 90.60 90.60
[13] (2018) 60/13603 LBP Custom Softmax 96.30 -
[18] (2019) 51/2877 HF VCG16 Cosine Similarity 93.30 -
[27] (2016) 102/7187 HF AlexNet SVM 84.50 -
[21] (2017) 147/12765 HF AlexNet SVM 62.40 -
[17] (2018) 276/2076 HF ResNet50 SVM 56.00 -
[14] (2018) 439/17802 HF Custom KNN Classifier 81.70 55.80
[25] (2018) 30/21600 HF InceptionV3 Softmax 92.00 -
[22] (2019) 107/3649 LBP ResNet50 Softmax 74.10 51.70
[20] (2019) 107/3649 LBP ResNet101 - 93.30 -
[19] (2019) 107/1887 LBP DenseNet121 Cosine Similarity 89.10 -

Table 3: Brief Summary of all single species animal re-identification papers surveyed. Abbre-
viations: LBP – Localized Body Parts, H – Head / Face / Full Detection.

the detector can segment distinguishing parts of the image for different species,
feature extraction becomes simpler. In the case of [26] it also follows that since
feature vectors are extracted and clustered to assign identities, finding if an
individual has not been seen before is also possible

4.1.2. Face, Head and Full Detection


The studies described in [18] (Red Pandas), [27] (Chimpanzees), [21] (Go-
rilla) use facial features for individual identification, and [17] (Elephants) and
[14] (Cow) use features from the head of the animal. All these systems operate
under the same assumption that the facial features are sufficient to discriminate
individuals. [25] (Cow) uses the full detection of the individual during feature
extraction. Each of these systems use an object detector to localize the animal
in the image, and use the localized portion of the image for feature extraction.
He et al. [18] build a system to identify red pandas, over a database of 2877
images of 51 different individuals. A VGG-16 [30] network is used for feature
extraction, and achieves 93.3% top-1 closed set accuracy over identification.
Freytag et al. [27] introduce the C-Zoo and C-Tai [27] datasets for chim-

12
panzees, and describe a system to identify characteristics of these primates, in-
cluding the identity, gender, age etc.. AlexNet [1] is used for feature extraction.
The publication argues on the importance of conversion of extracted feature
vectors into Euclidean spaces before they can be compared, and introduce a
Log-Euclidean framework that uses matrix logarithms on the feature vectors
before comparing them in the Euclidean space. An accuracy of nearly 92% on
C-Zoo and 75.66% on C-Tai datasets are reported.
Brust et al. [21] outline a system to identify gorilla, over a dataset of 2000
images of 147 different individuals. The identification pipeline used is similar to
that used by Freytag et al. in [27] but omits the matrix logarithm and bi-linear
pooling. The publication reports an accuracy of 62.4%.
Korschens et al. [17] introduce a dataset for elephants, consisting of 2076
images containing 276 individuals [31]. The authors describe use a truncated
ResNet50 network [3] for feature extraction and report a top-1 accuracy of 56%
when using single image input.
Bergamini et al. [14] create a system to identify cows over a dataset of
17802 images consisting of 439 individuals.Two different CNNs with a custom
architecture are trained with multiple views of faces of the same cow for feature
extraction. The outputs of each network are combined to form a single feature
vector. Histogram loss is used as the loss function. The system obtains 81.7%
top-1 closed set-accuracy and 55.8% top-1 open-set accuracy.
Li et al. [25] build a system for identification of individual cows, over 2160
images containing 30 individuals. The full detection of the cow is used for
feature extraction. The authors build and train a de-noising network to pre-
process mages and the de-noised image is then fed to a modified InceptionV3
[2] network for individual identification. A top-1 accuracy of 92% is reported.
All the publications here follow the same pipeline – object localization, then
feature extraction followed by classification. Although the classification method
used may not be applicable to open-set recognition (use of closed-set SVM clas-
sifiers for instance), it is possible to extend the concept to unseen individuals
assuming extracted features would be different for the new image. The system

13
Figure 2: An example of the possible parts of an image that can be used for feature extraction.
Base image source: one of the images published in the Amur Tiger dataset [22]

could also be extended to other species which have distinguishing facial charac-
teristics. However, facial characteristics by themselves may not be sufficient for
discriminating between individuals of certain species. Another drawback is the
face being hidden, such as when the image is captured from the side rather than
the front. On the other hand, no particular features are used for identification
in [25], i.e. the distinguishing features are auto-learned by the CNN using the
entire image of the cow, which is desirable to generalize it over different species.

4.1.3. Amur Tiger Re-Identification challenge


One of the major challenges in deep learning is the availability of reliable,
labelled data. This applies to animal re-identification as well. Good quality,
publicly available datasets go a long way to promote research, at the same time
providing for a like-to-like comparison of the results of different methodologies.
While few public datasets exist, like C-Tai, C-Zoo [27] (Chimpanzees), [31]
(Elephants) to name a few, more datasets spur research in the area.
Li et al. in [22], introduce a labelled dataset named Amur Tiger Reiden-
tification in the Wild (ATRW), with the intention of aiding studies to obtain

14
Ref. # # labels/ Feature Identity Closed Set Open Set
(Year) Species # images Extractor Association Acc. % Acc. %

[29] (2018) 3 -/- Custom Sigmoid 81.65 -


[24] (2019) 2 90/14845 DenseNet121 Cosine Similarity 95.50 75.00
[20, 4251]/
[28] (2019) 5 DensNet201 Sigmoid 85.00 80.00
350K
[23] (2018) 3 93/11637 PrimNet Cosine Similarity 85.00 61.33
[15] (2017) 3 50/1500 AlexNet SVM 85.00 -

Table 4: Brief Summary of all multi-species animal re-identification papers surveyed. The
dataset size and the accuracy reported is averaged across all species.

accurate population counts of the endangered Amur Tiger. The dataset con-
tained photos of tigers in various poses, captured at multiple angles and differing
light conditions, in both single camera and multi-camera setups. The datasets
labels each individual tiger, and in addition annotates pose related key-points.
The dataset contains 3649 images of tigers spread over 107 different individu-
als.Several publications regarding Amur Tiger Re-Identification have emerged,
and a few of the more interesting results are described in this section.
Both [22] and [20] describe use of both global and local image features for
re-identification. Features are extracted in multiple streams (from the full im-
age of the tiger as well as from multiple key points like limbs, hip etc.). Both
use variants of ResNet [3] as the backbone network for feature extraction, and
train each network over Triplet Loss [12]. In case of [20], three different streams
are used at train time, and only the global feature stream is used during infer-
ence. Their architecture is named Part-Pose Guided Network (PPGNet). [22]
reports a Top-1 accuracy of 89.4% in the single-camera case, and 77.1% in the
cross-camera case over the test set. [20] reports an astounding 97.7% single
camera top-1 accuracy, and 93.6% top-1 cross camera accuracy, and is the top
submission for the ATRW challenge in the plain re-id track2 .

2 https://cvwc2019.github.io/leaderboard.html

15
In [19], Shukla et al. use Deep Learning networks for re-identification, and
use SIFT descriptors [32] extracted from the image to re-rank the results ob-
tained from use of the deep learning network. Their identification network is
a DenseNet121 model [33]. A combination of Cross-Entropy loss and pair-wise
KL-divergence loss is used to train the network. The system achieves over the
test set show 92.7% accuracy over the single camera case and 84.5% over the
cross camera case.
While the authors of [22] have spent considerable effort to label the datasets
with pose and part based information, such metadata may not in general be
available in datasets of other species. Therefore use of pose based information
for re-identification may actually hinder long term research until such a time
more datasets emerge containing similar metadata. Another aspect that could
improve the dataset is to add test cases for open-set recognition, which don’t
seem to exist currently.
While the system in [20] performs admirably well for tiger re-identification,
the network structure used is highly complex — eight different networks need
to be trained (although only one is used during inference). In addition, as the
authors claim (understandably), the network architectures are built solely to
solve the problem of tiger re-identification, and as a result it might not generalise
to other species. The system described in [19] should generalize well to different
species, given that all features are auto-learned by the CNN and no additional
pose related information is used.
Systems built for particular type of animal naturally tend to use species-
specific characteristics to enhance system performance, be it for re-ranking or
during the object localization phase (such as building object detectors specifi-
cally to identify flanks of the tiger). As such, these mechanisms may not apply
for different types of animals, and removal of some of the species specific mod-
ifications may result in reduced accuracy. In this respect, systems which are
built for multi-species re-identification might be better suited for the problem
addressed through this paper.

16
4.2. Multi-Species Identification
This section describes works which have applied their methods over different
animal species. It is to be noted that the same topology and process is used
for multiple species, but not simultaneously; each model is trained separately.
Unlike in section 4.1, the systems built for multi-species re-identification are far
too different to group together effectively, and are studied individually in their
respective contexts. A summary of each publication presented in this section is
illustrated in Table 4.
Naiser et al. in their work [29] describe a tool for tracking ants, snowbugs
and zebrafish. The re-identification of each individual is performed using an 8-
layer customized CNN trained through a Siamese architecture and uses Triplet
loss. The reported accuracy of tracking is 79.2% for ants (averaged), 80.14% for
snowbugs and 88.14% for Zebrafish. The system is focused on short-term, real-
time tracking of multiple species, and being a lab setup, has no need of open-set
identification. The visual features are auto-learned, and hence transferable to
new species given sufficient training. The study included only small animals,
and the feature extraction network’s performance over larger animals is hard to
predict.
Shukla et al. [24] present their work on identification of two types of pri-
mates, namely Rhesus Macaques and chimpanzees. The identification network
is DenseNet121 [33]. A combination of pairwise KL divergence loss and cross-
entropy loss is the objective function used for training. The system is tested on
the C-Zoo and C-Tai dataset (Chimpanzee) [27] and a custom created Macaques
dataset, with a combined size of 14845 images, and approximately 90 identities
each. A top-1, closed set accuracy of 91.87% and 97.36%, and top-1, open
set accuracy of 66.24% and 88% over the Chimpanzee and Macaques’ dataset
respectively is reported. Since the discriminating feature set is auto-learned,
extending the network to newer species is possible. It is, however, worthwhile
to note that not all animals may be distinguished by their faces, and the same
network would need to be capable of extracting other discriminating features as
well.

17
Schneider et al. [28] compare and contrast the performance of different Deep
Learning Networks and capture their performance over multiple species. In par-
ticular, five different networks – AlexNet [1], VGG-19 [30], DenseNet201 [33],
MobileNetv2 [34] and InceptionV3 [2], are tested for their abilities in one-shot
learning under the Siamese architecture. The five species under study are hu-
mans, chimpanzees, humpback whales, fruit files and octopus. Publicly available
datasets are used for each species, with the exception of octopus, for which a
dataset was created using videos from the internet. Binary Cross Entropy loss
is used as the objective function during training. The publication concludes
that for the species included in the study, DenseNet201 is the optimal network
for feature extraction since it outperformed the other models in four out of
the five datasets, obtaining 92.2, 89.7, 79.3, 75.5 and 61.4 percentage accuracy
on the octopus, human, fruit fly, chimpanzee and humpback whale data sets
respectively. The authors also test the ability of the networks under open-set
recognition, by using 10% of the individuals as a test set, and including none
of the images of these individuals during training. This study is an indicator of
the suitability of networks trained using the Siamese architecture to the animal
re-identification problem.
The study by Deb et al. [23] used images of lemur, golden monkeys and
chimpanzees for individual identification. Faces of the primates are used to
identify individuals, and are first properly aligned before feature extraction.
The images are manually annotated to localize landmarks, namely eyes and
mouth. The neural network introduced here, called PrimNet is based from
the SphereNet CNN. AM-Softmax function is the loss function used to train
the network. Features extracted from the network are compared using Cosine
Similarity, by performing pairwise comparison with all known individuals in
the gallery. The performance of the network is measured over the C-Zoo and
C-Tai datasets [27] for Chimpanzees, LemurFace for Lemurs (containing 3000
images of 129 individuals) and a custom created dataset for Golden Monkeys
containing 1450 images of 49 individuals. Top-1 closed set accuracy reported
is 93.76%, 90.36% and 75.82%, and top-1 open set accuracy is 81.73%, 66.11%

18
and 37.08% for Lemurs, Golden Monkeys and Chimpanzees respectively. One of
the drawbacks for fully automated identification is the use of facial landmarks
(eyes and mouth) to align the image. This would be unfeasible in real-time
applications, and requires an automatic labelling system. It also follows that
identification might fail if the face is not visible in the image, such as if the
image is captured from the side.
Cheema et al. in [15] describe a method that attempts to identify multiple
species with distinct patterns on their skin. The study included tigers, zebra and
jaguars. A Faster-RCNN [35] network is used to localize animals in the image,
and the locations are run through a truncated AlexNet model [1] (ImageNet
pre-trained) to extract distinct features. An SVM classifier [36] is used on the
features for identity association. Their dataset consists of 247 images of tigers
(44 individuals), 821 zebra images (83 individuals) and 112 jaguar images (37
individuals). A top-1 accuracy 80%, 93% and 78% respectively on tiger, zebra
and jaguars is reported. Open-set recognition has not been explored. Use of a
fixed layer for extraction of features, with no particular training, might benefit
certain species, but may not be as effective for others. Instead. choice of the
layer could be species-specific, keeping the rest of the procedure intact to achieve
better performance.
With the exception of [28], the other publications use roughly homogeneous
species – be it primate sub-species for [23] and [24] or animals having distin-
guishing coat patterns for [15]. Only [28] report performances of the system
after testing over a truly heterogeneous set of animals, and observe that the ac-
curacy drops off to as low as 61% for whales. This might imply that it is likely
that the same system might not be effective to use across multiple different
species, but needs to be adapted suitably where required.
In summary, the general procedure used by animal re-identification systems
is provided in Figure 3. A few examples of the various possible choices for
the technique used at each stage are illustrated. Several of these choices are
used by different publications reviewed in this article. The figure underlines the
challenges faced in model and procedure selection for animal re-identification

19
Figure 3: Typical procedure used by animal re-identification systems. A non-exhaustive set
of alternative techniques used at each stage are illustrated.

systems.

5. Person Re-Identification

While there have been significant strides in animal re-identification, person


re-identification is an older challenge that has been extensively explored. There
have been vast improvements in methods of person re-identification post ap-
plication of deep learning. It is therefore possible that systems and methods
designed for person re-id could prove highly effective for animal re-id as well.
In this section, a few of the state-of-the art published methods for per-
son re-identification are reviewed in terms of their extensibility to animal re-
identification. The papers reviewed here have reported, at the time of access,
state-of-the-art performance on two datasets – the Duke MTMC [37] and the
Market-1501 dataset [38]. While the DukeMTMC dataset has been discontin-
ued, the methods that have been applied on these datasets are still relevant.
The five best performing papers on both of these datasets, going by their pub-

20
Ref. Market1501 Market1501 DukeMTMC DukeMTMC
(Year) mAP (%) Top-1 (%) mAP (%) Top-1 (%)

[39] (2019) 95.50 98.00 92.70 94.50


[40] (2019) 94.24 95.43 89.10 90.20
[41] (2019) 94.20 95.40 89.20 91.40
[42] (2019) 92.49 95.40 88.31 90.36
[43] (2018) 91.70 94.70 85.90 89.00

Table 5: Performance metrics for Person Re-Identification over the Market15013 and
DukeMTMC4 datasets. mAP stands for Mean Average Precision

licly available leader-boards are presented here. A summary of each of the


publications reviewed are tabulated in Table 5.
Wang et al. in [39] augment visual information with spatio-temporal infor-
mation streams, and classifies images based on a joint metric of cosine similar-
ity of extracted feature vectors, and probabilities calculated using the spatio-
temporal stream. This is repeated for all pairs of images and results are ranked
accordingly. A ResNet50 backbone is used for feature extraction, with Cross En-
tropy loss as the objective function. This method has achieved top performance
in both Market1501 [38] and DukeMTMC [37] datasets.
Luo et al. [40] combine several procedures from various past research and
combine them to create a formidable re-identification system. This includes
data augmentation procedures like random erasure augmentation and horizon-
tal flipping, use of Triplet Loss and use of label smoothing. Additionally, a
batch normalization bottleneck is added to allow for separation of the embed-
ding space for the two different loss functions (center loss and triplet loss). A
combination of all these produces a robust re-id system, which nears state-of-
the-art performance.

3 https://paperswithcode.com/sota/person-re-identification-on-market-1501
4 https://paperswithcode.com/sota/person-re-identification-on-dukemtmc-reid

21
Quan et al. [41] also employ a variant of the PCB to extract features,
and describe automatically generating a network architecture for person re-
identification. Softmax classification is used to assign identities to the tracklets.
This generated architecture may not extend well to animals, given that it has
been specifically built for person re-identification. However, the same method-
ology can be used to generate a specific architecture for animal re-identification
task.
Zheng et al. [42] augment the training data by using Generative Adversar-
ial Networks to create auxiliary images from the dataset while training is in
progress (online generation). A modified ResNet50 backbone is used for feature
extraction, with KL Divergence loss as the objective function. A softmax clas-
sifier is then used to associate identities. Generation of auxiliary images using
GANs is highly innovative; images generated are combinations of different per-
sons or variations of same person. While this is acceptable for human images,
this may not apply to animals, since changing the appearance of the animal will
result in a different individual. A different approach is required here.
Wang et al. [43] use Spatial Attention Networks for re-identification, and
use a variant of the Parts Based Convolutional Baseline (PCB) [44] during
training, where the image of the person is split into multiple horizontal stripes,
and features are extracted separately from each stripe. ResNet50 is the back-
bone feature extraction network. A spatial attention layer is added between the
Global Average Pooling layer and Fully Connected layer to prevent loss of infor-
mation due to average pooling. Identity association is through computation of
euclidean distance between each pair of images and associating identity of the
closest pair to the probe image.
While each individual publication has its unique contributions, some of the
common methods employed are listed below. Use of PCB seems a popular choice
for feature extraction, and variants of the same have been used in [39], [41] and
[43]. [39], [43], [40] and [42] all use a ResNet50 backbone [3] with suitable
modifications as the feature extraction network. [41] uses a Neural Architecture
Search to find the best model for re-identification.

22
Use of PCB may not be suitable to extract features of animals, owing to
the vertical postures of humans vs. primarily horizontal postures of animals.
An observation from these publications is the wide use of the ResNet model
[3] as the backbone, which encourages its use in animal re-identification tasks
as well. As does the use of Siamese Architecture coupled with Triplet-Loss as
the objective function for training the network. A comprehensive survey on
use of deep learning models for person re-identification systems is conducted by
Wu et al. [45]. The survey additionally evaluates the effectiveness of multiple
Deep Learning models using different loss functions. Wu et al. observe that the
ResNet50 model coupled with Triplet loss is one of the best performing systems
among those considered in their survey. This result further encourages use of
the above model – loss function combination.

6. Discussion

This paper presents a study of several animal re-identification methods that


use deep learning for feature extraction. A few state-of-the-art object tracking
and person re-identification methods have also been investigated to understand
and extend ideas within them to the problem of animal tracking. A few of the
challenges in building an animal tracking system are discussed in this section.
Further, common traits used to solve them are listed.

6.1. Feature Extraction

Use of the full detection, rather than use of localized parts of the animal (such
as face or head) are favorable in terms of building a robust feature extractor,
which can perform well under differing view points. Deep learning networks
that learn discriminating features without catering to a particular set of species-
specific features is highly desirable. Training networks that learn to discriminate
between individuals using examples of positive and negative images are popular,
in particular, use of Siamese Network Architecture during training along with
Triplet loss or KL Pair-wise divergence loss and their variants could prove to

23
build a system that can characterize individuals accurately. Such methods,
which are known to give good results in the task of human re-identification,
could perform well for tasks of animal identification as well.

6.2. Identity Association

One of the challenges in tracking animals is the ability to associate identities


across cameras. Although re-identification is primarily a human-in-the-loop
based system, improvements in accuracy imply better feature extraction, and
help cross-camera animal tracking. Currently, state-of-the-art object tracking
systems extract visual features of objects and associate identities on the basis of
similarity of these feature vectors, but still have fairly large lost or fragmented
tracks and id switches. Extending such systems to use re-identification networks,
which have good feature extraction performance, will greatly improve cross-
camera tracking.
Only few of the publications reviewed in this paper formulate the problem as
that of open-set recognition. Use of Softmax classification or Closed-Set SVMs
[36] may not be optimal in the setting of re-identification. It is highly likely that
individuals encountered in training may not be seen again when the systems are
deployed on-field, unless built for a controlled ecosystem such as a zoo (and
even then, site specific model training would become necessary). Even excellent
results over the validation and test sets may prove to be unrepresentative of
the system performance when applied to wild environments. It is therefore
necessary to design the networks to be able to infer that the individual under
test is unknown or previously unseen. Use of OpenMax classification [46], and
open-set SVMs [47] are better alternatives. Clustering feature vectors using
appropriate distance metrics, such as cosine similarity or euclidean distance can
help a centralized server assign identities using these feature vectors, instead of
the association being on-device or on-site only. This increases the scalability of
the system.

24
6.3. Multi-Species Identification

There are several challenges with regard to identification of multiple animal


species using a single system. Individual discriminating features vary between
species, and although some family of species may contain similar feature char-
acteristics, when the same system needs to identify heterogeneous species, there
could be a drop in performance. For example, facial recognition can be used to
identify multiple primate sub-species, flank patterns can be used to identify a
few varieties of large cats etc. However, when one of these is combined with a
heterogeneous species, such as an elephant or a lion, then the extracted feature
characteristics may no longer be reliable. It is therefore easier to have multiple
individual models, each trained for a particular species, and use the model cor-
responding to the species. The drawback of this approach is that the system
size grows with the number of species; deep learning models require access to
large primary memory. Having several models simultaneously loaded will cause
a huge bloat of the system configuration, thereby affecting cost effectiveness. It
might be prudent to deploy multiple models, each capable of identifying groups
of species – particularly homogeneous species like primates or wild cats. This
would reduce the requirement of having an overly large system.

6.4. Speed of Inference

One of the key aspects of any good tracking system is operational speed.
It is necessary for the system to function in real time (or at least near-real
time). However, a majority of the publications surveyed here do not report
the time consumed. This is understandable, since most are built for offline
functions. However, in order to adopt any of these systems for monitoring
applications, the entire identity association needs to be completed in sub-second
time intervals, ideally a few hundred milliseconds at the maximum. Feature
extraction time would otherwise become a bottleneck which could potentially
reduce the overall effectiveness of the system. This could be avoided with the
use of optimized feature extraction networks, and reasonably fast GPUs or the
emerging processors like Tensor Processing Units (TPUs).

25
7. Conclusion

This paper looks at the problem of re-identification in a new light – namely


as a potential tool for improving tracking performance in animals. Several recent
object tracking methods are reviewed, and the importance of feature extraction
in such systems is highlighted. Further, several animal re-identification net-
works are investigated which could potentially be used for feature extraction
during animal tracking. These publications are extensively examined in view
of their effectiveness, specifically for cross-camera, multi-species and open-set
recognition. In addition, some of the best performing person re-identification
systems are studied to understand how feature extraction can be incorporated
effectively for animal tracking systems. The challenges in multiple areas of the
tracking pipeline are listed, along with recommendations on how they could be
solved. This complete end-to-end survey throws light on the current state of
affairs in animal tracking, and hopes to provide a baseline for future research in
the area.

References

[1] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with


deep convolutional neural networks, Commun. ACM 60 (6) (2017) 84–90.
doi:https://doi.org/10.1145/3065386.

[2] C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed,


D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going
deeper with convolutions, in: 2015 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), 2015, pp. 1–9.
doi:https://doi.org/10.1109/CVPR.2015.7298594.

[3] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for


image recognition, in: 2016 IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
doi:https://doi.org/10.1109/CVPR.2016.90.

26
[4] G. Koch, Siamese neural networks for one-shot image recognition, in: ICML
deep learning workshop, Vol. 2, 2015.

[5] G. C. Hays, H. Bailey, S. J. Bograd, W. D. Bowen, C. Campagna, R. H.


Carmichael, P. Casale, A. Chiaradia, D. P. Costa, E. Cuevas, P. Nico de
Bruyn, M. P. Dias, C. M. Duarte, D. C. Dunn, P. H. Dutton, N. Esteban,
A. Friedlaender, K. T. Goetz, B. J. Godley, P. N. Halpin, M. Hamann,
N. Hammerschlag, R. Harcourt, A.-L. Harrison, E. L. Hazen, M. R. He-
upel, E. Hoyt, N. E. Humphries, C. Y. Kot, J. S. Lea, H. Marsh, S. M.
Maxwell, C. R. McMahon, G. Notarbartolo di Sciara, D. M. Palacios,
R. A. Phillips, D. Righton, G. Schofield, J. A. Seminoff, C. A. Simpfendor-
fer, D. W. Sims, A. Takahashi, M. J. Tetley, M. Thums, P. N. Trathan,
S. Villegas-Amtmann, R. S. Wells, S. D. Whiting, N. E. Wildermann, A. M.
Sequeira, Translating marine animal tracking data into conservation policy
and management, Trends in Ecology & Evolution 34 (5) (2019) 459 – 473.
doi:https://doi.org/10.1016/j.tree.2019.01.009.

[6] S. Schneider, G. W. Taylor, S. Linquist, S. C. Kremer, Past, present and


future approaches using computer vision for animal re-identification from
camera trap data, Methods in Ecology and Evolution 10 (4) (2019) 461–470.
doi:https://doi.org/10.1111/2041-210x.13133.

[7] H. Whitehead, Computer assisted individual identification of sperm whale


flukes, Reports of the International Whaling Commission 12 (1990) 71–77.

[8] A. Milan, L. Leal-Taixe, I. Reid, S. Roth, K. Schindler, Mot16: A bench-


mark for multi-object tracking, arXiv preprint arXiv:1603.00831 0 (2016)
0–0. arXiv:1603.00831.

[9] P. Bergmann, T. Meinhardt, L. Leal-Taixé, Tracking with-


out bells and whistles, in: 2019 IEEE/CVF International
Conference on Computer Vision (ICCV), 2019, pp. 941–951.
doi:https://doi.org/10.1109/ICCV.2019.00103.

27
[10] Y.-C. Yoon, D. Y. Kim, K. Yoon, Y. min Song, M. Jeon, Online multiple
pedestrian tracking using deep temporal appearance matching association,
arXiv preprint arXiv:1907.00831 0 (2019) 0–0. arXiv:1907.00831.

[11] E. Bochinski, T. Senst, T. Sikora, Extending iou based multi-object track-


ing by visual information, in: 2018 15th IEEE International Conference
on Advanced Video and Signal Based Surveillance (AVSS), 2018, pp. 1–6.
doi:https://doi.org/10.1109/AVSS.2018.8639144.

[12] A. Hermans, L. Beyer, B. Leibe, In defense of the triplet loss for person
re-identification, arXiv preprint arXiv:1703.07737 0 (2017) 0–0.

[13] C. N. Phyo, T. T. Zin, H. Hama, I. Kobayashi, A hybrid


rolling skew histogram-neural network approach to dairy cow iden-
tification system, in: 2018 International Conference on Image
and Vision Computing New Zealand (IVCNZ), 2018, pp. 1–5.
doi:https://doi.org/10.1109/IVCNZ.2018.8634739.

[14] L. Bergamini, A. Porrello, A. C. Dondona, E. Del Negro, M. Mat-


tioli, N. D’alterio, S. Calderara, Multi-views embedding for cattle
re-identification, in: 2018 14th International Conference on Signal-
Image Technology Internet-Based Systems (SITIS), 2018, pp. 184–191.
doi:https://doi.org/10.1109/sitis.2018.00036.

[15] G. S. Cheema, S. Anand, Automatic detection and recognition of indi-


viduals in patterned species, in: Machine Learning and Knowledge Dis-
covery in Databases, Springer International Publishing, 2017, pp. 27–38.
doi:https://doi.org/10.1007/978-3-319-71273-4_3.

[16] D. A. Konovalov, S. Hillcoat, G. Williams, R. A. Birtles, N. Gardiner, M. I.


Curnock, Individual minke whale recognition using deep learning convolu-
tional neural networks, Journal of Geoscience and Environment Protection
6 (2018) 25–36. doi:https://doi.org/10.4236/gep.2018.65003.

28
[17] M. Körschens, B. Barz, J. Denzler, Towards automatic identification of
elephants in the wild, arXiv preprint arXiv:1812.04418 0 (2018) 0–0.
arXiv:1812.04418.

[18] Q. He, Q. Zhao, N. Liu, P. Chen, Z. Zhang, R. Hou, Distinguishing indi-


vidual red pandas from their faces, in: Z. Lin, L. Wang, J. Yang, G. Shi,
T. Tan, N. Zheng, X. Chen, Y. Zhang (Eds.), Pattern Recognition and
Computer Vision, Springer International Publishing, Cham, 2019, pp. 714–
724. doi:https://doi.org/10.1007/978-3-030-31723-2_61.

[19] A. Shukla, C. Anderson, G. Sigh Cheema, P. Gao, S. Onda,


D. Anshumaan, S. Anand, R. Farrell, A hybrid approach to
tiger re-identification, in: 2019 IEEE/CVF International Confer-
ence on Computer Vision Workshop (ICCVW), 2019, pp. 294–301.
doi:https://doi.org/10.1109/ICCVW.2019.00039.

[20] C. Liu, R. Zhang, L. Guo, Part-pose guided amur tiger re-


identification, in: 2019 IEEE/CVF International Conference
on Computer Vision Workshop (ICCVW), 2019, pp. 315–322.
doi:https://doi.org/10.1109/ICCVW.2019.00042.

[21] C. Brust, T. Burghardt, M. Groenenberg, C. Kading, H. S. Kühl,


M. L. Manguette, J. Denzler, Towards automated visual monitoring of
individual gorillas in the wild, in: 2017 IEEE International Confer-
ence on Computer Vision Workshops (ICCVW), 2017, pp. 2820–2830.
doi:https://doi.org/10.1109/ICCVW.2017.333.

[22] S. Li, J. Li, W. Lin, H. Tang, Amur tiger re-identification in the wild, arXiv
preprint arXiv:1906.05586 0 (2019) 0–0.

[23] D. Deb, S. Wiper, S. Gong, Y. Shi, C. Tymoszek, A. Fletcher, A. K. Jain,


Face recognition: Primates in the wild, in: 2018 IEEE 9th International
Conference on Biometrics Theory, Applications and Systems (BTAS), 2018,
pp. 1–10. doi:https://doi.org/10.1109/BTAS.2018.8698538.

29
[24] A. Shukla, G. S. Cheema, S. Anand, Q. Qureshi, Y. Jhala, Primate
face identification in the wild, Lecture Notes in Computer Science (2019)
387–401doi:https://doi.org/10.1007/978-3-030-29894-4_32.

[25] Z. Li, C. Ge, S. Shen, X. Li, Cow individual identification based on con-
volutional neural network, in: Proceedings of the 2018 International Con-
ference on Algorithms, Computing and Artificial Intelligence, ACAI 2018,
Association for Computing Machinery, New York, NY, USA, 2018, pp. 1–5.
doi:https://doi.org/10.1145/3302425.3302460.

[26] S. Bouma, M. D. M. Pawley, K. Hupman, A. Gilman, Individual common


dolphin identification via metric embedding learning, in: 2018 International
Conference on Image and Vision Computing New Zealand (IVCNZ), 2018,
pp. 1–6. doi:https://doi.org/10.1109/ivcnz.2018.8634778.

[27] A. Freytag, E. Rodner, M. Simon, A. Loos, H. S. Kühl, J. Denzler,


Chimpanzee faces in the wild: Log-euclidean cnns for predicting identities
and attributes of primates, in: B. Rosenhahn, B. Andres (Eds.), Pattern
Recognition, Springer International Publishing, Cham, 2016, pp. 51–63.
doi:https://doi.org/10.1007/978-3-319-45886-1_5.

[28] S. Schneider, G. W. Taylor, S. C. Kremer, Similarity learning net-


works for animal individual re-identification-beyond the capabilities of
a human observer, in: Proceedings of the IEEE Winter Conference
on Applications of Computer Vision Workshops, 2020, pp. 44–52.
doi:https://doi.org/10.1007/978-3-030-01225-0_30.

[29] F. Naiser, M. Šmı́d, J. Matas, Tracking and re-identification system for


multiple laboratory animals, in: Visual observation and analysis of Verte-
brate And Insect Behavior Workshop at International Conference on Pat-
tern Recognition, Vol. 0, 2018, pp. 0–0.

[30] K. Simonyan, A. Zisserman, Very deep convolutional networks for large-


scale image recognition, arXiv 1409.1556 0 (2014) 0–0.

30
[31] M. Körschens, J. Denzler, Elpephants: A fine-grained dataset for
elephant re-identification, in: 2019 IEEE/CVF International Confer-
ence on Computer Vision Workshop (ICCVW), 2019, pp. 263–270.
doi:https://doi.org/10.1109/ICCVW.2019.00035.

[32] D. G. Lowe, Distinctive image features from scale-invariant key-


points, International Journal of Computer Vision 60 (2) (2004) 91–110.
doi:10.1023/B:VISI.0000029664.99615.94.

[33] G. Huang, Z. Liu, L. Van Der Maaten, K. Q. Weinberger, Densely


connected convolutional networks, in: 2017 IEEE Conference on Com-
puter Vision and Pattern Recognition (CVPR), 2017, pp. 2261–2269.
doi:https://doi.org/10.1109/cvpr.2017.243.

[34] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. Chen, Mobilenetv2:


Inverted residuals and linear bottlenecks, in: 2018 IEEE/CVF Confer-
ence on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520.
doi:https://doi.org/10.1109/CVPR.2018.00474.

[35] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: Towards real-time


object detection with region proposal networks, IEEE Transactions on
Pattern Analysis and Machine Intelligence 39 (6) (2017) 1137–1149.
doi:https://doi.org/10.1109/tpami.2016.2577031.

[36] C. Cortes, V. Vapnik, Support-vector networks, Machine Learning 20 (3)


(1995) 273–297. doi:https://doi.org/10.1007/BF00994018.

[37] M. Gou, S. Karanam, W. Liu, O. Camps, R. J. Radke,


Dukemtmc4reid: A large-scale multi-camera person re-identification
dataset, in: 2017 IEEE Conference on Computer Vision and
Pattern Recognition Workshops (CVPRW), 2017, pp. 1425–1434.
doi:https://doi.org/10.1109/CVPRW.2017.185.

[38] L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang, Q. Tian, Scal-


able person re-identification: A benchmark, in: 2015 IEEE Interna-

31
tional Conference on Computer Vision (ICCV), 2015, pp. 1116–1124.
doi:https://doi.org/10.1109/ICCV.2015.133.

[39] G. Wang, J. Lai, P. Huang, X. Xie, Spatial-temporal


person re-identification, Proceedings of the AAAI Con-
ference on Artificial Intelligence 33 (2019) 8933–8940.
doi:https://doi.org/10.1609/aaai.v33i01.33018933.

[40] H. Luo, Y. Gu, X. Liao, S. Lai, W. Jiang, Bag of tricks and a strong baseline
for deep person re-identification, in: The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 0–0.

[41] R. Quan, X. Dong, Y. Wu, L. Zhu, Y. Yang, Auto-reid: Searching for


a part-aware convnet for person re-identification, in: Proceedings of the
IEEE International Conference on Computer Vision, 2019, pp. 3750–3759.
doi:https://doi.org/10.1109/ICCV.2019.00385.

[42] Z. Zheng, X. Yang, Z. Yu, L. Zheng, Y. Yang, J. Kautz, Joint dis-


criminative and generative learning for person re-identification, 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR)doi:https://dx.doi.org/10.1109/CVPR.2019.00224.

[43] H. Wang, Y. Fan, Z. Wang, L. Jiao, B. Schiele, Parameter-free spatial atten-


tion network for person re-identification, arXiv preprint arXiv:1811.12150
0 (2018) 0–0. arXiv:1811.12150.

[44] Y. Sun, L. Zheng, Y. Yang, Q. Tian, S. Wang, Beyond part models: Person
retrieval with refined part pooling (and a strong convolutional baseline), in:
V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision
– ECCV 2018, Springer International Publishing, Cham, 2018, pp. 501–518.
doi:https://doi.org/10.1007/978-3-030-01225-0_30.

[45] D. Wu, S.-J. Zheng, X.-P. Zhang, C.-A. Yuan, F. Cheng, Y. Zhao, Y.-J. Lin,
Z.-Q. Zhao, Y.-L. Jiang, D.-S. Huang, Deep learning-based methods for per-

32
son re-identification: A comprehensive review, Neurocomputing 337 (2019)
354 – 371. doi:https://doi.org/10.1016/j.neucom.2019.01.079.

[46] A. Bendale, T. E. Boult, Towards open set deep networks, in: 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2016,
pp. 1563–1572. doi:https://doi.org/10.1109/cvpr.2016.173.

[47] W. J. Scheirer, A. de Rezende Rocha, A. Sapkota, T. E.


Boult, Toward open set recognition, IEEE Transactions on Pat-
tern Analysis and Machine Intelligence 35 (7) (2013) 1757–1772.
doi:https://doi.org/10.1109/TPAMI.2012.256.

33

View publication stats

You might also like