Deep Learning Methods For Multi Species Animal Re Identification and Tracking A Survey
Deep Learning Methods For Multi Species Animal Re Identification and Tracking A Survey
net/publication/343518676
CITATIONS READS
42 2,233
2 authors:
All content following this page was uploaded by Prashanth C Ravoor on 22 December 2020.
Abstract
∗ Corresponding author
c 2020. This manuscript version is made available under the CC-BY-NC-ND 4.0 license
http://creativecommons.org/licenses/by-nc-nd/4.0/
Published article: https://doi.org/10.1016/j.cosrev.2020.100289
1. Introduction
1.1. Motivation
Animals straying into human settlements, primarily in search for food cause
conflicts resulting in injury to humans, animals, or both. A fully automated
monitoring system that detects animal transgressions and alerts concerned au-
thorities can help reduce causalities. Computer Vision is one choice of tech-
nology that can potentially solve most of the associated problems. The system
referred to henceforth is in context of a network of cameras running image pro-
cessing software.
It is not sufficient to simply detect if a wild animal is close by. It would also
be highly beneficial to provide the closest estimate of current animal location.
The problem thus faced here is that of tracking an individual animal (or a group
of individual animals) in a cross-camera setup in real or near-real time. It is not
necessary to remember the identity of the individual(s) for an extended duration
of time, but the identity would only need to be retained until the animal has
moved completely out of range of the system (this could be in the order of
minutes or hours).
2
1.2. Terminology
3
probe needs to generate a single result – either the identity of the probe individ-
ual or a new identity (in case of open-set recognition). In re-identification terms,
instead of the top-k matches, tracking systems use only the top-1 match always.
It goes without saying that re-identification applies only for individuals of the
same species. However, general applications of animal tracking would require
identifying multiple kinds of animals, and it must be capable of identifying each
of them separately.
In summary, this study focuses on the top-1, open-set re-identification of
multiple species of animals in a cross-camera setup with identities assigned on
a non-permanent basis (short-term tracking).
4
3. Provide an overview of such methods in the context of open-set recognition
and their applicability for the same
4. Explore application of ideas used in state-of-the-art object tracking and
human re-identification methods to animal re-identification
5. List common trends in animal re-identification, the challenges faced and
possible solutions to such problems
2. Related Work
5
that the use of deep learning methods for re-identification results in increased
accuracy and eases the development of completely autonomous re-identification
systems. While the work of Schneider et. al. is fairly comprehensive, it does
not comment on re-identification in terms of a) open set recognition, b) multi-
species re-identification, c) applicability to end-to-end automation for tasks such
as tracking. The work done in this paper extends the survey, and contains a
study of recent research in the area. Although there is some overlap in the
papers presented here and in [6], the context of the presentations vary. This
paper also looks at the problem of re-identification in a new light – namely as
a tool for tracking individuals in a cross-camera setup.
3. Object Tracking
Since the problem described in this paper directly relates to tracking, some
state-of-the-art object tracking methods are explored in this section, along with
their advantages and shortcomings. The papers selected for review are some of
the top performers from the Multi Object Tracking (MOT) challenge [8], which
is a widely used benchmark. Apart from the overall score (MOTA), two other
key metrics have been taken into consideration when selecting the papers for
review – the number of id switches (where the identity assigned to one of the
tracklets changes erroneously) and fragmented tracks (discontinuation in the
tracking sequence). Three publications are presented below, which are among
the best performers in the MOT 2019 challenge, and which had, at the time
of writing, published articles describing the algorithms used and methodology
employed. A summary of the performance of all three papers is illustrated in
Table 1.
Bergmann et al. [9] present a novel way of using the task of object detection
itself to additionally track objects, by suitably modifying the detection network.
Outputs are added to allow the system to decide for each detection, whether
1 https://motchallenge.net/results/CVPR_2019_Tracking_Challenge/
6
Id Fragmented
Ref. Feature Speed
MOTA Switches Tracks
(Year) Extraction (FPS)
Ratio Ratio
[9] (2019) Siamese Network Architecture 51.3 ± 18.7 47.2 88.2 2.7
[10] (2019) Joint Inference Network 47.6 ± 20.3 44.4 70.9 0.2
[11] (2018) – 46.7 ± 19.6 48.6 81.8 18.2
Table 1: Performance metrics of the surveyed object tracking algorithms. MOTA stands for
Multi-Object Tracking Accuracy. The details of each metric can be found from the MOT2019
tracking results page1
to continue an existing track, or start a new one. Stale tracks are removed pe-
riodically. Camera motion compensation and constant velocity assumption are
used to smooth tracks in case of occlusions or failed detection frames. To pre-
serve identities of tracked objects, feature vectors are generated from a network
trained using a Siamese architecture [4], and used to determine if the identities
of objects in two discontinuous tracks are the same.
Yoon et al. [10] describe the use of a combination of locality based associa-
tion using Hungarian algorithm and appearance based association using visual
features to maintain track continuity. A joint inference network (JI-Net) is used
to extract feature vectors and compare the similarity of two images. The JI-Net
is trained using a Siamese Architecture [4] with Triplet Loss [12] as the objec-
tive function. The output of the JI-Net is fed to an Long Short Term Memory
(LSTM) network, and it is updated with changes to appearance of the object
over time.
Bochinski et al. [11] describe an Intersection-Over-Union (IOU) based tracker,
and show that augmenting it with visual information largely reduces number of
id switches and reduces fragmented tracks. In every frame, the detected object
with the highest overlap (IOU) is assigned to the same tracklet. Any interme-
diate frames with a failed detection would break tracks. To avoid this, a visual
tracker is initialized over the last known frame, and if the object is detected
again in subsequent frames, the tracklets are merged. This eliminates the need
7
for predictive tracking, where the last known frame and direction are used to
predict direction of movement. The visual cues used are only image location
based, and not to extract features. [11] report a good increase in overall tracking
accuracy and a reduction in the number of id switches.
One of the challenges observed with tracking methods is cross-camera track-
ing; specifically, how visual features are used to track identities across different
cameras in real-time. Subtle changes in lighting and colouration can lead to
non-matching of extracted feature vectors. Use of deep learning for feature ex-
traction goes a long way to address this problem. Current methods tend to use
only spatial factors (location of objects within the image) to determine identities
within images from a single camera. Clearly, when the object moves between
two cameras, the locations of the object with respect to the frame can vary sig-
nificantly depending on the angle of capture and direction of movement, giving
rise to possibility of an id-switch. Apart from having a good feature extrac-
tion method, it follows that temporal factors can also be introduced to build a
better strategy for identity association and make the tracker more robust in a
cross-camera setting,
Use of deep learning networks for feature extraction is commonly used to re-
duce id switches and fragmented tracks. However, there is no specific test case
for tracking animals in the datasets – a majority of the test cases use people as
subjects for tracking. As noted in section 1.4, there are some fundamental dif-
ferences in identifying humans vs. identifying animals. Assuming that the rest
of the tracking system remains unchanged, just the feature extraction module
can potentially be replaced with one more suited to animals. The new module
could use the same network architecture used to identify humans (with appro-
priate training) or a completely different network architecture. In view of the
latter, a study of existing feature extraction methods for animals is presented
in the next section.
8
4. Deep Learning Methods for Animal Re-Identification
This section covers all the studies that have been conducted on identification
of a single species. A brief summary of all the publications are tabulated in
Table 3. The publications here have been sub-categorized by the mechanism of
identification of each species – namely those that use a) localized body parts
(like trunk, flank etc.), b) face, head or full body of the animal. An illustration
of what is meant by the above mentioned methods is found in Figure 2, which
9
Figure 1: Motivations for animal re-identification studies surveyed in this paper. Y-Axis
indicates the count of the number of papers.
Table 2: The primary motivations for the surveyed animal re-identification publications
10
indicates various parts of the animal that can be used for feature extraction and
subsequent identification. The Amur Tiger Re-Identification challenge has been
addressed separately, since these solutions are specific to a particular problem.
11
Ref. # labels/ Id Feature Identity Closed Set Open Set
(Year) # images Mechanism Extractor Association Acc. % Acc. %
Table 3: Brief Summary of all single species animal re-identification papers surveyed. Abbre-
viations: LBP – Localized Body Parts, H – Head / Face / Full Detection.
the detector can segment distinguishing parts of the image for different species,
feature extraction becomes simpler. In the case of [26] it also follows that since
feature vectors are extracted and clustered to assign identities, finding if an
individual has not been seen before is also possible
12
panzees, and describe a system to identify characteristics of these primates, in-
cluding the identity, gender, age etc.. AlexNet [1] is used for feature extraction.
The publication argues on the importance of conversion of extracted feature
vectors into Euclidean spaces before they can be compared, and introduce a
Log-Euclidean framework that uses matrix logarithms on the feature vectors
before comparing them in the Euclidean space. An accuracy of nearly 92% on
C-Zoo and 75.66% on C-Tai datasets are reported.
Brust et al. [21] outline a system to identify gorilla, over a dataset of 2000
images of 147 different individuals. The identification pipeline used is similar to
that used by Freytag et al. in [27] but omits the matrix logarithm and bi-linear
pooling. The publication reports an accuracy of 62.4%.
Korschens et al. [17] introduce a dataset for elephants, consisting of 2076
images containing 276 individuals [31]. The authors describe use a truncated
ResNet50 network [3] for feature extraction and report a top-1 accuracy of 56%
when using single image input.
Bergamini et al. [14] create a system to identify cows over a dataset of
17802 images consisting of 439 individuals.Two different CNNs with a custom
architecture are trained with multiple views of faces of the same cow for feature
extraction. The outputs of each network are combined to form a single feature
vector. Histogram loss is used as the loss function. The system obtains 81.7%
top-1 closed set-accuracy and 55.8% top-1 open-set accuracy.
Li et al. [25] build a system for identification of individual cows, over 2160
images containing 30 individuals. The full detection of the cow is used for
feature extraction. The authors build and train a de-noising network to pre-
process mages and the de-noised image is then fed to a modified InceptionV3
[2] network for individual identification. A top-1 accuracy of 92% is reported.
All the publications here follow the same pipeline – object localization, then
feature extraction followed by classification. Although the classification method
used may not be applicable to open-set recognition (use of closed-set SVM clas-
sifiers for instance), it is possible to extend the concept to unseen individuals
assuming extracted features would be different for the new image. The system
13
Figure 2: An example of the possible parts of an image that can be used for feature extraction.
Base image source: one of the images published in the Amur Tiger dataset [22]
could also be extended to other species which have distinguishing facial charac-
teristics. However, facial characteristics by themselves may not be sufficient for
discriminating between individuals of certain species. Another drawback is the
face being hidden, such as when the image is captured from the side rather than
the front. On the other hand, no particular features are used for identification
in [25], i.e. the distinguishing features are auto-learned by the CNN using the
entire image of the cow, which is desirable to generalize it over different species.
14
Ref. # # labels/ Feature Identity Closed Set Open Set
(Year) Species # images Extractor Association Acc. % Acc. %
Table 4: Brief Summary of all multi-species animal re-identification papers surveyed. The
dataset size and the accuracy reported is averaged across all species.
accurate population counts of the endangered Amur Tiger. The dataset con-
tained photos of tigers in various poses, captured at multiple angles and differing
light conditions, in both single camera and multi-camera setups. The datasets
labels each individual tiger, and in addition annotates pose related key-points.
The dataset contains 3649 images of tigers spread over 107 different individu-
als.Several publications regarding Amur Tiger Re-Identification have emerged,
and a few of the more interesting results are described in this section.
Both [22] and [20] describe use of both global and local image features for
re-identification. Features are extracted in multiple streams (from the full im-
age of the tiger as well as from multiple key points like limbs, hip etc.). Both
use variants of ResNet [3] as the backbone network for feature extraction, and
train each network over Triplet Loss [12]. In case of [20], three different streams
are used at train time, and only the global feature stream is used during infer-
ence. Their architecture is named Part-Pose Guided Network (PPGNet). [22]
reports a Top-1 accuracy of 89.4% in the single-camera case, and 77.1% in the
cross-camera case over the test set. [20] reports an astounding 97.7% single
camera top-1 accuracy, and 93.6% top-1 cross camera accuracy, and is the top
submission for the ATRW challenge in the plain re-id track2 .
2 https://cvwc2019.github.io/leaderboard.html
15
In [19], Shukla et al. use Deep Learning networks for re-identification, and
use SIFT descriptors [32] extracted from the image to re-rank the results ob-
tained from use of the deep learning network. Their identification network is
a DenseNet121 model [33]. A combination of Cross-Entropy loss and pair-wise
KL-divergence loss is used to train the network. The system achieves over the
test set show 92.7% accuracy over the single camera case and 84.5% over the
cross camera case.
While the authors of [22] have spent considerable effort to label the datasets
with pose and part based information, such metadata may not in general be
available in datasets of other species. Therefore use of pose based information
for re-identification may actually hinder long term research until such a time
more datasets emerge containing similar metadata. Another aspect that could
improve the dataset is to add test cases for open-set recognition, which don’t
seem to exist currently.
While the system in [20] performs admirably well for tiger re-identification,
the network structure used is highly complex — eight different networks need
to be trained (although only one is used during inference). In addition, as the
authors claim (understandably), the network architectures are built solely to
solve the problem of tiger re-identification, and as a result it might not generalise
to other species. The system described in [19] should generalize well to different
species, given that all features are auto-learned by the CNN and no additional
pose related information is used.
Systems built for particular type of animal naturally tend to use species-
specific characteristics to enhance system performance, be it for re-ranking or
during the object localization phase (such as building object detectors specifi-
cally to identify flanks of the tiger). As such, these mechanisms may not apply
for different types of animals, and removal of some of the species specific mod-
ifications may result in reduced accuracy. In this respect, systems which are
built for multi-species re-identification might be better suited for the problem
addressed through this paper.
16
4.2. Multi-Species Identification
This section describes works which have applied their methods over different
animal species. It is to be noted that the same topology and process is used
for multiple species, but not simultaneously; each model is trained separately.
Unlike in section 4.1, the systems built for multi-species re-identification are far
too different to group together effectively, and are studied individually in their
respective contexts. A summary of each publication presented in this section is
illustrated in Table 4.
Naiser et al. in their work [29] describe a tool for tracking ants, snowbugs
and zebrafish. The re-identification of each individual is performed using an 8-
layer customized CNN trained through a Siamese architecture and uses Triplet
loss. The reported accuracy of tracking is 79.2% for ants (averaged), 80.14% for
snowbugs and 88.14% for Zebrafish. The system is focused on short-term, real-
time tracking of multiple species, and being a lab setup, has no need of open-set
identification. The visual features are auto-learned, and hence transferable to
new species given sufficient training. The study included only small animals,
and the feature extraction network’s performance over larger animals is hard to
predict.
Shukla et al. [24] present their work on identification of two types of pri-
mates, namely Rhesus Macaques and chimpanzees. The identification network
is DenseNet121 [33]. A combination of pairwise KL divergence loss and cross-
entropy loss is the objective function used for training. The system is tested on
the C-Zoo and C-Tai dataset (Chimpanzee) [27] and a custom created Macaques
dataset, with a combined size of 14845 images, and approximately 90 identities
each. A top-1, closed set accuracy of 91.87% and 97.36%, and top-1, open
set accuracy of 66.24% and 88% over the Chimpanzee and Macaques’ dataset
respectively is reported. Since the discriminating feature set is auto-learned,
extending the network to newer species is possible. It is, however, worthwhile
to note that not all animals may be distinguished by their faces, and the same
network would need to be capable of extracting other discriminating features as
well.
17
Schneider et al. [28] compare and contrast the performance of different Deep
Learning Networks and capture their performance over multiple species. In par-
ticular, five different networks – AlexNet [1], VGG-19 [30], DenseNet201 [33],
MobileNetv2 [34] and InceptionV3 [2], are tested for their abilities in one-shot
learning under the Siamese architecture. The five species under study are hu-
mans, chimpanzees, humpback whales, fruit files and octopus. Publicly available
datasets are used for each species, with the exception of octopus, for which a
dataset was created using videos from the internet. Binary Cross Entropy loss
is used as the objective function during training. The publication concludes
that for the species included in the study, DenseNet201 is the optimal network
for feature extraction since it outperformed the other models in four out of
the five datasets, obtaining 92.2, 89.7, 79.3, 75.5 and 61.4 percentage accuracy
on the octopus, human, fruit fly, chimpanzee and humpback whale data sets
respectively. The authors also test the ability of the networks under open-set
recognition, by using 10% of the individuals as a test set, and including none
of the images of these individuals during training. This study is an indicator of
the suitability of networks trained using the Siamese architecture to the animal
re-identification problem.
The study by Deb et al. [23] used images of lemur, golden monkeys and
chimpanzees for individual identification. Faces of the primates are used to
identify individuals, and are first properly aligned before feature extraction.
The images are manually annotated to localize landmarks, namely eyes and
mouth. The neural network introduced here, called PrimNet is based from
the SphereNet CNN. AM-Softmax function is the loss function used to train
the network. Features extracted from the network are compared using Cosine
Similarity, by performing pairwise comparison with all known individuals in
the gallery. The performance of the network is measured over the C-Zoo and
C-Tai datasets [27] for Chimpanzees, LemurFace for Lemurs (containing 3000
images of 129 individuals) and a custom created dataset for Golden Monkeys
containing 1450 images of 49 individuals. Top-1 closed set accuracy reported
is 93.76%, 90.36% and 75.82%, and top-1 open set accuracy is 81.73%, 66.11%
18
and 37.08% for Lemurs, Golden Monkeys and Chimpanzees respectively. One of
the drawbacks for fully automated identification is the use of facial landmarks
(eyes and mouth) to align the image. This would be unfeasible in real-time
applications, and requires an automatic labelling system. It also follows that
identification might fail if the face is not visible in the image, such as if the
image is captured from the side.
Cheema et al. in [15] describe a method that attempts to identify multiple
species with distinct patterns on their skin. The study included tigers, zebra and
jaguars. A Faster-RCNN [35] network is used to localize animals in the image,
and the locations are run through a truncated AlexNet model [1] (ImageNet
pre-trained) to extract distinct features. An SVM classifier [36] is used on the
features for identity association. Their dataset consists of 247 images of tigers
(44 individuals), 821 zebra images (83 individuals) and 112 jaguar images (37
individuals). A top-1 accuracy 80%, 93% and 78% respectively on tiger, zebra
and jaguars is reported. Open-set recognition has not been explored. Use of a
fixed layer for extraction of features, with no particular training, might benefit
certain species, but may not be as effective for others. Instead. choice of the
layer could be species-specific, keeping the rest of the procedure intact to achieve
better performance.
With the exception of [28], the other publications use roughly homogeneous
species – be it primate sub-species for [23] and [24] or animals having distin-
guishing coat patterns for [15]. Only [28] report performances of the system
after testing over a truly heterogeneous set of animals, and observe that the ac-
curacy drops off to as low as 61% for whales. This might imply that it is likely
that the same system might not be effective to use across multiple different
species, but needs to be adapted suitably where required.
In summary, the general procedure used by animal re-identification systems
is provided in Figure 3. A few examples of the various possible choices for
the technique used at each stage are illustrated. Several of these choices are
used by different publications reviewed in this article. The figure underlines the
challenges faced in model and procedure selection for animal re-identification
19
Figure 3: Typical procedure used by animal re-identification systems. A non-exhaustive set
of alternative techniques used at each stage are illustrated.
systems.
5. Person Re-Identification
20
Ref. Market1501 Market1501 DukeMTMC DukeMTMC
(Year) mAP (%) Top-1 (%) mAP (%) Top-1 (%)
Table 5: Performance metrics for Person Re-Identification over the Market15013 and
DukeMTMC4 datasets. mAP stands for Mean Average Precision
3 https://paperswithcode.com/sota/person-re-identification-on-market-1501
4 https://paperswithcode.com/sota/person-re-identification-on-dukemtmc-reid
21
Quan et al. [41] also employ a variant of the PCB to extract features,
and describe automatically generating a network architecture for person re-
identification. Softmax classification is used to assign identities to the tracklets.
This generated architecture may not extend well to animals, given that it has
been specifically built for person re-identification. However, the same method-
ology can be used to generate a specific architecture for animal re-identification
task.
Zheng et al. [42] augment the training data by using Generative Adversar-
ial Networks to create auxiliary images from the dataset while training is in
progress (online generation). A modified ResNet50 backbone is used for feature
extraction, with KL Divergence loss as the objective function. A softmax clas-
sifier is then used to associate identities. Generation of auxiliary images using
GANs is highly innovative; images generated are combinations of different per-
sons or variations of same person. While this is acceptable for human images,
this may not apply to animals, since changing the appearance of the animal will
result in a different individual. A different approach is required here.
Wang et al. [43] use Spatial Attention Networks for re-identification, and
use a variant of the Parts Based Convolutional Baseline (PCB) [44] during
training, where the image of the person is split into multiple horizontal stripes,
and features are extracted separately from each stripe. ResNet50 is the back-
bone feature extraction network. A spatial attention layer is added between the
Global Average Pooling layer and Fully Connected layer to prevent loss of infor-
mation due to average pooling. Identity association is through computation of
euclidean distance between each pair of images and associating identity of the
closest pair to the probe image.
While each individual publication has its unique contributions, some of the
common methods employed are listed below. Use of PCB seems a popular choice
for feature extraction, and variants of the same have been used in [39], [41] and
[43]. [39], [43], [40] and [42] all use a ResNet50 backbone [3] with suitable
modifications as the feature extraction network. [41] uses a Neural Architecture
Search to find the best model for re-identification.
22
Use of PCB may not be suitable to extract features of animals, owing to
the vertical postures of humans vs. primarily horizontal postures of animals.
An observation from these publications is the wide use of the ResNet model
[3] as the backbone, which encourages its use in animal re-identification tasks
as well. As does the use of Siamese Architecture coupled with Triplet-Loss as
the objective function for training the network. A comprehensive survey on
use of deep learning models for person re-identification systems is conducted by
Wu et al. [45]. The survey additionally evaluates the effectiveness of multiple
Deep Learning models using different loss functions. Wu et al. observe that the
ResNet50 model coupled with Triplet loss is one of the best performing systems
among those considered in their survey. This result further encourages use of
the above model – loss function combination.
6. Discussion
Use of the full detection, rather than use of localized parts of the animal (such
as face or head) are favorable in terms of building a robust feature extractor,
which can perform well under differing view points. Deep learning networks
that learn discriminating features without catering to a particular set of species-
specific features is highly desirable. Training networks that learn to discriminate
between individuals using examples of positive and negative images are popular,
in particular, use of Siamese Network Architecture during training along with
Triplet loss or KL Pair-wise divergence loss and their variants could prove to
23
build a system that can characterize individuals accurately. Such methods,
which are known to give good results in the task of human re-identification,
could perform well for tasks of animal identification as well.
24
6.3. Multi-Species Identification
One of the key aspects of any good tracking system is operational speed.
It is necessary for the system to function in real time (or at least near-real
time). However, a majority of the publications surveyed here do not report
the time consumed. This is understandable, since most are built for offline
functions. However, in order to adopt any of these systems for monitoring
applications, the entire identity association needs to be completed in sub-second
time intervals, ideally a few hundred milliseconds at the maximum. Feature
extraction time would otherwise become a bottleneck which could potentially
reduce the overall effectiveness of the system. This could be avoided with the
use of optimized feature extraction networks, and reasonably fast GPUs or the
emerging processors like Tensor Processing Units (TPUs).
25
7. Conclusion
References
26
[4] G. Koch, Siamese neural networks for one-shot image recognition, in: ICML
deep learning workshop, Vol. 2, 2015.
27
[10] Y.-C. Yoon, D. Y. Kim, K. Yoon, Y. min Song, M. Jeon, Online multiple
pedestrian tracking using deep temporal appearance matching association,
arXiv preprint arXiv:1907.00831 0 (2019) 0–0. arXiv:1907.00831.
[12] A. Hermans, L. Beyer, B. Leibe, In defense of the triplet loss for person
re-identification, arXiv preprint arXiv:1703.07737 0 (2017) 0–0.
28
[17] M. Körschens, B. Barz, J. Denzler, Towards automatic identification of
elephants in the wild, arXiv preprint arXiv:1812.04418 0 (2018) 0–0.
arXiv:1812.04418.
[22] S. Li, J. Li, W. Lin, H. Tang, Amur tiger re-identification in the wild, arXiv
preprint arXiv:1906.05586 0 (2019) 0–0.
29
[24] A. Shukla, G. S. Cheema, S. Anand, Q. Qureshi, Y. Jhala, Primate
face identification in the wild, Lecture Notes in Computer Science (2019)
387–401doi:https://doi.org/10.1007/978-3-030-29894-4_32.
[25] Z. Li, C. Ge, S. Shen, X. Li, Cow individual identification based on con-
volutional neural network, in: Proceedings of the 2018 International Con-
ference on Algorithms, Computing and Artificial Intelligence, ACAI 2018,
Association for Computing Machinery, New York, NY, USA, 2018, pp. 1–5.
doi:https://doi.org/10.1145/3302425.3302460.
30
[31] M. Körschens, J. Denzler, Elpephants: A fine-grained dataset for
elephant re-identification, in: 2019 IEEE/CVF International Confer-
ence on Computer Vision Workshop (ICCVW), 2019, pp. 263–270.
doi:https://doi.org/10.1109/ICCVW.2019.00035.
31
tional Conference on Computer Vision (ICCV), 2015, pp. 1116–1124.
doi:https://doi.org/10.1109/ICCV.2015.133.
[40] H. Luo, Y. Gu, X. Liao, S. Lai, W. Jiang, Bag of tricks and a strong baseline
for deep person re-identification, in: The IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 0–0.
[44] Y. Sun, L. Zheng, Y. Yang, Q. Tian, S. Wang, Beyond part models: Person
retrieval with refined part pooling (and a strong convolutional baseline), in:
V. Ferrari, M. Hebert, C. Sminchisescu, Y. Weiss (Eds.), Computer Vision
– ECCV 2018, Springer International Publishing, Cham, 2018, pp. 501–518.
doi:https://doi.org/10.1007/978-3-030-01225-0_30.
[45] D. Wu, S.-J. Zheng, X.-P. Zhang, C.-A. Yuan, F. Cheng, Y. Zhao, Y.-J. Lin,
Z.-Q. Zhao, Y.-L. Jiang, D.-S. Huang, Deep learning-based methods for per-
32
son re-identification: A comprehensive review, Neurocomputing 337 (2019)
354 – 371. doi:https://doi.org/10.1016/j.neucom.2019.01.079.
[46] A. Bendale, T. E. Boult, Towards open set deep networks, in: 2016 IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), 2016,
pp. 1563–1572. doi:https://doi.org/10.1109/cvpr.2016.173.
33