0% found this document useful (0 votes)

77 views15 pages

00 (2020) Sun Chen - A Survey of Multiple Pedestrian Tracking Based On Tracking-By-Detection Framework

This article surveys recent advances in multiple pedestrian tracking algorithms that use a tracking-by-detection framework. It reviews milestone works in tracking-by-detection and summarizes the main procedures, which include detection, data association, and trajectory management. The performance of existing algorithms is analyzed on benchmark datasets and factors affecting performance are discussed. Open issues and future directions for tracking-by-detection algorithms are also presented.

Uploaded by

mat fel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

77 views15 pages

00 (2020) Sun Chen - A Survey of Multiple Pedestrian Tracking Based On Tracking-By-Detection Framework

Uploaded by

mat fel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2020.3009717, IEEE
Transactions on Circuits and Systems for Video Technology
1

A Survey of Multiple Pedestrian Tracking Based on

Tracking-by-Detection Framework
Zhihong Sun, Jun Chen∗ , Member, IEEE, Chao Liang, Weijian Ruan, and
Mithun Mukherjee, Senior Member, IEEE

Abstract—Multiple pedestrian tracking (MPT) has gained Model-free-tracking (MFT)

significant attention due to its huge potential in a commercial
Manual MOT 2
application. It aims to predict multiple pedestrian trajectories initialization tracking
1 3
and maintain their identities, given a video sequence. In the
past decade, due to the advancement in pedestrian detection Video sequence 2:n frame Trajectory
algorithms, Tracking-by-Detection (TBD) based algorithms have
achieved tremendous successes. TBD has become the most
Tracking-by-detection (TBD)
popular MPT framework, and it has been actively studied
in the past decade. In this paper, we give a comprehensive MOT 2
survey of recent advances in TBD-based MPT algorithms. We Detector 1 3
tracking
systematically analyze the existing TBD-based algorithms and
organize the survey into four major parts. At first, this survey Video sequence 1:n frame Trajectory
draws a timeline to introduce the milestones of TBD-based works
which briefly reviews the development of the existing TBD-based Fig. 1: Illustration of MFT and TBD frameworks in MPT.
methods. Second, the main procedures of the TBD framework
are summarized, and each stage in the procedure is described
in detail. Afterward, this survey analyzes the performance of
existing TBD-based algorithms on MOT challenge datasets and main task of MPT is to predict multiple pedestrian trajectories
discusses the factors that affect tracking performance. Finally, and maintain their identities in a video sequence. As a mid-
open issues and future directions in the TBD framework are level task in computer vision, MPT grounds the high-level
discussed.
tasks, such as behavior analysis [9], action recognition [10],
Index Terms—Computer vision, multiple pedestrian tracking, and pose estimation [11] in a wide range of applications, e.g.,
tracking-by-detection, data association. multimedia analysis [12] and visual surveillance [13]. These
practical requirements have sparked enormous interest in this
I. I NTRODUCTION topic.
Multiple object tracking (MOT) that has already proven its Based on how the pedestrian are initialized, Most of
potential in computer vision has attracted huge attention from the existing MPT algorithms can be grouped into two
both academia and industries. Basically, tracking objects can categories [14]: model-free-tracking (MFT) [15]–[18] and
be pedestrians, cars, and other purposes. Multiple pedestrian tracking-by-detection (TBD) [19]–[25] frameworks. As shown
tracking (MPT), a subtask of MOT, has become the main in Fig. 1, MFT-based methods require the manual initialization
research direction of MOT. There are two main reasons as of a fixed number of pedestrians in the first frame. Afterward,
follows. First, pedestrian is a typical non-rigid object which it localizes these pedestrians in the subsequent frames. Another
is prone to deform more challenging, it can be treated as approach termed as TBD-based framework, at first, localizes
an ideal example to study the MOT [1]. Second, tracking the pedestrian in each frame, then connects these object
pedestrians in the monitoring scene can bring business value. hypotheses into the trajectories without any initial labeling.
In literature [1]–[8], a significant amount of MOT studies Compared with the MFT, the TBD framework is more popular
focused on pedestrians. Therefore, without drawing any hard for the following reasons: (a) TBD framework, which can
line between MPT and MOT, the specific use of MPT and handle any scene with a variable number of pedestrians, uses
MOT in this article will be consistent with the reference. The a pre-trained detector in each frame and (b) the precision of
detector is higher (i.e., the mean average precision of hybrid
Z. Sun, J. Chen (the corresponding author), W. Ruan, and C. Liang
are with the National Engineering Research Center for Multimedia Soft-
task cascade detection algorithm [26] on COCO [27] can reach
ware, School of Computer Science, and Hubei Key Laboratory of Multi- up to 46.9%) because of the significant advancement in object
media and Network Communication Engineering, Wuhan University, Wuhan detection. Thus, the accuracy of the detection results has a
430072, China (e-mail: [email protected]; [email protected];
[email protected]; [email protected]).
positive impact on the performance the TBD-based tracker.
M. Mukherjee is with the College of Artificial Intelligence, Nanjing Hence, TBD is the mainstream framework in MPT [28].
University of Information Science and Technology, Nanjing, China, e-mail: In the following, we discuss the main contributions and the
[email protected].
This work is supported by National Nature Science Foundation of China limitations of the closely related surveys [1]–[6] on the MPT.
(No. U1611461, 61876135, 61801335, 61672390, U1736206, U1903214), • Luo et al. [1] presented the first comprehensive review
National Key R&D Program of China (No. 2017YFC0803700), Hubei
Province Technological Innovation Major Project (2018AAA062, 2018C- about MOT. They provided a unified formulation of the
FA024, 2017AAA123, 2019CFB472). MOT problem and described the main techniques used in

1051-8215 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Carleton University. Downloaded on August 09,2020 at 08:12:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCSVT.2020.3009717, IEEE
Transactions on Circuits and Systems for Video Technology
2

MOT. As this survey was one of its kinds, there was no directions in the TBD-based algorithms for MPT. We aim
scope to discuss the TBD framework and deep learning- to provide meaningful insight for the development of new
based MOT algorithms that gain significant attention in tracking methods in TBD-based algorithms for MPT.
later years. The rest of the paper is organized as follows. Section II
• In 2015, Qiu et al. [2] presented a survey on motion- reviews the milestones of existing TBD-based methods with
based MOT algorithms. Their study was focused on the a timeline. Four main steps and two processing techniques
radar-related research area. However, the discussion on in the procedure of TBD framework are described in Sec-
the visual information was missing. Moreover, from the tion III. Section IV introduces the evaluation metrics, publicly
TBD framework point of view, the authors discussed the available datasets and analyze performance of existing TBD-
data association module, which is only one of the four based algorithms on these datasets. Existing issues at present
steps in TBD framework. Other steps, such as, object and future research directions of TBD-based algorithms are
localization, feature extraction, and track management discussed in Section V. Finally, conclusions are drawn in
cannot be ignored in the TBD framework. Section VI.
• Later, Camplani et al. [3] provided a survey on MPT
algorithms associated with RGB-D data. This survey
II. M ILESTONES OF EXISTING TBD- BASED METHODS
filled the gaps in the visual information that was not
fully covered in [2]. However, the discussion on TBD Through the unremitting efforts of the researchers, the
framework was still missing. TBD-based approaches have achieved remarkable success
• One of the highly related surveys that outlines the MPT in different aspects. We have reviewed the existing TBD-
algorithms is presented by Zhou et al. [4]. The four stages based algorithms in past decade and found that researchers
of the MPT algorithms are discussed. However, one of the mainly focused on the following four aspects: (a) design the
major limitations is that the detailed performance analysis association methods, (b) joint other vision tasks, (c) apply
was not performed on the existing MPT algorithms. deep learning to MPT, and (d) multi-modality-based MPT.
• Note that none of the previous surveys in [1]–[4] cover To the ease of understanding, we select the first proposed
the deep-learning methods in this topic. In 2019, Xu et work in each aspect to serve as milestone and describe the
al. [5] summarized and analyzed the deep learning-based motivation and principle of the proposed work. We illustrate
MOT algorithms. In particular, they focused on the ap- a timeline to introduce these milestones for the existing TBD-
plication of deep learning techniques in MOT algorithms, based methods in Fig. 2.
and the TBD framework was not thoroughly investigated.
Besides, Ciaparrone et al. [6] provided a review on deep A. Association methods
learning-based MOT algorithms. They mainly focused
As discussed above, the core of TBD framework is data
on the application of deep learning techniques in four
association. Researchers have proposed some classical associ-
steps in MOT algorithm. However, they ignored the
ation methods and many of them are still used as basic algo-
track management module which is an important module
rithms. For example, Hungarian method was used to find the
in TBD-based algorithms. Although the deep-learning
optimal association to an assignment problem in 2008 [7]. It is
techniques are gaining a significant attention in recent
a local optimal method that assigns the identity label to each
years, the traditional algorithms still play major role in
detection in every frame. Due to fast processing, Hungarian
MOT.
method was widely used in MPT in past decade [29]–[31].
From the above discussion, it is evident that TBD is Although the Hungarian is fast, the accuracy is not upto
becoming main framework in MPT. However, none of the the mark due to it is local optimal nature. Some researchers
surveys [1]–[6] particularly focused on it. Therefore, it is have tried to model data association as global optimal solution.
necessary to summarize and analyze the existing TBD-based In 2008, Zhang et al. [19] introduced a network flow (NF)
MPT algorithms to pave the way of study TBD-based methods method for the MPT, which models the association as disjoint
for MPT further. In this paper, our aim is to provide a survey flow paths in a cost-flow network. This is one of the earlier
for specially introducing the TBD framework in MPT in detail. work that applied NF to MPT. By a global optimal solution,
The main contributions of this survey are summarized as this approach achieved a good performance. Inspired by [19],
follows: many researchers have proposed improved algorithms, such
• We illustrate a timeline to introduce milestones of existing as Lagrangian relaxation-based NF [22], pair-wise cost-based
TBD-based algorithms and discuss the main steps in a NF [24], and bi-level optimization-based NF [32].
TBD framework. These can help the researchers to under- In the same year of 2008, Shafique et al. [33] used maximum
stand the development of existing TBD-based algorithms weighted stable set (MWIS) for the data association problem.
and main models used in each step in a TBD framework. They formulated the data association problem as a maximum
• Furthermore, we present the experimental results of TBD- weight problem and obtained pedestrian trajectories by an
based algorithms on publicly available MOT challenge global optimal solution. It is the first time when MWIS was
datasets and analyze the characteristic of each tracker in used for solving data association in TBD framework. However,
detail. We also discuss the major factors that affect the the availability of reliable tracklets cannot be guaranteed
tracking performance in MPT. in [33]. Thereafter, many researchers have tried to solve this
• Finally, this survey outlines the open issues and future problem by introducing several approaches, such as MWIS

NF-based Joint SOT and MPT CRF-based GMCP-based Deep learning-based Multi-modality-based
Zhang et al. [19] Xing et al. [47] Yang et al. [35] Zamir et al. [42] Kim et al. [54] Zhang et al. [59]

2008 2009 2010 2011 2012 2015 2017 2019

Hungarian-based Joint Segmentation and MPT
Singh et al. [7] MCSM-based End-to-end model-based
Zhang et al. [51] CEM-based
Tang et al. [44] Milan et al. [57]
MWIS-based Andriyenko et al. [38]
Shafique et al. [33]

Fig. 2: Milestones of existing TBD-based MPT algorithms over the past decade. We combed through TBD-based algorithm with
four parts: data association methods (dotted rectangle), joint other vision tasks (dotted rounded rectangle), deep learning-based
(solid rounded rectangle), and multi-modality-based (solid rectangle)

based on rank-constrained continuous relaxation [33] and are estimated from the multiple SOT tracking models. In 2009,
polynomial-time MWIS [34]. Xing et al. [47] used a SOT to generate initial tracklets in the
It is worthwhile to note that both NF and MWIS methods local stage. In addition, many researchers have used SOT to
share the assumption that the associations are independen- predict the location of pedestrian in MPT to the handle missing
t of each other. However, association dependencies cannot detection problems in raw detection result at a crowded scene
be totally ignored. In 2011, Yang et al. [35] proposed a (e.g., KCF [48], SiamFC [49], and Siamese-RPN [50]).
conditional random filed (CRF) model for tracking multiple Another vision task that is widely used in MPT is seg-
pedestrians. They formulated data association as CRF with the mentation. In brief, segmentation can predict the location by
explicit use of association dependencies. A huge experimen- pixel and discussed the level-set framework to track contours
tal results have demonstrated the importance of association of multiple pedestrians with mutual occlusions in real-time.
dependencies. On the basis of the idea, many improved CRF In fact, tracking and segmentation are closely related and
algorithms have been proposed, for example, pair-wise model- they can help each other. For example, object segmentation
based CRF [14], mixed discrete-continuous CRF [36] and deep would separate person from other targets and background,
continuous CRF [37]. Besides, Andriyenko [38] considered which will be useful for locating the target in every frame.
that the number of possible trajectories over time is large Many researchers pay attention to the multi-task topic which
in NF method. Basically, the proposed continuous energy combines tracking and segmentation [51]–[53].
minimization (CEM) method considered limitation of the state
space. Since, CEM-based method is a global optimal solution, C. Deep learning in MPT
good performance is obtained. Based on the CEM, several
Appearance feature is the important cue for calculating the
improvements are discussed in discrete CEM [39], pair-wise
similarity between two detection boxes. Note that the low-level
label cost CEM [40], and sparse representation CEM [41].
handcraft features have been widely used in MPT before 2015.
As most of the existing association methods address all of Interestingly, the convolutional neural networks (CNN) have
the objects simultaneously, computation complexity is very been applied in several vision tasks and outperformed hand
high. To deal with this issue, in 2012, Zamir et al. [42] engineered features. In 2015, Kim et al. [54] utilized CNN to
proposed a global framework to track multiple pedestrians extract the 4096-dimensional feature for each detection box.
by utilizing generalized minimum clique graphs (GMCP). On This work is the first time when CNN was used to extract a
the basis of GMCP, Dehghan et al. [43] formulated data high-level feature in MPT. Since then, CNN has been widely
association as a generalized maximum multi clique problem used in MPT. Several CNN models have been used to design
(GMMCP). In fact, NF, MWIS, CRF, and GMCP are graph- more robust with distinct features, such as VGGNet [46], [55]
based association methods by presenting the pedestrian trajec- and GoogleNet [50], [56].
tory in a graph. On the contrary, Tang et al [44] formulated Obviously, the modules in MPT result in high complexity.
the association as a minimum cost subgraph multicut problem Some researchers tried to design an end-to-end model in a
(MCSM) that links and performs clustering for the multiple single CNN framework. In 2017, Milan et al. [57] proposed,
plausible person detection jointly over time and space. The for the first time, an end-to-end model for online MPT. They
number and size of tracks are not specified as constraint rather cast the classical Bayesian state estimation, data association
they are obtained by the solution. Based on MCSM, several re- as well as track initiation and termination tasks as a recurrent
search works have been proposed, such as, DeepMatching [45] neural net, allowing for full end-to-end learning of the model.
and minimum cost lifted multicut formulation [46]. Inspired by this article, many researchers have proposed some
good end-to-end model, such as end-to-end tracklet association
B. Joint other vision tasks module [32] and end-to-end transportation network [58].
Several researchers have leveraged other vision tasks to
improve the tracking performance. For example, some re- D. Multi-modality-based MPT
searchers argued that MPT is a generalized single object Generally, single type of data has been used as input for
tracking (SOT) problem [47] where the locations of targets tracking in a traditional TBD-based algorithm. However, this

Pre-Processing Post-Processing
1
2 3 4 56
1 2
3
4
5
6

Video sequences Object localization Feature extraction Data association Track management Tracking results

Feature 1 Track Detection Track update Track termination Track Initialization

Fig. 3: The main procedures of TBD framework, which consists of four core components and two processing techniques. The
four core components contain pedestrian localisation, feature extraction, data association and track management. Processing
techniques contain pre-processing and post-processing techniques.

method cannot preserve the reliability. To address the problem, 1) Detection results from MPT datasets: Earlier, histograms
Zhang et al. [59] introduced a multi-modality MOT framework of oriented gradients (HOG) detector [62], deformable part-
for tracking objects. To solve the MOT problem with multi- based model (DPM) detection method [63] and background
modality, the authors used image and point cloud feature subtraction method [64] are widely used to detect the pedes-
extractor in feature extract phase. By extensive experiments, trian in previous tracking datasets (such as PETS1 , TUD2 ,
it has been observed that the multi-modality-based algorithms and ETHMS3 ). Later, aggregated channel features (ACF) [65]
can improve both reliability and accuracy. Inspired by this detection algorithm is used to detect pedestrian for the images
work, some researchers pay attention to multi-modality-based in MOT20154 dataset [66] released by MOT Challenge in
algorithms. For example, Gautam et al. [60] introduced a prac- 2015. Interestingly, the deformable parts model v5 (DPM) [67]
tical and lightweight tracking system, termed as SDVTracker, that outperforms other detectors has been used on MOT20165
a multi-sensor tracker with both LiDAR and detections as dataset. Later, faster R-CNN (FRCNN) [68], DPM, and scale-
asynchronous input. Later, Kuang et al [61] proposed a general dependent pooling detector (SDP) [69] are used on MOT2017
6
multi-modality cascaded fusion framework which combines dataset. Most recently, the pedestrian detection results are
detection and LiDAR information. obtained using an improved FRCNN [70] with ResNet101
backbone for the CVPR19 training sequences on MOT2019
III. T HE MAIN PROCEDURE OF TBD FRAMEWORK dataset7 .
The goal of MPT is to detect multiple pedestrians in each 2) Other localization methods: In addition to the detection
frame and maintain their identity information across frames. results provided by the public MPT datasets, many researchers
However, few discussions focused on the main procedure of used other localization methods to locate the pedestrian in
the TBD framework. Despite a considerable variety of TBD- TBD-based algorithms. The main localization methods in-
based approaches discussed in the literature, the majority of clude: filter-based, motion model-based, other computer vision
the TBD-based algorithms consider either a part or all of these tasks-based and deep learning-based. We summarize these
following steps: object localization, feature extraction, data methods in Table I.
association, and track management, as shown in Fig. 3. In • Firstly, filter-based methods can be used to predict the
addition to these four main steps, many TBD-based algorithms location of pedestrian in next frame, where the current
may contain pre-processing and post-processing techniques. In object state only depends on the previous states. If the
the following, we discuss the basic models and approaches pedestrian detection is missing, filter-based approaches
involved in each one of the steps in a TBD framework. can predict the position in the next time. The most
common filters used in a MPT may include: Kalman
A. Pedestrian localization filter (KF) [30], extended Kalman filter (EKF) [38], and
As the detection result is used as input to the data asso- particle filter (PF) [88].
ciation, it has a significant impact on the association results • Secondly, kinetic characteristic in motion to predict the
as well as the final tracking result. Generally, the detection location is simple and fast with as assumption that the
results were provided by the publicly available MPT datasets pedestrian moves consistently over a short time interval.
for a fair comparison. Some state-of-the-art detectors often Moreover, assuming the constant velocity, a constant
miss detection due to occlusion in crowded scenes. Hence, to
better track the pedestrian, many researchers used some other 1 PETS2009: http://www.cvg.reading.ac.uk/PETS2009/a.html
2 TUD:https://www.d2.mpi-inf.mpg.de/node/428
object localization methods to locate the missing detection.
3 ETHMS: http://www.vision.ee.ethz.ch/en/datasets/
These localization methods can assist public detection results. 4 MOT2015: https://motchallenge.net/data/2D MOT 2015/
On the whole, in object localization step, detection results were 5 MOT2016: https://motchallenge.net/data/MOT16/
mainly obtained by two ways: provided by MPT datasets and 6 MOT2017: https://motchallenge.net/data/MOT17/

other localization methods. 7 MOT2019: https://motchallenge.net/workshops/bmtt2019/tracking.html

TABLE I: Summary of other localization methods used in TBD-based algorithms

Method Description Advantage Disadvantage Techniques
Unsuitable for long term KF [56], [71]–[75], PF [76]–[79], and
Filter Prediction based on the previous state Simple
prediction EKF [39], [40], [80]
Motion Prediction according to the velocity Simple Unsuitable for irregular exercise CVM [81], [82]
Prediction by separating the pedestrian from DeepLabv [53], [83], Multicut [84], and
Segmentation Accurate Complex and slow
other tragets and background SVM [52]
KCF [48], SiamFC [49],
Prediction by object’s template in the next Not suitable for long-term
SOT Accurate Siamese-RPN [50] and Dense
frame prediction
sampling [8]
LSTM [57], [85], RNN [86] and
CNN Prediction by historical location information Accurate Complex and slow
DRL [87]

velocity model was used to predict the prediction at the but it ignores the spatial distribution of pixel values. Generally,
next frame [81], [82]. a histogram of oriented gradient (HOG) is a feature descriptor
• Thirdly, other vision tasks can be used in MPT to locate used for pedestrian detection in computer vision [23], [88],
the missing detection. Joint segmentation and tracking is [94], [95]. It provides the histogram of the gradient direction
the one of the popular methods [51]–[53]. Specifically, of the local area to form the feature. As shown in Fig. 4(b),
object segmentation separate person from other target and it can be kept invariant on geometric and optical deformation,
background, which will be useful for locating the person but it is hard to handle occlusion. Moreover, optical flow (OF)
in every frame. Now, during occlusion, the pixel labels in can be regarded as a local feature if we take the image pixel
the visible part of the pedestrian guide a tracker to find as unit [94], [96], [97]. It is suitable for crowded scenarios,
the correct location of the pedestrian. Another method is however, with a higher computing complexity. The example of
to use SOT to locate the pedestrian in the next frame. OF is illustrated in Fig. 4(c). Furthermore, local binary pattern
It uses the appearance template of the previous frame (LBP) is used to describe the local texture of images [98],
of the pedestrian to search at the next frame to find [99]. As shown in Fig. 4(d), it can keep invariant on grayscale
the position with the highest probability. Note that SOT invariance and rotation.
not only can handle inaccurate detection results, but also 2) High-level feature extracted by CNN: With the upsurge
reduce identity switches in MPT [29], [47], [89]. of deep learning and the attractive performance of features ex-
• At last, deep learning techniques can be used to predict tracted by the CNN in visual fields, high-level feature extracted
the location of pedestrian in the next frame. Some re- by CNN have been widely used in MPT (see Fig. 4(e)). To
searchers used long short-term memory (LSTM) to learn the best of our knowledge, Kim et al [54] introduced a high-
a complex dynamic model for predicting pedestrian in the level feature extraction in MPT. Afterward, several high-level
next frame [57], [85]. Generative Adversarial Network features have been extracted by the CNN in MPT.
(GAN) architecture is also used to predict pedestrian At present, the high-level features used in MPT can be
localization, which overcomes issues related to occlusion grouped into two categories: spatial and spatial-temporal
and noisy detection [90]. In addition, recurrent neural feature. Basically, a spatial feature is the feature of bounding
network (RNN) [86] and deep reinforcement learning box extracted by the various CNN models in single im-
(DRL) [87] can be used to predict the pedestrian location. age, such as SiameseNet [49], [100], VGGNet [46], [55],
[59], AlexNet [31], ResNet [25], [30], [32], [101], and
GoogleNet [56], [102]. Besides, as an another high-level
B. Feature extraction
feature, spatial-temporal features also have became popular
To find the similarity between two pedestrians in MPT, in recent years. Compared to spatial feature, spatial-temporal
the appearance feature is extracted before the data association feature depicts the appearance characteristics of pedestrians
phase in the TBD-based algorithm. It is an essential cue for on multiple frames. It combines the information in time and
affinity computation in MPT. Appearance features are broadly space domain and it more robust. Note that ResNet [103],
categorized into low-level handcrafted and high-level features. LSTM [73], [85], [104], VGGNet [72], FaceNet [105], and
1) Handcrafted feature: Before the popularity of deep GoogleNet [50] are widely used CNN models to extract the
learning in MOT, the handcrafted feature is the main feature spatial-temporal feature.
that was used to distinguish pedestrians. The handcrafted
feature is based on raw pixel template representation for
simplicity. There are four popular handcrafted features used C. Data association
in MPT, including color histogram, gradient feature-based, Data association is one of the core components in MPT.
optical flow, and local binary pattern. The goal of data association is to identify a correspondence
The color histogram (CH) is the most widely used visual between a collection of new detection and previously detected
feature in image retrieval [91]–[93]. The main reason is that pedestrians. According to the way of processing mode, data
color is often related to the pedestrian or scene contained in the association can be further categorized into online and offline
image. It’s the most commonly used color feature representa- methods. An online association approach associates the de-
tions, as shown in Fig. 4(a). It’s robust to photometric changes, tections of the incoming frame immediately to the existing

(a) (b) (c) (d) (e)

Fig. 4: An illustration of various visual features. (a) CH feature, (b) HOG feature, (c) OF feature, (d) LBP feature, and (e)
CNN feature.

TABLE II: Summary of main features used in TBD-based algorithms.

Feature Description Advantage Disadvantage
Calculates the proportion of color in an
CH [91]–[93] Robust to photometric changes Ignores the pixel distribution
image
HOG [23], Counts the histogram of gradient direction The geometric and optical deformation can
Cannot handle occlusion
[88], [94] of the local area be kept unchanged
OF [94], [96], Describes the motion of the pixel over the
Suitable for crowded scenarios High computational complexity
[97] time domain
The grayscale invariance and rotation can Not suitable for scenes with uneven
LBP [98], [99] Describes local texture features of images
be kept unchanged lighting changes
DF [25], [30],
Extracts high-level features by CNN More robust and discrimination High computational complexity
[32], [101]

trajectories; therefore, this approach is more appropriate for 2) Offline data association: Offline association methods
real-time applications. Bipartite graph matching is widely used can obtain global optimization solution. Graph model is the
for online data associations (such as Hungarian algorithm). more popular method, where vertex nodes represent detection
Besides, the offline association approach considers the detec- results in each frame (or short tracklets) and relevant edges
tions from all the frames through an entire sequence. It is denote the cost to measure the similarity between two detec-
worthwhile to note that the global optimization methods, e.g., tions (or short tracklets). These graph-based method belong to
such as NF [19] and max weight independent set [33], are network flow (NF) [19], conditional random filed (CRF) [35],
widely used in data association. generalized minimum clique graphs (GMCP) [43], maximum
1) Online data association: Online association method weighted stable set (MWIS) [33], and minimum cost subgraph
aims to obtain the local optimization solution and focuses multicut (MCSM) [44]. Another popular global optimization
on exploring which features to use for calculating the sim- method used in TBD-based algorithm is continuous energy
ilarity between detection and existing track. Denote Dt = minimization (CEM) [38]. The goal of CEM is to fit a set
{Dti }, i = 1, ..., n as the set of n detections at time t and of trajectories to the data while satisfying some constrains
(t−1)
T t−1 = {Tj }, j = 1, ..., m as the set of m trajecto- mimicking tracking in real world scenario. We have discussed
ries at (t − 1) frame. Each trajectory consists of detections these six main offline data association methods in Section II.
Tj = {Dj,t }, t = tj,s , ..., tj,e , Dj,t is the detection of the Moreover, we provide the brief description in Table III, where
jth trajectory in the (t − 1)th frame, and tj,s and tj,e are the G is the graph consists of vertex set V and edge set E, both
starting and ending frame of Tj , respectively. l and σ indicate the connection between two edges, re is the
Firstly, the affinity between detection and track is con- regularization term, w represents the weight of edge, d keeps
structed. This affinity cue not only contains a single cue (i.e., the solution close to the detections, dn, ex, and pe are motion,
motion) [106], [107], but also can contain multiple cues (i.e. physical, and pedestrian persistence constraint, respectively.
appearance and motion feature) [41], [108]–[110]. Therefore,
affinity is expressed as
D. Track management
∧ (i, j) = ∧A (i, j) ∧M (i, j) ∧S (i, j) , (1)
After data association, a rule needs to be designed to
where ∧A (i, j), ∧M (i, j), and ∧S (i, j) represent the affinity manage the association results. Generally, track management
of appearance, motion, and shape models between the ith contains three essential steps: track update, track termination,
detection and the jth track, respectively. and track initialization.
Secondly, after calculating the affinity between detection • Track update: After data association, we update the state
and track, then the cost C is obtained as of the track, which successfully matches a detection. If
a track is associated with one and only one detection,
C(i, j) = 1 − ∧(i, j), (2)
we assume that the pedestrian is isolated and tracked
where C(i, j) is the cost between detection i and track j. If correctly (but not necessarily precisely). Then, the state
the detection i is similar to the track j, then the cost C(i, j) of the pedestrian is updated with the detection.
is small. Finally, the Hungarian algorithm is used to compute • Track termination: When a track does not associate
an assignment which minimizes the total cost. with any detections, we consider that the pedestrian is

TABLE III: Summary of offline data association TBD-based algorithms

Method Formulation Description Vertex
NF G = (V, E, l) Cost can be solved in polynomial time Detections [19], [21], [22], [24] or tracklet [32], [111]
CRF G = (V, E, σ) Better distinction between difficult pairs of targets Detections [52] or tracklet [14], [35]–[37], [112]
GMCP G = (V, E, w) Temporal span of a video is considered Detection [42] or tracklet [43]
MWIS G = (V, E, w) Can handle long-term occlusion Tracklet [33], [34], [113]
Link and cluster plausible hypotheses jointly over
MCSM G = (V, E, σ) Detection [44]–[46], [84], [114]
time and space
CEM Q = f (d, dn, ex, pe, re) Suitable for the tracking scenario in real world 7

4 4 4
duce the potential false-positive tracks or remove false pos-
1 3

6
7 95
12
2
11
10
1 3

6
7 95
2
1 3

6
7 95
2
itive detection from the trajectory, matched frames threshold
(MAF) [28], [46], [119] is used as a post-processing technique,
8 8 8

thereby improving the tracking performance of the TBD

framework. Besides, it is often observed that the trajectory
(a) (b) (c) of a target may be either discontinuous or missing with some
frames due to occlusion. The interpolation (IPL) method [40],
Fig. 5: Pre-processing technique to filtering out false detec- [83], [120] can be used as a pre-processing technique to
tions. (a) raw detection from MOT2016, which is obtained recover the location of a pedestrian in the occlusion frame.
by using DPM detector, (b) using NMS method, and (c) CF
technique.
IV. E VALUATION OF TBD- BASED ALGORITHMS
A. Evaluation metric
invisible. The trajectory of an unassigned track will be To evaluate the performance of MPT algorithm, we mainly
terminated, and it will be removed from the track set. consider the detection and tracking metrics. The detection
• Track Initialization: After data association, when detec- metrics can be divided into two subsets as: accuracy and
tion is not associated with any existing tracks, we assume precision [127]. Basically, to evaluate the detection perfor-
that the detection is a new pedestrian candidate. A new mance in MPT, multiple object detection accuracy (MODA)
detected track is initialized in the trajectory set. is widely used. To explain, it considers the relative number
of false positives (FP) and false negatives (FN). Subsequently,
E. Pre-processing technique the multiple object detection precision (MODP) metric is used
In crowded environments, occlusions and noisy detection to measure the precision of alignment between detections and
(often termed as false detection) are very common. As shown the ground truths (GT). As the detection results are input to
in Fig. 5(a), false detection still exists in several state-of-the-art the tracking algorithm, both MODA and MODP often share
detectors. The false detection increases the number of targets the same benchmark dataset.
that leads to additional processing. The worst part is that
P
t (F Nt + F Pt )
these false detections interfere with the correct association and MODA = 1 − P , (3)
t N (GT )t
degrades the performance of the tracker. Thus, pre-processing
IOU(GTti , Dti )
P
technique is required to filter out the false detection. MODP = t P , (4)
At present, non-maximum suppression (NMS) and confi- t Mt
dence filter (CF) are two main pre-processing techniques. The where F Nt , F Pt , and N (GT )t are represent the number
NMS algorithm, an integral part of many detection algorithms of FN, FP, and GT, respectively, at the frame t. Besides,
in computer vision [115], selects a bounding box with the max- IOU(GTti , Dti ) is the intersection over union (IOU) of the ith
imum detection score and uses a pre-defined overlap threshold mapped pair between ground truth and its associated detection
to suppress its neighboring boxes as shown in Fig. 5(b). This result, and Mt is the number of matches at frame t.
algorithm is widely used in TBD-based algorithms [55], [56], In 2008, Bernardin et al. [128] introduced a systematic and
[116] to reduce the number of false detection. comprehensive evaluation of MOT tracking metrics based on
Another algorithm, termed as CF, is effective in filtering out
accuracy, precision, and completeness. In fact, the multiple
false detection in the TBD-based algorithms [50], [117], [118].
object tracking accuracy (MOTA) metric combines FN, FP,
As shown in Fig. 5(c), every bounding box has a confidence
and the number of identity switches (IDS) metrics as follows
score in the MOT dataset to measure the confidence level of P
the detector to detect the instance as a pedestrian. The higher (F Nt + F Pt + IDSt )
MOTA = 1 − t P , (5)
value in the score of the bounding box results in a higher t N (GT )t
possibility of detecting a pedestrian. where IDSt denotes the number of IDS at frame t.
For the precision metric, multiple object tracking precision
F. Post-processing technique (MOTP) measures the pedestrians that are tracked by a bound-
The false detection that is retained even after the pre- ing box overlapped in the image. Note that the processing
processing is temporarily assigned to a trajectory. To re- speed (in Hz) represents the processing frame number per

TABLE IV: MPT datasets and description.

Dataset Type Length Density Data format
MOT2015 train 5500 7.3 Detection: <frame number, ID, (2D coordinate) [x, y, w, h], confidence, (3D position) [-1, -1, -1]>
test 5783 10.6 Ground truth: <frame number, ID, (2D coordinate) [x, y, w, h], 0-1 indicator, (3D position) [-1, -1, -1]>
MOT2016 train 5316 20.8 Detection: <frame number, ID, (2D coordinate) [x, y, w, h], confidence, (3D position) [-1, -1, -1]>
test 5919 30.8 Ground truth: <frame number, ID, (2D coordinate) [x, y, w, h], 0-1 indicator, object type, visibility score >
MOT2017 train 15948 21.1 Detection: <frame number, ID, (2D coordinate) [x, y, w, h], confidence, (3D position) [-1, -1, -1]>
test 17757 31.8 Ground truth: <frame number, ID, (2D coordinate) [x, y, w, h], 0-1 indicator, object type, visibility score >
MOT2019 train 8931 163.0 Detection: <frame number, ID, (2D coordinate) [x, y, w, h], confidence, [-1, -1]>
test 4479 179.4 Ground truth: <frame number, ID, (2D coordinate) [x, y, w, h], 0-1 indicator, object type, visibility score >

TABLE V: Results of TBD-based algorithms on MOT2015. Red value denotes the best. ↑ indicates that the higher is better
and ↓ represents opposite
Year and Author Tracker MOTA(↑) MOTP(↑) IDF1(↑) MT(↑) ML(↓) FP(↓) FN(↓) IDS (↓) Frag(↓) Hz(↑)
2019 Bergmann et al. TWBW [100] 44.1 75.0 46.7 18.0 26.2 6477 26577 1318 1790 0.9
2019 Chu et al. IAT [121] 38.9 70.6 44.5 16.6 31.5 7321 29501 720 1440 0.3
2017 Chen et al. MTCN [122] 38.5 72.6 47.1 8.7 37.4 4005 33203 586 1263 6.7
2019 Xu et al. STRN [103] 38.1 72.1 46.6 11.5 33.4 5451 31571 1033 2665 13.8
2017 Sadeghian et al. TTU [123] 37.6 71.7 46 15.8 26.8 7933 29397 1026 2024 1.9
2018 Keuper et al. MSA [83] 35.6 71.9 45.1 23.20 39.3 10580 28508 457 969 0.6
2018 Fang et al. RAN [86] 35.1 70.9 45.4 13.0 42.3 6771 32717 381 1523 5.4
2017 Yang et al. HAD [81] 35.0 72.6 47.7 11.4 42.2 8455 31140 358 1267 4.6
2019 Wu et al. IARL [124] 34.7 70.7 42.1 12.5 30.0 9855 29158 1112 2848 2.6
2017 Chu et al. STAM [125] 34.3 70.5 48.3 11.4 43.4 5154 34848 348 1463 0.5
2016 Wang et al. CDE [126] 34.3 71.7 44.1 14.0 39.4 7869 31908 618 959 6.5
2017 Song et al. QCNN [31] 33.8 73.4 40.4 12.9 36.9 7898 32061 703 1430 3.7
2018 Zhou et al. DCCRF [37] 33.6 70.9 39.1 10.4 37.6 5917 34002 866 1566 0.1
2016 Yang et al. TDAM [82] 33.0 72.8 46.1 13.3 39.1 10064 30617 464 1506 5.9
2018 Bae et al. CBDA [108] 32.8 70.7 38.8 9.7 42.2 4983 35690 614 1583 2.3

TABLE VI: Characteristics of TBD-based algorithms tested on MOT2015. CMC-camera motion compensation, Motion-
constant velocity model, MAF-matched frames threshold, IPL-interpolation and DPS-det-pruning-subnet.
Year and Author Tracker Other object Appearance Data Pre- Pos- MOTA Hz Open
localisation association processing processing (↑) (↑) source
2019 Bergmann et al. TWBW [100] CMC Deep On NMS No 44.1 0.9 Yes
2019 Chu et al. IAT [121] SOT Deep MCSM No No 38.9 0.3 No
2017 Chen et al. MTCN [122] PF Deep On No No 38.5 6.7 No
2019 Xu et al. STRN [103] No Deep On No MAF 38.1 13.8 No
2017 Sadeghian et al. TTU [123] No Deep On No No 37.6 1.9 No
2018 Keuper et al. MSA [83] Segmentation Handcrafted MCSM CF No 35.6 0.6 No
2018 Fange et al. RAN [86] RNN Deep On No No 35.1 5.4 No
2017 Yang et al. HAD [81] Motion Deep NF No No 35.0 4.6 No
2019 Wu et al. IARL [124] MBN deep On DPS No 34.7 2.6 No
2017 Chu et al. STAM [125] SOT Deep On NMS No 34.3 0.5 No
2016 Wang et al. CDE [126] No Handcrafted NF No IPL 34.3 6.5 No
2017 Son et al. QCNN [31] No Deep NF No No 33.8 3.7 No
2018 Zhou et al. DCCRF [37] No Deep CRF No No 33.6 0.1 No
2016 Yang et al. TDAM [82] Motion Handcrafted On No No 33 5.9 No
2018 Bar et al. CBDA [108] No Deep NF No No 32.8 2.3 Yes

second of the tracker. For the completeness metrics, mostly B. MOT benchmark datasets
tracked targets (MT), mostly lost targets (ML), and fragmen-
tation (Frag) are used to indicate how completely the ground Publicly available MOT benchmark is a unified evaluation
truth trajectories are tracked. In addition, another metric, called platform for the pedestrian tracking. As far, four pedestrian
identification F-Score (IDF) is defined as the ratio of correctly datasets have been released in MOT benchmark as summarized
identified detections over the average number of ground-truth in Table IV. In 2015, Milan et al. [66] released the first dataset,
and computed detections [129]. MOT2015 that contains a total of 22 sequences, half of them
are for training and remaining are left for the testing purpose.
IOU(GTti , Hti )
P
MOTP = 1 − t P , (6) The ACF detector [65] was used to obtain the detection
t Mt results of MOT2015 dataset. Later, in 2016, the benchmark
where IOU(GTti , Hti ) represents the intersection over union team has released the MOT2016 and MOT2017 datasets.
(IOU) of the pedestrian and its associated tracking results. To briefly explain, the MOT2016 dataset contains 14 video
sequences including 7 train sequences and 7 test sequences.
The deformable part-based model (DPM) v5 [67] was used

TABLE VII: Results of TBD-based algorithms on MOT2016.

Year and Author Tracker MOTA(↑) MOTP(↑) IDF1(↑) MT(↑) ML(↓) FP(↓) FN(↓) IDS (↓) Frag(↓) Hz(↑)
2019 Bergmann et al. TWBW [100] 54.4 78.2 52.5 19.0 36.9 3280 79149 682 1480 1.5
2019 Chen et al. ATA [130] 49.8 74.5 55.3 17.9 37.7 7248 83614 614 1372 19.0
2018 Ma et al. HCC [114] 49.3 79.0 50.7 17.8 39.9 5333 86795 391 535 0.8
2019 Wang et al. ETC [105] 49.2 75.5 56.1 17.3 40.3 8400 83702 606 882 0.7
2018 Shen et al. AFN [32] 49.0 75.7 48.2 19.1 35.7 9508 82506 899 1383 0.6
2019 Chu et al. IAT [121] 48.8 75.7 47.2 15.8 38.1 5875 86567 906 1116 0.1
2017 Tang et al. LMP [46] 48.8 79.0 51.3 18.2 40.1 6654 86245 481 595 0.5
2018 Sheng et al. IMHT [113] 48.7 76.4 55.3 15.7 44.5 6632 86504 413 642 4.8
2019 Xu et al. STRN [103] 48.5 73.7 53.9 17.0 34.9 9038 84178 747 2919 13.5
2018 Ma et al. TCR [131] 48.2 77.5 48.6 12.9 41.1 5104 88586 821 1117 2.8
2018 Long et al. MOTDT [56] 47.6 74.8 50.9 15.2 38.3 9253 85431 792 1858 20.6
2017 Levinkov et al. JGD [132] 47.6 78.5 47.3 17.0 40.4 5844 89093 629 768 8.3
2018 Sheng et al. EAG [133] 47.4 75.9 50.1 17.3 42.7 8369 86931 575 913 197.3
2019 Tian et al. JDITS [134] 47.4 74.4 41.1 14.4 36.4 8076 86638 1266 2697 8.8
2017 Sadeghian et al. TTU [123] 47.2 75.8 46.3 14.0 41.6 2681 92856 774 1675 1.0

TABLE VIII: Characteristics of TBD-based algorithms on MOT2016. EG denotes epipolar geometry.

Year and Author Tracker Other object Appearance Data Pre- Pos- MOTA Hz Open
localisation association processing processing (↑) (↑) source
2019 Bergmann et al. TWBW [100] CMC Deep On NMS No 54.4 1.5 Yes
2019 Chen et al. ATA [130] No Deep On No No 49.8 19.2 No
2018 Ma et al. HCC [114] No Deep MCSM No No 49.3 0.8 No
2019 Wang et al. ETC [105] EG Deep MCSM No No 49.2 0.7 No
2018 Shen et al. AFN [32] KF Deep NF NMS No 49.0 0.6 No
2019 Chu et al. IAT [121] SOT Deep MCSM No No 48.8 0.1 No
2017 Tang et al. LMP [46] No Deep MCSM No No 48.8 0.5 No
2018 Sheng et al. IMHT [113] No Deep MWIS No No 48.7 4.8 No
2019 Xu et al. STRN [103] No Deep On No MAF 48.5 13.5 No
2018 Ma etal. TCR [131] LSTM Deep On NMS No 48.2 2.8 No
2018 Long et al. MOTDT [56] KF Deep On NMS No 47.6 20.6 Yes
2017 Levinlov et al. JGD [132] No Deep MCSM CF No 47.6 8.3 No
2018 Sheng et al. EAG [133] Segmentation,KF Handcrafted On NMS IPL 47.4 197.3 No
2019 Tian et al. JDITS [134] KF Handcrafted MCSM No IPL 47.4 8.8 No
2017 Sadeghian et al. TTU [123] No Deep On No No 47.2 1.0 No

to detect the pedestrian on MOT2016. Now, the MOT2017 than other two datasets. The overall accuracy of the trackers
includes the same videos as in MOT16, but contains three sets on MOT2015 is almost less than 45%, and the other two
of detection for each video such as FRCNN [68], DPM [67] datasets are over 45%. Although MOT2017 includes the same
and SDP [69]. The density of each sequence can reach up to videos as MOT2016, the overall performance of the tracker
25 pedestrians-per-frame on average in both MOT2016 and on MOT2017 dataset is higher than MOT2016 as shown in
MOT2017. Recently, CVPR19 challenge dataset, released in Table IX. There are two main reasons as follows. The first
2019, consists of 8 new sequences, out of 3 are very crowded reason is due to the improvement in detection performance.
scenes. Note that the average density of every sequence can For example, ACF [65] detector that was used on MOT2015
reach a value of 179 pedestrians-per-frame. was the most advanced detector in 2015. Subsequently, more
advanced detector such as DPM [67], SDP [69], and FRC-
C. Performance of TBD-based algorithms on MOT datasets NN [68] are applied in MOT2017. In addition, compared to
In this section, we first present the results of TBD-based MOT2016, multiple detectors can be selected in MOT2017,
algorithms on several MOT datasets, which are obtained from and the SDP and FRCNN outperform the DPM. The second
benchmark before December 1st, 2019 and used the publicly reason is due to the use of different level features. To explain,
available detection results. We selected the top fifteen best the features used in some trackers on MOT2015 were hand-
tracking algorithms on MOT2015, MOT2016, and MOT2017. crafted features, however, the features used in the vast majority
Since MOT2019 has been released recently, the relevant eval- of trackers on other datasets were high-level features extracted
uation results are not available on benchmark, so we do not by the various CNN.
discuss and analyze the performance of trackers on MOT2019. 2) Tracking performance with different detection results:
Second, we analyze characteristics of each trackers used in the As the detection result acts as an input to the MPT algorithm,
evaluation. Then, we analyze the overall tracking performance it has a great impact on the performance of the tracking
on different MOT datasets. Finally, we discuss which model algorithm. We test eight state-of-the-art detectors listed as
in each step has the high impact on the tracking performance. mask R-CNN (MASK) [138], Yolo v3 (YOLO) [139], cascade
1) The overall tracking performance on the challenge mask R-CNN (CAS) [140], Hybrid Task Cascade (HAC) [26],
datasets: From Table V, VII and IX, we see that the overall SDP [69], DPM [67], GT, and FRCNN [68] on MOT2017
performance of the trackers on MOT2015 dataset is lower train dataset to obtain different detection results, as shown

TABLE IX: Results of TBD-based algorithms on MOT2017.

Year and Author Tracker MOTA(↑) MOTP(↑) IDF1(↑) MT(↑) ML(↓) FP(↓ FN(↓) IDS (↓) Frag(↓) Hz(↑)
2019 Feng et al. SAC [50] 54.7 75.9 62.3 20.4 40.1 26091 228434 1243 3726 1.5
2019 Bergmann et al. TWBW [100] 53.5 78.0 52.3 19.5 36.6 12201 248047 2072 4611 1.5
2019 Henschel et al. BJD [28] 52.6 77.1 50.8 19.7 35.8 31572 232659 3050 3792 5.4
2019 Chu et al. FAMA [49] 52.0 76.5 48.7 19.1 33.4 14138 253616 3072 5318 0.0
2019 Wang et al. ETC [105] 51.9 76.3 58.1 23.1 35.5 36164 232783 2288 3071 0.7
2019 Sheng et al. HAGF [135] 51.8 77.0 54.7 23.4 37.9 33212 236772 1834 2739 0.7
2018 Shen et al. AFN [32] 51.5 77.6 46.9 20.6 35.5 22391 248420 2593 4308 1.8
2018 Henschel et al. FHFB [136] 51.3 77 47.6 21.4 5.2 24101 247921 2648 4279 0.2
2019 Chen et al. ATA [130] 51.3 76.7 54.5 17.1 35.4 20148 252531 2285 5798 17.8
2018 Keuper et al. MSA [83] 51.2 75.9 54.5 20.9 37.0 25937 247822 1802 2984 1.8
2019 Xu et al. STRN [103] 50.9 75.6 56.0 18.9 33.8 25295 249365 2397 9363 13.8
2018 Long et al. MOTDT [56] 50.9 76.6 52.7 17.5 35.7 24069 250768 2474 5317 18.3
2018 Sheng et al. IMHT [113] 50.6 77.6 56.5 17.6 43.4 22213 255030 1407 2079 2.6
2019 Yoon et al. DTAMA [8] 50.3 76.7 53.5 19.2 37.5 25479 25296 2192 3978 1.5
2017 Chen et al. EDM [137] 50.0 77.3 51.3 21.6 36.3 32279 247297 2264 3260 0.6

TABLE X: Characteristics of TBD-based algorithms which were test on MOT2017. JD denotes joint other detection, such as
head detection. DSA denotes detection-scene analysis.
Year and Author Tracker Other object Appearance Data Pre- Pos- MOTA Hz Open
localisation association processing processing (↑) (↑) source
2019 Feng et al. SAC [50] SOT Deep On NMS No 54.7 1.5 No
2019 Bergmann et al. TWBW [100] CMC Deep On NMS No 53.5 1.5 Yes
2019 Henschel et al. BJD [28] JD Deep NF No MAF 52.6 5.4 No
2019 Chu et al. FAMA [49] SOT Deep NF No No 52.0 0 No
2019 Wang et al. ETC [105] EG Deep MCSM No No 51.9 0.7 No
2019 Sheng et al. HAGF [135] SOT Deep MWIS No No 51.8 0.7 No
2018 Sheng et al. AFN [32] KF Deep NF NMS No 51.5 1.8 No
2018 Henschel et al. FHFB [136] JD Handcrafted CRF No No 51.3 0.2 No
2019 Chen et al. ATA [130] No Deep On No No 51.3 17.8 No
2018 Keuper et al. MSA [83] Segmentation Handcrafted MCSM No No 51.2 1.8 No
2019 Xu et al. STRN [103] No Deep On No MAF 50.9 13.8 No
2018 Long et al. MOTDT [56] KF Deep On NMS No 50.9 18.3 Yes
2018 Sheng et al. IMHT [113] No Deep MWIS No No 50.6 2.6 No
2019 Yoon et al. DTAMA [8] KF Deep On NMS No 50.3 1.5 No
2017 Chen et al. EDM [137] KF Deep MWIS DSA No 50.0 0.6 No

in Fig. 6(a) and 6(b). It is interesting to note that SDP and

100.0 100.0
FRCNN achieved good performance in terms of accuracy. 100 100
84.7 82.1 84.1 88.1
77.0 79.2 76.2 80.380.2
Moreover, we take these detection results as the input to data
MODA (%)

MODP (%)

56.3
association and select four tracking algorithms (IOU [106], 50 44.1 42.0 43.8 50
MOTDT [56], SST [141], and SORT [30]) for the testing 22.0
purpose. Among these four trackers, IOU and SORT do not
use the appearance feature, whereas both the appearance and 0 0
PM

PM
FR DP

M N
YO K
LO
TC

FR P

M N
YO K
LO
TC

T
S

S
G

G
SD
A

A
N
S

N
S

the motion feature are used in MOTDT and SST. For fair
H

H
S

A
C

C
C

C
D

comparison, we have used the same pre-processing method for

(a) (b)
these eight detection results. From Fig. 7(a) to Fig. 7(d), we
can find that an identical algorithm would produce different Fig. 6: Detection results on MOT2017 dataset. (a) MODA of
tracking results with significant performance differences by eight detectors and (b) MODP of eight detectors.
using different detection results while fixing other components.
Compared to SDP and FRCNN, MOTA is lowest in each track-
er by using DPM detection result. It verifies that the overall
performance on MOT2017 is better than that of MOT2016. [137]. Besides, joint detection [28] and head detection [136]
We can conclude that the tracking performance is different are also employed in several cases. Interestingly, these extra
due to the detection results of a tracker. object localization method can reduce the miss detection and
3) Tracking performance with extra object localization: ultimately adjust the raw public detection. We can conclude
As shown in Table X, extra object localization methods are that the tracking performance will be better if using extra
used in almost every algorithms on MOT2017 dataset. Some object localization methods in a TBD-based algorithm.
researchers have used other vision tasks (e.g., single object 4) Tracking performance with different level of appearance:
tracking) for the detection of pedestrian location [49], [50], From Table VI, we can easily find a set of 5 top performing
[83], [135], whereas others have used the filtering techniques trackers (TWBW [100], IAT [121], MTCN [122], STRN [103],
to predict the location at the next frame [8], [32], [56], and TTU [123]) with a MOTA above 37%. The appearance

IOU Tracker MOTDT Tracker SST Tracker SORT Tracker

99.2 98.4 98.4
100 100 100 100
MOTA (in %)

MOTA (in %)

MOTA (in %)
62.8 66.5
57.6 58.5 56.8
49.1 48.8 48.5
50 41.2 50 42.0 43.6 50 39.7 50 42.2 39.7 41.1
36.5 38.0 36.0 37.2
27.6 30.2 28.5 30.2 27.8 31.1 32.7 27.0 24.9
15.7

0 0 0 0

FR PM

FR PM
SK

YO T
LO
TC

N
S

S
G

G
SD

SD
A

A
N

N
H

H
A

A
C

C
C

C
D

D
M

M
(a) (b) (c) (d)

Fig. 7: The performance of tracking algorithm with different detection results.

features used in these trackers are high-level which are extract- only 1.5 Hz. It is hard to meet the requirement of real-time
ed by the CNN. As we discussed earlier, the features extracted application, such as autonomous driving [142]. In contrast,
by CNN are more robust. As a result, these high-level features some researchers are more focus on speed, they designed an
can achieve better performance than traditional handcrafted online methods to improve the speed, but the accuracy is not
features. well. For example, the speed of EAG tracker [133] can reach
5) Tracking performance with different data association up to 197.3 Hz on MOT2017, but its accuracy is low (i.e.
types: The performance of offline trackers as a whole is 47.4%) compared to others.
better than online trackers on MOT2017 as shown in Table IX 4) Other challenges. Note that appearance feature that
and X (seven of the top ten tracking algorithms are offline is one of the important cues for calculating the similarity
trackers). Basically, the offline trackers consider MOT as a between two pedestrians in data association. At the same time,
global optimization problem and leverage various optimization occlusion occurs frequently in the crowded scene that results
methods, such as minimum cost subgraph multicut (MCSM). det-pruning-subnet. Although a lot of approaches have already
Unlike online trackers that only consider the current and past been proposed to solve similar appearance [50], [72], [105],
frame information, both past and future frames information [143] and occlusion problem [83], [104], [120], the tracking
are used in the offline trackers. In fact, the solutions of the performance is still poor.
online method are local optimization, while offline methods
are global optimization. Therefore, the tracking performance B. Future research directions
of offline method is higher than online method.
Although many TBD-based algorithms have been proposed
V. E XISTING ISSUES AND FUTURE RESEARCH DIRECTIONS in recent years, there are still research gaps in MPT. In the
following, we outline some of possible research directions.
In this section, we endeavor to present some of the existing 1) MPT with end-to-end model. TBD framework involves
issues and outline the future research directions of TBD-based multiple individual data processing steps and is optimized
algorithms. differently from each other, which results in complex method
design and extensive tuning parameters to adapt different target
A. Existing Issues categories and tracking scenarios. At present, few researchers
1) Limited open source code. Unlike other computer have started to design an end-to-end model to track multiple
vision tasks, MPT has a limited open source codes . Only pedestrians [32], [57], [112], [141], [144]–[146]. Milan et
a few algorithms in Table VI, VIII, and X provide the source al. [57] designed an end-to-end model contains four steps of
code. This phenomenon limits further advancement due to the TBD framework in a single network. Shen et al. [32] proposed
difficulty while reproducing the results. Since there are many an end-to-end tracklet association module to associate track-
steps in a TBD framework, the code reproduction is hard for lets. Sun et al. [141] designed an end-to-end fashion for the
researchers, especially for beginners. association by jointly modeling pedestrian appearances and
2) The tracking performance highly depends on object their affinities between different frames.
detection results. As discussed above, the first step in TBD is 2) Joint task-based MPT. Missing detection often en-
to obtain the detection results. Note that an identical algorithm counters in MPT in a crowd scene. Other vision tasks can
would produce different tracking results with significant per- help MPT to localize the pedestrian better, such as SOT
formance differences by using different detection results while and segmentation. Joint task by combining MPT and other
fixing other components [76], [86], [89], [108], [122], [126]. vision tasks not only can reduce the number of missing
3) Tradeoff between accuracy and speed. As discussed pedestrian, but also can improve the performance of other
above, the balance between accuracy and speed is very im- vision tasks. Basically, SOT utilizes an appearance template
portant in MPT. As shown in Table VI, VIII, and X, some to search the pedestrian location in the next frame, so it’s
algorithms focused on the accuracy, thus used offline methods suitable for short term predicting in MPT at crowd scene [48]–
and other manipulation in their trackers. For example, MOTA [50], [71]. Tracking and segmentation are closely related, and
of SAC tracker [50] can reach up to 54.7%, but the speed is they can help each other. Object segmentation would separate

pedestrians from other targets and background, which will be [12] W. Ruan, J. Chen, Y. Wu, J. Wang, C. Liang, R. Hu, and J. Jiang,
useful for locating the person in every frame [52], [53], [89]. “Multi-correlation filters with triangle-structure constraints for object
tracking,” IEEE Trans. Multimedia, vol. 21, no. 5, pp. 1122–1134, May.
3) Multiple 3D pedestrian tracking. The significant chal- 2018.
lenges in MPT include noise in pedestrian detection, appear- [13] M. Ye, C. Liang, Y. Yu, Z. Wang, Q. Leng, C. Xiao, J. Chen, and
ance change, and identity switch caused by pedestrian occlu- R. Hu, “Person reidentification via ranking aggregation of similarity
pulling and dissimilarity pushing,” IEEE Trans. Multimedia., vol. 18,
sion and similar appearance between pedestrians in the group. no. 12, pp. 2553–2566, Dec. 2016.
Because 2D tracking cannot obtain the spatial coordinate [14] B. Yang and R. Nevatia, “Online learned discriminative part-based
information of the pedestrian, it does not support shape-related appearance models for multi-human tracking,” in Proc. Eur. Conf.
Comput. Vis., Oct. 2012, pp. 484–498.
measurements, such as thickness and discriminative features. [15] L. Zhang and L. Van Der Maaten, “Preserving structure in model-free
With the increasing demand for precision now in many appli- tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 4, pp.
cations (i.e., autonomous navigation, robotics, and sport), 3D 756–769, Apr. 2013.
tracking has become more popular [143], [147]. Compared [16] H. Morimitsu, I. Bloch, and R. M. Cesar-Jr, “Exploring structure for
long-term tracking of multiple objects in sports videos,” Comput. Vis.
with 2D tracking, 3D tracking can offer more compelling Image Underst., vol. 159, pp. 89–104, Jun. 2017.
information, such as depth information that predicts the object [17] L. Zhang and L. van der Maaten, “Structure preserving object tracking,”
movement and the scale can be more reliable. Besides, 3D in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2013, pp.
1838–1845.
geometry information can also be leveraged in the formulation [18] W. Hu, X. Li, W. Luo, X. Zhang, S. Maybank, and Z. Zhang, “Single
of data association in the TBD framework [143]. and multiple object tracking using log-euclidean riemannian subspace
and block-division appearance model,” IEEE Trans. Pattern Anal.
Mach. Intell., vol. 34, no. 12, pp. 2420–2440, Dec. 2012.
VI. C ONCLUSION [19] L. Zhang, Y. Li, and R. Nevatia, “Global data association for multi-
object tracking using network flows,” in Proc. IEEE Conf. Comput.
This survey presents a comprehensive review of TBD frame- Vis. Pattern Recognit., Jun. 2008, pp. 1–8.
work. We first introduced the development of TBD-based algo- [20] H. Pirsiavash, D. Ramanan, and C. C. Fowlkes, “Globally-optimal
greedy algorithms for tracking a variable number of objects,” in Proc.
rithms with a timeline. Afterward, the main procedures of the IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 1201–1208.
TBD framework are summarized. We also presented the main [21] A. Dehghan, Y. Tian, P. H. S. Torr, and M. Shah, “Target identity-aware
approaches in each step in detail. Moreover, the evaluation network flow for online multiple target tracking,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1146–1154.
metrics and publicly available datasets are discussed. Besides,
[22] A. A. Butt and R. T. Collins, “Multi-target tracking by lagrangian
the performance and characteristics of existing TBD-based relaxation to min-cost network flow,” in Proc. IEEE Conf. Comput.
methods on these datasets are analyzed. Finally, this article Vis. Pattern Recognit., Jun. 2013, pp. 1846–1853.
outlines important research issues that need to be solved in [23] W. Choi and S. Savarese, “A unified framework for multi-target track-
ing and collective activity recognition,” in Proc. Eur. Conf. Comput.
the TBD-based algorithms for MPT. Vis., Oct. 2012, pp. 215–230.
[24] V. Chari, S. Lacoste-Julien, I. Laptev, and J. Sivic, “On pairwise costs
for network flow multi-object tracking,” in Proc. IEEE Conf. Comput.
R EFERENCES Vis. Pattern Recognit., Jun. 2015, pp. 5537–5545.
[25] S. Schulter, P. Vernaza, W. Choi, and M. Chandraker, “Deep network
[1] W. Luo, J. Xing, A. Milan, X. Zhang, W. Liu, X. Zhao, and T.-K.
flow for multi-object tracking,” in Proc. IEEE Conf. Comput. Vis.
Kim, “Multiple object tracking: A literature review,” arXiv preprint
Pattern Recognit., Jan. 2017, pp. 6951–6960.
arXiv:1409.7618, 2014.
[26] K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng,
[2] C. Qiu, Z. Zhang, H. Lu, and H. Luo, “A survey of motion-based
Z. Liu, J. Shi, W. Ouyang et al., “Hybrid task cascade for instance
multitarget tracking methods,” Prog. Electromagn. Res. B, vol. 62, pp.
segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
195–223, 2015.
Jun. 2019, pp. 4974–4983.
[3] M. Camplani, A. Paiement, M. Mirmehdi, D. Damen, S. Hannuna,
T. Burghardt, and L. Tao, “Multiple human tracking in RGB-depth [27] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan,
data: a survey,” IET Comput. Vis., vol. 11, no. 4, pp. 265–285, Jun. P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in
2017. context,” in Proc. Eur. Conf. Comput. Vis, Sept. 2014, pp. 740–755.
[4] S. Zhou, M. Ke, J. Qiu, and J. Wang, “A survey of multi-object video [28] R. Henschel, Y. Zou, and B. Rosenhahn, “Multiple people tracking
tracking algorithms,” in Proc. Adv. Intell. Sys. Comput., Jul. 2018, pp. using body and joint detections,” in Proc. IEEE Conf. Comput. Vis.
351–369. Pattern Recognit., Jun. 2019, pp. 1–10.
[5] Y. Xu, X. Zhou, S. Chen, and F. Li, “Deep learning for multiple object [29] X. Yan, X. Wu, I. A. Kakadiaris, and S. K. Shah, “To track or to
tracking: A survey,” IET Comput. Vis., vol. 13, no. 4, pp. 355–368, detect? an ensemble framework for optimal selection,” in Proc. Eur.
Jan. 2019. Conf. Comput. Vis., Sept. 2012, pp. 594–607.
[6] G. Ciaparrone, F. L. Sánchez, S. Tabik, L. Troiano, R. Tagliaferri, and [30] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime
F. Herrera, “Deep learning in video multi-object tracking: A survey,” tracking with a deep association metric,” in Proc. Int. Conf. Image
Neurocomputing, vol. 381, pp. 61– 88, Mar. 2020. Process., Sept. 2017, pp. 3645–3649.
[7] V. K. Singh, B. Wu, and R. Nevatia, “Pedestrian tracking by associating [31] J. Son, M. Baek, M. Cho, and B. Han, “Multi-object tracking with
tracklets using detection residuals,” in IEEE Workshop Motion Video quadruplet convolutional neural networks,” in Proc. IEEE Conf. Com-
Comput., Jan. 2008, pp. 1–8. put. Vis. Pattern Recognit., Jan. 2017, pp. 5620–5629.
[8] Y.-C. Yoon, D. Y. Kim, K. Yoon, Y.-m. Song, and M. Jeon, “Online [32] H. Shen, L. Huang, C. Huang, and W. Xu, “Tracklet association tracker:
multiple pedestrian tracking using deep temporal appearance matching An end-to-end learning-based association approach for multi-object
association,” arXiv preprint arXiv:1907.00831, 2019. tracking,” arXiv preprint arXiv:1808.01562, 2018.
[9] T. Kimura, M. Ohashi, R. Okada, and H. Ikeno, “A new approach for [33] K. Shafique, M. W. Lee, and N. Haering, “A rank constrained con-
the simultaneous tracking of multiple honeybees for analysis of hive tinuous formulation of multi-frame multi-target tracking problem,” in
behavior,” Apidologie, vol. 42, no. 5, pp. 607–617, Sept. 2011. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2008, pp. 1–8.
[10] H. Rahmani, A. Mian, and M. Shah, “Learning a deep model for human [34] W. Brendel, M. Amer, and S. Todorovic, “Multiobject tracking as
action recognition from novel viewpoints,” IEEE Trans. Pattern Anal. maximum weight independent set,” in Proc. IEEE Conf. Comput. Vis.
Mach. Intell., vol. 40, no. 3, pp. 667–681, Mar. 2017. Pattern Recognit., Jun. 2011, pp. 1273–1280.
[11] W. Ruan, W. Liu, Q. Bao, J. Chen, Y. Cheng, and T. Mei, “Poinet: [35] B. Yang, C. Huang, and R. Nevatia, “Learning affinities and depen-
pose-guided ovonic insight network for multi-person pose tracking,” in dencies for multi-target tracking using a CRF model,” in Proc. IEEE
Proc. ACM Int. Conf. Multimed., Oct. 2019, pp. 284–292. Conf. Comput. Vis. Pattern Recognit., Jun. 2011, pp. 1233–1240.

[36] A. Milan, K. Schindler, and S. Roth, “Detection-and trajectory-level [60] S. Gautam, G. P. Meyer, C. Vallespi-Gonzalez, and B. C. Becker,
exclusion in multiple object tracking,” in Proc. IEEE Conf. Comput. “SDVTracker: Real-time multi-sensor association and tracking for self-
Vis. Pattern Recognit., Jun. 2013, pp. 3682–3689. driving vehicles,” arXiv preprint arXiv:2003.04447, 2020.
[37] H. Zhou, W. Ouyang, J. Cheng, X. Wang, and H. Li, “Deep continuous [61] H. Kuang, X. Liu, J. Zhang, and Z. Fang, “Multi-modality cascad-
conditional random fields with asymmetric inter-object constraints ed fusion technology for autonomous driving,” arXiv preprint arX-
for online multi-object tracking,” IEEE Trans. Circuits Syst. Video iv:2002.03138, 2020.
Technol., vol. 29, no. 4, pp. 1011–1022, Apr. 2018. [62] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
[38] A. Andriyenko and K. Schindler, “Multi-target tracking by continuous detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun.
energy minimization,” in Proc. IEEE Conf. Comput. Vis. Pattern 2005, pp. 886–893.
Recognit., Jun. 2011, pp. 1265–1272. [63] P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan,
[39] A. Andriyenko, K. Schindler, and S. Roth, “Discrete-continuous opti- “Object detection with discriminatively trained part-based models,”
mization for multi-target tracking,” in Proc. IEEE Conf. Comput. Vis. IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645,
Pattern Recognit., Jun. 2012, pp. 1926–1933. Sept. 2010.
[40] A. Milan, K. Schindler, and S. Roth, “Multi-target tracking by discrete- [64] M. Piccardi, “Background subtraction techniques: A review,” in Proc.
continuous energy minimization,” IEEE Trans. Pattern Anal. Mach. IEEE Int. Conf. Syst. Man Cybern., Oct. 2004, pp. 3099–3104.
Intell., vol. 38, no. 10, pp. 2054–2068, Oct. 2016. [65] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Aggregate channel features for
[41] M. Thoreau and N. Kottege, “Improving online multiple object tracking multi-view face detection,” in Proc. IEEE Int. Jt. Conf. Biom., Dec.
with deep metric learning,” arXiv preprint arXiv:1806.07592, 2018. 2014, pp. 1–8.
[42] A. R. Zamir, A. Dehghan, and M. Shah, “GMCP-tracker: Global multi- [66] L. Leal-Taixé, A. Milan, I. Reid, S. Roth, and K. Schindler, “MOTChal-
object tracking using generalized minimum clique graphs,” in Proc. lenge2015: Towards a benchmark for multi-target tracking,” arXiv
Eur. Conf. Comput. Vis., Oct. 2012, pp. 343–356. preprint arXiv:1504.01942, 2015.
[43] A. Dehghan, S. Modiri Assari, and M. Shah, “GMMCP tracker: Glob- [67] M. A. Sadeghi and D. Forsyth, “30Hz object detection with DPM V5,”
ally optimal generalized maximum multi clique problem for multiple in Proc. Eur. Conf. Comput. Vis., Sept. 2014, pp. 65–79.
object tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., [68] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-
Jun. 2015, pp. 4091–4099. time object detection with region proposal networks,” in Proc. Adv.
[44] S. Tang, B. Andres, M. Andriluka, and B. Schiele, “Subgraph decom- neural inf. proces. syst., Jan. 2015, pp. 91–99.
position for multi-target tracking,” in Proc. IEEE Conf. Comput. Vis. [69] F. Yang, W. Choi, and Y. Lin, “Exploit all the layers: Fast and accurate
Pattern Recognit., Jun. 2015, pp. 5033–5041. cnn object detector with scale dependent pooling and cascaded rejection
[45] S. Tang, B. Andres, and M. Andriluka, “Multi-person tracking by classifiers,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun.
multicut and deep matching,” in Proc. Eur. Conf. Comput. Vis., Oct. 2016, pp. 2129–2137.
2016, pp. 100–111. [70] P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid,
[46] S. Tang, M. Andriluka, B. Andres, and B. Schiele, “Multiple people S. Roth, K. Schindler, and L. Leal-Taixe, “CVPR19 tracking and
tracking by lifted multicut and person re-identification,” in Proc. IEEE detection challenge: How crowded can it get?” arXiv preprint arX-
Conf. Comput. Vis. Pattern Recognit., Jun. 2017, pp. 3539–3548. iv:1906.04567, 2019.
[47] J. Xing, H. Ai, and S. Lao, “Multi-object tracking through occlusions [71] S. Pan, Z. Tong, Y. Zhao, Z. Zhao, F. Su, and B. Zhuang, “Multi-object
by local tracklets filtering and global tracklets association with detec- tracking hierarchically in visual data taken from drones,” in Proc. IEEE
tion responses,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Int Conf Comput Vision., Oct. 2019, pp. 135–143.
Jun. 2009, pp. 1200–1207. [72] Y.-m. Song, K. Yoon, Y.-C. Yoon, K.-C. Yow, and M. Jeon, “Online
[48] H. Wu and W. Li, “Robust online multi-object tracking based on KCF multi-object tracking framework with the GMPHD filter and occlusion
trackers and reassignment,” in IEEE Glob. Conf. Signal Inf. Process., group management,” arXiv preprint arXiv:1907.13347, 2019.
Apr. 2016, pp. 124–128. [73] X. Wan, J. Wang, and S. Zhou, “An online and flexible multi-object
[49] P. Chu and H. Ling, “FAMNet: Joint learning of feature, affinity and tracking framework using long short-term memory,” in Proc. IEEE
multi-dimensional assignment for online multiple object tracking,” in Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1230–1238.
Proc. IEEE Int Conf Comput Vision., Oct. 2019, pp. 6172–6181. [74] T. Badal, N. Nain, and M. Ahmed, “Online multi-object tracking:
[50] W. Feng, Z. Hu, W. Wu, J. Yan, and W. Ouyang, “Multi-object tracking multiple instance based target appearance model,” Multimed. Tools
with multiple cues and switcher-aware classification,” arXiv preprint Appl., vol. 77, no. 19, pp. 25 199–25 221, Oct. 2018.
arXiv:1901.06129, 2019. [75] J. Ju, D. Kim, B. Ku, D. K. Han, and H. Ko, “Online multi-object
[51] Q. Zhang and K. N. Ngan, “Segmentation and tracking multiple objects tracking based on hierarchical association framework,” in Proc. IEEE
under occlusion from multiview video,” IEEE Trans. Image Process., Conf. Comput. Vis. Pattern Recognit., Jun. 2016, pp. 34–42.
vol. 20, no. 11, pp. 3308–3313, Nov. 2011. [76] R. Sanchez-Matilla, F. Poiesi, and A. Cavallaro, “Online multi-target
[52] A. Milan, L. Leal-Taixé, K. Schindler, and I. Reid, “Joint tracking and tracking with strong and weak detections,” in Proc. Eur. Conf. Comput.
segmentation of multiple targets,” in Proc. IEEE Conf. Comput. Vis. Vis., Oct. 2016, pp. 84–99.
Pattern Recognit., Jun. 2015, pp. 5397–5406. [77] J.-W. Choi, D. Moon, and J.-H. Yoo, “Robust multi-person tracking
[53] P. Voigtlaender, M. Krause, A. Osep, J. Luiten, B. B. G. Sekar, for real-time intelligent video surveillance,” ETRI J., vol. 37, no. 3,
A. Geiger, and B. Leibe, “MOTS: Multi-object tracking and segmenta- pp. 551–561, Jun. 2015.
tion,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2019, [78] S. Tian, F. Yuan, and G.-S. Xia, “Multi-object tracking with inter-
pp. 7942–7951. feedback between detection and tracking,” Neurocomputing, vol. 171,
[54] C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis pp. 768–780, Jan. 2016.
tracking revisited,” in Proc. IEEE Int Conf Comput Vision., Dec. 2015, [79] C. M. Bukey, S. V. Kulkarni, and R. A. Chavan, “Multi-object tracking
pp. 4696–4704. using kalman filter and particle filter,” in Proc. IEEE Int. Conf. Power,
[55] Y. Xu, L. Qin, X. Liu, J. Xie, and S.-C. Zhu, “A causal and-or graph Control, Signals Instrum. Eng., Sept. 2017, pp. 1688–1692.
model for visibility fluent reasoning in tracking interacting objects,” [80] A. Milan, S. Roth, and K. Schindler, “Continuous energy minimization
in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. for multitarget tracking,” IEEE Trans. Pattern Anal. Mach. Intell.,
2178–2187. vol. 36, no. 1, pp. 58–72, Jan. 2014.
[56] C. Long, A. Haizhou, Z. Zijie, and S. Chong, “Real-time multiple [81] M. Yang, Y. Wu, and Y. Jia, “A hybrid data association framework
people tracking with deeply learned candidate selection and person re- for robust online multi-object tracking,” IEEE Trans. Image Process.,
identification,” in Proc. IEEE Int. Conf. Multimedia Expo., Jul. 2018, vol. 26, no. 12, pp. 5667–5679, Dec. 2017.
pp. 1–8. [82] M. Yang and Y. Jia, “Temporal dynamic appearance modeling for
[57] A. Milan, S. H. Rezatofighi, A. Dick, I. Reid, and K. Schindler, “Online online multi-person tracking,” Comput. Vis. Image Underst., vol. 153,
multi-target tracking using recurrent neural networks,” in AAAI Conf. pp. 16–28, Dec. 2016.
Artif. Intell., Feb. 2017, pp. 4225–4232. [83] M. Keuper, S. Tang, B. Andres, T. Brox, and B. Schiele, “Motion
[58] M. Ullah and F. A. Cheikh, “Deep feature based end-to-end transporta- segmentation & multiple object tracking by correlation co-clustering,”
tion network for multi-target tracking,” in Proc. IEEE Int. Conf. Image IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 1, pp. 140–153,
Process., Oct. 2018, pp. 3738–3742. Jan. 2018.
[59] W. Zhang, H. Zhou, S. Sun, Z. Wang, J. Shi, and C. C. Loy, “Robust [84] M. Keuper, S. Tang, Y. Zhongjie, B. Andres, T. Brox, and B. Schiele,
multi-modality multi-object tracking,” in Proc. IEEE Int. Conf. Comput. “A multi-cut formulation for joint segmentation and tracking of multi-
Vis.., Oct. 2019, pp. 2365–2374. ple objects,” arXiv preprint arXiv:1607.06317, 2016.

[85] C. Kim, F. Li, and J. M. Rehg, “Multi-object tracking with neural gating [109] H. Yu, Q. Lei, Q. Huang, and H. Yao, “Online multiple object tracking
using bilinear lstm,” in Proc. Eur. Conf. Comput. Vis., Sept. 2018, pp. via exchanging object context,” Neurocomputing, vol. 292, no. 31, pp.
200–215. 28–37, May. 2018.
[86] K. Fang, Y. Xiang, X. Li, and S. Savarese, “Recurrent autoregressive [110] H. Kieritz, W. Hubner, and M. Arens, “Joint detection and online multi-
networks for online multi-object tracking,” in Proc. IEEE Winter Conf. object tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Appl. Comput. Vis., Mar. 2018, pp. 466–475. Jun. 2018, pp. 1540–1548.
[87] L. Ren, J. Lu, Z. Wang, Q. Tian, and J. Zhou, “Collaborative deep [111] B. Wang, G. Wang, K. Luk Chan, and L. Wang, “Tracklet association
reinforcement learning for multi-object tracking,” in Proc. Eur. Conf. with online target-specific metric learning,” in Proc. IEEE Conf.
Comput. Vis., Sept. 2018, pp. 586–602. Comput. Vis. Pattern Recognit., Jun. 2014, pp. 1234–1241.
[88] M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and [112] J. Xiang, M. Chao, G. Xu, and J. Hou, “End-to-end learning deep crf
L. Van Gool, “Robust tracking-by-detection using a detector confidence models for multi-object tracking,” arXiv preprint arXiv:1907.12176,
particle filter,” in Proc. IEEE Int Conf Comput Vision., Oct. 2009, pp. 2019.
1515–1522. [113] H. Sheng, J. Chen, Y. Zhang, W. Ke, Z. Xiong, and J. Yu, “Iterative
[89] Y. Xiang, A. Alahi, and S. Savarese, “Learning to track: Online multi- multiple hypothesis tracking with tracklet-level association,” IEEE
object tracking by decision making,” in Proc. IEEE Int Conf Comput Trans. Circuits Syst. Video Technol., vol. 14, no. 8, pp. 1–13, Dec.
Vision., Feb. 2015, pp. 4705–4713. 2019.
[90] T. Fernando, S. Denman, S. Sridharan, and C. Fookes, “Tracking by [114] L. Ma, S. Tang, M. J. Black, and L. Van Gool, “Customized multi-
prediction: A deep generative model for mutli-person localisation and person tracker,” in Lect. Notes Comput. Sci., Dec. 2018, pp. 612–628.
tracking,” in Proc. IEEE Workshop Appl. Comput. Vis., Dec. 2018, pp. [115] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-NMS–
1122–1132. improving object detection with one line of code,” in Proc. IEEE Int
[91] B. Yang and R. Nevatia, “Multi-target tracking by online learning a crf Conf Comput Vision., Oct. 2017, pp. 5561–5569.
model of appearance and motion patterns,” Int. J. Comput. Vis., vol. [116] U. Iqbal, A. Milan, and J. Gall, “Posetrack: Joint multi-person pose
107, no. 2, pp. 203–217, Apr. 2014. estimation and tracking,” in Proc. IEEE Conf. Comput. Vis. Pattern
[92] L. Wang, N. T. Pham, T.-T. Ng, G. Wang, K. L. Chan, and K. Leman, Recognit., Jun. 2017, pp. 2011–2020.
“Learning deep features for multiple object tracking by using a multi- [117] Y. Xu, Y. Ban, X. Alameda-Pineda, and R. Horaud, “Deepmot: A
task learning strategy,” in Proc. Int. Conf. Image Process., Jan. 2014, differentiable framework for training multiple object trackers,” arXiv
pp. 838–842. preprint arXiv:1906.06618, 2019.
[93] X. Wang and Q. Wang, “Coupled data association and L1 minimization [118] E. Insafutdinov, M. Andriluka, L. Pishchulin, S. Tang, E. Levinkov,
for multiple object tracking under occlusion,” in Proc. SPIE Int Soc B. Andres, and B. Schiele, “Arttrack: Articulated multi-person tracking
Opt Eng., Oct. 2014, pp. 1–22. in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun.
[94] H. Izadinia, I. Saleemi, W. Li, and M. Shah, “(MP)2 T multiple people 2017, pp. 6457–6465.
multiple parts tracker,” in Proc. Eur. Conf. Comput. Vis., Oct. 2012, [119] A. Maksai, X. Wang, F. Fleuret, and P. Fua, “Non-markovian globally
pp. 100–114. consistent multi-object tracking,” in Proc. IEEE Int Conf Comput
[95] F. Zhao, J. Wang, Y. Wu, and M. Tang, “Adversarial deep tracking,” Vision., Oct. 2017, pp. 2544–2554.
IEEE Trans. Circuits Syst. Video Technol, vol. 29, no. 7, pp. 1998– [120] L. Lan, X. Wang, S. Zhang, D. Tao, W. Gao, and T. S. Huang,
2011, Jul. 2018. “Interacting tracklets for multi-object tracking,” IEEE Trans. Image
[96] M. Rodriguez, S. Ali, and T. Kanade, “Tracking in unstructured Process., vol. 27, no. 9, pp. 4585–4597, Sept. 2018.
crowded scenes,” in Proc. IEEE Int Conf Comput Vision., Oct. 2009, [121] P. Chu, H. Fan, C. C. Tan, and H. Ling, “Online multi-object tracking
pp. 1389–1396. with instance-aware tracker and dynamic model refreshment,” in Proc.
[97] S. Walk, N. Majer, K. Schindler, and B. Schiele, “New features and IEEE Workshop Appl. Comput. Vis., Jan. 2019, pp. 161–170.
insights for pedestrian detection,” in Proc. IEEE Conf. Comput. Vis. [122] L. Chen, H. Ai, C. Shang, Z. Zhuang, and B. Bai, “Online multi-object
Pattern Recognit., Jun. 2010, pp. 1030–1037. tracking with convolutional neural networks,” in Proc. Int. Conf. Image
[98] C. Jia, Z. Wang, X. Wu, B. Cai, Z. Huang, G. Wang, T. Zhang, and Process., Sept. 2017, pp. 645–649.
D. Tong, “A tracking-learning-detection (TLD) method with local bi- [123] A. Sadeghian, A. Alahi, and S. Savarese, “Tracking the untrackable:
nary pattern improved,” in Proc. IEEE Int. Conf. Robotics Biomimetics, Learning to track multiple cues with long-term dependencies,” in Proc.
Dec. 2015, pp. 1625–1630. IEEE Int Conf Comput Vision., Oct. 2017, pp. 300–311.
[99] P. P. Dash, D. Patra, and S. K. Mishra, “Local binary pattern as a [124] H. Wu, Y. Hu, K. Wang, H. Li, L. Nie, and H. Cheng, “Instance-
texture feature descriptor in object tracking algorithm,” in Proc. Adv. aware representation learning and association for online multi-person
Intell. Sys. Comput., Jun. 2014, pp. 541–548. tracking,” Pattern Recognit., vol. 94, pp. 25–34, Oct. 2019.
[100] P. Bergmann, T. Meinhardt, and L. Leal-Taixe, “Tracking without bells [125] Q. Chu, W. Ouyang, H. Li, X. Wang, B. Liu, and N. Yu, “Online multi-
and whistles,” in Proc. IEEE Int Conf Comput Vision., Oct. 2019, pp. object tracking using CNN-based single object tracker with spatial-
941–951. temporal attention mechanism,” in Proc. IEEE Int Conf Comput Vision.,
[101] W. Li, J. Mu, and G. Liu, “Multiple object tracking with motion and Oct. 2017, pp. 4836–4845.
appearance cues,” in Proc. IEEE Int. Conf. Comput. Vis. Workshop., [126] B. Wang, G. Wang, K. L. Chan, and L. Wang, “Tracklet association
Oct. 2019, pp. 161–169. by online target-specific metric learning and coherent dynamics esti-
[102] Y. Xu, X. Liu, L. Yang, and S. C. Zhu, “Multi-view people tracking mation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 3, pp.
via hierarchical trajectory composition,” in Proc. IEEE Conf. Comput. 589–602, Feb. 2017.
Vis. Pattern Recognit., Jun. 2016, pp. 4256–4265. [127] R. Kasturi, D. Goldgof, P. Soundararajan, V. Manohar, J. Garofolo,
[103] J. Xu, Y. Cao, Z. Zhang, and H. Hu, “Spatial-temporal relation R. Bowers, M. Boonstra, V. Korzhova, and J. Zhang, “Framework for
networks for multi-object tracking,” in Proc. IEEE Int Conf Comput performance evaluation of face, text, and vehicle detection and tracking
Vision., Oct. 2019, pp. 3988–3998. in video: Data, metrics, and protocol,” IEEE Trans. Pattern Anal. Mach.
[104] X. Gao and T. Jiang, “OSMO: Online specific models for occlusion Intell., vol. 31, no. 2, pp. 319–336, Feb. 2008.
in multiple object tracking under surveillance scene,” in Proc. ACM [128] K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking
Multimed. Conf., Oct. 2018, pp. 201–210. performance: the clear mot metrics,” Journal on Image and Video
[105] G. Wang, Y. Wang, H. Zhang, R. Gu, and J.-N. Hwang, “Exploit the Processing, vol. 2008, pp. 1–11, 2008.
connectivity: Multi-object tracking with trackletnet,” in Proc. ACM [129] E. Ristani, F. Solera, R. Zou, R. Cucchiara, and C. Tomasi, “Perfor-
Multimed. Conf., Oct. 2019, pp. 482–490. mance measures and a data set for multi-target, multi-camera tracking,”
[106] E. Bochinski, V. Eiselein, and T. Sikora, “High-speed tracking-by- in Proc. Eur. Conf. Comput. Vis., Oct. 2016, pp. 17–35.
detection without using image information,” in IEEE Int. Conf. Adv. [130] L. Chen, H. Ai, R. Chen, and Z. Zhuang, “Aggregate tracklet appear-
Video Signal Based Surveill., Oct. 2017, pp. 1–6. ance features for multi-object tracking,” IEEE Signal Process. Lett.,
[107] A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online vol. 26, no. 11, pp. 1613–1617, Nov. 2019.
and realtime tracking,” in Proc. Int. Conf. Image Process., Aug. 2016, [131] C. Ma, C. Yang, F. Yang, Y. Zhuang, Z. Zhang, H. Jia, and X. X-
pp. 3464–3468. ie, “Trajectory factory: Tracklet cleaving and re-connection by deep
[108] S. H. Bae and K. J. Yoon, “Confidence-based data association and siamese Bi-GRU for multiple object tracking,” in Proc. IEEE Int. Conf.
discriminative deep appearance learning for robust online multi-object Multimedia Expo., Jul. 2018, pp. 1–6.
tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 3, pp. [132] E. Levinkov, J. Uhrig, S. Tang, M. Omran, E. Insafutdinov, A. Kirillov,
595–610, Mar. 2018. C. Rother, T. Brox, B. Schiele, and B. Andres, “Joint graph decom-

position & node labeling: Problem, algorithms, applications,” in Proc. Chao Liang received the Ph.D degree from National
IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2017, pp. 6012–6020. Lab of Pattern Recognition (NLPR), Institute of Au-
[133] H. Sheng, X. Zhang, Y. Zhang, Y. Wu, J. Chen, and Z. Xiong, “En- tomation, Chinese Academy of Sciences (CASIA),
hanced association with supervoxels in multiple hypothesis tracking,” Beijing, China, in 2012. He is currently working
IEEE Access, vol. 7, pp. 2107–2117, 2018. as an associate professor at National Engineering
[134] W. Tian, M. Lauer, and L. Chen, “Online multi-object tracking using Research Center for Multimedia Software (NERCM-
joint domain information in traffic scenarios,” IEEE Trans. Intell. S), Computer School of Wuhan University, Wuhan,
Transp. Syst., vol. 20, no. 2, pp. 1–11, Jan. 2019. China. His research interests focus on multimedia
[135] H. Sheng, Y. Zhang, J. Chen, Z. Xiong, and J. Zhang, “Heterogeneous content analysis and retrieval, computer vision and
association graph fusion for target association in multiple object pattern recognition, where he has published over
tracking,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 11, 60 papers, including premier conferences such as
pp. 3269–3280, Nov. 2019. CVPR, ACM MM, AAAI, IJCAI and honorable journals like TNNLS, TMM
[136] R. Henschel, L. Leal-Taixe, D. Cremers, and B. Rosenhahn, “Fusion of and TCSVT, and won the best paper award of PCM 2014.
head and full-body detectors for multi-object tracking,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 1428–1437.
[137] J. Chen, H. Sheng, Y. Zhang, and Z. Xiong, “Enhancing detection
model for multiple hypothesis tracking,” in Proc. IEEE Conf. Comput.
Vis. Pattern Recognit., Jun. 2017, pp. 18–27.
[138] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” in
Proc. IEEE Int Conf Comput Vision., Oct. 2017, pp. 2961–2969.
[139] J. Redmon and A. Farhadi, “YOLOV3: An incremental improvement,”
arXiv preprint arXiv:1804.02767, 2018.
[140] Z. Cai and N. Vasconcelos, “Cascade R-CNN: Delving into high quality
object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,
Jun. 2018, pp. 6154–6162.
[141] S. Sun, N. Akhtar, H. Song, A. S. Mian, and M. Shah, “Deep affinity
network for multiple object tracking,” IEEE Trans. Pattern Anal. Mach. Weijian Ruan received the B.E. degree from Elec-
Intell., 2019. tronic Information School of Wuhan University in
[142] A. Agarwal and S. Suryavanshi, “Real-time* multiple object tracking 2014. Since Sep 2014, he has been pursuing his
(mot) for autonomous navigation,” Technical report, Tech. Rep., 2017. Ph.D degree in school of computer science, Wuhan
[143] Z. Tang and J.-N. Hwang, “MOANA: An online learned adaptive University. From 2018.03 to 2018.09, he served as
appearance model for robust multiple object tracking in 3D,” IEEE an intern in the National Institute of Informatics,
Access, vol. 7, pp. 31 934–31 945, 2019. Tokyo, Japan. From 2019.01 to 2019.06, he had been
[144] T. Hu, L. Huang, and H. Shen, “Multi-object tracking via end-to- an intern in JD AI research, China. His research
end tracklet searching and ranking,” arXiv preprint arXiv:2003.02795, interests focus on computer vision and multimedia
2020. analysis, where he has published over 10 papers
[145] S. Wang, Y. Sun, C. Liu, and M. Liu, “PointTrackNet: An end-to-end including AAAI, ACM MM, TMM, TOMM, ICME,
network for 3-D object detection and tracking from point clouds,” Proc. ICIP, etc. In addition, he is active in his research field and has served as the
IEEE Robot. Autom., Apr. 2020. reviewer in related journals and conferences, such as TMM, TIP, AAAI 2020,
[146] J. Xiang, G. Xu, C. Ma, and J. Hou, “End-to-end learning deep CRF ACM MM 2020, ICME 2020, ICASSP 2020.
models for multi-object tracking,” IEEE Trans. Circuits Syst. Video
Technol, 2020.
[147] E. Baser, V. Balasubramanian, P. Bhattacharyya, and K. Czarnecki,
“Fantrack: 3D multi-object tracking with feature association network,”
arXiv preprint arXiv:1905.02843, 2019.

Zhihong Sun received the B.E degree from the

College of Post and Telecommunication of Wuhan
Institute of Technology, Wuhan, China, in 2014, and Mithun Mukherjee received the B.E. degree in
the M.E. degree from Wuhan Institute of Technolo- electronics and communication engineering from the
gy, Wuhan, China, in 2017. He is currently a Ph.D University Institute of Technology, Burdwan Univer-
student at Department of Computer Science, Wuhan sity, Bardhaman, India, in 2007; the M.E. degree
University. His research interests include computer in information and communication engineering from
vision, multimedia analysis, and machine learning. the Indian Institute of Science and Technology,
Shibpur, India, in 2009; and the Ph.D. degree in
electrical engineering from the Indian Institute of
Technology Patna, Patna, India, in 2015. Currently,
he is a professor with the College of Artificial
Intelligence, Nanjing University of Information Sci-
ence and Technology, Nanjing, China. He has been an Associate Editor of
IEEE ACCESS and a Guest Editor of the IEEE INTERNET OF THINGS
Jun Chen received the M.S. degree in Instrumenta- JOURNAL, the IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS,
tion from Huazhong University of Science and Tech- ACM/Springer Mobile Networks and Applications, and Sensors. His research
nology, Wuhan, China, in 1997, and Ph.D degree interests include fog computing and Intelligent edge computing.
in photogrammetry and remote sensing from Wuhan
University, Wuhan, China, in 2008. Dr. Chen is the
deputy director of National Engineering Research
Center for Multimedia Software, and a professor
in school of computer science, Wuhan University.
His research interests include multimedia analysis,
computer vision and security emergency information
processing, where he has published over 50 papers.

Effective Teaching & Learning Environment, 20th Nov, 23
No ratings yet
Effective Teaching & Learning Environment, 20th Nov, 23
16 pages
Lesson Plan For General Studies 2025
No ratings yet
Lesson Plan For General Studies 2025
3 pages
Psychological Analysis of Julias Ceaser
No ratings yet
Psychological Analysis of Julias Ceaser
4 pages
MAT: Motion-Aware Multi-Object Tracking
No ratings yet
MAT: Motion-Aware Multi-Object Tracking
13 pages
2017 - Extended Object Tracking Using IMM Approach For A Real-World Vehicle
No ratings yet
2017 - Extended Object Tracking Using IMM Approach For A Real-World Vehicle
6 pages
DEFT - Object Tracking
No ratings yet
DEFT - Object Tracking
14 pages
Mid Term Report
No ratings yet
Mid Term Report
10 pages
Theories Chart
No ratings yet
Theories Chart
2 pages
Free PDF Brain Over Binge Basics CURRENT
No ratings yet
Free PDF Brain Over Binge Basics CURRENT
25 pages
Diffusion Track
No ratings yet
Diffusion Track
9 pages
Literature Review Wikihow
100% (1)
Literature Review Wikihow
6 pages
Languages in Finnish
No ratings yet
Languages in Finnish
5 pages
Thesis Type Crossword
100% (3)
Thesis Type Crossword
4 pages
Comparative Evaluation of Sort, Deepsort, and Bytetrack For Multiple Object Tracking in Highway Videos
No ratings yet
Comparative Evaluation of Sort, Deepsort, and Bytetrack For Multiple Object Tracking in Highway Videos
11 pages
Remotesensing 15 00874 v2
No ratings yet
Remotesensing 15 00874 v2
18 pages
Lesson Plan Template (Task-Based Grammar Lesson)
No ratings yet
Lesson Plan Template (Task-Based Grammar Lesson)
4 pages
Multiple Intelligences 2023 02 27
No ratings yet
Multiple Intelligences 2023 02 27
19 pages
MC Track
No ratings yet
MC Track
14 pages
Electronics 13 00091
No ratings yet
Electronics 13 00091
16 pages
Multi-Object Tracking and Segmentation With Embedding Mask-Based Affinity Fusion in Hierarchical Data Association
No ratings yet
Multi-Object Tracking and Segmentation With Embedding Mask-Based Affinity Fusion in Hierarchical Data Association
15 pages
Extendable Multiple Nodes Recurrent Tracking Framework With 11e0w4nc
No ratings yet
Extendable Multiple Nodes Recurrent Tracking Framework With 11e0w4nc
16 pages
Heterogeneous Graph Transformer For Multiple Tiny
No ratings yet
Heterogeneous Graph Transformer For Multiple Tiny
14 pages
多模态目标跟踪综述
No ratings yet
多模态目标跟踪综述
40 pages
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
No ratings yet
IET Computer Vision - 2019 - Xu - Deep Learning For Multiple Object Tracking A Survey
14 pages
NAMA: Shabiya Putri A NPM: 120404200038 Kelas: C Assignment
No ratings yet
NAMA: Shabiya Putri A NPM: 120404200038 Kelas: C Assignment
3 pages
Taxonomy of Multiple Target Tracking Methods
No ratings yet
Taxonomy of Multiple Target Tracking Methods
15 pages
IET Image Processing - 2021 - Song - Multiple Object Tracking Based On Multi Task Learning With Strip Attention
No ratings yet
IET Image Processing - 2021 - Song - Multiple Object Tracking Based On Multi Task Learning With Strip Attention
13 pages
Real-Time Online Multi-Object Tracking in Compressed Domain: Qiankun Liu, Bin Liu, Yue Wu, Weihai Li, Nenghai Yu
No ratings yet
Real-Time Online Multi-Object Tracking in Compressed Domain: Qiankun Liu, Bin Liu, Yue Wu, Weihai Li, Nenghai Yu
10 pages
A Vehicle Tracking Algorithm Combining Detector and Tracker: Research Open Access
No ratings yet
A Vehicle Tracking Algorithm Combining Detector and Tracker: Research Open Access
20 pages
Depth Perspective-Aware Multiple Object Tracking: A B C A A C
No ratings yet
Depth Perspective-Aware Multiple Object Tracking: A B C A A C
38 pages
Multiple Object Tracking in Recent Times A Literat
No ratings yet
Multiple Object Tracking in Recent Times A Literat
18 pages
SQL CV
No ratings yet
SQL CV
2 pages
Camo-Mot: Combined Appearance-Motion Optimization For 3D Multi-Object Tracking With Camera-Lidar Fusion
No ratings yet
Camo-Mot: Combined Appearance-Motion Optimization For 3D Multi-Object Tracking With Camera-Lidar Fusion
13 pages
Video Object Tracking Based On Yolov7 and Deepsort: Feng Yang,, Xingle Zhang and Bo Liu
No ratings yet
Video Object Tracking Based On Yolov7 and Deepsort: Feng Yang,, Xingle Zhang and Bo Liu
4 pages
A Real Time Face Tracking System Based On Multiple Information Fusion
No ratings yet
A Real Time Face Tracking System Based On Multiple Information Fusion
19 pages
Multi Object Tracking - A Literature Review
No ratings yet
Multi Object Tracking - A Literature Review
23 pages
SOP Sample For Data Science in Canada
No ratings yet
SOP Sample For Data Science in Canada
2 pages
Object Tracking Thesis PDF
100% (3)
Object Tracking Thesis PDF
8 pages
Dense Scene Multiple Object Tracking With Box-Plane Matching
No ratings yet
Dense Scene Multiple Object Tracking With Box-Plane Matching
5 pages
Tracklet Siamese Network With Constrained Clustering For Multiple Object Tracking
No ratings yet
Tracklet Siamese Network With Constrained Clustering For Multiple Object Tracking
4 pages
Learning Episode 6: Enhancing A Face-To-Face Learning Environment
No ratings yet
Learning Episode 6: Enhancing A Face-To-Face Learning Environment
4 pages
Pattern Recognition: Jinlong Peng, Tao Wang, Weiyao Lin, Jian Wang, John See, Shilei Wen, Erui Ding
No ratings yet
Pattern Recognition: Jinlong Peng, Tao Wang, Weiyao Lin, Jian Wang, John See, Shilei Wen, Erui Ding
14 pages
Applsci 12 09597 v2
No ratings yet
Applsci 12 09597 v2
16 pages
Reflective Theories Writing Assignment
No ratings yet
Reflective Theories Writing Assignment
7 pages
Aronoff, Mark (1976) : Word Formation in Generative Grammar. Massachussetts: The MIT Press
79% (14)
Aronoff, Mark (1976) : Word Formation in Generative Grammar. Massachussetts: The MIT Press
74 pages
Multiple Object Tracking: A Literature Review
No ratings yet
Multiple Object Tracking: A Literature Review
49 pages
Real-Time Multiple Object Tracking Using Deep Learning Methods2021
No ratings yet
Real-Time Multiple Object Tracking Using Deep Learning Methods2021
30 pages
STT: Stateful Tracking With Transformers For Autonomous Driving
No ratings yet
STT: Stateful Tracking With Transformers For Autonomous Driving
8 pages
Analysis Based On Recent Deep Learning Approaches Applied in Real-Time Multi-Object Tracking A Review
No ratings yet
Analysis Based On Recent Deep Learning Approaches Applied in Real-Time Multi-Object Tracking A Review
22 pages
Thesis Object Tracking
100% (3)
Thesis Object Tracking
4 pages
Cesario Mosende Mejica JR.: Maypangdan, Borongan, City
No ratings yet
Cesario Mosende Mejica JR.: Maypangdan, Borongan, City
2 pages
Which Framework Is Suitable For Online 3D Multi-Object Tracking For Autonomous Driving With Automotive 4D Imaging Radar?
No ratings yet
Which Framework Is Suitable For Online 3D Multi-Object Tracking For Autonomous Driving With Automotive 4D Imaging Radar?
8 pages
1.mot Ijsae
No ratings yet
1.mot Ijsae
10 pages
CORT: Class-Oriented Real-Time Tracking For Embedded Systems
No ratings yet
CORT: Class-Oriented Real-Time Tracking For Embedded Systems
10 pages
Personal Happiness Planner
100% (6)
Personal Happiness Planner
39 pages
Trans Track
No ratings yet
Trans Track
11 pages
Technology Lesson Plan - Lets Talk About Food
No ratings yet
Technology Lesson Plan - Lets Talk About Food
3 pages
JDAN
No ratings yet
JDAN
17 pages
Caregiving 7: Teachnology and Livelihood Education Learning Module 4
No ratings yet
Caregiving 7: Teachnology and Livelihood Education Learning Module 4
4 pages
Hybrid Motion Model For Multiple Object Tracking in Mobile Devices
No ratings yet
Hybrid Motion Model For Multiple Object Tracking in Mobile Devices
14 pages
Concept Selection
No ratings yet
Concept Selection
30 pages
A Radar Target Tracking Algorithm Based On
No ratings yet
A Radar Target Tracking Algorithm Based On
5 pages
Sha STA
No ratings yet
Sha STA
10 pages
Electronics 10 02406 v2
No ratings yet
Electronics 10 02406 v2
31 pages
Zhang MotionTrack End-to-End Transformer-Based Multi-Object Tracking With LiDAR-Camera Fusion CVPRW 2023 Paper
No ratings yet
Zhang MotionTrack End-to-End Transformer-Based Multi-Object Tracking With LiDAR-Camera Fusion CVPRW 2023 Paper
10 pages
Sensors 23 03852
No ratings yet
Sensors 23 03852
28 pages
Trackformer
No ratings yet
Trackformer
16 pages
Multiple Object Tracking A Literature Review
No ratings yet
Multiple Object Tracking A Literature Review
35 pages
Traffic-Net: 3D Traffic Monitoring Using A Single Camera: Mahdi Rezaei, Mohsen Azarmi, Farzam Mohammad Pour Mir
No ratings yet
Traffic-Net: 3D Traffic Monitoring Using A Single Camera: Mahdi Rezaei, Mohsen Azarmi, Farzam Mohammad Pour Mir
21 pages
GNN PMB (Tiv)
No ratings yet
GNN PMB (Tiv)
16 pages
11 Chapter 3
No ratings yet
11 Chapter 3
25 pages
Lesson 1 - DEFINITION OF BASIC CONCEPTS AND IMPORTANT TERMS
No ratings yet
Lesson 1 - DEFINITION OF BASIC CONCEPTS AND IMPORTANT TERMS
9 pages
5133 Detailed Lesson Plan
No ratings yet
5133 Detailed Lesson Plan
6 pages
Traffic-Net: 3D Traffic Monitoring Using A Single Camera: Mahdi Rezaei, Mohsen Azarmi, Farzam Mohammad Pour Mir
No ratings yet
Traffic-Net: 3D Traffic Monitoring Using A Single Camera: Mahdi Rezaei, Mohsen Azarmi, Farzam Mohammad Pour Mir
21 pages
IEP Case Study: PART I: Background Information
No ratings yet
IEP Case Study: PART I: Background Information
48 pages
"A Study On Assessment of Entrepreneurial Skills Among Students" PDF
No ratings yet
"A Study On Assessment of Entrepreneurial Skills Among Students" PDF
12 pages
Hu Joint Monocular 3D Vehicle Detection and Tracking ICCV 2019 Paper
No ratings yet
Hu Joint Monocular 3D Vehicle Detection and Tracking ICCV 2019 Paper
10 pages
English 320: English Grammar Syllabus, Fall 2006: Stacyg@cwu - Edu
No ratings yet
English 320: English Grammar Syllabus, Fall 2006: Stacyg@cwu - Edu
5 pages
Allen
100% (3)
Allen
4 pages
Traffic-Net - 3D Traffic Monitoring Using A Single Camera
100% (1)
Traffic-Net - 3D Traffic Monitoring Using A Single Camera
21 pages
Challenges of Organizational Change
100% (1)
Challenges of Organizational Change
2 pages
Multi Object Tracking in Traffic Environments: A Systematic Literature
No ratings yet
Multi Object Tracking in Traffic Environments: A Systematic Literature
13 pages
Research Final Output
100% (1)
Research Final Output
58 pages
Multiple Object Tracking: A Literature Review
No ratings yet
Multiple Object Tracking: A Literature Review
36 pages
A Detection-Based Multiple Object Tracking Method: Mei Han Amit Sethi Yihong Gong
No ratings yet
A Detection-Based Multiple Object Tracking Method: Mei Han Amit Sethi Yihong Gong
4 pages
Joint Monocular 3D Vehicle Detection and Tracking
No ratings yet
Joint Monocular 3D Vehicle Detection and Tracking
18 pages
1602 00763
No ratings yet
1602 00763
5 pages
Pedestrian Detection and Tracking
No ratings yet
Pedestrian Detection and Tracking
13 pages
Mscthesis Final Submit PDF
No ratings yet
Mscthesis Final Submit PDF
140 pages

00 (2020) Sun Chen - A Survey of Multiple Pedestrian Tracking Based On Tracking-By-Detection Framework

Uploaded by

00 (2020) Sun Chen - A Survey of Multiple Pedestrian Tracking Based On Tracking-By-Detection Framework

Uploaded by

This article has been accepted for publication in a future issue of this journal, but has not been

A Survey of Multiple Pedestrian Tracking Based on

Abstract—Multiple pedestrian tracking (MPT) has gained Model-free-tracking (MFT)

2008 2009 2010 2011 2012 2015 2017 2019

Feature 1 Track Detection Track update Track termination Track Initialization

other localization methods. 7 MOT2019: https://motchallenge.net/workshops/bmtt2019/tracking.html

TABLE I: Summary of other localization methods used in TBD-based algorithms

(a) (b) (c) (d) (e)

TABLE II: Summary of main features used in TBD-based algorithms.

TABLE III: Summary of offline data association TBD-based algorithms

thereby improving the tracking performance of the TBD

TABLE IV: MPT datasets and description.

TABLE VII: Results of TBD-based algorithms on MOT2016.

TABLE VIII: Characteristics of TBD-based algorithms on MOT2016. EG denotes epipolar geometry.

TABLE IX: Results of TBD-based algorithms on MOT2017.

in Fig. 6(a) and 6(b). It is interesting to note that SDP and

comparison, we have used the same pre-processing method for

IOU Tracker MOTDT Tracker SST Tracker SORT Tracker

Fig. 7: The performance of tracking algorithm with different detection results.

Zhihong Sun received the B.E degree from the

You might also like