0% found this document useful (0 votes)
11 views16 pages

BVV LargeScale TSP20

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views16 pages

BVV LargeScale TSP20

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

1

PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020.

A Solution for Large-Scale Multi-Object Tracking


Michael Beard, Ba Tuong Vo, Ba-Ngu Vo

Abstract—A large-scale multi-object tracker based on the gen- meaningful indication of complexity is the density of the objects
eralised labeled multi-Bernoulli (GLMB) filter is proposed. The and measurements in both space and time. For example, it is
algorithm is capable of tracking a very large, unknown and time- straightforward to track a large cumulative number of objects
varying number of objects simultaneously, in the presence of a
high number of false alarms, as well as missed detections and over time, when only a small number are present simultaneously
measurement origin uncertainty due to closely spaced objects. at any given instant. Likewise, a large number of spatially
The algorithm is demonstrated on a simulated tracking scenario, well-separated objects is relatively easy to track, since they
where the peak number objects appearing simultaneously exceeds can be considered as statistically independent entities. In these
one million. Additionally, we introduce a new method of applying cases, it is likely that running single-object filters in parallel
the optimal sub-pattern assignment (OSPA) metric to determine a
meaningful distance between two sets of tracks. We also developed would suffice. Difficulties begin to arise when objects come
an efficient strategy for its exact computation in large-scale into close spatial proximity from the point of view of the
scenarios to evaluate the performance of the proposed tracker. sensor, and the ensuing increase in data association ambiguity
leads to a combinatorial explosion in the number of statistically
Index Terms—Random finite sets, generalised labeled multi- likely observation events. In dynamic multi-object scenarios, the
Bernoulli, multi-object tracking, large-scale tracking problem is further compounded by the motion of the objects.
Various methods have been proposed to address the com-
putational challenges of tracking a large number of objects
I. I NTRODUCTION
simultaneously. One of the earliest works on large scale multi-
Multi-object tracking is a problem with a wide variety of object tracking proposed the use of efficient spatial searching
applications across diverse disciplines, and numerous effective algorithms to associate measurements to tracks [12]. Although
solutions have been developed in recent decades [1], [2], [3]. capable of processing a very large number of objects, the
The common goal of multi-object tracking is to estimate the proposed method does not account for dependencies between
trajectories of an unknown and time-varying number of objects, closely spaced objects, and thus its performance is likely to
using sensor measurements corrupted by phenomena including degrade in such cases. In [13] it was suggested that the same
observation noise, false alarms, missed detections, and data techniques could be applied to help improve the efficiency of
association uncertainty. The combination of these effects gives the joint probabilistic data association (JPDA) algorithm, but
rise to a highly demanding computational task, with complexity no numerical results were provided. An alternative approach
that grows exponentially as the number of objects/measurements was proposed in [14], known as linear multi-target integrated
increases. Tracking a very large number of objects simultane- probabilistic data association (LMIPDA). This method reduces
ously (in excess of hundreds of thousands) is thus a challenging computation using an approximation that treats nearby objects
problem, with important practical applications. A few notable as additional sources of clutter, and the resulting algorithm
examples are: (i) space situational awareness, which requires was demonstrated on a simulated 50-object scenario. An ap-
tracking thousands of satellites and millions of debris objects proach based on 2D assignment of measurements to tracks was
[4], [5], [6]; (ii) wide area surveillance (e.g. monitoring large proposed in [15], with an application to large-scale air traffic
urban environments), requires tracking hundreds of thousands surveillance. The scenario had a total of about 800 tracks,
of objects over time, including vehicles and people in crowded however, the target/measurement density and the number of
environments [7], [8], [9]; (iii) cell biology, where tracking the simultaneous objects was not provided.
motion of large numbers of cells is critical to understanding Algorithms based on multiple hypothesis tracking (MHT)
their behaviour in living tissues [10], [11]; (iv) wildlife biology, have also been proposed for tracking large numbers of objects.
where tracking large animal populations is needed to study the For example, applications to cell tracking [10] and wildlife
behaviour of wildlife in their natural habitats [16]. tracking [16], have been demonstrated on thousands of objects
The total number of objects in a scenario is often quoted in total, with several hundred objects simultaneously [16]. The
as a major factor influencing the computational complexity of labelled multi-Bernoulli (LMB) filter, a one-term approximation
a multi-object tracking problem. This is partially true, since of the generalised labeled multi-Bernoulli (GLMB) filter, has
an increase in the number of objects leads to an increase in also been demonstrated on a simulated scenario with over a
the number of possible events that a tracking algorithm must thousand objects simultaneously [17], and in [18] it was used
consider. However, this is not the only concern, and a more with spatial searching to track hundreds of sea-ice objects.
An alternative approach to improving the scalability of multi-
M. Beard, B. T. Vo and B.-N. Vo are with the School of Electrical
Engineering, Computing and Mathematical Sciences, Curtin University, Bentley, target tracking systems is the concept of distributed estimation,
WA 6102, Australia. which spreads the computational load across multiple sensor
This work is supported by the Australian Research Council under discovery nodes, each of which only processes observations from its
projects DP170104584 and DP160104662.
This manuscript has supplementary downloadable material made available via local region. Random finite set (RFS) based multi-sensor fusion
IEEE Xplore. This includes two illustrative videos of the tracking results. algorithms have been proposed for the PHD/CPHD filters [19],
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 2

[20], [21], [22], multi-Bernoulli filter [23], [24], [25], [26], distance between two tracks, we are able to use OSPA to
and hybrid Poisson multi-Bernoulli filter [27]. However these construct a physically meaningful distance between two sets
approaches are not true multi-object trackers, since only the of tracks, which we called OSPA(2) to distinguish it from the
current states are estimated. Multi-sensor fusion for multi-object standard use. To evaluate the performance of the proposed
tracking filters was first examined in [28], where fusion rules tracker on a scenario involving an unknown and time-varying
for the LMB filter [29] and marginalised GLMB filter [30], number of objects with a peak in excess of one million, we
were proposed and demonstrated for multi-sensor multi-object developed a scalable procedure for exact computation of the
tracking. A variation for the LMB filter was subsequently OSPA(2) metric. Preliminary results on OSPA(2) were published
proposed in [31], where sensor fusion is performed based on in [42], [43]. This paper provides complete mathematical details.
a Cauchy–Schwarz divergence. Further works on robustness The rest of the paper is structured as follows. Section II
of distributed multi-object estimation have been reported in provides the necessary background on the GLMB filter. Section
[32], [33], and the latest works on computationally efficient III presents some theoretical results regarding the decomposition
implementations were proposed in [34]. of GLMB densities. In section IV we apply these results to
To satisfy the competing demands of computational efficiency implement an efficient GLMB filter, capable of handling large-
versus accuracy, a trade-off is necessary. In this paper, we scale multi-object tracking problems. Sections V and VI present
present a multi-object tracking filter that accomplishes this the OSPA(2) metric, and its use to evaluate the proposed large-
trade-off by exploiting the properties of GLMB densities, and scale tracker. Some concluding remarks are given in section VII.
the standard multi-object likelihood through a principled ap- Mathematical proofs are given in the Appendix.
proximation. The result is an algorithm that can be applied to
large-scale multi-object tracking scenarios exhibiting commonly II. BACKGROUND : G ENERALISED L ABELED
encountered structural properties, without suffering from an M ULTI -B ERNOULLI T RACKER
intractable increase in computational complexity. The algorithm In multi-object systems, tracking is distinct from filtering, in
is highly effective when the scenario consists of isolated groups the sense that tracking involves the estimation of the trajectories
of high object/measurement density, and is capable of adapting of objects over time, as opposed to the multi-object state at each
to changes in the structure of these groups over time. time instant. The generalised labelled multi-Bernoulli (GLMB)
Our proposed method is based on functional approximation filter is an algorithm that is specifically designed to provide
of the multi-object density (equivalent to a probability density estimates of object trajectories by modeling the multi-object
for finite-set-valued random variable [35]) – a key element state as a labeled random finite set (RFS). In this section we
of the random finite set (RFS) approach – that encapsulates briefly revisit the GLMB filter, and the interested reader is
all information on the current set of tracks in a single non- referred to [36], [37], [38] for more detailed treatments.
negative function. Processing the large number of combinations We begin by defining the notion of a labeled RFS. Let X be a
of events translates to recursive computation of the multi- single-object state space, L a discrete label space, L : X × L →
object filtering density [36], [37], [38]. Tractability hinges on L the projection defined by L ((x, `)) = ` for all points (x, `) ∈
efficient functional approximation/computation of the so-called X × L. Denote by F(S) the collection of all finite subsets of
GLMB filtering recursion, under limited processing/memory some underlying space S. Now consider X∈ F(X × L) and its
resources. Conceptually, the key enablers in our proposed large- corresponding label set L (X) = {L (x) : x ∈ X}. Then the
scale tracker are: (i) adaptive approximation of the GLMB labels of the points in X are distinct if and only if X and
filtering density, at each time, by a product of tractable and its label set L (X) have equal cardinality. This is expressed
approximately independent GLMB densities; and (ii) efficient mathematically by defining a distinct label indicator function
parallel computation of these GLMB densities by exploiting the
conjugacy of the GLMB family. This strategy is distinct from the ∆ (X) , δ|X| [|L (X)|] ,
approach in [17], where the GLMB is approximated by a single
which has value 1 if the labels in X are distinct and 0 otherwise.
term. In essence, our strategy efficiently identifies and processes
A labeled RFS is defined as a marked RFS with distinct marks
significant combinations by exploiting structural properties and
[36]. More precisely, a labeled RFS with state space X and
parallelisation, to make the most of the limited computing re-
label space L is an RFS of X × L, constructed by marking the
sources. Consequently, while the focus of this paper is on large-
elements of an RFS of X with distinct labels from L, i.e. any
scale problems, our solution also provides significant efficiency
realisation X must satisfy ∆ (X) = 1.
gains when applied to smaller scale problems.
Our study would be incomplete without evaluating the track-
ing performance of the proposed multi-object tracker, which is A. Multi-Object Dynamic Model
a challenging task in itself [39], [40]. We require a measure Given the multi-object state X (at time k) with label space
of dissimilarity between two sets of tracks, which: (i) is phys- L, each (x, `) ∈ X either survives with probability PS (x, `)
ically meaningful; (ii) satisfies the properties of a metric for and evolves to a new state (x+ , `+ ) (at time k + 1) with
mathematical consistency; and (iii) is computable for scenarios probability density f+ (x+ |x, `) δ` [`+ ] or dies with probability
involving millions of tracks. The optimal sub-pattern assignment 1 − PS (x, `). The surviving label space L , Lk is given
(OSPA) metric [41], in its most commonly used form, measures by a disjoint union ofU birth label spaces Bt for all times
k
the distance between two sets of states, and does not take t = 0, . . . , k, i.e. Lk = t=0 Bt . To ensure that the birth label
into account phenomena such as track switching and track spaces are disjoint, each birth label is constructed as an ordered
fragmentation. Nonetheless, by developing a meaningful base- pair, consisting of the birth time and a unique index, i.e.
3
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020.

|B |
t
Bt = {(t, i)} i=1 . The set B + of new objects (born at time GLMB densities on F(X × L) by GL , and adopt the following
k+1) with birth label space B+ , Bk+1 is distributed according notation for a GLMB density in terms of its parameters
to the labeled multi-Bernoulli (LMB) density n o
L(B + ) π, w(I,c) , p(c) (I,c)∈F (L)×C . (3)
(B+ )  B −L(B + ) B +
f B,+ (B +) = ∆(B +) 1B + rB,+ [1−rB,+] + pB,+ ,
If the multi-object filtering density at the current time step is
X Q ∅ a GLMB given by (3), then the multi-object filtering density at
where [h] , x∈X h (x) (with [h] = 1) is a multi-object
exponential, rB,+ (`) is the probability that a new object with the next time step, given by the multi-object Bayes recursion
label ` is born, and pB,+ (·, `) is the distribution of its kinematic
Z
state [36]. The multi-object state X + (at time k + 1) with label π + (X + |Z+ ) ∝ g (Z+ |X + ) π(X + ) f (X + |X)δX, (4)
space L+ , L ∪ B+ is formed by the union of surviving objects
is also a GLMB with parameters [38]
and new born objects. Using the standard assumption that, n o
(I ,c,θ ) (c,θ )
conditional on X, objects move, appear and die independently π + = wZ++ + , pZ+ + , (5)
of each other, the expression for the multi-object transition (I+ ,c,θ+ )∈F (L+ )×C×Θ+

density f + is given by [36], [37], [38] where


(I ,c,θ+ ) (I,c,I+ ,θ+)
X
+
f + (X + |X) = f S,+ (X + ∩(X × L) |X)f B,+
(B )
(X + −(X × L)) , wZ++ ∝ w(I,c) ωZ+ , (6)
I
where (I,c,I+,θ+)
h
(c)
iI−I+ h
(c)
iI∩I+
ωZ+ = 1Θ+ (I+ ) (θ+ ) 1 − P̄S P̄S (7)
X
f S,+ (W |X) = ∆(W )∆(X)1L(X) (L (W ))[Φ (W ; ·)] , h iI
B −I B+ ∩I+ (c,θ ) +

Φ (W ; x, `) = 1 − 1L(W ) (`) (1 − PS (x, `)) × [1 − rB,+ ] + + rB,+ ψ̄Z+ + , (8)
X D E
(c)
+ δ` [`+ ] PS (x, `)f+ (x+ |x, `) . P̄S (`) = p(c) (·, `) , PS (·, `) , (9)
(x+ ,`+ )∈W D E
(c,θ ) (c) (θ (` ))
ψ̄Z+ + (`+ ) = p̄+ (·, `+ ) , Z++ + (·, `+ ) , (10)
B. Multi-Object Observation Model (c) PS (·, `+)f+ (x+ |·, `+) , p (c)
(·, `+)
p̄+ (x+ , `+) = 1L (`+)
For a given multi-object state X, each (x, `) ∈ X is (c)
P̄S(`+ )
either detected with probability PD (x, `) and generates a de-
+ 1B+ (`+ ) pB,+ (x+ , `+ ) , (11)
tection z with likelihood g (z|x, `) or missed with probability
1 − PD (x, `). The multi-object observation Z is the union of (c) (θ+ (`+ ))
the observations from detected objects and Poisson clutter with (c,θ ) p̄+ (x+ , `+ ) Z+ (x+ , `+ )
pZ+ + (x+ , `+ ) = (c,θ )
. (12)
intensity κ. The multi-object likelihood function is given by ψ̄Z+ + (`+ )
[36], [37], [38]
The recursive propagation of the current filtering density (3)
(θ(`))
X Y
g (Z|X) = Z (x, `) , (1) to the next time is more compactly expressed by a GLMB joint
θ∈Θ(L(X)) (x,`)∈X prediction
 and update
 operator Ω : GL → GL+ defined by
(B ) (B )
where Θ is the set of positive 1-1 maps θ : L → {0 : |Z|}, Ω π; f B + , Z+ = π + according to (5)-(12), where f B +
i.e. maps such that no two distinct labels are assigned the same is the next birth density and Z+ is the next measurement set.
positive value, Θ (I) is the subset of Θ with domain I, and
( P (x,`)g(z |x,`)
D j
III. L ARGE -S CALE GLMB F ILTERING
(j) κ(zj ) , j ∈ {1, . . . , |Z|}
(x, `) = . Due to practical limitations on computational resources, the
{z1:|Z| } 1 − PD (x, `) , j = 0 original implementation of the GLMB filter proposed in [36],
The map θ specifies that object ` generates detection zθ(`) ∈ Z, [37] cannot accommodate a very large number of objects
with θ (`) = 0 if ` is undetected. The positive 1-1 property simultaneously. The main computational bottleneck occurs in
means that θ is 1-1 on {` : θ (`) > 0}, and ensures that any the measurement update, which involves processing each com-
detection in Z is generated from at most one object. ponent of the predicted GLMB density using Murty’s algo-
rithm to find the K most significant components according
to their weights. If there are N labels in a component and
C. Generalised Labelled Multi-Bernoulli Random Finite Sets
M measurements
 in total, then the complexity of the update
A generalised labeled multi-Bernoulli RFS is defined as a 3
is O K (N + M ) . Murty’s algorithm can be replaced with
class of labeled RFS that is distributed according to a multi- Gibbs sampling to reduce the computational complexity of pro-
object density with the form 
cessing each component down to O KN 2 M [38]. However,
X h iX the quadratic complexity in the number of objects will still
π (X) = ∆ (X) w(I,c) δI (L (X))) p(c) , (2)
render this algorithm infeasible for tracking a large number
(I,c)∈F (L)×C
of objects simultaneously. Arguably, any feasible solution for
where F(L) is the space of all finite subsets of L, C is some large-scale multi-object tracking should have a maximum com-
finite space, w(I,ξ) is a non-negative weight such that
P each(I,c) putational complexity of approximately O (KN M · log(N M )).
= 1 and each p(c) (·, `) is a probability
P
I∈F (L) c∈C w In many practical multi-object scenarios, the objects are
density on X. For convenience, we denote the space of all not uniformly distributed across the state space, but often in
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 4

(L) (I) (J)


separate groups. This structure can be exploited to improve Z+ ⊂ Z+ , such that Z+ ∩ Z+ = ∅ for I 6= J. We seek
computational efficiency of the multi-object tracker. Rather to approximate the new filtering density as an L+ -partitioned
than representing the entire multi-object density as one “large” GLMB on F (X × L+ ).
GLMB, we can approximate it as a product of much “smaller” For a particular choice of L+ ∈ P (L+ ) and Z+ ∈ P (Z+ ),
GLMBs, herein referred to as label-partitioned GLMBs. This is let A+ (L+ , Z+ ) denote the space of all positive 1-1 mappings
based on the premise that a large GLMB, with well-separated A+ : L+ → {∅} ] Z+ , where positive 1-1 means 1-1 on Z+ .
groups of estimated objects, is decomposable into a product In other words, a given A+ ∈ A+ (L+ , Z+ ) maps each group
of almost independent smaller GLMBs. Consequently, such an of labels in the partition L+ , to either {∅} or a unique group
approximation results in a negligible loss of information, whilst of measurements in the partition Z+ , i.e. A+ (L) is the set of
providing significant gains in computational efficiency. measurements corresponding to the set of labels L. Ideally, we
seek the approximation
A. Label-Partitioned GLMB
n o
(L)
π L+ ,Z+ ,A+ = π L+ ,Z+ ,A+ , (14)
A labeled RFS density on F(X × L) is said to be label- 
L∈L+

partitioned if it can be written as the following product (L) (L∩B )
π L+ ,Z+ ,A+ = Ω π L ; f B + , A+ (L) , (15)
Y (L)
π L (X) = π L (X ∩ (X × L)) , where
L∈L 
(L+ , Z+ , A+ ) = arg min DKL π + , π L+ ,Z+ ,A+ , (16)
where L is some partition of the label space L, and each factor L+ ∈P(L+ )
(L)
π L is a labeled RFS density U on F(X × L). Note that for
Z+ ∈P(Z+ )
A+ ∈A(L+ ,Z+ )
all X ∈ F (X × L), X = L∈L X ∩ (X × L),1 and hence
{F (X × L)} L∈L is also a partition of F (X × L). We call π L , s.t. |L| ≤ Lmax , ∀L ∈ L+ , (17)
(L)
denoted by its factors {π L }L∈L , an L-partitioned labeled RFS and DKL (·, ·) denotes the Kullback-Leibler divergence (KLD).
(L)
density. Further, if each factor π L is a GLMB, then π L is Here Lmax is a user parameter determined by the available
said to be an L-partitioned GLMB on F (X × L). We denote computational resources. Higher values of Lmax result in a more
by GL (L) the space of all L-partitioned GLMBs on F (X × L). accurate approximation to the multi-object filtering density,
Suppose that the current filtering density π L is an L- however, more memory and faster processing will be required
partitioned GLMB on F (X × L). Then, the prediction to the to achieve real-time performance. Smaller values yield a coarser
next time is also an L-partitioned GLMB. However, the resulting approximation, but real-time performance is achievable with less
filtering density generally does not take on the same form. computational resources. Choosing Lmax = 1 is equivalent to
While the new filtering density π + is still a GLMB that, in running parallel Bernoulli filters [3, Section 14.7].
principle, can be computed [36], [37], [38], a direct computation The combinatorial optimization problem in (16) is intractable,
via expansion of the product and subsequent update is not since the space of partitions is prohibitively large. We propose
scalable in practice. Moreover, due to object births, deaths and an approximate two-step solution which is tractable for large-
transitions, the current partition L of L is unsuitable as a basis scale problems. The first step involves choosing a suboptimal
for approximating the new filtering density. partition of L+ , subject to a constraint on the maximum group
Nonetheless, when estimated objects in the new filtering cardinality. The second step involves parallel computation of
density also occur in separate groups, as do the sets of mea- the factors in the new filtering density, directly as a product of
surements which are associated with different groups of objects, factors from the old filtering density. This strategy avoids the
it is possible to exploit this structure to improve computa- explicit expansion of the product, and subsequent refactorisa-
tional efficiency. To maintain scalability and parallelisability, tion after update, which would be intractable. These steps are
we approximate the new filtering density as a label partitioned described in the following two subsections.
GLMB. This entails selection of the optimal partition of labels
and measurements, and the optimal association of groups of B. Label Partitioning
labels to groups of measurements, in some statistical sense.
Intuitively the selection of the partitions and associations should We aim to find a partition of the label space such that any pair
result in negligible statistical dependence between labels and of labels that do not appear in the same group are approximately
measurements across different groups. statistically independent. In the standard multi-object tracking
Let P (L), P (L+ ) and P (Z+ ) denote the sets of all partitions model, as defined in Sections II-A and II-B, this statistical
of the current label space L, the new label space L+ and the dependence arises solely via the uncertainty in the unknown
new measurements Z+ respectively. Suppose that the current association between measurements and objects. That is, if a
filtering density is an L-partitioned GLMB on F(X × L), particular detection could have originated from one of several
n o objects, then there will be a statistical dependence between those
(L)
πL = πL . (13) objects in the multi-object filtering density.
L∈L
Intuitively, tracks that are well-separated in the measurement
Consider a new partition L+ ∈ P (L+ ), where each set of space will have low statistical dependence, because the proba-
labels L ∈ L+ is associated with a set of gated measurements bility that these tracks give rise to closely spaced measurements
1 The disjoint union notation (]), in expressions involving unions over a
is extremely low. In this context, “well-separated” means that
partition, would be equivalent to the standard union. However, we use it to the distance between tracks in the measurement space is large
serve as a reminder that the constituent sets of the union are disjoint. compared to the measurement noise and the uncertainty in
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 5

the objects’ predicted location. This is the key property that A naive approach to partitioning L+ , subjectto the  conditions
we exploit in order to partition the label space in a way 2
above, will have computational complexity O |L+ | since all
that minimises the amount of potential measurement sharing possible label pairs must be examined to determine whether their
between objects in different groups. gating regions intersect. This is clearly infeasible for large-scale
In principle, this can be achieved by analysing the distribu- tracking problems. Fortunately, techniques in computational
tions of the predicted measurements corresponding to all objects geometry can be applied to dramatically reduce the computa-
represented in a GLMB. Suppose each factor has the form tional complexity of the partitioning, thereby making it feasible
for such problems. A method for efficient implementation is
n o
(L) (I,c) (c)
πL = wL,L , pL,L . (18)
(I,c)∈F (L)×C(L) discussed in Section IV-A.
Then for each label ` ∈ L+ we compute the distribution of the
C. Computing Label-Partitioned GLMB Filtering Density
predicted measurement
Suppose the current filtering density is an L-partitioned
p̃+ (z, `) = hg(z|·, `), p+ (·, `)i (19) GLMB, and that we are given a new partition L+ of L+ . The
goal is to approximate the filtering density at the next time as an
via the distribution of the predicted state
L+ -partitioned GLMB. To achieve this in a way that is scalable
p+ (x, `) ∝ 1B+ (`) rB (`) pB (x, `) + to problems involving a very large number of objects, we
X X consider approximating the current filtering density according
1L (`) 1I (L) (`) ×
to a modified partition structure S = {L ∩ L : L ∈ L+ }, i.e.
L∈L (I (L) ,c(L) )∈
F (L)×C(L)
we take each element of the new partition L+ , and intersect it
  with the current label set L. The current filtering density can
(I (L) ,c(L) ) (c(L) ) then be approximated as an S-partitioned GLMB
wL,L pL,L (·) , f (x|·, `) . (20)
n o
(S)
πS = πS (22)
The distribution p̃+ (·, `) can be used to construct a “measure- S∈S
ment gating region”, B(`) ⊆ Z for each label ` ∈ L+ , which that mimimises DKL (π L ; π S ). This approximation is given
contains the majority of the probability mass for the predicted explicitly in Proposition 1 (see Appendix A for proof).
measurement. These gating regions are the basis for partitioning
the next label space L+ . A partition L+ = L1 , . . . , L|L| Proposition
n o 1. Given an L-partitioned GLMB π L =
(L)
is formed by splitting L+ into groups of labels whose corre- πL on F(X × L), and suppose that S is another
L∈L
sponding gating regions intersect (either directly, or indirectly partitionn of L.o Then the S-partitioned labeled RFS density
(S)
via a sequence of labels in the same group), and where there πS = πS that minimises DKL (π L ; π S ), is an S-
S∈S
is no intersection between gating regions for labels in different partitioned GLMB, with GLMB factors
groups. Furthermore, for computational feasibility, the gating re- (S)
  Y (L,S)  
gions must be sufficiently small such that L+ can be partitioned π S X (S) = π L,S X (S) ,
into groups no larger than some predefined cardinality threshold L∈L

Lmax . That is, L must satisfy the following three conditions: where
n o
1) For all L ∈ L+ , and for any `i , `j ∈ L, either B(`i ) ∩ (L,S) (H,c) (c)
π L,S = wL,S,L,S , pL,S,L,S ,
B(`j ) 6= ∅ or there exists {`1 , . . . , `n } ⊆ L such that (H,c)∈F (L∩S)×C(L)
(H,c) (H∪W,c)
X
B(`i ) ∩ B(`1 ) 6= ∅, B(`1 ) ∩ B(`2 ) 6= ∅, . . . , B(`n−1 ) ∩ wL,S,L,S = wL,L ,
B(`n ) 6= ∅, B(`n )h∩ B(`j ) 6= ∅i W ∈F (L−S)
S  S
`∈Li B(`) ∩ = ∅, for all i, j ∈ (c) (c)
2) `∈Lj B(`) pL,S,L,S (x, `) = 1(`)pL,L (x, `),
{1, . . . |L+ |}, i 6= j,
for each (L, S) ∈ L × S such that L ∩ S 6= ∅.
3) |L| ≤ Lmax for all L ∈ L+ .
Under these conditions, the label space is partitioned such that Given the approximation (22) to the current filtering density,
there is no overlap between the regions of Z corresponding to and the separable multi-object likelihood (21), the joint predic-
the groups of labels represented by L+ , taking into account tion and update can be applied to all factors of π S+ in parallel,
the prediction and likelihood. Consequently the multi-object yielding the new L+ -partitioned GLMB filtering density,
n o
filtering density at the next time should exhibit negligible (J)
π L+ ,+ = π L+ ,+ . (23)
statistical dependence between different groups of the partition J∈L+
(L)
{X + = X + ∩ F(X × L) : L ∈ L+ }. Hence, we can assume An explicit expression for (23) is given in Proposition 2 (see
that the multi-object likelihood can be well-approximated by the Appendix A for proof). Further details for efficient implemen-
following separable form tation are discussed in Sections IV-B and IV-C.
 
] (L) ] Y  (L) (L)  Proposition 2. Given a separable multi-object likelihood (21),
(L)
ĝ  Z+ | X+  , g Z+ |X + ,(21) where L+ is partition of L+ , and suppose that the current multi-
L∈L+ L∈L+ L∈L+ object n
filtering density
o is an S -partitioned GLMB of the form
(J∩L)
which facilitates a fast parallel evaluation of the labeled- πS = πS where
J∈L+
partitioned GLMB filtering density, as described in subsection (J∩L)
n
(I,c) (c)
o
III-C. πS = wS,J∩L , pS,J∩L .
(I,c)∈F (J∩L)×C(J)
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 6

Then the multi-object filtering density


n atothe next time is the For tractability, the distribution of the predicted measurement is
(J)
L+ -partitioned GLMB π L+ ,+ = π L+ ,+ , where then approximated as a uniform mixture
J∈L+
  U (`)  
(J) (J∩L) (J∩B ) (J)
X
π L+ ,+ = Ω π S ; f B + , Z+ , p̃+ (z, `) ≈ U z; B (u) (`) , (27)
  u=1
(B )
and Ω ·; f B + , Z+ : GL → GL+ is the joint prediction and

where U ·; B (u) (`) is a uniform distribution on an axis-
update operator. aligned hyper-rectangle
 B (u) (`) that should
 correspond to the
(u) (u)
region where N ·, m̃ (`) , P̃ (`) has significant mass.
IV. I MPLEMENTATION An efficient method for computing this is to consider the
In this section we describe in more detail our implementation ellipsoidal gate centered on m̃(u) (`) and shaped according to
of a large-scale GLMB filter, based on the concepts introduced P̃ (u) (`), which has probability PG of containing the received
in the previous section. The algorithm is composed of several measurement. The axis-aligned hyper-rectangle B (u) (`) can be
modules as shown in Figure 1. The details of each of these chosen as a tight bounding box on the ellipsoidal gate, which can
modules are discussed in the following subsections. be computed in simple closed form as a function of m̃(u) (`),
P̃ (u) (`) and PG . For any L ⊆ L+ , define the prediction gate
 
[ U[ (`)
B (L) =  B (u) (`) . (28)
Birth density Previous posterior density
Measurements u=1
(LMB) (label-partitioned GLMB) `∈L

We seek a partition L+ of L+ which maximises |L+ | and


satisfies
Construct Compute new label space
K-D tree partition (Section IV-A) B (I) ∩ B (J) = ∅, ∀I, J ∈ L+ , I 6= J. (29)
(Section IV-C)

A solution can always be found via the following procedure with


Approximate previous
Factorise posterior as label-partitioned
O (RT logd RT + RI + RL ) complexity, where d is the dimen-
LMB birth GLMB over new partition
(Section IV-B) sion of the measurement space, RT is the total number of hyper-
(Section IV-B)
rectangles, RI is the number of intersecting hyper-rectangle
pairs, and RL is the total number of labels at the current time.
Compute new posterior label-partitioned A segment-tree [44], [45], [46] is first constructed containing all
GLMB by parallel joint prediction/updates
(Section IV-C) hyper-rectangles, which is used to find all intersecting prediction
gates, with complexity O (RT log RT + RI ). A graph is then
constructed, consisting of one node for each label at the current
Measurement driven Extract estimates Pruning time, and an edge for every pair of labels whose prediction gates
birth (Section IV-E) (Section IV-F) (Section IV-D)
intersect. The connected components of the graph are found
via depth-first search, which has O (RL ) complexity, and can
be further accelerated via parallel processing. The connected
Fig. 1. High-level flow diagram of the large-scale GLMB filter. components then give the desired partition of L+ . However, if
the additional constraint |L| ≤ Lmax , ∀L ∈ L+ is not satisfied, it
is necessary to reduce the value for PG and repeat the procedure
A. Step 1: Label Space Partition Selection until a feasible solution is found.
We proceed under the standard assumptions of linear-
Gaussian transition and measurement models B. Step 2: Birth Factorisation and GLMB Repartitioning
f+ (x+ |x, `) =N (x+ ; F x, Q), Once an appropriate label space partition L+ of L+ has been
(B )
found, we proceed to factorise the LMB birth density f B + as
g (z|x, `) =N (z; Hx, R).
a product over the partition B+ = {L ∩ B+ : L ∈ L+ } of B+ ,
If the prior distribution for track ` is a Gaussian mixture and to approximate the current filtering density π L as a product
U (`)
over the partition S = {L ∩ L : L ∈ L+ } of L.
The LMB birth is exactly factorised as a B+ -partitioned LMB
X  
p (x; `) = w(u) (`) N x, m(u) (`) , P (u) (`) , (B+ ) (B )
u=1
f B = {f B + }B+ ∈B+ . The minimum KLD approximation to
(S)
then the distribution of the predicted measurement is also a π L as an S-partitioned GLMB, π S = {π S }S∈S , is obtained
Gaussian mixture via Proposition 1. Consider a given S ∈ S, then for each L ∈ L,
(L,S)
we calculate the GLMB π L,S for labels which are common
U (`)   (S) Q (L,S)
to both S and L. The expression for π S = L∈L π L,S is
X
p̃+ (z; `) = w(u) (`) N z, m̃(u) (`) , P̃ (u) (`) , (24)
u=1
given by expanding the product over L ∈ L. In practice, this
does not necessary require enumerating all pairs (L, S), because
m̃(u) (`) = HF m(u) (`) , (25)
(u) T T
some combinations may satisfy L ∩ S = ∅ resulting in a trivial
P̃ (`) = R + H(Q + F P F )H . (26) (L,S)
expression for π L,S . It is possible to identify pairs for which
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 7

L ∩ S 6= ∅ by querying an appropriate map structure used to E. Step 5: Measurement Driven Birth


represent the GLMB factors with logarithmic complexity. In The association probabilities computed in step 2 capture
(S)
addition, the computation of the final product for π S can the likelihood that the given measurement originated from any
be efficiently implemented using a k-shortest path algorithm one of the existing objects. These probabilities are now used
or stochastic sampling strategy to truncate the result without to construct a labelled multi-Bernoulli distribution, which will
explicit expansion. Note that this factorisation and repartitioning serve as the birth density for the next iteration of the filter.
step is trivially parallelisable in each of the factors. Each measurement with association probability below some pre-
defined threshold is used to generate a component of the LMB
C. Step 3: Parallel Propagation of Label-Partitioned GLMB distribution. The measurement itself, along with prior distribu-
Given the new partition L+ of L+ , in preparation for the tions on the unobserved state components, are used to generate a
update, it is necessary to compute for each L ∈ L+ , the non birth density for each component. This so-called “measurement-
(L)
intersecting measurement sets Z+ = Z+ ∩ B (L) that fall driven” approach requires fewer prior assumptions regarding the
inside the region B (L) found in Step 1. This will automatically initial state of newborn objects than static birth models.
(I) (J)
satisfy Z+ ∩ Z+ = ∅ since B(I) ∩ B(J) = ∅ for any
I, J ∈ L+ . A K-dimensional (K-D) tree data structure [47], can F. Step 6: Estimate Extraction
(L)
be used to find Z+ efficiently with O RU |Z+ | 1−1/d + RZ A parallelisable strategy for extracting labelled estimates of
P where d is the dimension
complexity, P of the measurement space, the current object states is to process each factor independently
RU = `∈L U (`) and RZ = `∈L |Z+ ∩ B (`)|. using standard approaches. The estimates for a particular factor
Since both the likelihood and prediction can be written can be obtained by first finding the maximum a-posteriori
as products over the same partition, the factors of the L+ - (MAP) cardinality estimate for the number of objects, then
partitioned GLMB filtering density can be computed in parallel finding the highest weighted component with the relevant car-
independently as shown in Proposition 2. This is a key fea- dinality, and finally selecting either the MAP/EAP estimates
ture that allows us to address large-scale multi-target tracking for each label. The overall multi-object estimate is obtained by
problems, by utilising modern multi-core architectures. For each taking the union of the estimates from all factors. These are
L ∈ L+ , computation of the filtering density is given by subsequently used to update the track estimates, by matching up
Proposition 2, via the GLMB joint prediction and update oper- the estimates with corresponding labels across different times.
(L∩L) (L∩B ) (L)
ator Ω with inputs π S , f B + , Z+ . Our implementation
follows [38], making use of a Gibbs sampler to efficiently V. M ULTI - OBJECT T RACKING P ERFORMANCE M ETRIC
generate GLMB components with high filtering weights, while In [41], the optimal sub-pattern assignment (OSPA) metric
also maintaining diversity across the generated samples. was proposed as a mathematically consistent and physically
Finally, an “association probability” is evaluated for each
meaningful distance between two sets of points. This has
measurement in Z+ , which is used to generate the LMB
found widespread use in the evaluation of multi-object tracking
birth density for the subsequent iteration of the filter. For a
performance, where it has become a common practice to present
measurement z ∈ Z+ that falls inside the bounding region
a plot of the OSPA distance between the estimated and true
B (L), this probability can be computed after the update of
multi-object states versus time. While this can provide an
the factor corresponding to L, by summing the weights of all
indication of the multi-object tracking performance, it does not
GLMB components in which the measurement was assigned to
fully account for errors between the estimated and true sets
an object. For measurements that do not fall inside any bounding
of tracks. Specifically, the OSPA distance between multi-object
region, the corresponding association probability is set to zero.
states does not penalise phenomena such as track switching and
fragmentation in a consistent fashion.
D. Step 4: Pruning Label-Partitioned GLMB Filtering Density In [39], a metric called OSPA for tracks (OSPA-T) was
To improve computational efficiency, for each factor of the proposed, along with a method for its approximate computation.
GLMB filtering density we prune its constituent components The disadvantage of this approach is that the approximation
by removing those whose contributions are deemed to be no longer satisfies the axioms of a metric, and thus it may
insignificant. In previous implementations of the GLMB filter not behave as one would expect. A number of drawbacks of
[36], [37], [38], a simple component pruning procedure was the approximate OSPA-T were discussed in [40], in which the
used, whereby a fixed number of highest weighted components authors defined another metric that alleviates these drawbacks.
were retained after each iteration, or components with weights However, this metric is computationally intractable for most
below a certain threshold were deleted. practical problems. In [48], two metrics inspired by the CLEAR
In addition, pruning is necessary for entire factors of the MOT heuristics for tracking performance evaluation in computer
GLMB filtering density, also with the aim of improving the vision were proposed. The drawbacks of these metrics were
algorithm’s computational efficiency. The idea is to eliminate pointed out in [49], and two other metrics were proposed.
entire factors which have negligible contribution to the new However, these metrics are not suitable for evaluating multi-
filtering density. A simple criterion is the probability that a object tracking performance in large-scale scenarios.
factor has zero cardinality. Factors for which this probability Since the OSPA distance between two sets is constructed from
exceeds a certain threshold are simply deleted. An alternative the base-distance between the elements of the sets, a simple way
is to retain only a fixed number of factors with the highest of using it to evaluate multi-object tracking performance is to
probability of having non-zero cardinality. choose the base-distance to be a distance between tracks. When
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 8

this base-distance is also constructed via OSPA, the result is the y(t)
OSPA metric on an OSPA base-distance, which we call OSPA(2)

state space
to distinguish it from the standard use. A meaningful OSPA(2)
depends on a meaningful base-distance between tracks.
This section presents the OSPA(2) metric together with a phys-
ically meaningful base-distance between tracks. We also develop
a scalable algorithm for computing this OSPA(2) distance (ex- x(t)
actly), capable of evaluating large-scale tracking performance. 10 20 30 40 50 60 70 80 90 100
time
We begin by defining the following notation:
• T = {1, 2, . . . , K} is a finite space of time indices Fig. 2. Hypothetical scenario with two well-separated tracks inside a relatively
(representing times {t1 , t2 , . . . , tK }), which includes all long time window
time indices from the beginning to the end of the scenario.
• X is the single-object state space, and F (X) is the space
of finite subsets of X. The base-distance (31) exhibits counter-intuitive behaviour
• U is the space of all functions mapping time indices in T [43], wherein a pair of well-separated tracks may have a smaller
to state vectors in X, i.e. U = {f : T 7→ X}. We refer to distance than expected. If there are two short tracks inside a
each element of U as a track. much longer window, then averaging over this window may re-
• For any f ∈ U, its domain, denoted by Df ⊆ T, is the set sult in a relatively small distance, even if the tracks are separated
of time instants at which the object exists. by a distance of c. For example, consider the scenario illustrated
Also, recall that for a function d (·, ·) to be called a “metric”, in Figure 2. Inside the window of length T = 100, there are
it must satisfy the following four properties: exactly two tracks, x with domain Dx = {91 : 95}, and y with
domain Dy = {96 : 100}. If the base-distance is calculated
P1. d (x, y) ≥ 0 (non-negativity), according to (31), i.e. as an average over t ∈ {1, ..., T } then
P2. d (x, y) = 0 ⇐⇒ x = y (identity), d˜(c) (x, y) = c/10. Furthermore, as the length of the window
P3. d (x, y) = d (y, x) (symmetry), increases, the value of this base-distance decreases, which is
P4. d (x, y) ≤ d (x, z) + d (z, y) (triangle inequality). counter intuitive, as the two tracks do not overlap in time, and
(c)
The OSPA distance dp (φ, ψ) between φ, ψ ∈ F (X) with are indeed very far apart. Thus it would actually be expected
order p and cutoff c is defined
  as follows [41]. For φ = that the base-distance assign the maximum penalty c.
φ(1) , φ(2) , . . . , φ(m) and ψ = ψ (1) , ψ (2) , . . . , ψ (n) , m ≤ n This issue can be resolved by constructing the distance
d˜(c) (x, y) between two tracks x, y ∈ U as the mean OSPA
dp(c) (φ, ψ) (30)
!!1/p distance between the set of states defined by x and y, over all
m
1 X  p times t ∈ Dx ∪ Dy , i.e.
, min d¯(c) φ(i) , ψ (π(i)) + cp (n − m) ,
n π∈Πn i=1  P
d(c) ({x(t)},{y(t)})

|Dx ∪Dy | , Dx ∪ Dy 6= ∅
where d¯(c) φ(i) , ψ (i) , min c, d φ(i) , ψ (i) , in which d (·, ·)
  d˜ (x, y) = t∈Dx ∪Dy
(c)
.
0, Dx ∪ Dy = ∅

(c)
is a metric on the space X. If m > n, then dp (φ, ψ) ,
(c) (c) (c) (33)
dp (ψ, φ). Additionally, dp (∅, φ) , c, and dp (∅, ∅) , 0.
Note that in (30), the factor of 1/n, which normalises the Averaging over Dx ∪ Dy , instead of {1, ..., T } as per (31),
distance by the number of objects, is crucial for the OSPA to results a base-distance with intuitively consistent behaviour. For
have the intuitive interpretation as a per-object error. the example in Figure 2, this choice of base-distance gives
d˜(c) (x, y) = c. Notice that even if the window is expanded,
A. Base-Distance Between Tracks the distance still evaluates to the cutoff value c, as we would
expect. In order to use (33) as a base-distance between tracks,
We now use the OSPA distance (30) to construct a base-
we need to establish that it defines a metric on U. That is, it
distance as a metric on the space U of tracks. One simple base-
must satisfy the properties P1-P4 as previously mentioned.
distance is [42]:
T Proposition 3. Let d(c) (·, ·) be a metric on the finite subsets
1 X (c)
d˜(c) (x, y) = d ({x (t)} , {y (t)}) , (31) of X, such that the distance between a singleton and an empty
T t=1 set assumes the maximum attainable value c. Then the distance
between two tracks as defined by (33) is also a metric.
where
The proof of this proposition involves some rather lengthy

0,
 |φ| = |ψ| = 0
algebraic manipulations, and as such, it has been relegated to
d(c) (φ, ψ) = c, |φ| =
6 |ψ| . (32)
  Appendix B. This result establishes that (33) is indeed a metric
(1) (1)
min c, d φ , ψ , |φ| = |ψ| = 1

on the space U, when d(c) (·, ·) is defined according to (32).
Note that in (31) {x (t)} is a singleton if t ∈ Dx , or {x (t)} Before proceeding to define OSPA(2) , we make two important
is empty if t ∈
/ Dx (and likewise for {y (t)}). In this case, the observations regarding the properties of this base-distance.
OSPA distance defined in (30) reduces to (32). Hence, the order • Since d(c) (·, ·) ≤ c, the distance between tracks saturates
parameter p becomes redundant, and is omitted. at the value c, i.e. d˜(c) (·, ·) ≤ c.
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 9

• For two tracks x and y such that Dx = Dy , (33) can 2) Visualisation of OSPA(2) : In practice, it is desirable to
be interpreted as a mean square error (MSE) between x examine the tracking performance as a function of time, so that
and y. Hence, this base-distance can be regarded as a trends in algorithm behaviour can be analysed in response to
generalisation of the MSE for tracks of different domains. changing scenario conditions. This can be achieved by plotting

αk (X, Y ; wk ) = dˇ(c)
p (Xwk , Ywk ) (35)
B. OSPA(2) for Tracks
as a function of k, where wk is a set of time indices that varies
The distance between two tracks as defined in Section with k, and
V-A is both a metric on the space U, and bounded by the
value c. It is therefore suitable to serve as a base-distance Xwk = {x|wk : x ∈ X and Dx ∩ wk 6= ∅} , (36)
for the OSPA metric (1)on (2)
the space of finite sets of tracks Ywk = {y|wk : y ∈ Y and Dy ∩ wk 6= ∅} , (37)
F (U). Let X = x , x , . . . , x(m) ⊆ F (U) and Y =
 (1) (2)
y ,y ,...,y (n)
⊆ F (U) be two sets of tracks, where where f |w denotes the restriction of f to domain w.
(c) Note that the sets Xwk and Ywk only contain those tracks
m ≤ n. We define the distance dˇp (X, Y ) between X and Y
whose domain overlaps with wk , i.e. any tracks whose domain
as the OSPA with base-distance d˜(c) (·, ·) (the time averaged
lies completely outside the set wk are disregarded. Choosing
OSPA given by equation (33)). That is,
different values for the set wk allows us to examine the
dˇ(c) performance of tracking algorithms over different time scales.
p (X, Y ) (34)
!!1/p For example, a straightforward approach is to set wk =
m p
1 X  {k − N + 1, k − N + 2, . . . , k}, so that at time k, the set wk
, min d˜(c) x(i) , y (π(i)) +cp (n − m) ,
n π∈Πn consists of only the latest N time steps. In this case, choosing
i=1
a small value for N will indicate the tracking performance over
where c is the cutoff and p is the order parameter. If m > n, shorter time periods, while larger values will reveal the longer-
(c) (c) (c)
then dˇp (X, Y ) , dˇp (Y, X). Note also that dˇp (∅, X) , c, term tracking performance. This choice is highly dependent on
(c)
and dˇp (∅, ∅) , 0. We call this distance OSPA-on-OSPA or the application at hand. For example, in real-time surveillance,
OSPA(2) , which can be interpreted as the time-averaged per- we may only be interested in tracking objects over a period of a
track error. few minutes, as older information may be considered irrelevant.
1) Efficient Evaluation of OSPA(2) : Evaluating (34) involves In this case, a small value for N would suffice to capture the
the following three steps: important aspects of the tracking performance. However, in an
off-line scenario analysis application, we might require accurate
1) Compute an m  × n cost matrix C, with entries Ci,j = trajectory estimates over much longer time periods, in which
d˜(c) x(i) , y (j) , according to (33).
case a larger value for N would be more appropriate.
2) Solve a 2-D assignment problem with cost matrix C, to
Furthermore, examining the same scenario using OSPA(2)
find the minimum cost 1-1 assignment of columns to rows.
(c) with different window lengths could reveal important insights
3) Use the result of step 2 to evaluate dˇp (X, Y ) via (34).
into the relationship between long-term and short-term per-
A basic implementation of this procedure would require com- formance. For example, the design of a multi-object tracking
puting the base-distance between all pairs of tracks in X and system often involves trade-offs between estimation accuracy
Y , which has complexity O (|wk | mn). Step 2 then requires and response time, and comparing OSPA(2) with long and short
solving
 a dense optimal assignment problem with complexity windows could help to characterise the nature of this trade-off.
3
O (m + n) . This would preclude its use in large-scale Note that computing the OSPA(2) on a sliding window as
tracking scenarios involving millions of objects, as the cost described above, converges to the traditional OSPA (for sets of
matrix would consume too much memory, and the assignment points) as N becomes smaller. For N = 1 the OSPA(2) becomes
problem would be infeasible. identical to the traditional OSPA.
Fortunately, in many practical applications, the base-distance It is important to understand that the OSPA(2) distance has
between most pairs of tracks will saturate at the cutoff value c. a different interpretation to that of the traditional OSPA dis-
This can be exploited to dramatically reduce the computational tance. Whereas the traditional OSPA distance captures the error
complexity, making it feasible for large-scale problems. Firstly, between the true and estimated multi-target states at a single
recall that the time averaging in the base-distance (33) is carried instant in time, the OSPA(2) distance captures the error between
out only over the union of the track domains. Consequently, the true and estimated sets of tracks over a range of time
the base-distance between any pair of tracks that have no instants, as determined by the choice of wk . Therefore, careful
corresponding states closer than a distance of c, must saturate consideration must be given to the design of wk , and the user
at c. Such pairs can be considered unassignable, and efficient must be mindful of this when interpreting the results.
spatial searching algorithms can be applied to extract only the 3) Behavior of the OSPA(2) : The OSPA(2) metric exhibits
assignable pairs of tracks in O (|wk | m log n) time. Once these intuitively consistent behavior when faced with various types
have been found, a sparse optimal assignment algorithm can of common tracking errors. For example, errors in localisation,
be used to obtain the lowest-cost matching between the ground cardinality, track fragmentation and track label switching, all
truth and estimated tracks. Such algorithms can solve sparse yield expected increases in the OSPA(2) distance (illustrations
assignment problems with a much lower average complexity are shown in [42], [43]). Some of the key points that distinguish
than is possible under the dense optimal assignment formulation. the OSPA(2) from the traditional OSPA distance are as follows:
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 10

• A track that is dropped and later regained with the same process noise vector. In the current scenario, the sampling
identity, yields a smaller increase in the OSPA(2) than a interval is T = 1s, and the process noise standard deviation
track that it dropped and regained with a different identity. is σw = 0.2m/s2 . The peak cardinality of approximately 1.2
• When visualised according to the method described in the million objects occurs at time 700, when the mean object density
previous section, a longer window effectively sustains the is around 520 per km2 .
influence of cardinality and track labelling errors for a We simulate data from a position sensor, corrupted by noise,
longer duration, i.e. the metric “remembers” mistakes that missed detections and false alarms. An object that exists at time
were committed by the tracker further into the past. k with state xk is detected with probability PD = 0.88, in which
• A greater frequency of track fragmentation and labelling case the object generates a measurement
errors results in a higher OSPA(2) distance.  
1 0 0 0
Illustrations of how the OSPA(2) metric respond to specific z= x k + vk ,
0 0 1 0
scenario events can be found in [42], [43]. Also included is
a larger synthetic example, in which the frequency of track 
and vk ∼ N 0, σv2 I2 is a 2 × 1 i.i.d. Gaussian measurement
fragmentation and label switching is varied in order to observe noise vector, with σv = 5m. Each set of measurements also
its influence on the OSPA(2) distance. contains false alarms, the number of which is Poisson distributed
with a mean of 200 per km2 (i.e. an overall mean of 200 ×
VI. N UMERICAL R ESULTS 64 × 36 = 460, 800 per scan), and a spatial distribution that is
In this section, we demonstrate the proposed GLMB filter uniform across the surveillance region.
implementation on a simulated large-scale multi-object tracking For the filter, the number of GLMB components generated
scenario in which the peak number of objects appearing simul- during the
 update
 of the i-th
 factor
 in the filtering density
taneously exceeds one million. The scenario runs for 1000 time 3
is max min L(i) , 5000 , 500 , i.e. the size of the fac-
steps, on a rectangular 64 km by 36 km surveillance region. The
tor’s label space cubed, lower bounded at 500 and upper
single-object state consists
 of 2D positionT and velocity, i.e. state bounded
 at 5000. 3In the pruning  step, this is reduced to
vectors have the form x ẋ y ẏ , with units of metres 1 (i)
max min 5 L , 1000 , 100 . In the label partitioning
and metres per second.
New objects are generated throughout the scenario by sam- step, the probability threshold for computing the size of the
pling from an LMB distribution at each time step. This dis- hyper-rectangles representing the bounding regions for each
tribution consists of 20,000 components, where the density of track is PG = 0.99. The maximum size of the label space of
the i-th component is a Gaussian N ·; m(i) , P with m(i) = any single factor is set to Lmax = 20. If the constraint imposed
 (i) T by Lmax cannot be satisfied with the chosen value for PG , the
x 0 y (i) 0 and P = diag [50, 5] ⊗ I2 , where the partitioning routine is repeated by progressively reducing PG
positional elements of m(i) are sampled according to x(i) ∼ by 20% until the constraint can be satisfied.
U (0, 64000) and y (i) ∼ U (0, 36000). At a given time step k, The tracking algorithm was coded in C++, making use of
the existence probabilities of all components in the birth model OpenMP to implement parallel processing wherever possible.
are set to a common value rB,+ , but different values for rB,+ We executed the algorithm on a machine with four 16-core
are used within different time intervals according to AMD Opteron 6376 processors (for a total of 64 physical
(
0.15, k ∈ [1, 400] ∪ [501, 700] processor cores), and 256 GB of memory. On this hardware
rB,+ = , configuration, the peak time taken to process a frame of mea-
0.01, k ∈ [401, 500] ∪ [701, 1000]
surements was approximately 5 minutes when the cardinality
which simulates a variable rate of object birth. With 20,000 birth was highest, but the algorithm ran considerably faster than this
components, the value rB,+ = 0.15 equates to an average birth at times when there were fewer objects in the scene. The peak
rate of 3000 objects per time step, and rB,+ = 0.01 gives an memory usage of the algorithm was approximately 50 GB. To
average birth rate of 200 objects per time step. It is imperative to evaluate tracking performance, the OSPA(2) metric was coded in
note that the filter is not provided with any information about the Matlab, using the “parfor” construct to parallelise some aspects
locations of the birth components nor their probabilities. Instead, of the computation. The average time taken to evaluate each
it uses a measurement-based approach to adaptively construct point in the OSPA(2) curve was approximately 1 minute.
an LMB birth distribution after each iteration. The true and estimated cardinality is shown in Figure 3, and
At each time step, objects that existed at the previous time Figure 4 shows OSPA(2) distance, as well as the traditional
survive with probability PS = 0.999, i.e. one out of every 1000 OSPA distance. For the OSPA calculations, the cutoff was set
objects spontaneously dies on average. This survival probability to c = 50m, the order was p = 1, and for the OSPA(2) , a sliding
is known by the filter. An object that has state xk at time k and window over the latest 50 time steps was used to evaluate each
survives to time k + 1, takes on a new state according to the point in the curve. From the cardinality plot, it can be seen
discrete white noise acceleration model that the estimated cardinality lags behind the true cardinality
xk+1 = F xk + Gwk , during the times when new targets are being born. This is to
be expected, as the measurement-driven birth model needs to
T 2 /2
   
1 T
F = ⊗ I2 , G= ⊗ I2 , consider a very large number of potential birth tracks at each
0 1 T
 scan. To avoid initiating too many false tracks, the filter delays
2
where ⊗ denotes the Kronecker product, and wk ∼ N 0, σw I2 the initiation of tracks until there is more data to confirm the
is a 2×1 independent and identically distributed (i.i.d.) Gaussian presence of an object. The OSPA(2) plot shows increased error
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 11

that it minimises the Kullback-Leibler divergence for a given


1.2M partition of the object label space. This allows the algorithm
True

1M
Estimated to fully exploit the potential of highly parallel processing, as
afforded by modern multi-core computing architectures. Due to
800k its relatively low processing time and memory requirements,
Cardinality

simulations show that the proposed technique is capable of


600k
tracking in excess of a million objects simultaneously, running
400k on standard off-the-shelf computing equipment. Additionally,
we have introduced a new way of using the OSPA metric to
200k
capture multi-object tracking error (rather than filtering error),
0 and applied it to evaluate the performance of our large-scale
0 200 400 600 800 1000 multi-object tracker.
Time [s]

Fig. 3. True and estimated cardinality for large-scale tracking scenario VIII. A PPENDIX
A. GLMB Approximation and Factorisation
25
OSPA The proofs of Propositions 1 and 2, draw on the preliminary
OSPA(2) results Lemma 4, Lemma 5 and Corollary 6 below.
20
OSPA / OSPA(2) [m]

Lemma 4. Consider a labeled RFS density π on F(X × L),


15 and a partition L of L. The L-partition labeled RFS density π L
that minimises the Kullback-Leibler divergence DKL (π; π L ) is
10 (L)
given by π L = {π L }L∈L , where
Z
5 (L)
π L (X (L) ) = π (X) δ(X − X (L) ).
0
0 200 400 600 800 1000 Proof: The proof follows that for the vector case [50], ex-
Time [s] cept that set integrals replace standard integrals. Since L is a par-
tition of L, {X (L) = X ∩(X × L)}QL∈L is a partition of X, and
Fig. 4. OSPA and OSPA(2) distance for large-scale tracking scenario
δ(X−X (L) ) can be replaced with `∈L−L δX ({`}) . Expanding
the Kullback-Leibler divergence between π and an arbitrary
during the periods in which new targets are appearing, due to the L-partitioned labeled RFS density π̃ L and regrouping  yields

(L) (L)
delay in initiating new tracks. At other times, the error stabilises
P
DKL (π; π̃ L ) = DKL (π; π L ) + L∈L DKL π L ; π̃ L .
to approximately 2.5m per object per unit time. As we would (L) (L)
The result follows by noting that the marginals π̃ L = πL
expect, the OSPA(2) error is consistently higher than the OSPA
result in the minimum divergence.
error, since it penalises incorrect labelling behaviour which is
not captured by the OSPA distance. Notably, the difference Lemma 5. Consider a GLMB π = {(w(I,c) , p(c) )}(I,c)∈F (L)×C
between the two curves is greatest during the times when the on F(X × L), and a partition L of L. The L-partitioned labeled
(L)
true cardinality is increasing. This is to be expected, since track RFS density π L = {π L }L∈L that minimises DKL (π; π L ), is
initiation is arguably one of the most challenging aspects of an L-partitioned GLMB, with GLMB factors
multi-object tracking, and we can thus expect the tracker to n o
(L) (J,c) (c)
commit more labelling errors at times when there are many new πL = wL,L , pL,L ,
(J,c)∈F (L)×C
objects appearing. Figure 5 show a snapshot of the estimated (J,c)
X
tracks produced by the proposed large-scale GLMB tracker at wL,L = w(J∪U,c) ,
time 700 when the peak cardinality of approximately 1.2 million U ∈F (L−L)
objects is reached. Two illustrative videos are also provided in (c)
pL,L (x, `) = 1L (`)p(c) (x, `) .
supplementary materials, one showing only the measurements
in time, and the other showing the estimates in time. Proof: Using the delta form for GLMB [36],
X h iX
VII. C ONCLUSION π (X) = ∆ (X) w(I,c) δI (L (X)) p(c) ,
We have presented an efficient and scalable implementation (I,c)∈F (L)×C

of the generalised labeled multi-Bernoulli filter, that is capable


it follows from Lemma 4 that for any X (L) ∈ F(X × L).
of estimating the trajectories of a very large number of objects
Z
simultaneously, in the order of millions per frame. The proposed (L)
π L (X ) = π (X) δ(X − X (L) )
(L)
method makes efficient use of the available computational
resources, by decomposing large-scale tracking problems into
Z
smaller independent sub-problems. The decomposition is carried = π(X (L) ] X (L̄) )δX (L̄) .
out via marginalisation of high-dimensional multi-object densi-
where L̄ = L − L, and X (L̄) = X − X (L) = X ∩ X × L̄ .

ties, using a technique that is shown to be optimal in the sense
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 12

Fig. 5. Estimated tracks at time step 700 when there is a peak cardinality of 1.2 million objects. The inset is a magnified view of a 2km by 1km region.

Substituting for π(X (L) ] X (L̄) ) gives


(L)
Moreover, noting that the argument of p(c) in [p(c) ]X is
(L) (c) (L)
(L) restricted to X ×L, we have [p(c) ]X = [pL,L ]X , and hence
π L (X (L) )
Z the desired result.
= ∆(X (L) ] X (L̄) ) Treating L as L, and C(L) as C in Lemma 5 yields:
X
w(I,c) ×
(I,c)∈F (L)×C Corollary 6. Given L ∈ F (L), a GLMB in GL of the form
(L) (L̄)
] X ))[p(c) ]X ]X δX (L̄)
(L) (L̄) n o
δI (L(X (I,c) (c)
π (L) = (wL , pL ) ,
(I,c)∈F (L)×C(L)
Z
= ∆(X (L) )∆(X (L̄) )
X
w(I,c) ×
and any S ∈ F (L) such that L ∩ S 6= ∅. The marginal
(I,c)∈F (L)×C Z
(L)
δI (L(X (L) ] X (L̄) ))[p(c) ]X [p(c) ]X δX (L̄)
(L̄)
π (L,S) (X (L∩S) ) = π (L) (X (L) )δ(X (L) − X (S) )
X (L)
= ∆(X (L) ) w(I,c) [p(c) ]X × is a GLMB in GL∩S given by
(I,c)∈F (L)×C n
(H,c) (c)
o
Z
(L̄)
π (L,S) = wL,S , pL,S ,
∆(X (L̄) )δI (L(X (L) ] X (L̄) ))[p(c) ]X δX (L̄) (H,c)∈F (L∩S)×C(L)
(H,c) (H∪W,c)
X
X X X (L)
wL,S = wL ,
= ∆(X (L) ) w(J∪U,c) δJ (L(X (L) ))[p(c) ]X W ∈F (L−S)
J∈F (L) U ∈F (L̄) c∈C (c) (c)
Z pL,S (x, `) = 1L∩S (`)pL (x, `).
(c) X (L̄)
× ∆(X (L̄) )δU (L(X (L̄) ))[p ] δX (L̄) , Proof of Proposition 1: Applying Lemma 4, the optimal
approximation is given by the product of its marginals
where the last line follows from decomposing the sum  over Y (S)
F (L), into sums over F (L) and its complement F L̄ . π S (X) = π S (X (S) ),
Using Lemma 3 of [36], S∈S
Z
(S)
π S (X (S) ) π L (X) δ(X − X (S) ).
Z
(L̄) =
∆(X (L̄) )δU (L(X (L̄) ))[p(c) ]X δX (L̄)
X Z H Since {X (L) }L∈L is a partition of X, observe that {(X (L) −
= δU (H) p(c) (x, ·)dx X (S) )}L∈L is a partition of X − X (S) , hence the integration
H∈F (L̄) can be performed over
X Y
= δU (H) = 1. δ(X − X (S) ) = δ(X (L) − X (S) ).
H∈F (L̄) L∈L
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 13

Substituting for π L and regrouping yields It remains to verify metric property P4 (the triangle inequal-
(S)
Y (L,S) ity), which is accomplished via induction. Since d(c) (·, ·) is
π S (X (S) ) = π L,S (X (S) ),
a metric, it is clear that the distance between the tracks at a
L∈L
Z single time index 1 (representing time t1 ), satisfies the triangle
(L,S) (L)
π L,S (X (L∩S) ) = π L (X (L) )δ(X (L) − X (S) ). inequality. Let us assume that the distance between the tracks
over time indices {1, 2, . . . , k} (representing {t1 , t2 , . . . , tk }),
(L,S) satisfies the triangle inequality. We now proceed to show that the
Applying Corollary 6 to evaluate the marginal π L,S yields the
desired result.  distance between the tracks over time indices {1, 2, . . . , k, k+1}
also satisfies the triangle inequality.
Proof of Proposition 2: Given a partition L+ of L+ , define When at least one of the sets Dx ∪ Dy , Dy ∪ Dz , Dz ∪ Dx
the corresponding partition S of L and B+ of B+ is empty, the triangle inequality for tracks over time indices
S = {J ∩ L : J ∈ L+ } , 1, 2, . . . , k, k + 1 can be easily verified, since d(c) (·, ·) is a
B+= {J ∩ B+ : J ∈ L+ } . metric. Hence, we consider the case where Dx ∪ Dy , Dy ∪ Dz ,
Dz ∪ Dx are all non-empty.
U (J) U (J)
Let Z+ , J∈L+ Z+ , and X+ , J∈L+ X + . Then the Some notations are needed for compactness. Let us denote
next posterior is given by the cardinalities of the basic sets in Dx ∪ Dy ∪ Dz by
π L+ ,+ (X + |Z+ ) ∝ g (Z |X ) f (X B,+ ) × N , |Dx ∩ Dy ∩ Dz | , (38)
Z + + B
f S (X S,+ |X S ) π S (X S ) δX S , Nx̆ , |Dx − Dy − Dz | , Np , |Dx ∩ Dy − Dz | , (39)
Ny̆ , |Dy − Dz − Dx | , Nq , |Dy ∩ Dz − Dx | , (40)
X S,+ = X + ∩ (X × L) ,
Nz̆ , |Dz − Dx − Dy | , Nr , |Dz ∩ Dx − Dy | , (41)
X B,+ = X + ∩ (X × B+ ) .
Noting that S is a partition of L, it follows that the transition as illustrated in the diagram below.
factors into
(J∩L) (J∩L)
Y
f S (X S,+ |X S ) = f S (X S,+ |X S ).
J∈L+

Similarly, noting that B+ is a partition of B+ , it follows that


the birth factors into
(B )
Y (J∩B ) (J∩B )
f B + (X B,+ ) = f B + (X B,+ + ).
J∈L+

Substituting
(J∩L) (J∩L)
Y
π S (X S ) = πS (X S ),
J∈L+
(J) (J)
Y
g (Z+ |X + ) = g(Z+ |X + ),
J∈L+

yields
Q (J) (J) (J) Furthermore, we adopt the following abbreviations:
π L+ ,+ (X + |Z+ ) = J∈L+ πL+ ,+ (Z+ |X + ),
(J) (J)
π L+ ,+ (Z+ |X + ) =
(J) (J) (J) (J∩B ) (J∩B )
g(Z+ |X + )f B + (X B,+ + ) S , N + Np + Nq + Nr , (42)
R (J∩L) (J∩L) (J∩L) (J∩L) T , S + |Dx ∪ Dy ∪ Dz | = 2S + Nx̆ + Ny̆ + Nz̆ , (43)
× f S (X S,+ |X S )π S (X S )δX S .
P , |Dx ∪ Dy | = S + Nx̆ + Ny̆ , (44)
Applying Proposition 1 in [38], for each J ∈ L+ gives the final
Q , |Dy ∪ Dz | = S + Ny̆ + Nz̆ , (45)
result. 
R , |Dz ∪ Dx | = S + Nx̆ + Nz̆ , (46)
d(c) ({x (t)} , {y (t)}) ,
P
B. Proof of Proposition 3 (Metric for Tracks) p, (47)
(c) t∈Dx ∪Dy
Firstly, since d (·, ·) ≥ 0 and |Dx ∪ Dy | ≥ 0, all terms in the
summation over t ∈ Dx ∪ Dy are clearly non-negative. Hence p0 , d(c) ({x (k + 1)} , {y (k + 1)}) , (48)
d˜(c) (·, ·) satisfies metric property P1. d(c) ({y (t)} , {z (t)}) ,
P
q, (49)
Second, d˜(c) (x, y) = 0 if and only if x = y = ∅, or t∈Dy ∪Dz

d ({x (t)} , {y (t)}) = 0 for all t ∈ Dx ∪ Dy . Since d(c) (·, ·)


(c)
q 0 , d(c) ({y (k + 1)} , {z (k + 1)}) , (50)
is a metric, d(c) ({x (t)} , {y (t)}) = 0 ⇔ {x (t)} = {y (t)}. r,
P
d(c) ({z (t)} , {x (t)}) , (51)
Hence d˜(c) (x, y) = 0 ⇔ x = y, satisfying metric property P2. t∈Dz ∪Dx
Third, since d(c) (·, ·) is a metric, and Dx ∪Dy = Dy ∪Dx , all r0 , d (c)
({z (k + 1)} , {x (k + 1)}) . (52)
terms in (35) are symmetric in their arguments. Hence d˜(c) (·, ·)
satisfies metric property P3. The sum p over (non-empty) Dx ∪ Dy is decomposed into:
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 14

• p̂N consisting of N terms on (Dx ∩ Dy ∩ Dz ); P (Q + 1) r + P (Q + 1) c (R + 1) P q + (R + 1) P c


= +
• p̂Np consisting of Np terms on (Dx ∩ Dy − Dz ); and P (Q + 1) (R + 1) P (Q + 1) (R + 1)
• (Nq + Nr + Nx̆ + Ny̆ ) c , since d ({x (k)} , {y (k)}) = c (Q + 1) (R + 1) p

on (Dx ∪ Dy ) − (Dx ∩ Dy ). P (Q + 1) (R + 1)
Similar decompositions also apply to q, and r. Hence, P (Q + 1) r + (R + 1) P c P c + (Q + R + 1) P c
= +
P (Q + 1) (R + 1) P (Q + 1) (R + 1)
p , p̂N + p̂Np + (Nq + Nr + Nx̆ + Ny̆ ) c,
(QR + Q + R + 1) p
q , q̂N + q̂Nq + (Nr + Np + Ny̆ + Nz̆ ) c, −
P (Q + 1) (R + 1)
r , r̂N + r̂Nr + (Np + Nq + Nz̆ + Nx̆ ) c. P Qr + RP q − QRp + P r + P q + P c
=
The following bounds are required for the proof P (Q + 1) (R + 1)
(Q + R + 1) P c − (Q + R + 1) p
p̂N ≤ q̂N + r̂N , (53) +
P (Q + 1) (R + 1)
p ≤ p̂N + (P − N ) c ≤ P c, (54) P (r + q + c) + (Q + R + 1) (P c − p)
q ≥ q̂N + (Nr + Np + Ny̆ + Nz̆ ) c , q̂N + Q} c, (55) ≥ ≥ 0,
P (Q + 1) (R + 1)
}
r ≥ r̂N + (Np + Nq + Nz̆ + Nx̆ ) c , r̂N + R c. (56)
where we used the triangle inequality (65) and the bound P c ≥
Note that: the triangle inequality (53) holds because p̂N , q̂N , p from (54).
r̂N are, respectively, sums of distances between x and y, y and For case (ii)
z, z and x, over all time indices in Dx ∩ Dy ∩ Dz ; (54) holds
because p̂Np ≤ Np c, Np + Nq + Nr + Nx̆ + Ny̆ = P − N , and r q+c p+c
+ −
p̂N ≤ N c; and (55), (56) hold because q̂Nq , r̂Nr ≥ 0. R Q+1 P +1
The following identities (follows directly from the definitions) (P + 1) (Q + 1) r
=
(P + 1) (Q + 1) R
P + Q = T + Ny̆ , (57)
R (P + 1) q + R (P + 1) c − (Q + 1) Rp − (Q + 1) Rc
= R + S + 2Ny̆ , (58) +
(P + 1) (Q + 1) R
Q + R = T + Nz̆ , (59) (P Q + P + Q + 1) r
R + P = T + Nx̆ , (60) =
(P + 1) (Q + 1) R
N + Q} + R} − Q = Np + Nx̆ + Nz̆ , (61) R (P + 1) q − (Q + 1) Rp + R (P + 1) c − (Q + 1) Rc
+
N + Q} + R} − P = Np + 2Nz̆ , (62) (P + 1) (Q + 1) R
(P Q + P + Q + 1) r + RP q + Rq − QRpc
are also required for the proof. Note that so far, all of the =
variables we have defined are non-negative. (P + 1) (Q + 1) R
Adopting the above notation, the properties of d(c) (·, ·) and (P − Q) Rc − Rp
+
the triangle inequality for d˜(·, ·) can be expressed as (P + 1) (Q + 1) R
P Qr + RP q − QRp + (P + Q + 1) r + Rq
c ≥ p0 , q 0 , r 0 , (63) =
0 0 0 (P + 1) (Q + 1) R
r + q − p ≥ 0, (64)
(P − Q) Rc − Rp
r q p +
+ ≥ or equivalently P Qr + RP q − QRp ≥ 0. (65) (P + 1) (Q + 1) R
R Q P
(P + Q + 1) r + Rq − Rp + (P − Q) Rc
We need to prove that the triangle inequality holds for the ≥ ,
(P + 1) (Q + 1) R
following three cases (note that the result holds trivially when
{x (k + 1)} = {y (k + 1)} = {z (k + 1)} = ∅): where the last line follows from the triangle inequality (65).
(i) {x (k + 1)} = {y (k + 1)} = ∅, and {z (k + 1)} = 6 ∅, i.e. Using the bounds (54), (55), (56) for p, q, and r, and the
r+c q+c p identities P + Q = R + S + 2Ny̆ from (58), we have
+ ≥ .
R+1 Q+1 P
(ii) {z (k + 1)} = ∅, and {x (k + 1)} =
6 ∅ or {y (k + 1)} =
6 (P + Q + 1) r + Rq − Rp + (P − Q) Rc
≥ (R + S + 2Ny̆ + 1) r̂N + R} c + R q̂N + Q} c
   
∅, i.e.
r q+c p+c r+c q p+c − R [p̂N + (P − N ) c] + (P − Q) Rc
+ ≥ or + ≥ .
R Q+1 P +1 R+1 Q P +1 = + Rr̂N

+ S + 2Ny̆ + 1 r̂N +R} R + S + 2Ny̆ + 1 c

+ Rq̂N }
+Q Rc
(iii) At least two of {x (k + 1)}, {y (k + 1)} and {z (k + 1)} − Rp̂N − (P − N ) Rc + (P − Q) Rc
are non-empty, i.e. | {z } | {z } | {z }
≥ 0 +0 +R} Rc + Q} Rc + (N − Q) Rc
r + r0 q + q0 p + p0
= R} + Q} + N − Q Rc
 
+ ≥ .
R+1 Q+1 P +1
= [Np + Nx̆ + Nz̆ ] Rc ≥ 0,
For case (i)
r+c q+c p where we used r̂N + q̂N − p̂N ≥ 0 from (53) and R} + Q} +
+ − N − Q = Np + Nx̆ + Nz̆ from (61).
R+1 Q+1 P
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 15

For case (iii) Hence, expanding (68) results in (69) on the following page,
r+r 0
q+q p+p 0 0 where we used the triangle inequality (64) for p0 , q 0 , r0 .
+ − Finally, since c ≥ p0 , it follows that (67) + (68) ≥ 0. 
R+1 Q+1 P +1
It is possible to extend the base-distance (33) to the case
(P + 1) (Q + 1) (r + r0 ) (R + 1) (P + 1) (q + q 0 )
= + where the terms of the sum in (33) are raised to the power of
(P + 1) (Q + 1) (R + 1) (P + 1) (Q + 1) (R + 1) q, and take the qth root of the resulting sum.
(Q + 1) (R + 1) (p + p0 )

(P + 1) (Q + 1) (R + 1) R EFERENCES
(P Q + P + Q + 1) r (RP + R + P + 1) q
= + [1] S. Blackman, R. Popoli, Design and Analysis of Modern Tracking Systems,
(P + 1) (Q + 1) (R + 1) (P + 1) (Q + 1) (R + 1) Norwood, MA: Artech House, 1999.
(QR + Q + R + 1) p (P Q + P + Q + 1) r0 [2] Y. Bar-Shalom, X. Rong Li, Multitarget-Multisensor Tracking: Principles
− + and Techniques, YBS Publishing, 1995.
(P + 1) (Q + 1) (R + 1) (P + 1) (Q + 1) (R + 1) [3] R. P. S. Mahler, Statistical Multisource-Multitarget Information Fusion,
(RP + R + P + 1) q 0 (QR + Q + R + 1) p0 Artech House, 2007.
+ − [4] B. A. Jones, D. S. Bryant, B.-T. Vo, and B.-N. Vo, “Challenges of multi-
(P + 1) (Q + 1) (R + 1) (P + 1) (Q + 1) (R + 1) target tracking for space situational awareness,” in Proc. 18th Int. Conf.
(P Qr + RP q − QRp) + (r0 + q 0 − p0 ) Information Fusion, pp. 1278-1285, Washington D.C., July 2015.
= (66) [5] H. Klinkrad, “Space debris: models and risk analysis,” Springer Science
(P + 1) (Q + 1) (R + 1) & Business Media, 2006.
(P + Q + 1) r + (R + P + 1) q − (Q + R + 1) p [6] Committee for the Assessment of the U.S. Air Force’s Astrodynamic Stan-
+ (67) dards; Aeronautics and Space Engineering Board; Division on Engineering
(P + 1) (Q + 1) (R + 1) and Physical Sciences; National Research Council, Continuing Kepler’s
(P Q + P + Q) r0 + (RP + R + P ) q 0 − (QR + Q + R) p0 Quest: Assessing Air Force Space Command’s Astrodynamics Standards,
+ . The National Academies Press, 2012.
(P + 1) (Q + 1) (R + 1) [7] V. Reilly, H. Idrees, and M. Shah, “Detection and tracking of large number
(68) of targets in wide area surveillance,” in Proc. European Conf. Computer
Vision, pp. 186-199, Springer, Berlin, Heidelberg, September 2010.
Note that (66) ≥ 0 from the triangle inequalities (65) and (64). [8] T. Pollard and M. Antone, “Detecting and tracking all moving objects in
It remains to be shown that (67) + (68) ≥ 0. wide-area aerial video,” in 2012 IEEE Computer Society Conf. Computer
Into (67), we substitute the three identities P + Q = T + Ny̆ , Vision & Pattern Recognition Workshops, pp. 15–22, June 2012.
[9] J. Prokaj, X. Zhao, and G. Medioni, “Tracking many vehicles in wide
Q+R = T +Nz̆ and R+P = T +Nx̆ , from (57), (59) and (60) area aerial surveillance,” in 2012 IEEE Computer Society Conf. Computer
respectively. We also use the upper/lower bounds on p, q, and r, Vision & Pattern Recognition Workshops, pp. 37–43, June 2012.
from (54), (55) and (56) respectively. This yields the following [10] N. Chenouard, I. Bloch, and J.-C. Olivo-Marin, “Multiple hypothesis
tracking for cluttered biological image sequences,” IEEE Trans. Pattern
expression Anal. Mach. Intell., vol. 35, no. 11, pp. 2736-3750, November 2013.
[11] E. Meijering, O. Dzyubachyk and I. Smal, “Methods for cell and particle
(P + Q + 1) r + (R + P + 1) q − (Q + R + 1) p tracking,” in Methods in Enzymology, vol. 504, Academic Press, 2012, pp.
≥ (T + Ny̆ + 1) r̂N + R} c + (T + Nx̆ + 1) q̂N + Q} c
   
183–200.
[12] J. K. Uhlmann, “Algorithms for multiple-target tracking,” American Sci-
− (T + Nz̆ + 1) [p̂N + (P − N ) c] entist, vol. 80, no. 2, pp. 128–141, 1992.
[13] J. B. Collins and J. K. Uhlmann, “Efficient gating in data association with
= + (T + 1) r̂N + Ny̆ r̂N + (T + 1) R} c + Ny̆ R} c
multivariate Gaussian distributed states,” IEEE Trans. Aerosp. Electron.
+ (T + 1) q̂N + Nx̆ q̂N + (T + 1) Q} c + Nx̆ Q} c
Syst, vol. 28, no. 3, pp. 909-916, July 1992.
− (T + 1) p̂N − Nz̆ p̂N − (T + 1) (P − N ) c − Nz̆ (P − N ) c
| {z } | {z } | {z } | {z } [14] D. Musicki, B. F. La Scala, and R. J. Evans, “Multi-target tracking in
≥ 0 −Nz̆ N c + (T +1)(Np + 2Nz̆ ) c −Nz̆ P c + Nz̆ N c clutter without measurement assignment.” in Proc. 43rd IEEE Conf. on
Decision and Control, pp. 716-721, 2004.
= [(T + 1) (Np + 2Nz̆ ) − Nz̆ P ] c [15] H. Wang, T. Kirubarajan, and Y. Bar-Shalom, “Precision large scale
≥ [(T + 1) Nz̆ − Nz̆ P ] c air traffic surveillance using IMM/assignment estimators,” IEEE Trans.
Aerosp. Electron. Syst,, vol. 35, no. 1, pp. 255-266, January 1999.
= (S + Nz̆ + 1) Nz̆ c, [16] M. Betke, D.E. Hirsh, A. Bagchi, N.I. Hristov, N.C. Makris, and T.H.
Kunz, “Tracking large variable numbers of objects in clutter,” in Proc.
where we used r̂N + q̂N − p̂N ≥ 0 from (53), Ny̆ r̂N ≥ 0, IEEE Conf. Computer Vision and Pattern Recognition, 2007.
Nx̆ q̂N ≥ 0, −p̂N ≥ −N c from (54), Q} +R} −P +N = Np + [17] B.-N. Vo, B.-T. Vo, S. Reuter, Q. Lam, and K Dietmayer, “Towards large
scale multi-target tracking,” in Proc. Int. Society for Optics & Photonics:
2Nz̆ from (62), Ny̆ R} c ≥ 0, Nx̆ Q} c ≥ 0, Np + 2Nz̆ ≥ Nz̆ , Sensors & Systems for Space Applications VII, vol. 9085, pp. 90850W,
and T − P = S + Nz̆ from (43) and (44). 2014.
For (68), note from (57), (59) and (60), that P +Q = T +Ny̆ , [18] J. Olofsson, C. Veibäck, and G. Hendeby, “Sea ice tracking with a spatially
indexed labeled multi-Bernoulli filter,” in Proc. 20th Int. Conf. Information
Q + R = T + Nz̆ and R + P = T + Nx̆ . Additionally, Fusion, Xi’an, China, July 2017.
[19] D. E. Clark, S. J. Julier, R. Mahler, and B. Ristic, “Robust multi-object
P Q = (S + Nx̆ + Ny̆ ) (S + Ny̆ + Nz̆ ) sensor fusion with unknown correlations,” in Proc. Sens. Signal Process.
= SS + SNx̆ + 2SNy̆ + SNz̆ Def., pp. 1–5, Sep. 2010.
[20] M. Uney, D. E. Clark, and S. J. Julier, “Distributed fusion of PHD filters
+ Nx̆ Ny̆ + Ny̆ Nz̆ + Nz̆ Nx̆ + Ny̆ Ny̆ , via exponential mixture densities,” IEEE J. Sel. Topics Signal Process.,
vol. 7, no. 3, pp. 521–531, Jun. 2013.
QR = (S + Ny̆ + Nz̆ ) (S + Nx̆ + Nz̆ ) [21] G. Battistelli, L. Chisci, C. Fantacci, A. Farina, and A. Graziano, “Consen-
= SS + SNx̆ + SNy̆ + 2SNz̆ sus CPHD filter for distributed multitarget tracking,” IEEE J. Sel. Topics
Signal Process., vol. 7, no. 3, pp. 508–520, Jun. 2013.
+ Nx̆ Ny̆ + Ny̆ Nz̆ + Nz̆ Nx̆ + Nz̆ Nz̆ , [22] G. Battistelli, L. Chisci, C. Fantacci, A. Farina, and R. Mahler, “Distributed
fusion of multitarget densities and consensus PHD/CPHD filters,” in Proc.
RP = (S + Nx̆ + Nz̆ ) (S + Nx̆ + Ny̆ ) SPIE Def., Security Sens., Baltimore, MD, USA, vol. 9474, 2015.
= SS + 2SNx̆ + SNy̆ + SNz̆ [23] M. B. Guldogan, “Consensus Bernoulli filter for distributed detection and
tracking using multi-static Doppler shifts,” IEEE Signal Process. Lett., vol.
+ Nx̆ Ny̆ + Ny̆ Nz̆ + Nz̆ Nx̆ + Nx̆ Nx̆ . 24, no. 6, pp. 672–676, Jun. 2014.
PREPRINT: IEEE TRANS. SIGNAL PROCESSING, VOL. 68, 2754–2769, 2020. 16

(P Q + P + Q) r0 + (RP + R + P ) q 0 − (QR + Q + R) p0
= + SSr0 + SNx̆ r0 + 2SNy̆ r0 + SNz̆ r0 + Nx̆ Ny̆ r0 + Ny̆ Nz̆ r0 + Nz̆ Nx̆ r0 + Ny̆ Ny̆ r0 + T r0 + Ny̆ r0
+ SSq 0 + 2SNx̆ q 0 + SNy̆ q 0 + SNz̆ q 0 + Nx̆ Ny̆ q 0 + Ny̆ Nz̆ q 0 + Nz̆ Nx̆ q 0 + Nx̆ Nx̆ q 0 + T q0 + Nx̆ q 0
− SSp0 − SN p0 − SNy̆ p0 − 2SNz̆ p0 − Nx̆ Ny̆ p0 − Ny̆ Nz̆ p0 − Nz̆ Nx̆ p0 − Nz̆ Nz̆ p0 − T p0 − Nz̆ p0
| {z } | {z x̆ } | {z } | {z } | {z } | {z } | {z } | {z } | {z } | {z }
≥ 0 +0 +0 − SNz̆ p0 +0 +0 +0 − Nz̆ Nz̆ p0 +0 − Nz̆ p0
= − (S + Nz̆ + 1) Nz̆ p0 . (69)

[24] W. Yi, M. Jiang, R. Hoseinnezhad, and B. Wang, “Distributed multisensor [44] A. Guttman, “R-trees: A dynamic index structure for spatial searching,”
fusion using generalised multi-Bernoulli densities,” IET Radar, Sonar in Proc. 1984 ACM SIGMOD Int. Conf. Management of Data, vol. 14,
Navig., vol. 11, no. 3, pp. 434–443, Mar. 2016. no. 2, pp. 47-57, June 1984.
[25] B. L. Wang, W. Yi, R. Hoseinnezhad, S. Q. Li, L. J. Kong, and X. B. [45] Y. Manolopoulos, A. Nanopoulos, Y. Theodoridis, R-Trees: Theory and
Yang, “Distributed fusion with multi-Bernoulli filter based on generalized Applications, Springer Science & Business Media, 2010.
covariance intersection,” IEEE Trans. Signal Process., vol. 65, no. 1, pp. [46] A. Zomorodian and H. Edelsbrunner, “Fast software for box intersections,”
242–255, Jan. 2017. in Proc. 16th Annual Symp. Computational Geometry, ACM, 2000.
[26] T. C. Li, J.M. Corchado, and S. D. Sun, “On generalized covariance inter- [47] J. Bentley, “Multidimensional binary search trees used for associative
section for distributed PHD filtering and a simple but better alternative,” searching,” Communications of the ACM, vol. 18, no. 9, pp. 509-517,
in Proc. IEEE Int. Fusion Conf., pp. 1–8, 2017. Sept.1975.
[27] F. Meyer et. al., "Message Passing Algorithms for Scalable Multitarget [48] J. Bento, “A metric for sets of trajectories that is practical and mathemat-
Tracking," in Proceedings of the IEEE, vol. 106, no. 2, pp. 221-259, Feb. ically consistent,” arXiv preprint arXiv:1601.03094, 2016.
2018. [49] A. S. Rahmathullah, Á. F. Garcı́a-Fernández, and L. Svensson, “A metric
on the space of finite sets of trajectories for evaluation of multi-target
[28] C. Fantacci, B.-N. Vo, B.-T. Vo, G. Battistelli, and L. Chisci, “Robust tracking algorithms,” arXiv preprint arXiv:1605.01177, 2016.
fusion for multisensor multiobject tracking,” IEEE Signal Process. Lett., [50] J. Cardoso, “Dependence, correlation and Gaussianity in independent
vol. 25, no. 5, pp. 640–644, 2018. component analysis,” J. Mach. Learn. Res., vol. 4, pp. 1177–1203, 2003.
[29] S. Reuter, B.-T. Vo, B.-N. Vo, and K. Dietmayer, “The labeled multi-
Bernoulli filter,” IEEE Trans. Signal Process., vol. 62, no. 12, pp. 3246-
3260, 2014.
[30] C. Fantacci, and F. Papi, “Scalable multisensor multitarget tracking using
the marginalized-GLMB density,” IEEE Signal Process. Lett., vol. 23, no.
6, pp. 863-867, 2016
[31] X. Wang, A.K. Gostar, T. Rathnayake, B. Xu, A. Bab-Hadiashar, “Cen-
tralized multiple-view sensor fusion using labeled multi-Bernoulli filters,”
Signal Processing, vol. 150, pp. 75–84, 2018.
[32] S. Li, W. Yi, R. Hoseinnezhad, G. Battistelli, B. Wang, and L. Kong,
“Robust distributed fusion with labeled random finite sets,” IEEE Trans.
Signal Process., vol. 66, no. 2, pp. 278-293, 2018.
[33] M. Üney, J. Houssineau, E. Delande, S. J. Julier, and D. E. Clark. “Fusion
of finite-set distributions: Pointwise consistency and global cardinality,”
IEEE Trans. Aerosp. Electron. Syst., vol. 55, no. 6, pp. 2759-2773, 2019.
[34] S. Li, G. Battistelli, L. Chisci, W. Yi, B. Wang, and L. Kong, “Com-
putationally Efficient Multi-Agent Multi-Object Tracking with Labeled
Random Finite Sets,” IEEE Trans. Signal Process., vol. 67, no. 1, pp.
260-275, 2019.
[35] B.-N. Vo, S. Singh, A. Doucet, “Sequential Monte Carlo methods for
multitarget filtering with random finite sets,” IEEE Trans. Aerosp. Electron.
Syst., vol. 41, no.4, pp. 1224-1245, 2005.
[36] B.-T. Vo and B.-N. Vo, “Labeled random finite sets and multi-object
conjugate priors,” IEEE Trans. Signal Process., vol. 61, no. 13, pp. 3460-
3475, 2013.
[37] B.-N. Vo, B.-T. Vo, and D. Phung, “Labeled random finite sets and the
Bayes multi-target tracking filter,” IEEE Trans. Signal Process., vol. 62,
no. 24, pp. 6554-6567, 2014.
[38] B.-N. Vo, B.-T. Vo, and H. G. Hoang, “An efficient implementation of the
generalized labeled multi-Bernoulli filter,” IEEE Trans. Signal Process.,
vol. 65, no. 8, pp. 1975-1987, 2017.
[39] B. Ristic, B.-N. Vo, D. Clark, and B.-T. Vo, “A metric for performance
evaluation of multi-target tracking algorithms,” IEEE Trans. Signal Pro-
cess., vol. 59, no. 7, pp. 3452–3457, 2011.
[40] T. Vu and R. Evans, “A new performance metric for multiple target
tracking based on optimal subpattern assignment,” in Proc. 17th Int. Conf.
Information Fusion, Salamanca, Spain, July 2014.
[41] D. Schuhmacher, B.-T. Vo, and B.-N. Vo, “A consistent metric for per-
formance evaluation of multi-object filters,” IEEE Trans. Signal Process.,
vol. 56, no. 8, pp. 3447-3457, 2008.
[42] M. Beard, B.-T. Vo, B.-N. Vo, “OSPA(2) : Using the OSPA metric to
evaluate multi-target tracking performance,” in Proc. Int. Conf. Control,
Automation & Information Sciences, Chang Mai, Thailand, October 2017.
[43] M. Beard, B.-T. Vo and B.-N. Vo, “Performance evaluation for large-
scale multi-target tracking algorithms,” in Proc. 21st Int. Conf. Information
Fusion, Cambridge, UK, July 2018.

You might also like