4.data Fusion of Radar and Image Measurements For Multi-Object Tracking Via Kalman Filtering
4.data Fusion of Radar and Image Measurements For Multi-Object Tracking Via Kalman Filtering
net/publication/278072957
Data fusion of radar and image measurements for multi-object tracking via
Kalman filtering
CITATIONS READS
23 1,101
2 authors, including:
Du Yong Kim
Curtin University
52 PUBLICATIONS 411 CITATIONS
SEE PROFILE
All content following this page was uploaded by Du Yong Kim on 14 August 2018.
Abstract— Data fusion is an important issue for object tracking in autonomous systems such as robotics and
surveillance. In this paper, we present a multiple-object tracking system whose design is based on multiple Kalman filters
dealing with observations from two different kinds of physical sensors. Hardware integration which combines a cheap
radar module and a CCD camera has been developed and data fusion method has been proposed to process
measurements from those modules for multi-object tracking. Due to the limited resolution of bearing angle measurements
of the cheap radar module, CCD measurements are used to compensate for the low angle resolution. Conversely, the
radar module provides radial distance information which cannot be measured easily by the CCD camera. The proposed
data fusion enables the tracker to efficiently utilize the radial measurements of objects from the cheap radar module and
2D location measurements of objects in image space of the CCD camera. To achieve the multi-object tracking we combine
the proposed data fusion method with the integrated probability data association (IPDA) technique underlying the
multiple-Kalman filter framework. The proposed complementary system based on the radar and CCD camera is
experimentally evaluated through a multi-person tracking scenario. The experimental results demonstrate that the
implemented system with fused observations considerably enhances tracking performance over a single sensor system.
I. INTRODUCTION
Since the seminal paper of Kalman in 1960 [16], the Kalman filter has been the workhorse of various
disciplines due to its properties of optimality in linear Gaussian systems and easy implementation for the
real-world estimation problem. The literature shows that further research has been conducted to relax strict
assumptions of the Kalman filter (e.g., prior knowledge of the system, Gaussian noises) that inhibit its
applicability to real-world applications. Variants of the Kalman filter were rigorously investigated to
overcome the practical issues; however, there are still several remaining issues for efficient implementation
and development. In early days, Kalman filtering was usually applied to the aerospace science [25],
navigation systems [29, 33, 36] and the mechanical system control [14]. Recent advances in theory and
hardware extend its applicability to more diverse areas such as the process control in chemical engineering
[15], bio-medical application [4, 20], mobile robotics [6, 7, 17, 18], image processing [10] and computer
Among them, we focus on the very intuitive application of Kalman filtering, e.g., the object tracking,
which has been implemented in many application areas. In this work, regarding two specific physical
sensors we limit our discussion of related works to vehicle applications [19] and surveillance [34].
Conventional object tracking in the aerospace science utilizes the large-size radar system that returns
point-wise measurements. However, recent electronics technology makes the radar systems small and cheap,
enabling them to be used in commercial vehicle navigation systems [19, 30]. In those systems, the radar
measures relatively short range that can be utilized for pedestrians and adjacent vehicles detection to avoid
collision. One shortcoming of this radar system is the low resolution in bearing angle information due to the
wide beam width. In [30], authors proposed a combined system of a millimeter-wave radar and a CCD
camera with a calibration method via homography. However, since their system simply combines detection
data obtained from two sensors using the calibrated homography without considering noise model at all, it
cannot track objects successfully when noises exist. In addition, spurious measurements from false targets
are not considered so that many false tracks can be generated from background reflections.
There have been several studies about the tracking based on the single camera in surveillance which uses
the special configuration of the camera to explain the relationship between the object size and its radial
distance [34]. The multiple-camera system is also developed with overlapping or non-overlapping views
[21] to reconstruct the 3D trajectories of objects. One of main tasks of the multiple-camera based tracking in
surveillance system is the calibration of cameras to reconstruct 3D position of the object from 2D image
Moongu Jeon is with the School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, South Korea, (e-mail:
> 3
planes [3]. Recent works tried to calculate the normal and inclination vectors to reconstruct 3D position of
pedestrians from calibrated multiple cameras; however, the critical restriction of these system is that it can
be used only for the fixed location set-up. Therefore, the multiple-camera system is seldom utilized for
The main purpose of our research in this paper is to design a robust and cost-effective multi-object
tracking system for autonomous agents which can be deployed in unconstrained environments. To this end,
by adopting the sensor geometry [30] to combine a visual sensor with radar, we propose to employ Kalman
filtering and data fusion technique underlying a Gaussian mixture form. As mentioned earlier, main
differences between [30] and our work is that detection procedure of [30] is not able to circumvent sensor
noises and false positive detections, whereas the proposed multi-object tracking system has such
Since a linear Gaussian state-space model (2) is considered in this paper, the Kalman filter is naturally
used to guarantee the best linear minimum mean square error performance [1], [16]; resulting in reduction of
sensor noises. Additionally, we utilize data fusion technique which is embedded in the Kalman filtering
framework to reduce the uncertainty originated from different sensor models and different types of sensor
noises that are not taken into account in [30]. In tracking problems with multisensory observations, data
fusion is important because different types of uncertainties are originated from individual sensors so that
[email protected]).
> 4
they affect the tracking performance differently. Here, we mean data fusion as the mathematical
methodology to combine data collected from different sources with uncertainty models in order to provide
more reliable information, which will be discussed in detail in Section III. B. In addition, false positive
detections (e.g., non-target originated measurements) are avoided substantially by incorporating the IPDA
technique [23].
In the theoretical aspect, we extend the mixture of Kalman filter algorithm to use multisensory
measurements for multi-object tracking that is implemented in the developed hardware module. A new
tracker is called mixture of fusion Kalman filter because it fuses two independent observations from two
physical sensors (i.e., radar, optical sensor) to construct the complementary system and tracks
The remainder of this paper is organized as follows. In Section II, the Bayesian filtering problem is
introduced and Kalman filtering equations are given for its solution. For the developed measurement system,
radar and CCD sensors are also explained in Section II. In Section III, the proposed algorithm implemented
in the system is subsequently explained in detail with the homography learning, data fusion technique, and
multi-object tracking. Then, the proposed system is evaluated in real-world implementation of multi-person
The fundamental formulation of the Kalman filter would be understood as a Bayesian filtering framework.
Here, we introduce the basic Bayesian filtering framework and define the problem to be solved. Let xt nx
denote the state of object (e.g., position and velocity) at time t and the given sensor collects the observation
yt y of the state xt . In sequential Bayesian filtering, we want to construct p xt yt , the probability density
n
In practice, however, it is impossible to analytically obtain the complete pdf because of intractable integrals
and arbitrary pdf forms. In 1960, Kalman proposed the optimal estimator using the minimum mean square
error criterion under the linear Gaussian assumption [16] that can be understood as the solution of the linear
Gaussian Bayesian filtering problem. For the brief review, we introduce the governing equations of the
Kalman filter.
xt Ft xt 1 t ,
yt Ct xt t , (2)
n y nx
where Ft nx nx is the state transition matrix; Ct is the observation matrix of the linear dynamic
system; t and t are system and observation zero mean noise vectors whose pdf are Gaussian, i.e.,
t ;0, Qt and t ;0, Rt , where Qt and Rt are noise covariances, respectively. The initial state x0 is also
Gaussian i.e., x0 ; x0 , P0 and mutually uncorrelated with noise sequences. Also, system and observation
noises are mutually uncorrelated each other. The notation for Gaussian pdf of the random process xt is given
as xt ; xt , Pt where xt is mean vector and Pt is its covariance matrix. Then, following equations explicitly
describe the recursion for Kalman estimate xˆt and its covariance Pt as follows.
where Pt is the predicted error covariance, K t is the Kalman gain and I nx is the nx nx identity matrix.
Note that this formulation is for the single object tracking formulation. In Section III, we introduce the
information form of Kalman filter which is mathematically equivalent to the standard Kalman filter (3)-(4)
in order to propose data fusion and extend it to the multi-object, multi-sensor problem. System parameters
In this work, we developed a physical system by combining a radar and a CCD camera to get observations
and detect moving objects more correctly and economically. Fig. 1 shows the radar module of the system
that mainly measures the radial distance of objects. SiversIMA RS 3400 module is a frequency modulation
continuous wave (FMCW) radar module [28]. Its center and sweep frequencies are 24.7 GHz, 1.5GHz,
respectively. In the experiments, we set the start-frequency of sweeping to 24GHz and the stop frequency to
25.5GHz. Because the Sivers IMA RS 3400 module is step frequency continuous wave (SFCW) radar,
frequency separation between two contiguous frequency points of the sweeping is set to 1.6MHz.
In the FMCW radar, the sensor output corresponds to the cosine of the phase difference between the echo
s cos , (5)
where s is the output signal from the sensor and 2 2d / is the phase difference between the echo RF
signal and the radiated signal. Here, 2d is the round trip distance to the reflecting object and is the
electrical wavelength of the RF signal. Then, can be represented by the frequency of the RF signal, i.e.,
f RF as
2d
f RF 2 f RF , (6)
c
where c / f RF , and c is the speed of light. From the FMCW property, a small value of d will create a
slowly varying detection signal and a relatively long distant echo will return a quickly varying detection
> 7
Fig. 2. Overall system and set up for outdoor experiments Fig. 3. Sensor geometry
signal. We linearly increase the frequency from 24 GHz to 25.5GHz for the bandwidth BW to be equal to
1500MHz. Then the expression of the output signal can be represented in more specific form as
2d n
s n cos 0 2 BW , (7)
c N
where n is the frequency point index, n 0,1,..., N 1 and N is the number of frequency points for the
observation sequence, and 0 is the phase value at the starting RF signal frequency of the sweeping. Then,
we extract the distance information, i.e., the radial distance of the object from the simple fast Fourier
1æ æ 2d * BW ö÷ æ öö
çç f + 2d * BW ÷÷÷÷ ,
S ( f ) = fft ( s (n)) = ççD çç f - ÷ + D (8)
2 èç èç c ø÷ çè c ø÷÷ø
where f 0,1,..., N 1 , denotes the normalized index in the transformed domain (distance or frequency
domain) and D (⋅) is Dirac delta function. By excluding negative frequency information, we finally detect
range information as d fc / (2 BW ) . Note that the range resolution is dependent upon the signal bandwidth
only, not specific frequency. Based on specifications, the rage resolution of the system is about 0.1 m.
In the system we developed the radar module that is mainly used to collect the range information while the
bearing information is not mainly considered because the used cheap radar sensor does not have good
> 8
resolution in bearing angle. Thus, when the reflected signal from the object would be contiguous within
certain bearing angle range, the radar has ambiguous measurements. In contrast, the CCD sensor can resolve
close targets from the image observation. To compensate for the low resolution in bearing angle of the radar,
we utilize the CCD camera to provide the x-, and z- position on the image plane. Here, we use the CCD with
VGA sensor 640 480 pixel supporting 30 fps. On the image plane, we represent the position of the object as
a bounding box and the center position of the bounding box is regarded as the point-wise measurement of the
object.
To detect the object as a bounding box from the given CCD camera image, first we utilize the bearing angle
measurement of the radar. When the object is roughly detected from the radar measurements, a trained blob
detector [5] is employed to obtain the bounding box as the representation of the object. From the bounding
box detection in the image, we can resolve the ambiguity in bearing angle measurement of the radar. In this
case, however, two sensors should be calibrated to be used complementarily because they are not in the same
geometric space.
A. Homography learning
Motivated from [30], we developed a multiple-object tracking system based on the combined
observation (radar and CCD camera). We adopt the calibration method proposed in [30] to acquire the
homography between the radar observation space and the CCD camera image space as described in Fig. 3.
Denote the radar and the camera coordinates by xr , yr , zr and xc , yc , zc , respectively, and the
coordinate in the image plane by u , v . If we fix the radar plane in yr 0 , then the relation between the
where H hi , j is the 3 3 homography which represents the coordinate relation between the radar plane
i , j 1,2,3
and the CCD image plane. Here, we assume the affine homography which means h31 h32 0, h33 1 . Then,
where h h11 , h12 , h13 , h21 , h22 , h23 ,0,0,1 , ax xr , zr , 1,0,0,0, uxr , vzr , v az 0,0,0, xr , yr , vxr , vzr , v . (12)
T T T
This problem is known as the homogeneous linear least square problem which can be solved using Singular
Once we obtain the homography matrix as explained in (9)-(12), we need to consider two kinds of
uncertainties: 1) sensor measurement error (i.e., signal noise), and 2) measurement conversion error (from
polar to Cartesian). If these two uncertainties are critical, the estimated homography matrix can be unreliable,
which leads to the tracking failure in the worst case. Beyond the work [30] that only learns homography, the
novelty of the proposed algorithm is to consider a mathematical model of the uncertainty (e.g., sensor
noises) as given in (13) and reduce them by using Kalman filtering and two sources of measurements will be
fused using the data fusion technique that will be explained in Section III. B. In the experiments, we will
show that if the variance of the sensor noise for each sensor Rti is appropriately chosen, reliable tracking
performance can be achieved. In a particular case, when the parameter Rti is not consistent, one can use a
self-tuning technique to tune the parameters online [31] which is not considered in this paper.
After calibrating two sensors via the homography, collected observations from the radar and the CCD
camera are converted onto the common plane from the learned sensor geometry. The resultant mathematical
where i 1 denotes the radar sensor and i 2 does the CCD camera. The observation matrices for two
sensors are given as Ct1 1 0 1 and Ct2 1 0 1 , respectively. Note that Ct1 1 0 1 is the
observation matrix for the radar measurement converted from the polar coordinate into the Cartesian
coordinate plane.
In our system, y-coordinate representing the vertical sensor position is fixed, and measurement noises
are considered as additive terms which represent aforementioned measurement uncertainties that contains 1)
signal noise and 2) measurement conversion error. The fixed y-coordinate location is assumed to be high
enough not to have severe ground clutters. But if the sparse ground clutters are dominant, they can be
effectively reduced using the PDA based clutter rejection algorithm that will be detailed in subsection C.
However, if the developed sensor system (i.e., radar + CCD camera) is moving in a y-coordinate direction
there must be an additional block to estimate the attitude of the sensor system and this effect should be taken
into account to calculate the homography. This problem known as the attitude estimation can be solved by
employing another Kalman filter that estimates the attitude of sensor system based on the attitude
measurements from the internal measurement unite (IMU). This problem is well discussed in the previous
B. Data fusion
As mentioned above, the object position in x-z plane is measured by two sensors. After calibration, we
obtain two sets of measurements on the common plane. Considering the sensor resolution by modeling the
sensor noise and errors in homography learning, we propose data fusion based on Kalman filtering technique
The main purpose of the data fusion technique is to combine measurements from multiple sensors that
monitor common objects considering the uncertainty of individual sensor. It is a rule of thumb that collecting
as many data as possible gives more reliable performance. However, just collecting data from different
sources without efficient fusion rule can seriously degrade the quality of an estimate even with large amount
> 11
Therefore, we propose to employ one of well-known data fusion methods called “information fusion”
that is based on the information form of Kalman filter. The reason we chose information fusion method is
that it is simple in use and the optimality [26]. In what follows, we first explain the centralized information
fusion and then, the decentralized information fusion is subsequently provided. In addition, the functional
equivalence between two data fusion schemes is explained with a simple proof.
Yt Ct xt t , (14)
where augmented measurement Yt , measurement matrix Ct , and noise t ~ t ;0, Rt are specified in (15).
C
T T
Ct Ct1 , 1 ,
T T T T
2 2
t t t t
(15)
y
T
Rt diag R , R , Yt yt1 .
T T
1 2 2
t t t
Then, the information form of the Kalman filter provides the natural centralized data fusion estimate given
by (16)-(17).
Time Update;
xˆt Ft xˆt ,
(16)
Pt Ft M t Ft T Qt ,
Observation Update;
Dt CtT Rt1Ct ,
Bt CtT Rt1Yt ,
M t Pt 1 Dt ,
1
(17)
xˆt xˆt M t Bt Dt xˆt ,
where Dt and Bt represent the contribution terms of the state and information matrix, respectively.
From the parallelization of the contribution terms, the mathematically equivalent fusion filter can be
From the mathematical equivalence, the optimality of the decentralized fusion algorithm (16)-(19) is
guaranteed [8].
Another data fusion method called decentralized fusion can be described by the weighted sum observation
form as follows.
1 2 1 1 2
2 1 2 1 2 1
Ct Rti R C , Rt Rti , Yt Rti R
1 1
i i i
t t t Cti . (20)
i 1 i 1 i 1 i 1 i 1
In the proposed algorithm, the information fusion filter (16)-(19) is used as a basic framework and the
Note that if observation matrices of sensors are identical, then, the centralized fusion scheme and the
decentralized fusion scheme are functionally equivalent as shown in the following simple proof.
Proof. (18) and (19) represent the contribution term calculation in the information Kalman filtering. This
implies that the augmented measurement form of (15) can be transformed to the summation of contributions
from each sensor. Thus, if (18) and (19) with (20) are functionally equivalent to (18) and (19) with (15), the
centralized fusion and decentralized fusion are equivalent. This reasoning is represented by the following
equality and it holds if observation matrices are identical, i.e., Ct1 Ct2 .▄
T
2
1 2
R
1 1 1 2 1
C Rti
2 T
i 1
C
t
i
Rt i
t
i
Rt
i
C
t
i
t
i
Cti ,
i 1 i 1 i 1
T
2
1 2
C R R R
1 1 1 2 1
y Rti
2 T
i 1 t
i
t
i i
t t
i
C t
i
t
i
yti ,
i 1 i 1
i 1
In summary, data fusion procedure in this paper can be described in following steps.
1) Collect the raw measurements from two physical sensors (radar and CCD camera).
3) Learn the homography between the radar and CCD camera coordinate.
6) Calculate the estimate and covariance using (16)-(17) with contribution terms.
The main task of the proposed tracking system is to detect the positions of multiple objects and track their
trajectories with the multi-sensory measurements. Intuitively, one may think that the remaining task is
simply assigning the information fusion filters (16)-(20) to individual tracks. However, it is not that
straightforward in practice because we have to deal with false positive detections (i.e., observations
In the proposed work, we adopt the multi-object integrated probabilistic data association (MIPDA) to
resolve two aforementioned problems in the multiple-object tracking [22]. This is the multiple-object
extension of IPDA [23] which is known as one of most efficient data association algorithms under clutter
environments. And the automatic management of tracks (object appearance and disappearance), is handled
To make the paper self-contained, we summarize MIPDA algorithm for our system. First, we introduce
notations for track management and data association for each track as follows.
t the discrete event of object existence at time t where t 1 for the target existence and nonexistence
t, j the event that the j th gated observation is object-originated observation and all others are
clutter-originated at time t .
t,0 the event that all gated observations are not detected at time t .
Yˆ ,t
Yˆ1 , Yˆ2 ,..., Yˆt a set of composite observations inside the gate of track up to time t , where
> 14
Yˆt Yˆt, j
j 1,..., mt
yˆt,,1j , yˆt,,2j
j 1,..., mt
yˆt ,i yˆt ,i Cti xˆt Sti
T 1
yˆ
i
t
Cti xˆt ,i , i 1, 2 . (21)
In (21), Sti is the residual covariance defined in (4); ,i is the gating threshold for track ; mt is the number
of composite gated measurement sets for track which contains a pair of the radar and the CCD
Then, the joint posterior probability density p xt , t Yˆ ,t can be decomposed into two conditional
probabilities by the product rule as
p xt , t Yˆ ,t P t Yˆ ,t p xt t , Yˆ ,t , (22)
where P t Yˆ ,t is the object existence probability calculated via prediction with a Markov chain transition
Prediction:
P t Yˆ ,t 1 1,1 P t Yˆ ,t 1 2,1 1 P t Yˆ ,t 1 (23)
Update:
1 P
Yˆ ,t 1
t t
P t Yˆ ,t
(24)
1 t P t Yˆ ,t 1
where 1,1 P t 1 t1 1 and 2,1 P t 2 t1 1 are the predefined transition probabilities
between the binary states of the existence. t is the data association factor given in (25)
t , j
t PD PG 1 j 1
mt
, (25)
t, j
where PD and PG are the detection probability and the gating probability; t, j is the given clutter density; and
t , j p Yˆt, j t , Yˆ ,t 1 . (26)
> 15
We approximate the predicted measurement density (26) by product of the Kalman filter innovation
t , j yt1, j ; yˆt1, j Ct1 xˆt , St1 yt2, j ; yˆ t2, j Ct2 xˆt , St2 , (27)
The object existence probability provides the track management capability that is not inherently given to
the classical PDA framework. In the implementation, the track is created using the known initialization
algorithm, namely the two-step initialization (TSI) [2]. Once track is created, the object existence probability
is calculated as described in (23)-(24) recursively, and the track deletion is decided when this value is below
Then, IPDA approximates the object state xˆt and its covariance Pt by using the weighted summation
representation as following
Pt j t 0 t, j Pt, j xˆt, j xˆt xˆ ,
T
xˆt
m
(29)
t, j
where xˆt, j and Pt, j are obtained using the j th gated measurement set via the data fusion technique given in
> 16
the previous section, and t, j is the associated weight of gated measurement set defined by (30).
By utilizing the object existence variable t , we integrate the track management process into the data
P t , t, j Yˆ ,t = 1 PD PG , j0
t , j P t , j
t , Yˆ ,t
1
t , j (30)
.
P t Yˆ ,t 1 t D G ,
P P j0
t, j
So far we have discussed the data fusion and multiple-object tracking framework for the proposed algorithm.
Fig. 4 describes the whole process of the proposed work. Specifically, the flow of the overall system is
1) Using a set of reference data (radar and camera), calculate the homograph matrix in (9) with the least
2) From the homography, the radar and camera measurements are mapped on to the common observation
plane.
3) Multi-object tracker block initializes or terminates tracks and excludes clutters from the object existence
4) For each track, data fusion and track estimation are performed via information Kalman prediction and
5) Gaussian mixture of Kalman estimate is finally established to represent multi-object state estimate.
The proposed algorithm is based on individual IPDA filter for each track combined with the specified track
management block and the data fusion technique embedded in the information Kalman filtering. Compared
to other multi-object trackers such as JPDA filter [2] or PHD filter [25], IPDA reduces the computational
time thanks to the track quality measure, i.e., object existence probability. In the experimental results, we
include the computation time comparison with the PHD filter that is known to be the state-of-the-art
multi-object tracker.
> 17
Our proposed system is applied to tracking three objects that appear and disappear independently. A set
of the ground truth for them is displayed in Fig. 6 as red lines. Our experiments are done under controlled
environment where objects are moving based on the predetermined scenario. Therefore, the ground truth is
obtained from the simulated position based on the exactly known scenario. But the scenario is not known to
Observations are collected from the proposed system as described in Section III. Note that we observe
and estimate positions of objects in 2D space assuming that y-coordinate is fixed. In the experiments, we
focus on two performance comparisons. One is the comparison of the proposed system with the radar-only
system, and the other is the comparison of the tracking algorithms embedded in the proposed system. Before
discussing the experimental results, we describe the mathematical model of the dynamic system and related
parameters.
1 T
Ft diag ( F1 , I 2 , F1 ), F1 , (31)
0 1
T 3 / 3 T 2 / 2
Qt diag Q1 , 02 , Q1 Q1 q 2 , (32)
T / 2 T
where T is the sampling interval and the system process noise parameter q is set to 0.1. I 2 and 02 are
2 2 identity and zero matrices, respectively. Ft and Qt represent the block matrices for the dynamic system
and noise covariance, respectively. The given model (31)-(32) is conventionally used in tracking literature;
called Ground Moving Target Indicator (GMTI) model [13, 29]. Note that the GMTI model describes the
position and velocity of moving object for each coordinate. Here, the velocity state is not measured but
regarded as noise terms as given in (32). Because we do not measure the velocity using the sensor, the
> 18
20
600 18
600
16
500
500
14
400400
12
Z coordinate
amplitude
amplitude
300300 10
200 8
200
6
100
100
4
0
0 5 10 15 20 25 2
0
0 5 distance
10 (m) 15 20 25
distance (m) 0
−5 0 5
X coordinate
20
600 18
600
500 16
500 14
400
12
Z coordinate
amplitude
400
amplitude
300 10
300
8
200
200
6
100
100 4
0 0 2
0 0 5 5 10 10 15 15 20 20 25 25
distancedistance
(m) (m) 0
−5 0 5
X coordinate
Fig. 5 Observations from two sensors and its display in the common plane including clutters
20 20
18 18
16 16
14 14
12 12
Z coordinate
Z coordinate
10 10
8 8
6 6
4 4
2 2
0 0
−5 0 5 −5 0 5
X coordinate X coordinate
Fig. 6 Trajectory comparison (red: ground truth, blue: estimated trajectory, black star: observations)
Originally, radar observations are on the polar coordinate, thus, the measurement matrix of the radar
sensor, Ct1 1 0 1 , is obtained by using the coordinate transformation from the polar to the Cartesian. In
the experiments, the observation noise parameters for each sensor are set as Rt1 diag (5, 0.1) and
Rt2 diag (0.5, 0.5) . PD and PG are set to 0.9 and 0.865, respectively.
For the proposed multi-object tracker, we assume that the dynamic system and the noise model are given
> 19
as a priori as mentioned before. The system noise is assumed to be Gaussian which contains information
about the maximum speed of the object. The measurement noise covariances are obtained by experiments,
and the gating threshold is set to the value as suggested in the literature [1]. The track deletion threshold is
Once the parameters are decided, it is sufficient to guarantee the reliable performance. That is because
sensors are stable and the maximum velocity (speed) of the object is predictable based on the sampling
interval (or scanning time) of the system. However, the performance of the tracker can be degraded when
severe clutters of non-uniform distribution are involved. In such situations, an adaptive clutter intensity
Fig. 5 illustrates range signals from the radar, detections from CCD and combined sets of these two types
of signals mapped onto the common plane. The displayed detections in Fig. 5 can be considered as the result
of [30] that is the motivation of the complementary observation system. Note however that it is not sufficient
to show the reliable tracking performance because the detection results contain uncertainty from sensor
As can be seen in Fig. 5 a), range measurements of radar are corrupted by sensor noise and clutters.
Clutters are originated from the multi-path effect from fences and the ground. Detected CCD measurements
are displayed in Fig. 5 b) with bounding box representation and its centroid. In this case, it is not possible to
perceive exactly how far objects are located if the CCD is not calibrated. Fig. 5 c) displays the combined
measurements using estimated homography. From those inputs, the designed filter rejects clutters and
associates detections using IPDA technique as suggested in Section III. C. The blob detector used in our
system may have false positive detections when the feature values have not sufficient discriminating power.
However, certain amount of false positive detections can be moderately rejected using the IPDA technique.
> 20
For the quantitative comparison we use the particular metric for multi-object tracking called the Optimal
Sub-Pattern Assignment (OSPA) distance [27] instead of the mean square errors (MSE). Note that the OSPA
distance is the overall performance index which contains the localization error (i.e., position error) and the
cardinality error (object number error) simultaneously. The reason for using the OSPA distance is that MSE
To clearly show the advantages of the proposed system, we compare its performance with that of
radar-only system. As can be seen in Fig. 6, the comparison of estimated trajectories results show that the
proposed system outperforms the radar-only system due to the complementary observations based on the
We display the comparison of overall performance with respect to the OSPA distance in Fig. 7. In Fig. 7 a),
we see that the proposed system is superior to the radar-only system. Note that the OSPA distance is
important performance metric of multiple object tracking because it simultaneously calculates the
localization error and the cardinality error, thus, track losses are taken into account in the error metric.
The resultant OSPA distance of the proposed system has significant low (accurate) and stable values
compared to the radar-only system. For the clarity, the localization distance and the cardinality distance are
also displayed in Fig. 7 b) and Fig. 7 c), respectively. They also confirm that the proposed system
The proposed complementary observation with MIPDA gives an accurate estimate for the number of
objects due to the correct data association and clutter rejection. Thus, the correct track management and the
efficient clutter rejection significantly enhance the tracking performance of the proposed system. It is
obvious that it also outperforms the vision-only system because vision-only system cannot inherently
3.5 2 3.5
propsed system MIPDA propsed system MIPDA propsed system MIPDA
radar−only MIPDA 1.8 radar−only MIPDA radar−only MIPDA
3 3
1.6
Localization distance
Cardinality distance
OSPA distance
1.2
2 2
1
1.5 1.5
0.8
1 0.6 1
0.4
0.5 0.5
0.2
0 0 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600 0 100 200 300 400 500 600
k time step k time step k time step
Fig. 7 Comparison between the radar-only system and the proposed system using OSPA distance, localization distance, and
cardinality distance
3.5 2 3.5
propsed system MIPDA propsed system MIPDA propsed system MIPDA
GMPHD filter 1.8 GMPHD filter GMPHD filter
3 3
1.6
Cardinality distance
OSPA distance
1.2
2 2
1
1.5 1.5
0.8
1 0.6 1
0.4
0.5 0.5
0.2
0 0 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600 0 100 200 300 400 500 600
k time step k time step k time step
Fig. 8 Comparison between the GMPHD filter and the proposed system using OSPA distance, localization distance, and cardinality
distance
Finally, we make the performance comparison with recently proposed multi-object tracker based on
Gaussian Mixture PHD (GMPHD) [25]. GMPHD is known to be the full Bayesian multi-object tracker
which does not require the data association. This method regards multi-object states as single meta-state
over the whole state space. The experimental results for the same scenario with respect to the OSPA distance
are illustrated in Fig. 8 a). We also confirm from the results that the proposed system is superior to the
GMPHD not only in the accuracy but also in computational time per scan where the elapsed time are given
as follows: GMPHD: 0.1s, Proposed MIPDA: 0.002s, respectively. The reason for this result is from the
gating technique used in the MIPDA algorithm. GMPHD also can be improved by using the gating
technique as suggested in [35], however, track identity information is not inherently provided in GMPHD
filter. The measured computation time can be thought of as the execution time of the global system and the
> 22
elapsed time of our system shows that it is real-time. Note that our system is implemented in C++ using the
Compared to the results of radar-only system in Fig. 7, GMPHD achieves little performance
improvement; because the cardinality error is not reliable as can be seen in Fig. 8 b). The unreliable
cardinality information leads to the degradation in the overall performance as represented in the OSPA
distance.
V. CONCLUSION
In this paper, we developed hardware integration combining radar and CCD sensor measurements for a
multiple-object tracking system. The complementary observation system is designed by using homography
and data fusion techniques. By doing so, the low resolution of bearing angle of radar is compensated by CCD
observation. Then, the multiple-object states are accurately estimated via Gaussian mixture of IPDA filter
called multi-object IPDA (MIPDA). The performance of the proposed system is evaluated with the
real-world experiments and test results verify that the system is viable for the real-world implementation.
REFERENCES
[1] Bar-Shalom, Y., and Li, X. R., Multitarget-Multisensor Tracking: Principles and Techniques, YBS publishing, Storrs, CT,
1995.
[2] Bar-Shalom, Y., and Waston, G. A., “Automatic tracking formation in clutter with a recursive algorithm,” Proceedings of 28th
[3] Bucher, T., Curio, C., Edelbrunner, J., Igel, C., Kastrup, D., Leefken, I., Lorenz, G., Steinhage, A., and Seelen, W. V., “Image
processing and behavior planning for intelligence vehicles,” IEEE Transactions on Industrial Electronics, vol. 50, no. 1, pp.
62-75, 2003.
[4] Butala, M. D., Frazin, R. A., Chen, Y., and Kamalabadi, F., “Tomographic imaging of dynamic objects with the ensemble
Kalman filter,” IEEE Trans. Image Processing, vol. 18, no. 7, pp. 1573–1587, July 2009.
[5] Chang, F., Chen, C.-J., and Lu, C.-J., “A linear-time component-labeling algorithm using contour tracking technique,”
Computer Vision and Image Understanding, vol. 93, no. 2, pp. 206-220, February, 2004.
[6] Chen, S. Y., “Kalman filter for robot vision: A survey,” IEEE Transactions on Industrial Electronics, vol. 59, no. 11, 2012.
> 23
[7] Cho, H., and Kim, S. W., “Mobile robot localization using biased chirp spread spectrum ranging,” IEEE Trans. Industrial
[8] Chong, C., Chang, K., and Mori, S., “Distributed tracking in distributed sensor networks,” In Proceedings of the American
[9] De Laet, Tinne, Bruyninckx, Herman, and De Schutter, Joris, “Shape-based online multitarget tracking and detection for
targets causing multiple measurements: variational Bayesian clustering and lossless data association,” IEEE Trans. Pattern
Analysis and Machine Intelligence, vol. 33, no. 2, pp. 2477-2491, 2011.
[10] Dedrick, Eric and Lau, Daniel, “A Kalman-filtering approach to high dynamic range imaging for measurement applications,”
IEEE Trans. Image Processing, vol. 21, no. 2, pp. 527-536, 2012.
[11] Deng, Zili, Zhang, Peng, Qi, Wenjuan, Liu, Jinfang, Gao, Yuan, “Sequential covariance intersection fusion Kalman filter,”
[12] Fleuret, F., Berclaz, J, Lengagne, R., and Fua, P., ” Multi-camera people tracking with a probabilistic occupancy map,” IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 2, pp.267-282, 2008.
[13] Gelb, A., Ed., Applied Optimal Estimation, The MIT Press, 1974.
[14] Goodwin, G. C., Haimovich, H., Quevedo, D. E., and Welsh, J. S., “A moving horizon approach to networked control system
design,” IEEE Trans. Automatic Control, vol. 49, no. 9, pp.1427–1455, September 2004.
[15] Haseltine, E. L., and Rawling, J. B., “Critical evaluation of extended Kalman filtering and moving-horizon estimation,” Ind.
[16] Kalman, R. E., “A new approach to linear filtering prediction problems,” ASME Journal of Basis Engineering, vol 82, pp.
34–45, 1960.
[17] Kim, Chanki, Sakthivel, Rathinassamy, and Cung, Wan Kyun, “Unscented FastSLAM: A robust and efficient solution to the
SLAM problem,” IEEE Trans. on Robotics, vol. 24, no. 4, pp. 808-820, 2008.
[18] Lee, J. M., Son, K., Lee, M. C., Choi, J. W., Han, S. H., and Lee, M. H., “Localization of a mobile robot using the image of a
moving object,” IEEE Trans. Industrial Electronics, vol. 50, no. 3, pp. 612–619, June 2003.
[19] Lee, M.-S., and Kim, Y.-H., “An efficient multitarget tracking algorithm for car applications, IEEE Trans. Industrial
[20] Lee, S. J., Motai, Y., and Murphy, M., “Respiratory motion estimation with hybrid implementation of extended Kalman
filter,” IEEE Transactions on Industrial Electronics, vol. 59, no. 11, 2012.
[21] Li, Y., Wu, B., and Nevatia, R., “Human detection by searching in 3D space using camera and scene knowledge,” in
[22] Musicki, D., and Evans, R., “Joint integrated probabilistic data association-IJPDA,” in proceedings of the 5th IEEE
[23] Musicki, D., Evans, R., and Stancovic, S., “Integrated probabilistic data association,” IEEE Trans. Automatic Control, vol. 39,
[24] Musicki, D., and Scala, B. L., “Multi-target tracking in clutter without measurement assignment,” in proceedings of the 43rd
[25] Pasha, S. A., Vo, B.-N., Tuan, H. D., and Ma, W.-K., “A Gaussian mixture PHD filter for jum Markov system models,” IEEE
Trans. Aerospace and Electronic Systems, vol. 45, no. 3, pp. 919–936, July 2009.
[26] Rao, B. S. Y., Durrant-Whyte, H. F. and Sheen, J. A., “A fully decentralized multi-sensor system for tracking and
surveillance,” Int. Journal of Robotics Research, vol. 12, no. 1 pp.20-44, February 1993.
[27] Schumacher, D., Vo, B. T., and Vo, B. N., “A consistent metric for performance evaluation of multi-object filters,” IEEE
[29] Song, H., Shin, V., and Jeon, M., “Mobile node localization using fusion prediction-based interacting multiple model cricket
sensor network, “ IEEE Transactions on Industrial Electronics, vol. 59, no. 11, 2012.
[30] Sugimoto, S., Tateda, H., Takahashi, H., and Okutomi, M., “Obstacle detection using millmeter-wave radar and its
visualization on image sequence,” in proceedings of the 17th International Conference on Pattern Recognition, UK, 2004.
[31] Sun, S., “Optimal and self-tuning information fusion Kalman multi-step predictor,” IEEE Trans. Aerospace and Electronic
[32] Won, S.-H.P., Melek, W. W., and Golnaraghi, F., “A Kalman/particle filter-based position and orientation estimation method
using a position sensor/inertial measurement unit (IMU) hybrid system,” IEEE Trans. Industrial Electronics, vol. 57, no. 5,
[33] Xe, Xiufeng, Le, Yang, and Xiao, Wendong, “MEMS IMU and two-antenna GPS integration navigation system using
internal adaptive Kalman filter,” IEEE Aerospace and Electronic Systems Magazine, vol. 28, no. 10, pp. 22-28, 2013.
[34] Xu, X., and Li, B., “Adaptive Rao-Blackwellized particle filter and its evaluation for tracking in surveillance,” IEEE Trans.
[35] Zhang, Hongjian, Jing, Zhongliang, and Shiquiang, Hu, “Gaussian mixture CPHD filter with gating technique,” Signal
[36] Zhou, Zebo, Li, Yong, Liu, Junning, and Li, Gun, “Equality constrained robust measurement fusion for adaptive
Kalman-filter-based heterogeneous multi-sensor navigation,” IEEE Trans. Aerospace and Electronic Systems, vol. 49, no. 4,