0% found this document useful (0 votes)
42 views8 pages

5.Camera-Radar Data Fusion For Target Detection Via Kalman Filter and Bayesian Estimation

Uploaded by

zphtym375
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views8 pages

5.Camera-Radar Data Fusion For Target Detection Via Kalman Filter and Bayesian Estimation

Uploaded by

zphtym375
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018

2018-01-1608 Published 07 Aug 2018

Camera-Radar Data Fusion for Target Detection via


Kalman Filter and Bayesian Estimation
Zhexiang Yu, Jie Bai, Sihan Chen, Libo Huang, and Xin Bi Tongji Univ.

Citation: Yu, Z., Bai, J., Chen, S., Huang, L. et al., “Camera-Radar Data Fusion for Target Detection via Kalman Filter and Bayesian
Estimation,” SAE Technical Paper 2018-01-1608, 2018, doi:10.4271/2018-01-1608.

Abstract
real-time performance and accuracy. Secondly, the coordinate

T
arget detection is essential to the advanced driving system of camera and radar are unified by coordinate trans-
assistance system (ADAS) and automatic driving. And formation matrix. Then, the parallel Kalman filter is used to
the data fusion of millimeter wave radar and camera track the targets detected by radar and camera respectively.
could provide more accurate and complete information of Since targets data provided by the camera and radar are
targets and enhance the environmental perception perfor- different, different Kalman filters are designed to achieve the
mance. In this paper, a method of vehicle and pedestrian tracking process. Finally, the targets data are fused based on
detection based on the data fusion of millimeter wave radar Bayesian Estimation. At first, several simulation experiments
and camera is proposed to improve the target distance estima- were designed to test and optimize the proposed method, then
tion accuracy. The first step is the targets data acquisition. A the real data was used to prove further. Through experiments,
deep learning model called Single Shot MultiBox Detector it shows that the measurement noise can be considerably
(SSD) is utilized for targets detection in consecutive video reduced by Kalman filter and the fusion algorithm could
frames captured by camera and further optimized for high improve the estimation accuracy.

Introduction
the target detection algorithm is just implemented in these

T
o develop advanced driving assistance system (ADAS) ROIs to judge the existence of targets to reduce the run time.
and automatic driving, real-time and robust on-road Finally, refine the detected targets’ boundary.
target detection is one of the key modules of vehicle However, the performance of this kind of algorithm is
environmental perception. Because the on-road driving limited by the capability of millimeter wave radar. Once a
circumstances are complex and unpredictable, it’s necessary target is missed by radar, it can’t be detected by the following
for vehicles to equipped with different types of sensors to deal detection algorithm.
very well with the issues of environmental perception and This paper proposes a new fusion method of camera and
recognition. Multi-sensor fusion can take advantage of radar data for target detection via Kalman filter and Bayesian
different types of sensors like camera, radar, and laser radar Estimation. Firstly, the SSD algorithm is used to detect targets
to acquire exactly and completely target information. in images captured by camera, the classification and boundary
Although the millimeter wave radar can provide rela- of targets are obtained. Secondly, the coordinate system of
tively high range and velocity resolution in bad weather condi- camera and radar are unified by coordinate transformation
tion, it suffers from limited field of views(FOV), low lateral matrix. Then, the parallel Kalman filter is used to track the
resolution and incapability of recognition of target type. On targets detected by radar and camera respectively and reduce
the contrary, camera can provide target type but low accuracy the noise. Finally, fusion method based on Bayesian Estimation
of obstacle range estimation. The fusion of camera and radar is used to fuse the tracking results of the two sensors.
could make up the defects of the two sensors. The paper is organized as follows: Section II introduces
Due to the poor real-time performance of vision-based the vison-based target detection algorithm, SSD. Section III
target detection algorithm, some camera-radar fusion algo- introduces the coordinate transformational matrix. Section IV
rithms [1, 2, 3, 4] work as the following steps: Firstly, through introduces the fusion algorithm. Section V and VI present the
coordinate transformation, the radar targets are utilized to results of simulation experiments and real experiments respec-
determine the regions of interest (ROIs) on the image. Then, tively. Finally, the conclusions are presented in section VII.

© 2018 SAE International. All Rights Reserved.


Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018

2 CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION

Vision-Based Target Train


Detection Default Boxes Generation Suppose m feature maps are
used for prediction. The scale of the default boxes for each
Traditional object detection algorithms based on machine feature map is computed as:
learning utilize the handcrafted features of the target and
well-trained classifier. However, these methods are usually smax - smin
sk = smin + ( k - 1) , k Î [1,m ] (1)
with poor real-time performance. m -1
As the development of deep learning method, some object
detection algorithms based on convolutional neural where smin is 0.2 and smax is 0.9, meaning the lowest layer
network(CNN) such as You Only Look Once (YOLO) or SSD has a scale of 0.2 and the highest layer has a scale of 0.9. And
have competitive accuracy and real-time performance. In this the different aspect ratios of default box are denoted as
paper, SSD [5] is implemented to detect target with frame rate ar{1,2,3,1/2,1/3}. The width and the height for each default box
of 11fps on TX2. The final detection results are the classifica- are computed as w ka = sk ar and hka = sk / ar respectively. For
tion and boundary of valid targets. the aspect ratio of 1, an extra default box whose scale is
s¢k = sk sk +1 is added, resulting in 6 default boxes per feature
map location.
Brief Introduction of SSD
Loss Function The overall objective loss function is a
The SSD approach is based on a feed-forward convolutional weighted sum of the localization loss (loc) and the confidence
network that produces a set of bounding boxes of different loss (conf):
aspect ratios and scales and scores for the presence of object
class instances in those boxes. L ( x, c , l, g ) =
1
N
( Lconf ( x, c ) + a Lloc ( x, l, g ) ) (2)

Model Structure where N is the number of matched default boxes, and the
localization loss is the Smooth L1 loss between the predicted
The model structure of SSD is shown in Figure 1. box (l) and the ground truth box(g). Confidence loss is the
Base network. A truncated standard network is used as softmax loss over multiple classes confidences (c) and the
early network layers of SSD model for high quality image weight term α is set to 1 by cross validation.
classification, which is called the base network. In this paper, confidence loss:
VGG-16 network is used as the base network.
Multi-scale feature maps for detection. Some extra N N

convolutional feature layers are added to the end of the base


network to achieve detection. These layers decrease in size
Lconf ( x ,c ) = - å
iÎPos
ij ( )
x log cˆip -
p
å log (cˆ )
iÎNeg
0
i

progressively to detect objects at multiple scales.


Convolutional predictors for detection. In each feature
where cˆip =
( )
exp c ip
(3)
layer, a set of convolutional filters are used to produce a set of
detection predictions. For a m × n feature layer with p å p
exp ( c )
i
p

channels, a 3 × 3 × p convolutional kernel is applied at each


of the m × n locations to produce either a score for a category, localization loss (loc):
or a shape offset relative to the default box coordinates. N
Default boxes and aspect ratios. At each feature map
cell, a set of default boxes of different aspect ratios at each
Lloc ( x ,l ,g ) = å {å
iÎPos mÎ cx ,cy ,w ,h}
(
xijksmooth L1 lim - gˆ mj ) (4)

location in several feature maps with different scales are


produced to predict. For each default box out of k at a given
cell in m × n feature layer, the c class scores and the 4 offsets
( ) (
gˆ cxj = g cxj - dicx / diw gˆ cyj = g cyj - dicy /dih ) (5)
relative to the original default box shape are computed. This
results in a total of (c + 4) × k filters that are applied around æ gw ö æ gh ö
gˆ wj = log ç wj ÷ gˆ hj = log ç hj ÷ (6)
each location in the feature map, yielding (c + 4) kmn outputs. è di ø è di ø

FIGURE 1 Model structure of SSD


Training Sample
The selection of samples has a great influence on the perfor-
mance of SSD model.
Vehicle detection training set consists of samples from
© SAE International

KITTI [6], GTI Vehicle Image Database [7] and the samples
made by ourselves. Pedestrian detection training set consists
of samples from CVC pedestrian database [8] and the samples
made by ourselves.
© 2018 SAE International. All Rights Reserved.
Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018

CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION 3

FIGURE 2 Coordinate transformational relation of The dx and dy represent the physical length of each pixel
coordinate systems in the X axis and the Y axis in the image pixel coordinate
system, the pixel coordinates (u0, v 0) is the intersection point
of the optical axis Zc and the image plane, f is the focal length
of camera. The M 2 is internal parameter matrix of the
camera.
And the transformation relation between the world coor-
dinate system and the image pixel coordinate system could
be obtained by the formula (7) and formula (8):

é Xw ù
éu ù êY ú
Z c êê v úú = M1M 2 ê ú
w
(9)
ê Zw ú
êë 1 úû ê ú
ë 1 û
© SAE International

Fusion Algorithm
In the fusion algorithm proposed in this paper, the Kalman
filter is used to track the targets detected by radar and camera
to reduce the measurement noise. After that, fusion weight is
Conversion of Coordinates calculated based on Bayesian Estimation. Finally, the targets
data are fused according to fusion weight and tracking results.
Radar and camera are fixed in different place in vehicle. And
the coordinate systems of the two sensors are different. Bayesian Estimation
Therefore, the unity of coordinate system of two sensors is the
basis of achieving target information fusion. The related coor- Bayesian Estimation is a data fusion algorithm to estimate the
dinate systems include world coordinate system, radar coor- unknown state vector X by the known measurement vector Z.
dinate system, camera coordinate system and image pixel Bayes’ theorem is given by:
coordinate system. The transformational relation of these coor-
p( Z = z | X = x ) p( X = x )
dinate systems is shown in Figure 2. p( X = x | Z = z ) = (10)
In this paper, the origin position of world coordinate p( Z = z )
system is consistent with the counterpart of radar coordinate
An estimation of X can be made by maximizing this
system. The transformational relation between the camera
posterior distribution, i.e., by maximizing the p(X = x| Z = z),
coordinate system and the world coordinate system is
which is called the Maximum a posteriori (MAP) estimate
written as:
[9]. Since the denominator in the formula (10) is a normaliza-
tion factor. The problem is equal to maximize the numerator:
é Xc ù é Xw ù é Xw ù
êY ú ê ú êY ú
ê c ú=éR T ù ê Yw ú xˆ MAP = arg max p ( X = x | Z = z ) µ p ( Z = z | X = x ) p ( X = x )
= M1 ê ú
w
(7)
ê Z c ú êë0T ú ê
1 û Zw ú ê Zw ú (11)
ê ú ê ú ê ú
ë1û ë 1 û ë 1 û In the case of two-sensors model, the formula (11) could
The R is a three order rotation matrix, which is determined be extended as:
by the rotation relation between the two coordinate system.
And T is the three-dimension translation vector, which is deter- p ( X = x | Z = z1, z 2 )
mined by the origin position of the two coordinate system. The p ( Z = z1 | X = x ) p ( Z = z 2 | X = x ) p ( X = x ) (12)
M1 is external parameter matrix of the camera. =
p ( Z = z1 , z 2 )
And the transformation relation between the camera
coordinate system and the image pixel coordinate system is: Suppose the measurement uncertainties of the two
sensors could be represented by Gaussian distribution:
é1 ù ìï - ( x - z j )2 üï
êd 0 u0 ú 1
éu ù ê x úéf 0 0
é Xc ù
0ù ê ú
é Xc ù p( Z = z j | X = x ) = exp í ý j = 1, 2 (13)
Y êY ú s j 2p ïî 2s j
2
þï
Z c êê v úú = êê 0 v 0 úú êê 0 0 úú ê ú = M 2 ê ú
1 c c
f 0
dy ê Zc ú ê Zc ú
êë 1 úû ê ú êë 0 0 1 0 úû ê ú ê ú the fused MAP estimate is given by:
êë 0 0 1 úû ë1û ë1û
xˆ MAP = arg max éë p ( Z = z1 | X = x ) p ( Z = z 2 | X = x ) ùû (14)
(8)
© 2018 SAE International. All Rights Reserved.
Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018

4 CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION

é 1 Weight Equation:
ïì - ( x - z1 ) ïü
2

xˆ MAP = arg max ê exp í ý


ê s 1 2p
ë ïî 2s 1
2
þï (
K k , j = Pk|k -1, j H jT / H j Pk|k -1, j H jT + R j ) (23)

1 ìï - ( x - z 2 )2 üïù The weight is calculated based on covariance matrix of


´ exp í ýú (15) the state vector Pk ∣ k − 1 and measurement noise covariance R.
s 2 2p ïî 2s 2
2
ïþúû Filtering Equation:
And the fusion result is given by: xˆ k k , j = xˆ k k -1, j + K k , j ( zˆ k , j - H j xˆ k k -1, j ) (24)
s 2
s 2
x f = xˆ MAP = z + 2
2 1
2
z2 1
(16) the state vector x̂ k k at time k is calculated by a weighted
s +s2
2
1 s 1 + s 22
sum the predicted state vector xˆ k ,k-1 and measurement vector
σ1 and σ2 are the measurement standard deviation ẑ k . The weight is calculated by formula (23)
of sensors. Update error covariance Equation:
Pk|k , j = Pk|k -1, j ( I - K k , j H j ) (25)
Kalman Filter
According to formula (16), the final fusion result is
The target dynamic model is given by:
given by:
xˆ k = fk -1xˆ k -1 + b k -1uk -1 + w k -1 w k = N ( 0,Qk ) (17)
s 22 ˆ s 12 ˆ
T xˆ k k , f = x k k ,1 + x k k ,2 (26)
xk is the target state vector, x̂ k = éë x v x y v y ùû , the x, vx, s 12 + s 22 s 12 + s 22
y, v y are the target’s longitudinal distance, longitudinal
velocity, lateral distance and lateral velocity respectively. ϕk − 1 s 1 and s 2 are obtained from the error covariance matrix
is the state transition matrix, uk − 1 is the controlled input, wk − 1 of the state vector x̂ k k.
is the process noise and Qk is process noise covariance.
The measurement model is given by:
z k = H k x k + v k v k = N ( 0, Rk ) (18) Simulation Experiment
zk is the target measurement vector, Hk is the measure-
ment matrix,vk is the measurement noise and R k is the In the simulation experiment, the type of target is pedestrian
measurement noise covariance. Since the radar could measure and vehicle. The target’s motion model is CV model. Since the
target distance and velocity while camera could only measure radar could measure target distance and velocity while camera
target distance, the H k and vk for radar and camera are could only measure target distance, the target’s distances in
different. Suppose sensor 1 represents radar and sensor 2 longitudinal and lateral direction are the fused data
represents camera. object.
In simulation experiment, radar model is a 77GHz radar,
é1 0 0 0ù FOV is 18°, measurement distance is 150 m, sampling period
ê0 1 0 0 úú is 100 ms. The camera model is a monocular camera, FOV is
z k ,1 = ê x k + v k ,1 v k ,1 = N ( 0,Rk ,1 ) (19) 40°, measurement distance is 120 m, sampling period is 100 ms.
ê0 0 1 0ú
ê ú The measurement noise of sensors meet Gaussian
ë0 0 0 1û distribution.
The measurement standard deviation of sensors is
é1 0 0 0 ù
z k ,2 = ê ú x k + v k ,2 v k ,2 = N ( 0,Rk ,2 ) (20) designed according to the sensor characteristics of radar and
ë0 0 1 0 û camera. Since the lateral resolution of radar is relatively low,
Since the measurement models of radar and camera are the measurement of longitudinal distance is more accurate
different, the parallel Kalman filter is used to track the targets than lateral distance. The standard deviation of longitudinal
detected by radar and camera respectively. The Kalman filter distance is smaller than lateral distance. For camera, the
involves five basic equations. measurement accuracy is lower than radar but in the same
For j = 1,2 represent radar and camera respectively. level. The camera measurement accuracy in longitudinal and
Predictor Equation: lateral direction are almost equal. The relationship between
the sensors measurement standard deviation in simulation
xˆ k k -1, j = f xˆ k -1 k -1, j + b k -1uk -1 (21) experiments is set as:
Which predicts state vector x̂ k k-1 at time k based on esti- s camera _ longitudinal = s camera _ lateral > s radar _ lateral > s radar _ longitudinal
mated target state x̂ k -1 k -1 and state transition matrix ϕ. (27)
Error covariance Equation: According to the type of target, the measurement distance
Pk|k -1, j = f Pk -1|k -1, jf + Q
T
(22) accuracy for vehicle target is higher than the counterpart of
pedestrian. Consequently, the standard deviation for vehicle
Which calculates the covariance matrix of the state vector targets is smaller than pedestrian target. Three experiments
x̂ k -1 k -1. were designed to test and evaluate the fusion algorithm.
© 2018 SAE International. All Rights Reserved.
Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018

CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION 5

FIGURE 3 The error of raw measurement results, tracking FIGURE 5 The RSS in the lateral direction in
results and fusion results in the longitudinal direction. different velocity.
© SAE International

© SAE International
Experiment 1: Pedestrian TABLE 1 The mean value and standard deviation of the RSS in

Moves in Longitudinal experiment 1. The x- represents the longitudinal direction; the


y- represents the lateral direction.
Direction radar camera fusion-mean ours
The target type was pedestrian. And the target’s initial longi- x-mean(m )
2
1.9540 4.1123 1.5894 1.3877
tudinal distance, lateral distance, longitudinal velocity and © SAE International
x-standard 0.3841 0.4981 0.2476 0.2522
lateral velocity were set to 10 meter, 0 meters, 5 km/h and deviation(m2)
0 km/h respectively. The duration time of experiment was y-mean(m2) 2.9877 4.1123 1.7619 1.7439
10 seconds. The experiment result is shown in Figure 3. y-standard 0.5785 0.4970 0.2784 0.3045
The Figure 3 shows that Kalman filter could reduce the deviation(m2)
measurement noise. Consequently, the tracked sensor data
were used to fuse.
The pedestrian’s longitudinal velocity increased from and assigns more fusion weight to radar. Its RSS mean value
5 km/h to 10 km/h progressively. The residual sum of is 12.7% lower than the equal weight fusion algorithm, and
squares(RSS), which represents the sum of error squares their RSS standard deviations are almost equal. Consequently,
between ideal and estimated value in every sampling point, it outperforms the equal weight fusion algorithm in longitu-
is used for evaluation. The formula of RSS is given by: dinal direction. In the lateral direction, the RSS mean value
of radar tracking result increased but is still better than the
n
camera’s. Thus, compared with the longitudinal direction,
å( x - xestimated ,i )
2
RSS = ideal ,i (28) radar is assigned less fusion weight. And the RSS mean value
i =1 of proposed algorithm is almost equal to equal weight fusion
According to Figure 4, Figure 5 and Table 1: algorithm. Although its RSS standard deviation is 8.6% more
The RSS mean value and standard deviation of two fusion than the equal weight fusion algorithm, the base is small and
algorithms are better than solo-sensor tracking results in both the indicator is secondary indicator compared to the mean
longitudinal and lateral direction. The data fusion algorithm value indicator. Consequently, the performance of proposed
could improve the estimation accuracy effectively. Since the algorithm is similar to equal weight fusion algorithm in
radar measurement accuracy in longitudinal is better, the RSS lateral direction.
mean value of radar tracking result is far less than the camera’s.
Thus, the proposed algorithm takes the advantage of radar
Experiment 2: Vehicle Moves
FIGURE 4 The RSS in the longitudinal direction in
in Longitudinal Direction
different velocity. The target type was vehicle. And the target’s initial longitu-
dinal distance, lateral distance, longitudinal velocity and
lateral velocity were set to 10 meter, 0 meters, 20 km/h and
0 km/h respectively. The vehicle’s longitudinal velocity
increased from 20 km/h to 45 km/h progressively.
According to Figure 6, Figure 7 and Table 2:
The overall RSS mean value and standard deviation
decrease compared to experiment 1, since the measurement
accuracy for vehicle target is higher than the pedestrian target.
© SAE International

In details, in the longitudinal direction, the RSS mean value


of proposed algorithm is 11.0% less than the equal weight
fusion algorithm, which approves its better performance. In
the lateral direction, their performances are similar.
© 2018 SAE International. All Rights Reserved.
Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018

6 CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION

FIGURE 6 The RSS in the longitudinal direction in FIGURE 8 The RSS in the longitudinal direction in
different velocity. different velocity.

© SAE International

© SAE International
FIGURE 7 The RSS in the lateral direction in FIGURE 9 The RSS in the lateral direction in
different velocity. different velocity.

© SAE International
© SAE International

TABLE 2 The mean value and standard deviation of the RSS TABLE 3 The mean value and standard deviation of the RSS
in experiment 2. The x- represents the longitudinal direction; in experiment 3. The x- represents the longitudinal direction;
the y- represents the lateral direction. the y- represents the lateral direction.
radar camera fusion-mean ours radar camera fusion-mean ours
x-mean(m2) 1.0842 2.1630 0.8372 0.7461 x-mean(m2) 1.9381 4.0730 1.5757 1.3762
x-standard 0.1674 0.2299 0.1048 0.1080 x-standard 0.3803 0.5005 0.2382 0.2446

© SAE International
© SAE International

deviation(m2) deviation(m2)
y-mean(m2) 1.6645 2.1630 0.9263 0.9129 y-mean(m2) 2.9428 4.0730 1.7344 1.7144
y-standard 0.2962 0.2299 0.1287 0.1395 y-standard 0.5370 0.5005 0.2633 0.2873
deviation(m2) deviation(m2)

In summary, the proposed algorithm is the best estima-


Experiment 3: Pedestrian tion algorithm and tolerate to the velocity variation.
Moves in Lateral Direction
The target type was pedestrian. And the target’s initial longi-
tudinal distance, lateral distance, longitudinal velocity and Real Experiment
lateral velocity were set to 10 meter, −5 meters, 0 km/h and
5 km/h respectively. Then the pedestrian’s lateral velocity In the real experiment, the used radar was 77GHz continental
increased from 5 km/h to10 km/h progressively. ARS 408 radar with 18° FOV, 250 m measuring range, 0.2 m
According to Figure 8, Figure 9 and Table 3, in the longi- range resolution and 80 ms sampling period. And the used
tudinal direction, the RSS mean value of proposed algorithm camera’s is a monocular camera with 40° FOV and 1280x720p
is 12.8% less than the equal weight fusion algorithm; in the resolution ratio. The object detection algorithm, SSD is
lateral direction, their performances are similar. The conclu- operated in TX2 with about 11fps. Since the radar’s sampling
sions about proposed fusion algorithm are consistent with period is different from camera. The spline-based fitting inter-
previous experiments. polation method is used for time synchronization.

© 2018 SAE International. All Rights Reserved.


Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018

CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION 7

Experiment 4: Pedestrian with large error would reduce the fusion algorithm perfor-
mance on the contrary. But the proposed algorithm assigns
Moves in Longitudinal more fusion weight to radar and the performance degradation
Direction is acceptable.
In lateral direction, the radar measurement accuracy is
The pedestrian’s initial longitudinal distance and lateral similar to camera. Consequently, the conclusion about
distance were 2 meters and 0 meter. And the pedestrian moved proposed fusion algorithm are consistent with simulation
away with about 5 km/h longitudinal velocity. experiments. Two fusion algorithms improve the accuracy
According to Figure 10, Figure 11 and Table 4: and are better than solo-sensor tracking results. The proposed
In longitudinal direction, the RSS of proposed fusion algorithm is slightly better than equal weight fusion algorithm.
algorithm is worse than the radar tracking results, which is
different from the conclusions in simulation experiment. The
proposed fusion algorithm works better when the measure- Experiment 5: Vehicle Moves
ment accuracies of different sensors are in the same level.
However, in the real experiment, camera measurement
in Longitudinal Direction
accuracy is not good as the radar does in longitudinal direc- The vehicle’s initial longitudinal distance and lateral distance
tion. Consequently, the addition of camera tracking results were 25 meters and 0 meter. And vehicle moved away with
about 10 km/h longitudinal velocity.
According to Figure 12, Figure 13 and Table 5, the overall
FIGURE 10 The tracking results and fusion results in the RSS value decrease compared to experiment 4, since the
longitudinal direction. Suppose target moved at constant measurement accuracy for vehicle target is higher than the
velocity in the experiment, the black line represents the ideal pedestrian target. In longitudinal direction, the proposed
target longitudinal distance. algorithm takes the advantage of radar and performs satisfac-
tory accuracy. In the lateral direction, the proposed algorithm
improves the estimation accuracy and is better than equal
weight fusion algorithm.

FIGURE 12 The tracking results and fusion results in the


longitudinal direction Suppose target moved at constant
velocity in the experiment, the black line represents the ideal
target longitudinal distance.
© SAE International

FIGURE 11 The tracking results and fusion results in the


lateral direction
© SAE International

FIGURE 13 The tracking results and fusion results in the


© SAE International

lateral direction

TABLE 4 The RSS in experiment 4. The x- represents the


longitudinal direction; the y- represents the lateral direction.
© SAE International

radar camera fusion-mean ours


x-RSS(m2) 36.5372 843.6917 289.2163 62.9294
y-RSS(m )2
5.9289 8.4778 4.8071 4.6807
© SAE International

© 2018 SAE International. All Rights Reserved.


Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018

8 CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION

TABLE 5 The RSS in experiment 5. The x- represents the 6. Geiger, A., Lenz, P., Urtasun, R., “Are We Ready for
longitudinal direction; the y- represents the lateral direction. Autonomous Driving the Kitti Vision Benchmark Suite,”
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
radar camera fusion-mean ours
Conference on, 2012, 3354-3361. IEEE.
x-RSS(m ) 2
17.7746 443.0017 123.6233 34.6460
7. Arróspide, J., Salgado, L., and Nieto, M., “Video Analysis
y-RSS(m2) 5.1947 6.0827 4.1266 4.0983 Based Vehicle Detection and Tracking Using an MCMC
© SAE International
Sampling Framework,” EURASIP Journal on Advances in
Signal Processing, 2012, Article ID 2012:2, Jan. 2012,

Conclusions doi:10.1186/1687-6180-2012-2.
8. Gonzalez, A., Fang, Z., Socarras, Y., Serrat, J., Vazquez, D.,
Xu, J., and Lopez, A., “Pedestrian Detection at Day/Night
In this paper, a method of vehicle and pedestrian detection
Time with Visible and FIR Cameras: A Comparison,” In
based on the data fusion of millimeter wave radar and camera Sensors Journal (Sensors), in press, 2016.
is proposed. A deep learning model called Single Shot
MultiBox Detector (SSD) is utilized for vison-based targets 9. Kumar, M., Garg, D.P., Zachery, R., “A Generalized Approach
for Inconsistency Detection in Data Fusion from Multiple
detection. The parallel Kalman filter is used to track the targets
Sensors, American Control Conference, 2006, 6pp. IEEE Xplore.
detected by radar and camera. Finally, the targets data after
tracking are fused based on Bayesian Estimation.
Through simulation and real experiments, it shows that Contact Information
the proposed algorithm could improve the estimation
accuracy effectively and outperforms single sensor tracking Zhexiang Yu
result and equal weight fusion algorithm. Tongji University No.4800 Cao’An Road, Jiading
District, Shanghai
Republic of China
[email protected]
References
1. Wang, X., Xu, L., Sun, H. et al., “Bionic Vision Inspired On-
Acknowledgments
Road Obstacle Detection and Tracking Using Radar and Subsidized by the project of standardization and new model
Visual Information,” IEEE International Conference on for intelligent manufacture: Research and Test Platform of
Intelligent Transportation Systems, 2014, 39-44. IEEE. System and Communication Standardization for Intelligent
2. Han, S, Wang, X, Xu, L. et al., “Frontal Object Perception for and Connected Vehicle, project ID: 2016ZXFB06002.
Intelligent Vehicles Based on Radar and Camera Fusion,”
Control Conference, 2016, 4003-4008. IEEE.
3. Alencar, F.A.R., Rosero, L.A., Massera Filho, C. et al., “Fast
Definitions/Abbreviations
Metric Tracking by Detection System: Radar Blob and ADAS - Advanced Driving Assistance System
Camera Fusion,” Robotics Symposium (LARS) and 2015 3rd
Brazilian Symposium on Robotics (LARS-SBR), 2015 12th SSD - Single Shot MultiBox Detector
Latin American, 2015,120-125. IEEE. FOV - field of views
4. Bi, X., Tan, B., Xu, Z. et al., “A New Method of Target ROI - Region Of Interest
Detection Based on Autonomous Radar and Camera Data CNN - Convolutional Neural Network
Fusion,” Intelligent and Connected Vehicles Symposium, 2017.
YOLO - You Only Look Once
5. Liu, W., Anguelov, D., Erhan, D. et al., “SSD: Single Shot
MultiBox Detector,” European Conference on Computer MAP - Maximum A Posteriori
Vision, 2016, 21-37. Springer, Cham. RSS - Residual Sum of Squares

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording, or otherwise, without the prior written permission of the copyright holder.

Positions and opinions advanced in this paper are those of the author(s) and not necessarily those of SAE International. The author is solely responsible for the
content of the paper.

ISSN 0148-7191

You might also like