5.Camera-Radar Data Fusion For Target Detection Via Kalman Filter and Bayesian Estimation
5.Camera-Radar Data Fusion For Target Detection Via Kalman Filter and Bayesian Estimation
Citation: Yu, Z., Bai, J., Chen, S., Huang, L. et al., “Camera-Radar Data Fusion for Target Detection via Kalman Filter and Bayesian
Estimation,” SAE Technical Paper 2018-01-1608, 2018, doi:10.4271/2018-01-1608.
Abstract
real-time performance and accuracy. Secondly, the coordinate
T
arget detection is essential to the advanced driving system of camera and radar are unified by coordinate trans-
assistance system (ADAS) and automatic driving. And formation matrix. Then, the parallel Kalman filter is used to
the data fusion of millimeter wave radar and camera track the targets detected by radar and camera respectively.
could provide more accurate and complete information of Since targets data provided by the camera and radar are
targets and enhance the environmental perception perfor- different, different Kalman filters are designed to achieve the
mance. In this paper, a method of vehicle and pedestrian tracking process. Finally, the targets data are fused based on
detection based on the data fusion of millimeter wave radar Bayesian Estimation. At first, several simulation experiments
and camera is proposed to improve the target distance estima- were designed to test and optimize the proposed method, then
tion accuracy. The first step is the targets data acquisition. A the real data was used to prove further. Through experiments,
deep learning model called Single Shot MultiBox Detector it shows that the measurement noise can be considerably
(SSD) is utilized for targets detection in consecutive video reduced by Kalman filter and the fusion algorithm could
frames captured by camera and further optimized for high improve the estimation accuracy.
Introduction
the target detection algorithm is just implemented in these
T
o develop advanced driving assistance system (ADAS) ROIs to judge the existence of targets to reduce the run time.
and automatic driving, real-time and robust on-road Finally, refine the detected targets’ boundary.
target detection is one of the key modules of vehicle However, the performance of this kind of algorithm is
environmental perception. Because the on-road driving limited by the capability of millimeter wave radar. Once a
circumstances are complex and unpredictable, it’s necessary target is missed by radar, it can’t be detected by the following
for vehicles to equipped with different types of sensors to deal detection algorithm.
very well with the issues of environmental perception and This paper proposes a new fusion method of camera and
recognition. Multi-sensor fusion can take advantage of radar data for target detection via Kalman filter and Bayesian
different types of sensors like camera, radar, and laser radar Estimation. Firstly, the SSD algorithm is used to detect targets
to acquire exactly and completely target information. in images captured by camera, the classification and boundary
Although the millimeter wave radar can provide rela- of targets are obtained. Secondly, the coordinate system of
tively high range and velocity resolution in bad weather condi- camera and radar are unified by coordinate transformation
tion, it suffers from limited field of views(FOV), low lateral matrix. Then, the parallel Kalman filter is used to track the
resolution and incapability of recognition of target type. On targets detected by radar and camera respectively and reduce
the contrary, camera can provide target type but low accuracy the noise. Finally, fusion method based on Bayesian Estimation
of obstacle range estimation. The fusion of camera and radar is used to fuse the tracking results of the two sensors.
could make up the defects of the two sensors. The paper is organized as follows: Section II introduces
Due to the poor real-time performance of vision-based the vison-based target detection algorithm, SSD. Section III
target detection algorithm, some camera-radar fusion algo- introduces the coordinate transformational matrix. Section IV
rithms [1, 2, 3, 4] work as the following steps: Firstly, through introduces the fusion algorithm. Section V and VI present the
coordinate transformation, the radar targets are utilized to results of simulation experiments and real experiments respec-
determine the regions of interest (ROIs) on the image. Then, tively. Finally, the conclusions are presented in section VII.
2 CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION
Model Structure where N is the number of matched default boxes, and the
localization loss is the Smooth L1 loss between the predicted
The model structure of SSD is shown in Figure 1. box (l) and the ground truth box(g). Confidence loss is the
Base network. A truncated standard network is used as softmax loss over multiple classes confidences (c) and the
early network layers of SSD model for high quality image weight term α is set to 1 by cross validation.
classification, which is called the base network. In this paper, confidence loss:
VGG-16 network is used as the base network.
Multi-scale feature maps for detection. Some extra N N
KITTI [6], GTI Vehicle Image Database [7] and the samples
made by ourselves. Pedestrian detection training set consists
of samples from CVC pedestrian database [8] and the samples
made by ourselves.
© 2018 SAE International. All Rights Reserved.
Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018
CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION 3
FIGURE 2 Coordinate transformational relation of The dx and dy represent the physical length of each pixel
coordinate systems in the X axis and the Y axis in the image pixel coordinate
system, the pixel coordinates (u0, v 0) is the intersection point
of the optical axis Zc and the image plane, f is the focal length
of camera. The M 2 is internal parameter matrix of the
camera.
And the transformation relation between the world coor-
dinate system and the image pixel coordinate system could
be obtained by the formula (7) and formula (8):
é Xw ù
éu ù êY ú
Z c êê v úú = M1M 2 ê ú
w
(9)
ê Zw ú
êë 1 úû ê ú
ë 1 û
© SAE International
Fusion Algorithm
In the fusion algorithm proposed in this paper, the Kalman
filter is used to track the targets detected by radar and camera
to reduce the measurement noise. After that, fusion weight is
Conversion of Coordinates calculated based on Bayesian Estimation. Finally, the targets
data are fused according to fusion weight and tracking results.
Radar and camera are fixed in different place in vehicle. And
the coordinate systems of the two sensors are different. Bayesian Estimation
Therefore, the unity of coordinate system of two sensors is the
basis of achieving target information fusion. The related coor- Bayesian Estimation is a data fusion algorithm to estimate the
dinate systems include world coordinate system, radar coor- unknown state vector X by the known measurement vector Z.
dinate system, camera coordinate system and image pixel Bayes’ theorem is given by:
coordinate system. The transformational relation of these coor-
p( Z = z | X = x ) p( X = x )
dinate systems is shown in Figure 2. p( X = x | Z = z ) = (10)
In this paper, the origin position of world coordinate p( Z = z )
system is consistent with the counterpart of radar coordinate
An estimation of X can be made by maximizing this
system. The transformational relation between the camera
posterior distribution, i.e., by maximizing the p(X = x| Z = z),
coordinate system and the world coordinate system is
which is called the Maximum a posteriori (MAP) estimate
written as:
[9]. Since the denominator in the formula (10) is a normaliza-
tion factor. The problem is equal to maximize the numerator:
é Xc ù é Xw ù é Xw ù
êY ú ê ú êY ú
ê c ú=éR T ù ê Yw ú xˆ MAP = arg max p ( X = x | Z = z ) µ p ( Z = z | X = x ) p ( X = x )
= M1 ê ú
w
(7)
ê Z c ú êë0T ú ê
1 û Zw ú ê Zw ú (11)
ê ú ê ú ê ú
ë1û ë 1 û ë 1 û In the case of two-sensors model, the formula (11) could
The R is a three order rotation matrix, which is determined be extended as:
by the rotation relation between the two coordinate system.
And T is the three-dimension translation vector, which is deter- p ( X = x | Z = z1, z 2 )
mined by the origin position of the two coordinate system. The p ( Z = z1 | X = x ) p ( Z = z 2 | X = x ) p ( X = x ) (12)
M1 is external parameter matrix of the camera. =
p ( Z = z1 , z 2 )
And the transformation relation between the camera
coordinate system and the image pixel coordinate system is: Suppose the measurement uncertainties of the two
sensors could be represented by Gaussian distribution:
é1 ù ìï - ( x - z j )2 üï
êd 0 u0 ú 1
éu ù ê x úéf 0 0
é Xc ù
0ù ê ú
é Xc ù p( Z = z j | X = x ) = exp í ý j = 1, 2 (13)
Y êY ú s j 2p ïî 2s j
2
þï
Z c êê v úú = êê 0 v 0 úú êê 0 0 úú ê ú = M 2 ê ú
1 c c
f 0
dy ê Zc ú ê Zc ú
êë 1 úû ê ú êë 0 0 1 0 úû ê ú ê ú the fused MAP estimate is given by:
êë 0 0 1 úû ë1û ë1û
xˆ MAP = arg max éë p ( Z = z1 | X = x ) p ( Z = z 2 | X = x ) ùû (14)
(8)
© 2018 SAE International. All Rights Reserved.
Downloaded from SAE International by North Carolina State Univ, Saturday, September 15, 2018
4 CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION
é 1 Weight Equation:
ïì - ( x - z1 ) ïü
2
CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION 5
FIGURE 3 The error of raw measurement results, tracking FIGURE 5 The RSS in the lateral direction in
results and fusion results in the longitudinal direction. different velocity.
© SAE International
© SAE International
Experiment 1: Pedestrian TABLE 1 The mean value and standard deviation of the RSS in
6 CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION
FIGURE 6 The RSS in the longitudinal direction in FIGURE 8 The RSS in the longitudinal direction in
different velocity. different velocity.
© SAE International
© SAE International
FIGURE 7 The RSS in the lateral direction in FIGURE 9 The RSS in the lateral direction in
different velocity. different velocity.
© SAE International
© SAE International
TABLE 2 The mean value and standard deviation of the RSS TABLE 3 The mean value and standard deviation of the RSS
in experiment 2. The x- represents the longitudinal direction; in experiment 3. The x- represents the longitudinal direction;
the y- represents the lateral direction. the y- represents the lateral direction.
radar camera fusion-mean ours radar camera fusion-mean ours
x-mean(m2) 1.0842 2.1630 0.8372 0.7461 x-mean(m2) 1.9381 4.0730 1.5757 1.3762
x-standard 0.1674 0.2299 0.1048 0.1080 x-standard 0.3803 0.5005 0.2382 0.2446
© SAE International
© SAE International
deviation(m2) deviation(m2)
y-mean(m2) 1.6645 2.1630 0.9263 0.9129 y-mean(m2) 2.9428 4.0730 1.7344 1.7144
y-standard 0.2962 0.2299 0.1287 0.1395 y-standard 0.5370 0.5005 0.2633 0.2873
deviation(m2) deviation(m2)
CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION 7
Experiment 4: Pedestrian with large error would reduce the fusion algorithm perfor-
mance on the contrary. But the proposed algorithm assigns
Moves in Longitudinal more fusion weight to radar and the performance degradation
Direction is acceptable.
In lateral direction, the radar measurement accuracy is
The pedestrian’s initial longitudinal distance and lateral similar to camera. Consequently, the conclusion about
distance were 2 meters and 0 meter. And the pedestrian moved proposed fusion algorithm are consistent with simulation
away with about 5 km/h longitudinal velocity. experiments. Two fusion algorithms improve the accuracy
According to Figure 10, Figure 11 and Table 4: and are better than solo-sensor tracking results. The proposed
In longitudinal direction, the RSS of proposed fusion algorithm is slightly better than equal weight fusion algorithm.
algorithm is worse than the radar tracking results, which is
different from the conclusions in simulation experiment. The
proposed fusion algorithm works better when the measure- Experiment 5: Vehicle Moves
ment accuracies of different sensors are in the same level.
However, in the real experiment, camera measurement
in Longitudinal Direction
accuracy is not good as the radar does in longitudinal direc- The vehicle’s initial longitudinal distance and lateral distance
tion. Consequently, the addition of camera tracking results were 25 meters and 0 meter. And vehicle moved away with
about 10 km/h longitudinal velocity.
According to Figure 12, Figure 13 and Table 5, the overall
FIGURE 10 The tracking results and fusion results in the RSS value decrease compared to experiment 4, since the
longitudinal direction. Suppose target moved at constant measurement accuracy for vehicle target is higher than the
velocity in the experiment, the black line represents the ideal pedestrian target. In longitudinal direction, the proposed
target longitudinal distance. algorithm takes the advantage of radar and performs satisfac-
tory accuracy. In the lateral direction, the proposed algorithm
improves the estimation accuracy and is better than equal
weight fusion algorithm.
lateral direction
8 CAMERA-RADAR DATA FUSION FOR TARGET DETECTION VIA KALMAN FILTER AND BAYESIAN ESTIMATION
TABLE 5 The RSS in experiment 5. The x- represents the 6. Geiger, A., Lenz, P., Urtasun, R., “Are We Ready for
longitudinal direction; the y- represents the lateral direction. Autonomous Driving the Kitti Vision Benchmark Suite,”
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE
radar camera fusion-mean ours
Conference on, 2012, 3354-3361. IEEE.
x-RSS(m ) 2
17.7746 443.0017 123.6233 34.6460
7. Arróspide, J., Salgado, L., and Nieto, M., “Video Analysis
y-RSS(m2) 5.1947 6.0827 4.1266 4.0983 Based Vehicle Detection and Tracking Using an MCMC
© SAE International
Sampling Framework,” EURASIP Journal on Advances in
Signal Processing, 2012, Article ID 2012:2, Jan. 2012,
Conclusions doi:10.1186/1687-6180-2012-2.
8. Gonzalez, A., Fang, Z., Socarras, Y., Serrat, J., Vazquez, D.,
Xu, J., and Lopez, A., “Pedestrian Detection at Day/Night
In this paper, a method of vehicle and pedestrian detection
Time with Visible and FIR Cameras: A Comparison,” In
based on the data fusion of millimeter wave radar and camera Sensors Journal (Sensors), in press, 2016.
is proposed. A deep learning model called Single Shot
MultiBox Detector (SSD) is utilized for vison-based targets 9. Kumar, M., Garg, D.P., Zachery, R., “A Generalized Approach
for Inconsistency Detection in Data Fusion from Multiple
detection. The parallel Kalman filter is used to track the targets
Sensors, American Control Conference, 2006, 6pp. IEEE Xplore.
detected by radar and camera. Finally, the targets data after
tracking are fused based on Bayesian Estimation.
Through simulation and real experiments, it shows that Contact Information
the proposed algorithm could improve the estimation
accuracy effectively and outperforms single sensor tracking Zhexiang Yu
result and equal weight fusion algorithm. Tongji University No.4800 Cao’An Road, Jiading
District, Shanghai
Republic of China
[email protected]
References
1. Wang, X., Xu, L., Sun, H. et al., “Bionic Vision Inspired On-
Acknowledgments
Road Obstacle Detection and Tracking Using Radar and Subsidized by the project of standardization and new model
Visual Information,” IEEE International Conference on for intelligent manufacture: Research and Test Platform of
Intelligent Transportation Systems, 2014, 39-44. IEEE. System and Communication Standardization for Intelligent
2. Han, S, Wang, X, Xu, L. et al., “Frontal Object Perception for and Connected Vehicle, project ID: 2016ZXFB06002.
Intelligent Vehicles Based on Radar and Camera Fusion,”
Control Conference, 2016, 4003-4008. IEEE.
3. Alencar, F.A.R., Rosero, L.A., Massera Filho, C. et al., “Fast
Definitions/Abbreviations
Metric Tracking by Detection System: Radar Blob and ADAS - Advanced Driving Assistance System
Camera Fusion,” Robotics Symposium (LARS) and 2015 3rd
Brazilian Symposium on Robotics (LARS-SBR), 2015 12th SSD - Single Shot MultiBox Detector
Latin American, 2015,120-125. IEEE. FOV - field of views
4. Bi, X., Tan, B., Xu, Z. et al., “A New Method of Target ROI - Region Of Interest
Detection Based on Autonomous Radar and Camera Data CNN - Convolutional Neural Network
Fusion,” Intelligent and Connected Vehicles Symposium, 2017.
YOLO - You Only Look Once
5. Liu, W., Anguelov, D., Erhan, D. et al., “SSD: Single Shot
MultiBox Detector,” European Conference on Computer MAP - Maximum A Posteriori
Vision, 2016, 21-37. Springer, Cham. RSS - Residual Sum of Squares
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording, or otherwise, without the prior written permission of the copyright holder.
Positions and opinions advanced in this paper are those of the author(s) and not necessarily those of SAE International. The author is solely responsible for the
content of the paper.
ISSN 0148-7191