Design and Implementation of Driverless Perceptual System Based on CPU FPGA
Design and Implementation of Driverless Perceptual System Based on CPU FPGA
Abstract—Nowadays, autonomous driving is one of the most obstruction and other factors. Traditional lane line detection
popular technologies and is in a stage of rapid development. In methods are based on features or models. For example, Borkar
the process of working, autonomous vehicles need to locate, [5] and others used the inverse perspective change method to
perceive, predict and plan. The enormous amount of calculation first detect the original road image, then use the Hough
results high energy consumption. In response to the transform to detect the lane lines and then extract the features
characteristics of unmanned driving, a high performance, low in sections. Liu Bin [6] and others have performed pruning and
power consumption and high flexibility unmanned visual convolution optimization operations on the Enet network, and
perception system was developed using a heterogeneous used the improved Enet network to perform pixel-level image
platform of CPU+FPGA. The heterogeneous platform has the
semantic segmentation on lane lines. However, although the
advantages of parallel processing and field programmable, etc.
The system can improve the YOLOv3(You Only Look Once)
current lane detection algorithm can basically meet the real-
algorithm of the deep neural network for target detection time requirements, the detection robustness is poor while face
recognition, the Soft-NMS (Non maximum suppression) with the complex lane detection, and the tracking detection of
algorithm is added to the original YOLOv3 algorithm lane lines is also very poor.
framework, and then the target detection false detection rate The perception system of automatic driving for image data,
and missed detection rate are minimized through the frame most of the actual application is still using CPU deployment,
regression operation, and finally we train data and test the whether it is two-stage algorithm or one-stage algorithm, but
network. the experimental results show that the system can due to the real-time detection requirements of unmanned
quickly and accurately identify the target in different complex driving, the use of CPU deployment can not meet the
scenes, and the power consumption is low. requirements of real-time, instead of CPU, it ues the GPS
deployment , in the meantime, the power consumption of the
Keywords-autonomous driving; FPGA; CPU; YOLOv3 deployment with GPU is again too huge for its deployment on
automated vehicles. To address the high requirements in
I. INTRODUCTION unmanned visual perception, this paper designs an unmanned
The Unmanned Driving System mainly consists of four visual perception system based on the YOLOv3 [7] algorithm
parts: perception, self-localization, prediction and decision- using a CPU+FPGA architecture with the BDD100K dataset
making, among which perception is the foundation of as the target. The main contributions are threefold as follows:
unmanned driving technology, because follow-up predictions (1) CPU+FPGA architecture is used to deploy target
and decisions require perception first. Perception begin with detection algorithm to improve the real-time detection of
getting relevant information outside the vehicle. In this paper, target algorithm to meet the speed and low-power
the driverless perception system is mainly aimed at the person requirements of unmanned vehicles for target detection.
outside the vehicle, the vehicle and so on target carries on the (2) Although the one-stage algorithm is faster, it is still
examination, also through Hough Transform algorithm worse than two-stage in terms of detection accuracy, and
together with Kalman filter real-time detection and tracking of although YOLOv3 is slightly better than other one-stage
lane lines. algorithms in terms of detection effect, it still cannot rule out
Current target detection algorithms in AI fall into two the case of missing detection. Therefore, in this paper, the
main categories: (1) the RCNN [1]-[3] series of algorithms Soft-NMS algorithm is added to the original YOLOv3
(RCNN, Fast RCNN, Faster RCNN) based on the Region algorithm framework, and then through the edge regression
Proposal, which require the use of heuristics (selective search) operation to minimize the false detection rate and the missed
or the CNN [4] network (RPN) Generate Region Proposal, detection rate.
and then do classification and regression on Region Proposal. (3) To address the shortcomings of the traditional lane line
(2) The one-stage algorithms such as YOLO and SSD use only detection algorithm in terms of real-time and insufficient
one CNN network to predict different target categories. The detection accuracy, an improved hough transform algorithm
one-stage algorithm has a higher accuracy, but the detection is used to detect lane lines and a Kalman filter is applied to
accuracy is poor. track lane lines to improve the accuracy and real-time of lane
Due to the complexity of the actual road conditions, lane line detection.
line detection can be easily affected by the light, pedestrian
262
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 03,2024 at 07:44:06 UTC from IEEE Xplore. Restrictions apply.
In the NMS algorithm, the selection of the threshold is IV. LANE LINE DETECTION ALGORITHM AND
difficult to be precise, but when comparing the low frame with IMPROVEMENT
high IOU and the high frame with low IOU in step (2), the
For the original image, we mainly extract the color feature
latter will be preserved, resulting in a non-optimal detection,
and edge feature of the image to complete the lane line
which will result in a missed or false detection, which will
detection. For the edge feature of the image, it is inevitable
result in the target detection frame not covering the target we
that noise will be generated due to the light and electronic
detected well.
interference in the image acquisition process. Therefore, first
Therefore, in this paper, Soft-NMS is used to replace the
of all, we The image is processed by Gaussian smoothing
NMS algorithm, and when the calculated value of IOU is
filtering, and the Gaussian function is as follows:
greater than Nt , it is not deleted directly, but its weight is
reduced with the following formula. 1 2 2
𝑆𝑖 = { 𝑖
𝑆 ,𝑖𝑜𝑢(𝑀, 𝑎𝑖 ) < 𝑁𝑡
(8) f ( x) = e− ( x −u ) /2 (14)
𝑆𝑖 (1 − 𝑖𝑜𝑢(𝑀, 𝑎𝑖 )), 𝑖𝑜𝑢(𝑀, 𝑎𝑖 ) ≥ 𝑁𝑡
&𝑖𝑜𝑢(𝑀,𝑎𝑖 )2
2
−
𝑆𝑖 = 𝑆𝑖 𝑒 𝜎 (9) 𝜇 − mean of x ,𝛿 −variance of x.
Soft-NMS uses linear weighting and Gaussian weighting The detection result after Gaussian filtering also needs to
to reduce the weights, reduce the probability of missed and be converted into a binary image, and the image is binarized
false detections, and improve the success of target detection. by the threshold binary method:
Since the detection is completed, the target box contains
the target but its positioning is not very accurate. In this paper, 1 I (i, j ) t
we refer to the idea of IOU-NET [13], and add the regression I (i, j ) = { (15)
operation of the border, mainly to correct the translation and 0 I (i, j ) t
scaling of the border of the candidate region, and adjust its
position and size, so that it can be closer to the real border of i—Number of rows of image I; j—Number of columns of
the target. denotes, (x, y) denotes the center of the border, (w, image j; t—Set image threshold.
h) denotes the width and height of the border; (Px, Py, Pw, Ph) After Gaussian filtering and threshold binarization
measured border, (Gx, Gy, Gw, Gh) is the actual border, the processing, we can extract edge features with step changes or
process is as follows: roof-like changes in pixel gray levels in the image to eliminate
(1) Shift the frame; irrelevant information in the image for subsequent straight line
extraction. In this paper, the most commonly used Sobel
𝐺𝑥 = 𝑃𝑊 𝑑𝑥 (𝑃) + 𝑃𝑥 (10) operator is used for edge detection. In formula (16), from left
𝐺𝑦 = 𝑃ℎ 𝑑𝑦 (𝑃) + 𝑃𝑦 (11) to right, there are vertical Sobel operators, horizontal Sobel
operators, 45˚Sobel operators, and 135˚Sobel operators.
(2) Perform scaling operations on the frame;
-1 -1 -1 -1 0 1 -1 -1 0 0 1 1
𝐺𝑤 = 𝑃𝑊 𝑒 𝑑𝑤 (𝑃) (12) 0 0 0 、-1 0 1 、-1 0 1 、-1 0 1 (16)
𝐺ℎ = 𝑃ℎ 𝑒 𝑑ℎ (𝑃) (13)
1 1 1 -1 0 1 -1 1 1 -1 -1 0
dx(p)、dy(p) 、dw(p) 、dh(p) is a linear function of the
image characteristics of the frame P. For the lane line edge features obtained after
preprocessing, this paper uses Hough transform to detect lane
lines. The specific formula is:
𝜌 = 𝑥𝑐𝑜𝑠𝜃 + 𝑦𝑠𝑖𝑛𝜃 (17)
Is the distance from the line to the origin of the coordinate
in the image space, and 𝜃 is the angle between the line and the
x-axis. Due to the poor real-time performance of the Hough
transform acquisition method and the shortcomings of being
easily disturbed, this article adopts the following improved
methods in lane line detection:
Figure 3. Optimization of detection frame based on IOU. (1) During edge extraction, all Sobels are selected for edge
detection in order to achieve the omnidirectional Sobel edge
After using the improvement of operating algorithms such detection in order to achieve the best detection effect. After
as Soft-max and border regression, it can be clearly found many experiments, it is found that the omnidirectional Sobel
from the Fig.3 that the improved target detection frame can edge detection effect is better than that of a single direction.
better cover the target we want to detect. Sobel algorithm.
(2) Because each detection in the Hough transform
requires traversal selection of θ, which is not conducive to
real-time detection, the direction and detection angle of the
263
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 03,2024 at 07:44:06 UTC from IEEE Xplore. Restrictions apply.
detected lane line are restricted. When selecting θ, the left lane By improving the Hough transform algorithm and
line is subject to the [-60˚, -10˚] level angle constraint, and the combining the Kalman filter to detect the lane line, it can
right lane line is subject to the [10˚, 60˚] level angle constraint. detect the lane line in real time and accurately, and use the
Carry out box sliding filtering on the lane line, put the Kalman filter to predict and track the lane line, which
coordinates into two quadratic functions for data fitting. The improves the robustness of the lane line detection. Fig.4 shows
fitted quadratic function is the mathematical expression of the the detection results of lane lines.
left and right lane lines of the current lane. Finally, the pixels
between the expression equations of the two lane lines are V. ALGORITHM DEPLOYMENT
filled and transformed back to the original lane in perspective In this paper, tensorflow with better speed and flexibility
to obtain the vehicle The current lane of travel, After the lane is selected as the model building framework. The
line detection is completed, the missing lane line may be experimental hardware platform adopts the heterogeneous
affected by pedestrians or other factors. At this time, the scalable open platform HERO platform, which mainly
Kalman filter can be used to track the lane line. includes an Arria10 FPGA and an Intel core I5 CPU. The
The Kalman filter calculates the estimated value of the software operating architecture is mainly based on the
current state based on the estimated value of the state at the OpenVINO toolkit, and the trained model is optimized by the
previous moment and the observed value of the current state. OpenVINO model optimizer. The optimized model saves the
The specific formula is as follows: network file in the .xml file and the weight file in the .bin file.
Equation of state:
VI. EXPERIMENTAL RESULTS AND ANALYSIS
xk = Ak xk + Bk μk + wk (18)
In the training process of the entire model, the loss
Measurement equation: function change curve is shown in Fig.5, from the analysis of
Zk = Hk xk + vk (19) the figure can be concluded that in the late model training, the
loss function value of the entire model has basically reached
xk is the state vector, Zk is the measurement vector, convergence, the entire model has basically all training
Ak is the state transition matrix, μk is the control vector, Bk completed, with a better target detection effect.
is the control matrix, wk is the system error (noise), Hk is
the measurement matrix, vk is the measurement error (noise),
wk and vk All are Gaussian noise.
The Kalman filter is used to track the use of lane lines in
Hough transform. According to the replacement of ρ and θ,
the corresponding Zk and xk are:
𝜌
Zk = [ ] (20)
𝜃 𝑘
𝜌
𝜃
xk = [ 𝑣𝜌 ] (21)
𝑣𝜃 𝑘
𝑣𝜌 and 𝑣𝜃 represent the transformation speed of the Figure 5. Trend of loss function.
target image of ρ and θ during driving.
As the model and optimization algorithm used are the
1 0 1 0 same, so different devices in the detection accuracy is
0 1 0 1 comparable, the difference lies in the reasoning speed and
Ak = [ ] (22)
0 0 1 0 device power consumption will be different, this paper in the
0 0 0 1 input stream resolution of 720P (1280 × 720), on different
1 0 0 0 devices in the power consumption and reasoning speed to do
Hk = [ ] (23) comparison experiments, the results are shown in Table 1 and
0 1 0 0
Table II.
TABLE I. DETECTION ACCURACY OF DIFFERENT DEVICES
Target CPU1 CPU2+GPU CPU1+FPGA
name accuracy accuracy accuracy
Bus 0.876 0.885 0.877
Light 0.734 0.774 0.745
Sign 0.615 0.628 0.626
Person 0.950 0.953 0.951
Car 0.934 0.949 0.946
Rider 0.745 0.756 0.748
Figure 4. Result of lane detection. mAP 0.766 0.781 0.772
264
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 03,2024 at 07:44:06 UTC from IEEE Xplore. Restrictions apply.
TABLE II. ENERGY EFFICIENCY RATIO OF DIFFERENT DEVICES optimization. The experimental results show that the
Energy optimized model, which is accelerated by FPGA, does not
Power Reasoning reduce the accuracy much compared to the original model.
efficiency
Device name consumption speed
(W) (fps)
ratio However, the inference speed of the whole model is greatly
(W/fps) improved compared to the original CPU structure, which
CPU1:I5 7260U 48 18 0.375 greatly solves the problem of energy efficiency of unmanned
CPU2+GPU:
I5 7500+1080TI
287 96 0.334 visual perception.
CPU1+FPGA: Although This paper designs and implements an
67 75 1.119
I57260U+Arria10 unmanned driving detection system based on CPU+FPGA,
which is deployed on FPGA through data set processing
Through the experimental data in Table I shows that the model training and optimization. However, due to the loss of
CPU+FPGA detection accuracy is slightly higher than the distance information in the process of monocular camera
CPU detection accuracy, lower than the CPU+GPU, image acquisition from three-dimensional to two-dimensional,
CPU+FPGA on the draw accuracy mAP and the other two the algorithm of this subject can not be directly applied to the
compared to the accuracy difference is only 0.06-0.09, this is actual scene, and the algorithm is deployed based on yolov3,
because the model and optimization algorithm used are the so the future work can be based on the following two aspects:
same, so different devices in the detection accuracy is (1) Yolov4 algorithm or other better algorithm can be used
comparable, the difference lies in the inferential The speed for project deployment to improve the accuracy of detection
and device power consumption will be different, and the (2) Using binocular or rgbd camera for image acquisition
experimental results in this paper are shown in Table II when and distance calculation, the image data with distance
the input stream resolution is 720P (1280×720). It can be seen information is obtained
that FPGAs are 50-60fps faster than CPUs in terms of CNN
inference acceleration, and the power consumption increase is REFERENCES
only 10-20W; Fig.7 shows the result graph of the detection [1] Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich
using CPU+FPGA. feature hierarchies for accurate object detection and semantic
segmentation. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 580–587, 2014.
[2] R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE International
Conference on Computer Vision, 2015, pp. 1440–1448.
[3] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-
time object detection with region proposal networks,” Advances in
Neural Information Processing Systems, pp. 91–99, 2015
[4] Dou Q, Ouyang C, Chen C, et al. Unsupervised Cross-Modality
Domain Adaptation of ConvNets for Biomedical Image Segmentations
with Adversarial Loss[J]. 2018.
[5] Borkar A,Hayes M, Smith M T. A novel lane detection system with
efficient ground truth generation [J] . EEE Transactions on Intelligent
Transportation Systems, 2012,13(1) : 365 374.
Figure 6. Test results using CPU.
[6] Liu Bin, Liu Hongzhe. Lane line detection algorithm based on
improved Enet network[J]. Computer Science,2020,47(04):142-149
[7] Joseph Redmon, Ali Farhadi. YOLOv3: An Incremental Improvement
[Online], available: http s://arxiv.org/abs/1804.02767, December 25,
2019.
[8] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous
driving? the kitti vision benchmark suite,” in CVPR, 2012.
[9] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R.
Benenson, U. Franke, S. Roth, and B. Schiele. The cityscapes dataset
for semantic urban scene understanding. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR),
pages 3213–3223, 2016.
Figure 7. Test results using CPU+FPGA. [10] X. Huang, X. Cheng, Q. Geng, B. Cao, D. Zhou, P. Wang, Y. Lin, and
R. Yang, “The apolloscape dataset for autonomous driving,” in CVPR,
VII. CONCLUSION AND FUTURE WORK 2018.
[11] Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and ` Peter
The visual perception technology of unmanned driving is Kontschieder. The mapillary vistas dataset for semantic understanding
an important area for the grounding of artificial intelligence of street scenes. In ICCV, 2017.
technology, which addresses the problem of low real-time and [12] Katz R, Nieto J, Nebot E, et al. Track-based self-supervised
high power consumption of the visual perception part of the classification of dynamic obstacles[J]. Autonomous Robots, 2010,
reasoning on the current deployment of automated driving. 29(2):219-233.
This paper designs and implements an unmanned driving [13] Jiang B , Luo R , Mao J , et al. Acquisition of Localization Confidence
detection system based on CPU+FPGA, which is deployed on for Accurate Object Detection[J].
FPGA through data set processing, model training and
265
Authorized licensed use limited to: R V College of Engineering. Downloaded on December 03,2024 at 07:44:06 UTC from IEEE Xplore. Restrictions apply.