A System-On-Chip FPGA Design For Real-Time 2016
A System-On-Chip FPGA Design For Real-Time 2016
saturation
? 1:0 AND
comparator
value
? 1:0
comparator
A. Pre-filtering
Figure 4. Diagram of HOG computation procedure
The input to our system is in RGB format. Each pixel
is represented by 3 bytes, with each byte representing one
color channel. Due to changing luminance and varied weather C. HOG Algorithm
conditions on road, a pixel appearing green to human eyes Standard traffic lights have fixed length-width ratio from
does not necessarily indicate a large absolute value in green 1/4 to 4 as indicated in Fig. 3. Blobs with too large or too
channel, because it also relies on values of other two color small length-width ratio can be eliminated. Prior to computing
channels. The disadvantage of RGB color space is that it HOG, each blob is resized to 32-by-32 pixels. Then each input
cannot reflect the relations among red, green and blue. As the image is firstly divided into blocks. A block size is 16 * 16
first step in pre-filtering, we convert RGB to HSV color space. pixels, containing 4 cells with each cell size 8 * 8 pixels. Next
HSV is a cylindrical-coordinate representations of colorful the block starts sliding horizontally and then vertically, with
pixels, representing relationships between each color channel a step size of 8 pixels. This results in a total of 9 blocks on a
[9]. HSV stands for hue, saturation, and value. In HSV color 32-by-32 image.
domain, green and red colors can be easily picked out by The HOG computation typically consists of three steps.
setting proper thresholds. They are weighted magnitude and bin class calculation, block
Equations below show the pixel format conversions from histogram generation, normalization as illustrated by Fig. 3.
RGB to HSV [10]: As the first step, gradients of each pixel in both x and y
directions are computed:
G−B
60 × M AX−M
IN + 0 (if M AX = R)
Gx (x, y) = |M x (x + 1, y) − M x (x − 1, y)| (4)
B−R
H= 60 × M AX−M IN + 120 (if M AX = G) (1)
R−G
60 × M AX−M IN + 240 (if M AX = B)
Gy (x, y) = |M y (x, y + 1) − M y (x, y − 1)| (5)
S = M AX − M IN (2)
Then, the gradient magnitude and the gradient angle can be
calculated:
V = M AX (3) q
G(x, y) = Gx (x, y) + Gy (x, y) (6)
As the last step in pre-filtering, a single pixel is binarized.
For instance, value is 1 if pixel is considered green, otherwise Gy (x, y)
0, as indicated in Fig. 2. The same process is repeated for red θ = arctan (7)
Gx (x, y)
pixels in parallel.
The gradient magnitude is further divided into 9 different
B. One-Pass Blob Detection bin classes. According to angle value with range 0-180 de-
grees, every 20 degrees represent one bin class. For each cell,
Blob detection collects connected pixels from pre-filtering a block histogram is generated by summing up the weighted
step. The principal idea is to label different clusters of pixels magnitudes for the corresponding bin class, resulting in 9
to different values on the entire image. Here we use 4- feature descriptors in one cell. For the whole image, there
connectivity to determine whether pixels are connected or are 324 feature descriptors in total.
not. 4-connectivity means, for center pixel, only 4 pixels
The last step of normalization makes algorithm more robust
(N,E,W,S) are considered to be its neighbors. For the purpose
to varied illuminations.
of high efficiency, one-pass labeling is utilized. We are able to
output all the potential blobs by scanning through the entire
s
b
image only once. More details on one-pass implementation is bnorm = (8)
sum(b)
explained in Section 3.2.
1779
D. Linear SVM For the implementation of blob detection algorithm on
Linear SVM maps input non-linear descriptors to higher FPGA, blob position table is required which records positions
dimension feature space, then a linear decision surface can be of each blob detected. Shown in Fig. 7, there is a label counter
constructed [11]. The linear SVM is expressed in (9). keeping track of current label number - each time a new blob is
detected, label counter adds its value by 1. The blob position
Y = αy T + γ (9) table is made up of 4 memory blocks, recording 4 vertices
of every blob. For a specific blob with label number n, its
where α is the support vector, y is HOG feature descriptors position information is stored at nth slot in each of these 4
vector, and γ is the SVM offset. In our work, support vector memory blocks.
α and SVM offset γ is pre-trained using labeled traffic light
samples. The result of (9) indicates whether a target blob
contains a traffic light or not.
connection table
B. Pipeline Structure for Detection
As described in Section 2.1, detection part consists of color
conversion and blob detection. Fig. 6 shows the hardware 5
value
architecture of detection part on the FPGA fabric. Since two 7 5
connection
label logic 5 7th memory slot
types of traffic lights are to be recognized simultaneously, two address
blob detection blocks are used. Also, we implement one-pass
blob detection in order to achieve a frame rate of 60 fps.
1780
Figure 9. Merging information in blob position table Figure 10. Green traffic lights is detected by the proposed system at real-time
Table II
D ETECTION ACCURACY
C. Blob to AXI4-Stream Interface
Recall Precision
The interface is employed to transfer the blob position Red traffic lights 92.11% 99.29%
information onto AXI4-stream bus, along with a video DMA, Green traffic lights 94.44% 98.27%
realizing high-speed from FPGA to frame buffers in DDR
memory. Subsequently, the on-chip embdded ARM processor
can access image frames from DDR with a very high pixel V. C ONCLUSION
rate. In this paper, we present an FPGA based SOC design for
real-time traffic light recognition. We successfully implement
IV. O NBOARD I MPLEMENTATION R ESULT
the entire system on Xilinx Zynq board, achieving real-time
We implement the entire system on Xilinx Zynq ZC-702 processing rate of 60 fps and beyond. With the advent of deep
board. The input video resolution is 1024×768 XGA. High- learning network, it is likely to obtain a higher detection rate
est frequency of FPGA implementation reaches 147.34MHz, by replacing HOG algorithm with stronger feature extractors.
so higher resolutions can also be supported. Overall FPGA
R EFERENCES
utilization is shown in Table 1. Since input data streaming
rate is at 60 fps, our one-pass blob detection is adequate to [1] R. Okuda, Y. Kajiwara, and K. Terashima, “A survey of technical
trend of adas and autonomous driving,” in VLSI Technology, Systems
follow such a rate for object detection. We also measure the and Application (VLSI-TSA), Proceedings of Technical Program-2014
time taken by the processor to classify all the traffic light International Symposium on. IEEE, 2014, pp. 1–4.
candidates on a single frame. The time varies from 1.96ms to [2] A. Møgelmose, M. M. Trivedi, and T. B. Moeslund, “Vision-based
traffic sign detection and analysis for intelligent driver assistance sys-
9.66ms, orless than 10ms all the time. For the embedded ARM tems: Perspectives and survey,” Intelligent Transportation Systems, IEEE
processor, we are able to achieve over 100 fps performance. Transactions on, vol. 13, no. 4, pp. 1484–1497, 2012.
[3] Y.-C. Chung, J.-M. Wang, and S.-W. Chen, “A vision-based traffic light
Table I detection system at intersections,” Journal of Taiwan Normal University:
FPGA RESOURCE UTILIZATION Mathematics, Science and Technology, vol. 47, no. 1, pp. 67–86, 2002.
[4] M. Omachi and S. Omachi, “Traffic light detection with color and edge
Used Available utilization information,” in Computer Science and Information Technology, 2009.
ICCSIT 2009. 2nd IEEE International Conference on. IEEE, 2009, pp.
Slice Registers 47656 106400 44.78% 284–287.
Slice LUTs 49183 53200 92.44% [5] Y. Shen, U. Ozguner, K. Redmill, and J. Liu, “A robust video based
DSP48E1s 4 220 1.81% traffic light detection algorithm for intelligent vehicles,” in Intelligent
Block RAM 25 140 17.85% Vehicles Symposium, 2009 IEEE. IEEE, 2009, pp. 521–526.
[6] Y. Zhou, Z. Chen, and X. Huang, “A pipeline architecture for traffic
sign classification on an fpga,” in Circuits and Systems (ISCAS), 2015
Also, we test our system through 10 video clips recorded IEEE International Symposium on, May 2015, pp. 950–953.
in different road and weather conditions. A sample image is [7] J. V. Gomes, P. R. Inácio, M. Pereira, M. M. Freire, and P. P. Monteiro,
shown in Fig. 10. Table 2 shows we have achieved a high “Detection and classification of peer-to-peer traffic: A survey,” ACM
Computing Surveys (CSUR), vol. 45, no. 3, p. 30, 2013.
recall and precision rate. Equations are given: [8] “Reaction time statistics.” [Online]. Available: http://www.
humanbenchmark.com/tests/reactiontime/statistics
true positives [9] “Hsl and hsv.” [Online]. Available: https://en.wikipedia.org/wiki/HSL_
recall = (10) and_HSV
true positives + f alse negatives [10] T. Hamachi, H. Tanabe, and A. Yamawaki, “Development of a generic
rgb to hsv hardware,” in The 1st International Conference on Industrial
Application Engineering 2013 (ICIAE2013), 2013.
true positives [11] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning,
precision = (11)
true positives + f alse positives vol. 20, no. 3, pp. 273–297, 1995.
1781