0% found this document useful (0 votes)
21 views6 pages

A Method For Traffic Sign Recognition With CNN Using GPU

Uploaded by

kaythinzar phue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views6 pages

A Method For Traffic Sign Recognition With CNN Using GPU

Uploaded by

kaythinzar phue
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

A Method for Traffic Sign Recognition with CNN using GPU

Alexander Shustanov and Pavel Yakimov


Samara National Research University, Moskovskoye shosse 34, Samara, Russia

Keywords: TensorFlow, Convolutional Neural Networks, Traffic Sign Recognition, Image Processing, Computer Vision,
Mobile GPU.

Abstract: In recent years, the deep learning methods for solving classification problem have become extremely popular.
Due to its high recognition rate and fast execution, the convolutional neural networks have enhanced most of
computer vision tasks, both existing and new ones. In this article, we propose an implementation of traffic
signs recognition algorithm using a convolution neural network. Training of the neural network is
implemented using the TensorFlow library and massively parallel architecture for multithreaded
programming CUDA. The entire procedure for traffic sign detection and recognition is executed in real time
on a mobile GPU. The experimental results confirmed high efficiency of the developed computer vision
system.

1 INTRODUCTION of traffic sign recognition. The dataset from GTSDB


(Stallkamp et al., 2012) was used for training and
Development of the technical level of modern mobile testing the developed algorithms. Figure 1 shows the
processors enabled many vehicle producers to install images for training the traffic signs recognition
computer vision systems into customer cars. These algorithm and testing the localization algorithm.
systems help to significantly improve the safety and
implement an important step on the way to
autonomous driving. Among other tasks solved with
computer vision, the traffic sign recognition (TSR)
problem is one of the most well-known and widely
discussed by lots of researchers. However, the main
problems of such systems are low detection accuracy
and high demand for hardware computational
performance, as well as the inability of some systems
classify the traffic signs from different countries.
Recognition of traffic signs is usually solved in Figure 1: Images from GTSDB (Stallkamp et al., 2012).
two steps: localization and subsequent classification.
There are many different localization methods While testing the developed technology for
(Nikonorov et al., 2013), (Ruta et al., 2009), detecting and classifying traffic signs in real
(Belarossi et al., 2010). In papers (Fursov et al., 2013) conditions, i.e. using videos from cameras installed
and (Yakimov, 2015), the authors proposed effective on a windshield, the end-to-end technology showed
implementations of the image preprocessing and significant decrease in the efficiency. Studies have
traffic signs localization algorithms, which performed shown that such a decrease arose because of too
in real time. Using a modified Generalized Hough strong variations in the illumination, contrast, and
Transform (GHT) algorithm, the solution allowed to angle of rotation in images of localized traffic signs.
determine the exact coordinates of a traffic sign in the Thus, a simple classification algorithm like template
acquired image. Thus, in the classification stage, the matching was not able to achieve high-qaulity
simple template matching algorithm was used. recognition because of a limited set of predefined
Combined with precise localization stage, this templates. To improve the system performance, the
algorithm showed the final results of 97.3% accuracy localization algorithm that has shown good results

42
Shustanov, A. and Yakimov, P.
A Method for Traffic Sign Recognition with CNN using GPU.
DOI: 10.5220/0006436100420047
In Proceedings of the 14th International Joint Conference on e-Business and Telecommunications (ICETE 2017) - Volume 5: SIGMAP, pages 42-47
ISBN: 978-989-758-260-8
Copyright © 2017 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
A Method for Traffic Sign Recognition with CNN using GPU

can be combined with recognition using the significantly increases the confidence of correct
convolutional neural networks that have received recognition. Classification, which is the final step,
such a wide application in recent years (Zhu et al., ensures that the entire procedure has been executed
2016), (LeCun and Servanet, 2011). successfully.
In this paper, we describe a revised end-to-end
technology for detecting and recognizing traffic signs
in real time. The developed system uses the speed 3 TRAFFIC SIGN
received from the vehicle. This allows you to predict
not only the presence of the object, but also the scale CLASSIFICATION
and its exact coordinates in the neighboring frame.
Thus, the accuracy of detection increases, while the 3.1 Convolutional Neural Networks
computational complexity remains the same. The
classification of localized objects is implemented Classification with artificial neural networks is a very
using convolutional neural networks (CNNs). The use popular approach to solve pattern recognition
of the GPU allows real-time processing of the frames problems. A neural network is a mathematical model
in the video sequence. based on connected via each other neural units –
artificial neurons – similarly to biological neural
networks. Typically, neurons are organized in layers,
and the connections are established between neurons
2 TRAFFIC SIGN from only adjacent layers. The input low-level feature
LOCALIZATION AND vector is put into first layer and, moving from layer to
TRACKING layer, is transformed to the high-level features vector.
The output layer neurons amount is equal to the
The developed technology for traffic signs number of classifying classes. Thus, the output vector
recognition consists of three steps: image is the vector of probabilities showing the possibility
preprocessing, localization and classification. that the input vector belongs to a corresponding class.
During image preprocessing, the HSV color space An artificial neuron implements the weighted
is used to extract red and blue pixels from an image. adder, which output is described as follows:
Due to errors in the process of images acquiring and ∑ , (1)
the presence of small colored objects, some point-like
noise occurs in the images after applying a threshold where is the jth neuron in the ith layer, stands
filter. To address this point-like noise we apply the for weight of a synapse, which connects the jth neuron
algorithm described in (Fursov et al., 2013). Paper in the ith layer with the kth neuron in the layer i-1.
(Yakimov, 2013) shows the effective implementation Widely used in regression, the logistic function is
of the algorithm for noise removal implemented using applied as an activation function. It is worth noting
CUDA. With GPUs, the acceleration reaches 60-80 that the single artificial neuron performs the logistic
times as compared with conventional executing on a regression function.
CPU. The frame size is 1920x1080 pixels. Using the The training process is to minimize the cost
CUDA-enabled mobile GPU NVIDIA Jetson TK1 function with minimization methods based on the
allows to preprocess one videoframe within 7-10 ms, gradient decent also known as backpropagation. In
which satisfies the requirements of video processing classification problems, the most commonly used
in real time. cost function is the cross entropy:
Paper (Yakimov, 2015) addresses the algorithms
for detecting and tracking traffic signs. The method , ∑ (2)
for localization, which is a modification of the Training networks with large number of layers, also
generalized Hough transform, has been developed called deep networks, with sigmoid activation is
considering the constraints on the time for processing difficult due to vanishing gradient problem. To
a single frame. The algorithm shows effective results overcome this problem, the ReLU function is used as
and functions well with the preprocessed images. an activation function:
Tracking using the value of the vehicle current speed
has improved the performance of the system, as the 0, 0
(3)
search area in the adjacent frames can be significantly , else
reduced. In addition, the presence of a sign in the Today, classifying with convolutional neural
sequence of adjacent frames in predicted areas networks is the state of the art pattern recognition

43
SIGMAP 2017 - 14th International Conference on Signal Processing and Multimedia Applications

method in computer vision. Unlike traditional neural Table 1: Neural network architecture.
networks, which works with one-dimensional feature Layer
vectors, a convolutional neural network takes a two- Convolutional, stride 2, kernel 7x7x4
dimensional image and consequentially processes it Convolutional, stride 2, kernel 5x5x8
with convolutional layers. Convolutional, stride 2, kernel 3x3x16
Each convolutional layer consists of a set of Convolutional, stride 2, kernel 3x3x32
trainable filters and computes dot productions Convolutional, stride 1, kernel 2x2x16
between these filters and layer input to obtain an Convolutional, stride 1, kernel 2x2x8
activation map. These filters are also known as Convolutional, stride 1, kernel 2x2x4
kernels and allow detecting the same features in Fully connected-64
different locations. For example, Figure 2 shows the Fully connected-16
result of applying convolution to an image with 4 Softmax
kernels.
To train and evaluate the model, the initial dataset
was divided into the train and test datasets with ratio
80/20 correspondently. At the training stage, the
network processed the batch of 50 images from the
train dataset per one iteration. Every 100 iterations,
the intermediate accuracy was computed with batch
of 50 images from the test dataset. After successful
training, the accuracy was computed using all images
from the test dataset. Figure 3 shows the classification
accuracy growing with increasing the number of
Figure 2: Input image convolution. training iterations. The graph shows that, starting with
the 2000th iteration, the network reaches the
3.2 Proposed Implementation classification accuracy above 0.9.

To solve the traffic sign recognition task, we used the


deep learning library TensorFlow (Abadi et al.,
2016). Training and testing were implemented using
the dataset from GTSRB (Houben et al., 2013). The
developed method can classify the 16 most popular
traffic signs types.
Table 1 describes the developed network
architecture. It consists of several convolutional
layers, fully connected layers and one softmax layer.
Some convolutional layers have parameter stride
equal to 2. This parameter determines the stride of the Figure 3: Classification accuracy changing with training
iterations.
convolution sliding window, so layers with parameter
stride greater than 1 also combine the pooling
operation. The softmax layer normalizes the previous
layer output so that its output contains probabilities of 4 EXPERIMENTAL RESULTS
belonging to recognizable classes for the original
input image. As the paper emphasizes on an end-to-end solution to
TensorFlow contains a set of tools to visualize real-time traffic sign localization and recognition, it
models at different abstraction levels down to low- is necessary to evaluate preprocessing, localization
level mathematical operations. The common name of and classification performance. Paper (Yakimov,
these tools is TensorBoard. The presented model can 2015) shows an effective implementation of
be divided into two stacked blocks: the convolutional localization with preprocessing algorithms that
block and the fully connected block. executes in 20 ms.
To evaluate the classification execution time, we
used the GPUs Nvidia GeForce GTX 650 and Nvidia
GeForce GT 650M, and CPU Intel Core i7 5500u.
Table 2 shows the results.

44
A Method for Traffic Sign Recognition with CNN using GPU

Table 2: CNN training and classifying execution time. Table 3: Accuracy and performance of TSR methods.
Classifying an image Method Accuracy FPS
Hardware Training
(64x64) Sliding window + SVN 100 % 1
Nvidia GeForce GTX 650 7 min 0.05 ms Modified GHT with preprocessing
99.94 % 50
Nvidia GeForce GT 650M 12 min 0.14 ms + CNN (this paper)
ConvNet 99.55 % 38
Intel Core i7 16 min 0.37 ms
Modified GHT with preprocessing 97.3 % 43
Modified GHT without preprocessing 89.3 % 25
To evaluate the localization and recognition Viola-Jones 90.81 % 15
algorithms accuracy, we used the German Traffic HOG 70.33 % 20
Sign Detection Benchmark (GTSDB) (Stallkamp et
al., 2012) and the German Traffic Sign Recognition However, the accuracy doesn’t reach 100 %.
Benchmark (GTSRB) (Houben et al., 2013). They Figure 5 shows the images of traffic signs that were
contain more than 50,000 images with traffic signs recognized incorrectly.
registered in various conditions. To assess the quality As it is seen in Figure 5, the quality of input
of the sign localization, we counted the number of images strongly influences on the recognition rate. It
images with correctly recognized traffic signs. When means that such high classification quality will not
testing the developed algorithms, we used only 9,987 always be obtainable when using the developed
images containing traffic signs of the required shape algorithms in real world. However, all the mentioned
and with red contours. The experiments showed in Table 3 algorithms will suffer from this input
99.94% of correctly localized and detected images quality.
prohibitory and danger traffic signs. The developed algorithm was also tested on the
Table 3 shows the resulting accuracy and video frames obtained in the streets using an Android
performance of the detection algorithms from device Nvidia Shield Tablet built in to a car. Figure 6
(Stallkamp et al., 2012), (Yakimov, 2015), (Aghdam shows the fragments of the original images with
et al. 2016) and the method described in this paper. marked road signs on them.
The accuracy of all methods shown in the table
was obtained using the dataset GTSDB. The sliding
window method (Mathias et al., 2013) shows the best
result with 100% of accuracy. However, the described
in this paper modified GHT+CNN reaches the best
performance.
One of the most efficient methods for TSR using
GTSDB and GTSRB is the method using ConvNet
for both localizing and classifying traffic signs
(Aghdam et al. 2016). The authors show results
reaching precision equal to 99.89% when detecting a
sign and 99.55% when classifying it. Also, the
method can process 37.72 high-resolution images per
second. The method described in this paper shows
slightly better results in both precision and
performance, but it is difficult to compare FPS as
there is no description of the hardware used for Figure 4: Successful classification.
experiments.
Figure 4 shows images of traffic signs that were
successfully recognized by the proposed in this paper
CNN implementation. The picture shows that the
applied method gives good recognition results even
with traffic signs images, which are not easy to
recognize for a human.
Figure 4 shows images of traffic signs that were
successfully recognized by the proposed in this paper
CNN implementation. The picture shows that the
applied method gives good recognition results even
with traffic signs images, which are not easy to
Figure 5: Unsuccessful classification.
recognize for a human.

45
SIGMAP 2017 - 14th International Conference on Signal Processing and Multimedia Applications

REFERENCES
Shneier, M., 2005. Road sign detection and recognition.
Proc. IEEE Computer Society Int. Conf. on Computer
Vision and Pattern Recognition, pp. 215–222.
Nikonorov, A., Yakimov, P., Petrov, M., 2013. Traffic sign
detection on GPU using color shape regular
expressions. VISIGRAPP IMTA-4, Paper Nr 8.
Belaroussi, R., Foucher, P., Tarel, J. P., Soheilian, B.,
Charbonnier, P., Paparoditis, N., 2010. Road Sign
Detection in Images. A Case Study, 20th International
Conference on Pattern Recognition (ICPR), pp. 484-
488.
Ruta, A., Porikli, F., Li, Y., Watanabe, S., Kage, H., Sumi,
K., 2009. A New Approach for In-Vehicle Camea
Traffic Sign Detection and Recognition. IAPR
Conference on Machine Vision Applications (MVA),
Session 15: Machine Vision for Transportation.
Stallkamp J., Schlipsing M., Salmen J., Igel C., 2012. Man
vs. computer: Benchmarking machine learning
algorithms for traffic sign recognition. Neural
networks, vol. 32, pp. 323-332.
Figure 6: Localized and recognized traffic signs. Houben, S., Stallkamp, J., Salmen, J., Schlipsing, M., Igel,
C.: Detection of Traffic Signs in Real-World Images:
The {G}erman {T}raffic {S}ign {D}etection
{B}enchmark. In: Proc. International Joint Conference
5 CONCLUSIONS on Neural Networks, 2013.
Fursov, V., Bibkov, S., Yakimov, P., 2013. Localization of
This paper considers an implementation of the objects contours with different scales in images using
Hough transform [in Russian]. Computer optics, vol.
classification algorithm for the traffic signs
37(4), pp. 502-508.
recognition task. Combined with preprocessing and Yakimov, P., 2015. Tracking traffic signs in video
localization steps from previous works, the proposed sequences based on a vehicle velocity [in Russian].
method for traffic signs classification shows very Computer optics, vol. 39(5), pp. 795-800.
good results: 99.94 % of correctly classified images. Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., Hu, S.,
The proposed classification solution is 2016. Traffic-Sign Detection and Classification in the
implemented using the TensorFlow framework. Wild. Proceedings of CVPR, pp. 2110-2118.
The use of our TSR algorithms allows processing LeCun, Y., Sermanet, P., 2011. Traffic Sign Recognition
of video streams in real-time with high resolution, and with Multi-Scale Convolutional Networks.
Proceedings of International Joint Conference on
therefore at greater distances and with better quality
Neural Networks (IJCNN'11).
than similar TSR systems have. FullHD resolution Yakimov, P., 2013. Preprocessing of digital images in
makes it posiible to detect and recognize a traffic sign systems of location and recognition of road signs [in
at a distance up to 50 m. Russian]. Computer optics, vol. 37 (3), pp. 401-405.
The developed method was implemented on a M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C.
device with Nvidia Tegra K1 processor. CUDA was Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S.
used to accelerate the performance of the described Ghemawat, I. J. Goodfellow, A. Harp, G. Irving, M.
methods. In future research, we plan to train the CNN Isard, Y. Jia, R. Jozefowicz, L. Kaiser, ´ M. Kudlur, J.
to consider more traffic sign classes and possible bad Levenberg, D. Mane, R. Monga, S. Moore, D. G.
Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I.
weather conditions. In current, versions we
Sutskever, K. Talwar, P. A. Tucker, V. Vanhoucke, V.
considered only daylight and good visibility. Vasudevan, F. B. Viegas, O. Vinyals, P. Warden, M.
Watten- ´ berg, M. Wicke, Y. Yu, and X. Zheng.
TensorFlow: Large-scale machine learning on
ACKNOWLEDGEMENTS heterogeneous distributed systems. arXiv preprint,
1603.04467, 2016. arxiv.org/abs/1603.04467.
Software available from tensorflow.org.
This work was supported by the Russian Foundation Mathias, M., Timofte, R., Benenson, R., Gool, L., 2013.
for Basic Research - Project # 16-37-60106 Traffic sign recognition - how far are we from the
mol_a_dk.

46
A Method for Traffic Sign Recognition with CNN using GPU

solution? Proceedings of IEEE International Joint


Conference on Neural Networks.pp. 1-8.
Aghdam, H., Heravi, E., Puig, D., 2016, A practical
approach for detection and classification of traffic signs
using Convolutional Neural Networks, Robotics and
Autonomous Systems, Vol. 84, pp. 97-112.

47

You might also like