Zhang 2018
Zhang 2018
Abstract—Traffic sign recognition(TSR) is an indispensable images, within-class variability and between-class similarity
component for vision-based system of self-driving car. Promising always pose challenges. Traditional methods are relying on
results have been achieved which especially benefit from the rapid hand crafted features like Scale-invariant Feature Transfor-
development of deep neural networks recently. However, there are
few works focusing on the algorithms’ performances towards m(SIFT), Histogram of Oriented Gradient(HOG)[8], and Local
different complex conditions, such as weather and viewpoint Binary Patterns(LBP), which can be combined with machine
variations. In this paper, we propose a new real-world TSR learning methods such as Support Vector Machine(SVM)
dataset, which is a dataset with several fine-grained conditions and boosting for classification. The modern history of object
fine labeled involving weather, light condition, occlusion, distance, recognition goes along with the development of Convolution-
color fading and camera angle. Detailed and unbiased com-
parison results are reported about the performances of several al Neural Networks(ConvNets). Deep-learning-based methods
state-of-the-arts on our proposed and five public TSR datasets. gradually become mainstream. Researchers propose kinds of
Experimental results demonstrate that current arts for TSR are methods for object recognition, such as Region-based Convo-
still far from satisfactory especially when it comes to complex lutional Neural Networks(R-CNN)[9], OverFeat[10], SPPNet,
real-world cases. MultiBox, Faster R-CNN[11], SSD[12], YOLOv2[13], and so
Index Terms—Traffic sign recognition(TSR), traffic sign
dataset(TSD) on. These methods shed light on recognizing traffic signs by
using deep-learning-based approaches[14].
I. I NTRODUCTION All in all, pinpointing a traffic sign’s location and its type
in real-world images is a challenging computer vision task[5].
Recently, a prosperity has been observed in the development It is of high industrial relevance in constructing advanced
of automatic driving and driver assistance systems. TSR acts driver assistance systems. Although plenty of algorithms have
as a critical role in the integration of car vision system, been proposed, there is no clear consensus on which one is
providing drivers with safe and efficient navigation. TSR better in dealing with various complex real-world images. This
dataset plays an critical role in developing TSR algorithms can be accounted for a lack of comprehensive comparisons
at both training and testing stages. Therefore, multiple public among these methods regarding to recognizing traffic signs
traffic sign datasets(TSDs) have been collected in the com- in different external conditions. In this paper, we provide ex-
munity over the years, such as UAH Dataset[6], BelgiumTS perimental comparisons of several fundamental methodologies
Dataset(BTSD)[2], CVL Dataset[7], German Traffic Sign De- used for TSR on our proposed and five public TSDs. The main
tection Benchmark(GTSDB)[1], LISA[3], Russian Traffic Sign contributions can be concluded as follows:
Dataset(RTSD)[4], and Tsinghua-Tencent 100K(TT100K)[5]. (i) We introduce a new real-world dataset for TSR, called
Although traffic sign datasets like GTSDB, BTSD, LISA, CTSD. Images cover large variations in weather, illuminance,
RTSD and TT100K have detailed annotations about bound- occlusion, distance, color fading/stains, and viewpoint. To our
ing boxes and sign types, they don’t explicitly classify and best knowledge, no such dataset has been proposed before with
annotate the image conditions like weather and camera angle. so many different conditions fine classified and labeled.
They are not specially designed for evaluating TSR algorithms (ii) We provide a survey on publicly available datasets for
in fine-grained various conditions. UAH classifies images into TSR and give a comparison about their scales, categories, and
different conditions like occlusion, rotation, and color. But it limitations. In particular, these datasets are used to train and
doesn’t provide such annotations, which makes it difficult to test kinds of models, hence we try to unlock the value of the
evaluate on some special cases, such as the performance in data and discover how they suit the real-world road situation.
raining. Thus, a more complex real-world dataset is needed (iii) We investigate into plenty of technologies regarding to
to evaluate the performance of different algorithms. It should object detection. Both traditional methods and deep-learning-
be well organized and take the complexity of real-world based methods are benchmarked.
conditions and diversity of traffic signs into consideration.
The problem of TSR is usually formulated into two stages: II. DATASET
detection and classification, where poor quality of traffic sign We begin by giving a survey on some popular TSDs. And
a new dataset CTSD is proposed to better evaluate numerous
advanced techniques for TSR in different conditions.
Conditions annotated?
Name Region Year Scale Types Image Size Sign Size
W L O D C V
GTSDB[1] German 2013 900 43 1360×800 16×16 ∼ 128×128
BTSD[2] Belgium 2010 9006 62 1628×1236 5×11 ∼ 983×653
LISA[3] America 2012 6610 47 640×480 ∼ 1024×522 6×6 ∼ 167×168
RTSD[4] Russia 2016 104359 156 1280×720 ∼ 1920×1280 16×16 ∼ 307×288
TT100K[5] China 2016 10000 182 2048×2048 6×6 ∼ 438×492 √ √ √ √ √ √
Our Dataset China 2018 2205 153 600×600 ∼ 2048×2048 8×10 ∼ 1186×1000
TABLE III
R ESULTS OF DEEP - LEARNING - BASED METHODS ON DIFFERENT CONDITIONS (P- PRECISION (%), R- RECALL (%)).
approach can be efficiently implemented within a ConvNet. model for SSD. To give an objective comparison without
Zhu et. has modified this project in detection case[5]. prejudice and observe the algorithms’ original performances
Faster R-CNN: To overcome the bottleneck of region in recognizing traffic signs, we haven’t spent much time on
proposal in Fast R-CNN, Ren et al. propose Faster R-CNN. adjusting parameters. Besides, for most dataset, we separate
Region proposal networks which computes proposals using a it into training and testing sets with about 9:1 ratio to give
deep net make the object proposal generator share full-image the algorithms abundant training samples(or 6:3:1 ratio for
convolutional features with the detection network, allowing the training:validation:testing set). The experiments are done on
detection system to achieve a fast frame rate. a NVIDIA GTX 1080Ti 12GB GPU and the NVIDIA Deep
YOLOv2: The YOLO design unifies separate components Learning GPU Training System using the Caffe deep learning
into a single neural network, using features from entire im- framework by Berkeley Learning and Vision Center.
age to predict bounding boxes and classes simultaneously.
It enables end-to-end training and realtime speeds. YOLOv2 B. Results and Analysis
takes additional measures like batch normalization and high We use precision(P) and recall(R) to evaluate the results.
resolution classifier to improve performance. The Intersection over Union (IoU) threshold is set as 0.5
SSD: The key feature of SSD is regarding to the use of for all cases. In deep-learning-based methods, we compute
multi-scale convolutional bounding box outputs, which are P = Ntp /Nrt and R = Ntp /Ngt , where Ntp is the number
attached to multiple feature maps at the top of the network. of traffic signs whose location and class are both accurately
SSD completely eliminates proposal generation and subse- predicted. Nrt and Ngt represents the number of traffic signs
quent pixel or feature resampling stages. It encapsulates all for predicted result and ground truth, separately. In traditional
computation in a single network. methods, we define P = Ntp /Nrt and R = Ntb /Ngt ,
where Ntb represents the number of boxes localized in right
IV. E XPERIMENT
position(IoU≥0.5) obtained through SS method.
A. Setup Table. II shows the results of different methods on various
Traditional methods are implemented by Python and Scikit datasets, which could be concluded three-fold: 1) Tradition-
Learn package. To mimic the region proposal step, Selective al method of SVM classifier relying on HOG descriptors
Search(SS) is used to generate candidate object locations greatly outperforms the combination of Haar-like feature and
which are then resized to 50×50 pixels. The Haar-like feature Adaboost for most datasets. Because HOG calculates the
descriptors come into “type-4” features in a 10×10 pixel gradients and fully utilizes available image information re-
window. We set 1000 Decision Trees with depth 2 as the weak vealed by abrupt edges while Haar-like feature blurs this.
classifiers in the Adaboost algorithm. As for constructing HOG The precision of “HOG+SVM” even has 9 times and 4 times
descriptor, we apply 24×24 pixel blocks of nine 8×8 pixel increases than “Haar+Adaboost” for TT100K and our dataset.
cells and define the orientations of 8. Then a non-linear SVM Adaboost performs bad in imbalanced dataset(TT100K), while
classifier with “RBF” kernel is applied. SVM is not good at dealing with large-scale dataset(RTSD)
In the deep-learning-based methods, we use stochastic gra- and the training consumes too much time. SS couldn’t find
dient descent(SGD) with momentum of 0.9 and weight decay enough positive proposals which leads to low recall values
of 0.0005. We tune ZF net in Faster R-CNN and apply SSD300 about 10%∼20%. 2) Deep-learning-based methods perform
Fig. 3. Examples of recognition result of traffic signs. Green boxes indicate exactly the right localization and classification. Red boxes indicate wrong
detection. Orange boxes represent traffic signs which haven’t been successfully detected in the images.
much better than traditional methods, especially in large-scale will collect more data to enrich our dataset.
datasets like RTSD and BTSD which provide abundant train-
ACKNOWLEDGMENT
ing samples. The development of ConvNets has brought about
large gains in object detection. However, for small objects This work is supported by the National Natural Science
such as traffic signs in TT100K, “HOG+SVM” also have a Foundation of China (61601042, 61671078, 61701031), 111
comparable performance with them. 3) In deep-learning-based Project of China (B08004, B17007), and Center for Data
methods, Faster R-CNN and Yolov2 achieve higher precisions Science of Beijing University of Posts and Telecommuni-
on large-scale datasets like RTSD and BTSD. Without a cations. We gratefully acknowledge the support of NVIDIA
follow-up feature resampling step, the classification for small Corporation for the donation of the GPUs used for this
objects is hard for SSD, which has low recall values. YOLOv2 research.
has applied some measures such as batch normalization, high R EFERENCES
resolution classifier, and convolutional with anchor boxes,
[1] S. Houben and et al., “Detection of traffic signs in real-world images:
which makes its recall values comparable to those region- The german traffic sign detection benchmark,” in IJCNN. IEEE, 2013,
proposal-based methods like Faster R-CNN and OverFeat. pp. 1–8.
From dataset perspective, it has been observed that the more [2] R. Timofte and et al., “Multi-view traffic sign detection, recognition,
and 3d localisation,” Machine vision and applications, vol. 25, no. 3,
complex the images are, the more difficult it is to recognize pp. 633–647, 2014.
traffic signs inside. For example, although TT100K has a [3] A. Mogelmose and et al., “Vision-based traffic sign detection and anal-
great number of pictures, traffic signs are pretty small in ysis for intelligent driver assistance systems: Perspectives and survey,”
IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 4,
high-resolution pictures and the distribution of sign classes pp. 1484–1497, 2012.
is severely imbalanced, which leads to a lower precision and [4] V. I. Shakhuro and et al., “Russian traffic sign images dataset,” Computer
recall value compared with the performance of same methods Optics, vol. 40, no. 2, pp. 294–300, 2016.
[5] Z. Zhu and et al., “Traffic-sign detection and classification in the wild,”
on other datasets. Moreover, the images in our dataset has been in CVPR, 2016, pp. 2110–2118.
labeled with different conditions which provides an opportuni- [6] S. Maldonado-Bascón and et al., “Road-sign detection and recognition
ty for us to observe the performances of algorithms in various based on support vector machines,” IEEE transactions on intelligent
transportation systems, vol. 8, no. 2, pp. 264–278, 2007.
complex real-world conditions. We show how deep-learning- [7] F. Larsson and M. Felsberg, “Using fourier descriptors and spatial
based methods perform regarding to different conditions in models for traffic sign recognition,” in Scandinavian Conference on
Table. III, and Fig.3 gives some examples of recognition Image Analysis. Springer, 2011, pp. 238–249.
[8] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
result. Signs in small size or with color fading/occlusions are detection,” in CVPR, vol. 1. IEEE, 2005, pp. 886–893.
challenging for detection. The camera angle and bad weather [9] R. Girshick and et al., “Rich feature hierarchies for accurate object
like rainy days also bring about difficulty in recognizing. detection and semantic segmentation,” in CVPR, 2014, pp. 580–587.
[10] P. Sermanet and et al., “Overfeat: Integrated recognition, localiza-
tion and detection using convolutional networks,” arXiv preprint arX-
V. C ONCLUSION iv:1312.6229, 2013.
[11] S. Ren and et al., “Faster r-cnn: Towards real-time object detection with
In this paper, we introduce a new real-world dataset for region proposal networks,” in Advances in neural information processing
TSR. Compared with other public TSDs, the traffic signs in our systems, 2015, pp. 91–99.
[12] W. Liu and et al., “Ssd: Single shot multibox detector,” in ECCV.
dataset cover various conditions of weather, light, occlusion, Springer, 2016, pp. 21–37.
color fading, distance, viewpoint, and etc. The experimental [13] J. Redmon and A. Farhadi, “Yolo9000: better, faster, stronger,” arXiv
results on our proposed dataset illustrated that recognizing preprint, 2017.
[14] Á. Arcos-Garcı́a and et al., “Deep neural network for traffic sign
traffic signs is still very challenging especially in complex recognition systems: An analysis of spatial transformers and stochastic
real-world conditions like bad weather or oblique viewpoint. optimisation methods,” Neural Networks, vol. 99, pp. 158–165, 2018.
And the proposed dataset provide a benchmark for special [15] P. Viola and M. Jones, “Rapid object detection using a boosted cascade
of simple features,” in CVPR, vol. 1. IEEE, 2001, pp. I–I.
cases. However, the scale is not big yet. In future work, we