JJSB 2018
JJSB 2018
http://www.scirp.org/journal/jcc
ISSN Online: 2327-5227
ISSN Print: 2327-5219
Junyan Lu1*, Chi Ma2, Li Li2, Xiaoyan Xing2, Yong Zhang2, Zhigang Wang2, Jiuwei Xu2
How to cite this paper: Lu, J.Y., Ma, C., Li, Abstract
L., Xing, X.Y., Zhang, Y., Wang, Z.G. and
Xu, J.W. (2018) A Vehicle Detection Me- With the application of UAVs in intelligent transportation systems, vehicle
thod for Aerial Image Based on YOLO. detection for aerial images has become a key engineering technology and has
Journal of Computer and Communications, academic research significance. In this paper, a vehicle detection method for
6, 98-107.
https://doi.org/10.4236/jcc.2018.611009
aerial image based on YOLO deep learning algorithm is presented. The me-
thod integrates an aerial image dataset suitable for YOLO training by
Received: August 29, 2018 processing three public aerial image datasets. Experiments show that the
Accepted: November 12, 2018
training model has a good performance on unknown aerial images, especially
Published: November 19, 2018
for small objects, rotating objects, as well as compact and dense objects, while
meeting the real-time requirements.
Keywords
Vehicle Detection, Aerial Image, YOLO, VEDAI, COWC, DOTA
1. Introduction
In recent years, with the rapid development of information technology, intelli-
gent transportation systems have become an important way of modern traffic
management and an inevitable trend. As the key technology of intelligent trans-
portation system, vehicle detection is the basis for realizing many important
functions [1], such as measurement and statistics of traffic parameters such as
traffic flow and density, vehicle location and tracking, and traffic data mining,
etc.
At the same time, with the technology maturity and market popularization of
UAV (Unmanned Aerial Vehicle), which has characteristics of being lightweight,
flexible, and cheap, the aerial photography of UAVs in the application of scenes
such as traffic information collection and traffic emergency response reflects a
huge advantage.
In summary, vehicle detection for aerial image plays an important role in en-
gineering applications. In addition, the technology relies on machine vision, ar-
tificial intelligence, image processing and other disciplines, and is a typical ap-
plication of interdisciplinary research. Therefore, it also has important research
significance in academics.
Based on YOLO deep learning algorithm and three public aerial image data-
sets, this paper presents a vehicle detection method for aerial image.
2. Related Work
The commonly used vehicle detection methods proposed by domestic and for-
eign scholars are mainly divided into three categories: based on motion informa-
tion, based on features, and based on template matching. Cheng and others use
background subtraction and registration methods to detect dynamic vehicles [2],
Azevedo and others based on median background difference method to detect
vehicles in aerial images [3]. The above two methods achieve the detection of
moving objects, however, because the aerial video has the characteristics of
complex scenes and diverse objects, the two methods cannot achieve the desired
effect for accurate vehicle detection, and false and missed detection are also se-
rious. Sivaraman and others combined Haar features and Adaboost to detect ve-
hicles and implement vehicle detection on highways [4], Tehrani and others
proposed a vehicle detection method based on HOG features and SVM to
achieve vehicle detection in urban roads [5]. The above two methods improve
the accuracy of detection, but since the traditional machine learning method
only supports training for a small amount of data, there is still a shortage of de-
tection of vehicle diversity.
In recent years, with the updating of computer hardware, especially GPU
technology, the deep learning algorithms have been rapidly developed when
solving problems in the fields of pattern recognition and image processing, and
are more efficient and precise than traditional algorithms. Therefore, this paper
uses a deep learning algorithm, YOLO, to achieve vehicle detection.
3.1. YOLO v1
1) Basic idea
YOLO divides the input image into S × S grids. If the center coordinate of
the GT (Ground Truth) of an object falls into a grid, the grid is responsible for
detecting the object. The innovation of YOLO is that it reforms the Region Pro-
posal detection framework: RCNN series need to generate Region Proposal in
which to complete classification and regression. But there is overlap between
Region Proposal, which will bring a lot of repetition work. However, YOLO pre-
dicts the bbox (bounding box) of the object contained in all grids, the location
reliability, as well as the probability vectors of all classes at one time, thus it
solves problem one-shot.
2) Network structure
YOLO network borrows Google Net while the difference is that YOLO uses
the 1 × 1 convolutional layer (for cross-channel information integration) +
3 × 3 convolutional layer instead of the Inception module simply. YOLO v1
network structure consists of 24 convolution layers and 2 full connection layers,
as shown in Figure 1.
3.2. YOLO v2
Compared with the region proposal based method such as Fast R-CNN, YOLO
v1 has a larger positioning error and a lower recall rate. Therefore, the main im-
provements of YOLO v2 are to enhance the recall rate and positioning ability,
and include:
1) BN (Batch Normalization)
BN is a popular training technique since 2015. By adding BN layer after each
layer, the entire batch data can be normalized to a space with a mean of 0 and
variance of 1, which can prevent the gradient from disappearing as well as gra-
dient explosion, and make network convergence faster.
2) Anchor boxes
In YOLO v1, the full connection layer is used to predict the coordinates of
bbox directly after the convolutional layer. YOLO v2 removes the full connec-
tion layer by using the idea of Faster R-CNN, and adds Anchor Boxes, which ef-
fectively improves the recall rate.
3) Multi-scale training
The input image size for the YOLO v1 training network is fixed, where YOLO
v2 adjusts the input image size randomly every 10 epoch during training, so that
the model has a good detection effect on the multi-scale input images during the
test.
3.3. YOLO v3
YOLO v3 model is much more complex than YOLO v2, and its detection on
small objects, as well as compact dense or highly overlapping objects is very ex-
cellent. The main improvements include:
1) Loss
YOLO v3 replaces the Softmax Loss of YOLO v2 with Logistic Loss. When the
predicted objects classes are complex, especially when there are many overlap-
ping labels in the dataset, it is more efficient to use Logistic Regression.
2) Anchor
YOLO V3 uses nine anchors instead of the five anchors of YOLO v2, which
improves the IoU.
3) Detection
YOLO v2 only uses one detection while YOLO v3 uses three, which greatly
improves the detection effect on small objects.
4) Backbone
YOLO v3 replaces darknet-19 network of YOLO v2 with darknet-53 network,
which improves the accuracy of object detection by deepening the network.
This paper uses the latest YOLO v3 model to achieve the vehicle detection for
aerial image.
4.3. DOTA
DOTA (Dataset for Object detection in Aerial images) is an aerial image dataset
Table 1. The basic information of the three public aerial image datasets.
Image
Dataset Images Classes Image size Annotations
Format
2) COWC
a) Delete the grayscale images;
b) Delete the annotation of negative samples, leaving only the positive sample
“car”;
c) Split the images of COWC into 416 × 416 size and convert to JPEG format.
When splitting, the coordinate of the sample center point is converted accor-
dingly to ensure its position in the new image is correct. The remaining images
less than 416 × 416 are padded with black.
d) According to the GSD of COWC, it is assumed that the size of vehicle in
the image is unified to 48 * 48 pixels, therefore,
w= h= 48 / 416= 0.115384615384615... (2)
3) DOTA
a) Except for “large vehicle” and “small vehicle”, delete all the annotations of
other 13 classes in labels, “large vehicle” and “small vehicle” are unified to “car”;
b) Split the images of DOTA into 1024 × 1024 size. When splitting, the coor-
dinates of GT’s 4 corners are converted accordingly to ensure their positions in
the new images are correct. Abandon the remaining imagesless than 1024 ×
1024.
c) Center point coordinate:
x=
( xmax + xmin ) / 2, y =
( ymax + ymin ) / 2 (3)
After processing, the information of the new datasets are shown in Table 2.
2) Number of iterations
The dataset contains a total of 20,542 images, so one epoch needs to iterate:
20542 / 64 ≈ 320 times.
The YOLO training defaults to iterate 160 epochs, so the number of iterations
is: 160 × 320 =51200 times.
3) Learning rate
The initial learning rate is 0.001, after 60 epoch divided by 10, after 90 epoch
divided by 10 once again.
4) Number of filters in the last layer of the network
filters = (class + 5) × 3 = (6 + 5) × 3 = 33
6. Experimental Results
In this paper, we use NVIDIA’s TITAN X graphics card for training. The train-
ing duration is about 60 hours. The test results of the training model are shown
in Table 3.
The detection effect of the training model on unknown images are shown in
Figure 2 (the original images are from Internet, please inform if there is any in-
fringement).
Figure 2 (left) shows that the training model has a good effect on detection of
small objects. The vehicles in Figure 2 (middle) are mostly not horizontal or
vertical with rotation, test result shows that the model has a good performance
on the detection of rotating objects, especially the leftmost vehicle in the image is
very close to the background, while the manual detection may miss the object,
and the model correctly detects it. Figure 2 (right) indicates it is outstanding
that the model on detection of compact and dense objects, more than 95% of the
vehicles are correctly detected except for those in the far left shadow.
7. Conclusion
In this paper, a vehicle detection method based on YOLO deep learning algo-
rithm for aerial image is presented. This method integrates an aerial image data-
set suitable for YOLO training by processing three public datasets. The training
model has good test results especially for small objects, rotating objects, as well
as compact and dense objects, and meets the real-time requirements. Next, we
will integrate more public aerial image datasets to increase the number and di-
versity of training samples, at the same time, optimize the YOLO algorithm to
further improve the detection accuracy.
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this pa-
per.
References
[1] Qiu, Y. (2014) Video-Based Vehicle Detection in Intelligent Transportation System.
Master Thesis, Jilin University, China.
[2] Cheng, P., Zhou, G. and Zheng, Z. (2009) Detecting and Counting Vehicles from
Small Low-Cost UAV Images. Proceedings of ASPRS 2009 Annual Conference, Bal-
timore, 1-7.
[3] Azevedo, C.L., Cardoso, J.L., Ben-Akiva, M., Costeira, J.P. and Marques, M. (2014)
Automatic Vehicle Trajectory Extraction by Aerial Remote Sensing. Procedia-Social
and Behavioral Sciences (S1877-0428), 111, 849-858.
https://doi.org/10.1016/j.sbspro.2014.01.119
[4] Sivaraman, S. and Trivedi, M.M. (2010) A General Active-Learning Framework for
On-Road Vehicle Recognition and Tracking. Nlgn Ranoraon Ym Ranaon on, 2,
267-276. https://doi.org/10.1109/TITS.2010.2040177
[5] Tehrani, H., Akihiro, T., Mita, S. and Mcallester, D.A. (2012) On-Road Multivehicle
Tracking Using Deformable Object Model and Particle Filter with Improved Like-
lihood Estimation. IEEE Transactions on Intelligent Transportation, 2, 748-758.
https://doi.org/10.1109/TITS.2012.2187894
[6] Redmon, J., Divvala, S., Girshick, R. and Farhadi, A. (2016) You Only Look Once:
Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, 27-30 June 2016, 779-788.
https://doi.org/10.1109/CVPR.2016.91
[7] Redmon, J. and Farhadi, A. (2017) YOLO9000: Better, Faster, Stronger. Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu,
21-26 July 2017, 6517-6525. https://doi.org/10.1109/CVPR.2017.690
[8] Redmon, J. and Farhadi, A. (2018) YOLO v3: An Incremental Improvement. ar-
xiv:1804.02767v1 [cs.CV], Unpublished.
[9] Razakarivony, S. and Jurie, F. (2015) Vehicle Detection in Aerial Imagery: A Small
Target Detection Benchmark. Journal of Visual Communication & Image Repre-
sentation, 34, 187-203. https://doi.org/10.1016/j.jvcir.2015.11.002
[10] Mundhenk, T.N., Konjevod, G., Sakla, W.A. and Boakye, K. (2016) A Large Con-
textual Dataset for Classification, Detection and Counting of Cars with Deep
Learning. Proceedings of European Conference on Computer Vision, Springer,
2016, 785-800.
[11] Xia, G.S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J.L., et al. (2018) DOTA: A
Large-Scale Dataset for Object Detection in Aerial Images. arXiv: 1711.10398v2
[cs.CV], Unpublished.