Tracking and Counting of Vehicles For Flow Analysis From Urban Traffic Videos

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Tracking and counting of vehicles for flow analysis

from urban traffic videos


Igor S. Farias, Bruno J. T. Fernandes, Edison Q. Albuquerque and Byron L. D. Leite
Polytechnic School of Pernambuco - POLI
University of Pernambuco - UPE
Recife, Brazil
{isf, bjtf, edison, byronleite}@ecomp.poli.br

Abstract—Among the major problems faced by urban centers is essential [4]. Based on the density of vehicles in the street,
there is traffic congestion. This problem comes from the growing it becomes possible to develop a mechanism that effectively
number of vehicles on the streets and has already become the reduces congestion. From the number of vehicles that travel
subject of several researches seeking for solutions to it. Among
the mechanisms that allow congestion reduction is traffic control, on a particular route, it becomes possible to obtain predictions
which requires metrics that enable traffic analysis in real time. of traffic jams, which makes it possible to take some actions,
To determine the flow of vehicles the widely used mechanism such as a better adjustment of a traffic light’s time.
is the counting of occurrence of vehicles on a street, which To obtain information about the flow of vehicles in real
is usually performed from sensors (e.g. magnetic or thermal). time, it is necessary to use substantial computational power
However, these approaches have a rather high installation and
maintenance difficulty. Thus, the objective of this paper is to together with images obtained by cameras. The use of simpler
present a mechanism capable of counting from video images. sensors, such as magnetic sensors, has become unfeasible these
To accomplish this task it is used image processing resources days mainly due to the difficulty inherent in their installation
that do not require large computational power, thus allowing the and maintenance [5]. When using video cameras that have a
mechanism to be easily coupled to common transit systems. The greater ease of handling, the amount of information that can
result obtained has an accuracy of more than 90 % in videos of
urban traffic cameras. be obtained increases significantly, but for this, it becomes
necessary to use efficient systems to make the processing
Key words—Vehicle detection, vehicle count, image processing. useful when executed in real time [6].
The essential information for the process of traffic control
is the flow of vehicles [7]. In this way, this paper presents an
I. I NTRODUCTION
approach to be executed in real time that makes it possible to
Transportation has always been an extremely relevant aspect obtain the counting of occurrences of the vehicles. This infor-
for society, the great advance of humanity is due to the mation, when processed, has great value for traffic control.
mobility, in addition to being treated as synonymous with
progress. Mobility problems, however, have increased con- II. R ELATED WORK
siderably, where problems related to traffic congestion have When attempting to mitigate traffic jam problems, the first
become a major concern for urban centers [1]. point to be addressed is regarding the density of vehicles; a
Due to the huge fleet of vehicles we currently have, there is mechanism is presented in [7] that through the use of magnetic
an unavoidable interest in reducing the effects of congestion. sensors it becomes possible to count the quantity of cars that
In the major urban centers, it is notorious and easily concluded pass in astreet. In addition to counting, based on the magnetic
that the demand for services, commerce, and personal travel field obtained by the sensors, the vehicles are classified by
grow year by year, placing unprecedented challenges in the bus, car, motorcycle or bicycle.
area of transportation. Based on the image obtained through a UAV (Unmanned
The mechanism that allows a mitigation of this massive Aerial Vehicle), [4] counts vehicles in a scene. The process
problem is traffic control. This process makes the vehicle begins with a screening of the paved areas, to restrict the
flow to be maximized in the roads allowing a shorter travel search space. The next step is the feature extraction process
time, which implies the reduction of congestion. Vehicle flow based on scalar transformations. The information obtained is
refers to the number of vehicles passing a particular route over passed to an SVM (Support Vector Machine) that can inform
a period [2]. According to DENATRAN (Brazilian National the number of vehicles in the scene.
Department of Transit), the leading cause of congestion is the A video-based approach is made in [8] where the slicing of
intersection of streets. The traffic light is the main mechanism video in frames is performed, and a mask is applied to extract
used to control traffic at these intersections [3]. parts of the frame. Each slice obtained is used as input to a
Urban traffic control has been evolving over the years, so CNN (Convolutional Neural Network) so that it can identify
we are in a generation where real-time traffic data collection and classify the vehicles. Such an approach is interesting
978-1-5386-3734-0/17/$31.00 2017
c IEEE because it has proved quite robust for identifying vehicles in
a scene. Also using videos, [5] presents a method of detecting
and counting vehicles where the process is based on particle
clustering, so that each set of particles in a video represents a
vehicle. Despite achieving good results, this method requires a
great deal of computational effort, a scarce resource for traffic
monitoring devices.
In [9] a real-time vehicle counting technique using GMM
(Gaussian Mixture Model) is approached to perform a better
adaptation of the algorithm to the brightness of the scene.

III. V EHICLE C OUNTING S YSTEM


Vehicle flow analysis is only a starting point for traffic
control, and this initial process must be carried out efficiently
and effectively due to the need to be a basis for traffic control.
In the development of the present work, it was necessary to
implement an algorithm that can be easily used in embedded
systems, since the system responsible for flow control should
be the least invasive in traffic, i.e., the use of sensors or other
mechanisms that prevent flow should be avoided. On the other
hand, the use of computational systems that can obtain the
necessary information without intervening in the flow should
be stimulated. Another requirement that must be evaluated is
the computational cost required by this mechanism since it
is the insertion of the computational process into embedded
systems that have a processing limitation.
Sophisticated systems with very high precision usually have
a high computational cost, according to [10]. Thus, its use in
traffic control assistance in large cities is hampered because the
cost associated with implementation is very high. The process
discussed below is aimed at counting vehicles from a video,
but focusing on mitigating the need for great processing power.
The process has been divided into two stages, the detection
of a vehicle from a video is performed initially, and then the
counting of the vehicles that are present in the recording is
performed. The flowchart in the figure 1 presents the steps of
the algorithm that repeat until the video completes.

A. Detection Figure 1. Flowchart of the presented algorithm.

In this first stage of the process, the video is cut into several
frames. This cut must be done in such a way that a sequence or vice versa. The figure 2 presents a vehicle where the
of images can be obtained and that allows the motion detection background removal operation was performed.
of the objects contained therein. In the following steps, only the foreground is used, since the
Initially, the region of interest is selected, and information relevant information at that time is the objects obtained after
such as trees and sidewalks is ignored. The sequence of steps the background removal. The next step is the application of
starts with the selection of a frame to be chosen as background, morphological operations in the object segmentation process.
by default the first frame obtained in the video was used, The first applied function is Closing so that small breaks in
however, for a greater accuracy, this image can be selected the contour of the vehicles are corrected, then the Opening is
manually. Background elimination takes place in three stages. applied, so that the contours of the objects are smoothed, and
In the first step, the difference between two frames, f and f+1 finally the Dilatation operation is carried out to Smooth curves
is calculated, then the difference is compared, and finally, the and correct imperfections [11].
pixels that have the same value in the difference calculation After the segmentation process of the objects, the contours
are eliminated. Then the image resulting from the background are identified in the image, and for each object contained
removal is converted to grayscale. At that moment, the vehicle in the scene, a minimum rectangle is inserted, in this way,
is represented by a set of pixels that move coherently, that is, each rectangle obtained is the representation of a vehicle. The
a region that has a different color from the darkest background equation 1 checks whether the selection of a given object
Figure 3. Result of the detection process.

Figure 2. Frame after morphological operations. the same vehicle appears in several scenes, being of great
importance the treatment of that fact, lest it allows the same
vehicles can be counted several times.
can be interpreted as a vehicle. If the result is equal to 1, The comparison of vehicles stored in the list with those
the object is considered as a vehicle, if the result is 0, the identified in the new scene is performed based on the po-
object is disregarded. A minimum rectangle size is defined so sition of the centroids in the frames f and f-1. The com-
that it is possible to identify valid objects in the image, that parison takes into account the distance between the two
is, even if it is inserted into a rectangle, the object is only points and the angle formed between them. To obtain the
considered a vehicle if the rectangle containing it has height distance is used the canonical form of the Euclidean distance
and width greater than or equal to the threshold. The value p
(x1 − x2 )2 + (y1 − y2 )2 and the angle is set between -
set for the threshold must be carefully checked and tested 180o and 180o increasing clockwise. The inclusion of the
because the distance from the camera to the object implied two parameters mentioned is of paramount importance so that
directly in its height and width parameters, often causing vehicles occasionally change tracks or pass at high speed are
vehicle identification error. not lost.

objectSize ≥ minimumRectangle, 1
objectSize < minimumRectangle, 0
(1) γ = −0.008 ∗ angle2 + 0.4 ∗ angle + 25 (2)

If the rectangle is set to a minimal value, there is a After obtaining the distance between the centroids and the
possibility a single vehicle will be identified in several parts angle formed therebetween, it is necessary to use a threshold
and having such parts inserted each into a rectangle, causing to check whether the two centroids compared represent the
the error of a single vehicle to be identified as many. On the same vehicle or are different vehicles. The value of this
other hand, if set to a very large size, the identification of threshold should be cautiously defined because the distance of
vehicles such as motorcycles can be ignored, also causing the the camera in the scene and the angle that the image is obtained
counting error. The size of the minimum rectangle has such can influence the parameters. Based on tests, the threshold γ
relevance that it can be used to classify the vehicle, by bus, used for this work is according to the equation 2. The values
car or motorcycle, for example. However, this procedure will used presented good results with the videos that compose the
not be addressed in this paper. The figure 3 shows the result database tested, which in turn, is composed of videos where
obtained with the mentioned steps for detection. the camera positioning is diagonal concerning the street.
After setting the γ, it is evaluated whether the calculated
B. Counting distance between the points is less than the value obtained. If
The counting process has as input the objects found in the the distance at time t is less than the limit obtained, this new
detection phase. Valid rectangles are taken as vehicles on the point refers to the same vehicle since it has the centroid of the
scene, and from that point, the centroids of each object are previous scene, so that the position of the vehicle is updated
calculated. This mechanism will represent a vehicle in the with the new values. Otherwise, the new centroid is defined
scene. as a new vehicle and stored in the list.
In the first scene, each of the centroids found is assumed To perform counting of vehicles, a counting point mark is
as a vehicle and are stored in a list. From the second scene inserted into the frame. This point is the determination of a
onwards, it is necessary to compare the vehicles stored in linear position in the frame and each vehicle that crosses that
the list, and with those found in the current frame, such demarcated region, it must be counted. The count is given
comparison is necessary so that there is a link between by the position of the centroid on frame f and the defined
the same vehicle in different frames. Because it is a video, point, with the position of the centroid being greater or less,
streets with a lot of tracks, the flow of vehicles that can pass
without being imply in congestion is greater than a street with
few tracks. That is, a street with more tracks will have a larger
number of vehicles, but this value does not necessarily imply
congestion.
The algorithm developed to perform the count by track,
simply change the region of interest by centralizing only one
track and add a loop that counts all of them.

Figure 4. Counting point marking.

depending on the direction of the vehicle. When the point


crosses the mark, addition the car on the accounting. The
vertical line in the figure 4 displays the counting point for
the scene.
The region defined as a counting point must be perpendic- Figure 5. Source image without region of interest selection.
ular to the flow of vehicles so that it is possible to recognize
all vehicles crossing the region. Marking should preferably be The region of interest defined was focusing on the central
done at the end of the image (direction of vehicle flow) so part of the route, so as to remove unnecessary information as
that all vehicles have already been identified at the point of part of the traffic light structure, such as sky, sidewalks, and
counting. After crossing this point, the vehicles are ignored trees. The area must be selected very careful to avoid losing
because its count has already been performed. information where the vehicles travel. The figure 6 represents
the selection made in one of the experiments related to the
IV. E XPERIMENTS AND R ESULTS whole scene captured in the figure 5.
The images used to perform the experiments were obtained
through the Camerite 1 website where it is possible to have
access to several cameras around the world. The videos used
were obtained in cameras that are in the city of Recife in
Brazil. All videos selected are with good light intensity and
during the day. The positioning of the selected cameras is
diagonal, a factor that hinders the detection process since it
can occlude objects. The best positioning is a vertical view of
the street, so occlusion becomes more difficult to occur.
Figure 6. Background according to selection of the region of interest.
The process described in the previous section was imple-
mented using the Python language and the OpenCV (Open
Source Computer Vision Library) library.
Initially, the video was sliced into several frames. This cut
is done with a capture every 200 milliseconds approximately.
This value may vary according to the experiment, but it is
important to note that this parameter will directly influence
the total execution time of the counting process and the
computational cost necessary to the process.
For the experiment presented, the algorithm counts all tracks
of the street, as can be seen in the figure 5. However, for an Figure 7. Frame after morphological operations.
intelligent traffic control process that uses the flow of vehicles
as input data, it becomes interesting to perform the count In the detection process, the functions used to remove
taking into account the number of tracks on the street. This the background and perform the morphological operations
normalization by tracks is important because on very wide mentioned in the previous session were those contained in the
OpenCV library for Python. The result is shown in the figure
1 www.camerite.com 7.
Figure 8. Identification of vehicles after detection and counting phase.

The figure 8 presents the final result of the detection and


counting, where each vehicle is contained in a rectangle, and
all its centroids in the course of the frames are demarcated.
This factor implies not counting the same object more than
once. However, due to occlusion, some objects may be mis- Figure 9. Comparison of the quantity obtained with the process described
labeled. The error is caused by the undue grouping of two and the actual quantity.
or more objects or the opposite case which is the separation
of objects that are unique. The experimental results obtained
The graph in the figure 9 presents the result by making
were compared with the results obtained by [12] and [13],
a direct comparison between the values found in the count
these works have similar objectives.
performed by the algorithm and the actual number of vehicles
in the video.
Table I
E XPERIMENTS DONE WITH VIDEO OF THE E NGENHEIRO D OMINGOS Table II
F ERREIRA AVENUE R ESULT FOR COUNTING WITH DIFFERENT FLOWS
Experiment Actual Amount Amount detected Location Actual Amount Amount detected
1 06 06 Gov. Agamenon Magalhães Av. 09 10
2 09 11 Gov. Agamenon Magalhães Av. 14 14
3 12 10 Gov. Agamenon Magalhães Av. 27 24
4 17 18 Moreira e Silva Av. 15 15
5 18 16 Moreira e Silva Av. 21 21
6 22 22 Moreira e Silva Av. 25 21
7 24 21
8 28 31
9 28 32 To demonstrate the functioning of the algorithm, more
10 12 13 experiments were also carried out with cameras from other
parts of the same city (Governador Agamenon Magalhães and
The process developed in [12] presents a good accuracy, Moreira e Silva avenues). All experiments were run three
according to the author. However, the approach to the vehicle times to validate the count performed. The results obtained are
tracking mechanism may cause greater difficulty in counting presented in the table II. Based on the experiments performed
objects. This problem was mitigated by the centroid mecha- on videos from various locations, the algorithm obtained the
nism discussed in this paper. In turn, the algorithm developed accuracy of well over 90%. Videos with good illumination
in [13] has the approach quite similar to that presented in this and without occlusion of vehicles, the accuracy was equal to
work, since it uses centroids to track the vehicles, but it has 100%.
little efficiency. Even when using videos with good lighting In real-time applications, a very important factor is the com-
and in the morning the accuracy in some experiments reaches putational cost. Since it is of the utmost importance that the
less than 50%. With such a low hit rate, it is unfeasible to use application can return promptly, a response without affecting
such a procedure to aid in the flow control of vehicles. or delaying the performance of the system/application as a
To demonstrate the operation of the algorithm, ten experi- whole is necessary. To present the computational cost, the
ments were selected using a camera at Engenheiro Domingos time in seconds for the experiments was listed in the table I.
Ferreira Avenue with videos lasting between thirty and sixty The experiments were run on a laptop with Ubuntu operating
seconds, all in AVI format. The results obtained are presented system version 16.04, Intel Core i5 processor, 8GB of RAM
in the table I. and NVIDIA GeForce 820M graphics card with 2GB.
After the executions presented, the Wilcoxon Signal Test The table III presents the average computational cost ob-
was applied with significance level equal to 95% to validate if tained for the processing of each frame. Observing the table,
the count performed by the algorithm is statistically equal to it is possible to verify that the highest computational cost is
the present number of vehicles present in the video, the result for the morphological operations. The processor usage aver-
was positive. The number detected is statistically equals real aged 12.9% with a standard deviation of 1.41. One relevant
number. information is that the total execution time required to process
Table III with the greatest impacts were related to the light density
C OMPUTATIONAL C OST and the stabilization of the camera. These two characteristics
Stage Cost directly compromise the detection of the vehicle, as they affect
Morphological Operations 8.41x10−4 seg the operations performed in the video processing. An example
Vehicle Detection 4.38x10−4 seg of this is the removal of the background in each frame.
Vehicle Count 2.74x10−4 seg
Frame Processing 1.59x10−2 seg
As an improvement in the presented process, it is necessary
to perform counting with light variation and camera instability,
since it becomes important to count vehicles regardless of
a 60-second video was on average 17.1 seconds when the 200- the brightness of the scene and climate variation. This paper
millisecond interval between frames capture was not used. refers to the initial step for the development of effective traffic
In this way, it is possible to realize that the algorithm is control systems that use image processing mechanisms to
computationally efficient to be used in real time. obtain traffic density.
R EFERENCES
[1] M. Heinen, C. Sá, F. Silveira, C. Cesconetto, and G. Sohn, “Controle
inteligente de semáforos utilizando redes neurais artificiais com funções
de base radial,” Encontro Anual de Tecnologia da Informação e Semana
Acadêmica de Tecnologia da Informação. Frederico Westphalen/RS, pp.
38–45, 2013.
[2] C. N. de Trânsito, Manual Brasileiro de Sinalização de Trânsito, 2007.
[3] D. N. de Trânsito, Manual de Semáforos, 2nd ed., 1984.
[4] T. Moranduzzo and F. Melgani, “Automatic car counting method for
unmanned aerial vehicle images,” IEEE Transactions on Geoscience and
Remote Sensing, vol. 52, no. 3, pp. 1635–1647, 2014.
[5] P. R. M. Barcellos, “Detecção e contagem de veı́culos em vı́deos de
tráfego urbano,” 2014.
[6] E. Mitsakis, J. M. Salanova, and G. Giannopoulos, “Combined dynamic
traffic assignment and urban traffic control models,” Procedia-Social and
Behavioral Sciences, vol. 20, pp. 427–436, 2011.
Figure 10. Vehicle occlusion. [7] J. M. Rizwan, P. N. Krishnan, R. Karthikeyan, and S. R. Kumar, “Multi
layer perception type artificial neural network based traffic control,”
Indian Journal of Science and Technology, vol. 9, no. 5, 2016.
The task of counting vehicles is not trivial since several [8] C. M. Bautista, C. A. Dy, M. I. Mañalac, R. A. Orbe, and M. Cordel,
factors hamper image processing, among them, are noise, “Convolutional neural network for vehicle detection in low resolution
luminosity variation, climate and, mainly, occlusion. The latter traffic videos,” in Region 10 Symposium (TENSYMP), 2016 IEEE.
IEEE, 2016, pp. 277–281.
is quite complicated to deal with, the figure 10 exemplifies [9] B. Setiyono, D. S. Ratna, I. Mukhlash, and R. J. Augusta, “A new
the problem. In this case, the presence of a truck prevents approach algorithm for counting of vehicles moving based on image
the visualization of other cars that are in the street, so that processing,” International Journal of Computer Science and Information
Security, vol. 14, no. 10, p. 366, 2016.
the algorithm will not be able to identify them. Some papers [10] G. Buttazzo, Hard real-time computing systems: predictable scheduling
propose alternatives to mitigate this problem, in [14] is shown algorithms and applications. Springer Science & Business Media, 2011,
a mechanism that uses mutually the windshield detection and vol. 24.
[11] J. Y. Gil and R. Kimmel, “Efficient dilation, erosion, opening, and clos-
the grouping of points, in [15] is presented the geometric ing algorithms,” IEEE Transactions on Pattern Analysis and Machine
process based on the vertices that form the vehicle. However, Intelligence, vol. 24, no. 12, pp. 1606–1617, 2002.
the computational cost to use these techniques mentioned is [12] M. Daigavane and P. Bajaj, “Real time vehicle detection and counting
method for real time vehicle detection and counting method for unsuper-
greater than presented in this paper. vised traffic video on highways unsupervised traffic video on highways,”
The mentioned problems generate the counting errors pre- IJCSNS, vol. 10, no. 8, p. 112, 2010.
sented in the tables I and II, for example, in the last experiment [13] M. B. Subaweh and E. P. Wibowo, “Implementation of pixel based
adaptive segmenter method for tracking and counting vehicles in vi-
presented for Moreira e Silva Avenue the detection of four sual surveillance,” in Informatics and Computing (ICIC), International
vehicles was not performed. Conference on. IEEE, 2016, pp. 1–5.
[14] J. Yang, Y. Wang, A. Sowmya, Z. Li, B. Zhang, and J. Xu, “Feature
V. C ONCLUSION fusion for vehicle detection and tracking with low-angle cameras,” in
Applications of Computer Vision (WACV), 2011 IEEE Workshop on.
In this paper, it was presented a mechanism to provide the IEEE, 2011, pp. 382–388.
input data for a traffic control system, the density of vehicles in [15] C. C. C. Pang, W. W. L. Lam, and N. H. C. Yung, “A method for vehicle
count in the presence of multiple-vehicle occlusions in traffic images,”
one street. Once in possession of this information, it becomes IEEE Transactions on Intelligent Transportation Systems, vol. 8, no. 3,
possible to evaluate the flow and to make an effective control pp. 441–459, 2007.
in the opening time of the traffic lights, avoiding the traffic
jam. The implemented algorithm achieves, in a simplified way
and without requiring much computational power to perform,
the counting of vehicles.
During the implementation process, some limitations were
found in the presented methodology. Among them, the ones

You might also like