Tracking and Counting of Vehicles For Flow Analysis From Urban Traffic Videos
Tracking and Counting of Vehicles For Flow Analysis From Urban Traffic Videos
Tracking and Counting of Vehicles For Flow Analysis From Urban Traffic Videos
Abstract—Among the major problems faced by urban centers is essential [4]. Based on the density of vehicles in the street,
there is traffic congestion. This problem comes from the growing it becomes possible to develop a mechanism that effectively
number of vehicles on the streets and has already become the reduces congestion. From the number of vehicles that travel
subject of several researches seeking for solutions to it. Among
the mechanisms that allow congestion reduction is traffic control, on a particular route, it becomes possible to obtain predictions
which requires metrics that enable traffic analysis in real time. of traffic jams, which makes it possible to take some actions,
To determine the flow of vehicles the widely used mechanism such as a better adjustment of a traffic light’s time.
is the counting of occurrence of vehicles on a street, which To obtain information about the flow of vehicles in real
is usually performed from sensors (e.g. magnetic or thermal). time, it is necessary to use substantial computational power
However, these approaches have a rather high installation and
maintenance difficulty. Thus, the objective of this paper is to together with images obtained by cameras. The use of simpler
present a mechanism capable of counting from video images. sensors, such as magnetic sensors, has become unfeasible these
To accomplish this task it is used image processing resources days mainly due to the difficulty inherent in their installation
that do not require large computational power, thus allowing the and maintenance [5]. When using video cameras that have a
mechanism to be easily coupled to common transit systems. The greater ease of handling, the amount of information that can
result obtained has an accuracy of more than 90 % in videos of
urban traffic cameras. be obtained increases significantly, but for this, it becomes
necessary to use efficient systems to make the processing
Key words—Vehicle detection, vehicle count, image processing. useful when executed in real time [6].
The essential information for the process of traffic control
is the flow of vehicles [7]. In this way, this paper presents an
I. I NTRODUCTION
approach to be executed in real time that makes it possible to
Transportation has always been an extremely relevant aspect obtain the counting of occurrences of the vehicles. This infor-
for society, the great advance of humanity is due to the mation, when processed, has great value for traffic control.
mobility, in addition to being treated as synonymous with
progress. Mobility problems, however, have increased con- II. R ELATED WORK
siderably, where problems related to traffic congestion have When attempting to mitigate traffic jam problems, the first
become a major concern for urban centers [1]. point to be addressed is regarding the density of vehicles; a
Due to the huge fleet of vehicles we currently have, there is mechanism is presented in [7] that through the use of magnetic
an unavoidable interest in reducing the effects of congestion. sensors it becomes possible to count the quantity of cars that
In the major urban centers, it is notorious and easily concluded pass in astreet. In addition to counting, based on the magnetic
that the demand for services, commerce, and personal travel field obtained by the sensors, the vehicles are classified by
grow year by year, placing unprecedented challenges in the bus, car, motorcycle or bicycle.
area of transportation. Based on the image obtained through a UAV (Unmanned
The mechanism that allows a mitigation of this massive Aerial Vehicle), [4] counts vehicles in a scene. The process
problem is traffic control. This process makes the vehicle begins with a screening of the paved areas, to restrict the
flow to be maximized in the roads allowing a shorter travel search space. The next step is the feature extraction process
time, which implies the reduction of congestion. Vehicle flow based on scalar transformations. The information obtained is
refers to the number of vehicles passing a particular route over passed to an SVM (Support Vector Machine) that can inform
a period [2]. According to DENATRAN (Brazilian National the number of vehicles in the scene.
Department of Transit), the leading cause of congestion is the A video-based approach is made in [8] where the slicing of
intersection of streets. The traffic light is the main mechanism video in frames is performed, and a mask is applied to extract
used to control traffic at these intersections [3]. parts of the frame. Each slice obtained is used as input to a
Urban traffic control has been evolving over the years, so CNN (Convolutional Neural Network) so that it can identify
we are in a generation where real-time traffic data collection and classify the vehicles. Such an approach is interesting
978-1-5386-3734-0/17/$31.00
2017
c IEEE because it has proved quite robust for identifying vehicles in
a scene. Also using videos, [5] presents a method of detecting
and counting vehicles where the process is based on particle
clustering, so that each set of particles in a video represents a
vehicle. Despite achieving good results, this method requires a
great deal of computational effort, a scarce resource for traffic
monitoring devices.
In [9] a real-time vehicle counting technique using GMM
(Gaussian Mixture Model) is approached to perform a better
adaptation of the algorithm to the brightness of the scene.
In this first stage of the process, the video is cut into several
frames. This cut must be done in such a way that a sequence or vice versa. The figure 2 presents a vehicle where the
of images can be obtained and that allows the motion detection background removal operation was performed.
of the objects contained therein. In the following steps, only the foreground is used, since the
Initially, the region of interest is selected, and information relevant information at that time is the objects obtained after
such as trees and sidewalks is ignored. The sequence of steps the background removal. The next step is the application of
starts with the selection of a frame to be chosen as background, morphological operations in the object segmentation process.
by default the first frame obtained in the video was used, The first applied function is Closing so that small breaks in
however, for a greater accuracy, this image can be selected the contour of the vehicles are corrected, then the Opening is
manually. Background elimination takes place in three stages. applied, so that the contours of the objects are smoothed, and
In the first step, the difference between two frames, f and f+1 finally the Dilatation operation is carried out to Smooth curves
is calculated, then the difference is compared, and finally, the and correct imperfections [11].
pixels that have the same value in the difference calculation After the segmentation process of the objects, the contours
are eliminated. Then the image resulting from the background are identified in the image, and for each object contained
removal is converted to grayscale. At that moment, the vehicle in the scene, a minimum rectangle is inserted, in this way,
is represented by a set of pixels that move coherently, that is, each rectangle obtained is the representation of a vehicle. The
a region that has a different color from the darkest background equation 1 checks whether the selection of a given object
Figure 3. Result of the detection process.
Figure 2. Frame after morphological operations. the same vehicle appears in several scenes, being of great
importance the treatment of that fact, lest it allows the same
vehicles can be counted several times.
can be interpreted as a vehicle. If the result is equal to 1, The comparison of vehicles stored in the list with those
the object is considered as a vehicle, if the result is 0, the identified in the new scene is performed based on the po-
object is disregarded. A minimum rectangle size is defined so sition of the centroids in the frames f and f-1. The com-
that it is possible to identify valid objects in the image, that parison takes into account the distance between the two
is, even if it is inserted into a rectangle, the object is only points and the angle formed between them. To obtain the
considered a vehicle if the rectangle containing it has height distance is used the canonical form of the Euclidean distance
and width greater than or equal to the threshold. The value p
(x1 − x2 )2 + (y1 − y2 )2 and the angle is set between -
set for the threshold must be carefully checked and tested 180o and 180o increasing clockwise. The inclusion of the
because the distance from the camera to the object implied two parameters mentioned is of paramount importance so that
directly in its height and width parameters, often causing vehicles occasionally change tracks or pass at high speed are
vehicle identification error. not lost.
objectSize ≥ minimumRectangle, 1
objectSize < minimumRectangle, 0
(1) γ = −0.008 ∗ angle2 + 0.4 ∗ angle + 25 (2)
If the rectangle is set to a minimal value, there is a After obtaining the distance between the centroids and the
possibility a single vehicle will be identified in several parts angle formed therebetween, it is necessary to use a threshold
and having such parts inserted each into a rectangle, causing to check whether the two centroids compared represent the
the error of a single vehicle to be identified as many. On the same vehicle or are different vehicles. The value of this
other hand, if set to a very large size, the identification of threshold should be cautiously defined because the distance of
vehicles such as motorcycles can be ignored, also causing the the camera in the scene and the angle that the image is obtained
counting error. The size of the minimum rectangle has such can influence the parameters. Based on tests, the threshold γ
relevance that it can be used to classify the vehicle, by bus, used for this work is according to the equation 2. The values
car or motorcycle, for example. However, this procedure will used presented good results with the videos that compose the
not be addressed in this paper. The figure 3 shows the result database tested, which in turn, is composed of videos where
obtained with the mentioned steps for detection. the camera positioning is diagonal concerning the street.
After setting the γ, it is evaluated whether the calculated
B. Counting distance between the points is less than the value obtained. If
The counting process has as input the objects found in the the distance at time t is less than the limit obtained, this new
detection phase. Valid rectangles are taken as vehicles on the point refers to the same vehicle since it has the centroid of the
scene, and from that point, the centroids of each object are previous scene, so that the position of the vehicle is updated
calculated. This mechanism will represent a vehicle in the with the new values. Otherwise, the new centroid is defined
scene. as a new vehicle and stored in the list.
In the first scene, each of the centroids found is assumed To perform counting of vehicles, a counting point mark is
as a vehicle and are stored in a list. From the second scene inserted into the frame. This point is the determination of a
onwards, it is necessary to compare the vehicles stored in linear position in the frame and each vehicle that crosses that
the list, and with those found in the current frame, such demarcated region, it must be counted. The count is given
comparison is necessary so that there is a link between by the position of the centroid on frame f and the defined
the same vehicle in different frames. Because it is a video, point, with the position of the centroid being greater or less,
streets with a lot of tracks, the flow of vehicles that can pass
without being imply in congestion is greater than a street with
few tracks. That is, a street with more tracks will have a larger
number of vehicles, but this value does not necessarily imply
congestion.
The algorithm developed to perform the count by track,
simply change the region of interest by centralizing only one
track and add a loop that counts all of them.