0% found this document useful (0 votes)
15 views11 pages

Deep Learning Based Vehicle Speed Estimation On Highways

1) The document proposes a deep learning-based method to estimate vehicle speeds on highways using fixed surveillance cameras. 2) An SSD network is used to detect vehicle license plates, and a deep SORT model tracks vehicles in the license plate areas to estimate speeds based on traveled distance. 3) The method achieves speed estimation errors between -1.5 and 1.1 km/h, meeting the accuracy requirements for vehicle speed monitoring in Vietnam.

Uploaded by

Hien Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views11 pages

Deep Learning Based Vehicle Speed Estimation On Highways

1) The document proposes a deep learning-based method to estimate vehicle speeds on highways using fixed surveillance cameras. 2) An SSD network is used to detect vehicle license plates, and a deep SORT model tracks vehicles in the license plate areas to estimate speeds based on traveled distance. 3) The method achieves speed estimation errors between -1.5 and 1.1 km/h, meeting the accuracy requirements for vehicle speed monitoring in Vietnam.

Uploaded by

Hien Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

JST: Smart Systems and Devices

Volume 33, Issue 1, January 2023, 043-053

Deep Learning Based Vehicle Speed Estimation on Highways


Nguyen Thi Thu Hien1, Tran Thi Hien2, Le Dinh Chung3, Tien Dzung Nguyen1*
1
School of Electrical and Electronic Engineering, Hanoi University of Science and Technology, Ha Noi, Vietnam
2
Nam Dinh University of Technology Education, Nam Dinh, Vietnam
3
Department of Military Medicine, General Department of Logistics,
Ministry of National Defence, Ha Noi, Vietnam
*
Corresponding author email: [email protected]

Abstract
Traffic management always is a matter requiring the attention of highway system managers in terms of vehicle
monitoring and speed estimation. This paper proposes an efficient deep learning-based vehicle speed
estimation on highway lanes in the Vietnam transport system. The input videos are recorded by fixed
surveillance cameras. An optimized single shot multibox detector network, called SSD, is utilized for vehicle
license plate detection (LPD). The deep SORT (simple online and real-time tracking) model is first applied to
video vehicle tracking and performed in the detected license plate area. This tracking process investigates the
traveling distance of a vehicle to estimate its speed. In this study, the dataset has been normalized to improve
the efficiency of vehicle localization and tracking to improve the time elapsing in the estimation of the distance
traveled by vehicles on highways. The results showed that the proposed system has achieved better accuracy
in terms of the determined speeds with errors ranging between [-1.5, +1.1] km/h, equivalent to 98% of the
error limit by the regulation in Viet Nam.
Keywords: Vehicle detection, speed estimation, feature extraction, LPD-SSD network, SORT model

1. Introduction 1 perspective projection and requires vehicles to move


in a straight line, at a constant speed, and parallel to the
The rapid development of intelligent
lane under testing. Vehicle speed determination
transportation systems (ITS) has proven an important
investigates the captured stereo video frames, where
role in intelligent traffic monitoring, management,
feature extraction is extracted to identify the media for
dynamic information, and vehicle control services.
estimation of the depth map and then measures vehicle
Various methods have been introduced for monitoring
speed. The superior advantage of this method is to
vehicles on the road through domestic and
handle speed estimation problems when vehicles move
international studies. The proposed methods of
in curvilinear structured environment. However, stereo
measuring vehicle speed using improved image
image tracking is accomplished through particle
processing technologies have significant speed errors
filtering, which relies on manual setting of the
for vehicles moving at high speeds.
vehicle's features. One-time manual setup is required
Video-based vehicle speed measurement has for a feature, meanwhile, various vehicles may have
been recently receiving more and more research different features leading to more complex cases.
interest because it is suitable for stealth speed Therefore, multiple manual settings are adopted if
measurement and has a low cost. It can also provide some vehicles appear in given image frames for speed
vehicle speed determination and identification measure. Several methods are focusing on motion-
information in the same video at the same time. While based speed detection that subtracts a static
the methods mentioned are video-based vehicle speed background from a video frame for media detection,
estimation, the method falls into feature-based ones for depth calculation, and speed measurement. However,
identification of vehicles from video frames by their these approaches face to difficulty in distinguishing
visual characteristics. There is a motion-based one to vehicles and other moving objects detected in a video
locate vehicles by an optical flow. The common point sequence. The study in [1] proposes an intelligent
of these methods is to calculate traveling distance of a system with integrated vehicle tracking based on
vehicle detected from the surveillance videos, and thus stereo motion estimation to achieve accurate speed
estimate the average speed of the vehicle. determination. However, this tracking uses speedup
The speed is estimated in 3D reconstruction and robust features (SURF) in [2] to perform license plate
visualization for transportation detection. There is a tracking in consecutive frames. In fact, up to this point
method that can solve the speed detection of the deep SORT [3] tracking model demonstrates
simultaneously moving vehicles with the lowest root higher efficiency than that of SURF in terms of speed
mean square error (RMSE). However, it uses detection.

ISSN: 2734-9373
https://doi.org/10.51316/jst.163.ssad.2023.33.1.6
Received: June 29, 2022; accepted: September 28, 2022

43
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

Fig. 1. Overall system of the proposed deep learning-based vehicle speed estimation

The study in this paper proposes a speed image plane in terms of pixels per frame [7]. They
estimation approach utilizing a deep learning converted the velocity vector into km/h with two
algorithm as demonstrated in Fig. 1. The input video is distinct pixels-to-meters and frames-to-seconds
captured by a surveillance camera. The LPD - SSD transfer functions. Their reported speed measurement
license plate detection network is recommended for accuracy is 94.8%.
synchronous extraction of typical features from the
Commonly, vehicle speed estimation methods
localized license plate, and the vehicle tracking in the
are classified into two main categories: binocular
video using the SORT network is performed only in
stereo camera-based and monocular single camera-
the license plate area.
based.
2. Related Works
Accurate vehicle feature detection in video
Most recently, A. S. Gunawan et al. measured frames is a prerequisite for video-based vehicle speed
vehicles’ speed using a single camera [4]. In their measurement. License plate has a regular appearance,
proposed algorithm the road image was mapped using uniform contour, and relatively rich texture details and
direct linear transformation (DLT) method. A vehicle can thus be easily detected. License plate is also unique
passed a straight line which was limited by four points and suitable for the synchronization of speed
and the camera measured the movement of the vehicle. measurement and information identification in the
The positions of the four points were used to map their future. Many existing traffic surveillance systems
data from the image plane to the road plane. Finally, record license plates of vehicles in traffic violations,
after calculating the vehicle displacement at the and the infrastructure is already available. Although
ground plane and the time interval between the frames, the license plate is not attached to the ground plane, the
they calculated the vehicle speed. The reported 3D spatial position of the license plate center can be
accuracy is 96.14%. J. Sochor et al. also used a single directly calculated using our binocular stereovision
camera for speed measurement [5]. Vehicle tracking system. The relative 3D displacements can be
was performed using optical flow. They considered obtained. Accordingly, accurate vehicle speed can be
two reference start and end lines on the road plane and derived. Therefore, we choose license plate as the
count the number of video frames in which the vehicle object to be detected in our system. Traditional video
was present between these reference lines. The object detection methods include background
measured speed has an average error of 0.63 km/h with modeling, frame difference, and optical flow. The two
a standard deviation of 4.5 km/h. M. G. Moazzam et former methods are based on video processing. They
al. measured the speed of vehicles by calculating the are suitable for detecting moving objects with fixed
number of frames it takes for the vehicle to travel the background. However, license plate is always a part of
distance between two specified lines on the road plane a moving vehicle and cannot be detected separately.
[6]. The calculation average error is 10%. A new Meanwhile, the latter method can track objects with
method was presented by A. Tourani et al. based on fixed or moving background. It can also track license
measuring the vehicle displacement vector on the plate as a separate part but cannot detect license plate

44
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

without manual help. Morphological methods are while more effectively.


widely used in image object detection by relying on
3. Proposed Tachometer System
color, edge, shape, and texture attributes extracted with
the Canny detector, Sobel operator, template-matching In the proposed vehicle tachometer system, the
conditional random field, or wavelet. An edge operator input monitoring video source is obtained from the
is used to extract the vertical edge of license plate. industrial surveillance camera system XNP-
Edge density information is utilized to detect license 6550RH/VAP located in Phap Van - Cau Gie highway
plate. A rectangular sliding window is adopted to with the following parameters: Sony ICX445 CCD
detect image regions of high gradient density, such as camera, 1/3 inch, 1288x964 maxi-mum resolution,
license plate. All these morphological methods are effective 1.25 Mpixels, 32 MB onboard buffer, and
cumbersome, time consuming, and unsuitable for LPD 512 KB data flash memory for data acquisition
in complex background. Whether. The local server system is equipped with
6-core Intel E5-2620 v3 CPU @ 2.40 GHz, 32 GB
Modern object detection methods have
RAM, Nvidia Geforce GTX 1080 8 GB standalone
developed with the progress of convolutional neural
graphics card, and 1T solid-state hard drive for data
networks (CNNs) [8]. Since AlexNet won ILSVRC
storage and a laptop, with Core i7 CPU, 8 GB RAM,
2012 by a notable margin 2012, object detection
Nvidia Geforce 830 M 2GB discrete graphics card.
methods with CNN have received considerable
attention. CNN-based object detection methods can 3.1. Vehicle Feature Extraction
detect various objects accurately and intelligently with
the trained networks and models. The R-CNN series The proposed system is implemented by
(e.g, RCNN, Fast-RCNN, and Faster-RCNN) utilization of the algorithms mentioned above, e.g
algorithms are typical examples of two-stage vehicle feature detection and vehicle tracking, and in
algorithms. They have high accuracy but need long combination with vehicle speed estimation. In vehicle
computation time. You-only-look-once (YOLO) and feature detection, an LPD -SSD model is trained based
SSD are typical examples of one-stage algorithms that on an optimized SSD network. The trained model is
perform faster than the two-stage ones. For example, then used to detect all license plates in the captured
the detection speed of SSD [9] in VOC2007 dataset video. For vehicle tracking, the detected license plate
can reach 59 frames per second (FPS), whereas that of areas in consecutive frames are matched to investigate
Faster R-CNN can only reach 7 FPS. All these CNN- tracking independency between the license plates
based object detection methods are suitable for LPD. under consideration. Finally, the license plate in the
A CNN network is trained and fine-tuned to detect frame pairs is matched to extract all matching-point
license plate. A CNN-based MD-YOLO framework pairs and utilized for estimation of the vehicle speed.
for multidirectional LPD is proposed. Region-based In the tachometer, the ratio between the distance in
convolutional neural network is trained to detect meters and the distance in pixels has been pre-
license plate. determined. The points closest to the center of the
license plate are then chosen as the exact coordinates
The next part of the recommended system is of the current frame pairs, the distance the vehicles
vehicle tracking. Deep SORT was developed by have traveled in a given period (i.e. a frame interval or
Nicolai Woke and Alex Bewley [3] shortly after SORT several frame intervals) is determined and then vehicle
[10] to address the shortcoming problems associated speed is estimated from the detected distance over
with a high number of ID switches. The solution time.
proposed by deep SORT is based on using deep
learning to extract features of objects to increase Fig. 2 describes the processing steps in vehicle
accuracy in the data association process. In addition, a license plate detection using LPD-SSD.
linking strategy was also built called Matching
Cascade to help link objects after disappearing for a

Fig. 2. Flowchart of vehicle license plate detection.

45
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

Fig. 3. Dataset before and after data augmentation

3.1.1. Data augmentation - 3000 images (1600x1200 resolution) taken from


the traffic data set supported by the Voice of
This is a technique applied in dataset setup and Vietnam Radio for research students;
normalization processes to enhance data quality.
Currently, in deep learning, data augmentation plays a - 2500 images (1920x1080 resolution) extracted
very important role. Since the data set has little or lacks from the campus security monitoring system of
diversity, it will be difficult to train the model to apartment buildings in Hanoi city, in which the
produce a good model used for future prediction. As vehicles can be far away from the camera and the
illustrated in Fig. 3, a data augmentation technique is number plate size is very small.
adopted to diversify and enrich the datasets for After cropping and resizing random images
improvement in prediction efficiency through the picked from the total 8000 images above in the
trained model. The data augmentation helps fix the augmentation, the dataset size has been expanded by a
"not enough data" problem, prevents overfitting, and factor of three, i.e., the augmented dataset finally
moreover makes the model perform better on never- includes 24000 images.
before-seen models. In addition, no additional effort is
required in data collection or data labeling, which may
require a higher cost or impractical solution.
The following image and signal processing
algorithms are required to increase dataset diversity:
• Image cropping, rotation, and flipping;
• Blur and sharpen filtering;
• Affine transformation: to preserve parallel lines
in an image; a) b) c) d) e)
• Noise addition or removal: such as salt and Fig. 4. Example of calibration results of the augmented
pepper noise, Gaussian noise; data. a) Original image. b) Resized image. c) Cropped
• Color shifting: lighting or contrast adjustment. image. d) Brightness adjusted image. e) Rotated image

Fig. 4 illustrates the diversifying process as


mentioned above to set up datasets, where the cropped
images containing license plate area have been resized
to a given original size and followed by brightness
adjustment stage and then rotated to the horizontal
direction.
3.1.2. Prepared data set
Fig. 5 illustrates examples of dataset that
includes:
- 2500 images (4032x2268 resolution) with high
resolution and image quality captured by the
Nikon d3200 SLR in parking lots and roads street
at 5 - 30 m distance; Fig. 5. Examples of the setup dataset

46
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

3.2. Application of LPD- SSD Based Model


To detect vehicles from the surveillance video
frames, vehicle features are first extracted by using
LPD - SSD as the network structure [1] as shown in
Fig. 6.
The LPD-SSD based model is trained with half
of the augmented dataset which consists of 24000 Fig. 6. LPD-SSD network architecture
images captured from different angles. The other half
is used to test the trained model and its performance is
recorded and assessed afterward. Another 600 license
plate images that are independent of the augmented
dataset are used to validate the final trained model as
demonstrated in Fig. 7.
After 20000 training iterations, the performance
of the trained model has been recorded and shown in
Table 1, where the first step of LPD-SSD studied in [1]
(LPD-SSD1st) demonstrates superior accuracy of
98.2% on the augmented dataset compared to that
without augmentation. The real-time processing speed
of the two models is equal to 5.4 FPS. As the a) License plate detected at 30 m distance
processing speed increases, the consumption power of
the CPU and GPU in this system also increases
correspondingly.
Table 1. The accuracy of LPD-SSD network applied to
the original and the augmented datasets respectively

Network Accuracy FPS


LPD-SSD1st [1] without
97.8% 5.4
augmentation
LPD-SSD with augmented b) License plate detected at 5 m distance
98.2% 5.4
dataset Fig 7. License plates from the surveillance video
stream on Phap Van - Cau Gie highway are detected
Since the location of the mounted surveillance by LPD-SSD model (green) on the augmented datasets
camera and its distance to the vehicles moving on captured at ranges between 5m-30m in two-way lanes
highways are incredibly important parameters, the
proper setup of LPD - SSD network topology may help 3.3. Vehicle Tracking
adapt this system to any transportation environment
with allowed speed under 80 km/h which is typical in Fig. 8 depicts sequential steps in the vehicle
Vietnamese transportation. tracking process using Deep SORT [3] and Fig. 9
introduces its state model accordingly.

Fig. 8. Vehicle tracking using Deep SORT

47
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

Fig. 9. State model of the Deep SORT

These steps are described in detail below:


- Step 1: Use LPD-SSD to detect objects in the
current frame.
- Step 2: The Deep SORT utilizes the Kalman
filter to predict new track states based on past
tracks. These states are initially assigned
tentative values. If this value is still guaranteed to
be maintained for the next 3 frames, the state will
change from poll to confirmed and try to stay
tracked for the next 30 frames. Since the frame
rate of the surveillance camera applied in
this problem is 30 fps with speed ranges from
50 km/h - 90 km/h or 13.88 m/s - 25 m/s, where
possible detection range is about 25-30 m
distance, the utilized number of frames Fig. 10. License plates tracked by Deep SORT
containing vehicles still can guarantee no media
distortion. Conversely, if the tracking is lost in
less than 3 frames, the status will be removed Fig. 10 demonstrates the tracked license plates
from the tracker. from the video stream performed by both LPD-SSD
- Step 3: Using the verified tracks, introduce a model (light blue) bounding box (BB) and Deep SORT
matching cascade to associate all candidate model (dark blue) for combination.
detections based on distance and feature metrics 3.4. Speed Estimation
such as the shape or the combined background
colors for a given license plates. 3.4.1. Geometric model of camera in distance
- Step 4: Unlinked tracks and detections are passed estimation problem
to the next filter layer. Use the Hungarian The essence of a camera is its lens system [11].
algorithm to resolve the assignment problem with Fig 10 demonstrates the position of an object
the IOU cost matrix for the next 2nd association. according to the incident ray passing through O which
- Step 5: Classification of the candidate detections is the center of the lens. The image obtained on the
and tracks. screen reflects the object thanks to the optical
phenomena in the lens which is real and inverted to the
- Step 6: Use the Kalman filter to recalibrate the object. The closer the object is to the viewfinder, the
track value from the detections associated with larger size it has. This viewfinder receives lighting
the track and initialize new tracks.

48
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

source passing through lens system of the camera, and covered by a camera can be determined using frequent
then an image reflecting the object will be created on geometrical equations. In short, the camera calibration
the screen at the focal length f. The pixel position (i, j) parameters are set as follows:
of a reflected point from the object into the image plan
1) Camera elevation angle range is α towards the
can be evaluated from the object’s coordinates in 3D
vertical axis and α + δ is the maximum adjustable
space as in [12]:
elevation angle of this camera;
𝑋𝑋 𝑌𝑌
𝑖𝑖 = 𝑓𝑓 and 𝑗𝑗 = 𝑓𝑓 (1) 2) An obtained image I at resolution is m × n pixels;
𝑍𝑍 𝑍𝑍

where, X, Y, and Z are the pixel coordinates in the 3) The camera is located at the height h evaluated
OXYZ space; f is the focal length of the camera lens from the road surface;
system. The camera location must be set up over the
surface of the road with its optical axis inclined If the camera's viewing angle along the Ox-axis
downward to the roadway to cover the road plane. direction intersects the road surface at point C, which
Since the proposed solution focuses on speed detection is always set up as the center of the captured image
of vehicles on both highways and inner-city streets since point C corresponds to the center of the camera's
with a mixed traffic flow of motorcycles, cars, vans, image sensor. Point L is the closest position on the road
and other means, there is the need to identify each type where the camera can capture. Considering those
of means out of each other in multiple road lanes. parameters in this model setup, the distance from the
Therefore, this paper utilizes the direction angle (DA) camera to the object can be essentially evaluated.
of the first primary axis (FPA) for each coming vehicle If O’ is assumed to be the center of the camera’s
detected in the video sequence and captured by the image sensor, the angle O� ′ OA between OO’ and a
surveillance camera mounted on the roads [13]. The given pixel A(i,j) on the image plan I of size m × n
background subtraction is first performed for the may be determined as:
captured frame sequence and then each vehicle is
2 2
located and the DA of FPA is evaluated for decision ��𝑖𝑖−𝑚𝑚� +�𝑗𝑗−𝑛𝑛�
2 2
making in the identification task [10]. The study in ′ OA = 𝑡𝑡𝑡𝑡𝑡𝑡
O� (2)
[13] modelled the location of the surveillance camera 𝑓𝑓
installation on the road as shown in Fig. 11 below. Utilizing the properties of circles, rectangles, and
trigonometry, all pixels in the image plan I will be
determined corresponding to the angle range scanned
by the camera.
Since this problem focuses only on the evaluation
of vehicle speed, a vehicle is assumed to move in a
straight direction, and therefore only the pixels along
the horizontal frame boundary are the target under
consideration to determine the movement distance of
the vehicle followed by estimation of the vehicle
speed. Let denote ∆p the size of one pixel on the image
Fig. 11. A model of a surveillance camera calibration sensor, its representation can be written as:
mounted on roads 𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡 𝛿𝛿
𝛥𝛥𝛥𝛥 = 𝑚𝑚 (3)
2
Camera calibration is one of the important
aspects of the study. The vehicle’s location in video From the known angles α and δ, the position of
images is 2D, however, the vehicles in real world are pixel A(i,j) on the resulting image I at the distance from
3D since vehicles cannot leave the road surface. the center of OO’ can be found as follows:
Therefore, vehicle motion is also in 2D which allows 𝑚𝑚
If pixel A(i, j) for 𝑖𝑖 ≥ then:
the formulation of the vehicle’s coordinate 2
transformation in 2D-to-2D mapping. In this section, 𝑚𝑚
�𝑖𝑖− �𝛥𝛥𝛥𝛥
the calculation of the pattern function between vehicle 𝑑𝑑 = ℎ. 𝑡𝑡𝑡𝑡𝑡𝑡 �𝛼𝛼 + �𝛿𝛿 − 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 � 2
��� (4)
coordinates in the image and real-world coordinates is 𝑓𝑓

performed. How the video camera is installed when the 𝑚𝑚


And if pixel A (i, j) for 𝑖𝑖 < then:
video images are captured from the road traffic and 2
what characteristics are involved in that must be
𝑖𝑖𝑖𝑖𝑖𝑖
determined. As shown in Fig. 11, the camera is set at 𝑑𝑑 = ℎ. 𝑡𝑡𝑡𝑡𝑡𝑡 �𝛼𝛼 + �𝛿𝛿 + 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 � ��� (5)
𝑓𝑓
the height of h above the road surface with its optical
axis sloped at an angle δ from the road. The relation Vehicle position in each frame is first determined
between the camera lens angle and the view domain by the coordinates of the center in each detected BB.

49
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

3.4.2. Frame in the video 4.1. Distance Measure and Estimation


From the frame map shown in Fig 12, the time 4.1.1. Measure the rating
between the two consecutive frames is 1/30 second. Detect license plates: Used terms of summoning
When vehicles move toward and get closer to the (Recall - Re) and accuracy (Precision - Pr) and Jaccard
camera or move away from the camera into the flow of Index coefficient (JI) [11] are defined in formulas (8),
traffic, a pair of consecutive or non-contiguous frames (9), and (10) to hit license plate detection performance
may be utilized with the frame displacement in time price as following:
denoted by Δt. Therefore, the calculation of the
average velocity v of the direction is simply expressed 𝑡𝑡𝑡𝑡
𝑅𝑅𝑒𝑒 = 𝑡𝑡𝑡𝑡+𝑓𝑓𝑓𝑓 (8)
by the following formula:
∆𝑠𝑠 𝑡𝑡𝑡𝑡
𝑣𝑣 = (6) 𝑃𝑃𝑟𝑟 = 𝑡𝑡𝑡𝑡+𝑓𝑓𝑓𝑓 (9)
∆𝑡𝑡
𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 (𝐵𝐵𝑝𝑝 ∩ 𝐵𝐵𝑔𝑔𝑔𝑔 )
where ∆𝑠𝑠 = |𝑑𝑑1 − 𝑑𝑑2 | (7) 𝐽𝐽𝐽𝐽 = (10)
𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 (𝐵𝐵𝑝𝑝 ∪ 𝐵𝐵𝑔𝑔𝑔𝑔 )

where tp is called a true finding if JI ≥ 0.5, and JI


coefficient is calculated by the ratio between the upper
intersection union of the detected rectangle by the Bp
algorithm and the rectangular region containing
manually defined objects Bgt, fp is a bad development
if as JI < 0.5 and fn are not detected for status. Table 2
Fig. 12. Representation of the number of frames per shows the precision and recall of vehicle license plates
second (at 30 fps). detection for different distances with the proposed
method.
Table 2. Precision and recall for distances with the
proposed method

Distance (m) Recall (%) Precision (%)


5 99.7 99.1
15 98.9 98.6
30 98.3 98.2

4.1.2. Measure distance estimation error


To evaluate the distance estimation error, the
actual distance from the camera to the vehicle will be
determined manually, and then compared with the
estimated one by marking the position distance on the
road surface. The vehicle distance estimation is
performed using degrees standard error (RMSE) as:
Fig. 13. Demonstration of detected vehicle speed on
highways 1 2
𝑅𝑅𝑅𝑅𝑅𝑅𝑅𝑅 = � ∑𝑛𝑛𝑖𝑖=1�𝜃𝜃𝑖𝑖 − 𝜃𝜃�� (11)
𝑛𝑛

Fig. 13 demonstrates the estimated speed of all where 𝜃𝜃� is the field-measured distance to vehicle; θ is
vehicles that appeared in the surveillance camera on the predicted distance according to the camera model.
Phap Van - Cau Gie highway. Table 3 shows the evaluation results of vehicle
4. Experimental Results distance estimation error. Based on the absolute base
value of the RMSE, the distances of the points with
The original dataset is set up with 24000 image large deviations should be eliminated to improve the
frames extracted from the traffic surveillance videos accuracy of running track estimation. In this study, the
that are monitored by the camera system mounted on points with an absolute value of RMSE greater than 1
Phap Van - Cau Gie highway in Vietnam and will be removed. The ground coordinates of the point
authorized by Ha Noi City Public Transport closest to center of the detected license plate center are
Management Center under Ha Noi Department of selected as the exact spatial location of the vehicle of
Transport. interest in the current video frame pair.

50
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

Table 3. Evaluation results of vehicle distance is integrated with GPS speedometer using data of GPS
estimation error + GLONASS dual satellite navigation system,
installed for iPhone 12 pro max iOS version 15.3.1 to
Point ∆s (m) RMSE (m)
measure speed and display results on the phone screen
1 0.712 0.43 in real-time.
2 0.715 0.52
On base cars and cars, in particular, the
3 0.725 0.61
tachometer system will measure the vehicle's revving
4 0.740 0.38 speed with a transit speed sensor located in the tailgate.
5 0.746 0.49 As a rule, the speedometer of a car can never display a
6 0.753 0.6 speed lower or higher than 10% of the actual speed.
7 0.773 0.48 The satellite navigation system is used to calculate the
8 0.783 0.55 vehicle's speed by measuring the distance travelled and
9 0.794 0.58 the time according to the information obtained from
10 0.813 0.41 the GPS (In fact, the satellite speed always has an error
of less than 2% compared to the actual speed).
Typically satellite speedometers may monitor the
4.2. Results and Performance Evaluation actual vehicle speed which is much more accurate than
To evaluate performance of the proposed system, that using odometer.
the vehicle speed data respectively estimated by the The system setup in the real scene to record
proposed system, measured by GPS system mounted vehicle speed is shown in Fig. 14, and the performance
on the vehicle, and the odometer are recorded and is introduced in Table 4 and corresponding charts in
compared. The GPS data is collected using Fig. 15 in terms of speed data and the corresponding
a speedometer for iPhone. This speedometer software errors.

Table 4. Vehicle speed data collected by different methods and error comparison
Time Speed Speed Measured Speed error Speed error Speed error Speed error
step measured measured by proposed measured by measured by rate measured rate measured
by by GPS method odometer and GPS by odometer by GPS
odometer (km/h) (km/h) proposed and proposed and proposed and proposed
(km/h) (km/h) (km/h) (%) (%)
1 83.4 76.1 76.9 6.5 0.8 7.79 1.05
2 84.5 76.1 77.2 7.3 1.1 8.64 1.45
3 85 79.2 78.3 6.7 -0.9 7.88 -1.14
4 84.7 79.2 79.9 4.8 0.7 5.67 0.88
5 86 80 80.6 5.4 0.6 6.28 0.75
6 87.4 82.8 81.3 6.1 -1.5 6.98 -1.81
7 90.6 83.1 83.5 7.1 0.4 7.84 0.48
8 91.4 85.2 84.6 6.8 -0.6 7.44 -0.70
9 91.3 84.7 85.7 5.6 1 6.13 1.18
10 94 87.3 87.8 6.2 0.5 6.60 0.57
Table 5. Speed detected and errors with the LPD - SSD1st method
Time Speed Speed Speed Speed error Speed error Speed error Speed error
step measured measured measured by between between GPS rate between rate between
by by GPS proposed odometer and and proposed odometer GPS
odometer (km/h) (km/h) proposed (km/h) and proposed and proposed
(km/h) (km/h) (%) (%)
1 NA 43.6 44.6 NA 1 NA 2.29
2 NA 46.8 45.2 NA -1.6 NA -3.42
3 NA 50.2 49.4 NA -0.8 NA -1.59
4 NA 49.7 50.5 NA 0.8 NA 1.61
5 NA 48.1 49 NA 0.9 NA 1.87
6 NA 46.5 47.1 NA 0.6 NA 1.29
7 NA 45.4 45.5 NA 0.1 NA 0.22
8 NA 44.3 45.1 NA 0.8 NA 1.81
9 NA 42.1 43.2 NA 1.1 NA 2.51
10 NA 42.3 43 NA 0.7 NA 1.65

51
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

Table 6. Error comparison for vehicle speed


measurement
Max speed Max speed
error error
measured by measured by Detectable
Network
odometer GPS distance (m)
and proposed and proposed
(km/h) (km/h)
LPD - SSD1st [1] NA [-1.6, +1.1] 1 - 15
Fig. 14. Visual depiction of the setup system to record
PROPOSED [5.67, 8.64] [-1.5, +1.1] 5 - 30
speed
SYSTEM

Fig. 15. Comparison of the detected speed by the


proposed system versus the estimated Odometer Fig. 16. Compare the speed and distance determined
speedometers one and by the GPS by the proposed method (blue) and the LPD - SSD1st
Experiments are performed to compare the method (red)
results with the LPD - SSD1st method. Table 5 is the
specific values that LPD - SSD1st has measured and
calculated. This value is used for comparison with the
proposed method because the range of speed values in
this table is the highest that LPD - SSD has determined.
The maximum error and error rate of LPD-SSD1st
which appear in the time step 2 in Table 5, are
-1.6 km/h and -3.42%, correspondingly, the speed
measured by GPS is 46.8 km/h, whereas the speed
measured by their proposed system is 45.2 km/h. The
maximum error and error rate of our proposed system,
which appear in the time step 6 in Table 6, are
-1.5 km/h and -1.81%, correspondingly. And the speed
measured by GPS is 82.8 km/h, whereas the speed
measured by our proposed system is 81.3 km/h.
Besides in Fig 16, the speed measurement range in our Fig. 17. Compare the error (km/h) of the proposed
proposed system is 5-30m, while in the LPD - SSD method (blue) and the LPD - SSD1st method (red)
system is 1-15m. The results imply that for a given
range of speed between 50-90 km/h, the detected speed 5. Conclusion
by the proposed system has produced errors around
[-1.5 - 1.1] in a detectable distance of up to 30m as In this study, data augmentation has been adopted
given in Table 6. That means the detected speed by the for data normalization in the speed detection problem.
proposed system with an accuracy greater than 98% The speed detection error in the tachometer within the
can adapt to transport regulations in Vietnam. range of [5.67, 8.64] and [-1.5 km/h, +1.1 km/h] and
working distance up to 30 m prove that the proposed
system has achieved relatively high efficiency in terms
of transportation environment in Vietnam highways.

52
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053

However, the range of detectable speed should be Inter. Journal of Image, Graphics and Signal
improved to work with other highways which allow Processing, Vol. 4, pp. 42-55, 2019.
maximum speeds of up to 120 km/h. The future work [8] Igor Ševo, Aleksej Avramovic, Convolutional neural
of this study will aim to handle this issue in video- network-based automatic object detection on aerial
based speed detection problems. images, 2016 IEEE Geoscience and Remote Sensing
Letters, 13(5):740-744, 2016.
References https://doi.org/10.1109/LGRS.2016.2542358
[1] Lei yang, Menglong Li, Xiaowei Song, Zhi Xiang [9] Wei Liu, Dragomir Anguelov, Dumitru Erhan,
Xiong, Chunping Hou, Boyang Que, Vehicle speed Christian Szegedy, Scott Reed, Cheng-Yang Fu,
measurement based on binocular stereovision system, Alexander C. Berg, SSD: Single shot multibox detector,
Electronic ISSN: 2169-3536, IEEE 30 July 2019, European Conference On Computer Vision, pp 21-
Page(s) : 106628 - 106641 3729, Dec 2016, arXiv:1512.02325v5 [cs.CV],
https://doi.org/10.1109/ACCESS.2019.2932120
https://doi.org/10.1007/978-3-319-46448-0_2
[2] Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van
[10] Alex Bewley, ZongYuan Ge, Lionel Ott, Fabio
Gool, Speedup robust features (surf), Computer Vision
Ramos, Ben Upcroft, Simple online and realtime object
and Image Understanding, 110(3): 346-359, 2008.
tracking, 2016 IEEE International Conference on Image
https://doi.org/10.1016/j.cviu.2007.09.014
Processing (ICIP), September 2016.
[3] Nicolai Woke, Alex Bewley, Dietrich Paulus, Simple https://doi.org/10.1109/ICIP.2016.7533003
online and realtime tracking with a deep association
[11] Chris Stauffer, W.E.L. Grimson, Adaptive background
metric, 2017 IEEE International Conference on Image
mixture models for real-time tracking, 1999 IEEE
Processing (ICIP), September 2017.
Computer Society Conference on Computer Vision and
https://doi.org/10.1109/ICIP.2017.8296962
Pattern Recognition, pp. 246-252, June 1999.
[4] A. S. Gunavan, D. A. Tanjung, F. E. Gunavan,
[12] Nguyen Viet Hung, Nguyen Thi Thao, Do Huy Khoi,
Detection of Vehicle Position and Speed using Camera
Nguyen Tien Dung. Modeling method of vehicle speed
Calibration and Image Projection Methods, 4th
detection based on image processing, Journal of
International Conference on Computer Science and
Science and Technolgy, Thai Nguyen University, ISSN
Computational Intelligence 2019, pp. 255-265,
1859 - 2171, Issue 169, 9/2017, pp 4-39
September 2019.
[13] David A. Forsyth, Jean Ponce, Computer Vision A
[5] J. Sochor et. al, Comprehensive Dataset for Automatic
Modern Approach, second edition, 2012.
Single Camera Visual Speed Measurement, IEEE
transactions on intelligent transportation systems, [14] Massimo Piccardi, Background subtraction techniques:
pp(99) 1-11, May 2018. a review, IEEE International Conference on Systems,
Man and Cybernetics, pp. 3099-3104, 2004.
[6] M. G. Moazzam, M. R. Haque, M. S. Udđin, Image-
Based Vehicle Speed Estimation, Journal of Computer [15] M. Everingham, L. Van Gool, C. K. Williams, J. Winn,
and Communications, Vol. 7, pp 1-5, 2019. and A. Zisserman, The pascal visual object classes
(voc) challenge, International Journal of Computer
[7] A. Tourani et al, Motion-based Vehicle Speed
Vision, vol. 88, no. 2, pp. 303-338, 2010.
Measurement for Intelligent Transportation System,
https://doi.org/10.1007/s11263-009-0275-4

53

You might also like