Deep Learning Based Vehicle Speed Estimation On Highways
Deep Learning Based Vehicle Speed Estimation On Highways
Abstract
Traffic management always is a matter requiring the attention of highway system managers in terms of vehicle
monitoring and speed estimation. This paper proposes an efficient deep learning-based vehicle speed
estimation on highway lanes in the Vietnam transport system. The input videos are recorded by fixed
surveillance cameras. An optimized single shot multibox detector network, called SSD, is utilized for vehicle
license plate detection (LPD). The deep SORT (simple online and real-time tracking) model is first applied to
video vehicle tracking and performed in the detected license plate area. This tracking process investigates the
traveling distance of a vehicle to estimate its speed. In this study, the dataset has been normalized to improve
the efficiency of vehicle localization and tracking to improve the time elapsing in the estimation of the distance
traveled by vehicles on highways. The results showed that the proposed system has achieved better accuracy
in terms of the determined speeds with errors ranging between [-1.5, +1.1] km/h, equivalent to 98% of the
error limit by the regulation in Viet Nam.
Keywords: Vehicle detection, speed estimation, feature extraction, LPD-SSD network, SORT model
ISSN: 2734-9373
https://doi.org/10.51316/jst.163.ssad.2023.33.1.6
Received: June 29, 2022; accepted: September 28, 2022
43
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
Fig. 1. Overall system of the proposed deep learning-based vehicle speed estimation
The study in this paper proposes a speed image plane in terms of pixels per frame [7]. They
estimation approach utilizing a deep learning converted the velocity vector into km/h with two
algorithm as demonstrated in Fig. 1. The input video is distinct pixels-to-meters and frames-to-seconds
captured by a surveillance camera. The LPD - SSD transfer functions. Their reported speed measurement
license plate detection network is recommended for accuracy is 94.8%.
synchronous extraction of typical features from the
Commonly, vehicle speed estimation methods
localized license plate, and the vehicle tracking in the
are classified into two main categories: binocular
video using the SORT network is performed only in
stereo camera-based and monocular single camera-
the license plate area.
based.
2. Related Works
Accurate vehicle feature detection in video
Most recently, A. S. Gunawan et al. measured frames is a prerequisite for video-based vehicle speed
vehicles’ speed using a single camera [4]. In their measurement. License plate has a regular appearance,
proposed algorithm the road image was mapped using uniform contour, and relatively rich texture details and
direct linear transformation (DLT) method. A vehicle can thus be easily detected. License plate is also unique
passed a straight line which was limited by four points and suitable for the synchronization of speed
and the camera measured the movement of the vehicle. measurement and information identification in the
The positions of the four points were used to map their future. Many existing traffic surveillance systems
data from the image plane to the road plane. Finally, record license plates of vehicles in traffic violations,
after calculating the vehicle displacement at the and the infrastructure is already available. Although
ground plane and the time interval between the frames, the license plate is not attached to the ground plane, the
they calculated the vehicle speed. The reported 3D spatial position of the license plate center can be
accuracy is 96.14%. J. Sochor et al. also used a single directly calculated using our binocular stereovision
camera for speed measurement [5]. Vehicle tracking system. The relative 3D displacements can be
was performed using optical flow. They considered obtained. Accordingly, accurate vehicle speed can be
two reference start and end lines on the road plane and derived. Therefore, we choose license plate as the
count the number of video frames in which the vehicle object to be detected in our system. Traditional video
was present between these reference lines. The object detection methods include background
measured speed has an average error of 0.63 km/h with modeling, frame difference, and optical flow. The two
a standard deviation of 4.5 km/h. M. G. Moazzam et former methods are based on video processing. They
al. measured the speed of vehicles by calculating the are suitable for detecting moving objects with fixed
number of frames it takes for the vehicle to travel the background. However, license plate is always a part of
distance between two specified lines on the road plane a moving vehicle and cannot be detected separately.
[6]. The calculation average error is 10%. A new Meanwhile, the latter method can track objects with
method was presented by A. Tourani et al. based on fixed or moving background. It can also track license
measuring the vehicle displacement vector on the plate as a separate part but cannot detect license plate
44
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
45
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
46
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
47
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
48
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
source passing through lens system of the camera, and covered by a camera can be determined using frequent
then an image reflecting the object will be created on geometrical equations. In short, the camera calibration
the screen at the focal length f. The pixel position (i, j) parameters are set as follows:
of a reflected point from the object into the image plan
1) Camera elevation angle range is α towards the
can be evaluated from the object’s coordinates in 3D
vertical axis and α + δ is the maximum adjustable
space as in [12]:
elevation angle of this camera;
𝑋𝑋 𝑌𝑌
𝑖𝑖 = 𝑓𝑓 and 𝑗𝑗 = 𝑓𝑓 (1) 2) An obtained image I at resolution is m × n pixels;
𝑍𝑍 𝑍𝑍
where, X, Y, and Z are the pixel coordinates in the 3) The camera is located at the height h evaluated
OXYZ space; f is the focal length of the camera lens from the road surface;
system. The camera location must be set up over the
surface of the road with its optical axis inclined If the camera's viewing angle along the Ox-axis
downward to the roadway to cover the road plane. direction intersects the road surface at point C, which
Since the proposed solution focuses on speed detection is always set up as the center of the captured image
of vehicles on both highways and inner-city streets since point C corresponds to the center of the camera's
with a mixed traffic flow of motorcycles, cars, vans, image sensor. Point L is the closest position on the road
and other means, there is the need to identify each type where the camera can capture. Considering those
of means out of each other in multiple road lanes. parameters in this model setup, the distance from the
Therefore, this paper utilizes the direction angle (DA) camera to the object can be essentially evaluated.
of the first primary axis (FPA) for each coming vehicle If O’ is assumed to be the center of the camera’s
detected in the video sequence and captured by the image sensor, the angle O� ′ OA between OO’ and a
surveillance camera mounted on the roads [13]. The given pixel A(i,j) on the image plan I of size m × n
background subtraction is first performed for the may be determined as:
captured frame sequence and then each vehicle is
2 2
located and the DA of FPA is evaluated for decision ��𝑖𝑖−𝑚𝑚� +�𝑗𝑗−𝑛𝑛�
2 2
making in the identification task [10]. The study in ′ OA = 𝑡𝑡𝑡𝑡𝑡𝑡
O� (2)
[13] modelled the location of the surveillance camera 𝑓𝑓
installation on the road as shown in Fig. 11 below. Utilizing the properties of circles, rectangles, and
trigonometry, all pixels in the image plan I will be
determined corresponding to the angle range scanned
by the camera.
Since this problem focuses only on the evaluation
of vehicle speed, a vehicle is assumed to move in a
straight direction, and therefore only the pixels along
the horizontal frame boundary are the target under
consideration to determine the movement distance of
the vehicle followed by estimation of the vehicle
speed. Let denote ∆p the size of one pixel on the image
Fig. 11. A model of a surveillance camera calibration sensor, its representation can be written as:
mounted on roads 𝑓𝑓 𝑡𝑡𝑡𝑡𝑡𝑡 𝛿𝛿
𝛥𝛥𝛥𝛥 = 𝑚𝑚 (3)
2
Camera calibration is one of the important
aspects of the study. The vehicle’s location in video From the known angles α and δ, the position of
images is 2D, however, the vehicles in real world are pixel A(i,j) on the resulting image I at the distance from
3D since vehicles cannot leave the road surface. the center of OO’ can be found as follows:
Therefore, vehicle motion is also in 2D which allows 𝑚𝑚
If pixel A(i, j) for 𝑖𝑖 ≥ then:
the formulation of the vehicle’s coordinate 2
transformation in 2D-to-2D mapping. In this section, 𝑚𝑚
�𝑖𝑖− �𝛥𝛥𝛥𝛥
the calculation of the pattern function between vehicle 𝑑𝑑 = ℎ. 𝑡𝑡𝑡𝑡𝑡𝑡 �𝛼𝛼 + �𝛿𝛿 − 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎 � 2
��� (4)
coordinates in the image and real-world coordinates is 𝑓𝑓
49
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
Fig. 13 demonstrates the estimated speed of all where 𝜃𝜃� is the field-measured distance to vehicle; θ is
vehicles that appeared in the surveillance camera on the predicted distance according to the camera model.
Phap Van - Cau Gie highway. Table 3 shows the evaluation results of vehicle
4. Experimental Results distance estimation error. Based on the absolute base
value of the RMSE, the distances of the points with
The original dataset is set up with 24000 image large deviations should be eliminated to improve the
frames extracted from the traffic surveillance videos accuracy of running track estimation. In this study, the
that are monitored by the camera system mounted on points with an absolute value of RMSE greater than 1
Phap Van - Cau Gie highway in Vietnam and will be removed. The ground coordinates of the point
authorized by Ha Noi City Public Transport closest to center of the detected license plate center are
Management Center under Ha Noi Department of selected as the exact spatial location of the vehicle of
Transport. interest in the current video frame pair.
50
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
Table 3. Evaluation results of vehicle distance is integrated with GPS speedometer using data of GPS
estimation error + GLONASS dual satellite navigation system,
installed for iPhone 12 pro max iOS version 15.3.1 to
Point ∆s (m) RMSE (m)
measure speed and display results on the phone screen
1 0.712 0.43 in real-time.
2 0.715 0.52
On base cars and cars, in particular, the
3 0.725 0.61
tachometer system will measure the vehicle's revving
4 0.740 0.38 speed with a transit speed sensor located in the tailgate.
5 0.746 0.49 As a rule, the speedometer of a car can never display a
6 0.753 0.6 speed lower or higher than 10% of the actual speed.
7 0.773 0.48 The satellite navigation system is used to calculate the
8 0.783 0.55 vehicle's speed by measuring the distance travelled and
9 0.794 0.58 the time according to the information obtained from
10 0.813 0.41 the GPS (In fact, the satellite speed always has an error
of less than 2% compared to the actual speed).
Typically satellite speedometers may monitor the
4.2. Results and Performance Evaluation actual vehicle speed which is much more accurate than
To evaluate performance of the proposed system, that using odometer.
the vehicle speed data respectively estimated by the The system setup in the real scene to record
proposed system, measured by GPS system mounted vehicle speed is shown in Fig. 14, and the performance
on the vehicle, and the odometer are recorded and is introduced in Table 4 and corresponding charts in
compared. The GPS data is collected using Fig. 15 in terms of speed data and the corresponding
a speedometer for iPhone. This speedometer software errors.
Table 4. Vehicle speed data collected by different methods and error comparison
Time Speed Speed Measured Speed error Speed error Speed error Speed error
step measured measured by proposed measured by measured by rate measured rate measured
by by GPS method odometer and GPS by odometer by GPS
odometer (km/h) (km/h) proposed and proposed and proposed and proposed
(km/h) (km/h) (km/h) (%) (%)
1 83.4 76.1 76.9 6.5 0.8 7.79 1.05
2 84.5 76.1 77.2 7.3 1.1 8.64 1.45
3 85 79.2 78.3 6.7 -0.9 7.88 -1.14
4 84.7 79.2 79.9 4.8 0.7 5.67 0.88
5 86 80 80.6 5.4 0.6 6.28 0.75
6 87.4 82.8 81.3 6.1 -1.5 6.98 -1.81
7 90.6 83.1 83.5 7.1 0.4 7.84 0.48
8 91.4 85.2 84.6 6.8 -0.6 7.44 -0.70
9 91.3 84.7 85.7 5.6 1 6.13 1.18
10 94 87.3 87.8 6.2 0.5 6.60 0.57
Table 5. Speed detected and errors with the LPD - SSD1st method
Time Speed Speed Speed Speed error Speed error Speed error Speed error
step measured measured measured by between between GPS rate between rate between
by by GPS proposed odometer and and proposed odometer GPS
odometer (km/h) (km/h) proposed (km/h) and proposed and proposed
(km/h) (km/h) (%) (%)
1 NA 43.6 44.6 NA 1 NA 2.29
2 NA 46.8 45.2 NA -1.6 NA -3.42
3 NA 50.2 49.4 NA -0.8 NA -1.59
4 NA 49.7 50.5 NA 0.8 NA 1.61
5 NA 48.1 49 NA 0.9 NA 1.87
6 NA 46.5 47.1 NA 0.6 NA 1.29
7 NA 45.4 45.5 NA 0.1 NA 0.22
8 NA 44.3 45.1 NA 0.8 NA 1.81
9 NA 42.1 43.2 NA 1.1 NA 2.51
10 NA 42.3 43 NA 0.7 NA 1.65
51
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
52
JST: Smart Systems and Devices
Volume 33, Issue 1, January 2023, 043-053
However, the range of detectable speed should be Inter. Journal of Image, Graphics and Signal
improved to work with other highways which allow Processing, Vol. 4, pp. 42-55, 2019.
maximum speeds of up to 120 km/h. The future work [8] Igor Ševo, Aleksej Avramovic, Convolutional neural
of this study will aim to handle this issue in video- network-based automatic object detection on aerial
based speed detection problems. images, 2016 IEEE Geoscience and Remote Sensing
Letters, 13(5):740-744, 2016.
References https://doi.org/10.1109/LGRS.2016.2542358
[1] Lei yang, Menglong Li, Xiaowei Song, Zhi Xiang [9] Wei Liu, Dragomir Anguelov, Dumitru Erhan,
Xiong, Chunping Hou, Boyang Que, Vehicle speed Christian Szegedy, Scott Reed, Cheng-Yang Fu,
measurement based on binocular stereovision system, Alexander C. Berg, SSD: Single shot multibox detector,
Electronic ISSN: 2169-3536, IEEE 30 July 2019, European Conference On Computer Vision, pp 21-
Page(s) : 106628 - 106641 3729, Dec 2016, arXiv:1512.02325v5 [cs.CV],
https://doi.org/10.1109/ACCESS.2019.2932120
https://doi.org/10.1007/978-3-319-46448-0_2
[2] Herbert Bay, Andreas Ess, Tinne Tuytelaars, Luc Van
[10] Alex Bewley, ZongYuan Ge, Lionel Ott, Fabio
Gool, Speedup robust features (surf), Computer Vision
Ramos, Ben Upcroft, Simple online and realtime object
and Image Understanding, 110(3): 346-359, 2008.
tracking, 2016 IEEE International Conference on Image
https://doi.org/10.1016/j.cviu.2007.09.014
Processing (ICIP), September 2016.
[3] Nicolai Woke, Alex Bewley, Dietrich Paulus, Simple https://doi.org/10.1109/ICIP.2016.7533003
online and realtime tracking with a deep association
[11] Chris Stauffer, W.E.L. Grimson, Adaptive background
metric, 2017 IEEE International Conference on Image
mixture models for real-time tracking, 1999 IEEE
Processing (ICIP), September 2017.
Computer Society Conference on Computer Vision and
https://doi.org/10.1109/ICIP.2017.8296962
Pattern Recognition, pp. 246-252, June 1999.
[4] A. S. Gunavan, D. A. Tanjung, F. E. Gunavan,
[12] Nguyen Viet Hung, Nguyen Thi Thao, Do Huy Khoi,
Detection of Vehicle Position and Speed using Camera
Nguyen Tien Dung. Modeling method of vehicle speed
Calibration and Image Projection Methods, 4th
detection based on image processing, Journal of
International Conference on Computer Science and
Science and Technolgy, Thai Nguyen University, ISSN
Computational Intelligence 2019, pp. 255-265,
1859 - 2171, Issue 169, 9/2017, pp 4-39
September 2019.
[13] David A. Forsyth, Jean Ponce, Computer Vision A
[5] J. Sochor et. al, Comprehensive Dataset for Automatic
Modern Approach, second edition, 2012.
Single Camera Visual Speed Measurement, IEEE
transactions on intelligent transportation systems, [14] Massimo Piccardi, Background subtraction techniques:
pp(99) 1-11, May 2018. a review, IEEE International Conference on Systems,
Man and Cybernetics, pp. 3099-3104, 2004.
[6] M. G. Moazzam, M. R. Haque, M. S. Udđin, Image-
Based Vehicle Speed Estimation, Journal of Computer [15] M. Everingham, L. Van Gool, C. K. Williams, J. Winn,
and Communications, Vol. 7, pp 1-5, 2019. and A. Zisserman, The pascal visual object classes
(voc) challenge, International Journal of Computer
[7] A. Tourani et al, Motion-based Vehicle Speed
Vision, vol. 88, no. 2, pp. 303-338, 2010.
Measurement for Intelligent Transportation System,
https://doi.org/10.1007/s11263-009-0275-4
53