Deep Learning Based Traffic Surveillance System For Missing and Suspicious Car Detection
Deep Learning Based Traffic Surveillance System For Missing and Suspicious Car Detection
Received: / Accepted:
Abstract Vehicle theft is arguably one of the fastest- recognition(OCR) and corrects the wrongly recognized
growing types of crime in India. In some of the urban characters using Error-Detector. The effectiveness of
areas, vehicle theft cases are believed to be around 100 the proposed approach is tested on the government’s
each day. Identification of stolen vehicles in such pre- CCTV camera footage, which resulted in identifying
carious scenarios is not possible using traditional meth- the stolen/suspicious cars with an accuracy of 87%.
ods like manual checking and radio frequency identifi-
Keywords Deep Learning · Vehicle Detection ·
cation(RFID) based technologies. This paper presents
Pix2pix generative adversarial network · Tesseract-
a deep learning based automatic traffic surveillance sys-
OCR · Image quality enhancer
tem for the detection of stolen/suspicious cars from
the closed circuit television(CCTV) camera footage. It
mainly comprises of four parts: Select-Detector, Im- 1 Introduction
age Quality Enhancer, Image Transformer, and Smart
Recognizer. The Select-Detector is used for extracting The number of vehicles have increased considerably in
the frames containing vehicles and to detect the license the past few decades and the number of stolen vehicle
plates much efficiently with minimum time complex- count and their uses in crime activities is on the rise too.
ity. The quality of the license plates is then enhanced According to the police department in Delhi,India, ve-
using Image Quality Enhancer which uses pix2pix gen- hicle theft is one of the least-solvable offenses. In 2018
erative adversarial network(GAN) for enhancing the li- alone, over 44,000 vehicles were stolen but less than
cense plates that are affected by temporal changes like 20% of them were recovered. Traditional methods like
low light, shadow, etc. Image Transformer is used to inspection of the vehicle by halting the traffic for find-
tackle the problem of inefficient recognition of license ing these cars have become obsolete. It is important to
plates which are not horizontal(which are at an angle) make use of technology to identify suspicious/stolen ve-
by transforming the license plate to different levels of hicles. It is difficult to manually search each and every
rotation and cropping. Smart Recognizer recognizes the car from the CCTV footage. So there is a need for a
license plate number using Tesseract optical character system that can automatically track the location and
time of the suspicious car within a city once provided
*Corresponding author: K.V. Kadambari
with the license plate number.
Assistant Professor, Dept. of Computer Science and Engg, In most of the developed countries, Red Light Cameras[1]
National Institute of Technology, Warangal, Telangana, are used and license plates are strictly maintained. But
India in India, its only confined to CCTV cameras and the
E-mail: [email protected]
license plates lack standardization. The problem with
Vishnu Vardhan Nimmalapudi CCTV cameras is the low resolution which results in
B.Tech., Dept. of Electronics and Communication
Engg,National Institute of Technology, Warangal, Telangana,
poor recording. The attributes to consider standard-
India ization are size, font, color of the plate, the distance
E-mail: [email protected] between two characters etc. The complexity of recog-
nizing a license plate becomes even harder in India as
2 K.V. Kadambari*, Vishnu Vardhan Nimmalapudi
the license plates often come with regional fonts. Hence, frame uses an object detector to find possible instances
the system needs to be trained for each region. of objects and then matches those detections with cor-
Automation of the process of tracking suspicious responding objects in the preceding frame. This track-
cars is highly desirable as it signicantly reduces the ing is purely based on a particular car detection over
number of human resources needed. Over the recent a period of time. The major drawback of this system
years, numerous technologies consisting of innovative is that it cant track the vehicles based on the license
methods to extract vehicle number plate have emerged plates. It needs a car image to track it, in the given set
but it yet proves to be a difficult task. Major challenges of videos.
are due to low-resolution CCTV footages and due to the Sarfraz et al.[10] proposed a framework for real-time
temporal changes in conditions like illumination, shad- automatic license plate recognition, comprising of de-
ows, low light, bad weather etc. tection/localization, tracking, and recognition of license
In recent times, there has been a lot of research plates in CCTV surveillance videos. Firstly, the license
done on license plate recognition recently [2, 3, 4, 5, 6, plate is localized in the incoming frames using back-
7,8]. Saini et al.[3] designed a deep learning based sys- ground learning and then using histogram of oriented
tem where K-Nearest Neighbors algorithm and convolu- gradients(HOG) on the region of selection. Tracking of
tional neural network classifier are used to identify the the located plate is further done in each frame by con-
stolen/suspicious vehicles without human intervention. tinuous upgradation of background and nding the new
In the first stage, Google’s Tensor Flow detection API is location of the license plate. On the detected plate(s),
used to detect the vehicle with sub-modules to identify the character recognition procedure is applied to rec-
registration number and color from a real-time video ognize the characters of the license plate in each frame
stream. In the second stage, the extracted characteristic using the nearest neighbor classier.The framework is
i.e. vehicle registration number and color are compared evaluated on a set of CCTV road surveillance videos
with the Regional Transport Office(RTO) record.Their obtained for general purpose and manual inspection.
model is tested on a database containing real time road The disadvantage of this method is that it fails to rec-
traffic videos captured with camera. This design is not ognize the plates at an angle.
able to tackle the challenges like identification of license To make a full-fledged working system, it should
plates when they are not horizontal(i.e at an angle) and be able to tackle the following challenges like recogniz-
in conditions of low light, shadow, blurrness of license ing license plates which are at an angle as most of the
plates etc. CCTV cameras are not fixed always at the center of the
Seungwon et al.[2] proposed an integrated video- road, the videos may contain vehicles that are not ex-
based automobile tracking system(IVATS), based on actly parallel or horizontal to the camera view. There
Kafka and HBase. This system consists of Frame Dis- is also a need for enhancing the license plate quality
tributor for distributing the video frames from video as the videos may be taken in different lighting con-
sources like car dashboard cameras, drone -mounted, ditions. The complexity of the frames extraction pro-
CCTV cameras.It uses a Feature Extractor for extract- cess can also be decreased by using a proper algorithm
ing principal vehicle features such as location, plate which can extract only those frames that contain vehi-
number, and time from each frame, and an Information cles, instead of extracting every frame from the CCTV
Manager for storing all features into a database and re- videos. The currently available systems for license plate
trieving them for query processing. Tesseract-OCR is detection fail to tackle the above-mentioned challenges.
used for license plate recognition. This approach may However, the novelty of the proposed framework is in its
work satisfactorily for tracking some vehicles. However, ability to tackle the above-mentioned challenges and to
their Feature Extractor can’t be taken as an ultimate detect suspicious cars with better accuracy using their
solution for identifying license plates for the same rea- license plates.
sons mentioned above like plate at an angle,low light The layout of the rest of the paper is as follows. Sec-
etc. The failure of Tesseract-OCR in recognizing such tion II delineates the proposed framework of the paper.
license plates and the solution to this problem, are ad- Section III contains the analysis of our experiments and
dressed in the later sections. results. Section IV illustrates the conclusion of the ex-
periment and discusses future areas of exploration.
Sandesh[9] proposed a track by detection framework
to track the moving vehicles on the road. You Only
Look Once(YOLO)v3 object detection system was used 2 Proposed Framework
to detect the vehicles and the concepts of Deep Sim-
ple Online and Realtime Tracking(SORT) algorithm This section describes the proposed framework for the
were applied for tracking. In tracking by detection, each suspicious/missing car identification using the traffic
Deep Learning Based Traffic Surveillance System For Missing and Suspicious Car Detection 3
Fig. 4 Algorithm for selecting only required frames from videos for reducing the time complexity.f no represents the frame
number, loc represents the location coordinates of license plate predicted by YOLO, prev loc is a variable to store the previous
location, fps denotes the frames per second.
next half a second and is detected by the object de- image-to-image translation. Pix2pix learns a function to
tector. Considering frames per second(fps) equal to 25, map from a low-quality input image to a high-quality
every vehicle should appear in at least 13 consecutive output image using a conditional generative adversar-
frames. ial network (cGAN) . The network consists of two main
parts, the Generator, and the Discriminator. The Gen-
erator transforms the low-quality input image to get the
2.2 Image Quality Enhancer high quality output image. The Discriminator estimates
the similarity of the input image to an unknown image
To solve the problems such as noise, low contrast, shadow, (either a target image from the dataset or an output
etc in the license plate image, image quality enhance- image from the generator) and tries to guess if this was
ment technique is applied. Pix2pix GAN[12] is used for the actual image produced by the generator. The Gen-
this purpose. It is a model designed for general purpose
Deep Learning Based Traffic Surveillance System For Missing and Suspicious Car Detection 5
erator comprises of an encoder-decoder like structure the horizontal level and that can give better recogni-
with skip-connections giving it a U-net shaped archi- tion results compared to the input image which was
tecture as shown in Figure 5. not horizontal. Then the recognizer is applied to each of
The Discriminator takes in two images, an input these sets of images and the results are noted. During
image and an unknown image (which will be either a testing, distance is calculated between the suspicious
target or output image from the generator), and tries to car’s license number and the recognition result of ev-
decide if the other image was produced by the generator ery image obtained from Image Transformer. Based on
or not. The discriminator model consist of a sequence of a threshold for distance, the input image is classified
standard Convolution, Batch Normalization, Rectified whether it matches with the suspicious car’s number or
Linear Unit(ReLU) blocks similar to deep convolutional not. Rotating an image is done in the range of 12 degree
neural networks as shown in Figure 6. All the ReLUs in clockwise to 12 degree anti-clockwise while cropping is
the encoder of the generator and the discriminator are done by reducing the border from 0 to 25 pixels. Rota-
leaky, whereas the ReLUs in the decoder of the gener- tion and cropping are done at 20 different levels to get
ator are not leaky. Pix2pix GAN is trained to gener- 20 different sub-images for each image, in a hope that
ate a new high-quality license plate image when given one of the sub-images is approximately horizontal.
a low-quality license plate image with low-resolution,
low-contrast, shadow, noise etc.
The comparator takes in two license numbers and tries The proposed framework is evaluated on a set of Gov-
to find the distance between these two numbers. One ernment’s CCTV road surveillance videos obtained from
of the two numbers is the suspicious car’s license num- the Urban Police District, Rajamahendravaram,Andhra
ber and the other is the predicted one generated by the Pradesh,India. The cameras used to capture footage are
smart recognizer. The distance here is defined as the not dedicatedly selected or set up for automatic license
number of characters of the two license numbers that plate detection and recognition. The dataset contains
are mismatched. Based on the threshold value for the 5 videos of varying time-duration obtained from the
distance, the license number identified by the frame- CCTV cameras set up by the Government in different
work is classified as matched or mismatched. If the de- areas in the city.The resolution of each video is 1920 x
tected car’s license number matches with the required 1080. Due to improper functioning of cameras, videos
suspicious car’s license number, the time and the lo- had abruptly paused for few seconds. This was resolved
cation of the video in which the car is detected are by removing the paused frames from the video. For
recorded. For the proposed framework threshold value example if we consider a video of 1 minute, and the
for the distance of mismatch is taken as 2. system got paused in between 20-30 seconds time in-
Deep Learning Based Traffic Surveillance System For Missing and Suspicious Car Detection 7
Fig. 9 Error-Detector
terval, this portion is then removed and the remain- eral Blur randomly. The dataset thus prepared, con-
ing video that was recorded is added to the preceding tains 306 original-low quality image pairs for training
portion i.e after 19th second. Re-Identification(ReID) and 20 image pairs for validation. The training was done
license plates dataset provided by Jakubet et al.[14] is for 40 epochs on a batch size of 64. The training process
used for training of Image Quality Enhancer. It contains took about less than an hour to complete when trained
182,336 color license plate images of different lengths, on NVIDIA Tesla K80 GPU. The pix2pix network and
image blur and slight occlusion. its weights obtained after training are saved and are
used as an image quality enhancer for testing. The re-
sults shown by this trained model on our dataset are
3.2 Image Quality Enhancer presented in Figure 10.
A total of 326 high quality images are selected from The performance of the proposed system is evalu-
the ReID dataset and are made noisy, blur, low con- ated on the acquired dataset. The license numbers of
trast using python’s OpenCV. As the images were of the cars that are clearly visible from the dataset are
different sizes, all of them were resized to 256x256. Ran- manually taken as a test set for evaluation. The test
dom Gaussian noise was added to images. The images set taken contains 53 license numbers. The bar graph
were blurred using Gaussian Blur, Median Blur, Bilat- in Figure 11, shows the percentage of cars that are cor-
8 K.V. Kadambari*, Vishnu Vardhan Nimmalapudi
Fig. 12 Results predicted by the proposed framework. The scatter points gives the predicted location when and where the
car was last seen.
6. Mayan, J. Albert, Kumar Akash Deep, Mukesh Kumar, on computer vision and pattern recognition, pp. 1125-1134.
Livingston Alvin, and Siva Prasad Reddy. ”Number plate 2017.
recognition using template comparison for various fonts 13. Smith, Ray. ”An overview of the Tesseract OCR engine.”
in MATLAB.” In 2016 IEEE International Conference on In Ninth International Conference on Document Analysis
Computational Intelligence and Computing Research (IC- and Recognition (ICDAR 2007), vol. 2, pp. 629-633. IEEE,
CIC), pp. 1-6. IEEE, 2016. 2007.
7. Khaparde, Devesh, Heet Detroja, Jainam Shah, Rushikesh 14. pahel, Jakub, Jakub Sochor, Roman Jurnek, Adam Her-
Dikey, and Bhushan Thakare. ”Automatic Number Plate out, Luk Mark, and Pavel Zemk. ”Holistic recognition of
Recognition System.” International Journal of Computer low quality license plates by CNN using track annotated
Applications 975: 8887. data.” In 2017 14th IEEE International Conference on Ad-
8. Qadri, Muhammad Tahir, and Muhammad Asif. ”Auto- vanced Video and Signal Based Surveillance (AVSS), pp.
matic number plate recognition system for vehicle identifi- 1-6. IEEE, 2017.
cation using optical character recognition.” In 2009 Interna-
tional Conference on Education Technology and Computer,
pp. 335-338. IEEE, 2009.
9. Shrestha, Sandesh. ”Vehicle Tracking Using Video Surveil-
lance.” In Intelligent System and Computing. IntechOpen,
2019.
10. Sarfraz, M. Saquib, Azeem Shahzad, Muhammad A.
Elahi, Muhammad Fraz, Iffat Zafar, and Eran A. Ediris-
inghe. ”Real-time automatic license plate recognition for
CCTV forensic applications.” Journal of real-time image
processing 8, no. 3 (2013): 285-295.
11. Redmon, Joseph, and Ali Farhadi. ”Yolov3: An incremen-
tal improvement.” arXiv preprint arXiv:1804.02767 (2018).
12. Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei
A. Efros. ”Image-to-image translation with conditional ad-
versarial networks.” In Proceedings of the IEEE conference