Video Surveillance Image Enhancement Using Deep Learning
Video Surveillance Image Enhancement Using Deep Learning
2019
VIDEO SURVEILLANCE IMAGE
ENHANCEMENT USING DEEP LEARNING
by
March 2019
ACKNOWLEDGEMENT
First of all, I would like to thank Universiti Sains Malaysia and School of Electrical
and Electronic Engineering that had given me the chance to further my study for Master
Degree. I would like to express my gratitude to my supervisor, Assoc. Prof. Dr. Hj.
Shahrel Azmin for his kindness and inspiration to guide through my Master journey.
The dedication and effort that help me to keep in track within my research scope. This
thesis would never have been completed without his continuous supervised.
I would also like to thank the Malaysian Ministry of Higher Education for sponsor-
I would like to thank my family that had given me moral support to pursue my
Master. Their love and encouragement helped me to keep doing my study. Thousand
of thanks to all my friend that had helped and shared their experience on research skill
and knowledge. Thank you for all the support and guidance. I also want to thank all the
lecturers, staffs and technicians of School of Electrical and Electronic Engineering that
helped me. I would also like to thank the staffs of Institute of Postgraduate Studies for
their kind assistance when there is problem regarding candidature and thesis format.
ii
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENT ii
LIST OF TABLES vi
LIST OF FIGURES ix
ABSTRAK xiv
ABSTRACT xvi
1.1 Overview 1
2.1 Overview 9
2.2 Background 10
iii
2.5 Machine Learning 23
2.6.1 Autoencoder 32
2.6.5 CUDA 39
2.8 Summary 42
3.1 Introduction 44
iv
3.5 Summary 72
4.1 Introduction 73
REFERENCES 112
APPENDICES
v
LIST OF TABLES
Table 3.2 Summary of framework used for executing the deep learning 54
network in this research.
Table 4.2 Average PSNR and SSIM value for each camera for 130 sub- 81
jects without image enhancement.
Table 4.3 Average PSNR and SSIM value for each camera for 130 sub- 81
jects after image enhancement using trained DLB1.
Table 4.4 The increment average of PSNR and SSIM value for each 81
camera for 130 subjects after image enhancement using
trained DLB1.
Table 4.5 The increment in image quality of PSNR and SSIM value 82
in percentage for each camera for 130 subjects after image
enhancement using trained DLB1.
Table 4.6 Result for Deep Learning Block 2 (DLB2) training using 83
surveillance camera images and mugshot images.
Table 4.8 Average PSNR and SSIM value for HE, CLAHE, gamma 87
adjustment, and DLB2 for distance 1.
Table 4.9 Average PSNR and SSIM value for HE, CLAHE, gamma 88
adjustment, and DLB2 for distance 2.
Table 4.10 Average PSNR and SSIM value for HE, CLAHE, gamma 88
adjustment, and DLB2 for distance 3.
vi
Table 4.11 Results using weighted image fusion with different weight 92
schemes for distance 1.
Table 4.12 Results using weighted image fusion with different weight 94
schemes for distance 2.
Table 4.13 Results using weighted image fusion with different weight 94
schemes for distance 3.
Table 4.14 The comparison results using wavelet image fusion using 96
a)Symlet coefficient level 2 and b)Symlet coefficient level
3.
Table 4.15 The comparison results using wavelet image fusion using 97
a)Symlet coefficient level 4 and b)Symlet coefficient level
5.
Table 4.16 Wavelet image fusion result for symlet level 2 with different 98
fusion level for distance 1. The top and bottom values refer
ti PSNR and SSIM, respectively.
Table 4.17 Wavelet image fusion result for symlet level 3 with different 98
fusion level for distance 1. The top and bottom values refer
to PSNR and SSIM, respectively.
Table 4.18 Wavelet image fusion result for symlet level 4 with different 99
fusion level for distance 1. The top and bottom values refer
to PSNR and SSIM, respectively.
Table 4.19 Wavelet image fusion result for symlet level 2 with different 99
fusion level for distance 2. The top and bottom values refer
to PSNR and SSIM, respectively.
Table 4.20 Wavelet image fusion result for symlet level 3 with different 100
fusion level for distance 2. The top and bottom values refer
to PSNR and SSIM, respectively.
Table 4.21 Wavelet image fusion result for symlet level 4 with different 100
fusion level for distance 2. The top and bottom values refer
to PSNR and SSIM, respectively.
Table 4.22 Wavelet image fusion result for symlet level 2 with different 101
fusion level for distance 3. The top and bottom values refer
to PSNR and SSIM, respectively.
vii
Table 4.23 Wavelet image fusion result for symlet level 3 with different 101
fusion level for distance 3.The top and bottom values refer
to PSNR and SSIM, respectively.
Table 4.24 Wavelet image fusion result for symlet level 4 with different 102
fusion level for distance 3. The top and bottom values refer
to PSNR and SSIM, respectively.
Table 4.25 Results of overall performance of DLIE model for DLB1, 104
DLB2, and combining the output using image fusion
((DLB1+DLB2), Weighted Fusion, Wavelet Fusion) for Dis-
tance 1.
Table 4.26 Results of overall performance of DLIE model for DLB1, 104
DLB2, and combining the output using image fusion
((DLB1+DLB2), Weighted Fusion, Wavelet Fusion) for Dis-
tance 2.
Table 4.27 Results of overall performance of DLIE model for DLB1, 106
DLB2, and combining the output using image fusion
((DLB1+DLB2), Weighted Fusion, Wavelet Fusion) for Dis-
tance 3.
viii
LIST OF FIGURES
Figure 2.3 Type of noise that usually occurs in digital image a) original 19
image b) Poisson noise c) Speckle noise d) Salt and Pepper
noise and e) Gaussian noise
ix
Figure 3.4 Camera distance. 49
Figure 3.6 Pooling operation only applied on width and height that are 53
done indepently on each layer depth.
Figure 3.7 List of activation functions that are usually used in deep neu- 53
ral network (Lin and Shen, 2018).
Figure 3.10 Super resolution network from DLB1. This block consists 60
of convolutional layers and rectified linear unit that are con-
figured together to enhance the low resolution image.
Figure 3.11 Deep Learning Block 2 (DLB2) that consists of three denois- 63
ing autoencoder to perform contrast enhancement to the low
quality image.
Figure 3.13 The image fusion process for two images using discrete 69
wavelet transform.
Figure 4.1 Graph shows the number of iteration that improved the SSIM 75
value.
Figure 4.2 The different image quality between different image size. 76
The top images are the original images with different size,
while the bottom row images is the enhance images with its
respective size that have been enlarged to same size.
Figure 4.3 Graph shows the depth of the network affects the SR network 77
in terms of PSNR value.
Figure 4.4 The different image quality between original and after en- 80
hancing the image using a) Original Image b) Bicubic inter-
polation, and c) Proposed DLB1.
x
Figure 4.6 Some examples of enhanced images using HE, CLAHE, 87
gamma adjustment and DLB2 that did not achieved better
image quality.
Figure 4.7 Merging both DLB1 and DLB2 by directly connecting the 91
output of DLB1 to DLB2.
Figure 4.8 Result for fused images using Weighted image fusion with 93
different weight schemes (50 DLB1 and 50 DLB2, 80 DLB1
and 20 DLB2, and 20 DLB1 and 80 DLB2) using PSNR and
SSIM as image quality assessment.
Figure A.1 Images of first candidate at distance 4.2 from camera. 121
Figure A.2 Images of first candidate at distance 2.6 from camera. 121
Figure A.3 Images of first candidate at distance 1.0 from camera. 122
Figure A.4 Infrared mugshot image for the first subject that are not used 122
in the research.
xi
LIST OF ABBREVIATIONS
AI Artificial Intelligence
HE Histogram Equalization
HR High Resolution
LR Low Resolution
xii
ML Machine Learning
NN Neural Network
SR Super Resolution
xiii
PENINGKATAN IMEJ PENGAWASAN VIDEO MENGGUNAKAN
PEMBELAJARAN MENDALAM
ABSTRAK
matan kerana kegunaannya dalam merakam video atau gambar untuk digunakan da-
kualiti gambar keseluruhan. Kualiti gambar memainkan peranan penting dalam meng-
ekstrak maklumat penting yang terdapat dalam gambar. Dalam sistem pengecaman
wajah, gambar yang berkualiti rendah akan mengakibatkan prestasi sistem terjejas.
proses latihan dan ujian akan menangani masalah ini. Gambar yang beresolusi ren-
dah, kadar cahaya rendah, dan gangguan adalah antara beberapa masalah yang kerap
bar, kontras, dan gangguan tanpa mengubah mana-mana parameter. Untuk mencapai
Terdapat dua blok pembelajaran mendalam (DLB1 dan DLB2) dan teknik pengga-
bungan gambar dalam model DLIE yang dicadangkan. Kedua-dua "Deep Learning
Block 1" (DLB1) dan "Deep Learning Block 2" (DLB2) yang di cadang adalah un-
tuk menyelesaikan masalah resolusi rendah, kontras, dan gangguan dalam gambar ka-
untuk menggabungkan DLB1 dan DLB2 sebagai satu sistem. DLB1 menggunakan
xiv
"Convolutional Neural Network" (CNN) untuk meningkatkan resolusi gambar dengan
menggunakan kaedah "Super Resolution". Super Resolution adalah salah satu daripada
algoritma yang membaiki kualiti gambar dengan membina semula gambar beresolusi
bar sebelum membina semula gambar tersebut. Oleh yang demikian, gambar gelap dan
mempunyai bunyi akan ditambah baik kepada gambar lebih bagus. Hasil kedua-dua
rangkaian (DLB1 dan DLB2) yang telah dilatih digabung dengan menggunakan teknik
gabungan gambar Wavelet untuk memastikan sistem mendapat kualiti gambar terbaik.
kualiti gambar di antara 0.946 hingga 8 peratus, manakala DLB2 menunjukkan ba-
hawa ia mampu membaiki kontras dan mengurangkan gangguan dalam gambar lebih
baik daripada teknik konvensional untuk meningkatkan kualiti gambar. Gambar yang
dengan gambar yang gelap dan mempunyai gangguan. Peningkatan dengan purata
xv
VIDEO SURVEILLANCE IMAGE ENHANCEMENT USING DEEP
LEARNING
ABSTRACT
Surveillance camera had become common in improving security because of its use-
fulness to capture video and images for analysis. The variation of surveillance camera
model and specification affects the overall image quality. Image quality plays a signifi-
system, a bad quality image will affect the performance of the system. Thus, en-
hancing the image in image preprocessing before training and testing would deal with
this problem. The low-resolution, low-exposure, and noises are several problems that
image resolution and enhancing the contrast and reduce the noise of the image without
overexposing it. In conventional image enhancement, each approach could only solve
one problem at a time and the parameters need to be changed for each problem. This
work, image enhancement using deep learning approach is proposed. Image enhance-
ment using deep learning utilizes the deep learning network that could automatically
improve the resolution, contrast, and reduce noise of the images without changing any
parameter. To achieve the goal, Deep Learning Image Enhancement (DLIE) is pro-
posed. There are two deep learning blocks which are Deep Learning Block 1 and Deep
Learning Block2 (DLB1 and DLB2) and image fusion in the proposed DLIE model.
Both DLB1 and DLB2 are proposed to solve their respective problems, which is low-
xvi
merge DLB1 and DLB2 outputs into one system. DLB1 utilizes convolutional neural
network to enhance the low-resolution image using Super Resolution method. Super
resolution is one of the algorithms that could improve the image resolution by recon-
structing the low-resolution to high-resolution image. On the other hand, DLB2 utilize
denoising autoencoder to obtain contrast enhancement and noise reduction before re-
constructing the input image to a good quality image. As a result, dark and noise
images can be improved to a cleaner. The outputs of both deep learning techniques
(DLB1 and DLB2) are then fused together using Wavelet image fusion to get the best
image quality while maintaining the capability of both techniques. The enhanced im-
ages are evaluated using image quality assessment such as the peak to signal noise ratio
(PSNR) and structural similarity index (SSIM). DLB1 shows an improvement ranging
from 0.946 to 8 percent, whereas DLB2 shows that it capable of enhancing image
contrast and reduces noise in the image better compared to conventional image en-
hancement method. The enhanced image from the DLIE shows improvement in terms
of PSNR compared to the dark and noisy image with minimum average of 13.3625 dB
to 12.8398 dB.
xvii
CHAPTER ONE
INTRODUCTION
1.1 Overview
ing and security. Video surveillance helps people to know what is happening without
being there and can monitor several places at the same time. There are a lot of appli-
cations for the surveillance camera such as in traffic monitoring, video surveillance,
criminal recognition, crowd monitoring, etc. Nowadays, all the Closed-Circuit Televi-
sions (CCTV) that are commonly installed are usually digital CCTV instead of analog
CCTV (Cermeño et al., 2018). This evolution of technology helped in further video
processing and analysis instead of old limitation in analog CCTV. The improvement of
security by implementing it into security surveillance camera system. There are few
existing cameras with embedded deep learning technology that are used for people-
counting, heat mapping and queue detection that are used in retail stored (Technology,
2018). There are several concerns of implementing deep learning in real-time applica-
tion. One of the problems is the processing power required to process all the required
information into the network. This would load the system capacity that only has a
small processing unit but with development technology such as a video processing
unit (VPU) has made deep learning embedded system possible (Strom, 2018). VPU
was developed by Intel’s Movidus group, and it is a dedicated hardware accelerator for
deep neural network that could be used to embed the camera for deep learning. Other
1
common hardware used to solve processing power problem is by using Graphical Pro-
cessing Unit (GPU) due to its capability to do parallel computing and making huge
dataset could be computed easily. With this kind of growth with deep neural network,
soon the deep learning will become a staple for CCTV that can be applied depending
on its usage.
The qualities of visual data produced from CCTV are varied depending on its hard-
ware such as sensors and lenses. The qualities of visual data are important for further
analysis to capture the important information from the images. Due to cost limitation,
there are a lot of low-end CCTVs installed, and this type of CCTV produces low visual
data quality. The low quality images from the CCTV would affect the analysis, espe-
cially for recognition because of the lack of information in the images. Low resolution,
exposure, contrast, and noises are the frequent problems that affect the image quality
to be low (Loza et al., 2013). This problem would cause the performance of detection
and recognition such as for face recognition degraded because of the image will be ei-
ther too dark or even the image is too pixelated, and the person in the image could not
be identified. Thus, image enhancement becomes important to cope with these prob-
lems by enhancing the image in the preprocessing stage before doing further process.
and many others are the common approach on enhancing the image for specific case
and problem.
Histogram Equalization (HE) is able to improve the contrast of the image by equally
distributing the histogram of the image (Cheng and Shi, 2004; Hall, 1974). Thus mak-
ing the image is able to achieve better contrast by making a wider dynamic range of
2
image gray level. The downside of this process is the noise in the image becomes more
visible and will affect the image quality. This downside is overcome by a new approach
using contrast limited adaptive histogram (CLAHE) proposed by Pisano et al. (1998).
CLAHE fixed the contrast on an image by improving the contrast on the region of inter-
est without amplifying the noise within the image. These are several common methods
used in improving the contrast of the image (Meena et al., 2017). All of these conven-
tional methods are commonly used in image enhancement to improve the contrast of
the image.
Recently, deep neural network has been used in image enhancement and has achieved
has been used to learn important feature from the data and filter the useless data (Fan
et al., 2017). The learned feature from the trained network is then used when re-
constructing back the image. The convolutional network is also utilized for image
enhancement by training the network with a reference image either from some algo-
rithms or human adjustment (Gharbi et al., 2017a). Other image enhancement methods
using deep learning are colorization (Iizuka et al., 2016), demosaicking and denoising
(Gharbi et al., 2016), portrait matting (Shen et al., 2016) and Super resolution (Dong
et al., 2016a).
Super resolution (SR) is one of the image enhancement methods that could help
resolution image is usually very bad. Blockiness in the image can visibly be seen when
the image is further zoomed in. Similarity-based (Yoo et al., 2016), dictionary-based
(Li et al., 2016) and learning based (Dong et al., 2016c) are some of SR approaches
3
in enhancing the image resolution. SR works by reconstructing a low resolution (LR)
image into a high resolution (HR) image on high-resolution planes by smoothing and
On the other hand, image fusion is one of the processes that could help in improving
image quality when there are two or more images that need to be combined together
to obtain better image quality. This process helps in storing important information in
each image and combining it into one image that contains all the information (Nikolov
et al., 2001). There are various image fusion methods such as wavelet image fusion,
Laplacian pyramid, simple pixel averaging, principal component analysis and intensity
hue saturation.
trast, and noise are known to affect the performance of image analysis in surveillance
for each different problems. Most of the existing researches on image quality enhance-
ment has complex algorithm and required many parameters tinkering. Thus, making
it hard to create a universal parameter to enhance the surveillance image from various
CCTV qualities. Deep learning addresses this problem by training the network with its
required parameters and appropriate architecture. Deep learning can understand more
complex image structures using its learning capability compared to conventional im-
age enhancement method. After the network is trained, the network can be used as it
is without any further parameter tinkering for enhancing their respective problems.
4
1.2 Problem Statement
There are several common problems within the video surveillance system that mak-
1. The qualities of the surveillance camera vary depending on its hardware such as
sensors and lenses. The manufacturer of the surveillance camera produced dif-
ferent specification of the surveillance camera with price variation. The variation
of the surveillance camera cause differences in image quality and would interfere
with further image analysis. Problems such as low resolution in the image from
the low-end quality camera would really reduce the image quality. Therefore, an
image enhancement technique that could improve the image quality regardless
2. The position of a surveillance camera could affect the image quality. If the
age will have noise and would affect the contrast of the image. Without enough
light, camera sensor will have a problem capturing a good-quality image. Thus,
an image enhancement that could improve the brightness of the image and re-
3. The conventional image enhancement method can only solve one problem at a
time. This would make the system more complex and could not tackle more than
5
1.3 Research Objectives
1. This research only uses still grayscale images from surveillance cameras to re-
duce training time and complexity in deep learning algorithm. This also reduced
other problem such as size of GPU memory that could be trained on.
2. The architecture used in this research is only focusing on autoencoder and con-
volutional neural network as it fits the criteria for developing image enhancement
3. The database used in this research is SCFace database because the database mim-
ics the real-world surveillance camera. This is the only database that uses real
surveillance camera for capturing the image. The database contains five different