0% found this document useful (0 votes)
29 views24 pages

Video Surveillance Image Enhancement Using Deep Learning

The document is a master's thesis by Muhamad Faris Che Aminudin from Universiti Sains Malaysia, focusing on video surveillance image enhancement using deep learning techniques. It includes an introduction to the problem, literature review, development of a deep learning image enhancement model, results, and discussions on the effectiveness of the proposed methods. The thesis also acknowledges support from the university, supervisor, and funding sources.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views24 pages

Video Surveillance Image Enhancement Using Deep Learning

The document is a master's thesis by Muhamad Faris Che Aminudin from Universiti Sains Malaysia, focusing on video surveillance image enhancement using deep learning techniques. It includes an introduction to the problem, literature review, development of a deep learning image enhancement model, results, and discussions on the effectiveness of the proposed methods. The thesis also acknowledges support from the university, supervisor, and funding sources.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

VIDEO SURVEILLANCE IMAGE

ENHANCEMENT USING DEEP LEARNING

MUHAMAD FARIS CHE AMINUDIN

UNIVERSITI SAINS MALAYSIA

2019
VIDEO SURVEILLANCE IMAGE
ENHANCEMENT USING DEEP LEARNING

by

MUHAMAD FARIS CHE AMINUDIN

Thesis submitted in fulfilment of the


requirements for the degree of
Master of Science

March 2019
ACKNOWLEDGEMENT

First of all, I would like to thank Universiti Sains Malaysia and School of Electrical

and Electronic Engineering that had given me the chance to further my study for Master

Degree. I would like to express my gratitude to my supervisor, Assoc. Prof. Dr. Hj.

Shahrel Azmin for his kindness and inspiration to guide through my Master journey.

The dedication and effort that help me to keep in track within my research scope. This

thesis would never have been completed without his continuous supervised.

I would also like to thank the Malaysian Ministry of Higher Education for sponsor-

ing a research grant Fundamental Research Grant (FRGS) No.203/PELECT/6091294

that helps to carry out my studies while being Graduate Assistance.

I would like to thank my family that had given me moral support to pursue my

Master. Their love and encouragement helped me to keep doing my study. Thousand

of thanks to all my friend that had helped and shared their experience on research skill

and knowledge. Thank you for all the support and guidance. I also want to thank all the

lecturers, staffs and technicians of School of Electrical and Electronic Engineering that

helped me. I would also like to thank the staffs of Institute of Postgraduate Studies for

their kind assistance when there is problem regarding candidature and thesis format.

ii
TABLE OF CONTENTS

Page

ACKNOWLEDGEMENT ii

TABLE OF CONTENTS iii

LIST OF TABLES vi

LIST OF FIGURES ix

LIST OF ABBREVIATIONS xii

ABSTRAK xiv

ABSTRACT xvi

CHAPTER ONE: INTRODUCTION

1.1 Overview 1

1.2 Problem Statement 5

1.3 Research Objectives 6

1.4 Scope of Thesis 6

1.5 Thesis outline 7

CHAPTER TWO: LITERATURE REVIEW

2.1 Overview 9

2.2 Background 10

2.3 Surveillance Camera 11

2.4 Image Enhancement 13

2.4.1 Conventional Image Enhancement 14

2.4.2 Deep Learning based Image Enhancement 19

iii
2.5 Machine Learning 23

2.5.1 Supervised Learning 25

2.5.2 Unsupervised Learning 28

2.6 Deep Learning 29

2.6.1 Autoencoder 32

2.6.2 Restricted Boltzmann Machine 34

2.6.3 Convolutional neural network 35

2.6.4 Deep learning framework 36

2.6.5 CUDA 39

2.7 Image fusion 40

2.8 Summary 42

CHAPTER THREE: DEEP LEARNING IMAGE ENHANCEMENT


MODEL

3.1 Introduction 44

3.2 Deep Learning environment setup 46

3.2.1 Data preparation 47

3.2.2 Deep learning framework setup 51

3.3 Deep Learning Image Enhancement Model 55

3.3.1 Deep Learning Block 1 57

3.3.2 Deep Learning Block 2 61

3.3.3 Image fusion 65

3.4 Performance Evaluation Metrics 69

3.4.1 Peak-Signal to Noise Ratio 70

3.4.2 Structural Similarity Index 71

iv
3.5 Summary 72

CHAPTER FOUR: RESULTS AND DISCUSSIONS

4.1 Introduction 73

4.2 Experimental Results for Deep Learning Block 1 74

4.2.1 Depth vs Performance 77

4.2.2 Comparison Experiments with other Image Enhancement


Technique 79

4.3 Experimental Results for Deep Learning Block 2 82

4.4 Deep Learning Image Enhancement Model 90

4.4.1 Weighted Image Fusion 91

4.4.2 Wavelet Image Fusion 95

4.4.3 Image Fusion Comparison 103

4.5 Summary 108

CHAPTER FIVE: CONCLUSION AND FUTURE WORK

5.1 Conclusion 109

5.2 Future Works 111

REFERENCES 112

APPENDICES

Appendix A: Examples of ScFace Database

v
LIST OF TABLES

Table 2.1 Summary of widely used deep learning framework. 39

Table 3.1 Surveillance Camera specification (Grgic et al., 2011) 50

Table 3.2 Summary of framework used for executing the deep learning 54
network in this research.

Table 3.3 Summary of required dependencies for deep learning frame- 55


work.

Table 3.4 Parameters for Deep Learning Block 1 (DLB1) training. 60

Table 3.5 Parameter for training Deep Learning Block 2. 64

Table 4.1 Time taken to complete 100 iterations. 78

Table 4.2 Average PSNR and SSIM value for each camera for 130 sub- 81
jects without image enhancement.

Table 4.3 Average PSNR and SSIM value for each camera for 130 sub- 81
jects after image enhancement using trained DLB1.

Table 4.4 The increment average of PSNR and SSIM value for each 81
camera for 130 subjects after image enhancement using
trained DLB1.

Table 4.5 The increment in image quality of PSNR and SSIM value 82
in percentage for each camera for 130 subjects after image
enhancement using trained DLB1.

Table 4.6 Result for Deep Learning Block 2 (DLB2) training using 83
surveillance camera images and mugshot images.

Table 4.7 List of parameters used in CLAHE. 84

Table 4.8 Average PSNR and SSIM value for HE, CLAHE, gamma 87
adjustment, and DLB2 for distance 1.

Table 4.9 Average PSNR and SSIM value for HE, CLAHE, gamma 88
adjustment, and DLB2 for distance 2.

Table 4.10 Average PSNR and SSIM value for HE, CLAHE, gamma 88
adjustment, and DLB2 for distance 3.

vi
Table 4.11 Results using weighted image fusion with different weight 92
schemes for distance 1.

Table 4.12 Results using weighted image fusion with different weight 94
schemes for distance 2.

Table 4.13 Results using weighted image fusion with different weight 94
schemes for distance 3.

Table 4.14 The comparison results using wavelet image fusion using 96
a)Symlet coefficient level 2 and b)Symlet coefficient level
3.

Table 4.15 The comparison results using wavelet image fusion using 97
a)Symlet coefficient level 4 and b)Symlet coefficient level
5.

Table 4.16 Wavelet image fusion result for symlet level 2 with different 98
fusion level for distance 1. The top and bottom values refer
ti PSNR and SSIM, respectively.

Table 4.17 Wavelet image fusion result for symlet level 3 with different 98
fusion level for distance 1. The top and bottom values refer
to PSNR and SSIM, respectively.

Table 4.18 Wavelet image fusion result for symlet level 4 with different 99
fusion level for distance 1. The top and bottom values refer
to PSNR and SSIM, respectively.

Table 4.19 Wavelet image fusion result for symlet level 2 with different 99
fusion level for distance 2. The top and bottom values refer
to PSNR and SSIM, respectively.

Table 4.20 Wavelet image fusion result for symlet level 3 with different 100
fusion level for distance 2. The top and bottom values refer
to PSNR and SSIM, respectively.

Table 4.21 Wavelet image fusion result for symlet level 4 with different 100
fusion level for distance 2. The top and bottom values refer
to PSNR and SSIM, respectively.

Table 4.22 Wavelet image fusion result for symlet level 2 with different 101
fusion level for distance 3. The top and bottom values refer
to PSNR and SSIM, respectively.

vii
Table 4.23 Wavelet image fusion result for symlet level 3 with different 101
fusion level for distance 3.The top and bottom values refer
to PSNR and SSIM, respectively.

Table 4.24 Wavelet image fusion result for symlet level 4 with different 102
fusion level for distance 3. The top and bottom values refer
to PSNR and SSIM, respectively.

Table 4.25 Results of overall performance of DLIE model for DLB1, 104
DLB2, and combining the output using image fusion
((DLB1+DLB2), Weighted Fusion, Wavelet Fusion) for Dis-
tance 1.

Table 4.26 Results of overall performance of DLIE model for DLB1, 104
DLB2, and combining the output using image fusion
((DLB1+DLB2), Weighted Fusion, Wavelet Fusion) for Dis-
tance 2.

Table 4.27 Results of overall performance of DLIE model for DLB1, 106
DLB2, and combining the output using image fusion
((DLB1+DLB2), Weighted Fusion, Wavelet Fusion) for Dis-
tance 3.

viii
LIST OF FIGURES

Figure 2.1 Histogram Equalization 16

Figure 2.2 Comparison of CLAHE with Histogram equalization where 18


a) Original image with its histogram b) Image after apply-
ing CLAHE and its histogram and c) Image after applying
Histogram equalization.

Figure 2.3 Type of noise that usually occurs in digital image a) original 19
image b) Poisson noise c) Speckle noise d) Salt and Pepper
noise and e) Gaussian noise

Figure 2.4 Super Resolution Convolution Neural Network (SRCNN) ar- 22


chitecture.

Figure 2.5 Example of simple decision tree model (Xhemali et al., 27


2009).

Figure 2.6 Illustration of deep learning extracting information. 30

Figure 2.7 Autoencoder network architecture 32

Figure 2.8 RBM network architecture. 35

Figure 2.9 CNN network architecture (Ian Goodfellow and Courville, 36


2016).

Figure 2.10 TensorFlow Process Flow for distributing Computational 37


Process (Abadi et al., 2015).

Figure 2.11 CUDA architecture (Nickolls et al., 2008) 41

Figure 2.12 CUDA process flow 41

Figure 2.13 Image fusion in frequency domain block diagram. 42

Figure 3.1 Deep learning image enhancement model that consists of 2 45


deep learning blocks to enhance low resolution, low contrast,
and reduce noise.

Figure 3.2 Overall camera setup. 48

Figure 3.3 Camera position. 49

ix
Figure 3.4 Camera distance. 49

Figure 3.5 All the image size in SCFace database 50

Figure 3.6 Pooling operation only applied on width and height that are 53
done indepently on each layer depth.

Figure 3.7 List of activation functions that are usually used in deep neu- 53
ral network (Lin and Shen, 2018).

Figure 3.8 Overall block diagram representing the proposed Deep 56


Learning Image Enhancement Model.

Figure 3.9 Overall block diagram process for DLB1. 58

Figure 3.10 Super resolution network from DLB1. This block consists 60
of convolutional layers and rectified linear unit that are con-
figured together to enhance the low resolution image.

Figure 3.11 Deep Learning Block 2 (DLB2) that consists of three denois- 63
ing autoencoder to perform contrast enhancement to the low
quality image.

Figure 3.12 Overall block diagram process for DLB2. 63

Figure 3.13 The image fusion process for two images using discrete 69
wavelet transform.

Figure 4.1 Graph shows the number of iteration that improved the SSIM 75
value.

Figure 4.2 The different image quality between different image size. 76
The top images are the original images with different size,
while the bottom row images is the enhance images with its
respective size that have been enlarged to same size.

Figure 4.3 Graph shows the depth of the network affects the SR network 77
in terms of PSNR value.

Figure 4.4 The different image quality between original and after en- 80
hancing the image using a) Original Image b) Bicubic inter-
polation, and c) Proposed DLB1.

Figure 4.5 Some examples of enhanced images using HE, CLAHE, 86


gamma adjustment and DLB2 that achieved better image
quality.

x
Figure 4.6 Some examples of enhanced images using HE, CLAHE, 87
gamma adjustment and DLB2 that did not achieved better
image quality.

Figure 4.7 Merging both DLB1 and DLB2 by directly connecting the 91
output of DLB1 to DLB2.

Figure 4.8 Result for fused images using Weighted image fusion with 93
different weight schemes (50 DLB1 and 50 DLB2, 80 DLB1
and 20 DLB2, and 20 DLB1 and 80 DLB2) using PSNR and
SSIM as image quality assessment.

Figure 4.9 Some example of images to demonstrate the differences 105


of image quality after enhancement using different fusion
methods to merge DLB1 and DLB2.

Figure A.1 Images of first candidate at distance 4.2 from camera. 121

Figure A.2 Images of first candidate at distance 2.6 from camera. 121

Figure A.3 Images of first candidate at distance 1.0 from camera. 122

Figure A.4 Infrared mugshot image for the first subject that are not used 122
in the research.

xi
LIST OF ABBREVIATIONS

AI Artificial Intelligence

ANN Artificial Neural Network

API Application Programming Interface

BPNN Back Propagation Neural Network

CCTV Closed-Circuit Television

CLAHE Contrast Limited Adaptive Histogram

CMPNN Complementary Neural Network

CNN Convolutional Neural Network

CPU Computer Processing Unit

DBN Deep Belief Network

DLB Deep Learning Block

DLIE Deep Learning Image Enhancement

DNN Deep Neural Network

FSRCNN Fast Super Resolution Convolutional Neural Network

GPU Graphical Processing Unit

HE Histogram Equalization

HR High Resolution

k-NN K Nearest Neighbor

LR Low Resolution

xii
ML Machine Learning

MLP Multilayer Perceptron

NN Neural Network

PNN Probabilistic Neural Network

PSNR Peak Signal to Noise Ratio

RAM Random Access Memory

RBM Restricted Boltzmann Machine

SR Super Resolution

SRCNN Super Resolution Convolutional Neural Network

SSIM Structural Similarity Index Metric

SVM Support Vector Machine

VAE Variational Autoencoder

VCR Video Cassette Recorder

VPU Video Processing Unit

VRAM Video Random Access Memory

xiii
PENINGKATAN IMEJ PENGAWASAN VIDEO MENGGUNAKAN

PEMBELAJARAN MENDALAM

ABSTRAK

Kamera pengawasan telah menjadi suatu kebiasaan untuk meningkatkan kesela-

matan kerana kegunaannya dalam merakam video atau gambar untuk digunakan da-

lam analisis. Kepelbagaian model dan spesifikasi kamera pengawasan mempengaruhi

kualiti gambar keseluruhan. Kualiti gambar memainkan peranan penting dalam meng-

ekstrak maklumat penting yang terdapat dalam gambar. Dalam sistem pengecaman

wajah, gambar yang berkualiti rendah akan mengakibatkan prestasi sistem terjejas.

Oleh demikian, memperbaiki kualiti gambar ketika pra-pemprosesan gambar sebelum

proses latihan dan ujian akan menangani masalah ini. Gambar yang beresolusi ren-

dah, kadar cahaya rendah, dan gangguan adalah antara beberapa masalah yang kerap

berlaku dalam kamera pengawasan. Untuk menyelesaikan masalah ini, meningkatk-

an kualiti gambar dengan mengunakan kaedah pembelajaran mendalam dicadangkan

dengan melatih rangkaian pembelajaran mendalam untuk meningkatkan resolusi gam-

bar, kontras, dan gangguan tanpa mengubah mana-mana parameter. Untuk mencapai

matlamat tersebut, "Deep Learning Image Enhancement" DLIE model dicadangkan.

Terdapat dua blok pembelajaran mendalam (DLB1 dan DLB2) dan teknik pengga-

bungan gambar dalam model DLIE yang dicadangkan. Kedua-dua "Deep Learning

Block 1" (DLB1) dan "Deep Learning Block 2" (DLB2) yang di cadang adalah un-

tuk menyelesaikan masalah resolusi rendah, kontras, dan gangguan dalam gambar ka-

mera pengawasan. Manakala, teknik penggabungan gambar digunakan sebagai cara

untuk menggabungkan DLB1 dan DLB2 sebagai satu sistem. DLB1 menggunakan

xiv
"Convolutional Neural Network" (CNN) untuk meningkatkan resolusi gambar dengan

menggunakan kaedah "Super Resolution". Super Resolution adalah salah satu daripada

algoritma yang membaiki kualiti gambar dengan membina semula gambar beresolusi

rendah kepada gambar beresolusi tinggi. Manakala, DLB2 menggunakan "Denoising

Autoencoder" untuk penambahbaikan kontras dan pengurangan ganguan dalam gam-

bar sebelum membina semula gambar tersebut. Oleh yang demikian, gambar gelap dan

mempunyai bunyi akan ditambah baik kepada gambar lebih bagus. Hasil kedua-dua

rangkaian (DLB1 dan DLB2) yang telah dilatih digabung dengan menggunakan teknik

gabungan gambar Wavelet untuk memastikan sistem mendapat kualiti gambar terbaik.

Gambar yang dibaikpulih dinilai menggunakan "Peak-to-Signal Noise Ratio" (PSNR)

dan "Structural Similarity Index" (SSIM). DLB1 menunjukkan peningkatan di dalam

kualiti gambar di antara 0.946 hingga 8 peratus, manakala DLB2 menunjukkan ba-

hawa ia mampu membaiki kontras dan mengurangkan gangguan dalam gambar lebih

baik daripada teknik konvensional untuk meningkatkan kualiti gambar. Gambar yang

ditingkatkan kualiti oleh model DLIE menunjukkan peningkatan jika dibandingkan

dengan gambar yang gelap dan mempunyai gangguan. Peningkatan dengan purata

minimum 13.3625 dB hingga ke 22.7728 dB berbanding sebelum peningkatan kualiti

gambar iaitu dengan purata 9.3940 dB hingga 12.8398 dB.

xv
VIDEO SURVEILLANCE IMAGE ENHANCEMENT USING DEEP

LEARNING

ABSTRACT

Surveillance camera had become common in improving security because of its use-

fulness to capture video and images for analysis. The variation of surveillance camera

model and specification affects the overall image quality. Image quality plays a signifi-

cant role in extracting the prominent information from an image. In a face-recognition

system, a bad quality image will affect the performance of the system. Thus, en-

hancing the image in image preprocessing before training and testing would deal with

this problem. The low-resolution, low-exposure, and noises are several problems that

occur in surveillance camera. These problems could be addressed by improving the

image resolution and enhancing the contrast and reduce the noise of the image without

overexposing it. In conventional image enhancement, each approach could only solve

one problem at a time and the parameters need to be changed for each problem. This

would cause difficulty in developing an automated system. Therefore, in this research

work, image enhancement using deep learning approach is proposed. Image enhance-

ment using deep learning utilizes the deep learning network that could automatically

improve the resolution, contrast, and reduce noise of the images without changing any

parameter. To achieve the goal, Deep Learning Image Enhancement (DLIE) is pro-

posed. There are two deep learning blocks which are Deep Learning Block 1 and Deep

Learning Block2 (DLB1 and DLB2) and image fusion in the proposed DLIE model.

Both DLB1 and DLB2 are proposed to solve their respective problems, which is low-

resolution, low-contrast, and noise. Whereas, image fusion is used as a method to

xvi
merge DLB1 and DLB2 outputs into one system. DLB1 utilizes convolutional neural

network to enhance the low-resolution image using Super Resolution method. Super

resolution is one of the algorithms that could improve the image resolution by recon-

structing the low-resolution to high-resolution image. On the other hand, DLB2 utilize

denoising autoencoder to obtain contrast enhancement and noise reduction before re-

constructing the input image to a good quality image. As a result, dark and noise

images can be improved to a cleaner. The outputs of both deep learning techniques

(DLB1 and DLB2) are then fused together using Wavelet image fusion to get the best

image quality while maintaining the capability of both techniques. The enhanced im-

ages are evaluated using image quality assessment such as the peak to signal noise ratio

(PSNR) and structural similarity index (SSIM). DLB1 shows an improvement ranging

from 0.946 to 8 percent, whereas DLB2 shows that it capable of enhancing image

contrast and reduces noise in the image better compared to conventional image en-

hancement method. The enhanced image from the DLIE shows improvement in terms

of PSNR compared to the dark and noisy image with minimum average of 13.3625 dB

up to 22.7728 dB, compared to before enhancement which averages of 9.3940 dB up

to 12.8398 dB.

xvii
CHAPTER ONE

INTRODUCTION

1.1 Overview

Surveillance cameras became a common technology that is in use for monitor-

ing and security. Video surveillance helps people to know what is happening without

being there and can monitor several places at the same time. There are a lot of appli-

cations for the surveillance camera such as in traffic monitoring, video surveillance,

criminal recognition, crowd monitoring, etc. Nowadays, all the Closed-Circuit Televi-

sions (CCTV) that are commonly installed are usually digital CCTV instead of analog

CCTV (Cermeño et al., 2018). This evolution of technology helped in further video

processing and analysis instead of old limitation in analog CCTV. The improvement of

machine learning in this ever-expanding age of artificial intelligence has strengthened

security by implementing it into security surveillance camera system. There are few

existing cameras with embedded deep learning technology that are used for people-

counting, heat mapping and queue detection that are used in retail stored (Technology,

2018). There are several concerns of implementing deep learning in real-time applica-

tion. One of the problems is the processing power required to process all the required

information into the network. This would load the system capacity that only has a

small processing unit but with development technology such as a video processing

unit (VPU) has made deep learning embedded system possible (Strom, 2018). VPU

was developed by Intel’s Movidus group, and it is a dedicated hardware accelerator for

deep neural network that could be used to embed the camera for deep learning. Other

1
common hardware used to solve processing power problem is by using Graphical Pro-

cessing Unit (GPU) due to its capability to do parallel computing and making huge

dataset could be computed easily. With this kind of growth with deep neural network,

soon the deep learning will become a staple for CCTV that can be applied depending

on its usage.

The qualities of visual data produced from CCTV are varied depending on its hard-

ware such as sensors and lenses. The qualities of visual data are important for further

analysis to capture the important information from the images. Due to cost limitation,

there are a lot of low-end CCTVs installed, and this type of CCTV produces low visual

data quality. The low quality images from the CCTV would affect the analysis, espe-

cially for recognition because of the lack of information in the images. Low resolution,

exposure, contrast, and noises are the frequent problems that affect the image quality

to be low (Loza et al., 2013). This problem would cause the performance of detection

and recognition such as for face recognition degraded because of the image will be ei-

ther too dark or even the image is too pixelated, and the person in the image could not

be identified. Thus, image enhancement becomes important to cope with these prob-

lems by enhancing the image in the preprocessing stage before doing further process.

Image enhancement such as histogram equalization, Gaussian filtering, noise removal

and many others are the common approach on enhancing the image for specific case

and problem.

Histogram Equalization (HE) is able to improve the contrast of the image by equally

distributing the histogram of the image (Cheng and Shi, 2004; Hall, 1974). Thus mak-

ing the image is able to achieve better contrast by making a wider dynamic range of

2
image gray level. The downside of this process is the noise in the image becomes more

visible and will affect the image quality. This downside is overcome by a new approach

using contrast limited adaptive histogram (CLAHE) proposed by Pisano et al. (1998).

CLAHE fixed the contrast on an image by improving the contrast on the region of inter-

est without amplifying the noise within the image. These are several common methods

used in improving the contrast of the image (Meena et al., 2017). All of these conven-

tional methods are commonly used in image enhancement to improve the contrast of

the image.

Recently, deep neural network has been used in image enhancement and has achieved

significant performance in the image-processing task. Stacked denoising autoencoder

has been used to learn important feature from the data and filter the useless data (Fan

et al., 2017). The learned feature from the trained network is then used when re-

constructing back the image. The convolutional network is also utilized for image

enhancement by training the network with a reference image either from some algo-

rithms or human adjustment (Gharbi et al., 2017a). Other image enhancement methods

using deep learning are colorization (Iizuka et al., 2016), demosaicking and denoising

(Gharbi et al., 2016), portrait matting (Shen et al., 2016) and Super resolution (Dong

et al., 2016a).

Super resolution (SR) is one of the image enhancement methods that could help

in enhancing a low-resolution image. The amount of important information in a low-

resolution image is usually very bad. Blockiness in the image can visibly be seen when

the image is further zoomed in. Similarity-based (Yoo et al., 2016), dictionary-based

(Li et al., 2016) and learning based (Dong et al., 2016c) are some of SR approaches

3
in enhancing the image resolution. SR works by reconstructing a low resolution (LR)

image into a high resolution (HR) image on high-resolution planes by smoothing and

upsampling the LR images. As a result, the HR image appearance will be visually

improved with less noise in the image.

On the other hand, image fusion is one of the processes that could help in improving

image quality when there are two or more images that need to be combined together

to obtain better image quality. This process helps in storing important information in

each image and combining it into one image that contains all the information (Nikolov

et al., 2001). There are various image fusion methods such as wavelet image fusion,

Laplacian pyramid, simple pixel averaging, principal component analysis and intensity

hue saturation.

Image quality problems in surveillance system such as low-resolution, low con-

trast, and noise are known to affect the performance of image analysis in surveillance

system. The conventional image enhancement required parameters need to be changed

for each different problems. Most of the existing researches on image quality enhance-

ment has complex algorithm and required many parameters tinkering. Thus, making

it hard to create a universal parameter to enhance the surveillance image from various

CCTV qualities. Deep learning addresses this problem by training the network with its

required parameters and appropriate architecture. Deep learning can understand more

complex image structures using its learning capability compared to conventional im-

age enhancement method. After the network is trained, the network can be used as it

is without any further parameter tinkering for enhancing their respective problems.

4
1.2 Problem Statement

There are several common problems within the video surveillance system that mak-

ing further analysis become more complicated. The problems are:

1. The qualities of the surveillance camera vary depending on its hardware such as

sensors and lenses. The manufacturer of the surveillance camera produced dif-

ferent specification of the surveillance camera with price variation. The variation

of the surveillance camera cause differences in image quality and would interfere

with further image analysis. Problems such as low resolution in the image from

the low-end quality camera would really reduce the image quality. Therefore, an

image enhancement technique that could improve the image quality regardless

of the camera quality to a certain quality is required.

2. The position of a surveillance camera could affect the image quality. If the

surveillance camera is installed in a place without sufficient lighting, the im-

age will have noise and would affect the contrast of the image. Without enough

light, camera sensor will have a problem capturing a good-quality image. Thus,

an image enhancement that could improve the brightness of the image and re-

move noise that visible on the image is required.

3. The conventional image enhancement method can only solve one problem at a

time. This would make the system more complex and could not tackle more than

one problem at a time. Therefore, an intelligent image enhancement model that

could solve several problems at a time could help in this situation.

5
1.3 Research Objectives

The main objectives of this research are as follows:

1. To design and develop a deep learning-based image enhancement technique to

improve the low-resolution video surveillance image.

2. To develop a deep learning-based image enhancement technique to improve low-

contrast and reduce noise in video surveillance image.

3. To solve the three problems simultaneously in one system.

1.4 Scope of Thesis

This thesis covers the following scope:

1. This research only uses still grayscale images from surveillance cameras to re-

duce training time and complexity in deep learning algorithm. This also reduced

other problem such as size of GPU memory that could be trained on.

2. The architecture used in this research is only focusing on autoencoder and con-

volutional neural network as it fits the criteria for developing image enhancement

algorithm in deep learning framework.

3. The database used in this research is SCFace database because the database mim-

ics the real-world surveillance camera. This is the only database that uses real

surveillance camera for capturing the image. The database contains five different

cameras with three different distance when capturing each subject.

You might also like