0% found this document useful (0 votes)
48 views54 pages

CNN 1

CNN

Uploaded by

ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views54 pages

CNN 1

CNN

Uploaded by

ganesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

AUTOMATIC DAMAGE RECOVERY OF OLD PHOTOS

IN PAINTING AND RESTORATION BASED ON


CONVULTIONAL NEURAL NETWORK

PHASE I REPORT

Submitted by

P ANUSHIYA
212223310002

in partial fulfillment for the award of the degree of

MASTER OF ENGINEERING
in
APPLIED ELECTRONICS

SAVEETHA ENGINEERING COLLEGE (AUTONOMOUS)


AFFILIATED TO ANNA UNIVERSITY
THANDALAM, CHENNAI - 602 105

DECEMBER 2024
BONAFIDE CERTIFICATE

Certified that this project report “AUTOMATIC DAMAGE


RECOVERY OF OLD PHOTOS IN PAINTING AND RESTORATION
BASED ON CONVULTIONAL NEURAL NETWORK” is the bona-fide
work of “P ANUSHIYA 212223310002 who carried out the work under my
supervision. Certified further that to the best of my knowledge the work reported
here in does not form part of any other thesis or dissertation on the basis of
which a degree or award was conferred on an earlier occasion on this or any
other candidate.

SIGNATURE SIGNATURE
DR. SRIGITHA S NATH DR. S PRAVEEN KUMAR
HEAD OF DEPARTMENT SUPERVISOR
Saveetha Engineering College Saveetha Engineering College
(Autonomous) (Autonomous)
Chennai – 602 105 Chennai – 602 105

Submitted for the project viva voce examination held on ___________

INTERNAL EXAMINER EXTERNAL EXAMINER


. ACKNOWLEDGEMENT

We convey our sincere thanks to DR. N. M. VEERAIYAN –Founder


President and Chancellor-SIMATS, Saveetha Group of Institutions, DR. S.
RAJESH, Director - Saveetha Engineering College and DR. V. SAVEETHA
RAJESH, Director, NMV University for providing us with the facilities for the
completion of our project. We are grateful to our Principal Dr. V. VIJAYA
CHAMUNDEESWARI, M.Tech., Ph.D., for her continuous support and
encouragement in carrying out our project work. We are deeply indebted to our
beloved Head of the Department, DR. SRIGITHA S NATH, M.E., Ph.D.,
Department of Electronics and Communication, for giving us the opportunity to
display our professional skills through this project. We are greatly thankful to
Supervisor DR. DR. S PRAVEEN KUMA M.E., Ph.D, Professor, Electronics and
Communication Engineering for his valuable guidance and motivation which helped
to complete our project on time. We thank all our teaching and non-teaching
faculty members of the Department of Electronics and Communication for their
passionate support, for helping us to identify our mistakes, and also We thank all
our teaching and non-teaching faculty members of the Department of Electronics
and Communication for their passionate support, for helping us to identify our
mistakes, and also non-teaching the appreciation they gave us. We heartily thank
our library staff and the management for their extensive support in providing the
resources and information that helped us to complete the project successfully. Also,
we would like to record our deepest gratitude to our parents for their constant
encouragement and support, which motivated us a lot to complete our
ABSTRACT

In recent years, motion detection has attracted a great interest from


computer vision researchers due to its promising applications in many areas, such as
video surveillance. In motion detection the task is to detect a region of interest
embodied in a region of awareness, where the region of awareness, or in terms of
the camera geometry, the field of view, is defined as the portion of environment
being monitored. The region of interest is in the present case the portion of the
environment with activity. For the sake of simplicity and generality, recognition-
based detection is not assumed. A region of interest can be therefore a person, an
animal, or an artifact; circumscribed with the term moving objects. The motion
detection algorithm is based on background change detection, i.e. the difference
method by background subtraction. This assumes that the background model for the
expected image sequence is known in advance and that it does not change over time.
Such conditions are rarely given for an indoor scene, e.g. when illumination changes
occur and objects are moved around. Hence a confident adaption method is
required. An adaption region must be specified that discerns between foreground
and background objects. By definition the background change detection itself yields
that distinction but is inappropriate because undetected foreground objects will be
falsely adapted to the modelled background image. Therefore the discrimination
must be done by a different method. In mechanical devices, if a moving objects
steps into the field of view, a simple sound like device will alert the user. Finally the
alert will be also send by using e-mail system. Therefore, this paper proposes a
twostage convolution network to automatically repair damaged old photos. The first
stage will detect the damaged areas of the photos, and the second stage will repair
these areas. The experiment results demonstrates our method can successfully detect
and repair the damage of the photos.
TABLE OF CONTENTS

CHAPTER TITLE PAGE

NO. NO.

ABSTRACT 2

LIST OF FIGURES 7

LIST OF ABBREVATIONS 8

I INTRODUCTION 9

1.1 Camera video module 9

1.2 Motion detection 9

1.3 Objectives 9

II SYSTEM ANALYSIS 11

2.1 Existing system 11

2.1.1 Limitations 11

2.2.1 Advantages of proposed system 12

III SYSTEM SPECIFICATION 13

3.1 Hardware requirement 13

3.2 Software requirement 13

3.3 Software specification 13

3.3.1 C# 13

3.3.2 Implementation 14

3.3.3 .Net framework 15


IV LITERATURE REVIEW 18

4.1 Autonomous real-time surveillance system with 18

distributed IP cameras

4.2 Integrated motion detection and tracking for 19

visual surveillance

4.3 Smart webcam motion detection surveillance 19

4.4 A robust and computationally efficient motion 20

detection algorithm based on background

estimation

V MODULES 21

5.1 List of modules 21

5.2 Module description 21

V1 SYSTEM DESIGN 23

6.1 DFD-LEVEL 0 23

6.2 DFD-LEVEL 1 24

VII TESTING 25

7.1 System testing 25

7.2 Unit testing 25

7.3 Functional testing 25

7.4 White Box testing 25

7.5 Black box testing 26

VIII SYSTEM DEVELOPMENT 27


8.1 System development 27

8.2 Code design 30

IX CONCLUSION 49

9.1 Project conclusion 51

9.2 Future enhancement 51

X APPENDIX 52

XI REFERENCE 52
LIST OF FIGURES

FIGURE TITLE PAGE


NO. NO.
a Languages interoperability is a key feature of the .NET 17
framework.
LIST OF ABBREVATIONS

SNO ABBREVATIONS EXPANSION


1 C# C Sharp
2 CLR Common Language
Runtime
3 CLI Common Language
Infrastructure
4 IP Internet Protocol
5 ABORAT Algorithm Based Object
Recognition and Tracking
CHAPTER 1

INTRODUCTION

1.1 INTRODUCTION

A camera related operations package is used in order to handle the camera


related methods. Here we use the camera class in order to capture the camera feed
either of the local camera or the remote IP camera and then display it in a window.
This would be visible along with the other security controls to the user.

1.2 MOTION DETECTION

The motion detection module consists of the motion detection algorithm


which helps us to analyze the camera feed and to detect and signal any motion
related triggers. It also comes with a motion sensitivity panel were you get to adjust
the level of motion sensitivity that might be required.

1.3 OBJECTIVES

The primary objective of this project is to develop an object


detection system using CNNs with Python. Specifically, the
project aims to achieve the following goals:
i. Implement a CNN-based object detection model capable of
accurately localizing and classifying objects within
images.
ii. Train the model on adiverse data set of annotated images
to learn representative features for various object
categories.
iii. Explore and analyze the strengths and limitations of different
CNN
1

Alarm system covers the remaining part of the project on alerting the user
regarding the security threat. This is done by detection of any motion and a
alarm sound is set on. This alarm alerts the user about the security issue which
might be present. It also records the video on triggering the alarm.

1.2.1 Applications

1. Remote monitoring
2. Secured alarm system

1.2.2 Features

1. Round the clock security


2. Available for both local and remote cameras
3. Effective motion detection algorithm
2

CHAPTER 2

SYSTEM ANALYSIS

2.1 EXSISTING SYSTEM

The currently existing systems provide very large variety of unwanted


and also heavy components which make the application very hard for usage,
and also not fit for the actual purpose. They too provide limitation for user
connectivity through licensing. This makes us unable to utilize the root purpose
of the software at ease. Existing systems make it difficult for the users to
understand the design and working that makes them believe the software is very
much harder than using some other in efficient methodology. Also local camera
capability is limited and only remote camera monitoring facility is enabled.

2.1.1 LIMITATIONS:

 Instability: The existing systems have a very large stability issues which
are very annoying.
 Locality support: The local camera is not supported
 Inefficiency: The existing systems do not have a very good efficient
motion detection due to which people don’t want to use them in most
cases

2.2 PROPOSED SYSTEM

The proposed system is very efficient software that can be used to detect
and monitor security with ease. Simple architecture makes it user friendly too. It
does this by using a very efficient motion detection algorithm with a variable
sensitivity which makes it vital for the security purpose.
3

2.2.1 ADVANTAGES OF PROPOSED SYSTEM

 ROBUST: It is a very simple and efficient system which is very


much robust under heavy usage also.
 STABLE: Does not crash even if under much load.
 EFFICIENT: Works efficiently in detecting movements.
 SIMPLICITY: Very simple design and executions for better
understandability.
4

CHAPTER 3

SYSTEM SPECIFICATION

3.1 HARDWARE REQUIREMENT

i. Hard disk : 40 GB
ii. RAM : 128 MB
iii. Processor: Pentium IV

3.2 SOFTWARE REQUIREMENT

i. Windows XP or Higher
ii. .Net Framework 3.5
iii. Directx 2010
iv. MJPEG supporting Web browser

3.3 SOFTWARE SPECIFICATION


3.3.1. C#
C Sharp (C#) is a very powerful language that is a very common
tool for building the windows tools. Her for our project we have selected
this language since it servers to be the most effective placement since the
most commonly used Operating systems now a days are windows and
this runs in the .NET framework with very much stability . It is very
simple and also doesn’t need any additional frameworks to be installed
before we work on it such as in Java which needs the Java JDK and JRE
to be installed before we work on Java applications.
During the development of the .NET Framework, the class
libraries were originally written using a managed code compiler system
called Simple Managed C (SMC). In January 1999, Anders
5

Hejlsberg formed a team to build a new language at the time called Cool,
which stood for "C-like Object Oriented Language". Microsoft had
considered keeping the name "Cool" as the final name of the language,
but chose not to do so for trademark reasons. By the time the .NET
project was publicly announced at the July 2000 Professional Developers
Conference, the language had been renamed C#, and the class libraries
and ASP.NET runtime had been ported to C#.
C# is an elegant and type-safe object-oriented language that
enables developers to build a variety of secure and robust applications
that run on the .NET Framework. You can use C# to create traditional
Windows client applications, XML Web services, distributed
components, client-server applications, database applications, and much,
much more. Visual C# provides an advanced code editor, convenient user
interface designers, integrated debugger, and many other tools to make it
easier to develop applications based on version 4.0 of the C# language
and version 4.0 of the .NET Framework.
3.3.2. Implementations
The reference C# compiler is Microsoft Visual C#, which is
closed-source. Other C# compilers exist, often including an
implementation of the Common Language Infrastructure and the .NET
class libraries up to .NET 2.0:
 The Mono project provides an open source C# compiler, a
complete open source implementation of the Common Language
Infrastructure including the required framework libraries as they
appear in the ECMA specification, and a nearly complete
implementation of the Microsoft proprietary .NET class libraries
up to .NET 3.5. As of Mono 2.6, no plans exist to
6

implement WPF; WF is planned for a later release; and there are


only partial implementations of LINQ to SQL and WCF.
 The Dot GNU project also provides an open source C# compiler, a
nearly complete implementation of the Common Language
Infrastructure including the required framework libraries as they
appear in the ECMA specification, and subset of some of the
remaining Microsoft proprietary .NET class libraries up to .NET
2.0 (those not documented or included in the ECMA specification,
but included in Microsoft's standard .NET Framework
distribution).
 Microsoft's Rotor project (currently called Shared Source Common
Language Infrastructure) (licensed for educational and research use
only) provides a shared source implementation of the CLR runtime
and a C# compiler, and a subset of the required Common Language
Infrastructure framework libraries in the ECMA specification (up
to C# 2.0, and supported on Windows XP only).

3.3.3. .NET Framework


C# programs run on the .NET Framework, an integral component
of Windows that includes a virtual execution system called the common
language runtime (CLR) and a unified set of class libraries. The CLR is
the commercial implementation by Microsoft of the common language
infrastructure (CLI), an international standard that is the basis for creating
execution and development environments in which languages and
libraries work together seamlessly.
When the C# program is executed, the assembly is loaded into the
CLR, which might take various actions based on the information in the
manifest. Then, if the security requirements are met, the CLR performs
just in time (JIT) compilation to convert the IL code to native machine
7

instructions. The CLR also provides other services related to automatic


garbage collection, exception handling, and resource management. Code
that is executed by the CLR is sometimes referred to as "managed code,"
in contrast to "unmanaged code" which is compiled into native machine
language that targets a specific system. The following diagram illustrates
the compile-time and run-time relationships of C# source code files, the
.NET Framework class libraries, assemblies, and the CLR.
In addition to the run time services, the .NET Framework also
includes an extensive library of over 4000 classes organized into
namespaces that provide a wide variety of useful functionality for
everything from file input and output to string manipulation to XML
parsing, to Windows Forms controls. The typical C# application uses the
.NET Framework class library extensively to handle common "plumbing"
chores.
8

Fig .a. Language interoperability is a key feature of the .NET


Framework
9

CHAPTER 4

LITERATURE REVIEW

In this section, we first review changes in education and technology,


explain the concept of remote and local camera motion monitoring, and
conclude with a review of technology adoption theories in Information
Systems as they have developed over the years.

4.1. Autonomous Real-time Surveillance System with Distributed IP


Cameras

Authors : Kofi Appiah, Andrew Hunter and Jonathan Owens

An autonomous Internet Protocol (IP) camera based object tracking and


behaviour identification system, capable of running in real-time on an
embedded system with limited memory and processing power is presented in
this paper. The

main contribution of this work is the integration of processor intensive image


processing algorithms on an embedded platform capable of running at real-time
for monitoring the behavior of pedestrians. The Algorithm Based Object
Recognition and Tracking (ABORAT) system architecture presented here was
developed on an Intel PXA270-based development board clocked at 520 MHz.
The platform was connected to a commercial stationary IP-based camera in a
remote monitoring station for intelligent image processing. The system is
capable of detecting moving objects and their shadows in a complex
environment with varying lighting intensity and moving foliage. Objects
moving close to each other are also detected to extract their trajectories which
are then fed into an unsupervised neural network for autonomous classification.
The novel intelligent video system presented is also capable of performing
simple analytic functions such as tracking and generating alerts when objects
10

enter/leave regions or cross tripwires superimposed on live video by the


operator.

4.2. Integrated Motion Detection and Tracking for Visual Surveillance

Authors : Mohamed F. Abdelkader Rama Chellappa Qinfen Zheng

Visual surveillance systems have gained a lot of interest in the last few
years. In this paper, we present a visual surveillance system that is based on the
integration of motion detection and visual tracking to achieve better
performance. Motion detection is achieved using an algorithm that combines
temporal variance with background modeling methods. The tracking algorithm
combines motion and appearance information into an appearance model and
uses a particle filter framework for tracking the object in subsequent frames.
The systems was tested on a large ground-truthed data set containing hundreds
of color and FLIR image sequences. A performance evaluation for the system
was performed and the average evaluation results are reported in this paper.

4.3. Smart Web Cam Motion Detection Surveillance System

Authors : Cynthia Tuscano, Blossom Lopes, Stephina Machado, Pradnya


Rane

The Basic Idea Behind “Smart Web Cam Motion Detection Surveillance
System” Is To Stop The Intruder To Getting Into The Place Where A High End
Security Is Required. This Paper Proposes A Method For Detecting The Motion
Of A Particular Object Being Observed. The Motion Tracking Surveillance Has
Gained A Lot Of Interests Over Past Few Years. This System Is Brought Into
Effect Providing Relief To The Normal Video Surveillance System Which
Offers Time Consuming Reviewing Process. Through The Study And
Evaluation Of Products, We Propose A Motion Tracking Surveillance System
Consisting Of Its Method For Motion Detection And Its Own Graphic User
11

Interface. Various Methods Are Used In Motion Detection Of A Particular


Interest. Each Algorithm Is Found Efficient In One Way. But There Exits Some
Limitation In Each Of Them. In Our Proposed System Those Disadvantages
Are Omitted And Combining The Usage Of Best Method We Are Creating A
New Motion Detection Algorithm For Our Proposed Motion Tracking
Surveillance System. The Proposed System In This Paper Does Not Have Its
Effect Usage In Office Alone. It Also Offers More Convenient, Effective And
Efficient Usage Where High-End Security Comes Into Picture.

4.4. A robust and computationally efficient motion detection algorithm


based on-background estimation.

Authors : A . Manzanera J. C. Richefeu

This paper presents a new algorithm to detect moving objects within a scene
acquired by a stationary camera. A simple recursive non linear operator, the
_lter, is used to estimate two orders of temporal statistics for every pixel of the
image. The output data provide a scene characterization allowing a simple and
ef_cient pixel-level change detection framework. For a more suitable detection,
exploiting spatial correlation in these data is necessary. We use them as a
multiple observation _eld in a Markov model, leading to a spatiotemporal
regularization of the pixel-level solution. This method yields a good trade-off in
terms of robustness and accuracy, with a minimal cost in memory and a low
computational complexity.
12

CHAPTER 5

MODULES

5.1 LIST OF MODULES:

 Camera video module


 Motion detection
 Alarm system

5.2 MODULE DESCRIPTION

1. Camera video module

Camera video module is the part were we capture the video stream as it is
done by the camera and then display it within boundaries of a window in which
the user may see the captured video. A camera related operations package is
used in order to handle the camera related methods. Here we use the camera
class in order to capture the camera feed either of the local camera or the remote
IP camera and then display it in a window. This would be visible along with the
other security controls to the user.

2. Motion detection

The motion detection module consists of the motion detection algorithm


which helps us to analyze the camera feed and to detect and signal any motion
related triggers. It also comes with a motion sensitivity panel were you get to
adjust the level of motion sensitivity that might be required.

A timer is run that waits for the camera to focus in, then the control
panels are activated. Settings like the keycode and option control states are
stored in the application's default properties. When first run, or if the keycode
property is cleared, the Arm/Disarm button acts as the keycode set button. The
sounds used in the application are instances of
13

the System.Media.SoundPlayer class, instanced and loaded with a sound file


when the app initializes:

The camera's motion trigger-state is polled periodically using a custom


timer class; if motion has occurred, an alarm timer is fired, which will loop until
the alarm is deactivated, or the max alarm timeout occurs

3. Alarm system

Alarm system covers the remaining part of the project on alerting the user
regarding the security threat. This is done by detection of any motion and a
alarm sound is set on. This alarm alerts the user about the security issue which
might be present. It also records the video on triggering the alarm.

When the alarm is in Armed mode, all but the keypad and disarm switch
are disabled. This is done by disabling the Group Box that houses the controls.
The main form close button also needs to be deactivated. This is done both by
cancelling the form's exit in the Form Closing event, and by disabling the Close
button using the Get System Menu/Enable MenuItem API.
14

CHAPTER 6

SYSTEM DESIGN
15

DFD – LEVEL 0
16

DFD – LEVEL 1
17

CHAPTER 7

TESTING

7.1. SYSTEM TESTING:

System testing of software results that it works very well under the given
specifications of hardware and software. Also for the test of stability
simultaneously running software were implemented that dint result in any
change in the output which showed it is very much stable under load and worst
cases.

7.2.UNIT TESTING

Unit testing of the product resulted that the three separate modules work
efficiently and individually well. Each unit of the product was tested to ensure
maximum security and stability.

7.3. FUNTIONAL TESTING

The software resulted in the expected output functions were delivered


without any faults or mismatches. Some expected and unexpected inputs were
tested which showed that it was provided with the logic to handle those inputs.
Unexpected errors were not much likely to happen under normal circumstances.

7.4.WHITE BOX TESTING

White Box testing of the program code of the functions shows that the
modules are stable at normal conditions and are efficient than the existing
systems in their stability and efficiency as such as in the motion detection
module that is less likely in existing systems.
18

7.5. BLACK BOX TESTING

Black-box testing shows that every module of the software work as


expected and produced the outputs likely. The functionality tests such as in the
3 modules are sampled and tested individually.
19

CHAPTER 8

SYSTEM DEVELOPMENT

8.1 SYSTEM DESIGN

Systems design is the process of defining the architecture, components,


modules, interfaces, and data for a system to satisfy specified requirements. The
software in this paper has a stable architecture which is built in such a way to
work perfectly right even at conditions of much less resources. This is so
because it is not concentrated on more of look and feel, additional payloads but
is very simple to make and produce results of the expected outcome.

8.2 CODE DESIGN

A Code Design is a document that sets rules for the design of a new
development. The code of this software is a step-by-step process written to
show the execution at every step and to make it simple rather than making it
more complex for executing and understanding. For example the camera video
module has a code design so that first the camera feed is captured and then it is
displayed to the user for the further processes. Testcase1:Uploading image and
detecting single object at a time. convolutional neural network (CNN) is a
regularized type of feed-forward neural network that learns features by itself via
filter (or kernel) optimization. This type of deep learning network has been
applied to process and make predictions from many different types of data
including text, images and audio. Convolution-based networks are the de-facto
standard in deep learning-based approaches to computer vision and image
processing, and have only recently have been replaced -- in some cases -- by
newer deep learning architectures such as the transformer. Vanishing gradients
and exploding gradients, seen during back propagation in earlier neural
20

networks, are prevented by using regularized weights over fewer connections.


For example, for each neuron in the fully-connected layer, 10,000 weights
would be required for processing an image sized 100 × 100 pixels. However,
applying cascaded convolution (or cross-correlation) kernels, only 25 neurons
are required to process 5x5-sized tiles. Higher-layer features are extracted from
wider context windows, compared to lower-layer features Some applications of
CNNs include:

• image and video recognition,

• recommender systems,

• image classification,

• image segmentation,

• medical image analysis,

• natural language processing,

• brain–computer interfaces, and

• financial time series.

CNNs are also known as shift invariant or space invariant artificial neural
networks, based on the shared-weight architecture of the convolution kernels or
filters that slide along input features and provide translation-equivariant
responses known as feature maps.[ Counter-intuitively, most convolutional
neural networks are not invariant to translation, due to the down sampling
operation they apply to the input.
21

Feed-forward neural networks are usually fully connected networks, that


is, each neuron in one layer is connected to all neurons in the next layer.
The "full connectivity" of these networks makes them prone to
overfitting data. Typical ways of regularization, or preventing
overfitting, include: penalizing parameters during training (such as
weight decay) or trimming connectivity (skipped connections, dropout,
etc.) Robust datasets also increase the probability that CNNs will learn
the generalized principles that characterize a given dataset rather than the
biases of a poorly-populated set.
Convolutional networks were inspired by biological processes in that the
connectivity pattern between neurons resembles the organization of the
animal visual cortex. Individual cortical neurons respond to stimuli only
in a restricted region of the visual field known as the receptive field. The
receptive fields of different neurons partially overlap such that they
cover the entire visual field. CNNs use relatively little pre-processing
compared to other image classification algorithms. This means that the
network learns to optimize the filters (or kernels) through automated
learning, whereas in traditional algorithms these filters are hand-
engineered. This independence from prior knowledge and human
intervention in feature extraction is a major advantage. Comparison of
the LeNet and AlexNet convolution, pooling and dense layer (AlexNet
image size should be 227×227×3, instead of 224×224×3, so the math
will come out right. The original paper said different numbers, but
Andrej Karpathy, the head of computer vision at Tesla, said it should be
227×227×3 (he said Alex did not describe why he put 224×224×3). The
next convolution should be 11×11 with stride 4: 55×55×96 (instead of
54×54×96). It would be calculated, for example, as: [(input width 227 -
kernel width 11) / stride 4] + 1 = [(227 - 11) / 4] + 1 = 55. Since the
kernel output is the same length as width, its area is 55×55.)
22

A convolutional neural network consists of an input layer, hidden layers


and an output layer. In a convolutional neural network, the hidden layers
include one or more layers that perform convolutions. Typically this
includes a layer that performs a dot product of the convolution kernel
with the layer's input matrix. This product is usually the Frobenius inner
product, and its activation function is commonly ReLU. As the
convolution kernel slides along the input matrix for the layer, the
convolution operation generates a feature map, which in turn contributes
to the input of the next layer. This is followed by other layers such as
pooling layers, fully connected layers, and normalization layers. Here it
should be noted how close a convolutional neural network is to a
matched filter
Convolutional layers
In a CNN, the input is a tensor with shap
(number of inputs) × (input height) × (input width) × (input channels)
After passing through a convolutional layer, the image becomes
abstracted to a feature map, also called an activation map, with shape:
(number of inputs) × (feature map height) × (feature map width) ×
(feature map channels).
23

Convolutional layers convolve the input and pass its result to the next
layer. This is similar to the response of a neuron in the visual cortex to a
specific stimulus. Each convolutional neuron processes data only for its
receptive field.Although fully connected feed forward neural networks
can be used to learn features and classify data, this architecture is
generally impractical for larger inputs (e.g., high-resolution images),
which would require massive numbers of neurons because each pixel is a
relevant input feature. A fully connected layer for an image of size 100 ×
100 has 10,000 weights for each neuron in the second layer. Convolution
reduces the number of free parameters, allowing the network to be
deeper. For example, using a 5 × 5 tiling region, each with the same
shared weights, requires only 25 neurons. Using regularized weights
over fewer parameters avoids the vanishing gradients and exploding
gradients problems seen during back propagation in earlier neural
networks. To speed processing, standard convolutional layers can be
replaced by depth wise separable convolutional layers, which are based
on a depth wise convolution followed by a point wise convolution. The
depth wise convolution is a spatial convolution applied independently
over each channel of the input tensor, while the point wise convolution is
a standard convolution restricted to the use of Pooling layer
Convolutional networks may include local and/or global pooling layers
along with traditional convolutional layers. Pooling layers reduce the
dimensions of data by combining the outputs of neuron clusters at one
layer into a single neuron in the next layer. Local pooling combines
small clusters, tiling sizes such as 2 × 2 are commonly used. Global
pooling acts on all the neurons of the feature map. There are two
common types of pooling in popular use: max and average. Max pooling
uses the maximum value of each local cluster of neurons in the feature
map, while average pooling takes the average value.
24

Fully connected layers


Fully connected layers connect every neuron in one layer to every
neuron in another layer. It is the same as a traditional multilayer
perceptron neural network (MLP). The flattened matrix goes through a
fully connected layer to classify the images.
Receptive field In neural networks, each neuron receives input from
some number of locations in the previous layer. In a convolutional layer,
each neuron receives input from only a restricted area of the previous
layer called the neuron's receptive field. Typically the area is a square
(e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the receptive
field is the entire previous layer. Thus, in each convolutional layer, each
neuron takes input from a larger area in the input than previous layers.
This is due to applying the convolution over and over, which takes the
value of a pixel into account, as well as its surrounding pixels. When
using dilated layers, the number of pixels in the receptive field remains
constant, but the field is more sparsely populated as its dimensions grow
when combining the effect of several layers.
To manipulate the receptive field size as desired, there are some
alternatives to the standard convolutional layer. For example, atrous or
dilated convolution expands the receptive field size without increasing
the number of parameters by interleaving visible and blind regions.
Moreover, a single dilated convolutional layer can comprise filters with
multiple dilation ratios,[30] thus having a variable receptive field size.
25

Weights Each neuron in a neural network computes an output value by


applying a specific function to the input values received from the
receptive field in the previous layer. The function that is applied to the
input values is determined by a vector of weights and a bias (typically real
numbers). Learning consists of iteratively adjusting these biases and
weights.
The vectors of weights and biases are called filters and represent particular
features of the input (e.g., a particular shape). A distinguishing feature of
CNNs is that many neurons can share the same filter. This reduces the
memory footprint because a single bias and a single vector of weights are
used across all receptive fields that share that filter, as opposed to each
receptive field having its own bias and vector weighting.
Deconvolutional
A deconvolutional neural network is essentially the reverse of a CNN. It
consists of deconvolutional layers and unpooling layers
A deconvolutional layer is the transpose of a convolutional layer. Specifically, a
convolutional layer can be written as a multiplication with a matrix, and a
deconvolutional layer is multiplication with the transpose of that matrix.be fixed
by upscale-then-convolve.

8.3 DIGITAL IMAGE PROCESSING


Digital image processing is the use of computer algorithms to perform image
processing on digital images. As a subcategory or field of digital signal
processing, digital image processing has many advantages over analog image
processing. It allows a much wider range of algorithms to be applied to the
input data and can avoid problems such as the build-up of noise and signal
distortion during processing. Considerable advances have been made over the
past 30 years resulting in routine application of image processing to problems in
26

medicine, manufacturing, entertainment, law enforcement, and many others.


Examples include mapping internal organs in medicine using various scanning
technologies (image reconstruction from projections), automatic fingerprint
recognition (pattern recognition and image coding), and HDTV (video coding).

The discipline of image processing covers a vast area of scientific and


engineering knowledge. It is built on a foundation of one- and two-dimensional
signal processing theory and overlaps with such disciplines as artificial
intelligence (scene understanding), information theory (image coding),
statistical pattern recognition (image classification), communication theory
(image coding and transmission), and microelectronics (image sensors, image
processing hardware). Broadly, image processing may be subdivided into the
following categories: enhancement, restoration, coding, and understanding. The
goal in the first three categories is to improve the pictorial information either in
quality (for purposes of human interpretation) or in transmission efficiency. In
the last category, the objective is to obtain a symbolic description of the scene.
In electrical engineering and computer science, image processing is any form of
signal processing for which the input is an image, such as a photograph or video
frame; the output of image processing may be either an image or, a set of
characteristics related to the image. Image Processing and Analysis can be
defined as the "act of examining images for the purpose of identifying objects
and judging their significance". A major attraction of digital imaging is the
ability to manipulate image and video information with the computer. Digital
image processing is now a very important component in many industrial and
commercial applications and a core component of computer vision applications.
Image processing techniques also provide the basic functional support for
document image analysis and many other medical applications. The field of
digital image processing is continually evolving. Transform theory plays a key
27

role in image processing. Image and signal compression is one of the most
important applications of wavelets. A key idea for wavelets is the concept of
scale. The discrete wavelet transforms decomposes an image into approximation
and detail.

Image Processing deals with the processing and display of images of real
objects. Their emphasis is on the modification of the image, which takes in a
digital image and produces some other information, decision etc.
Communication in digital image primarily involves local communication
between image processing systems and remote processing systems and remote
communication from one point to another, typically in connection with the
transmission of image hardware. Communication across vast distances presents
a more serious challenge if the intent is to communicate image data rather than
abstracted results. Monochrome and color TV monitors are the principle display
devices used in modern digital processing systems. Printing image display
devices are useful primarily for low-resolution image processing work.
Digital image processing applications include the following:

 Restorations and enhancements.


 Image transmission and coding.
 Color processing.
 Remote sensing.
 Robot vision.
 Hybrid technique
 Pattern recognition.
 High-quality color representation.
 Super-high-definition image processing.
 Impact of standardization on image processing.
28

 In Agriculture
 Air pollution and environment survey.
 Geology.
 In Industry
 Nondestructive testing & quality inspection.
 Remote sensing.
 Security industry.
 Robotics.
 Print industry.
 Entertainment & Ad – industry.
 In Health Care
 Medical imaging.
 Digital & computer assisted radiology.
8.3.1 Output Image

The term image, refers to a two-dimensional light intensity function f(x, y),
where x and y denote spatial coordinates and value at any point (x, y) is
proportional to the brightness of the image at that point.

8.3.2 Digital Image

A digital image can be considered as a matrix whose row and column indices
identify a point in the image and the corresponding matrix element values
identifies the gray level at that point. In a most generalized way, a digital image
is an array of numbers depicting spatial distribution of a certain field of
parameters. Digital image consists of discrete picture elements called pixels.
Based on the way that image data is saved, images can be split into 3 different
types. They are

 Bitmap
 Vector
29

 Metafile
8.3.4 Bitmap
Bitmaps images are exactly what their name says they are: a collection of
bits that form an image. The image consists of a matrix of individual dots (or
pixels) that all have their own color described using bits. Bitmap graphics are
also called raster images. A picture saved using the Paint program is likely to
have the .bmp file extension, for bit map. The data in .bmp files is not
compressed; therefore bitmap files tend to be very large. Bitmap graphics can
be saved in any of these formats: GIF, JPEG, TIFF, BMP, PICT, PNG and
PCX.
8.3.5 Vector

In vector graphics, the co-ordinates of images (lines and curves) are saved as
mathematical data. You can imagine the co-ordinates as being all the points
through which lines and curves pass. It's a little like drawing a square on a piece
of graph paper and describing it, using the co-ordinates of all 4 corners.
Computer Aided Design (CAD) is based on vector graphics. Images produced
using vector graphics are ideal for many purposes because they're so much
smaller than bitmaps - it is not necessary to store information about every pixel,
just about the lines and curves, their co-ordinates, width and color. The format
of your vector graphic could be draw or one of many others depending on the
software used. Examples of commercial software that uses vector graphics are
Corel and Draw, Macromedia Flash and Adobe Illustrator. Scalable Vector
Graphics, or SVG, is a new graphics format that allows Web designers to
include very realistic interactive vector graphics and animation to Web pages
using only plain text commands based on XML (extensible Markup Language).

Metafile graphics are simply 2D graphics that are made up of both vector and
bitmap. If you drew a shape using vector graphics, and then filled it with a
bitmap pattern, then you would have metafile. The vector object still retains the
30

property of scalability without any loss of resolution. The circle above was
created as a vector graphic, and then a fill added. It was saved as a .gif to
include on this page, which unfortunately changes it to a bitmap with a
subsequent loss of scalability. Clip Art images for use with desktop publishing
are usually supplied as metafiles. If you're a sub-editor or a desk top publishing
user, you would want to be able to rescale or stretch graphics to fill the space
you have, whilst retaining resolution, rather than to create one again from
scratch. Metafile graphics suit this purpose admirably. The formats that you're
likely to meet are: WMF (Windows Metafile), EMF (Enhanced Metafile), and
CGM (Computer Graphics Metafile).CGM graphics have many applications
because the image size is independent of file size, which means that you can
enlarge the size of the original graphic without increasing the file size. This
makes it ideal for many electronic document applications, maps (think of being
able to zoom in without waiting for ages for the image to load), technical
drawings, and icons.
8.3.6 Bitmap Graphics Formats

Bitmap graphics format is the specific format in which an image file is saved.
The format is identified by three-letter extension at the end of the file name.
Every format has its own characteristics, advantages and disadvantages. By
defining the file format it may be possible to determine the number of bits per
pixel. The Bitmap graphics formats are listed below

 GIF Format
 JPEG Format
 TIFT Format
 BMP Format
 PICT Format
 PNG Format
 PCX Format
31

Image Processing deals with the processing and display of images of real
objects. Their emphasis is on the modification of the image, which takes in a
digital image and produces some other information, decision etc. A digital
image is an array of real or complex processing of any two dimensional
data.The elements of the general-purpose system capable of performing the
image processing operations are:
1. Image Acquisition
2. Image Storage
3. Processing the image
4. Communication
5. Display
8.3.7 Image Acquisition

Image acquisition is the process of acquiring the digital images using some
physical devices and digitizer. The most commonly used image acquisition
devices are scanner and video cameras. Image acquisition is the process of
acquiring a digital image. To acquire an image we require an imaging sensor
and the capability to digitize the signal produced by the sensor.

Analog Image Analog Digital


output
Input Sensor image ADC Output

Fig. 8.1 Image Acquisitions


32

8.3.8. Image Storage

Storage of digital processing elements falls in the following three categories.

They are

1. Short-term storage - used during processing


2. Online storage - for relatively fast recall
3. Archival storage - characterized by infrequent access

Processing of digital image involves procedures that are usually expressed in


algorithmic form. The exception of image acquisition and display, most image
processing functions implemented in software.
Communication in digital image primarily involves local communication
between image processing systems and remote processing systems and remote
communication from one point to another, typically in connection with the
transmission of image hardware. Communication across vast distances presents
a more serious challenge if the intent is to communicate image data rather than
abstracted results. Monochrome and color TV monitors are the principle display
devices used in modern- digital processing systems. Printing image display
devices are useful primarily for low-resolution image processing work

8.3.9 Steps In Image Processing

There are seven steps involved in the digital image processing. Image
Processing and Analysis can be defined as the "act of examining images for the
purpose of identifying the objects and judging their significance". A major
attraction of digital imaging is the ability to manipulate image and video
information with the computer. Digital image processing is now a very
33

important component in many Industrial and commercial applications and a


core component of computer vision applications.

Comparison Multi resolution Morphological


Wavelet Processing
Transformation
Preprocessing
Segmentation
Image
Restoration
Representation &

Image Descriptor
Enhancement Knowledge

Base Recognizer
Image Acquisition
&

Fig. 8.2 Steps involved in Image processing

Image acquisition is the process of acquiring a digital


image. To acquire an image we require an imaging sensor and the
capability to digitize the signal produced by the sensor. The key
function of preprocessing is to improve the image in ways that
increase the chances of the other process. The processes are image
enhancement and image refinement.
(i) Image enhancement
1) To provide more effective display of data for visual interpretation
human eye can distinguish up to 40 grey shades
34

2) Increase the visual distinction between features in a scene.


3) "Digital darkroom" techniques
(ii) Image rectification and restoration
1) Correction of geometric distortions,
2) Calibration of data,
3) Elimination of noise

Segmentation partitions the input image into its constituted parts or objects.
In general, autonomous segmentation is one of the most difficult tasks in digital
image processing.

Description, also called feature selection, deals with extracting features that
result in some quantitative information of interest or features that are basic for
differentiating one class of objects from another.

Recognition is the process that assigns a label to an object based on the


information provided by its descriptors. Interpretation involves assigning
meaning to an ensemble of recognized objects. Knowledge about a problem
domain is coded into an image processing system in the form of a knowledge
database. This knowledge may be as simple as detailing regions of an image
where the information of interest is known to be located, thus limiting the
search that has to be conducted in seeking the information.

The knowledge base can be quite complex such as an interrelated list of all
major possible defects in a materials inspection problem or an image database
containing high-resolution satellite images of a region in connection with
change-detection applications

The increasing number of digital photos on both personal devices and the
Internet has posed a big challenge for storage. With the fast development and
35

prevailing use of handheld cameras, cost as well as required photography skills


are much lower than before. Users today have gotten used to taking and posting
photos profligately with mobile phones, digital cameras, and other portable
devices to record daily life, share experiences, and promote businesses.
According to recent reports, Instagram users have been posting an average of 55
million photos every day . Facebook users are uploading 350 million photos
each day . How to store, backup, and maintain these enormous amount of
photos in an efficient way has become an urgent problem. The most popular
way to reduce storage sizes of photos is via JPEG compression . It is designed
for reducing the size of photos taken in realistic scenes with smooth variations
of tone and color. Though several superior formats such as JPEG almost all
imaging devices like digital cameras and smart phones. Consequently, the
overwhelming majority of images stored in both personal devices and the
Internet are in JPEG format. The lossy coded baseline JPEG image, which is
referred to as the JPEG coded image .
The JPEG baseline,, nevertheless leaves plenty of room for further
compression. It reduces inter-block redundancy among DC coefficients via a
differential coding and exploits the statistical correlation inside each block
through a table based Huffman coding. The performance of image compression
can be enhanced by introducing both advanced intra prediction methods, such
as pixel-wise and block-wise intra predictors, and high efficiency entropy
coding methods like arithmetic coders .

We notice that all the aforementioned intra prediction methods operate on


pixels in the raw images (i.e. original captured images without any
compression). To further compress JPEG coded images, these methods have to
first decode the JPEG coded images to the spatial domain and then perform
intra or inter predictions. When lossless compression is mandatory as for photo
archiving, backup, and sync, the prediction residues may need more bits for
36

lossless compression compared with the corresponding original JPEG coded


image. To address this problem, some lossless compression methods are
proposed to reduce the spatial redundancy in the frequency domain or by
designing advanced arithmetic entropy coding methods . Besides, commercial
or open-source archivers such as WinZip , PackJPG , StuffIt have employed
dedicated algorithms to reduce the file size of individual JPEG coded images.

On the other hand, when dealing with a group of correlated images, the inter
image redundancy can be exploited by organizing images as a pseudo sequence
and compress the sequence like a video , or subtracting a representative signal
(e.g., an average image) from each image and coding the residues using image
coding methods. Recently, inter image redundancy has also been investigated
for image compression using a predefined 3D model , similar images retrieved
from clouds or videos. However, all these compression schemes are designed
for coding pixels in raw images. To the best of our knowledge, there is no
lossless compression scheme for the existing JPEG coded image set presented
before.

A novel compression scheme to further compress a set of JPEG coded


correlated images without loss. Given a JPEG coded image set, we propose to
remove both inter and intra redundancies by a hybrid prediction in the feature,
spatial, and frequency domains. We first evaluate the pair-wise correlation
between images by introducing the feature-based measurement so as to
determine the prediction structure which is robust to scale, rotation, and
illumination. The disparity between images is then compensated by both the
global (geometric and photometric) alignments and the local motion estimation
in the spatial domain. Furthermore, we reduce both the inter and intra
redundancies via frequency domain prediction and context-adaptive entropy
37

coding. Compared with our preliminary work reported in, we not only provide
more details and discussions of our scheme here, but more importantly we
further improve the coding performance by introducing both the intra frame
lossless compression algorithm and advanced entropy coding methods. The cost
of storage and transmission of JPEG-coded image collections (e.g. geotagged
images and personal albums) transparently for personal and cloud applications
is reduced.

Here in the above figure we have done testing by


uploading
38

single image and detected single object from that image


successfully passed the test case. : Test case 2: Uploading
image and detecting the multiple objects in that image a time.

Here in the above figure we have done testing by uploading single


image having multiple objects and detected multiple objects from
that image successfully and passed the test cases.
39

Sample output images


40
41

CHAPTER 9

CONCLUSION

9.1 PROJECT CONCLUSION

Although the application of Remote camera monitoring is not a new area


of development, but the novelty in the current system is the utilization of the
local camera also along with the remote IP camera that makes the system
ubiquitous. The application APIs have been designed in such a way that it can
be used as interface between wide range of people and clients. The aspect can
be used as a generic platform for many other security monitoring applications.

9.2 FUTURE ENHANCEMENT

A two-stage network architecture to detect and repair damaged old photos


automatically. Our network uses the same model architecture in each stage
(except the output layer) but different parameter sets for damage detection and
repairment. The concepts of parallel structure and channel attention is integrated
in our network to enhance the performance. Experiments show that our method
can accurately identify the distorted areas and successfully repair them.
42

CHAPTER 10

APPENDIX

10.1 SCREEN SHOTS

 From Main FORM DESIGN


 VIDEO CAPTURE DEVICE FORM
 VIEWER FORM
43

REFERENCE

[1]Echoboomer:www.worldwidewords.org/turnsofphrase/tpech1.htm.

[2] Patrick Seeling and Martin Reisslein,”Evaluating multimedia networking


mechanisms using video traces” IEEE potentials October/ November 2005.

[3] J.-S. Hu and T.-M. Su, “Robust Environmental Change Detection Using
PTZ Camera via Spatial-Tempora Probabilistic Modeling”, IEEE/ASME
Transactions on Mechatronics, Vol.12, Issue 3, pp. 339-344 (2007).

[4] M. Shah, O. Javed, and K. Shafique, “Automated visual surveillance in


realistic scenarios,” vol. 14, no. 1. Los Alamitos, CA, USA: IEEE Computer
Society, 2007, pp. 30–39.

[5] A. J. Lipton, “Keynote: intelligent video as a force multiplier for crime


detection and prevention.” The IEE International Symposium on Imaging for
Crime Detection and Prevention, 2005, pp. 151–156.

[6] M. Quaritsch, M. Kreuzthaler, B. Rinner, H. Bischof, and B. Strobl,


“Autonomous multicamera tracking on embedded smart cameras,” EURASIP
Journal on Embedded Systems, vol. 2007. [Online]. Available:
http://www.hindawi.com/GetArticle.aspx?doi=10.1155/2007/92827

[7] A. Hampapur, L. Brown, J. Connell, A. Ekin, N. Haas, M. Lu, H. Merkl, and


S. Pankanti, “Smart video surveillance: exploring the concept of multiscale
spatiotemporal tracking.” IEEE Signal Processing Magazine, vol. 22, no. 2, pp.
38 – 51, March 2005.

[8] ActiveEye, “Active alert,” 2005. [Online]. Available: http://www.


activeye.com/press rel forensics.html. 1 2 3 Bai, Xiang; Latecki, Longin Jan
(2007-08-27). "Discrete Skeleton Evolution". Energy Minimization Methods in
Computer Vision and Pattern Recognition. Lecture Notes in Computer Science.
44

Vol. 4679. Springer, Berlin, Heidelberg. pp. 362–374. CiteSeerX


10.1.1.79.8377. doi:10.1007/978-3-540-74198-5_28. ISBN 9783540741954.

1 2 3 4 5 Huichuan Duan; Jinling Wang; Xiyu Liu; Hong Liu (December


2008). "A scheme for morphological skeleton pruning". 2008 IEEE
International Symposium on IT in Medicine and Education. IEEE. pp. 1112–
1117. doi:10.1109/itme.2008.4744043. ISBN 9781424425105. S2CID
25274325.

1 2 3 4 Bai, Xiang; Latecki, Longin; Liu, Wen-yu (March 2007). "Skeleton


Pruning by Contour Partitioning with Discrete Curve Evolution". IEEE
Transactions on Pattern Analysis and Machine Intelligence. 29 (3): 449–462.
doi:10.1109/tpami.2007.59. ISSN 0162-8828. PMID 17224615. S2CID
14965041.

1 2 Shen, Wei; Bai, Xiang; Hu, Rong; Wang, Hongyuan; Jan Latecki, Longin
(February 2011). "Skeleton growing and pruning with bending potential ratio".
Pattern Recognition. 44 (2): 196–209. Bibcode:2011PatRe..44..196S.
doi:10.1016/j.patcog.2010.08.021. ISSN 0031-3203.

↑ "Part-Based Shape Similarity Project". www.dabi.temple.edu. Retrieved


2018-04-24.

1 2 3 Chacko, Binu P.; P, Babu Anto (February 2009). "Discrete Curve


Evolution Based Skeleton Pruning for Character Recognition". 2009 Seventh
International Conference on Advances in Pattern Recognition. IEEE. pp. 402–
405. doi:10.1109/icapr.2009.63. ISBN 9780769535203. S2CID 21114057.

↑ Duan, Huichuan; Wang, Jinling; Liu, Xiyu; Liu, Hong (October 2008). "A
Skeleton Pruning Approach Using Contour Length as the Significance
Measure". 2008 Third International Conference on Pervasive Computing and
Applications. IEEE. pp. 360

You might also like