RPRT

CHAPTER-1
INTRODUCTION
Visual perception is the ability to interpret the surrounding environment by processing
information that is contained in visible light. The resulting perception is also known as eyesight,
sight, or vision. The various physiological components involved in vision are referred to
collectively as the visual system.
Human vision is a complex process that is not yet completely understood, despite hundreds of
years of study and research. The complex physical process of visualizing something involves the
nearly simultaneous interaction of the eyes and the brain through a network of neurons, receptors,
and other specialized cells. The act of seeing starts when the cornea and then the lens of the eye
focuses an image of its surroundings onto a light-sensitive membrane in the back of the eye,
called the retina. The retina is actually part of the brain that is isolated to serve as a transducer for
the conversion of patterns of light into neuronal signals. The lens of the eye focuses light on the
photoreceptive cells of the retina, also known as the rods and cones, which detect the photons of
light and respond by producing neural impulses. These signals are processed in a hierarchical
fashion by different parts of the brain, from the retina upstream to central ganglia in the brain.
Vision play an important role in intelligence because brain process the visuals and make us to act
accordingly. When implementing the same concept in robots like a camera as eye and a powerful
processor as brain, it leads to artificial intelligence. Robotic vision continues to be treated
including different methods for processing, analyzing, and understanding. All these methods
produce information that is translated into decisions for robots. The different process or
algorithms used in robotic vision are object tracking and detection based on colour, movement,
shape etc. But all this have restrictions and drawbacks, since they heavily depends on camera
lighting which varies from morning to night. So replicating the object identification,
understanding methods by brain is the perfect solution for robotic vision and its intelligence.
Colour, shape, size, properties are the main parameters to define an object and by providing all
these to a learning process called Artificial Neural Network (ANN) the goal can be easily
achieved. Artificial Neural Networks are relatively crude electronic models based on the neural
structure of the brain. The brain basically learns from experience. This brain modeling also
promises a less technical way to develop machine solutions. If the robot learn by itself without
human teaching it adds artificial intelligence and its the most complicated thing ever.
1 | Page
CHAPTER-2
HUMAN VISUAL SYSTEM (HVS)

The human visual system gives human the ability to see the physical environment. The system
requires communication between its major sensory organ (the eye) and the core of the central
nervous system (the brain) to interpret external stimuli (light waves) as images. Humans are
highly visual creatures compared to many other animals which rely more on smell or hearing, and
over our evolutionary history we have developed an incredibly complex sight system.
The visual system is the part of the central nervous system which gives organisms the ability to
process visual detail, as well as enabling the formation of several non-image photo response
functions. It detects and interprets information from visible light to build a representation of the
surrounding environment. The visual system carries out a number of complex tasks, including the
reception of light and the formation of monocular representations; the buildup of a nuclear
binocular perception from a pair of two dimensional projections; the identification and
categorization of visual objects; assessing distances to and between objects; and guiding body
movements in relation to the objects seen. The psychological process of visual information is
known as visual perception, a lack of which is called blindness. Non-image forming visual
functions, independent of visual perception, include the pupillary light reflex (PLR) and circadian
photo entrainment.
The visual cortex is the largest system in the human brain and is responsible for processing the
visual image. It lies at the rear of the brain (highlighted in the image), above the cerebellum. The
region that receives information directly from the LGN (The Lateral Geniculate Nucleus is a
sensory relay nucleus in the thalamus of the brain.) is called the primary visual cortex, (also called
V1 and striate cortex). Visual information then flows through a cortical hierarchy. These areas
include V2, V3, V4 and area V5/MT (the exact connectivity depends on the species of the
animal). These secondary visual areas (collectively termed the extra striate visual cortex) process
a wide variety of visual primitives. Neurons in V1 and V2 respond selectively to bars of specific
orientations, or combinations of bars. These are believed to support edge and corner detection.
Similarly, basic information about color and motion is processed here.
Color vision is a critical component of human vision and plays an important role in both
perception and communication. Color sensors are found within cones, which respond to relatively
broad color bands in the three basic regions of red, green, and blue (RGB). Any colors in between
these three are perceived as different linear combinations of RGB. The eye is much more sensitive
2 | Page
to overall light and color intensity than changes in the color itself. Colors have three attributes:
brightness, based on luminance and reflectivity; saturation, based on the amount of white present;
and hue, based on color combinations. Sophisticated combinations of these receptors signals are
transduced into chemical and electrical signals, which are sent to the brain for the dynamic
process of color perception.
Fig 2.1 Formation of image in

brain
Depth perception refers to our ability to see the world in three dimensions. With this ability, we
can interact with the physical world by accurately gauging the distance to a given object. While
depth perception is often attributed to binocular vision (vision from two eyes), it also relies
heavily on monocular cues (cues from only one eye) to function properly. These cues range from
the convergence of our eyes and accommodation of the lens to optical flow and motion.
3 | Page
CHAPTER-3
DIGITIZING HUMAN VISION

Digitizing in the method of implementing the human vision system in robot so that it can sense
the vision. Human vision is one of the complex process, and with the processors, algorithms
available today, it is very difficult to implement it on a robotic system.
3.1 Visual Learning

Learning is the technique used by brain to do all stuffs like vision, understanding, thinking etc.
Visual learning process helps the human to understand, identify, detect, analyse anything that see
through eyes. From birth to death, the information in the brain is expanding. A new born baby
doesnt have the ability to understand objects contained in the visual frame. But it can detect or
differentiate objects because along with the three color channels a fourth channel called depth is
also capturing and it makes the visuals in 3 dimensional.
The idea behind teaching students from kinder garden using texts that contain visual
representation of fruits, vegetables, things etc. are to understand that for all objects there is a name
and classification so that the brain can easily process the image and understand it. For eg. when a
teacher shows an image of an apple and saying this thing is called apple, what happens inside his
brain is learning. The 2D image of apple get stored in brain with image name as apple. When
teacher adds that what you see in text is 2D it have back side with same format, it is fruit. Student
brain will add specification to image that already stored so that he can understand 3D apple fruit
easily without any doubt. If student sees an apple that is beyond the specification or features what
happens is that he/she ask question as Is that apple, isnt it? . And the brain finalise that it is
apple. This process is called visual learning.
3.2 Programming Language Approach

It is the way of explaining the activities that is happening inside brain with the help of a
programing language. Take C language as example, there functions are used for processing
different things and creating outputs. For almost all functions there will be input parameters and a
return result. Based on these results the brain takes decision and human works accordingly.
Sometimes these functions output may get modified based on what learned new over the previous
data. Input parameters to a function is from all sense organs so that most efficient is produced.
4 | Page
The block diagram below shows the outline of brain functioning when it sees two things, one
which is known and other unknown.
Observing a new thing
Observing a unknown thing
Fig 3.1 Block Diagram
When brain is observing a new thing new memory allocation for the object actually happening
like what did in childhood. Along with image, the brain saves information like shape, size,
features, and specifications with the object. These all information are stored in order to reuse as
parameters when the observer sees the same thing or similar thing in future.
The scenario is different for processing a known thing, there brain will uses stored information to
take perfect decision, along with that it will modify or adds new information. All the processing
that taking place related to visuals is image processing and it is in real time. Also there are several
duplicate objects that resembles the original one in shape, size and color, here brain can get
5 | Page
confused. Confusion situations are perfectly handled by another function which will be waiting
for another parameter as by asking doubts, and confirms whether the object is new or not.
3.3 Image Processing

In imaging science, image processing is processing of images using mathematical operations by
using any form of signal processing for which the input is an image, a series of images, or a
video, such as a photograph or video frame; the output of image processing may be either an
image or a set of characteristics or parameters related to the image. Most image-processing
techniques involve treating the image as a two-dimensional signal and applying standard signalprocessing techniques to it. Images are also processed as three-dimensional signals where the
third-dimension being time or the z-axis.
Closely related to image processing are computer graphics and computer vision. In computer
graphics, images are manually made from physical models of objects, environments, and lighting,
instead of being acquired (via imaging devices such as cameras) from natural scenes, as in most
animated movies. Computer vision, on the other hand, is often considered high-level image
processing out of which a machine/computer/software intends to decipher the physical contents of
an image or a sequence of images (e.g., videos or 3D full-body magnetic resonance scans). The
entire image processing part from the point at which it start capturing the image and taking the
decision, several or single frames are considered for deep processing.
3.4 Frame Rate

In video (both analog and high definition), just as in film, images are displayed as Frames.
However, there are differences in the way the frames are displayed on a television screen. In terms
of traditional video content, in NTSC-based countries there are 30 separate frames displayed
every second (1 complete frame every 1/30th of a second), while in PAL-based countries, there
are 25 separate frames displayed every second (1 complete frame displayed every 25th of a
second). These frames are either displayed using the Interlaced Scan method or the Progressive
Scan method. Human eyes frame rate can be defined as the number of frames that is captured or
processed in a second. As theoretical about 1000 frames can be interpret, but research and
experiments shows that analyzing and interpreting about 220 frames are possible even with deep
training.
According to research that determined how many light flashes per second the human brain can
discern as separate before they look like a steady beam, scientists have found that for us, life is a
movie running at around 60 frames per second. When a cars wheel is spinning fast enough to
6 | Page
simulate that frame rate, it can look like the wheels spin backwards. Whats happening is that the
wheels position is only slightly behind where it was when the last frame was stitched together by
your brain.
If you get a cars wheel spinning in sync with the frame rate, it can even look as though a car is
gliding rather than rolling.
The motion picture, the scanning of an image for television, and the sequential reproduction of the
flickering visual images they produce, work in part, because of an optical phenomena called the
persistence of vision and its psychological partner, the phi phenomenon the mental bridge that the
mind forms to conceptually complete the gaps between the frames or pictures. Persistence of
vision also plays a role in keeping the world from going pitch black every time we blink our eyes.
Whenever light strikes the retina, the brain retains the impression of that light for about a tenth of
a second depending on the brightness of the image after the source of that light is removed from
the eye. This is due to a prolonged chemical reaction. As a result, the eye cannot clearly
distinguish fast changes in light that occur faster than this retention period. The changes either go
unnoticed or they appear to be one continuous picture to the human observer. This fundamental
fact of the way we see has been used to our advantage.
7 | Page
Fig 3.2 Frame rate
CHAPTER 4
OBJECT DETECTING TECHNIQUES

Object detection is a computer technology related to computer vision and image processing that
deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or
cars) in digital images and videos. Well-researched domains of object detection include face
detection and pedestrian detection. Object detection has applications in many areas of computer
vision, including image retrieval and video surveillance. The goal of object detection is to detect
all instances of objects from a known class, such as people, cars or faces in an image. Typically
only a small number of instances of the object are present in the image, but there is a very large
number of possible locations and scales at which they can occur and that need to somehow be
explored
4.1 Region of Interest (ROI)

A region of interest (often abbreviated ROI), is a selected subset of samples within a dataset
identified for a particular purpose. In the figure given below ROI is converted to binary image and
rest everything in normal RGB color. Detection of ROI is the first process for which the further
processing to identify or understanding the object.
Fig 4.1 ROI Detection

8 | Page
The different methods to identify the ROI detection are background subtraction method and
colour detection method. Further processing and learning on the ROI region is done with the help
of Artificial Neural Network (ANN). In machine learning and cognitive science, artificial neural
networks (ANNs) is a network inspired by biological neural networks (the central nervous
systems of animals, in particular the brain) which are used to estimate or approximate functions
that can depend on a large number of inputs that are generally unknown.
4.2 Background subtraction

Background subtraction, also known as Foreground Detection, is a technique in the fields of
image processing and computer vision wherein an image's foreground is extracted for further
processing like object recognition. Generally an image's regions of interest are objects (humans,
cars, text etc.) in its foreground. After the stage of image preprocessing object localisation is
required which may make use of this technique. Background subtraction is a widely used
approach for detecting moving objects in videos from static cameras. The rationale in the
approach is that of detecting the moving objects from the difference between the current frame
and a reference frame, often called background image, or background model. Background
subtraction is mostly done if the image in question is a part of a video stream. Background
subtraction provides important cues for numerous applications in computer vision, for example
surveillance tracking or human poses estimation. However, background subtraction is generally
based on a static background hypothesis which is often not applicable in real environments. With
indoor scenes, reflections or animated images on screens lead to background changes. In a same
way, due to wind, rain or illumination changes brought by weather, static backgrounds methods
have difficulties with outdoor scenes.
9 | Page
Fig 4.2 Background Subtraction
A simple algorithm to do background subtraction is as follows. After turning on the camera, initial
two or three frames and their average is stored as reference. All the frames that captured after this
are processed by finding the difference with reference frame. By applying a proper threshold level
the object can be easily detected. Threshold is important because noise due to wind, shaking of
camera minor presence of smaller object etc. can be easily neglected.
4.3 Colour Detection

Object detection and segmentation is the most important and challenging fundamental task of
computer vision.
It is a critical part in many applications such as image search, scene
understanding, etc. However it is still an open problem due to the variety and complexity of
object classes and backgrounds. The easiest way to detect and segment an object from an image is
the colour based methods. The object and the background should have a significant colour
difference in order to successfully segment objects using colour based methods.
4.4 Colour Spaces

A range of colours can be created by the primary colours of pigment and these colours then define
a specific colour space. Colour space, also known as the colour model (or colour system), is an
abstract mathematical model which simply describes the range of colours as tuples of numbers,
typically as 3 or 4 values or colour components (e.g. RGB). Basically speaking, colour space is an
elaboration of the coordinate system and sub-space. Each colour in the system is represented by a
single dot. A colour space is a useful method for users to understand the colour capabilities of a
particular digital device or file. It represents what a camera can see, a monitor can display or a
printer can print, and etc. There are a variety of colour spaces, such as RGB, HSV, YCbCr, HIS.
4.4.1 RGB
RGB (Red, Green, Blue) describes what kind of light needs to be emitted to produce a given
colour. Light is added together to create form from darkness. RGB stores individual values for
red, green and blue. An RGB colour space can be simply interpreted as "all possible colours"
which can be made from three colours for red, green and blue. In such conception, each pixel of
an image is assigned a range of 0 to 255 intensity values of RGB components. That is to say,
using only these three colours, there can be 16,777,216 colours on the screen by different mixing
ratios. RGB is not a colour space, it is a colour model.
4.4.2 HSV
10 | P a g e
HSV (hue, saturation, value), also known as HSB (hue, saturation, brightness), is often used by
artists because it is often more natural to think about a colour in terms of hue and saturation than
in terms of additive or subtractive colour components. HSV is a transformation of an RGB colour
space, and its components and colourimetric are relative to the RGB colour space from which it
was derived
Fig 4.3 RGB and HSV colour channel
Fig 4.4 HSV Colour Space
4.4.3 YCbCr
YCbCr, YCbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or YCBCR, is a family of colour
spaces used as a part of the colour image pipeline in video and digital photography systems. Y is
the luma component and CB and CR are the blue-difference and red-difference chroma
components. Y (with prime) is distinguished from Y, which is luminance, meaning that light
intensity is nonlinearly encoded based on gamma corrected RGB primaries. YCbCr colour spaces
11 | P a g e
are defined by a mathematical coordinate transformation from an associated RGB colour space. If
the underlying RGB colour space is absolute, the YCbCr colour space is an absolute colour space
as well; conversely, if the RGB space is ill-defined, so is YCbCr.
Fig 4.5 YCbCr Colour Space
4.5 Algorithm
Colour detection method to find and track particular object in a frame is simple but the output will
be less efficient. A simple algorithm to detect a blue colour object is as follows.
Capture frame by frame.

Blur the image to reduce noise. Gaussian blur is the best.
Then apply threshold values separately for each colour channels.
As blue object is the target, threshold value for red and green channels will be zero.
Threshold is also applied to blue object to get the targeted object.
So as output the frame will consist of all the objects which lies in threshold value.
This method with RGB is less efficient as the light that reaches the camera changes every second
in less manner. Morning, noon, evening and night provide different blue shades for the same
object. So real time continuous tracking in RGB colour channel is difficult else the threshold must
be adjusted manually. Usually, one can think that BGR colour space is more suitable for colour
based segmentation. But HSV colour space is the most suitable colour space for colour based
image segmentation. HSV colour space is also consists of 3 matrices, HUE, SATURATION and
VALUE. HUE represents the colour, SATURATION represents the amount to which that
respective colour is mixed with white and VALUE represents the amount to which that respective
colour is mixed with black. The HSV colour space is quite similar to the way in which humans
perceive colour. The other models, except for HSL, define colour in relation to the primary
colours. The colours used in HSV can be clearly defined by human perception, which is not
always the case with RGB or CMYK. So by using threshold values in three channels we can track
12 | P a g e
a particular object without considering camera light up to certain limit. HSV colour detection is
more efficient than RGB.
4.6 Drawbacks
Background subtraction is more efficient algorithm to detect and track object. But this algorithm
is only applicable when there is a stable background with the object to be tracked is moving.
Unfortunately if any object which shows considerable movement in the frame will also get
detected. Camera shaking may also lead to faulty detection and error output.
Colour detection methods doesnt provide any meaning to object tracking, because it will detect
and track all the objects in the frame whether its the targeted object or not since color is only
factor that used here. As the intensity of light reaches earth is different at every instant and the
color of light reflected by each object varies accordingly, so detecting object by considering only
a single factor as color is less efficient. The frame consisting of the object will have different
color shades of the same color from morning to night so the detection become difficult in real
time from video.
Another major factor while considering object tracking using colour detection is that the presence
of other object which have same colour. Since shape or any other factor is not specifying the
output frame will consist of different object with same colour get detected.
13 | P a g e
Fig 4.6 Noise detection
In the above image all the objects having red colour is detected. In order to detect a specific object
for eg. Rectangle sheet along with colour, its shape and geometry must be specified to get most
efficient output.
CHAPTER-5
ARTIFICIAL NEURAL NETWORKING (ANN)

Artificial Neural Networks are relatively crude electronic models based on the neural structure of
the brain. The brain basically learns from experience. It is natural proof that some problems that
are beyond the scope of current computers are indeed solvable by small energy efficient packages.
This brain modeling also promises a less technical way to develop machine solutions. This new
approach to computing also provides a more graceful degradation during system overload than its
more traditional counterparts.
These biologically inspired methods of computing are thought to be the next major advancement
in the computing industry. Even simple animal brains are capable of functions that are currently
impossible for computers. Computers do rote things well, like keeping ledgers or performing
complex math. But computers have trouble recognizing even simple patterns much less
generalizing those patterns of the past into actions of the future.
5.1 Biological Neural Network

Now, advances in biological research promise an initial understanding of the natural thinking
mechanism. This research shows that brains store information as patterns. Some of these patterns
are very complicated and allow us the ability to recognize individual faces from many different
angles. This process of storing information as patterns, utilizing those patterns, and then solving
problems encompasses a new field in computing. This field, as mentioned before, does not utilize
traditional programming but involves the creation of massively parallel networks and the training
of those networks to solve specific problems. This field also utilizes words very different from
traditional computing, words like behave, react, self-organize, learn, generalize, and forget.
The exact workings of the human brain are still a mystery. Yet, some aspects of this amazing
processor are known. In particular, the most basic element of the human brain is a specific type of
cell which, unlike the rest of the body, doesn't appear to regenerate. Because this type of cell is the
only part of the body that isn't slowly replaced, it is assumed that these cells are what provides us
14 | P a g e
with our abilities to remember, think, and apply previous experiences to our every action. These
cells, all 100 billion of them, are known as neurons. Each of these neurons can connect with up to
200,000 other neurons, although 1,000 to 10,000 is typical.
The power of the human mind comes from the sheer numbers of these basic components and the
multiple connections between them. It also comes from genetic programming and learning.The
individual neurons are complicated. They have a myriad of parts, sub-systems, and control
mechanisms. They convey information via a host of electrochemical pathways. There are over one
hundred different classes of neurons, depending on the classification method used. Together these
neurons and their connections form a process which is not binary, not stable, and not synchronous.
In short, it is nothing like the currently available electronic computers, or even artificial neural
networks. These artificial neural networks try to replicate only the most basic elements of this
complicated, versatile, and powerful organism. They do it in a primitive way. But for the software
engineer who is trying to solve problems, neural computing was never about replicating human
brains. It is about machines and a new way to solve problems.
The fundamental processing element of a neural network is a neuron. This building block of
human awareness encompasses a few general capabilities. Basically, a biological neuron receives
inputs from other sources, combines them in some way, performs a generally nonlinear operation
on the result, and then outputs the final result. Figure 2.2.1 shows the relationship of these four
parts.
Figure 5.1 A Simple Neuron.
Within humans there are many variations on this basic type of neuron, further complicating man's
attempts at electrically replicating the process of thinking. Yet, all natural neurons have the same
15 | P a g e
four basic components. These components are known by their biological names - dendrites, soma,
axon, and synapses. Dendrites are hair-like extensions of the soma which act like input channels.
These input channels receive their input through the synapses of other neurons. The soma then
processes these incoming signals over time. The soma then turns that processed value into an
output which is sent out to other neurons through the axon and the synapses.
Recent experimental data has provided further evidence that biological neurons are structurally
more complex than the simplistic explanation above. They are significantly more complex than
the existing artificial neurons that are built into today's artificial neural networks. As biology
provides a better understanding of neurons, and as technology advances, network designers can
continue to improve their systems by building upon man's understanding of the biological brain.
But currently, the goal of artificial neural networks is not the grandiose recreation of the brain. On
the contrary, neural network researchers are seeking an understanding of nature's capabilities for
which people can engineer solutions to problems that have not been solved by traditional
computing.
5.2 ANN Structure

In machine learning and cognitive science, an artificial neural network (ANN) is a network
inspired by biological neural networks (the central nervous systems of animals, in particular the
brain) which are used to estimate or approximate functions that can depend on a large number of
inputs that are generally unknown. Artificial neural networks are typically specified using three
things
5.2.1 Architecture
Specifies what variables are involved in the network and their topological relationshipsfor
example the variables involved in a neural network might be the weights of the connections
between the neurons, along with activities of the neurons
5.2.2 Activity Rule
Most neural network models have short time-scale dynamics: local rules define how the activities
of the neurons change in response to each other. Typically the activity rule depends on the weights
(the parameters) in the network.
16 | P a g e
5.2.3 Learning Rule

The learning rule specifies the way in which the neural network's weights change with time. This
learning is usually viewed as taking place on a longer time scale than the time scale of the
dynamics under the activity rule. Usually the learning rule will depend on the activities of the
neurons. It may also depend on the values of the target values supplied by a teacher and on the
current value of the weights.
There is no single formal definition of what an artificial neural network is. However, a class of
statistical models may commonly be called "neural" if it possesses the following characteristics:
Contains sets of adaptive weights, i.e. numerical parameters that are tuned by a learning
algorithm
Is capable of approximating non-linear functions of their inputs.
The adaptive weights can be thought of as connection strengths between neurons, which are
activated during training and prediction.
Artificial neural networks are similar to biological neural networks in the performing by its units
of functions collectively and in parallel, rather than by a clear delineation of subtasks to which
individual units are assigned. The term "neural network" usually refers to models employed in
statistics, cognitive psychology and artificial intelligence. Neural network models which
command the central nervous system and the rest of the brain are part of theoretical neuroscience
and computational neuroscience
17 | P a g e
Figure 5.2 Artificial Neural Network
5.3 An Artificial Neuron

The artificial neuron simulates four basic functions of a biological neuron. Fig.4.2 shows basic
representation of an artificial neuron. In Fig. 4.2, various inputs to the network are represented by
the mathematical symbol, x(n). Each of these inputs is multiplied by a connection weight. The
weights are represented by w(n). In the simplest case, these products are summed, fed to a
transfer function (activation function) to generate a result, and this result is sent as output. This is
also possible with other network structures, which utilize different summing functions as well as
different transfer functions. Some applications like recognition of text, identification of speech,
image deciphering of scenes etc. require binary answers. These applications may utilize the
binary properties of ORing and ANDing of inputs along with summing operations. Such functions
can be built into the summation and transfer functions of a network.
18 | P a g e
Figure 5.3 Artificial Neuron functioning.
Seven major components make up an artificial neuron. These components are valid whether the
neuron is used for input, output, or is in the hidden layers.
5.3.1 Weighting Factors

A neuron usually receives many simultaneous inputs. Each input has its own relative weight,
which gives the input the impact that it needs on the processing element's summation function.
Some inputs are made more important than others to have a greater effect on the processing
element as they combine to produce a neural response. Weights are adaptive coefficients that
determine the intensity of the input signal as registered by the artificial neuron. They are a
measure of an input's connection strength. These strengths can be modified in response to various
training sets and according to a network's specific topology or its learning rules.
5.3.2 Summation Function
The inputs and corresponding weights are vectors which can be represented as (i1, i2 . . . in) and
(w1, w2 . . . wn). The total input signal is the dot product of these two vectors. The result; (i1 *
19 | P a g e
w1) + (i2 * w2) +.. + (in * wn); is a single number. The summation function can be more
complex than just weight sum of products. The input and weighting coefficients can be combined
in many different ways before passing on to the transfer function. In addition to summing, the
summation function can select the minimum, maximum, majority, product or several normalizing
algorithms. The specific algorithm for combining neural inputs is determined by the chosen
network architecture and paradigm. Some summation functions have an additional activation
function applied to the result before it is passed on to the transfer function for the purpose of
allowing the summation output to vary with respect to time.
5.3.3 Transfer Function
The result of the summation function is transformed to a working output through an algorithmic
process known as the transfer function. In the transfer function the summation can be compared
with some threshold to determine the neural output. If the sum is greater than the threshold value,
the processing element generates a signal and if it is less than the threshold, no signal (or some
inhibitory signal) is generated. Both types of response are significant. The threshold, or transfer
function, is generally non-linear.
Linear functions are limited because the output is simply proportional to the input. The step type
of transfer function would output zero and one, one and minus one, or other numeric
combinations. Another type, the threshold or ramping function, can mirror the input within a
given range and still act as a step function outside that range. It is a linear function that is clipped
to minimum and maximum values, making it non-linear. Another option is a S curve, which
approaches a minimum and maximum value at the asymptotes. It is called a sigmoid when it
ranges between 0 and 1, and a hyperbolic tangent when it ranges between -1 and 1. Both the
function and its derivatives are continuous
5.3.4 Scaling and Limiting
After the transfer function, the result can pass through additional processes, which scale and
limit. This scaling simply multiplies a scale factor times the transfer value and then adds an offset.
Limiting is the mechanism which insures that the scaled result does not exceed an upper, or lower
bound. This limiting is in addition to the hard limits that the original transfer function may have
performed.
5.3.5 Output Function (Competition)
Each processing element is allowed one output signal, which it may give to hundreds of other
neurons. Normally, the output is directly equivalent to the transfer function's result. Some network
20 | P a g e
topologies modify the transfer result to incorporate competition among neighboring processing
elements. Neurons are allowed to compete with each other inhibiting processing elements unless
they have great strength. Competition can occur at one or both levels. First, competition
determines which artificial neuron will be active or provides an output. Second, competitive
inputs help determine which processing element will participate in the learning or adaptation
process.
5.3.6 Error Function and Back-Propagated Value
In most learning networks the difference between the current output and the desired output is
calculated as an error which is then transformed by the error function to match a particular
network architecture. Most basic architectures use this error directly but some square the error
while retaining its sign, some cube the error, other paradigms modify the error to fit their specific
purposes.
The error is propagated backwards to a previous layer. This back-propagated value can be either
the error, the error scaled in some manner (often by the derivative of the transfer function) or
some other desired output depending on the network type. Normally, this back-propagated value,
after being scaled by the learning function, is multiplied against each of the incoming connection
weights to modify them before the next learning cycle.
5.3.7 Learning Function
Its purpose is to modify the weights on the inputs of each processing element according to some
neural based algorithm.
5.4 Training of Artificial Neural Networks

Once a network has been structured for a particular application, it is ready for training. At the
beginning, the initial weights are chosen randomly and then the training or learning begins. There
are two approaches to training; supervised and unsupervised.
5.4.1 Supervised Training
In supervised training, both the inputs and the outputs are provided. The network then processes
the inputs and compares its resulting outputs against the desired outputs. Errors are then
propagated back through the system, causing the system to adjust the weights, which control the
network. This process occurs over and over as the weights are continually tweaked. The set of
data, which enables the training, is called the "training set."
21 | P a g e
During the training of a network, the same set of data is processed many times, as the connection
weights are ever refined. Sometimes a network may never learn. This could be because the input
data does not contain the specific information from which the desired output is derived. Networks
also don't converge if there is not enough data to enable complete learning. Ideally, there should
be enough data so that part of the data can be held back as a test.
Many layered networks with multiple nodes are capable of memorizing data. To monitor the
network to determine if the system is simply memorizing its data in some non-significant way,
supervised training needs to hold back a set of data to be used to test the system after it has
undergone its training. If a network simply can't solve the problem, the designer then has to
review the input and outputs, the number of layers, the number of elements per layer, the
connections between the layers, the summation, transfer, and training functions, and even the
initial weights themselves. Another part of the designer's creativity governs the rules of training.
There are many laws (algorithms) used to implement the adaptive feedback required to adjust the
weights during training.
The most common technique is known as back-propagation. The training is not just a technique,
but a conscious analysis, to insure that the network is not over trained. Initially, an artificial neural
network configures itself with the general statistical trends of the data. Later, it continues to
learn about other aspects of the data, which may be spurious from a general viewpoint. When
finally the system has been correctly trained and no further learning is needed, the weights can, if
desired, be frozen. In some systems, this finalized network is then turned into hardware so that it
can be fast. Other systems don't lock themselves in but continue to learn while in production use.
5.4.2 Unsupervised or Adaptive Training

The other type is the unsupervised training (learning). In this type, the network is provided with
inputs but not with desired outputs. The system itself must then decide what features it will use to
group the input data. This is often referred to as self-organization or adaption. These networks use
no external influences to adjust their weights. Instead, they internally monitor their performance.
These networks look for regularities or trends in the input signals, and makes adaptations
according to the function of the network. Even without being told whether it's right or wrong, the
network still must have some information about how to organize itself. This information is built
into the network topology and learning rules.
22 | P a g e
An unsupervised learning algorithm might emphasize cooperation among clusters of processing

elements. In such a scheme, the clusters would work together. If some external input activated any
node in the cluster, the cluster's activity as a whole could be increased. Likewise, if external input
to nodes in the cluster was decreased, that could have an inhibitory effect on the entire cluster.
Competition between processing elements could also form a basis for learning.
Training of competitive clusters could amplify the responses of specific groups to specific stimuli.
As such, it would associate those groups with each other and with a specific appropriate response.
Normally, when competition for learning is in effect, only the weights belonging to the winning
processing element will be updated. Presently, the unsupervised learning is not well understood
and there continues to be a lot of research in this aspect.
5.4.3 Learning Rates
The rate at which ANNs learn depends upon several controllable factors. A slower rate means
more time to spend in producing an adequately trained system. With faster learning rates,
however, the network may not be able to make the fine discriminations that are possible with a
system learning slowly. Most learning functions have some provision for a learning rate (learning
constant). Usually this term is positive and between 0 and 1.
If the learning rate is greater than 1, it is easy for the learning algorithm to overshoot in
correcting the weights, and the network will oscillate. Small values of the learning rate will not
correct the current error as quickly, but if small steps are taken in correcting errors, there is a good
chance of arriving at the best minimum convergence.
5.5 Types of Artificial Neural Networks

5.5.1 Single Layer Feed Forward Network
A neural network in which the input layer of source nodes projects into an output layer of neurons
but not vice-versa is known as single feed-forward or acyclic network. In single layer network,
single layer refers to the output layer of computation nodes as shown in Fig. 4.7.
23 | P a g e
Figure 5.4 Single Layer Feed Forward Network.
5.5.2 Multilayer Feed Forward Network

This type of network (Fig. 4.8) consists of one or more hidden layers, whose computation nodes
are called hidden neurons or hidden units. The function of hidden neurons is to interact between
the external input and network output in some useful manner and to extract higher order statistics.
The source nodes in input layer of network supply the input signal to neurons in the second layer
(1st hidden layer). The output signals of 2nd layer are used as inputs to the third layer and so on.
The set of output signals of the neurons in the output layer of network constitutes the overall
response of network to the activation pattern supplied by source nodes in the input first layer.
Short characterization of feed forward networks:
1. Typically, activation is fed forward from input to output through hidden layers, though many
other architectures exist.
2. Mathematically, they implement static input-output mappings.
3. Most popular supervised training algorithm: back propagation algorithm
4. Have proven useful in many practical applications as approximates of nonlinear functions and
as pattern classificatory.
24 | P a g e
Figure 5.4 Multilayer Feed Forward Network.
5.5.3 Recurrent Network

A feed forward neural network having one or more hidden layers with at least one feedback loop
is known as recurrent network as shown in Fig. 4.9. The feedback may be a self-feedback, i.e.,
where output of neuron is fed back to its own input. Sometimes, feedback loops involve the use of
unit delay elements, which results in nonlinear dynamic behavior, assuming that neural network
contains nonlinear units. There are various other types of networks like; delta-bar-delta, Hopfield,
vector quantization, counter propagation, probabilistic, Hamming, Boltzmann, bidirectional
associative memory, spacio-temporal pattern, adaptive resonance, self-organizing map,
recirculation etc.
Figure 5.4 Recurrent Network.

A recurrent neural network has (at least one) cyclic path of synaptic connections. Basic
characteristics:
25 | P a g e
1. All biological neural networks are recurrent

2. Mathematically, they implement dynamical systems
3. Several types of training algorithms are known, no clear winner
4. Theoretical and practical difficulties by and large have prevented practical applications so far.
5.6 Training an ANN

When training an ANN with a set of input and output data, weights need to be adjusted in the
ANN, to make the ANN give the same outputs as seen in the training data. On the other hand, e
do not want to make the ANN too specific, making it give precise results for the training data, but
incorrect results for all other data. When this happens, say that the ANN has been over-fitted.
The training process can be seen as an optimization problem, to minimize the mean square error
of the entire set of training data This problem can be solved in many different ways, ranging from
standard optimization heuristics like simulated annealing, through more special optimization
techniques like genetic algorithms to specialized gradient descent algorithms like back
propagation.
The most used algorithm is the back propagation algorithm, but this algorithm has some
limitations concerning, the extent of adjustment to the weights in each iteration. This problem has
been solved in more advanced algorithms like RPROP [Riedmiller and Braun, 1993] and
quickprop [Fahlman, 1988.
5.6.1 The Backpropagation Algorithm

The back propagation algorithm works in much the same way as the name suggests: After
propagating an input through the network, the error is calculated and the error is propagated back
through the network while the weights are adjusted in order to make the error smaller. When I
explain this algorithm, I will only explain it for fully connected ANNs, but the theory is the same
for sparse connected ANNs.
Although we want to minimize the mean square error for all the training data, the most efficient
way of doing this with the back propagation algorithm, is to train on data sequentially one input at
a time, instead of training on the combined data. However, this means that the order the data is
26 | P a g e
given in is of importance, but it also provides a very efficient way of avoiding getting stuck in a
local minima.
First the input is propagated through the ANN to the output. After this the error
output neuron
Where
on a single
can be calculated as:
is the calculated output and
used to calculate a
is the desired output of neuron . This error value is
value, which is again used for adjusting the weights. The
value is
calculated by:
Where
is the derived activation function, the need for calculating the derived activation function
was expressed the need for a differentiable activation function.

When the
value is calculated, we can calculate the
values of the previous layer is calculated from the
values for preceding layers. The

values of this layer. By the following
equation:
Where
is the number of neurons in this layer and
is the learning rate parameter, which
determines how much the weight should be adjusted. The more advanced gradient descent
algorithms does not use a learning rate, but a set of more advanced parameters that makes a more
qualified guess to how much the weight should be adjusted.
Using these
values, the
values that the weights should be adjusted by, can be calculated
by:
The
value is used to adjust the weight
, by
and the back
propagation algorithm moves on to the next input and adjusts the weights according to the output.
This process goes on until a certain stop criteria is reached. The stop criteria is typically
determined by measuring the mean square error of the training data while training with the data,
27 | P a g e
when this mean square error reaches a certain limit, the training is stopped. More advanced
stopping criteria involving both training and testing data are also used.
In this section I have briefly discussed the mathematics of the backpropagation algorithm, but
since this report is mainly concerned with the implementation of ANN algorithms, I have left out
details unnecessary for implementing the algorithm. I will refer to [Hassoun, 1995] and [Hertz et
al., 1991] for more detailed explanation of the theory behind and the mathematics of this
algorithm.
5.6.1.1 Running Time of Back propagation
The back propagation algorithm starts by executing the network, involving the amount of work
described in addition to the actual back propagation.
If the ANN is fully connected, the running time of algorithms on the ANN is dominated by the
operations executed for each connection
The backpropagation is dominated by the calculation of the
and the adjustment of
since
these are the only calculations that are executed for each connection. The calculations executed
for each connection when calculating
adjusting
is one multiplication and one addition. When
it is also one multiplication and one addition. This means that the
total running time is dominated by two multiplications and two additions (three if you also count
the addition and multiplication used in the forward propagation) per connection. This is only a
small amount of work for each connection, which gives a clue to how important it is, for the data
needed in these operations to be easily accessible.
5.7 Applications
5.7.1 General Applications
Many of the networks being designed presently are statistically quite accurate (upto 85% to 90%
accuracy). Currently, neural networks are not the user interface, which translates spoken words
into instructions for a machine but some day it will be achieved. VCRs, home security systems,
CD players, and word processors will simply be activated by voice.
Touch screen and voice editing will replace the word processors of today while bringing
spreadsheets and data bases to a level of usability. Neural network design is progressing in other
more promising application areas.
28 | P a g e
5.7.2 Hand Written English Alphabet Recognition System

Two phase processes are involved in the overall processing of our proposed scheme: the Preprocessing and Neural network based Recognizing tasks. The pre-processing steps handle the
manipulations necessary for the preparation of the characters for feeding as input to the neural
network system. First, the required character or part of characters needs to be extracted from the
pictorial representation. The splitting of alphabets into 25 segment grids, scaling the segments so
split to a standard size and thinning the resultant character segments to obtain skeletal patterns.
The following pre-processing steps may also be required to furnish the recognition process:
I. The alphabets can be thinned and their skeletons obtained using well-known image processing
techniques, before extracting their binary forms.
II. The scanned documents can be cleaned and smoothed with the help of image processing
techniques for better performance.
Figure 5.5 Hand Written English Alphabet Recognition
Further, the step involves the digitization of the segment grids because neural networks need their
inputs to be in the form of binary (0s & 1s). The next phase is to recognize segments of the
character. This is carried out by neural networks having different network parameters. The each
digitize segment out of 25 segmented grid is then provided as input to the each node of neural
29 | P a g e
network designed specially for the training of that segments. Once the networks trained for these
segments, be able to recognize them. Characters which are similar looking but distinct are also
distinguished at this time. Results obtained were satisfactory, especially when input characters
were very close to printed letters.
5.7.3 Language Processing
These applications include text-to-speech conversion, auditory input for machines, automatic
language translation, secure voice keyed locks, automatic transcription, aids for the deaf, aids for
the physically disabled which respond to voice commands and natural language processing.
5.7.4 Image (data) Compression
Neural networks can do real-time compression and decompression of data. These networks can
reduce eight bits of data to three and then reverse that process upon restructuring to eight bits
again.
5.7.5 Pattern Recognition
Many pattern recognition applications are in use like, a system that can detect bombs in luggage
at airports by identifying from small variances and patterns from within specialized sensor's
outputs, a back-propagation neural network which can discriminate between a true and a false
heart attack, a network which can scan and also read the PAP smears etc. Many automated quality
control applications are now in use, which are based on pattern recognition.
5.7.6 Signal Processing
Neural networks have proven capable of filtering out electronic noise. Another application is a
system that can detect engine misfire simply from the engine sound.
5.7.7 Financial
Banks, credit card companies and lending institutions deal with many decisions that are not clearcut. They involve learning and statistical trends. Neural networks are now trained on the data
from past decisions and being used in decision-making.
5.7.8 Servo Control
A neural system known as Martingale's Parametric Avalanche -a spatio-temporal pattern
recognition network is being designed to control the shuttle during in-flight maneuvers. Another
application is ALVINN, for Autonomous Land Vehicle.
30 | P a g e
5.8 Advantages
Embedded systems and other devices nowadays perform only a specific task. The program
that embedded in all these device knows only to perform the specific work. The complete
system become fault if it is used for another one. The scenario is different for an ANN
compatible devices. ANN helps the device to perform almost any task without
reprogramming the entire device. ANN algorithm helps the device to learn the new moves
and actions and repeat it.

Algorithm uses large no. of nodes to take accurate decision. If any nodes or any input from
any sensor fails, the weights get adjusted automatically for the previous same answer.
Thus failure of any part will not affect the ANN algorithm embedded devices.
Even if the system is developed and is working on a particular environment, the entire
thing can be relocated anywhere where the environment parameters are different. The
system will automatically get adapted with that.
5.9 Limitations
The devices that is integrated with ANN is like born baby, it doesnt know anything. The
user or the manufacturer need to train the device to perform according to the customer
needs. Whenever a custom need is found, training should be provided from the beginning.
Its completely time consuming.

Even if the ANN is adaptive in nature, complex application system requires highly
complex ANN structure containing large no. of nodes. Teaching and learning process for
such systems requires training in large no. so that effective output is obtained.
The microcontrollers, microprocessors that are available today are not highly supporting
the ANN algorithm and structure because of it sequential nature. Only DSP processors
with parallel execution is recommended for ANN to get its maximum performance.
31 | P a g e
CHAPTER-6
CONCLUSION
Porting of human brain algorithm to a robot for its effective vision is not possible with the
technology and algorithm available nowadays. Apart from that implementation in basic level is
possible. Color, object detection, background subtraction methods are among them. All these have
high end drawbacks as they are not adaptive and they are specific.
Artificial intelligence is most modern technology currently developing and its major component is
ANN. ANN helps the device to learn and understand the input and output, so that same output can
be obtained adaptively with any environment. Back propagation and deep learning algorithms are
the famous algorithms used for implementing artificial neural network. Due to the immense
application of ANN, the usage and need is nowadays increasing exponentially.
32 | P a g e
CHAPTER-7
REFERENCE
[1]
Christos Stergiou, NEURAL NETWORKS , 1996. [Online]. Available:

https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html
[2]
M. Mohan Prasad, Intelligent Robot Used In The Field Of Practical

Application of Artificial Neural Network & Machine Vision, International
Journal of Learn Thinking, Vol. 3, Issue 2, December 2012.
[3]
Jitender Sing et al, ARTIFICIAL NEURAL NETWORK, International

Journal Of Scientific Research And Education, vol. 1, issue 6, pp:108118, 2013
[4]
Pijush Chakraborty, An Approach to Handwriting Recognition using BackPropagation Neural Network, International Journal of Computer
Applications, vol. 68, no.13, April 2013
33 | P a g e

RPRT

Uploaded by

Copyright:

Available Formats

RPRT

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RPRT

Uploaded by

Copyright:

Available Formats

CHAPTER-1

HUMAN VISUAL SYSTEM (HVS)

Fig 2.1 Formation of image in

DIGITIZING HUMAN VISION

3.1 Visual Learning

3.2 Programming Language Approach

Observing a new thing

Observing a unknown thing

Fig 3.1 Block Diagram

3.3 Image Processing

3.4 Frame Rate

Fig 3.2 Frame rate

OBJECT DETECTING TECHNIQUES

4.1 Region of Interest (ROI)

Fig 4.1 ROI Detection

4.2 Background subtraction

Fig 4.2 Background Subtraction

4.3 Colour Detection

It is a critical part in many applications such as image search, scene

4.4 Colour Spaces

Fig 4.3 RGB and HSV colour channel

Fig 4.4 HSV Colour Space

Fig 4.5 YCbCr Colour Space

Capture frame by frame.

Fig 4.6 Noise detection

ARTIFICIAL NEURAL NETWORKING (ANN)

5.1 Biological Neural Network

Figure 5.1 A Simple Neuron.

5.2 ANN Structure

5.2.3 Learning Rule

Figure 5.2 Artificial Neural Network

5.3 An Artificial Neuron

Figure 5.3 Artificial Neuron functioning.

5.3.1 Weighting Factors

5.4 Training of Artificial Neural Networks

5.4.2 Unsupervised or Adaptive Training

An unsupervised learning algorithm might emphasize cooperation among clusters of processing

5.5 Types of Artificial Neural Networks

Figure 5.4 Single Layer Feed Forward Network.

5.5.2 Multilayer Feed Forward Network

Figure 5.4 Multilayer Feed Forward Network.

5.5.3 Recurrent Network

Figure 5.4 Recurrent Network.

1. All biological neural networks are recurrent

5.6 Training an ANN

5.6.1 The Backpropagation Algorithm

can be calculated as:

is the calculated output and

is the desired output of neuron . This error value is

value, which is again used for adjusting the weights. The

was expressed the need for a differentiable activation function.

value is calculated, we can calculate the

values of the previous layer is calculated from the

values for preceding layers. The

is the number of neurons in this layer and

is the learning rate parameter, which

values that the weights should be adjusted by, can be calculated

value is used to adjust the weight