RPRT
RPRT
RPRT
INTRODUCTION
Visual perception is the ability to interpret the surrounding environment by processing
information that is contained in visible light. The resulting perception is also known as eyesight,
sight, or vision. The various physiological components involved in vision are referred to
collectively as the visual system.
Human vision is a complex process that is not yet completely understood, despite hundreds of
years of study and research. The complex physical process of visualizing something involves the
nearly simultaneous interaction of the eyes and the brain through a network of neurons, receptors,
and other specialized cells. The act of seeing starts when the cornea and then the lens of the eye
focuses an image of its surroundings onto a light-sensitive membrane in the back of the eye,
called the retina. The retina is actually part of the brain that is isolated to serve as a transducer for
the conversion of patterns of light into neuronal signals. The lens of the eye focuses light on the
photoreceptive cells of the retina, also known as the rods and cones, which detect the photons of
light and respond by producing neural impulses. These signals are processed in a hierarchical
fashion by different parts of the brain, from the retina upstream to central ganglia in the brain.
Vision play an important role in intelligence because brain process the visuals and make us to act
accordingly. When implementing the same concept in robots like a camera as eye and a powerful
processor as brain, it leads to artificial intelligence. Robotic vision continues to be treated
including different methods for processing, analyzing, and understanding. All these methods
produce information that is translated into decisions for robots. The different process or
algorithms used in robotic vision are object tracking and detection based on colour, movement,
shape etc. But all this have restrictions and drawbacks, since they heavily depends on camera
lighting which varies from morning to night. So replicating the object identification,
understanding methods by brain is the perfect solution for robotic vision and its intelligence.
Colour, shape, size, properties are the main parameters to define an object and by providing all
these to a learning process called Artificial Neural Network (ANN) the goal can be easily
achieved. Artificial Neural Networks are relatively crude electronic models based on the neural
structure of the brain. The brain basically learns from experience. This brain modeling also
promises a less technical way to develop machine solutions. If the robot learn by itself without
human teaching it adds artificial intelligence and its the most complicated thing ever.
1 | Page
CHAPTER-2
to overall light and color intensity than changes in the color itself. Colors have three attributes:
brightness, based on luminance and reflectivity; saturation, based on the amount of white present;
and hue, based on color combinations. Sophisticated combinations of these receptors signals are
transduced into chemical and electrical signals, which are sent to the brain for the dynamic
process of color perception.
Depth perception refers to our ability to see the world in three dimensions. With this ability, we
can interact with the physical world by accurately gauging the distance to a given object. While
depth perception is often attributed to binocular vision (vision from two eyes), it also relies
heavily on monocular cues (cues from only one eye) to function properly. These cues range from
the convergence of our eyes and accommodation of the lens to optical flow and motion.
3 | Page
CHAPTER-3
4 | Page
The block diagram below shows the outline of brain functioning when it sees two things, one
which is known and other unknown.
When brain is observing a new thing new memory allocation for the object actually happening
like what did in childhood. Along with image, the brain saves information like shape, size,
features, and specifications with the object. These all information are stored in order to reuse as
parameters when the observer sees the same thing or similar thing in future.
The scenario is different for processing a known thing, there brain will uses stored information to
take perfect decision, along with that it will modify or adds new information. All the processing
that taking place related to visuals is image processing and it is in real time. Also there are several
duplicate objects that resembles the original one in shape, size and color, here brain can get
5 | Page
confused. Confusion situations are perfectly handled by another function which will be waiting
for another parameter as by asking doubts, and confirms whether the object is new or not.
simulate that frame rate, it can look like the wheels spin backwards. Whats happening is that the
wheels position is only slightly behind where it was when the last frame was stitched together by
your brain.
If you get a cars wheel spinning in sync with the frame rate, it can even look as though a car is
gliding rather than rolling.
The motion picture, the scanning of an image for television, and the sequential reproduction of the
flickering visual images they produce, work in part, because of an optical phenomena called the
persistence of vision and its psychological partner, the phi phenomenon the mental bridge that the
mind forms to conceptually complete the gaps between the frames or pictures. Persistence of
vision also plays a role in keeping the world from going pitch black every time we blink our eyes.
Whenever light strikes the retina, the brain retains the impression of that light for about a tenth of
a second depending on the brightness of the image after the source of that light is removed from
the eye. This is due to a prolonged chemical reaction. As a result, the eye cannot clearly
distinguish fast changes in light that occur faster than this retention period. The changes either go
unnoticed or they appear to be one continuous picture to the human observer. This fundamental
fact of the way we see has been used to our advantage.
7 | Page
CHAPTER 4
The different methods to identify the ROI detection are background subtraction method and
colour detection method. Further processing and learning on the ROI region is done with the help
of Artificial Neural Network (ANN). In machine learning and cognitive science, artificial neural
networks (ANNs) is a network inspired by biological neural networks (the central nervous
systems of animals, in particular the brain) which are used to estimate or approximate functions
that can depend on a large number of inputs that are generally unknown.
9 | Page
A simple algorithm to do background subtraction is as follows. After turning on the camera, initial
two or three frames and their average is stored as reference. All the frames that captured after this
are processed by finding the difference with reference frame. By applying a proper threshold level
the object can be easily detected. Threshold is important because noise due to wind, shaking of
camera minor presence of smaller object etc. can be easily neglected.
understanding, etc. However it is still an open problem due to the variety and complexity of
object classes and backgrounds. The easiest way to detect and segment an object from an image is
the colour based methods. The object and the background should have a significant colour
difference in order to successfully segment objects using colour based methods.
HSV (hue, saturation, value), also known as HSB (hue, saturation, brightness), is often used by
artists because it is often more natural to think about a colour in terms of hue and saturation than
in terms of additive or subtractive colour components. HSV is a transformation of an RGB colour
space, and its components and colourimetric are relative to the RGB colour space from which it
was derived
4.4.3 YCbCr
YCbCr, YCbCr, or Y Pb/Cb Pr/Cr, also written as YCBCR or YCBCR, is a family of colour
spaces used as a part of the colour image pipeline in video and digital photography systems. Y is
the luma component and CB and CR are the blue-difference and red-difference chroma
components. Y (with prime) is distinguished from Y, which is luminance, meaning that light
intensity is nonlinearly encoded based on gamma corrected RGB primaries. YCbCr colour spaces
11 | P a g e
are defined by a mathematical coordinate transformation from an associated RGB colour space. If
the underlying RGB colour space is absolute, the YCbCr colour space is an absolute colour space
as well; conversely, if the RGB space is ill-defined, so is YCbCr.
4.5 Algorithm
Colour detection method to find and track particular object in a frame is simple but the output will
be less efficient. A simple algorithm to detect a blue colour object is as follows.
This method with RGB is less efficient as the light that reaches the camera changes every second
in less manner. Morning, noon, evening and night provide different blue shades for the same
object. So real time continuous tracking in RGB colour channel is difficult else the threshold must
be adjusted manually. Usually, one can think that BGR colour space is more suitable for colour
based segmentation. But HSV colour space is the most suitable colour space for colour based
image segmentation. HSV colour space is also consists of 3 matrices, HUE, SATURATION and
VALUE. HUE represents the colour, SATURATION represents the amount to which that
respective colour is mixed with white and VALUE represents the amount to which that respective
colour is mixed with black. The HSV colour space is quite similar to the way in which humans
perceive colour. The other models, except for HSL, define colour in relation to the primary
colours. The colours used in HSV can be clearly defined by human perception, which is not
always the case with RGB or CMYK. So by using threshold values in three channels we can track
12 | P a g e
a particular object without considering camera light up to certain limit. HSV colour detection is
more efficient than RGB.
4.6 Drawbacks
Background subtraction is more efficient algorithm to detect and track object. But this algorithm
is only applicable when there is a stable background with the object to be tracked is moving.
Unfortunately if any object which shows considerable movement in the frame will also get
detected. Camera shaking may also lead to faulty detection and error output.
Colour detection methods doesnt provide any meaning to object tracking, because it will detect
and track all the objects in the frame whether its the targeted object or not since color is only
factor that used here. As the intensity of light reaches earth is different at every instant and the
color of light reflected by each object varies accordingly, so detecting object by considering only
a single factor as color is less efficient. The frame consisting of the object will have different
color shades of the same color from morning to night so the detection become difficult in real
time from video.
Another major factor while considering object tracking using colour detection is that the presence
of other object which have same colour. Since shape or any other factor is not specifying the
output frame will consist of different object with same colour get detected.
13 | P a g e
In the above image all the objects having red colour is detected. In order to detect a specific object
for eg. Rectangle sheet along with colour, its shape and geometry must be specified to get most
efficient output.
CHAPTER-5
with our abilities to remember, think, and apply previous experiences to our every action. These
cells, all 100 billion of them, are known as neurons. Each of these neurons can connect with up to
200,000 other neurons, although 1,000 to 10,000 is typical.
The power of the human mind comes from the sheer numbers of these basic components and the
multiple connections between them. It also comes from genetic programming and learning.The
individual neurons are complicated. They have a myriad of parts, sub-systems, and control
mechanisms. They convey information via a host of electrochemical pathways. There are over one
hundred different classes of neurons, depending on the classification method used. Together these
neurons and their connections form a process which is not binary, not stable, and not synchronous.
In short, it is nothing like the currently available electronic computers, or even artificial neural
networks. These artificial neural networks try to replicate only the most basic elements of this
complicated, versatile, and powerful organism. They do it in a primitive way. But for the software
engineer who is trying to solve problems, neural computing was never about replicating human
brains. It is about machines and a new way to solve problems.
The fundamental processing element of a neural network is a neuron. This building block of
human awareness encompasses a few general capabilities. Basically, a biological neuron receives
inputs from other sources, combines them in some way, performs a generally nonlinear operation
on the result, and then outputs the final result. Figure 2.2.1 shows the relationship of these four
parts.
Within humans there are many variations on this basic type of neuron, further complicating man's
attempts at electrically replicating the process of thinking. Yet, all natural neurons have the same
15 | P a g e
four basic components. These components are known by their biological names - dendrites, soma,
axon, and synapses. Dendrites are hair-like extensions of the soma which act like input channels.
These input channels receive their input through the synapses of other neurons. The soma then
processes these incoming signals over time. The soma then turns that processed value into an
output which is sent out to other neurons through the axon and the synapses.
Recent experimental data has provided further evidence that biological neurons are structurally
more complex than the simplistic explanation above. They are significantly more complex than
the existing artificial neurons that are built into today's artificial neural networks. As biology
provides a better understanding of neurons, and as technology advances, network designers can
continue to improve their systems by building upon man's understanding of the biological brain.
But currently, the goal of artificial neural networks is not the grandiose recreation of the brain. On
the contrary, neural network researchers are seeking an understanding of nature's capabilities for
which people can engineer solutions to problems that have not been solved by traditional
computing.
Contains sets of adaptive weights, i.e. numerical parameters that are tuned by a learning
algorithm
Is capable of approximating non-linear functions of their inputs.
The adaptive weights can be thought of as connection strengths between neurons, which are
activated during training and prediction.
Artificial neural networks are similar to biological neural networks in the performing by its units
of functions collectively and in parallel, rather than by a clear delineation of subtasks to which
individual units are assigned. The term "neural network" usually refers to models employed in
statistics, cognitive psychology and artificial intelligence. Neural network models which
command the central nervous system and the rest of the brain are part of theoretical neuroscience
and computational neuroscience
17 | P a g e
18 | P a g e
Seven major components make up an artificial neuron. These components are valid whether the
neuron is used for input, output, or is in the hidden layers.
w1) + (i2 * w2) +.. + (in * wn); is a single number. The summation function can be more
complex than just weight sum of products. The input and weighting coefficients can be combined
in many different ways before passing on to the transfer function. In addition to summing, the
summation function can select the minimum, maximum, majority, product or several normalizing
algorithms. The specific algorithm for combining neural inputs is determined by the chosen
network architecture and paradigm. Some summation functions have an additional activation
function applied to the result before it is passed on to the transfer function for the purpose of
allowing the summation output to vary with respect to time.
5.3.3 Transfer Function
The result of the summation function is transformed to a working output through an algorithmic
process known as the transfer function. In the transfer function the summation can be compared
with some threshold to determine the neural output. If the sum is greater than the threshold value,
the processing element generates a signal and if it is less than the threshold, no signal (or some
inhibitory signal) is generated. Both types of response are significant. The threshold, or transfer
function, is generally non-linear.
Linear functions are limited because the output is simply proportional to the input. The step type
of transfer function would output zero and one, one and minus one, or other numeric
combinations. Another type, the threshold or ramping function, can mirror the input within a
given range and still act as a step function outside that range. It is a linear function that is clipped
to minimum and maximum values, making it non-linear. Another option is a S curve, which
approaches a minimum and maximum value at the asymptotes. It is called a sigmoid when it
ranges between 0 and 1, and a hyperbolic tangent when it ranges between -1 and 1. Both the
function and its derivatives are continuous
5.3.4 Scaling and Limiting
After the transfer function, the result can pass through additional processes, which scale and
limit. This scaling simply multiplies a scale factor times the transfer value and then adds an offset.
Limiting is the mechanism which insures that the scaled result does not exceed an upper, or lower
bound. This limiting is in addition to the hard limits that the original transfer function may have
performed.
5.3.5 Output Function (Competition)
Each processing element is allowed one output signal, which it may give to hundreds of other
neurons. Normally, the output is directly equivalent to the transfer function's result. Some network
20 | P a g e
topologies modify the transfer result to incorporate competition among neighboring processing
elements. Neurons are allowed to compete with each other inhibiting processing elements unless
they have great strength. Competition can occur at one or both levels. First, competition
determines which artificial neuron will be active or provides an output. Second, competitive
inputs help determine which processing element will participate in the learning or adaptation
process.
5.3.6 Error Function and Back-Propagated Value
In most learning networks the difference between the current output and the desired output is
calculated as an error which is then transformed by the error function to match a particular
network architecture. Most basic architectures use this error directly but some square the error
while retaining its sign, some cube the error, other paradigms modify the error to fit their specific
purposes.
The error is propagated backwards to a previous layer. This back-propagated value can be either
the error, the error scaled in some manner (often by the derivative of the transfer function) or
some other desired output depending on the network type. Normally, this back-propagated value,
after being scaled by the learning function, is multiplied against each of the incoming connection
weights to modify them before the next learning cycle.
5.3.7 Learning Function
Its purpose is to modify the weights on the inputs of each processing element according to some
neural based algorithm.
During the training of a network, the same set of data is processed many times, as the connection
weights are ever refined. Sometimes a network may never learn. This could be because the input
data does not contain the specific information from which the desired output is derived. Networks
also don't converge if there is not enough data to enable complete learning. Ideally, there should
be enough data so that part of the data can be held back as a test.
Many layered networks with multiple nodes are capable of memorizing data. To monitor the
network to determine if the system is simply memorizing its data in some non-significant way,
supervised training needs to hold back a set of data to be used to test the system after it has
undergone its training. If a network simply can't solve the problem, the designer then has to
review the input and outputs, the number of layers, the number of elements per layer, the
connections between the layers, the summation, transfer, and training functions, and even the
initial weights themselves. Another part of the designer's creativity governs the rules of training.
There are many laws (algorithms) used to implement the adaptive feedback required to adjust the
weights during training.
The most common technique is known as back-propagation. The training is not just a technique,
but a conscious analysis, to insure that the network is not over trained. Initially, an artificial neural
network configures itself with the general statistical trends of the data. Later, it continues to
learn about other aspects of the data, which may be spurious from a general viewpoint. When
finally the system has been correctly trained and no further learning is needed, the weights can, if
desired, be frozen. In some systems, this finalized network is then turned into hardware so that it
can be fast. Other systems don't lock themselves in but continue to learn while in production use.
22 | P a g e
23 | P a g e
24 | P a g e
characteristics:
25 | P a g e
given in is of importance, but it also provides a very efficient way of avoiding getting stuck in a
local minima.
First the input is propagated through the ANN to the output. After this the error
output neuron
Where
on a single
used to calculate a
value is
calculated by:
Where
is the derived activation function, the need for calculating the derived activation function
equation:
Where
determines how much the weight should be adjusted. The more advanced gradient descent
algorithms does not use a learning rate, but a set of more advanced parameters that makes a more
qualified guess to how much the weight should be adjusted.
Using these
values, the
by:
The
, by
propagation algorithm moves on to the next input and adjusts the weights according to the output.
This process goes on until a certain stop criteria is reached. The stop criteria is typically
determined by measuring the mean square error of the training data while training with the data,
27 | P a g e
when this mean square error reaches a certain limit, the training is stopped. More advanced
stopping criteria involving both training and testing data are also used.
In this section I have briefly discussed the mathematics of the backpropagation algorithm, but
since this report is mainly concerned with the implementation of ANN algorithms, I have left out
details unnecessary for implementing the algorithm. I will refer to [Hassoun, 1995] and [Hertz et
al., 1991] for more detailed explanation of the theory behind and the mathematics of this
algorithm.
5.6.1.1 Running Time of Back propagation
The back propagation algorithm starts by executing the network, involving the amount of work
described in addition to the actual back propagation.
If the ANN is fully connected, the running time of algorithms on the ANN is dominated by the
operations executed for each connection
The backpropagation is dominated by the calculation of the
since
these are the only calculations that are executed for each connection. The calculations executed
for each connection when calculating
adjusting
it is also one multiplication and one addition. This means that the
total running time is dominated by two multiplications and two additions (three if you also count
the addition and multiplication used in the forward propagation) per connection. This is only a
small amount of work for each connection, which gives a clue to how important it is, for the data
needed in these operations to be easily accessible.
5.7 Applications
5.7.1 General Applications
Many of the networks being designed presently are statistically quite accurate (upto 85% to 90%
accuracy). Currently, neural networks are not the user interface, which translates spoken words
into instructions for a machine but some day it will be achieved. VCRs, home security systems,
CD players, and word processors will simply be activated by voice.
Touch screen and voice editing will replace the word processors of today while bringing
spreadsheets and data bases to a level of usability. Neural network design is progressing in other
more promising application areas.
28 | P a g e
Further, the step involves the digitization of the segment grids because neural networks need their
inputs to be in the form of binary (0s & 1s). The next phase is to recognize segments of the
character. This is carried out by neural networks having different network parameters. The each
digitize segment out of 25 segmented grid is then provided as input to the each node of neural
29 | P a g e
network designed specially for the training of that segments. Once the networks trained for these
segments, be able to recognize them. Characters which are similar looking but distinct are also
distinguished at this time. Results obtained were satisfactory, especially when input characters
were very close to printed letters.
5.7.3 Language Processing
These applications include text-to-speech conversion, auditory input for machines, automatic
language translation, secure voice keyed locks, automatic transcription, aids for the deaf, aids for
the physically disabled which respond to voice commands and natural language processing.
5.7.4 Image (data) Compression
Neural networks can do real-time compression and decompression of data. These networks can
reduce eight bits of data to three and then reverse that process upon restructuring to eight bits
again.
5.7.5 Pattern Recognition
Many pattern recognition applications are in use like, a system that can detect bombs in luggage
at airports by identifying from small variances and patterns from within specialized sensor's
outputs, a back-propagation neural network which can discriminate between a true and a false
heart attack, a network which can scan and also read the PAP smears etc. Many automated quality
control applications are now in use, which are based on pattern recognition.
5.7.6 Signal Processing
Neural networks have proven capable of filtering out electronic noise. Another application is a
system that can detect engine misfire simply from the engine sound.
5.7.7 Financial
Banks, credit card companies and lending institutions deal with many decisions that are not clearcut. They involve learning and statistical trends. Neural networks are now trained on the data
from past decisions and being used in decision-making.
5.7.8 Servo Control
A neural system known as Martingale's Parametric Avalanche -a spatio-temporal pattern
recognition network is being designed to control the shuttle during in-flight maneuvers. Another
application is ALVINN, for Autonomous Land Vehicle.
30 | P a g e
5.8 Advantages
Embedded systems and other devices nowadays perform only a specific task. The program
that embedded in all these device knows only to perform the specific work. The complete
system become fault if it is used for another one. The scenario is different for an ANN
compatible devices. ANN helps the device to perform almost any task without
reprogramming the entire device. ANN algorithm helps the device to learn the new moves
Thus failure of any part will not affect the ANN algorithm embedded devices.
Even if the system is developed and is working on a particular environment, the entire
thing can be relocated anywhere where the environment parameters are different. The
system will automatically get adapted with that.
5.9 Limitations
The devices that is integrated with ANN is like born baby, it doesnt know anything. The
user or the manufacturer need to train the device to perform according to the customer
needs. Whenever a custom need is found, training should be provided from the beginning.
such systems requires training in large no. so that effective output is obtained.
The microcontrollers, microprocessors that are available today are not highly supporting
the ANN algorithm and structure because of it sequential nature. Only DSP processors
with parallel execution is recommended for ANN to get its maximum performance.
31 | P a g e
CHAPTER-6
CONCLUSION
Porting of human brain algorithm to a robot for its effective vision is not possible with the
technology and algorithm available nowadays. Apart from that implementation in basic level is
possible. Color, object detection, background subtraction methods are among them. All these have
high end drawbacks as they are not adaptive and they are specific.
Artificial intelligence is most modern technology currently developing and its major component is
ANN. ANN helps the device to learn and understand the input and output, so that same output can
be obtained adaptively with any environment. Back propagation and deep learning algorithms are
the famous algorithms used for implementing artificial neural network. Due to the immense
application of ANN, the usage and need is nowadays increasing exponentially.
32 | P a g e
CHAPTER-7
REFERENCE
[1]
[2]
[3]
[4]
Pijush Chakraborty, An Approach to Handwriting Recognition using BackPropagation Neural Network, International Journal of Computer
Applications, vol. 68, no.13, April 2013
33 | P a g e