0% found this document useful (0 votes)
37 views31 pages

Ai Introduction

Artificial intelligence (AI) is a branch of computer science that aims to build machines that can think and act intelligently like humans. Some key applications of AI include game playing, speech recognition, natural language processing, computer vision, expert systems, and heuristic classification. Early work in AI involved programming computers to solve complex problems and demonstrate intelligent behavior.

Uploaded by

Pappu P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views31 pages

Ai Introduction

Artificial intelligence (AI) is a branch of computer science that aims to build machines that can think and act intelligently like humans. Some key applications of AI include game playing, speech recognition, natural language processing, computer vision, expert systems, and heuristic classification. Early work in AI involved programming computers to solve complex problems and demonstrate intelligent behavior.

Uploaded by

Pappu P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Overview of Artificial Intelligence

What is AI ?

Artificial Intelligence (AI) is a branch of Science which deals with helping machines
find solutions to complex problems in a more human-like fashion.
This generally involves borrowing characteristics from human intelligence, and
applying them as algorithms in a computer friendly way.
A more or less flexible or efficient approach can be taken depending on the
requirements established, which influences how artificial the intelligent behavior
appears
Artificial intelligence can be viewed from a variety of perspectives.
From the perspective of intelligence
artificial intelligence is making machines "intelligent" -- acting as we would
expect people to act.
o The inability to distinguish computer responses from human responses
is called the Turing test.
o Intelligence requires knowledge
o Expert problem solving - restricting domain to allow including
significant relevant knowledge
From a business perspective AI is a set of very powerful tools, and
methodologies for using those tools to solve business problems.
From a programming perspective, AI includes the study of symbolic
programming, problem solving, and search.
o Typically AI programs focus on symbols rather than numeric
processing.
o Problem solving - achieve goals.
o Search - seldom access a solution directly. Search may include a
variety of techniques.
o AI programming languages include:
– LISP, developed in the 1950s, is the early programming language
strongly associated with AI. LISP is a functional programming language with
procedural extensions. LISP (LISt Processor) was specifically designed for
processing heterogeneous lists -- typically a list of symbols. Features of LISP
are run- time type checking, higher order functions (functions that have other
functions as parameters), automatic memory management (garbage collection)
and an interactive environment.
– The second language strongly associated with AI is PROLOG.
PROLOG was developed in the 1970s. PROLOG is based on first order logic.
PROLOG is declarative in nature and has facilities for explicitly limiting the
search space.
– Object-oriented languages are a class of languages more recently used
for AI programming. Important features of object-oriented languages include:
concepts of objects and messages, objects bundle data and methods for
manipulating the data, sender specifies what is to be done receiver decides
how to do it, inheritance (object hierarchy where objects inherit the attributes
of the more general class of objects). Examples of object-oriented languages
are Smalltalk, Objective C, C++. Object oriented extensions to LISP (CLOS -
Common LISP Object System) and PROLOG (L&O - Logic & Objects) are
also used.
Artificial Intelligence is a new electronic machine that stores large amount of
information and process it at very high speed
The computer is interrogated by a human via a teletype It passes if the human cannot
tell if there is a computer or human at the other end
The ability to solve problems
It is the science and engineering of making intelligent machines, especially intelligent
computer programs. It is related to the similar task of using computers to understand
human intelligence
Importance of AI
Game Playing
You can buy machines that can play master level chess for a few hundred dollars.
There is some AI in them, but they play well against people mainly through brute
force computation--looking at hundreds of thousands of positions. To beat a world
champion by brute force and known reliable heuristics requires being able to look at
200 million positions per second.

Speech Recognition
In the 1990s, computer speech recognition reached a practical level for limited
purposes. Thus United Airlines has replaced its keyboard tree for flight information
by a system using speech recognition of flight numbers and city names. It is quite
convenient. On the other hand, while it is possible to instruct some computers using
speech, most users have gone back to the keyboard and the mouse as still more
convenient.

Understanding Natural Language


Just getting a sequence of words into a computer is not enough. Parsing sentences is
not enough either. The computer has to be provided with an understanding of the
domain the text is about, and this is presently possible only for very limited domains.

Computer Vision
The world is composed of three-dimensional objects, but the inputs to the human eye
and computers' TV cameras are two dimensional. Some useful programs can work
solely in two dimensions, but full computer vision requires partial three-dimensional
information that is not just a set of two-dimensional views. At present there are only
limited ways of representing three-dimensional information directly, and they are not
as good as what humans evidently use.

Expert Systems
A ``knowledge engineer'' interviews experts in a certain domain and tries to embody
their knowledge in a computer program for carrying out some task. How well this
works depends on whether the intellectual mechanisms required for the task are
within the present state of AI. When this turned out not to be so, there were many
disappointing results. One of the first expert systems was MYCIN in 1974, which
diagnosed bacterial infections of the blood and suggested treatments. It did better than
medical students or practicing doctors, provided its limitations were observed.
Namely, its ontology included bacteria, symptoms, and treatments and did not include
patients, doctors, hospitals, death, recovery, and events occurring in time. Its
interactions depended on a single patient being considered. Since the experts
consulted by the knowledge engineers knew about patients, doctors, death, recovery,
etc., it is clear that the knowledge engineers forced what the experts told them into a
predetermined framework. The usefulness of current expert systems depends on their
users having common sense.
Heuristic Classification
One of the most feasible kinds of expert system given the present knowledge of AI is
to put some information in one of a fixed set of categories using several sources of
information. An example is advising whether to accept a proposed credit card
purchase. Information is available about the owner of the credit card, his record of
payment and also about the item he is buying and about the establishment from which
he is buying it (e.g., about whether there have been previous credit card frauds at this
establishment).

The applications of AI are shown in Fig 1.1:


Consumer Marketing
o Have you ever used any kind of credit/ATM/store card while shopping?
o if so, you have very likely been “input” to an AI algorithm
o All of this information is recorded digitally
o Companies like Nielsen gather this information weekly and search for
patterns
– general changes in consumer behavior
– tracking responses to new products
– identifying customer segments: targeted marketing, e.g., they find
out that consumers with sports cars who buy textbooks respond
well to offers of new credit cards.
o Algorithms (“data mining”) search data for patterns based on mathematical
theories of learning
Identification Technologies
o ID cards e.g., ATM cards
o can be a nuisance and security risk: cards can be lost, stolen, passwords
forgotten, etc
o Biometric Identification, walk up to a locked door
– Camera
– Fingerprint device
– Microphone
– Computer uses biometric signature for identification
– Face, eyes, fingerprints, voice pattern
– This works by comparing data from person at door with stored
library
– Learning algorithms can learn the matching process by analyzing a
large library database off-line, can improve its performance.
Intrusion Detection
o Computer security - we each have specific patterns of computer use times
of day, lengths of sessions, command used, sequence of commands, etc
– would like to learn the “signature” of each authorized user
– can identify non-authorized users
o How can the program automatically identify users?
– record user’s commands and time intervals
– characterize the patterns for each user
– model the variability in these patterns
– classify (online) any new user by similarity to stored patterns
Machine Translation
o Language problems in international business
– e.g., at a meeting of Japanese, Korean, Vietnamese and Swedish
investors, no common language
– If you are shipping your software manuals to 127 countries, the
solution is ; hire translators to translate
– would be much cheaper if a machine could do this!
o How hard is automated translation
– very difficult!
– e.g., English to Russian
– not only must the words be translated, but their meaning also!
Fig : Application areas of AI

Early work in AI

“Artificial Intelligence (AI) is the part of computer science concerned with designing
intelligent computer systems, that is, systems that exhibit characteristics we associate
with intelligence in human behaviour – understanding language, learning, reasoning,
solving problems, and so on.”
Scientific Goal To determine which ideas about knowledge representation, learning,
rule systems, search, and so on, explain various sorts of real intelligence.
Engineering Goal To solve real world problems using AI techniques such as
knowledge representation, learning, rule systems, search, and so on.
Traditionally, computer scientists and engineers have been more interested in the
engineering goal, while psychologists, philosophers and cognitive scientists have been
more interested in the scientific goal.
The Roots - Artificial Intelligence has identifiable roots in a number of older
disciplines, particularly:
Philosophy
Logic/Mathematics
Computation
Psychology/Cognitive Science
Biology/Neuroscience
Evolution
There is inevitably much overlap, e.g. between philosophy and logic, or between
mathematics and computation. By looking at each of these in turn, we can gain a
better understanding of their role in AI, and how these underlying disciplines have
developed to play that role.
Philosophy
~400 BC Socrates asks for an algorithm to distinguish piety from non-piety.
~350 BC Aristotle formulated different styles of deductive reasoning, which
could mechanically generate conclusions from initial premises, e.g. Modus Ponens
If A?B and A then B
If A implies B and A is true then B is true when it’s raining you

get wet and it’s raining then you get wet

1596 – 1650 Rene Descartes idea of mind-body dualism – part of the mind is
exempt from physical laws.
1646 – 1716 Wilhelm Leibnitz was one of the first to take the materialist position
which holds that the mind operates by ordinary physical processes – this has the
implication that mental processes can potentially be carried out by machines.
Logic/Mathematics
Earl Stanhope’s Logic Demonstrator was a machine that was able to solve
syllogisms, numerical problems in a logical form, and elementary questions of
probability.
1815 – 1864 George Boole introduced his formal language for making logical
inference in 1847 – Boolean algebra.
1848 – 1925 Gottlob Frege produced a logic that is essentially the first-order
logic that today forms the most basic knowledge representation system.
1906 – 1978 Kurt Gödel showed in 1931 that there are limits to what logic can
do. His Incompleteness Theorem showed that in any formal logic powerful
enough to describe the properties of natural numbers, there are true statements
whose truth cannot be established by any algorithm.
1995 Roger Penrose tries to prove the human mind has non-computable
capabilities.
Computation
1869 William Jevon’s Logic Machine could handle Boolean Algebra and Venn
Diagrams, and was able to solve logical problems faster than human beings.
1912 – 1954 Alan Turing tried to characterise exactly which functions are
capable of being computed. Unfortunately it is difficult to give the notion of
computation a formal definition. However, the Church-Turing thesis, which states
that a Turing machine is capable of computing any computable function, is
generally accepted as providing a sufficient definition. Turing also showed that
there were some functions which no Turing machine can compute (e.g. Halting
Problem).
1903 – 1957 John von Neumann proposed the von Neuman architecture which
allows a description of computation that is independent of the particular
realisation of the computer.
1960s Two important concepts emerged: Intractability (when solution time
grows atleast exponentially) and Reduction (to ‘easier’ problems).
Psychology / Cognitive Science
Modern Psychology / Cognitive Psychology / Cognitive Science is the science
which studies how the mind operates, how we behave, and how our brains process
information.
Language is an important part of human intelligence. Much of the early work on
knowledge representation was tied to language and informed by research into
linguistics.
It is natural for us to try to use our understanding of how human (and other
animal) brains lead to intelligent behavior in our quest to build artificial intelligent
systems. Conversely, it makes sense to explore the properties of artificial systems
(computer models/simulations) to test our hypotheses concerning human systems.
Many sub-fields of AI are simultaneously building models of how the human
system operates, and artificial systems for solving real world problems, and are
allowing useful ideas to transfer between them.
Biology / Neuroscience
Our brains (which give rise to our intelligence) are made up of tens of billions of
neurons, each connected to hundreds or thousands of other neurons.
Each neuron is a simple processing device (e.g. just firing or not firing depending
on the total amount of activity feeding into it). However, large networks of
neurons are extremely powerful computational devices that can learn how best to
operate.
The field of Connectionism or Neural Networks attempts to build artificial
systems based on simplified networks of simplified artificial neurons.
The aim is to build powerful AI systems, as well as models of various human
abilities.
Neural networks work at a sub-symbolic level, whereas much of conscious human
reasoning appears to operate at a symbolic level.
Artificial neural networks perform well at many simple tasks, and provide good
models of many human abilities. However, there are many tasks that they are not
so good at, and other approaches seem more promising in those areas.
Evolution
One advantage humans have over current machines/computers is that they have a
long evolutionary history.
Charles Darwin (1809 – 1882) is famous for his work on evolution by natural
selection. The idea is that fitter individuals will naturally tend to live longer and
produce more children, and hence after many generations a population will
automatically emerge with good innate properties.
This has resulted in brains that have much structure, or even knowledge, built in at
birth.
This gives them at the advantage over simple artificial neural network systems
that have to learn everything.
Computers are finally becoming powerful enough that we can simulate evolution
and evolve good AI systems.
We can now even evolve systems (e.g. neural networks) so that they are good at
learning.
A related field called genetic programming has had some success in evolving
programs, rather than programming them by hand.
Sub-fields of Artificial Intelligence
Neural Networks – e.g. brain modelling, time series prediction, classification
Evolutionary Computation – e.g. genetic algorithms, genetic programming
Vision – e.g. object recognition, image understanding
Robotics – e.g. intelligent control, autonomous exploration
Expert Systems – e.g. decision support systems, teaching systems
Speech Processing– e.g. speech recognition and production
Natural Language Processing – e.g. machine translation
Planning – e.g. scheduling, game playing
Machine Learning – e.g. decision tree learning, version space learning
Speech Processing
As well as trying to understand human systems, there are also numerous real
world applications: speech recognition for dictation systems and voice activated
control; speech production for automated announcements and computer interfaces.
How do we get from sound waves to text streams and vice-versa?

Natural Language Processing


For example, machine understanding and translation of simple sentences:
Planning
Planning refers to the process of choosing/computing the correct sequence of steps
to solve a given problem.
To do this we need some convenient representation of the problem domain. We
can define states in some formal language, such as a subset of predicate logic, or a
series of rules.
A plan can then be seen as a sequence of operations that transform the initial state
into the goal state, i.e. the problem solution. Typically we will use some kind of
search algorithm to find a good plan.
Common Techniques
Even apparently radically different AI systems (such as rule based expert systems
and neural networks) have many common techniques.
Four important ones are:
o Knowledge Representation: Knowledge needs to be represented
somehow – perhaps as a series of if-then rules, as a frame based system, as
a semantic network, or in the connection weights of an artificial neural
network.
o Learning: Automatically building up knowledge from the environment –
such as acquiring the rules for a rule based expert system, or determining
the appropriate connection weights in an artificial neural network.
o Rule Systems: These could be explicitly built into an expert system by a
knowledge engineer, or implicit in the connection weights learnt by a
neural network.
o Search: This can take many forms – perhaps searching for a sequence of
states that leads quickly to a problem solution, or searching for a good set
of connection weights for a neural network by minimizing a fitness
function.

AI and related fields

Logical AI
What a program knows about the world in general the facts of the specific situation in
which it must act, and its goals are all represented by sentences of some mathematical
logical language. The program decides what to do by inferring that certain actions are
appropriate for achieving its goals.

Search
AI programs often examine large numbers of possibilities, e.g. moves in a chess game
or inferences by a theorem proving program. Discoveries are continually made about
how to do this more efficiently in various domains.

Pattern Recognition
When a program makes observations of some kind, it is often programmed to
compare what it sees with a pattern. For example, a vision program may try to match
a pattern of eyes and a nose in a scene in order to find a face. More complex patterns,
e.g. in a natural language text, in a chess position, or in the history of some event are
also studied.
Representation
Facts about the world have to be represented in some way. Usually languages of
mathematical logic are used.

Inference
From some facts, others can be inferred. Mathematical logical deduction is adequate
for some purposes, but new methods of non-monotonic inference have been added to
logic since the 1970s. The simplest kind of non-monotonic reasoning is default
reasoning in which a conclusion is to be inferred by default, but the conclusion can be
withdrawn if there is evidence to the contrary. For example, when we hear of a bird,
we man infer that it can fly, but this conclusion can be reversed when we hear that it
is a penguin. It is the possibility that a conclusion may have to be withdrawn that
constitutes the non-monotonic character of the reasoning. Ordinary logical reasoning
is monotonic in that the set of conclusions that can the drawn from a set of premises is
a monotonic increasing function of the premises.

Common sense knowledge and reasoning


This is the area in which AI is farthest from human-level, in spite of the fact that it has
been an active research area since the 1950s. While there has been considerable
progress, e.g. in developing systems of non-monotonic reasoning and theories of
action, yet more new ideas are needed.

Learning from experience


Programs do that. The approaches to AI based on connectionism and neural nets
specialize in that. There is also learning of laws expressed in logic. Programs can only
learn what facts or behaviors their formalisms can represent, and unfortunately
learning systems are almost all based on very limited abilities to represent
information.

Planning
Planning programs start with general facts about the world (especially facts about the
effects of actions), facts about the particular situation and a statement of a goal. From
these, they generate a strategy for achieving the goal. In the most common cases, the
strategy is just a sequence of actions.

Epistemology
This is a study of the kinds of knowledge that are required for solving problems in the
world.

Ontology
Ontology is the study of the kinds of things that exist. In AI, the programs and
sentences deal with various kinds of objects, and we study what these kinds are and
what their basic properties are. Emphasis on ontology begins in the 1990s.

Heuristics
A heuristic is a way of trying to discover something or an idea imbedded in a
program. The term is used variously in AI. Heuristic functions are used in some
approaches to search to measure how far a node in a search tree seems to be from a
goal. Heuristic predicates that compare two nodes in a search tree to see if one is
better than the other, i.e. constitutes an advance toward the goal, may be more useful.

Genetic Programming
Genetic programming is a technique for getting programs to solve a task by mating
random Lisp programs and selecting fittest in millions of generations.

Search and Control Strategies:

Problem solving is an important aspect of Artificial Intelligence. A problem can be


considered to consist of a goal and a set of actions that can be taken to lead to the goal. At
any given time, we consider the state of the search space to represent where we have reached
as a result of the actions we have applied so far. For example, consider the problem of
looking for a contact lens on a football field. The initial state is how we start out, which is to
say we know that the lens is somewhere on the field, but we don’t know where. If we use the
representation where we examine the field in units of one square foot, then our first action
might be to examine the square in the top-left corner of the field. If we do not find the lens
there, we could consider the state now to be that we have examined the top-left square and
have not found the lens. After a number of actions, the state might be that we have examined
500 squares, and we have now just found the lens in the last square we examined. This is a
goal state because it satisfies the goal that we had of finding a contact lens.

Search is a method that can be used by computers to examine a problem space like
this in order to find a goal. Often, we want to find the goal as quickly as possible or without
using too many resources. A problem space can also be considered to be a search space
because in order to solve the problem, we will search the space for a goal state.We will
continue to use the term search space to describe this concept. In this chapter, we will look at
a number of methods for examining a search space. These methods are called search
methods.

The Importance of Search in AI


It has already become clear that many of the tasks underlying AI can be
phrased in terms of a search for the solution to the problem at hand.
Many goal based agents are essentially problem solving agents which must
decide what to do by searching for a sequence of actions that lead to their
solutions.
For production systems, we have seen the need to search for a sequence of rule
applications that lead to the required fact or action.
For neural network systems, we need to search for the set of connection
weights that will result in the required input to output mapping.
Which search algorithm one should use will generally depend on the problem
domain? There are four important factors to consider:
Completeness – Is a solution guaranteed to be found if at least one solution
exists?
Optimality – Is the solution found guaranteed to be the best (or lowest cost)
solution if there exists more than one solution?
Time Complexity – The upper bound on the time required to find a solution,
as a function of the complexity of the problem.
Space Complexity – The upper bound on the storage space (memory) required
at any point during the search, as a function of the complexity of the problem.
Preliminary concepts

Two varieties of space-for-time algorithms:


Input enhancement — preprocess the input (or its part) to store some info to
be used later in solving the problem
o Counting for sorting
o String searching algorithms
Prestructuring — preprocess the input to make accessing its elements easier
o Hashing
UNIT I FUNDAMENTALS OF ANN

Fundamentals of ANN – Biological Neurons and Their Artificial Models – Types of ANN
– Properties – Different Learning Rules – Types of Activation Functions – Training of
ANN – Perceptron Model (Both Single &Multi-Layer) – Training Algorithm – Problems
Solving Using Learning Rules and Algorithms – Linear Separability Limitation and Its
Over Comings
1. FUNDAMENTALS OF ANN

Neural computing is an information processing paradigm, inspired by biological


system, composed of a large number of highly interconnected processing elements(neurons)
working in unison to solve specific problems.

Artificial neural networks (ANNs), like people, learn by example. An ANN is


configured for a specific application, such as pattern recognition or data classification, through
a learning process. Learning in biological systems involves adjustments to the synaptic
connections that exist between the neurons. This is true of ANNs as well.

1.1 THE BIOLOGICAL NEURON

The human brain consists of a large number, more than a billion of neural cells that
process information. Each cell works like a simple processor. The massive interaction between
all cells and their parallel processing only makes the brain’s abilities possible. Figure 1
represents a human biological nervous unit. Various parts of biological neural network(BNN)
is marked in Figure 1.

Figure 1: Biological Neural Network

2
Dendrites are branching fibres that extend from the cell body or soma.

Soma or cell body of a neuron contains the nucleus and other structures, support
chemical processing and production of neurotransmitters.

Axon is a singular fiber carries information away from the soma to the synaptic sites of
other neurons (dendrites ans somas), muscels, or glands.

Axon hillock is the site of summation for incoming information. At any moment, the
collective influence of all neurons that conduct impulses to a given neuron will determine
whether or n ot an action potential will be initiated at the axon hillock and propagated along
the axon.

Myelin sheath consists of fat-containing cells that insulate the axon from electrical
activity. This insulation acts to increase the rate of transmission of signals. A gap exists
between each myelin sheath cell along the axon. Since fat inhibits the propagation of electricity,
the signals jump from one gap to the next.

Nodes of Ranvier are the gaps (about 1 μm) between myelin sheath cells. Since fat
serves as a good insulator, the myelin sheaths speed the rate of transmission of an electrical
impulse along the axon.

Synapse is the point of connection between two neurons or a neuron and a muscle or a
gland. Electrochemical communication between neurons take place at these junctions.

Terminal buttons of a neuron are the small knobs at the end of an axon that release
chemicals called neurotransmitters.

Information flow in a neural cell

The input/output and the propagation of information are shown below.

1.2 ARTIFICIAL NEURON MODEL

An artificial neuron is a mathematical function conceived as a simple model of a real


(biological) neuron.

 The McCulloch-Pitts Neuron


This is a simplified model of real neurons, known as a Threshold Logic Unit.
 A set of input connections brings in activations from other neuron.

3
 A processing unit sums the inputs, and then applies a non-linear activation function
(i.e. squashing/transfer/threshold function).
 An output line transmits the result to other neurons.

1.2.1 Basic Elements of ANN

Neuron consists of three basic components –weights, thresholds and a single activation
function. An Artificial neural network(ANN) model based on the biological neural sytems is
shown in Figure 2.

Figure 2: Basic Elements of Artificial Neural Network

1.3 DIFFERENT LEARNING RULES

A brief classification of Different Learning algorithms is depicted in figure 3.

 Training: It is the process in which the network is taught to change its


weight and bias.
 Learning: It is the internal process of training where the artificial neural
system learns to update/adapt the weights and biases.

Different Training /Learning procedure available in ANN are

 Supervised learning
 Unsupervised learning
 Reinforced learning
 Hebbian learning
 Gradient descent learning

4
 Competitive learning
 Stochastic learning

1.3.1 Requirements of Learning Laws

• Learning Law should lead to convergence of weights


• Learning or training time should be less for capturing the information from
the training pairs
• Learning should use the local information
• Learning process should able to capture the complex non linear mapping
available between the input & output pairs
• Learning should able to capture as many as patterns as possible
• Storage of pattern information's gathered at the time of learning should be
high for the given network

Figure 3: Different Training methods of Artificial Neural Network

1.3.1.1 Supervised learning

Every input pattern that is used to train the network is associated with an output pattern
which is the target or the desired pattern.

A teacher is assumed to be present during the training process, when a comparison is


made between the network’s computed output and the correct expected output, to determine
the error.The error can then be used to change network parameters, which result in an
improvement in performance.

5
1.3.1.2 Unsupervised learning

In this learning method the target output is not presented to the network.It is as if there
is no teacher to present the desired patterns and hence the system learns of its own by
discovering and adapting to structural features in the input patterns.

1.3.1.3 Reinforced learning

In this method, a teacher though available, doesnot present the expected answer but
only indicates if the computed output correct or incorrect.The information provided helps the
network in the learning process.

1.3.1.4 Hebbian learning

This rule was proposed by Hebb and is based on correlative weight adjustment.This is
the oldest learning mechanism inspired by biology.In this, the input-output pattern pairs (𝑥𝑖 , 𝑦𝑖 )
are associated by the weight matrix W, known as the correlation matrix.

It is computed as

W = ∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 𝑇 ----------- eq (1)

Here 𝑦𝑖 𝑇 is the transposeof the associated output vector 𝑦𝑖 .Numerous variants of the rule have
been proposed.

1.3.1.5 Radient descent learning

This is based on the minimization of error E defined in terms of weights and activation
function of the network.Also it is required that the activation function employed by the network
is differentiable, as the weight update is dependent on the gradient of the error E.

Thus if ∆𝑤𝑖𝑗 is the weight update of the link connecting the 𝑖𝑡ℎ and 𝑗𝑡ℎ neuron of the
two neighbouring layers, then ∆𝑤𝑖𝑗 is defined as,

𝜕𝐸
∆𝑤𝑖𝑗 = ɳ ----------- eq (2)
𝜕𝑤𝑖𝑗

𝜕𝐸
Where, ɳ is the learning rate parameter and is the error gradient with reference to the
𝜕𝑤𝑖𝑗

weight 𝑤𝑖𝑗 .

1.3.1.5 Competitive learning

In this method, those neurons which respond strongly to input stimuli have their weights
updated.

6
When an input pattern is presented, all neurons in the layer compete and the winning
neurons undergoes weight adjustment.Hence it is a winner-takes-all strategy.

1.3.1.6 Stochastic learning

In this method, weights are adjusted in a probablistic fashion.An example is evident in


simulated annealing the learning mechanism employed by Boltzmann and Cauchy machines,
which are a kind of NN systems.

1.3.2 Different Learning Rules

1. Hebb’s Learning Law


2. Perceptron Learning Law
3. Delta Learning Law
4. Wldrow and Hoff LMS Learning Law
5. Correlation Learning Law
6. lnstar (Winner-take-all) Learning Law
7. Outstar Learning Law

The different learning laws or rules with their features is given in Table1 which is given
below

Table 1: Different learning laws with their weight details and learning type

7
1.4 TYPES OF ACTIVATION FUNCTIONS

Common activation functions used in ANN are listed below

1.4.1 Identity Function

f(x) = x - for all x ----------- eq (3)

Figure 4: Identity function

Linear functions are simplest form of Activation function.Refer figure 4 . f(x) is just
an identity function.Usually used in simple networks. It collects the input and produces an
output which is proportionate to the given input. This is Better than step function because it
gives multiple outputs, not just True or False

1.4.2. Binary Step Function (with threshold ) (aka Heaviside Function or Threshold
Function)

1 if x  
f (x)  
 0 if x   ----------- eq (4)

Figure 4: Binary step function

Binary step function is shown in figure 4. It is also called Heaviside function. Some
literatures it is also known as Threshold function. Equation 4 gives the output for this function.

8
1.4.3. Binary Sigmoid

This is also known as Logistic function.The graphical representation is provided in


figure5.Equation 5 gives the output values for this function.

F(x) = [ 1/(1+ e -ax)] ----------- eq (5)

Figure 5: Binary sigmoidal function

1.4.4. Bipolar Sigmoid

Also known as Hyperbolic tangent or tanh function. It is a bounded function whose


values lies in the range of (-1 to +1). This is a shifted version of binary Sigmoid Function. It is
a Non Linear function.Equation 6 represents this type of function.The pictorial representation
for this function is given in figure 6.

F(x) = [ (1- e -ax)/(1+ e -ax)] ----------- eq (6)

It is better than Sigmoidal function and its output is zero centered

Figure 6: Bipolar sigmoidal function

9
1.5 PERCEPTRON MODEL
1.5.1 Simple Perceptron for Pattern Classification

Perceptron network is capable of performing pattern classification into two or more


categories. The perceptron is trained using the perceptron learning rule. We will first consider
classification into two categories and then the general multiclass classification later. For
classification into only two categories, all we need is a single output neuron. Here we will use
bipolar neurons. The simplest architecture that could do the job consists of a layer of N input
neurons, an output layer with a single output neuron, and no hidden layers. This is the same
architecture as we saw before for Hebb learning. However, we will use a different transfer
function here for the output neurons as given below in eq (7). Figure 7 represents a single layer
perceptron network.

----------- eq (7)

Figure 7: Single Layer Perceptron

Equation 7 gives the bipolar activation function which is the most common function
used in the perceptron networks. Figure 7 represents a single layer perceptron network. The
inputs arising from the problem space are collected by the sensors and they are fed to the
aswociation units.Association units are the units which are responsible to associate the inputs
based on their similarities. This unit groups the similar inputs hence the name association unit.

10
A single input from each group is given to the summing unit.Weights are randomnly fixed
intially and assigned to this inputs. The net value is calculate by using the expression

x = Σ wiai – θ ----------- eq (8)

This value is given to the activation function unit to get the final output response.The
actual output is compared with the Target or desired .If they are same then we can stop training
else the weights haqs to be updated .It means there is error .Error is given as δ = b-s , where b
is the desired / Target output and S is the actual outcome of the machinehere the weights are
updated based on the perceptron Learning law as given in equation 9.

Weight change is given as Δw= η δ ai. So new weight is given as

Wi (new) = Wi (old) + Change in weight vector (Δw) ----------- eq (9)

1.5.2 Perceptron Algorithm

Step 1: Initialize weights and bias.For simplicity, set weights and bias to zero.Set
learning rate in the range of zero to one.

• Step 2: While stopping condition is false do steps 2-6


• Step 3: For each training pair s:t do steps 3-5
• Step 4: Set activations of input units xi = ai
• Step 5: Calculate the summing part value Net = Σ aiwi-θ
• Step 6: Compute the response of output unit based on the activation functions
• Step 7: Update weights and bias if an error occurred for this pattern (if y is not
equal to t)
Weight (new) = wi(old) + atxi , & bias (new) = b(old) + at
Else wi(new) = wi(old) & b(new) = b(old)
• Step 8: Test Stopping Condition

1.5.3 Limitations of Single Layer Perceptrons

• Uses only Binary Activation function


• Can be used only for Linear Networks
• Since uses Supervised Learning ,Optimal Solution is provided
• Training Time is More
• Cannot solve Linear In-separable Problem

11
1.5.3 Multi-Layer Perceptron Model
Figure 8 is the general representation of Multi layer Perceptron network.Inbetween the
input and output Layer there will be some more layers also known as Hidden layers.

Figure 8: Multi-Layer Perceptron

1.5.4 Multi Layer Perceptron Algorithm


1. Initialize the weights (Wi) & Bias (B0) to small random values near Zero
2. Set learning rate η or α in the range of “0” to “1”
3. Check for stop condition. If stop condition is false do steps 3 to 7
4. For each Training pairs do step 4 to 7
5. Set activations of Output units: xi = si for i=1 to N
6. Calculate the output Response
yin = b0 + Σ xiwi
7. Activation function used is Bipolar sigmoidal or Bipolar Step functions
For Multi Layer networks, based on the number of layers steps 6 & 7 are repeated
8. If the Targets is (not equal to) = to the actual output (Y), then update weights and
bias based on Perceptron Learning Law
Wi (new) = Wi (old) + Change in weight vector
Change in weight vector = ηtixi
Where η = Learning Rate
ti = Target output of ith unit
xi = ith Input vector
b0(new) = b0 (old) + Change in Bias
Change in Bias = ηti
Else Wi (new) = Wi (old)
b0(new) = b0 (old)
9. Test for Stop condition

12
1.6 LINEARLY SEPERABLE & LINEAR IN SEPARABLE TASKS

Figure 9: Representation of Linear seperable & Linear-in separable Tasks

Perceptron are successful only on problems with a linearly separable solution sapce.
Figure 9 represents both linear separable as well as linear in seperable problem.Perceptron
cannot handle, in particular, tasks which are not linearly separable.(Known as linear
inseparable problem).Sets of points in two dimensional spaces are linearly separable if the sets
can be seperated by a straight line.Generalizing, a set of points in n-dimentional space are that
can be seperated by a straight line.is called Linear seperable as represented in Figure 9.

Single layer perceptron can be used for linear separation.Example AND gate.But it cant
be used for non linear ,inseparable problems.(Example XOR Gate).Consider figure 10.

Figure 10: XOR representation (Linear-in separable Task)

13
Here a single decision line cannot separate the Zeros and Ones Linearly.At least Two
lines are required to separate Zeros and Onesas shown in Figure 10. Hence single layer
networks can not be used to solve inseparable problems. To over come this problem we go for
creation of convex regions.

Convex regions can be created by multiple decision lines arising from multi layer
networks.Single layer network cannot be used to solve inseparable problem.Hence we go for
multilayer network there by creating convex regions which solves the inseparable problem.

1.6.1 Convex Region

Select any Two points in a region and draw a straight line between these two points. If
the points selected and the lines joining them both lie inside the region then that region is
known as convex regions.

1.6.2 Types of convex regions

(a) Open Convex region (b) Closed Convex region

Figure 11: Open convex region

Figure 12 A: Circle - Closed convex Figure 12 B: Triangle - Closed convex


region region

14
Figure 5. General Regression Neural Network

2.6. APPLICATIONS OF ANN

The major application are (1) Character Recognition system


(2) Texture Recognition System etc

2.6.1. Character Recognition System

Consider the characters given in figure 6. Now the objective is to recognise a particular
alphabet, say ‘A’ in this example. Using Image analysis models the particular alphabet is
segmented and converted into Intensity or Gray scale or Pixel values. The general work flow
is shown in Figure 7. The first procedure is segmentation. Segmentation is the process of
Subdividing the images into sub blocks. So alphabet “A” is isolated by using appropriate
segmentation procedures like thresholding or region Growing or Edge detector based
algorithms.

12
Figure 6. Input to Character recognition system

Figure 7. Work flow diagram for character recognition system

After implementing the segmentation procedure, we will obtain an output as shown in


figure 8a. Now this image pattern has to be converted in terms of Binary values. The pattern
is divided into different rows and columns as per the system resolution as shown in figure 8.b.
Now for each square box values in the range of zero to 255 is provide if gray scale is used.
These values represent whether the required object in present inside the square box or not.
These Binary or Gray scale values are taken as input for further processing.

13
Figure 8b. Character Pattern conversion into
Figure 8a. Character Pattern values
intensity
Figures 6,7,8 are adapted from Praveen Kumar et al. (2012), “Character Recognition
using Neural Network”, vol3 ,issue 2., .Pp 978- 981, IJST

For figure 8.b Texture Features, Shape Features and or Boundary features etc can be
extracted. This feature values are known as exemplars which is the actual input into the
neural network. Consider any neural network. The input is the feature table created as
explained in the above process, which is shown in Figure 9. This table is provided as in put to
the neural system

Figure 9: Character ‘a’ is segmented and binary values extracted from it

Figure 9 Adopted from Yusuf Perwej et al. (2011), “Neural Networks for Handwritten
English Alphabet Recognition”, International Journal of Computer Applications (0975 –
8887) Volume 20– No.7, April 2011.

Figure 10 shows the full implementation using a multi-layer neural network

14
Figure 10. ANN implementation of character recognition system

Figure 10: Adopted from Anita pal et al. (2010), “Handwritten English Character
Recognition Using Neural Network”, International Journal of Computer Science &
Communication, Vol. 1, No. 2, July-December 2010, pp. 141-144.

If the feature sets matches between the trained and current input features the output
produces “1” , which denotes that the particular alphabet is trained else “0” not recognised.

Note: Similar procedure is used for texture classification Application

REFERENCE BOOKS

1. B. Yegnanarayana, “Artificial Neural Networks” Prentice Hall Publications.

2. Simon Haykin, “Artificial Neural Networks”, Second Edition, Pearson Education.

3. Laurene Fausett, “Fundamentals of Neural Networks, Architectures, Algorithms and


Applications”, Prentice Hall publications.

4. James A. Freeman & Skapura, “Neural Networks”, Pearson Education.

************************** ALL THE BEST ******************************

15

You might also like