Ai Introduction
Ai Introduction
What is AI ?
Artificial Intelligence (AI) is a branch of Science which deals with helping machines
find solutions to complex problems in a more human-like fashion.
This generally involves borrowing characteristics from human intelligence, and
applying them as algorithms in a computer friendly way.
A more or less flexible or efficient approach can be taken depending on the
requirements established, which influences how artificial the intelligent behavior
appears
Artificial intelligence can be viewed from a variety of perspectives.
From the perspective of intelligence
artificial intelligence is making machines "intelligent" -- acting as we would
expect people to act.
o The inability to distinguish computer responses from human responses
is called the Turing test.
o Intelligence requires knowledge
o Expert problem solving - restricting domain to allow including
significant relevant knowledge
From a business perspective AI is a set of very powerful tools, and
methodologies for using those tools to solve business problems.
From a programming perspective, AI includes the study of symbolic
programming, problem solving, and search.
o Typically AI programs focus on symbols rather than numeric
processing.
o Problem solving - achieve goals.
o Search - seldom access a solution directly. Search may include a
variety of techniques.
o AI programming languages include:
– LISP, developed in the 1950s, is the early programming language
strongly associated with AI. LISP is a functional programming language with
procedural extensions. LISP (LISt Processor) was specifically designed for
processing heterogeneous lists -- typically a list of symbols. Features of LISP
are run- time type checking, higher order functions (functions that have other
functions as parameters), automatic memory management (garbage collection)
and an interactive environment.
– The second language strongly associated with AI is PROLOG.
PROLOG was developed in the 1970s. PROLOG is based on first order logic.
PROLOG is declarative in nature and has facilities for explicitly limiting the
search space.
– Object-oriented languages are a class of languages more recently used
for AI programming. Important features of object-oriented languages include:
concepts of objects and messages, objects bundle data and methods for
manipulating the data, sender specifies what is to be done receiver decides
how to do it, inheritance (object hierarchy where objects inherit the attributes
of the more general class of objects). Examples of object-oriented languages
are Smalltalk, Objective C, C++. Object oriented extensions to LISP (CLOS -
Common LISP Object System) and PROLOG (L&O - Logic & Objects) are
also used.
Artificial Intelligence is a new electronic machine that stores large amount of
information and process it at very high speed
The computer is interrogated by a human via a teletype It passes if the human cannot
tell if there is a computer or human at the other end
The ability to solve problems
It is the science and engineering of making intelligent machines, especially intelligent
computer programs. It is related to the similar task of using computers to understand
human intelligence
Importance of AI
Game Playing
You can buy machines that can play master level chess for a few hundred dollars.
There is some AI in them, but they play well against people mainly through brute
force computation--looking at hundreds of thousands of positions. To beat a world
champion by brute force and known reliable heuristics requires being able to look at
200 million positions per second.
Speech Recognition
In the 1990s, computer speech recognition reached a practical level for limited
purposes. Thus United Airlines has replaced its keyboard tree for flight information
by a system using speech recognition of flight numbers and city names. It is quite
convenient. On the other hand, while it is possible to instruct some computers using
speech, most users have gone back to the keyboard and the mouse as still more
convenient.
Computer Vision
The world is composed of three-dimensional objects, but the inputs to the human eye
and computers' TV cameras are two dimensional. Some useful programs can work
solely in two dimensions, but full computer vision requires partial three-dimensional
information that is not just a set of two-dimensional views. At present there are only
limited ways of representing three-dimensional information directly, and they are not
as good as what humans evidently use.
Expert Systems
A ``knowledge engineer'' interviews experts in a certain domain and tries to embody
their knowledge in a computer program for carrying out some task. How well this
works depends on whether the intellectual mechanisms required for the task are
within the present state of AI. When this turned out not to be so, there were many
disappointing results. One of the first expert systems was MYCIN in 1974, which
diagnosed bacterial infections of the blood and suggested treatments. It did better than
medical students or practicing doctors, provided its limitations were observed.
Namely, its ontology included bacteria, symptoms, and treatments and did not include
patients, doctors, hospitals, death, recovery, and events occurring in time. Its
interactions depended on a single patient being considered. Since the experts
consulted by the knowledge engineers knew about patients, doctors, death, recovery,
etc., it is clear that the knowledge engineers forced what the experts told them into a
predetermined framework. The usefulness of current expert systems depends on their
users having common sense.
Heuristic Classification
One of the most feasible kinds of expert system given the present knowledge of AI is
to put some information in one of a fixed set of categories using several sources of
information. An example is advising whether to accept a proposed credit card
purchase. Information is available about the owner of the credit card, his record of
payment and also about the item he is buying and about the establishment from which
he is buying it (e.g., about whether there have been previous credit card frauds at this
establishment).
Early work in AI
“Artificial Intelligence (AI) is the part of computer science concerned with designing
intelligent computer systems, that is, systems that exhibit characteristics we associate
with intelligence in human behaviour – understanding language, learning, reasoning,
solving problems, and so on.”
Scientific Goal To determine which ideas about knowledge representation, learning,
rule systems, search, and so on, explain various sorts of real intelligence.
Engineering Goal To solve real world problems using AI techniques such as
knowledge representation, learning, rule systems, search, and so on.
Traditionally, computer scientists and engineers have been more interested in the
engineering goal, while psychologists, philosophers and cognitive scientists have been
more interested in the scientific goal.
The Roots - Artificial Intelligence has identifiable roots in a number of older
disciplines, particularly:
Philosophy
Logic/Mathematics
Computation
Psychology/Cognitive Science
Biology/Neuroscience
Evolution
There is inevitably much overlap, e.g. between philosophy and logic, or between
mathematics and computation. By looking at each of these in turn, we can gain a
better understanding of their role in AI, and how these underlying disciplines have
developed to play that role.
Philosophy
~400 BC Socrates asks for an algorithm to distinguish piety from non-piety.
~350 BC Aristotle formulated different styles of deductive reasoning, which
could mechanically generate conclusions from initial premises, e.g. Modus Ponens
If A?B and A then B
If A implies B and A is true then B is true when it’s raining you
1596 – 1650 Rene Descartes idea of mind-body dualism – part of the mind is
exempt from physical laws.
1646 – 1716 Wilhelm Leibnitz was one of the first to take the materialist position
which holds that the mind operates by ordinary physical processes – this has the
implication that mental processes can potentially be carried out by machines.
Logic/Mathematics
Earl Stanhope’s Logic Demonstrator was a machine that was able to solve
syllogisms, numerical problems in a logical form, and elementary questions of
probability.
1815 – 1864 George Boole introduced his formal language for making logical
inference in 1847 – Boolean algebra.
1848 – 1925 Gottlob Frege produced a logic that is essentially the first-order
logic that today forms the most basic knowledge representation system.
1906 – 1978 Kurt Gödel showed in 1931 that there are limits to what logic can
do. His Incompleteness Theorem showed that in any formal logic powerful
enough to describe the properties of natural numbers, there are true statements
whose truth cannot be established by any algorithm.
1995 Roger Penrose tries to prove the human mind has non-computable
capabilities.
Computation
1869 William Jevon’s Logic Machine could handle Boolean Algebra and Venn
Diagrams, and was able to solve logical problems faster than human beings.
1912 – 1954 Alan Turing tried to characterise exactly which functions are
capable of being computed. Unfortunately it is difficult to give the notion of
computation a formal definition. However, the Church-Turing thesis, which states
that a Turing machine is capable of computing any computable function, is
generally accepted as providing a sufficient definition. Turing also showed that
there were some functions which no Turing machine can compute (e.g. Halting
Problem).
1903 – 1957 John von Neumann proposed the von Neuman architecture which
allows a description of computation that is independent of the particular
realisation of the computer.
1960s Two important concepts emerged: Intractability (when solution time
grows atleast exponentially) and Reduction (to ‘easier’ problems).
Psychology / Cognitive Science
Modern Psychology / Cognitive Psychology / Cognitive Science is the science
which studies how the mind operates, how we behave, and how our brains process
information.
Language is an important part of human intelligence. Much of the early work on
knowledge representation was tied to language and informed by research into
linguistics.
It is natural for us to try to use our understanding of how human (and other
animal) brains lead to intelligent behavior in our quest to build artificial intelligent
systems. Conversely, it makes sense to explore the properties of artificial systems
(computer models/simulations) to test our hypotheses concerning human systems.
Many sub-fields of AI are simultaneously building models of how the human
system operates, and artificial systems for solving real world problems, and are
allowing useful ideas to transfer between them.
Biology / Neuroscience
Our brains (which give rise to our intelligence) are made up of tens of billions of
neurons, each connected to hundreds or thousands of other neurons.
Each neuron is a simple processing device (e.g. just firing or not firing depending
on the total amount of activity feeding into it). However, large networks of
neurons are extremely powerful computational devices that can learn how best to
operate.
The field of Connectionism or Neural Networks attempts to build artificial
systems based on simplified networks of simplified artificial neurons.
The aim is to build powerful AI systems, as well as models of various human
abilities.
Neural networks work at a sub-symbolic level, whereas much of conscious human
reasoning appears to operate at a symbolic level.
Artificial neural networks perform well at many simple tasks, and provide good
models of many human abilities. However, there are many tasks that they are not
so good at, and other approaches seem more promising in those areas.
Evolution
One advantage humans have over current machines/computers is that they have a
long evolutionary history.
Charles Darwin (1809 – 1882) is famous for his work on evolution by natural
selection. The idea is that fitter individuals will naturally tend to live longer and
produce more children, and hence after many generations a population will
automatically emerge with good innate properties.
This has resulted in brains that have much structure, or even knowledge, built in at
birth.
This gives them at the advantage over simple artificial neural network systems
that have to learn everything.
Computers are finally becoming powerful enough that we can simulate evolution
and evolve good AI systems.
We can now even evolve systems (e.g. neural networks) so that they are good at
learning.
A related field called genetic programming has had some success in evolving
programs, rather than programming them by hand.
Sub-fields of Artificial Intelligence
Neural Networks – e.g. brain modelling, time series prediction, classification
Evolutionary Computation – e.g. genetic algorithms, genetic programming
Vision – e.g. object recognition, image understanding
Robotics – e.g. intelligent control, autonomous exploration
Expert Systems – e.g. decision support systems, teaching systems
Speech Processing– e.g. speech recognition and production
Natural Language Processing – e.g. machine translation
Planning – e.g. scheduling, game playing
Machine Learning – e.g. decision tree learning, version space learning
Speech Processing
As well as trying to understand human systems, there are also numerous real
world applications: speech recognition for dictation systems and voice activated
control; speech production for automated announcements and computer interfaces.
How do we get from sound waves to text streams and vice-versa?
Logical AI
What a program knows about the world in general the facts of the specific situation in
which it must act, and its goals are all represented by sentences of some mathematical
logical language. The program decides what to do by inferring that certain actions are
appropriate for achieving its goals.
Search
AI programs often examine large numbers of possibilities, e.g. moves in a chess game
or inferences by a theorem proving program. Discoveries are continually made about
how to do this more efficiently in various domains.
Pattern Recognition
When a program makes observations of some kind, it is often programmed to
compare what it sees with a pattern. For example, a vision program may try to match
a pattern of eyes and a nose in a scene in order to find a face. More complex patterns,
e.g. in a natural language text, in a chess position, or in the history of some event are
also studied.
Representation
Facts about the world have to be represented in some way. Usually languages of
mathematical logic are used.
Inference
From some facts, others can be inferred. Mathematical logical deduction is adequate
for some purposes, but new methods of non-monotonic inference have been added to
logic since the 1970s. The simplest kind of non-monotonic reasoning is default
reasoning in which a conclusion is to be inferred by default, but the conclusion can be
withdrawn if there is evidence to the contrary. For example, when we hear of a bird,
we man infer that it can fly, but this conclusion can be reversed when we hear that it
is a penguin. It is the possibility that a conclusion may have to be withdrawn that
constitutes the non-monotonic character of the reasoning. Ordinary logical reasoning
is monotonic in that the set of conclusions that can the drawn from a set of premises is
a monotonic increasing function of the premises.
Planning
Planning programs start with general facts about the world (especially facts about the
effects of actions), facts about the particular situation and a statement of a goal. From
these, they generate a strategy for achieving the goal. In the most common cases, the
strategy is just a sequence of actions.
Epistemology
This is a study of the kinds of knowledge that are required for solving problems in the
world.
Ontology
Ontology is the study of the kinds of things that exist. In AI, the programs and
sentences deal with various kinds of objects, and we study what these kinds are and
what their basic properties are. Emphasis on ontology begins in the 1990s.
Heuristics
A heuristic is a way of trying to discover something or an idea imbedded in a
program. The term is used variously in AI. Heuristic functions are used in some
approaches to search to measure how far a node in a search tree seems to be from a
goal. Heuristic predicates that compare two nodes in a search tree to see if one is
better than the other, i.e. constitutes an advance toward the goal, may be more useful.
Genetic Programming
Genetic programming is a technique for getting programs to solve a task by mating
random Lisp programs and selecting fittest in millions of generations.
Search is a method that can be used by computers to examine a problem space like
this in order to find a goal. Often, we want to find the goal as quickly as possible or without
using too many resources. A problem space can also be considered to be a search space
because in order to solve the problem, we will search the space for a goal state.We will
continue to use the term search space to describe this concept. In this chapter, we will look at
a number of methods for examining a search space. These methods are called search
methods.
Fundamentals of ANN – Biological Neurons and Their Artificial Models – Types of ANN
– Properties – Different Learning Rules – Types of Activation Functions – Training of
ANN – Perceptron Model (Both Single &Multi-Layer) – Training Algorithm – Problems
Solving Using Learning Rules and Algorithms – Linear Separability Limitation and Its
Over Comings
1. FUNDAMENTALS OF ANN
The human brain consists of a large number, more than a billion of neural cells that
process information. Each cell works like a simple processor. The massive interaction between
all cells and their parallel processing only makes the brain’s abilities possible. Figure 1
represents a human biological nervous unit. Various parts of biological neural network(BNN)
is marked in Figure 1.
2
Dendrites are branching fibres that extend from the cell body or soma.
Soma or cell body of a neuron contains the nucleus and other structures, support
chemical processing and production of neurotransmitters.
Axon is a singular fiber carries information away from the soma to the synaptic sites of
other neurons (dendrites ans somas), muscels, or glands.
Axon hillock is the site of summation for incoming information. At any moment, the
collective influence of all neurons that conduct impulses to a given neuron will determine
whether or n ot an action potential will be initiated at the axon hillock and propagated along
the axon.
Myelin sheath consists of fat-containing cells that insulate the axon from electrical
activity. This insulation acts to increase the rate of transmission of signals. A gap exists
between each myelin sheath cell along the axon. Since fat inhibits the propagation of electricity,
the signals jump from one gap to the next.
Nodes of Ranvier are the gaps (about 1 μm) between myelin sheath cells. Since fat
serves as a good insulator, the myelin sheaths speed the rate of transmission of an electrical
impulse along the axon.
Synapse is the point of connection between two neurons or a neuron and a muscle or a
gland. Electrochemical communication between neurons take place at these junctions.
Terminal buttons of a neuron are the small knobs at the end of an axon that release
chemicals called neurotransmitters.
3
A processing unit sums the inputs, and then applies a non-linear activation function
(i.e. squashing/transfer/threshold function).
An output line transmits the result to other neurons.
Neuron consists of three basic components –weights, thresholds and a single activation
function. An Artificial neural network(ANN) model based on the biological neural sytems is
shown in Figure 2.
Supervised learning
Unsupervised learning
Reinforced learning
Hebbian learning
Gradient descent learning
4
Competitive learning
Stochastic learning
Every input pattern that is used to train the network is associated with an output pattern
which is the target or the desired pattern.
5
1.3.1.2 Unsupervised learning
In this learning method the target output is not presented to the network.It is as if there
is no teacher to present the desired patterns and hence the system learns of its own by
discovering and adapting to structural features in the input patterns.
In this method, a teacher though available, doesnot present the expected answer but
only indicates if the computed output correct or incorrect.The information provided helps the
network in the learning process.
This rule was proposed by Hebb and is based on correlative weight adjustment.This is
the oldest learning mechanism inspired by biology.In this, the input-output pattern pairs (𝑥𝑖 , 𝑦𝑖 )
are associated by the weight matrix W, known as the correlation matrix.
It is computed as
Here 𝑦𝑖 𝑇 is the transposeof the associated output vector 𝑦𝑖 .Numerous variants of the rule have
been proposed.
This is based on the minimization of error E defined in terms of weights and activation
function of the network.Also it is required that the activation function employed by the network
is differentiable, as the weight update is dependent on the gradient of the error E.
Thus if ∆𝑤𝑖𝑗 is the weight update of the link connecting the 𝑖𝑡ℎ and 𝑗𝑡ℎ neuron of the
two neighbouring layers, then ∆𝑤𝑖𝑗 is defined as,
𝜕𝐸
∆𝑤𝑖𝑗 = ɳ ----------- eq (2)
𝜕𝑤𝑖𝑗
𝜕𝐸
Where, ɳ is the learning rate parameter and is the error gradient with reference to the
𝜕𝑤𝑖𝑗
weight 𝑤𝑖𝑗 .
In this method, those neurons which respond strongly to input stimuli have their weights
updated.
6
When an input pattern is presented, all neurons in the layer compete and the winning
neurons undergoes weight adjustment.Hence it is a winner-takes-all strategy.
The different learning laws or rules with their features is given in Table1 which is given
below
Table 1: Different learning laws with their weight details and learning type
7
1.4 TYPES OF ACTIVATION FUNCTIONS
Linear functions are simplest form of Activation function.Refer figure 4 . f(x) is just
an identity function.Usually used in simple networks. It collects the input and produces an
output which is proportionate to the given input. This is Better than step function because it
gives multiple outputs, not just True or False
1.4.2. Binary Step Function (with threshold ) (aka Heaviside Function or Threshold
Function)
1 if x
f (x)
0 if x ----------- eq (4)
Binary step function is shown in figure 4. It is also called Heaviside function. Some
literatures it is also known as Threshold function. Equation 4 gives the output for this function.
8
1.4.3. Binary Sigmoid
9
1.5 PERCEPTRON MODEL
1.5.1 Simple Perceptron for Pattern Classification
----------- eq (7)
Equation 7 gives the bipolar activation function which is the most common function
used in the perceptron networks. Figure 7 represents a single layer perceptron network. The
inputs arising from the problem space are collected by the sensors and they are fed to the
aswociation units.Association units are the units which are responsible to associate the inputs
based on their similarities. This unit groups the similar inputs hence the name association unit.
10
A single input from each group is given to the summing unit.Weights are randomnly fixed
intially and assigned to this inputs. The net value is calculate by using the expression
This value is given to the activation function unit to get the final output response.The
actual output is compared with the Target or desired .If they are same then we can stop training
else the weights haqs to be updated .It means there is error .Error is given as δ = b-s , where b
is the desired / Target output and S is the actual outcome of the machinehere the weights are
updated based on the perceptron Learning law as given in equation 9.
Step 1: Initialize weights and bias.For simplicity, set weights and bias to zero.Set
learning rate in the range of zero to one.
11
1.5.3 Multi-Layer Perceptron Model
Figure 8 is the general representation of Multi layer Perceptron network.Inbetween the
input and output Layer there will be some more layers also known as Hidden layers.
12
1.6 LINEARLY SEPERABLE & LINEAR IN SEPARABLE TASKS
Perceptron are successful only on problems with a linearly separable solution sapce.
Figure 9 represents both linear separable as well as linear in seperable problem.Perceptron
cannot handle, in particular, tasks which are not linearly separable.(Known as linear
inseparable problem).Sets of points in two dimensional spaces are linearly separable if the sets
can be seperated by a straight line.Generalizing, a set of points in n-dimentional space are that
can be seperated by a straight line.is called Linear seperable as represented in Figure 9.
Single layer perceptron can be used for linear separation.Example AND gate.But it cant
be used for non linear ,inseparable problems.(Example XOR Gate).Consider figure 10.
13
Here a single decision line cannot separate the Zeros and Ones Linearly.At least Two
lines are required to separate Zeros and Onesas shown in Figure 10. Hence single layer
networks can not be used to solve inseparable problems. To over come this problem we go for
creation of convex regions.
Convex regions can be created by multiple decision lines arising from multi layer
networks.Single layer network cannot be used to solve inseparable problem.Hence we go for
multilayer network there by creating convex regions which solves the inseparable problem.
Select any Two points in a region and draw a straight line between these two points. If
the points selected and the lines joining them both lie inside the region then that region is
known as convex regions.
14
Figure 5. General Regression Neural Network
Consider the characters given in figure 6. Now the objective is to recognise a particular
alphabet, say ‘A’ in this example. Using Image analysis models the particular alphabet is
segmented and converted into Intensity or Gray scale or Pixel values. The general work flow
is shown in Figure 7. The first procedure is segmentation. Segmentation is the process of
Subdividing the images into sub blocks. So alphabet “A” is isolated by using appropriate
segmentation procedures like thresholding or region Growing or Edge detector based
algorithms.
12
Figure 6. Input to Character recognition system
13
Figure 8b. Character Pattern conversion into
Figure 8a. Character Pattern values
intensity
Figures 6,7,8 are adapted from Praveen Kumar et al. (2012), “Character Recognition
using Neural Network”, vol3 ,issue 2., .Pp 978- 981, IJST
For figure 8.b Texture Features, Shape Features and or Boundary features etc can be
extracted. This feature values are known as exemplars which is the actual input into the
neural network. Consider any neural network. The input is the feature table created as
explained in the above process, which is shown in Figure 9. This table is provided as in put to
the neural system
Figure 9 Adopted from Yusuf Perwej et al. (2011), “Neural Networks for Handwritten
English Alphabet Recognition”, International Journal of Computer Applications (0975 –
8887) Volume 20– No.7, April 2011.
14
Figure 10. ANN implementation of character recognition system
Figure 10: Adopted from Anita pal et al. (2010), “Handwritten English Character
Recognition Using Neural Network”, International Journal of Computer Science &
Communication, Vol. 1, No. 2, July-December 2010, pp. 141-144.
If the feature sets matches between the trained and current input features the output
produces “1” , which denotes that the particular alphabet is trained else “0” not recognised.
REFERENCE BOOKS
15