Artificial Intelligence Notes
Artificial Intelligence Notes
Artificial Intelligence Notes
com
Module 1 12Hrs
What is Artificial Intelligence? AI Technique, Level of the Model,Problem Spaces, and Search: Defining
the Problem as a State Space Search, Production Systems, Problem Characteristics, Production System
Characteristics, Issues in the Design of Search Programs. Heuristic Search Techniques: Generate-and-
Test, Hill Climbing, Best-first Search, Problem Reduction, Constraint Satisfaction, Means-ends
Module 2 10Hrs
Game Playing: The Minimax Search Procedure, Adding Alpha-beta Cutoffs, Iterative Deepening.Planning:
The Blocks World, Components of a Planning System, Goal Stack Planning, Nonlinear Planning Using
Constraint Posting, Hierarchical PlanningOther Planning Techniques.Understanding: What is
Understanding, What Makes Understanding Hard?, Understanding as Constraint Satisfaction.Natural
Language Processing: Introduction, Syntactic Processing, Semantic Analysis, Discourse and Pragmatic
Processing, Statistical Natural Language Processing, Spell Checking.
Module 3 8Hrs
Learning: Rote Learning, learning by Taking Advice, Learning in Problem-solving, Learning from
Examples: Induction, Explanation-based Learning, Discovery, Analogy, Formal Learning Theory, Neural
Net Learning and Genetic Learning. Expert Systems: Representing and Using Domain Knowledge, Expert
System Shells, Explanation, Knowledge Acquisition.
Module 1
ARTIFICIAL INTELLIGENCE
What is Artificial Intelligence?
It is a branch of Computer Science that pursues creating the computers or machines as intelligent
as human beings.
It is the science and engineering of making intelligent machines, especially intelligent computer
programs.
It is related to the similar task of using computers to understand human intelligence, but AI does
not have to confine itself to methods that are biologically observable
Definition: Artificial Intelligence is the study of how to make computers do things, which, at the
moment, people do better.
According to the father of Artificial Intelligence, John McCarthy, it is “The science and
engineering of making intelligent machines, especially intelligent computer programs”.
Artificial Intelligence is a way of making a computer, a computer-controlled robot, or a
software think intelligently, in the similar manner the intelligent humans think.
AI is accomplished by studying how human brain thinks and how humans learn, decide, and
work while trying to solve a problem, and then using the outcomes of this study as a basis of
developing intelligent software and systems.
It has gained prominence recently due, in part, to big data, or the increase in speed, size and
variety of data businesses are now collecting. AI can perform tasks such as identifying patterns
in the data more efficiently than humans, enabling businesses to gain more insight out of
their data.
From a business perspective AI is a set of very powerful tools, and methodologies for using
those tools to solve business problems.
From a programming perspective, AI includes the study of symbolic programming, problem
solving, and search.
AI Vocabulary
Intelligence relates to tasks involving higher mental processes, e.g. creativity, solving problems,
pattern recognition, classification, learning, induction, deduction, building analogies,
optimization, language processing, knowledge and many more. Intelligence is the computational
part of the ability to achieve goals.
Intelligent behaviour is depicted by perceiving one’s environment, acting in complex
environments, learning and understanding from experience, reasoning to solve problems and
discover hidden knowledge, applying knowledge successfully in new situations, thinking
abstractly, using analogies, communicating with others and more.
Engineering based goals of AI relate to developing concepts, theory and practice of building
intelligent machines. The emphasis is on system building.
AI Techniques depict how we represent, manipulate and reason with knowledge in order to
solve problems. Knowledge is a collection of ‘facts’. To manipulate these facts by a program, a
suitable representation is required. A good representation facilitates problem solving.
Learning means that programs learn from what facts or behaviour can represent. Learning
denotes changes in the systems that are adaptive in other words, it enables the system to do the
same task(s) more efficiently next time.
Applications of AI refers to problem solving, search and control strategies, speech recognition,
natural language understanding, computer vision, expert systems, etc.
Problems of AI:
Intelligence does not imply perfect understanding; every intelligent being has limited perception,
memory and computation. Many points on the spectrum of intelligence versus cost are viable,
from insects to humans. AI seeks to understand the computations required from intelligent
behaviour and to produce computer systems that exhibit intelligence. Aspects of intelligence
studied by AI include perception, communicational using human languages, reasoning, planning,
learning and memory.
Branches of AI:
A list of branches of AI is given below. However some branches are surely missing, because no
one has identified them yet. Some of these may be regarded as concepts or topics rather than full
branches.
Logical AI — In general the facts of the specific situation in which it must act, and its goals are
all represented by sentences of some mathematical logical language. The program decides what
to do by inferring that certain actions are appropriate for achieving its goals.
Search — Artificial Intelligence programs often examine large numbers of possibilities – for
example, moves in a chess game and inferences by a theorem proving program. Discoveries are
frequently made about how to do this more efficiently in various domains.
Pattern Recognition — When a program makes observations of some kind, it is often planned
to compare what it sees with a pattern. For example, a vision program may try to match a pattern
of eyes and a nose in a scene in order to find a face. More complex patterns are like a natural
language text, a chess position or in the history of some event. These more complex patterns
require quite different methods than do the simple patterns that have been studied the most.
Representation — Usually languages of mathematical logic are used to represent the facts about
the world.
Inference — Others can be inferred from some facts. Mathematical logical deduction is
sufficient for some purposes, but new methods of non-monotonic inference have been added to
the logic since the 1970s. The simplest kind of non-monotonic reasoning is default reasoning in
which a conclusion is to be inferred by default. But the conclusion can be withdrawn if there is
evidence to the divergent. For example, when we hear of a bird, we infer that it can fly, but this
conclusion can be reversed when we hear that it is a penguin. It is the possibility that a
conclusion may have to be withdrawn that constitutes the non-monotonic character of the
reasoning. Normal logical reasoning is monotonic, in that the set of conclusions can be drawn
from a set of premises, i.e. monotonic increasing function of the premises. Circumscription is
another form of non-monotonic reasoning.
Common sense knowledge and Reasoning — This is the area in which AI is farthest from the
human level, in spite of the fact that it has been an active research area since the 1950s. While
there has been considerable progress in developing systems of non-monotonic reasoning and
theories of action, yet more new ideas are needed.
Learning from experience — There are some rules expressed in logic for learning. Programs
can only learn what facts or behaviour their formalisms can represent, and unfortunately learning
systems are almost all based on very limited abilities to represent information.
Planning — Planning starts with general facts about the world (especially facts about the effects
of actions), facts about the particular situation and a statement of a goal. From these, planning
programs generate a strategy for achieving the goal. In the most common cases, the strategy is
just a sequence of actions.
Epistemology — This is a study of the kinds of knowledge that are required for solving
problems in the world.
Ontology — Ontology is the study of the kinds of things that exist. In AI the programs and
sentences deal with various kinds of objects and we study what these kinds are and what their
basic properties are. Ontology assumed importance from the 1990s.
Applications of AI
AI has applications in all fields of human study, such as finance and economics, environmental
engineering, chemistry, computer science, and so on. Some of the applications of AI are listed
below:
Perception
■ Machine vision
■ Speech understanding
■ Touch ( tactile or haptic) sensation
Robotics
Natural Language Processing
■ Natural Language Understanding
■ Speech Understanding
■ Language Generation
■ Machine Translation
Planning
Expert Systems
Machine Learning
Theorem Proving
Symbolic Mathematics
Game Playing
AI Technique:
Artificial Intelligence research during the last three decades has concluded that Intelligence
requires knowledge. To compensate overwhelming quality, knowledge possesses less desirable
properties.
A. It is huge.
B. It is difficult to characterize correctly.
C. It is constantly varying.
D. It differs from data by being organized in a way that corresponds to its application.
E. It is complicated.
It can be understood by people who must provide it—even though for many
programs bulk of the data comes automatically from readings.
In many AI domains, how the people understand the same people must supply the
knowledge to a program.
It can be easily modified to correct errors and reflect changes in real conditions.
It can be used to help overcome its own sheer bulk by helping to narrow the range
of possibilities that must be usually considered.
In order to characterize an AI technique let us consider initially OXO or tic-tac-toe and use a
series of different approaches to play the game.
The programs increase in complexity, their use of generalizations, the clarity of their
knowledge and the extensibility of their approach. In this way they move towards being
representations of AI techniques.
Example-1: Tic-Tac-Toe
The Tic-Tac-Toe game consists of a nine element vector called BOARD; it represents the
numbers 1 to 9 in three rows.
An element contains the value 0 for blank, 1 for X and 2 for O. A MOVETABLE vector consists
of 19,683 elements (39) and is needed where each element is a nine element vector. The contents
of the vector are especially chosen to help the algorithm.
The algorithm makes moves by pursuing the following:
1. View the vector as a ternary number. Convert it to a decimal number.
2. Use the decimal number as an index in MOVETABLE and access the vector.
3. Set BOARD to this vector indicating how the board looks after the move. This approach is
capable in time but it has several disadvantages. It takes more space and requires stunning
effort to calculate the decimal numbers. This method is specific to this game and cannot be
completed.
The structure of the data is as before but we use 2 for a blank, 3 for an X and 5 for an O.
A variable called TURN indicates 1 for the first move and 9 for the last. The algorithm consists
of three actions:
MAKE2 which returns 5 if the centre square is blank; otherwise it returns any blank non-
corner square, i.e. 2, 4, 6 or 8. POSSWIN (p) returns 0 if player p cannot win on the next move
and otherwise returns the number of the square that gives a winning move.
It checks each line using products 3*3*2 = 18 gives a win for X, 5*5*2=50 gives a win
for O, and the winning move is the holder of the blank. GO (n) makes a move to square n setting
BOARD[n] to 3 or 5.
This algorithm is more involved and takes longer but it is more efficient in storage which
compensates for its longer time. It depends on the programmer’s skill.
The structure of the data consists of BOARD which contains a nine element vector, a list of
board positions that could result from the next move and a number representing an estimation of how
the board position leads to an ultimate win for the player to move.
This algorithm looks ahead to make a decision on the next move by deciding which the most
promising move or the most suitable move at any stage would be and selects the same.
Consider all possible moves and replies that the program can make. Continue this process for
as long as time permits until a winner emerges, and then choose the move that leads to the computer
program winning, if possible in the shortest time.
Actually this is most difficult to program by a good limit but it is as far that the technique can
be extended to in any game. This method makes relatively fewer loads on the programmer in terms
of the game technique but the overall game strategy must be known to the adviser.
Let us consider Question Answering systems that accept input in English and provide
answers also in English. This problem is harder than the previous one as it is more difficult to
specify the problem properly. Another area of difficulty concerns deciding whether the answer
obtained is correct, or not, and further what is meant by ‘correct’. For example, consider the
following situation:
2.1 Text
Rani went shopping for a new Coat. She found a red one she really liked.
When she got home, she found that it went perfectly with her favourite dress.
2.2 Question
Method 1
A set of templates that match common questions and produce patterns used to match
against inputs. Templates and patterns are used so that a template that matches a given question
is associated with the corresponding pattern to find the answer in the input text. For example, the
template who did x y generates x y z if a match occurs and z is the answer to the question. The
given text and the question are both stored as strings.
2.4 Algorithm
Compare the template against the questions and store all successful matches to produce a
set of text patterns.
Pass these text patterns through a substitution process to change the person or voice and
produce an expanded set of text patterns.
Apply each of these patterns to the text; collect all the answers and then print the
answers.
2.5 Example
In question 1 we use the template WHAT DID X Y which generates Rani go shopping for z and
after substitution we get Rani goes shopping for z and Rani went shopping for z giving z
[equivalence] a new coat
In question 2 we need a very large number of templates and also a scheme to allow the insertion
of ‘find’ before ‘that she liked’; the insertion of ‘really’ in the text; and the substitution of ‘she’
for ‘Rani’ gives the answer ‘a red one’.
2.6 Comments
This is a very primitive approach basically not matching the criteria we set for
intelligence and worse than that, used in the game. Surprisingly this type of technique was
actually used in ELIZA which will be considered later in the course.
Method 2
2.7 Data Structures
A structure called English consists of a dictionary, grammar and some semantics about
the vocabulary we are likely to come across. This data structure provides the knowledge to
convert English text into a storable internal form and also to convert the response back into
English. The structured representation of the text is a processed form and defines the context of
the input text by making explicit all references such as pronouns. There are three types of such
knowledge representation systems: production rules of the form ‘if x then y’, slot and filler
systems and statements in mathematical logic. The system used here will be the slot and filler
system.
Event2 Event2
instance: finding instance: liking
tense: past tense: past
agent: Rani modifier: much
object: Thing1 object: Thing1
Thing1
instance: coat
colour: red
The question is stored in two forms: as input and in the above form.
2.8 Algorithm
Convert the question to a structured form using English know how, then use a marker to
indicate the substring (like ‘who’ or ‘what’) of the structure, that should be returned as an
answer. If a slot and filler system is used a special marker can be placed in more than one
slot.
The answer appears by matching this structured form against the structured text.
The structured form is matched against the text and the requested segments of the
question are returned.
2.9 Examples
Both questions 1 and 2 generate answers via a new coat and a red coat respectively.
Question 3 cannot be answered, because there is no direct response.
2.10 Comments
This approach is more meaningful than the previous one and so is more effective. The
extra power given must be paid for by additional search time in the knowledge bases. A warning
10
must be given here: that is – to generate unambiguous English knowledge base is a complex task
and must be left until later in the course. The problems of handling pronouns are difficult.
For example:
Rani walked up to the salesperson: she asked where the toy department was.
Rani walked up to the salesperson: she asked her if she needed any help.
Whereas in the original text the linkage of ‘she’ to ‘Rani’ is easy, linkage of ‘she’ in each of the
above sentences to Rani and to the salesperson requires additional knowledge about the context
via the people in a shop.
Method 3
World model contains knowledge about objects, actions and situations that are described
in the input text. This structure is used to create integrated text from input text. The diagram
shows how the system’s knowledge of shopping might be represented and stored. This
information is known as a script and in this case is a shopping script. (See figure 1.1 next page )
1.8.2.12 Algorithm
Convert the question to a structured form using both the knowledge contained in Method
2 and the World model, generating even more possible structures, since even more knowledge is
being used. Sometimes filters are introduced to prune the possible answers.
To answer a question, the scheme followed is: Convert the question to a structured form
as before but use the world model to resolve any ambiguities that may occur. The structured
form is matched against the text and the requested segments of the question are returned.
2.13 Example
Both questions 1 and 2 generate answers, as in the previous program. Question 3 can now
be answered. The shopping script is instantiated and from the last sentence the path through step
14 is the one used to form the representation. ‘M’ is bound to the red coat-got home. ‘Rani buys
a red coat’ comes from step 10 and the integrated text generates that she bought a red coat.
2.14 Comments
This program is more powerful than both the previous programs because it has more
knowledge. Thus, like the last game program it is exploiting AI techniques. However, we are not
yet in a position to handle any English question. The major omission is that of a general
reasoning mechanism known as inference to be used when the required answer is not explicitly
given in the input text. But this approach can handle, with some modifications, questions of the
following form with the answer—Saturday morning Rani went shopping. Her brother tried to
call her but she did not answer.
11
This answer is derived because we have supplied an additional fact that a person cannot
be in two places at once. This patch is not sufficiently general so as to work in all cases and does
not provide the type of solution we are really looking for.
12
Are we trying to produce programs that do the tasks the same way that people do?
OR
Are we trying to produce programs that simply do the tasks the easiest way that is
possible?
Programs in the first class attempt to solve problems that a computer can easily solve and
do not usually use AI techniques. AI techniques usually include a search, as no direct method is
available, the use of knowledge about the objects involved in the problem area and abstraction on
which allows an element of pruning to occur, and to enable a solution to be found in real time;
otherwise, the data could explode in size. Examples of these trivial problems in the first class,
which are now of interest only to psychologists are EPAM (Elementary Perceiver and
Memorizer) which memorized garbage syllables.
The second class of problems attempts to solve problems that are non-trivial for a computer and
use AI techniques. We wish to model human performance on these:
13
To solve the problem of building a system you should take the following steps:
1. Define the problem accurately including detailed specifications and what constitutes a
suitable solution.
2. Scrutinize the problem carefully, for some features may have a central affect on the
chosen method of solution.
3. Segregate and represent the background knowledge needed in the solution of the
problem.
4. Choose the best solving techniques for the problem to solve a solution.
Problem solving has been the key area of concern for Artificial Intelligence.
Problem solving is a process of generating solutions from observed or given data. It is however
not always possible to use direct methods (i.e. go directly from data to solution). Instead,
problem solving often needs to use indirect or modelbased methods.
General Problem Solver (GPS) was a computer program created in 1957 by Simon and Newell
to build a universal problem solver machine. GPS was based on Simon and Newell’s theoretical
work on logic machines. GPS in principle can solve any formalized symbolic problem, such as
theorems proof and geometric problems and chess playing. GPS solved many simple problems,
such as the Towers of Hanoi, that could be sufficiently formalized, but GPS could not solve any
real-world problems.
14
Analyze the problem – find few important features that may have impact on the
appropriateness of various possible techniques for solving the problem
Isolate and represent task knowledge necessary to solve the problem
Choose the best problem-solving technique(s) and apply to the particular problem
Problem definitions
A problem is defined by its ‘elements’ and their ‘relations’. To provide a formal description of a
problem, we need to do the following:
a. Define a state space that contains all the possible configurations of the relevant objects,
including some impossible ones.
b. Specify one or more states that describe possible situations, from which the problem-
solving process may start. These states are called initial states.
c. Specify one or more states that would be acceptable solution to the problem.
The problem can then be solved by using the rules, in combination with an appropriate control
strategy, to move through the problem space until a path from an initial state to a goal state is
found. This process is known as ‘search’. Thus:
A problem space is represented by a directed graph, where nodes represent search state and paths
represent the operators applied to change the state.
A tree is a graph in which any two vertices are connected by exactly one path. Alternatively, any
connected graph with no cycles is a tree.
15
• Problem solving: The term, Problem Solving relates to analysis in AI. Problem solving may be
characterized as a systematic search through a range of possible actions to reach some predefined
goal or solution. Problem-solving methods are categorized as special purpose and general
purpose.
• A special-purpose method is tailor-made for a particular problem, often exploits very specific
features of the situation in which the problem is embedded.
16
To solve the problem of playing a game, we require the rules of the game and targets for winning
as well as representing positions in the game. The opening position can be defined as the initial
state and a winning position as a goal state. Moves from initial state to other states leading to the
goal state follow legally. However, the rules are far too abundant in most games— especially in
chess, where they exceed the number of particles in the universe. Thus, the rules cannot be
supplied accurately and computer programs cannot handle easily. The storage also presents
another problem but searching can be achieved by hashing.
The number of rules that are used must be minimized and the set can be created by expressing
each rule in a form as possible. The representation of games leads to a state space representation
and it is common for well-organized games with some structure. This representation allows for
the formal definition of a problem that needs the movement from a set of initial positions to one
of a set of target positions. It means that the solution involves using known techniques and a
systematic search. This is quite a common method in Artificial Intelligence.
In this problem, we use two jugs called four and three; four holds a maximum of four gallons of
water and three a maximum of three gallons of water. How can we get two gallons of water in
the four jug?
The state space is a set of prearranged pairs giving the number of gallons of water in the pair of
jugs at any time, i.e., (four, three) where four = 0, 1, 2, 3 or 4 and three = 0, 1, 2 or 3.
The start state is (0, 0) and the goal state is (2, n) where n may be any but it is limited to three
holding from 0 to 3 gallons of water or empty. Three and four shows the name and numerical
number shows the amount of water in jugs for solving the water jug problem. The major
production rules for solving this problem are shown below:
17
The problem solved by using the production rules in combination with an appropriate control
strategy, moving through the problem space until a path from an initial state to a goal state is
found. In this problem solving process, search is the fundamental concept. For simple problems
it is easier to achieve this goal by hand but there will be cases where this is far too difficult.
Production systems provide appropriate structures for performing and describing search
processes. A production system has four basic components as enumerated below.
A set of rules each consisting of a left side that determines the applicability of the
rule and a right side that describes the operation to be performed if the rule is
applied.
A database of current facts established during the process of inference.
18
A control strategy that specifies the order in which the rules will be compared
with facts in the database and also specifies how to resolve conflicts in selection
of several rules or selection of more facts.
A rule firing module.
The production rules operate on the knowledge database. Each rule has a precondition—that is,
either satisfied or not by the knowledge database. If the precondition is satisfied, the rule can be
applied. Application of the rule changes the knowledge database. The control system chooses
which applicable rule should be applied and ceases computation when a termination condition on
the knowledge database is satisfied.
The 8-puzzle is a 3 × 3 array containing eight square pieces, numbered 1 through 8, and
one empty space. A piece can be moved horizontally or vertically into the empty space, in effect
exchanging the positions of the piece and the empty space. There are four possible moves, UP
(move the blank space up), DOWN, LEFT and RIGHT. The aim of the game is to make a
sequence of moves that will convert the board from the start state into the goal state:
This example can be solved by the operator sequence UP, RIGHT, UP, LEFT, DOWN.
The Missionaries and Cannibals problem illustrates the use of state space search for planning
under constraints:
Three missionaries and three cannibals wish to cross a river using a two person boat. If
at any time the cannibals outnumber the missionaries on either side of the river, they will eat the
missionaries. How can a sequence of boat trips be performed that will get everyone to the other
side of the river without any missionaries being eaten?
State representation:
Operators:
19
20
The word ‘search’ refers to the search for a solution in a problem space.
• Search proceeds with different types of ‘search control strategies’.
• A strategy is defined by picking the order in which the nodes expand.
The Search strategies are evaluated along the following dimensions: Completeness, Time
complexity, Space complexity, Optimality (the search- related terms are first explained, and then
the search algorithms and control strategies are illustrated next).
Search-related terms
• Algorithm’s performance and complexity
Ideally we want a common measure so that we can compare approaches in order to select
the most appropriate algorithm for a given situation.
Performance of an algorithm depends on internal and external factors.
• ‘Big - O’ notation
The Big-O, theoretical measure of the execution of an algorithm, usually indicates the time or the
memory needed, given the problem size n, which is usually the number of items.
• Big-O notation
The Big-O notation is used to give an approximation to the run-time- efficiency of an algorithm;
the letter ‘O’ is for order of magnitude of operations or space at run-time.
• The Big-O of an Algorithm A
If an algorithm A requires time proportional to f(n), then algorithm A is said to be
of order f(n), and it is denoted as O(f(n)).
If algorithm A requires time proportional to n2, then the order of the algorithm is
said to be O(n2).
If algorithm A requires time proportional to n, then the order of the algorithm is
said to be O(n).
21
The function f(n) is called the algorithm’s growth-rate function. In other words, if an algorithm
has performance complexity O(n), this means that the run-time t should be directly proportional
to n, ie t • n or t = k n where k is constant of proportionality.
Similarly, for algorithms having performance complexity O(log2(n)), O(log N), O(N log N),
O(2N) and so on.
Example 1:
Determine the Big-O of an algorithm:
Thus, the polynomial (2n + 3) is dominated by the 1st term as n while the number of elements in
the array becomes very large.
• In determining the Big-O, ignore constants such as 2 and 3. So the algorithm is of order n.
• So the Big-O of the algorithm is O(n).
• In other words the run-time of this algorithm increases roughly as the size of the input data n,
e.g., an array of size n.
Tree structure
Tree is a way of organizing objects, related in a hierarchical fashion.
• Tree is a type of data structure in which each element is attached to one or more
elements directly beneath it.
• The connections between elements are called branches.
• Tree is often called inverted trees because it is drawn with the root at the top.
• The elements that have no elements below them are called leaves.
• A binary tree is a special type: each element has only two branches below it.
Properties
• Tree is a special case of a graph.
• The topmost node in a tree is called the root node.
• At root node all operations on the tree begin.
• A node has at most one parent.
• The topmost node (root node) has no parents.
• Each node has zero or more child nodes, which are below it .
• The nodes at the bottommost level of the tree are called leaf nodes.
Since leaf nodes are at the bottom most level, they do not have children.
• A node that has a child is called the child’s parent node.
• The depth of a node n is the length of the path from the root to the node.
• The root node is at depth zero.
22
The Stacks and Queues are data structures that maintain the order of last-in, first-out and first-in,
first-out respectively. Both stacks and queues are often implemented as linked lists, but that is
not the only possible implementation.
Stack - Last In First Out (LIFO) lists
An ordered list; a sequence of items, piled one on top of the other.
The insertions and deletions are made at one end only, called Top.
If Stack S = (a[1], a[2], . . . . a[n]) then a[1] is bottom most element
Any intermediate element (a[i]) is on top of element a[i-1], 1 < i <= n.
In Stack all operation take place on Top.
• An ordered list; a sequence of items; there are restrictions about how items can be added to and
removed from the list. A queue has two ends.
• All insertions (enqueue ) take place at one end, called Rear or Back
• All deletions (dequeue) take place at other end, called Front.
• If Queue has a[n] as rear element then a[i+1] is behind a[i] , 1 < i <= n.
• All operation takes place at one end of queue or the other.
Search is the systematic examination of states to find path from the start / root state to the goal
state.
• Search deals with finding nodes having certain properties in a graph that represents search
space.
• Search methods explore the search space ‘intelligently’, evaluating possibilities without
investigating every single possibility.
Examples:
• For a Robot this might consist of PICKUP, PUTDOWN, MOVEFORWARD, MOVEBACK,
MOVELEFT, and MOVERIGHT—until the goal is reached.
• Puzzles and Games have explicit rules: e.g., the ‘Tower of Hanoi’ puzzle
23
This puzzle involves a set of rings of different sizes that can be placed on three different pegs.
• The puzzle starts with the rings arranged as shown in Figure 2.4(a)
• The goal of this puzzle is to move them all as to Figure 2.4(b)
• Condition: Only the top ring on a peg can be moved, and it may only be placed on a smaller
ring, or on an empty peg.
• State space
A state space is the set of all states reachable from the initial state.
A state space forms a graph (or map) in which the nodes are states and the arcs between
nodes are actions.
In a state space, a path is a sequence of states connected by a sequence of actions.
The solution of a problem is part of the map formed by the state space.
24
• Problem description
A problem consists of the description of:
The current state of the world,
The actions that can transform one state of the world into another,
The desired state of the world.
The following action one taken to describe the problem:
State space is defined explicitly or implicitly
A state space should describe everything that is needed to solve a problem and nothing that is
not needed to solve the problem.
Initial state is start state
Goal state is the conditions it has to fulfill.
The description by a desired state may be complete or partial.
Operators are to change state
Operators do actions that can transform one state into another;
Operators consist of: Preconditions and Instructions;
Preconditions provide partial description of the state of the world that must be true in order to
perform the action, and
Instructions tell the user how to create the next state.
Operators should be as general as possible, so as to reduce their number.
Elements of the domain has relevance to the problem
Knowledge of the starting point.
Problem solving is finding a solution
Find an ordered sequence of operators that transform the current (start) state
into a goal state.
25
PROBLEM CHARACTERISTICS
Heuristics cannot be generalized, as they are domain specific. Production systems provide ideal
techniques for representing such heuristics in the form of IF-THEN rules. Most problems
requiring simulation of intelligence use heuristic search extensively. Some heuristics are used to
define the control structure that guides the search process, as seen in the example described
above. But heuristics can also be encoded in the rules to represent the domain knowledge. Since
most AI problems make use of knowledge and guided search through the knowledge, AI can be
described as the study of techniques for solving exponentially hard problems in polynomial time
by exploiting knowledge about problem domain.
To use the heuristic search for problem solving, we suggest analysis of the problem for the
following considerations:
Decomposability of the problem into a set of independent smaller subproblems
Possibility of undoing solution steps, if they are found to be unwise
Predictability of the problem universe
Possibility of obtaining an obvious solution to a problem without comparison of all other
possible solutions
Type of the solution: whether it is a state or a path to the goal state
Role of knowledge in problem solving
Nature of solution process: with or without interacting with the user
Problem Decomposition
26
This problem can be solved by breaking it into smaller problems, each of which we can solve by
using a small collection of specific rules. Using this technique of problem decomposition, we can
solve very large problems very easily. This can be considered as an intelligent behaviour.
Suppose we are trying to prove a mathematical theorem: first we proceed considering that
proving a lemma will be useful. Later we realize that it is not at all useful. We start with another
one to prove the theorem. Here we simply ignore the first method.
Consider the 8-puzzle problem to solve: we make a wrong move and realize that mistake. But
here, the control strategy must keep track of all the moves, so that we can backtrack to the initial
state and start with some new move.
Consider the problem of playing chess. Here, once we make a move we never recover from that
step. These problems are illustrated in the three important classes of problems mentioned below:
1. Ignorable, in which solution steps can be ignored. Eg: Theorem Proving
2. Recoverable, in which solution steps can be undone. Eg: 8-Puzzle
3. Irrecoverable, in which solution steps cannot be undone. Eg: Chess
Consider the 8-Puzzle problem. Every time we make a move, we know exactly what will happen.
This means that it is possible to plan an entire sequence of moves and be confident what the
resulting state will be. We can backtrack to earlier moves if they prove unwise.
Suppose we want to play Bridge. We need to plan before the first play, but we cannot play with
certainty. So, the outcome of this game is very uncertain. In case of 8-Puzzle, the outcome is
very certain. To solve uncertain outcome problems, we follow the process of plan revision as the
plan is carried out and the necessary feedback is provided. The disadvantage is that the planning
in this case is often very expensive.
27
We are interested to answer the question; it does not matter which path we follow. If we follow
one path successfully to the correct answer, then there is no reason to go back and check another
path to lead the solution.
28
Is there any relationship between classes of production systems and classes of problems?
For any solvable problems, there exist an infinite number of production systems that show how
to find solutions. Any problem that can be solved by any production system can be solved by a
commutative one, but the commutative one is practically useless. It may use individual states to
represent entire sequences of applications of rules of a simpler, non-commutative system. In the
formal sense, there is no relationship between kinds of problems and kinds of production systems
Since all problems can be solved by all kinds of systems. But in the practical sense, there is
definitely such a relationship between the kinds of problems and the kinds of systems that lend
themselves to describing those problems.
Partially commutative, monotonic productions systems are useful for solving ignorable
problems. These are important from an implementation point of view without the ability to
backtrack to previous states when it is discovered that an incorrect path has been followed. Both
types of partially commutative production systems are significant from an implementation point;
they tend to lead to many duplications of individual states during the search process. Production
systems that are not partially commutative are useful for many problems in which permanent
changes occur.
Finally in terms of data structures, considering the water jug as a typical problem do we use a
graph or tree? The breadth-first structure does take note of all nodes generated but the depth-first
one can be modified.
29
1. Observe all nodes that are already generated, if a new node is present.
2. If it exists add it to the graph.
3. If it already exists, then
a. Set the node that is being expanded to the point to the already existing node
corresponding to its successor rather than to the new one. The new one can be thrown
away.
b. If the best or shortest path is being determined, check to see if this path is better or
worse than the old one. If worse, do nothing.
Better save the new path and work the change in length through the chain of successor nodes if
necessary.
Example: Tic-Tac-Toe
State spaces are good representations for board games such as Tic-Tac-Toe. The position of a
game can be explained by the contents of the board and the player whose turn is next. The board
can be represented as an array of 9 cells, each of which may contain an X or O or be empty.
• State:
Player to move next: X or O.
Board configuration:
Search Tree
The sequence of states formed by possible moves is called a search tree. Each level of the tree is
called a ply.
Since the same state may be reachable by different sequences of moves, the state space may in
general be a graph. It may be treated as a tree for simplicity, at the cost of duplicating states.
30
31
• Heuristic algorithms are not really intelligent; they appear to be intelligent because they
achieve better performance.
• Heuristic algorithms are more efficient because they take advantage of feedback from the data
to direct the search path.
• Uninformed search algorithms or Brute-force algorithms, search through the search space all
possible candidates for the solution checking whether each candidate satisfies the problem’s
statement.
• Informed search algorithms use heuristic functions that are specific to the problem, apply
them to guide the search through the search space to try to reduce the amount of time spent in
searching.
A good heuristic will make an informed search dramatically outperform any uninformed search:
for example, the Traveling Salesman Problem (TSP), where the goal is to find is a good solution
instead of finding the best solution.
In such problems, the search proceeds using current information about the problem to predict
which path is closer to the goal and follow it, although it does not always guarantee to find the
best possible solution. Such techniques help in finding a solution within reasonable time and
space (memory). Some prominent intelligent search algorithms are stated below:
1. Generate and Test Search
2. Best-first Search
3. Greedy Search
4. A* Search
5. Constraint Search
6. Means-ends analysis
There are some more algorithms. They are either improvements or combinations of these.
• Hierarchical Representation of Search Algorithms: A Hierarchical representation of most
search algorithms is illustrated below. The representation begins with two types of search:
• Uninformed Search: Also called blind, exhaustive or brute-force search, it uses no
information about the problem to guide the search and therefore may not be very efficient.
• Informed Search: Also called heuristic or intelligent search, this uses information about the
problem to guide the search—usually guesses the distance to a goal state and is therefore
efficient, but the search may not be always possible.
32
The first requirement is that it causes motion, in a game playing program, it moves on the board
and in the water jug problem, filling water is used to fill jugs. It means the control strategies
without the motion will never lead to the solution.
The second requirement is that it is systematic, that is, it corresponds to the need for global
motion as well as for local motion. This is a clear condition that neither would it be rational to
fill a jug and empty it repeatedly, nor it would be worthwhile to move a piece round and round
on the board in a cyclic way in a game. We shall initially consider two systematic approaches for
searching. Searches can be classified by the order in which operators are tried: depth-first,
breadth-first, bounded depth-first.
33
Breadth-first search
A Search strategy, in which the highest layer of a decision tree is searched completely before
proceeding to the next layer is called Breadth-first search (BFS).
• In this strategy, no viable solutions are omitted and therefore it is guaranteed that an optimal
solution is found.
• This strategy is often not feasible when the search space is large.
Algorithm
1. Create a variable called LIST and set it to be the starting state.
2. Loop until a goal state is found or LIST is empty, Do
a. Remove the first element from the LIST and call it E. If the LIST is empty, quit.
b. For every path each rule can match the state E, Do
(i) Apply the rule to generate a new state.
(ii) If the new state is a goal state, quit and return this state.
(iii) Otherwise, add the new state to the end of LIST.
34
Advantages
1. Guaranteed to find an optimal solution (in terms of shortest number of steps
to reach the goal).
2. Can always find a goal node if one exists (complete).
Disadvantages
1. High storage requirement: exponential with tree depth.
Depth-first search
A search strategy that extends the current path as far as possible before backtracking to the last
choice point and trying the next alternative path is called Depth-first search (DFS).
• This strategy does not guarantee that the optimal solution has been found.
• In this strategy, search reaches a satisfactory solution more rapidly than breadth first, an
advantage when the search space is large.
Algorithm
Depth-first search applies operators to each newly generated state, trying to drive directly toward
the goal.
1. If the starting state is a goal state, quit and return success.
2. Otherwise, do the following until success or failure is signalled:
a. Generate a successor E to the starting state. If there are no more successors, then signal failure.
b. Call Depth-first Search with E as the starting state.
c. If success is returned signal success; otherwise, continue in the loop.
Advantages
1. Low storage requirement: linear with tree depth.
2. Easily programmed: function call stack does most of the work of maintaining state of the
search.
Disadvantages
1. May find a sub-optimal solution (one that is deeper or more costly than the best solution).
2. Incomplete: without a depth bound, may not find a solution even if one exists.
2.4.2.3 Bounded depth-first search
Depth-first search can spend much time (perhaps infinite time) exploring a very deep path that
does not contain a solution, when a shallow solution exists. An easy way to solve this problem is
to put a maximum depth bound on the search. Beyond the depth bound, a failure is generated
automatically without exploring any deeper.
Problems:
1. It’s hard to guess how deep the solution lies.
2. If the estimated depth is too deep (even by 1) the computer time used is dramatically
increased, by a factor of bextra.
3. If the estimated depth is too shallow, the search fails to find a solution; all that computer time
is wasted.
Heuristics
A heuristic is a method that improves the efficiency of the search process. These are like tour
guides. There are good to the level that they may neglect the points in general interesting
directions; they are bad to the level that they may neglect points of interest to particular
individuals. Some heuristics help in the search process without sacrificing any claims to entirety
that the process might previously had. Others may occasionally cause an excellent path to be
overlooked. By sacrificing entirety it increases efficiency. Heuristics may not find the best
35
solution every time but guarantee that they find a good solution in a reasonable time. These are
particularly useful in solving tough and complex problems, solutions of which would require
infinite time, i.e. far longer than a lifetime for the problems which are not solved in any other
way.
Heuristic search
To find a solution in proper time rather than a complete solution in unlimited time we use
heuristics. ‘A heuristic function is a function that maps from problem state descriptions to
measures of desirability, usually represented as numbers’. Heuristic search methods use
knowledge about the problem domain and choose promising operators first. These heuristic
search methods use heuristic functions to evaluate the next state towards the goal state. For
finding a solution, by using the heuristic technique, one should carry out the following steps:
1. Add domain—specific information to select what is the best path to continue searching along.
2. Define a heuristic function h(n) that estimates the ‘goodness’ of a node n.
Specifically, h(n) = estimated cost(or distance) of minimal cost path from n to a goal state.
3. The term, heuristic means ‘serving to aid discovery’ and is an estimate, based on domain
specific information that is computable from the current state description of how close we are to
a goal.
Finding a route from one city to another city is an example of a search problem in which
different search orders and the use of heuristic knowledge are easily understood.
1. State: The current city in which the traveller is located.
2. Operators: Roads linking the current city to other cities.
3. Cost Metric: The cost of taking a given road between cities.
4. Heuristic information: The search could be guided by the direction of the goal city from the
current city, or we could use airline distance as an estimate of the distance to the goal.
Heuristic search techniques
For complex problems, the traditional algorithms, presented above, are unable to find the
solution within some practical time and space limits. Consequently, many special techniques are
developed, using heuristic functions.
• Blind search is not always possible, because it requires too much time or Space (memory).
36
Comparison of Algorithms
No knowledge about how far a node Guides search process toward goal
node from goal state
Prefers states (nodes) that lead
close to and not away from goal
state
Suppose there are N cities, then a solution would be to take N! possible combinations to find the
shortest distance to decide the required route. This is not efficient as with N=10 there are
36,28,800 possible routes. This is an example of combinatorial explosion.
There are better methods for the solution of such problems: one is called branch and bound.
First, generate all the complete paths and find the distance of the first complete path. If the next
path is shorter, then save it and proceed this way avoiding the path when its length exceeds the
saved shortest path length, although it is better than the previous method.
Generate-And-Test Algorithm
Generate-and-test search algorithm is a very simple algorithm that guarantees to find a solution if
done systematically and there exists a solution.
Algorithm: Generate-And-Test
1.Generate a possible solution.
2.Test to see if this is the expected solution.
3.If the solution has been found quit else go to step 1.
Potential solutions that need to be generated vary depending on the kinds of problems. For some
problems the possible solutions may be particular points in the problem space and for some
problems, paths from the start state.
37
38
Hill Climbing
Hill Climbing is heuristic search used for mathematical optimization problems in the field of
Artificial Intelligence .
Given a large set of inputs and a good heuristic function, it tries to find a sufficiently good
solution to the problem. This solution may not be the global optimal maximum.
In the above definition, mathematical optimization problems implies that hill climbing
solves the problems where we need to maximize or minimize a given real function by
choosing values from the given inputs. Example-Travelling salesman problem where we
need to minimize the distance traveled by salesman.
‘Heuristic search’ means that this search algorithm may not find the optimal solution to
the problem. However, it will give a good solution in reasonable time.
A heuristic function is a function that will rank all the possible alternatives at any
branching step in search algorithm based on the available information. It helps the
algorithm to select the best route out of possible routes.
Features of Hill Climbing
1. Variant of generate and test algorithm : It is a variant of generate and test algorithm. The
generate and test algorithm is as follows :
1. Simple Hill climbing : It examines the neighboring nodes one by one and selects the first
neighboring node which optimizes the current cost as next node.
Algorithm for Simple Hill climbing :
Step 1 : Evaluate the initial state. If it is a goal state then stop and return success. Otherwise,
make initial state as current state.
Step 2 : Loop until the solution state is found or there are no new operators present which can be
applied to current state.
a) Select a state that has not been yet applied to the current state and apply it to produce a new
state.
b) Perform these to evaluate new state
i. If the current state is a goal state, then stop and return success.
ii. If it is better than the current state, then make it current state and proceed further.
iii. If it is not better than the current state, then continue in the loop until a solution is found.
Step 3 : Exit.
39
2. Steepest-Ascent Hill climbing : It first examines all the neighboring nodes and then
selects the node closest to the solution state as next node.
Step 1 : Evaluate the initial state. If it is goal state then exit else make the current state as initial
state
Step 2 : Repeat these steps until a solution is found or current state does not change
i. Let ‘target’ be a state such that any successor of the current state will be better than it;
ii. for each operator that applies to the current state
a. apply the new operator and create a new state
b. evaluate the new state
c. if this state is goal state then quit else compare with ‘target’
d. if this state is better than ‘target’, set this state as ‘target’
e. if target is better than current state set current state to Target
Step 3 : Exit
3. Stochastic hill climbing : It does not examine all the neighboring nodes before deciding
which node to select .It just selects a neighboring node at random, and decides (based on
the amount of improvement in that neighbor) whether to move to that neighbor or to
examine another.
State Space diagram for Hill Climbing
State space diagram is a graphical representation of the set of states our search algorithm can
reach vs the value of our objective function(the function which we wish to maximize).
X-axis : denotes the state space ie states or configuration our algorithm may reach.
Y-axis : denotes the values of objective function corresponding to to a particular state.
The best solution will be that state space where objective function has maximum value(global
maximum).
40
2. Global maximum : It is the best possible state in the state space diagram. This because at
this state, objective function has highest value.
3. Plateua/flat local maximum : It is a flat region of state space where neighboring states
have the same value.
4. Ridge : It is region which is higher than its neighbours but itself has a slope. It is a special
kind of local maximum.
5. Current state : The region of state space diagram where we are currently present during
the search.
6. Shoulder : It is a plateau that has an uphill edge.
Problems in different regions in Hill climbing
Hill climbing cannot reach the optimal/best state(global maximum) if it enters any of the
following regions :
1. Local maximum : At a local maximum all neighboring states have a values which is
worse than than the current state. Since hill climbing uses greedy approach, it will not
move to the worse state and terminate itself. The process will end even though a better
solution may exist.
To overcome local maximum problem : Utilize backtracking technique. Maintain a list of
visited states. If the search reaches an undesirable state, it can backtrack to the previous
configuration and explore a new path.
2. Plateau : On plateau all neighbors have same value . Hence, it is not possible to select the
best direction.
To overcome plateaus : Make a big jump. Randomly select a state far away from current state.
Chances are that we will land at a non-plateau region
3. Ridge : Any point on a ridge can look like peak because movement in all possible
directions is downward. Hence the algorithm stops when it reaches this state.
To overcome Ridge : In this kind of obstacle, use two or more rules before testing. It
implies moving in several directions at once.
In BFS and DFS, when we are at a node, we can consider any of the adjacent as next
node. So both BFS and DFS blindly explore paths without considering any cost function. The
idea of Best First Search is to use an evaluation function to decide which adjacent is most
promising and then explore. Best First Search falls under the category of Heuristic Search or
Informed Search.
We use a priority queue to store costs of nodes. So the implementation is a variation of BFS, we
just need to change Queue to PriorityQueue.
Algorithm:
Best-First-Search(Grah g, Node start)
1) Create an empty PriorityQueue
PriorityQueue pq;
2) Insert "start" in pq.
pq.insert(start)
3) Until PriorityQueue is empty
u = PriorityQueue.DeleteMin
41
If u is the goal
Exit
Else
Foreach neighbor v of u
If v "Unvisited"
Mark v "Visited"
pq.insert(v)
Mark v "Examined"
End procedure
Let us consider below example.
pq initially contains S
We remove s from and process unvisited
neighbors of S to pq.
pq now contains {A, C, B} (C is put
before B because C has lesser cost)
42
A* Search Algorithm
A* is a type of search algorithm. Some problems can be solved by representing the world in the
initial state, and then for each action we can perform on the world we generate states for what the
world would be like if we did so. If you do this until the world is in the state that we specified as
a solution, then the route from the start to this goal state is the solution to your problem.
In this tutorial I will look at the use of state space search to find the shortest path between two
points (pathfinding), and also to solve a simple sliding tile puzzle (the 8-puzzle). Let's look at
some of the terms used in Artificial Intelligence when describing this state space search.
Some terminology
A node is a state that the problem's world can be in. In pathfinding a node would be just a 2d
coordinate of where we are at the present time. In the 8-puzzle it is the positions of all the tiles.
Next all the nodes are arranged in a graph where links between nodes represent valid steps in
solving the problem. These links are known as edges. In the 8-puzzle diagram the edges are
shown as blue lines. See figure 1 below.
State space search, then, is solving a problem by beginning with the start state, and then for each
node we expand all the nodes beneath it in the graph by applying all the possible moves that can
be made at each point.
At this point we introduce an important concept, the heuristic. This is like an algorithm, but with
a key difference. An algorithm is a set of steps which you can follow to solve a problem, which
always works for valid input. For example you could probably write an algorithm yourself for
43
multiplying two numbers together on paper. A heuristic is not guaranteed to work but is useful in
that it may solve a problem for which there is no algorithm.
We need a heuristic to help us cut down on this huge search problem. What we need is to use our
heuristic at each node to make an estimate of how far we are from the goal. In pathfinding we
know exactly how far we are, because we know how far we can move each step, and we can
calculate the exact distance to the goal.
But the 8-puzzle is more difficult. There is no known algorithm for calculating from a given
position how many moves it will take to get to the goal state. So various heuristics have been
devised. The best one that I know of is known as the Nilsson score which leads fairly directly to
the goal most of the time, as we shall see.
Cost
When looking at each node in the graph, we now have an idea of a heuristic, which can estimate
how close the state is to the goal. Another important consideration is the cost of getting to where
we are. In the case of pathfinding we often assign a movement cost to each square. The cost is
the same then the cost of each square is one. If we wanted to differentiate between terrain types
we may give higher costs to grass and mud than to newly made road. When looking at a node we
want to add up the cost of what it took to get here, and this is simply the sum of the cost of this
node and all those that are above it in the graph.
8 Puzzle
Let's look at the 8 puzzle in more detail. This is a simple sliding tile puzzle on a 3*3 grid where
one tile is missing and you can move the other tiles into the gap until you get the puzzle into the
goal position. See figure 1.
There are 362,880 different states that the puzzle can be in, and to find a solution the search has
to find a route through them. From most positions of the search the number of edges (that's the
44
blue lines) is two. That means that the number of nodes you have in each level of the search is
2^d where d is the depth. If the number of steps to solve a particular state is 18, then that�s
262,144 nodes just at that level.
The 8 puzzle game state is as simple as representing a list of the 9 squares and what's in them.
Here are two states for example; the last one is the GOAL state, at which point we've found the
solution. The first is a jumbled up example that you may start from.
Pathfinding
In a video game, or some other pathfinding scenario, you want to search a state space and find
out how to get from somewhere you are to somewhere you want to be, without bumping into
walls or going too far. For reasons we will see later, the A* algorithm will not only find a path, if
there is one, but it will find the shortest path. A state in pathfinding is simply a position in the
world. In the example of a maze game like Pacman you can represent where everything is using
a simple 2d grid. The start state for a ghost say, would be the 2d coordinate of where the ghost is
at the start of the search. The goal state would be where pacman is so we can go and eat him.
There is also example code to do pathfinding on the github site.
45
Implementing A*
We are now ready to look at the operation of the A* algorithm. What we need to do is start with
the goal state and then generate the graph downwards from there. Let's take the 8-puzzle in
figure 1. We ask how many moves can we make from the start state? The answer is 2, there are
two directions we can move the blank tile, and so our graph expands.
If we were just to continue blindly generating successors to each node, we could potentially fill
the computer's memory before we found the goal node. Obviously we need to remember the best
nodes and search those first. We also need to remember the nodes that we have expanded
already, so that we don't expand the same state repeatedly.
Let's start with the OPEN list. This is where we will remember which nodes we haven't yet
expanded. When the algorithm begins the start state is placed on the open list, it is the only state
we know about and we have not expanded it. So we will expand the nodes from the start and put
those on the OPEN list too. Now we are done with the start node and we will put that on the
CLOSED list. The CLOSED list is a list of nodes that we have expanded.
f=g+h
Using the OPEN and CLOSED list lets us be more selective about what we look at next in the
search. We want to look at the best nodes first. We will give each node a score on how good we
think it is. This score should be thought of as the cost of getting from the node to the goal plus
the cost of getting to where we are. Traditionally this has been represented by the letters f, g and
h. 'g' is the sum of all the costs it took to get here, 'h' is our heuristic function, the estimate of
what it will take to get to the goal. 'f' is the sum of these two. We will store each of these in our
nodes.
Using the f, g and h values the A* algorithm will be directed, subject to conditions we will look
at further on, towards the goal and will find it in the shortest route possible.
So far we have looked at the components of the A*, let's see how they all fit together to make the
algorithm :
Pseudocode
Hopefully the ideas we looked at in the preceding paragraphs will now click into place as we
look at the A* algorithm pseudocode. You may find it helpful to print this out or leave the
window open while we discuss it.
To help make the operation of the algorithm clear we will look again at the 8-puzzle problem in
figure 1 above. Figure 3 below shows the f,g and h scores for each of the tiles.
46
First of all look at the g score for each node. This is the cost of what it took to get from the start
to that node. So in the picture the center number is g. As you can see it increases by one at each
level. In some problems the cost may vary for different state changes. For example in
pathfinding there is sometimes a type of terrain that costs more than other types.
Next look at the last number in each triple. This is h, the heuristic score. As I mentioned above I
am using a heuristic known as Nilsson's Sequence, which converges quickly to a correct solution
in many cases. Here is how you calculate this score for a given 8-puzzle state :
Advantages:
It is the best one from other techniques. It is used to solve very complex problems.
It is optimally efficient, i.e. there is no other optimal algorithm guaranteed to expand fewer nodes
than A*.
Disadvantages:
This algorithm is complete if the branching factor is finite and every action has fixed cost.
The speed execution of A* search is highly dependant on the accuracy of the heuristic algorithm
that is used to compute h (n).
47
The Depth first search and Breadth first search given earlier for OR trees or graphs can be easily
adopted by AND-OR graph. The main difference lies in the way termination conditions are
determined, since all goals following an AND nodes must be realized; where as a single goal
node following an OR node will do. So for this purpose we are using AO* algorithm.
Like A* algorithm here we will use two arrays and one heuristic function.
OPEN:
It contains the nodes that has been traversed but yet not been marked solvable or unsolvable.
CLOSE:
Algorithm:
Step 3: Select a node n that is both on OPEN and a member of T0. Remove it from OPEN and
place it in
CLOSE
Step 4: If n is the terminal goal node then leveled n as solved and leveled all the ancestors of n
as solved. If the starting node is marked as solved then success and exit.
Step 5: If n is not a solvable node, then mark n as unsolvable. If starting node is marked as
unsolvable, then return failure and exit.
Step 6: Expand n. Find all its successors and find their h (n) value, push them into OPEN.
Step 8: Exit.
48
Implementation:
Step 1:
In the above graph, the solvable nodes are A, B, C, D, E, F and the unsolvable nodes are G, H.
Take A as the starting node. So place A into OPEN.
49
50
51
Advantages:
It is an optimal algorithm.
If traverse according to the ordering of nodes. It can be used for both OR and AND graph.
Disadvantages:
Sometimes for unsolvable nodes, it can’t find the optimal path. Its complexity is than other
algorithms.
PROBLEM REDUCTION
When a problem can be divided into a set of sub problems, where each sub problem can be
solved separately and a combination of these will be a solution, AND-OR graphs or AND - OR
trees are used for representing the solution. The decomposition of the problem or problem
reduction generates AND arcs. One AND are may point to any number of successor nodes. All
52
these must be solved so that the arc will rise to many arcs, indicating several possible solutions.
Hence the graph is known as AND - OR instead of AND. Figure shows an AND - OR graph.
An algorithm to find a solution in an AND - OR graph must handle AND area appropriately. A*
algorithm can not search AND - OR graphs efficiently. This can be understand from the give
figure.
In figure (a) the top node A has been expanded producing two area one leading to B and leading
to C-D . the numbers at each node represent the value of f ' at that node (cost of getting to the
goal state from current state). For simplicity, it is assumed that every operation(i.e. applying a
rule) has unit cost, i.e., each are with single successor will have a cost of 1 and each of its
components. With the available information till now , it appears that C is the most promising
node to expand since its f ' = 3 , the lowest but going through B would be better since to use C
we must also use D' and the cost would be 9(3+4+1+1). Through B it would be 6(5+1).
Thus the choice of the next node to expand depends not only n a value but also on whether that
node is part of the current best path form the initial mode. Figure (b) makes this clearer. In figure
the node G appears to be the most promising node, with the least f ' value. But G is not on the
current beat path, since to use G we must use GH with a cost of 9 and again this demands that
arcs be used (with a cost of 27). The path from A through B, E-F is better with a total cost of
(17+1=18). Thus we can see that to search an AND-OR graph, the following three things must
be done.
1. traverse the graph starting at the initial node and following the current best path, and
accumulate the set of nodes that are on the path and have not yet been expanded.
2. Pick one of these unexpanded nodes and expand it. Add its successors to the graph and
computer f ' (cost of the remaining distance) for each of them.
53
3. Change the f ' estimate of the newly expanded node to reflect the new information produced
by its successors. Propagate this change backward through the graph. Decide which of the
current best path.
The propagation of revised cost estimation backward is in the tree is not necessary in A*
algorithm. This is because in AO* algorithm expanded nodes are re-examined so that the current
best path can be selected. The working of AO* algorithm is illustrated in figure as follows:
Referring the figure. The initial node is expanded and D is Marked initially as promising node. D
is expanded producing an AND arc E-F. f ' value of D is updated to 10. Going backwards we can
see that the AND arc B-C is better . it is now marked as current best path. B and C have to be
expanded next. This process continues until a solution is found or all paths have led to dead ends,
indicating that there is no solution. An A* algorithm the path from one node to the other is
always that of the lowest cost and it is independent of the paths through other nodes.
The algorithm for performing a heuristic search of an AND - OR graph is given below. Unlike
A* algorithm which used two lists OPEN and CLOSED, the AO* algorithm uses a single
structure G. G represents the part of the search graph generated so far. Each node in G points
down to its immediate successors and up to its immediate predecessors, and also has with it the
value of h' cost of a path from itself to a set of solution nodes. The cost of getting from the start
nodes to the current node "g" is not stored as in the A* algorithm. This is because it is not
possible to compute a single such value since there may be many paths to the same state. In AO*
algorithm serves as the estimate of goodness of a node. Also a there should value called
FUTILITY is used. The estimated cost of a solution is greater than FUTILITY then the search is
abandoned as too expansive to be practical.
For representing above graphs AO* algorithm is as follows
AO* ALGORITHM:
1. Let G consists only to the node representing the initial state call this node INTT. Compute
h' (INIT).
2. Until INIT is labeled SOLVED or hi (INIT) becomes greater than FUTILITY, repeat the
following procedure.
54
(I) Trace the marked arcs from INIT and select an unbounded node NODE.
(II) Generate the successors of NODE . if there are no successors then assign FUTILITY as
h' (NODE). This means that NODE is not solvable. If there are successors then for each
one
called SUCCESSOR, that is not also an ancester of NODE do the following
(b) if successor is not a terminal node, mark it solved and assign zero to its h ' value.
(III) propagate the newly discovered information up the graph by doing the following . let S be a
set of nodes that have been marked SOLVED. Initialize S to NODE. Until S is empty
repeat
the following procedure;
(b) compute h' of each of the arcs emerging from CURRENT , Assign minimum h' to
CURRENT.
(c) Mark the minimum cost path a s the best out of CURRENT.
(d) Mark CURRENT SOLVED if all of the nodes connected to it through the new marked
are have been labeled SOLVED.
(e) If CURRENT has been marked SOLVED or its h ' has just changed, its new status
must
be propagate backwards up the graph . hence all the ancestors of CURRENT are added
to S.
(Refered From Artificial Intelligence TMH)
2. Using the search tree, compute the most promising solution tree TP .
3. Select node n that is both on open and a part of tp, remove n from open and place it no closed.
4. If n is a goal node, label n as solved. If the start node is solved, exit with success where tp is
the solution tree, remove all nodes from open with a solved ancestor.
55
5. If n is not solvable node, label n as unsolvable. If the start node is labeled as unsolvable, exit
with failure. Remove all nodes from open ,with unsolvable ancestors.
6. Otherwise, expand node n generating all of its successor compute the cost of for each newly
generated node and place all such nodes on open.
7. Go back to step(2)
CONSTRAINT SATISFACTION:-
Many problems in AI can be considered as problems of constraint satisfaction, in which the goal
state satisfies a given set of constraint. constraint satisfaction problems can be solved by using
any of the search strategies. The general form of the constraint satisfaction procedure is as
follows:
Until a complete solution is found or until all paths have led to lead ends, do
2. Apply the constraint inference rules to the selected node to generate all possible new
constraints.
3. If the set of constraints contains a contradiction, then report that this path is a dead end.
5. If neither a constraint nor a complete solution has been found then apply the rules to generate
new partial solutions. Insert these partial solutions into the search graph.
SEND
+ MORE
----------
MONEY
----------
Assign decimal digit to each of the letters in such a way that the answer to the problem is correct
to the same letter occurs more than once , it must be assign the same digit each time . no two
different letters may be assigned the same digit. Consider the crypt arithmetic problem.
56
SEND
+ MORE
-----------
MONEY
-----------
CONSTRAINTS:-
2. Assumption can be made at various levels such that they do not contradict each other.
3. The problem can be decomposed into secured constraints. A constraint satisfaction approach
may be used.
Goal State: the digits to the letters must be assigned in such a manner so that the sum is satisfied.
Solution Process:
1. initial guess m=1 because the sum of two single digits can generate at most a carry '1'.
57
2. When n=1 o=0 or 1 because the largest single digit number added to m=1 can generate the
sum of either 0 or 1 depend on the carry received from the carry sum. By this we conclude that
o=0 because m is already 1 hence we cannot assign same digit another letter(rule no.)
3. We have m=1 and o=0 to get o=0 we have s=8 or 9, again depending on the carry received
from the earlier sum.
The same process can be repeated further. The problem has to be composed into various
constraints. And each constraints is to be satisfied by guessing the possible digits that the letters
can be assumed that the initial guess has been already made . rest of the process is being shown
in the form of a tree, using depth-first search for the clear understandability of the solution
process.
58
D>6(Controduction)
59
60
Most of the search strategies either reason forward of backward however, often a mixture o the
two directions is appropriate. Such mixed strategy would make it possible to solve the major
parts of problem first and solve the smaller problems the arise when combining them together.
Such a technique is called "Means - Ends Analysis".
The means -ends analysis process centers around finding the difference between current state and
goal state. The problem space of means - ends analysis has an initial state and one or more goal
state, a set of operate with a set of preconditions their application and difference functions that
computes the difference between two state a(i) and s(j). A problem is solved using means - ends
analysis by
1. Computing the current state s1 to a goal state s2 and computing their difference D12.
2. Satisfy the preconditions for some recommended operator op is selected, then to reduce the
difference D12.
3. The operator OP is applied if possible. If not the current state is solved a goal is created and
means- ends analysis is applied recursively to reduce the sub goal.
4. If the sub goal is solved state is restored and work resumed on the original problem.
( the first AI program to use means - ends analysis was the GPS General problem solver)
means- ends analysis I useful for many human planning activities. Consider the example of
planing for an office worker. Suppose we have a different table of three rules:
1. If in out current state we are hungry , and in our goal state we are not hungry , then either the
"visit hotel" or "visit Canteen " operator is recommended.
2. If our current state we do not have money , and if in your goal state we have money, then the
"Visit our bank" operator or the "Visit secretary" operator is recommended.
3. If our current state we do not know where something is , need in our goal state we do know,
then either the "visit office enquiry" , "visit secretary" or "visit co worker " operator is
recommended.
61
KNOWLEDGE REPRESENTATION
KNOWLEDGE REPRESENTATION:-
For the purpose of solving complex problems c\encountered in AI, we need both a large amount
of knowledge and some mechanism for manipulating that knowledge to create solutions to new
problems. A variety of ways of representing knowledge (facts) have been exploited in AI
programs. In all variety of knowledge representations , we deal with two kinds of entities.
A. Facts: Truths in some relevant world. These are the things we want to represent.
One way to think of structuring these entities is at two levels : (a) the knowledge level, at which
facts are described, and (b) the symbol level, at which representations of objects at the
knowledge level are defined in terms of symbols that can be manipulated by programs.
The facts and representations are linked with two-way mappings. This link is called
representation mappings. The forward representation mapping maps from facts to
representations. The backward representation mapping goes the other way, from representations
to facts.
62
63
64
Given the facts, it is not possible to answer a simple question such as: “Who is the
heaviest player?”
Also, But if a procedure for finding the heaviest player is provided, then these facts will
enable that procedure to compute an answer.
Moreover, We can ask things like who “bats – left” and “throws – right”.
Inheritable Knowledge
Here the knowledge elements inherit attributes from their parents.
The knowledge embodied in the design hierarchies found in the functional, physical and
process domains.
Within the hierarchy, elements inherit attributes from their parents, but in many cases, not
all attributes of the parent elements prescribed to the child elements.
Also, The inheritance is a powerful form of inference, but not adequate.
Moreover, The basic KR (Knowledge Representation) needs to augment with inference
mechanism.
Property inheritance: The objects or elements of specific classes inherit attributes and
values from more general classes.
So, The classes organized in a generalized hierarchy.
65
66
Procedural Knowledge
A representation in which the control information, to use the knowledge, embedded in the
knowledge itself. For example, computer programs, directions, and recipes; these indicate
specific use or implementation;
Moreover, Knowledge encoded in some procedures, small programs that know how to do
specific things, how to proceed.
Advantages:
Heuristic or domain-specific knowledge can represent.
Moreover, Extended logical inferences, such as default reasoning facilitated.
Also, Side effects of actions may model. Some rules may become false in time.
Keeping track of this in large systems may be tricky.
Disadvantages:
Completeness — not all cases may represent.
Consistency — not all deductions may be correct. e.g If we know that Fred is a
bird we might deduce that Fred can fly. Later we might discover that Fred is an
emu.
Modularity sacrificed. Changes in knowledge base might have far-reaching
effects.
Cumbersome control information.
67
Functions, which are a subset of relations where there is only one “value” for any given
“input”
First-order Predicate logic (FOPL) provides
Constants: a, b, dog33. Name a specific object.
Variables: X, Y. Refer to an object without naming it.
Functions: Mapping from objects to objects.
Terms: Refer to objects
Atomic Sentences: in(dad-of(X), food6) Can be true or false, Correspond to propositional
symbols P, Q.
A well-formed formula (wff) is a sentence containing no “free” variables. So, That is, all
variables are “bound” by universal or existential quantifiers.
(∀x)P(x, y) has x bound as a universally quantified variable, but y is free.
Quantifiers
Universal quantification
(∀x)P(x) means that P holds for all values of x in the domain associated with that variable
E.g., (∀x) dolphin(x) → mammal(x)
Existential quantification
(∃ x)P(x) means that P holds for some value of x in the domain associated with that
variable
E.g., (∃ x) mammal(x) ∧ lays-eggs(x)
Also, Consider the following example that shows the use of predicate logic as a way of
representing knowledge.
1. Marcus was a man.
2. Marcus was a Pompeian.
3. All Pompeians were Romans.
4. Caesar was a ruler.
5. Also, All Pompeians were either loyal to Caesar or hated him.
6. Everyone is loyal to someone.
7. People only try to assassinate rulers they are not loyal to.
8. Marcus tried to assassinate Caesar.
The facts described by these sentences can be represented as a set of well-formed formulas (wffs)
as follows:
1. Marcus was a man.
man(Marcus)
2. Marcus was a Pompeian.
Pompeian(Marcus)
3. All Pompeians were Romans.
∀x: Pompeian(x) → Roman(x)
4. Caesar was a ruler.
ruler(Caesar)
5. All Pompeians were either loyal to Caesar or hated him.
inclusive-or
∀x: Roman(x) → loyalto(x, Caesar) ∨ hate(x, Caesar)
exclusive-or
∀x: Roman(x) → (loyalto(x, Caesar) ∧¬ hate(x, Caesar)) ∨
(¬loyalto(x, Caesar) ∧ hate(x, Caesar))
68
69
Specific attributes instance and isa play an important role particularly in a useful form of
reasoning called property inheritance.
The predicates instance and isa explicitly captured the relationships they used to express,
namely class membership and class inclusion.
4.2 shows the first five sentences of the last section represented in logic in three different
ways.
The first part of the figure contains the representations we have already discussed. In
these representations, class membership represented with unary predicates (such as
Roman), each of which corresponds to a class.
Asserting that P(x) is true is equivalent to asserting that x is an instance (or element) of P.
The second part of the figure contains representations that use the instance predicate
explicitly.
70
71
Resolution
Propositional Resolution
1. Convert all the propositions of F to clause form.
2. Negate P and convert the result to clause form. Add it to the set of clauses obtained in
step 1.
3. Repeat until either a contradiction is found or no progress can be made:
1. Select two clauses. Call these the parent clauses.
2. Resolve them together. The resulting clause, called the resolvent, will be the
disjunction of all of the literals of both of the parent clauses with the following
exception: If there are any pairs of literals L and ¬ L such that one of the parent
clauses contains L and the other contains ¬L, then select one such pair and
eliminate both L and ¬ L from the resolvent.
3. If the resolvent is the empty clause, then a contradiction has been found. If it is
not, then add it to the set of classes available to the procedure.
The Unification Algorithm
In propositional logic, it is easy to determine that two literals cannot both be true at the
same time.
Simply look for L and ¬L in predicate logic, this matching process is more complicated
since the arguments of the predicates must be considered.
For example, man(John) and ¬man(John) is a contradiction, while the man(John) and
¬man(Spot) is not.
Thus, in order to determine contradictions, we need a matching procedure that compares
two literals and discovers whether there exists a set of substitutions that makes them
identical.
There is a straightforward recursive procedure, called the unification algorithm, that does
it.
Algorithm: Unify(L1, L2)
1. If L1 or L2 are both variables or constants, then:
1. If L1 and L2 are identical, then return NIL.
2. Else if L1 is a variable, then if L1 occurs in L2 then return {FAIL}, else return
(L2/L1).
3. Also, Else if L2 is a variable, then if L2 occurs in L1 then return {FAIL}, else
return (L1/L2). d. Else return {FAIL}.
2. If the initial predicate symbols in L1 and L2 are not identical, then return {FAIL}.
3. If LI and L2 have a different number of arguments, then return {FAIL}.
4. Set SUBST to NIL. (At the end of this procedure, SUBST will contain all the
substitutions used to unify L1 and L2.)
5. For I ← 1 to the number of arguments in L1 :
1. Call Unify with the ith argument of L1 and the ith argument of L2, putting the
result in S.
2. If S contains FAIL then return {FAIL}.
3. If S is not equal to NIL then:
2. Apply S to the remainder of both L1 and L2.
3. SUBST: = APPEND(S, SUBST).
6. Return SUBST.
72
Resolution Procedure
Resolution is a procedure, which gains its efficiency from the fact that it operates on
statements that have been converted to a very convenient standard form.
Resolution produces proofs by refutation.
In other words, to prove a statement (i.e., to show that it is valid), resolution attempts to
show that the negation of the statement produces a contradiction with the known
statements (i.e., that it is unsatisfiable).
The resolution procedure is a simple iterative process: at each step, two clauses, called
the parent clauses, are compared (resolved), resulting in a new clause that has inferred
from them. The new clause represents ways that the two parent clauses interact with each
other. Suppose that there are two clauses in the system:
winter V summer
¬ winter V cold
Now we observe that precisely one of winter and ¬ winter will be true at any point.
If winter is true, then cold must be true to guarantee the truth of the second clause. If ¬
winter is true, then summer must be true to guarantee the truth of the first clause.
Thus we see that from these two clauses we can deduce summer V cold
This is the deduction that the resolution procedure will make.
Resolution operates by taking two clauses that each contains the same literal, in this
example, winter.
Moreover, The literal must occur in the positive form in one clause and in negative form
in the other. The resolvent obtained by combining all of the literals of the two parent
clauses except the ones that cancel.
If the clause that produced is the empty clause, then a contradiction has found.
For example, the two clauses
winter
73
¬ winter
will produce the empty clause.
Rule for T
The simplest introduction rule is the one for T. It is called "unit". Because it has no premises, this
rule is an axiom: something that can start a proof.
(unit)
T
Rules for Implication
In natural deduction, to prove an implication of the form P ⇒ Q, we assume P, then reason under
that assumption to try to derive Q. If we are successful, then we can conclude that P ⇒ Q.
In a proof, we are always allowed to introduce a new assumption P, then reason under that
assumption. We must give the assumption a name; we have used the name x in the example
below. Each distinct assumption must have a different name.
(assum)
[x : P]
74
Because it has no premises, this rule can also start a proof. It can be used as if the proposition P
were proved. The name of the assumption is also indicated here.
However, you do not get to make assumptions for free! To get a complete proof, all assumptions
must be eventually discharged. This is done in the implication introduction rule. This rule
introduces an implication P ⇒ Q by discharging a prior assumption [x : P]. Intuitively, if Q can
be proved under the assumption P, then the implication P ⇒ Q holds without any assumptions.
We write x in the rule name to show which assumption is discharged. This rule and modus
ponens are the introduction and elimination rules for implications.
[x : P]
⋮ P P⇒Q
(⇒-elim, modus ponens)
Q Q
P ⇒ Q (⇒-intro/x)
A proof is valid only if every assumption is eventually discharged. This must happen in the proof
tree below the assumption. The same assumption can be used more than once.
Rules for Disjunction
P (∨-intro- Q (∨-intro- P∨Q P⇒R Q ⇒ R (∨-
P ∨ Q left) P ∨ Q right) R elim)
75
A proposition that has a complete proof in a deductive system is called a theorem of that system.
Soundness and Completeness
A measure of a deductive system's power is whether it is powerful enough to prove all true
statements. A deductive system is said to be complete if all true statements are theorems (have
proofs in the system). For propositional logic and natural deduction, this means that all
tautologies must have natural deduction proofs. Conversely, a deductive system is
called sound if all theorems are true. The proof rules we have given above are in fact sound and
complete for propositional logic: every theorem is a tautology, and every tautology is a theorem.
Finding a proof for a given tautology can be difficult. But once the proof is found, checking that
it is indeed a proof is completely mechanical, requiring no intelligence or insight whatsoever. It
is therefore a very strong argument that the thing proved is in fact true.
We can also make writing proofs less tedious by adding more rules that provide reasoning
shortcuts. These rules are sound if there is a way to convert a proof using them into a proof using
the original rules. Such added rules are called admissible.
76
A representation in which the control information that is necessary to use the knowledge
is embedded in the knowledge itself for e.g. computer programs, directions, and recipes;
these indicate specific use or implementation;
The real difference between declarative and procedural views of knowledge lies in where
control information reside.
For example, consider the following
Man (Marcus)
Man (Caesar)
Person (Cleopatra)
∀x: Man(x) → Person(x)
Now, try to answer the question. ?Person(y)
The knowledge base justifies any of the following answers.
Y=Marcus
Y=Caesar
Y=Cleopatra
We get more than one value that satisfies the predicate.
If only one value needed, then the answer to the question will depend on the order in
which the assertions examined during the search for a response.
If the assertions declarative then they do not themselves say anything about how they will
be examined. In case of procedural representation, they say how they will examine.
Declarative Knowledge
A statement in which knowledge specified, but the use to which that knowledge is to be
put is not given.
For example, laws, people’s name; these are the facts which can stand alone, not
dependent on other knowledge;
So to use declarative representation, we must have a program that explains what is to do
with the knowledge and how.
For example, a set of logical assertions can combine with a resolution theorem prover to
give a complete program for solving problems but in some cases, the logical assertions
can view as a program rather than data to a program.
Hence the implication statements define the legitimate reasoning paths and automatic
assertions provide the starting points of those paths.
These paths define the execution paths which is similar to the ‘if then else “in traditional
programming.
So logical assertions can view as a procedural representation of knowledge.
77
Also, The first two differences are the fact that PROLOG programs are actually sets of
Horn clause that have been transformed as follows:-
1. If the Horn Clause contains no negative literal then leave it as it is.
2. Also, Otherwise rewrite the Horn clauses as an implication, combining all of the
negative literals into the antecedent of the implications and the single positive
literal into the consequent.
Moreover, This procedure causes a clause which originally consisted of a disjunction of
literals (one of them was positive) to be transformed into a single implication whose
antecedent is a conjunction universally quantified.
But when we apply this transformation, any variables that occurred in negative literals
and so now occur in the antecedent become existentially quantified, while the variables in
the consequent are still universally quantified.
For example the PROLOG clause P(x): – Q(x, y) is equal to logical expression ∀x: ∃y: Q (x,
y) → P(x).
The difference between the logic and PROLOG representation is that the PROLOG
interpretation has a fixed control strategy. And so, the assertions in the PROLOG
program define a particular search path to answer any question.
But, the logical assertions define only the set of answers but not about how to choose
among those answers if there is more than one.
Consider the following example:
1. Logical representation
∀x : pet(x) ۸ small (x) → apartmentpet(x)
∀x : cat(x) ۸ dog(x) → pet(x)
∀x : poodle (x) → dog (x) ۸ small (x)
poodle (fluffy)
2. Prolog representation
apartmentpet (x) : pet(x), small (x)
pet (x): cat (x)
pet (x): dog(x)
dog(x): poodle (x)
small (x): poodle(x)
poodle (fluffy)
78
79
The third factor is nothing but deciding whether the reasoning process can justify its
reasoning. If it justifies then it can apply. For example, doctors are usually unwilling to
accept any advice from diagnostics process because it cannot explain its reasoning.
Example 4 of Forward versus Backward Reasoning
Prolog is an example of backward chaining rule system. In Prolog rules restricted to Horn
clauses. This allows for rapid indexing because all the rules for deducing a given fact
share the same rule head. Rules matched with unification procedure. Unification tries to
find a set of bindings for variables to equate a sub-goal with the head of some rule. Rules
in the Prolog program matched in the order in which they appear.
Combining Forward and Backward Reasoning
Instead of searching either forward or backward, you can search both simultaneously.
Also, That is, start forward from a starting state and backward from a goal state
simultaneously until the paths meet.
This strategy called Bi-directional search. The following figure shows the reason for a
Bidirectional search to be ineffective.
80
Many types of Reasoning have been identified and recognized, but many questions
regarding their logical and computational properties still remain controversial.
The popular methods of Reasoning include abduction, induction, model-based,
explanation and confirmation. All of them are intimately related to problems of belief
revision and theory development, knowledge absorption, discovery, and learning.
Logical Reasoning
Logic is a language for reasoning. It is a collection of rules called Logic arguments, we
use when doing logical reasoning.
The logic reasoning is the process of drawing conclusions from premises using rules of
inference.
The study of logic divided into formal and informal logic. The formal logic is sometimes
called symbolic logic.
Symbolic logic is the study of symbolic abstractions (construct) that capture the formal
features of logical inference by a formal system.
The formal system consists of two components, a formal language plus a set of inference
rules.
The formal system has axioms. Axiom is a sentence that is always true within the system.
Sentences derived using the system’s axioms and rules of derivation called theorems.
The Logical Reasoning is of our concern in AI.
Approaches to Reasoning
There are three different approaches to reasoning under uncertainties.
1. Symbolic reasoning
2. Statistical reasoning
3. Fuzzy logic reasoning
Symbolic Reasoning
The basis for intelligent mathematical software is the integration of the “power of
symbolic mathematical tools” with the suitable “proof technology”.
Mathematical reasoning enjoys a property called monotonicity, that says, “If a conclusion
follows from given premises A, B, C… then it also follows from any larger set of
premises, as long as the original premises A, B, C.. included.”
Moreover, Human reasoning is not monotonic.
People arrive at conclusions only tentatively; based on partial or incomplete information,
reserve the right to retract those conclusions while they learn new facts. Such reasoning
non-monotonic, precisely because the set of accepted conclusions have become smaller
when the set of premises expanded.
Formal Logic
Moreover, The Formal logic is the study of inference with purely formal content, i.e. where
content made explicit.
Examples – Propositional logic and Predicate logic.
Here the logical arguments are a set of rules for manipulating symbols. The rules are of
two types,
1. Syntax rules: say how to build meaningful expressions.
2. Inference rules: say how to obtain true formulas from other true formulas.
Moreover, Logic also needs semantics, which says how to assign meaning to expressions.
Uncertainty in Reasoning
81
The world is an uncertain place; often the Knowledge is imperfect which causes
uncertainty.
So, Therefore reasoning must be able to operate under uncertainty.
Also, AI systems must have the ability to reason under conditions of uncertainty.
Monotonic Reasoning
A reasoning process that moves in one direction only.
Moreover, The number of facts in the knowledge base is always increasing.
The conclusions derived are valid deductions and they remain so.
A monotonic logic cannot handle
1. Reasoning by default: because consequences may derive only because of lack of evidence
to the contrary.
2. Abductive reasoning: because consequences only deduced as most likely explanations.
3. Belief revision: because new knowledge may contradict old beliefs.
82
Implementation Issues
Evolution Frames
As seen in the previous example, there are certain problems which are difficult to solve
with Semantic Nets.
Although there is no clear distinction between a semantic net and frame system, more
structured the system is, more likely it is to be termed as a frame system.
A frame is a collection of attributes (called slots) and associated values that describe
some entities in the world. Sometimes a frame describes an entity in some absolute sense;
Sometimes it represents the entity from a particular point of view only.
A single frame taken alone is rarely useful; we build frame systems out of collections of
frames that connected to each other by virtue of the fact that the value of an attribute of
one frame may be another frame.
Frames as Sets and Instances
The set theory is a good basis for understanding frame systems.
Each frame represents either a class (a set) or an instance (an element of class)
Both isa and instance relations have inverse attributes, which we call subclasses & all
instances.
As a class represents a set, there are 2 kinds of attributes that can be associated with it.
1. Its own attributes &
2. Attributes that are to be inherited by each element of the set.
83
84
These frames will organize into an isa hierarchy, just as any other frames, and that
hierarchy can then used to support inheritance of values for attributes of slots.
Now let us formalize what is a slot. A slot here is a relation.
It maps from elements of its domain (the classes for which it makes sense) to elements of
its range (its possible values).
A relation is a set of ordered pairs.
Thus it makes sense to say that relation R1 is a subset of another relation R2.
In that case, R1 is a specialization of R2. Since a slot is a set, the set of all slots, which
we will call SLOT, is a meta-class.
Its instances are slots, which may have sub-slots.
Frame Example
In this example, the frames Person, Adult-Male, ML-Baseball-Player (corresponding to major
league baseball players), Pitcher, and ML-Baseball-Team (for major league baseball team) are all
classes.
85
Both the isa and instance relations have inverse attributes, which we call subclasses and
all instances.
Because a class represents a set, there are two kinds of attributes that can associate with
it.
Some attributes are about the set itself, and some attributes are to inherited by each
element of the set.
We indicate the difference between these two by prefixing the latter with an asterisk (*).
For example, consider the class ML-Baseball-Player, we have shown only two properties
of it as a set: It a subset of the set of adult males. And it has cardinality 624.
We have listed five properties that all major league baseball players have (height, bats,
batting average, team, and uniform-color), and we have specified default values for the
first three of them.
By providing both kinds of slots, we allow both classes to define a set of objects and to
describe a prototypical object of the set.
Frames are useful for representing objects that are typical of stereotypical situations.
The situation like the structure of complex physical objects, visual scenes, etc.
A commonsense knowledge can represent using default values if no other value exists.
Commonsense is generally used in the absence of specific knowledge.
Semantic Nets
Inheritance property can represent using isa and instance
Monotonic Inheritance can perform substantially more efficiently with such structures
than with pure logic, and non-monotonic inheritance is also easily supported.
The reason that makes Inheritance easy is that the knowledge in slot and filler systems is
structured as a set of entities and their attributes.
These structures turn out to be useful as,
It indexes assertions by the entities they describe. As a result, retrieving the value for an
attribute of an entity is fast.
Moreover, It makes easy to describe properties of relations. To do this in a purely logical
system requires higher-order mechanisms.
It is a form of object-oriented programming and has the advantages that such systems
normally include modularity and ease of viewing by people.
Here we would describe two views of this kind of structure – Semantic Nets & Frames.
Semantic Nets
There are different approaches to knowledge representation include semantic net, frames,
and script.
The semantic net describes both objects and events.
In a semantic net, information represented as a set of nodes connected to each other by a
set of labeled arcs, which represents relationships among the nodes.
It is a directed graph consisting of vertices which represent concepts and edges which
represent semantic relations between the concepts.
It is also known as associative net due to the association of one node with other.
The main idea is that the meaning of the concept comes from the ways in which it
connected to other concepts.
We can use inheritance to derive additional relations.
86
Conceptual Dependency
Introduction to Strong Slot and Filler Structures
The main problem with semantic networks and frames is that they lack formality; there is
no specific guideline on how to use the representations.
In frame when things change, we need to modify all frames that are relevant – this can be
time-consuming.
Strong slot and filler structures typically represent links between objects according to
more rigid rules, specific notions of what types of object and relations between them are
provided and represent knowledge about common situations.
Moreover, We have types of strong slot and filler structures:
1. Conceptual Dependency (CD)
2. Scripts
3. Cyc
Conceptual Dependency (CD)
Conceptual Dependency originally developed to represent knowledge acquired from natural
language input.
The goals of this theory are:
To help in the drawing of the inference from sentences.
To be independent of the words used in the original input.
That is to say: For any 2 (or more) sentences that are identical in meaning there should be
only one representation of that meaning.
Moreover, It has used by many programs that portend to understand English (MARGIE, SAM,
PAM).
87
88
Scripts
Scripts Strong Slot
A script is a structure that prescribes a set of circumstances which could be expected to
follow on from one another.
It is similar to a thought sequence or a chain of situations which could be anticipated.
It could be considered to consist of a number of slots or frames but with more specialized
roles.
Scripts are beneficial because:
Events tend to occur in known runs or patterns.
Causal relationships between events exist.
Entry conditions exist which allow an event to take place
Prerequisites exist for events taking place. E.g. when a student progresses through a
degree scheme or when a purchaser buys a house.
Script Components
Each script contains the following main components.
Entry Conditions: Must be satisfied before events in the script can occur.
Results: Conditions that will be true after events in script occur.
Props: Slots representing objects involved in the events.
Roles: Persons involved in the events.
Track: the Specific variation on the more general pattern in the script. Different tracks
may share many components of the same script but not all.
89
Scenes: The sequence of events that occur. Events represented in conceptual dependency
form.
Advantages and Disadvantages of Script
Advantages
Capable of predicting implicit events
Single coherent interpretation may be build up from a collection of observations.
Disadvantage
More specific (inflexible) and less general than frames.
Not suitable to represent all kinds of knowledge.
To deal with inflexibility, smaller modules called memory organization packets (MOP)
can combine in a way that appropriates for the situation.
Script Example
90
Getting significance from the story is not straightforward. However, some heuristics can apply to
get the value.
CYC
What is CYC?
An ambitious attempt to form a very large knowledge base aimed at capturing
commonsense reasoning.
Initial goals to capture knowledge from a hundred randomly selected articles in the
Encyclopedia Britannica.
Also, Both Implicit and Explicit knowledge encoded.
Moreover, Emphasis on study of underlying information (assumed by the authors but not
needed to tell to the readers.
Example: Suppose we read that Wellington learned of Napoleon’s death Then we (humans)
can conclude Napoleon never new that Wellington had died.
How do we do this?
So, We require special implicit knowledge or commonsense such as:
We only die once.
You stay dead.
Moreover, You cannot learn anything when dead.
Time cannot go backward.
Why build large knowledge bases:
1. Brittleness
Specialised knowledge bases are brittle. Hard to encode new situations and non-
graceful degradation in performance. Commonsense based knowledge bases
should have a firmer foundation.
2. Form and Content
Moreover, Knowledge representation may not be suitable for AI. Commonsense
strategies could point out where difficulties in content may affect the form.
3. Shared Knowledge
Also, Should allow greater communication among systems with common bases
and assumptions.
How is CYC coded?
By hand.
Special CYCL language:
LISP-like.
Frame-based
Multiple inheritances
Slots are fully fledged objects.
Generalized inheritance — any link not just isa and instance.
Module 2
Game Playing:
91
Game Playing
Charles Babbage, the nineteenth-century computer architect thought about programming
his analytical engine to play chess and later of building a machine to play tic-tac-toe.
There are two reasons that games appeared to be a good domain.
1. They provide a structured task in which it is very easy to measure success or
failure.
2. They are easily solvable by straightforward search from the starting state to a
winning position.
The first is true is for all games bust the second is not true for all, except simplest games.
For example, consider chess.
The average branching factor is around 35. In an average game, each player might make
50.
So in order to examine the complete game tree, we would have to examine 35100
Thus it is clear that a simple search is not able to select even its first move during the
lifetime of its opponent.
It is clear that to improve the effectiveness of a search based problem-solving program
two things can do.
1. Improve the generate procedure so that only good moves generated.
2. Improve the test procedure so that the best move will recognize and explored first.
If we use legal-move generator then the test procedure will have to look at each of them
because the test procedure must look at so many possibilities, it must be fast.
Instead of the legal-move generator, we can use plausible-move generator in which only
some small numbers of promising moves generated.
As the number of lawyers available moves increases, it becomes increasingly important
in applying heuristics to select only those moves that seem more promising.
The performance of the overall system can improve by adding heuristic knowledge into
both the generator and the tester.
In game playing, a goal state is one in which we win but the game like chess. It is not
possible. Even we have good plausible move generator.
The depth of the resulting tree or graph and its branching factor is too great.
It is possible to search tree only ten or twenty moves deep then in order to choose the best
move. The resulting board positions must compare to discover which is most
advantageous.
This is done using static evolution function, which uses whatever information it has to
evaluate individual board position by estimating how likely they are to lead eventually to
a win.
Its function is similar to that of the heuristic function h’ in the A* algorithm: in the
absence of complete information, choose the most promising position.
92
Now we can apply the static evolution function to those positions and simply choose the
best one.
After doing so, we can back that value up to the starting position to represent our
evolution of it.
Here we assume that static evolution function returns larger values to indicate good
situations for us.
So our goal is to maximize the value of the static evaluation function of the next board
position.
The opponents’ goal is to minimize the value of the static evaluation function.
The alternation of maximizing and minimizing at alternate ply when evaluations are
to be pushed back up corresponds to the opposing strategies of the two players is
called MINIMAX.
It is the recursive procedure that depends on two procedures
MOVEGEN(position, player)— The plausible-move generator, which returns a
list of nodes representing the moves that can make by Player in Position.
STATIC(position, player)– static evaluation function, which returns a number
representing the goodness of Position from the standpoint of Player.
With any recursive program, we need to decide when recursive procedure should stop.
There are the variety of factors that may influence the decision they are,
Has one side won?
How many plies have we already explored? Or how much time is left?
How stable is the configuration?
We use DEEP-ENOUGH which assumed to evaluate all of these factors and to return
TRUE if the search should be stopped at the current level and FALSE otherwise.
It takes two parameters, position, and depth, it will ignore its position parameter and
simply return TRUE if its depth parameter exceeds a constant cut off value.
One problem that arises in defining MINIMAX as a recursive procedure is that it needs to
return not one but two results.
The backed-up value of the path it chooses.
The path itself. We return the entire path even though probably only the first
element, representing the best move from the current position, actually needed.
We assume that MINIMAX returns a structure containing both results and we have two
functions, VALUE and PATH that extract the separate components.
Initially, It takes three parameters, a board position, the current depth of the search, and
the player to move,
MINIMAX(current,0,player-one) If player –one is to move
MINIMAX(current,0,player-two) If player –two is to move
93
It requires maintaining of two threshold values, one representing a lower bound on that a
maximizing node may ultimately assign (we call this alpha).
And another representing an upper bound on the value that a minimizing node may assign
(this we call beta).
Each level must receive both the values, one to use and one to pass down to the next level
to use.
The MINIMAX procedure as it stands does not need to treat maximizing and minimizing
levels differently. Since it simply negates evaluation each time it changes levels.
Instead of referring to alpha and beta, MINIMAX uses two values, USE-THRESH and
PASSTHRESH.
USE-THRESH used to compute cutoffs. PASS-THRESH passed to next level as its
USETHRESH.
USE-THRESH must also pass to the next level, but it will pass as PASS-THRESH so that
it can be passed to the third level down as USE-THRESH again, and so forth.
Just as values had to negate each time they passed across levels.
Still, there is no difference between the code required at maximizing levels and that
required at minimizing levels.
PASS-THRESH should always the maximum of the value it inherits from above and the
best move found at its level.
If PASS-THRESH updated the new value should propagate both down to lower levels.
And back up to higher ones so that it always reflects the best move found anywhere in the
tree.
The MINIMAX-A-B requires five arguments, position, depth, player, Use-thresh, and
passThresh.
MINIMAX-A-B(current,0,player-one,maximum value static can compute, minimum
value static can compute).
Iterative Deepening Search(IDS) or Iterative Deepening Depth First Search(IDDFS)
There are two common ways to traverse a graph, BFS and DFS. Considering a Tree (or Graph)
of huge height and width, both BFS and DFS are not very efficient due to following reasons.
1. DFS first traverses nodes going through one adjacent of root, then next adjacent. The
problem with this approach is, if there is a node close to root, but not in first few subtrees
explored by DFS, then DFS reaches that node very late. Also, DFS may not find shortest
path to a node (in terms of number of edges).
94
2. BFS goes level by level, but requires more space. The space required by DFS is O(d)
where d is depth of tree, but space required by BFS is O(n) where n is number of nodes in
tree (Why? Note that the last level of tree can have around n/2 nodes and second last
level n/4 nodes and in BFS we need to have every level one by one in queue).
IDDFS combines depth-first search’s space-efficiency and breadth-first search’s fast search (for
nodes closer to root).
How does IDDFS work?
IDDFS calls DFS for different depths starting from an initial value. In every call, DFS is
restricted from going beyond given depth. So basically we do DFS in a BFS fashion.
Algorithm:
95
return false
An important thing to note is, we visit top level nodes multiple times. The last (or max depth)
level is visited once, second last level is visited twice, and so on. It may seem expensive, but it
turns out to be not so costly, since in a tree most of the nodes are in the bottom level. So it does
not matter much if the upper levels are visited multiple times.
Planning
Blocks World Problem
In order to compare the variety of methods of planning, we should find it useful to look at all of
them in a single domain that is complex enough that the need for each of the mechanisms is
apparent yet simple enough that easy-to-follow examples can be found.
There is a flat surface on which blocks can be placed.
There are a number of square blocks, all the same size.
They can be stacked one upon the other.
There is robot arm that can manipulate the blocks.
Actions of the robot arm
1. UNSTACK(A, B): Pick up block A from its current position on block B.
2. STACK(A, B): Place block A on block B.
3. PICKUP(A): Pick up block A from the table and hold it.
4. PUTDOWN(A): Put block A down on the table.
Notice that the robot arm can hold only one block at a time.
Predicates
In order to specify both the conditions under which an operation may be performed and
the results of performing it, we need the following predicates:
1. ON(A, B): Block A is on Block B.
2. ONTABLES(A): Block A is on the table.
3. CLEAR(A): There is nothing on the top of Block A.
4. HOLDING(A): The arm is holding Block A.
5. ARMEMPTY: The arm is holding nothing.
Robot problem-solving systems (STRIPS)
List of new predicates that the operator causes to become true is ADD List
Moreover, List of old predicates that the operator causes to become false is DELETE List
PRECONDITIONS list contains those predicates that must be true for the operator to be
applied.
STRIPS style operators for BLOCKs World
STACK(x, y)
P: CLEAR(y)^HOLDING(x)
D: CLEAR(y)^HOLDING(x)
A: ARMEMPTY^ON(x, y)
UNSTACK(x, y)
PICKUP(x)
P: CLEAR(x) ^ ONTABLE(x) ^ARMEMPTY
D: ONTABLE(x) ^ ARMEMPTY
96
A: HOLDING(x)
PUTDOWN(x)
Goal Stack Planning
To start with goal stack is simply:
ON(C,A)^ON(B,D)^ONTABLE(A)^ONTABLE(D)
This problem is separate into four sub-problems, one for each component of the goal.
Two of the sub-problems ONTABLE(A) and ONTABLE(D) are already true in the initial state.
97
CLEAR(A)^HOLDING(C)
STACK(C,A)
ON(B,D)
ON(C,A)^ON(B,D)^OTAD
We see that we can pop predicates on the stack till we reach HOLDING(C) for which we
need to find a suitable operator.
Moreover, The operators that might make HOLDING(C) true: PICKUP(C) and
UNSTACK(C, x). Without looking ahead, since we cannot tell which of these operators
is appropriate. Also, we create two branches of the search tree corresponding to the
following goal stacks:
Complete plan
1. UNSTACK(C, A)
2. PUTDOWN(C )
3. PICKUP(A)
4. STACK(A, B)
5. UNSTACK(A, B)
6. PUTDOWN(A)
7. PICKUP(B)
8. STACK(B, C)
9. PICKUP(A)
10. STACK(A,B)
Planning Components
Methods which focus on ways of decomposing the original problem into appropriate
subparts and on ways of recording and handling interactions among the subparts as they
are detected during the problem-solving process are often called as planning.
Planning refers to the process of computing several steps of a problem-solving procedure
before executing any of them.
Components of a planning system
Choose the best rule to apply next, based on the best available heuristic information.
The most widely used technique for selecting appropriate rules to apply is first to isolate
a set of differences between desired goal state and then to identify those rules that are
relevant to reduce those differences.
If there are several rules, a variety of other heuristic information can be exploited to
choose among them.
Apply the chosen rule to compute the new problem state that arises from its application.
98
In simple systems, applying rules is easy. Each rule simply specifies the problem state
that would result from its application.
In complex systems, we must be able to deal with rules that specify only a small part of
the complete problem state.
One way is to describe, for each action, each of the changes it makes to the state
description.
Detect when a solution has found.
A planning system has succeeded in finding a solution to a problem when it has found a
sequence of operators that transform the initial problem state into the goal state.
How will it know when this has done?
In simple problem-solving systems, this question is easily answered by a straightforward
match of the state descriptions.
One of the representative systems for planning systems is predicate logic. Suppose that as
a part of our goal, we have the predicate P(x).
To see whether P(x) satisfied in some state, we ask whether we can prove P(x) given the
assertions that describe that state and the axioms that define the world model.
Detect dead ends so that they can abandon and the system’s effort directed in more fruitful
directions.
As a planning system is searching for a sequence of operators to solve a particular
problem, it must be able to detect when it is exploring a path that can never lead to a
solution.
The same reasoning mechanisms that can use to detect a solution can often use for
detecting a dead end.
If the search process is reasoning forward from the initial state. It can prune any path that
leads to a state from which the goal state cannot reach.
If search process reasoning backward from the goal state, it can also terminate a path
either because it is sure that the initial state cannot reach or because little progress made.
Detect when an almost correct solution has found and employ special techniques to make it
totally correct.
The kinds of techniques discussed are often useful in solving nearly decomposable
problems.
One good way of solving such problems is to assume that they are completely
decomposable, proceed to solve the sub-problems separately. And then check that when
the sub-solutions combined. They do in fact give a solution to the original problem.
99
The problem solver also relies on a database that describes the current situation and a set
of operators described as PRECONDITION, ADD and DELETE lists.
The goal stack planning method attacks problems involving conjoined goals by solving
the goals one at a time, in order.
A plan generated by this method contains a sequence of operators for attaining the first
goal, followed by a complete sequence for the second goal etc.
At each succeeding step of the problem-solving process, the top goal on the stack will
pursue.
When a sequence of operators that satisfies it, found, that sequence applied to the state
description, yielding new description.
Next, the goal that then at the top of the stack explored. And an attempt made to satisfy it,
starting from the situation that produced as a result of satisfying the first goal.
This process continues until the goal stack is empty.
Then as one last check, the original goal compared to the final state derived from the
application of the chosen operators.
If any components of the goal not satisfied in that state. Then those unsolved parts of the
goal reinserted onto the stack and the process resumed.
100
Hierarchical Planning
In order to solve hard problems, a problem solver may have to generate long plans.
It is important to be able to eliminate some of the details of the problem until a solution
that addresses the main issues is found.
Then an attempt can make to fill in the appropriate details.
Early attempts to do this involved the use of macro operators, in which larger operators
were built from smaller ones.
In this approach, no details eliminated from actual descriptions of the operators.
ABSTRIPS
A better approach developed in ABSTRIPS systems which actually planned in a hierarchy of
abstraction spaces, in each of which preconditions at a lower level of abstraction ignored.
ABSTRIPS approach is as follows:
First solve the problem completely, considering only preconditions whose criticality
value is the highest possible.
These values reflect the expected difficulty of satisfying the precondition.
To do this, do exactly what STRIPS did, but simply ignore the preconditions of lower
than peak criticality.
Once this done, use the constructed plan as the outline of a complete plan and consider
preconditions at the next-lowest criticality level.
Augment the plan with operators that satisfy those preconditions.
Because this approach explores entire plans at one level of detail before it looks at the
lower-level details of any one of them, it has called length-first approach.
The assignment of appropriate criticality value is crucial to the success of this hierarchical
planning method.
Those preconditions that no operator can satisfy are clearly the most critical.
Example, solving a problem of moving the robot, for applying an operator, PUSH-THROUGH
DOOR, the precondition that there exist a door big enough for the robot to get through is of high
criticality since there is nothing we can do about it if it is not true.
The idea of reactive systems is to avoid planning altogether, and instead, use the
observable situation as a clue to which one can simply react.
A reactive system must have access to a knowledge base of some sort that describes what
actions should be taken under what circumstances.
A reactive system is very different from the other kinds of planning systems we have
discussed. Because it chooses actions one at a time.
It does not anticipate and select an entire action sequence before it does the first thing.
The example is a Thermostat. The job of the thermostat is to keep the temperature
constant inside a room.
Reactive systems are capable of surprisingly complex behaviors.
The main advantage reactive systems have over traditional planners is that they operate
robustly in domains that are difficult to model completely and accurately.
Reactive systems dispense with modeling altogether and base their actions directly on
their perception of the world.
Another advantage of reactive systems is that they are extremely responsive since they
avoid the combinatorial explosion involved in deliberative planning.
This makes them attractive for real-time tasks such as driving and walking.
Other Planning Techniques
Triangle tables
Provides a way of recording the goals that each operator expected to satisfy as well as the
goals that must be true for it to execute correctly.
Meta-planning
A technique for reasoning not just about the problem solved but also about the planning
process itself.
Macro-operators
Allow a planner to build new operators that represent commonly used sequences of
operators.
Case-based planning:
Re-uses old plans to make new ones.
UNDERSTANDING
Understanding is the simplest procedure of all human beings. Understanding means ability to
determine some new knowledge from a given knowledge. For each action of a problem, the
mapping of some new actions is very necessary. Mapping the knowledge means transferring the
knowledge from one representation to another representation. For example, if you will say “I
need to go to New Delhi” for which you will book the tickets. The system will have
“understood” if it finds the first available plane to New Delhi. But if you will say the same thing
to you friends, who knows that your family lives in “New Delhi”, he/she will have “understood”
if he/she realizes that there may be a problem or occasion in your family. For people,
understanding applies to inputs from all the senses. Computer understanding has so far been
applied primarily to images, speech and typed languages. It is important to keep in mind that the
success or failure of an “understanding” problem can rarely be measured in an absolute sense but
must instead be measured with respect to a particular task to be performed. There are some
factors that contribute to the difficulty of an understanding problem.
(a) If the target representation is very complex for which you cannot map from the original
representation.
102
(b) There are different types of mapping factors may arise like one-to-one, one-to-many and
many to many.
(c) Some noise or disturbing factors are also there.
(d) The level of interaction of the source components may be complex one.
(e) The problem solver might be unknown about some more complex problems.
(f) The intermediary actions may also be unavailable.
Consider an example of an English sentence which is being used for communication with a
keyword based data retrieval system. Suppose I want to know all about the temples in India. So I
would need to be translated into a representation such as The above sentence is a simple sentence
for which the corresponding representation may be easy to implement. But what for the complex
queries?
This type of complex queries can be modeled with the conceptual dependency representation
which is more complex than that of simple representation. Constructing these queries is very
difficult since more informationare to be extracted. Extracting more information will require
some more knowledge. Also the type of mapping process is not quite easy to the problem solver.
Understanding is the process of mapping an input from its original form to a more useful one.
The simplest kind of mapping is “one-toone”.
In one-to-one mapping each different problems would lead to only one solution. But there are
very few inputs which are one-to-one. Other mappings are quite difficult to implement. Many-to-
one mappings are frequent is that free variation is often allowed, either because of the physical
limitations of that produces the inputs or because such variation simply makes the task of
generating the inputs.
Many to one mapping require that the understanding system know about all the ways that a target
representation can be expressed in the source language. One-to-many mapping requires a great
deal of domain knowledge in order to make the correct choice among the available target
representation.
The mapping process is simplest if each component can be mapped without concern for the other
components of the statement. If the number of interactions increases, then the complexity of the
problem will increase. In many understanding situations the input to which meaning should be
assigned is not always the input that is presented to the under stander.
Because of the complex environment in which understanding usually occurs, other things often
interfere with the basic input before it reaches the under stander. Hence the understanding will be
more complex if there will be some sort of noise on the inputs.
104
Pull apart the word “Bill’s” into proper noun “Bill” and the possessive suffix “’s”
Recognize the sequence “.init” as a file extension that is functioning as an adjective in the
sentence.
This process will usually assign syntactic categories to all the words in the sentence.
Syntactic Analysis
A syntactic analysis must exploit the results of the morphological analysis to build a
structural description of the sentence.
The goal of this process, called parsing, is to convert the flat list of words that form the
sentence into a structure that defines the units that represented by that flat list.
The important thing here is that a flat sentence has been converted into a hierarchical
structure. And that the structure corresponds to meaning units when a semantic analysis
performed.
Reference markers (set of entities) shown in the parenthesis in the parse tree.
Each one corresponds to some entity that has mentioned in the sentence.
These reference markers are useful later since they provide a place in which to
accumulate information about the entities as we get it.
Semantic Analysis
The semantic analysis must do two important things:
1. It must map individual words into appropriate objects in the knowledge base or
database.
2. It must create the correct structures to correspond to the way the meanings of the
individual words combine with each other.
Discourse Integration
Specifically, we do not know whom the pronoun “I” or the proper noun “Bill” refers to.
To pin down these references requires an appeal to a model of the current discourse
context, from which we can learn that the current user is USER068 and that the only
person named “Bill” about whom we could be talking is USER073.
Once the correct referent for Bill known, we can also determine exactly which file
referred to.
Pragmatic Analysis
The final step toward effective understanding is to decide what to do as a result.
105
One possible thing to do to record what was said as a fact and done with it.
For some sentences, a whose intended effect is clearly declarative, that is the precisely
correct thing to do.
But for other sentences, including this one, the intended effect is different.
We can discover this intended effect by applying a set of rules that characterize
cooperative dialogues.
The final step in pragmatic processing to translate, from the knowledge-based
representation to a command to be executed by the system.
Syntactic Processing
Syntactic Processing is the step in which a flat input sentence converted into a
hierarchical structure that corresponds to the units of meaning in the sentence. This
process called parsing.
It plays an important role in natural language understanding systems for two reasons:
1. Semantic processing must operate on sentence constituents. If there is no syntactic
parsing step, then the semantics system must decide on its own constituents. If
parsing is done, on the other hand, it constrains the number of constituents that
semantics can consider.
2. Syntactic parsing is computationally less expensive than is semantic processing.
Thus it can play a significant role in reducing overall system complexity.
Although it is often possible to extract the meaning of a sentence without using
grammatical facts, it is not always possible to do so.
Almost all the systems that are actually used have two main components:
1. A declarative representation, called a grammar, of the syntactic facts about the
language.
2. A procedure, called parser that compares the grammar against input sentences to
produce parsed structures.
Grammars and Parsers
The most common way to represent grammars is a set of production rules.
The first rule can read as “A sentence composed of a noun phrase followed by Verb
Phrase”; the Vertical bar is OR; ε represents the empty string.
Symbols that further expanded by rules called non-terminal symbols.
Symbols that correspond directly to strings that must found in an input sentence called
terminal symbols.
Grammar formalism such as this one underlies many linguistic theories, which in turn
provide the basis for many natural language understanding systems.
Pure context-free grammars are not effective for describing natural languages.
NLPs have less in common with computer language processing systems such as
compilers.
Parsing process takes the rules of the grammar and compares them against the input
sentence.
The simplest structure to build is a Parse Tree, which simply records the rules and how
they matched.
Every node of the parse tree corresponds either to an input word or to a non-terminal in
our grammar.
Each level in the parse tree corresponds to the application of one grammar rule.
106
107
Procedures that build structures that will form part of the final parse.
Procedures that perform arbitrary tests on current input and sentence components that
have identified.
Semantic Analysis
The structures created by the syntactic analyzer assigned meanings.
A mapping made between the syntactic structures and objects in the task domain.
Structures for which no such mapping is possible may rejected.
The semantic analysis must do two important things:
It must map individual words into appropriate objects in the knowledge base or
database.
It must create the correct structures to correspond to the way the meanings of the
individual words combine with each other. Semantic Analysis AI
Producing a syntactic parse of a sentence is only the first step toward understanding it.
We must produce a representation of the meaning of the sentence.
Because understanding is a mapping process, we must first define the language into
which we are trying to map.
There is no single definitive language in which all sentence meaning can describe.
The choice of a target language for any particular natural language understanding
program must depend on what is to do with the meanings once they constructed.
Choice of the target language in Semantic Analysis AI
There are two broad families of target languages that used in NL systems,
depending on the role that the natural language system playing in a larger system:
When natural language considered as a phenomenon on its own, as for example
when one builds a program whose goal is to read the text and then answer
questions about it. A target language can design specifically to support language
processing.
When natural language used as an interface language to another program (such as
a db query system or an expert system), then the target language must legal input
to that other program. Thus the design of the target language driven by the
backend program.
108
109
The learning procedures used during machine learning automatically focus on the most
common cases, whereas when writing rules by hand it is often not at all obvious where
the effort should be directed.
Automatic learning procedures can make use of statistical inference algorithms to
produce models that are robust to unfamiliar input (e.g. containing words or structures
that have not been seen before) and to erroneous input (e.g. with misspelled words or
words accidentally omitted). Generally, handling such input gracefully with hand-written
rules—or more generally, creating systems of hand-written rules that make soft
decisions—is extremely difficult, error-prone and time-consuming.
Systems based on automatically learning the rules can be made more accurate simply by
supplying more input data. However, systems based on hand-written rules can only be
made more accurate by increasing the complexity of the rules, which is a much more
difficult task. In particular, there is a limit to the complexity of systems based on hand-
crafted rules, beyond which the systems become more and more unmanageable.
However, creating more data to input to machine-learning systems simply requires a
corresponding increase in the number of man-hours worked, generally without significant
increases in the complexity of the annotation process.
Spell Checking
Spell checking is one of the applications of natural language processing that impacts billions of
users daily. A good introduction to spell checking can be found on Peter Norvig’s webpage. The
article introduces a simple 21-line spell checker implementation in Python combining simple
language and error models to predict the word a user intended to type. The language model
estimates how likely a given word `c` is in the language for which the spell checker is
designed, this can be written as `P(C)`. The error model estimates the probability `P(w|c)`
of typing the misspelled version `w` conditionally to the intention of typing the correctly
spelled word `c`.The spell checker then returns word `c` corresponding to the highest value of
`P(w|c)P(c)` among all possible words in the language.
Module 3
LEARNING
Learning is the improvement of performance with experience over time.
Learning element is the portion of a learning AI system that decides how to modify the
performance element and implements those modifications.
We all learn new knowledge through different methods, depending on the type of material to be
learned, the amount of relevant knowledge we already possess, and the environment in which the
learning takes place. There are five methods of learning . They are,
1. Memorization (rote learning)
2. Direct instruction (by being told)
3. Analogy
4. Induction
110
5. Deduction
Learning by memorizations is the simplest from of le4arning. It requires the least amount of
inference and is accomplished by simply copying the knowledge in the same form that it will be
used directly into the knowledge base.
Example:- Memorizing multiplication tables, formulate , etc.
Direct instruction is a complex form of learning. This type of learning requires more inference
than role learning since the knowledge must be transformed into an operational form before
learning when a teacher presents a number of facts directly to us in a well organized manner.
Analogical learning is the process of learning a new concept or solution through the use of
similar known concepts or solutions. We use this type of learning when solving problems on an
exam where previously learned examples serve as a guide or when make frequent use of
analogical learning. This form of learning requires still more inferring than either of the previous
forms. Since difficult transformations must be made between the known and unknown situations.
Learning by induction is also one that is used frequently by humans . it is a powerful form of
learning like analogical learning which also require s more inferring than the first two methods.
This learning re quires the use of inductive inference, a form of invalid but useful inference. We
use inductive learning ofinstances of examples of the concept. For example we learn the
concepts of color or sweet taste after experiencing the sensations associated with several
examples of colored objects or sweet foods.
Deductive learning is accomplished through a sequence of deductive inference steps using
known facts. From the known facts, new facts or relationships are logically derived. Deductive
learning usually requires more inference than the other methods.
Review Questions:-
1. what is perception ?
2. How do we overcome the Perceptual Problems?
3. Explain in detail the constraint satisfaction waltz algorithm?
4. What is learning ?
5. What is Learning element ?
6. List and explain the methods of learning?
Types of learning:- Classification or taxonomy of learning types serves as a guide in studying or
comparing a differences among them. One can develop learning taxonomies based on the type of
knowledge representation used (predicate calculus , rules, frames), the type of knowledge
learned (concepts, game playing, problem solving), or by the area of application(medical
diagnosis , scheduling , prediction and so on).
The classification is intuitively more appealing and is one which has become popular among
machine learning researchers . it is independent of the knowledge domain and the representation
scheme is used. It is based on the type of inference strategy employed or the methods used in the
learning process. The five different learning methods under this taxonomy are:
Memorization (rote learning)
Direct instruction(by being told)
Analogy
Induction
Deduction
Learning by memorization is the simplest form of learning. It requires the least5 amount of
inference and is accomplished by simply copying the knowledge in the same form that it will be
111
used directly into the knowledge base. We use this type of learning when we memorize
multiplication tables ,
for example.
A slightly more complex form of learning is by direct instruction. This type of learning requires
more understanding and inference than role learning since the knowledge must be transformed
into an operational form before being integrated into the knowledge base. We use this type of
learning when a teacher presents a number of facts directly to us in a well organized manner.
The third type listed, analogical learning, is the process of learning an ew concept or solution
through the use of similar known concepts or solutions. We use this type of learning when
solving problems on an examination where previously learned examples serve as a guide or
when we learn to drive a truck using our knowledge of car driving. We make frewuence use of
analogical learning. This form of learning requires still more inferring than either of the previous
forms, since difficult transformations must be made between the known and unknown situations.
This is a kind of application of knowledge in a new situation.
The fourth type of learning is also one that is used frequency by humans. It is a powerful form of
learning which, like analogical learning, also requires more inferring than the first two methods.
This form of learning requires the use of inductive inference, a form of invalid but useful
inference. We use inductive learning when wed formulate a general concept after seeing a
number of instance or examples of the concept. For example, we learn the concepts of color
sweet taste after experiencing the sensation associated with several examples of colored objects
or sweet foods.
The final type of acquisition is deductive learning. It is accomplished through a sequence of
deductive inference steps using known facts. From the known facts, new facts or relationships
are logically derived. Deductive learning usually requires more inference than the other methods.
The inference method used is, of course , a deductive type, which is a valid from of inference.
In addition to the above classification, we will sometimes refer to learning methods as wither
methods or knowledge-rich methods. Weak methods are general purpose methods in which little
or no initial knowledge is available. These methods are more mechanical than the classical AI
knowledge – rich methods. They often rely on a form of heuristics search in the learning process.
Rote Learning
Rote learning is the basic learning activity. Rote learning is a memorization technique based
on repetition. It is also called memorization because the knowledge, without any modification is,
simply copied into the knowledge base. As computed values are stored, this technique can save a
significant amount of time.
Rote learning technique can also be used in complex learning systems provided sophisticated
techniques are employed to use the stored values faster and there is a generalization to keep the
number of stored information down to a manageable level. Checkers-playing program, for ex
The idea is that one will be able to quickly recall the meaning of the material the more one
repeats it. Some of the alternatives to rote learning include meaningful learning, associative
learning, and active learning. ample, uses this technique to learn the board positions it evaluates
in its look-ahead search.
112
This is a simple form of learning. Suppose a programmer writes a set of instructions to instruct
the computer what to do, the programmer is a teacher and the computer is a student. Once
learned (i.e. programmed), the system will be in a position to do new things.
The advice may come from many sources: human experts, internet to name a few. This type of
learning requires more inference than rote learning. The knowledge must be transformed into an
operational form before stored in the knowledge base. Moreover the reliability of the source of
knowledge should be considered.
The system should ensure that the new knowledge is conflicting with the existing knowledge.
FOO (First Operational Operationaliser), for example, is a learning system which is used to learn
the game of Hearts. It converts the advice which is in the form of principles, problems, and
methods into effective executable (LISP) procedures (or knowledge). Now this knowledge is
ready to use.
general learning model is depicted in figure 4.1 where the environment has been included as a
part of the overall learner system. The environment may be regarded as either a form of nature
which produces random stimuli or as a more organized training source such as a teacher which
provides carefully selected training examples for the learner component. The actual form of
environment used will depend on the particular learning paradigm. In any case, some
representation language must be assumed for communication between the environment and the
learner. The language may be the same representation scheme as that used in the knowledge base
(such as a form of predicate calculus). When they are hosen to be the same, we say the single
representation trick is being used. This usually results in a simpler implementation since it is not
necessary to transform between two or more different representations.
113
For some systems the environment may be a user working at a keyboard . Other systems will use
program modules to simulate a particular environment. In even more realistic cases the system
will have real physical sensors which interface with some world environment.
Inputs to the learner component may be physical stimuli of some type or descriptive , symbolic
training examples. The information conveyed to the learner component is used to create and
modify knowledge structures in the knowledge base. This same knowledge is used by the
performance component to carry out some tasks, such as solving a problem playing a game, or
classifying instances of some concept.
given a task, the performance component produces a response describing its action in
performing the task. The critic module then evaluates this response relative to an optimal
response.
Feedback , indicating whether or not the performance was acceptable , is then sent by the critic
module to the learner component for its subsequent use in modifying the structures in the
knowledge base. If proper learning was accomplished, the system’s performance will have
improved with the changes made to the knowledge base.
The cycle described above may be repeated a number of times until the performance of the
system has reached some acceptable level, until a known learning goal has been reached, or until
changes ceases to occur in the knowledge base after some chosen number of training examples
have been observed.
There are several important factors which influence a system’s ability to learn in addition to the
form of representation used. They include the types of training provided, the form and extent of
any initial background knowledge , the type of feedback provided, and the learning algorithms
used.
The type of training used in a system can have a strong effect on performance, much the same as
it does for humans. Training may consist of randomly selected instance or examples that have
been carefully selected and ordered for presentation. The instances may be positive examples of
some concept or task a being learned, they may be negative, or they may be mixture of both
positive and negative. The instances may be well focused using only relevant information, or
they may contain a variety of facts and details including irrelevant data.
There are Many forms of learning can be characterized as a search through a space of possible
hypotheses or solutions. To make learning more efficient. It is necessary to constrain this search
process or reduce the search space. One method of achieving this is through the use of
background knowledge which can be used to constrain the search space or exercise control
operations which limit the search process.
Feedback is essential to the learner component since otherwise it would never know if the
knowledge structures in the knowledge base were improving or if they were adequate for the
performance of the given tasks. The feedback may be a simple yes or no type of evaluation, or it
114
may contain more useful information describing why a particular action was good or bad. Also ,
the feedback may be completely reliable, providing an accurate assessment of the performance or
it may contain noise, that is the feedback may actually be incorrect some of the time. Intuitively ,
the feedback must be accurate more than 50% of the time; otherwise the system carries useful
information, the learner should also to build up a useful corpus of knowledge quickly. On the
other hand, if the feedback is noisy or unreliable, the learning process may be very slow and the
resultant knowledge incorrect.
115
116
Training sets used in artificial intelligence, machine learning, genetic programming, intelligent
systems, and statistics.
In all these fields, a training set has much the same role and often used in conjunction with a test
set.
Testing set
A test set is a set of data used in various areas of information science to assess the strength and
utility of a predictive relationship.
Moreover, Test sets are used in artificial intelligence, machine learning, genetic programming,
and statistics. In all these fields, a test set has much the same role.
Accuracy of classifier: Supervised learning
In the fields of science, engineering, industry, and statistics. The accuracy of a measurement
system is the degree of closeness of measurements of a quantity to that quantity’s actual (true)
value.
Sensitivity analysis: Supervised learning
Similarly, Local Sensitivity as correlation coefficients and partial derivatives can only use, if the
correlation between input and output is linear.
Regression: Supervised learning
In statistics, regression analysis is a statistical process for estimating the relationships among
variables.
Moreover, It includes many techniques for modeling and analyzing several variables. When the
focus on the relationship between a dependent variable and one or more independent variables.
More specifically, regression analysis helps one understand how the typical value of the
dependent variable (or ‘criterion variable’) changes when any one of the independent variables
varied. Moreover, While the other independent variables held fixed.
Expert systems:
Expert system = knowledge + problem-solving methods. ... A knowledge base that captures
the domain-specific knowledge and an inference engine that consists of algorithms for
manipulating the knowledge represented in the knowledge base to solve a problem presented to
the system.
Expert systems (ES) are one of the prominent research domains of AI. It is introduced by the
researchers at Stanford University, Computer Science Department.
The expert systems are the computer applications developed to solve complex problems in a
particular domain, at the level of extra-ordinary human intelligence and expertise.
117
Knowledge Base
It contains domain-specific and high-quality knowledge. Knowledge is required to exhibit
intelligence. The success of any ES majorly depends upon the collection of highly accurate and
precise knowledge.
118
What is Knowledge?
The data is collection of facts. The information is organized as data and facts about the task
domain. Data, information, and past experience combined together are termed as knowledge.
Components of Knowledge Base
The knowledge base of an ES is a store of both, factual and heuristic knowledge.
Factual Knowledge − It is the information widely accepted by the Knowledge Engineers
and scholars in the task domain.
Heuristic Knowledge − It is about practice, accurate judgement, one’s ability of
evaluation, and guessing.
Knowledge representation
It is the method used to organize and formalize the knowledge in the knowledge base. It is in the
form of IF-THEN-ELSE rules.
Knowledge Acquisition
The success of any expert system majorly depends on the quality, completeness, and accuracy of
the information stored in the knowledge base.
The knowledge base is formed by readings from various experts, scholars, and the Knowledge
Engineers. The knowledge engineer is a person with the qualities of empathy, quick learning,
and case analyzing skills.
He acquires information from subject expert by recording, interviewing, and observing him at
work, etc. He then categorizes and organizes the information in a meaningful way, in the form of
IF-THEN-ELSE rules, to be used by interference machine. The knowledge engineer also
monitors the development of the ES.
Inference Engine
Use of efficient procedures and rules by the Inference Engine is essential in deducting a correct,
flawless solution.
In case of knowledge-based ES, the Inference Engine acquires and manipulates the knowledge
from the knowledge base to arrive at a particular solution.
In case of rule based ES, it −
Applies rules repeatedly to the facts, which are obtained from earlier rule application.
Adds new knowledge into the knowledge base if required.
Resolves rules conflict when multiple rules are applicable to a particular case.
To recommend a solution, the Inference Engine uses the following strategies −
Forward Chaining
Backward Chaining
Forward Chaining
It is a strategy of an expert system to answer the question, “What can happen next?”
Here, the Inference Engine follows the chain of conditions and derivations and finally deduces
the outcome. It considers all the facts and rules, and sorts them before concluding to a solution.
This strategy is followed for working on conclusion, result, or effect. For example, prediction of
share market status as an effect of changes in interest rates.
119
Backward Chaining
With this strategy, an expert system finds out the answer to the question, “Why this happened?”
On the basis of what has already happened, the Inference Engine tries to find out which
conditions could have happened in the past for this result. This strategy is followed for finding
out cause or reason. For example, diagnosis of blood cancer in humans.
User Interface
User interface provides interaction between user of the ES and the ES itself. It is generally
Natural Language Processing so as to be used by the user who is well-versed in the task domain.
The user of the ES need not be necessarily an expert in Artificial Intelligence.
It explains how the ES has arrived at a particular recommendation. The explanation may appear
in the following forms −
Natural language displayed on screen.
Verbal narrations in natural language.
Listing of rule numbers displayed on the screen.
The user interface makes it easy to trace the credibility of the deductions.
Requirements of Efficient ES User Interface
It should help users to accomplish their goals in shortest possible way.
It should be designed to work for user’s existing or desired work practices.
Its technology should be adaptable to user’s requirements; not the other way round.
It should make efficient use of user input.
Expert Systems Limitations
No technology can offer easy and complete solution. Large systems are costly, require
significant development time, and computer resources. ESs have their limitations which include
−
Limitations of the technology
120
121
Expert System.
DEFINITION - An expert system is a computer program that simulates the judgement and
behavior of a human or an organization that has expert knowledge and experience in a particular
field. Typically, such a system contains a knowledge base containing accumulated experience
and a set of rules for applying the knowledge base to each particular situation that is described to
the program. Sophisticated expert systems can be enhanced with additions to the knowledge base
or to the set of rules.
Among the best-known expert systems have been those that play chess and that assist in medical
diagnosis.
122
subfield of artificial intelligence (AI). A wide variety of methods can be used to simulate the
performance of the expert; however, common to most or all are: 1) the creation of a knowledge
base which uses some knowledge representation structure to capture the knowledge of
the Subject Matter Expert (SME); 2) a process of gathering that knowledge from the SME and
codifying it according to the structure, which is called knowledge engineering; and 3) once the
system is developed, it is placed in the same real world problem solving situation as the human
SME, typically as an aid to human workers or as a supplement to some information system.
Expert systems may or may not have learning components.
factors
The MYCIN rule-based expert system introduced a quasi-probabilistic approach called certainty
factors, whose rationale is explained below.
A human, when reasoning, does not always make statements with 100% confidence: he might
venture, "If Fritz is green, then he is probably a frog" (after all, he might be a chameleon). This
type of reasoning can be imitated using numeric values called confidences. For example, if it is
known that Fritz is green, it might be concluded with 0.85 confidence that he is a frog; or, if it is
known that he is a frog, it might be concluded with 0.95 confidence that he hops. These certainty
factor (CF) numbers quantify uncertainty in the degree to which the available evidence supports
a hypothesis. They represent a degree of confirmation, and are not probabilities in a Bayesian
sense. The CF calculus, developed by Shortliffe & Buchanan, increases or decreases the CF
associated with a hypothesis as each new piece of evidence becomes available. It can be mapped
to a probability update, although degrees of confirmation are not expected to obey the laws of
probability. It is important to note, for example, that evidence for hypothesis H may have nothing
to contribute to the degree to which Not_h is confirmed or disconfirmed (e.g., although a fever
lends some support to a diagnosis of infection, fever does not disconfirm alternative hypotheses)
and that the sum of CFs of many competing hypotheses may be greater than one (i.e., many
hypotheses may be well confirmed based on available evidence).
The CF approach to a rule-based expert system design does not have a widespread following, in
part because of the difficulty of meaningfully assigning CFs a priori. (The above example of
green creatures being likely to be frogs is excessively naive.) Alternative approaches to quasi-
probabilistic reasoning in expert systems involve fuzzy logic, which has a firmer mathematical
foundation. Also, rule-engine shells such as Drools and Jess do not support probability
manipulation: they use an alternative mechanism called salience, which is used to prioritize the
order of evaluation of activated rules.
In certain areas, as in the tax-advice scenarios discussed below, probabilistic approaches are not
acceptable. For instance, a 95% probability of being correct means a 5% probability of being
wrong. The rules that are defined in such systems have no exceptions: they are only a means of
achieving software flexibility when external circumstances change frequently. Because rules are
stored as data, the core software does not need to be rebuilt each time changes to federal and
state tax codes are announced.
123
Chaining
Two methods of reasoning when using inference rules are forward chaining and backward
chaining.
Forward chaining starts with the data available and uses the inference rules to extract more data
until a desired goal is reached. An inference engine using forward chaining searches the
inference rules until it finds one in which the if clause is known to be true. It then concludes the
then clause and adds this information to its data. It continues to do this until a goal is reached.
Because the data available determines which inference rules are used, this method is also
classified as data driven.
Backward chaining starts with a list of goals and works backwards to see if there is data which
will allow it to conclude any of these goals. An inference engine using backward chaining would
search the inference rules until it finds one which has a then clause that matches a desired goal. If
the if clause of that inference rule is not known to be true, then it is added to the list of goals.
SW Architecture.
The following general points about expert systems and their architecture have been outlined:
1. The sequence of steps taken to reach a conclusion is dynamically synthesized with each new
case. The sequence is not explicitly programmed at the time that the system is built.
2. Expert systems can process multiple values for any problem parameter. This permits more
than one line of reasoning to be pursued and the results of incomplete (not fully determined)
reasoning to be presented.
End user
There are two styles of user-interface design followed by expert systems. In the original style of
user interaction, the software takes the end-user through an interactive dialog. In the following
example, a backward-chaining system seeks to determine a set of restaurants to recommend:
A. No
124
A. No
A. No
A. Yes
A. Yes
Participants
There are generally three individuals having an interaction in an expert system. Primary among
these is the end-user, the individual who uses the system for its problem solving assistance. In
the construction and maintenance of the system there are two other roles: the problem domain
expert who builds the system and supplies the knowledge base, and a knowledge engineer who
assists the experts in determining the representation of their knowledge, enters this knowledge
into an explanation module and who defines the inference technique required to solve the
problem. Usually the knowledge engineer will represent the problem solving activity in the form
of rules. When these rules are created from domain expertise, the knowledge base stores the rules
of the expert system.
Inference rule
The function of the procedure node interface is to receive information from the procedures
coordinator and create the appropriate procedure call. The ability to call a procedure and receive
information from that procedure can be viewed as simply a generalization of input from the
external world. In some earlier expert systems external information could only be obtained in a
125
predetermined manner, which only allowed certain information to be acquired. Through the
knowledge base, this expert system disclosed in the cross-referenced application can invoke any
procedure allowed on its host system. This makes the expert system useful in a much wider class
of knowledge domains than if it had no external access or only limited external access.
In the area of machine diagnostics using expert systems, particularly self-diagnostic applications,
it is not possible to conclude the current state of "health" of a machine without some information.
The best source of information is the machine itself, for it contains much detailed information
that could not reasonably be provided by the operator.
The knowledge that is represented in the system appears in the rulebase. In the rulebase
described in the cross-referenced applications, there are basically four different types of objects,
with the associated information:
2. Parameters: Place holders for character strings which may be variables that can be inserted
into a class question at the point in the question where the parameter is positioned.
3. Rule nodes: Inferences in the system are made by a tree structure which indicates the
rules or logic mimicking human reasoning. The nodes of these trees are called rule nodes.
There are several different types of rule nodes.
Expert Systems/Shells. The E.S shell simplifies the process of creating a knowledge base. It is
the shell that actually processes the information entered by a user relates it to the concepts
contained in the knowledge base and provides an assessment or solution for a particular problem.
Knowledge Acquisition
Knowledge acquisition is the process used to define the rules and ontologies required for
a knowledge-based system. The phrase was first used in conjunction with expert systems to
describe the initial tasks associated with developing an expert system, namely finding and
interviewing domain experts and capturing their knowledge via rules, objects, and frame-
based ontologies.
126
Expert systems were one of the first successful applications of artificial intelligence technology
to real world business problems. Researchers at Stanford and other AI laboratories worked with
doctors and other highly skilled experts to develop systems that could automate complex tasks
such as medical diagnosis. Until this point computers had mostly been used to automate highly
data intensive tasks but not for complex reasoning. Technologies such as inference
engines allowed developers for the first time to tackle more complex problems.
As expert systems scaled up from demonstration prototypes to industrial strength applications it
was soon realized that the acquisition of domain expert knowledge was one of if not the most
critical task in the knowledge engineering process. This knowledge acquisition process became
an intense area of research on its own. One of the earlier works on the topic used Batesonian
theories of learning to guide the process.
One approach to knowledge acquisition investigated was to use natural language parsing and
generation to facilitate knowledge acquisition. Natural language parsing could be performed on
manuals and other expert documents and an initial first pass at the rules and objects could be
developed automatically. Text generation was also extremely useful in generating explanations
for system behavior. This greatly facilitated the development and maintenance of expert systems.
A more recent approach to knowledge acquisition is a re-use based approach. Knowledge can be
developed in ontologies that conform to standards such as the Web Ontology Language
(OWL). In this way knowledge can be standardized and shared across a broad community of
knowledge workers. One example domain where this approach has been successful
is bioinformatics.
127