Engineering Colleges: Minimum Study Material (MSM) Department of Computer Science and Engineering
Engineering Colleges: Minimum Study Material (MSM) Department of Computer Science and Engineering
Prepared by
Sl.
Name of the Faculty Designation Affiliating College
No.
1 Mr. Balaji Prof SCADCET
4 Mr.K.Sivakumar AP FXEC
Verified by DLI, CLI and Approved by the Centralised Monitoring Team dated 18.12.2015
Copyright © SCAD
1. Kevin Night and Elaine Rich, Nair B., “Artificial Intelligence (SIE)”, McGraw Hill-
2008. (Unit-1,2,4,5).
2. Dan W. Patterson, “Introduction to AI and ES”, Pearson Education, 2007. (Unit-III)
REFERENCES:
1. Peter Jackson, “Introduction to Expert Systems”, 3rd Edition, Pearson Education,
2007.
2. Stuart Russel and Peter Norvig “AI – A Modern Approach”, 2nd Edition, Pearson
Education 2007.
3. Deepak Khemani “Artificial Intelligence”, Tata Mc Graw Hill Education 2013.
4. Nils.J.Nilsson “Principles of Atrificial Intelligence”, Springer –Verlag 1982.
TABLE OF CONTENTS
S.NO TITLE PAGE NO.
1
a Aim and Objective of the subject
2
b Detailed Lesson Plan
5
c UNIT I - INTRODUCTION TO AI PART A
8
d UNIT I - INTRODUCTION TO AI PART B
8
1 Informed search strategies
18
2 Uninformed search strategies
23
3 Problem characteristics
29
4 Types of production system
31
5 Constraint satisfaction problem
33
6 Hill climbing
36
7 Problem solving methods
38
8 Means end analysis
41
e UNIT II REPRESENTATION OF KNOWLEDGE PART A
44
f UNIT II REPRESENTATION OF KNOWLEDGE PART B
44
9 Predicate logic, resolution, unification
53
11 Alpha beta cutoff & minmax algorithm
56
12 Conversion of predicate to causal form
58
13 Approaches to knowledge representation
60
g UNIT III KNOWLEDGE INFERENCE PART A
63
h UNIT III KNOWLEDGE INFERENCE PART B
63
14 Forward and backward chaining
67
15 Production and frame based system
73
16 Fuzzy set and fuzzy logic
77
17 Bayesian network
79
18 Rule based approach & dempster shafer theory
82
i UNIT IV PLANNING AND MACHINE LEARNING PART A
85
j UNIT IV PLANNING AND MACHINE LEARNING PART A
85
19 Advanced plan generation system
88
20 Components of planning system
90
21 Machine learning methods
98
22 Adaptive learning
101
23 Goal stack planning
107
24 Strips problem solving
108
25 Steps in design of learning system
111
k UNIT V EXPERT SYSTEMS PART A
111
l UNIT V EXPERT SYSTEMS PART A
115
26 Expert systems components
119
27 DART
122
28 MYCIN
125
29 Pitfalls in expert system
126
30 Xcon
128
31 Expert system shells
131
32 Knowledge acquisition
132
33 Characteristics, role and advantages of expert system
137
m Industrial / Practical Connectivity of the subject
AIM AND OBJECTIVE OF THE SUBJECT
1
DETAILED LESSON PLAN
Text Book
1. S Kevin Night and Elaine Rich, Nair B., “Artificial Intelligence (SIE)”, McGraw Hill-
2008. (Unit-1,2,4,5). (Copies available in Library : Yes)
2. Dan W. Patterson, “Introduction to AI and ES”, Pearson Education, 2007. (Unit-III)
(Copies available in Library : Yes)
REFERENCES:
1. Peter Jackson, “Introduction to Expert Systems”, 3rd Edition, Pearson Education,
2007. (Copies available in Library : Yes)
2. Stuart Russel and Peter Norvig “AI – A Modern Approach”, 2nd Edition, Pearson
Education 2007. (Copies available in Library : Yes)
3. Deepak Khemani “Artificial Intelligence”, Tata Mc Graw Hill Education 2013.
(Copies available in Library : No)
4. Nils.J.Nilsson “Principles of Atrificial Intelligence”, Springer –Verlag 1982. (Copies
available in Library : Yes)
5. www.nptel.ac.in
Hours
Sl. Cumulative Books
Unit Topic / Portions to be Covered Required
No Hrs Referred
/ Planned
11 2 Game playing 1 11 T1 R3
12 2 Knowledge representation 1 12 T1 T2
2
Hours
Sl. Cumulative Books
Unit Topic / Portions to be Covered Required
No Hrs Referred
/ Planned
21 3 Knowledge representation 1 21 T1
22 3 Production based system 1 22 T1
23 3 Frame based system 1 23 T1 T2
24 3 Inference - Backward chaining 1 24 T1
25 3 Forward chaining 1 25 R3
26 3 Rule value approach, Fuzzy reasoning 1 26 T1 R3
27 3 Certainty factors 1 27 T1 T2
28 3 Bayesian Theory 1 28 T1 T2
29 3 Bayesian Network 1 29 T1 T2
30 3 Dempster - Shafer theory. 1 30 T1 T2
3
Hours
Sl. Cumulative Books
Unit Topic / Portions to be Covered Required
No Hrs Referred
/ Planned
39 5 Expert systems 1 39 T1
40 5 Architecture of expert systems 1 40 R2
41 5 Roles of expert systems 1 41 T1
42 5 Knowledge Acquisition, Meta knowledge 1 42 T1
43 5 Heuristics 1 43 R3
44 5 Typical expert systems - MYCIN 1 44 R3
45 5 DART 1 45 R3
46 5 XCON 1 46 R3
47 5 Expert systems shells. 1 47 T1
4
UNIT I INTRODUCTION TO Al AND PRODUCTION SYSTEMS
5
Artificial Intelligence (AI) is a branch of Science which deals with helping machines
finding solutions to complex problems in a more human-like fashion. This involves
borrowing characteristics from human intelligence, and applying them as algorithms in a
computer friendly way. This word was coined by McCarthy in the year 1956.
A heuristic function is a function that maps from problem state description to measures
desirability, usually represented as number weights. The value of a heuristic function at a
given node in the search process gives a good estimate of that node being on the desired
path to solution. Heuristics which will be used to increase the efficiency of Uniform Cost
Search algorithm.
6
1. Best first search
2. Branch and Bound Search
3. A* Search
4. AO* Search
5. Hill Climbing
6. Constraint satisfaction
7. Means end analysis
10. What are the various problem solving methods?
The various methods used are
Problem graph
Matching
Indexing
Heuristic function
11. Define Production system.
A production system is one that consists of
A set of rules.
One or more knowledge/database.
A control strategy
A rule applier.
Production system also incorporates a family of general production system
interpreters that include
7
c. Isolate and represent the task knowledge that is necessary to solve the
problem.
d. Choose the best problem solving technique and apply it to the particular
problem.
14. What are the requirements of a good control strategy?
A good control strategy should have the following requirement:
The first requirement is that it causes motion. In a game playing program the
pieces move on the board and in the water jug problem water is used to fill
jugs.
Control strategies that do not cause motion will never lead to a solution.
The second requirement is that it is systematic, this is a clear requirement for
it would not be sensible to fill a jug and empty it repeatedly nor in a game
would it be advisable to move a piece round and round the board in a cyclic
way.
The third requirement is it is efficient. It finds a good answer to the problem.
15. What is an AI technique? (June 2013)
An AI technique is a method that exploits knowledge that should be represented in
such a way that
PART- B
8
presuming that such function is efficient. A heuristic function is a function that maps from
problem state descriptions to measure of desirability usually represented as number. The
purpose of heuristic function is to guide the search process in the most profitable directions
by suggesting which path to follow first when more than is available. Generally heuristic
incorporates domain knowledge to improve efficiency over blind search. In AI heuristic has a
general meaning and also a more specialized technical meaning. Generally a term heuristic is
used for any advice that is effective but is not guaranteed to work in every case. For example
in case of travelling sales man (TSP) problem we are using a heuristic to calculate the nearest
neighbour. Heuristic is a method that provides a better guess about the correct choice to make
at any junction that would be achieved by random guessing. This technique is useful in
solving though problems which could not be solved in any other way. Solutions take an
infinite time to compute.
Classifications of heuristic search.
Best First Search
Best first search is an instance of graph search algorithm in which a node is selected
for expansion based o evaluation function f (n). Traditionally, the node which is the lowest
evaluation is selected for the explanation because the evaluation measures distance to the
goal. Best first search can be implemented within general search frame work via a priority
queue, a data structure that will maintain the fringe in ascending order of f values. This search
algorithm serves as combination of depth first and breadth first search algorithm. Best first
search algorithm is often referred greedy algorithm this is because they quickly attack the
most desirable path as soon as its heuristic weight becomes the most desirable.
Concept:
Step 2: Traverse any neighbour of the root node, that is maintaining a least distance
from the root node and insert them in ascending order into the queue.
Step 3: Traverse any neighbour of neighbour of the root node, that is maintaining a
least distance from the root node and insert them in ascending order into the
queue
Step 4: This process will continue until we are getting the goal node
Algorithm:
Step 1: Place the starting node or root node into the queue.
Step 3: If the first element of the queue is our goal node, then stop and return success.
9
Step 4: Else, remove the first element from the queue. Expand it and compute the
estimated goal distance for each child. Place the children in the queue in ascending
order to the goal distance.
Step 5: Go to step-3
Implementation:
Step 1:Consider the node A as our root node. So the first element of the queue is A
whish is not our goal node, so remove it from the queue and find its neighbour that are
to inserted in ascending order.A
Step 2:The neighbours of A are B and C. They will be inserted into the queue in
ascending order. B C A
Step 3:Now B is on the FRONT end of the queue. So calculate the neighbours of B
that are maintaining a least distance from the roof. F E D C B
Step 4:Now the node F is on the FRONT end of the queue. But as it has no further
children, so remove it from the queue and proceed further. E D C B
Step 5:Now E is the FRONT end. So the children of E are J and K. Insert them into
Step 6:Now K is on the FRONT end and as it has no further children, so remove it
and proceed further J D C K
Step 8:Now D is on the FRONT end and calculates the children of D and put it into
the queue. ICD
Step9:Now I is the FRONT node and it has no children. So proceed further after
removing this node from the queue. C I
Step 10:Now C is the FRONT node .So calculate the neighbours of C that are to be
inserted in ascending order into the queue.G H C
Step 11:Now remove G from the queue and calculate its neighbour that is to insert in
ascending order into the queue. M L H G
10
Step12:Now M is the FRONT node of the queue which is our goal node. So stop here
and exit. LHM
Advantage:
Time complexity of Best first search is much less than Breadth first search.
The Best first search allows us to switch between paths by gaining the benefits of both
breadth first and depth first search. Because, depth first is good because a solution can
be found without computing all nodes and Breadth first search is good because it does
not get trapped in dead ends.
Disadvantages:
Branch and Bound is an algorithmic technique which finds the optimal solution by
keeping the best solution found so far. If partial solution can’t improve on the best it is
abandoned, by this method the number of nodes which are explored can also be reduced. It
also deals with the optimization problems over a search that can be presented as the leaves of
the search tree. The usual technique for eliminating the sub trees from the search tree is called
pruning. For Branch and Bound algorithm we will use stack data structure.
Concept:
Step 2: Traverse any neighbour of the root node that is maintaining least distance
from the root node.
Step 3: Traverse any neighbour of the neighbour of the root node that is maintaining
least distance from the root node.
Step 4: This process will continue until we are getting the goal node.
Algorithm:
Step 3: If the top node of the stack is a goal node, then stop and return success.
Step 4: Else POP the node from the stack. Process it and find all its successors. Find
out the path containing all its successors as well as predecessors and then PUSH the
successors which are belonging to the minimum or shortest path.
11
Step 5: Go to step 5.
Step 6: Exit.
Implementation:
Let us take the following example for implementing the Branch and Bound algorithm.
Step 1:
Consider the node A as our root node. Find its successors i.e. B, C, F. Calculate the
distance from the root and PUSH them according to least distance.
AB
Step 2:
C FBA
D:0+5+4 = 9
E: 0+5+6 = 11
A5B 4C
12
Step 3:
CFDB
C: 0+5+4+8 = 17
F: 0+5+4+3 = 12
The least distance is F from D and it is our goal node. So stop and return success.
Step 4:
CFD
Advantages:
As it finds the minimum path instead of finding the minimum successor so there
should not be any repetition.
Disadvantages:
The load balancing aspects for Branch and Bound algorithm make it parallelization
difficult.
The Branch and Bound algorithm is limited to small size network. In the problem of large
networks, where the solution search space grows exponentially with the scale of the network,
the approach becomes relatively prohibitive.
A* SEARCH
A* is a cornerstone name of many AI systems and has been used since it was
developed in 1968 by Peter Hart; Nils Nilsson and Bertram Raphael. It is the combination of
Dijkstra’s algorithm and Best first search. It can be used to solve many kinds of problems. A*
search finds the shortest path through a search space to goal state using heuristic function.
This technique finds minimal cost solutions and is directed to a goal state called A* search. In
A*, the * is written for optimality purpose. The A* algorithm also finds the lowest cost path
between the start and goal state, where changing from one state to another requires some cost.
A* requires heuristic function to evaluate the cost of path that passes through the particular
state. This algorithm is complete if the branching factor is finite and every action has fixed
cost. A* requires heuristic function to evaluate the cost of path that passes through the
particular state. It can be defined by following formula.f (n) + g (n) = h (n)
Where
13
g (n): The actual cost path from the start state to the current state.
h (n): The actual cost path from the current state to goal state.
f (n): The actual cost path from the start state to the goal state.
For the implementation of A* algorithm we will use two arrays namely OPEN and CLOSE.
OPEN:
An array which contains the nodes that has been generated but has not been yet examined.
CLOSE:
Algorithm:
Step 1: Place the starting node into OPEN and find its f (n) value.
Step 2: Remove the node from OPEN, having smallest f (n) value. If it is a goal node
then stop and return success.
Step 3: Else remove the node from OPEN, find all its successors.
Step 4: Find the f (n) value of all successors; place them into OPEN and place the
removed node into CLOSE.
Step 5: Go to Step-2.
Step 6: Exit.
Implementation:
Advantages:
It is the best one from other techniques. It is used to solve very complex problems.
Disadvantages:
This algorithm is complete if the branching factor is finite and every action has fixed
cost.
14
The speed execution of A* search is highly dependant on the accuracy of the heuristic
algorithm that is used to compute h (n).It has complexity problems.
The Depth first search and Breadth first search given earlier for OR trees or graphs
can be easily adopted by AND-OR graph. The main difference lies in the way termination
conditions are determined, since all goals following an AND nodes must be realized; where
as a single goal node following an OR node will do. So for this purpose we are using AO*
algorithm.Like A* algorithm here we will use two arrays and one heuristic function.
OPEN:
It contains the nodes that has been traversed but yet not been marked solvable or
unsolvable.
CLOSE:
Algorithm:
Step 3: Select a node n that is both on OPEN and a member of T0. Remove it from
OPEN and place it in CLOSE
Step 4: If n is the terminal goal node then leveled n as solved and leveled all the
ancestors of n as solved. If the starting node is marked as solved then success and exit.
Step 6: Expand n. Find all its successors and find their h (n) value, push them into
OPEN.
Step 8: Exit.
Implementation:
15
Step 1:
In the above graph, the solvable nodes are A, B, C, D, E, F and the unsolvable nodes are G,
H. Take A as the starting node. So place A into OPEN.
A f A
i.e. OPEN
= CLOSE = (NULL)
Step 2:
The children of A are B and C which are solvable. So place them into OPEN and place A into
the CLOSE.
i.e. OPEN =
CLOSE =
B C
Step 3:
Now process the nodes B and C. The children of B and C are to be placed into OPEN.
Also remove B and C from OPEN and place them into CLOSE.
So OPEN = CLO
SE =
16
G D E A B C
Step 4:
As the nodes G and H are unsolvable, so place them into CLOSE directly and process
the nodes D and E.
A B C G (O) D E H (O)
D E
Step 5:
Now we have been reached at our goal state. So place F into CLOSE.
(O) (O)
A B C G D E H
i.e. CLOSE =
Step 6:
AO* Graph:
17
B C
D E
Advantages:
It is an optimal algorithm.
If traverse according to the ordering of nodes. It can be used for both OR and AND
graph
.Disadvantages:
Sometimes for unsolvable nodes, it can’t find the optimal path. Its complexity is than
other algorithms.
Breadth first search is a general technique of traversing a graph. Breadth first search
may use more memory but will always find the shortest path first. In this type of search the
state space is represented in form of a tree. The solution is obtained by traversing through the
tree. The nodes of the tree represent the start value or starting state, various intermediate
states and the final state. In this search a queue data structure is used and it is level by level
traversal. Breadth first search expands nodes in order of their distance from the root. It is a
path finding algorithm that is capable of always finding the solution if one exists. The
solution which is found is always the optional solution. This task is completed in a very
memory intensive manner. Each node in the search tree is expanded in a breadth wise at each
level.
Concept:
Step 4: This process will continue until we are getting the goal node.
Algorithm:
18
Step 1: Place the root node inside the queue.
Step 3: If the FRONT node of the queue is a goal node then stop and return success.
Step 4: Remove the FRONT node from the queue. Process it and find all its
neighbours that are in ready state then place them inside the queue in any order.
Step 5: Go to Step 3.
Step 6: Exit.
Implementation:
Let us implement the above algorithm of BFS by taking the following suitable example.
Consider the graph in which let us take A as the starting node and F as the goal node (*)
Step 2: Now the queue is not empty and also the FRONT node i.e. A is not our goal node. So
move to step 3.
Step 3: So remove the FRONT node from the queue i.e. A and find the neighbour of A i.e. B
and C BCA
Step 4: Now b is the FRONT node of the queue .So process B and finds the neighbours of B
i.e. D. CDB
DBE
Step 6: Next find out the neighbours of D as D is the FRONT node of the queue
EFD
Step 7: Now E is the front node of the queue. So the neighbour of E is F which is our goal
node.
F E
19
Step 8: Finally F is our goal node which is the FRONT of the queue. So exit.
Advantages:
DFS is also an important type of uniform search. DFS visits all the vertices in the
graph. This type of algorithm always chooses to go deeper into the graph. After DFS visited
all the reachable vertices from a particular sources vertices it chooses one of the remaining
undiscovered vertices and continues the search. DFS reminds the space limitation of breath
first search by always generating next a child of the deepest unexpanded nodded. The data
structure stack or last in first out (LIFO) is used for DFS. One interesting property of DFS is
that, the discover and finish time of each vertex from a parenthesis structure. If we use one
open parenthesis when a vertex is finished then the result is properly nested set of
parenthesis.
Concept:
Step 4: This process will continue until we are getting the goal node.
Algorithm:
Step 3: If the top node of the stack is the goal node, then stop and return success.
Step 4: Else POP the top node from the stack and process it. Find all its neighbours
that are in ready state and PUSH them into the stack in any order.
Step 5: Go to step 3.
20
Step 6: Exit.
Implementation:
Examples of DFS
Consider A as the root node and L as the goal node in the graph figure
Step 2: Now the stack is not empty and A is not our goal node. Hence move to next
step.
Step 3: POP the top node from the stack i.e. A and find the neighbours of A i.e. B and
C.
B C A
Step 4: Now C is top node of the stack. Find its neighbours i.e. F and G.
BFGC
Step 5: Now G is the top node of the stack. Find its neighbour i.e. M
BFMG
Step 6: Now M is the top node and find its neighbour, but there is no neighbours of M
in the
BFM
Step 7: Now F is the top node and its neighbours are K and L. so PUSH them on to
the stack.
BKLF
Step 8: Now L is the top node of the stack, which is our goal node.
21
BKL
Advantages:
BFS
It uses the data structure queue.BFS is complete because it finds the solution if one
exists
BFS takes more space i.e. equivalent to o (b0) where b is the maximum breath exist in a
search tree and d is the maximum depth exit in a search tree. In case of several goals, it finds
the best one.
DFS
It uses the data structure stack. It is not complete because it may take infinite loop to
reach at the goal node. The space complexity is O (d).In case of several goals, it will
terminate the solution in any order.
Greedy Search
This algorithm uses an approach which is quite similar to the best first search
algorithm. It is a simple best first search which reduces the estimated cost of reach the goal.
Basically it takes the closest node that appears to be closest to the goal. This search starts
with the initial matrix and makes very single possible changes then looks at the change it
made to the score. This search then applies the change till the greatest improvement. The
search continues until no further improvement can be made. The greedy search never makes
never makes a lateral move .It uses minimal estimated cost h (n) to the goal state as measure
which decreases the search time but the algorithm is neither complete nor optimal. The main
advantage of this search is that it is simple and finds solution quickly. The disadvantages are
that it is not optimal, susceptible to false start.
22
Figure Greedy Search
Is the problem decomposable into set of independent smaller or easier sub problems?
Can the solution step be ignored or undone?
Is the problem universally predictable?
Is a good solution to the problem obvious without comparison to all the possible
solutions?
Is the knowledge base to be used for solving the problem internally consistent?
Is a large amount of knowledge absolutely required to solve the problem?
Will the solution of the problem required interaction between the computer and the
person?
The above characteristics of a problem are called as 7-problem characteristics under
which the solution must take place.
(x2 + 3x + sin2x.cos2x)dx
23
(1 cos2x)cos2xdx
cos2xdx cos4xdx
START GOAL
C B
A B C
Blocks World
Applying the technique of problem decomposition to this simple blocks lead to the
solution to the problem.
ON(B, C) ON(A, B)
CLEAR(A) ON(A, B)
A
C B
A B C
Decomposable problems can be solved by the Divide and Conquer technique of problem
decomposition.
Consider a mathematical theorem. First we take a lemma and prove it thinking that it is
useful. But later we realize that the lemma is no help. But here there is no problem. The rules
that could have been applied at the outset can still be applied. We can just proceed from the
first place.
24
Example :
The 8 Puzzle. It is a square tray in which are placed 8 square tiles. The remaining 9th
square is uncovered. Each tile has a number in it. A tile that is adjacent to the blank space can
be slid into that space. A game consisting of a starting position and a specified goal position.
The goal is to transform the starting position into the goal position by sliding the tiles around.
Start state
2 8 3
1 6 4
7 5
Goal state.
1 2 3
8 4
7 6 5
The control mechanism must keep track of order in which operations are performed so that
the operations can be undone one at a time if necessary. The control structure for a theorem
prover does not need to record all that information. Consider another problem of playing
chess. Suppose a chess playing program realizes a stupid move after some moves, it cannot
backup and start the game over from that point. It can try a best of the current situation and
go on from there. These problems - theorem proving, 8 puzzle and chess illustrate the
difference between three important classes of problems. Ignorable problems can be solved
using a simple control structure that never backtracks.
A different formulation of the same problem would lead to the problem being characterized
differently.
The recoverability of a problem plays an important role in determining the complexity of the
control structure necessary for its solution. Ignorable problems can be solved using a simple
25
control structure that never backtracks. such a control structure is easy to implement.
Recoverable problems can be solved by a simple technique called backtracking. It can be
implemented by using push down stack., in which the decisions are recorded in case they
need later to be undone. Irrecoverable problems will need to be solved by a system that
expends a great deal of effort making each decision, since the decision must be final. Some
irrecoverable problems can be solved by recoverable style methods used in a planning
process in which an entire sequence of steps is analysed in advance to discover where it will
lead before the first step is actually taken.
Generally,
For certain-outcome problems, planning can used to generate a sequence of operators that
is guaranteed to lead to a solution.
For uncertain-outcome problems, a sequence of generated operators can only have a good
probability of leading to a solution.
Plan revision is made as the plan is carried out and the necessary feedback is provided.
Is a good solution absolute or relative?
Different reasoning paths lead to the answer. It does not matter which path we follow.
Justification
26
4 All men are mortal. Axiom 4
Or
The goal is to find the shortest route that visits each city exactly once.
Generally
Any-path problems can be solved in a reasonable amount of time using heuristics that
suggest good
paths to explore. If the heuristics are not perfect , the search for a solution may not
be as direct as possible.
“The bank president ate a dish of pasta salad with the fork”.
“bank” refers to a financial situation or to a side of a river?
“dish” or “pasta salad” was eaten?
Does “pasta salad” contain pasta, as “dog food” does not contain “dog”?
Which part of the sentence does “with the fork” modify?
What if “with vegetables” is there?
No record of the processing is necessary.
The Water Jug Problem : The path that leads to the goal must be reported.
27
A path-solution problem can be reformulated as a state-solution problem by describing a state
as a partial path to a solution. The question is whether that is natural or not.
Playing Chess
Reading Newspaper
It is useful for computers to be programmed to solve problems in ways that the majority of
people woul not be able to understand. It is possible when human computer interaction is
good.
Chess
28
Satisfie
Problem characteristic Reason
d
Is the problem
No One game have Single solution
decomposable?
Is the problem universe Problem Universe is not predictable as we are not
No
predictable? sure about move of other player(second player)
Is a good solution absolute Relative Solution : once you get one solution you
absolute
or relative? have to find another possible solution to check
which solution is best(i.e low cost).
What is the role of Lot of knowledge helps to constrain the search for
knowledge? a solution.
Conversational
Production system is a mechanism that describes and performs the search process. It consists
of
1. A set of rules.
4. A rule applied.
Consider the water jug problem. Suppose we implemented the simple control strategy of
starting each time at the top of the list of rules and choosing the first applicable one.
2. The second requirement is that it be systematic. The requirement that a control strategy be
systematic corresponds to the need for global motion as well as for
local motion.
We have argued that production systems are a good way to describe the operations that can
be performed in a search for a solution to a problem.
1. Can production systems, like problems, be described by a set of characteristics that shed
some light on how they can easily be implemented?
2. If so, what relationships are there between problem types and the types of production
system best suited to solving the problems?
30
Monotonic Nonmonotonic
Blocks world
Partially commutative Theorem proving
8 puzzle
Not partially
Chemical synthesis Bridge
commutative
The significance of these categories of production systems lies in the relationship between the
categories and approximate implementation strategies.
For any problem, there exists an infinite number of production systems that describe
ways to find solutions. Some will be natural or efficient than others. Any problem that can be
solved by any production system can be solved by a commutative one.
There is a relationship between kinds of problems and the kinds of systems that lend
themselves naturally to describe the problem.
Nonmonotonic partially commutative systems are useful for problems in which changes
occur but can be reversed and in which order of operations is not critical.
A constraint search does not refer to any specific search algorithm but to a layer of
complexity added to existing algorithms that limit the possible solution set. Heuristic and
acquired knowledge can be combined to produce the desired result a constraint satisfaction
problem is a special kind of search problem in which states are defined by the values of a set
of variables and the goal state specifies a set of constraints that the value must obey. There
are many problems in AI in which the goal state is not specified in the problem and it requires
to be discovered according to some specific constraint. Examples of some constraint
satisfaction search include design problem, labeling graphs, robot path planning and
cryptarithmatic problem etc.
31
Algorithm:
Cryptarithmetic puzzles
32
•Variables:F, T, U, W, R, O, X1, X2, X3
•Domains: {0,1,2,3,4,5,6,7,8,9}
•Constraints:
–Alldiff(F,T,U,W,R,O)
–O + O = R + 10 · X1
–X1+ W + W = U + 10 ·X2
–X2+ T + T = O + 10 ·X3
–X3= F, T ≠0, F≠0
6. What are the problems encountered during hill climbing and what are the ways
available to deal with these problems?
Hill climbing search algorithm is simply a loop that continuously moves in the
direction of increasing value. It stops when it reaches a “peak” where no n eighbour has
higher value. This algorithm is considered to be one of the simplest procedures for
implementing heuristic search. The hill climbing comes from that idea if you are trying to
find the top of the hill and you go up direction from where ever you are. This heuristic
combines the advantages of both depth first and breadth first searches into a single method.
The name hill climbing is derived from simulating the situation of a person climbing
the hill. The person will try to move forward in the direction of at the top of the hill. His
movement stops when it reaches at the peak of hill and no peak has higher value of heuristic
function than this. Hill climbing uses knowledge about the local terrain, providing a very
useful and effective heuristic for eliminating much of the unproductive search space. It is a
branch by a local evaluation function. The hill climbing is a variant of generate and test in
which direction the search should proceed. At each point in the search path, a successor node
that appears to reach for exploration.
Algorithm:
33
Step 1: Evaluate the starting state. If it is a goal state then stop and return success.
Step 2: Else, continue with the starting state as considering it as a current state.
Step 3: Continue step-4 until a solution is found i.e. until there are no new states left to be
applied in the current state.
Step 4:
Select a state that has not been yet applied to the current state and apply it to produce a new
state.
Procedure to evaluate a new state.
If the current state is a goal state, then stop and return success.
If it is better than the current state, then make it current state and proceed further.
If it is not better than the current state, then continue in the loop until a solution is
found.
Step 5: Exit.
Advantages:
Hill climbing technique is useful in job shop scheduling, automatic programming, circuit
designing, and vehicle routing and portfolio management. It is also helpful to solve pure
optimization problems where the objective is to find the best state according to the objective
function. It requires much less conditions than other search techniques.
Disadvantages:
The algorithm doesn’t maintain a search tree, so the current node data structure need only
record the state and its objective function value. It assumes that local improvement will lead
to global improvement. There are some reasons by which hill climbing often gets suck which
are stated below.
Local Maxima:
A local maxima is a state that is better than each of its neighbouring states, but not better than
some other states further away. Generally this state is lower than the global maximum. At this
point, one cannot decide easily to move in which direction! This difficulties can be extracted
by the process of backtracking i.e. backtrack to any of one earlier node position and try to go
on a different event direction. To implement this strategy, maintaining in a list of path almost
taken and go back to one of them. If the path was taken that leads to a dead end, then go back
to one of them.
34
Figure Local Maxima
It is a special type of local maxima. It is a simply an area of search space. Ridges result in a
sequence of local maxima that is very difficult to implement ridge itself has a slope which is
difficult to traverse. In this type of situation apply two or more rules before doing the test.
This will correspond to move in several directions at once.
Figure Ridges
Plateau:
It is a flat area of search space in which the neighbouring have same value. So it is very
difficult to calculate the best direction. So to get out of this situation, make a big jump in any
direction, which will help to move in a new direction this is the best way to handle the
problem like plateau.
Figure Plateau
Back track to some earlier nodes and try a different direction. This is a good way of
dealing with local maxim.
Make a big jump an some direction to a new area in the search. This can be done by
applying two more rules of the same rule several times, before testing. This is a
good strategy is dealing with plate and ridges. Hill climbing becomes inefficient in
large problem spaces, and when combinatorial explosion occurs. But it is a useful
when combined with other methods.
Steepest Descent Hill Climbing
This is a variation of simple hill climbing which considers all the moves from the current
state and selects the best one as the next state.
Also known as Gradient search
35
Algorithm
Evaluate the initial state. If it is also a goal state, then return it and quit. Otherwise,
continue with the initial state as the current state.
Loop until a solution is found or until a complete iteration produces no change to current
state:
Let SUCC be a state such that any possible successor of the current state will be better
than SUCC
For each operator that applies to the current state do:
Apply the operator and generate a new state
Evaluate the new state. If is is a goal state, then return it and quit. If not, compare it
to SUCC. If it is better, then set SUCC to this state. If it is not better, leave SUCC
alone.
If the SUCC is better than the current state, then set current state to SYCC,
7. Explain the various problem solving methods. (December 2011)
A problem in AI is defined as a task to be completed.
Most of the problems are complex and cannot be solved by direct techniques.
They can be solved by search strategies.
Every search process can be viewed as a traversal of directed graph in which
each node represents a problem state.
Arcs represent the relationship between the states represented by the node it
connects.
The search process starts with an initial state and ends in a final state leading a
path.
Five important issues have to be considered.
1. The direction to which to conduct the search.
2. The topology of search process.
3. How each node of the search process will be represented?
4. Selecting applicable rules.
5. Using a heuristic function to guide the search.
The various methods used to solve problems are
1. Problem graph
2. Matching
a. indexing
3. Heuristic functions.
1. Problem Graph
Each node of the tree is expanded by the production rules to generate a set of successor
nodes, each of which can in turn be expanded, continuing until a node representing a solution
is found.
Example: Water Jug Problem- two levels of Breadth first search Tree.
36
Implementing this procedure is
Simple
Require little bookkeeping.
Disadvantage
Same path may be generated many times and so processing takes place more than once.
This wastage can be avoided by bookkeeping and elimination of redundant nodes.
1. Examine the set of nodes that have been created so far to see if the new node already
exists.
2. If it doesn’t, add it to the graph just as for a tree.
3. If it is already present, then
a. Set the node that is being expanded to point to already existing node
corresponding to its successor, rather than new one. Throw the new node.
b. If we keep track of the best path to each node, then check if the new path is
best or worse.
i. If worse do nothing.
ii. If best record the new path as correct path and propagate the change in
cost to all its successor nodes.
Uses
1. Matching
It is the process of applying a rule between the current state and precondition of the
rules to determine some identity between them.
1. Indexing
2. Matching with variables.
3. Complex matching
37
4. Filtering.
1. Indexing
Use the current state as an index into rules and select the matching ones
immediately.
There are two problems
i. It requires the use of large number of rules.
ii. It is unobvious that rules preconditions are satisfied by a particular
state.
Instead of using rules, use the current state as index and select the
matching accordingly.
In a game of chess, an index can be assigned to each board position. This
scheme can be used while searching.
Disadvantage
Filtering
The result of matching is a list of rules whose left side have matched the current state
description, along with variable binding that is generated by matching process.
Heuristic Function
A Heuristic function is a function that maps from problem state
descriptions to measures of desirability usually represented as numbers.
A Heuristic is a technique that improves the efficiency of a search process
possibly by sacrificing claims of completeness.
The purpose of heuristic function is to guide the search process in the most
profitable directions suggesting which paths to follow first when more that
one is available.
8. Explain Means- Ends Analysis with an example. (December 2011)
Most of the search strategies either reason forward of backward however, often a
mixture of the two directions is appropriate.
Such mixed strategy would make it possible to solve the major parts of problem
first and solve the smaller problems that arise when combining them together.
38
Such a technique is called "Means - Ends Analysis".
The means -ends analysis process centers around finding the difference between
current state and goal state.
The problem space of means - ends analysis has an initial state and one or more
goal state, a set of operate with a set of preconditions their application and
difference functions that computes the difference between two state a(i) and s(j).
A problem is solved using means - ends analysis by
1. Computing the current state s1 to a goal state s2 and computing their difference
D12.
3. The operator OP is applied if possible. If not the current state is solved a goal is
created and means- ends analysis is applied recursively to reduce the sub goal.
4. If the sub goal is solved state is restored and work resumed on the original
problem.
(The first AI program to use means - ends analysis was the GPS General problem
solver)
EXAMPLE
Problem for household robot: moving desk with 2 things on it from one room to
another.
Main difference between start and goal state is location.
Choose PUSH and CARRY
Move desk with 2 things on it to new room
PushCarryWalkPickupPutdownPlace
Move object * *
Move robot *
Clear object *
Be holding object *
39
THE ROBOT OPERATORS
Operator Preconditions Results
PUSH (obj, loc) at(robot,obj) at(obj, loc)
&large (obj) & at (robot, loc)
&clear (obj) &
arm empty
CARRY (obj, loc) at(robot, obj) &Small at(obj, loc) &at(robot,
(obj) loc)
40
UNIT II REPRESENTATION OF KNOWLEDGE
PART A
41
5. What is meant by alpha beta pruning?What is the difference between declarative
knowledge and procedural knowledge? (May 2014)
42
Dec-11
10. What are the limitations in using propositional logic to represent knowledge
base?
May-11
1) It has limited expressive power
2) It cannot directly represent properties of individuals or relations between
individuals.
3) Generalizations, patterns, regularities cannot easily be represented.
4) Many rules are required to write so as to allow inference.
11. Specify the complexity of expectiminimax.
The Time complexity is O(bmnm)
Where
b is the branching factor
n is the number of chance nodes for each max or min node
m is the maximum depth.
This extra cost can make games of chance very difficult to solve.
12. What is unification algorithm?
In propositional logic it is easy to determine that two literals cannot both be true at the
same time. Simply look for L and ~L . In predicate logic, this matching process is more
complicated, since bindings of variables must be considered.
For example man (john) and man(john) is a contradiction while man (john) and
man(Himalayas) is not. Thus in order to determine contradictions we need a matching
procedure that compares two literals and discovers whether there exist a set of substitutions
that makes them identical. There is a recursive procedure that does this matching . It is called
Unification algorithm.
43
In Unification algorithm each literal is represented as a list, where first element is the name of
a predicate and the remaining elements are arguments. The argument may be a single element
(atom) or may be another list.
13. How can you represent the resolution in predicate logic?
1. Convert all the statements of S to clausal form.
2. Negate P and convert the result to clausal form. Add it to the set of clauses
obtained in I.
3. Repeat until a contradiction is found.
14. List the canonical forms of resolution.
eliminate implications
move negations down to the atomic formulas
purge existential quantifiers
rename variables if necessary
move the universal quantifiers to the left
move the disjunctions down to the literals
eliminate the conjunctions
rename the variables if necessary
purge the universal quantifiers
15. State the use of unification.
Examples: “red” represents colour red, “car1” represents my car , "red(car1)" represents
fact that my car is red.
Assumptions about KR :
44
Intelligent Behavior can be achieved by manipulation of symbol structures.
KR languages are designed to facilitate operations over symbol structures, have
precise syntax and semantics; Syntax tells which expression is legal?
e.g., red1 (car1), red1 car1, car1(red1), red1(car1 & car2) ?; and Semantic tells
what an expression means ?
Logic
Logic is concerned with the truth of statements about the world. Generally each
statement is either TRUE or FALSE. Logic includes: Syntax, Semantics and Inference
Procedure.
1. Syntax:
Specifies the symbols in the language about how they can be combined to form
sentences. The facts about the world are represented as sentences in logic.
2. Semantic:
Specifies how to assign a truth value to a sentence based on its meaning in the world.
It Specifies what facts a sentence refers to. A fact is a claim about the world, and it may be
TRUE or FALSE.
3. Inference Procedure:
Specifies methods for computing new sentences from the existing sentences.
Logic as a KR Language
Logic is a language for reasoning, a collection of rules used while doing logical
reasoning. Logic is studied as KR languages in artificial intelligence. Logic is a formal
system in which the formulas or sentences have true or false values.
(a) Expressive enough to represent important objects and relations in a problem domain.
(b) Efficient enough in reasoning and answering questions about implicit information in a
reasonable amount of time.
Logics are of different types: Propositional logic, Predicate logic, temporal logic, Modal
logic, Description logic etc;
45
Propositional logic and Predicate logic are fundamental to all logic.
Logic Representation
The facts are claims about the world that are True or False.
2. Logic defines ways of putting symbols together so that user can define legal
sentences in the language that represent TRUE facts.
4. Sentences - either TRUE or false but not both are called propositions.
the declarative "snow is white" expresses that snow is white; further, "snow is
white" expresses that snow is white is TRUE.
In propositional logic it is easy to determine that two literals cannot both be true at the
same time. Simply look for L and ~L . In predicate logic, this matching process is more
complicated, since bindings of variables must be considered.
For example man (john) and man(john) is a contradiction while man (john) and
man(Himalayas) is not. Thus in order to determine contradictions we need a matching
procedure that compares two literals and discovers whether there exist a set of substitutions
that makes them identical. There is a recursive procedure that does this matching . It is called
Unification algorithm.
In Unification algorithm each literal is represented as a list, where first element is the name of
a predicate and the remaining elements are arguments. The argument may be a single element
(atom) or may be another list. For example we can have literals as
To unify two literals, first check if their first elements re same. If so proceed. Otherwise they
cannot be unified. For example the literals
46
(try assassinate Marcus Caesar)
Cannot be unified. The unification algorithm recursively matches pairs of elements, one pair
at a time. The matching rules are :
i) Different constants, functions or predicates can not match, whereas identical ones can.
ii) A variable can match another variable, any constant or a function or predicate expression,
subject to the condition that the function or [predicate expression must not contain any
instance of the variable being matched (otherwise it will lead to infinite recursion).
iii) The substitution must be consistent. Substituting y for x now and then z for x later is
inconsistent. (a substitution y for x written as y/x)
The Unification algorithm is listed below as a procedure UNIFY (L1, L2). It returns a list
representing the composition of the substitutions that were performed during the match. An
empty list NIL indicates that a match was found without any substitutions. If the list contains
a single value F, it indicates that the unification procedure failed.
else return F.
(at the end of this procedure , SUBST will contain all the substitutions used to unify L1 and
L2).
i) call UNIFY with the i th element of L1 and I’th element of L2, putting the result in S
47
(A) apply S to the remainder of both L1 and L2
Resolution yields a complete inference algorithm when coupled with any complete search
algorithm. Resolution makes use of the inference rules. Resolution performs deductive
inference. Resolution uses proof by contradiction. One can perform Resolution from a
Knowledge Base. A Knowledge Base is a collection of facts or one can even call it a database
with all facts.
Resolution basically works by using the principle of proof by contradiction. To find the
conclusion we should negate the conclusion. Then the resolution rule is applied to the
resulting clauses.
Each clause that contains complementary literals is resolved to produce a 2new clause, which
can be added to the set of facts (if it is not already present). This process continues until one
of the two things happen. There are no new clauses that can be added. An application of the
resolution rule derives the empty clause An empty clause shows that the negation of the
conclusion is a complete contradiction, hence the negation of the conclusion is invalid or
false or the assertion is completely valid or true.
Steps
2. a → b = ~a v b
3. ~ (a ^ b) = ~ a v ~ b …………DeMorgan’sLaw
4. ~ (a v b) = ~ a ^ ~ b ………... DeMorgan’sLaw
5. ~ (~a) = a
48
Eliminate Existential Quantifier ‘∃’
Here ‘y’ is an independent quantifier so we can replace ‘y’ by any name (say – George Bush).
Example: ∀x : ∃y : father_of(x, y)
To eliminate the Universal Quantifier, drop the prefix in PRENEX NORMAL FORM i.e. just
drop ∀and the sentence then becomes in PRENEX NORMAL FORM.
a ^ b splits the entire clause into two separate clauses i.e. a and b
To eliminate ‘^’ break the clause into two, if you cannot break the clause, distribute the OR
‘v’ and then break the clause.
Problem Statement:
49
i. ∀x : food(x) → likes (Ravi, x)
50
Uses of Resolution in Today’s World
The issues that arise while using KR techniques are many. Some of these are
1. Important Attributes: Any attribute of objects so basic that they occur in almost every
problem domain?
2. Relationship among attributes: Any important relationship that exists among object
attributes?
3. Choosing Granularity : At what level of detail should the knowledge be represented ?
4. Set of objects : How sets of objects be represented ?
5. Finding Right structure : Given a large amount of knowledge stored, how can relevant
parts be accessed ?
1. Important Attributes:
There are two attributes "instance" and "isa", that are of general importance. These
attributes are important because they support property inheritance.
The relationship between the attributes of an object, independent of specific knowledge they
encode, may hold properties like:
51
Inverses, existence in an isa hierarchy, techniques for reasoning about values and single
valued attributes.
1. Inverses : This is about consistency check, while a value is added to one attribute.
b. second, use attributes that focus on a single entity but use them in pairs, one the inverse of
the other; for e.g., one, team = Brooklyn– Dodgers , and the other, team = Pee-Wee-Reese, . .
..
This second approach is followed in semantic net and frame-based systems, accompanied by
a knowledge acquisition tool that guarantees the consistency of inverse slot by checking, each
time a value is added to one attribute then the corresponding value is added to the inverse.
This is about reasoning values of attributes not given explicitly. Several kinds of information
are used in reasoning, like, height : must be in a unit of length, age : of person can not be
greater than the age of person's parents.
Example : A baseball player can at time have only a single height and be a member of only
one team. KR systems take different approaches to provide support for single valued
attributes.
3. Choosing Granularity
52
What level should the knowledge be represented and what are the primitives ?
Should there be a small number or should there be a large number of low-level
primitives or High-level facts.
High-level facts may not be adequate for inference while Low-level primitives may
require a lot of storage.
Example of Granularity :
John spotted Sue. This could be represented as Spotted (agent(John), object (Sue))
4. Set of Objects
Certain properties of objects that is true as member of a set but not as individual.
2. in hierarchical structure where node represent sets, the inheritance propagate set level
assertion down to individual.
The exact implementation of alpha-beta keeps track of the best move for each side as it
moves throughout the tree.
We proceed in the same (preorder) way as for the minimax algorithm. For the MIN
nodes, the score computed starts with +infinity and decreases with time.
For MAX nodes, scores computed starts with –infinity and increase with time.
53
The efficiency of the Alpha-Beta procedure depends on the order in which successors of a
node are examined. If we were lucky, at a MIN node we would always consider the nodes in
order from low to high score and at a MAX node the nodes in order from high to low score.
In general it can be shown that in the most favorable circumstances the alpha-beta search
opens as many leaves as minimax on a game tree with double its depth.
Alpha-Beta algorithm: The algorithm maintains two values, alpha and beta, which
represents the minimum score that the maximizing player is assured of and the maximum
score that the minimizing player is assured of respectively. Initially alpha is negative infinity
and beta is positive infinity. As the recursion progresses the "window" becomes smaller.
When beta becomes less than alpha, it means that the current position cannot be the result of
best play by both players and hence need not be explored further.
if node is a leaf
return beta
return beta
return alpha
return alpha.
The Min-Max algorithm is applied in two player games, such as tic-tac-toe, checkers,
chess, go, and so on.
54
There are two players involved, MAX and MIN. A search tree is generated, depth-first,
starting with the current game position upto the end game position. Then, the final game
position is evaluated from MAX’s point of view, as shown in Figure 1. Afterwards, the inner
node values of the tree are filled bottom-up with the evaluated values. The nodes that belong
to the MAX player receive the maximum value of it’s children. The nodes for the MIN player
will select the minimun value of it’s children.
The values represent how good a game move is. So the MAX player will try to select the
move with highest value in the end. But the MIN player also has something to say about it
and he will try to select the moves that are better to him, thus minimizing MAX’s outcome.
Algorithm
if (GameEnded(game)) {
return EvalGameState(game);
else {
ForEach moves {
return best_move;
55
best_move <- {};
ForEach moves {
return best_move;
}Optimization
1. price
2. Limit the depth of the tree.
Speed up the algorithm
This all means that sometimes the search can be aborted because we find out that the
search subtree won’t lead us to any viable answer. This optimization is known as alpha-beta
cutoffs.
The algorithm Have two values passed around the tree nodes:
The alpha value which holds the best MAX value found;
The beta value which holds the best MIN value found.
At MAX level, before evaluating each child path, compare the returned value with of the
previous path with the beta value. If the value is greater than it abort the search for the current
node;
At MIN level, before evaluating each child path, compare the returned value with of the
previous path with the alpha value. If the value is lesser than it abort the search for the current
node.
1. Eliminate implications
2. Move negations down to the atomic formulas
3. Purge existential quantifiers
4. Rename variables if necessary
5. Move the universal quantifiers to the left
56
6. Move the disjunctions down to the literals
7. Eliminate the conjunctions
8. Rename the variables if necessary
9. Purge the universal quantifiers
Example
“All music lovers who enjoy Bach either dislike Wagner or think that anyone who dislikes
any composer is a philistine''.
"x[musiclover(x) enjoy(x,Bach)
dislike(x,Wagner) ( "y[$z[dislike(y,z)] =>think-philistine(x,y)])]
Conversion
( P) <=> P;
( a b ) <=> a b;
( a b ) <=> a b;
"x P(x) <=> $x P(x); and
$x P(x) <=>"x P(x).
Step 4: Step 3 allows us to move all the quantifiers to the left in Step 4.
57
This is called the prenex normal form.
Now, if existential quantifiers exist within the scope of universal quantifiers, we can't use
merely an object, but rather a function that returns an object. The function will depend on the
univeral quantifier.
"x$y tutor-of(y,x)
gets replaced by
"x tutor-of(S2(x),x).
Step 6: Any variable left must be universally quantified out on the left.
The form is
Step 8: Call each conjunct a separate clause. In order for the entire wff to be true, each clause
must be true separately.
Step 9: Standard apart the variables in the set of clauses generated in 7 and 8. This requires
renaming the variables so that no two clauses make reference to the same variable. All
variables are implicitly universally quantified to the left.
After application to a set of wffs, we end up with a set of clauses each of which is a
disjunction of literals.
A good knowledge representation enables fast and accurate access to knowledge and
understanding of the content.
58
1. Representational Adequacy: The ability to represent all kinds of knowledge that are
needed in that domain.
2. Inferential Adequacy: The ability to manipulate the representational structures to
derive new structure corresponding to new knowledge inferred from old.
3. Inferential Efficiency: The ability to incorporate additional information into the
knowledge structure that can be used to focus the attention of the inference
mechanisms in the most promising direction.
4. Acquisitional Efficiency: The ability to acquire new knowledge using automatic
methods wherever possible rather than reliance on human intervention.
Knowledge Representation Schemes
1. Relational Knowledge:
A statement in which knowledge is specified, but the use to which that knowledge is
to be put is not given.
e.g. laws, people's name; these are facts which can stand alone, not dependent on
other knowledge;
Procedural Knowledge
59
UNIT III KNOWLEDGE INFERENCE
PART A
This system preserves the property that, at any instant of time, a statement is either
believed to be true or false.
1. No monotonic logic
2. Probabilistic reasoning
60
3. Fuzzy logic
4. Truth values.
61
10. State the use of inference engine. (May 2014)
An inference engine interprets and evaluates the facts in the knowledge base in
order to provide an answer
The inference engine enables the expert system to draw deductions from the rules
in the KB
11. Define fuzzy logic.
FL is a problem-solving control system methodology that lends itself to
implementation in systems ranging from simple, small, embedded micro-controllers to large,
networked, multi-channel PC or workstation-based data acquisition and control systems. It
can be implemented in hardware, software, or a combination of both. FL provides a simple
way to arrive at a definite conclusion based upon vague, ambiguous, imprecise, noisy, or
missing input information. FL's approach to control problems mimics how a person would
make decisions, only much faster.
62
1. MD measures the extent to which the evidence
supports the negation else it is zero if the evidence
fails to support the hypothesis.
iii. These two factors define certainty as
1. CF(h,e)=MB(h,e)-MD(h,e)
PART B
1. Explain about forward chaining and backward chaining with proper example.
(June 2013, November 2013)
FORWARD CHAINING
Forward chaining working from the facts to a conclusion. Sometimes called the data driven
approach. To chain forward, match data in working memory against 'conditions‘ of rules in
the rule-base. Starts with the facts, and sees what rules apply (and hence what should be
done) given the facts.
WORKING
Facts are held in a working memory. Condition-action rules represent actions to take when
specified facts occur in working memory. Typically the actions involve adding or deleting
facts from working memory.
STEPS
•To chain forward, match data in working memory against 'conditions' of rules in the
rule base.
Example
63
•Here are two rules:
Repeat
Rule R2
[followed by
sequence of
64
actions]
DELEATE dry, ie
[by R5 ]
BACKWARD CHAINING
Backward chaining: working from the conclusion to the facts. Sometimes called the
goal-driven approach. Starts with something to find out, and looks for rules that will help in
answering it goal driven.
Steps in BC
•To chain backward, match a goal in working memory against 'conclusions' of rules in
the rule-base.
Example
•Same rules:
Prove goal G
Encoding of rules
65
Goal : Should I switch sprinklers on?
Example 1
Example 2
66
Fact F2 : hot [Given]
Example:-
A global database
A set of production rules and
A control system
The goal database is the central data structure used by an AI production system. The
production rules operate on the global database. Each rule has a precondition that is
67
either satisfied or not by the database. If the precondition is satisfied, the rule can be
applied. Application of the rule changes the database. The control system chooses which
applicable rule should be applied and ceases computation when a termination condition
on the database is satisfied. If several rules are to fire at the same time, the control
system resolves the conflicts.
One important disadvantage is the fact that it may be very difficult analyse the
flow of control within a production system because the individual rules don’t call
each other.
Production systems describe the operations that can be performed in a search for a
solution to the problem.
They can be classified as follows.
Monotonic production system :-
A system in which the application of a rule never prevents the later application of
another rule, that could have also been applied at the time the first rule was
selected.
Partially commutative production system:-
68
Production systems that are not partially commutative are useful for many problems
in which irreversible changes occur, such as chemical analysis.
Each frame has its own name and a set of attributes associated with it. Name, weight, height
and age are slots in the frame Person.
Model, processor, memory and price are slots in the frame Computer. Each attribute or slot
has a value attached to it.
Frames provide a natural way for the structured and concise representation of knowledge.
A frame provides a means of organising knowledge in slots to describe various attributes and
characteristics of the object.
EXAMPLE
An object combines both data structure and its behaviour in a single entity.
69
When an object is created in an object-oriented programming language, we first assign a
name to the object, and then determine a set of attributes to describe the object’s
characteristics, and at last write procedures to specify the object’s behaviour.
A knowledge engineer refers to an object as a frame (the term, which has become the
AI jargon).Frames as a knowledge representation technique
Slots are used to store values. A slot may contain a default value or a pointer to another
frame, a set of rules or procedure by which the slot value is obtained.
The frame IBM Aptiva S35 might be a member of the class Computer, which in turn
might belong to the class Hardware.
Slot value.
A slot value can be symbolic, numeric or Boolean. For example, the slot Name has
symbolic values, and the slot Age numeric values. Slot values can be assigned when the
frame is created or during a session with the expert system.
Default slot value. The default value is taken to be true when no evidence to the contrary
has been found. For example, a car frame might have four wheels and a chair frame four legs
as default values in the corresponding slots.
Range of the slot value. The range of the slot value determines whether a particular object
complies with the stereotype requirements defined by the frame. For example, the cost of a
computer might be specified between $750 and $1500.
Procedural information. A slot can have a procedure attached to it, which is executed if the
slot value is changed or needed.
The word frame often has a vague meaning. The frame may refer to a particular
object, for example the computer IBM Aptiva S35, or to a group of similar objects. To be
more precise, we will use the instance-frame when referring to a particular object, and the
class-frame when referring to a group of similar objects.
A class-frame describes a group of objects with common attributes. Animal, person, car and
computer are all class-frames.
70
Each frame “knows” its class.
INSTANCES OF A CLASS
The fundamental idea of inheritance is that attributes of the class-frame represent things that
are typically true for all objects in the class. However, slots in the instance-frames can be
filled with actual data uniquely specified for each instance.
71
2. Aggregation is a-part-of or part-whole relationship in which several subclasses
representing components are associated with a superclass representing a whole. For
example, an engine is a part of a car.
3. Association describes some semantic relationship between different classes which are
unrelated otherwise. For example, Mr Black owns a house, a car and a computer.
Such classes as House, Car and Computer are mutually independent, but they are
linked with the frame Mr Black through the semantic association.
Expert systems are required not only to store the knowledge but also to validate and
manipulate this knowledge. To add actions to our frames, we need methods and demons.
We write a method for a specific attribute to determine the attribute’s value or execute a
series of actions when the attribute’s value changes.
72
3. Explain the need of fuzzy set and fuzzy logic with example. (May-12)
The word "fuzzy" means "vagueness". Fuzziness occurs when the boundary of a piece of
information is not clear-cut.
Fuzzy sets have been introduced by Lotfi A. Zadeh (1965) as an extension of the classical
notion of set.
Classical set theory allows the membership of the elements in the set in binary terms, a
bivalent condition - an element either belongs or does not belong to the set.
Fuzzy set theory permits the gradual assessment of the membership of elements in a set,
described with the aid of a membership function valued in the real unit interval [0, 1].
Fuzzy Set Theory
Fuzzy set theory is an extension of classical set theory where elements have varying
degrees of membership. A logic based on the two truth values, True and False, is sometimes
inadequate when describing human reasoning. Fuzzy logic uses the whole interval between
0 (false) and 1 (true) to describe human reasoning.
Fuzzy logic is derived from fuzzy set theory dealing with reasoning that is approximate
rather than precisely deduced from classical predicate logic.
Fuzzy logic is capable of handling inherently imprecise concepts.
Fuzzy set theory defines Fuzzy Operators on Fuzzy Sets.
Crisp and Non-Crisp Set
The characteristic function μA(x) which has values 0 ('false') and 1 ('true'') only are
crisp sets.
For Non-crisp sets the characteristic function μA(x) can be defined.
The characteristic function μA(x) for the crisp set is generalized for the Non-crisp
sets.
Example:
73
if students 1.8 m tall are to be qualified, then should we exclude a student who is
1/10" less? Or should we exclude a student who is 1" shorter?
Non-Crisp Representation to represent the notion of a tall person
A student of height 1.79m would belong to both tall and not tall sets with a particular
degree of membership.
As the height increases the membership grade within the tall set would increase whilst
the membership grade within the not-tall set would decrease.
Capturing Uncertainty
Instead of avoiding or ignoring uncertainty, Lotfi Zadeh introduced Fuzzy Set theory that
captures uncertainty.
Each membership function maps elements of a given universal base set X , which is
itself a crisp set, into real numbers in [0, 1] .
Example:
74
Fig. 2 Membership function of a Crisp set C and Fuzzy set F
In the case of Crisp Sets the members of a set are : either out of the set, with
membership of degree " 0 ", or in the set, with membership of degree " 1 ",
Therefore, Crisp Sets Є Fuzzy Sets
In other words, Crisp Sets are Special cases of Fuzzy Sets.
Fuzzy Set
A Fuzzy Set is any set that allows its members to have different degree of
membership, called membership function, in the interval [0 , 1].
The value A(x) is the membership grade of the element x in a fuzzy set A.
Set SMALL = {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4}, {6, 0.3}, {7, 0.2},
{8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
Note that a fuzzy set can be defined precisely by associating with each x , its grade of
membership in SMALL.
75
• Definition of Universal Space
Originally the universal space for fuzzy sets in fuzzy logic was defined only on the
integers. Now, the universal space for fuzzy sets and fuzzy relations is defined with
three numbers.
The first two numbers specify the start and end of the universal space, and the third
argument specifies the increment between elements.
This gives the user more flexibility in choosing the universal space.
Fuzzy Membership
The value A(x) is the degree of membership of the element x in a fuzzy set A. The
Graphic Interpretation of fuzzy membership for the fuzzy sets :
The fuzzy set SMALL of small numbers, defined in the universal space
The Set SMALL in set X is : SMALL = FuzzySet {{1, 1 }, {2, 1 }, {3, 0.9}, {4, 0.6}, {5, 0.4},
{6, 0.3}, {7, 0.2}, {8, 0.1}, {9, 0 }, {10, 0 }, {11, 0}, {12, 0}}
76
Graphic Interpretation of Fuzzy Sets EMPTY
An empty set is a set that contains only elements with a grade of membership equal to
0.
Example: Let EMPTY be a set of people, in Minnesota, older than 120.
Knowledge structure:
77
A Bayesian network is a representation of the joint distribution over all the variables
represented by nodes in the graph. Let the variables be X(1), ..., X(n).
Let parents(A) be the parents of the node A. Then the joint distribution for X(1)
through X(n) is represented as the product of the probability distributions
P(Xi|Parents(Xi)) for i = 1 to n. If X has no parents, its probability distribution is
said to be unconditional, otherwise it is conditional.
By the chaining rule of probability, the joint probability of all the nodes in the graph above
is:
Example: P(W∩-R∩S∩C)
= P(W|S,-R)*P(-R|C)*P(S|C)*P(C)
= 0.9*0.2*0.1*0.5 = 0.009
78
Here P(S) = P(S|C) * P(C) + P(S|-C) * P(-C)
P(W)= 0.5985
5. Explain in detail about Ruled based approach and Dempster –Shafer theory.
Rule-Based system architecture consists a set of rules, a set of facts, and an inference
engine.
Types of Rules
Three types of rules are mostly used in the Rule-based production systems.
Example :
Example :
Meta rules
These are rules for making rules. Meta-rules reason about which rules should be
considered for firing.
Example :
79
IF the rules which do not mention the current goal in their premise,
AND there are rules which do mention the current goal in their premise,
Meta-rules specify which rules should be considered and in which order they should be
invoked.
FACTS
Inference Engine
The inference engine uses one of several available forms of inferencing. By inferencing
means the method used in a knowledge-based system to process the stored knowledge
and supplied data to produce correct conclusions.
Example
80
The Dempster-Shafer theory is based on two ideas: the idea of obtaining degrees of
belief for one question from subjective probabilities for a related question, and
Dempster's rule for combining such degrees of belief when they are based on
independent items of evidence.
81
UNIT IV PLANNING AND MACHINE LEARNING
Basic plan generation systems - Strips -Advanced plan generation systems – K strips
-Strategic explanations -Why, Why not and how explanations. Learning- Machine
learning, adaptive Learning.
PART A
1. Triangle tables – provides a way to record the goals that each operator is
expected to satisfy as well as the goals that must be true for it to execute
correctly. If something unexpected happens during the execution of a plan, the
table provides information required to patch the plan.
2. Meta planning – a technique for reasoning not just the problem being solved
but also planning process itself.
3. Macro operators – allow a planner to build new operators that represent
commonly used sequences of operators.
4. Case based planning – Reuse old plans to make new ones.
2. Define plan. (June 2013)
Plan is a way of decomposing a problem into some parts and the various
techniques can be applied on it to handle the various interactions as they are
detected during the problem solving process.
3. What do you mean by goal stack planning? (June 2013)
i. One of the earliest techniques is planning using goal stack.
ii. Problem solver uses single stack that contains
1. sub goals and operators both
2. Sub goals are solved linearly and then finally the conjoined sub
goal is solved.
iii. Plans generated by this method will contain
1. Complete sequence of operations for solving one goal followed
by complete sequence of operations for the next etc.
iv. Problem solver also relies on
1. A database that describes the current situation.
2. Set of operators with precondition, add and delete lists.
4. Define machine learning. (June 2013, November 2013)
Machine learning is a type of artificial intelligence (AI) that provides
computers with the ability to learn without being explicitly programmed.
Machine learning focuses on the development of computer programs that can
teach themselves to grow and change when exposed to new data.
5. What is planning? (November 2013)
82
Planning refers to the use of methods that focus on ways of decomposing
the original problem into appropriate subparts and on ways of recording and
handling interaction among the subparts as they are detected during the problem
solving process.
General rule follows logically from the background of the Cavemen’s usual
cooking process.
10. List the various issues that affect the design of a learning element.
There are several factors affecting the performance. They are,
The planner should be able to represent the states, goals, and actions
The planner should be able to add new actions at any time.
The planner should be able to use divide and conquer method for solving very
big problems
12. What is Q learning?
Q-learning uses temporal differences to estimate the value of Q*(s,a).
83
In Q-learning, the agent maintains a table of Q[S,A], where S is the set of states
and A is the set of actions. Q[s,a] represents its current estimate of Q*(s,a).
An experience s,a,r,s' provides one data point for the value of Q(s,a). The data point is
that the agent received the future value of r+ γV(s'), where V(s') =maxa' Q(s',a'); this
is the actual current reward plus the discounted estimated future value. This new data
point is called a return. The agent can use the temporal difference equation to update
its estimate for Q(s,a):
or, equivalently,
This assumes that α is fixed; if α is varying, there will be a different count for each
state-action pair and the algorithm would also have to keep track of this count.
Data
The experiences that are used to improve performance in the task
Measure of improvement
How the improvement is measured - for example, new skills that
were not present initially, increasing accuracy in prediction, or improved
speed
84
In supervised learning the environment is fully observable where as in
unsupervised learning the environment is partially observable.
PART B
85
Abstraction of Situations - abstrips (Abstraction-Based strips)
Main idea: introduce weights for each literal and consider only the most important ones in
first loop, then refine by considering also literals with second highest weight.
Property Weight
On 4
Clear 3
Holds 2
Ontable 2
Handempty 1
Higher weights indicate more important properties. Means here concretely: first consider only
the property On, in the second loop the properties On and Clear and so on. Use the abstract
plan for a refinement of the more detailed plans. In the last loop all properties have to be
considered.
Algorithm
Assume non-linear planning algorithm NLP with input: partial nonlinear plan & output: total
well-formed conflict-free non-linear plan.
NLPAS (Non-Linear Planning with Abstraction of Situations) calls NLP initially considering
only the most important properties. This plan is refined by taking into account less important
properties.
begin
else
endif
86
return(NLPAS(P’,Index-1))
end
Abstraction certainly plays an important role in human problem solving. Selection of useful
order is not possible.
Abstraction of Operators
begin
if N unsolved
then
else
end if
resolve interactions
87
criticise plan
end do
return(P)
end
Planning refers to the use of methods that focus on ways of decomposing the original
problem into appropriate subparts and on ways of recording and handling interaction among
the subparts as they are detected during the problem solving process.
1. Choosing the best rule to apply next based on the best variable available heuristic
information.
2. Applying the chosen rule to compute the new problem state that arises from its
application.
3. Detecting when a solution has been found.
4. Detecting dead ends so that they can be abandoned and the system’s effort directed in
correct direction.
5. Repairing an almost correct solution.
1. Choosing rules to apply:
First isolate a set of differences between the desired goal state and the current state.
Detect rules that are relevant to reduce the differences.
If several rules are found, a variety of heuristic information can be exploited to choose
among them.
2. Applying rules:
Each rule specifies the problem state that would result from its application.
We must be able to deal with rules that specify only a small part of the complete problem
state.
1. Describe, for each action, each of the changes it makes the state description.
88
A state was described by a set of predicates representing the facts that were true in that state.
Each state is represented as a predicate.
The manipulation of the state description is done using a resolution theorem prover.
3. Detecting a solution
A planning system has succeeded in finding a solution to a problem when it has found
a sequence of operators that transforms the initial problem state into the goal state.
Any of the corresponding reasoning mechanisms could be used to discover when a solution
has been found.
If the search process is reasoning forward from the initial state, it can prune any path
that leads to a state from which the goal state cannot be reached.
If the search process is reasoning backward from the goal state, it can also terminate a
path either because it is sure that the initial state cannot be reached or little progress is being
made.
In reasoning backward, each goal is decomposed into sub goals. Each of them may
lead to a set of additional sub goals. Sometimes it is easy to detect that there is now way that
the entire sub goals in a given set can be satisfied at once. Other paths can be pruned because
they lead nowhere.
Solve the sub problems separately and then combine the solution to yield a correct
solution. But it leads to wasted effort.
The other way is to look at the situations that result when the sequence of operations
corresponding to the proposed solution is executed and to compare that situation to the
desired goal. The difference between the initial state and goal state is small. Now the problem
solving can be called again and asked to find a way of eliminating a new difference. The first
solution can then be combined with second one to form a solution to the original problem.
89
It can be applied in a variety of ways.
1. Role learning
2. Learning by taking advice.
3. Learning by parameter adjustment.
4. Learning with macro-operators.
5. Chunking
6. Explanation based learning
7. Clustering
8. Correcting mistakes
9. Recording cases.
10. Managing multiple models.
11. Back propagation.
12. Reinforcement learning.
13. Genetic algorithms
1. Role learning
Rote learning is the basic learning activity. It is also called memorization because
the knowledge, without any modification is, simply copied into the knowledge base. As
computed values are stored, this technique can save a significant amount of time.
Rote learning technique can also be used in complex learning systems provided
sophisticated techniques are employed to use the stored values faster and there is a
generalization to keep the number of stored information down to a manageable level.
Checkers-playing program, for example, uses this technique to learn the board positions it
evaluates in its look-ahead search.
Capabilities are
90
The advice may come from many sources: human experts, internet to name a few.
This type of learning requires more inference than rote learning. The knowledge must be
transformed into an operational form before stored in the knowledge base. Moreover the
reliability of the source of knowledge should be considered.
The system should ensure that the new knowledge is conflicting with the existing knowledge.
FOO (First Operational Operationaliser), for example, is a learning system which is used to
learn the game of Hearts. It converts the advice which is in the form of principles, problems,
and methods into effective executable (LISP) procedures (or knowledge). Now this
knowledge is ready to use.
3. Learning by parameter adjustment
Here the learning system relies on evaluation procedure that combines information
from several sources into a single summary static. For example, the factors such as demand
and production capacity may be combined into a single score to indicate the chance for
increase of production. But it is difficult to know a priori how much weight should be
attached to each factor.
The correct weight can be found by taking some estimate of the correct settings and then
allow the program modify its settings based on its experience. This type of learning systems
is useful when little knowledge is available. In game programs, for example, the factors such
as piece advantage and mobility are combined into a single score to decide whether a
particular board position is desirable. This single score is nothing but a knowledge which the
program gathered by means of calculation.
4. Learning with Macro-operators
Sequence of actions that can be treated as a whole are called macro-operators. Once a
problem is solved, the learning component takes the computed plan and stores it as a macro-
operator. The preconditions are the initial conditions of the problem just solved, and its post
conditions correspond to the goal just achieved.
The problem solver efficiently uses the knowledge base it gained from its previous
experiences. By generalizing macro-operators the problem solver can even solve different
problems. Generalization is done by replacing all the constants in the macro-operators with
variables.The STRIPS, for example, is a planning algorithm that employed macro-operators
in it’s learning phase. It builds a macro operator MACROP, which contains preconditions,
post-conditions and the sequence of actions. The macro operator will be used in the future
operation.
5. Chunking
Chunking is similar to learning with macro-operators. Generally, it is used by
problem solver systems that make use of production systems.
A production system consists of a set of rules that are in if-then form. That is given a
particular situation, what are the actions to be performed. For example, if it is raining then
take umbrella.
Production system also contains knowledge base, control strategy and a rule applier. To solve
a problem, a system will compare the present situation with the left hand side of the rules. If
there is a match then the system will perform the actions described in the right hand side of
the corresponding rule.
91
Problem solvers solve problems by applying the rules. Some of these rules may be more
useful than others and the results are stored as a chunk. Chunking can be used to learn general
search control knowledge. Several chunks may encode a single macro-operator and one
chunk may participate in a number of macro sequences. Chunks learned in the beginning of
problem solving, may be used in the later stage. The system keeps the chunk to use it in
solving other problems.
Soar is a general cognitive architecture for developing intelligent systems. Soar
requires knowledge to solve various problems. It acquires knowledge using chunking
mechanism. The system learns reflexively when impasses have been resolved. An impasse
arises when the system does not have sufficient knowledge. Consequently, Soar chooses a
new problem space (set of states and the operators that manipulate the states) in a bid to
resolve the impasse. While resolving the impasse, the individual steps of the task plan are
grouped into larger steps known as chunks. The chunks decrease the problem space search
and so increase the efficiency of performing the task.
In Soar, the knowledge is stored in long-term memory. Soar uses the chunking mechanism to
create productions that are stored in long-term memory. A chunk is nothing but a large
production that does the work of an entire sequence of smaller ones. The productions have a
set of conditions or patterns to be matched to working memory which consists of current
goals, problem spaces, states and operators and a set of actions to perform when the
production fires. Chunks are generalized before storing. When the same impasse occurs
again, the chunks so collected can be used to resolve it.
6. Explanation based learning
An Explanation-based Learning (EBL) system accepts an example (i.e. a training
example) and explains what it learns from the example. The EBL system takes only the
relevant aspects of the training. This explanation is translated into particular form that a
problem solving program can understand. The explanation is generalized so that it can be
used to solve other problems.
PRODIGY is a system that integrates problem solving, planning, and learning methods in a
single architecture. It was originally conceived by Jaime Carbonell and Steven Minton, as an
AI system to test and develop ideas on the role that machine learning plays in planning and
problem solving. PRODIGY uses the EBL to acquire control rules.
The EBL module uses the results from the problem-solving trace (ie. Steps in solving
problems) that were generated by the central problem solver (a search engine that searches
over a problem space). It constructs explanations using an axiomatized theory that describes
both the domain and the architecture of the problem solver. The results are then translated as
control rules and added to the knowledge base. The control knowledge that contains control
rules is used to guide the search process effectively.
7. Clustering
Discovery is a restricted form of learning. The knowledge acquisition is done without
getting any assistance from a teacher. Discovery Learning is an inquiry-based learning
method.
92
In discovery learning, the learner uses his own experience and prior knowledge to discover
the truths that are to be learned. The learner constructs his own knowledge by experimenting
with a domain, and inferring rules from the results of these experiments. In addition to
domain information the learner need some support in choosing and interpreting the
information to build his knowledge base.
A cluster is a collection of objects which are similar in some way. Clustering groups data
items into similarity classes. The properties of these classes can then be used to understand
problem characteristics or to find similar groups of data items. Clustering can be defined as
the process of reducing a large set of unlabeled data to manageable piles consisting of similar
items. The similarity measures depend on the assumptions and desired usage one brings to
the data.
Clustering begins by doing feature extraction on data items and measure the values of the
chosen feature set. Then the clustering model selects and compares two sets of data items and
outputs the similarity measure between them. Clustering algorithms that use particular
similarity measures as subroutines are employed to produce clusters.
The clustering algorithms are generally classified as Exclusive Clustering, Overlapping
Clustering, Hierarchical Clustering and Probabilistic Clustering. The selection of clustering
algorithms depends on various criteria such as time and space complexity. The results are
checked to see if they meet the standard otherwise some or all of the above steps have to be
repeated.
Some of the applications of clustering are data compression, hypothesis generation and
hypothesis testing. The conceptual clustering system accepts a set of object descriptions in
the form of events, observations, facts and then produces a classification scheme over the
observations.
COBWEB is an incremental conceptual clustering system. It incrementally adds the objects
into a classification tree. The attractive feature of incremental systems is that the knowledge
is updated with each new observation. In COBWEB system, learning is incremental and the
knowledge it learned in the form of classification trees increase the inference abilities.
8. Correcting mistakes
While learning new things, there is a possibility that the learning system may make
mistakes. Like human beings, learning system can correct itself by identifying reasons for its
failure, isolate it, explain how the particular assumption causes failure, and modifies its
knowledge base. For example, while playing chess a learning system may make a wrong
move and ends up with failure. Now the learning system thinks of the reasons for the failure
and corrects its knowledge base. Therefore when it plays again it will not repeat the same
mistake.
In his work, Active Learning with Multiple Views, Ion Muslea has used this technique to
label the data. He develops a technique known as Co-EMT which is a combination of two
techniques: Co-testing and Co-EM. The Co-testing method interacts with the user to label the
data. If it does any mistake in labeling, it learns from the mistakes and improves. After
learning, the system labels the unlabelled data extracted from a source efficiently. The
labeled data constitutes what is called knowledge.
93
9. Recording cases.
A program that learns by recording cases generally makes use of consistency
heuristic. According to consistency heuristic, a property of something can be guessed by
finding the most similar cases from a given set of cases. For example, a computer is given the
images of different types of insects, birds, and animals. If the computer is asked to identify a
living thing which is not in the recorded list, it will compare the given image with already
recorded ones, and will at least tell whether the given image is insect, bird or animal.Learning
by recoding cases technique is mainly used in natural language learning tasks.
During the training phase, a set of cases that describe ambiguity resolution episodes for a
particular problem in text analysis is collected. Each case contains a set of features or
attribute-value pairs that encode the context in which the ambiguity was encountered.
Moreover, each case is annotated with solution features that explain how the ambiguity was
resolved in the current example. The cases which are created are then stored in a case base.
Once the training is over, the system can use the case base to resolve ambiguities in new
sentences. This way, the system acquires the linguistic knowledge.
A version space description consists of two trees which are complement to each other: one
represents general model and the other represents specific model. Both positive and negative
examples are used to get the two set of models converge on one just-right model. That is,
positive examples generalize specific models and negative examples specialize general
models. Ultimately, a correct model that matches only the observed positive examples is
obtained. Query By Committee (QBC) is an algorithm which implements this technique in
order to acquire knowledge.
Initially, a weight is assigned at random to each link in order to determine the strength
of one node’s influence on the other. When the sum of input values reaches a threshold value,
the node will produce the output 1 or 0 otherwise. By adjusting theweights the desired output
can be obtained. This training process makes the network learn. The network, in other words,
acquires knowledge in much the same way human brains acquire namely learning from
94
experience. Backpropagation is one of the powerful artificial neural network technique which
is used acquire knowledge automatically.
Backpropagation method is the basis for training a supervised neural network. The output is a
real value which lies between 0 and 1 based on the sigmoid function. The formula for the
output is,
Output = 1 / (1+e-sum).
As the sum increases, the output approaches 1. As the sum decreases, the output approaches
0.
A Multilayer Network
As the name implies, there is a backward pass of error to each internal node within the
network, which is then used to calculate weight gradients for that node. The network learns
by alternately propagating forward the activations and propagating backward the errors as
and when they occur.
Back propagation network can deal with various types of data. It also has the ability to model
a complex decision system. If the problem domain involves large amount of data, the network
will require having more input units or hidden units. Consequently this will increase the
complexity of the model and also increase its computational complexity. Moreover it takes
more time in solving complex problems. In order to overcome this problem multi-back
propagation network is used.
In the beginning of the training process, weights are assigned to the connections at random.
The training process is iterative. The training cases are given to the network iteratively. The
95
actual outputs are compared with desired outputs and the errors are calculated. The errors are
then propagated back to the network in order to adjust the weights. The training process
repeats till the desired outputs are obtained or the error reaches an acceptable level.
APPLICATION OF BACKPROPAGATION
The symbol grounding problem is nothing but connecting meaning to the symbols or images
of objects received from input stimuli. The model uses two learning algorithms: Kohonen
Self-Organizing Feature Map and backpropagation algorithm.
To perform the tasks, it has two modules and a retina for input. The first module
uses Kohonen Self-Organizing Feature Map and categorizes the images projected on the
retina. It expresses the intrinsic order of the stimulus set in a bi-dimensional matrix known as
activation matrix.
The second module relates visual input, symbolic input and output stimuli. Now it
uses backpropagation algorithm for learning. The error in the output is computed with respect
to the training set and is sent back to input unit and the weight distribution is corrected in
order to get the correct output. In this process, the symbols which are grounded constitute the
knowledge. The knowledge so acquired is then used to generate and describe new meanings
for the symbols.
96
negative value (i.e. punishment). The learning system which gets the punishment has to
improve itself. Thus it is a trial and error process. The reinforcement learning
algorithms selectively retain the outputs that maximize the received reward over time. To
accumulate a lot of rewards, the learning system must prefer the best experienced actions;
however, it has to try new actions in order to discover better action selections for the future.
Genetic algorithms are so powerful that they can exhibit more efficiency if programmed
perfectly. Applications include learning Robot behavior, molecular structure optimization,
automated design of mechatronic systems, and electronic circuit design.
97
intelligent diagnosis system. The knowledge so stored is known as diagnosis knowledge as it
s used to detect what went wrong and to decide the course of action in order to make the
robot perfect.
Adaptive learning
Adaptive learning has been partially driven by a realization that tailored learning
cannot be achieved on large-scale using traditional, non-adaptive approaches. Adaptive
learning systems endeavor to transform the learner from passive receptor of information to
collaborator in the educational process. Adaptive learning systems' primary application is in
education, but another popular application is business training. They have been designed as
desktop computer applications, web applications, and are now being introduced into overall
curricula.
Adaptive learning has been implemented in several kinds of educational systems such
as adaptive educational hypermedia, intelligent tutoring systems, Computerized adaptive
testing, and computer-based pedagogical agents, among others.
Adaptive learning systems have traditionally been divided into separate components
or 'models'. While different model groups have been presented, most systems include some
or all of the following models (occasionally with different names):
Expert model - The model with the information which is to be taught
Student model - The model which tracks and learns about the student
Instructional model - The model which actually conveys the information
Instructional environment - The user interface for interacting with the system
Expert model
The expert model stores information about the material which is being taught. This
can be as simple as the solutions for the question set but it can also include lessons and
tutorials and, in more sophisticated systems, even expert methodologies to illustrate
approaches to the questions.
Adaptive learning systems which do not include an expert model will typically incorporate
these functions in the instructional model.
98
Student model
Student model algorithms have been a rich research area over the past twenty years.
The simplest means of determining a student's skill level is the method employed in CAT
(Computerized adaptive testing). In CAT, the subject is presented with questions that are
selected based on their level of difficulty in relation to the presumed skill level of the subject.
As the test proceeds, the computer adjusts the subject's score based on their answers,
continuously fine-tuning the score by selecting questions from a narrower range of difficulty.
An algorithm for a CAT-style assessment is simple to implement. A large pool of
questions is amassed and rated according to difficulty, through expert analysis,
experimentation, or a combination of the two. The computer then performs what is essentially
a binary search, always giving the subject a question which is half way between what the
computer has already determined to be the subject's maximum and minimum possible skill
levels. These levels are then adjusted to the level of the difficulty of the question, reassigning
the minimum if the subject answered correctly, and the maximum if the subject answered
incorrectly. Obviously, a certain margin for error has to be built in to allow for scenarios
where the subject's answer is not indicative of their true skill level but simply coincidental.
Asking multiple questions from one level of difficulty greatly reduces the probability of a
misleading answer, and allowing the range to grow beyond the assumed skill level can
compensate for possible misevaluations.
Richer student model algorithms look to determine causality and provide a more
extensive diagnosis of student's weaknesses by linking 'concepts' to questions and defining
strengths and weaknesses in terms of concepts rather than simple 'levels' of ability. Because
multiple concepts can influence a single question, questions have to be linked to all relevant
concepts. For example, a matrix can list binary values (or even scores) for the intersection of
every concept and every question. Then, conditional probability values have to be calculated
to reflect the likelihood that a student who is weak in a particular concept will fail to correctly
answer a particular question. A student takes a test, the probabilities of weakness in all
concepts conditional on incorrect answers in all questions can be calculated using Baye’s
Law (these adaptive learning methods are often called bayesian algorithms).
A further extension of identifying weaknesses in terms of concepts is to program the student
model to analyze incorrect answers. This is especially applicable for multiple choice
questions. Consider the following example:
Q. Simplify:
a) Can't be simplified
b)
c) ...
d) ...
Clearly, a student who answers (b) is adding the exponents and failing to grasp the concept of
like terms. In this case, the incorrect answer provides additional insight beyond the simple
fact that it is incorrect.
99
Instructional model
The instructional model generally looks to incorporate the best educational tools that
technology has to offer (such as multimedia presentations) with expert teacher advice for
presentation methods. The level of sophistication of the instructional model depends greatly
on the level of sophistication of the student model. In a CAT-style student model, the
instructional model will simply rank lessons in correspondence with the ranks for the
question pool. When the student's level has been satisfactorily determined, the instructional
model provides the appropriate lesson. The more advanced student models which assess
based on concepts need an instructional model which organizes its lessons by concept as
well. The instructional model can be designed to analyze the collection of weaknesses and
tailor a lesson plan accordingly.
When the incorrect answers are being evaluated by the student model, some systems
look to provide feedback to the actual questions in the form of 'hints'. As the student makes
mistakes, useful suggestions pop up such as "look carefully at the sign of the number". This
too can fall in the domain of the instructional model, with generic concept-based hints being
offered based on concept weaknesses, or the hints can be question-specific in which case the
student, instructional, and expert models all overlap.
Implementation
Classroom Implementation
Adaptive learning that is implemented in the classroom environment using
information technology is often referred to as an Intelligent Tutoring System or an Adaptive
Learning System. Intelligent Tutoring Systems operate on three basic principles:
Systems need to be able to dynamically adapt to the skills and abilities of a student.
Environments utilize cognitive modeling to provide feedback to the student while assessing
student abilities and adapting the curriculum based upon past student performance.
Inductive logic programming (ILP) is a way to bring together inductive learning and logic
programming to an Adaptive Learning System. Systems using ILP are able to create
hypothesis from examples demonstrated to it by the programmer or educator and then use
those experiences to develop new knowledge to guide the student down paths to correct
answers.
Systems must have the ability to be flexible and allow for easy addition of new content.
School districts have specific curriculum that the system needs to utilize to be effective for
the district. Algorithms and cognitive models should be broad enough to teach mathematics,
science, and language.
100
Many educators and domain experts are not skilled in programming or simply do not have
enough time to demonstrate complex examples to the system so it should adapt to the abilities
of educators.
101
5. Solve the following by goal stack planning. (June 2013)
B C B
A C D Goal A D
Gn
Bottom G1 G2 … G4
At each step of problem solving process, the top goal on the stack is pursued.
Find an operator that satisfies sub goal G1 (makes it true) and replace G1 by the
operator.
o If more than one operator satisfies the sub goal then apply some heuristic to
choose one.
In order to execute the top most operation, its preconditions are added onto the stack.
o Once preconditions of an operator are satisfied, then we are guaranteed that
operator can be applied to produce a new state.
o New state is obtained by using ADD and DELETE lists of an operator to the
existing database.
102
Problem solver keeps track of operators applied.
o This process is continued till the goal stack is empty and problem solver
returns the plan of the problem.
Example
ON(C, A)
ON(B, D)
S (C, A)
ON(B, D)
ON(C, A) ON(B, D) TSUBG
S(C, A) can be applied if its preconditions are true. So add its preconditions on the
stack.
Goal Stack:
CL(A)
103
HOLD(C) Preconditions of STACK
CL(A) HOLD(C)
S (C, A) Operator
ON(B, D)
AE
ON(B, A) CL(B) AE
US(B, A) Operator
HOLD(C)
CL(A) ) HOLD(C)
S (C, A) Operator
ON(B, D)
ON(B, A), CL(B) and AE are all true in initial state, so pop these along with its
compound goal.
Next pop top operator US(B, A) and produce new state by using its ADD and
DELETE lists.
Add US(B, A) in a queue of sequence ofoperators.
SQUEUE = US (B, A)
State_1:
Goal Stack:
HOLD(C)
CL(A) ) HOLD(C)
S (C, A) Operator
104
ON(B, D)
To satisfy the goal HOLD(C), two operators can be used e.g., PU(C ) or US(C,
X), where X could be any block. Let us choose PU(C ) and proceed further.
Repeat the process. Change in states is shown below.
State_1:
SQUEUE = US (B, A)
State_3:
State_4:
The Goal stack method is not efficient for difficult problems such as Sussman
anomaly problem.
It fails to find good solution.
Consider the Sussman anomaly problem
105
US(C, A) , PD(C), PU(A) and S(A, B)
2. PD (C)
106
3. PU(A)
4. S(A, B)
5. US(A, B)
6. PD(A)
7. PU(B)
8. S(B, C)
9. PU(A)
10. S(A, B)
Although this plan will achieve the desired goal, but it is not efficient.
In order to get efficient plan, either repair this plan or use some other method.
Repairing is done by looking at places where operations are done and undone
immediately, such as S(A, B) and US(A, B).
By removing them, we get
1. US(C, A)
2. PD (C)
3. PU(B)
4. S(B, C)
5. PU(A)
6. S(A, B)
6. Solve the blocks world problem using strips. How does it act as a planning system?
( May 2014)
Block World Problem
107
o Pick up X from its current position on block Y. The arm must be empty and X
has no block on top of it.
STACK (X, Y): [S (X, Y)]
o Place block X on block Y. Arm must holding X and the top of Y is clear.
PICKUP (X): [PU (X) ]
o Pick up X from the table and hold it. Initially the arm must be empty and top
of X is clear.
PUTDOWN (X): [PD (X)]
o Put block X down on the table. The arm must have been holding block X.
Predicates used to describe the state
o ON(X, Y) - Block X on block Y.
o ONT(X) - Block X on the table.
o CL(X) - Top of X clear.
o HOLD(X) - Robot-Arm holding X.
o AE - Robot-arm empty.
Logical statements true in this block world.
o Holding X means, arm is not empty
( X) HOLD (X) ~ AE
o X is on a table means that X is not on the top of any block
( X) ONT (X) ~ ( Y) ON (X, Y)
o Any block with no block on has clear top
( X) (~ ( Y) ON (Y,X)) CL (X)
Effect of Unstack operation
The effect of US(X, Y) is described by the following axiom
[CL(X, State) ON(X, Y, State)]
[HOLD(X, DO(US (X, Y), State)) CL(Y, DO(US(X, Y),
State)) ]
o DO is a function that generates a new state as a result of given action and a
state.
For each operator, set of rules (called frame axioms) are defined where the
components of the state are
o affected by an operator
If US(A, B) is executed in state S0, then we can infer that HOLD (A,
S1) CLEAR (B, S1) holds true, where S1 is new state after Unstack
operation is executed.
o not affected by an operator
If US(A, B) is executed in state S0, B in S1 is still on the table but we can’t derive it.
So frame rule stating this fact is defined as ONT(Z, S) ONT(Z, DO(US (A, B), S))
Advantage of this approach is that
o simple mechanism of resolution can perform all the operations that are
required on the state descriptions.
Disadvantage is that
o number of axioms becomes very large for complex problem such as COLOR
of block also does not change.
o So we have to specify rule for each attribute.
COLOR(X, red, S)
COLOR(X, red, DO(US(Y, Z), s))
108
To handle complex problem domain, there is a need of mechanism that does not
require large number of explicit frame axioms.
The figure shown above is a typical learning system model. It consists of the following
components.
1. Learning element.
2. Knowledge base.
3. Performance element.
4. Feedback element.
5. Standard system.
1. Learning element
It receives and processes the input obtained from a person ( i.e. a teacher), from
reference material like magazines, journals, etc, or from the environment at large.
2. Knowledge base
This is somewhat similar to the database. Initially it may contain some basic
knowledge. Thereafter it receives more knowledge which may be new and so be
added as it is or it may replace the existing knowledge.
3. Performance element
It uses the updated knowledge base to perform some tasks or solves some problems
and produces the corresponding output.
4. Feedback element
109
It is receiving the two inputs, one from learning element and one from standard (or
idealized) system. This is to identify the differences between the two inputs. The feedback is
used to determine what should be done in order to produce the correct output.
5. Standard system
It is a trained person or a computer program that is able to produce the correct
output. In order to check whether the machine learning system has learned well, the same
input is given to the standard system. The outputs of standard system and that of performance
element are given as inputs to the feedback element for the comparison. Standard system is
also called idealized system.
The sequence of operations described above may be repeated until the system gets the desired
perfection.
There are several factors affecting the performance.
They are,
Types of training provided.
The form and extent of any initial background knowledge.
The type of feedback provided.
The learning algorithms used.
Training is the process of making the system able to learn. It may consist of randomly
selected examples that include a variety of facts and details including irrelevant data. The
learning techniques can be characterized as a search through a space of possible hypotheses
or solutions. Background knowledge can be used to make learning more efficient by reducing
the search space. The feedback may be a simple yes or no type of evaluation or it may
contain useful information describing why a particular action was good or bad. If the
feedback is always reliable and carries useful information, the learning process will be faster
and the resultant knowledge will be correct.
The success of machine learning system also depends on the algorithms. These
algorithms control the search to find and build the knowledge structures. The algorithms
should extract useful information from training examples.
110
UNIT V EXPERT SYSTEMS
PART A
1. What are expert systems?
An expert system is an interactive computer-based decision tool that uses both facts
and heuristics to solve difficult decision making problems, based on knowledge acquired
from an expert.
An expert system is a model and associated procedure that exhibits, within a specific
domain, a degree of expertise in problem solving that is comparable to that of a human
expert.
First expert system, called DENDRAL, was developed in the early 70's at Stanford
University.
Knowledge and processing are combined in Knowledge database and the processing
one programme. mechanism (inference) are two different
components.
Programme does not make errors (only The ES programme may be make a
mistake.
Programming error).
Usually it will not explain why the data Explanation is part of an ES component
needs to be input or how the decision is
achieved.
System is operational only when fully An ES can operate with small amount of
developed. rules.
111
Step by step execution according to fixed Execution done logically and heuristically.
algorithms is necessary.
Needs complete and full information. Can operate with sufficient or insufficient
information.
Manipulates a large and effective database. Manipulates a big and effective database.
Easily operated with quantitative data. Easily operated with qualitative data
3. Define heuristics
Heuristic is a rule of thumb that probably leads to a solution. Heuristics play a major
role in search strategies because of exponential nature of the most problems. Heuristics help
to reduce the number of alternatives from an exponential number to a polynomial number.
4. Define Meta Knowledge and its use. (December 2011, November 2013)
Meta knowledge includes the ability to evaluate the knowledge available, the
additional knowledge required and the systematic implied by the present rules.
Jacques Pitrat definition:
Anyone can be called an expert as long as that person has a vast knowledge of the
particular field and has practical experience in a certain domain. However, the person is
restricted to his or her own domain. For example, being an IT expert does not mean that the
person is an expert in all IT domains but she may be an expert in intelligence systems or an
expert in only the development of an intelligence agent.
LISP
All ES developed in the early days used LISP, or tools written using the LISP
language.
PROLOG
The on-going research of artificial intelligent has given birth to the programming
language PROLOG. PROLOG is the acronym for 'Programming in Logic'. A programme
using PROLOG can be assumed to be a knowledge database that stores facts and rules.
9. Name some expect systems developed for voice recognition and their uses.
Hearsay was an early attempt at solving voice recognition through an expert systems
approach. Hearsay and all interpretation systems are essentially pattern recognition systems
—looking for patterns in noisy data. In the case of Hearsay recognizing phonemes in an audio
stream. Other early examples were analyzing sonar data to detect Russian submarines. These
kinds of systems proved much more amenable to a neural network AI solution than a rule-
based approach.
113
was one of the most successful areas for early expert systems applied to business domains
such as sales people configuring Dec Vax computers and mortgage loan application
development.
SMH.PAL is an expert system for the assessment of students with multiple disabilities.
12. How is expert system used to monitor dam safety and landslides?
Mistral is an expert system for the monitoring of dam safety developed in the 90's by
Ismes (Italy). It gets data from an automatic monitoring system and performs a diagnosis of
the state of the dam. Its first copy, installed in 1992 on the Ridracoli Dam (Italy), is still
operational 24/7/365. It has been installed on several dams in Italy and abroad (e.g. Itaipu
Damin Brazil), as well as on landslides under the name of Eydenet, and on monuments under
the name of Kaleidos. Mistral is a registered trade mark of CESI.
Ignorance does not know something that is knowable. Uncertainty is where something
is not knowable: it is inherent in the situation. Ignorance can come from (a) the limited
knowledge of human expert; (b) inexact data; or (c) incomplete data (which forces a
premature decision).
15. Name any three universities and mention the expert system tools developed there?
114
PART B
1. Draw the schematic of expert system and explain. (December 2011, November
2012, June 2013)
Expert systems have a number of major system components and interface with
individuals who interact with the system in various roles.
1. Knowledge base :
The knowledge acquired from the human expert must be encoded in such a
way that it remains a faithful representation of what the expert knows, and it can be
manipulated by a computer.
Three common methods of knowledge representation evolved over the years are
1. IF-THEN rules.
2. Semantic networks.
3. Frames.
115
1. IF-THEN rules
If a1 , a2 , . . . . . , an Then b1 , b2 , . . . . . , bn where
2.Semantic Networks
The Fig. below shows a car IS-A vehicle; a vehicle HAS wheels. This kind of
relationship establishes an inheritance hierarchy in the network, with the objects lower down
in the network inheriting properties from the objects higher up.
116
3. Frames
In this technique, knowledge is decomposed into highly modular pieces called frames,
which are generalized record structures. Knowledge consist of concepts, situations, attributes
of concepts, relationships between concepts, and procedures to handle relationships as well as
attribute values.
The attributes, the relationships between concepts, and the procedures are allotted to slots in a
frame.
The contents of a slot may be of any data type - numbers, strings, functions or procedures
and so on.
The frames may be linked to other frames, providing the same kind of inheritance as that
provided by a semantic network.
Two frames, their slots and the slots filled with data type are shown.
117
2. Working storage
Working memory refers to task-specific data for a problem. The contents of the
working memory, changes with each problem situation. Consequently, it is the most dynamic
component of an expert system, assuming that it is kept current.
o Every problem in a domain has some unique data associated with it.
o Data may consist of the set of conditions leading to the problem, its parameters and so
on.
o Data specific to the problem needs to be input by the user at the time of using, means
consulting the expert system. The Working memory is related to user interface
Fig. below shows how Working memory is closely related to user interface of the expert
system.
3. Inference engine :
The code at the core of the system which derives recommendations from the
knowledge base and problem specific data in working storage;
The inference engine is a generic control mechanism for navigating through and
manipulating knowledge and deduce results in an organized manner. The inference engine's
generic control mechanism applies the axiomatic (self-evident) knowledge present in the
knowledge base to the task-specific data to arrive at some conclusion.
118
The Forward chaining, Backward chaining and Tree searches are some of the techniques used
for drawing inferences from the knowledge base.
4. User interface :
The code that controls the dialog between the user and the system.
1. Domain expert : The individuals who currently are experts in solving the
problems; here the system is intended to solve;
3. User : The individual who will be consulting with the system to get advice which
would have been provided by the expert.
Many expert systems are built with products called expert system shells. A shell is a
piece of software which contains the user interface, a format for declarative knowledge in the
knowledge base, and an inference engine. The knowledge and system engineers use these
shells in making expert systems.
1. Knowledge engineer: uses the shell to build a system for a particular problem
domain.
2. System engineer: builds the user interface, designs the declarative format of the
knowledge base, and implements the inference engine.
Depending on the size of the system, the knowledge engineer and the system engineer might
be the same person.
2. Write short notes on DART and MYCIN (May 2012, November 2012, June
2013, November 2013, May 2014)
i) DART
It is an artificial intelligence based expert system used for computer system fault
diagnosis. This system is an automated consultant that advises IBM field service personnel on
the diagnosis of faults occurring in computer installations.
The consultant identifies specific system components (both hardware and software) likely to
be responsible for an observed fault and offers a brief explanation of the major factors and
evidence supporting these indictments.
The consultant, called DART, was constructed using HMYCIN, and is part of a larger
research effort investigating automated diagnosis of machine faults.
Motivation and Scope of Effort
119
A typical, large-scale computer installation is composed of numerous subsystems including
CPUs, primary and secondary storage, peripherals, and supervisory software. Each of these
subsystems, in turn, consists of a richly connected set of both hardware and software
components such as disk drives, controllers, CPUs, memory modules, and access methods.
Generally, each individual component has an associated set of diagnostic aids designed to test
its own specific integrity. However, very few maintenance tools and established diagnostic
strategies aimed at identifying faults on the system or subsystem level. As a result,
identification of single or multiple faults from systemic manifestations remains a difficult
task. The non-specialist field service engineer is trained to use the existing component-
specific tools and, as a result, is often unable to attack the failure at the systemic level. Expert
assistance is then required, increasing both the time and cost required to determine and repair
the fault. The design of DART reflects the expert's ability to take a systemic viewpoint on
problems and to use that viewpoint to indict specific components, thus making more effective
use of the existing maintenance capabilities.
For our initial design, the concentration was on problems occurring within the
teleprocessing (TP) subsystems for the IBM 370-class computers. This subsystem includes
various network controllers, terminals, remote-job entry facilities, modems, and several
software access methods. In addition to these well-defined components there are numerous
available test points the program can use during diagnosis. The focus was on handling two of
the most frequent TP problems, (1) when a user is unable to log on to the system from a
remote terminal, and (2) when the system operator is unable to initialize the TP network
itself. In a new system configuration, these two problems constitute a significant percentage
of service calls received.
Interviews with field-service experts made it apparent that much of their problem-solving
expertise is derived from their knowledge of several well-defined communications protocols.
Often composed of simple request-acknowledge sequences, these protocols represent the
transactions between components that are required to perform various TP tasks. Although
based on information found in reference manuals it is significant that these protocols are not
explicitly detailed anywhere in the standard maintenance documentation. Knowledge of the
basic contents of these protocols and their common sequence forms the basis of a diagnostic
strategy: use the available tracing facilities to capture the actual communications occurring in
the network, and analyze this data to determine which link in the protocol chain has broken.
This procedure is sufficient to identify specific faulty components in the network.
120
Each subsystem serves as a focal point for tests and findings associated with that segment of
the diagnostic activity. These subsystems currently correspond to various input/output
facilities (e.g.. DISK. TAPE. TP) or the CPU-complex itself, lor each selected subsystem, the
user is asked to identify one or more logical pathways which might be involved in the
situation. Kach of these logical pathways correspond to a line of communication between a
peripheral and an application program. On the basis of this information and details of the
basic composition of the network, the appropriate communications protocol can be selected.
The user is also asked to indicate which diagnostic tools (e.g., traces, dumps, logic probes)
arc available for examining each logical pathway.
Once the logical pathway and protocol have been determined, descriptions arc gathered of the
often multiple physical pathways that actually implement the logical pathway, it is on this
level that diagnostic test results arc presented and actual component indictments occur. For
DART to be useful at this level, the field engineer must be familiar with the diagnostic
equipment and software testing and tracing facilities which can be requested, and, of course,
must also have access to information about the specific system hardware and softwate
configuration of the installation. Finally, at the end of the consultation session, DART
summarizes its findings and recommends additional tests and procedures to follow. Figure
below depicts the major steps of the diagnostic process outlined above.
121
After DART has indicated the components most likely to be at fault, the responsibility
for performing a detailed determination and repair of the actual component failures (i.e..
microcode bugs, integrated circuit failure, etc.) would then shift to the appropriate
maintenance groups for those components.
Result
The current DART knowledge base consists of 300 EMYCIN parameters and 190
production rules and was constructed over a period of 8 months. During this period 5
specialists were interviewed about different aspects of the diagnostic process and the
knowledge base reflects their composite expertise. Much of the requested diagnostic data is
already in a machine-readable form on the subject computer system. However, as the
transcript shows, this information must currently be entered by the user. This interaction
forms a substantial fraction of me users input. Indeed, wc estimate 30 to 60 percent of the
current interaction between program and user will be eliminated when this on-line data is
exploited.
It is clear that the communications protocols form the crux of the expertise, both for
the human specialist and for DART. Although the experts were able to easily articulate these
protocols, their translation into the production rule formalism was tedious. A protocol
represents an expected sequence of transactions between components. However, in order to
identify specific faulty components, the production rules capture only the possible deviations
from this expected sequence. Thus each protocol yields a substantial number of rules which
only indirectly reflect the original sequence and tend to produce rather opaque explanations
of the diagnostic reasoning. Furthermore, for any lengthy protocol, ensuring the completeness
of the resulting rule set becomes a significant problem. In a collateral effort, wc are
investigating the use of explicit representations of these protocols with general diagnostic
rules which will hypothesize deviations directly from the protocols.
ii. MYCIN.
MYCIN was developed at Stanford university by Shortliffe (1970’s). It was an
expert system for diagnosing blood diseases. It was a precursor to today’s expert
systems and acts as an ideal case study.
Features
A typical rule
AND is anaerobic
122
The rules were actualyl stored as LISP expressions.
Inexact reasoning was employed using certainty factors. This is a number on the scale -1 to 1.
-1 being definitely false +1 definitely true. MYCIN used meta-rules to redirect search at
stages.
THEN evidence should use rules for E before rules for gram pos rods
The above is guiding the search by saying which rules to try first.
To make the system acceptable to users (doctors) MYCIN incorporated Natural Language i.e
Users entered information in English and were answered in English. A spelling checker made
intelligent corrections to the input. Both WHY, WHY NOT and HOW explanation facilities
were provided. Several suggestions were often offered listed with some priority, allowing the
doctor some freedom of selection.
MYCIN has two phases in its approach, a diagnosis and a prescription phase.
In the first phase the nature of the infection and the organism causing the infection are
determined (hopefully). The prescription phase then indicates the drugs for the treatment
taking into account any side effects they may have on the patient.
MYCIN consists of about 500 rules. It is a backward chaining rule based system using
a depth-first search strategy.
123
Evaluation
In order to assess the performance of MYCIN a set of ten case histories of meningitis were
selected. The results of MYCIN were compared with medics of various degrees of expertise
from Stanford. They were also compared with evaluations from experts outside of Stanford
(the evaluators). They were to classify the prescriptions according to the categories:
MYCIN 70 0
Prior Rx 70 0
Faculty 1 50 1
Faculty 2 50 1
Fellow 50 1
Faculty 4 50 0
Faculty 3 40 0
Resident 30 1
124
Faculty5 30 0
Student 10 3
ES is not widely used in business firms or organisations. Due to limited usage, firms
are still in doubt about the capability and, most definitely, the high cost involved in
investing in an ES.
Using an ES is very difficult and learning and mastering it requires a long time. This
discourages managers from using ES. In one aspect, developing an ES that is user-
friendly is the biggest challenge for ES developer.
This is the most obvious weakness in an ES; its scope is very limited to its field only. In
the development aspect, the ES built is best developed because of its high accuracy.
However, usage-wise decision makers face constantly changing problems which involve
different fields that are interrelated.
The main source of the knowledge is experts. Humans make mistakes. If the experts input
wrong information into the Expert system, this will have a negative impact on the results
produced.
The information in ES must be constantly updated to solve new problems. Every new
problem that occurs needs new knowledge and expertise. This means that there must be
an on-going relationship between the domain experts and the ES developer. This situation
requires the domain experts update the source of knowledge and expertise to suit the
current situation.
The cost to consult a group of experts is not cheap, what if ES was built traditionally
without involving the use of an Expert System shell? On the other hand, programming
125
cost is high because the artificial intelligence technique is difficult to master and needs a
very skilful programmer.
We must be responsible for our actions and decisions. An expert has to take responsibility
for the information he or she provides. . The difficult question here is who should
shoulder the responsibility if a decision suggested by ES results in a negative outcome.
1. Identification
2. Conceptualization
3. Formalization
4. Implmentation
6. Maintainence
1. Identification: this phase is used to identify the problems, data, goals, company,
people…
126
• This was expressed in production rules
IF c1, c2 c3 THEN a1, a2, a3
• Configuration stage was explicitly represented as data: current goal or context
• Changing contexts moved configuration process through all stages
4. Implementation:
Build the system in executable form
• Language: OPS5 (similar to CLIPS)
• Conflict Resolution: MEA (extends Lex / Specificity)
• Means-Ends Analysis: order by recency of first condition
IF c1, c2 THEN .. is now different from IF c2, c1 THEN
• Contexts are treated as special by putting them first
• End-task is unspecific, thus executed last
• Use MEA + Spec to concentrate on subtasks:
• IF g1, x, y THEN assert barify // Signal necessity of subtask
• IF barify, a THEN p, q // Two rules perform the task
• IF barify, b THEN r, s // of barification per se
• IF barify THEN retract barify // Termination when ready
5. Testing and Evaluation :Does it do what we wanted?
• Field test after 1 year, production after another year
• Accuracy over 95%:No more pre-assembly was necessary!
• The installed configurations are optimized:Buyers are happier because they see
better productsLess retraining of staff on product changes:Quicker change of
production
• Net return to Digital is estimated to 40M$ per year.
6. Maintenance
Adapt to changing environment or requirements
Biological components
1. Users of XCON
• Sales: Use for quotations and ensure technical validity of orders (XSEL)
• Manufacturing: Check installability of order, guide assembly instructions and
diagnostics
• Field service: Easy assembly at customer’s site (XFL)
• Development: Anticipate integration problems for new products
The system found more use than what it was designed for
2. Key Roles
127
• Experts: Provide domain knowledge
• Users: Feedback on fit in business process
128
The generic components of a shell : the knowledge acquisition, the knowledge Base,
the reasoning, the explanation and the user interface are shown below. The knowledge base
and reasoning engine are the core components.
1. Knowledge Base
A store of factual and heuristic knowledge. Expert system tool provides one or
more knowledge representation schemes for expressing knowledge about the
application domain. Some tools use both Frames (objects) and IF-THEN rules. In
PROLOG the knowledge is represented as logical statements.
2. Reasoning Engine
4. Explanation subsystem
A subsystem that explains the system's actions. The explanation can range from how
the final or intermediate solutions were arrived at justifying the need for additional
data.
5. User Interface
A means of communication with the user. The user interface is generally not a
part of the expert system technology. It was not given much attention in the past.
However, the user interface can make a critical difference in the perceived utility of
an Expert system.
6. . Explanation
129
Most expert systems have explanation facilities that allow the user to ask
questions - why and how it reached some conclusion.
The questions are answered by referring to the system goals, the rules being used, and
existing problem solving. The rules typically reflect empirical, or "compiled"
knowledge. They are codes of an expert's rules of thumb, not the expert's deeper
understanding.
Example :
User No.
User Yes.
User Yes
User Why ?
User
Note : The rule gives the correct advice for a flooded car, and knows the questions to
be ask to determine if the car is flooded, but it does not contain the knowledge of
what a flooded car is and why waiting will help.
Types of Explanation
130
6. What is knowledge acquisition? Explain in detail. (November 2013)
Knowledge acquisition includes the elicitation, collection, analysis, modeling and
validation of knowledge.
1. Protocol-generation techniques.
2. Protocol analysis techniques.
3. Hierarchy generation techniques.
4. Matrix-based techniques.
5. Sorting techniques.
6. Limited-information and constrained-processing tasks.
7. Diagram-based techniques.
■ Protocol-generation techniques
Hierarchy-generation techniques
131
Involve creation, reviewing and modification of hierarchical knowledge. Hierarchy-
generation techniques, such as laddering, are used to build taxonomies or other hierarchical
structures such as goal trees and decision networks. The Ladders are of various forms like
concept ladder, attribute ladder, composition ladders.
■ Matrix-based techniques
Involve the construction and filling-in a 2-D matrix (grid, table), indicating such
things, as may be, for example, between concepts and properties (attributes and values) or
between problems and solutions or between tasks and resources, etc. The elements within
the matrix can contain: symbols (ticks, crosses, question marks ) , colors , numbers , text.
Sorting techniques
Used for capturing the way people compare and order concepts; it may reveal
knowledge about classes, properties and priorities.
Techniques that either limit the time and/or information available to the expert when
performing tasks. For example, a twenty-questions technique provides an efficient way of
accessing the key information in a domain in a prioritized order.
■ Diagram-based techniques
Include generation and use of concept maps, state transition networks, event diagrams
and process maps. These are particularly important in capturing the "what, how, when, who
and why" of tasks and events.
7. Explain the characteristics, role and advantages of expert system? ( NOV 2013)
Expert system operates as an interactive system that responds to questions, asks for
clarifications, makes recommendations and generally aids the decision-making process.
Responds to questions
Asks for clarifications
Makes recommendations
Aids the decision-making process.
2. Tools have ability to sift (filter) knowledge
132
3. Make logical inferences based on knowledge stored
o Best suited for those dealing with expert heuristics for solving problems.
o Not a suitable choice for those problems that can be solved using purely numerical
techniques.
8. Cost-Effective alternative to Human Expert
The Expert systems have found their way into most areas of knowledge work. The
applications of expert systems technology have widely proliferated to industrial and
commercial problems, and even helping NASA to plan the maintenance of a space shuttle for
its next flight. The main applications are
133
Medical diagnosis was one of the first knowledge areas to which Expert system technology
was applied in 1976. However, the diagnosis of engineering systems quickly surpassed
medical diagnosis.
The Expert system's commercial potential in planning and scheduling has been recognized as
very large. Examples are airlines scheduling their flights, personnel, and gates; the
manufacturing process planning and job scheduling;
Configuration problems are synthesized from a given set of elements related by a set of
constraints. The Expert systems have been very useful to find solutions. For example,
modular home building and manufacturing involving complex engineering design.
The financial services are the vigorous user of expert system techniques. Advisory programs
have been created to assist bankers in determining whether to make loans to businesses and
individuals. Insurance companies to assess the risk presented by the customer and to
determine a price for the insurance. ES are used in typical applications in the financial
markets / foreign exchange trading.
5. Knowledge Publishing
This is relatively new, but also potentially explosive area. Here the primary function of the
Expert system is to deliver knowledge that is relevant to the user's problem. The two most
widely known Expert systems are : one, an advisor on appropriate grammatical usage in a
text; and the other, is a tax advisor on tax strategy, tactics, and individual tax policy.
Here Expert system does analysis of real-time data from physical devices, looking for
anomalies, predicting trends, controlling optimality and failure correction. Examples of real-
time systems that actively monitor processes are found in the steel making and oil refining
industries.
Here the Expert systems assist in the design of physical devices and processes, ranging from
high-level conceptual design of abstract entities all the way to factory floor configuration of
manufacturing processes.
(a) Consistency
134
One of the advantages of an ES is that the results given are consistent. This might be due to
the fact that there are no elements such as exhaustion and emotions as experienced by
humans.
A very difficult problem encountered by an organisation, if not taken seriously, can cause an
adverse impact such as losses or cancellation of a business deal. Sometimes, the problems
need to be attended to quickly. The problems can become more complicated when individuals
or experts involved in solving them are absent or cannot be contacted. Thus, an ES serves as
an alternative to experts.
One of the important components in an ES is the knowledge base. This component contains
the accumulated knowledge and acquired or transferred expertise from many experts. Thus,
an ES is sometimes more superior than an expert because its knowledge and expertise have
come from many sources.
An ES can be used by trainees to learn about the knowledge-based system. Trainee who uses
an ES would be able to observe how an expert solves a problem.
ES is not widely used in business firms or organisations. Due to limited usage, firms are still
in doubt about the capability and, most definitely, the high cost involved in investing in an
ES.
Using an ES is very difficult and learning and mastering it requires a long time. This
discourages managers from using ES. In one aspect, developing an ES that is user-friendly is
the biggest challenge for ES developer.
This is the most obvious weakness in an ES; its scope is very limited to its field only. In the
development aspect, the ES built is best developed because of its high accuracy. However,
135
usage-wise decision makers face constantly changing problems which involve different fields
that are interrelated.
The main source of the knowledge is experts. Humans make mistakes. If the experts input
wrong information into the Expert system, this will have a negative impact on the results
produced.
The information in ES must be constantly updated to solve new problems. Every new
problem that occurs needs new knowledge and expertise. This means that there must be an
on-going relationship between the domain experts and the ES developer. This situation
requires the domain experts update the source of knowledge and expertise to suit the current
situation.
The cost to consult a group of experts is not cheap, what if ES was built traditionally without
involving the use of an Expert System shell? On the other hand, programming cost is high
because the artificial intelligence technique is difficult to master and needs a very skilful
programmer.
We must be responsible for our actions and decisions. An expert has to take responsibility for
the information he or she provides. . The difficult question here is who should shoulder the
responsibility if a decision suggested by ES results in a negative outcome.
136
Industry/ Practical Connectivity
Industry Connectivity
The student can work as a software engineer in industry, which is working for
companies like Amazon to shopping list recommendation engines or Facebook
analyzing and processing big data.
The students can also work as a hardware engineer developing electronic parking
assistants or home assistant robots.
Latest developments in AI
137
138
139
140
141
142
143
Anna University
B.E./B.Tech. DEGREE EXAMINATION, APRIL/MAY 2011
Sixth Semester
Computer Science and Engineering
CS 2351 — ARTIFICIAL INTELLIGENCE
(Regulation 2008)
PART A — (10 × 2 = 20 marks)
1. List down the characteristics of intelligent agent.
2. What do you mean by local maxima with respect to search technique?
3. What factors determine the selection of forward or backward reasoning
approach for an AI problem?
4. What are the limitations in using propositional logic to represent the
knowledge base?
5. Define partial order planner?
6. What are the differences and similarities between problem solving and
planning?
7. List down two applications of temporal probabilistic models.
8. Define Dempster-Shafer theory.
9. Explain the concept of learning from example.
10. How statistical learning method differs from reinforcementlearning method?
PART B — (5 × 16 = 80 marks)
11. (a) Explain in detail on the characteristics and applications of learning agents.
Or
(b) Explain AO* algorithm with an example.
12. (a) Explain unification algorithm used for reasoning under predicate logic with an
example.
Or
(b) Describe in detail the steps involved in the knowledge Engineering process.
13. (a) Explain the concept of planning with state space search using suitable examples.
Or
(b) Explain the use of planning graphs in providing better heuristic estimate with suitable
examples.
14. (a) Explain the method of handling approximate inference in Bayesian Networks.
Or
(b) Explain the use of Hidden Markov Models in Speech Recognition.
15. (a) Explain the concept of learning using decision trees and neural network
approach.
Or
(b) Write short notes on :
(i) Statistical learning. (8)
(ii) Explanation based learning. (8)
144
Anna University
B.E./B.Tech. DEGREE EXAMINATION, NOVEMBER/DECEMBER 2011.
Sixth Semester
Computer Science and Engineering
CS 2351 — ARTIFICIAL INTELLIGENCE
(Common to Seventh Semester – Electronics and Instrumentation Engineering)
(Regulation 2008)
145
ensure a valid solution. Draw a diagram of the complete state space.
(ii) Design appropriate search algorithm for it.
13. (a) Explain the concept of planning with state space search. How is it different from
partial order planning?
Or
(b) What are planning graphs? Explain the methods of planning and acting in the real
world.
14. (a) Explain the concept of Bayesian network in representing knowledge in an
uncertain domain.
Or
(b) Write short notes on : (i) Temporal models
(ii) Probabilistic Reasoning.
15. (a) Explain in detail learning from observation and explanation based learning.
Or
(b) Explain in detail statistical learning methods and reinforcement learning.
146
B.E/B.TECH DEGREE EXAMINATION,APRIL/MAY 2010
3. State the reasons when the hill climbing often gets stuck
PART B – (5 x 16 = 80 marks)
(or)
(Or)
147
(b) Describe the Min-Max Algorithm and Alpha –beta Pruning
148