Artificial Intelligence Notes
Artificial Intelligence Notes
A A A
B C D B C D
Page 4
E F
Step4 Step5
A
A
D
B C D B C D
G H E F
G H E F
I J
Page 5
The actual operation of the algorithm is very simple.
It proceeds in steps, expanding one node at each step, until it
generates a node that corresponds to a goal state. At each step, it
picks the most promising of the nodes that have so far been
generated but not expanded. It generates the successors of the
chosen node, applies the heuristic function to them, and adds
them to the list of open nodes, after checking to see if any of
them have been generated before. By doing this check, we can
guarantee that each node only appears once in this graph,
although many nodes may point to it as a successor. Then the
next step begins.
The process can be summarized as follows.
Algorithm: Best-First Search
1. Start with OPEN containing just the initial state.
2. Until a goal is found or there are no nodes left on OPEN
do:
(a) Pick the best node on OPEN.
(b) Generate its successors.
(c) For each successor do:
i. If it has not been generated before, evaluate it, add it to
OPEN, and record its parent.
ii. If it has been generated before, change the parent if this
new path is better than the previous one. In that case,
Page 6
update the cost of getting this node and to any successors
that this node may already here.
A* Search: The best-first search algorithm that was just
presented is a simplification of an algorithm called A*, which
was first presented by Hart et al. [1968; 1972]. This algorithm
uses the same f’, g, and h’ functions, as well as the lists OPEN
and CLOSED
The form of heuristic estimation function for A* is
f*(n) =g*(n)+h*(n)
Where the two components g*(n) and h*(n) are estimates of
the cost (or distance) from the start node to node n and the
cost form node n to a goal node, respectively.
Nodes on the open list are nodes that have been
generated but not yet expanded while nodes on the closed list
are nodes have been expanded and whose children are,
therefore, available to the search program. The A* algorithm
proceeds as follows
Algorithm: A*
1. Place the starting node s on open.
2. If open is empty, stop and return failure.
Page 7
3. Remove from open the node n that has the smallest value
of f*(n). If the node is a goal node, return success and
stop. Otherwise,
4. Expand n, generating all of its successors n’ and place n
on closed. For every successor n’, if n’ is not already on
open or closed attach a back-pointer to n, compute f*(n’)
and place it an open.
5. Each n’ that is already on open or closed should be
attached to back-pointers which reflect the lowest g*(n)
path. If n’ was on closed and its pointer was changed,
remove it and place it on open.
6. Return to step2.
It has been shown that the A* algorithm is both complete and
admissible. Thus, A* will always find an optimal path if one
exists. The efficiency of an A* algorithm depends on how
closely h* approximates h and the cost of the computing f*.
AO* algorithm:
1. Place the starting node s on open
2. Using the search tree constructed thus far, compute the
most promising solution tree To.
3. Select a node n that is both on open and a part of To.
Remove n from open and place it on closed.
4. If n is a terminal goal node, label n as solved. If the
solution of n results in any of n’s ancestors being solved,
Page 8
label all the ancestors as solved. If the start node s is
solved, exit with success where To is the solution tree.
Remove from open all nodes with a solved ancestor.
5. If n is not a solvable node (operators cannot be applied),
label n as unsolvable. If the start node is labeled as
unsolvable, exist with failure. If any of n’s ancestors
become unsolvable because n is, label them unsolvable as
well. Remove from open all nodes with unsolvable
ancestors.
6. Otherwise, expand node n generating all of its successors.
For each such successor node that contains more than one
sub problem, generate their successors to give individual
sub problems. Attach to each newly generated node a back
pointer to its predecessor. Compute the cost estimate h* for
each newly generated node and place all such nodes that do
not yet have descendents on open. Next, recomputed the
values of h* at n and each ancestor of n.
7. Return to step2.
Hill Climbing Search: Hill climbing gets their names
from the way the nodes are selected for expansion. At each
point in the search path, a successor node that appears to lead
most quickly to the top of the hill (the goal) is selected for
exploration. This method requires that some information be
available with which to evaluate and order the most promising
choices.
Page 9
Hill climbing is like depth-first searching where the
most promising child is selected for expansion. When the
children have been generated, alternative choices are
evaluated using some type of heuristic function. The path that
appears most promising is then chosen and no further
reference to the parent or other children is retained. This
process continues from node-to-node with previously
expanded nodes being discarded.
Hill climbing can produce substantial savings over blind
searches when an informative, reliable function is available to
guide the search to a global goal. It suffers from some serious
drawbacks when this is not the case. Potential problem types
named after certain terrestrial anomalies are the foothill,
ridge, and plateau traps.
The foothill trap results when local maxima or peaks
are found. In this case the children all have less promising
goal distances than the parent node. The search is essentially
trapped at the local node with no indication of goal direction.
The only way to remedy this problem is to try moving in
some arbitrary direction a few generations in the hope that the
real goal direction will become evident, backtracking to an
ancestor node and trying a secondary path choice.
A second potential problem occurs when several
adjoining nodes have higher values than surrounding nodes.
Page 10
This is the equivalent of a ridge. The search may encounter a
plateau type of structure, that is, an area in which all
neighboring nodes have the same values. Once again, one of
the methods noted above must be tried to escape the trap.
Breadth-First Search: Breadth-first searches are
performed by exploring all nodes at a given depth before
proceeding to the next level. This mans that all immediate
children of nodes are explored before any of the children’s
children are considered. Breadth first tree search is illustrated
in fig. It has the obvious advantage of always finding a
minimal path length solution where one exists. A great many
nodes may need to be explored before a solution is found,
especially if the tree is very full. It uses a queue structure to
hold all generated but still unexplored nodes. The breadth-
first algorithm proceeds as follows.
BREADTH-FIRST SEARCH
1. Place the starting node s on the queue.
2. If the queue is empty, return failure and stop.
3. If the first element on the queue is a goal node g,
return success and stop. Otherwise,
4. Remove and expand the first element from the queue
and place all the children at the end of the queue in
any order.
5. Return to step2.
Page 11
Start
Page 12
the heuristic function, we choose to move to B. backing B’s
value up to A, we can conclude that A’s value is 8, since we
know we can move to a position with a value of 8.
A A
B C D
B(8) C D
Heuristic function
A heuristic function, or simply a heuristic, is a function that
ranks alternatives in various search algorithms at each branching
step based on the available information (heuristically) in order to
make a decision about which branch to follow during a search.
Page 14
Shortest paths
Page 16
Using these techniques a program called ABSOLVER was
written (1993) by A.E. Prieditis for automatically generating
heuristics for a given problem. ABSOLVER generated a new
heuristic for the 8-puzzle better than any pre-existing heuristic
and found the first useful heuristic for solving the Rubik's Cube.
Consistency and Admissibility
Page 18
The benefit of alpha-beta pruning lies in the fact that branches of
the search tree can be eliminated. The search time can in this
way be limited to the 'more promising' sub tree, and a deeper
search can be performed in the same time. Like its predecessor,
it belongs to the branch and bound class of algorithms. The
optimization reduces the effective depth to slightly more than
half that of simple mini max if the nodes are evaluated in an
optimal or near optimal order (best choice for side on move
ordered first at each node).
With an (average or constant) branching factor of b, and a search
depth of d plies, the maximum number of leaf node positions
evaluated (when the move ordering is pessimal) is O(b*b*...*b)
= O(bd) – the same as a simple mini max search. If the move
ordering for the search is optimal (meaning the best moves are
always searched first), the number of leaf node positions
evaluated is about O(b*1*b*1*...*b) for odd depth and
Page 19
O(b*1*b*1*...*1) for even depth, or . In the
latter case, where the ply of a search is even, the effective
branching factor is reduced to its square root, or, equivalently,
the search can go twice as deep with the same amount of
computation.The explanation of b*1*b*1*... is that all the first
player's moves must be studied to find the best one, but for each,
only the best second player's move is needed to refute all but the
first (and best) first player move – alpha-beta ensures no other
second player moves need be considered. If b=40 (as in chess),
and the search depth is 12 plies, the ratio between optimal and
pessimal sorting is a factor of nearly 406 or about 4 billion times.
An animated pedagogical example that attempts to be
human-friendly by substituting initial infinite (or arbitrarily
large) values for emptiness and by avoiding using the nega max
coding simplifications.
Normally during alpha-beta, the sub trees are temporarily
dominated by either a first player advantage (when many first
player moves are good, and at each search depth the first move
checked by the first player is adequate, but all second player
responses are required to try and find a refutation), or vice versa.
This advantage can switch sides many times during the search if
the move ordering is incorrect, each time leading to inefficiency.
As the number of positions searched decreases exponentially
each move nearer the current position, it is worth spending
considerable effort on sorting early moves. An improved sort at
any depth will exponentially reduce the total number of
positions searched, but sorting all positions at depths near the
root node is relatively cheap as there are so few of them. In
Page 20
practice, the move ordering is often determined by the results of
earlier, smaller searches, such as through iterative deepening.
The algorithm maintains two values, alpha and beta, which
represents the minimum score that the maximizing player is
assured of and the maximum score that the minimizing player, is
assured of respectively. Initially alpha is negative infinity and
beta is positive infinity. As the recursion progresses the
"window" becomes smaller. When beta becomes less than alpha,
it means that the current position cannot be the result of best
play by both players and hence need not be explored further.
Additionally, this algorithm can be trivially modified to return
an entire principal variation in addition to the score. Some more
aggressive algorithms such as MTD(f) do not easily permit such
a modification.
Pseudo code
Function alpha beta (node, depth, α, β, Player)
if depth = 0 or node is a terminal node
return the heuristic value of node
If Player = Max Player
for each child of node
α := max(α, alpha beta(child, depth-1, α, β, not(Player) ))
if β≤α
Break (* Beta cut-off *)
return α
else
for each child of node
β := min(β, alpha beta(child, depth-1, α, β, not(Player) ))
Page 21
if β≤α
break (* Alpha cut-off *)
return β
(* Initial call *)
Alpha beta (origin, depth, -infinity, +infinity, Max Player)
Heuristic improvements
Page 22
problems are typically identified with problems based on
constraints on a finite domain. Such problems are usually solved
via search, in particular a form of backtracking or local search.
Constraint propagation are other methods used on such
problems; most of them are incomplete in general, that is, they
may solve the problem or prove it un satisfiable, but not always.
Constraint propagation methods are also used in conjunction
with search to make a given problem simpler to solve. Other
considered kinds of constraints are on real or rational numbers;
solving problems on these constraints is done via variable
elimination or the simplex algorithm.
Constraint satisfaction originated in the field of artificial
intelligence in the 1970s (see for example (Laurière 1978)).
During the 1980s and 1990s, embedding of constraints into a
programming language were developed. Languages often used
for constraint programming are Prolog and C++.
Page 23
Constraint satisfaction problem
Constraints enumerate the possible values a set of variables
may take. Informally, a finite domain is a finite set of arbitrary
elements. A constraint satisfaction problem on such domain
contains a set of variables whose values can only be taken from
the domain, and a set of constraints, each constraint specifying
the allowed values for a group of variables. A solution to this
problem is an evaluation of the variables that satisfies all
constraints. In other words, a solution is a way for assigning a
value to each variable in such a way that all constraints are
satisfied by these values.
Page 25
problem with problems in other areas such as finite model
theory.
Constraint programming
Constraint programming is the use of constraints as a
programming language to encode and solve problems. This is
often done by embedding constraints into a programming
language, which is called the host language. Constraint
programming originated from a formalization of equalities of
terms in Prolog II, leading to a general framework for
embedding constraints into a logic programming language. The
most common host languages are Prolog, C++, and Java, but
other languages have been used as well.
Constraint logic programming
Page 27
Gecode, an open source portable toolkit written in C++
developed as a production-quality and highly efficient
implementation of a complete theoretical background.
JaCoP (solver) an open source Java constraint solver
Koalog a commercial Java-based constraint solver.
logilab-constraint an open source constraint solver written
in pure Python with constraint propagation algorithms.
MINION an open-source constraint solver written in C++,
with a small language for the purpose of specifying
models/problems.
ZDC is an open source program developed in the
Computer-Aided Constraint Satisfaction Project for
modeling and solving constraint satisfaction problems.
Other constraint programming languages
Constraint toolkits are a way for embedding constraints into an
imperative programming language. However, they are only used
as external libraries for encoding and solving problems. An
approach in which constraints are integrated into an imperative
programming language is taken in the Kaleidoscope
programming language.
Constraints have also been embedded into functional
programming languages.
The diagram shows the first two levels, or ply, in the game tree
for tic-tac-toe. We consider all the rotations and reflections of
positions as being equivalent, so the first player has three
choices of move: in the center, at the edge, or in the corner. The
second player has two choices for the reply if the first player
played in the center, otherwise five choices. And so on.
The number of leaf nodes in the complete game tree is the
number of possible different ways the game can be played. For
example, the game tree for tic-tac-toe has 26,830 leaf nodes.
Game trees are important in artificial intelligence because one
way to pick the best move in a game is to search the game tree
using the mini max algorithm or its variants. The game tree for
tic-tac-toe is easily searchable, but the complete game trees for
larger games like chess are much too large to search. Instead, a
chess-playing program searches a partial game tree: typically
Page 30
as many ply from the current position as it can search in the time
available. Except for the case of "pathological" game trees [1]
(which seem to be quite rare in practice), increasing the search
depth (i.e., the number of ply searched) generally improves the
chance of picking the best move.
Two-person games can also be represented as and-or trees. For
the first player to win a game there must exist a winning move
for all moves of the second player. This is represented in the
and-or tree by using disjunction to represent the first player's
alternative moves and using conjunction to represent all of the
second player's moves.
Solving Game Trees
Page 32
Game of chance: A game of chance is a game whose
outcome is strongly influenced by some randomizing device,
and upon which contestants may or may not wager money or
anything of monetary value. Common devices used include dice,
spinning tops, playing cards, roulette wheels or numbered balls
drawn from a container.
Any game of chance that involves anything of monetary value is
gambling.
Gambling is known in nearly all human societies, even though
many have passed laws restricting it. Early people used the
knucklebones of sheep as dice. Some people develop a
psychological addiction to gambling, and will risk even food and
shelter to continue.
Some games of chance may also involve a certain degree of
skill. This is especially true where the player or players have
decisions to make based upon previous or incomplete
knowledge, such as poker and blackjack. In other games like
roulette and baccarat the player may only choose the amount of
bet and the thing he/she wants to bet on, the rest is up to chance,
therefore these games are still considered games of chance with
small amount of skills required [1]. The distinction between
'chance' and 'skill' is relevant as in some countries chance games
are illegal or at least regulated, where skill games are not
Page 33
Unit-2
Knowledge Representation
Introduction to Knowledge Representation (KR)
We argue that the notion can best be understood in terms of
five distinct roles it plays, each crucial to the task at hand:
A knowledge representation (KR) is most fundamentally a
surrogate, a substitute for the thing itself, used to enable
an entity to determine consequences by thinking rather
than acting, i.e., by reasoning about the world rather than
taking action in it
It is a set of ontological commitments, i.e., an answer to
the question: In what terms should I think about the
world?
It is a fragmentary theory of intelligent reasoning,
expressed in terms of three components: (i) the
representation’s fundamental conception of intelligent
Page 34
reasoning; (ii) the set of inferences the representation
sanctions; and (iii) the set of inferences it recommends.
It is a medium for pragmatically efficient computation,
i.e., the computational environment in which thinking is
accomplished. One contribution to this pragmatic
efficiency is supplied by the guidance a representation
provides for organizing information so as to facilitate
making the recommended inferences.
It is a medium of human expression, i.e., a language in
which we say things about the words
Knowledge representation is needed for library classification
and for processing concepts in an information system. In the
field of artificial intelligence, problem solving can be simplified
by an appropriate choice of knowledge representation.
Representing the knowledge in one way may make the solution
simple, while an unfortunate choice of representation may make
the solution difficult or obscure; the analogy is to make
computations in Hindu-Arabic numerals or in Roman numerals;
long division is simpler in one and harder in the other. Likewise,
there is no representation that can serve all purposes or make
every problem equally approachable.
Properties for Knowledge Representation Systems
The following properties should be possessed by a knowledge
representation system.
Representational Adequacy
– the ability to represent the required knowledge;
Inferential Adequacy
Page 35
- the ability to manipulate the knowledge represented to produce
new knowledge corresponding to that inferred from the original;
Inferential Efficiency
- the ability to direct the inferential mechanisms into the most
productive directions by storing appropriate guides;
Acquisition Efficiency
- the ability to acquire new knowledge using automatic methods
wherever possible rather than reliance on human intervention.
Page 36
Constants are objects: john, apples
Predicates are properties and relations:
– likes(john, apples)
Functions transform objects:
– likes(john, fruit of(apple tree))
Variables represent any object: likes(X, apples)
Quantifiers qualify values of variables
– True for all objects (Universal): "X. likes(X,
apples)
Exists at least one object (Existential): $X. likes(X, apples
Example: FOL Sentence
For all X
– if (X is a rose)
– then there exists Y
(X has Y) and (Y is a thorn)
Page 37
Higher Order Logic
More expressive than first order
Functions and predicates are also objects
– Described by predicates: binary(addition)
– Transformed by functions: differentiate(square)
– Can quantify over both
E.g. define red functions as having zero at 17
Much harder to reason with
Forward Chaining: In forward chaining the rules are
examined one after the other in a certain order. The order might
be the sequence in which the rules were entered into the rule set
or some other sequence as specified by the user. As each rule is
examined, the expert system attempts to evaluate whether the
condition is true or false.
Rule evaluation: When the condition is true, the rule is fired
and the next rule is examined. When the condition is false, the
rule is not fired and the next rule is examined.
It is possible that a rule cannot be evaluated as true or
false. Perhaps the condition includes one or more variables with
unknown values. In that case the rule condition is unknown.
Page 38
When a rule condition is unknown, the rule is not fired and the
next rule is examined.
The iterative reasoning process: The process of examining
one rule after the other continues until a complete pass has been
made through the entire rule set. More than one pass usually is
necessary in order to assign a value to the goal variable. Perhaps
the information needed to evaluate one rule is produced by
another rule that is examined subsequently. After the second rule
is fired, the first rule can be evaluated o the next pass.
The passes continue as long as it is possible to fire rules.
When no more rules can be fired, the reasoning process ceases.
Example of forward reasoning: Letters are used for the
conditions and actions to keep the illustration simple. In rule1,
for example, if condition A exists then action B is taken.
Condition A might be
THIS.YEAR.SALES>LAST.YEAR.SALES
Page 39
resolution. Both rules are based on the modus ponens inference
rule.
Page 40
first two rules are selected, because their consequents (Then X
is a frog, Then X is a canary) match the new goals that were just
added to the list. The antecedent (If Fritz croaks and eats flies) is
known to be true and therefore it can be concluded that Fritz is a
frog, and not a canary. The goal of determining Fritz's color is
now achieved (Fritz is green if he is a frog, and yellow if he is a
canary, but he is a frog since he croaks and eats flies; therefore,
Fritz is green).
Note that the goals always match the affirmed versions of the
consequents of implications (and not the negated versions as in
modus tollens) and even then, their antecedents are then
considered as the new goals (and not the conclusions as in
affirming the consequent) which ultimately must match known
facts (usually defined as consequents whose antecedents are
always true); thus, the inference rule which is used is modus
ponens.
Because the list of goals determines which rules are selected and
used, this method is called goal-driven, in contrast to data-driven
forward-chaining inference. The backward chaining approach is
often employed by expert systems.
Page 41
To construct computer programs that can understand
natural language.
To make inferences from the statements and also to
identify conditions in which two sentences can have
similar meaning.
Page 43
Every action takes place at some locations and serves as
source and destination.
4. Ts: Times
An action can take place at a particular location at a given
specified time. The time can be represented on an absolute scale
or relative scale.
5. AAs: Action aiders
These serve as modifiers of actions, the actor PROPEL has a
speed factor associated with it which is an action aider.
6. PAs: Picture Aides
Serve as aides of picture producers. Every object that serve as
a PP, needs certain characteristics by which they are defined.
PA’s practically serve PP’s by defining the characteristics.
There are certain rules by which the conceptual categories of
types of objects discussed can be combined.
CD models provide the following advantages for representing
knowledge.
The ACT primitives help in representing wide knowledge
in a succinct way. To illustrate this, consider the following
verbs. These are verbs that correspond to transfer of mental
information.
-see
Page 44
-learn
-hear
-inform
-remember
In CD representation all these are represented using a single
ACT primitives MTRANS. They are not represented
individually as given. Similarly, different verbs that indicate
various activities are clubbed under unique ACT primitives,
thereby reducing the number of inference rules.
The main goal of CD representation is to make explicit of
what is implicit. That is why every statement that is made
has not only the actors and objects but also time and
location, source and destination.
The following set conceptual tenses still make usage of CD
more precise.
O-Object case relationship
R-recipient case relationship
P-past
F-future
T-transition
Ts-start transition
Page 45
Tf-finished transition
K-continuing
?-interrogative
/-negative
Nil-present
Delta-timeless
C-conditional
CD brought forward the notion of language independence
because all ACT’s are language-independent primitive.
Semantic Nets: The main idea behind semantic nets is that the
meaning of a concept comes from the ways in which it is
connected to other concepts. In a semantic net, information is
represented as a set of nodes connected to each other by a set of
labeled arcs, which represent relationship among the nodes. A
fragment of a typical semantic net is shown in fig.
Mammal
Isa
Person
Has-port Nose
Instance
Uniform-
Page 46
Color team
Blue Pee-wee-Reese Brooklyn- Dodgers
Page 47
Instance (pee-Wee-Reese, Person)
Team (pee-wee-Reese, Brooklyn-Dodgers)
Uniform –color (pee-wee- Reese, blue)
But the knowledge expressed by predicates of other arties can
also be expressed in semantic nets. We have already seen that
many unary predicates. Such as Isa and instance. So for
example,
Man (Marcus)
Could be rewritten as
Instance (Marcus, Man)
Thereby making it easy to represent in a semantic net.
3. Partitioned semantic Nets. Suppose we want to represent
simple quantified expression in semantic nets. One way to do
this is to partition the semantic net into a hierarchical set of
spaces, each of which corresponds to the scope of one or more
variables. To see how this works, consider first the simple net
shown in fig. this net corresponds to the statement.
The dog bit the mail carrier.
The nodes Dogs, Bite, and Mail-Carrier represents the classes of
dogs, biting, and mail carriers, respectively, while the nodes d,b,
and m represent a particular dog, a particular biting, and a
Page 48
particular mail carrier. This fact can be easily be represented by
a single net with no portioning.
But now suppose that we want to represent the fact
Every dog has bitten a mail carrier.
Or, in logic:
X: dog(x) y: Mail-carrier(y) bite(x, y)
To represent this fact, it is necessary to encode the scope of the
universally quantified variable x.
Mail-carrier
Dogs Bite
Assailant victim
Fig: using partitioned semantic nets
Frame: A frame is a collection of attributes (usually called
slots) and associated values (and possibly constraints on values)
that describes some entity in the world. Sometimes a frame
describes an entity in some absolute sense; sometimes it
represents the entity from a particular point of view. A single
frame taken alone is rarely useful. Instead, we build frame
Page 49
systems out of collection of frames that are connected to each
other.
Set theory provides a good basis for understanding frame
systems. Although not all frame systems are defined this way,
we do so here. In this view, each frame represents either a class
(a set) or an instance (an element of a class). To see how this
works, consider a frame system shown in fig. In this example,
the frames person, adult-male, ML-baseball-player, pitcher, and
ML-baseball team are all classes. The frames pee-wee-Reese
and Brooklyn-Dodgers are instances.
Person
Isa: Mammal
Cardinality: 6,000,000,000
*handed: Right
Adult-Male
Isa: person
Cardinality 2,000,000,000
*height: 5-10
ML-Baseball-Player
Isa: adult-male
Cardinality: 624
Page 50
*height: 6-1
*bats: equal to handed
*batting-average: .252
*team:
*uniform-color:
Fielder
Isa: ML-baseball-player
Cardinality: 376
*batting-average: .262
Pee-Wee-Reese
Instance: fielder
Height: 5-10
Bats: right
Batting-average: .309
Team: Brooklyn-Dodgers
Uniform-color: Blue
ML-Baseball-Team
Isa: Team
Cardinality: 26
*team-size: 24
Page 51
*manager:
Brooklyn-dodgers
Instance: ML-Baseball-Team
Team-size: 24
Manager: Leo-Durocher
Players: (Pee-Wee-Reese)
fig. A simplified frame system
Page 52
3. Each atomic formula (i.e. a specific predicate with
variables) is a wff.
4. If A, B, and C are wffs, then so are A, (A B), (A B), (A
B), and (A B).
5. If x is a variable (representing objects of the universe of
discourse), and A is a wff, then so are x A and x A .
6. For example, "The capital of Virginia is Richmond." is a
specific proposition. Hence it is a wff by Rule 2.
Let B be a predicate name representing "being blue" and let
x be a variable. Then B(x) is an atomic formula meaning "x
is blue". Thus it is a wff by Rule 3. above. By applying
Rule 5. to B(x), xB(x) is a wff and so is xB(x). Then by
applying Rule 4. to them x B(x) x B(x) is seen to be a
wff. Similarly if R is a predicate name representing "being
round". Then R(x) is an atomic formula. Hence it is a wff.
By applying Rule 4 to B(x) and R(x), a wff B(x) R(x) is
obtained.
In this manner, larger and more complex wffs can be
constructed following the rules given above.
Note, however, that strings that can not be constructed by
using those rules are not wffs. For example, xB(x)R(x),
and B( x ) are NOT wffs, NOR are B( R(x) ), and B( x
R(x) ) .
Page 53
More examples: To express the fact that Tom is taller than
John, we can use the atomic formula taller(Tom, John),
which is a wff. This wff can also be part of some
compound statement such as taller(Tom, John)
taller(John, Tom), which is also a wff.
Unit-3
Handling Uncertainty and learning
Fuzzy Logic: In the techniques we have not modified the
mathematical underpinnings provided by set theory and logic.
We have instead augmented those ideas with additional
constructs provided by probability theory. We take a different
approach and briefly consider what happens if we make
fundamental changes to our idea of set membership and
corresponding changes to our definitions of logical operations.
Page 54
The motivation for fuzzy sets is provided by the need
to represent such propositions as:
John is very tall.
Mary is slightly ill.
Sue and Linda are close friends.
Exceptions to the rule are nearly impossible.
Most Frenchmen are not very tall.
While traditional set theory defines set membership as a
Boolean predicate, fuzzy set theory allows us to represent set
membership as a possibility distribution.
Once set membership has been redefined in this way, it is
possible to define a reasoning system based on techniques for
combining distributions. Such responders have been applied
control systems for devices as diverse trains and washing
machines.
Dempster Shafer Theory: This theory was developed by
Dempster 1968; Shafer, 1976. This approach considers sets of
propositions and assigns to each of them an interval
[Belief, Plausibility]
In which the degree of belief must lie. Belief (usually
denoted Bel) measures the strength of the evidence in favor of a
Page 55
set of propositions. It ranges from 0 (indicating no evidence) to
1(denoting certainty).
A belief function, Bel, corresponding to a specific m for
the set A, is defined as the sum of beliefs committed to every
subset of A by m. That is, Bel (A) is a measure of the total
support or belief committed to the set A and sets a minimum
value for its likelihood. It is defined in terms of all belief
assigned to A as well as to all proper subsets of A. Thus,
Bel (A) =∑m (B)
For example, if U contains the mutually exclusive subsets A, B,
C, and D then
Bel({A,C,D})= m({A,C,D}) +m({A,C})+m({A,D})+m({C,D})
+m({A})+m({c})+m({D}.
In Dempster-Shafer theory, a belief interval can also be defined
for a subset A. It is represented as the subinterval [Bel (A), P1
(A)] of [0, 1]. Bel (A) is also called the support of A, and P1 (A)
=1-Bel (A) the plausibility of A.
We define Bel (o) =0 to signify that no belief should be
assigned to the empty set and Bel (U) = 1 to show that the truth
is contained within U. The subsets A of U are called the focal
elements of the support function Bel when m (A)>0.
Since Bel (A) only partially describes the beliefs about
proposition A, it is useful to also have a measure of the extent
Page 56
one believes in A that is, the doubts regarding A. For this, we
define the doubt of A as D (A) = Bel (A). From this definition it
will be seen that the upper bound of the belief interval noted
above, P1 (A), can be expressed as P1 (A) =1-D (A) = 1- Bel
(`A). P1 (A) represents an upper belief limit on the proposition
A. The belief interval, [Bel (A), P (A)], is also sometimes
referred to as the confidence in A, while the quantity P1 (A)-Bel
(A) is referred to as the uncertainty in A. It can be shown that
P1 (0) = 0, P1 (U) =1
For all A,
P1 (A) ≥Bel (A)
Bel (A) +Bel (`A) ≤1,
P1 (A) + P1 (`A) ≥1, and
For A _B,
Bel (A) ≤ Bel (B), P1 (A) ≤ P1 (B)
As an example of the above concepts, recall once again the
problem of identifying the terrorist organizations A, B, C and D
could have been responsible for the attack. The possible subsets
of U in this case form a lattice of sixteen sub sets (fig).
{A, B, C, D}
Page 57
{A, B, C,} {A, B, D} {A, C, D} {B, C, D}
{A, B} {A, C,} {B, C,} {B, D} {A, C,} {C, D} {B, D} {C, D}
{O}
Fig Lattice of subsets of the universe U.
Page 59
P (Hi) = the priori probability that hypothesis i is true in the
absence of any specific evidence. These probabilities are called
prior probabilities of priors.
K=the number of positive hypotheses
Baye’s theorem then states that
P (Hi/E) = P (E/Hi). P (Hi)
∑ P (E/Hn).P (Hn)
Specifically, when we say P (A/B), we are describing
the conditional probability of A given that the only evidence we
have is B. If there is also other relevant evidence, then it too
must be considered. Suppose, for example, that we are solving a
medical diagnosis problem. Consider the following assertions:
S: patient has spots
M: patient has measles
F: patient has high fever
Without any additional evidence, the presence of spots serves
as evidence in favor of measles. It also serves as evidence of
fever since measles would cause fever. But, since spots and
fever are not independent events, we cannot just sum their
effects; instead, we need to represent explicitly the conditional
probability that arises from their conjunction. In general, given a
Page 60
prior body of evidence e and some new observation E, we need
to compute.
P (H/E, e) = P (H/E).P (e/E, H)
P (e/E)
Unfortunately, in an arbitrarily complex world, the sizes of
the set of join probabilities that we are require in order to
compute this function grows as 2n if there are n different
propositions being considered. This makes using Baye’s
theorem intractable for several reasons:
The knowledge acquisition problem is insurmountable; too
many probabilities have to be provided.
The space that would be required to store all the
probabilities is too large.
The time required to compute the probabilities is too large.
Despite these problems, through Bayesian statistics provide
an attractive basis for an uncertain reasoning system. As a
result, several mechanisms for exploiting its power while at
the same time making it tractable have been developed.
Learning: One of the most often heard criticisms of AI is that
machines cannot be called intelligent until they are able to learn
to do new things and to adapt to new situations, rather than
simply doing as they are told to do. There can be little question
that the ability to adapt to new surroundings and to solve new
Page 61
problems is an important characteristics of intelligent entities.
How to interpret its inputs in such a way that its performance
gradually improves.
Learning denotes changes in the systems that are
adaptive in the sense that they enable the system to do the same
task or tasks drawn from the same population more efficiently
and more effectively the next time.
Learning covers a wide range of phenomena.
1. At one end of the spectrum is skill refinement. People get
better at many tasks simply by practicing. The more you
ride a bicycle or play tennis, the better you get.
2. At the other end of this spectrum lies knowledge
acquisition. Knowledge is generally acquired through
experience
3. Many AI programs are able to improve their performance
substantially through rote- learning techniques.
4. Another way we learn is through taking advice from others.
Advice taking is similar to rote learning, but high-level may
not be in a form simple enough for a program to use
directly in problem solving.
5. People also learn through their own problem solving
experience. After solving a complex problem, we
remember the structure of the problem and the methods we
used to solve it. The next time we see the problem, we can
Page 62
solve it more efficiently. Moreover, we can generalize from
our experience to solve related problems more easily. the
program remembers its experiences and generalizes from
them. In large problem spaces, however, efficiency gains
are critical. Learning can mean the difference between
solving a problem rapidly and not solving it at all. In
addition, programs that learn through problem-solving
experience may be able to come up with qualitatively better
solutions in the future.
6. Another form of learning that does involve stimuli from the
outside is learning from examples. Learning from examples
usually involves a teacher who helps us classify things by
correcting us when we are wrong. Sometimes, however, a
program can discover things without the aid of a teacher.
Learning is itself a problem-solving process.
Learning Model: Learning can be accomplished using a
number of different methods. For example, we can learn by
memorizing facts, by being told, or by studying examples like
problem solutions. Learning requires that new knowledge
structures be created from some form of input stimulus. This
new knowledge must then be assimilated into a knowledge
base and be tested in some way for its utility. Testing means
that the knowledge should be used in the performance of
some task from which meaningful feedback can be obtained,
Page 63
where the feedback provides some measure of the accuracy
and usefulness of the newly acquired knowledge.
Learning model is depicted in fig where the
environment has been included as part of the overall learner
system. The environment may be regarded as either a form of
nature which produces random stimuli or as a more organized
training source such as a teacher which provides carefully
selected training examples for the learner component.
Stimuli
Examples
Feedback u
Learner
Component
Environment
Or Teacher Knowledge Critic
performance
Base
Evaluator
Response
Performance
Component
Tasks
Fig. Learning Model
Page 64
The actual form of environment used will depend on the
particular learning paradigm. In any case, some representation
language must be assumed for communication between the
environment and the learner. The language may be the same
representation scheme as that used in the knowledge base (such
as form of predicate calculus). When they are chosen to be the
same we say the single representation trick is being used. This
usually results in a simpler implementation since it is not
necessary to transform between two or more different
representations.
Inputs to the learner component may be physical stimuli
of some type or descriptive, symbolic training examples. The
information conveyed to the learner component is used to create
and modify knowledge structures in the knowledge base.
When given a task, the performance component produces a
response describing actions in performing the task. The critic
module then evaluates this response relative to an optimal
response.
The cycle described above may be repeated a number of
times until the performance of the system have reached some
acceptable level, until a known learning goal has been reached,
or until changes cease to occur in the knowledge base after some
chosen numbers of training examples have been observed.
Page 65
There are several important factors which influence a
system’s ability to learn in addition to the form of representation
used. They include the types of training provided, the form and
extent of any initial background knowledge, the type of
feedback provided, and the learning algorithms used (fig).
Background
knowledge
Feedback
Learning resultant
Algorithms
Training
Scenario
Representation
scheme
Page 66
useful information from training examples and take advantage of
any background knowledge out perform those that do not.
Page 68
Factors to consider
.
Generalizations of supervised learning
Page 71
Structured prediction: When the desired output value is a
complex object, such as a parse tree or a labeled graph, then
standard methods must be extended.
.Learning to rank: When the input is a set of objects and the
desired output is a ranking of those objects, then again the
standard methods must be extended.
Dog 1 0 0 0 0 0
Cat 1 0 0 0 0 0
Bat 1 0 0 1 0 0
Whale 1 0 0 0 1 0
Canary 0 0 1 1 0 1
Robin 0 0 1 1 0 1
Ostrich 0 0 1 1 0 1
Snake 0 1 0 0 0 1
Lizard 0 1 0 0 0 1
Page 72
Alligator 0 1 0 0 1 1
Fig. Data for unsupervised learning
This form of learning is called unsupervised learning
because no teacher is required. Given a set of input data, the
network is allowed to play with it to try to discover regularities
and relationships between the different parts of the input.
Learning is often made possible through some notion of which
features in the input sets are important. But often we do not
know in advance which features are important, and asking a
learning system to deal with raw input data can be
computationally expensive. Unsupervised learning can be used
as a “feature discovery” module that precedes supervised
learning.
Consider the data in fig. the group of ten animals, each is
described by its own set of features, breaks down naturally into
three groups: mammals, reptiles and birds. We would like to
build a network that can learn which group a particular animal
belongs to, and to generalize so that it can identify animals it has
not yet seen. We can easily accomplish this with a six-input,
three-output back propagation network. We simply present the
network with an input, observe its output, and update its weights
based on the errors it makes. Without a teacher, however, the
error cannot be computed, so we must seek other methods.
Our first problem is to ensure that only one of the three output
units become active for any given input. One solution to this
problem is to let the network settle, find the output unit with the
Page 73
highest level of activation, and set that unit to 1 and all other
output units to 0. In other words, the output unit with the highest
activation is the only one we consider to be active. A more
neural-like solution is to have the output units fight among
themselves for control of an input vector.
Page 74
developing some general conclusions or theories. (Thanks to
William M.K. Trochim for these definitions).
To translate this into an approach to learning a skill, deductive
learning is someone TELLING you what to do, while inductive
learning is someone SHOWING you what to do. Remember the
saying "a picture is worth a thousand words"? That means, in a
given amount of time, a person can be SHOWN a thousand
times more information than they could be TOLD in the same
amount of time. I can access a picture or pattern much more
quickly than the equivalent description of that picture or pattern
in words.
Athletes often practice "visualization" before they undertake an
action. But in order to visualize something, you need to have a
picture in your head to visualize. How do you get those pictures
in your head, by WATCHING. Who do you watch?
Professionals. This is the key. Pay attention here. When you
want to learn a skill:
Page 77
observations about an item to conclusions about the item's target
value. More descriptive names for such tree models are
classification trees or regression trees. In these tree structures,
leaves represent classifications and branches represent
conjunctions of features that lead to those classifications.
In decision analysis, a decision tree can be used to visually and
explicitly represent decisions and decision making. In data
mining, a decision tree describes data but not decisions; rather
the resulting classification tree can be an input for decision
making..
General
Page 79
Decision tree advantages
Page 80
in a time short enough to enable stakeholders to take
decisions based on its analysis.
Page 82
NNodded Dependency-directed backtracking is a problem-solving (
Dependency directed Backtracking: Dependency directed
backtracking is a problem solving (qv) technique for efficiently
evading contradictions. It is invoked when the problem solvers
discovers that its current state is inconsistent. The goal is, in a
single operation, to change the problem solver’s current state to
neither one that contains neither the contradiction just uncovered
nor any contradiction encountered previously. This is achieved
by consulting records of the inferences the problem solver has
performed and records of previous contradiction, which
dependency- directed backtracking has constructed in response
to previous contradictions.
Contrast to backtracking: Dependency directed backtracking
was developed to avoid the deficiencies of chronological
backtracking. Consider the application of chronological
backtracking to the following task (see fig): First do one of A or
B, then one of C or D, and then one of E or F. Assume that each
step requires significant problem solving effort and that A and C
together or B and E together produce a contradiction that is only
uncovered after significant effort. Fig illustrates the sequence of
problem –solving states that chronological backtracking goes
through to find all solutions (6, 7, 11 and 14)
Backtracking to an Appropriate Choice: The first deficiency
of chronological backtracking is illustrated by the unnecessary
state 4. The contradiction discovered in state 3 depends on
choices A and C and not E. Therefore, replacing the choices E
with F and working on state 4 is futile, as this change does not
remove the contradiction. Unlike chronological backtracking,
which replaces the most recent choice, dependency directed
Page 83
backtracking replaces a choice that caused the contradiction.
The discovery that state 3 is inconsistent causes immediate
backtracking to state 5. To be able to determine which choices
underlie the contradiction requires that the problem solver store
dependency records with every datum that it infers.
Avoiding Rediscovering contradiction. The second deficiency
of chronological backtracking is illustrated by un- necessary
state 13. The contradiction discovered in state 10 depends on B
and E. As E is the most recent choice, chronological and
dependency directed backtracking are indistinguishable, both
backtracking to state 11. How ever, as B and E are known to be
inconsistent with each other, there is no point in rediscovering
this contradiction by working in state 13.
Page 84
Disadvantages:
1. Dependency-directed backtracking incurs a significant time
and space overhead as it requires the maintenance of
dependency records and an additional no-good database.
Thus the effort required to maintain the dependencies may
be more than the problem-solving effort solved.
Page 85
2. If the problem solver is logically complete and finishes all
work on a state before considering the next, the problem of
backtracking to an inappropriate choice cannot occur.
3. In such cases much of the advantage of Dependency-
directed backtracking is irrelevant. However, most practical
problem solvers are neither logically complete nor finish all
possible work on a state before considering one other.
Fuzzy function:
Membership function is the one of the fuzzy function which is
used to develop the fuzzy set value. The fuzzy logic is depends
upon membership function
Unit-4
Natural Language processing and planning
Page 86
Backward chaining: Backward chaining (or backward
reasoning) is an inference method used in automated theorem
provers, proof assistants and other artificial intelligence
applications. It is one of the two most commonly used methods
of reasoning with inference rules and logical implications – the
other is forward chaining. Backward chaining is implemented in
logic programming by SLD resolution. Both rules are based on
the modus ponens inference rule.
Page 89
Machine translation: Machine translation is architecture
for SMT that revolves around multi trees. Figure 2 shows how to
build and use a rudimentary Machine Translation system,
starting from some multi text and one or more monolingual tree
banks.
The recipe follows:
T1. Induce a word-to-word translation model.
T2. Induce PCFGs from the relative frequencies of productions
in the monolingual tree banks
T3. Synchronize some multi text,
Page 90
T4. Induce an initial PMTG from the relative frequencies of
productions in the multi tree bank.
T5. Re-estimate the PMTG parameters, using a
Synchronous parser with the expectation smearing.
A1. Use the PMTG to infer the most probable multi tree
Covering new input text.
A2. Linearize the output dimensions of the multi tree.
Steps T2, T4 and A2 are trivial. Steps T1, T3, T5, and A1 are
instances of the generalized parsers
Figure 2 is only architecture. Computational
Complexity and generalization error stand in the
Way of its practical implementation. Nevertheless,
it is satisfying to note that all the non-trivial algorithms
In Figure 2 are special cases of Translator CT.
It is therefore possible to implement an MTSMT
System using just one inference algorithm, parameterized
By a grammar, a smearing, and a search
Strategy. An advantage of building an MT system in
This manner is that improvements invented for ordinary
Parsing algorithms can often be applied to all
The main components of the system. For example,
Me lamed (2003) showed how to reduce the computational
complexity of a synchronous parser by _ _3_, just by changing
the logic.
The same optimization can be applied to the inference
algorithms... With proper software design, such optimizations
Need never be implemented more than once. For simplicity, the
algorithms in this are based on CKY logic. However, the
architecture in Figure 2 can also be implemented using
Page 91
generalizations of more sophisticated parsing logics, such as
those inherent in Early or Head-Driven parsers
Page 92
1. We can take advantage of past and future research on
making parsers more accurate and more efficient.
Therefore,
2. We can concentrate our efforts on better models, without
worrying about MT-specific search algorithms.
3. More generally and most importantly, this approach
encourages MT research to be less specialized and more
transparently related to the rest of computational
linguistics.
Page 95
B
how a state, called S0, of a simple blocks world problem could
be represented.
ON (A, B, S0) ^
Page 96
PRECONDITION and DELETE lists must be specified
separately.
STACK(X, Y)
P: CLEAR(Y) ^HOLDING(X)
D: CLEAR(Y) ^HOLDING(X)
A: ARMEMPTY^ON(X, Y)
UNSTACK(X, Y)
D: ON(X, Y) ^ARMEMPTY
A: HOLDING(X) ^ON(X, Y)
PICKUP(X)
D: ONTABLE(X) ^ARMEMPTY
A: HOLDING(X)
PUTDOWN(X)
P: HOLDING(X)
D: HOLDING(X)
A: ONTABLE(X) ^ARMEMPTY
Page 97
Detecting a solution. A planning system has succeeded in
finding a solution to a problem when it has found a sequence of
operators that transforms the initial problem state into the goal
state.
Detecting Dead ends. As a planning system is searching for a
sequence of operators to solve a particular problem, it must be
able to detect when it is exploring a path that can never lead to a
solution. The same reasoning mechanisms that can be used to
detect a solution can often be used for detecting a dead end.
Goal stack planning: The technique to be developed for
solving compound goals that many interact was the use of a goal
stack. This was the approach used by STRIPS. In this method,
the problem solver makes use of a single stack that contains both
goals and operators that have been proposed to satisfy those
goals. The problem solver also relies on a database that
describes the current situation and a set of operators described in
PRECONDITION, ADD, and DELETE lists. To see how this
method works, let us carry it through for the simple example
shown in fig.
B C B
C D
A A D
Page 98
ONTABLE (D) ^ ONTABLE (D)
ARMEMPTY
ON (C, A) ON (B, D)
ON (B, D) ON (C, A)
[1] [2]
Page 99
not very interesting. Exploring alternative 1, we first check to
see whether ON (C, A) is true in the current state.
function solution?(plan)
if causal-links-establishing-all-preconditions-of-all-steps(plan)
and all-threats-resolved(plan)
and all-temporal-ordering-constraints-consistent(plan)
and all-variable-bindings-consistent(plan)
then return true;
else return false;
end
function select-subgoal(plan)
pick a plan step S-need from steps(plan) with a precondition c
that has not been achieved;
return (S-need, c);
end
procedure resolve-threats(plan)
foreach S-threat that threatens link "Si --->c Sj" in Links(plan)
begin // "declobber" threat
choose either
Demotion: add "S-threat < Si" to Orderings(plan)
or Promotion: add "Sj < S-threat" to Orderings(plan);
if not(consistent(plan)) then return fail;
end
end
Unit-5
Page 103
Expert System and AI languages
Introduction: An expert system is a set of programs that
manipulate encoded knowledge to solve problems in a
specialized domain that normally requires human expertise. An
expert system’s knowledge is obtained form expert sources and
coded in a form suitable for the system to use in its interference
or reasoning processes. The expert knowledge must be obtained
from specialists or other sources of expertise, such as texts,
journal articles, and data bases. This type of knowledge usually
requires much training and experience in some specialized field
such as medicine, geology, system configuration, or engineering
design.
Characteristic Features of Expert systems: Expert
systems differ from conventional computer systems in several
important ways.
1. Expert systems use knowledge rather than data to control
the solution process. Much of the knowledge used is
heuristic in nature rather than algorithmic.
2. The knowledge is encoded and maintained as an entity
separate from the control program. As such, it is not
compiled together with the control program itself. This
permits the incremental addition and modification
(refinement) of the knowledge base without recompilation
of the control programs. Furthermore, it is possible in
Page 104
some cases to use different knowledge bases with the same
control programs to produce different types of expert
systems.
3. Expert systems are capable of explaining how a particular
conclusion was reached, and why requested information is
needed during a consultation. This is important as it gives
the user a chance to assess and understand the system’s
reasoning ability, thereby improving the user’s confidence
in the system.
4. Expert systems use symbolic representations for
knowledge (rules, networks, or frames) and perform their
interference through symbolic computations that closely
resemble manipulations of natural language.
Expert systems often reason with Meta knowledge; that is, they
reason with knowledge about themselves, and their own
knowledge limits and capabilities.
Applications:
Different types of medical diagnoses(internal medicine,
pulmonary diseases, infectious blood disease, and so on)
Diagnosis of complex electronic and electrochemical
systems.
Diagnosis of diesel electric locomotion systems.
Diagnosis of software development projects.
Forecasting crop damage.
Page 105
Location of faults in computer and communication systems.
Development engine
Knowledge Pro
ble
Base
m
Inference engine
Do
mai
User interface n
User
Page 106
IF: Condition-1 and Condition-2 and Condition-3
THEN Take Action-4
IF: The temperature is greater than 200 degrees, and
the water level is low
THEN: Open the safety valve.
A&B & C&D E&F
Each rule represents a small chunk of knowledge
relating to the given domain of expertise which leads from
some initially known facts to some useful conclusions or
action part of the rule is then accepted as known(or at least
known with some degree of certainty).
Inference in production systems is
accomplished by a process of chaining through the rules
recursively, either in a forward or backward direction, until a
conclusion is reached or until failure occurs. The selection of
rules used in the chaining process is determined by matching
current facts against the domain knowledge or variables in rules
and choosing among a candidate set of rules the ones that meet
some given criteria, such as specificity. The inference process is
typically carried out in an interactive mode with the user
providing input parameters needed to complete the chaining
process.
Page 107
EXPERT SYSTEM
USER
Explanation
Module
Output
Learning
Module
Page 109
ENCOURAGED (student) MOTIVATED (student)
Page 110
agree with the reasoning steps presented they may be changed
using the editor.
To respond to a why query, the explanation module
must be able to explain why certain information is needed by the
inference engine to complete a step in the reasoning process
before it can proceed. For example, in diagnosing a car that will
not start, a system might be asked why it needs to know the
status of the distributor spark.
Building a knowledge Base: The editor is used by developers
to create new rules for addition to the knowledge base, to delete
outmoded rules, or to modify existing rules in some way.
Consistency tests for newly created rule. Such systems also
prompt the user for missing information.
The I/O Interface: The input-output interface permits the user
to communicate with the system in a more natural way by
permitting the use of simple selection menus or the use of a
restricted language which is close to a natural language. This
means that the system must have special prompts or a
specialized vocabulary which encompasses the terminology of
the given domain of expertise.
Nonproduction System Architectures: Instead of rules,
these systems employ more structured representation schemes
like associative or semantic networks, frame and rule structures,
Page 111
and decision trees, or even specialized networks like neural
networks.
Associative or Semantic Network Architectures: We know
that an associative network is a network made up of nodes
connected by directed arcs. The nodes represent objects,
attributes, concepts, or other basic entities, and the arcs, which
are labeled, describe the relationship between the two nodes they
connect. Special network links include the ISA and HASPART
links which designate an object as being a certain type of object
(belonging to a class of objects) and as being a subpart of
another object, respectively.
Associative network representations are especially useful
in depicting hierarchical knowledge structures, where property
inheritance is common. More often, these network
representations are used in natural language or computer vision
systems or in conjunction with some other form of
representation.
Frame Architectures: Frames are structured sets of closely
related knowledge, such as an object or concept name, the
object’s main attributes and their corresponding values, and
possibility some attached procedures (if-needed, if-added, if-
removed procedures). The attributes, values, and procedures are
stored in specified slots facets of the frame. Individual frames
Page 112
are usually linked together as a much like the nodes in an
associative network.
Decision Tree Architectures: Knowledge for expert systems
may be stored in the form of a decision tree when the knowledge
can be structured in a top-to-bottom manner. For example, the
identification of objects (equipment, faults, physical objects,
diseases) can be made through a decision tree structure. Initial
and intermediate nodes in the tree correspond to object
attributes, and terminal nodes correspond to the identities of
objects. Attribute values for an object determine a path to a leaf
node in the tree which contains object identification. Each object
attribute corresponds to a non terminal node in the tree and each
branch of the decision tree corresponds to an attribute value or
set of values.
Blackboard system Architectures: Blackboard architectures
refer to a special type of knowledge-based system which uses a
form of opportunistic reasoning. This differs from pure forward
or pure backward chaining in production systems in that either
direction may be chosen dynamically at each stage in the
problem solution process.
Blackboard systems are composed of three functional
components as depicted in fig
1. There are a number of knowledge sources which are
separate and independent sets of coded knowledge. Each
Page 113
knowledge source may be thought of as a specialist in
some limited area needed to solve a given subset of
problems; the sources may contain knowledge in the form
of procedures, rules, or other schemes.
2. A globally accessible data base structure, called a
blackboard, contains the current problem state and
information needed by the knowledge sources (input data,
partial solutions, control data, alternatives, and final
solutions). The knowledge sources make changes to the
blackboard data that incrementally lead to a solution.
Communication and interaction between the knowledge
sources takes place solely through the blackboard.
3. Control information may be contained within the sources,
on the blackboard, or possibly in a separate module. The
control knowledge monitors the changes to the blackboard
and determines what the immediate focus of attention
should be in solving the problem.
Analogical Reasoning Architectures: Expert systems based
on analogical architectures solve new problems like humans, by
finding a similar problem solution that is known and applying
the known solution to the new problem, possibly with some
modifications, for example, if we know a method of proving that
the product of two even integer is even, we can successfully
prove that the product of two odd integers is odd through much
the same proof steps. Expert systems using analogical
Page 114
architectures will require a large knowledge base having
numerous problem solutions and other previously encountered
situations or episodes
Neural Network Architectures: Neural networks are large
networks of simple processing elements or nodes which process
information dynamically in response to external inputs. The
nodes are simplified models of neurons. The knowledge in a
neural network is distributed throughout the network in the form
of internodes connections and weighted links which form the
inputs to the nodes. The link weights serve to enhance or inhibit
the input stimuli values which are then added together at the
nodes. If the sum of all the inputs to a node exceeds some
threshold value T, the node executes and produces an output
Page 115
Control information
Fig. Components of blackboard systems.
which is passed on to other nodes or is used to produce some
output response.
Neural networks were originally inspired as being
models of the human nervous system. They are generally
simplified models to be sure.
Knowledge acquisition: Knowledge for expert systems must
be derived from expert sources like experts in the given field,
journal articles, texts, reports, data bases, and so on. Elicitation
of the right Knowledge can take several man years and cost
hundreds of thousands of dollars. This process is now
recognized as one of the main bottlenecks in building expert and
other Knowledge-based systems. Consequently, much effort has
been developed to more effective methods of acquisition and
coding.
Pulling together and correctly interpreting the right
knowledge to solve a set of complex tasks is an onerous job.
Typically, experts do not know what specific Knowledge is
being neither applied nor just how it is applied in the solution of
a given problem. Even if they do know, it is likely they are
unable to articulate the problem solving process well enough to
Page 116
capture the low-level Knowledge used and the inferring
processes applied. This difficulty has lead to the use of AI
experts (called Knowledge engineers) who serve as
intermediaries between the domain expert and the system. The
knowledge engineer elicits information from the experts and
codes this Knowledge into a form suitable for use in the expert
system.
The Knowledge elicitation process is depicted in fig. To
elicit the requisite Knowledge, a Knowledge engineer conducts
extensive interviews with domain experts. During the
interviews, the expert is asked to solve typical problems in the
domain of interest and to explain his or her solutions.
Domain
D Knowledge System Knowledge
engineer Editor
Expert Base
Page 118
parentheses. A string is a group of characters enclosed in double
quotation marks.
Lisp programs run either on an interpreter or as
compiled code. The interpreter examines source programs in a
repeated loop, called the read-evaluate-print loop. This loop
reads the program code, evaluates it, and prints the values
returned by the program. The interpreter signals its read lines to
accept code for execution by printing a prompt such as the ->
symbol.
Examples
Here are examples of Common Lisp code.
The basic "Hello world" program:
(Print "Hello world")
As the reader may have noticed from the above discussion, Lisp
syntax lends itself naturally to recursion. Mathematical problems
such as the enumeration of recursively defined sets are simple to
express in this notation.
Evaluate a number's factorial:
(Defun factorial (n)
(if (<= n 1)
1
(* n (factorial (- n 1)))))
Page 119
An alternative implementation, often faster than the previous
version if the Lisp system has tail recursion optimization:
(defun factorial (n &optional (acc 1))
(if (<= n 1)
acc
(factorial (- n 1) (* acc n))))
Contrast with an iterative version which uses Common Lisp's
loop macro:
(defun factorial (n)
(loop for i from 1 to n
for fac = 1 then (* fac i)
finally (return fac)))
The following function reverses a list. (Lisp's built-in reverse
function does the same thing.)
(defun -reverse (list)
(let ((return-value '()))
(dolist (e list) (push e return-value))
return-value))
Prolog's single data type is the term. Terms are either atoms,
numbers, variables or compound terms.
An atom is a general-purpose name with no inherent
meaning. Examples of atoms include x, blue, 'Taco', and
'some atom'.
Numbers can be floats or integers.
Variables are denoted by a string consisting of letters,
numbers and underscore characters, and beginning with an
upper-case letter or underscore. Variables closely resemble
variables in logic in that they are placeholders for arbitrary
terms.
A compound term is composed of an atom called a
"functor" and a number of "arguments", which are again
terms. Compound terms are ordinarily written as a functor
followed by a comma-separated list of argument terms,
which is contained in parentheses. The number of
Page 122
arguments is called the term's arity. An atom can be
regarded as a compound term with arity zero. Examples of
compound terms are truck_year('Mazda', 1986) and
'Person_Friends'(zelda,[tom,jim]).
Special cases of compound terms:
A List is an ordered collection of terms. It is denoted by
square brackets with the terms separated by commas or in
the case of the empty list, []. For example [1,2,3] or
[red,green,blue].
Strings: A sequence of characters surrounded by quotes is
equivalent to a list of (numeric) character codes, generally
in the local character encoding, or Unicode if the system
supports Unicode. For example, "to be, or not to be".
Examples
An example of a query:
?- write('Hello world!'), nl.
Hello world!
true.
?-
Compiler optimization
Page 123
Any computation can be expressed declaratively as a sequence
of state transitions. As an example, an optimizing compiler with
three optimization passes could be implemented as a relation
between an initial program and its optimized form:
program optimized(Prog0, Prog) :-
optimization_pass_1(Prog0, Prog1),
optimization_pass_2(Prog1, Prog2),
optimization_pass_3(Prog2, Prog).
or equivalently using DCG notation:
program_optimized --> optimization_pass_1,
optimization_pass_2, optimization_pass_3.
Quicksort
lcs([], _, []) :- !.
lcs(_, [], []) :- !.
lcs([X|Xs], [X|Ys], [X|Ls]) :- !, memo(lcs(Xs, Ys, Ls)).
lcs([X|Xs], [Y|Ys], Ls) :-
memo(lcs([X|Xs], Ys, Ls1)), memo(lcs(Xs, [Y|Ys], Ls2)),
length(Ls1, L1), length(Ls2, L2),
( L1 >= L2 -> Ls = Ls1 ; Ls = Ls2 ).
Example query:
?- lcs([x,m,j,y,a,u,z], [m,z,j,a,w,x,u], Ls).
Ls = [m, j, a, u]
Design patterns
Page 126
This can be used to enumerate perfect numbers, and also to
check whether a number is perfect
Page 127
Despite MYCIN's success, it sparked debate about the use of its
ad hoc, but principled, uncertainty framework known as
"certainty factors". The developers performed studies showing
that MYCIN's performance was minimally affected by
perturbations in the uncertainty metrics associated with
individual rules, suggesting that the power in the system was
related more to its knowledge representation and reasoning
scheme than to the details of its numerical uncertainty model.
Some observers felt that it should have been possible to use
classical Bayesian statistics. MYCIN's developers argued that
this would require either unrealistic assumptions of probabilistic
independence, or require the experts to provide estimates for an
unfeasibly large number of conditional probabilities.[1][2]
Subsequent studies later showed that the certainty factor model
could indeed be interpreted in a probabilistic sense, and
highlighted problems with the implied assumptions of such a
model. However the modular structure of the system would
prove very successful, leading to the development of graphical
models such as Bayesian networks
Results
Research conducted at the Stanford Medical School found
MYCIN to propose an acceptable therapy in about 69% of cases,
which was better than the performance of infectious disease
experts who were judged using the same criteria. This study is
often cited as showing the potential for disagreement about
therapeutic decisions, even among experts, when there is no
"gold standard" for correct treatment.
Page 128
Practical use
MYCIN was never actually used in practice. This wasn't
because of any weakness in its performance. As mentioned, in
tests it outperformed members of the Stanford medical school
faculty. Some observers raised ethical and legal issues related to
the use of computers in medicine — if a program gives the
wrong diagnosis or recommends the wrong therapy, who should
be held responsible? However, the greatest problem, and the
reason that MYCIN was not used in routine practice, was the
state of technologies for system integration, especially at the
time it was developed. MYCIN was a stand-alone system that
required a user to enter all relevant information about a patient
by typing in response to questions that MYCIN would pose. The
program ran on a large time-shared system, available over the
early Internet (Arpanet), before personal computers were
developed. In the modern era, such a system would be integrated
with medical record systems, would extract answers to questions
from patient databases, and would be much less dependent on
physician entry of information. In the 1970s, a session with
MYCIN could easily consume 30 minutes or more—an
unrealistic time commitment for a busy clinician.
A difficulty that rose to prominence during the development of
MYCIN and subsequent complex expert systems has been the
extraction of the necessary knowledge for the inference engine
Page 129
to use from the human expert in the relevant fields into the rule
base (the so-called knowledge engineering).
Page 130