0% found this document useful (0 votes)

8K views130 pages

Artificial Intelligence

1. The document discusses four main views of artificial intelligence: trying to emulate human thought, passing the Turing test, focusing on logical reasoning and inference, and building rational agents that act in the best possible manner. 2. It also describes different AI techniques like strong AI that aims to build truly intelligent machines, weak AI that simulates human cognition, applied AI that builds useful systems, and cognitive AI that tests theories of the human mind. 3. Several search algorithms are covered, including best-first search that evaluates nodes based on a heuristic, A* search which is an improved best-first search, and hill climbing search which moves in the direction of increasing value at each step.

Uploaded by

Torres Gautam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8K views130 pages

Artificial Intelligence

Uploaded by

Torres Gautam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 130

Unit-1 Overview and Search Technique Introduction to Artificial Intelligence

Definition of AI What is AI? Artificial Intelligence is concerned with the design of intelligence in an artificial device. The term was coined by McCarthy in 1956. There are two ideas in the definition. 1. Intelligence 2. Artificial device What is intelligence? Is it that which characterize humans? Or is there an absolute standard of judgment? Accordingly there are two possibilities: A system with intelligence is expected to behave as intelligently as a human A system with intelligence is expected to behave in the best possible manner Secondly what type of behavior are we talking about? Are we looking at the thought process or reasoning ability of the system? Or are we only interested in the final manifestations of the system in terms of its actions? Given this scenario different interpretations have been used by different researchers as defining the scope and view of Artificial Intelligence.
Page 1

1. One view is that artificial intelligence is about designing systems that are as intelligent as humans. This view involves trying to understand human thought and an effort to build machines that emulate the human thought process. This view is the cognitive science approach to AI. 2. The second approach is best embodied by the concept of the Turing Test. Turing held that in future computers can be programmed to acquire abilities rivaling human intelligence. As part of his argument Turing put forward the idea of an 'imitation game', in which a human being and a computer would be interrogated under conditions where the interrogator would not know which was which, the communication being entirely by textual messages. Turing argued that if the interrogator could not distinguish them by questioning, then it would be unreasonable not to call the computer intelligent. Turing's 'imitation game' is now usually called 'the Turing test' for intelligence. 3. Logic and laws of thought deals with studies of ideal or rational thought process and inference. The emphasis in this case is on the inference mechanism, and its properties. That is how the system arrives at a conclusion, or the reasoning behind its selection of actions is very important in this point of view. The soundness and completeness of the inference mechanisms are important here. 4. The fourth view of AI is that it is the study of rational agents. This view deals with building machines that act rationally. The focus is on how the system acts and performs, and not so much on the reasoning process. A rational agent is one that acts rationally, that is, is in the best possible manner.
Page 2

Problem Solving:
Strong AI aims to build machines that can truly reason and solve problems. These machines should be self aware and their overall intellectual ability needs to be indistinguishable from that of a human being. Excessive optimism in the 1950s and 1960s concerning strong AI has given way to an appreciation of the extreme difficulty of the problem. Strong AI maintains that suitably programmed machines are capable of cognitive mental states. Weak AI: deals with the creation of some form of computerbased artificial intelligence that cannot truly reason and solve problems, but can act as if it were intelligent. Weak AI holds that suitably programmed machines can simulate human cognition. Applied AI: Aims to produce commercially viable "smart" systems such as, for example, a security system that is able to recognize the faces of people who are permitted to enter a particular building. Applied AI has already enjoyed considerable success. Cognitive AI: computers are used to test theories about how the human mind works--for example, theories about how we recognize faces and other objects, or about how we solve abstract problems. Best First Search: Best First search, which is a way of combining the advantages of both depth-first and breadth-firstsearch into a single method. One way of combining the two is to follow a single path at a time, but switch paths whenever some competing path looks more promising than the current one does.
Page 3

At each step of the best-first-search process, we select the most promising of the nodes we have generated so far. This is done by applying an appropriate heuristic to each of them. We then expanded the chosen node by using the rules to generate its successors. If one of them is a solution, we can quit. If not, all those new nodes are added to the set of nodes generated so far. Again the most promising node is selected and the process continues. Fig shows the beginning of a best-first search procedure. Initially, there is only one node, so it will be expanded. Doing so generates three new nodes. The heuristic function, which, in this example, is an estimate of the cost of getting to a solution from a given node, is applied to each of these new nodes. Since node D is the most promising, if it is expanded next, producing two successor nodes, E and F. But then the heuristic function is applied to them. Now another path, that going through node B, looks more promising, so it is pursued, generating nodes G and H. But again when these new nodes are evaluated they look less promising than another path, so attention is returned to the pass through D to E. E is then expanded, yielding nodes I and J. At the next step, j will be expanded, since it is the most promising. This process can continue until a solution is found. Step 1 Step2 Step3
A A A

Page 4

Step4
A

Step5
A

D B

Fig: A Best-first Search The actual operation of the algorithm is very simple. It proceeds in steps, expanding one node at each step, until it
Page 5

generates a node that corresponds to a goal state. At each step, it picks the most promising of the nodes that have so far been generated but not expanded. It generates the successors of the chosen node, applies the heuristic function to them, and adds them to the list of open nodes, after checking to see if any of them have been generated before. By doing this check, we can guarantee that each node only appears once in this graph, although many nodes may point to it as a successor. Then the next step begins. The process can be summarized as follows. Algorithm: Best-First Search 1. Start with OPEN containing just the initial state. 2. Until a goal is found or there are no nodes left on OPEN do: (a) Pick the best node on OPEN. (b) Generate its successors. (c) For each successor do: i. If it has not been generated before, evaluate it, add it to OPEN, and record its parent. ii. If it has been generated before, change the parent if this new path is better than the previous one. In that case,
Page 6

update the cost of getting this node and to any successors that this node may already here.

A* Search: The best-first search algorithm that was just

presented is a simplification of an algorithm called A*, which was first presented by Hart et al. [1968; 1972]. This algorithm uses the same f, g, and h functions, as well as the lists OPEN and CLOSED The form of heuristic estimation function for A* is f*(n) =g*(n)+h*(n) Where the two components g*(n) and h*(n) are estimates of the cost (or distance) from the start node to node n and the cost form node n to a goal node, respectively. Nodes on the open list are nodes that have been generated but not yet expanded while nodes on the closed list are nodes have been expanded and whose children are, therefore, available to the search program. The A* algorithm proceeds as follows Algorithm: A* 1. Place the starting node s on open. 2. If open is empty, stop and return failure.

Page 7

3. Remove from open the node n that has the smallest value of f*(n). If the node is a goal node, return success and stop. Otherwise, 4. Expand n, generating all of its successors n and place n on closed. For every successor n, if n is not already on open or closed attach a back-pointer to n, compute f*(n) and place it an open. 5. Each n that is already on open or closed should be attached to back-pointers which reflect the lowest g*(n) path. If n was on closed and its pointer was changed, remove it and place it on open. 6. Return to step2. It has been shown that the A* algorithm is both complete and admissible. Thus, A* will always find an optimal path if one exists. The efficiency of an A* algorithm depends on how closely h* approximates h and the cost of the computing f*.

AO* algorithm:
1. Place the starting node s on open
2.

Using the search tree constructed thus far, compute the most promising solution tree To. Select a node n that is both on open and a part of To. Remove n from open and place it on closed.
Page 8

4. If n is a terminal goal node, label n as solved. If the solution of n results in any of ns ancestors being solved, label all the ancestors as solved. If the start node s is solved, exit with success where To is the solution tree. Remove from open all nodes with a solved ancestor. 5. If n is not a solvable node (operators cannot be applied), label n as unsolvable. If the start node is labeled as unsolvable, exist with failure. If any of ns ancestors become unsolvable because n is, label them unsolvable as well. Remove from open all nodes with unsolvable ancestors.
6.

Otherwise, expand node n generating all of its successors. For each such successor node that contains more than one sub problem, generate their successors to give individual sub problems. Attach to each newly generated node a back pointer to its predecessor. Compute the cost estimate h* for each newly generated node and place all such nodes that do not yet have descendents on open. Next, recomputed the values of h* at n and each ancestor of n.

7. Return to step2.

Hill Climbing Search: Hill climbing gets their names

from the way the nodes are selected for expansion. At each point in the search path, a successor node that appears to lead most quickly to the top of the hill (the goal) is selected for
Page 9

exploration. This method requires that some information be available with which to evaluate and order the most promising choices. Hill climbing is like depth-first searching where the most promising child is selected for expansion. When the children have been generated, alternative choices are evaluated using some type of heuristic function. The path that appears most promising is then chosen and no further reference to the parent or other children is retained. This process continues from node-to-node with previously expanded nodes being discarded. Hill climbing can produce substantial savings over blind searches when an informative, reliable function is available to guide the search to a global goal. It suffers from some serious drawbacks when this is not the case. Potential problem types named after certain terrestrial anomalies are the foothill, ridge, and plateau traps. The foothill trap results when local maxima or peaks are found. In this case the children all have less promising goal distances than the parent node. The search is essentially trapped at the local node with no indication of goal direction. The only way to remedy this problem is to try moving in some arbitrary direction a few generations in the hope that the

Page 10

real goal direction will become evident, backtracking to an ancestor node and trying a secondary path choice. A second potential problem occurs when several adjoining nodes have higher values than surrounding nodes. This is the equivalent of a ridge. The search may encounter a plateau type of structure, that is, an area in which all neighboring nodes have the same values. Once again, one of the methods noted above must be tried to escape the trap.

Breadth-First Search: Breadth-first searches are

performed by exploring all nodes at a given depth before proceeding to the next level. This mans that all immediate children of nodes are explored before any of the childrens children are considered. Breadth first tree search is illustrated in fig. It has the obvious advantage of always finding a minimal path length solution where one exists. A great many nodes may need to be explored before a solution is found, especially if the tree is very full. It uses a queue structure to hold all generated but still unexplored nodes. The breadthfirst algorithm proceeds as follows. BREADTH-FIRST SEARCH 1. Place the starting node s on the queue. 2. If the queue is empty, return failure and stop.

Page 11

3. If the first element on the queue is a goal node g, return success and stop. Otherwise, 4. Remove and expand the first element from the queue and place all the children at the end of the queue in any order. 5. Return to step2.

Start

Mini-max search: The mini-max search procedure is a depthlimited search procedure. The idea is to start at the current position and use the plausible-move generator to generate the set of possible successor positions. Now we can apply the static evaluation function to those positions and simply choose the best one.
Page 12

The starting position is exactly as good for us as the position generated by the best move we can make next. Here we assume that the static evaluation function returns large values to indicate good situations for us, so our goal is to maximize the value of the static evaluation function of the next board position. An example of this operation is shown in fig1. It assumes a static evaluation function that returns values ranging from -10 to 10, with 10 indicating a win for us, -10 a win for the opponent, and 0 an even match. Since our goal is to maximize the value of the heuristic function, we choose to move to B. backing Bs value up to A, we can conclude that As value is 8, since we know we can move to a position with a value of 8.
A A

(8) B
(8)

C (3)

D (-2) E F

(9)

(-6) (0)

(0)

(-2) (-4) (3)

JJJJ F

Fig1. One ply search and two ply search But since we know that the static evaluation function is not completely accurate, we would like to carry the search farther
Page 13

ahead than one ply. This could be very important for example, in a chess game in which we are in the middle of a piece exchange. After our move, the situation would appear to be very good, but if we look one move ahead, we will see that one of our pieces also gets captured and so the situation is not as seemed. Once the values from the second ply are backed up, it becomes clear that the correct move for us to make at the first level, given the information we have available, is C, since there is nothing the opponent can do from there to produce a value worse than -2. This process can be repeated for as many ply as time allows, and the more accurate evaluations that are produced can be used to choose the correct move at the top level. The alteration of maximizing and minimizing at alternate ply when evaluations are being pushed back up corresponds to the opposing strategies of the two players and gives this method the name minimax. Having described informally the operation of the minimax procedure, we now describe it precisely. It is a straight forward recursive procedure that relies on two auxiliary procedures that are specific to the game being played: 1. MOVGEN (position, player) - The plausible-move generator, which returns a list of nodes representing the moves that can be made by player in position. We call the two players PLAYERONE and PLAYER-TWO; in a chess program, we might use the names BLACK and WHITE instead. 2.STATIC(position, player) the static evaluation function, which returns a number representing the goodness of position from the standpoint of player.2

Page 14

Heuristic function
A heuristic function, or simply a heuristic, is a function that ranks alternatives in various search algorithms at each branching step based on the available information (heuristically) in order to make a decision about which branch to follow during a search. Shortest paths For example, for shortest path problems, a heuristic is a function, h(n) defined on the nodes of a search tree, which serves as an estimate of the cost of the cheapest path from that node to the goal node. Heuristics are used by informed search algorithms such as Greedy best-first search and A* to choose the best node to explore. Greedy best-first search will choose the node that has the lowest value for the heuristic function. A* search will expand nodes that have the lowest value for g(n) + h(n), where g(n) is the (exact) cost of the path from the initial state to the current node. If h(n) is admissiblethat is, if h(n) never overestimates the costs of reaching the goal, then A* will always find an optimal solution. The classical problem involving heuristics is the n-puzzle. Commonly used heuristics for this problem include counting the number of misplaced tiles and finding the sum of the Manhattan distances between each block and its position in the goal configuration. Note that both are admissible.

Page 15

Effect of heuristics on computational performance In any searching problem where there are b choices at each node and a depth of d at the goal node, a naive searching algorithm would have to potentially search around bd nodes before finding a solution. Heuristics improve the efficiency of search algorithms by reducing the branching factor from b to a lower constant b', using a cutoff mechanism. The branching factor can be used for defining a partial order on the heuristics, such that h1(n) < h2(n) if h1(n) has a lower branch factor than h2(n) for a given node n of the search tree. Heuristics giving lower branching factors at every node in the search tree are preferred for the resolution of a particular problem, as they are more computationally efficient. Finding heuristics The problem of finding an admissible heuristic with a low branching factor for common search tasks has been extensively researched in the artificial intelligence community. Several common techniques are used:

Solution costs of sub-problems often serve as useful estimates of the overall solution cost. These are always admissible. For example, a heuristic for a 10-puzzle might be the cost of moving tiles 1-5 into their correct places. A common idea is to use a pattern database that stores the exact solution cost of every sub problem instance.
Page 16

The solution of a relaxed problem often serves as a useful admissible estimate of the original. For example, Manhattan distance is a relaxed version of the n-puzzle problem, because we assume we can move each tile to its position independently of moving the other tiles. Given a set of admissible heuristic functions h1(n),h2(n),...,hi(n), the function h(n) = max{h1(n),h2(n),...,hi(n)} is an admissible heuristic that dominates all of them.

Using these techniques a program called ABSOLVER was written (1993) by A.E. Prieditis for automatically generating heuristics for a given problem. ABSOLVER generated a new heuristic for the 8-puzzle better than any pre-existing heuristic and found the first useful heuristic for solving the Rubik's Cube. Consistency and Admissibility If a Heuristic function never over-estimates the cost reaching to goal, then it is called an Admissible heuristic function. If H(n) is consistent then the value of H(n) for each node along a path to goal node are non decreasing.

Alpha-beta pruning: Alpha-beta pruning is a search

algorithm which seeks to reduce the number of nodes that are evaluated by the min max algorithm in its search tree. It is a search with adversary algorithm used commonly for machine playing of two-player games (Tic-tac-toe, Chess, Go, etc.). It stops completely evaluating a move when at least
Page 17

one possibility has been found that proves the move to be worse than a previously examined move. Such moves need not be evaluated further. Alpha-beta pruning is a sound optimization in that it does not change the score of the result of the algorithm it optimize History Allen Newell and Herbert Simon who used what John McCarthy calls an "approximation"[1] in 1958 wrote that alpha-beta "appears to have been reinvented a number of times".[2] Arthur Samuel had an early version and Richards, Hart, Levine and/or Edwards found alpha-beta independently in the United States.[3] McCarthy proposed similar ideas during the Dartmouth Conference in 1956 and suggested it to a group of his students including Alan Kotok at MIT in 1961.[4] Alexander Brudno independently discovered the alpha-beta algorithm, publishing his results in 1963.[5] Donald Knuth and Ronald W. Moore refined the algorithm in 1975[6][7] and it continued to be advanced. Improvements over naive mini max An illustration of alpha-beta pruning. The grayed-out sub trees need not be explored (when moves are evaluated from left to right), since we know the group of sub trees as a whole yields the value of an equivalent sub tree or worse, and as such cannot influence the final result. The max and min levels represent the turn of the player and the adversary, respectively.

Page 18

The benefit of alpha-beta pruning lies in the fact that branches of the search tree can be eliminated. The search time can in this way be limited to the 'more promising' sub tree, and a deeper search can be performed in the same time. Like its predecessor, it belongs to the branch and bound class of algorithms. The optimization reduces the effective depth to slightly more than half that of simple mini max if the nodes are evaluated in an optimal or near optimal order (best choice for side on move ordered first at each node). With an (average or constant) branching factor of b, and a search depth of d plies, the maximum number of leaf node positions evaluated (when the move ordering is pessimal) is O(b*b*...*b) = O(bd) the same as a simple mini max search. If the move ordering for the search is optimal (meaning the best moves are always searched first), the number of leaf node positions evaluated is about O(b*1*b*1*...*b) for odd depth and
Page 19

O(b*1*b*1*...*1) for even depth, or . In the latter case, where the ply of a search is even, the effective branching factor is reduced to its square root, or, equivalently, the search can go twice as deep with the same amount of computation.The explanation of b*1*b*1*... is that all the first player's moves must be studied to find the best one, but for each, only the best second player's move is needed to refute all but the first (and best) first player move alpha-beta ensures no other second player moves need be considered. If b=40 (as in chess), and the search depth is 12 plies, the ratio between optimal and pessimal sorting is a factor of nearly 406 or about 4 billion times. An animated pedagogical example that attempts to be human-friendly by substituting initial infinite (or arbitrarily large) values for emptiness and by avoiding using the nega max coding simplifications. Normally during alpha-beta, the sub trees are temporarily dominated by either a first player advantage (when many first player moves are good, and at each search depth the first move checked by the first player is adequate, but all second player responses are required to try and find a refutation), or vice versa. This advantage can switch sides many times during the search if the move ordering is incorrect, each time leading to inefficiency. As the number of positions searched decreases exponentially each move nearer the current position, it is worth spending considerable effort on sorting early moves. An improved sort at any depth will exponentially reduce the total number of positions searched, but sorting all positions at depths near the root node is relatively cheap as there are so few of them. In
Page 20

practice, the move ordering is often determined by the results of earlier, smaller searches, such as through iterative deepening. The algorithm maintains two values, alpha and beta, which represents the minimum score that the maximizing player is assured of and the maximum score that the minimizing player, is assured of respectively. Initially alpha is negative infinity and beta is positive infinity. As the recursion progresses the "window" becomes smaller. When beta becomes less than alpha, it means that the current position cannot be the result of best play by both players and hence need not be explored further. Additionally, this algorithm can be trivially modified to return an entire principal variation in addition to the score. Some more aggressive algorithms such as MTD(f) do not easily permit such a modification. Pseudo code Function alpha beta (node, depth, , , Player) if depth = 0 or node is a terminal node return the heuristic value of node If Player = Max Player for each child of node := max(, alpha beta(child, depth-1, , , not(Player) )) if Break (* Beta cut-off *) return else for each child of node := min(, alpha beta(child, depth-1, , , not(Player) ))
Page 21

if break (* Alpha cut-off *) return (* Initial call *) Alpha beta (origin, depth, -infinity, +infinity, Max Player) Heuristic improvements Alpha-beta search can be made even faster by considering only a narrow search window (generally determined by guesswork based on experience). This is known as aspiration search. In the extreme case, the search is performed with alpha and beta equal; a technique known as zero-window search, null-window search, or scout search. This is particularly useful for win/loss searches near the end of a game where the extra depth gained from the narrow window and a simple win/loss evaluation function may lead to a conclusive result. If an aspiration search fails, it is straightforward to detect whether it failed high (high edge of window was too low) or low (lower edge of window was too high). This gives information about what window values might be useful in a re-search of the position.

Constraint Satisfaction, Constraint satisfaction is the

process of finding a solution to a set of constraints that impose conditions that the variables must satisfy. A solution is therefore a vector of variables that satisfies all constraints. The techniques used in constraint satisfaction depend on the kind of constraints being considered. Often used are constraints on a finite domain, to the point that constraint satisfaction problems are typically identified with problems based on
Page 22

constraints on a finite domain. Such problems are usually solved via search, in particular a form of backtracking or local search. Constraint propagation are other methods used on such problems; most of them are incomplete in general, that is, they may solve the problem or prove it un satisfiable, but not always. Constraint propagation methods are also used in conjunction with search to make a given problem simpler to solve. Other considered kinds of constraints are on real or rational numbers; solving problems on these constraints is done via variable elimination or the simplex algorithm. Constraint satisfaction originated in the field of artificial intelligence in the 1970s (see for example (Laurire 1978)). During the 1980s and 1990s, embedding of constraints into a programming language were developed. Languages often used for constraint programming are Prolog and C++.

Page 23

Constraint satisfaction problem Constraints enumerate the possible values a set of variables may take. Informally, a finite domain is a finite set of arbitrary elements. A constraint satisfaction problem on such domain contains a set of variables whose values can only be taken from the domain, and a set of constraints, each constraint specifying the allowed values for a group of variables. A solution to this problem is an evaluation of the variables that satisfies all constraints. In other words, a solution is a way for assigning a value to each variable in such a way that all constraints are satisfied by these values. In practice, constraints are often expressed in compact form, rather than enumerating all values of the variables that would satisfy the constraint. One of the most used constraints is the one establishing that the values of the affected variables must be all different. Problems that can be expressed as constraint satisfaction problems are the Eight queens puzzle, the Sudoku solving problem, the Boolean satisfiability problem, scheduling problems and various problems on graphs such as the graph coloring problem. While usually not included in the above definition of a constraint satisfaction problem, arithmetic equations and inequalities bound the values of the variables they contain and can therefore be considered a form of constraints. Their domain
Page 24

is the set of numbers (either integer, rational, or real), which is infinite: therefore, the relations of these constraints may be infinite as well; for example, X = Y + 1 has an infinite number of pairs of satisfying values. Arithmetic equations and inequalities are often not considered within the definition of a "constraint satisfaction problem", which is limited to finite domains. They are however used often in constraint programming. Solving Constraint satisfaction problems on finite domains are typically solved using a form of search. The most used techniques are variants of backtracking, constraint propagation, and local search. These techniques are used on problems with nonlinear constraints. Variable elimination and the simplex algorithm are used for solving linear and polynomial equations and inequalities, and problems containing variables with infinite domain. These are typically solved as optimization problems in which the optimized function is the number of violated constraints. Complexity Solving a constraint satisfaction problem on a finite domain is an NP complete problem. Research has shown a number of tractable sub cases, some limiting the allowed constraint relations, some requiring the scopes of constraints to form a tree, possibly in a reformulated version of the problem. Research has also established relationship of the constraint satisfaction

Page 25

problem with problems in other areas such as finite model theory.

Constraint programming Constraint programming is the use of constraints as a programming language to encode and solve problems. This is often done by embedding constraints into a programming language, which is called the host language. Constraint programming originated from a formalization of equalities of terms in Prolog II, leading to a general framework for embedding constraints into a logic programming language. The most common host languages are Prolog, C++, and Java, but other languages have been used as well. Constraint logic programming A constraint logic program is a logic program that contains constraints in the bodies of clauses. As an example, the clause A(X):-X>0,B(X) is a clause containing the constraint X>0 in the body. Constraints can also be present in the goal. The constraints in the goal and in the clauses used to prove the goal are accumulated into a set called constraint store. This set contains the constraints the interpreter has assumed satisfiable in order to proceed in the evaluation. As a result, if this set is detected un satisfiable, the interpreter backtracks. Equations of terms, as used in logic programming, are considered a particular form of
Page 26

constraints which can be simplified using unification. As a result, the constraint store can be considered an extension of the concept of substitution that is used in regular logic programming. The most common kinds of constraints used in constraint logic programming are constraints over integers/rational/real numbers and constraints over finite domains. Concurrent constraint logic programming languages have also been developed. They significantly differ from non-concurrent constraint logic programming in that they are aimed at programming concurrent processes that may not terminate. Constraint handling rules can be seen as a form of concurrent constraint logic programming, but are also sometimes used within a non-concurrent constraint logic programming language. They allow for rewriting constraints or to infer new ones based on the truth of conditions. Constraint satisfaction toolkits Constraint satisfaction toolkits are software libraries for imperative programming languages that are used to encode and solve a constraint satisfaction problem.

Cassowary constraint solver is an open source project for constraint satisfaction (accessible from C, Java, Python and other languages). Comet, a commercial programming language and toolkit

Page 27

Gecode, an open source portable toolkit written in C++ developed as a production-quality and highly efficient implementation of a complete theoretical background. JaCoP (solver) an open source Java constraint solver Koalog a commercial Java-based constraint solver. logilab-constraint an open source constraint solver written in pure Python with constraint propagation algorithms. MINION an open-source constraint solver written in C++, with a small language for the purpose of specifying models/problems. ZDC is an open source program developed in the Computer-Aided Constraint Satisfaction Project for modeling and solving constraint satisfaction problems.

Other constraint programming languages Constraint toolkits are a way for embedding constraints into an imperative programming language. However, they are only used as external libraries for encoding and solving problems. An approach in which constraints are integrated into an imperative programming language is taken in the Kaleidoscope programming language. Constraints have also been embedded into functional programming languages.

Evaluation function: An evaluation function, also known

as a heuristic evaluation function or static evaluation function, is a function used by game-playing programs to estimate the value or goodness of a position in the mini max and
Page 28

related algorithms. The evaluation function is typically designed to be prioritize speed over accuracy; the function looks only at the current position and does not explore possible move In chess One popular strategy for constructing evaluation functions is as a weighted sum of various factors that are thought to influence the value of a position. For instance, an evaluation function for chess might take the form c1 * material + c2 * mobility + c3 * king safety + c4 * center control +... Chess beginners, as well as the simplest of chess programs, evaluate the position taking only "material" into account, i.e. they assign a numerical score for each piece (with pieces of opposite color having scores of opposite sign) and sum up the score over all the pieces on the board. On the whole, computer evaluation functions of even advanced programs tend to be more materialistic than human evaluations. This is compensated for by the increased speed of evaluation, which allows more plies to be examined. As a result, some chess programs may rely too much on tactics at the expense of strategy.

Game tree: a game tree is a directed graph whose nodes are

positions in a game and whose edges are moves. The complete game tree for a game is the game tree starting at the initial position and containing all possible moves from each position.

Page 29

The first two ply of the game tree for tic-tac-toe. The diagram shows the first two levels, or ply, in the game tree for tic-tac-toe. We consider all the rotations and reflections of positions as being equivalent, so the first player has three choices of move: in the center, at the edge, or in the corner. The second player has two choices for the reply if the first player played in the center, otherwise five choices. And so on. The number of leaf nodes in the complete game tree is the number of possible different ways the game can be played. For example, the game tree for tic-tac-toe has 26,830 leaf nodes. Game trees are important in artificial intelligence because one way to pick the best move in a game is to search the game tree using the mini max algorithm or its variants. The game tree for tic-tac-toe is easily searchable, but the complete game trees for larger games like chess are much too large to search. Instead, a chess-playing program searches a partial game tree: typically as many ply from the current position as it can search in the time
Page 30

available. Except for the case of "pathological" game trees [1] (which seem to be quite rare in practice), increasing the search depth (i.e., the number of ply searched) generally improves the chance of picking the best move. Two-person games can also be represented as and-or trees. For the first player to win a game there must exist a winning move for all moves of the second player. This is represented in the and-or tree by using disjunction to represent the first player's alternative moves and using conjunction to represent all of the second player's moves. Solving Game Trees

An arbitrary game tree that has been fully colored With a complete game tree, it is possible to "solve" the game that is to say, find a sequence of moves that either the first or

Page 31

second player can follow that will guarantee either a win or tie. The algorithm can be described recursively as follows. 1. Color the final ply of the game tree so that all wins for player 1 are colored one way, all wins for player 2 are colored another way, and all ties are colored a third way. 2. Look at the next ply up. If there exists a node colored opposite as the current player, color this node for that player as well. If all immediately lower nodes are colored for the same player, color this node for the same player as well. Otherwise, color this node a tie. 3. Repeat for each ply, moving upwards, until all nodes are colored. The color of the root node will determine the nature of the game. The diagram shows a game tree for an arbitrary game, colored using the above algorithm. It is usually possible to solve a game (in this technical sense of "solve") using only a subset of the game tree, since in many games a move need not be analyzed if there is another move that is better for the same player (for example alpha-beta pruning can be used in many deterministic games). Any sub tree that can be used to solve the game is known as a decision tree, and the sizes of decision trees of various shapes are used as measures of game complexity.

Game of chance: A game of chance is a game whose

outcome is strongly influenced by some randomizing device,
Page 32

and upon which contestants may or may not wager money or anything of monetary value. Common devices used include dice, spinning tops, playing cards, roulette wheels or numbered balls drawn from a container. Any game of chance that involves anything of monetary value is gambling. Gambling is known in nearly all human societies, even though many have passed laws restricting it. Early people used the knucklebones of sheep as dice. Some people develop a psychological addiction to gambling, and will risk even food and shelter to continue. Some games of chance may also involve a certain degree of skill. This is especially true where the player or players have decisions to make based upon previous or incomplete knowledge, such as poker and blackjack. In other games like roulette and baccarat the player may only choose the amount of bet and the thing he/she wants to bet on, the rest is up to chance, therefore these games are still considered games of chance with small amount of skills required [1]. The distinction between 'chance' and 'skill' is relevant as in some countries chance games are illegal or at least regulated, where skill games are not

Page 33

Unit-2 Knowledge Representation Introduction to Knowledge Representation (KR)

We argue that the notion can best be understood in terms of five distinct roles it plays, each crucial to the task at hand: A knowledge representation (KR) is most fundamentally a surrogate, a substitute for the thing itself, used to enable an entity to determine consequences by thinking rather than acting, i.e., by reasoning about the world rather than taking action in it It is a set of ontological commitments, i.e., an answer to the question: In what terms should I think about the world? It is a fragmentary theory of intelligent reasoning, expressed in terms of three components: (i) the representations fundamental conception of intelligent reasoning; (ii) the set of inferences the representation sanctions; and (iii) the set of inferences it recommends. It is a medium for pragmatically efficient computation, i.e., the computational environment in which thinking is accomplished. One contribution to this pragmatic
Page 34

efficiency is supplied by the guidance a representation provides for organizing information so as to facilitate making the recommended inferences. It is a medium of human expression, i.e., a language in which we say things about the words Knowledge representation is needed for library classification and for processing concepts in an information system. In the field of artificial intelligence, problem solving can be simplified by an appropriate choice of knowledge representation. Representing the knowledge in one way may make the solution simple, while an unfortunate choice of representation may make the solution difficult or obscure; the analogy is to make computations in Hindu-Arabic numerals or in Roman numerals; long division is simpler in one and harder in the other. Likewise, there is no representation that can serve all purposes or make every problem equally approachable. Properties for Knowledge Representation Systems The following properties should be possessed by a knowledge representation system. Representational Adequacy the ability to represent the required knowledge; Inferential Adequacy - the ability to manipulate the knowledge represented to produce new knowledge corresponding to that inferred from the original; Inferential Efficiency - the ability to direct the inferential mechanisms into the most productive directions by storing appropriate guides;
Page 35

Acquisition Efficiency - the ability to acquire new knowledge using automatic methods wherever possible rather than reliance on human intervention.

Predicate Logic: Propositional logic combines atoms An

atom contains no propositional connectives Have no structure (today_is_wet, john_likes_apples) about objects

Predicates allow us to talk Properties: is wet (today) Relations: likes (john, apples)

True or false is a predicate e.g. first order logic, higher-order logic First Order Logic propositional Used in this course (Lecture 6 on representation in FOL)

In predicate logic each atom

More expressive logic than

Constants are objects: john, apples

Page 36

Predicates are properties and relations: likes(john, apples)

Functions transform objects:

likes(john, fruit of(apple tree))

Variables represent any object: likes(X, apples) Quantifiers qualify values of variables

True for all objects (Universal): apples)

X. likes(X,

Exists at least one object (Existential): X. likes(X, apples Example: FOL Sentence

Every rose has a thorn

if (X is a rose) then there exists Y Higher Order Logic

Page 37

For all X

(X has Y) and (Y is a thorn)

order also objects

More expressive than first Functions and predicates are Described by predicates: binary(addition) Transformed by functions: differentiate(square) Can quantify over both

having zero at 17

E.g. define red functions as Much harder to reason with

Forward Chaining: In forward chaining the rules are

examined one after the other in a certain order. The order might be the sequence in which the rules were entered into the rule set or some other sequence as specified by the user. As each rule is examined, the expert system attempts to evaluate whether the condition is true or false. Rule evaluation: When the condition is true, the rule is fired and the next rule is examined. When the condition is false, the rule is not fired and the next rule is examined. It is possible that a rule cannot be evaluated as true or false. Perhaps the condition includes one or more variables with
Page 38

unknown values. In that case the rule condition is unknown. When a rule condition is unknown, the rule is not fired and the next rule is examined. The iterative reasoning process: The process of examining one rule after the other continues until a complete pass has been made through the entire rule set. More than one pass usually is necessary in order to assign a value to the goal variable. Perhaps the information needed to evaluate one rule is produced by another rule that is examined subsequently. After the second rule is fired, the first rule can be evaluated o the next pass. The passes continue as long as it is possible to fire rules. When no more rules can be fired, the reasoning process ceases. Example of forward reasoning: Letters are used for the conditions and actions to keep the illustration simple. In rule1, for example, if condition A exists then action B is taken. Condition A might be THIS.YEAR.SALES>LAST.YEAR.SALES

Backward chaining: Backward chaining is an inference

method used in automated theorem proves, proof assistants and other artificial intelligence applications. It is one of the two most commonly used methods of reasoning with inference rules and logical implications the other is forward chaining. Backward
Page 39

chaining is implemented in logic programming by SLD resolution. Both rules are based on the modus ponens inference rule. Backward chaining starts with a list of goals (or a hypothesis) and works backwards from the consequent to the antecedent to see if there is data available that will support any of these consequents. An inference engine using backward chaining would search the inference rules until it finds one which has a consequent (Then clause) that matches a desired goal. If the antecedent (If clause) of that rule is not known to be true, then it is added to the list of goals (in order for one's goal to be confirmed one must also provide data that confirms this new rule). For example, suppose that the goal is to conclude the color of my pet Fritz, given that he croaks and eats flies, and that the rule base contains the following four rules:
1. 2. 3. 4.

If X croaks and eats flies Then X is a frog If X chirps and sings Then X is a canary If X is a frog Then X is green If X is a canary Then X is yellow

first two rules are selected, because their consequents (Then X is a frog, Then X is a canary) match the new goals that were just added to the list. The antecedent (If Fritz croaks and eats flies) is known to be true and therefore it can be concluded that Fritz is a frog, and not a canary. The goal of determining Fritz's color is now achieved (Fritz is green if he is a frog, and yellow if he is a canary, but he is a frog since he croaks and eats flies; therefore, Fritz is green). Note that the goals always match the affirmed versions of the consequents of implications (and not the negated versions as in modus tollens) and even then, their antecedents are then considered as the new goals (and not the conclusions as in affirming the consequent) which ultimately must match known facts (usually defined as consequents whose antecedents are always true); thus, the inference rule which is used is modus ponens. Because the list of goals determines which rules are selected and used, this method is called goal-driven, in contrast to data-driven forward-chaining inference. The backward chaining approach is often employed by expert systems.

Conceptual Dependency formalism: Conceptual

dependency (CD) is a theory of natural language processing which mainly deals with representation of semantics of a language. The main motivation for the development of CD as a knowledge representation techniques are given below:

Page 41

To construct computer programs that can understand natural language. To make inferences from the statements and also to identify conditions in which two sentences can have similar meaning. To provide facilities for the system to take part in dialogues and answer questions. To provide a means of representation which are language independent. Knowledge is represented in CD by elements what are called as conceptual structures. What forms the basis of CD representation is that for two sentences which have identical meaning there must be only one representation and implicitly packed information must be explicitly stated. In order that knowledge is represented in CD form, certain primitive actions have been developed. Table provides the primitives CD actions. Apart from the primitives CD actions one has to make use of the six following categories of types of objects. 1. PPs: (picture producers) Only physical objects are physical producers. 2. ACTs: Actions are done by an actor to an object. Table gives the major ACTs. Table Primitive CD forms
Page 42

CD primitive action 1. ATRANS

Explanation transfer of abstract relationship(e,g, give) transfer of physical location of an object application of physical force of an object movement of a body part of an animal by grasping of an object by an actor (e.g.

PTRANS (e.g. go) PROPEL (e.g. throw)

4. MOVE the animal 5. GRASP hold)

6. INGEST Taking of an object by an animal to the inside of that a animal (e.g. Drink.eat) 7. EXPEL Expulsion of an object from inside the body by an animal to the world (e.g. spit) 8. MTRANS Transfer of mental information between animals or within an animal (e.g. tell) 9. MBUILD Construction of a new information from an old information (e.g. decide). 10. SPEAK
Page 43

Action of producing sound (e.g. say).

3. LOCs: Locations Every action takes place at some locations and serves as source and destination. 4. Ts: Times An action can take place at a particular location at a given specified time. The time can be represented on an absolute scale or relative scale. 5. AAs: Action aiders These serve as modifiers of actions, the actor PROPEL has a speed factor associated with it which is an action aider. 6. PAs: Picture Aides Serve as aides of picture producers. Every object that serve as a PP, needs certain characteristics by which they are defined. PAs practically serve PPs by defining the characteristics. There are certain rules by which the conceptual categories of types of objects discussed can be combined. CD models provide the following advantages for representing knowledge. The ACT primitives help in representing wide knowledge in a succinct way. To

Page 44

illustrate this, consider the following verbs. These are verbs that correspond to transfer of mental information. -see -learn -hear -inform -remember In CD representation all these are represented using a single ACT primitives MTRANS. They are not represented individually as given. Similarly, different verbs that indicate various activities are clubbed under unique ACT primitives, thereby reducing the number of inference rules.

The main goal of CD representation is to make explicit of what is implicit. That is why every statement that is made has not only the actors and objects but also time and location, source and destination.

The following set conceptual tenses still make usage of CD more precise. O-Object case relationship R-recipient case relationship P-past
Page 45

F-future T-transition Ts-start transition Tf-finished transition K-continuing ?-interrogative /-negative Nil-present Delta-timeless C-conditional

CD brought forward the notion of language independence because all ACTs are language-independent primitive.

Semantic Nets: The main idea behind semantic nets is that the meaning of a concept comes from the ways in which it is connected to other concepts. In a semantic net, information is represented as a set of nodes connected to each other by a set of labeled arcs, which represent relationship among the nodes. A fragment of a typical semantic net is shown in fig.
Mammal

Isa
Page 46 Person

Has-port

Nose

Instance
UniformColor
Blue Pee-wee-Reese

team
Brooklyn- Dodgers

Fig: A semantic network This network contains an example of both the Isa and instance relations, as well as some other, more domain-specific relations like team and uniform-color. In this network, we could use inheritance to derive the additional relation Has-part (pee-wee-Reese, Nose) 1. Insertion Search. One of the early ways that semantic nets were used was to find relationships among objects by spreading activation out from each of two nodes and seeing where the activation met. This process is called insertion search. Using this process, it is possible to use the network of fig to answer questions such as what is the connection between the Brooklyn Dodgers and Blue? 2. Representing Non Binary predicates. Semantic nets are a natural way to represent relationships that would appear as ground instances of binary predicates in predicate logic. For
Page 47

example, some of the arcs from fig could be represented in logic as Isa (person, mammal) Instance (pee-Wee-Reese, Person) Team (pee-wee-Reese, Brooklyn-Dodgers) Uniform color (pee-wee- Reese, blue) But the knowledge expressed by predicates of other arties can also be expressed in semantic nets. We have already seen that many unary predicates. Such as Isa and instance. So for example, Man (Marcus) Could be rewritten as Instance (Marcus, Man) Thereby making it easy to represent in a semantic net. 3. Partitioned semantic Nets. Suppose we want to represent simple quantified expression in semantic nets. One way to do this is to partition the semantic net into a hierarchical set of spaces, each of which corresponds to the scope of one or more variables. To see how this works, consider first the simple net shown in fig. this net corresponds to the statement. The dog bit the mail carrier.
Page 48

The nodes Dogs, Bite, and Mail-Carrier represents the classes of dogs, biting, and mail carriers, respectively, while the nodes d,b, and m represent a particular dog, a particular biting, and a particular mail carrier. This fact can be easily be represented by a single net with no portioning. But now suppose that we want to represent the fact Every dog has bitten a mail carrier. Or, in logic: X: dog(x) y: Mail-carrier(y) bite(x, y)

To represent this fact, it is necessary to encode the scope of the universally quantified variable x.
Dogs Bite Mailcarrier

Isa
d b

Isa

isa
m

Assailant

victim

Fig: using partitioned semantic nets Frame: A frame is a collection of attributes (usually called slots) and associated values (and possibly constraints on values) that describes some entity in the world. Sometimes a frame
Page 49

describes an entity in some absolute sense; sometimes it represents the entity from a particular point of view. A single frame taken alone is rarely useful. Instead, we build frame systems out of collection of frames that are connected to each other. Set theory provides a good basis for understanding frame systems. Although not all frame systems are defined this way, we do so here. In this view, each frame represents either a class (a set) or an instance (an element of a class). To see how this works, consider a frame system shown in fig. In this example, the frames person, adult-male, ML-baseball-player, pitcher, and ML-baseball team are all classes. The frames pee-wee-Reese and Brooklyn-Dodgers are instances.
Person Isa: Cardinality: *handed: Adult-Male Isa: Cardinality *height: person 2,000,000,000 5-10 Mammal 6,000,000,000 Right

ML-Baseball-Player
Page 50

Isa: Cardinality: height: bats:

adult-male 624 6-1 equal to handed

batting-average: .252 team: *uniform-color: Fielder Isa: Cardinality: ML-baseball-player 376

*batting-average: .262 Pee-Wee-Reese Instance: Height: Bats: fielder 5-10 right

Batting-average: .309 Team: Uniform-color: Brooklyn-Dodgers Blue

ML-Baseball-Team Isa:
Page 51

Team

Cardinality: team-size: manager: Brooklyn-dodgers Instance: Team-size: Manager: Players:

26 24

ML-Baseball-Team 24 Leo-Durocher (Pee-Wee-Reese)

fig. A simplified frame system

Wff. Not all strings can represent propositions of the predicate

logic. Those which produce a proposition when their symbols are interpreted must follow the rules given below, and they are called wffs (well-formed formulas) of the first order predicate logic. Rules for constructing Wffs A predicate name followed by a list of variables such as P(x, y), where P is a predicate name, and x and y are variables, is called an atomic formula. Wffs are constructed using the following rules:
1.

True and False are wffs.

Page 52

2. Each propositional constant (i.e. specific proposition), and each propositional variable (i.e. a variable representing propositions) are wffs. 3. Each atomic formula (i.e. a specific predicate with variables) is a wff. 4. If A, B, and C are wffs, then so are A, (A B), (A B), (A B), and (A B). 5. If x is a variable (representing objects of the universe of discourse), and A is a wff, then so are x A and x A . 6. For example, "The capital of Virginia is Richmond." is a specific proposition. Hence it is a wff by Rule 2. Let B be a predicate name representing "being blue" and let x be a variable. Then B(x) is an atomic formula meaning "x is blue". Thus it is a wff by Rule 3. above. By applying Rule 5. to B(x), xB(x) is a wff and so is xB(x). Then by applying Rule 4. to them x B(x) x B(x) is seen to be a wff. Similarly if R is a predicate name representing "being round". Then R(x) is an atomic formula. Hence it is a wff. By applying Rule 4 to B(x) and R(x), a wff B(x) R(x) is obtained. In this manner, larger and more complex wffs can be constructed following the rules given above. Note, however, that strings that can not be constructed by using those rules are not wffs. For example, xB(x)R(x), and B( x ) are NOT wffs, NOR are B( R(x) ), and B( x R(x) ) . One way to check whether or not an expression is a wff is to try to state it in English. If you can translate it into a
Page 53

correct English sentence, then it is a wff. More examples: To express the fact that Tom is taller than John, we can use the atomic formula taller(Tom, John), which is a wff. This wff can also be part of some compound statement such as taller(Tom, John) taller(John, Tom), which is also a wff. If x is a variable representing people in the world, then taller(x,Tom), x taller(x,Tom), x taller(x,Tom), x y taller(x,y) are all wffs among others. However, taller( x,John) and taller(Tom Mary, Jim), for example, are NOT wffs.

Unit-3 Handling Uncertainty and learning Fuzzy Logic: In the techniques we have not modified the
mathematical underpinnings provided by set theory and logic. We have instead augmented those ideas with additional constructs provided by probability theory. We take a different approach and briefly consider what happens if we make
Page 54

fundamental changes to our idea of set membership and corresponding changes to our definitions of logical operations. The motivation for fuzzy sets is provided by the need to represent such propositions as: John is very tall. Mary is slightly ill. Sue and Linda are close friends. Exceptions to the rule are nearly impossible. Most Frenchmen are not very tall. While traditional set theory defines set membership as a Boolean predicate, fuzzy set theory allows us to represent set membership as a possibility distribution. Once set membership has been redefined in this way, it is possible to define a reasoning system based on techniques for combining distributions. Such responders have been applied control systems for devices as diverse trains and washing machines.

Dempster Shafer Theory: This theory was developed by

Dempster 1968; Shafer, 1976. This approach considers sets of propositions and assigns to each of them an interval [Belief, Plausibility]
Page 55

In which the degree of belief must lie. Belief (usually denoted Bel) measures the strength of the evidence in favor of a set of propositions. It ranges from 0 (indicating no evidence) to 1(denoting certainty). A belief function, Bel, corresponding to a specific m for the set A, is defined as the sum of beliefs committed to every subset of A by m. That is, Bel (A) is a measure of the total support or belief committed to the set A and sets a minimum value for its likelihood. It is defined in terms of all belief assigned to A as well as to all proper subsets of A. Thus, Bel (A) =m (B) For example, if U contains the mutually exclusive subsets A, B, C, and D then Bel({A,C,D})= m({A,C,D}) +m({A,C})+m({A,D})+m({C,D}) +m({A})+m({c})+m({D}. In Dempster-Shafer theory, a belief interval can also be defined for a subset A. It is represented as the subinterval [Bel (A), P1 (A)] of [0, 1]. Bel (A) is also called the support of A, and P1 (A) =1-Bel (A) the plausibility of A. We define Bel (o) =0 to signify that no belief should be assigned to the empty set and Bel (U) = 1 to show that the truth is contained within U. The subsets A of U are called the focal elements of the support function Bel when m (A)>0.
Page 56

Since Bel (A) only partially describes the beliefs about proposition A, it is useful to also have a measure of the extent one believes in A that is, the doubts regarding A. For this, we define the doubt of A as D (A) = Bel (A). From this definition it will be seen that the upper bound of the belief interval noted above, P1 (A), can be expressed as P1 (A) =1-D (A) = 1- Bel (À). P1 (A) represents an upper belief limit on the proposition A. The belief interval, [Bel (A), P (A)], is also sometimes referred to as the confidence in A, while the quantity P1 (A)-Bel (A) is referred to as the uncertainty in A. It can be shown that P1 (0) = 0, P1 (U) =1 For all A, P1 (A) Bel (A) Bel (A) +Bel (À) 1, P1 (A) + P1 (À) 1, and For A _B, Bel (A) Bel (B), P1 (A) P1 (B) As an example of the above concepts, recall once again the problem of identifying the terrorist organizations A, B, C and D could have been responsible for the attack. The possible subsets of U in this case form a lattice of sixteen sub sets (fig).
{A, B, C, D}
Page 57

{A, B, C,}

{A, B, D}

{A, C, D}

{B, C, D}

{A, B} {A, C,} {B, C,} {B, D} {A, C,} {C, D} {B, D} {C, D}

{A}

{B}

{C}

{D}

{O} Fig Lattice of subsets of the universe U.

Assume one piece of evidence supports the belief that groups A and C were responsible to a degree of m1 ({A, C}) = 0.6 and another source of evidence disproves the belief that C was involved (and therefore supports the belief that the three organizations, A, B, and D were responsible: that is m2 ({A, B, D}) =0.7. To obtain the pooled evidence, we compute the following quantities M1 0m2({A}) = (0.6)*(0.7) =0.42 M1 0 m2 ({A, C}) = (0.6) *(0.3) = 0.18 M1 0 m2 ({A, B, D}) = (0.4)*(0.7) = 0.28
Page 58

M1 0 m2 ({U}) = (0.4)*(0.3) =0.12 M1 0m2=0 for all other subsets of U Bel1 ({A, C}) = m ({A, C}) +m ({A}) +m ({C})

Bayes Theorem: An important goal for many problemsolving systems is to collect evidence as the system goes along and to modify its behavior on the basis of evidence. To model this behavior, we need a statistical theory of evidence. Bayesian statistics is such a theory. The fundamental notion of. Bayesian statistics is that of conditional probability; P (H/ E) 231 Read this expression as the probability of hypothesis H given that we have observed evidence E. To compute this, we need to take into account the prior probability of H and the extent to which E provides evidence of H. To do this, we need to define a universe that contains an exhaustive, mutually exclusive set of His among which we are trying to discriminate then, let P (Hi/E) = the probability that hypothesis Hi is true given evidence E P (E/Hi) = the probability that we will observe evidence E given that hypothesis i is true.
Page 59

P (Hi) = the priori probability that hypothesis i is true in the absence of any specific evidence. These probabilities are called prior probabilities of priors. K=the number of positive hypotheses Bayes theorem then states that P (Hi/E) = P (E/Hi). P (Hi) P (E/Hn).P (Hn) Specifically, when we say P (A/B), we are describing the conditional probability of A given that the only evidence we have is B. If there is also other relevant evidence, then it too must be considered. Suppose, for example, that we are solving a medical diagnosis problem. Consider the following assertions: S: patient has spots M: patient has measles F: patient has high fever Without any additional evidence, the presence of spots serves as evidence in favor of measles. It also serves as evidence of fever since measles would cause fever. But, since spots and fever are not independent events, we cannot just sum their effects; instead, we need to represent explicitly the conditional probability that arises from their conjunction. In general, given a
Page 60

prior body of evidence e and some new observation E, we need to compute. P (H/E, e) = P (H/E).P (e/E, H) P (e/E) Unfortunately, in an arbitrarily complex world, the sizes of the set of join probabilities that we are require in order to compute this function grows as 2n if there are n different propositions being considered. This makes using Bayes theorem intractable for several reasons: The knowledge acquisition problem is insurmountable; too many probabilities have to be provided. The space that would be required to store all the probabilities is too large. The time required to compute the probabilities is too large. Despite these problems, through Bayesian statistics provide an attractive basis for an uncertain reasoning system. As a result, several mechanisms for exploiting its power while at the same time making it tractable have been developed.

Learning: One of the most often heard criticisms of AI is that

machines cannot be called intelligent until they are able to learn to do new things and to adapt to new situations, rather than simply doing as they are told to do. There can be little question
Page 61

that the ability to adapt to new surroundings and to solve new problems is an important characteristics of intelligent entities. How to interpret its inputs in such a way that its performance gradually improves. Learning denotes changes in the systems that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time. Learning covers a wide range of phenomena. 1. At one end of the spectrum is skill refinement. People get better at many tasks simply by practicing. The more you ride a bicycle or play tennis, the better you get. 2. At the other end of this spectrum lies knowledge acquisition. Knowledge is generally acquired through experience 3. Many AI programs are able to improve their performance substantially through rote- learning techniques. 4. Another way we learn is through taking advice from others. Advice taking is similar to rote learning, but high-level may not be in a form simple enough for a program to use directly in problem solving.
5.

People also learn through their own problem solving experience. After solving a complex problem, we
Page 62

remember the structure of the problem and the methods we used to solve it. The next time we see the problem, we can solve it more efficiently. Moreover, we can generalize from our experience to solve related problems more easily. the program remembers its experiences and generalizes from them. In large problem spaces, however, efficiency gains are critical. Learning can mean the difference between solving a problem rapidly and not solving it at all. In addition, programs that learn through problem-solving experience may be able to come up with qualitatively better solutions in the future.
6.

Another form of learning that does involve stimuli from the outside is learning from examples. Learning from examples usually involves a teacher who helps us classify things by correcting us when we are wrong. Sometimes, however, a program can discover things without the aid of a teacher. Learning is itself a problem-solving process.

Learning Model: Learning can be accomplished using a

number of different methods. For example, we can learn by memorizing facts, by being told, or by studying examples like problem solutions. Learning requires that new knowledge structures be created from some form of input stimulus. This new knowledge must then be assimilated into a knowledge base and be tested in some way for its utility. Testing means that the knowledge should be used in the performance of
Page 63

some task from which meaningful feedback can be obtained, where the feedback provides some measure of the accuracy and usefulness of the newly acquired knowledge. Learning model is depicted in fig where the environment has been included as part of the overall learner system. The environment may be regarded as either a form of nature which produces random stimuli or as a more organized training source such as a teacher which provides carefully selected training examples for the learner component.
Stimuli Examples

Feedback u
Learner Component

Environment Or Teacher
Knowledge Base Critic performance

Response
Performance Component

Evaluator

Tasks
Page 64

Fig. Learning Model

The actual form of environment used will depend on the particular learning paradigm. In any case, some representation language must be assumed for communication between the environment and the learner. The language may be the same representation scheme as that used in the knowledge base (such as form of predicate calculus). When they are chosen to be the same we say the single representation trick is being used. This usually results in a simpler implementation since it is not necessary to transform between two or more different representations. Inputs to the learner component may be physical stimuli of some type or descriptive, symbolic training examples. The information conveyed to the learner component is used to create and modify knowledge structures in the knowledge base. When given a task, the performance component produces a response describing actions in performing the task. The critic module then evaluates this response relative to an optimal response. The cycle described above may be repeated a number of times until the performance of the system have reached some acceptable level, until a known learning goal has been reached, or until changes cease to occur in the knowledge base after some chosen numbers of training examples have been observed.
Page 65

There are several important factors which influence a systems ability to learn in addition to the form of representation used. They include the types of training provided, the form and extent of any initial background knowledge, the type of feedback provided, and the learning algorithms used (fig).
Background knowledge

Feedback Learning Algorithms

Training Scenario

resultant

Representation scheme

fig Factors affecting learning performance Finally the learning algorithms themselves determine to a large extent how successful a learning system will be. Te algorithms control the search to find and build the knowledge structures. We then expect that the algorithms that extract much of the
Page 66

useful information from training examples and take advantage of any background knowledge out perform those that do not.

Supervised learning: Supervised learning is the machine

learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal). A supervised learning algorithm analyzes the training data and produces an inferred function, which is called a classifier (if the output is discrete, see classification) or a regression function (if the output is continuous, see regression). The inferred function should predict the correct output value for any valid input object. This requires the learning algorithm to generalize from the training data to unseen situations in a "reasonable" way (see inductive bias). (Compare with unsupervised learning.) The parallel task in human and animal psychology is often referred to as concept learning. Overview In order to solve a given problem of supervised learning, one has to perform the following steps: 1. Determine the type of training examples. Before doing anything else, the engineer should decide what kind of data is to be used as an example. For instance, this might be a single handwritten character, an entire handwritten word, or an entire line of handwriting.
Page 67

2. Gather a training set. The training set needs to be representative of the real-world use of the function. Thus, a set of input objects is gathered and corresponding outputs are also gathered, either from human experts or from measurements. 3. Determine the input feature representation of the learned function. The accuracy of the learned function depends strongly on how the input object is represented. Typically, the input object is transformed into a feature vector, which contains a number of features that are descriptive of the object. The number of features should not be too large, because of the curse of dimensionality; but should contain enough information to accurately predict the output. 4. Determine the structure of the learned function and corresponding learning algorithm. For example, the engineer may choose to use support vector machines or decision trees. 5. Complete the design. Run the learning algorithm on the gathered training set. Some supervised learning algorithms require the user to determine certain control parameters. These parameters may be adjusted by optimizing performance on a subset (called a validation set) of the training set, or via cross-validation. 6. Evaluate the accuracy of the learned function. After parameter adjustment and learning, the performance of the resulting function should be measured on a test set that is separate from the training set.

Page 68

Factors to consider Factors to consider when choosing and applying a learning algorithm include the following:
1.

Heterogeneity of the data. If the feature vectors include features of many different kinds (discrete, discrete ordered, counts, continuous values), some algorithms are easier to apply than others. Many algorithms, including Support Vector Machines, linear regression, logistic regression, neural networks, and nearest neighbor methods, require that the input features be numerical and scaled to similar ranges (e.g., to the [-1,1] interval). Methods that employ a distance function, such as nearest neighbor methods and support vector machines with Gaussian kernels, are particularly sensitive to this. An advantage of decision trees is that they easily handle heterogeneous data. Redundancy in the data. If the input features contain redundant information (e.g., highly correlated features), some learning algorithms (e.g., linear regression, logistic regression, and distance based methods) will perform poorly because of numerical instabilities. These problems can often by solved by imposing some form of regularization. Presence of interactions and non-linearitys. If each of the features makes an independent contribution to the output, then algorithms based on linear functions (e.g., linear regression, logistic regression, Support Vector Machines, naive Bayes) and distance functions (e.g., nearest neighbor methods, support vector machines with Gaussian kernels)
Page 69

generally perform well. However, if there are complex interactions among features, then algorithms such as decision trees and neural networks work better, because they are specifically designed to discover these interactions. Linear methods can also be applied, but the engineer must manually specify the interactions when using them. How supervised learning algorithms work Given a set of training examples of the form , a learning algorithm a function , where X is the input space and Y is the output space. The function g is an element of some space of possible functions G, usually called the hypothesis space. It is sometimes convenient to represent g using a scoring function such that g is defined as returning the y value that gives the highest score: . Let F denote the space of scoring functions. Although G and F can be any space of functions, many learning algorithms are probabilistic models where g takes the form of a conditional probability model g(x) = P(y | x), or f takes the form of a joint probability model f(x,y) = P(x,y). For example, naive Bayes and linear discriminant analysis are joint probability models, whereas logistic regression is a conditional probability model. There are two basic approaches to choosing f or g: empirical risk minimization and structural risk minimization[3]. Empirical risk minimization seeks the function that best fits the training data.
Page 70

Structural risk minimize includes a penalty function that controls the bias/variance tradeoff. In both cases, it is assumed that the training set consists of a sample of independent and identically distributed pairs, . In order to measure how well a function fits the training data, a loss function is defined. For training example , the loss of predicting the value is . The risk R(g) of function g is defined as the expected loss of g. This can be estimated from the training data as . Generalizations of supervised learning There are several ways in which the standard supervised learning problem can be generalized: Semi-supervised learning: In this setting, the desired output values are provided only for a subset of the training data. The remaining data is unlabeled. Active learning: Instead of assuming that all of the training examples are given at the start, active learning algorithms interactively collect new examples, typically by making queries to a human user. Often, the queries are based on unlabeled data, which is a scenario that combines semi-supervised learning with active learning.

Page 71

Structured prediction: When the desired output value is a complex object, such as a parse tree or a labeled graph, then standard methods must be extended. .Learning to rank: When the input is a set of objects and the desired output is a ranking of those objects, then again the standard methods must be extended.

Unsupervised Learning: What if a neural network is given

no feedback for its outputs, not even a real-valued reinforcement? Can the network learn anything useful? The unintuitive answer is yes.
Has- hair? Has-scales? has-feathers? flies? lives in water? lays eggs?

Dog Cat Bat Whale Canary Robin Ostrich Snake Lizard

1 1 1 1 0 0 0 0 0

0 0 0 0 0 0 0 1 1

0 0 0 0 1 1 1 0 0

0 0 1 0 1 1 1 0 0

0 0 0 1 0 0 0 0 0

0 0 0 0 1 1 1 1 1

Page 72

Alligator 0

Fig. Data for unsupervised learning This form of learning is called unsupervised learning because no teacher is required. Given a set of input data, the network is allowed to play with it to try to discover regularities and relationships between the different parts of the input. Learning is often made possible through some notion of which features in the input sets are important. But often we do not know in advance which features are important, and asking a learning system to deal with raw input data can be computationally expensive. Unsupervised learning can be used as a feature discovery module that precedes supervised learning. Consider the data in fig. the group of ten animals, each is described by its own set of features, breaks down naturally into three groups: mammals, reptiles and birds. We would like to build a network that can learn which group a particular animal belongs to, and to generalize so that it can identify animals it has not yet seen. We can easily accomplish this with a six-input, three-output back propagation network. We simply present the network with an input, observe its output, and update its weights based on the errors it makes. Without a teacher, however, the error cannot be computed, so we must seek other methods. Our first problem is to ensure that only one of the three output units become active for any given input. One solution to this problem is to let the network settle, find the output unit with the
Page 73

highest level of activation, and set that unit to 1 and all other output units to 0. In other words, the output unit with the highest activation is the only one we consider to be active. A more neural-like solution is to have the output units fight among themselves for control of an input vector.

Learning by Induction : What is "Learning by Induction"?

Simply put, it is learning by watching. You watch what others do, then you do that. Below is a more formal explanation of inductive vs. deductive logic: In logic, we often refer to the two broad methods of reasoning as the deductive and inductive approaches. Deductive reasoning works from the more general to the more specific. Sometimes this is informally called a "top-down" approach. We might begin with thinking up a theory about our topic of interest. We then narrow that down into more specific hypotheses that we can test. We narrow down even further when we collect observations to address the hypotheses. This ultimately leads us to be able to test the hypotheses with specific data -- a confirmation (or not) of our original theories. Inductive reasoning works the other way, moving from specific observations to broader generalizations and theories. Informally, we sometimes call this a "bottom up" approach. In inductive reasoning, we begin with specific observations and measures, begin to detect patterns and regularities, formulate some tentative hypotheses that we can explore, and finally end up

Page 74

developing some general conclusions or theories. (Thanks to William M.K. Trochim for these definitions). To translate this into an approach to learning a skill, deductive learning is someone TELLING you what to do, while inductive learning is someone SHOWING you what to do. Remember the saying "a picture is worth a thousand words"? That means, in a given amount of time, a person can be SHOWN a thousand times more information than they could be TOLD in the same amount of time. I can access a picture or pattern much more quickly than the equivalent description of that picture or pattern in words. Athletes often practice "visualization" before they undertake an action. But in order to visualize something, you need to have a picture in your head to visualize. How do you get those pictures in your head, by WATCHING. Who do you watch? Professionals. This is the key. Pay attention here. When you want to learn a skill:

WATCH PROFESSIONALS DO IT BEFORE YOU DO IT. DO NOT DO IT YOURSELF FIRST. Going out and doing a sport without having seen AND STUDIED professionals doing that sport is THE NUMBER ONE MISTAKE people make. They force themselves to play, their brain says "what do we do now?", another part of the brain looks for examples (pictures) of what to do, and, finding none, says "just do anything". So they try to generate behavior to
Page 75

accomplish something within the rules of the sport. If they "keep score" and try to "win" and avoid "losing", the negative impact is multiplied tenfold. Yet this is EXACTLY what most people do and what most ARE TOLD to do! "Interested in tennis? Grab a racquet, join a league, get out there and have fun!" Then what happens? They have no training, they try do what it takes to "win", and to do so, they manufacture awful strokes just TO BE ABLE to play (remember, they joined a league, so they have to keep score and win!), these awful strokes get ingrained by repetition, they produce terrible results, and they are very difficult to unlearn, so progress, despite lessons (mostly in the useless form of words), is slow or non existent. Then they quit. When you finally pick up a racquet and go out to play, and your brain says "what do we do now?", your head will be filled with pictures of professionals perfectly doing what you are trying to do. You will not know how to do it incorrectly, because you have never seen it done incorrectly. You will try to do what they do, and you will almost immediately proceed to an advanced intermediate level. You will be a beginner for a short period of time, if at all, and improvement will be a matter of adding to and refining what you are doing, not stripping down and unlearning bad patterns. And since you are not keeping score, you focus purely on technique. If you hit one into the net, just pull another ball out of your pocket and do it again. No big deal, no drama, no guilt. Just hit another. When you feel you can hit all of your shots somewhat professionally, maybe you can actually play someone and keep score. You will love the
Page 76

positive feedback of beating players who have been playing much longer than you have. You will wonder how they could have played for so long and still "play like that". Don't they know it's done "this way?" What professional does it "that way?" Don't they watch tennis on TV? Who does that? I just started and I know that's wrong. All these thoughts will make you feel like a genius. So how does all of this relate to chess? Simply put, play over the games of professional players and see how they play before you play anybody. Try to imitate them instead of trying to reinvent the wheel. Play over the games of lots of different players and then decide which one or two you like. The ones you like are the ones where you say after playing over one of their games, "I would love to play a game like that!" Then just concentrate on those one or two players. Study and play the openings they play. Get books where they comment on their own games. Maybe they will say what they were thinking during the game. Try to play like them. During your games, think "What would he do in this position?" Personally, I like Murphy for his rapid development and attacks, Lachine for his creativeness in all positions, and Spas sky for his ability to play all types of positions and create attacks in calm positions.

Learning Decision tree

Learning , Decision tree used in data mining and machine learning, uses a decision tree as a predictive model which maps observations about an item to conclusions about the item's target
Page 77

value. More descriptive names for such tree models are classification trees or regression trees. In these tree structures, leaves represent classifications and branches represent conjunctions of features that lead to those classifications. In decision analysis, a decision tree can be used to visually and explicitly represent decisions and decision making. In data mining, a decision tree describes data but not decisions; rather the resulting classification tree can be an input for decision making.. General Learning decision tree is a common method used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables. Each interior node corresponds to one of the input variables; there are edges to children for each of the possible values of that input variable. Each leaf represents a value of the target variable given the values of the input variables represented by the path from the root to the leaf. A tree can be "learned" by splitting the source set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning. The recursion is completed when the subset at a node all has the same value of the target variable, or when splitting no longer adds value to the predictions. Data comes in records of the form:
Page 78

The dependent variable, Y, is the target variable that we are trying to understand, classify or generalize. The vector x is composed of the input variables, x1, x2, x3 etc., that are used for that task. Types: Classification tree analysis is when the predicted outcome is the class to which the data belongs. Regression tree analysis is when the predicted outcome can be considered a real number (e.g. the price of a house, or a patients length of stay in a hospital). Classification And Regression Tree (CART) analysis is used to refer to both of the above procedures, first introduced by Breiman et al. Chi-squared Automatic Interaction Detector (CHAID). Performs multi-level splits when computing classification trees.
[2]

A Random Forest classifier uses a number of decision trees, in order to improve the classification rate. Boosted Trees can be used for regression-type and classification-type problems

Page 79

Decision tree advantages Decision trees have various advantages:

Simple to understand and interpret. People are able to understand decision tree models after a brief explanation. Requires little data preparation. Other techniques often require data normalization, dummy variables need to be created and blank values to be removed. Able to handle both numerical and categorical data. Other techniques are usually specialized in analyzing datasets that have only one type of variable. Ex: relation rules can be used only with nominal variables while neural networks can be used only with numerical variables. Uses a white box model. If a given situation is observable in a model the explanation for the condition is easily explained by Boolean logic. An example of a black box model is an artificial neural network since the explanation for the results is difficult to understand. Possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model. Robust. Performs well even if its assumptions are somewhat violated by the true model from which the data were generated. Perform well with large data in a short time. Large amounts of data can be analyzed using personal computers in a time short enough to enable stakeholders to take decisions based on its analysis.
Page 80

Truth maintenance system : Truth maintenance system,

or TMS, is a knowledge representation method for representing both beliefs and their dependencies. The name truth maintenance is due to the ability of these systems to restore consistency. It is also termed as a belief revision system; a truth maintenance system maintains consistency between old believed knowledge and current believed knowledge in the knowledge base (KB) through revision. If the current believed statements contradict the knowledge in KB, then the KB is updated with the new knowledge. It may happen that the same data will again come into existence; the previous knowledge will be required in KB. If the previous data is not present, it is required for new inference. But if the previous knowledge was with KB, then no retracing of the same knowledge was needed. Hence the use of TMS to avoid such retracing; it keeps track of the contradictory data with the help of a dependency record. This record reflects the retractions and additions which makes the inference engine (IE) aware of its current belief set. Each statements having at least one valid justification is made a part of the current belief set. When a contradiction is found, the statement(s) responsible for the contradiction are identified and an appropriate is retraced. This results the addition of new statements to the KB. This process is called dependencydirected backtracking. The TMS maintain the records in the form of a dependency network. The nodes in the network are one of the entries in the
Page 81

KB (may be a premise, antecedent, inference rule etc.) Each arc of the network represents the inference steps from which the node was derived. Premise: A premise is a fundamental belief which is assumed to be always true. They do not need justifications. Considering premises are base from which justifications for all other nodes will be stated. There are two types of justification for each node. They are: 1. Support List [SL] 2. Conceptual Dependencies(CP) Many kinds of truth maintenance systems exist. Two major types are single-context and multi-context truth maintenance. In single context systems, consistency is maintained among all facts in memory (database). Multi-context systems allow consistency to be relevant to a subset of facts in memory (a context) according to the history of logical inference. This is achieved by tagging each fact or deduction with its logical history. Multi-agent truth maintenance systems perform truth maintenance across multiple memories, often located on different machines. De Klees ATMS (1986) was utilized in systems based upon KEE on the Lisp Machine. The first multiagent TMS was created by Mason and Johnson. It was a multicontext system. Bridge land and Huns created the first singlecontext multiagent system
N

Nodded Dependency-directed backtracking is a problem-solving (

Page 82

Dependency directed Backtracking: Dependency directed

backtracking is a problem solving (qv) technique for efficiently evading contradictions. It is invoked when the problem solvers discovers that its current state is inconsistent. The goal is, in a single operation, to change the problem solvers current state to neither one that contains neither the contradiction just uncovered nor any contradiction encountered previously. This is achieved by consulting records of the inferences the problem solver has performed and records of previous contradiction, which dependencydirected backtracking has constructed in response to previous contradictions. Contrast to backtracking: Dependency directed backtracking was developed to avoid the deficiencies of chronological backtracking. Consider the application of chronological backtracking to the following task (see fig): First do one of A or B, then one of C or D, and then one of E or F. Assume that each step requires significant problem solving effort and that A and C together or B and E together produce a contradiction that is only uncovered after significant effort. Fig illustrates the sequence of problem solving states that chronological backtracking goes through to find all solutions (6, 7, 11 and 14) Backtracking to an Appropriate Choice: The first deficiency of chronological backtracking is illustrated by the unnecessary state 4. The contradiction discovered in state 3 depends on choices A and C and not E. Therefore, replacing the choices E with F and working on state 4 is futile, as this change does not remove the contradiction. Unlike chronological backtracking, which replaces the most recent choice, dependency directed backtracking replaces a choice that caused the contradiction.
Page 83

The discovery that state 3 is inconsistent causes immediate backtracking to state 5. To be able to determine which choices underlie the contradiction requires that the problem solver store dependency records with every datum that it infers. Avoiding Rediscovering contradiction. The second deficiency of chronological backtracking is illustrated by unnecessary state 13. The contradiction discovered in state 10 depends on B and E. As E is the most recent choice, chronological and dependency directed backtracking are indistinguishable, both backtracking to state 11. How ever, as B and E are known to be inconsistent with each other, there is no point in rediscovering this contradiction by working in state 13.

Page 84

Disadvantages:
1.

Dependency-directed backtracking incurs a significant time and space overhead as it requires the maintenance of dependency records and an additional no-good database. Thus the effort required to maintain the dependencies may be more than the problem-solving effort solved.
Page 85

2. If the problem solver is logically complete and finishes all work on a state before considering the next, the problem of backtracking to an inappropriate choice cannot occur.
3.

In such cases much of the advantage of Dependencydirected backtracking is irrelevant. However, most practical problem solvers are neither logically complete nor finish all possible work on a state before considering one other.

Fuzzy function:
Membership function is the one of the fuzzy function which is used to develop the fuzzy set value. The fuzzy logic is depends upon membership function

Unit-4 Natural Language processing and planning

Page 86

Backward chaining: Backward chaining (or backward

reasoning) is an inference method used in automated theorem provers, proof assistants and other artificial intelligence applications. It is one of the two most commonly used methods of reasoning with inference rules and logical implications the other is forward chaining. Backward chaining is implemented in logic programming by SLD resolution. Both rules are based on the modus ponens inference rule. Backward chaining starts with a list of goals (or a hypothesis) and works backwards from the consequent to the antecedent to see if there is data available that will support any of these consequents. An inference engine using backward chaining would search the inference rules until it finds one which has a consequent (Then clause) that matches a desired goal. If the antecedent (If clause) of that rule is not known to be true, then it is added to the list of goals (in order for one's goal to be confirmed one must also provide data that confirms this new rule). For example, suppose that the goal is to conclude the color of my pet Fritz, given that he croaks and eats flies, and that the rule base contains the following four rules:
1. 2. 3. 4.

If X croaks and eats flies Then X is a frog If X chirps and sings Then X is a canary If X is a frog Then X is green If X is a canary Then X is yellow
Page 87

This rule base would be searched and the third and fourth rules would be selected, because their consequents (Then Fritz is green, Then Fritz is yellow) match the goal (to determine Fritz's color). It is not yet known that Fritz is a frog, so both the antecedents (If Fritz is a frog, If Fritz is a canary) are added to the goal list. The rule base is again searched and this time the first two rules are selected, because their consequents (Then X is a frog, Then X is a canary) match the new goals that were just added to the list. The antecedent (If Fritz croaks and eats flies) is known to be true and therefore it can be concluded that Fritz is a frog, and not a canary. The goal of determining Fritz's color is now achieved (Fritz is green if he is a frog, and yellow if he is a canary, but he is a frog since he croaks and eats flies; therefore, Fritz is green). Note that the goals always match the affirmed versions of the consequents of implications (and not the negated versions as in modus tollens) and even then, their antecedents are then considered as the new goals (and not the conclusions as in affirming the consequent) which ultimately must match known facts (usually defined as consequents whose antecedents are always true); thus, the inference rule which is used is modus ponens. Because the list of goals determines which rules are selected and used, this method is called goal-driven, in contrast to data-driven forward-chaining inference. The backward chaining approach is often employed by systems. Programming languages such as Prolog, Knowledge Machine and Eclipse support backward chaining within their inference engines.
Page 88

Parsing: Parser is an algorithm for inferring the structure

Of its input, guided by a grammar that dictates what Structures are possible or probable. In an ordinary Parser, the input is a string, and the grammar ranges over strings. This explores generalizations of Ordinary parsing algorithms that allow the input to Consist of string tuples and/or the grammar to range Over string tuples. Such inference algorithms can perform various kinds of analysis on parallel texts, Also known as multi texts. Figure 1 show some of the ways in which ordinary parsing can be generalized. A synchronous parser is an algorithm that can infer the syntactic structure of each component text in a multi text and simultaneously infer the correspondence relation Between this structures.1 when a parsers input can have fewer dimensions than the parsers grammar, we call it a translator. When a parsers grammar can have fewer dimensions than the parsers input, we call it a synchronizer. The corresponding processes are called translation and synchronization. To our knowledge, synchronization has never been explored as a class of algorithms. Neither has the relationship between parsing and word alignment. The relationship between translation and ordinary parsing was noted a long time. But here we articulate it in more detail: ordinary parsing is a special Case of synchronous parsing, which is a special case of translation. This paper offers an informal guided tour of the generalized parsing algorithms in Figure 1. It culminates with a recipe for using these algorithms to train and apply a syntaxaware statistical machine translation (SMT) system.
Page 89

Machine translation: Machine translation is architecture

for SMT that revolves around multi trees. Figure 2 shows how to build and use a rudimentary Machine Translation system, starting from some multi text and one or more monolingual tree banks. The recipe follows: T1. Induce a word-to-word translation model. T2. Induce PCFGs from the relative frequencies of productions in the monolingual tree banks T3. Synchronize some multi text,
Page 90

T4. Induce an initial PMTG from the relative frequencies of productions in the multi tree bank. T5. Re-estimate the PMTG parameters, using a Synchronous parser with the expectation smearing. A1. Use the PMTG to infer the most probable multi tree Covering new input text. A2. Linearize the output dimensions of the multi tree. Steps T2, T4 and A2 are trivial. Steps T1, T3, T5, and A1 are instances of the generalized parsers Figure 2 is only architecture. Computational Complexity and generalization error stand in the Way of its practical implementation. Nevertheless, it is satisfying to note that all the non-trivial algorithms In Figure 2 are special cases of Translator CT. It is therefore possible to implement an MTSMT System using just one inference algorithm, parameterized By a grammar, a smearing, and a search Strategy. An advantage of building an MT system in This manner is that improvements invented for ordinary Parsing algorithms can often be applied to all The main components of the system. For example, Me lamed (2003) showed how to reduce the computational complexity of a synchronous parser by _ _3_, just by changing the logic. The same optimization can be applied to the inference algorithms... With proper software design, such optimizations Need never be implemented more than once. For simplicity, the algorithms in this are based on CKY logic. However, the architecture in Figure 2 can also be implemented using
Page 91

generalizations of more sophisticated parsing logics, such as those inherent in Early or Head-Driven parsers

Benefits of Machine Translation: There are three research benefits of using generalized Parsers to build MT systems.

Page 92

We can take advantage of past and future research on making parsers more accurate and more efficient. Therefore, 2. We can concentrate our efforts on better models, without worrying about MT-specific search algorithms. 3.More generally and most importantly, this approach encourages MT research to be less specialized and more transparently related to the rest of computational linguistics.
1.

Block world: The technique we are about to discuss can be

applied in a wide variety of task domains, and they have been. But to make it easy to compare the variety of methods we consider, we should find it useful to look at all of them in a single domain that is complex enough that we need for each of the mechanisms is apparent yet simple enough that easy-tofollow examples can be found. The blocks world is such a domain. There is a flat surface on which blocks can be placed. There are a number of square blocks, all the same size. They can be stacked one upon another. There is a robot arm that can manipulate the blocks. The actions it can perform include: UNSTACK (A, B) - pick up block A from its current position on block B. the arm must be empty and blocks A must have no blocks on top of it. STACK (A, B) - place block A on block B. the arm must already be holding and the surface of B must be clear. PICKUP (A) - picks up block A from the table and holds it. The arm must be nothing on top of block A.
Page 93

PUTDOWN (A)- put block A down on the table. The arm must have been holding block A. Notice that in the world we have described, the robot arm can hold only one block at a time. Also, since all blocks are the same size, each block can have at most one other block directly on top of it. In order to specify both the conditions under which an operation may be performed and the results of performing it, we need to use the following predicates: ON (A, B) - block A is on block B. ONTABLE (A) - block A is on the table. CLEAR (A) - there is nothing on top of block A. HOLDING (A) - The arm is holding block A. ARMEMPTY- the arm is holding nothing.

Various logical statements are true in this blocks world. For example, [ x: HOLDING (x)] x: ONTABLE(x) ARMEMPTY y: ON(x,y)

The first of these statements says simply that if the arm is holding anything, then it is not empty. The second says that if a block is on the table, then it is not also on another block. The third says that any block with no blocks on it is clear.

Page 94

Components of planning system: The components of planning systems are as follows: Choose the best rule to apply next based on the best available information Apply the chosen rule to compute the new problem state that arises from the application Detect when a solution has been found Detect dead ends so that they can be abandoned and the systems effort directed in more fruitful directions. Detect when an almost correct solution has been found and employ special techniques to make it totally correct.

Choosing rules to apply. The most widely used technique for selecting appropriate rules to apply is first to isolate a set of differences between the desired goal state and the current state and then to identify those rules that are relevant to reducing those differences. If several rules are found, a variety of other heuristic information can be exploited to choose among them. Applying Rules. Each rule simply specified the problem state that would result from its application. Now, however, we must be able to deal with rules that specify only a small part of the complete problem state. One way is to describe, for each action, each of the changes it makes to the state description. In addition, some statement that everything else remains unchanged is also necessary. Fig shows how a state, called S0, of a simple blocks world problem could be represented.
Page 95 A

ON (A, B, S0) ^ ONTABLE (B, S0) ^ CLEAR (A, S0)

Fig1: a simple blocks world description

If we start with the situation shown in fig, we would describe it as

ON (A, B) ^ONTABLE (B) ^CLEAR (A)

After applying the operator UNSTACK (A, B), our description of the world would be
ONTABLE (B) ^CLEAR (A) ^CLEAR (B) ^HOLDING (A)

STRIPS-style operators that correspond to the blocks world operations we have discussed are shown in fig2. Notice that for simple rules such as these the PRECONDITION lists is often identical to the DELETE list. In order to pick up a block, the robot arm must be empty; as soon as it picks up a block, it is no longer empty. But preconditions are not always detected. For example, in order for the arm to pick up a block, the block must have no other blocks on top of it: after it is picked up, it still has no blocks on top of it. This is the reason that the PRECONDITION and DELETE lists must be specified separately.
STACK(X, Y)
Page 96

P: CLEAR(Y) ^HOLDING(X) D: CLEAR(Y) ^HOLDING(X) A: ARMEMPTYÔN(X, Y) UNSTACK(X, Y) P: ON(X, Y) ^CLEAR(X) ÂRMEMPTY D: ON(X, Y) ÂRMEMPTY A: HOLDING(X) ÔN(X, Y)

PICKUP(X) P: CLEAR(X) ÔNTABLE(X) ÂRMEMPTY D: ONTABLE(X) ÂRMEMPTY A: HOLDING(X) PUTDOWN(X) P: HOLDING(X) D: HOLDING(X) A: ONTABLE(X) ÂRMEMPTY

Fig: STRIPS- style operators for the blocks world Detecting a solution. A planning system has succeeded in finding a solution to a problem when it has found a sequence of operators that transforms the initial problem state into the goal state.
Page 97

Detecting Dead ends. As a planning system is searching for a sequence of operators to solve a particular problem, it must be able to detect when it is exploring a path that can never lead to a solution. The same reasoning mechanisms that can be used to detect a solution can often be used for detecting a dead end. Goal stack planning: The technique to be developed for solving compound goals that many interact was the use of a goal stack. This was the approach used by STRIPS. In this method, the problem solver makes use of a single stack that contains both goals and operators that have been proposed to satisfy those goals. The problem solver also relies on a database that describes the current situation and a set of operators described in PRECONDITION, ADD, and DELETE lists. To see how this method works, let us carry it through for the simple example shown in fig.
B C D C B

Start: ON (B, A) ^ ON TABLE (A) ^ ONTABLE(C) ^ ONTABLE (D) ^ ARMEMPTY

goal: ON (C, A) ^ ON (B, D) ^ ONTABLE (A) ^ ONTABLE (D)

Fig: a very simple Blocks world problem

Page 98

When we begin solving this problem, the goal stack is simply

ON (C, A) ^ ON (B, D) ^ ONTABLE (A) ^ ONTABLE (D)

But we want to separate this problem into four sub problems, one for each component of the original goal. Two of the sub problems, ONTABLE (A) and ONTABLE (D), are already true in the initial state. Depending on the order in which we want to tackle the sub problems, there are two goals stacks that could be created as our first step, where each line represents one goal on the stack and OTAD is an abbreviation for ONTABLE (A) ^
ONTABLE (D):

ON (C, A) ON (B, D) ON (C, A) ^ ON (B, D) ^ ONTAD [1]

ON (B, D) ON (C, A) ON (C, A) ^ ON (B, D) ^ ONTAD [2]

To continue with the example we started above, let us assume that we choose first to explore alternative1. Alternative2 will also lead to a solution. In fact, it finds one so trivially that it is not very interesting. Exploring alternative 1, we first check to see whether ON (C, A) is true in the current state.

Partial-Order Planner. Any planner that maintains a partial

solution as a totally ordered list of steps found so far is called a total-order planner, or a linear planner. Alternatively, if we
Page 99

only represent partial-order constraints on steps, then we have a partial-order planner, which is also called a non-linear planner. In this case, we specify a set of temporal constraints between pairs of steps of the form S1 < S2 meaning that step S1 comes before, but not necessarily immediately before, step S2. We also show this temporal constraint in graph form as S1 +++++++++> S2 STRIPS is a total-order planner, as are situation-space progression and regression planners Principle of Least Commitment The principle of least commitment is the idea of never making a choice unless required to do so. In other words, only do something if it's necessary. The advantage of using this principle is that we try to avoid doing work that might have to be undone later, hence avoiding wasted work. In planning, one application of this principle is to never order plan steps unless it's necessary for some reason. So, partial-order planners exhibit this property because constraints ordering steps will only be inserted when necessary. On the other hand, situation-space progression planners make commitments about the order of steps as they try to find a solution and therefore may make mistakes from poor guesses about the right order of steps. Representing a Partial-Order Plan A partial-order plan will be represented as a graph that describes the temporal constraints between plan steps selected so far. That is, each node will represent a single step in the plan (i.e., an
Page 100

instance of one of the operators), and an arc will designate a temporal constraint between the two steps connected by the arc. For example, S1 ++++++++> S2 ++++++++++> S5 |\ ^ | \++++++++++++++++| | | v | ++++++> S3 ++++++> S4 ++++++ graphically represents the temporal constraints S1 < S2, S1 < S3, S1 < S4, S2 < S5, S3 < S4, and S4 < S5. This partial-order plan implicitly represents the following three total-order plans, each of which is consistent with all of the given constraints: [S1,S2,S3,S4,S5], [S1,S3,S2,S4,S5], and [S1,S3,S4,S2,S5].
Partial-Order Planner (POP) Algorithm function pop(initial-state, conjunctive-goal, operators) // non-deterministic algorithm plan = make-initial-plan(initial-state, conjunctive-goal); loop: begin if solution?(plan) then return plan; (S-need, c) = select-subgoal(plan) ; // choose an unsolved goal choose-operator(plan, operators, S-need, c); // select an operator to solve that goal and revise plan resolve-threats(plan); // fix any threats created end end function solution?(plan) if causal-links-establishing-all-preconditions-of-all-steps(plan) and all-threats-resolved(plan)
Page 101

and all-temporal-ordering-constraints-consistent(plan) and all-variable-bindings-consistent(plan) then return true; else return false; end function select-subgoal(plan) pick a plan step S-need from steps(plan) with a precondition c that has not been achieved; return (S-need, c); end procedure choose-operator(plan, operators, S-need, c) // solve "open precondition" of some step choose a step S-add by either Step Addition: adding a new step from operators that has c in its Add-list or Simple Establishment: picking an existing step in Steps(plan) that has c in its Add-list; if no such step then return fail; add causal link "S-add --->c S-need" to Links(plan); add temporal ordering constraint "S-add < S-need" to Orderings(plan); if S-add is a newly added step then begin add S-add to Steps(plan); add "Start < S-add" and "S-add < Finish" to Orderings(plan); end end procedure resolve-threats(plan) foreach S-threat that threatens link "Si --->c Sj" in Links(plan) begin // "declobber" threat choose either
Page 102

Demotion: add "S-threat < Si" to Orderings(plan) or Promotion: add "Sj < S-threat" to Orderings(plan); if not(consistent(plan)) then return fail; end end

Recursive transition network recursive transition network

("RTN") is a graph theoretical schematic used to represent the rules of a context free grammar. RTNs have application to programming languages, natural language and lexical analysis. Any sentence that is constructed according to the rules of an RTN [1] is said to be "well-formed." The structural elements of a well-formed sentence may also be well-formed sentences by themselves, or they may be simpler structures. This is why RTNs are described as recursive. A sentence is generated by a RTN by applying the generative rules specified in the RTN itself. These represent any set of rules or a function consisting of a finite number of steps.

Unit-5 Expert System and AI languages Introduction: An expert system is a set of programs that
manipulate encoded knowledge to solve problems in a specialized domain that normally requires human expertise. An expert systems knowledge is obtained form expert sources and
Page 103

coded in a form suitable for the system to use in its interference or reasoning processes. The expert knowledge must be obtained from specialists or other sources of expertise, such as texts, journal articles, and data bases. This type of knowledge usually requires much training and experience in some specialized field such as medicine, geology, system configuration, or engineering design.

Characteristic Features of Expert systems: Expert

systems differ from conventional computer systems in several important ways.
1.

Expert systems use knowledge rather than data to control the solution process. Much of the knowledge used is heuristic in nature rather than algorithmic. The knowledge is encoded and maintained as an entity separate from the control program. As such, it is not compiled together with the control program itself. This permits the incremental addition and modification (refinement) of the knowledge base without recompilation of the control programs. Furthermore, it is possible in some cases to use different knowledge bases with the same control programs to produce different types of expert systems. Expert systems are capable of explaining how a particular conclusion was reached, and why requested information is
Page 104

needed during a consultation. This is important as it gives the user a chance to assess and understand the systems reasoning ability, thereby improving the users confidence in the system. 4. Expert systems use symbolic representations for knowledge (rules, networks, or frames) and perform their interference through symbolic computations that closely resemble manipulations of natural language. Expert systems often reason with Meta knowledge; that is, they reason with knowledge about themselves, and their own knowledge limits and capabilities. Applications: Different types of medical diagnoses(internal medicine, pulmonary diseases, infectious blood disease, and so on) Diagnosis of complex electronic and electrochemical systems. Diagnosis of diesel electric locomotion systems. Diagnosis of software development projects. Forecasting crop damage. Location of faults in computer and communication systems.

Page 105

Expert and Systems analyst

Development engine

Knowledge Base

Pro ble m Do mai n

Inference engine

User interface

User

Fig. An expert system model.

Rule-Based System Architectures: The most common

form of architecture used in expert and other types of knowledge based systems is the production system, also called the rule-based system. This type of system uses knowledge encoded in the form of production rules, that is, if then rules. IF: Condition-1 and Condition-2 and Condition-3 THEN Take Action-4
Page 106

IF: The temperature is greater than 200 degrees, and the water level is low THEN: Open the safety valve. A&B & C&D E&F

Each rule represents a small chunk of knowledge relating to the given domain of expertise which leads from some initially known facts to some useful conclusions or action part of the rule is then accepted as known(or at least known with some degree of certainty). Inference in production systems is accomplished by a process of chaining through the rules recursively, either in a forward or backward direction, until a conclusion is reached or until failure occurs. The selection of rules used in the chaining process is determined by matching current facts against the domain knowledge or variables in rules and choosing among a candidate set of rules the ones that meet some given criteria, such as specificity. The inference process is typically carried out in an interactive mode with the user providing input parameters needed to complete the chaining process.

Page 107

EXPERT SYSTEM
USER

Explanation Module

Input

Inference engine I/O interface

Case history file

Output

Editor

Knowledge base

Working memory

Learning Module

Fig. Components of a typical expert system The Knowledge Base: The Knowledge base contains facts and rules about some specialized knowledge domain.
Page 108

The Inference Process: The inference engine accepts user input queries and responses to questions through the I/O interface and uses this dynamic information together with the static knowledge (the rules and facts) stored in the knowledge base. The knowledge in the knowledge base is used to derive conclusions about the current case or situation as presented by the users input. During the match stage, the contents of working memory are compared to facts and rules contained in the knowledge base. When consistent matches are found, the corresponding rules are placed in a conflict set. To find an appropriate and consistent match, substitutions (instantiations) may be required. Once all the matched rules have been added to the conflict set during a given cycle, one of the rules is selected for execution. When the left side of a sequence of rules is instantiated first and the rules are executed from left to right, the process is called forward chaining. This is also known as datadriven inference since input data are used to guide the direction of the inference process. For example, we can chain forward to show that when a student is encouraged, is healthy, and has goals, the student will succeed.
ENCOURAGED (student)
Page 109

MOTIVATED (student)

MOTIVATED (student) &HEALTHY (student) WORKHARD (student) &HASGOALS (student) EXCELL (student)

WORKHARD (student) EXCELL (student)

SUCCED (student)

On the other hand, when the right side of the rules is instantiated first, the left-hand conditions become sub goals. These sub goals may in turn cause sub- sub goals to be established, and so on until facts are found to match the lowest sub goals conditions. When this form of inference takes place, we say that backward chaining is performed. This form of inference is also known as goal-driven inference since an initial goal establishes the backward direction of the inferring. Explanation Module: The Explanation module provides the user with an explanation of the reasoning process when requested. This is done in response to a how query or a why query. To respond to a how query, the explanation module traces the chain of rules fired during a consultation with the user. The sequence of rules that led to the conclusion is then printed for the user in an easy to understand human-language style. This permits the user to actually see the reasoning process followed by the system in arriving at the conclusion. If the user does not agree with the reasoning steps presented they may be changed using the editor.
Page 110

To respond to a why query, the explanation module must be able to explain why certain information is needed by the inference engine to complete a step in the reasoning process before it can proceed. For example, in diagnosing a car that will not start, a system might be asked why it needs to know the status of the distributor spark. Building a knowledge Base: The editor is used by developers to create new rules for addition to the knowledge base, to delete outmoded rules, or to modify existing rules in some way. Consistency tests for newly created rule. Such systems also prompt the user for missing information. The I/O Interface: The input-output interface permits the user to communicate with the system in a more natural way by permitting the use of simple selection menus or the use of a restricted language which is close to a natural language. This means that the system must have special prompts or a specialized vocabulary which encompasses the terminology of the given domain of expertise.

Nonproduction System Architectures: Instead of rules,

these systems employ more structured representation schemes like associative or semantic networks, frame and rule structures, and decision trees, or even specialized networks like neural networks.

Page 111

Associative or Semantic Network Architectures: We know that an associative network is a network made up of nodes connected by directed arcs. The nodes represent objects, attributes, concepts, or other basic entities, and the arcs, which are labeled, describe the relationship between the two nodes they connect. Special network links include the ISA and HASPART links which designate an object as being a certain type of object (belonging to a class of objects) and as being a subpart of another object, respectively. Associative network representations are especially useful in depicting hierarchical knowledge structures, where property inheritance is common. More often, these network representations are used in natural language or computer vision systems or in conjunction with some other form of representation. Frame Architectures: Frames are structured sets of closely related knowledge, such as an object or concept name, the objects main attributes and their corresponding values, and possibility some attached procedures (if-needed, if-added, ifremoved procedures). The attributes, values, and procedures are stored in specified slots facets of the frame. Individual frames are usually linked together as a much like the nodes in an associative network.

Page 112

Decision Tree Architectures: Knowledge for expert systems may be stored in the form of a decision tree when the knowledge can be structured in a top-to-bottom manner. For example, the identification of objects (equipment, faults, physical objects, diseases) can be made through a decision tree structure. Initial and intermediate nodes in the tree correspond to object attributes, and terminal nodes correspond to the identities of objects. Attribute values for an object determine a path to a leaf node in the tree which contains object identification. Each object attribute corresponds to a non terminal node in the tree and each branch of the decision tree corresponds to an attribute value or set of values. Blackboard system Architectures: Blackboard architectures refer to a special type of knowledge-based system which uses a form of opportunistic reasoning. This differs from pure forward or pure backward chaining in production systems in that either direction may be chosen dynamically at each stage in the problem solution process. Blackboard systems are composed of three functional components as depicted in fig
1.

There are a number of knowledge sources which are separate and independent sets of coded knowledge. Each knowledge source may be thought of as a specialist in some limited area needed to solve a given subset of
Page 113

problems; the sources may contain knowledge in the form of procedures, rules, or other schemes.
2.

A globally accessible data base structure, called a blackboard, contains the current problem state and information needed by the knowledge sources (input data, partial solutions, control data, alternatives, and final solutions). The knowledge sources make changes to the blackboard data that incrementally lead to a solution. Communication and interaction between the knowledge sources takes place solely through the blackboard.

3. Control information may be contained within the sources, on the blackboard, or possibly in a separate module. The control knowledge monitors the changes to the blackboard and determines what the immediate focus of attention should be in solving the problem. Analogical Reasoning Architectures: Expert systems based on analogical architectures solve new problems like humans, by finding a similar problem solution that is known and applying the known solution to the new problem, possibly with some modifications, for example, if we know a method of proving that the product of two even integer is even, we can successfully prove that the product of two odd integers is odd through much the same proof steps. Expert systems using analogical architectures will require a large knowledge base having
Page 114

numerous problem solutions and other previously encountered situations or episodes Neural Network Architectures: Neural networks are large networks of simple processing elements or nodes which process information dynamically in response to external inputs. The nodes are simplified models of neurons. The knowledge in a neural network is distributed throughout the network in the form of internodes connections and weighted links which form the inputs to the nodes. The link weights serve to enhance or inhibit the input stimuli values which are then added together at the nodes. If the sum of all the inputs to a node exceeds some threshold value T, the node executes and produces an output
Blackboard Knowledge sources

Page 115

Control information Fig. Components of blackboard systems. which is passed on to other nodes or is used to produce some output response. Neural networks were originally inspired as being models of the human nervous system. They are generally simplified models to be sure.

Knowledge acquisition: Knowledge for expert systems must

be derived from expert sources like experts in the given field, journal articles, texts, reports, data bases, and so on. Elicitation of the right Knowledge can take several man years and cost hundreds of thousands of dollars. This process is now recognized as one of the main bottlenecks in building expert and other Knowledge-based systems. Consequently, much effort has been developed to more effective methods of acquisition and coding. Pulling together and correctly interpreting the right knowledge to solve a set of complex tasks is an onerous job. Typically, experts do not know what specific Knowledge is being neither applied nor just how it is applied in the solution of a given problem. Even if they do know, it is likely they are unable to articulate the problem solving process well enough to
Page 116

capture the low-level Knowledge used and the inferring processes applied. This difficulty has lead to the use of AI experts (called Knowledge engineers) who serve as intermediaries between the domain expert and the system. The knowledge engineer elicits information from the experts and codes this Knowledge into a form suitable for use in the expert system. The Knowledge elicitation process is depicted in fig. To elicit the requisite Knowledge, a Knowledge engineer conducts extensive interviews with domain experts. During the interviews, the expert is asked to solve typical problems in the domain of interest and to explain his or her solutions.

Domain D Expert
Knowledge engineer

System Editor

Knowledge Base

Fig the Knowledge acquisition process. Using the Knowledge gained from the experts and other sources, the knowledge engineer codes the knowledge in the form of rules or some other representation scheme. This
Page 117

knowledge is then used to solve sample problems for review. Errors and omissions are uncovered and corrected, and additional knowledge is added as needed. The process is repeated until a sufficient body of knowledge has been collected to solve a large class of problems in the chosen domain. The whole process may take as many as tens of person years.

Lisp: Lisp is one of the oldest computer programming

languages. It was invented by John Mc carthy during the late 1950s, shortly after the development of FORTRAN LISP is particularly suited for AI programs because of its ability to process symbolic information effectively. It is a language with a simple syntax, with little or no data typing and dynamic memory management. Lisp has become the language of choice for most AI practioners. It was practically unheard of outside the research community until AI began to gain some popularity ten to fifteen years ago. Since then, special LISP processing machines have been built and its popularity has spread to many new sectors of business and government. The basic building blocks of LISP are the atom, list, and the string. An atom is a number or string of contiguous characters, including numbers and special characters. A list is a sequence of atoms and/or other lists enclosed within

Page 118

parentheses. A string is a group of characters enclosed in double quotation marks. Lisp programs run either on an interpreter or as compiled code. The interpreter examines source programs in a repeated loop, called the read-evaluate-print loop. This loop reads the program code, evaluates it, and prints the values returned by the program. The interpreter signals its read lines to accept code for execution by printing a prompt such as the -> symbol. Examples Here are examples of Common Lisp code. The basic "Hello world" program: (Print "Hello world") As the reader may have noticed from the above discussion, Lisp syntax lends itself naturally to recursion. Mathematical problems such as the enumeration of recursively defined sets are simple to express in this notation. Evaluate a number's factorial: (Defun factorial (n) (if (<= n 1) 1 (* n (factorial (- n 1)))))
Page 119

An alternative implementation, often faster than the previous version if the Lisp system has tail recursion optimization: (defun factorial (n &optional (acc 1)) (if (<= n 1) acc (factorial (- n 1) (* acc n)))) Contrast with an iterative version which uses Common Lisp's loop macro: (defun factorial (n) (loop for i from 1 to n for fac = 1 then (* fac i) finally (return fac))) The following function reverses a list. (Lisp's built-in reverse function does the same thing.) (defun -reverse (list) (let ((return-value '())) (dolist (e list) (push e return-value)) return-value))

Prolog: Prolog is a general purpose logic programming

language associated with artificial intelligence and computational linguistics. Prolog has its roots in formal logic, and unlike many other programming languages, Prolog is declarative: The program
Page 120

logic is expressed in terms of relations, represented as facts and rules. A computation is initiated by running a query over these relations.[4] The language was first conceived by a group around Alain Colmerauer in Marseille, France, in the early 1970s and the first Prolog system was developed in 1972 by Colmerauer with Philippe Roussel.[5][6] Prolog was one of the first logic programming languages,[7] and remains among the most popular such languages today, with many free and commercial implementations available. While initially aimed at natural language processing, the language has since then stretched far into other areas like theorem proving,[8] expert systems,[9] games, automated answering systems, ontologies and sophisticated control systems. Modern Prolog environments support creating graphical user interfaces, as well as administrative and networked applications. Syntax and semantics. In Prolog, program logic is expressed in terms of relations, and a computation is initiated by running a query over these relations. Relations and queries are constructed using Prolog's single data type, the term. Relations are defined by clauses. Given a query, the Prolog engine attempts to find a resolution refutation of the negated query. If the negated query can be refuted, i.e., an instantiation for all free variables is found that makes the union of clauses and the singleton set consisting of the negated query false, it follows that the original query, with the found instantiation applied, is a logical consequence of the program. This makes Prolog (and other logic programming
Page 121

languages) particularly useful for database, symbolic mathematics, and language parsing applications. Because Prolog allows impure predicates, checking the truth value of certain special predicates may have some deliberate side effect, such as printing a value to the screen. Because of this, the programmer is permitted to use some amount of conventional imperative programming when the logical paradigm is inconvenient. It has a purely logical subset, called "pure Prolog", as well as a number of extra logical features. Data types Prolog's single data type is the term. Terms are either atoms, numbers, variables or compound terms.

An atom is a general-purpose name with no inherent meaning. Examples of atoms include x, blue, 'Taco', and 'some atom'. Numbers can be floats or integers. Variables are denoted by a string consisting of letters, numbers and underscore characters, and beginning with an upper-case letter or underscore. Variables closely resemble variables in logic in that they are placeholders for arbitrary terms. A compound term is composed of an atom called a "functor" and a number of "arguments", which are again terms. Compound terms are ordinarily written as a functor followed by a comma-separated list of argument terms, which is contained in parentheses. The number of arguments is called the term's arity. An atom can be
Page 122

regarded as a compound term with arity zero. Examples of compound terms are truck_year('Mazda', 1986) and 'Person_Friends'(zelda,[tom,jim]). Special cases of compound terms:

A List is an ordered collection of terms. It is denoted by square brackets with the terms separated by commas or in the case of the empty list, []. For example [1,2,3] or [red,green,blue]. Strings: A sequence of characters surrounded by quotes is equivalent to a list of (numeric) character codes, generally in the local character encoding, or Unicode if the system supports Unicode. For example, "to be, or not to be".

Examples Here follow some example programs written in Prolog. Hello world An example of a query: ?- write('Hello world!'), nl. Hello world! true. ?Compiler optimization Any computation can be expressed declaratively as a sequence of state transitions. As an example, an optimizing compiler with
Page 123

three optimization passes could be implemented as a relation between an initial program and its optimized form: program optimized(Prog0, Prog) :optimization_pass_1(Prog0, Prog1), optimization_pass_2(Prog1, Prog2), optimization_pass_3(Prog2, Prog). or equivalently using DCG notation: program_optimized --> optimization_pass_1, optimization_pass_2, optimization_pass_3. Quicksort The Quicksort sorting algorithm, relating a list to its sorted version: partition([], _, [], []). partition([X|Xs], Pivot, Smalls, Bigs) :( X @< Pivot -> Smalls = [X|Rest], partition(Xs, Pivot, Rest, Bigs) ; Bigs = [X|Rest], partition(Xs, Pivot, Smalls, Rest) ). quicksort([]) --> []. quicksort([X|Xs]) --> { partition(Xs, X, Smaller, Bigger) }, quicksort(Smaller), [X], quicksort(Bigger).
Page 124

Dynamic programming The following Prolog program uses dynamic programming to find the longest common subsequence of two lists in polynomial time. The clause database is used for memorization: :- dynamic(stored/1). memo (Goal) :- ( stored (Goal) -> true ; Goal, asserts(stored(Goal)) ). lcs([], _, []) :- !. lcs(_, [], []) :- !. lcs([X|Xs], [X|Ys], [X|Ls]) :- !, memo(lcs(Xs, Ys, Ls)). lcs([X|Xs], [Y|Ys], Ls) :memo(lcs([X|Xs], Ys, Ls1)), memo(lcs(Xs, [Y|Ys], Ls2)), length(Ls1, L1), length(Ls2, L2), ( L1 >= L2 -> Ls = Ls1 ; Ls = Ls2 ). Example query: ?- lcs([x,m,j,y,a,u,z], [m,z,j,a,w,x,u], Ls). Ls = [m, j, a, u] Design patterns A design pattern is a general reusable solution to a commonly occurring problem in software design. In Prolog, design patterns go under various names: skeletons and techniques, clichs, program schemata, and logic description schemata. An alternative to design patterns is higher order programming.
Page 125

Higher-order programming Main articles: Higher-order logic and Higher-order programming By definition, first-order logic does not allow quantification over predicates. A higher-order predicate is a predicate that takes one or more other predicates as arguments. Prolog already has some built-in higher-order predicates such as call/1, find all/3, setoff/3, and bag of/3.[16] Furthermore, since arbitrary Prolog goals can be constructed and evaluated at run-time, it is easy to write higher-order predicates like map list/2, which applies an arbitrary predicate to each member of a given list, and sub list/3, which filters elements that satisfy a given predicate, also allowing for currying.[15] To convert solutions from temporal representation (answer substitutions on backtracking) to spatial representation (terms), Prolog has various all-solutions predicates that collect all answer substitutions of a given query in a list. This can be used for list comprehension. For example, perfect numbers equal the sum of their proper divisors: Perfect (N):between (1, inf, N), U is N // 2, find all(D, (between(1,U,D), N mod D =:= 0), Ds), sum list(Ds, N). This can be used to enumerate perfect numbers, and also to check whether a number is perfect
Page 126

Mycin. Mycin was an early expert system developed over five

or six years in the early 1970s at Stanford University. It was written in Lisp as the doctoral dissertation of Edward Shortliffe under the direction of Bruce Buchanan, Stanley N. Cohen and others. It arose in the laboratory that had created the earlier Dendral expert system, but emphasized the use of judgmental rules that had elements of uncertainty (known as certainty factors) associated with them. This expert system was designed to identify bacteria causing severe infections, such as bacteremia and meningitis, and to recommend antibiotics, with the dosage adjusted for patient's body weight the name derived from the antibiotics themselves, as many antibiotics have the suffix "mycin". The Mycin system was also used for the diagnosis of blood clotting diseases. Method MYCIN operated using a fairly simple inference engine, and a knowledge base of ~600 rules. It would query the physician running the program via a long series of simple yes/no or textual questions. At the end, it provided a list of possible culprit bacteria ranked from high to low based on the probability of each diagnosis, its confidence in each diagnosis' probability, the reasoning behind each diagnosis (that is, MYCIN would also list the questions and rules which led it to rank a diagnosis a particular way), and its recommended course of drug treatment. Despite MYCIN's success, it sparked debate about the use of its ad hoc, but principled, uncertainty framework known as "certainty factors". The developers performed studies showing
Page 127

that MYCIN's performance was minimally affected by perturbations in the uncertainty metrics associated with individual rules, suggesting that the power in the system was related more to its knowledge representation and reasoning scheme than to the details of its numerical uncertainty model. Some observers felt that it should have been possible to use classical Bayesian statistics. MYCIN's developers argued that this would require either unrealistic assumptions of probabilistic independence, or require the experts to provide estimates for an unfeasibly large number of conditional probabilities.[1][2] Subsequent studies later showed that the certainty factor model could indeed be interpreted in a probabilistic sense, and highlighted problems with the implied assumptions of such a model. However the modular structure of the system would prove very successful, leading to the development of graphical models such as Bayesian networks Results Research conducted at the Stanford Medical School found MYCIN to propose an acceptable therapy in about 69% of cases, which was better than the performance of infectious disease experts who were judged using the same criteria. This study is often cited as showing the potential for disagreement about therapeutic decisions, even among experts, when there is no "gold standard" for correct treatment.

Page 128

Practical use MYCIN was never actually used in practice. This wasn't because of any weakness in its performance. As mentioned, in tests it outperformed members of the Stanford medical school faculty. Some observers raised ethical and legal issues related to the use of computers in medicine if a program gives the wrong diagnosis or recommends the wrong therapy, who should be held responsible? However, the greatest problem, and the reason that MYCIN was not used in routine practice, was the state of technologies for system integration, especially at the time it was developed. MYCIN was a stand-alone system that required a user to enter all relevant information about a patient by typing in response to questions that MYCIN would pose. The program ran on a large time-shared system, available over the early Internet (Arpanet), before personal computers were developed. In the modern era, such a system would be integrated with medical record systems, would extract answers to questions from patient databases, and would be much less dependent on physician entry of information. In the 1970s, a session with MYCIN could easily consume 30 minutes or morean unrealistic time commitment for a busy clinician. A difficulty that rose to prominence during the development of MYCIN and subsequent complex expert systems has been the extraction of the necessary knowledge for the inference engine
Page 129

to use from the human expert in the relevant fields into the rule base (the so-called knowledge engineering).

Page 130

HCIA AI Dump File
No ratings yet
HCIA AI Dump File
180 pages
Module 1 AI
No ratings yet
Module 1 AI
19 pages
Unit-4 of Ai
No ratings yet
Unit-4 of Ai
9 pages
Ai Note Unit 1-5 Panimalar
100% (1)
Ai Note Unit 1-5 Panimalar
380 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
130 pages
Artificial Intelligence Notes
No ratings yet
Artificial Intelligence Notes
130 pages
A Real-Time Flood Detection System Based On Machine Learning Algorithms With Emphasis On
No ratings yet
A Real-Time Flood Detection System Based On Machine Learning Algorithms With Emphasis On
8 pages
Eye OS
No ratings yet
Eye OS
9 pages
Unit-2 (Notes AI)
No ratings yet
Unit-2 (Notes AI)
37 pages
Artificial Inteligence PDF
No ratings yet
Artificial Inteligence PDF
328 pages
Artificial Intelligence Notes
100% (1)
Artificial Intelligence Notes
28 pages
Vi Sem Bca Unit 4 Artificial Intelligence and Applications Notes K.r.r.sir
No ratings yet
Vi Sem Bca Unit 4 Artificial Intelligence and Applications Notes K.r.r.sir
24 pages
AI Notes
No ratings yet
AI Notes
141 pages
Artificial Intelligence Questions & Answers - Agents: Join @mcqs - Sppu
No ratings yet
Artificial Intelligence Questions & Answers - Agents: Join @mcqs - Sppu
892 pages
Unit 2
No ratings yet
Unit 2
55 pages
Artificial Intelligence Notes
No ratings yet
Artificial Intelligence Notes
20 pages
Data Science Refresher: Gunjan Trivedi
No ratings yet
Data Science Refresher: Gunjan Trivedi
93 pages
Unit-4 Aiml
No ratings yet
Unit-4 Aiml
27 pages
Module - 1: Introduction To AI
No ratings yet
Module - 1: Introduction To AI
128 pages
Deep Learning Unit-1 Finals
No ratings yet
Deep Learning Unit-1 Finals
23 pages
The Evolution of Artificial Intelligence: What You Must Know About AI
No ratings yet
The Evolution of Artificial Intelligence: What You Must Know About AI
100 pages
Machine Learning
No ratings yet
Machine Learning
22 pages
N Grams Parsing - Cicling2013
No ratings yet
N Grams Parsing - Cicling2013
8 pages
Parkinsons Disease Detection
No ratings yet
Parkinsons Disease Detection
80 pages
Seminar Report
100% (1)
Seminar Report
42 pages
Dive Into Deep Learning
No ratings yet
Dive Into Deep Learning
60 pages
ML UNIT-IV Notes
100% (1)
ML UNIT-IV Notes
23 pages
What Are The Types of Machine Learning?
100% (1)
What Are The Types of Machine Learning?
24 pages
Artificial Intelligence Interview Questions
100% (1)
Artificial Intelligence Interview Questions
28 pages
Machine Learning Midterm
No ratings yet
Machine Learning Midterm
18 pages
Introduction To Deep Learning-1
No ratings yet
Introduction To Deep Learning-1
16 pages
Artificial Intelligence University Questions
No ratings yet
Artificial Intelligence University Questions
4 pages
Question bank-AI-12-13-10144CS601
100% (2)
Question bank-AI-12-13-10144CS601
30 pages
MACHINE LEARNING 1-5 (Ai &DS)
100% (1)
MACHINE LEARNING 1-5 (Ai &DS)
60 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
130 pages
Purdue PGP AI and ML
No ratings yet
Purdue PGP AI and ML
35 pages
Artificial Intelligence Module 5
No ratings yet
Artificial Intelligence Module 5
23 pages
T Thesis Topics in Machine Learning For Research Scholars
No ratings yet
T Thesis Topics in Machine Learning For Research Scholars
14 pages
CS6659 AI UNIT 1 Notes
100% (8)
CS6659 AI UNIT 1 Notes
47 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Ai Final
No ratings yet
Ai Final
52 pages
Reasoning in Ai
No ratings yet
Reasoning in Ai
14 pages
AIML Module 3
No ratings yet
AIML Module 3
25 pages
CCE3 - KNOWLEDGE REPRESENTATION AND ML DL With Answer
No ratings yet
CCE3 - KNOWLEDGE REPRESENTATION AND ML DL With Answer
46 pages
Made," and Intelligence Defines "Thinking Power", Hence AI Means "A Man-Made Thinking Power."
No ratings yet
Made," and Intelligence Defines "Thinking Power", Hence AI Means "A Man-Made Thinking Power."
42 pages
I M. Tech. - I Sem. (CSE) L T C 3 0 3 Program Elective I (16CS5010) Course Objectives
No ratings yet
I M. Tech. - I Sem. (CSE) L T C 3 0 3 Program Elective I (16CS5010) Course Objectives
8 pages
Ai-Unit2 - QB-VDP
No ratings yet
Ai-Unit2 - QB-VDP
13 pages
Department of Computer Science and Engineering
No ratings yet
Department of Computer Science and Engineering
23 pages
Artificial Intelligence: (Unit 1: Introduction)
No ratings yet
Artificial Intelligence: (Unit 1: Introduction)
10 pages
ML Overview Notes
No ratings yet
ML Overview Notes
23 pages
Machine Learning For Everyone - in Simple Words. With Real-World Examples. Yes, Again PDF
No ratings yet
Machine Learning For Everyone - in Simple Words. With Real-World Examples. Yes, Again PDF
62 pages
Training Report On Machine Learning
No ratings yet
Training Report On Machine Learning
27 pages
The 10 Algorithms Machine Learning Engineers Need To Know
No ratings yet
The 10 Algorithms Machine Learning Engineers Need To Know
14 pages
Artificial Intelligent - Questn Bank Print
No ratings yet
Artificial Intelligent - Questn Bank Print
27 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
AI Notes
No ratings yet
AI Notes
10 pages
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
From Everand
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
Dr.Chandrakant
No ratings yet
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
From Everand
Hopfield Networks: Fundamentals and Applications of The Neural Network That Stores Memories
Fouad Sabry
No ratings yet
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
From Everand
Hebbian Learning: Fundamentals and Applications for Uniting Memory and Learning
Fouad Sabry
No ratings yet
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
From Everand
Hybrid Neural Networks: Fundamentals and Applications for Interacting Biological Neural Networks with Artificial Neuronal Models
Fouad Sabry
No ratings yet

Artificial Intelligence

Uploaded by

Artificial Intelligence

Uploaded by

Unit-1 Overview and Search Technique Introduction to Artificial Intelligence

A* Search: The best-first search algorithm that was just

Hill Climbing Search: Hill climbing gets their names

Breadth-First Search: Breadth-first searches are

(-2) (-4) (3)

Alpha-beta pruning: Alpha-beta pruning is a search

Constraint Satisfaction, Constraint satisfaction is the

problem with problems in other areas such as finite model theory.

Evaluation function: An evaluation function, also known

Game tree: a game tree is a directed graph whose nodes are

Game of chance: A game of chance is a game whose

Unit-2 Knowledge Representation Introduction to Knowledge Representation (KR)

Predicate Logic: Propositional logic combines atoms An

In predicate logic each atom

More expressive logic than

Constants are objects: john, apples

Predicates are properties and relations: likes(john, apples)

Functions transform objects:

likes(john, fruit of(apple tree))

True for all objects (Universal): apples)

Every rose has a thorn

if (X is a rose) then there exists Y Higher Order Logic

(X has Y) and (Y is a thorn)

order also objects

E.g. define red functions as Much harder to reason with

Forward Chaining: In forward chaining the rules are

Backward chaining: Backward chaining is an inference

Conceptual Dependency formalism: Conceptual

CD primitive action 1. ATRANS

PTRANS (e.g. go) PROPEL (e.g. throw)

4. MOVE the animal 5. GRASP hold)

Action of producing sound (e.g. say).

Isa: Cardinality: *height: *bats:

adult-male 624 6-1 equal to handed

*batting-average: .252 *team: *uniform-color: Fielder Isa: Cardinality: ML-baseball-player 376

*batting-average: .262 Pee-Wee-Reese Instance: Height: Bats: fielder 5-10 right

Batting-average: .309 Team: Uniform-color: Brooklyn-Dodgers Blue

Cardinality: *team-size: *manager: Brooklyn-dodgers Instance: Team-size: Manager: Players:

ML-Baseball-Team 24 Leo-Durocher (Pee-Wee-Reese)

fig. A simplified frame system

Wff. Not all strings can represent propositions of the predicate

True and False are wffs.

Dempster Shafer Theory: This theory was developed by

{O} Fig Lattice of subsets of the universe U.

Learning: One of the most often heard criticisms of AI is that

Learning Model: Learning can be accomplished using a

Fig. Learning Model

Feedback Learning Algorithms

Supervised learning: Supervised learning is the machine

Unsupervised Learning: What if a neural network is given

Dog Cat Bat Whale Canary Robin Ostrich Snake Lizard

Learning by Induction : What is "Learning by Induction"?

Learning Decision tree

Decision tree advantages Decision trees have various advantages:

Truth maintenance system : Truth maintenance system,

Nodded Dependency-directed backtracking is a problem-solving (

Dependency directed Backtracking: Dependency directed

Unit-4 Natural Language processing and planning

Backward chaining: Backward chaining (or backward

Parsing: Parser is an algorithm for inferring the structure

Machine translation: Machine translation is architecture

Block world: The technique we are about to discuss can be

ON (A, B, S0) ^ ONTABLE (B, S0) ^ CLEAR (A, S0)

Fig1: a simple blocks world description

If we start with the situation shown in fig, we would describe it as

Start: ON (B, A) ^ ON TABLE (A) ^ ONTABLE(C) ^ ONTABLE (D) ^ ARMEMPTY

goal: ON (C, A) ^ ON (B, D) ^ ONTABLE (A) ^ ONTABLE (D)

Fig: a very simple Blocks world problem

When we begin solving this problem, the goal stack is simply

ON (C, A) ON (B, D) ON (C, A) ^ ON (B, D) ^ ONTAD [1]

ON (B, D) ON (C, A) ON (C, A) ^ ON (B, D) ^ ONTAD [2]

Partial-Order Planner. Any planner that maintains a partial

Recursive transition network recursive transition network

Characteristic Features of Expert systems: Expert

Expert and Systems analyst

Pro ble m Do mai n

Isa: Cardinality: height: bats:

batting-average: .252 team: *uniform-color: Fielder Isa: Cardinality: ML-baseball-player 376

Cardinality: team-size: manager: Brooklyn-dodgers Instance: Team-size: Manager: Players: