AI Algo and Difference
AI Algo and Difference
Best first search is a simplified A*. A search method or heuristic is informed if it uses additional information about nodes
1. Start with OPEN holding the initial nodes. that have not yet been explored to decide which nodes to examine next. If a method is
2. Pick the BEST node on OPEN such that f = g + h' is minimal. not informed, it is uninformed, or blind. In other words, search methods that use
3. If BEST is goal node quit and return the path from initial to BEST Otherwise heuristics are informed, and those that do not are blind.
4. Remove BEST from OPEN and all of BEST's children, labelling each with its Best-first search is an example of informed search, whereas breadth-first and depth-first
path from initial node. search are uninformed or blind. A heuristic h is said to be more informed than another
heuristic, j, if h(node) ≤j(node) for all nodes in the search space. (In fact, in order for h
Proof of Admissibility of A* to be more informed than j, there must be some node where h(node) < j(node).
We will show that A* is admissible if it uses a monotone heuristic. Otherwise they are as informed as each other.) The more informed a search method is,
A monotone heuristic is such that along any path the f-cost never decreases. the more efficiently it will search.
But if this property does not hold for a given heuristic function, we can make the f value Forward reasoning. Given a goal graph and an initial values assignment to some goals,
monotone by making use of the following trick (m is a child of n) input goals from now on (typically leaf goals), forward reasoning focuses on the forward
f(m) = max (f(n), g(m) + h(m)) propagation of these initial values to all other goals of the graph according to the rules
o Let G be an optimal goal state described in Section 3. Initial values represent the evidence available about the
o C* is the optimal path cost. satisfaction and the denial of a specific goal, namely evidence about the state of the
o G2 is a suboptimal goal state: g(G2) > C* goal. Usually such a evidence corresponds to qualitative values of satisfaction or denial
of a goal. This is mainly because the evidence is usually provided very vaguely by the
Suppose A* has selected G2 from OPEN for expansion. stakeholders, during the interviews with the analyst, or elaborated from documents or
Consider a node n on OPEN on an optimal path to G. Thus C* ≥ f(n) other available sources of information.
Since n is not chosen for expansion over G2, f(n) ≥ f(G2) For each goal we consider three values representing the current evidence of satisfi-
G2 is a goal state. f(G2) = g(G2) ability and deniability of goal: F (full), P (partial), N (none).We admit also conflicting
Hence C* ≥ g(G2). situations in which we have both evidence for satisfaction and denial of a goal. So for
This is a contradiction. Thus A* could not have selected G2 for expansion before instance, we may have that for goal G we have fully (F) evidence for the satisfaction and
reaching the goal by an optimal path. at the same time partial (P) evidence for denial. This could represent a situation in which
we have two difference sources of information that provide conflicting evidence, or a
Proof of Completeness of A* multiple decompositions of goal G, where some decompositions suggest satisfaction of
Let G be an optimal goal state. G while others suggest denial.
A* cannot reach a goal state only if there are infinitely many nodes where f(n) ≤ C*. After the forward propagation of the initial values, the user can look the final values of
This can only happen if either happens: the goals of interest, target goals from now on (typically root goals), and reveal possible
o There is a node with infinite branching factor. The first condition takes care of conflicts. In other words, the user observes the effects of the initial values over the goals
this. of interests.
o There is a path with finite cost but infinitely many nodes. But we assumed that
Every arc in the graph has a cost greater than some ε> 0. Thus if there are Backward reasoning. Backword reasoning focuses on the backward search of the
infinitely many nodes on a path g(n) > f*, the cost of that path will be possible input values leading to some desired final value, under desired constraints. We
infinite. set the desired final values of the target goals, and we want to find possible initial
assignments to the input goals which would cause the desired final values of the target
Lemma: A* expands nodes in increasing order of their f values. goals by forward propagation. We may also add some desired constraints, and decide to
A* is thus complete and optimal, assuming an admissible and consistent heuristic avoid strong/medium/weak conflicts.
function (or using the pathmax equation to simulate consistency). The Alpha-Beta Procedure
A* is also optimally efficient, meaning that it expands only the minimal number of nodes Alpha-beta pruning is a procedure to reduce the amount of computation and searching
needed to ensure optimality and completeness. during minimax. Minimax is a two-pass search, one pass is used to assign heuristic
values to the nodes at the ply depth and the second is used to propagate the values up
Performance Analysis of A* the tree.
Model the search space by a uniform b-ary tree with a unique start state s, and a goal Alpha-beta search proceeds in a depth-first fashion. An alpha value is an initial or
state, g at a distance N from s. temporary value associated with a MAX node. Because MAX nodes are given the
The number of nodes expanded by A* is exponential in N unless the heuristic estimate is maximum value among their children, an alpha value can never decrease; it can only go
logarithmically accurate up. A beta value is an initial or temporary value associated with a MIN node. Because
|h(n) – h*(n)| ≤ O ( log h*(n) ) MIN nodes are given the minimum value among their children, a beta value can never
In practice most heuristics have proportional error. increase; it can only go down.
It becomes often difficult to use A* as the OPEN queue grows very large. For example, suppose a MAX node's alpha = 6. Then the search needn't consider any
A solution is to use algorithms that work with less memory. branches emanating from a MIN descendant that has a beta value that is less-than-or-
Blind Search Depth-First Search equal to 6. So if you know that a MAX node has an alpha of 6, and you know that one of
1. Set L to be a list of the initial nodes in the problem. its MIN descendants has a beta that is less than or equal to 6, you needn't search any
2. If L is empty, fail otherwise pick the first node n from L further below that MIN node. This is called alpha pruning.
3. If n is a goal state, quit and return path from initial node. The reason is that no matter what happens below that MIN node, it cannot take on a
4. Otherwise remove n from L and add to the front of L all of n's children. Label value that is greater than 6. So its value cannot be propagated up to its MAX (alpha)
each child with its path from initial node. Return to 2. parent.
Similarly, if a MIN node's beta value = 6, you needn't search any further below a
descendant MAX that has acquired an alpha value of 6 or more. This is called beta
pruning.
The reason again is that no matter what happens below that MAX node, it cannot take
on a value that is less than 6. So its value cannot be propagated up to its MIN (beta)
parent.
Note: All numbers in Fig 1 refer to order visited in search. Rules for Alpha-beta Pruning
Breadth-First Search Alpha Pruning: Search can be stopped below any MIN node having a beta
1. Set L to be a list of the initial nodes in the problem. value less than or equal to the alpha value of any of its MAX ancestors.
2. If L is empty, fail otherwise pick the first node n from L Beta Pruning: Search can be stopped below any MAX node having a alpha
3. If n is a goal state, quit and return path from initial node. value greater than or equal to the beta value of any of its MIN ancestors.
4. Otherwise remove n from L and add to the end of L all of n's children. Label
each child with its path from initial node. Return to 2.
Best First Search is a combination of depth first and breadth first searches. Hill climbing
Depth first is good because a solution can be found without computing all nodes and Here the generate and test method is augmented by an heuristic function which
breadth first is good because it does not get trapped in dead ends. The best measures the closeness of the current state to the goal state.
first search allows us to switch between paths thus gaining the benefit of both 1. Evaluate the initial state if it is goal state quit otherwise current state is
approaches. initial state.
Best First The Best First algorithm is a simplified form of the A* algorithm. From 2. Select a new operator for this state and generate a new state.
A* we note that f' = g+h' where g is a measure of the time taken to go from the initial 3. Evaluate the new state
node to the current node and h' is an estimate of the time taken to solution o if it is closer to goal state than current state make it current state
from the current node. Thus f' is an estimate of how long it takes to go from the initial o if it is no better ignore
node to the solution. As an aid we take the time to go from one node to the next to be 4. If the current state is goal state or no new operators available, quit.
a constant at 1 Otherwise repeat from 2.
Best First Search Algorithm: In the case of the four cubes a suitable heuristic is the sum of the number of different
Start with OPEN holding the initial state colours on each of the four sides, and the goal state is 16 four on each side. The set of
Pick the best node on OPEN rules is simply choose a cube and rotate the cube through 90 degrees. The starting
Generate its successors arrangement can either be specified or is at random.
For each successor Do Pruning is the process of removing leaves and branches to improve the performance of
If it has not been generated before evaluate it add it to OPEN and record its parent the decision tree when it moves from the training data (where the classification is
If it has been generated before change the parent if this new path is better and in that known) to real-world applications (where the classification is unknown -- it is what you
case update the cost of getting to any successor nodes are trying to predict). The tree-building algorithm makes the best split at the root node
If a goal is found or no more nodes left in OPEN, quit, else return to 2. where there are the largest number of records and, hence, a lot of information. Each
subsequent split has a smaller and less representative population with which to work.
Monotonicity Towards the end, idiosyncrasies of training records at a particular node display patterns
A logical system is described as being monotonic if a valid proof in the system cannot be that are peculiar only to those records. These patterns can become meaningless and
made invalid by adding additional premises or assumptions. In other words, if we find sometimes harmful for prediction if you try to extend rules based on them to larger
that we can prove a conclusion C by applying rules of deduction to a premise B with populations.
assumptions A, then adding additional assumptions A_ and B_ will not stop us from For example, say the classification tree is trying to predict height and it comes to a node
being able to deduce C. Both propositional logic and FOPL are monotonic. Elsewhere in containing one tall person named X and several other shorter people. It can decrease
this book, we learn about probability theory, which is not a monotonic system. diversity at that node by a new rule saying "people named X are tall" and thus classify
Monotonicity of a logical system can be expressed as follows: the training data. In a wider universe this rule can become less than useless. (Note that,
If we can prove {A, B} _ C, then we can also prove: {A, B, A_, B_} _ C. in practice, we do not include irrelevant fields like "name", this is just an illustration.)
Note that A_ and B_ can be anything, including ¬A and ¬B. In other words, even adding Pruning methods solve this problem -- they let the tree grow to maximum size, then
contradictory assumptions does not stop us from making the proof in a monotonic remove smaller branches that fail to generalize.
system. In fact, it turns out that adding contradictory assumptions allows us to prove
anything, including invalid conclusions. This makes sense if we recall the line in the truth Bayes Theorem
table for →, which shows that false →true. By adding a contradictory assumption, we Let X be the data record (case) whose class label is unknown. Let H be some hypothesis,
make our assumptions false and can thus prove any conclusion. such as "data record X belongs to a specified class C." For classification, we want to
determine P (H|X) -- the probability that the hypothesis H holds, given the observed
Neural Networks data record X.
Artificial neural networks are among the most powerful learning models. They have the P (H|X) is the posterior probability of H conditioned on X. For example, the probability
versatility to approximate a wide range of complex functions representing multi- that a fruit is an apple, given the condition that it is red and round. In contrast, P(H) is
dimensional input-output maps. Neural networks also have inherent adaptability, and the prior probability, or apriori probability, of H. In this example P(H) is the probability
can perform robustly even in noisy environments. that any given data record is an apple, regardless of how the data record looks. The
An Artificial Neural Network (ANN) is an information processing paradigm that is inspired posterior probability, P (H|X), is based on more information (such as background
by the way biological nervous systems, such as the brain, process information. The key knowledge) than the prior probability, P(H), which is independent of X.
element of this paradigm is the novel structure of the information processing system. It Similarly, P (X|H) is posterior probability of X conditioned on H. That is, it is the
is composed of a large number of highly interconnected simple processing elements probability that X is red and round given that we know that it is true that X is an apple.
(neurons) working in unison to solve specific problems. ANNs, like people, learn by P(X) is the prior probability of X, i.e., it is the probability that a data record from our set
example. An ANN is configured for a specific application, such as pattern recognition or of fruits is red and round. Bayes theorem is useful in that it provides a way of calculating
data classification, through a learning process. Learning in biological systems involves the posterior probability, P(H|X), from P(H), P(X), and P(X|H). Bayes theorem is
adjustments to the synaptic connections that exist between the neurons. This is true of P (H|X) = P(X|H) P(H) / P(X)
ANNs as well. ANNs can process information at a great speed owing to their highly
massive parallelism. State space search
• Formulate a problem as a state space search by showing the legal problem states, the
Skolem function legal operators, and the initial and goal states .
A function that is used to replace an *existentially quantified variable that comes after a • A state is defined by the specification of the values of all attributes of interest in the
*universal quantification when *skolemizing an expression. world
For example, ∀x ∃y (x ∧ y)→b would be skolemized as ∀x (x ∧ f(x))→b), where f(x) is the • An operator changes one state into the other; it has a precondition which is the value of
skolem function. certain attributes prior to the application of the operator, and a set of effects, which
are the attributes altered by the operator
2.2.3 Search Problem • The initial state is where you start
We are now ready to formally describe a search problem. • The goal state is the partial description of the solution
A search problem consists of the following: State Space Search Notations
• S: the full set of states Let us begin by introducing certain terms.
• s : the initial state An initial state is the description of the starting configuration of the agent
0 An action or an operator takes the agent from one state to another state which is called a
• A:S→S is a set of operators successor state. A state can have a number of successor states.
• G is the set of final states. Note that G ⊆S A plan is a sequence of actions. The cost of a plan is referred to as the path cost. The path
The search problem is to find a sequence of actions which transforms the agent from the cost is a positive number, and a common path cost may be the sum of the costs of the
initial state to a goal state g∈G. A search problem is represented by a 4-tuple {S, s , steps in the path.
0
Now let us look at the concept of a search problem.
A, G}.
Problem formulation means choosing a relevant set of states to consider, and a feasible set
S: set of states
of operators for moving from one state to another.
s ∈ S : initial state
0 Search is the process of considering various possible sequences of operators applied to
A: SS operators/ actions that transform one state to another state the initial state, and finding out a sequence which culminates in a goal state.
G : goal, a set of states. G ⊆ S
This sequence of actions is called a solution plan. It is a path from the initial state to a
goal state. A plan P is a sequence of actions. Predicate Logic
P = {a , a , … , a } which leads to traversing a number of states {s , s , … , s ∈G}. predicate logic, involves using standard forms of logical symbolism which have
0 1 N 0 1 N+1 been familiar to philosophers and mathematicians for many decades. Most
A sequence of states is called a path. The cost of a path is a positive number. In many simple sentences, for example, ``Peter is generous'' or ``Jane gives a painting
cases the path cost is computed by taking the sum of the costs of each action. to Sam,'' can be represented in terms of logical formulae in which a
Representation of search problems predicate is applied to one or more arguments
A search problem is represented using a directed graph.
• The states are represented as nodes.
• The allowed actions are represented as arcs.
Inheritable Knowledge Procedural Knowledge Declarative Knowledge
− is obtained from associated objects. • Hard to debug • Easy to validate
− it prescribes a structure in which new objects are created which may inherit all or a • Black box • White box
subset of attributes from existing objects. • Obscure • Explicit
◊ Inferential Knowledge • Process oriented • Data - oriented
− is inferred from objects through relations among objects. • Extension may effect stability • Extension is easy
− e.g., a word alone is a simple syntax, but with the help of other words in phrase the • Fast , direct execution • Slow (requires interpretation)
reader may infer more from a word; this inference within linguistic is called semantics. • Simple data type can be used • May require high level data type
Resolution • Representations in the form of sets of rules, organized into routines and subroutines.
Resolution is a procedure used in proving that arguments which are • Representations in the form of production system, the entire setof rules for executing
expressible in predicate logic are correct. the task.
Resolution is a procedure that produces proofs by refutation or
contradiction.
Resolution lead to refute a theorem-proving technique for sentences in
propositional logic and first-order logic.
− Resolution is a rule of inference.
− Resolution is a computerized theorem prover.
− Resolution is so far only defined for Propositional Logic. The strategy is
that the Resolution techniques of Propositional logic be adopted in
Predicate Logic.