Unit 2 Informed Search and Exploration
Unit 2 Informed Search and Exploration
Let us see how this works for route-finding problems in Romania, using the
straightline distance heuristic, which we will call hsLD. If the goal is Bucharest,
we will need to know the straight-line distances to Bucharest, which are shown in
Figure For example, hsLD(In(Arad)=) 366.
Figure:
Arad 366 Mehadia 24 1
Bucharest 0 Neamt 234
Craiova 160 Oradea 380
Drobeta 242 Pitesti 100
Eforie 161 Rimnicu Vilcea 193
Fagaras 176 Sibiu 253
Giurgiu 77 Timisoara 329
Hirsova 151 Urziceni 80
Iasi 226 Vaslui 199
Lugoj 244 Zerind 374
Arad
Arad
253 0
Sibiu Buchares
t
Figure shows the progress of a greedy best-first search using hsLD to find a path
from Arad to Bucharest. The first node to be expanded from Arad will be Sibiu,
because it is closer to Bucharest than either Zerind or Timisoara. The next node to
be expanded will be Fagaras, because it is closest. Fagaras in turn generates
Bucharest, which is the goal.It is not optimal, however: the path via Sibiu and
Fagaras to Bucharest is 32 kilometers longer than the path through Rimnicu Vilcea
and Pitesti. This shows why the algorithm is called "greedy'-at each step it tries to
get as close to the goal as it can.
393=140+253 447=118+329
Arad
Arad
Arad
Arad
From this example, we can extract a general proof that A* using TREE-SEARCH
is optimal if h(n) is admissible. Suppose a suboptimal goal node G2 appears on the
fringe, and let the cost of the optimal solution be C*.
Then, because G2i s suboptimal and because h(G2=) 0 (true for any goal node), we
know
f (G2) = g(G2) + h(G2) = g(G2) > C* .
Now consider a fringe node n that is on an optimal solution path-for example,
Pitesti in the example of the preceding paragraph. (There must always be such a
node if a solution exists.)
If h(n) does not overestimate the cost of completing the solution path, then we
know that f (n) = g(n) + h(n) < C* .
Now we have shown that f (n) < C* < f (G2) so G2 will not be expanded and A*
must return an optimal solution.
A heuristic h(n) is consistent if, for every node n and every successor n' of
n generated by any action a, the estimated cost of reaching the goal from n is no
greater than the step cost of getting to n' plus the estimated cost of reaching the
goal from n':
h(n)<= c(n,a , n') + h(n’).
This is a form of the general triangle inequality, which stipulates that each side of
A triangle cannot be longer than the sum of the other two sides. Here, the triangle
is formed by n, n',and the goal closest to n. It is fairly easy to show that every
consistent heuristic is also admissible. The most important consequence of
consistency is the following: A* using GRAPH-SEARCH is optimal if h(n) is
consistent.
Another important consequence of consistency is the following: iff h(n) is
consistent,then the values off (n) along any path are nondecreasing. The proof
follows directly from the definition of consistency. Suppose n' is a successor of n;
then g(n')= g(n)+ c(n,a , n') for some a, and we have
f(n’)= g(n’)+ h(n’)= g(n)+ c(n,a , n') + h(n’)>= g(n)+ h(n)= f (n).
It follows that the sequence of nodes expanded by A* using GRAPH-SEARCH is
in nondecreasing order of f (n). Hence, the first goal node selected for expansion
must be an optimal solution, since all later nodes will be at least as expensive.
The fact that f -costs are nondecreasing along any path also means that we
can draw contours in the state space, just like the contours in a topographic map.
Figure shows an example. Inside the contour labeled 400, all nodes have f (n) less
than or equal to 400, and so on. Then, because A* expands the fringe node of
lowest f -cost, we can see that an A* search fans out from the start node, adding
nodes in concentric bands of increasing f -cost.
With uniform-cost search (A* search using h(n) = 0), the bands will be "circular"
around the start state. With more accurate heuristics, the bands will stretch toward
the goal state and become more narrowly focused around the optimal path. If C* is
the cost of the optimal solution path, then we can say the following:
A* expands all nodes with f (n) < C*.
A* might then expand some of the nodes right on the "goal contour"
(where f (n) = C*) before selecting a goal node.
Notice that A* expands no nodes with f (n) > C*-for example, Timisoara is not
expanded in Figure : even though it is a child of the root. We say that the subtree
below Timisoara is pruned; because hsLD is admissible, the algorithm can safely
ignore this subtree while still guaranteeing optimality. The concept of pruning-
eliminating possibilities fromconsideration without having to examine them-is
important for many areas of AI.
Computation time is not, however, A*'s main drawback. Because it keeps all
generated nodes in memory (as do all GRAPH-SEARCH algorithms), A*
usually runs out of space long before it runs out of time. For this reason, A* is
not practical for many large-scale problems. Recently developed algorithms have
overcome the space problem without sacrificing optimality or completeness, at a
small cost in execution time. These are discussed next.
Arad
447
IDA* and RBFS suffer from using too little memory. Between iterations, IDA*
retains only a single number: the current f -cost limit. RBFS retains more
information in memory, but it uses only linear space: even if more memory were
available, RBFS has no way to make use of it.
It seems sensible, therefore, to use all available memory. Two algorithms that do
this are MA* (memory-bounded A*) and SMA* (simplified MA*).
SMA* proceeds just like A*, expanding the best leaf until memory is full.
At this point, it cannot add a new node to the search tree without dropping an old
one. SMA* always drops the worst leaf node-the one with the highest f-value. Like
RBFS, SMA* then backs up the value of the forgotten node to its parent. In this
way, the ancestor of a forgotten subtree knows the quality of the best path in that
subtree. With this information , SMA* regenerates the subtree only when all
otherpaths have been shown to look worse than the path it has forgotten.
What if all the leaf nodes have the same f -value? Then the algorithm might select
the same node for deletion and expansion. SMA* solves this problem by
expanding the newest best leaf and deleting the oldest worst leaf. These can be the
same node only if there is only one leaf; in that case, the current search tree must
be a single path from root to leaf that fills all of memory. If the leaf is not a goal
node, then even if it is on an optimal solution path, that solution is not reachable
with the available memory. Therefore, the node can be discarded exactly as if
it had no successors.
SMA* is complete if there is any reachable solution-that is, if d, the depth of the
shallowest goal node, is less than the memory size (expressed in nodes). It is
optimal if any optimal solution is reachable; otherwise it returns the best reachable
solution. In practical terms, SMA* might well be the best general-purpose
algorithm for finding optimal solutions,particularly when the state space is a graph,
step costs are not uniform, and node generation is expensive compared to the
additional overhead of maintaining the open and closed lists.
On very hard problems, however, it will often be the case that SMA* is forced to
switch back and forth continually between a set of candidate solution paths, only a
small subset of which can fit in memory. (This resembles the problem of
thrashing in disk paging systems.)
HEURISTIC FUNCTIONS
In this section, we will look at heuristics for the 8-puzzle, in order to shed light on
the nature of heuristics in general.
Start Goal
If we want to find the shortest solutions by using A*, we need a heuristic function
that never overestimates the number of steps to the goal. There is a long history of
such heuristics for the 15-puzzle; here are two commonly-used candidates:
h1 = the number of misplaced tiles. For the Figure , all of the eight tiles are
out of position, so the start state would have h 1 = 8. H1 is an admissible
heuristic, because it is clear that any tile that is out of place must be moved
at least once.
h2 = the sum of the distances of the tiles from their goal positions. Because
tiles cannot move along diagonals, the distance we will count is the sum of
the horizontal and vertical distances. This is sometimes called the city block
distance or Manhattandistance. h2 is also admissible, because all any
move can do is move one tile one step closer to the goal. Tiles 1 to 8 in the
start state give a Manhattan distance of
h2=3+l+2+2+2+3+3+2=18.
One might ask whether h2 is always better than hl. The answer is yes.
Proof: every node with f (n) < C* will surely be expanded. This is the same as
saying that every node with h(n) < C* - g(n) will surely be expanded. But because
h2 is at least as big as h1 for all nodes, every node that is surely expanded by A*
search with h2 will also surely be expanded with h1, and h1 might also cause other
nodes to be expanded as well. Hence, it is always better to use a heuristic
function with higher values, provided it does not overestimate and that the
computation time for the heuristic is not too large.
We have seen that both hl (misplaced tiles) and h2 (Manhattan distance) are fairly
good heuristics for the 8-puzzle and that h2 is better. Is it possible for a computer
to invent such a heuristic mechanically?
1. relaxed problem
A problem with fewer restrictions on the actions is called a relaxed problem. The
cost of an optimal solution to a relaxedproblem is an admissible heuristic for the
original problem. The heuristic is admissible because the optimal solution in the
original problem is, by definition also a solution in the relaxed problem and
therefore must be at least as expensive as the optimal solution in the relaxed
problem.
For e xample, if the 8-puzzle actions are described as
A tile can move from square A to square B if
A is horizontally or vertically adjacent to B an(d B is blank
,
we can generate three relaxed problems by removing one or both of the conditions:
(a) A tile can move from square A to square B if A is adjacent to B.
(b) A tile can move from square A to square B if B is blank.
(c) A tile can move from square A to square B.
From (a), we can derive h2 (Manhattan distance). The reasoning is that h2 would
be the proper score if we moved each tile in turn to its destination. The heuristic
derived from (b) is discussed in next chapters.
From (c), we can derive hl (misplaced tiles), because it would be
the proper score if tiles could move to their intended destination in one step.
If the relaxed problem is hard to solve, then the values of the corresponding
heuristic will be expensive to obtain .If a collection of admissible heuristics hl . .
. h, is available for a problem, and none of them dominates any of the others,
which should we choose? As it turns out, we need not make a choice. We can have
the best of all worlds, by defining
h(n) = max{h1 (n)., . . , h,(n) ) .
This composite heuristic uses whichever function is most accurate on the node in
question.
2)Subproblem
Admissible heuristics can also be derived from the solution cost of a subproblem
of a given problem. For example, Figure below shows a subproblem of the 8-
puzzle instance . The subproblem involves getting tiles 1, 2, 3, 4 into their correct
positions.Clearly, the cost of the optimal solution of this subproblem is a lower
bound on the cost of the complete problem. It turns out to be substantially more
accurate than Manhattan distance in some cases.
2.a)pattern databases
The idea behind pattern databases is to store these exact solution costs for every
possible subproblem instance-in our example, every possible configuration of the
four tiles and the blank. (Notice that the locations of the other four tiles are
irrelevant for the purposes of solving the subproblem, but moves of those tiles do
count towards the cost.) Then, we compute an admissible heuristic hDB for each
complete state encountered during a search simply by looking up the
corresponding subproblem configuration in the database. The database
itself is constructed by searching backwards from the goal state and recording the
cost of each new pattern encountered; the expense of this search is amortized over
many subsequent problem instances.
"Experience" here means solving lots of 8-puzzles, for instance. Each optimal
solution to an 8-puzzle problem provides examples from which h(n) can be
learned. Each example consists of a state from the solution
path and the actual cost of the solution from that point. From these examples, an
inductive learning algorithm can be used to construct a function h(n) that can
(with luck) predict solution costs for other states that arise during search.
Inductive learning methods work best when supplied with features of a state that
are relevant to its evaluation, rather than with just the raw state description. For
example, the feature "number of misplaced tiles" might be helpful in predicting the
actual distance of a state from the goal.
In addition to finding goals, local search algorithms are useful for solving pure op-
timization problems, in which the aim is to find the best state according to an
objective function.
To understand local search, we will find it very useful to consider the state space
land- scape .A landscape has both "location" (defined by the state) and
"elevation"(defined by the value of the heuristic cost function or objective
function). If elevation corresponds to cost, then the aim is to find the lowest valley-
a global minimum; if elevation corresponds to an objective function, then the aim
is to find the highest peak-a global maximum, A complete, local search algorithm
always finds a goal if one exists; an optimal algorithm always finds a, global
minimum/maximum.
Maximum
StateSpace
Hill-climbing search
It is simply a loop that continually moves in the direction of increasing value-that
is, uphill. It terminates when it reaches a "peak" where no neighbor has a higher
value. The algorithm does not maintain a search tree, so the current node data
structure need only record the state and its objective function value.
Ridges: Ridges result in a sequence of local maxima that is very difficult for
greedy algorithms to navigate.
Plateaux: a plateau is an area of the state space landscape where the evaluation
function is flat. It can be a flat local maximum, from which no uphill exit exists, or
a shoulder,from which it is possible to make progress.
If we always allow sideways moves when there are no uphill moves, an infinite
loop will occur whenever the algorithm reaches a flat local maximum that is not a
shoulder. One common solution is to put a limit on the number of consecutive
sidleways moves allowed.
Genetic algorithms
Like beam search, GAS begin with a set of k randomly generated states, called the
population. Each state, or individual, is represented as a string over a finite
alphabet.The production of the next generation of states is shown in Figure . In (b),
each state is rated by the evaluation function or the fitness function.
A b c d
A fitness function should return higher values for better states so, for the 8-queens
problem we use the number of nonattacking pairs of queens, which has a value of
28 for a solution.The values of the four states are 24, 23, 20, and 11
In (c), a random choice of two pairs is selected for reproduction, in accordance
with the probabilities in (b). Notice that one individual is selected twice and one
not at all. Then a crossover point is randomly chosen from the positions in the
string. In Figure the crossover points are after the third digit in the first pair and
after the fifth digit in the second pair.
In (d), the offspring themselves are created by crossing over the parent strings at
the crossover point. For example, the first child of the first pair gets the first three
digits from the first parent and the remaining digits from the second parent,
whereas the second child gets the first three digits from the second parent and the
rest from the first parent.
Finally, in (e), each location is subject to random mutation with a small
independent probability. One digit was mutated in the first, third, and fourth
offspring.
The theory of genetic algorithms explains how this works using the idea of a
schema,which is a substring in which some of the positions can be left unspecified.
For example,the schema 246""""" describes all 8-queens states in which the first
three queens are in positions 2, 4, and 6 respectively. Strings that match the schema
(such as 24613578) are called instances of the schema. It can be shown that, if the
average fitness of the instances of a schema is above the mean, then the number of
instances of the schema within the population will grow over time.
offline search algorithms compute a complete solution before setting foot in the
real world and then execute the solution without recourse to their percepts. In
contrast, an online search agent operates by interleaving computation and action:
first it takes an action, then it observes the environment and computes the next
action.
Online search is a necessary idea for an exploration problem, where the
states and actions are unknown to the agent. An agent in this state of Ignorance
must use its actions as experiments to determine what to do next, and hence must
interleave computation and action. Consider a newborn baby: it has many possible
actions, but knows the outcomes of none of them, and it has experienced only a
few of the possible states that it can reach. The baby's gradual discovery of how the
world works is, in part, an online search process.
Online search problems
An online search problem can be solved only by an agent executing actions, rather
than by a purely computational process. We will assume that the agent knows just
the following:
1.ACTIONS(S) which returns a list of actions allowed in state s;
2.The step-cost function c(s, a, sl)-note that this cannot be used until the
agent knows that s’ is the outcome;
3.GOAL-TEST(S).
Typically, the agent's objective is to reach a goal state while minimizing cost. The
cost is the total path cost of the path that the agent actually travels. It is common to
compare this cost with the path cost of the path the agent would follow if it knew
the search space in advance-that is, the actual shortest path .In the language of
online algorithms, this is called the competitive ratio; we would like it to be as
small as possible.
No algorithm can avoid dead ends in all state spaces. Consider the two dead-end
state spaces in the figure. To an online search algorithm that has visited states S and
A, the two state spaces look identical, so it must make the same decision in both.
Therefore, it will fail in one of them. This is an example of an adversary
argument-we can imagine an adversary that constructs the state space while the
agent explores it and can put the goals and dead ends wherever it likes.
To make progress, we will simply assume that the state space is safely explorable-
that is, some goal state is reachable from every reachable state. State spaces with
reversible actions, such as mazes and 8-puzzles, can be viewed as undirected
graphs and are clearly safely explorable.
S G