0% found this document useful (0 votes)
118 views25 pages

Unit 2 Informed Search and Exploration

The document discusses informed search strategies, specifically best-first search and A* search. It explains that best-first search selects the next node to expand based on an evaluation function, with the goal of minimizing cost. A* search combines the cost to reach a node (g) and the estimated cost to the goal (h) as the evaluation function (f=g+h). If h is admissible and never overestimates cost to the goal, A* search is guaranteed to find an optimal solution. The document provides examples applying A* search to find routes between cities in Romania.

Uploaded by

aiswarya p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
118 views25 pages

Unit 2 Informed Search and Exploration

The document discusses informed search strategies, specifically best-first search and A* search. It explains that best-first search selects the next node to expand based on an evaluation function, with the goal of minimizing cost. A* search combines the cost to reach a node (g) and the estimated cost to the goal (h) as the evaluation function (f=g+h). If h is admissible and never overestimates cost to the goal, A* search is guaranteed to find an optimal solution. The document provides examples applying A* search to find routes between cities in Romania.

Uploaded by

aiswarya p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 25

UNIT 2

INFORMED SEARCH AND EXPLORATION

Informed Search Strategies


This section shows how an infomed search strategy--one that uses problem-
specific knowledge beyond the definition of the problem itself-can find solutions
more efficiently than an uninformed strategy.The general approach we will
consider is called best-first search. Best-first search is an instance of the general
TREE-SEARCH or GRAPH-SEARCH algorithm in which a node is selected for
expansion based on an evaluation function, f (n) . Traditionally, the node with
the lowest evaluation is selected for expansion, because the evaluation measures
distance to the goal.
All we can do is choose the node that appears to be best according to the
evaluationfunction.
There is a whole family of BEST-FIRST-SEARCH algorithms with different
evaluation functions. A key component of these algorithms is a heuristic function
denoted h(n):
h(n) = estimated cost of the cheapest path from node n to a goal node.
For example, in Romania, one might estimate the cost of the cheapest path from
Arad to Bucharest via the straight-line distance from Arad to Bucharest.
Heuristic functions are the most common form in which additional Greedy best-
first search3 tries to expand the node that is closest to the goal, on the: grounds that
this is likely to lead to a solution quickly. Thus, it evaluates nodes by using just the
heuristic function: f (n) = h(n).

Let us see how this works for route-finding problems in Romania, using the
straightline distance heuristic, which we will call hsLD. If the goal is Bucharest,
we will need to know the straight-line distances to Bucharest, which are shown in
Figure For example, hsLD(In(Arad)=) 366.

Figure:
Arad 366 Mehadia 24 1
Bucharest 0 Neamt 234
Craiova 160 Oradea 380
Drobeta 242 Pitesti 100
Eforie 161 Rimnicu Vilcea 193
Fagaras 176 Sibiu 253
Giurgiu 77 Timisoara 329
Hirsova 151 Urziceni 80
Iasi 226 Vaslui 199
Lugoj 244 Zerind 374

a)The initial state

Arad

253 329 374

(c) After expanding Sibiu and then Fagarus

Arad

Ara Fagaras Oradea Riminicu vilsi


366
d 176 380 193

253 0

Sibiu Buchares
t
Figure shows the progress of a greedy best-first search using hsLD to find a path
from Arad to Bucharest. The first node to be expanded from Arad will be Sibiu,
because it is closer to Bucharest than either Zerind or Timisoara. The next node to
be expanded will be Fagaras, because it is closest. Fagaras in turn generates
Bucharest, which is the goal.It is not optimal, however: the path via Sibiu and
Fagaras to Bucharest is 32 kilometers longer than the path through Rimnicu Vilcea
and Pitesti. This shows why the algorithm is called "greedy'-at each step it tries to
get as close to the goal as it can.

A* search: Minimizing the total estimated solution cost


The most widely-known form of best-first search is called A* search (pronounced
"A-star search"). It evaluates nodes by combining g(n), the cost to reach the node,
and h(n.),the cost to get from the node to the goal:
F(n)=g(n)+h(n)
Since g(n) gives the path cost from the start node to node n, and h(n) is the
estimated cost of the cheapest path from n to the goal, we have f (n) = estimated
cost of the cheapest solution through n Thus, if we are trying to find the cheapest
solution, a reasonable thing to try first is the node with the lowest value of g(n) +
h(n). It turns out that this strategy is more than just reasonable: provided that the
heuristic function h(n) satisfies certain conditions, A* search is both complete and
optimal.

The optimality of A* is straightforward to analyze if it is used with TREE-


SEARCH. In this case, A* is optimal if h(n) is an admissible heuristic-that is,
provided that h(n)never overestimates the cost to reach the goal. Admissible
heuristics are by nature optimistic,because they think the cost of solving the
problem is less than it actually is. Since g(n) is the exact cost to reach n, we have
as immediate consequence that f (n) never overestimates the true cost of a solution
through n.
An obvious example of an admissible heuristic is the straight-line distance hsLD
that we used in getting to Bucharest. Straight-line distance is admissible because
the shortest path between any two points is a straight line, so the straight line
cannot be an overestimate.

a) Initial State Arad


b)After Expanding Arad
Arad

393=140+253 447=118+329

C)After Expanding Sibiu

Arad

646=280+366 415=239+176 671 413

d)After Expanding Riminicu Vilsi

Arad

526=366+160 417 553

After Expanding Fagaras

Arad
Arad

591 450 526 417 553

After Expanding Pitesi Arad

418 615 607

From this example, we can extract a general proof that A* using TREE-SEARCH
is optimal if h(n) is admissible. Suppose a suboptimal goal node G2 appears on the
fringe, and let the cost of the optimal solution be C*.
Then, because G2i s suboptimal and because h(G2=) 0 (true for any goal node), we
know
f (G2) = g(G2) + h(G2) = g(G2) > C* .
Now consider a fringe node n that is on an optimal solution path-for example,
Pitesti in the example of the preceding paragraph. (There must always be such a
node if a solution exists.)
If h(n) does not overestimate the cost of completing the solution path, then we
know that f (n) = g(n) + h(n) < C* .
Now we have shown that f (n) < C* < f (G2) so G2 will not be expanded and A*
must return an optimal solution.

A heuristic h(n) is consistent if, for every node n and every successor n' of
n generated by any action a, the estimated cost of reaching the goal from n is no
greater than the step cost of getting to n' plus the estimated cost of reaching the
goal from n':
h(n)<= c(n,a , n') + h(n’).
This is a form of the general triangle inequality, which stipulates that each side of
A triangle cannot be longer than the sum of the other two sides. Here, the triangle
is formed by n, n',and the goal closest to n. It is fairly easy to show that every
consistent heuristic is also admissible. The most important consequence of
consistency is the following: A* using GRAPH-SEARCH is optimal if h(n) is
consistent.
Another important consequence of consistency is the following: iff h(n) is
consistent,then the values off (n) along any path are nondecreasing. The proof
follows directly from the definition of consistency. Suppose n' is a successor of n;
then g(n')= g(n)+ c(n,a , n') for some a, and we have
f(n’)= g(n’)+ h(n’)= g(n)+ c(n,a , n') + h(n’)>= g(n)+ h(n)= f (n).
It follows that the sequence of nodes expanded by A* using GRAPH-SEARCH is
in nondecreasing order of f (n). Hence, the first goal node selected for expansion
must be an optimal solution, since all later nodes will be at least as expensive.
The fact that f -costs are nondecreasing along any path also means that we
can draw contours in the state space, just like the contours in a topographic map.
Figure shows an example. Inside the contour labeled 400, all nodes have f (n) less
than or equal to 400, and so on. Then, because A* expands the fringe node of
lowest f -cost, we can see that an A* search fans out from the start node, adding
nodes in concentric bands of increasing f -cost.

With uniform-cost search (A* search using h(n) = 0), the bands will be "circular"
around the start state. With more accurate heuristics, the bands will stretch toward
the goal state and become more narrowly focused around the optimal path. If C* is
the cost of the optimal solution path, then we can say the following:
A* expands all nodes with f (n) < C*.
A* might then expand some of the nodes right on the "goal contour"
(where f (n) = C*) before selecting a goal node.
Notice that A* expands no nodes with f (n) > C*-for example, Timisoara is not
expanded in Figure : even though it is a child of the root. We say that the subtree
below Timisoara is pruned; because hsLD is admissible, the algorithm can safely
ignore this subtree while still guaranteeing optimality. The concept of pruning-
eliminating possibilities fromconsideration without having to examine them-is
important for many areas of AI.

Computation time is not, however, A*'s main drawback. Because it keeps all
generated nodes in memory (as do all GRAPH-SEARCH algorithms), A*
usually runs out of space long before it runs out of time. For this reason, A* is
not practical for many large-scale problems. Recently developed algorithms have
overcome the space problem without sacrificing optimality or completeness, at a
small cost in execution time. These are discussed next.

Memory-bounded heuristic search


The simplest way to reduce memory requirements for A" is to adapt the idea of
iterative deepening to the heuristic search context, resulting in the iterative-
deepening A* (IDA*) algorithm.The main difference between IDA* and standard
iterative deepening is that the cutoff used is the f -cost (g + h) rather than the
depth; at each iteration, the cutoff value is the smallest f -cost of any node that
exceeded the cutoff on the previous iteration. IDA* is practical for many problems
with unit step costs and avoids the substantial overhead associated with
keeping a sorted queue of nodes.
Recursive best-first search (RBFS)

Its structure is similar to that of a recursive depth-first search, but rather


than continuing indefinitely down the current path, it keeps track of the f-value of
the best alternative path available from any ancestor of the current node. If the
current node exceeds this limit, the recursion unwinds back to the alternative path.
As the recursion unwinds, RBFS replaces the f -value of each node along the path
with the best f -value of its children. In this way, RBFS remembers the f -value of
the best leaf in the forgotten subtree and can therefore decide whether it's worth
reexpanding the subtree at some later time.

function RECURSIVE-BEST_FIRST_SEARCH(Problem )Returns a Soluton


or failure
RBFS(problem, MAKE-NoDE(INITIAL-STATE[Problem]),$)

function RBFS(problem, node, f-limit) returns a solution, or failure and a new f


-cost limit
if GoAL-TEsT[Problem](State[node]) Then Return Node
successors<- E X P A N D ( node,Problem) )
if successors is empty then return failure,$
for each s in successors do
f [sl <-max(g(s) + h(s), f [nodel)
repeat
best <-the lowest f -value node in successors
if f[best] > f-limit then return failure, f [best]
alternative <-- the second-lowest f -value among successors
result, f [best]<- RBFS(problem, best,min( f-limit, alternative))
if result # failure then return result

After Expanding Arad, Sibiu And Riminicu Vilsi


Arad

After unwinding Back to sibiu And Expanding Fagarus

Arad

After Switching Back to riminicu vilsi And Expanding Pitesi


Arad

447

Sibiu 393 Timisora 447 Zerind 449

IDA* and RBFS suffer from using too little memory. Between iterations, IDA*
retains only a single number: the current f -cost limit. RBFS retains more
information in memory, but it uses only linear space: even if more memory were
available, RBFS has no way to make use of it.

It seems sensible, therefore, to use all available memory. Two algorithms that do
this are MA* (memory-bounded A*) and SMA* (simplified MA*).

SMA* (simplified MA*).

SMA* proceeds just like A*, expanding the best leaf until memory is full.
At this point, it cannot add a new node to the search tree without dropping an old
one. SMA* always drops the worst leaf node-the one with the highest f-value. Like
RBFS, SMA* then backs up the value of the forgotten node to its parent. In this
way, the ancestor of a forgotten subtree knows the quality of the best path in that
subtree. With this information , SMA* regenerates the subtree only when all
otherpaths have been shown to look worse than the path it has forgotten.
What if all the leaf nodes have the same f -value? Then the algorithm might select
the same node for deletion and expansion. SMA* solves this problem by
expanding the newest best leaf and deleting the oldest worst leaf. These can be the
same node only if there is only one leaf; in that case, the current search tree must
be a single path from root to leaf that fills all of memory. If the leaf is not a goal
node, then even if it is on an optimal solution path, that solution is not reachable
with the available memory. Therefore, the node can be discarded exactly as if
it had no successors.

SMA* is complete if there is any reachable solution-that is, if d, the depth of the
shallowest goal node, is less than the memory size (expressed in nodes). It is
optimal if any optimal solution is reachable; otherwise it returns the best reachable
solution. In practical terms, SMA* might well be the best general-purpose
algorithm for finding optimal solutions,particularly when the state space is a graph,
step costs are not uniform, and node generation is expensive compared to the
additional overhead of maintaining the open and closed lists.

On very hard problems, however, it will often be the case that SMA* is forced to
switch back and forth continually between a set of candidate solution paths, only a
small subset of which can fit in memory. (This resembles the problem of
thrashing in disk paging systems.)

HEURISTIC FUNCTIONS
In this section, we will look at heuristics for the 8-puzzle, in order to shed light on
the nature of heuristics in general.

Start Goal

If we want to find the shortest solutions by using A*, we need a heuristic function
that never overestimates the number of steps to the goal. There is a long history of
such heuristics for the 15-puzzle; here are two commonly-used candidates:

h1 = the number of misplaced tiles. For the Figure , all of the eight tiles are
out of position, so the start state would have h 1 = 8. H1 is an admissible
heuristic, because it is clear that any tile that is out of place must be moved
at least once.

h2 = the sum of the distances of the tiles from their goal positions. Because
tiles cannot move along diagonals, the distance we will count is the sum of
the horizontal and vertical distances. This is sometimes called the city block
distance or Manhattandistance. h2 is also admissible, because all any
move can do is move one tile one step closer to the goal. Tiles 1 to 8 in the
start state give a Manhattan distance of
h2=3+l+2+2+2+3+3+2=18.

The effect of heuristic accuracy on performance

As we would hope, neither of these overestimates the true solution


One way to characterize the quality of a heuristic is the effective branching factor
b*. If the total number of nodes generated by A* for a particular problem is N, and
the solution depth is d, then b* is the branching factor that a uniform tree of depth
d would have to have in order to contain N + 1 nodes. Thus,
N + 1 = 1 + b* + (b*)2 + . . . + (b*)d.
To test the heuristic functions hl and h2,, we generated 1200 random problems with
solution lengths from 2 to 24 and solved them with iterative deepening search and
with A* tree search using both hl and h2.
The results suggest that h2 is better than hl, and is far better than using iterative
deepening search.

One might ask whether h2 is always better than hl. The answer is yes.

Proof: every node with f (n) < C* will surely be expanded. This is the same as
saying that every node with h(n) < C* - g(n) will surely be expanded. But because
h2 is at least as big as h1 for all nodes, every node that is surely expanded by A*
search with h2 will also surely be expanded with h1, and h1 might also cause other
nodes to be expanded as well. Hence, it is always better to use a heuristic
function with higher values, provided it does not overestimate and that the
computation time for the heuristic is not too large.

Inventing admissible heuristic functions

We have seen that both hl (misplaced tiles) and h2 (Manhattan distance) are fairly
good heuristics for the 8-puzzle and that h2 is better. Is it possible for a computer
to invent such a heuristic mechanically?

Different Methods are there to generate Heuristics

1. relaxed problem
A problem with fewer restrictions on the actions is called a relaxed problem. The
cost of an optimal solution to a relaxedproblem is an admissible heuristic for the
original problem. The heuristic is admissible because the optimal solution in the
original problem is, by definition also a solution in the relaxed problem and
therefore must be at least as expensive as the optimal solution in the relaxed
problem.
For e xample, if the 8-puzzle actions are described as
A tile can move from square A to square B if
A is horizontally or vertically adjacent to B an(d B is blank
,
we can generate three relaxed problems by removing one or both of the conditions:
(a) A tile can move from square A to square B if A is adjacent to B.
(b) A tile can move from square A to square B if B is blank.
(c) A tile can move from square A to square B.
From (a), we can derive h2 (Manhattan distance). The reasoning is that h2 would
be the proper score if we moved each tile in turn to its destination. The heuristic
derived from (b) is discussed in next chapters.
From (c), we can derive hl (misplaced tiles), because it would be
the proper score if tiles could move to their intended destination in one step.

If the relaxed problem is hard to solve, then the values of the corresponding
heuristic will be expensive to obtain .If a collection of admissible heuristics hl . .
. h, is available for a problem, and none of them dominates any of the others,
which should we choose? As it turns out, we need not make a choice. We can have
the best of all worlds, by defining
h(n) = max{h1 (n)., . . , h,(n) ) .
This composite heuristic uses whichever function is most accurate on the node in
question.
2)Subproblem
Admissible heuristics can also be derived from the solution cost of a subproblem
of a given problem. For example, Figure below shows a subproblem of the 8-
puzzle instance . The subproblem involves getting tiles 1, 2, 3, 4 into their correct
positions.Clearly, the cost of the optimal solution of this subproblem is a lower
bound on the cost of the complete problem. It turns out to be substantially more
accurate than Manhattan distance in some cases.

2.a)pattern databases
The idea behind pattern databases is to store these exact solution costs for every
possible subproblem instance-in our example, every possible configuration of the
four tiles and the blank. (Notice that the locations of the other four tiles are
irrelevant for the purposes of solving the subproblem, but moves of those tiles do
count towards the cost.) Then, we compute an admissible heuristic hDB for each
complete state encountered during a search simply by looking up the
corresponding subproblem configuration in the database. The database
itself is constructed by searching backwards from the goal state and recording the
cost of each new pattern encountered; the expense of this search is amortized over
many subsequent problem instances.

2.b) disjoint pattern databases


The choice of 1-2-3-4 is fairly arbitrary; we could also construct databases for 5-6-
7-8,and for 2-4-64, and so on.One might wonder whether the heuristics obtained
from the 1-2-3-4 database and the 5-6-7-8 could be added, since the two
subproblems seem not to overlap. Would this still give an admissible heuristic? The
answer is no, because the solutions of the 1-2-3-4 subproblem and the 5-6-7-8
subproblem for a given state will almost certainly share some moves-it is
unlikely that 1-2-3-4 can be moved into place without touching 5-6-7-8, and
vice versa. But what if we don't count those moves? That is, we record into the
total cost of solving the 1-2-3-4 subproblem, but just the number of moves
involving 1-2-3-4. Then it is easy to see that the sum of the two costs is still a
lower bound on the cost of solving the entire problem.This is the idea behind
disjoint pattern databases. Using such databases, it is possible to solve random
15-puzzles in a few milliseconds-the numbes of nodes generated is reduced
by a factor of 10,000 compared with using Manhattan distance. For 24-puzzles, a
speedup of roughly a million can be obtained.

3)Learning heuristics from experience

"Experience" here means solving lots of 8-puzzles, for instance. Each optimal
solution to an 8-puzzle problem provides examples from which h(n) can be
learned. Each example consists of a state from the solution
path and the actual cost of the solution from that point. From these examples, an
inductive learning algorithm can be used to construct a function h(n) that can
(with luck) predict solution costs for other states that arise during search.

Inductive learning methods work best when supplied with features of a state that
are relevant to its evaluation, rather than with just the raw state description. For
example, the feature "number of misplaced tiles" might be helpful in predicting the
actual distance of a state from the goal.

LOCAL SEARCH ALGORITHMS AND OPTIMIZATION


PROBLEMS
If the path to the goal does not matter, we might consider a different class of algo-
rithms, ones that do not worry about paths at all. Local search algorithms operate
using a single current state (rather than multiple paths) and generally move only
to neighbors of that state. Typically, the paths followed by the search are not
retained. Although local search algorithms are not systematic, they have two key
advantages:
(1) they use very little memory-usually a constant amount; and
(2) they can often find reasonable solutions in large
or infinite (continuous) state spaces for which systematic algorithms are
unsuitable.

In addition to finding goals, local search algorithms are useful for solving pure op-
timization problems, in which the aim is to find the best state according to an
objective function.
To understand local search, we will find it very useful to consider the state space
land- scape .A landscape has both "location" (defined by the state) and
"elevation"(defined by the value of the heuristic cost function or objective
function). If elevation corresponds to cost, then the aim is to find the lowest valley-
a global minimum; if elevation corresponds to an objective function, then the aim
is to find the highest peak-a global maximum, A complete, local search algorithm
always finds a goal if one exists; an optimal algorithm always finds a, global
minimum/maximum.

Maximum

StateSpace
Hill-climbing search
It is simply a loop that continually moves in the direction of increasing value-that
is, uphill. It terminates when it reaches a "peak" where no neighbor has a higher
value. The algorithm does not maintain a search tree, so the current node data
structure need only record the state and its objective function value.

function H I L L - C L I M B I N G (Problem)Returns a state that is a local


maximum
inputs: problem, a problem
local variables: current, a nodeneighbor, a node

current <- MAKE-NODE(INITIAL-STATE[Problem])


loop do
neighbor <- a highest-valued successor of current
if Value[neighbor] < Value[current] then return Solution[current]
current <- neighbor

To illustrate hill-climbing, we will use the 8-queens problem typically use a


complete-state formulation, where each state has 8 queens on the board, one per
column. The successor function returns all possible states generated by moving a
single queen to another square in the same column (so each state has
8 x 7 = 56 successors). The heuristic cost function h is the number of pairs of
queens that are attacking each other, either directly or indirectly. The global
minimum of this function is zero, which occurs only at perfect solutions
Hill climbing is sometimes called greedy local search because it grabs a good
neighbor state without thinking ahead about where to go next.

hillclimbing often gets stuck for the following reasons:


Local maxima: a local maximum is a peak that is higher than each of its
neighboring states, but lower than the global maximum. Figure illustrates the
problem schematically. More concretely, the state in Figure (b) is in fact a local
maximum ; every move of a single queen makes the situation worse.

Ridges: Ridges result in a sequence of local maxima that is very difficult for
greedy algorithms to navigate.
Plateaux: a plateau is an area of the state space landscape where the evaluation
function is flat. It can be a flat local maximum, from which no uphill exit exists, or
a shoulder,from which it is possible to make progress.
If we always allow sideways moves when there are no uphill moves, an infinite
loop will occur whenever the algorithm reaches a flat local maximum that is not a
shoulder. One common solution is to put a limit on the number of consecutive
sidleways moves allowed.

a) Initial state b)A Local Maximum after 5 steps

Many variants of hill-climbing have been invented.

Stochastic hill climbing


chooses at random from among the uphill moves; the probability of selection can
vary with the steepness of the uphill move.
First-choice hill climbing
implements stochastic hill climbing by generating successors randomly until one is
generated that is better than the current state. This is a good strategy when a state
has many of successors.
Random-restart hill climbing
adopts the well known adage, "If at first you don't succeed, try, try again." It
conducts a series of hill-climbing searches from randomly generated initial state
stopping when a goal is found. It is complete with probability approaching 1, for
the trivial reason that it will eventually generate a goal state as the initial state. If
each hill-climbing search has a probability p of success, then the expected number
of restarts required is 1/ p .

Simulated annealing search


In metallurgy, annealingis the process used to temper or harden metals and glass
by heating them to a high temperature and then gradually cooling them, thus
alllowing the material to coalesce into a low-energy crystalline state.The
probability of changing from a Low energy to higher energy state is
P=e - - E/KT
where E is the +ve change in energy T is the Temperature and K is the Boltzmann
Constant
In the Early stages of the algorithm(When the Temperature is High) will accept
some bad moves and then gradually decreases.the probability of acceptance of the
bad movement depends on the change in value and the Time.

function SIMULATED-ANNEALINSearch(Problem,Scedule) returns a solution


state
inputs: problem, a problem
schedule, a mapping from time to "temperature"
local variables: current, a node
next, a node
T, a "temperature" controlling the probability of downward steps
current <- MAKE-NoDE(INITIAL-STATE[Problem])
for t<- 1 to $ do
T <- schedule[t]
if T = 0 then return current
next <- a randomly selected successor of current
E <- VALUE[Next]-Value [current]
if E > 0 then current <- next
else current <- next only with probability e E/T

The innermost loop of the simulated-annealing algorithm is quite similar to hill


climbing. Instead of picking the best move, however, it picks a random move. If
the move improves the situation, it is always accepted. Otherwise, the algorithm
accepts the move with some probability less than 1. The probability decreases
exponentially with the "badness'of the move-the amount E by which the
evaluation is worsened. The probability also decreases as the "temperature" T goes
down: "bad moves are more likely to be allowed at the start when temperature is
high, and they become more unlikely as T decreases. One can prove that if the
schedule lowers T slowly enough, the algorithm will find a global optimum
with probability approaching 1.
Local beam search
The local beam search algorithmlo keeps track of k states rather than
just one. It begins with k randomly generated states. At each step, all the successors
of all k states are generated. If any one is a goal, the algorithm halts. Otherwise, it
selects the k best successors from the complete list and repeats.
In a random-restart search, each search process runs independently of
the others. In a local beam search, useful information is passed among the k
parallel search threads. For example, if one state generates several good
successors and the other k - 1 states all generate bad successors, then the effect is
that the first state says to the others, "Come over here, the grass is greener!" The
algorithm quickly abandons unfruitful searches and moves its resources to where
the most progress is being made.

stochastic beam search,


Instead of choosing the best k from the the pool of candidate successors, stochastic
beam search chooses k successors at random, with the probability of choosing a
given successor being an increasing function of its value.

Genetic algorithms
Like beam search, GAS begin with a set of k randomly generated states, called the
population. Each state, or individual, is represented as a string over a finite
alphabet.The production of the next generation of states is shown in Figure . In (b),
each state is rated by the evaluation function or the fitness function.

A b c d
A fitness function should return higher values for better states so, for the 8-queens
problem we use the number of nonattacking pairs of queens, which has a value of
28 for a solution.The values of the four states are 24, 23, 20, and 11
In (c), a random choice of two pairs is selected for reproduction, in accordance
with the probabilities in (b). Notice that one individual is selected twice and one
not at all. Then a crossover point is randomly chosen from the positions in the
string. In Figure the crossover points are after the third digit in the first pair and
after the fifth digit in the second pair.
In (d), the offspring themselves are created by crossing over the parent strings at
the crossover point. For example, the first child of the first pair gets the first three
digits from the first parent and the remaining digits from the second parent,
whereas the second child gets the first three digits from the second parent and the
rest from the first parent.
Finally, in (e), each location is subject to random mutation with a small
independent probability. One digit was mutated in the first, third, and fourth
offspring.
The theory of genetic algorithms explains how this works using the idea of a
schema,which is a substring in which some of the positions can be left unspecified.
For example,the schema 246""""" describes all 8-queens states in which the first
three queens are in positions 2, 4, and 6 respectively. Strings that match the schema
(such as 24613578) are called instances of the schema. It can be shown that, if the
average fitness of the instances of a schema is above the mean, then the number of
instances of the schema within the population will grow over time.

function GENETIC-ALGORITHM(Population,FitnessFn )Returns an individual


inputs: population, a set of individuals
FITNESS-FN, a function that measures the fitn'ess of an individual
repeat
new-population <= empty set
loop for i from 1 to SizE(population)do
x <-RandomSelection(Population,FitnessFn)
y <-RandomSelection(Population,FitnessFn)
child <- REPRODUCE(x,y),
if (small random probability) then child<-Mutate(chi1d)
add child to new-population
population <- new-popuation
until some individual is fit enough, or enough time has elapsed
return the best individual in population, according to FITNESS-FN
function REPRODUCE(x,y), r eturns an individual
inputs: x, y, parent individuals
n <- LENGTH(x)
c <- a random number from 1 to n
return APPEND(SUBSTRIN(x,1,c),SUBSTRING(y,c+1,n))

ONLINE SEARCH AGENTS AND UNKNOWN


ENVIRONMENTS

offline search algorithms compute a complete solution before setting foot in the
real world and then execute the solution without recourse to their percepts. In
contrast, an online search agent operates by interleaving computation and action:
first it takes an action, then it observes the environment and computes the next
action.
Online search is a necessary idea for an exploration problem, where the
states and actions are unknown to the agent. An agent in this state of Ignorance
must use its actions as experiments to determine what to do next, and hence must
interleave computation and action. Consider a newborn baby: it has many possible
actions, but knows the outcomes of none of them, and it has experienced only a
few of the possible states that it can reach. The baby's gradual discovery of how the
world works is, in part, an online search process.
Online search problems
An online search problem can be solved only by an agent executing actions, rather
than by a purely computational process. We will assume that the agent knows just
the following:
1.ACTIONS(S) which returns a list of actions allowed in state s;
2.The step-cost function c(s, a, sl)-note that this cannot be used until the
agent knows that s’ is the outcome;
3.GOAL-TEST(S).
Typically, the agent's objective is to reach a goal state while minimizing cost. The
cost is the total path cost of the path that the agent actually travels. It is common to
compare this cost with the path cost of the path the agent would follow if it knew
the search space in advance-that is, the actual shortest path .In the language of
online algorithms, this is called the competitive ratio; we would like it to be as
small as possible.
No algorithm can avoid dead ends in all state spaces. Consider the two dead-end
state spaces in the figure. To an online search algorithm that has visited states S and
A, the two state spaces look identical, so it must make the same decision in both.
Therefore, it will fail in one of them. This is an example of an adversary
argument-we can imagine an adversary that constructs the state space while the
agent explores it and can put the goals and dead ends wherever it likes.

To make progress, we will simply assume that the state space is safely explorable-
that is, some goal state is reachable from every reachable state. State spaces with
reversible actions, such as mazes and 8-puzzles, can be viewed as undirected
graphs and are clearly safely explorable.

Online search agents


After each action, an online agent receives a percept telling it what state it has
reached; from this information, it can augment its map of the environment. The
current map is used to decide where to go next. This interleaving of planning and
action means that online search algorithms are quite different from the offline
search algorithms we have seen previously.
For example, offline algorithms such as A* have the ability to expand a node in
one part of the space and then immediately expand a node in another part of the
space, because node expansion involves simulated rather than real actions. An
online algorithm, on the other hand, can expand only a node that it physically
occupies. To avoid traveling all the way across the tree to expand the next node, it
seems better to expand nodes in a local order. Depth-first search has exactly this
property, because (except when backtracking) the next node expanded
is a child of the previous node expanded.
An algorithm of online depth-first search agent is shown below. This agent stores
its map in a table, result[a, s], that records the state resulting from executing action
a in state s.

function ONLINE-DFS-AGENT(s’)returns an action


inputs: s', a percept that identifies the current state
static: result, a table, indexed by action and state, initially empty
unexplored, a table that lists, for each visited state, the actions not yet tried
un.backtracked, a table that lists, for each visited state, the backtracks not yet tried
s, a, the previous state and action, initially null
if GOAL-TEST(S') then return stop
if s' is a new state then unexp1ored[s’] <-ACTIONS(S')
if s is not null then do
result[a, s] <- s'
add s to the front of unbacktracked[s’]
if unexplored[s’] is empty then
if unbacktracked[s’] is empty then return stop
else
a<- an action b such that result[b, s’] = Pop(unbacktracked[s'])
else a<- Pop(unexplored[s'])
s<-s'
return a

Online local search


Like depth-first search, hill-climbing search has the property of locality in its
node expansions.In fact, because it keeps just one current state in memory, hill-
climbing search is already an online search algorithm! Unfortunately, it is not very
useful in its simplest form because it leaves the agent sitting at local maxima with
nowhere to go. Moreover, random restarts cannot be used, because the agent
cannot transport itself to a new state.
Instead of random restarts, one might consider using a random walk to explore the
environment. A random walk simply selects at random one of the available actions
from the current state; preference can be given to actions that have not yet been
tried.

S G

the process can be very slow. See the above figure


an environment in which a random walk will take exponentially many steps to find
the goal,because, at each step, backward progress is twice as likely as forward
progress.
The basic idea is to store a "current best estimate" H(s) of the cost to
reach the goal from each state that has been visited. H(s) starts out being just the
heuristic estimate h(s) and is updated as the agent gains experience in the state
space. Figure below shows a simple example in a one-dimensional state space.
In (a), the agent seems to be stuck in a flat local minimum at the shaded state.
Rather than staying where it is, the agent should follow what seems to be the best
path to the goal based on the current cost estimates for its neighbors. The estimated
cost to reach the goal through a neighbor s is the cost to get to s plus the estimated
cost to get to a goal from there-that is, c(s, a, s) + H(s’).
An agent implementing this scheme, which is callled learning real-time A*
(LRTA*), is shown in Figure
function LRTA*-AGENT(S’)ret urns an action
inputs: s’, a percept that identifies the current state
static: result, a table, indexed by action and state, initially empty
H, a table of cost estimates indexed by state, initially empty
s, a, the previous state and action, initially null
if GOAL-TEST(S’) then return stop
if s' is a new state (not in H) then H[s’]<- h(s’)
unless s is null
result[a, s] <- s’
H[s]<- min LRTA*-COST(s,b , result[b,s ],H )
B in ACTIONS(S)
A<- an action b in ACTIONS(S')that minimizes LRTA*-COST(S',b,
result[b,s’],H )
S<-s’
return a

function LRTA*-CosT(s,a,, s', H ) r eturns a cost estimate


if s’ is undefined then return h(s)
else return c(s, a, s’) + H[s’]

You might also like