Module2-Informed Search Strategies-10Jan2024
Module2-Informed Search Strategies-10Jan2024
Artificial Intelligence
1
Informed (Heuristic) Search Strategies
Informed search strategy uses problem-specific knowledge beyond the
definition of the problem and can find solutions more efficiently than the
uninformed strategy.
The general approach we consider is called best-first search.
Best-first search is an instance of the general Tree-search or Graph-search
algorithm in which a node is selected for expansion based on an evaluation
function, f(n).
The evaluation function is construed as a cost estimate, and the node with
the lowest evaluation is expanded first.
The implementation of best-first graph search is identical to uniform-cost
search except for the use of f instead of g to order the priority queue.
The choice of f determines the search strategy.
Artificial Intelligence 2
Informed (Heuristic) Search Strategies
Most of the best-first algorithms include a heuristic function h(n) as a
component of f which is denoted by:
h(n) = estimated cost of the cheapest path from the state at node n to a goal state.
Note that the function h(n) takes a node as input but unlike the function
g(n) it depends only on the state at that node.
For example, in Romania, one might estimate the cost of the cheapest
path from Arad to Bucharest via the straight-line distance from Arad to
Bucharest.
Heuristic functions are the most common form in which additional
knowledge about the problem is imparted to the search algorithm.
We consider them to be arbitrary, nonnegative, problem-specific
functions with one constraint, i.e., if n is a goal node, then h(n) =0.
Artificial Intelligence 3
Greedy Best-First Search
Greedy best-first search algorithm try to expand the node that is closest to
the goal on the basis that, this node is likely to lead to a solution quickly.
It evaluates the nodes by using the heuristic function, i.e., f(n) = h(n).
For route-finding problems in Romania, we use the straight-line distance
heuristic, hSLD.
If the goal is Bucharest, we need to know the straight-line distances to
Bucharest which are shown in Figure 3.22.
For example, hSLD(In(Arad))=366.
Notice that the values of hSLD cannot be computed from the problem
description itself.
Moreover, it takes a certain amount of experience to know that hSLD is
correlated with actual road distances and therefore, it is a useful heuristic.
Artificial Intelligence 4
Greedy Best-First Search
Artificial Intelligence 5
Greedy Best-First Search
Figure 3.23 shows the progress of a greedy best-first search using hSLD to
find a path from Arad to Bucharest.
The first node to be expanded from Arad will be Sibiu because it is closer
to Bucharest than either Zerind or Timisoara.
The next node to be expanded will be Fagaras because it is closest.
Fagaras in turn generates Bucharest, which is the goal.
For this particular problem, greedy best-first search using hSLD finds a
solution without ever expanding a node that is not on the solution path,
and hence its search cost is minimal.
It is not optimal as the path via Sibiu and Fagaras to Bucharest is 32
kilometers longer than the path through Rimnicu Vilcea and Pitesti.
This shows why the algorithm is called greedy, at each step it tries to get
as close to the goal as it can.
Artificial Intelligence 6
Greedy Best-First Search
Artificial Intelligence 7
Greedy Best-First Search
Greedy best-first tree search is also incomplete even in a finite state space,
much like depth-first search.
Consider the problem of getting from Iasi to Fagaras.
The heuristic suggests that Neamt be expanded first because it is closest to
Fagaras, but it is a dead end.
The solution is to go first to Vaslui, a step that is actually farther from the
goal according to the heuristic, and then to continue to Urziceni, Bucharest,
and Fagaras.
The algorithm will never find this solution, also because of expanding
Neamt puts Iasi back into the frontier, Iasi is closer to Fagaras than Vaslui
is, and so Iasi will be expanded again leading to an infinite loop.
The graph search version is complete in finite spaces but not in infinite
ones.
Artificial Intelligence 8
Greedy Best-First Search
The worst-case time and space complexity for the tree version is O(bm),
where m is the maximum depth of the search space.
• With a good heuristic function the time and space complexity can be
reduced substantially.
The amount of the reduction depends on the particular problem, and on
the quality of the heuristic.
Artificial Intelligence 9
A* search: Minimizing the total estimated solution cost
The most widely known form of best-first search is called A∗ search.
It evaluates nodes by combining g(n), the cost to reach the node, and h(n),
the cost to get from the node to the goal:
f(n) = g(n) + h(n).
Since g(n) gives the path cost from the start node to node n, and h(n) is the
estimated cost of the cheapest path from n to the goal, we have
f(n) = estimated cost of the cheapest solution through n.
Thus, if we are trying to find the cheapest solution, it is better to try first the
node with the lowest value of g(n) + h(n).
This strategy is more than just reasonable provided that the heuristic
function h(n) satisfies certain conditions, and hence A∗ search is both
complete and optimal.
The algorithm is identical to uniform-cost-search except that A∗ uses g + h
instead of g. Artificial Intelligence 10
A* search: Minimizing the total estimated solution cost
Conditions for Optimality- Admissibility and Consistency:
The first condition we require for optimality is that h(n) be an admissible
heuristic.
An admissible heuristic is one that never overestimates the cost to reach
the goal.
Because g(n) is the actual cost to reach n along the current path, and
f(n) = g(n) + h(n), we have an immediate consequence that f(n) never
overestimates the true cost of a solution along the current path through n.
Admissible heuristics are optimistic because they think the cost of solving
the problem is less than the actual cost.
Straight-line distance hSLD that we used in getting to Bucharest from Arad
an example of an admissible heuristic.
Straight-line distance is admissible because the shortest path between any
two points is a straight line, so the straight line cannot be an overestimate.
Artificial Intelligence 11
A* search: Minimizing the total estimated solution cost
The progress of an A∗ tree search for Bucharest is shown in Figure 3.24.
The values of g are computed from the step costs in Figure 3.2, and the values of
hSLD are given in Figure 3.22.
Notice that Bucharest first appears on the frontier at step(e), but it is not selected
for expansion because its f-cost is (450) which is higher than that of Pitesti (417).
Since there might be another solution through Pitesti whose cost is as low as
417, the algorithm will not settle for a solution that costs 450.
A second condition called consistency (or monotonicity) is required only for
applications of A∗ to graph search.
• A heuristic h(n) is consistent if for every node n and every successor n of n
generated by any action a, the estimated cost of reaching the goal from n is no
greater than the step cost of getting to n plus the estimated cost of reaching the
goal from n .
h(n) ≤ c(n, a, n ) + h(n ).
Artificial Intelligence 12
A* search: Minimizing the total estimated solution cost
Artificial Intelligence 13
A* search: Minimizing the total estimated solution cost
Figure 3.24
Artificial Intelligence 14
A* search: Minimizing the total estimated solution cost
Optimality of A∗ :
The tree-search version of A∗ is optimal if h(n) is admissible, and the graph-
search version is optimal if h(n) is consistent (i.e., h(n) ≤ c(n, a, n ) + h(n )).
We show the second of these two claims since it is more useful.
The argument essentially mirrors the argument for the optimality of uniform-
cost search with g replaced by f just as in the A∗ algorithm itself.
The first step is to establish the following:
If h(n) is consistent, then the values of f(n) along any path are nondecreasing.
Using the definition of consistency, we can prove this as shown below.
Suppose n is a successor of n, then g(n ) = g(n) + c(n, a, n ) for some action a,
and we have f(n ) = g(n ) + h(n ) = g(n) + c(n, a, n ) + h(n ) ≥ g(n) + h(n) = f(n).
Artificial Intelligence 15
A* search: Minimizing the total estimated solution cost
The next step is to prove that “whenever A∗ selects a node n for expansion, the
optimal path to that node has been found”.
If this is not true, then there must be another frontier node n on the optimal path
from the start node to n.
By the graph separation property of figure 3.9 and because f is nondecreasing
along any path, the node n would have lower f-cost than n and it would have
been selected first.
From these observations, it follows that the sequence of nodes expanded by A∗
using graph-search is in nondecreasing order of f(n).
Hence, the first goal node selected for expansion must be an optimal solution
because f is the true cost for goal nodes (which have h=0) and all later goal
nodes will be at least as expensive.
Since f-costs are nondecreasing on any path, we can draw contours in the state
space as shown below just like the contours in a topographic map.
Artificial Intelligence 16
A* search: Minimizing the total estimated solution cost
Artificial Intelligence 17
A* search: Minimizing the total estimated solution cost
For the contour labeled 400, all nodes have f(n) value less than or equal to 400,
and the contour labeled 420, all nodes have f(n) value less than or equal to 420.
Because A∗ search expands the frontier node having lowest f-cost, we can see
that an A∗ search fans out from the start node, adds the nodes in concentric
bands of increasing f-cost.
With better heuristics, the bands will stretch towards the goal state and become
more narrowly focused around the optimal path.
If C∗ is the cost of the optimal solution path, then we can say that:
A∗ expands all nodes with f(n) < C∗.
A∗ might then expand some of the nodes on the goal contour (where f(n) = C∗) before
selecting a goal node.
Completeness requires that there should be only finitely many nodes with cost
less than or equal to C∗, a condition that is true if all step costs exceed some
finite and if b is finite.
Artificial Intelligence 18
A* search: Minimizing the total estimated solution cost
A∗ is optimally efficient for any given consistent heuristic.
That is, no other optimal algorithm is guaranteed to expand fewer nodes
than A∗.
This is because any algorithm that does not expand all nodes with f(n) < C∗
runs the risk of missing the optimal solution.
A∗ search is complete, optimal, and optimally efficient among all such
algorithms.
It does not mean that A∗ can be applied to all searching problems.
For most problems, the number of states within the goal contour search
space is still exponential in the length of the solution.
For the problems with constant step costs, the growth in run time as a
function of the optimal solution depth d is analyzed in terms of the absolute
error or the relative error of the heuristic.
Artificial Intelligence 19
A* search: Minimizing the total estimated solution cost
The absolute error is defined as Δ ≡ h∗ - h, where h∗ is the actual cost of getting
from the root to the goal, and the relative error is defined as ≡ (h∗ - h)/h∗.
• The complexity results depend very strongly on the assumptions made about
the state space.
• The simplest model studied is a state space that has a single goal and is
essentially a tree with reversible actions.
• In this case, the time complexity of A∗ is exponential in the maximum absolute
error, that is O(bΔ).
• For constant step costs, we can write this as O(b𝜖 d), where d is the solution
depth.
• For almost all heuristics in practical use, the absolute error is at least
proportional to the path cost h∗, so 𝜖 is constant or growing and the time
complexity is exponential in d.
Artificial Intelligence 20
A* search: Minimizing the total estimated solution cost
We can also see the effect of a more accurate heuristic: O(b𝜖 d) = O((b𝜖 )d), so
the effective branching factor is b𝜖.
When the state space has many goal states, then the search process can be
led astray(out of the right way, off the correct or known road, path, or route)
from the optimal path and there is an extra cost proportional to the number of
goals whose cost is within a factor 𝜖 of the optimal cost.
In the general case of a graph search, the situation is even worse.
There can be exponentially many states with f(n) < C∗ even if the absolute
error is bounded by a constant.
For example, consider a version of the vacuum world where the agent can
clean up any square for unit cost without even having to visit it.
In that case, squares can be cleaned in any order.
Artificial Intelligence 21
A* search: Minimizing the total estimated solution cost
With initially N dirty squares, there are 2N states where some subset of these
states has been cleaned and all of them are on an optimal solution path, and
hence satisfy f(n) < C∗ even if the heuristic has an error of 1.
The complexity of A∗ makes it impractical to insist to find an optimal solution.
We can use variants of A∗ that find suboptimal solutions quickly, or one can
design heuristics that are more accurate but not strictly admissible.
The use of a good heuristic provides enormous savings compared to the use of
an uninformed search.
Because A∗ keeps all generated nodes in the memory, it usually runs out of
space long before it runs out of time.
For this reason, A∗ is not practical for many large-scale problems.
But there are algorithms that overcome the space problem without sacrificing
optimality or completeness at a small cost in execution time.
Artificial Intelligence 22
Heuristic Functions
To understand the nature of heuristics in general, we will consider the
heuristics for the 8-puzzle problem.
The 8-puzzle was one of the earliest heuristic search problems.
We know that the objective of the puzzle is to slide the tiles horizontally or
vertically into the empty space until the configuration matches the goal
configuration as shown in below.
Artificial Intelligence 23
Heuristic Functions
The average solution cost for a randomly generated 8-puzzle instance is
about 22 steps.
The branching factor is about 3. (When the empty tile is in the middle, four
moves are possible, when it is in a corner, two moves are possible, and when
it is along an edge, three moves are possible).
This means that an exhaustive tree search to the depth 22 would have about
322 ≈ 3.1×1010 states.
A graph search would cut this down by a factor of about 170,000 because
only 9!/2 =181, 440 distinct states are reachable.
This is a manageable number, but the corresponding number for the 15-
puzzle is roughly 1013, so we need to find a good heuristic function.
If we want to find the shortest solutions by using A∗, then we need a heuristic
function that never overestimates the number of steps to the goal.
There is a long history of such heuristics for the 15-puzzle.
Artificial Intelligence 24
Heuristic Functions
The two commonly used heuristics are:
h1 = the number of misplaced tiles.
For Figure 3.28, all of the eight tiles are out of position, so the start state would have
h1 = 8.
h1 is an admissible heuristic because it is clear that any tile that is out of place must be
moved at least once.
h2 = the sum of the distances of the tiles from their goal positions.
Since tiles cannot move along diagonals, the distance we count is the sum of the
horizontal and vertical distances.
This distance is called as the city block distance or Manhattan distance.
h2 is also admissible because all any move can do is, move one tile one step closer to
the goal.
Tiles 1 to 8 in the start state has a Manhattan distance of h2 = 3+1+2+2+2+3+3+2 = 18.
Artificial Intelligence 25
The Effect of Heuristic Accuracy on Performance
• The quality of a heuristic can be characterized by the effective branching factor b∗.
• If the total number of nodes generated by A∗ for some problem is N and the
solution depth is d, then a uniform tree of depth d would have the branching factor
b∗ to contain N + 1 nodes.
• Thus, N + 1 = 1+ b∗ + (b∗)2 + ・ ・ ・ + (b∗)d.
• For example, if A∗ finds a solution at depth 5 using 52 nodes, then the effective
branching factor is equal to 1.92.
• The effective branching factor can vary across problem instances, but usually it is
constant for sufficiently hard problems.
• Therefore, experimental measurements of b∗ on a small set of problems can be
used as a good guide to the overall usefulness of heuristic.
• A good heuristic will have a value of b∗ close to 1 and allows large problems to be
solved at reasonable computational cost.
Artificial Intelligence 26
The Effect of Heuristic Accuracy on Performance
To test the heuristic functions h1 and h2, we generated 1200 random problems
with solution lengths from 2 to 24 (100 problems for each even number) and
solved them with iterative deepening search and with A∗ tree search using both
h1 and h2.
Figure 3.29 gives the average number of nodes generated by each strategy
and the effective branching factor.
The results shows that h2 is better than h1 and is far better than using iterative
deepening search.
Even for small problems with d = 12, A∗ with h2 is 50,000 times more efficient
than uninformed iterative deepening search.
From the definitions of the two heuristics, we can see that for any node n,
h2(n) ≥ h1(n).
Hence, we say that h2 dominates h1.
A∗ using h2 will never expand more nodes than using h1.
Artificial Intelligence 27
The Effect of Heuristic Accuracy on Performance
Artificial Intelligence 28
Generating Admissible Heuristics from Relaxed Problems
We have seen that both h1 (misplaced tiles) and h2 (Manhattan distance) are
good heuristics for the 8-puzzle and we also know that h2 is better than h1.
How might one have come up with h2?
Is it possible for a computer to invent such a heuristic mechanically?
h1 and h2 are estimates of the remaining path length for the 8-puzzle, but they
are also perfectly accurate path lengths for simplified versions of the puzzle.
If the rules of the puzzle were changed so that a tile could move anywhere
instead of just to the adjacent empty square, then h1 would give the exact
number of steps in the shortest solution.
Similarly, if a tile could move one square in any direction, even onto an
occupied square, then h2 would give the exact number of steps in the shortest
solution.
A problem with fewer restrictions on the actions is called a relaxed problem.
Artificial Intelligence 29
Generating Admissible Heuristics from Relaxed Problems
The state-space graph of the relaxed problem is a supergraph of the original
state space because the removal of restrictions creates additional edges in the
graph.
Because of the additional edges are added by the relaxed problem to the state
space, any optimal solution of the original problem is also a solution in the
relaxed problem, but the relaxed problem may have better solutions if the
added edges provide short cuts.
Hence, the cost of an optimal solution to a relaxed problem is an admissible
heuristic for the original problem.
Also, because the derived heuristic is an exact cost for the relaxed problem it
is also a consistent.
If a problem definition is written in a formal language, then it is possible to
construct relaxed problems automatically.
For example, if the 8-puzzle actions are described as given below:
Artificial Intelligence 30
Generating Admissible Heuristics from Relaxed Problems
A tile can move from square A to square B if A is horizontally or vertically
adjacent to B and B is blank.
we can generate three relaxed problems by removing one or both of the
conditions:
a) A tile can move from square A to square B if A is adjacent to B.
b) A tile can move from square A to square B if B is blank.
c) A tile can move from square A to square B.
Artificial Intelligence 32
Generating Admissible Heuristics from Relaxed Problems
If a collection of admissible heuristics say h1 . . . hm are available for some
problem and none of them dominates any of the others, then which should
we choose?
We need not make a choice.
We can have the best of all by defining:
h(n) = max{h1(n), h2(n), . . . , hm(n)} .
This composite heuristic uses whichever function is most accurate on the
given node.
Because the component heuristics are admissible, h is also admissible.
It can also be proved that h is consistent.
Also, h dominates all the component heuristics.
Artificial Intelligence 33
Generating Admissible Heuristics from Subproblems: Pattern databases
Admissible heuristics can also be derived from the solution cost of a
subproblem of a given problem.
For example, Figure 3.30 shows a subproblem of the 8-puzzle instance of
Figure 3.28.
The subproblem involves getting tiles 1, 2, 3, 4 into their correct positions.
Clearly, the cost of the optimal solution of this subproblem is a lower bound
on the cost of the complete problem.
It looks to be more accurate than Manhattan distance in some cases.
The pattern databases is used to store these exact solution costs for every
possible subproblem instance in our example, every possible configuration of
the four tiles and the blank. (The locations of the other four tiles are irrelevant
for the purposes of solving the subproblem, but moves of those tiles do count
toward the cost.)
Artificial Intelligence 34
Generating Admissible Heuristics from Subproblems: Pattern databases
Artificial Intelligence 40