HEURISTIC SEARCH
4.0 Introduction 4.3 Using Heuristics I n Games
4.1 An Algorithm for Heuristic Search 4.4 Complexity Issues
4.2 Admissibility, Monotonicity, and 4.5 Epilogue and References
Informedness
4.6 Exercises
George F Luger
ARTIFICIAL INTELLIGENCE 5th edition
Structures and Strategies for Complex Problem Solving
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 1
More formally, why heuristic
functions work?
In any search problem where there are at most b choices
at each node and a depth of d at the goal node, a naive
search algorithm would have to, in the worst case, search
around O(bd) nodes before finding a solution
(Exponential Time Complexity).
Heuristics improve the efficiency of search algorithms
by reducing the effective branching factor from b to
(ideally) a low constant b* such that
1 =< b* << b
2
Heuristic Functions
A heuristic function is a function f(n) that gives an estimation on
the “cost” of getting from node n to the goal state – so that the
node with the least cost among all possible choices can be
selected for expansion first.
Three approaches to defining f:
f measures the value of the current state (its “goodness”)
f measures the estimated cost of getting to the goal from the current
state:
f(n) = h(n) where h(n) = an estimate of the cost to get
from n to a goal
f measures the estimated cost of getting to the goal state from the
current state and the cost of the existing path to it. Often, in this
case, we decompose f:
f(n) = g(n) + h(n) where g(n) = the cost to get to n (from
initial state) 3
Approach 1: f Measures the Value
of the Current State
Usually the case when solving optimization problems
Finding a state such that the value of the metric f is
optimized
Often, in these cases, f could be a weighted sum of a
set of component values:
N-Queens
Example: the number of queens under attack …
Data mining
Example: the “predictive-ness” (a.k.a. accuracy) of a rule
discovered 4
Approach 2: f Measures the Cost
to the Goal
A state X would be better than a state Y if the
estimated cost of getting from X to the goal
is lower than that of Y – because X would be
closer to the goal than Y
• 8–Puzzle
h1: 8 The number of misplaced tiles
(squares with number).
h2: The sum of the distances of the tiles
from their goal positions.
h2 = 3+1+2+2+2+3+3+2= 18 5
Approach 3: f measures the total cost
of the solution path (Admissible
Heuristic Functions)
A heuristic function f(n) = g(n) + h(n) is admissible if h(n) never
overestimates the cost to reach the goal.
Admissible heuristics are “optimistic”: “the cost is not that much …”
However, g(n) is the exact cost to reach node n from the initial state.
Therefore, f(n) never over-estimate the true cost to reach the goal
state through node n.
Theorem: A search is optimal if h(n) is admissible.
I.e. The search using h(n) returns an optimal solution.
Given h2(n) > h1(n) for all n, it’s always more efficient to use h2(n).
h2 is more realistic than h1 (more informed), though both are optimistic.
6
Traditional informed
search strategies
Greedy Best first search
“Always chooses the successor node with the best f value”
where f(n) = h(n)
We choose the one that is nearest to the final state among
all possible choices
A* search
Best first search using an “admissible” heuristic function f
that takes into account the current cost g
Always returns the optimal solution path
7
Admissible 30
Heuristics
● E.g., for the 8-puzzle
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Admissible 31
● E.g., for the 8-puzzle
Heuristics
– h (n)
1 = number of misplaced tiles
– h (2 n) = total Manhattan distance
(i.e., no. of squares from desired location of each tile)
h 1(S) =?
● h2(S) =? Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Admissible 32
● E.g., for the 8-puzzle
Heuristics
– h (n)
1 = number of misplaced tiles
– h (2 n) = total Manhattan distance
(i.e., no. of squares from desired location of each tile)
h 1(S) =? 6
● h2(S) =? 4+0+3+3+1+0+2+1
Artificial= 14
Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Dominan 33
ce
● If h 2( n) ≥ h1 (n) for all n (both admissible)
→ h2 dominates h1 and is better for search
● Typical search costs (d = depth of solution for 8-puzzle)
d = 14 IDS =
3,473,941 nodes
A∗(h1) = 539 nodes A∗(h2) = 113 nodes
d = 24 IDS ≈
54,000,000,000 nodes
A∗(h1) = 39,135 nodes A∗(h2) = 1,641
nodes
●Given any admissible heuristics ha, hb,
h(n) = max(ha(n), hb(n))
Artificial Intelligence: Informed 15 February
is also admissible and dominates
Search ha , hb 2024
Philipp Koehn
The successive stages of open and closed that generate this graph are:
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 2
Fig 4.16 State space generated in heuristic search of the 8-puzzle graph.
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 3
Fig 4.17 Open and closed as they appear after the 3rd iteration of heuristic
search
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 4
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 5
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 6
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 7
Fig 4.18 Comparison of state space searched using heuristic search with space searched by
breadth-first search. The proportion of the graph searched heuristically is shaded.
The optimal search selection is in bold. Heuristic used is f(n) = g(n) + h(n) where
h(n) is tiles out of place.
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 8
Fig 4.22 Heuristic measuring conflict applied to states of tic-tac-toe.
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005 12
Local Search for
• Optimization
Note that for many types of problems, the path to a goal is
irrelevant (we simply want the solution – consider the 8-
queens).
• If the path to the goal does not matter, we might consider a
class of algorithms that don’t concern themselves with
paths at all.
• Local search algorithms operate using a single current
node (rather than multiple paths) and generally move only
to neighbors of that node.
• Typically, the paths followed by the search are not retained
(the benefit being a potentially substantial memory
savings).
Local Search for
• Local search Optimization
– Keep track of single current state
– Move only to neighboring states
– Ignore paths
• Advantages:
– Use very little memory
– Can often find reasonable solutions in large or
infinite (continuous) state spaces.
• “Pure optimization” problems
– All states have an objective function
– Goal is to find state with max (or min) objective
value
– Does not quite fit into path-cost/goal-state
Local Search for
Optimization
• In addition to finding goals, local search algorithms are
useful for solving pure optimization problems, in which
we aim to find the best state according to an objective
function.
• Nature, for example, provides an objective function –
reproductive fitness.
• To understand local search, we consider state-space
landscapes, as shown next.
Optimizatio
n
• So what is optimization?
• Find the minimum or maximum of an objective
function (usually: given a set of constraints):
Local Search for
Optimization
State-Space Landscape
•If elevation corresponds with cost, then the
aim is to find the lowest valley – a global
minimum; if elevation corresponds to an
objective function, then the aim is to find
the highest peak – a global maximum.
Hill-Climbing (Greedy Local
Search)
function HILL-CLIMBING( problem) return a state that is a local
maximum
input: problem, a problem
local variables: current, a node.
neighbor, a node.
current MAKE-NODE(INITIAL-STATE[problem])
loop do
neighbor a highest valued successor of current
if VALUE [neighbor] ≤ VALUE[current] then return STATE[current]
current neighbor
Minimum version will reverse inequalities and
look for lowest valued successor
Hill-
Climbing
• “A loop that continuously moves towards increasing value”
– terminates when a peak is reached
– Aka greedy local search
• Value can be either
– Objective function value
– Heuristic function value (minimized)
• Hill climbing does not look ahead of the immediate
neighbors
• Can randomly choose among the set of best successors
– if multiple have the best value
• “Climbing Mount Everest in a thick fog with
amnesia”
Hill-Climbing (8-
queens)
• Need to convert to an optimization
problem!
Hill-Climbing (8-
• State queens)
– All 8 queens on the board in some configuration
• Successor function
– move a single queen to another square in the same
column.
• Example of a heuristic function h(n):
– the number of pairs of queens that are
attacking each other
– (so we want to minimize this)
Hill-Climbing (8-
queens)
• h = number of pairs of queens that are attacking each
other
• h = 17 for the above state
Hill-Climbing (8-
queens)
• Randomly generated 8-queens starting states…
• 14% the time it solves the problem
• 86% of the time it get stuck at a local minimum
• However…
– Takes only 4 steps on average when it
succeeds
– And 3 on average when it gets stuck
Hill Climbing
Drawbacks
• Local
maxima
• Plateau
s
• Diagonal
ridges 1
7
Escaping Shoulders: Sideways
• If no downhill (uphill)Move
moves, allow sideways
moves in hope that algorithm can escape
– Need to place a limit on the possible number of sideways
moves to avoid infinite loops
• For 8-queens
– Now allow sideways moves with a limit of 100
– Raises percentage of problem instances solved from
14 to 94%
– However….
• 21 steps for every successful solution
• 64 for each failure
1
8
Tabu
• Search
Prevent returning quickly to the same state
• Keep fixed length queue (“tabu list”)
• Add most recent state to queue; drop oldest
• Never make the step that is currently tabu’ed
• Properties:
– As the size of the tabu list grows, hill-climbing will
asymptotically become “non-redundant” (won’t look at the
same state twice)
– In practice, a reasonable sized tabu list (say 100 or so)
improves the performance of hill climbing in many
problems
Escaping Shoulders/local Optima
Enforced Hill Climbing
• Perform breadth first search from a local
optima
– to find the next state with better h function
• Typically,
– prolonged periods of exhaustive search
– bridged by relatively quick periods of hill-
climbing
2
• Middle ground b/w local and systematic 0
Hill-climbing: stochastic
• variations
Stochastic hill-climbing
– Random selection among the uphill moves.
– The selection probability can vary with the steepness of the uphill
move.
• To avoid getting stuck in local minima
– Random-walk hill-climbing
– Random-restart hill-climbing
– Hill-climbing with both
2
1
Hill Climbing: stochastic
variations
When the state-space landscape has local
minima, any search that moves only in the
greedy direction cannot be complete
Random walk, on the other hand, is
asymptotically complete
Idea: Put random walk into greedy hill-
climbing
Hill-climbing with random
• restarts
If at first you don’t succeed, try, try again!
• Different variations
– For each restart: run until termination vs. run for a fixed
time
– Run a fixed number of restarts or run indefinitely
• Analysis
– Say each search has probability p of success
• E.g., for 8-queens, p = 0.14 with no sideways moves
– Expected number of restarts?
– Expected number of steps taken?
• If you want to pick one local search algorithm, learn this
Hill-climbing with random
• At each step do one ofwalk
the two
– Greedy: With prob p move to the neighbor with largest
value
– Random: With prob 1-p move to a random neighbor
Hill-climbing with both
• At each step do one of the three
– Greedy: move to the neighbor with largest value
– Random Walk: move to a random neighbor
– Random Restart: Resample a new current state
Simulated
Annealing
• Simulated Annealing = physics inspired twist on random walk
• Basic ideas:
– like hill-climbing identify the quality of the local improvements
– instead of picking the best move, pick one randomly
– say the change in objective function is d
– if d is positive, then move to that state
– otherwise:
• move to this state with probability proportional to d
• thus: worse moves (very large negative d) are executed less
often
– however, there is always a chance of escaping from local maxima
– over time, make it less likely to accept locally bad moves
– (Can also make the size of the move random as well, i.e., allow
Physical Interpretation of Simulated
• A Physical Analogy:Annealing
• imagine letting a ball roll downhill on the function surface
– this is like hill-climbing (for minimization)
• now imagine shaking the surface, while the ball rolls,
gradually reducing the amount of shaking
– this is like simulated annealing
• Annealing = physical process of cooling a liquid or
metal until particles achieve a certain frozen crystal
state
• simulated annealing:
– free variables are like particles
– seek “low energy” (high quality) configuration
– slowly reducing temp. T with particles moving around randomly
Simulated
Annealing
Best-First 5
Search
● Idea: use an evaluation function for each node
– estimate of “desirability”
⇒ Expand most desirable unexpanded node
● Implementation:
fringe is a queue sorted in decreasing order of desirability
● Special cases
– greedy search
– A∗ search
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Roman 6
ia
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Roman 7
ia
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Greedy 8
Search
● State evaluation function h(n) (heuristic)
= estimate of cost from n to the closest goal
● E.g., hSLD(n) = straight-line distance from n to Bucharest
● Greedy search expands the node that appears to be closest to goal
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Romania with Step Costs 9
in km
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Greedy Search 10
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Greedy Search 11
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Greedy Search 12
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Greedy Search 13
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Properties of Greedy 14
Search
● Complete? No, can get stuck in loops, e.g., with Oradea as goal,
Iasi → Neamt → Iasi → Neamt →
Complete in finite space with repeated-state checking
●Time? O(bm), but a good heuristic can give dramatic improvement
●Space? O(bm)—keeps all nodes in memory
●Optimal? No Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
15
a* search
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ 16
Search
● Idea: avoid expanding paths that are already expensive
● State evaluation function f (n) = g(n) + h(n)
– g(n) = cost so far to reach n
– h(n) = estimated cost to goal from n
– f (n) = estimated total cost of path through n to goal
● A∗ search uses an admissible heuristic
– i.e., h(n) ≤ h∗(n) where h∗(n) is the true cost from n
– also require h(n) ≥ 0, so h(G) = 0 for any goal G
● E.g., hSLD(n) never overestimates the actual road distance
● Theorem: A∗ search is optimal
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 17
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 18
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 19
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 20
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 21
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 22
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 23
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 24
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 25
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 26
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
A∗ Search 27
Example
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn
Optimality of 28
A∗
● Suppose some suboptimal goal G2 has been generated and is in the queue
● Let n be an unexpanded node on a shortest path to an optimal goal G
since h(G )2 = 0
f (G2) =
>g(G) since G2 is suboptimal
g(G2) ≥ f (n) since h is admissible
● Since f (G2) > f (n), A∗Artificial
will never terminate
Intelligence: at G2
Informed 15 February
Search 2024
Philipp Koehn
Properties of 29
A∗
● Complete? Yes, unless there are infinitely many nodes with f ≤ f (G)
● Time? Exponential in [relative error in h × length of solution]
● Space? Keeps all nodes in memory
● Optimal? Yes—cannot expand fi+1 until fi is finished A∗ expands all
nodes with f (n) < C∗
A∗ expands some nodes with f (n) = C∗
A∗ expands no nodes with f (n) > C∗
Artificial Intelligence: Informed 15 February
Search 2024
Philipp Koehn