0% found this document useful (0 votes)
24 views

Week-6 - Informed Search and Local Search

Uploaded by

grupsakli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Week-6 - Informed Search and Local Search

Uploaded by

grupsakli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Dr.

Ahmet Esad TOP


[email protected]
oExpand the node that seems closest…

oWhat can go wrong?


closest cities

oIs it complete?
o No (can get stuck in loops)
o Iasi to Fagaras will lead it to go to Neamt [dead end] (IasiNeamtIasiNeamt …)
o Graph search version, however, is complete (for finite spaces)
oComplete?
o No – can get stuck in loops, e.g., IasiNeamtIasiNeamt…
oTime?
o O(𝑏 𝑚 ), (in worst case)
o m  max depth of search space
o but a good heuristic can give dramatic improvement
oSpace?
o O(𝑏 𝑚 ) – keeps all nodes in memory
oOptimal?
o No (not guaranteed to render lowest cost solution)
oA* expands the fringe node with lowest f value where
o f(n) = g(n) + h(n)
o g(n) is the cost to reach n
o h(n) is an optimistic estimate of the least cost from n to a goal node:
o 0  h(n)  h*(n)
oA* tree search is optimal
oIts performance depends heavily on the heuristic h
o Expand the node that seems the closest and
that was the shortest…

h(x)
oComplete?
o Yes (unless there are infinitely many nodes with f  f(G)
oTime?
o Exponential with path length, O(𝑏 𝑚 )
oSpace?
o Keeps all nodes (generated) in memory, so also exponential, O(𝑏 𝑚 )
o It is the major problem, not time
oOptimal?
o Yes (provided that h is admissible [for tree] or consistent [for graph]).
o Every consistent heuristic is also admissible
oOptimally Efficient? (something even stronger) (no search algorithm could do
better!)
o Yes
o (no algorithm with the same heuristic is guaranteed to expand fewer nodes).
oA* keeps the entire explored region in memory
o => will run out of space before you get bored waiting for the answer
oThere are variants that use less memory
o IDA* works like iterative deepening, except it uses an f-limit instead of a depth limit
o On each iteration, remember the smallest f-value that exceeds the current limit, use as new
limit
o Very inefficient when f is real-valued and each node has a unique value
o RBFS is a recursive depth-first search that uses an f-limit = the f-value of the best
alternative path available from any ancestor of the current node
o When the limit is exceeded, the recursion unwinds but remembers the best reachable f-value on
that branch
o SMA* uses all available memory for the queue, minimizing thrashing
o When full, drop worst node on the queue but remember its value in the parent
oOften, admissible heuristics are solutions to relaxed problems, where new
actions are available

366
15

Straight line distance Manhattan distance

oProblem P2 is a relaxed version of P1 if A2(s)  A1(s) for every s


oTheorem: h2*(s)  h1*(s) for every s, so h2*(s) is admissible for P1
o Dominance: h1 ≥ h2 if
n h1(n)  h2(n)
o Roughly speaking, larger is better as long as both are admissible
o The zero heuristic is pretty bad (what does A* do with h=0? [UCS])
o The exact heuristic is pretty good, but usually too expensive!
o What if we have two heuristics, neither dominates the other?
o Form a new heuristic by taking the max of both:
h(n) = max( h1(n), h2(n))
o Max of admissible heuristics is admissible and dominates both!
o It is all about constructing a superior heuristic from two (or more)
o Minimum number of knight’s moves to get from A to B?
o h1 = (Manhattan distance)/3 [due to each L shape move takes 3 MH steps]
o Another thing that can be considered is colors
o Knight changes its color when it moves
o even if A, B same color, odd otherwise
o For same color, even number of moves needed
o For different color, odd number of moves needed
o h1’ = h1 rounded up to correct parity
o e.g. when you get 2.7 as required number of moves from MH/3, for same color  4 moves

o h2 = (Euclidean distance)/ 5 (rounded up to correct parity)


o h3 = (max x or y shift)/2 (rounded up to correct parity)
o Knight moves only gives 2 squares at most in any direction
o So to be able to start from a and go to b, there needs to be 9 squares to y direction
o So, it needs 5 moves at least

o h(n) = max( h1’(n), h2(n), h3(n)) is admissible!


oInformed search methods use a heuristic function h(n) that estimates the
cost of a solution from n
oGreedy best-first search expands nodes with minimal h(n)
o It is not optimal, but is often efficient.
oA* search expands nodes with minimal f(n)=g(n)+h(n)
o It is complete and optimal
o provided that h(n) is admissible [for tree] or consistent [for graph]
o The space complexity of A* is still excessive
oThe performance of heuristic search algorithms depends on the quality of
the h(n) function
oOne can sometimes construct good heuristics by relaxing the problem
definition
o Tree search keeps unexplored alternatives on the fringe (ensures completeness)
o BFS, DFS, UCS, A* are complete but in a worst case you try everything (slow)

o Local search: improve a single option until you can’t make it better (no fringe!)
o Abandoning the fringe (idea of exploring everything)  lost the completeness guarentee

o New successor function: local changes (to existing state)

o Generally much faster and more memory efficient (but incomplete and suboptimal)
oIn many optimization problems, path is irrelevant; the goal state is the solution
oThen state space = set of “complete” configurations;
find configuration satisfying constraints, e.g., n-queens problem; or, find optimal
configuration, e.g., travelling salesperson problem

oIn such cases, can use iterative improvement algorithms (aka local search): keep a
single “current” state, try to improve it
oMore or less unavoidable if the “state” is yourself (i.e., learning)
oLocal search may be good when

o You don’t know a goal, but can recognize one when you see it

o You only want to find a goal and don’t need to keep track of the sequence of actions
that reached it

o You don’t care about finding the shortest solution

o You do care about find a solution quickly


oSimple, general idea:
o Start wherever
o Repeat: move to the best neighboring state
o If no neighbors better than current, quit

oWhat’s bad about this approach?


o Complete?
o Optimal?
oWhat’s good about it?
o Finds a decent solution
o Easy to apply
o Fast
oGoal: n queens on board with no conflicts, i.e., no queen attacking another
oStates: n queens on board, one per column
oActions: move a queen in its column
oHeuristic value function: number of conflicts
function HILL-CLIMBING(problem) returns a state
current ← make-node(problem.initial-state)
loop do
neighbor ← a highest-valued successor of current
if neighbor.value ≤ current.value then
return current.state
current ← neighbor

“Like climbing Everest in dense fog with amnesia (memory loss) and no map”
oRandom restarts
o find global optimum
o it’s true but not that
much helpful

oRandom sideways
moves
o Escape from
shoulders
o Loop forever on flat
local maxima
local maximum oLocal Maxima: peaks not highest
point in space (not global max)
plateau oPlateaus: broad flat region giving
search no guidance (use random
walk) (e.g. flat local max and
shoulder)
oRidges: flat like plateaus, but with
drop-offs to sides; steps to North,
ridge East, South and West may go
down, but step to NW may go up
o Starting from X, where do you end up ?
o Starting from Y, where do you end up ?
o Starting from Z, where do you end up ?
2 5
1 7 4 -4
8 6 3

1 2 5 1 2 5 1 2 3
start 7 4 7 4 -4 8 4 0
8 6 3 8 6 3 7 6 5
-3 goal
1 2 5
8 7 4 -4
6 3
oNo sideways moves – number of
conflicts for heuristic:
o Succeeds w/ prob. p=0.14 (14%)
o 86% stuck at local maxima
o So, allow random restart (7 trial)
o Average number of moves per trial:
o 4 when succeeding, 3 when getting stuck
o Expected total number of moves needed:
o 3(1-p)/p + 4 ≅ 22 moves
oAllowing 100 sideways moves:
o Succeeds w/ prob. p=0.94 (94%)
o Average number of moves per trial:
o 21 when succeeding, 65 when getting stuck
o Expected total number of moves needed:
o 65(1-p)/p + 21 =~ 25 moves
oIn metallurgy, annealing is a technique
involving heating & controlled cooling of a material to
increase size of its crystals & reduce defects
oHeat causes atoms to become unstuck from initial
positions (local minima of internal energy) and wander
randomly through states of higher energy
oSlow cooling gives them more chances of finding
configurations with lower internal energy than initial
one
oReminds the annealing process used to cool metals slowly to reach an
ordered (low-energy) state
oBasic idea:
o Allow “bad” moves occasionally, depending on “temperature”
o Temperature defines how much you are bouncing around
o High temperature => more bad moves allowed, shake the system out of its local
minimum (or maximum)
o Gradually reduce temperature according to some schedule
o Low temperature => more good moves allowed, tries to fit onto the global optimum
o Sounds pretty weird, doesn’t it?
• Idea: Escape local maxima by allowing downhill moves
• But make them rarer as time goes on

function SIMULATED-ANNEALING(problem,schedule) returns a state


current ← problem.initial-state
for t = 1 to ∞ do
T ←schedule(t)
if T = 0 then return current
next ← a randomly selected successor of current
∆E ← next.value – current.value
if ∆E > 0 then current ← next
else current ← next only with probability e∆E/T
oTheoretical guarantee:
o If T decreased slowly enough, will converge to optimal state!
oIs this an interesting guarantee?
o Sounds like magic, but reality is reality:
o The more downhill steps you need to escape a local optimum, the
less likely you are to ever make them all in a row
o “Slowly enough” may mean exponentially slowly
o Random restart hill climbing also converges to optimal state…
oPractical?
o It does not guarentee that as a result of finite amount of
effort
o Time required to ensure a significant probability of success
will usually exced the time of a complete search
oBasic idea:
oInstead of 1 local search algorithm → K copies of this algorithm
o all running simultaneously
o K copies of a local search algorithm, initialized randomly
oFor each iteration
o Generate ALL successors from K current states
o Choose best K of these to be the new current states

Or, K chosen randomly with


a bias towards good ones
(Stochastic beam search)
K*b

Best K out of K*b

• Why is this different from K local searches in parallel? (e.g. Running hill-climbing with random K restarts in parallel
[not in sequence])
The searches communicate! “Come over here, the grass is greener!” (without communciation, they may hit to
dead end, stuck in local maximum)
• What is the problem?
Concentration in a small region after some iterations. → Stochastic beam search is a solution (choose K
successors at random with probability that is an increasing function of their objective value)
8
X7 9
X7 10
4 initial states (K=4)
9 X8 Branching factor b=2

9 9 Higher value is better


8 9 10
9 10

8 10
7 9 9
X6 X5

6
X7 8
X3 9
X7 9
States need to be encoded

Initial Population

oGenetic algorithms use a natural selection metaphor


o Keep best K individuals at each step (selection) based on a fitness function
o Also have pairwise crossover operators, with optional mutation to give variety
oPossibly the most misunderstood, misapplied (and even maligned)
technique around
o Represent state by a string of 8 digits in {1..8}
o S1 = ‘32752411’
o Why does crossover make sense here?
o Left quadrant and right quadrant have the right number of queens
o What would mutation be?
o Change 1 digit (1 queen location)
o What would a good fitness function be?
o Fitness function = # of non-attacking pairs
o F(Ssolution) = 8*7/2 = 28
o F(S1) = 24
oMany configuration and optimization problems can be formulated as local
search
oGeneral families of algorithms:
o Hill-climbing, continuous optimization
o Simulated annealing (and other stochastic methods)
o Local beam search: multiple interaction searches
o Genetic algorithms: break and recombine states

Many machine learning algorithms are local searches


Thanks for your attention!

You might also like