Search Local

Local Search/Stochastic
Search
Today’s Class of Search Problems

• Given:
– A set of states (or configurations) S = {X1..XM}
– A function that evaluates each configuration:
Eval(X)
• Solve:
– Find global extremum: Find X* such that Eval(X*) is
greater than all Eval(Xi) for all possible values of Xi
Eval(X)
X*
1
Real-World Examples
Placement
Floorplanning
Channel routing
Compaction
• VLSI layout:
– X = placement of components + routing of
interconnections
– Eval = Distance between components + %
unused + routing length
Real-World Examples
Jobs
Machines
Time
• Scheduling: Given m machines, n jobs

• X = assignment of jobs to machines
• Eval = completion time of the n jobs (minimize)
• Others: Vehicle routing, design, treatment sequencing,

………
2
What makes this challenging?
• Problems of particular interest:
– Set of configurations too large to be enumerated
explicitly
– Computation of Eval(.) may be expensive
– There is no algorithm for finding the maximum of
Eval(.) efficiently
– Solutions with similar values of Eval(.) are
considered equivalent for the problem at hand
– We do not care how we get to X*, we care only
about the description of the configuration X* (this is
a key difference with the earlier search problems)
Example: TSP (Traveling Salesperson Problem)
2
5 X1 = {1 2 5 3 6 7 4}
4 7
1
3
6
5 Eval(X1) > Eval(X2)
2
4 7
1
3 X2 = {1 2 5 4 7 6 3}
6
• Find a tour of minimum length passing through
each point once
3
Example: TSP (Traveling Salesperson Problem)
5 5
2 2
4 7 4 7
1 1
3 3
6 6
X1 = {1 2 5 3 6 7 4} X2 = {1 2 5 4 7 6 3}
Eval(X1) > Eval(X2)
• Configuration X = tour through nodes {1,..,N}
• Eval = Length of path defined by a permutation of
{1,..,N}
• Find X* that realizes the minimum of Eval(X)
• Size of search space = order (N-1)!/2
• Note: Solutions for N = hundreds of thousands
Example: SAT (SATisfiability)

A ∨ ¬B ∨ C
¬A ∨ C ∨ D
B ∨ D ∨ ¬E
¬C ∨ ¬D ∨ ¬ E
¬A ∨ ¬C ∨ E LLL
A B C D E Eval
X1 true true false true false 5
X2 true true true true true 4
4
Example: SAT (SATisfiability)
A ∨ ¬B ∨ C
A B C D E Eval
¬A ∨ C ∨ D
X1 true true false true false 5
B ∨ D ∨ ¬E
X2 true true true true true 4
¬ C ∨ ¬D ∨ ¬E
¬A ∨ ¬C ∨ E
LLL
• Configuration X = Vector of assignments of N Boolean
variables
• Eval(X) = Number of clauses that are satisfied given
the assignments in X
• Find X* that realizes the maximum of Eval(X)
• Size of search space = 2N
• Note: Solutions for 1000s of variables and clauses
Example: N-Queens
Eval(X) = 5 Eval(X) = 2
Find a configuration
in which no queen
can attack any
other queen
Eval(X) = 0
5
Example: N-Queens
Eval(X) = 5 Eval(X) = 2 Eval(X) = 0

• Configuration X = Position of the N queens in N
columns
• Eval(X) = Number of pairs of queens that are attacking
each other
• Find X* that realizes the minimum: Eval(X*) = 0
• Size of search space: order NN
• Note: Solutions for N = millions
Local Search
• Assume that for each configuration X, we
define a neighborhood (or “moveset”)
Neighbors(X) that contains the set of
configurations that can be reached from X in
one “move”.
1. Xo , Initial state
2. Repeat until we are “satisfied” with the
current configuration:
3. Evaluate some of the neighbors in
Neighbors(Xi)
4. Select one of the neighbors Xi+1
5. Move to Xi+1
6
Local Search
The definition of the
neighborhoods is not
obvious or unique in
general. The performance
1. X Initial state
of the search o algorithm
,
depends critically on the
2. Repeat until we are “satisfied” with the
definition of the
neihborhood which is not
current configuration:
straightforward in general.
3. Evaluate some of the neighbors in

Neighbors(Xi)
4. Select one of the neighbors Xi+1
5. Move to Xi+1 Ingredient 2. Stopping
Ingredient 1. Selection condition
strategy: How to decide
which neighbor to accept
Simplest Example
S = {1,..,100}
Neighbors(X) = {X-1,X+1}
7
Simplest Example
Local optimum
Eval(X*) >=
Global optimum
Eval(X) for all Xs
Eval(X*) >=
in Neighbors(X)
Eval(X) for all Xs
Neighbors(X) =
{X-1,X+1}
S = {1,..,100}
• We are interested in the global maximum, but we

may have to be satisfied with a local maximum
• In fact, at each iteration, we can check only for
local optimality
• The challenge: Try to achieve global optimality
through a sequence of local moves
Most Basic Algorithm: Hill-Climbing

(Greedy Local Search)
• X Initial configuration
• Iterate:
1. E Eval(X)
2. N Neighbors(X)
3. For each Xi in N
Ei Eval(Xi)
4. If all Ei’s are lower than E
Return X
Else
i* = argmaxi (Ei) X Xi* E Ei*
8
More Interesting Examples
• How can we define Neighbors(X)?
5 A ∨ ¬B ∨ C
2
4 7 ¬A ∨ C ∨ D
TSP
SAT B ∨ D ∨ ¬E
1 3
6 ¬C ∨ ¬ D ∨ ¬E
¬A ∨ ¬ C ∨ E
LLL
N-Queens
Issues
Multiple “poor” local maxima Plateau = constant region of Eval(.)

X*
Eval(X)
Ridge = Impossible to
reach X* from Xstart
Xstart using uphill moves only
9
Issues
• Constant memory usage
• All we can hope is to find the local maximum
“closest” to the initial configuration Can we do
better than that?
• Ridges and plateaux will plague all local search
algorithms
• Design of neighborhood is critical (as important
as design of search algorithm)
• Trade-off on size of neighborhood
larger neighborhood = better chance of finding a
good maximum but may require evaluating an
enormous number of moves
smaller neighborhood = smaller number of
evaluation but may get stuck in poor local
maxima
10
Stochastic Search: Randomized
Hill-Climbing
• Iterate: Until when?
1. E Eval(X)
2. X’ one configuration
randomly selected in
Neighbors (X)
3. E’ Eval(X’) Critical change: We no
4. If E’ > E longer select the best
move in the entire
X X’ neighborhood
E E’
TSP Moves
5 “2-change”
2
4 7 O(N2) neighborhood
1 3
6 5
2 4 7
Select 2
edges
1 3
6
5
2
4 7
Invert the order of
the corresponding
1
3 vertices
6
11
“3-change” O(N3) 2 4 7
neighborhood
1
…….. k-change 3
6
5
8
2 2
4 7 4 7
1 3
1 6
3 5
6 8
5 2 4
8 7
Select 3 edges
1 3
2 5
6
4 7 8
2 4 7
1
3 1
6 3
5 5
6
8 8
Hill-Climbing: TSP Example

% error % error Running Running
from min from min time time
cost cost (N=100) (N=1000)
(N=100) (N=1000)
2-Opt 4.5% 4.9% 1 11
2-Opt (Best 1.9% 3.6%
of 1000)
3-Opt 2.5% 3.1% 1.2 13.7
3-Opt (Best 1.0% 2.1%
of 1000)
Data from: Aarts & Lenstra, “Local Search
in Combinatorial Optimization”, Wiley
Interscience Publisher
12
Hill-Climbing: TSP Example
• k-opt = Hill-climbing with k-change neighborhood
• Some results:
– 3-opt better than 2-opt
– 4-opt not substantially better given increase in computation
time
– Use random restart to increase probability of success
– Better measure: % away from (estimated) minimum cost
% error from % error from Running time Running time
min cost min cost (N=100) (N=1000)
(N=100) (N=1000)
2-Opt 4.5% 4.9% 1 11
2-Opt (Best of 1.9% 3.6%
1000)
3-Opt 2.5% 3.1% 1.2 13.7
3-Opt (Best of 1.0% 2.1% Data from: Aarts & Lenstra, “Local Search
in Combinatorial Optimization”, Wiley
1000) Interscience Publisher
Hill-Climbing: N-Queens
• Basic hill-climbing is not very effective
• Exhibits plateau problem because many configurations have
the same cost
• Multiple random restarts is standard solution to boost
performance
N=8 % Success Average number of moves
Direct hill climbing 14% 4
With sideways moves 94% 21 (success)/64 (failure)
E=5 E=2 E=0

Data from Russell & Norvig
13
Hill-Climbing: SAT
A ∨ ¬B ∨ C ¬C ∨ ¬D ∨ ¬E
¬A ∨ C ∨ D LLL ¬A ∨ ¬C ∨ E
• State X = assignment of N boolean variables
• Initialize the variables (x1,..,xN) randomly to
true/false
• Iterate until all clauses are satisfied or max
iterations:
Random
1. Select an unsatisfied clause
walk part
2. With probability p:
Select a variable xi at random Greedy part
3. With probability 1-p:
Select the variable xi such that changing xi will unsatisfy the least
number of clauses (Max of Eval(X))
4. Change the assignment of the selected variable xi
Hill-Climbing: SAT
• WALKSAT algorithm still one of the most
effective for SAT
• Combines the two ingredients: random
walk and greedy hill-climbing
• Incomplete search: Can never find out if
the clauses are not satisfiable
For more details and useful examples/code: http://www.cs.washington.edu/homes/kautz/walksat/
14
Simulated Annealing
1. E Eval(X)
2. X’ one configuration randomly selected
in Neighbors (X)
Critical change: We no longer
3. E’ Eval(X’) move always uphill. Next
4. If E’ >= E question: How to choose p?
X X’
E E’
Else accept the move to X’ with some
probability p:
X X’
E E’
How to set p?
• X Initial configuration If p constant: We don’t know
• Iterate: how to set p should depend
1. E Eval(X) on the shape of the Eval
function
2. X’ one configuration
randomly selected in Decrease p as the iterations
Neighbors (X) progress We accept
3. E’ Eval(X’) fewer downhill moves as we
4. If E’ >= E approach the global
X X’ maximum
E E’
Decrease p as E-E’ increases
Else accept the move to Lower probability to move
X’ with some probability p: downhill if slope is high
X X’
E E’
15
How to set p? Intuition
E = E(X)
E = E(X)
E’ = E(X’) E’ = E(X’)
E – E’ is large: It is E – E’ is small: It is likely that we

more likely that we are are moving toward a shallow
moving toward a maximum that is likely to be a
(promising) sharp (uninteresting) local maximum,
maximum so we don’t so we like to move downhill to
want to move downhill explore other parts of the
too much landscape
Choosing p: Simulated Annealing
• If E’ >= E accept the move
• Else accept the move with probability:

p = e -(E – E’)/T
• Start with high temperature T and

decrease T gradually as iterations
increase (“cooling schedule”)
16
p
Increasing T
Increasing |∆E|
Choosing p: Simulated Annealing

• If E’ >= E accept the move
• Else accept the move with probability:
p = e -(E – E’)/T
• Start with high temperature T and decrease T
gradually as iterations increase (“cooling schedule”)
Increasing T
Increasing |∆E|
17
Simulated Annealing
1. Do K times:
1.1 E Eval(X)
1.2 X’ one configuration randomly
selected in Neighbors (X)
1.3 E’ Eval(X’)
1.4 If E’ >= E
X X’; E E’;
Else accept the move with probability
p = e -(E – E’)/T :
X X’; E E’;
2. T α T
18
Simulated Annealing
• T Initial high temperature
• Iterate: Iterate a number of times keeping
the temperature fixed
1. Do K times:
1.1 E Eval(X)
1.2 X’ one configuration randomly selected in
Neighbors (X) Use the previous definition of
1.3 E’ Eval(X’) the probability
1.4 If E’ >= E
Progressively decrease the
X X’; E E’;
temperature using an exponential
Elseschedule:
cooling accept the
T(n)move
= αn Twith
with probability
α<1 p = e -(E – E’)/T :
X X’; E E’;
2. T α T T = 0 Greedy hill climbing
T =∞ Random walk
Basic Example
T= T=
Iteration 150: Random

downhill moves allow
us to escape the local
Starting point: We move
extremum
most of the time uphill
19
Basic Example
T= T=
Iteration 180: Random

downhill moves have Iteration 800: As T decreases,
pushed us past the fewer downhill moves are
local extremum allowed and we stay at the
maximum
Basic Example
E
Note that larger

deviations from uphill
search are allowed at
high temperature
Temperature
Iterations
20
Where does this come from?
• If the temperature of a solid is T, the probability of moving
between two states of energy is:
e –∆Energy/kT
• If the temperature T of a solid is decreased slowly, it will
reach an equilibrium at which the probability of the solid being
in a particular state is:
• Probability (State) proportional to e –Energy(State)/kT
• Boltzmann distribution States of low energy relative to T

are more likely
• Analogy:
– State of solid Configurations X
– Energy Evaluation function Eval(.)
• N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth. A.H. Teller

and E. Teller, Journal Chem. Phys. 21 (1953) 1087-1092
A TSP Example
N = 13 nodes (in a circle)
K = 100N Starting configuration

E = 25 E(X) = 55
Note: Boring but it has an

obvious solution
21
A TSP Example
Note that larger deviations from downhill
search are allowed at high temperature
Temperature
Iterations
Iterations
22
Final configuration after
convergence
Note that
intermediate states
can be much worse
than the initial state.
Initial
Configuration
Iterations
Another Example
N = 13 nodes Initial state
K = 100N
23
Another Example
Temperature
Iterations
Final configuration after

convergence
Initial
Configuration
Iterations
24
What can we say about convergence?
• In theory:
lim lim Pr( X (T , K ) ∈ S * ) = 1

T →0 K →∞
In words: Probability that the state reached after K

iterations at temperature T is a global optimum
• In practice:
– Perform a large enough number of iterations (K
“large enough”)
– Decrease temperature slowly enough (α “close
enough” to 1)
– But, if not careful, we may have to perform an
enormous number of evaluations
Simulated Annealing
• T Initial high temperature
• Iterate:
Many parameters
1. Do K times:
1.1 E Eval(X)
need to be tweaked!!
1.2 X’ one configuration randomly selected in
Neighbors (X)
1.3 E’ Eval(X’)
1.4 If E’ >= E
X X’; E E’;
Else accept the move with probability p = e -(E – E’)/T :
X X’; E E’;
2. T α T
25
SA Discussion
• Design of neighborhood is critical

• How to choose K? Typically related to size of
neighborhood
• How to choose α? Critical to avoid large number
of useless evaluations. Especially a problem
close to convergence (empirically, most of the
time spent close to the optimum)
SA Discussion
• How to choose starting temperature? Typically
related to the distribution of anticipated values of
∆E (e.g., Tstart = max{∆E over a large sample of
pairs of neighbors})
• What if we choose a really bad starting X?
Multiple random restart.
• How to avoid repeated evaluation? Use a bit
more memory by remembering the previous
moves that were tried (“Tabu search”)
• Use (faster) approximate evaluation if possible
(How?)
26
SA Discussion
• Often better than hill-climbing. Successful
algorithm in many applications
• Many parameters to tweak. If not careful,
may require very large number of
evaluations
• Semi-infinite number of variations for
improving performance depending on
applications
Genetic Algorithms
• View optimization by analogy with evolutionary
theory Simulation of natural selection
• View configurations as individuals in a
population
• View Eval as a measure of fitness
• Let the least-fit individuals die off without
reproducing
• Allow individuals to reproduce with the best-fit
ones selected more often
• Each generation should be overall better fit
(higher value of Eval) than the previous one
• If we wait long enough the population should
evolve so toward individuals with high fitness
(i.e., maximum of Eval)
27
Genetic Algorithms: Implementation
• Configurations represented by strings:
X= 1 0 0 1 1 0 0 1
• Analogy:
– The string is the chromosome representing the individual
– String made up of genes
– Configuration of genes are passed on to offsprings
– Configurations of genes that contribute to high fitness tend to
survive in the population
• Start with a random population of P configurations and

apply two operations
– Reproduction: Choose 2 “parents” and produce 2 “offsprings”

– Mutation: Choose a random entry in one (randomly selected)
configuration and change it
Genetic Algorithms: Reproduction

1 0 0 1 1 0 0 1
Parents:
1 0 1 1 0 0 0 1
Select random 1 0 0 1 1 0 0 1
crossover point:
1 0 1 1 0 0 0 1
Offsprings: 1 0 0 1 0 0 0 1 1 0 1 1 1 0 0 1
• An offspring receive part of the genes from

each of the parents
• Implemented by crossover operation
28
Genetic Algorithms: Mutation
• Random change of one element in one
configuration
Implements random deviations from inherited
traits
Corresponds loosely to “random walk”: Introduce
random moves to avoid small local extrema
1 0 0 1 1 0 0 1
1 0 0 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 0 1 1 0 0 1
1 1 1 1 1 0 0 1
Select a random Select a random entry Change that entry
individual
Basic GA Outline
• Create initial population X = {X1,..,XP}
• Iterate:
1. Select K random pairs of parents (X,X’)
2. For each pair of parents (X,X’):
1.1 Generate offsprings (Y1,Y2) using crossover
operation
1.2 For each offspring Yi:
Replace randomly selected element of the
population by Yi
With probability µ:
Apply a random mutation to Yi
• Return the best individual in the population
29
Basic GA Outline
Stopping condition is not obvious?
• Iterate:
Possible strategy:
1. Select K random pairs of parents (X,X’)
Select the best rP
2. For each pair of parents (X,X’):individuals (r < 1) for
1.1 Generate offsprings (Y1,Y2) reproduction
using crossoverand
discard the rest
operation
Variation:
Implements selection of
Generate only
1.2 For each offspring Yi: the fittest
one offspring
Replace randomly selected element of the
population by Yi
Genetic Algorithms: Selection

• Discard the least-fit individuals through threshold on
Eval or fixed percentage of population
• Select best-fit (larger Eval) parents in priority
• Example: Random selection of individual based on
the probability distribution
Eval ( X )
Pr(individual X selected) =
∑ Eval (Y )
Y ∈ population
• Example (tournament): Select a random small subset

of the population and select the best-fit individual as
a parent
• Implements “survival of the fittest”

• Corresponds loosely to the greedy part of hill–
climbing (we try to move uphill)
30
GA and Hill Climbing
• Iterate:
Hill-climbing component: Try to
1. Select K random pairs
moveofuphill
parents (X,X’)
as much as possible
2. For each pair of parents (X,X’):
1.1 Generate offsprings (Y1,Y2) using crossover
operation
Random walkeach offspring Y :
1.2 For i
component: Move
randomly toReplace
escape randomly selected element of the
shallow local population
maxima by Yi
How would you set up these

problems to use GA search?
5 A ∨ ¬B ∨ C
2
4 7 ¬A ∨ C ∨ D
TSP
SAT B ∨ D ∨ ¬E
1
3
6 ¬C ∨ ¬ D ∨ ¬E
¬A ∨ ¬ C ∨ E
LLL
N-Queens
31
TSP Example
Cost
Average cost in population
N = 13
P = 100 elements in Minimum cost

population
Generation
µ = 4% mutation rate
Optimal solution reached at
r = 50% reproduction rate generation 35
population candidate for

Best rN elements in
reproduction
Best (lowest
cost) element in
population
Initial population
32
Population at generation 15
33
Another TSP Example
Cost
Average cost in population
Converges and remains stable Minimum cost

after generation 23
Stabilizes at
generation 23
0.4% difference:
GA = 11.801
SA = 11.751
But: Number of operations

(number of cost evaluations) much
smaller (approx. 2500)
34
Even more radical ideas..
Individual = program
X = parse tree representing a program
ifte
> X Y (ifte (X > Y) X Y)
X Y
Crossover ifte +
X *
Parents: > X Y
2 Y
X Y ifte
Offsprings: > X *
X Y 2 Y
Use genetic algorithms as before with this definition of crossover
Example applications: robot controller, signal processing, circuit design
Intriguing, but alternative solutions exist for most of these applications; this is not
the first approach to consider!!!
Koza. Genetic programming: On the programming of computers by means of natural selection. MIT Press. 1992
http://www.genetic-programming.org/
35
GA Discussion
• Many parameters to tweak: µ, P, r
• Many variations on basic scheme. Examples:
– Multiple-point crossover
– Dynamic encoding
– Selection based on rank or relative fitness to least fit
individual
– Multiple fitness functions
– Combine with a local optimizer (for example, local hill-
climbing) Deviates from “pure” evolutionary view
• In many problems, assuming correct choice of
parameters, can be surprisingly effective
GA Discussion
• Why does it work at all?
• Limited theoretical results (informally!):
– Suppose that there exists a partial assignment of genes s
such that:
Average of Eval ( X ) ≥ Average of Eval (Y )
X contains s Y ∈Population
– Then the number of individuals containing s will increase
in the next generation
• Key consequence: The design of the
representation (the chromosomes) is critical to the
performance the GA. It is probably more important
than the choice of parameters of selection strategy,
etc.
36
Summary
• Hill Climbing
• Stochastic Search
• Simulated Annealing
• Genetic Algorithms
• Class of algorithms applicable to many practical

problems
• Not useful if more direct search methods can be used
• The algorithms are general black-boxes. What makes
them work is the correct engineering of the problem
representation
– State representation
– Neighborhoods
– Evaluation function
– Additional knowledge and heuristics
(Some) References
• Russell & Norvig, Chap. 4
• Aarts & Lenstra. Local Search in Combinatorial
Optimization. Wiley-InterScience. 1997.
• Spall. Introduction to Stochastic Search and
Optimization. Wiley-InterScience. 2003.
• Numerical Recipes (http://www.nr.com/).
• Haupt&Haupt. Practical Genetic Algorithms. Wiley-
InterScience. 2004.
• Mitchell. An Introduction to Genetic Algorithms (Complex
Adaptive Systems). MIT Press. 2003.
• http://www.cs.washington.edu/homes/kautz/walksat/
37

Search Local

Uploaded by

Copyright:

Available Formats

Search Local

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Search Local

Uploaded by

Copyright:

Available Formats

Local Search/Stochastic

Today’s Class of Search Problems

• Scheduling: Given m machines, n jobs

• Others: Vehicle routing, design, treatment sequencing,

Example: TSP (Traveling Salesperson Problem)

Example: SAT (SATisfiability)

Eval(X) = 5 Eval(X) = 2 Eval(X) = 0

3. Evaluate some of the neighbors in

• We are interested in the global maximum, but we

Most Basic Algorithm: Hill-Climbing

Multiple “poor” local maxima Plateau = constant region of Eval(.)

Hill-Climbing: TSP Example

E=5 E=2 E=0

For more details and useful examples/code: http://www.cs.washington.edu/homes/kautz/walksat/

E – E’ is large: It is E – E’ is small: It is likely that we

Choosing p: Simulated Annealing

• If E’ >= E accept the move

• Else accept the move with probability:

• Start with high temperature T and

Choosing p: Simulated Annealing

Iteration 150: Random

Iteration 180: Random

Note that larger

• Boltzmann distribution States of low energy relative to T

• N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth. A.H. Teller

N = 13 nodes (in a circle)

K = 100N Starting configuration

Note: Boring but it has an

N = 13 nodes Initial state

Final configuration after

lim lim Pr( X (T , K ) ∈ S * ) = 1

In words: Probability that the state reached after K

• Design of neighborhood is critical

• Start with a random population of P configurations and

– Reproduction: Choose 2 “parents” and produce 2 “offsprings”

Genetic Algorithms: Reproduction

• An offspring receive part of the genes from

Genetic Algorithms: Selection

• Example (tournament): Select a random small subset

• Implements “survival of the fittest”

How would you set up these

P = 100 elements in Minimum cost

population candidate for

Converges and remains stable Minimum cost

But: Number of operations

> X Y (ifte (X > Y) X Y)

• Class of algorithms applicable to many practical

You might also like