Optimal Searching: - Advantages
Optimal Searching: - Advantages
Best-first search
review
• Advantages
– Takes advantage of domain information to guide search
– Greedy advance to the goal
• Disadvantages
– Considers cost to the goal from the current state
– Some path can continue to look good according to the
heuristic function
3 2 x
1
1 1 x
1
At this point the path is more
costly than the alternate path
1
Branch & Bound
• Use current cost (past cost) to a node
• Pick best (lowest) cost.
• If f is our evaluation function for node n,
f(n)= g(n) [g= cost ‘gone so far’
g≥0
• B&B: sort queue in order of lowest f, &
make sure not to pursue identical paths
with higher costs than known costs
The A* Algorithm:
combining past with future
• Now Consider the overall cost of the solution.
3 2 3 3
x x
1 2
1 1 x 3 5 x
1 4
2
UCS, BFS, Best-First,
and A*
• f = g + h ¨ A* Search
• h = 0 ¨ Uniform cost search
• g = 1, h = 0 ¨ Breadth-First search
• g = 0 ¨ Best-First search
Admissible Heuristics
• This is not quite enough, we also require h be
admissible:
– a heuristic h is admissible if h(n) < h*(n) for all nodes n,
– where h* is the actual cost of the optimal path from n to the
goal
• Examples:
– travel distance straight line distance must be shorter than
actual travel path
– tiles out of place each move can reorder at most one tile
distance of each out of place tile from the correct place each
move moves a tile at most one place toward correct place
3
Heuristic Functions
• Tic-tac-toe
x x
x
x o x o x x x o o o x
o o x x
o
8 Puzzle
• Exhaustive Search : 3 20 = 3 * 10 9
states
• Remove repeated state : 9! = 362,880
• Use of heuristics
– h1 : # of tiles that are in the wrong position
– h2 : sum of Manhattan distance
1 2 3 1 2 3
h1 = 3 8 4
8 5 6
7 4 h2 = 1+2+2=5 7 6 5
4
Heuristics : 8 Puzzle
1 2 3 1 2 3
8 5 6 8 4
7 4 7 6 5
1 2 3 1 2 3 1 2 3
8 5 6 8 6 8 5 6
7 4 7 5 4 7 4
Tic-tac-toe
• Most-Win Heuristics
x x
3 win 2 win
4 win
5
Road Map Problem
n
s
g(n’)
n’
h(n’)
h(n)
h(s)
Effect of Heuristic
Accuracy on
Performance
• Well-designed heuristic have its branch close to 1
• h2 dominates h1 iff
h2(n) ≥ h1(n), ∀ n
• It is always better to use a heuristic function with
higher values, as long as it does not overestimate
• Inventing heuristic functions
– Cost of an exact solution to a relaxed problem is a good
heuristic for the original problem
– collection of admissible heuristics
h*(n) = max(h1(n), h2(n), …, hk(n))
6
Optimality of A*
• Let us assume that f is non-decreasing along each path
– if not, simply use parent’s value
– if that’s the case, we can think of A* as expanding f contours toward
the goal; better heuristics make this contour more “eccentric”
• Let G be an optimal goal state with path cost f*
• Let G2 be a suboptimal goal state with path cost g(G2) > f*.
– suppose A* picks G2 before G (A* is not optimal)
– suppose n is a leaf node on the path to G when G2 is chosen
– if h is admissible, then f* >= f(n)
– since n was not chosen, it must be the case that f(n) >= G2
– therefore f* >= f(G2), but since G2 is a goal, f* >= g(G2)
– But this is a contradiction --- G2 is a better goal node than G
– Thus, our supposition is false and A* is optimal.
Completeness of A*
• Suppose there is a goal state G with path cost f*
– Intuitively: since A* expands nodes in order of increasing f, it must
eventually expand node G
• If A* stops and fails
– Prove by contradiction that this is impossible.
– There exists a path from the initial state to the node state
– Let n be the last node expanded along the solution path
– n has at least one child, that child should be in the open nodes
– A* does not stop until there are open list is empty (unless it finds a
goal state). Contradiction.
• A* is on an infinite path
– Recall that cost(s1,s2) > δ
– Let n be the last node expanded along the solution path
– After f(n)/δ the cumulative cost of the path becomes large enough
that A* will expand n. Contradiction.
7
Properties of A*
• Suppose C* is the cost of the optimal solution path
– A* expands all nodes with f(n) < C*
– A* might expand some of nodes with f(n) = C* on the
“goal contour”
– A* will expand no nodes with f(n) > C*, which are pruned!
– Pruning: eliminating possibilities from consideration
without examination
• A* is complete
8
A* summary
• Completeness
– provided finite branching factor and finite cost per operator
• Optimality
– provided we use an admissible heuristic
• Time complexity
– worst case is still O(bd) in some special cases we can do
better for a given heuristic
• Space complexity
– worst case is still O(bd)
9
Iterative Deepening A*
• Goals
– A storage efficient algorithm that we can use in practice
– Still complete and optimal
• Modification of A*
– use f-cost limit as depth bound
– increase threshold as minimum of f(.) of previous cycle
• Each iteration expands all nodes inside the contour
for current f-cost
• same order of node expansion
Games
• Why games?
– Games provide an environment of pure competition with
objective goals between agents.
– Game playing is considered an intelligent human activity.
– The environment is deterministic and accessible.
– The set of operators is small and defined.
– Large state space
– Fun!
10
Games
• Consider Games
– Two player games
– Perfect Information: not involving chance or hidden
information (not back-gammon, poker)
– Zero-sum games: games where our gain is our opponents
loss
– Examples: tic-tac-toe, checkers, chess, go
• Games of perfect information are really just search
problems
– initial state
– operators to generate new states
– goal test
– utility function (win/lose/draw)
Game Trees
• Tic-tac-toe
x x
x 1 ply 1 move
x o x o x x x o o o x
o o x x
o
11
Game Trees Example
x o x
ox
o win
lose
x o x x o x x o x draw
x ox ox ox
o x o x o
x o x x o x x o x x o x x o x x o x
x ox x ox o ox ox o ox ox
o o oo x o x oo x o o x o
x o x x o x x o x x o x x o x
x ox x ox o ox o ox x ox
o x o x oo x xo x x o o x o
x o x x o x x o x x o x x o x x o x
x ox x ox o ox ox o ox ox
o o oo x o x oo x o o x o
x o x x o x x o x x o x x o x
x ox x ox o ox o ox x ox
o x o x oo x xo x x o o x o
12
Game Trees Example
x o x
ox
o win
lose
x o x x o x x o x draw
x ox ox ox
o x o x o
x o x x o x x o x x o x x o x x o x
x ox x ox o ox ox o ox ox
o o oo x o x oo x o o x o
x o x x o x x o x x o x x o x
x ox x ox o ox o ox x ox
o x o x oo x xo x x o o x o
Perfect decisions in 2-
person games
Let’s name the two agents (players) MAX and MIN
• MAX is searching for the highest utility state, so when
it is MAX’s move she will maximize the payoff
• High utility for MAX is low utility for MIN, since it’s a
zero-sum game
• When it is MIN’s move she will minimize the payoff
• The winning strategy is to maximize over minimum
payoff moves.
13
Minimax Algorithm
For the MAX player
1. Generate the game to terminal states
2. Apply the utility function to the terminal
states
3. Back-up values
• At MIN ply assign minimum payoff move
• At MAX ply assign maximum payoff move
4. At root, MAX chooses the operator that
led to the highest payoff
Minimax Example
Two-ply game
Max
A1 A2 A3
Min
A A31
A11 A12 A13 21 A A A32 A33
22 23
Max
14
Minimax Example
Two-ply game
Max
A1 A2 A3
Min
A A31
A11 A12 A13 21 A A A32 A33
22 23
Max
3 12 8 2 4 6 14 5 2
Minimax Example
Two-ply game
Max
A1 A2 A3
Min 3 2 2
A A31
A11 A12 A13 21 A A A32 A33
22 23
Max
3 12 8 2 4 6 14 5 2
15
Minimax Example
Two-ply game
3
Max
A1 A2 A3
Min 3 2 2
A A31
A11 A12 A13 21 A A A32 A33
22 23
Max
3 12 8 2 4 6 14 5 2
Minimax
• Perfect play for deterministic, perfect-
information games
• Totally impractical since it generates the
whole tree
– Time complexity is O(bd)!
– Space complexity is O(bd)
16
The Complexity of
Minimax
• For a given game with branching factor b,
searching to depth d require O(bd)
computation and storage
– chess has a branching factor of around 35
• A 1-move search tree for chess has 1225 leaves
• Say a typical chess game has 100 moves then the
number of leaves in the tree is 35100 = 10154
• Assuming a modern computer can process 1000000
board positions a second it will take 10140 years to search
the entire tree.
– go has a branching factor of 360 or more
17
Minimax Cutoff
18
Minimax
# ways to win heuristic
x x
x
x o x o x x x o o o x x x
o o x x o o
o
2 2 3 2 2 2 2 2 1 3
Minimax
x x
x 1
2 2
x o x o x x x o o o x x x
o o x x o o
o
2 2 3 2 2 2 2 2 1 3
19
Minimax
2
x x
x 1
2 2
x o x o x x x o o o x x x
o o x x o o
o
2 2 3 2 2 2 2 2 1 3
Revised Minimax
Algorithm
For the MAX player
1. Generate the game as deep as time permits
2. Apply the evaluation function to the leaf states
3. Back-up values
• At MIN ply assign minimum payoff move
• At MAX ply assign maximum payoff move
4. At root, MAX chooses the operator that led to
the highest payoff
20
Cutting Off Search
• Because the evaluation function is only
an approximation it can misguide us.
– Example: white appears to have the
advantage, but black captures the queen in
the next move. Need to search one more ply
• Often, it makes sense to make depth
dynamically decided
• quiescence search --- go until things
seem stable
– Example: in chess, don’t stop in positions
where capture moves are imminent
Nonquiescent
21
Alpha-beta Pruning
• Efficiency hack on top of minimax: gets same result,
but fewer evaluations
• Basic idea: keep track of your best move's value so
far while performing minimax search
• For Max, that value is called alpha
– When Min is examining its moves, and it gets one back that
is less than alpha (i.e., worse for Max), then its parent, Max,
would not make that move because the move that gave
alpha is better. So Min can abandon this node right now
before examining any more moves from it
• Ditto for Min, but the best value so far is called beta
(Min wants to make beta as small as possible)
– When Max is expanding its moves, if any are greater than
beta (i.e., worse for Min) than it can stop early
• Starts with worst possible alpha (negative
infinity) and beta (positive infinity)
Alpha-Beta Pruning
Two-ply game Example
-∞
Max
Min 3
Max
3 12 8
22
Alpha-Beta Pruning
Two-ply game Example
≥3
Max
Min 3
Max
3 12 8
Alpha-Beta Pruning
Two-ply game Example
Max ≥3
Min 3 ≤2
X X
Max
When Min is examining its moves, and it gets one
3 12 8 2
back that is less than alpha (i.e., worse for Max),
then its parent, Max, would not make that move
because the move that gave alpha is better.
So Min can abandon this node right now before
examining any more moves from it
23
Alpha-Beta Pruning
Two-ply game Example
Max ≥3
Min 3 ≤2 ≤ 14
X X
Max
3 12 8 2 14
Alpha-Beta Pruning
Two-ply game Example
Max ≥3
Min 3 ≤2 ≤ 14
X X
Max
3 12 8 2 14 5
24
Alpha-Beta Pruning
Two-ply game Example
Max ≥3
Min 3 ≤2 X 5
≤ 14
X X
Max
3 12 8 2 14 5
Alpha-Beta Pruning
Two-ply game Example
Max ≥3
Min 3 ≤2 X 5
≤ 14
X X X X
Max
3 12 8 2 14 5 2
25
Alpha-Beta Pruning
Two-ply game Example
Max ≥3
Min 3 ≤2 X 5X 2
≤ 14
X X X X
Max
3 12 8 2 14 5 2
Alpha-Beta Pruning
Two-ply game Example
Max ≥3
Min 3 ≤2 X 5X 2
≤ 14
X X X X
Max
3 12 8 2 14 5 2
26
Alpha-Beta Pruning
Two-ply game Example
Max 3
Min 3 ≤2 X 5X 2
≤ 14
X X X X
Max
3 12 8 2 14 5 2
Alpha-beta pruning
• Pruning does not affect final result
• Alpha-beta pruning
– Good move ordering improves effectiveness of
pruning
– Asymptotic time complexity
• O((b/log b)d)
– With “perfect ordering,” time complexity
• O(bd/2)
• means we go from an effective branching factor of b to
sqrt(b) (e.g. 35 -> 6).
27
Complexity of Alpha-
Beta Pruning
• Order the nodes so that best moves for that player are
investigated first
– tend to get alpha and beta to optimal values faster
– so get more pruning
• If a decent heuristic for ordering moves can be found--
– Time complexity approaches O(b^(d/2))
• If moves are randomly ordered, then around O(b^(3d/4))
• But these both assume randomly distributed utilities
– need empirical work with real games
• Space complexity is O(bd)
– The same as other depth-first searches
From here on is
optional material
28
α−β Procedure pseudo-code
minimax-α−β(board, depth, type, α, β)
If depth = 0 return Eval-Fn(board)
else if type = max
cur-max = -inf
loop for b in succ(board)
b-val = minimax-α−β(b,depth-1,min, α, β)
cur-max = max(b-val,cur-max)
α = max(cur-max, α)
if cur-max >= β finish loop
return cur-max
else (type = min)
cur-min = inf
loop for b in succ(board)
b-val = minimax-α−β(b,depth-1,max, α, β)
cur-min = min(b-val,cur-min)
β = min(cur-min, β)
if cur-min <= α finish loop
return cur-min
• E.g. backgammon
29
Game tree for a
backgammon
30
Position evaluation in games
with chance nodes
• For minimax, any order-preserving
transformation of the leaf values
does not affect the choice of move
31
Complexity of expectiminimax
• Problems
– The extra cost compared to minimax is very high
– Alpha-beta pruning is more difficult to apply
State-of-the-Art for
Chess Programs
• Chess basics
– 8x8 board, 16 pieces per side, average branching
factor of about 35
– Rating system based on competition
• 500 --- beginner/legal
• 1200 --- good weekend warrior
• 2000 --- world championship level
• 2500+ --- grand master
– time limited moves
– open and closing books available
– important aspects: position, material
32
Chess Ratings
Sketch of Chess
History
• First discussed by Shannon, Sci. American, 1950
• Initially, two approaches
– human-like
– brute force search
• 1966 MacHack ---1100 --- average tournament player
• 1970’s
– discovery that 1 ply = 200 rating points
– hash tables
– quiescence search
• Chess 4.x reaches 2000 (expert level), 1979
• Belle 2200, 1983
– special purpose hardware
• 1986 --- Cray Blitz and Hitech 100,000 to 120,000 position/sec
using special purpose hardware
33
IBM checks in
• Deep thought:
– 250 chips (2M pos/sec /// 6-7M pos/soc)
– Evaluation hardware
• piece placement
• pawn placement
• passed pawn eval
• file configurations
• 120 parameters to tune
– Tuning done to master’s games
• hill climbing and linear fits
– 1989 --- rating of 2480 === Kasparov beats
34
Chess Programs
Today
• Deep Blue dismantled --- leaves void in the world of
chess programs
• Deep Junior
• Deep Fritz
– A commercial product
– Pentium III dual processing 933 MHz computers
– Analyze 6 million moves per second
– As strong as Deep Blue
State-of-the-art
for Checkers Programs
• Checker
– Arthur Samuel (1952)
35
State-of-the-art
for Backgammon Programs
• Use a temporal differencing algorithm to train a neural network
• Strongest Programs: TD-GAMMON by Gary Tesauro of IBM,
Jellyfish
• Achieve expert level play
State-of-the-art
for Othello Programs
• Programs stronger than human players
• Programs use learning techniques to fine-tune the evaluation
function, the opening book, and even the search algorithm
• Strongest programs: Logistello, Hannibal
36
State-of-the-art
for GO Programs
• Branching factor of GO about 360
• Humans lead by a huge margin
• Many, many programs
– From recent Go Ladder
competition: Go4++, Many Faces
of Go, Ego 1, NeuroGo II,
Explorer, Indigo, Golois, Gnu Go,
Gobble, gottaGo, Poka, Viking,
GoLife I, The Turtle, Gogo, GL7
State-of-the-art
for Poker Programs
• Poki (University of Alberta) is probably the strongest poker program
• Not close to world-class level
37