Unit Ii
Unit Ii
Problem Formulation is the process of deciding what actions and states to consider, given a goal.
Knowledge available to the agent is considered
Current state
Outcome of actions
8- Puzzle Problem
States: A state description specifies the location of each of the eight tiles and the blank in one of the
nine squares.
Initial state: Any state can be designated as the initial state. Note that any given goal can be
reached from exactly half of the possible initial states.
Actions: The simplest formulation defines the actions as movements of the blank space Left, Right,
Up, or Down. Different subsets of these are possible depending on where the blank is.
Transition model: Given a state and action, this returns the resulting state; for example, if we apply
Left to the start state in Figure 3.4, the resulting state has the 5 and the blank switched.
Goal test: This checks whether the state matches the goal configuration shown in Figure.
Path cost: Each step costs 1, so the path cost is the number of steps in the path.
3
4
Prepared by S.Surendhar AP/CSE
Different Search Algorithms
Breadth-First Search
Breadth-first search is a simple strategy in which the root node is expanded first, then all
the successors of the root node are expanded next, then their successors, and so on. In general, all
the nodes are expanded at a given depth in the search tree before any nodes at the next level are
expanded.
Breadth-first search is an instance of the general graph-search algorithm in which the
shallowest unexpanded node is chosen for expansion. This is achieved very simply by using a FIFO
queue for the frontier. Thus, new nodes (which are always deeper than their parents) go to the back
of the queue, and old nodes, which are shallower than the new nodes, get expanded first. Thus,
breadth-first search always has the shallowest path to every node on the frontier.
All nodes in depth ‘d’ expanded before the nodes at depth ‘d+1’
5
Depth-limited search can be implemented as a simple modification to the general tree or graph-
search algorithm. Alternatively, it can be implemented as a simple recursive algorithm as shown in
Figure. Notice that depth-limited search can terminate with two kinds of failure: the standard failure
value indicates no solution; the cutoff value indicates no solution within the depth limit.
Iterative-Deepening Search
Iterative deepening search (or iterative deepening depth-first search) is a general strategy, often used
in combination with depth-first tree search, that finds the best depth limit. It does this by gradually
increasing the limit—first 0, then 1, then 2, and so on until a goal is found. This will occur when the
depth limit reaches d, the depth of the shallowest goal node. The algorithm is shown in Figure.
Iterative deepening combines the benefits of depth-first and breadth-first search. Like depth-
first search, its memory requirements are modest: O(bd) to be precise. Like breadth-first search, it is
complete when the branching factor is finite and optimal when the path cost is a nondecreasing
function of the depth of the node. The expansion of states is like BFS except some states are
expanded multiple times. IDS is preferred for large search space and when the depth of the solution
is unknown.
10
For example, if a problem has solution depth d=6, and each direction runs breadth-first
search one node at a time, then in the worst case the two searches meet when they have generated
all of the nodes at depth 3. For b=10, this means a total of 2,220 node generations, compared with
1,111,110 for a standard breadth-first search. Thus, the time complexity of bidirectional search
using breadth-first searches in both directions is O(bd/2). The space complexity is also O(bd/2).
Let us see how this works for route-finding problems in Romania; we use the straight line distance
heuristic, which we will call hSLD. If the goal is Bucharest, we need to know the straight-line
distances to Bucharest, which are shown in Figure. For example, hSLD(In(Arad))=366.
The following Figure shows the progress of a greedy best-first search using hSLD to find a
path from Arad to Bucharest. The first node to be expanded from Arad will be Sibiu because it is
12
Performance Analysis
• Time and space complexity – O(bm)
• Optimality – no
• Completeness - no
13
14
A-S-R-P-
Performance Analysis
• Time complexity – depends on heuristic function and admissible heuristic value
• space complexity – O(bm)
• Optimality – yes (locally finite graphs)
• Completeness – yes (locally finite graphs)
HEURISTIC FUNCTIONS
The 8-puzzle was one of the earliest heuristic search problems. The object of the puzzle is to slide
the tiles horizontally or vertically into the empty space until the configuration matches the goal
configuration. The average solution cost for a randomly generated 8-puzzle instance is about 22
steps. The branching factor is about 3.
To understand local search, we find it useful to consider the state-space landscape (as in Figure). A
landscape has both “location” (defined by the state) and “elevation” (defined by the value of the
heuristic cost function or objective function). If elevation corresponds to cost, then the aim is to find
the lowest valley—a global minimum; if elevation corresponds to an objective function, then the
aim is to find the highest peak—a global maximum. Local search algorithms explore this
landscape. A complete local search algorithm always finds a goal if one exists; an optimal
algorithm always finds a global minimum/maximum.
Hill Climbing
The hill-climbing search algorithm (steepest-ascent version) is shown in Figure. It is
simply a loop that continually moves in the direction of increasing value ie, uphill. It terminates
when it reaches a “peak” where no neighbor has a higher value. The algorithm does not maintain a
search tree, so the data structure for the current node need only record. Hill climbing does not look
ahead beyond the immediate neighbors of the current state.
16
Prepared by S.Surendhar AP/CSE
To illustrate hill climbing, we will use the 8-queens problem. Local search algorithms typically use
a complete-state formulation, where each state has 8 queens on the board, one per column. The
successors of a state are all possible states generated by moving a single queen to another square in
the same column.
The heuristic cost function h is the number of pairs of queens that are attacking each other,
either directly or indirectly. The global minimum of this function is zero, which occurs only at
perfect solutions. The following figure shows a state with h=17. The figure also shows the values
of all its successors, with the best successors having h=12. Hill-climbing algorithms typically
choose randomly among the set of best successors if there is more than one. Hill climbing is
sometimes called greedy local search because it grabs a good neighbor state without thinking
ahead about where to go next.
For example, from the state in Figure (a), it takes just five steps to reach the state in Figure (b),
which has h=1 and is very nearly a solution. Unfortunately, hill climbing often gets stuck for the
following reasons:
Local maxima: a local maximum is a peak that is higher than each of its neighboring states but
lower than the global maximum.
Ridges: a ridge is shown in Figure. Ridges result in a sequence of local maxima that is very difficult
for greedy algorithms to navigate.
Plateaux: a plateau is a flat area of the state-space landscape. It can be a flat local maximum, from
which no uphill exit exists, or a shoulder, from which progress is possible.
Many variants of hill climbing have been invented. Stochastic hill climbing chooses at
random from among the uphill moves; the probability of selection can vary with the steepness of the
uphill move. First-choice hill climbing implements stochastic hill climbing by generating
17
Simulated Annealing
A hill-climbing algorithm that never makes “downhill” moves toward states with lower
value (or higher cost) is guaranteed to be incomplete, because it can get stuck on a local maximum.
In contrast, a purely random walk—that is, moving to a successor chosen uniformly at random from
the set of successors—is complete but extremely inefficient. Therefore, it seems reasonable to try to
combine hill climbing with a random walk in some way that yields both efficiency and
completeness. Simulated annealing is such an algorithm. To explain simulated annealing, we
switch our point of view from hill climbing to gradient descent (i.e., minimizing cost) and imagine
the task of getting a ping-pong ball into the deepest crevice in a bumpy surface
The innermost loop of the simulated-annealing algorithm is quite similar to hill climbing.
Instead of picking the best move, however, it picks a random move. If the move improves the
situation, it is always accepted. Otherwise, the algorithm accepts the move with some probability
less than 1. The probability decreases exponentially with the “badness” of the move—the amount
ΔE by which the evaluation is worsened. The probability also decreases as the “temperature” T goes
down: “bad” moves are more likely to be allowed at the start when T is high, and they become more
unlikely as T decreases. If the schedule lowers T slowly enough, the algorithm will find a global
optimum with probability approaching 1.
18
Prepared by S.Surendhar AP/CSE
a random-restart search, each search process runs independently of the others. In a local beam
search, useful information is passed among the parallel search threads. In effect, the states that
generate the best successors say to the others, “Come over here, the grass is greener!” The algorithm
quickly abandons unfruitful searches and moves its resources to where the most progress is being
made.
In its simplest form, local beam search can suffer from a lack of diversity among the k
states—they can quickly become concentrated in a small region of the state space, making the
search little more than an expensive version of hill climbing.
Genetic Algorithms
A genetic algorithm (or GA) is a variant of stochastic beam search in which successor states are
generated by combining two parent states rather than by modifying a single state.
Like beam searches, GAs begin with a set of k randomly generated states, called the
population. Each state, or individual, is represented as a string over a finite alphabet—most
commonly, a string of 0s and 1s. For example, an 8-queens state must specify the positions of 8
queens, each in a column of 8 squares, and so requires 8× log2 8=24 bits. Alternatively, the state
could be represented as 8 digits, each in the range from 1 to 8.
Figure (a) shows a population of four 8-digit strings representing 8-queens states. The
production of the next generation of states is shown in Figure (b)–(e). In (b), each state is rated by
the objective function, or (in GA terminology) the fitness function. A fitness function should return
higher values for better states, so, for the 8-queens problem we use the number of nonattacking
pairs of queens, which has a value of 28 for a solution. The values of the four states are 24, 23, 20,
and 11.
In (c), two pairs are selected at random for reproduction, in accordance with the probabilities
in (b). Notice that one individual is selected twice and one not at all. For each pair to be mated, a
crossover point is chosen randomly from the positions in the string. In Figure, the crossover points
are after the third digit in the first pair and after the fifth digit in the second pair. In (d), the offspring
themselves are created by crossing over the parent strings at the crossover point.
The 8-queens states involved in this reproduction step are shown in Figure. The example
shows that when two parent states are quite different, the crossover operation can produce a state
that is a long way from either parent state. It is often the case that the population is quite diverse
early on in the process, so crossover (like simulated annealing) frequently takes large steps in the
state space early in the search process and smaller steps later on when most individuals are quite
similar.
Finally, in (e), each location is subject to random mutation with a small independent
probability. One digit was mutated in the first, third, and fourth offspring. In the 8-queens problem,
19
Prepared by S.Surendhar AP/CSE
this corresponds to choosing a queen at random and moving it to a random square in its column. An
algorithm that implements all these steps are given below.
20
Prepared by S.Surendhar AP/CSE
{2, 4, 6, 8}—the agent now has more information! Furthermore, the action sequence [Right,Suck]
will always end up in one of the states {4, 8}. Finally, the sequence [Right,Suck,Left,Suck] is
guaranteed to reach the goal state no matter what the start state
It is instructive to see how the belief-state search problem is constructed. Suppose the underlying
physical problem P is defined by ACTIONSP, RESULTP, GOAL-TESTP, and STEP-COSTP . Then
we can define the corresponding sensorless problem as follows:
• Belief states: The entire belief-state space contains every possible set of physical states.
If P has N states, then the sensorless problem has up to 2N states, although many may
be unreachable from the initial state.
• Initial state: Typically the set of all states in P, although in some cases the agent will
have more knowledge than this.
• Actions: This is slightly tricky. Suppose the agent is in belief state b={s1, s2}, but
ACTIONSP (s1) !=ACTIONSP (s2); then the agent is unsure of which actions are legal.
If we assume that illegal actions have no effect on the environment, then it is safe to take the
union of all the actions in any of the physical states in the current belief state b:
On the other hand, if an illegal action might be the end of the world, it is safer to allow only the
intersection, that is, the set of actions legal in all the states. For the vacuum world, every state
has the same legal actions, so both methods give the same result.
• Transition model: The agent doesn’t know which state in the belief state is the right one; so
as far as it knows, it might get to any of the states resulting from applying the action to one
of the physical states in the belief state. For deterministic actions, the set of states that might
be reached is
The process of generating the new belief state after the action is called the prediction step;
the notation bl = PREDICTP (b, a) will come in handy.
• Goal test: The agent wants a plan that is sure to work, which means that a belief state
satisfies the goal only if all the physical states in it satisfy GOAL-TESTP . The agent may
accidentally achieve the goal earlier, but it won’t know that it has done so.
• Path cost: This is also tricky. If the same action can have different costs in different states,
then the cost of taking an action in a given belief state could be one of several values.
21
• The observation prediction stage determines the set of percepts o that could be observed
in the predicted belief state:
• The update stage determines, for each possible percept, the belief state that would result
from the percept. The new belief state bo is just the set of states in ˆb that could have
produced the percept:
Putting these three stages together, we obtain the possible belief states resulting from a given
action and the subsequent possible percepts:
Figure 4.16 shows part of the search tree for the local-sensing vacuum world, assuming an initial
percept [A, Dirty]. The solution is the conditional plan
[Suck, Right, if Bstate ={6} then Suck else [ ]]
Because we supplied a belief-state problem to the AND–OR search algorithm, it returned a
conditional plan that tests the belief state rather than the actual state. This is as it should be: in a
partially observable environment the agent won’t be able to execute a solution that requires testing
the actual state.
As in the case of standard search algorithms applied to sensorless problems, the AND– OR
search algorithm treats belief states as black boxes, just like any other states. One can improve on
this by checking for previously generated belief states that are subsets or supersets of the current
state, just as for sensorless problems. One can also derive incremental search algorithms, analogous
to those described for sensorless problems, that provide substantial speedups over the black-box
approach.
Here we will show an example in a discrete environment with deterministic sensors and
nondeterministic actions. The example concerns a robot with the task of localization: working out
where it is, given a map of the world and a sequence of percepts and actions. Our robot is placed in
the maze-like environment of Figure 4.18. The robot is equipped with four sonar sensors that tell
whether there is an obstacle—the outer wall or a black square in the figure—in each of the four
compass directions. We assume that the sensors give perfectly correct data, and that the robot has a
correct map of the enviornment. But unfortunately the robot’s navigational system is broken, so
when it executes a Move action, it moves randomly to one of the adjacent
squares. The robot’s task is to determine its current location.
Suppose the robot has just been switched on, so it does not know where it is. Thus its initial belief
state b consists of the set of all locations. The the robot receives the percept NSW, meaning there are
obstacles to the north, west, and south, and does an update using the equation bo =UPDATE(b),
27
Prepared by S.Surendhar AP/CSE 28
Constraint Propagation : Inference in CSP
In CSPs there is a choice: an algorithm can search (choose a new variable assignment from several
possibilities) or do a specific type of inference called constraint propagation: using the
constraints to reduce the number of legal values for a variable, which in turn can reduce the legal
values for another variable, and so on.
The key idea is local consistency. If we treat each variable as a node in a graph (see Figure 6.1(b))
and each binary constraint as an arc, then the process of enforcing local consistency in each part of
the graph causes inconsistent values to be eliminated throughout the graph. There are different
types of local consistency, which are as follows.
Node consistency
A single variable (corresponding to a node in the CSP network) is node-consistent if all the values
in the variable’s domain satisfy the variable’s unary constraints. For example, in the variant of the
Australia map-coloring problem (Figure 6.1) where South Australians dislike green, the variable SA
starts with domain {red , green, blue}, and we can make it node consistent by eliminating green,
leaving SA with the reduced domain {red , blue}. We say that a network is node-consistent if every
variable in the network is node-consistent. It is always possible to eliminate all the unary constraints
in a CSP by running node
consistency.
Arc consistency
A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s binary
constraints. More formally, Xi is arc-consistent with respect to another variable Xj if for every value
in the current domain Di there is some value in the domain Dj that satisfies the binary constraint on
the arc (Xi,Xj). A network is arc-consistent if every variable is arc consistent with every other
variable. For example, consider the constraint Y = X 2 where the domain of both X and Y is the set
of digits. We can write this constraint explicitly as {(X, Y ), {(0, 0), (1, 1), (2, 4), (3, 9)) })
To make X arc-consistent with respect to Y , we reduce X’s domain to {0, 1, 2, 3}. If we also make
Y arc-consistent with respect to X, then Y ’s domain becomes {0, 1, 4, 9} and the whole CSP is arc-
consistent.
On the other hand, arc consistency can do nothing for the Australia map-coloring problem. Consider
the following inequality constraint on (SA,WA):
{(red , green), (red , blue), (green, red ), (green, blue), (blue, red ), (blue, green)} .
29
No matter what value you choose for SA (or for WA), there is a valid value for the other variable.
So applying arc consistency has no effect on the domains of either variable.
The most popular algorithm for arc consistency is called AC-3 (see Figure 6.3). To make every
variable arc-consistent, the AC-3 algorithm maintains a queue of arcs to consider. Initially, the
queue contains all the arcs in the CSP. AC-3 then pops off an arbitrary arc (Xi,Xj) from the queue
and makes Xi arc-consistent with respect to Xj . If this leaves Di unchanged, the algorithm just
moves on to the next arc. But if this revises Di (makes the domain smaller), then we add to the
queue all arcs (Xk,Xi) where Xk is a neighbor of Xi. We need to do that because the change in Di
might enable further reductions in the domains of Dk, even if we have previously considered Xk. If
Di is revised down to nothing, then we know the whole CSP has no consistent solution, and AC-3
can immediately return failure. Otherwise, we keep checking, trying to remove values from the
domains of variables until no more arcs are in the queue. At that point, we are left with a CSP that is
equivalent to the original CSP—they both have the same solutions—but the arc-consistent CSP will
in most cases be faster to search because its variables have smaller domains
Path Consistency
Path consistency tightens the binary constraints by using implicit constraints that are inferred by
looking at triples of variables.
A two-variable set {Xi,Xj} is path-consistent with respect to a third variable Xm if, for every
assignment {Xi = a,Xj = b} consistent with the constraints on {Xi,Xj}, there is an assignment to
Xm that satisfies the constraints on {Xi,Xm} and {Xm,Xj}. This is called path consistency because
one can think of it as looking at a path from Xi to Xj with Xm in the middle.
Let’s see how path consistency fares in coloring the Australia map with two colors. We will make
the set {WA,SA} path consistent with respect to NT. We start by enumerating the consistent
assignments to the set. In this case, there are only two: {WA = red ,SA = blue}and {WA = blue,SA
= red}. We can see that with both of these assignments NT can be neither red nor blue (because it
would conflict with either WA or SA). Because there is no valid choice for NT, we eliminate both
31
The backtracking algorithm contains the line var ←SELECT-UNASSIGNED-VARIABLE(csp) .
The simplest strategy for SELECT-UNASSIGNED-VARIABLE is to choose the next unassigned
variable in order, {X1,X2, . . .}. This static variable ordering seldom results in the most efficient
search. For example, after the assignments for WA=red and NT =green in Figure 6.6, there is only
one possible value for SA, so it makes sense to assign SA=blue next rather than assigning Q. In fact,
after SA is assigned, the choices for Q, NSW, and V are all forced. This intuitive idea—choosing
the variable with the fewest “legal” values—is called the minimum remaining-values (MRV)
heuristic. It also has been called the “most constrained variable” or “fail-first” heuristic, the latter
because it picks a variable that is most likely to cause a failure soon, thereby pruning the search tree.
In Figure 6.1, SA is the variable with highest degree, 5; the other variables have degree 2 or 3,
except for T, which has degree 0. In fact, once SA is chosen, applying the degree heuristic solves
the problem without any false steps—you can choose any consistent color at each choice point and
still arrive at a solution with no backtracking.
Once a variable has been selected, the algorithm must decide on the order in which to examine its
values. For this, the least-constraining-value heuristic can be effective in some cases. It prefers
the value that rules out the fewest choices for the neighboring variables in the constraint graph. For
example, suppose that in Figure 6.1 we have generated the partial assignment with WA=red and NT
=green and that our next choice is for Q. Blue would be a bad choice because it eliminates the last
legal value left for Q’s neighbor, SA. The least-constraining-value heuristic therefore prefers red to
blue. In general, the heuristic is trying to leave the maximum flexibility for subsequent variable
assignments. Of course, if we are trying to find all the solutions to a problem, not just the first one,
then the ordering does not matter because we have to consider every value anyway. The same holds
if there are no solutions to the problem.
GAME PLAYING
Early work on AI focused on formal tasks such as game playing and theorem proving. In
game playing to select the next state, search technique is used. There were two reasons that
games appeared to be a good domain in which to explore machine intelligence.
(i) Games provide a structured task in which it is easy to measure success or failure
(ii) They did not require large amount of knowledge. They apply a straight forward
search and provide a solution from the starting state to a winning state
In generate and test procedure, the generator generates the entire proposed solutions and the
tester then evaluates. To improve the effectiveness of the test procedure two things can be
done.
Improve the generate procedure so that only good moves are generated.
Improve the test procedure, so that the best moves will be recognized and explored first.
Two types of generator
(i) Legal move generator – All the possible moves will be generated
(ii) Plausible move generator – Some smaller number of promising moves are generated.
32
Prepared by S.Surendhar AP/CSE
A game can be formally defined as a kind of search problem with the following elements:
• S0: The initial state, which specifies how the game is set up at the start.
• PLAYER(s): Defines which player has the move in a state.
• ACTIONS(s): Returns the set of legal moves in a state.
• RESULT(s, a): The transition model, which defines the result of a move.
• TERMINAL-TEST(s): A terminal test, which is true when the game is over and false
otherwise. States where the game has ended are called terminal states.
• UTILITY(s, p): A utility function (also called an objective function or payoff function), defines
the final numeric value for a game that ends in terminal state s for a player p. In chess, the outcome
is a win, loss, or draw, with values +1, 0, or 1/ 2 . Some games have a wider variety of possible
outcomes; the payoffs in backgammon range from 0 to +192.
Figure 5.1 shows part of the game tree for tic-tac-toe (noughts and crosses). From the initial state,
MAX has
nine possible moves. Play alternates between MAX’s placing an X and MIN’s placing an O until we
reach leaf nodes corresponding to terminal states such that one player has three in a row or all the
squares are filled. The number on each leaf node indicates the utility value of the terminal state from
the point of view of MAX; high values are assumed to be good for MAX and bad for MIN (which is
how the players get their names).
33
Prepared by S.Surendhar AP/CSE
Given a game tree, the optimal strategy can be determined from the minimax value of each node,
which we write as MINIMAX(n). The minimax value of a node is the utility (for MAX) of being in
the corresponding state, assuming that both players play optimally from there to the end of the
game. Obviously, the minimax value of a terminal state is just its utility. Furthermore, given a
choice, MAX prefers to move to a state of maximum value, whereas MIN prefers a state of
minimum value. So we have the following:
Let us apply these definitions to the game tree in Figure 5.2. The terminal nodes on the bottom level
get their utility values from the game’s UTILITY function. The first MIN node, labeled B, has three
successor states with values 3, 12, and 8, so its minimax value is 3. Similarly, the other two MIN
nodes have minimax value 2. The root node is a MAX node; its successor states have minimax
values 3, 2, and 2; so it has a minimax value of 3. We can also identify the minimax decision at the
root: action a1 is the optimal choice for MAX because it leads to the state with the highest minimax
value.
34
Let us examine how to extend the minimax idea to multiplayer games. we need to replace the single
value for each node with a vector of values. For example, in a three-player game with players A, B,
and C, a vector _vA, vB, vC_ is associated with each node. For terminal states, this vector gives the
utility of the state from each player’s viewpoint. (In two-player, zero-sum games, the two-element
vector can be reduced to a single value because the values are always opposite.) The simplest way
to implement this is to have the UTILITY function return a vector of utilities.
Consider the node marked X in the game tree shown in Figure 5.4. In that state, player C chooses
what to do. The two choices lead to terminal states with utility vectors {vA =1, vB =2, vC =6} and
{vA =4, vB =2, vC =3}. Since 6 is bigger than 3, C should choose the first move. This means that if
state X is reached, subsequent play will lead to a terminal state with utilities {vA =1, vB =2, vC
=6}. Hence, the backed-up value of X is this vector. The backed-up value of a node n is always the
utility vector of the successor state with the highest value for the player choosing at n. Anyone who
plays multiplayer games, such as diplomacy, quickly becomes aware that much more is going on
than in two-player games. Multiplayer games usually involve alliances, whether formal or informal,
among the players.
Alpha– beta pruning, when applied to a standard minimax tree, it returns the same move as
minimax would, but prunes away branches that cannot possibly influence the final decision
Let the two unevaluated successors of node C in Figure 5.5 have values x and y. Then the value of
the root node is given by
MINIMAX(root ) = max(min(3, 12, 8), min(2, x, y), min(14, 5, 2))
= max(3, min(2, x, y), 2)
= max(3, z, 2) where z = min(2, x, y) ≤ 2
= 3.
In other words, the value of the root and hence the minimax decision are independent of the
values of the pruned leaves x and y. Alpha–beta pruning can be applied to trees of any depth, and it
is often possible to prune entire subtrees rather than just leaves. The general principle is this:
consider a node n somewhere in the tree (see Figure 5.6), such that Player has a choice of moving to
that node. If Player has a better choice m either at the parent node of n or at any choice point further
up, then n will never be reached in actual play.
Prepared by S.Surendhar AP/CSE 36
Alpha–beta pruning gets its name from the following two parameters that describe bounds on the
backed-up values that appear anywhere along the path.
α = the value of the best (i.e., highest-value) choice we have found so far at any choice point
along the path for MAX.
β = the value of the best (i.e., lowest-value) choice we have found so far at any choice point
along the path for MIN.
Alpha–beta search updates the values of α and β as it goes along and prunes the remaining branches
at a node (i.e., terminates the recursive call) as soon as the value of the current node is known to be
worse than the current α or β value for MAX or MIN, respectively. The complete algorithm is given
in Figure 5.7.
Although White knows what his or her own legal moves are, White does not know what Black is
going to roll and thus does not know what Black’s legal moves will be. That means White cannot
construct a standard game tree of the sort we saw in chess and tic-tac-toe. A game tree in
backgammon must include chance nodes in addition to MAX and MIN nodes. Chance nodes are
shown as circles in Figure 5.11
Terminal nodes and MAX and MIN nodes (for which the dice roll is known) work exactly the same
way as before. For chance nodes we compute the expected value, which is the sum of the value over
all outcomes, weighted by the probability of each chance action
39