UNIT3 Aies Reg 2023

1
UNIT - III CONSTRAINT SATISFACTION PROBLEMS AND
GAME THEORY
Constraint Satisfaction Problems – Constraint Propagation – Backtracking

Search for CSP – Local Search for CSP –Structure of CSP–Game Playing –
Mini–Max Algorithm – Optimal Decisions in Games – Alpha–Beta Search –
Cutting of Search – Forward Pruning –Monte– Carlo Search for Games –
Stochastic Games, Partially Observable Game, Card Game.
1. CONSTRAINT SATISFACTION PROBLEMS
A Constraint Satisfaction Problem is characterized by:
 a set of variables {x1, x2, .., xn},
 for each variable xi a domain Di with the possible values for that variable, and
 a set of constraints, i.e. relations, that are assumed to hold between the
values of the variables. [These relations can be given intentionally, i.e. as a
formula, or extensionally, i.e. as a set, or procedurally, i.e. with an
appropriate generating or recognising function.] We will only consider
constraints involving one or two variables.
The constraint satisfaction problem is to find, for each i from 1 to n, a value in Di
for xi so that all constraints are satisfied. A CS problem is usually represented as
an undirected graph, called Constraint Graph where the nodes are the variables
and the edges are the binary constraints. Unary constraints can be disposed of by
just redefining the domains to contain only the values that satisfy all the unary
constraints. Higher order constraints are represented by hyperarcs. In the following
we restrict our attention to the case of unary and binary constraints.
Consistency Based Algorithms use information from the constraints to reduce the
search space as early in the search as it is possible
This problem requires a lot of reasoning.
Time complexity of the problem is more as concerned to the other problems.
This problem can also be solved by the evolutionary approach and
mutation operations. This problem is dependent upon some
constraints which are necessary part of the problem. Various complex
Example
Figure 2.15 (a) Principle states and territories of Australia. Coloring this map can be viewed as a
constraint satisfaction problem. The goal is to assign colors to each region so that no
neighboring regions have the same color.

Figure 2.15 (b) The map coloring problem represented as a constraint graph.
problems can also be solved by this technique.
2
Initial state : the empty assignment {},in which all variables are unassigned.
Successor function : a value can be assigned to any unassigned
3
variable,provided that it does not conflict with previously assigned variables.

Goal test : the current
assignment is complete.
Path cost : a constant
cost(E.g.,1) for every step.
Every solution must be a complete assignment and therefore appears at depth

n if there are n variables.
Varieties of constraints :
(i) Unary constraints involve a single variable.
Example: SA # green
(ii) Binary constraints involve pairs of variables.
Example: SA # WA
(iii) Higher order constraints involve
3 or more variables. Example:
cryptarithmetic puzzles.
(iv) Absolute constraints are the constraints, which rules out a potential solution when
they are violated
(v) Preference constraints are the constraints indicating which solutions are preferred
2. CONSTRAINT PROPAGATION: INFERENCE IN CSPS

In regular state-space search, an algorithm can do only one thing: search.
In CSPs there is a choice: an algorithm can search (choose a new variable
assignment from several
possibilities) or do a specific type of inference called constraint propagation:
using the constraints to reduce the number of legal values for a variable, which in
turn can reduce the legal values for another variable, and so on.
Constraint propagation may be intertwined with search, or it may be done as a

preprocessing step, before search starts. Sometimes this preprocessing can solve
the whole problem, so no search is required at all. The key idea is local
consistency. If we treat each variable as a node in a and each binary constraint as
an arc, then the process of enforcing local consistency in each part of the graph
causes inconsistent values to be eliminated throughout the graph. There are
different types of local consistency, which we now cover in turn.
So far our search algorithm considers the constraints on a

variable only at the time that theVariable is chosen by
SELECT-VNASSIGNED-VARIABLE.
But by looking at some of the constraints earlier in the search, or even before
the search has started, we can drastically reduce the search space.
4
Forward checking
 One way to make better use of constraints during search is called forward
checking.
 Whenever a variable X is assigned, the forward checking process looks at
each unas signed variable Y that is connected to X by a constraint and
deletes from Y 's domain any value that is inconsistent with the value
chosen for X.
 The following figure shows the progress of a map-coloring search with
forward checking.
Although forward checking detects many inconsistencies, it does not detect all of
them.
 Constraint propagation is the general term for propagating the
implications of a constraint on onevariable onto other variables.
Node consistency
A single variable (corresponding to a node in the CSP network) is node-consistent
if all the values in the variable’s domain satisfy the variable’s unary constraints.
For example, in the variant of the Australia map-coloring problem where South
Australians dislike green, the variable SA starts with
domain {red , green, blue}, and we can make it node consistent by eliminating
green, leaving SA with the reduced domain {red , blue}.We say that a network is
node-consistent if every variable in the network is node-consistent.
S
5
K-Consistency
Local Search for CSPs
The key idea of CSP constraint propagation is local consistency. If we treat each
variable as a nodein a graph and each binary constraint as an arc, then the process
of enforcing local consistency in eachpart of the graph causes inconsistent values
to be eliminated throughout the graph. There are different types of local
consistency, which we now cover in turn.
The Structure of Problems :Problem Structure
Independent Sub problems

6
Tree-Structured CSPs
The key idea is local consistency. If we treat each variable as a node in a graph
and each binary constraint as an arc, then the process of enforcing local
consistency in each part of the graph causes inconsistent values to be eliminated
throughout the graph. There are different types of local
consistency, which we now cover in turn.
Global constraints
Global constraint is one involving an arbitrary number of variables (but not
necessarily all variables). Global constraints occur frequently in real problems and
can be handled by special-purpose algorithms that are more efficient than the
general-purpose methods described so far. For example, the Alldiff constraint says
that all the variables involved must have distinct values (as in the cryptarithmetic
problem above and Sudoku puzzles below). One simple form of inconsistency
detection for Alldiff constraints works as follows:
7
if m variables are involved in the constraint, and if they have n possible

distinct values altogether, and m>n, then the constraint cannot be satisfied.
This leads to the following simple algorithm:
First, remove any variable in the constraint that has a singleton domain, and delete
that variable’s value from the domains of the remaining variables.
Repeat as long as there are singleton variables. If at any point an empty domain
is produced or there are more variables than domain values left, then an
inconsistency has been detected.
Resource constraint
Another important higher-order constraint is the resource constraint, sometimes
called the atmost constraint. For example, in a scheduling problem, let P denote the
numbers of personnel assigned to each of four tasks. The constraint that no more
than 10 personnel are assigned in total is written as Atmost (10,P1,P2,P3,P14,...,P).
Instead, domains are represented by upper and lower bounds and are managed by
bounds propagation. For example, in an airline-scheduling problem, let’s
suppose there are two flights, F 1and F2, for which the planes have capacities 165
and 385, respectively. The initial domains for thenumbers of passengers on each
flight are then D1=[0, 165] and D2=[0, 385] .
Sudoku example
The popular Sudoku puzzle has introduced millions of people to constraint
satisfaction problems, although they may not recognize it. A Sudoku board
consists of 81 squares, some of which are initially
filled with digits from 1 to 9. The puzzle is to fill in all the remaining squares such
that no digit appears twice in any row, column, or 3 ×3 box (see Figure 6.4). A
row, column, or box is called a unit.
The Sudoku puzzles that are printed in newspapers and puzzle books have the
property that there is exactly one solution. Although some can be tricky to solve
by hand, taking tens of minutes, even the hardest Sudoku problems yield to a
CSP solver in less than 0.1 second. Sudoku puzzle can be considered a CSP
with 81 variables, one for each square. We use the variable names A1 through
A9 for the top row (left to right), down to I1 through I9 for the bottom row. The
empty squares have the domain {1, 2, 3, 4, 5, 6, 7, 8, 9} and the pre- filled
squares have a domain consisting of a single value.
8
In addition, there are 27 different
3. BACKTRACKING SEARCH
The term backtracking search is used for depth-first search that chooses
values for one variable ata time and backtracks when a variable has no legal
values left to assign.
9
The term backtracking search is used for a depth-first search that chooses values
for one variable at
a time and backtracks when a variable has no legal values left to assign. It
repeatedly chooses an unassigned variable, and then tries all values in the domain
of that variable in turn, trying to find a solution. If an inconsistency is detected,
then BACKTRACK returns failure, causing the previous call to try another value.
Part of the search tree for the Australia problem is shown in Figure 6.6, where we
have assigned variables in the order WA, NT,Q,. Because the representation of
CSPs is tandardized,
there is no need to supply BACKTRACKING-SEARCH with a domain-specific
initial state, action function, transition model, or goal test.
Notice that BACKTRACKING-SEARCH keeps only a single

representation of a state and alters that representation rather than
creating new ones.
10
It turns out that we can solve CSPs efficiently without such domain-specific
knowledge. Instead, we can add some sophistication to the unspecified
functions, using them to address the following questions:
1. Which variable should be assigned next (SELECT-UNASSIGNED-
VARIABLE), and in what order should its values be tried (ORDER-
DOMAIN-VALUES)?
2. What inferences should be performed at each step in the search (INFERENCE)?

3. When the search arrives at an assignment that violates a constraint, can the
search avoid repeating this failure?
GAME THEORY
There were two reasons that games appeared to be a good domain in which to
explore machine intelligence:
AI Adversarial search: Adversarial search is a game-playing technique where the agents

are surrounded by a competitive environment. A conflicting goal is given to the agents
(multiagent). These agents compete with one another and try to defeat one another in order
to win the game.
Such conflicting goals give rise to the adversarial search. Here, game-playing means
discussing those games where human intelligence and logic factor is used, excluding other
factors such as luck factor. Tic- tac-toe, chess, checkers, etc., are such type of games where
11
no luck factor works, only mind works.

Mathematically, this search is based on the concept of ‘Game Theory.’
According to game theory, a game is played between two players. To complete the
game, one has to win the game and the other looses automatically.’
Techniques required to get the best optimal solution
There is always a need to choose those algorithms which provide the best optimal solution
in a limited time. So, we use the following techniques which could fulfill our requirements:
• Pruning: A technique which allows ignoring the unwanted portions of a search
tree which make no difference in its final result.
• Heuristic Evaluation Function: It allows to approximate the cost value at each level
of the search tree, before reaching the goal node.
Elements of Game Playing search
To play a game, we use a game tree to know all the possible choices and to pick the best
one out. There are following elements of a game-playing:
• S0: It is the initial state from where a game begins.
• PLAYER (s): It defines which player is having the current turn to make a move in the
state.
• ACTIONS (s): It defines the set of legal moves to be used in a state.
• RESULT (s, a): It is a transition model which defines the result of a move.
• TERMINAL-TEST (s): It defines that the game has ended and returns true.
• UTILITY (s,p): It defines the final value with which the game has ended. This
function is also known as Objective function or Payoff function. The price which the
winner will get i.e.
• (-1): If the PLAYER loses.
• (+1): If the PLAYER wins.
• (0): If there is a draw between the PLAYERS.
For example, in chess, tic-tac-toe, we have two or three possible outcomes. Either to win, to
lose, or to draw the match with values +1,-1 or 0.
Let’s understand the working of the elements with the help of a game tree designed for tic-
tac-toe. Here, the node represents the game state and edges represent the moves taken by the
players.
12
A game-tree for tic-tac-toe

• INITIAL STATE (S0): The top node in the game-tree represents the initial state in
the tree and shows all the possible choice to pick out one.
• PLAYER (s): There are two players, MAX and MIN. MAX begins the game by
picking one best move and place X in the empty square box.
• ACTIONS (s): Both the players can make moves in the empty boxes chance by chance.
• RESULT (s, a): The moves made by MIN and MAX will decide the outcome of the
game.
• TERMINAL-TEST(s): When all the empty boxes will be filled, it will be the
terminating state of the game.
• UTILITY: At the end, we will get to know who wins: MAX or MIN, and accordingly,
the price will be given to them.
Types of algorithms in Adversarial search
In a normal search, we follow a sequence of actions to reach the goal or to finish the game
optimally. But in an adversarial search, the result depends on the players which will decide
the result of the game. It is also obvious that the solution for the goal state will be an
optimal solution because the player will try to win the game with the shortest path and
under limited time.
There are following types of adversarial search:
• Minmax Algorithm
• Alpha-beta Pruning.
Note: The types of adversarial search are discussed in the next section
• They provide a structured task in which it is very easy to measure success or

failure.
• They did not obviously require large amounts of knowledge. They were
thought to be solvable by straightforward search from the starting state to a
winning position
The first of these reasons remains valid and accounts for continued interest in the area of
game playing by machine.
Unfortunately, the second is not true for any but the simplest games.
For example, consider chess.
• The average branching factor is around 35.
• In an average game, each player might make 50 moves.
• So in order to examine the complete game tree, we would have to examine
100
35 positions. Thus it is clear that a program that simply does a straight forward
search of the game tree will not be able to select even its first move during the
lifetime of its opponent.
13
Some kinds of heuristic search procedure is necessary to improve the effectiveness of

a search-based problem-solving program two things can be done:
1. Improve the generate procedure so that only good moves (or paths) are
generated.
2. Improve the test procedure so that the best moves (or paths) will be
recognized and explored first.
As the number of legal moves available increases, it becomes increasingly important to
apply heuristics to select only those that have some kind of promise.
The ideal way to use a search procedure to find a solution to a problem is to generate
moves through the problem space until a goal state is reached.
14
In the context of game-playing programs, a goal state is one in which we win.

Unfortunately, for interesting games such as chess, it is not usually possible, even with a
good Plausible- move generator, to search until a goal state is found.
Static evaluation function

In order to choose the best move, the resulting positions must be compared to discover
which is most advantageous. This is done using a static evaluation function, which uses
whatever information it has to evaluate individual board positions by estimating how
likely they are to lead eventually to a win.
A very simple static evaluation function for chess based on piece advantage was proposed
by Turing
—simply add the values of black's pieces (B), the values of white's pieces (W), and then
compute the quotient W/B.
A more sophisticated approach was that taken in Samuel's checkers program, in which
the static evaluation function was a linear combination of several simple functions. Thus
the complete evaluation function had the form:
c1 x pieceadvantage + c2 x advancement + c3 x centercontrol
Unfortunately, deciding which moves have contributed to wins and which to losses is not
always easy. Suppose we make a very bad move, but then, because the opponent makes a
mistake, we ultimately win the game. We would not like to give credit for winning to
our mistake.
Two important knowledge-based components of a good game-playing program:

 a good plausible-move generator and
 a good static evaluation function.
They must both incorporate a great deal of knowledge about the particular game being
played. But unless these functions are perfect, we also need a search procedure that
makes it possible to look aheadas many moves as possible to see what may occur.
For a simple one-person game or puzzle, the A* algorithm can be used. It can be applied
to reason forward from the current state as far as possible in the time allowed. The
heuristic function h' can be applied at terminal nodes and used to propagate values back
up the search graph so that the best next move can be chosen. But because of their
adversarial nature, this procedure is inadequate for two- person games such as chess.
As values are passed back up, different assumptions must be made at levels where the
program chooses the move and at the alternating levels where the opponent chooses.
There are several ways that this can be done. The most commonly used method is the
minimax procedure.
An alternative approach is the B* algorithm which works on both standard problem-
solving trees and on game trees.
15
Optimal Decisions in Games:

In a normal search problem, the optimal solution would be a sequence of actions
leading to a goal state—a terminal state that is a win. In adversarial search, MIN has
something to say STRATEGY about it. MAX therefore must find a contingent
strategy, which specifies MAX’s move in the initial state, then MAX’s moves in the
states resulting from every possible response by
16
A two-ply game tree. The nodes are “MAX nodes,” in which it is MAX’s turn to
move, and the nodes are “MIN nodes.” The terminal nodes show the utility values for
MAX; the other nodes are labeled with their minimax values. MAX’s best move at the
root is a1, because it leads to the state with the highest minimax value, and MIN’s best
reply is b1, because it leads to the state with the lowest minimax value
Minmax Search
MINIMAX SEARCH PROCEDURE

The minimax search procedure is a depth-first, depth-limited search procedure.
The idea is to start at the current position and use the plausible-move generator to
generate the set of possible successor positions.
Static evaluation function is applied to the positions and simply choose the best
one. After doing so, the values are backed up to the starting position to represent
the evaluation of it.
The starting position is exactly as good for us as the position generated by the best move
we can make next. Here we assume that the static evaluation function returns large
values to indicate good situations for us, so our goal is to maximize the value of the
static evaluation function of the next board position. An example of this operation is
shown in figure 2.1. It assumes a static evaluation functions that returns values
ranging from -10 to 10, with 10 indicating a win for us, -10 a win for the opponent, and 0
an even match. Since our goal is to maximize the value of the heuristic function, we
choose to move to B. Backing B’s value up to A, we can conclude that A’s value is 8,
since we know we can move to a position with a value of 8.
One Ply Search

But since we know that the static evaluation function is not completely accurate, we
would like to carry the search farther ahead than one ply.
This could be very important, for example, in a chess game in which we are in the middle of
a piece exchange. After our move, the situation would appear to be very good, but, if we
17
look one move

ahead, we will see that one of our pieces also gets captured and so the situation is not as
favorable as it seemed.
Look ahead to see what will happen to each of the new game positions at the
next move
which will be made by the opponent.
Instead of applying the static evaluation function to each of the positions that we
just generated,
18
we apply the plausible-move generator, generating a set of successor positions

for each position.
If we wanted to stop here, at two-ply look ahead, we could apply the static
evaluation function to each of these positions, as shown in figure 2.2.
Figure 2: Two Ply Search
Figure 3: Backing Up the Values of a Two – Ply Search
But now we must take into account that the opponent gets to choose which successor
moves to make and thus which terminal value should be backed up to the next level.
Suppose we made move B. Then the opponent must choose among moves E,F, and G.
The opponent’s goal is to minimize the value of the evaluation function, so he or she
can be expected to choose move F. This means that if we make move B, the actual
position in which we will end up one move later is very bad for us. This is true even
though a possible configuration is that represented by node E, which is very good for us.
But since at this level we are not the ones to move, we will not get to choose it. Figure
2.3 shows the result of propagating the new values up the tree.
At the level representing the opponent’s choice, the minimum value was
chosen and backed up. At the level representing our choice, the maximum
value was chosen.
Once the values from the second ply are backed up, it becomes clear that the correct
move for us to make at the first level, given the information we have available, is C,
since there is nothing the opponent can do from there to produce a value worse than -2.
This process can be repeated for as many ply as time allows, and the more accurate
evaluations that are produced can be used to choose the correct move at the top level.
19
The alternation of maximizing at alternate ply when evaluations are being

pushed back up corresponds to the opposing strategies of the two players and
gives this method the name minimax.
Game can be formally defined as a search problem as below:

20
Types of games:
21
It is a straightforward recursive procedure that relies on two auxiliary procedures that

are specific to the game being played:
1. MOVEGEN(Position, Player)-The plausible-move generator, which
returns a list of nodes representing the move that can be made by Player in
Positions.
2. STATIC(Position, Player)- The static evaluation function, which returns a number
representing the goodness of position from the standpoint of player2.
22
As with any recursive program, a critical issue in the design of the MINIMAX procedure
is when to stop the recursion and simply call the static evaluation function. There are a
variety of factors that may influence this decision. They include:
 Has one side won?
 How many ply have we already explored?
 How promising is this path?
 How much time is left?
 How stable is the configuration?
DEEP-ENOUGH, which is assumed to evaluate all of these factors and to return TRUE
if the search should be stopped at the current level and FALSE otherwise. Our simple
implementation of DEEP- ENOUGH will take two parameters, Position and Depth. It
will ignore its Position parameter and simply return TRUE if its Depth parameter
exceeds a constant cutoff value.
One problem that arises in defining MINIMAX as a recursive procedure is that it needs
to return not one but two results:
 The backed-up value of the path it chooses.
 The path itself. We return the entire path even though probably only the first
element, representing the best move from the current position, is actually
needed.
We assume that MINIMAX returns a structure containing both results and that we have
two functions, VALUE and PATH, that extract the separate components.
Since we define the MINIMAX procedure as a recursive functions, we must also specify
how it is to be called initially. It takes three parameters, a board position, the current
depth of the search, and the player to move. So the initial call to compute the best move
from the position CURRENT should be
MINIMAX(CURRENT, 0,PLAYER-ONE) if PLAYER-
ONE is to move, or MINIMAX(CURRENT,0,PLAYER-
TWO) if PLAYER-
TWO is to move.
MINIMAX(Position, Depth, Player)

1. If DEEP-ENOUGH(Position, Depth), then
return the structure
VALUE=STATIC(Position, Player);
PATH=nil
This indicated that there is no path from this node and that its value is that determined by
the static evaluation functions.
2. Otherwise, generate one more ply of the tree by calling the function MOVE-
GEN(Position Player) and setting SUCCESSORS to the list it returns.
3. If SUCCESSORS is not empty, then there are no moves to be made, so return the
same structure that would have been that would have been returned if DEEP-
ENOUGH had returned true.
23
4. If SUCCESSORS is not empty, then examine each element in turn and keep track
of the best one. This is done as follows.
Initialize BEST-SCORE to the minimum value that STATIC can return. It will be
updated to reflect the best score that can be achieved by an element of SUCCESSORS.
For each element SUCC of SUCCESSORS, do the following:
(a) Set RESULT-SUCC TO
MINIMAX(SUCC, Depth+1, OPPOSITE(Player))3
This recursive call to MINIMAX will actually carry out the exploration of SUCC.
(b) Set NEW-VALUE to-VALUE(RESULT-SUCC).This will cause it to reflect
the merits of the position from the opposite perspective from that of the next
lower level.
24
(c) If NEW-VALUE > BEST-SCORE, Then we have found a successor that

is better than any that have been examined so far. Record this by doing the
following:
i. Set BEST-SCORE to NEW-VALUE.
ii. The best known path is now from CURRENT to SUCC and then on to the
appropriate path down from SUCC as determined by the recursive call to
MINIMAX, So set BEST-PATH to the result of attaching SUCC to the front of
PATH(RESULT-SUCC).
5. Now that all the successors have been examined, we know the value of Position as
well as which path to take form it. So return the structure.
VALUE
=BEST-
SCORE
PATH=
BEST-
PATH
When the initial call to MINIMAX returns, the best move from CURRENT is the first
element of PATH.
Optimal decisions in multiplayer games:
Many popular games allow more than two players.
Let us examine how to extend the minimax idea to multiplayer games.
First, we need to replace the single value for each node with a vector of values.
For terminal states, this vector gives the utility of the state from each player's viewpoint. (
In two-player, zero-sum games, the two-element vector can be reduced to a single value
because the values are always opposite.)
The first three plies of a game tree with three players (A, B, C). Each node is
labeled with values from the viewpoint of each player. The best move is marked at
the root
vector of the successor state with the highest value for the player choosing at n. Anyone
who plays multiplayer games, such as Diplomacy, quickly becomes aware that much
more ALLIANCE is going on than in two-player games.
Multiplayer games usually involve alliances, whether formal or informal, among
25
the players. Are alliances a natural consequence of optimal strategies for each
player in a multiplayer game? In this way, collaboration emerges from purely
selfish behavior.
If the game is not zero-sum, then collaboration can also occur with just two players.
OPTIMAL DECISIONS IN GAMES –ALPHA -BETA PRUNING

The MINMAX search procedure is slightly modified to handle both maximizing and
minimizing players. It is also necessary to modify the branch and bound strategy to
include two bounds, one for each of the players. This modified strategy is called alpha
beta pruning.
It requires the maintenance of two threshold values,
:
Alpha lower bound on the value that a maximizing node may ultimately be
assigned
Beta : upper bound on the value that a minimizing node may be assigned.
26
Figure .4 An Alpha Cutoff
After examining node F, we know that the opponent is guaranteed a score of -5 or less at
C (since the opponent is the minimizing player). But we also know that we are
guaranteed a score of 3 or greater at node A which we can achieve if we move to B.
Any other move that produces a score of less than 3 is worse than the move to B, and we
can ignore it. After examining only F we are sure that a move to C is worse (it will be
<=-5) regardless of the score of node G. Thus we need not bother to explore node G at
all. Of course, cutting out one node may not appear to justify the expense of keeping
track of the limits and checking them, but if we were exploring this tree to 6 ply, then we
would have eliminated not a single node but an entire tree 3 ply deep.
Figure 2.5 Alpha and Beta Cutoffs
To see how the two thresholds, alpha and beta can both be used, consider the example shown
in figure
2.5. In searching this tree the entire sub tree headed by B is searched, and we discover
that at A we can expect a score of at least 3.
When this alpha value is passed down to F, it will enable us to skip the exploration of
L. After K is examined, we see that I is guaranteed a maximum score of 0, which means
that F is guaranteed a minimum of 0. But this is less than alpha's value of 3, so no more
branches of I need to be considered. The maximizing player already knows not to choose
to move to C and then to I since, if that move is made, the resulting score will be no
better than 0 and a score of 3 can be achieved by moving to B instead.
Now let's see how the value of beta can be used. After cutting off further exploration of I,
J is examined, yielding a value of 5, which is assigned as the value of F (since it is the
maximum of 5 and 0). This value becomes the value of beta at node C. it indicates that C
27
is guaranteed to get a 5 or less. Now we must expand G. First M is examined and it has a
value of 7, which is passed back to G as its tentative value. But now 7 is compared to
beta (5). It is greater, and the player whose turn it is at node C is trying to minimize. So
this player will not choose G, which would lead to a score of at least 7, since there is an
alternative move to F, which will lead to a score of 5. Thus it is not necessary to explore
any of the other
28
branches of G.
From this example,

 At maximizing levels, we can rule out a move early if it becomes clear that its
value will be less than the current threshold,
 At minimizing levels, search will be terminated if values that are greater
than the current threshold are discovered.
 At maximizing levels, only beta is used to determine whether to cut off the
search, and at minimizing levels only alpha is used.
ALPHA-BETA SEARCH ALGORITHM

- The problem with minimax search is that the number of game states it has to
examine is exponential in the number of moves. Unfortunately we can’t
eliminate the exponent, but we can effectively cut it in half. The trick is that it
is possible to compute the correct minimax decision without looking at every
node in the game tree.
- Alpha-beta pruning technique is applied to a standard minimax tree, it returns

the same move as minimax would, but prunes away branches that cannot
possibly influence the final decision.
The outcome is that we can identify the minimax decision without ever
evaluating two of the leaf nodes.
- Another way to look at this is as a simplification of the formula for

MINIMAX-VALUE. Let the two unevaluated successors of node C in figure
have values x and y and let z be the minimum of x and y. The value of the root
node is given by
MINIMAX-VALUE(root) = max(min(3,12,8),min(2,x,y), min(14,5,2))

= max (3, min(2,x,y),2)
= max(3,z,2) where z <= 2
= 3.
29
In other words, the value of the root and hence the minimax decision are independent of
the values of the pruned leaves x and y
- Alpha-beta pruning can be applied to trees of any depth, and it is often possible
to prune entire subtrees rather than just leaves.
- The general principle is this: consider a node n somewhere in the tree such that
Player has a choice of moving to that node.
30
- If Player has a better choice m either at the parent node of n or at any choice
point further up, then n will never be reached in actual play. So once we have
found out enough about n to reach this conclusion, we can prune it.
- Alpha-beta pruning gets its name from the following two parameters that
describe bounds on the backed-up values that appear anywhere along the path:
31
α = the value of the best (i.e., highest-value) choice we have found so far
at any choice point along the path for MAX.
β= the value of the best (i.e., lowest-value) choice we have found so far
at any choice point along the path for MIN
- Alpha-beta search updates the values of α and β as it goes along and prunes the
remaining branches at a node as soon as the value of the current node is known
to be worse than the current α or β value for MAX or MIN respectively.
- The effectiveness of alpha-beta pruning is highly dependent on the order in

which the successors are examined. For ex, in figure we could not prune any
successors of D at all because the worst successors were generated first. If the
third successor had been generated first, we would have been able to prune the
other two. This suggests that it might be worthwhile to try to examine first the
successors that are likely to be best.
The alpha-beta search algorithm
Monte Carlo Tree Search,

32
Monte Carlo Tree Search (MCTS) is a heuristic search algorithm used in decision-
making processes, particularly in game-playing artificial intelligence. It is a combination
of tree search and random sampling methods, making it useful for games with complex
decision trees and limited computation resources.
33
MCTS works by simulating random game moves starting from the current game state,
and then selecting the best move based on the outcomes of those simulations. It builds
a tree of game states and their associated action values, and each iteration of the
algorithm expands the tree by selecting and simulating a promising unexplored node,
using a selection policy such as Upper Confidence Bound (UCB), and updating the
statistics of the nodes on the path.
One example of the application of MCTS is in the game of Go, which has an extremely
large branching factor, making it difficult to explore the entire decision tree. In 2016, a
computer program called AlphaGo used MCTS to defeat the world champion of Go in
a five-game match.
Another example is in the game of poker, where MCTS can be used to determine the
optimal action to take given the current state of the game and the probability distribution
of the opponents' hands. The program Pluribus, developed by researchers at Carnegie
Mellon University, used MCTS to defeat professional poker players in a six-player no-
limit Texas hold'em game.
MCTS can be divided into four phases:
1. Selection: A node in the tree is selected for expansion.

2. Expansion: The selected node is expanded, creating new child nodes.
3. Simulation: A simulation is run from the expanded node to a terminal state.
4. Backpropagation: The results of the simulation are propagated back up the
tree, updating the values of the nodes.
The selection phase is the most important phase in MCTS. The goal of the selection
phase is to select the node that is most likely to lead to a good outcome. There are a
number of different heuristics that can be used for selection, such as:
 Upper confidence bound (UCB): UCB is a heuristic that selects the node with the
highest upper confidence bound. The upper confidence bound is an estimate of the
expected value of the node, plus a bonus that is proportional to the uncertainty of the
estimate.
 Prior probability: Prior probability is a heuristic that selects the node with the highest
prior probability. The prior probability is an estimate of the probability that the node
will lead to a good outcome.
The expansion phase is simply the process of creating new child nodes for the selected
node. The number of child nodes that are created depends on the game. For example,
in Go, each node can have up to 256 child nodes.
The simulation phase is the process of running a game from the expanded node to a
terminal state. The simulation is run using a random policy, which means that each move
is chosen randomly.
34
The backpropagation phase is the process of updating the values of the nodes in the
tree based on the results of the simulation. The values of the nodes are updated
using a formula that takes into account the outcome of the simulation and the prior
probability of the node.
MCTS is a powerful algorithm that has been used to achieve state-of-the-art results in a
variety of games. However, it can be computationally expensive. There are a number
of techniques that can be used to reduce the computational cost of MCTS, such as:
 Pruning: Pruning is a technique that removes nodes from the tree that are unlikely to be
relevant.
35
 Robustness: MCTS can be made more robust to noise by using techniques such as
averaging the results of multiple simulations.
 Parallelization: MCTS can be parallelized by running multiple simulations in parallel.
MCTS is a powerful tool that can be used to solve a wide variety of problems. It is
particularly well- suited for problems that can be modeled as games.
STOCHASTIC GAMES
 In real life, there are many unpredictable external events that put us into unforeseen
situations.
 Many games mirror this unpredictability by including a random element, such as
the throwing of dice.
 In this way, they take us a step nearer reality, and it is worthwhile to see how
this affects the decision- making process.
 Backgammon is a typical game that combines luck and skill. Dice are rolled at
the beginning of a player’s turn to determine the legal moves. In the
backgammon position of figure, for example, white has rolled a 6-5, and has
four possible moves.
 Although White knows what his or her own legal moves are, White does not
know what Black is going to roll and thus does not know what Black’s legal
moves will be.
 That means White cannot construct a standard game tree of the sort we saw in chess
and tic-tac-toe.
 A game tree in backgammon must include chance nodes in addition to MAX
and MIN node Chance nodes are shown as circles in Fig.
36
 The branches leading from each chance node denote the possible dice rolls, and
each is labeled with the roll and the chance that it will occur.
Figure: A typical backgammon position. The goal of the game is to move all
one’s
piecesoff theboard.
37
There are 36 ways to roll two dice, each equally likely; but because a 6-5 is the
same as a 5-6, there areonly 21 distinct rolls a 1/18 chance each.
The next step is to understand how to make correct
decisions. Obviously, we still want to pick the move
that leads to the best position. However, the resulting
positions do not have definite minimax values.
Instead, we can only calculate the expected value, where the expectation is taken
over all the possibledice rolls that could occur.
This leads us to generalize the minimax value for deterministic games to an
expectiminimax value forgames with chance nodes.
- Terminal nodes and MAX and MIN nodes work exactly the same way as before;
chance nodes are evaluated by taking the weighted average of the values
resulting from all possible dice rolls, that is,
EXPECTIMINIMAX(n) =
Maxs ∊ Successors(n) EXPECTIMINIMAX(s) if n is

UTILITY(n) if n is a terminal state
a MAX node mins∊Successors(n)

EXPECTIMINIMAX(s) if
∑s∊Successors(n)P(s). EXPECTIMINIMAX(s)
n is a MIN node
if n is a chance node
Where the successor function for a chance node n simply augments the
state of n with each possible dice roll to produce each successor s and
P(s) is the probability that that dice roll occurs. These equations can be
backed up recursively all the way to the root of the tree, just as in
minimax.
Position evaluation in games with chance nodes
- As with minimax, the obvious approximation to make with expectiminimax is

to cut the search off at some point and apply an evaluation function to each
leaf.
38
- One might think that evaluation functions for games such as backgammon
should be just like evaluation functions for chess – they just need to give
higher scores to better positions.
- But in fact, the presence of chance nodes means that one has to be more careful
about what the evaluation values mean.
Figure: An order preserving transformation on leaf values changes the best move.
39
- Figure shows what happens: with an evaluation function that assigns values
[1,2,3,4] to the leaves, move A1 is best; with values [1,20,30,400], move A2 is
best. Hence, the program behaves totally differently if we make a change in the
scale of some evaluation values.
- It turns out that, to avoid this sensitivity, the evaluation function must be a
positive linear transformation of the probability of winning from a position
Complexity of expect minimax
- If the program knew in advance all the dice rolls that would occur for the rest
of the game, solving a game with dice would be just like solving a game
without dice, which minimax does in O(bm) time. Because expectiminimax is
also considering all the possible dice-roll sequences, it will take O(bmnm),
where n is the number of distinct rolls.
- Even if the search depth is limited to some small depth d , the extra cost
compared with that of minimax makes it unrealistic to consider looking ahead
very far in most games of chance.
- In backgammon n is 21 and b is usually around 20, but in some situations can
be as high as 4000 for dice rolls that are doubles. Three plies is probably all we
could manage
- Another way to think about the problem is this: the advantage of alpha-beta is
that it ignores future developments that just are not going to happen, given best
play. Thus it concentrates on likely occurrences.
- In games with dice, there are no likely sequences of moves, because for those
moves to take place, the dice would first have to come out the right way to
make them legal.
- This is a general problem whenever uncertainty enters the picture the
possibilities are multiplied enormously, and forming detailed plans of action
becomes pointless, because the world probably will not play along.
- No doubt it will have occurred to the reader that perhaps something like alpha-
beta pruning could be applied to game trees with chance nodes. It turns out that
it can.
- The analysis for MIN and MAX nodes is unchanged, but we can also prune
chance nodes,using a bit of ingenuity.
40
- Consider the chance node C in figure and what happens to its value as we
examine and evaluate its children.
- Is it possible to find an upper bound on the value of C before we have looked at all
its children?
- At first sight, it might seem impossible, because the value of C is the average
of its children’s values. Until we have looked at all the dice rolls, this average
could be anything, because the unexamined children might have any value at
all.
- But if we put bounds on the possible values of the utility function, then we can
arrive at bounds for the average. For example, if we say that all utility values
are between +3 and -3, then the value of leaf nodes is bounded, and in turn we
can place an upper bound on the value of the chance node without looking at
all its children.
41
Partially Observable Game,

Chess has often been described as war in miniature, but it lacks at least one major
characteristic of real wars, namely, partial observability.
In the "fog of war," the existence and disposition of enemy units is often unknown until
revealed by direct contact.
As a result, warfare includes the use of scouts and spies to gather information and the use of
concealment and bluff to confuse the enemy.
Partially observable games share these characteristics and are thus qualitatively different
from the games described in the preceding sections
1.Kriegspiel: Partially observable
The game of Kriegspiel, a partially observable variant of chess in which pieces can move
but are completely invisible to the opponent.
For Kriegspiel, a winning strategy, or guaranteed checkmate, is one that, for each
possible percept sequence, leads to an actual checkmate for every GUARANTEED
CHECKMATE possible board state in the current belief state, regardless of how the
opponent moves.
Such checkmates are PROBABILISTIC CHECKMATE still required to work in every
board state in the belief state; they are probabilistic with respect to randomization of the
winning player's moves.
It is quite rare that a guaranteed or probabilistic checkmate can be found within any
reasonable depth, except in the endgame.
Sometimes a checkmate strategy works for some of the board states in the current belief
state but not others.
Part of a guaranteed checkmate in the KRK endgame, shown on a reduced board. In the
initial belief state, Black’s king is in one of three possible locations. By a combination of
probing moves, the strategy narrows this down to one. Completion of the checkmate is left
42
as an exercise.
43
The KBBK endgame, on the other hand, is won with probability 1− .

From these considerations, it seems that the probabilities associated with the board states
in the current belief state can only be calculated given an optimal randomized strategy; in
turn, computing that strategy seems to require knowing the probabilities of the various
states the board might be in.
An equilibrium specifies an optimal randomized strategy for each player.
Card Game.
Card games provide many examples of stochastic partial observability, where the
missing information is generated randomly. For example, in many games, cards are dealt
randomly at the beginning of the game, with each player receiving a hand that is not visible
to the other players. Such games include bridge, whist, hearts, and some forms of poker.
At first sight, it might seem that these card games are just like dice games: the
cards are dealt randomly and determine the moves available to each player, but all the
“dice” are rolled at the beginning!
Even though this analogy turns out to be incorrect, it suggests an effective
algorithm: consider all possible deals of the invisible cards; solve each one as if it were a
fully observable game; and then choose the move that has the best outcome averaged over
all the deals. Suppose that each deal s occurs with probability P(s); then the move we want
is
Here, we run exact MINIMAX if computationally feasible; otherwise, we run H-

MINIMAX.
Now, in most card games, the number of possible deals is rather large. For example, in
bridge play, each player sees just two of the four hands;
there are two unseen hands of 13 cards each, so the number of deals is 26 13 = 10,
400, 600. Solving even one deal is quite difficult, so solving ten million is out of the
question. Instead, we resort to a Monte Carlo.
Approximation: instead of adding up all the deals, we take a random sample of N
deals, where the probability of deal s appearing in the sample is proportional to P(s):
The KBBK endgame, on the other hand, is won with probability 1− .

From these considerations, it seems that the probabilities associated with the board states
44
in the current belief state can only be calculated given an optimal randomized strategy; in
turn, computing that strategy seems to require knowing the probabilities of the various
states the board might be in. An equilibrium specifies an optimal randomized strategy for
each player.

UNIT3 Aies Reg 2023

Uploaded by

Copyright:

Available Formats

UNIT3 Aies Reg 2023

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UNIT3 Aies Reg 2023

Uploaded by

Copyright:

Available Formats

1

UNIT - III CONSTRAINT SATISFACTION PROBLEMS AND

Constraint Satisfaction Problems – Constraint Propagation – Backtracking

neighboring regions have the same color.

variable,provided that it does not conflict with previously assigned variables.

Every solution must be a complete assignment and therefore appears at depth

2. CONSTRAINT PROPAGATION: INFERENCE IN CSPS

Constraint propagation may be intertwined with search, or it may be done as a

So far our search algorithm considers the constraints on a

Local Search for CSPs

The Structure of Problems :Problem Structure

Independent Sub problems

if m variables are involved in the constraint, and if they have n possible

In addition, there are 27 different

Notice that BACKTRACKING-SEARCH keeps only a single

2. What inferences should be performed at each step in the search (INFERENCE)?

AI Adversarial search: Adversarial search is a game-playing technique where the agents

no luck factor works, only mind works.

A game-tree for tic-tac-toe

• They provide a structured task in which it is very easy to measure success or

Some kinds of heuristic search procedure is necessary to improve the effectiveness of

In the context of game-playing programs, a goal state is one in which we win.

Static evaluation function

Two important knowledge-based components of a good game-playing program:

Optimal Decisions in Games:

MINIMAX SEARCH PROCEDURE

One Ply Search

look one move

we apply the plausible-move generator, generating a set of successor positions

Figure 2: Two Ply Search

Figure 3: Backing Up the Values of a Two – Ply Search

The alternation of maximizing at alternate ply when evaluations are being

Game can be formally defined as a search problem as below:

It is a straightforward recursive procedure that relies on two auxiliary procedures that

MINIMAX(Position, Depth, Player)

(c) If NEW-VALUE > BEST-SCORE, Then we have found a successor that

OPTIMAL DECISIONS IN GAMES –ALPHA -BETA PRUNING

Figure .4 An Alpha Cutoff

Figure 2.5 Alpha and Beta Cutoffs

From this example,

ALPHA-BETA SEARCH ALGORITHM

- Alpha-beta pruning technique is applied to a standard minimax tree, it returns

- Another way to look at this is as a simplification of the formula for

MINIMAX-VALUE(root) = max(min(3,12,8),min(2,x,y), min(14,5,2))

- The effectiveness of alpha-beta pruning is highly dependent on the order in

The alpha-beta search algorithm

Monte Carlo Tree Search,

MCTS can be divided into four phases:

1. Selection: A node in the tree is selected for expansion.

Maxs ∊ Successors(n) EXPECTIMINIMAX(s) if n is

a MAX node mins∊Successors(n)

Position evaluation in games with chance nodes

- As with minimax, the obvious approximation to make with expectiminimax is

Complexity of expect minimax

Partially Observable Game,

The KBBK endgame, on the other hand, is won with probability 1− .

Here, we run exact MINIMAX if computationally feasible; otherwise, we run H-

The KBBK endgame, on the other hand, is won with probability 1− .

You might also like