Unit III AI

Unit III Adversarial Search and Games 07 Hours
Game Theory, Optimal Decisions in Games, Heuristic Alpha–Beta Tree Search, Monte Carlo Tree
Search, Stochastic Games, Partially Observable Games, Limitations of Game Search Algorithms,
Constraint Satisfaction Problems (CSP), Constraint Propagation: Inference in CSPs, Backtracking
Search for CSPs.
Unit III
Adversarial Search and Games
Adversarial Search
Search in competitive environments where agents goals are in conflict.
Examples of Adversarial Search are Games
State are easy to represent (unlike otjher real world problem)
Agents are restricted to a small number of actions
Outcomes of agent actions defined by precise rule
Usually hard to solve.
Game Theory
Multiagent environments, in which each agent needs to consider the actions of other agents and how they
affect its own welfare.
The unpredictability of these other agents can introduce contingencies into the agent‗s problem-
solving process.
In environments the agents‗ goals are in conflict which gives rise to adversarial search problems—
often known as games.
Mathematical game theory, a branch of economics, views any multi-agent environment as a game,
provided that the impact of each agent on the others is ―significant, regardless of whether the agents are
cooperative or competitive.
Types of Games
1. Perfect Information 2.Imperfect Information 3.Deterministic
4. Non-Deterministic 5 .Zero-sum 6.Constant sum
1. Perfect information
Agent can look into complete board.
Agent have all information about games.
Agent can see others move also.
Example- chess, checkers etc.
2.Imperfect information
Agent do not have all information about games.
Agent is not aware what is going on.
Example-Tic-Tac-Toe, Battleship, Bridge etc.
3. Deterministic
Follow strict pattern and set of rules for the games.
No randomness associated with games.
Example –Chess, Checkers, Tic Tac Toe etc
4. Non-deterministic
It Have various unpredictable events. It Has a factor of luck or chance.
Luck factor introduced by the dice or cards
Each action response is not fixed.
Also called as stochastic games
Example- Poker, Backgemmon etc
5. Zero Sum Games

A zero-sum game is defined as one where the total payoff to all players is the same for every instance
of the game.
It is type of Adversarial Search.
It involves pure competition.
Each agents gains or loss of utility is exactly balanced by gain or loss of other agents.
One player tries to maximize its single value while other tries to minimize it.
Example – chess and tic-tac-toe etc.
6. Constant Sum Games

Here algebraic sum of outcomes are always constant.
Thought not necessarily zero
It is strategically equivalents to Zero Sum games.
How to formally define a game

Consider a game with two players MAX and MIN
MAX moves first (Place X) followed by MIN (Place 0)
A game can be formally defined as a kind of search problem with the following elements:
S0: The initial state, which specifies how the game is set up at the start.
PLAYER(s): return player p1 or p2
Defines which player has the move in a state.
ACTIONS(s): return different state are available
Returns the set of legal moves in a state.
RESULT(s, a):
S current state and a action taken gives next state.
The transition model, which defines the result of a move.
TERMINAL-TEST(s):
A terminal test, which is true when the game is over and false otherwise. States where
the game has ended are called terminal states.
UTILITY(s, p):
A utility function (also called an objective function or payoff function),
Defines the final numeric value for a game that ends in terminal state ‗s‗ for a player ‗p‗.
In chess, the outcome is a win, loss, or draw, with values +1, 0, or -1.
Some games have a wider variety of possible outcomes; the payoffs in backgammon range from 0 to +192.
Example Game tree for Tic-Tac-Toe
Figure shows part of the game tree for tic-tac-toe (noughts and crosses).
From the initial state, MAX has nine possible moves.
Play alternates between MAX„s placing an X and MIN‗s placing an O until we reach leaf nodes
corresponding to terminal states such that one player has three in a row or all the squares are filled.
The number on each leaf node indicates the utility value of the terminal state from the point of view of
MAX; high values are assumed to be good for MAX and
bad for MIN.
For tic-tac-toe the game tree is relatively small—fewer than 9! = 362, 880 terminal nodes.
Optimal Decisions in Games

A Normal Search Problem-
The optimal solution is sequence of action leading to a goal state.
An Adversarial Search Problem-
‗MIN‘- interferes the sequence of actions.
Strategy for MAX-
Specifies MAX‗s move in the initial state,
MAX‗s moves in the states resulting from every possible response by MIN,
MAX‗s moves in the states resulting from every possible response by MIN to those
moves,
and so on.
This is exactly analogous to the AND–OR search algorithm with MAX playing the role
of OR and MIN equivalent to AND.
An optimal strategy leads to outcomes at least as good as any other strategy when one is
playing an infallible opponent.
This optimal strategy can be determined from the minimax value of each node.
Even a simple game like tic-tac-toe is too complex for us to draw the entire game tree
on one page, so we will switch to the trivial game in Figure.
The possible moves for MAX at the root node are labeled a1, a2, and a3.
The possible replies to a1 for MIN are b1, b2, b3, and so on.
This particular game ends after one move each by MAX and MIN.
The utilities of PLY the terminal states in this game range from 2 to 14.
Given a game tree, the optimal strategy can be determined from the minimax value of
each node, which we write as MINIMAX(n).
The minimax value of a node is the utility (for MAX) of being in the corresponding
state, assuming that both players play optimally from there to the end of the game.
The minimax value of a terminal state is just its utility. Furthermore, given a choice,
MAX prefers to move to a state of maximum value, whereas MIN prefers a state of minimum
value.
So we have the following:
MINIMAX(s) =UTILITY(s) if TERMINAL-TEST(s)
maxa∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MAX
mina∈Actions(s) MINIMAX(RESULT(s, a)) if PLAYER(s) = MIN
Let us apply these definitions to the game tree in Figure

The terminal nodes on the bottom level get their utility values from the game‗s
function.
The first MIN node, labeled B, has three successor states with values 3, 12, and 8, so its
minimax value is 3.
Similarly, the other two MIN nodes have minimax value 2. The root node is a MAX
node; its successor states have minimax values 3, 2, and 2; so it has a minimax value of 3.
We can also identify the minimax decision at the root: action a1 is the optimal choice
for MAX because it leads to the state with the highest minimax value.
This definition of optimal play for MAX assumes that MIN also plays optimally—it
maximizes the worstcase outcome for MAX.
What if MIN does not play optimally? Then it is easy to show that MAX will do even
better.
Other strategies against suboptimal opponents may do better than the minimax
strategy, but these strategies necessarily do worse against optimal opponents.
The minimax algorithm

The minimax algorithm computes the minimax decision from the current state.
function MINIMAX-DECISION(state) returns an action return argmax a ∈
ACTIONS(s) MIN-VALUE(RESULT(state, a))
The notation argmaxa∈ S f(a) computes the element a of set S that has the maximum
value of f(a)
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v ←−∞
for each a in ACTIONS(state) do
v ←MAX(v, MIN-VALUE(RESULT(s, a)))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v←∞
for each a in ACTIONS(state) do
v ←MIN(v, MAX-VALUE(RESULT(s, a)))
return v
UTILITY function vs MinMax Function

Utility (s,p)
defines the final numeric value for a game that ends in a terminal state for player p
Minmax(s,p)
defines the numeric values at all other nodes. i.e. terminal state and Non- terminal
state.
Heuristic Alpha–Beta Tree Search
The problem with minimax search is that the number of game states it has to examine is
exponential in the depth of the tree.
Unfortunately, we can‗t eliminate the exponent, but it turns out we can effectively cut it
in half.
The trick is that it is possible to compute the correct minimax decision without looking
at every node in the game tree.
That is, we can borrow the idea of pruning to eliminate large parts of the tree from
consideration.
The particular technique is called alph-beta pruning.
Monte Carlo Tree Search

Monte Carlo Tree Search (MCTS) is a search technique in the field of Artificial Intelligence (AI).
It is a probabilistic and heuristic driven search algorithm that combines the classic tree search
implementations alongside machine learning principles of reinforcement learning.
In tree search, there‘s always the possibility that the current best action is actually not the most
optimal action.
In such cases, MCTS algorithm becomes useful as it continues to evaluate other alternatives
periodically during the learning phase by executing them, instead of the current perceived optimal strategy.
This is known as the ‖ exploration-exploitation trade-off ―.
It exploits the actions and strategies that is found to be the best till now but also must continue to
explore the local space of alternative decisions and find out if they could replace the current best.
Exploration helps in exploring and discovering the unexplored parts of the tree, which could result in
finding a more optimal path.
In other words, we can say that exploration expands the tree‘s breadth more than its depth. Exploration
can be useful to ensure that MCTS is not overlooking any potentially better paths. But it quickly becomes
inefficient in situations with large number of steps or repetitions.
In order to avoid that, it is balanced out by exploitation.
Exploitation sticks to a single path that has the greatest estimated value. This is a greedy approach and
this will extend the tree‘s depth more than its breadth.
In simple words, UCB formula applied to trees helps to balance the exploration-exploitation trade-off
by periodically exploring relatively unexplored nodes of the tree and discovering potentially more optimal
paths than the one it is currently exploiting.
In MCTS, nodes are the building blocks of the search tree.
These nodes are formed based on the outcome of a number of simulations.
The process of Monte Carlo Tree Search can be broken down into four distinct steps, viz.,
1.selection, 2. expansion, 3. simulation and 4.backpropagation.
Each of these steps is explained in details below:
1.Selection-
In this process, the MCTS algorithm traverses the current tree from the root node using a specific
strategy. The strategy uses an evaluation function to optimally select nodes with the highest estimated value.
MCTS uses the Upper Confidence Bound (UCB) formula applied to trees as the strategy in the
selection process to traverse the tree.
It balances the exploration-exploitation trade-off. During tree traversal, a node is selected based on
some parameters that return the maximum value.
The parameters are characterized by the formula that is typically used for this purpose is given below
In this process, the MCTS algorithm traverses the current tree from the root node using a specific
strategy. The strategy uses an evaluation function to optimally select nodes with the highest estimated value.
MCTS uses the Upper Confidence Bound (UCB) formula applied to trees as the strategy in the
selection process to traverse the tree.
It balances the exploration-exploitation trade-off. During tree traversal, a node is selected based on
some parameters that return the maximum value.
The parameters are characterized by the formula that is typically used for this purpose is given below
where;
Si = value of node i
xi = empirical mean of a node i
C = a constant
N = total number of simulations
When traversing a tree during the selection process, the child node that returns the greatest value from
the above equation will be one that will get selected. During traversal, once a child node is found which is
also a leaf node, the MCTS jumps into the expansion step.
2. Expansion-
In this process, a new child node is added to the tree to that node which was optimally reached during
the selection process.
3. Simulation:
In this process, a simulation is performed by choosing moves or strategies until a result or predefined
state is achieved.
4. Backpropagation:
After determining the value of the newly added node, the remaining tree must be updated. So, the
backpropagation process is performed, where it backpropagates from the new node to the root node.
During the process, the number of simulation stored in each node is incremented. Also, if the new
node‘s simulation results in a win, then the number of wins is also incremented.
Advantages of Monte Carlo Tree Search:

1. MCTS is a simple algorithm to implement.
2. Monte Carlo Tree Search is a heuristic algorithm. MCTS can operate effectively without any
knowledge in the particular domain, apart from the rules and end conditions, and can find its own
moves and learn from them by playing random playouts.
3. The MCTS can be saved in any intermediate state and that state can be used in future use cases
whenever required.
4. MCTS supports asymmetric expansion of the search tree based on the circumstances in which it is
operating.
Disadvantages of Monte Carlo Tree Search:

1. As the tree growth becomes rapid after a few iterations, it requires a huge amount of memory.
2. There is a bit of a reliability issue with Monte Carlo Tree Search. In certain scenarios, there might be a
single branch or path, that might lead to loss against the opposition when implemented for those turn-
based games. This is mainly due to the vast amount of combinations and each of the nodes might not
be visited enough number of times to understand its result or outcome in the long run.
3. MCTS algorithm needs a huge number of iterations to be able to effectively decide the most efficient
path. So, there is a bit of a speed issue there.
Stochastic Games
Many games mirror the unpredictability by including a random element, such as the
throwing of dice.
We call these stochastic games.
Backgammon is a typical game that combines luck and skill. Dice are rolled at the
beginning of a player‗s turn to determine the legal moves.
In the backgammon position of Figure 5.6,
for example, White has rolled a 6–5 and has four possible moves.
White moves clockwise toward 25, and

Black moves counterclockwise toward 0.
A piece can move to any position unless multiple opponent pieces are there; if there is
one opponent, it is captured and must start over.
In the position shown,
White has rolled 6–5 and must choose among four legal moves:
(5–10,5–11), (5–11,19–24), (5–10,10–16), and (5–11,11–16), where the notation (5–
11,11–16) means move one piece from position 5 to 11, and then move a piece from 11 to 16.
A game tree in backgammon must include chance nodes in addition to MAX and MIN
nodes.
Chance nodes are shown as circles in Figure 5.7.
The branches leading from each chance node denote the possible dice rolls; each branch
is labeled with the roll and its probability.
There are 36 ways to roll two dice, each equally likely; but because a 6–5 is the same as
a 5–6, there are only 21 distinct rolls.
The six doubles (1–1 through 6–6) each have a probability of 1/36, so we say P(1–1) =
1/36. The other 15 distinct rolls each have a 1/18 probability.
The next step is to understand how to make correct decisions. Obviously, we still want
to pick the move that leads to the best position.
However, positions do not have definite minimax values. Instead, we can only calculate
the expected value of a position: the average over all possible outcomes of the chance nodes.
This leads to generalize the minimax value for deterministic games to an expecti-
minimax value for games with chance nodes. Terminal nodes and MAX and MIN nodes (for
which the dice roll is known) work exactly the same way as before
For chance nodes we compute the expected value, which is the sum of the value over
all outcomes, weighted by the probability of each chance action:
EXPECTIMINIMAX(s) =
UTILITY(s) if TERMINAL-TEST(s)
maxa EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s)= MAX
mina EXPECTIMINIMAX(RESULT(s, a)) if PLAYER(s)= MIN
Σr P(r)EXPECTIMINIMAX(RESULT(s, r)) if PLAYER(s)= CHANCE
Partially Observable Games
Chess has often been described as war in miniature, but it lacks at least one major
characteristic of real wars, namely, partial observability.
In the ―fog of war,‖ the existence and disposition of enemy units is often unknown until
revealed by direct contact.
As a result, warfare includes the use of scouts and spies to gather information and the
use of concealment and bluff to confuse the enemy.
Kriegspiel: Partially observable chess

In deterministic partially observable games, uncertainty about the state of the board
arises entirely from lack of access to the choices made by the opponent.
This class includes children‘s games such as Battleships (where each player‘s ships are
placed in locations hidden from the opponent but do not move) and Stratego (where piece
locations are known but piece types are hidden).
We will examine the game of Kriegspiel, a partially observable variant of chess in
which pieces can move but are completely invisible to the opponent.
The rules of Kriegspiel are as follows:
White and Black each see a board containing
only their own pieces.
A referee, who can see all the pieces, adjudicates the game and periodically makes
announcements that are heard by both players.
On his turn, White proposes to the referee any move that would be legal if there were
no black pieces.
If the move is in fact not legal (because of the black pieces), the referee announces
―illegal.‖ In this case, White may keep proposing moves until a legal one is found—and learns
more about the location of Black‘s pieces in the process.
Once a legal move is proposed, the referee announces one or more of the following:
―Capture on square X‖ if there is a capture, and ―Check by D‖ if the black king is in

check, where D is the direction of the check, and can be one of ―Knight,‖
―Rank,‖ ―File,‖ ―Long diagonal,‖ or ―Short diagonal.‖
(In case of discovered check, the referee may make two ―Check‖ announcements.)
If Black is checkmated or stalemated, the referee says so; otherwise, it is Black‘s turn to
move.
Initially, White‘s belief state is a singleton
because Black‘s pieces haven‘t moved yet.
After White makes a move and Black responds,
White‘s belief state contains 20 positions because Black has 20 replies to any White move.
Keeping track of the belief state as the game progresses is exactly the problem of state
estimation,
We can map Kriegspiel state estimation directly onto the partially observable,
nondeterministic framework, if we consider the opponent as the source of nondeterminism;
that is, the RESULTS of White‘s move are composed from the (predictable) outcome of
White‘s own move and the unpredictable outcome given by Black‘s reply
Given a current belief state, White may ask, “Can I win the game?”
For a partially observable game, the notion of a strategy is altered; instead of
specifying a move to make for each possible move the opponent might make, we need a move
for every possible percept sequence that might be received.
For Kriegspiel, a winning strategy, or guaranteed checkmate, is one that, for each
possible percept sequence, leads to an actual checkmate for every possible board state in the
current belief state, regardless of how the opponent moves.
With this definition, the opponent‘s belief state is irrelevant—the strategy has to work
even if the opponent can see all the pieces.
This greatly simplifies the computation. Figure 5.13 shows part of a guaranteed
checkmate for the KRK (king and rook against king) endgame.
In this case, Black has just one piece (the king), so a belief state for White can be
shown in a single board by marking each possible position of the Black king.
The general AND-OR search algorithm can be applied to the belief-state space to find
guaranteed checkmates.
The incremental belief-state algorithm mentioned
in that section often finds midgame checkmates up to depth 9—probably well beyond the
abilities of human players.
In addition to guaranteed checkmates, Kriegspiel admits an entirely new concept that
makes no sense in fully observable games: probabilistic checkmate.
Such checkmates are still required to work in every board state in the belief state; they
are probabilistic with respect to randomization of the winning player‘s moves.
To get the basic idea, consider the problem of finding a lone black king using just the
white king.
Simply by moving randomly, the white king will eventually bump into the black king
even if the latter tries to avoid this fate, since Black cannot keep guessing the right evasive
moves indefinitely. In the terminology of probability theory, detection occurs with probability
The KBNK endgame—king, bishop and knight against king—is won in this sense;
White presents Black with an infinite random sequence of choices, for one of which Black
will guess incorrectly and reveal his position, leading to checkmate.
The KBBK endgame, on the other hand, is won with probability 1− €.
White can force a win only by leaving one of his bishops unprotected for one move.
If Black happens to be in the right place and captures the bishop (a move that would
lose if the
bishops are protected), the game is drawn.
White can choose to make the risky move at some randomly chosen point in the middle
of a very long sequence, thus reducing to an arbitrarily small constant, but cannot reduce € to
zero.
It is quite rare that a guaranteed or probabilistic checkmate can be found within any
reasonable depth, except in the endgame.
Sometimes a checkmate strategy works for some of the board states in the current belief
state but not others.
Trying such a strategy may succeed, leading to an accidental checkmate—accidental
in the sense that White could not know that it would be checkmate—if Black‘s pieces happen
to be in the right places.
(Most checkmates in games between humans are of this accidental nature.)
This idea leads naturally to the question of how likely it is that a given strategy will win,
which leads in turn to the question of how likely it is that each board state in the current belief
state is the true board state.
One‘s first inclination might be to propose that all board states in the current belief state
are equally likely—but this can‘t be right. Consider, for example, White‘s belief state after
Black‘s first move of the game.
By definition (assuming that Black plays optimally), Black must have played an
optimal move, so all board states resulting from suboptimal moves ought to be assigned zero
probability.
This argument is not quite right either, because each player’s goal is not just to move
pieces to the right squares but also to minimize the information that the opponent has about
their location.
Playing any predictable ―optimal‖ strategy provides the opponent with information.
Hence, optimal play in partially observable games requires a willingness to play somewhat
randomly. (This is why restaurant hygiene inspectors do random inspection visits.)
This means occasionally selecting moves that may seem ―intrinsically‖ weak—but they
gain strength from their very unpredictability, because the opponent is unlikely to have
prepared any defense against them.
From these considerations, it seems that the probabilities associated with the board
states in the current belief state can only be calculated given an optimal randomized
strategy; in turn, computing that strategy seems to require knowing the probabilities of the
various states the board might be in.
This conundrum can be resolved by adopting the gametheoretic notion of an
equilibrium solution.
An equilibrium specifies an optimal randomized strategy for each player.
Computing equilibria is prohibitively expensive, however, even for small games, and is
out of the question for Kriegspiel.
At present, the design of effective algorithms for general Kriegspiel play is an open
research topic. Most systems perform bounded-depth lookahead in their own beliefstate space,
ignoring the opponent‘s belief state. Evaluation functions resemble those for the observable
game but include a component for the size of the belief state—smaller is better!
Card games
Card games provide many examples of stochastic partial observability, where the missing
information is generated randomly.
For example, in many games, cards are dealt randomly at the beginning of the game,
with each player receiving a hand that is not visible to the other
players.
Such games include bridge, whist, hearts, and some forms of poker.
At first sight, it might seem that these card games are just like dice games: the cards are
dealt randomly and determine the moves available to each player, but all the ―dice‖ are rolled
at the beginning!
Even though this analogy turns out to be incorrect, it suggests an effective algorithm:
consider all possible deals of the invisible cards; solve each one as if it were a fully observable
game; and then choose the move that has the best outcome averaged over all the deals
Suppose that each deal s occurs with probability P(s); then the move we want is
Here, we run exact MINIMAX if computationally feasible; otherwise, we run H-

MINIMAX.
Now, in most card games, the number of possible deals is rather large.
For example, in bridge play, each player sees just two of the four hands; there are two
unseen hands of 13
cards each, so the number of deals is = 10, 400, 600.
Solving even one deal is quite difficult, so solving ten million is out of the question.
Instead, we resort to a Monte Carlo approximation: instead of adding up all the deals,
we take a random sample of N deals,
where the probability of deal s appearing in the sample is proportional to P(s):
For games like whist and hearts, where there is no bidding or betting phase before play
commences, each deal will be equally likely and so the values of P(s) are all equal.
(Notice that P(s) does not appear explicitly in the summation, because the samples are
already drawn according to P(s).)
As N grows large, the sum over the random sample tends to the exact value, but even
for fairly small N—say, 100 to 1,000—the method gives a good
approximation.
It can also be applied to deterministic games such as Kriegspiel, given some reasonable
estimate of P(s).
For games like whist and hearts, where there is no bidding or betting phase before play
commences, each deal will be equally likely and so the values of P(s) are all equal.
For bridge, play is preceded by a bidding phase in which each team indicates how many
tricks it expects to win.
Since players bid based on the cards they hold, the other players learn more about the
probability of each deal
Taking this into account in deciding how to play the hand is tricky, for the reasons
mentioned in our description of Kriegspiel: players may bid in such
a way as to minimize the information conveyed to their opponents.
Even so, the approach is quite effective for bridge,
The strategy described in Equations 5.1 and 5.2 is sometimes called averaging over
clairvoyance because it assumes that the game will become observable to both players
immediately after the first move.
Consider the following story:
Day 1: Road A leads to a heap of gold; Road B leads to a fork. Take the left fork and
you‘ll find a bigger heap of gold, but take the right fork and you‘ll be run over by a bus.
Day 2: Road A leads to a heap of gold; Road B leads to a fork. Take the right fork and
you‘ll find a bigger heap of gold, but take the left fork and you‘ll be run over by a bus.
Day 3: Road A leads to a heap of gold; Road B leads to a fork. One branch of the
fork leads to a bigger heap of gold, but take the wrong fork and you‘ll be hit by a bus.
Unfortunately you don‘t know which fork is which.
Averaging over clairvoyance leads to the following reasoning: on Day 1, B is the right
choice; on Day 2, B is the right choice; on Day 3, the situation is the same as either Day 1 or
Day 2, so B must still be the right choice. Now we can see how averaging over clairvoyance
fails: it does not consider the belief state that the agent will be in after acting.
A belief state of total ignorance is not desirable, especially when one possibility is
certain death.Because it assumes that every future state will automatically be one of perfect
knowledge, the approach never selects actions that gather information (like the first move in
Figure 5.13); nor will it choose actions that hide information from the opponent or provide
information to a partner because it assumes that they already know the information; and it will
never bluff in poker,4 BLUFF because it assumes the opponent can see its cards.
Limitations of Game Search Algorithms

1.Limited scope:
The techniques and algorithms developed for game playing may not be well-suited for
other types of applications and may need to be adapted or modified for different domains.
2.Computational cost:
Game playing can be computationally expensive, especially for complex games such as
chess or Go, and may require powerful computers to achieve real-time performance.
3.Due to huge branching factor, the process of reaching the goal is slower.
4.Evaluation and search of all possible nodes and branches degrades the performance and
efficiency of the engine.
5. Both the players have too many choices to decide from.
6. If there is a restriction of time and space, it is not possible to explore the entire tree.
Constraint Satisfaction Problems (CSP)

A constraint satisfaction problem consists of three components, X,D, and C:
X is a set of variables, {X1, . . . ,Xn}.
D is a set of domains, {D1, . . . ,Dn}, one for each variable.
C is a set of constraints that specify allowable combinations of values.
Each domain Di consists of a set of allowable values, {v1, . . . , vk} for variable Xi. Each constraint Ci
consists of a pair (scope, rel ),
where scope is a tuple of variables that participate in the constraint and rel is a relation that defines the
values that those variables can take on.
A relation can be represented as an explicit list of all tuples of values that satisfy the constraint, or as
an abstract relation that supports two operations:
Testing if a tuple is a member of the relation and enumerating the members of the relation.
For example,
If X1 and X2 both have the domain {A,B}, then the constraint saying the two variables must have
different values can be written as
<(X1,X2), [(A,B), (B,A)]>
or as
<(X1,X2),X1 ≠ X2>.
To solve a CSP, we need to define a state space and the notion of a solution.
Each state in a CSP is defined by an assignment of values to some or all of the variables,
{Xi =vi,Xj = vj , . . .}.
An assignment that does not violate any constraints is called a consistent or legal assignment.
A complete assignment is one in which every variable is assigned, and a solution to a CSP is a
consistent, complete assignment.
A partial assignment is one that assigns values to only some of the variables.
Example problem: Map coloring

Suppose that, having tired of Romania, we are looking at a map of Australia showing each of its states
and territories (Figure 6.1(a)).
We are given the task of coloring each region either red, green, or blue in such a way that no
neighboring regions have the same color.
To formulate this as a CSP, we define the variables to be the regions
X = {WA, NT, Q, NSW, V, SA, T} .
The domain of each variable is the set Di = {red , green, blue}.

The constraints require neighboring regions to have distinct colors. Since there are nine places where
regions border, there are nine constraints:
C = {SA ≠ WA, SA ≠NT, SA = Q, SA ≠ NSW, SA ≠V, WA ≠ NT, NT ≠Q, Q ≠NSW, NSW≠V } .
Here we are using abbreviations; SA ≠ WA is a shortcut for <(SA,WA),SA ≠ WA>,
where SA ≠WA can be fully enumerated in turn as {(red , green), (red , blue), (green, red ), (green,
blue), (blue, red ), (blue, green)} .
There are many possible solutions to this problem, such as

{WA=red , NT =green, Q=red , NSW =green, V =red , SA=blue, T =red }.
A CSP as a constraint graph is as shown in Figure 6.1(b).

The nodes of the graph correspond to variables of the problem, and a link connects any two variables
that participate in a constraint.
The CSPs yield a natural representation for a wide variety of problems; if you already have a CSP-
solving system, it is often easier to solve a problem using it than to design a custom solution using another
search technique.
The CSP solvers can be faster than state-space searchers because the CSP solver can quickly eliminate
large swatches of the search space. For example, once we have chosen {SA=blue} in the Australia problem,
we can conclude that none of the five neighboring variables can take on the value blue.
Without taking advantage of constraint propagation, a search procedure would have to consider
=243 assignments for the five neighboring variables; with constraint propagation we never have to consider
blue as a value, so we have only =32 assignments to look at, a reduction of 87%.
Example problem: Job-shop scheduling

Consider the problem of scheduling the assembly of a car. The whole job is composed of tasks, and
we can model each task as a variable, where the value of each variable is the time that the task starts,
expressed as an integer number of minutes.
Constraints can assert that one task must occur before another—for example, a wheel must be installed
before the hubcap is put on—and that only so many tasks can go on at once.
Constraints can also specify that a task takes a certain amount of time to complete.
We consider a small part of the car assembly, consisting of 15 tasks: install axles (front and back),
affix all four wheels (right and left, front and back), tighten nuts for each wheel, affix hubcaps, and inspect
the final assembly. We can represent the tasks with 15 variables:
X={
Inspect}.
The value of each variable is the time that the task starts. Next we represent precedence constraints
between individual tasks. Whenever a task must occur before task , and task T1 takes duration d1 to
complete, we add an arithmetic constraint of the form
+ d1 ≤ .
In our example, the axles have to be in place before the wheels are put on, and it takes 10 minutes to install an
axle, so we write
+ 10 ≤ ; + 10 ≤ ;
+ 10 ≤ ; +10 ≤ .
Next for each wheel, we must affix the wheel (which takes 1 minute), then tighten the nuts (2
minutes), and
finally attach the hubcap (1 minute, but not represented yet):
+1≤ ; +2≤ ;
+1≤ ; +2 ≤ ;
+1≤ ; +2≤ ;
+1≤ ; +2≤ .
Suppose we have four workers to install wheels, but they have to share one tool that helps put the axle
in place.
We need a disjunctive constraint to say that and must not overlap in time; either one
comes first or the other does:
( + 10 ≤ ) or ( + 10 ≤ ).
This looks like a more complicated constraint, combining arithmetic and logic.
But it still reduces to a set of pairs of values that and can take on.
We also need to assert that the inspection comes last and takes 3 minutes.
For every variable except Inspect we add a constraint of the form X + ≤ Inspect . Finally, suppose
there is a requirement to get the whole assembly done in 30 minutes.
We can achieve that by limiting the domain of all variables:

= {1, 2, 3, . . . , 27} .
The simplest kind of CSP involves variables that have discrete, finite domains.
Map- coloring problems and scheduling with time limits are both of this kind.
The 8-queens problem can also be viewed as a finite-domain CSP,
Where the variables Q1, . . . ,Q8 are the positions of each queen in columns 1, . . . , 8 and each
variable has the domain Di = {1, 2, 3, 4, 5, 6, 7, 8}.
A discrete domain can be infinite, such as the set of integers or strings.
(If we didn‘t put a deadline on the job-scheduling problem, there would be an infinite number of start
times
for each variable.)
With infinite domains, it is no longer possible to describe constraints by enumerating all allowed
combinations of values
A constraint language must be used that understands constraints such as + ≤ directly,
without enumerating the set of pairs of allowable values for ( , ).
Special solution algorithms (which we do not

discuss here) exist for linear constraints on integer variables—that is, constraints, such as the one just given,
in which each variable appears only in linear form.
It can be shown that no algorithm exists for solving general nonlinear constraints on integer
variables.
Cryptarithmetic Problem
A constraint involving an arbitrary number of variables is called a global constraint.

(The name is traditional but confusing because it need not involve all the variables in a problem).
One of the most common global constraints is Alldiff , which says that all of the variables involved in
the constraint must have different values.
In Sudoku problems (see Section 6.2.6), all variables in a row or column must satisfy an Alldiff
constraint. An other example is provided by cryptarithmetic puzzles. (See Figure 6.2(a).)
Each letter in a cryptarithmetic puzzle represents a different digit. For the case in Figure 6.2(a), this
would be represented as the global constraint Alldiff (F, T, U, W, R, O)
The addition constraints on the four columns of the puzzle can be written as the following n-ary
constraints:
O + O = R + 10 · C1
C1 + W + W = U + 10 · C2
C2 + T + T = O + 10 · C3
C3= F ,
where C1, C2, and C3 are auxiliary variables representing the digit carried over into the tens,
hundreds, or thousands column.
These constraints can be represented in a constraint hypergraph, such as the one shown in Figure
6.2(b).
A hypergraph consists of ordinary nodes (the circles in the figure) and hypernodes (the squares),
which represent n-ary constraints.
Defining CSP for Cryptarithmetic Problem
Variable are X= { T, W, O, F, U, R}
Domains for all variable
= {1,2,3,4,5,6,7,8,9}
= {0, 1,2,3,4,5,6,7,8,9}
= {0,1}
Global constraint
This would be represented as the global constraint
C= Alldiff (F, T, U, W, R, O) and

Solution for the given Problem are seven
C3 C2 C1
T W O
+ T W O
----------------------
F O U R
First Solution
9 3 8
+ 9 3 8
----------------------
1 8 7 6
Second Solution
9 2 8
+ 9 2 8
----------------------
1 8 5 6
SSPU Question for 4 Marks in Dec-2022
Apply A constraint satisfaction method to solve following problem SEND+MORE=MONEY,
TWO+TWO=FOUR, CROSS+ ROADS=DANGER
SEND+ MORE= MONEY
Variable are X= { S, E, N, D, M,O, R}
= {1,2,3,4,5,6,7,8,9}
= {0, 1,2,3,4,5,6,7,8,9}
= {0,1}
Global constraint
C= Alldiff (S, E, N, D, M,O, R) and
constraints:
D + E = Y + 10 · C1
C1 + N + R = E + 10 · C2
C2 + E + O = N + 10 · C3
C3 +S+M=O+10.C4 ,
C4=M
where C1, C2, C3 and C4 are auxiliary variables representing the digit carried over into the tens,
hundreds, thousands OR ten thousand column.
C4 C3 C2 C1
S E N D
+ M O R E
-------------------------
M O N E Y
First Solution
9 5 6 7
+ 1 0 8 5
--------------------------
1 0 6 5 2
CROSS+ RODS= DANGER
Variable are X= { C, R,O,S,D,A,N,E}
= {1,2,3,4,5,6,7,8,9}
= {0, 1,2,3,4,5,6,7,8,9}
= {0,1}
Global constraint
C= Alldiff (C, R,O,S,D,A,N,E) and
constraints:
S+ S = R+ 10 · C1
C1+ S + D = E + 10 · C2
C2 + O + A = G+ 10 · C3
C3 +R+O=N+10.C4 ,
C4 +C+R=A+10.C5 ,
C5=D
where C1, C2, C ,C4 and C5 are auxiliary variables representing the digit carried over into the
tens, hundreds, thousands , Ten Thousand or One Lakh column.
C R O S S
+ R O A D S
------------------------------
D A N G E R
Solution
9 6 2 3 3
+ 6 2 5 1 3
--------------------------
1 5 8 7 4 6
Constraint Propagation: Inference in CSPs

The process of using the constraint to reduce the number of legal values for a variable,
which in turn can reduce the legal values for another variable, and so on.
The key idea is local consistency
1.Node consistency
2.Arc consistency
3.Path consistency
4. K-consistency
They use the constraints to infer which variable/value pairs are consistent and which are
not.
1.Node consistency
A single variable (corresponding to a node in the CSP network) is node-consistent if all
the values in the variable‘s domain satisfy the variable‘s unary constraints.
The simplest type is the unary constraint, which restricts the value of a single variable.
For example, in the map-coloring problem it could be the case that South Australians
won‘t tolerate the color green; we can express that with the unary constraint {(SA), SA ≠
green }
A binary constraint relates two variables. For example, SA = NSW is a binary
constraint.
For example, in the variant of the Australia map-coloring problem (Figure 6.1)
Where South Australians dislike green, the variable SA starts with domain {red, green,
blue}, and we can make it node consistent by eliminating green, leaving SA with the reduced
domain {red, blue}.
We say that a network is node-consistent if every variable in the network is node-

consistent
2.Arc consistency
A variable in a CSP is arc-consistent if every value in its domain satisfies the variable‘s
binary constraints.
A network is arc-consistent if every variable is arc consistent with every other variable.
More formally, Xi is arc-consistent with respect to another variable Xj if for every

value in the current domain Di there is some value in the domain Dj that satisfies the binary
constraint on the arc (Xi, Xj )
The most popular algorithm for Arc consistency is called as AC-3.

For example, consider the constraint Y = where the domain of both X and Y is the
set of digits. We can write this constraint explicitly as
[(X, Y ), {(0, 0),(1, 1),(2, 4),(3, 9))}]
To make X arc-consistent with respect to Y , we reduce X‘s domain to {0, 1, 2, 3}.
If we also make Y arc-consistent with respect to X, then Y ‘s domain becomes {0, 1, 4,

9} and
The whole CSP is arc-consistent
3.Path consistency
A two-variable set {Xi, Xj} is path-consistent with respect to a third variable Xm if,for
every assignment {Xi = a, Xj = b}
consistent with the constraints on {Xi, Xj},
there is an assignment to Xm that satisfies the constraints on {Xi, Xm} and {Xm, Xj}.
This is called path consistency.
Let‘s see how path consistency fares in coloring the Australia map with two colors.
We will make the set {WA, SA} path consistent with respect to NT.
We start by enumerating the consistent assignments to the set.
In this case, there are only two:
{WA = red, SA = blue} and
{WA = blue, SA = red}.

We can see that with both of these assignments NT can be neither red nor blue (because
it would conflict with either WA or SA).
Because there is no valid choice for NT, we eliminate both assignments, and we end up
with no valid assignments for {WA, SA}.
Therefore, we know that there can be no solution to this problem.
The PC-2 algorithm (Mackworth, 1977) achieves path consistency in much the same
way that AC-3 achieves arc consistency.
4. K-consistency
Stronger forms of propagation can be defined with the notion of k-consistency.
A CSP is k-consistent if, for any set of k − 1 variables and for any consistent
assignment to those variables, a consistent value can always be assigned to any kth variable.
1-consistency says that, given the empty set, we can make any set of one variable
consistent: this is what we called node consistency.
2-consistency is the same as arc consistency. For binary constraint networks.
3-consistency is the same as path consistency.
Backtracking Search for CSPs

A simple backtracking algorithm for constraint satisfaction problems.
The term backtracking search is used for a depth-first search that chooses values for
one variable at a time and backtracks when a variable has no legal values left to assign.
The algorithm is shown in Figure.
It repeatedly chooses an unassigned variable, and then tries all values in the domain
of that variable in turn, trying to find a solution.
If an inconsistency is detected, then BACKTRACK returns failure, causing the
previous call to try another value.
Part of the search tree for the Australia problem is shown in Figure
where we have assigned variables in the order WA,NT,Q, . . .. Because the
representation of CSPs is standardized,
there is no need to supply BACKTRACKING-SEARCH with a domain-specific initial
state, action function, transition model, or goal test.
Notice that BACKTRACKING-SEARCH keeps only a single representation of a state
Unanswered Question from Algo.
1. Which variable should be assigned next (SELECT-UNASSIGNED-VARIABLE), and
2. Inwhat order should its values be tried (ORDER-DOMAIN-VALUES)?
3. What inferences should be performed at each step in the search (INFERENCE)?
4. When the search arrives at an assignment that violates a constraint, can the search avoid
repeating this failure?
1.Variable ordering
The backtracking algorithm contains the line
var ←SELECT-UNASSIGNED-VARIABLE(csp)
The simplest strategy for SELECT-UNASSIGNED-VARIABLE is to choose the next
unassigned variable in order, {X1,X2, . . .}.
This static variable ordering seldom results in the most efficient search.
For example, after the assignments for WA=red and NT =green in Figure 6.6.
There is only one possible value for SA, so it makes sense to assign SA=blue next
rather than assigning Q.
In fact, after SA is assigned, the choices for Q, NSW, and V are all forced.
1. minimum-remaining-values (MRV) heuristic.
This intuitive idea—choosing the variable with the fewest ―legal‖ values—is called the
minimum-remaining-values (MRV) heuristic.
It also has been called the ―most constrained variable‖ or ―fail-first‖ heuristic, the latter
because it picks a variable that is most likely to cause a failure soon, thereby pruning the
search tree.
It attempts to reduce the branching factor on future choices by selecting the variable
that is involved in the largest number of constraints on other unassigned variables.
In Figure 6.1 b ,
SA is the variable with highest degree, 5;
The other variables have degree 2 or 3,
except for T, which has degree 0.
2. degree heuristic
In fact, once SA is chosen, applying the degree heuristic solves the problem without
any false steps—
You can choose any consistent color at each choice point and still arrive at a solution
with no backtracking.
The minimum-remaining values heuristic is usually a more powerful guide, but the
degree heuristic can be useful as a tie-breaker.
Least-constraining-value heuristic
It prefers the value that rules out the fewest choices for the neighboring variables in the
constraint graph.
For example, suppose that in Figure 6.1 we have generated the partial assignment with
WA=red and NT =green and that our next choice is for Q.
Blue would be a bad choice because it eliminates the last legal value left for Q‘s neighbor,
SA.
The least-constraining-value heuristic therefore prefers red to blue.
In general, the heuristic is trying to leave the maximum flexibility for subsequent
variable assignments.
Interleaving search and inference

So far we have seen how AC-3 and other algorithms can infer reductions in the domain
of variables before we begin the search.
But inference can be even more powerful in the course of a search: every time we
make a choice of a value for a variable, we have a brand-new opportunity to infer new domain
reductions on the neighboring variables.
Forward checking
One of the simplest forms of inference is called forward checking.
Whenever a variable X is assigned, the forward-checking process establishes arc

consistency for it: for each unassigned variable Y that is connected to X by a constraint, delete
from Y ‘s domain any value that is inconsistent with the value chosen for X.
Because forward checking only does arc consistency inferences, there is no reason to
do forward checking if we have already done arc consistency as a preprocessing step.
Intelligent backtracking: Looking backward
The BACKTRACKING-SEARCH algorithm has a very simple policy for what to do

when a branch of the search fails: back up to the preceding variable and try a different value
for it.
This is called chronological backtracking because the most recent decision point is
revisited.
Consider what happens when we apply simple backtracking in Figure 6.1 with a fixed
variable ordering Q, NSW, V , T, SA, WA, NT.
Suppose we have generated the partial assignment {Q=red, NSW =green, V =blue, T
=red}.
When we try the next variable, SA, we see that every value violates a constraint. We
back up to T and try a new color for Tasmania!
Obviously this is silly—recoloring Tasmania cannot possibly resolve the problem

with South Australia.
A more intelligent approach to backtracking is to backtrack to a variable that might fix
the problem—a variable that was responsible for making one of the possible values of SA
impossible.
To do this, we will keep track of a set of assignments that are in conflict with some
value for SA.
The set (in this case {Q=red ,NSW =green, V =blue, }), is called the conflict set for
SA.
The backjumping method backtracks to the most recent assignment in the conflict set;

Unit III AI

Uploaded by

Copyright:

Available Formats

Unit III AI

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit III AI

Uploaded by

Copyright:

Available Formats

Unit III Adversarial Search and Games 07 Hours

5. Zero Sum Games

6. Constant Sum Games

How to formally define a game

Example Game tree for Tic-Tac-Toe

Optimal Decisions in Games

Let us apply these definitions to the game tree in Figure

The minimax algorithm

UTILITY function vs MinMax Function

Monte Carlo Tree Search

Advantages of Monte Carlo Tree Search:

Disadvantages of Monte Carlo Tree Search:

White moves clockwise toward 25, and

Kriegspiel: Partially observable chess

―Capture on square X‖ if there is a capture, and ―Check by D‖ if the black king is in

Here, we run exact MINIMAX if computationally feasible; otherwise, we run H-

cards each, so the number of deals is = 10, 400, 600.

Even so, the approach is quite effective for bridge,

Limitations of Game Search Algorithms

Constraint Satisfaction Problems (CSP)

Example problem: Map coloring

The domain of each variable is the set Di = {red , green, blue}.

There are many possible solutions to this problem, such as

A CSP as a constraint graph is as shown in Figure 6.1(b).

Example problem: Job-shop scheduling

We can achieve that by limiting the domain of all variables:

The 8-queens problem can also be viewed as a finite-domain CSP,

Special solution algorithms (which we do not

A constraint involving an arbitrary number of variables is called a global constraint.

This would be represented as the global constraint

C= Alldiff (F, T, U, W, R, O) and

Constraint Propagation: Inference in CSPs

We say that a network is node-consistent if every variable in the network is node-

More formally, Xi is arc-consistent with respect to another variable Xj if for every

The most popular algorithm for Arc consistency is called as AC-3.

To make X arc-consistent with respect to Y , we reduce X‘s domain to {0, 1, 2, 3}.

If we also make Y arc-consistent with respect to X, then Y ‘s domain becomes {0, 1, 4,

{WA = blue, SA = red}.

Backtracking Search for CSPs

1. minimum-remaining-values (MRV) heuristic.

Interleaving search and inference

Whenever a variable X is assigned, the forward-checking process establishes arc

Intelligent backtracking: Looking backward

The BACKTRACKING-SEARCH algorithm has a very simple policy for what to do

Obviously this is silly—recoloring Tasmania cannot possibly resolve the problem

You might also like