UNIT3 Aies Reg 2023
UNIT3 Aies Reg 2023
UNIT3 Aies Reg 2023
GAME THEORY
Consistency Based Algorithms use information from the constraints to reduce the
search space as early in the search as it is possible
This problem requires a lot of reasoning.
Time complexity of the problem is more as concerned to the other problems.
This problem can also be solved by the evolutionary approach and
mutation operations. This problem is dependent upon some
constraints which are necessary part of the problem. Various complex
Example
Figure 2.15 (a) Principle states and territories of Australia. Coloring this map can be viewed as a
constraint satisfaction problem. The goal is to assign colors to each region so that no
Initial state : the empty assignment {},in which all variables are unassigned.
Successor function : a value can be assigned to any unassigned
3
Varieties of constraints :
(i) Unary constraints involve a single variable.
Example: SA # green
(ii) Binary constraints involve pairs of variables.
Example: SA # WA
(iii) Higher order constraints involve
3 or more variables. Example:
cryptarithmetic puzzles.
(iv) Absolute constraints are the constraints, which rules out a potential solution when
they are violated
(v) Preference constraints are the constraints indicating which solutions are preferred
Forward checking
One way to make better use of constraints during search is called forward
checking.
Whenever a variable X is assigned, the forward checking process looks at
each unas signed variable Y that is connected to X by a constraint and
deletes from Y 's domain any value that is inconsistent with the value
chosen for X.
The following figure shows the progress of a map-coloring search with
forward checking.
Although forward checking detects many inconsistencies, it does not detect all of
them.
Constraint propagation is the general term for propagating the
implications of a constraint on onevariable onto other variables.
Node consistency
A single variable (corresponding to a node in the CSP network) is node-consistent
if all the values in the variable’s domain satisfy the variable’s unary constraints.
For example, in the variant of the Australia map-coloring problem where South
Australians dislike green, the variable SA starts with
domain {red , green, blue}, and we can make it node consistent by eliminating
green, leaving SA with the reduced domain {red , blue}.We say that a network is
node-consistent if every variable in the network is node-consistent.
S
5
K-Consistency
The key idea of CSP constraint propagation is local consistency. If we treat each
variable as a nodein a graph and each binary constraint as an arc, then the process
of enforcing local consistency in eachpart of the graph causes inconsistent values
to be eliminated throughout the graph. There are different types of local
consistency, which we now cover in turn.
Tree-Structured CSPs
The key idea is local consistency. If we treat each variable as a node in a graph
and each binary constraint as an arc, then the process of enforcing local
consistency in each part of the graph causes inconsistent values to be eliminated
throughout the graph. There are different types of local
consistency, which we now cover in turn.
Global constraints
Global constraint is one involving an arbitrary number of variables (but not
necessarily all variables). Global constraints occur frequently in real problems and
can be handled by special-purpose algorithms that are more efficient than the
general-purpose methods described so far. For example, the Alldiff constraint says
that all the variables involved must have distinct values (as in the cryptarithmetic
problem above and Sudoku puzzles below). One simple form of inconsistency
detection for Alldiff constraints works as follows:
7
Resource constraint
Another important higher-order constraint is the resource constraint, sometimes
called the atmost constraint. For example, in a scheduling problem, let P denote the
numbers of personnel assigned to each of four tasks. The constraint that no more
than 10 personnel are assigned in total is written as Atmost (10,P1,P2,P3,P14,...,P).
Instead, domains are represented by upper and lower bounds and are managed by
bounds propagation. For example, in an airline-scheduling problem, let’s
suppose there are two flights, F 1and F2, for which the planes have capacities 165
and 385, respectively. The initial domains for thenumbers of passengers on each
flight are then D1=[0, 165] and D2=[0, 385] .
Sudoku example
The popular Sudoku puzzle has introduced millions of people to constraint
satisfaction problems, although they may not recognize it. A Sudoku board
consists of 81 squares, some of which are initially
filled with digits from 1 to 9. The puzzle is to fill in all the remaining squares such
that no digit appears twice in any row, column, or 3 ×3 box (see Figure 6.4). A
row, column, or box is called a unit.
The Sudoku puzzles that are printed in newspapers and puzzle books have the
property that there is exactly one solution. Although some can be tricky to solve
by hand, taking tens of minutes, even the hardest Sudoku problems yield to a
CSP solver in less than 0.1 second. Sudoku puzzle can be considered a CSP
with 81 variables, one for each square. We use the variable names A1 through
A9 for the top row (left to right), down to I1 through I9 for the bottom row. The
empty squares have the domain {1, 2, 3, 4, 5, 6, 7, 8, 9} and the pre- filled
squares have a domain consisting of a single value.
8
3. BACKTRACKING SEARCH
The term backtracking search is used for depth-first search that chooses
values for one variable ata time and backtracks when a variable has no legal
values left to assign.
9
The term backtracking search is used for a depth-first search that chooses values
for one variable at
a time and backtracks when a variable has no legal values left to assign. It
repeatedly chooses an unassigned variable, and then tries all values in the domain
of that variable in turn, trying to find a solution. If an inconsistency is detected,
then BACKTRACK returns failure, causing the previous call to try another value.
Part of the search tree for the Australia problem is shown in Figure 6.6, where we
have assigned variables in the order WA, NT,Q,. Because the representation of
CSPs is tandardized,
there is no need to supply BACKTRACKING-SEARCH with a domain-specific
initial state, action function, transition model, or goal test.
It turns out that we can solve CSPs efficiently without such domain-specific
knowledge. Instead, we can add some sophistication to the unspecified
functions, using them to address the following questions:
1. Which variable should be assigned next (SELECT-UNASSIGNED-
VARIABLE), and in what order should its values be tried (ORDER-
DOMAIN-VALUES)?
GAME THEORY
There were two reasons that games appeared to be a good domain in which to
explore machine intelligence:
1. Improve the generate procedure so that only good moves (or paths) are
generated.
2. Improve the test procedure so that the best moves (or paths) will be
recognized and explored first.
As the number of legal moves available increases, it becomes increasingly important to
apply heuristics to select only those that have some kind of promise.
The ideal way to use a search procedure to find a solution to a problem is to generate
moves through the problem space until a goal state is reached.
14
A very simple static evaluation function for chess based on piece advantage was proposed
by Turing
—simply add the values of black's pieces (B), the values of white's pieces (W), and then
compute the quotient W/B.
A more sophisticated approach was that taken in Samuel's checkers program, in which
the static evaluation function was a linear combination of several simple functions. Thus
the complete evaluation function had the form:
c1 x pieceadvantage + c2 x advancement + c3 x centercontrol
Unfortunately, deciding which moves have contributed to wins and which to losses is not
always easy. Suppose we make a very bad move, but then, because the opponent makes a
mistake, we ultimately win the game. We would not like to give credit for winning to
our mistake.
They must both incorporate a great deal of knowledge about the particular game being
played. But unless these functions are perfect, we also need a search procedure that
makes it possible to look aheadas many moves as possible to see what may occur.
For a simple one-person game or puzzle, the A* algorithm can be used. It can be applied
to reason forward from the current state as far as possible in the time allowed. The
heuristic function h' can be applied at terminal nodes and used to propagate values back
up the search graph so that the best next move can be chosen. But because of their
adversarial nature, this procedure is inadequate for two- person games such as chess.
As values are passed back up, different assumptions must be made at levels where the
program chooses the move and at the alternating levels where the opponent chooses.
There are several ways that this can be done. The most commonly used method is the
minimax procedure.
An alternative approach is the B* algorithm which works on both standard problem-
solving trees and on game trees.
15
A two-ply game tree. The nodes are “MAX nodes,” in which it is MAX’s turn to
move, and the nodes are “MIN nodes.” The terminal nodes show the utility values for
MAX; the other nodes are labeled with their minimax values. MAX’s best move at the
root is a1, because it leads to the state with the highest minimax value, and MIN’s best
reply is b1, because it leads to the state with the lowest minimax value
Minmax Search
Static evaluation function is applied to the positions and simply choose the best
one. After doing so, the values are backed up to the starting position to represent
the evaluation of it.
The starting position is exactly as good for us as the position generated by the best move
we can make next. Here we assume that the static evaluation function returns large
values to indicate good situations for us, so our goal is to maximize the value of the
static evaluation function of the next board position. An example of this operation is
shown in figure 2.1. It assumes a static evaluation functions that returns values
ranging from -10 to 10, with 10 indicating a win for us, -10 a win for the opponent, and 0
an even match. Since our goal is to maximize the value of the heuristic function, we
choose to move to B. Backing B’s value up to A, we can conclude that A’s value is 8,
since we know we can move to a position with a value of 8.
But now we must take into account that the opponent gets to choose which successor
moves to make and thus which terminal value should be backed up to the next level.
Suppose we made move B. Then the opponent must choose among moves E,F, and G.
The opponent’s goal is to minimize the value of the evaluation function, so he or she
can be expected to choose move F. This means that if we make move B, the actual
position in which we will end up one move later is very bad for us. This is true even
though a possible configuration is that represented by node E, which is very good for us.
But since at this level we are not the ones to move, we will not get to choose it. Figure
2.3 shows the result of propagating the new values up the tree.
At the level representing the opponent’s choice, the minimum value was
chosen and backed up. At the level representing our choice, the maximum
value was chosen.
Once the values from the second ply are backed up, it becomes clear that the correct
move for us to make at the first level, given the information we have available, is C,
since there is nothing the opponent can do from there to produce a value worse than -2.
This process can be repeated for as many ply as time allows, and the more accurate
evaluations that are produced can be used to choose the correct move at the top level.
19
Types of games:
21
As with any recursive program, a critical issue in the design of the MINIMAX procedure
is when to stop the recursion and simply call the static evaluation function. There are a
variety of factors that may influence this decision. They include:
Has one side won?
How many ply have we already explored?
How promising is this path?
How much time is left?
How stable is the configuration?
DEEP-ENOUGH, which is assumed to evaluate all of these factors and to return TRUE
if the search should be stopped at the current level and FALSE otherwise. Our simple
implementation of DEEP- ENOUGH will take two parameters, Position and Depth. It
will ignore its Position parameter and simply return TRUE if its Depth parameter
exceeds a constant cutoff value.
One problem that arises in defining MINIMAX as a recursive procedure is that it needs
to return not one but two results:
The backed-up value of the path it chooses.
The path itself. We return the entire path even though probably only the first
element, representing the best move from the current position, is actually
needed.
We assume that MINIMAX returns a structure containing both results and that we have
two functions, VALUE and PATH, that extract the separate components.
Since we define the MINIMAX procedure as a recursive functions, we must also specify
how it is to be called initially. It takes three parameters, a board position, the current
depth of the search, and the player to move. So the initial call to compute the best move
from the position CURRENT should be
MINIMAX(CURRENT, 0,PLAYER-ONE) if PLAYER-
ONE is to move, or MINIMAX(CURRENT,0,PLAYER-
TWO) if PLAYER-
TWO is to move.
4. If SUCCESSORS is not empty, then examine each element in turn and keep track
of the best one. This is done as follows.
Initialize BEST-SCORE to the minimum value that STATIC can return. It will be
updated to reflect the best score that can be achieved by an element of SUCCESSORS.
For each element SUCC of SUCCESSORS, do the following:
(a) Set RESULT-SUCC TO
MINIMAX(SUCC, Depth+1, OPPOSITE(Player))3
This recursive call to MINIMAX will actually carry out the exploration of SUCC.
(b) Set NEW-VALUE to-VALUE(RESULT-SUCC).This will cause it to reflect
the merits of the position from the opposite perspective from that of the next
lower level.
24
ii. The best known path is now from CURRENT to SUCC and then on to the
appropriate path down from SUCC as determined by the recursive call to
MINIMAX, So set BEST-PATH to the result of attaching SUCC to the front of
PATH(RESULT-SUCC).
5. Now that all the successors have been examined, we know the value of Position as
well as which path to take form it. So return the structure.
VALUE
=BEST-
SCORE
PATH=
BEST-
PATH
When the initial call to MINIMAX returns, the best move from CURRENT is the first
element of PATH.
Optimal decisions in multiplayer games:
Many popular games allow more than two players.
Let us examine how to extend the minimax idea to multiplayer games.
First, we need to replace the single value for each node with a vector of values.
For terminal states, this vector gives the utility of the state from each player's viewpoint. (
In two-player, zero-sum games, the two-element vector can be reduced to a single value
because the values are always opposite.)
The first three plies of a game tree with three players (A, B, C). Each node is
labeled with values from the viewpoint of each player. The best move is marked at
the root
vector of the successor state with the highest value for the player choosing at n. Anyone
who plays multiplayer games, such as Diplomacy, quickly becomes aware that much
more ALLIANCE is going on than in two-player games.
Multiplayer games usually involve alliances, whether formal or informal, among
25
the players. Are alliances a natural consequence of optimal strategies for each
player in a multiplayer game? In this way, collaboration emerges from purely
selfish behavior.
If the game is not zero-sum, then collaboration can also occur with just two players.
After examining node F, we know that the opponent is guaranteed a score of -5 or less at
C (since the opponent is the minimizing player). But we also know that we are
guaranteed a score of 3 or greater at node A which we can achieve if we move to B.
Any other move that produces a score of less than 3 is worse than the move to B, and we
can ignore it. After examining only F we are sure that a move to C is worse (it will be
<=-5) regardless of the score of node G. Thus we need not bother to explore node G at
all. Of course, cutting out one node may not appear to justify the expense of keeping
track of the limits and checking them, but if we were exploring this tree to 6 ply, then we
would have eliminated not a single node but an entire tree 3 ply deep.
To see how the two thresholds, alpha and beta can both be used, consider the example shown
in figure
2.5. In searching this tree the entire sub tree headed by B is searched, and we discover
that at A we can expect a score of at least 3.
When this alpha value is passed down to F, it will enable us to skip the exploration of
L. After K is examined, we see that I is guaranteed a maximum score of 0, which means
that F is guaranteed a minimum of 0. But this is less than alpha's value of 3, so no more
branches of I need to be considered. The maximizing player already knows not to choose
to move to C and then to I since, if that move is made, the resulting score will be no
better than 0 and a score of 3 can be achieved by moving to B instead.
Now let's see how the value of beta can be used. After cutting off further exploration of I,
J is examined, yielding a value of 5, which is assigned as the value of F (since it is the
maximum of 5 and 0). This value becomes the value of beta at node C. it indicates that C
27
is guaranteed to get a 5 or less. Now we must expand G. First M is examined and it has a
value of 7, which is passed back to G as its tentative value. But now 7 is compared to
beta (5). It is greater, and the player whose turn it is at node C is trying to minimize. So
this player will not choose G, which would lead to a score of at least 7, since there is an
alternative move to F, which will lead to a score of 5. Thus it is not necessary to explore
any of the other
28
branches of G.
In other words, the value of the root and hence the minimax decision are independent of
the values of the pruned leaves x and y
- Alpha-beta pruning can be applied to trees of any depth, and it is often possible
to prune entire subtrees rather than just leaves.
- The general principle is this: consider a node n somewhere in the tree such that
Player has a choice of moving to that node.
30
- If Player has a better choice m either at the parent node of n or at any choice
point further up, then n will never be reached in actual play. So once we have
found out enough about n to reach this conclusion, we can prune it.
- Alpha-beta pruning gets its name from the following two parameters that
describe bounds on the backed-up values that appear anywhere along the path:
31
α = the value of the best (i.e., highest-value) choice we have found so far
at any choice point along the path for MAX.
β= the value of the best (i.e., lowest-value) choice we have found so far
at any choice point along the path for MIN
- Alpha-beta search updates the values of α and β as it goes along and prunes the
remaining branches at a node as soon as the value of the current node is known
to be worse than the current α or β value for MAX or MIN respectively.
Monte Carlo Tree Search (MCTS) is a heuristic search algorithm used in decision-
making processes, particularly in game-playing artificial intelligence. It is a combination
of tree search and random sampling methods, making it useful for games with complex
decision trees and limited computation resources.
33
MCTS works by simulating random game moves starting from the current game state,
and then selecting the best move based on the outcomes of those simulations. It builds
a tree of game states and their associated action values, and each iteration of the
algorithm expands the tree by selecting and simulating a promising unexplored node,
using a selection policy such as Upper Confidence Bound (UCB), and updating the
statistics of the nodes on the path.
One example of the application of MCTS is in the game of Go, which has an extremely
large branching factor, making it difficult to explore the entire decision tree. In 2016, a
computer program called AlphaGo used MCTS to defeat the world champion of Go in
a five-game match.
Another example is in the game of poker, where MCTS can be used to determine the
optimal action to take given the current state of the game and the probability distribution
of the opponents' hands. The program Pluribus, developed by researchers at Carnegie
Mellon University, used MCTS to defeat professional poker players in a six-player no-
limit Texas hold'em game.
The selection phase is the most important phase in MCTS. The goal of the selection
phase is to select the node that is most likely to lead to a good outcome. There are a
number of different heuristics that can be used for selection, such as:
Upper confidence bound (UCB): UCB is a heuristic that selects the node with the
highest upper confidence bound. The upper confidence bound is an estimate of the
expected value of the node, plus a bonus that is proportional to the uncertainty of the
estimate.
Prior probability: Prior probability is a heuristic that selects the node with the highest
prior probability. The prior probability is an estimate of the probability that the node
will lead to a good outcome.
The expansion phase is simply the process of creating new child nodes for the selected
node. The number of child nodes that are created depends on the game. For example,
in Go, each node can have up to 256 child nodes.
The simulation phase is the process of running a game from the expanded node to a
terminal state. The simulation is run using a random policy, which means that each move
is chosen randomly.
34
The backpropagation phase is the process of updating the values of the nodes in the
tree based on the results of the simulation. The values of the nodes are updated
using a formula that takes into account the outcome of the simulation and the prior
probability of the node.
MCTS is a powerful algorithm that has been used to achieve state-of-the-art results in a
variety of games. However, it can be computationally expensive. There are a number
of techniques that can be used to reduce the computational cost of MCTS, such as:
Pruning: Pruning is a technique that removes nodes from the tree that are unlikely to be
relevant.
35
Robustness: MCTS can be made more robust to noise by using techniques such as
averaging the results of multiple simulations.
Parallelization: MCTS can be parallelized by running multiple simulations in parallel.
MCTS is a powerful tool that can be used to solve a wide variety of problems. It is
particularly well- suited for problems that can be modeled as games.
STOCHASTIC GAMES
In real life, there are many unpredictable external events that put us into unforeseen
situations.
Many games mirror this unpredictability by including a random element, such as
the throwing of dice.
In this way, they take us a step nearer reality, and it is worthwhile to see how
this affects the decision- making process.
Backgammon is a typical game that combines luck and skill. Dice are rolled at
the beginning of a player’s turn to determine the legal moves. In the
backgammon position of figure, for example, white has rolled a 6-5, and has
four possible moves.
Although White knows what his or her own legal moves are, White does not
know what Black is going to roll and thus does not know what Black’s legal
moves will be.
That means White cannot construct a standard game tree of the sort we saw in chess
and tic-tac-toe.
A game tree in backgammon must include chance nodes in addition to MAX
and MIN node Chance nodes are shown as circles in Fig.
36
The branches leading from each chance node denote the possible dice rolls, and
each is labeled with the roll and the chance that it will occur.
Figure: A typical backgammon position. The goal of the game is to move all
one’s
piecesoff theboard.
37
There are 36 ways to roll two dice, each equally likely; but because a 6-5 is the
same as a 5-6, there areonly 21 distinct rolls a 1/18 chance each.
The next step is to understand how to make correct
decisions. Obviously, we still want to pick the move
that leads to the best position. However, the resulting
positions do not have definite minimax values.
Instead, we can only calculate the expected value, where the expectation is taken
over all the possibledice rolls that could occur.
This leads us to generalize the minimax value for deterministic games to an
expectiminimax value forgames with chance nodes.
- Terminal nodes and MAX and MIN nodes work exactly the same way as before;
chance nodes are evaluated by taking the weighted average of the values
resulting from all possible dice rolls, that is,
EXPECTIMINIMAX(n) =
∑s∊Successors(n)P(s). EXPECTIMINIMAX(s)
n is a MIN node
if n is a chance node
Where the successor function for a chance node n simply augments the
state of n with each possible dice roll to produce each successor s and
P(s) is the probability that that dice roll occurs. These equations can be
backed up recursively all the way to the root of the tree, just as in
minimax.
- One might think that evaluation functions for games such as backgammon
should be just like evaluation functions for chess – they just need to give
higher scores to better positions.
- But in fact, the presence of chance nodes means that one has to be more careful
about what the evaluation values mean.
Figure: An order preserving transformation on leaf values changes the best move.
39
- Figure shows what happens: with an evaluation function that assigns values
[1,2,3,4] to the leaves, move A1 is best; with values [1,20,30,400], move A2 is
best. Hence, the program behaves totally differently if we make a change in the
scale of some evaluation values.
- It turns out that, to avoid this sensitivity, the evaluation function must be a
positive linear transformation of the probability of winning from a position
- If the program knew in advance all the dice rolls that would occur for the rest
of the game, solving a game with dice would be just like solving a game
without dice, which minimax does in O(bm) time. Because expectiminimax is
also considering all the possible dice-roll sequences, it will take O(bmnm),
where n is the number of distinct rolls.
- Even if the search depth is limited to some small depth d , the extra cost
compared with that of minimax makes it unrealistic to consider looking ahead
very far in most games of chance.
- In backgammon n is 21 and b is usually around 20, but in some situations can
be as high as 4000 for dice rolls that are doubles. Three plies is probably all we
could manage
- Another way to think about the problem is this: the advantage of alpha-beta is
that it ignores future developments that just are not going to happen, given best
play. Thus it concentrates on likely occurrences.
- In games with dice, there are no likely sequences of moves, because for those
moves to take place, the dice would first have to come out the right way to
make them legal.
- This is a general problem whenever uncertainty enters the picture the
possibilities are multiplied enormously, and forming detailed plans of action
becomes pointless, because the world probably will not play along.
- No doubt it will have occurred to the reader that perhaps something like alpha-
beta pruning could be applied to game trees with chance nodes. It turns out that
it can.
- The analysis for MIN and MAX nodes is unchanged, but we can also prune
chance nodes,using a bit of ingenuity.
40
- Consider the chance node C in figure and what happens to its value as we
examine and evaluate its children.
- Is it possible to find an upper bound on the value of C before we have looked at all
its children?
- At first sight, it might seem impossible, because the value of C is the average
of its children’s values. Until we have looked at all the dice rolls, this average
could be anything, because the unexamined children might have any value at
all.
- But if we put bounds on the possible values of the utility function, then we can
arrive at bounds for the average. For example, if we say that all utility values
are between +3 and -3, then the value of leaf nodes is bounded, and in turn we
can place an upper bound on the value of the chance node without looking at
all its children.
41
Part of a guaranteed checkmate in the KRK endgame, shown on a reduced board. In the
initial belief state, Black’s king is in one of three possible locations. By a combination of
probing moves, the strategy narrows this down to one. Completion of the checkmate is left
42
as an exercise.
43
Card Game.
Card games provide many examples of stochastic partial observability, where the
missing information is generated randomly. For example, in many games, cards are dealt
randomly at the beginning of the game, with each player receiving a hand that is not visible
to the other players. Such games include bridge, whist, hearts, and some forms of poker.
At first sight, it might seem that these card games are just like dice games: the
cards are dealt randomly and determine the moves available to each player, but all the
“dice” are rolled at the beginning!
Even though this analogy turns out to be incorrect, it suggests an effective
algorithm: consider all possible deals of the invisible cards; solve each one as if it were a
fully observable game; and then choose the move that has the best outcome averaged over
all the deals. Suppose that each deal s occurs with probability P(s); then the move we want
is
in the current belief state can only be calculated given an optimal randomized strategy; in
turn, computing that strategy seems to require knowing the probabilities of the various
states the board might be in. An equilibrium specifies an optimal randomized strategy for
each player.