0% found this document useful (0 votes)
7 views13 pages

Chapter 5

Unit 5 discusses decision-making principles, focusing on utility theory and the Maximum Expected Utility (MEU) principle, which suggests rational agents should choose actions that maximize expected utility. It outlines constraints on rational preferences, known as the axioms of utility theory, and introduces decision theory, including types of decision-making under certainty, risk, and uncertainty. Additionally, it covers decision trees, sequential decision problems, and game theory applications in multi-agent systems, emphasizing strategic interactions and optimal decision-making in uncertain environments.

Uploaded by

Udayan Tathe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views13 pages

Chapter 5

Unit 5 discusses decision-making principles, focusing on utility theory and the Maximum Expected Utility (MEU) principle, which suggests rational agents should choose actions that maximize expected utility. It outlines constraints on rational preferences, known as the axioms of utility theory, and introduces decision theory, including types of decision-making under certainty, risk, and uncertainty. Additionally, it covers decision trees, sequential decision problems, and game theory applications in multi-agent systems, emphasizing strategic interactions and optimal decision-making in uncertain environments.

Uploaded by

Udayan Tathe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Unit – 5 Decision Making

Basics of utility theory :


• MEU: The principle of maximum expected utility (MEU) says that a rational
agent should choose the action that maximizes the agent’s expected utility.
• The principle of Maximum Expected Utility (MEU) seems like a reasonable way
to make decisions, but it is by no means obvious that it is the only rational way.
• We shall consider following questions like: After all, why should maximizing
the average utility be so special?
• What’s wrong with an agent that maximizes the weighted sum of the cubes of
the possible utilities, or tries to minimize the worst possible loss?
• Could an agent act rationally just by expressing preferences between states,
without giving them numeric values?
• Finally, why should a utility function with the required properties exist at all?

Constraints on Rational Preferences:


• These questions can be answered by writing down some constraints on the
preferences that a rational agent should have and then showing that the MEU
principle can be derived from the constraints.
• We use the following notation to describe an agent’s preferences:

• A what sorts of things are A and B?


• They could be states of the world, but more often than not there is uncertainty
about what is really being offered.

For example, an airline passenger who is offered “the pasta dish or the chicken”
does not know what lurks beneath the tinfoil cover.
• The pasta could be delicious or congealed, the chicken juicy or overcooked
beyond recognition.
• We can think of the set of outcomes for each action as a lottery—think of each
action as a ticket.
• A lottery L with possible outcomes S1,...,Sn that occur with probabilities p1,...,pn
is written
L = [p1, S1; p2, S2; ... pn, Sn]
• Each outcome Si of a lottery can be either an atomic state or another lottery.
• The primary issue for utility theory is to understand how preferences between
complex lotteries are related to preferences between the underlying states in
those lotteries.
• To address this issue we list six constraints that we require any reasonable
preference relation to obey

1.Orderability: Given any two lotteries, a rational agent must either prefer one to the
other or else rate the two as equally preferable. That is, the agent cannot avoid deciding.

2. Transitivity: Given any three lotteries, if an agent prefers A to B and prefers B to C,


then the agent must prefer A to C.

3. Continuity: If some lottery B is between A and C in preference, then there is some


probability p for which the rational agent will be indifferent between getting B for sure
and the lottery that yields A with probability p and C with probability 1 − p.

4. Substitutability: If an agent is indifferent between two lotteries A and B, then the


agent is indifferent between two more complex lotteries that are the same except that B
is substituted for A in one of them. This holds regardless of the probabilities and the
other outcome(s) in the lotteries.
5. Monotonicity: Suppose two lotteries have the same two possible outcomes, A and
B. If an agent prefers A to B, then the agent must prefer the lottery that has a higher
probability for A (and vice versa).

6. Decomposability: Compound lotteries can be reduced to simpler ones using the laws
of probability. This has been called the “no fun in gambling” rule because it says that
two consecutive lotteries can be compressed into a single equivalent lottery, as shown
in Figure 5.1 (b)

• These constraints are known as the axioms of utility theory.


• Each axiom can be motivated by showing that an agent that violates it will
exhibit patently irrational behaviour in some situations.
• For example, we can motivate transitivity by making an agent with non-transitive
preferences give us all its money.
• If the agent currently has A, then we could offer to trade C for A plus one cent.
• The agent prefers C, and so would be willing to make this trade.
• We could then offer to trade B for C, extracting another cent, and finally trade A
for B.
• This brings us back where we started from, except that the agent has given us
three cents as shown in Figure 5.1 (a).
• We can keep going around the cycle until the agent has no money at all. Clearly,
the agent has acted irrationally in this case.
• Suppose that the agent has the non-transitive preferences A > B > C > A, where
A, B, and C are goods that can be freely exchanged.
Figure 5.1 (a) A cycle of exchanges showing that the nontransitive preferences A
> B > C > A result in irrational behavior. (b) The decomposability axiom.

Preferences lead to Utility:

• Notice that the axioms of utility theory are really axioms about preferences—
they say nothing about a utility function.
• But in fact from the axioms of utility we can derive the following consequences
(for the proof, see von Neumann and Morgenstern, 1944):
1. Existence of Utility Function: If an agent’s preferences obey the axioms of
utility, then there exists a function U such that U(A) > U(B) if and only if A is
preferred to B, and U(A) = U(B) if and only if the agent is indifferent between A
and B.

2. Expected Utility of a Lottery: The utility of a lottery is the sum of the probability
of each outcome times the utility of that outcome.
• In other words, once the probabilities and utilities of the possible outcome states
are specified, the utility of a compound lottery involving those states is
completely determined.
• Because the outcome of a nondeterministic action is a lottery, it follows that an
agent can act rationally— that is, consistently with its preferences—only by
choosing an action that maximizes expected utility.

Decision Theory :
Types of Decision Making:
• Under Certainty: have complete knowledge
• Under Risk: State of nature is unknown and have only probability of occurrence.
• Under Uncertainty: Hardly any knowledge of state of nature and have no
probability of occurrence.
• Under Conflicts: Neither states of nature are completely known nor are
completely uncertain. E.g. Branding Advertisement.
Definition: Decision theory deals with methods for determining the optimal course of
action when a number of alternatives are available and their consequences cannot be
forecast with certainty.
Components:
• Decision Maker: Individual / Group
• Course of Action: All possible actions
• State of Nature: more possible future events, called states of nature, that might
occur
• Payoff: What we get in return, profit/loss.

Decision Tree :
• A decision tree is a map of the possible outcomes of a series of related choices.
• It allows an individual or organization to weigh possible actions against one
another based on their costs, probabilities, and benefits.
• They can can be used either to drive informal discussion or to map out an
algorithm that predicts the best choice mathematically.
• Decision trees are commonly used in operations research, specifically in decision
analysis, to help identify a strategy most likely to reach a goal, but are also a
popular tool in machine learning.
• A decision tree is a flowchart-like structure in which each internal node
represents a "test" on an attribute, each branch represents the outcome of the test,
and each leaf node represents a class label (decision taken after computing all
attributes).
• The paths from root to leaf represent classification rules.
A decision tree consists of three types of nodes:
1. Decision nodes – typically represented by squares, shows a decision to be made.
2. Chance nodes – typically represented by circles, shows the probabilities of
certain results.
3. End nodes – typically represented by triangles, shows the final outcome of a
decision path.

Advantages
• Easy to understand
• They can be useful with or without hard data, and any data requires
minimal preparation.
• New options can be added to existing trees.
• Their value in picking out the best of several options.
• How easily they combine with other decision making tools.
Disadvantages
• No Convergence Path.
• Tree is very large become excessively complex.
Sequential Decision Problem:

• Here, we address the computational issues involved in making decisions in a


stochastic environment.
• we are concerned with sequential decision problems, in which the agent’s
utility depends on a sequence of decisions.
• Sequential decision problems incorporate utilities, uncertainty, and sensing,
and include search and planning problems as special cases.
• Def: In sequential decision problems the utility of agent's actions do not depend
on single decisions, expressed with the state, which the agent would have
gotten into, as the result of this decision, but rather on the whole sequence of
agent's action.
• Let’s understand with an example.
• Suppose that an agent is situated in the 4 × 3 environment shown in Figure
5.2(a) in next slide.
• Beginning in the start state, it must choose an action at each time step.
• The interaction with the environment terminates when the agent reaches one of
the goal states, marked +1 or –1.
• Just as for search problems, the actions available to the agent in each state are
given by ACTIONS(s), sometimes abbreviated to A(s); in the 4 × 3
environment, the actions in every state are Up, Down, Left, and Right.
• We assume for now that the environment is fully observable, so that the agent
always knows where it is.
Example:
• An agent is in the field start, and can move in any direction between the field.
Its actions ends when it reaches one of the fields (4,2) or (4,3), with the result
marked in those fields.
Figure 5.2 (a) A simple 4 ×3 environment that presents the agent with a sequential
decision problem. (b) Illustration of the transition model of the environment: the
“intended” outcome occurs with probability 0.8, but with probability 0.2 the agent
moves at right angles to the intended direction. A collision with a wall results in no
movement. The two terminal states have reward +1 and –1, respectively, and all other
states have a reward of –0.04.
• If the environment were deterministic, a solution would be easy: [Up, Up, Right,
Right, Right].
• Unfortunately, the environment won’t always go along with this solution,
because the actions are unreliable.
• The particular model of stochastic motion that we adopt is illustrated in Figure
5.2(b).
• Each action achieves the intended effect with probability 0.8, but the rest of the
time, the action moves the agent at right angles to the intended direction.
Furthermore, if the agent bumps into a wall, it stays in the same square.
• In such an environment, the sequence [Up, Up, Right, Right, Right] goes up
around the barrier and reaches the goal state at (4,3) with probability 0.85 =
0.32768.
• There is also a small chance of accidentally reaching the goal by going the other
way around with probability 0.14 × 0.8=0.0008, for a grand total of 0.32776
(0.32768+0.0008).
• The transition model (or just “model,” whenever no confusion can arise)
describes the outcome of each action in each state.
• Here, the outcome is stochastic, so we write P(s`| s, a) to denote the probability
of reaching state s` if action a is done in state s.
• We will assume that transitions are Markovian, that is, the probability of
reaching s` from s depends only on s and not on the history of earlier states.
• For now, you can think of P(s`|s, a) as a big three-dimensional table containing
probabilities.
• To complete the definition of the task environment, we must specify the utility
function for the agent. Because the decision problem is sequential, the utility
function will depend on a sequence of states—an environment history—rather
than on a single state.
• Utility functions can be specified in general; for now, we simply stipulate that in
each state s, the agent receives a reward R(s), which may be positive or negative,
but must be bounded.
• For our particular example, the reward is −0.04 in all states except the terminal
states (which have rewards +1 and –1).
• The utility of an environment history is just (for now) the sum of the rewards
received. For example, if the agent reaches the +1 state after 10 steps, its total
utility will be 0.6 (1+(10*(-0.04)). The negative reward of –0.04 gives the agent
an incentive to reach (4,3).
• To sum up: a sequential decision problem for a fully observable, stochastic
environment with a Markovian transition model and additive rewards is called a
Markov decision process(MDP) and consists of a set of states (with an initial
state s0); a set ACTIONS(s) of actions in each state; a transition model P(s` | s,
a); and a reward function R(s).
Game Theory:
• Game Theory is a branch of mathematics used to model the strategic interaction
between different players in a context with predefined rules and outcomes.
• Game Theory can be applied in different ambit of Artificial Intelligence:
• Multi-agent AI systems.
• Imitation and Reinforcement Learning.
• Adversary training in Generative Adversarial Networks (GANs).
• Game Theory can also be used to describe many situations in our daily life and
Machine Learning models (Figure).

Fig : Game Theory Applications


Game Theory can be divided into 5 main types of games:
• Cooperative vs Non-Cooperative Games: In cooperative games, participants can
establish alliances in order to maximise their chances to win the game (eg.
negotiations). In non-cooperative games, participants can’t form alliances (eg.
wars).
• Symmetric vs Asymmetric Games: In a symmetric game all the participants have
the same goals and just their strategies implemented in order to achieve them
will determine who wins the game (eg. chess). In asymmetric games instead, the
participants have different or conflicting goals.
• Perfect vs Imperfect Information Games: In Perfect Information games all the
players can see the other players moves (eg. chess). Instead, in Imperfect
Information games, the other players' moves are hidden (eg. card games).
• Simultaneous vs Sequential Games: In Simultaneous games, the different players
can take actions concurrently. Instead in Sequential games, each player is aware
of the other players' previous actions (eg. board games).
• Zero-Sum vs Non-Zero Sum Games: In Zero Sum games, if a player gains
something that causes a loss to the other players. In Non-Zero Sum games,
instead, multiple players can take benefit of the gains of another player.

Decision with Multiple Agents: Game Theory :


The game theory can be used in at least two ways:
1. Agent Design:
• Game theory can analyze the agent’s decisions and compute the expected
utility for each decision.
• For example, in the game two-finger Morra, two players, O and E,
simultaneously display one or two fingers.
• Let the total number of fingers be f. If f is odd, O collects f dollars from
E; and if f is even, E collects f dollars from O.
• Game theory can determine the best strategy against a rational player and
the expected return for each player.
2. Mechanism Design:
● When an environment is inhabited by many agents, it might be possible to define
the rules of the environment (i.e., the game that the agents must play) so that the
collective good of all agents is maximized when each agent adopts the game-
theoretic solution that maximizes its own utility.
● For example, game theory can help design the protocols for a collection of
Internet traffic routers so that each router has an incentive to act in such a way
that global throughput is maximized.
● Mechanism design can also be used to construct intelligent multiagent systems
that solve complex problems in a distributed fashion.

The single-move game:


● Here we discuss restricted set of games where all players take action
simultaneously and the result of the game is based on this single set of actions.
● The restriction to a single move might make this seem trivial, but in fact, game
theory is serious business.
● It is used in decision-making situations including the auctioning of oil drilling
rights and wireless frequency spectrum rights, bankruptcy proceedings, product
development and pricing decisions, and national defense—situations involving
billions of dollars and hundreds of thousands of lives.
● The single-move game is defined by three component:
● Players or agents who will be making decisions. Two-player games have
received the most attention, although n-player games for n > 2 are also
common. We give players capitalized names, like Alice and Bob or O
and E.
● Actions that the players can choose. We will give actions lowercase
names, like one or testify. The players may or may not have the same set
of actions available.
● Payoff Function that gives the utility to each player for each
combination of actions by all the players. For single-move games the
payoff function can be represented by a matrix, a representation known
as the strategic form (also called normal form).
• The payoff matrix for two-finger Morra is as follows:

• For example, the lower-right corner shows that when player O chooses action
two and E also chooses two, the payoff is +4 for E and −4 for O.
• Each player in a game must adopt and then execute a strategy (which is the
name used in game theory for a policy).
• A pure strategy is a deterministic policy; for a single-move game, a pure
strategy is just a single action.
• For many games an agent can do better with a mixed strategy, which is a
randomized policy that selects actions according to a probability distribution.
• The mixed strategy that chooses action a with probability p and action b
otherwise is written [p: a; (1 − p): b].
• For example, a mixed strategy for two-finger Morra might be [0.5: one;
0.5:two].
• A strategy profile is an assignment of a strategy to each player; given the
strategy profile, the game’s outcome is a numeric value for each player.
• A solution to a game is a strategy profile in which each player adopts a rational
strategy.

You might also like