Fai Unit 4 Notes
Fai Unit 4 Notes
Regulatio
n 2021
(AUTONOMOUS)
UNIT IV-DECISION-MAKING
Utility theory forms the mathematical foundation for decision-making under uncertainty. It enables an
agent to quantify preferences, evaluate outcomes, and make rational decisions based on the principle
of maximizing expected utility. The framework is widely used in artificial intelligence, economics,
game theory, and decision sciences.
Utility theory provides a rigorous framework for modeling rational decision-making under
uncertainty. By satisfying fundamental axioms and leveraging utility functions, agents can evaluate
actions systematically and optimize their decisions. The Von Neumann-Morgenstern theorem ensures
that rational preferences can be consistently represented through a utility function, making utility
theory indispensable in AI and decision sciences.
Utility theory is grounded in an agent's preferences, which must satisfy specific rationality axioms.
These axioms ensure consistency in decision-making:
1. Completeness: For any two outcomes A and B, an agent must prefer one over the other or
consider them equally preferable:
A≻BandB≻C ⟹ A≻C.
3. Continuity: If A≻B≻C, there exists a probability p such that the agent is indifferent between
B and a lottery that offers A with probability p and C with probability (1−p):
B∼pA+(1−p)C.
4. Independence: If A≻B, then for any outcome C and probability p, the preference remains
consistent when combined with C:
pA+(1−p)C≻pB+(1−p)C.
A utility function U(x) maps outcomes to real numbers, where higher values represent more desirable
outcomes:
U:Outcomes→R.
U′(x)=aU(x)+b,a>0.
Example: If an agent prefers A over B and B over C, then the utility values might be assigned as
U(A)=100, U(B)=50 and U(C)=0.
Expected utility theory is used when outcomes are uncertain. The expected utility of an action A is
computed as the weighted sum of utilities of all possible outcomes:
EU(A)=∑iP(Oi∣A)⋅U(Oi),
where:
A rational agent selects the action A∗ that maximizes the expected utility:
Consider an agent deciding whether to buy insurance to mitigate financial losses from an accident.
Outcomes:
o O1: No accident occurs, U(O1)=100.
o O2: Accident occurs, U(O2)=−500.
Probabilities:
o P(O1)=0.9,
o P(O2)=0.1.
Expected Utility of Buying Insurance: If insurance reduces the loss to −100-100−100, then:
EU(Buy Insurance)=(0.9⋅100)+(0.1⋅−100)=90−10=80.
EU(No Insurance)=(0.9⋅100)+(0.1⋅−500)=90−50=40.
The Von Neumann-Morgenstern theorem formalizes the connection between rational preferences and
utility functions. It states that if preferences satisfy the axioms of completeness, transitivity,
continuity, and independence, then there exists a utility function U such that:
A≻B ⟺ U(A)>U(B).
Implications
Agents can exhibit different attitudes toward risk, represented by the shape of their utility functions:
1. Risk-Neutral:
o Evaluates outcomes based solely on expected value.
o Linear utility function: U(x)=x.
2. Risk-Averse:
o Prefers certain outcomes over risky ones with the same expected value.
o Concave utility function
3. Risk-Seeking:
o Prefers risky outcomes over certain ones with the same expected value.
o Convex utility function:
For many goods, such as wealth, the utility increases at a decreasing rate, capturing the idea of
diminishing returns:
U(x)=log(x).
4.1.5. Applications of Utility Theory
1. Artificial Intelligence:
o Decision-making in autonomous agents.
o Game-playing and strategy optimization.
2. Economics:
o Consumer choice modeling.
o Cost-benefit analysis.
3. Healthcare:
o Treatment planning under uncertainty.
4. Robotics:
Preferences, as expressed by utilities, are combined with probabilities in the general theory of
rational decisions called decision theory:
The fundamental idea of decision theory is that the action that yields the highest expected utility,
averaged over all the possible outcomes an agent is rational if and only if it chooses J of the
action. This is called the principle of maximum expected utility (MEU). Here, “expected” means
the “average,” or “statistical mean” of the outcome utilities, weighted by the probability of the
outcome
Decision theory, in its simplest form, deals with choosing among actions based on the
desirability of their immediate outcomes; that is, the environment is assumed to be episodic. The
agent’s preferences are captured by a utility function, U(s), which assigns a single number to
express the Utility function desirability of a state. The expected utility of an action given the
evidence, EU(a), is just the Expected utility average utility value of the outcomes, weighted by
the probability that the outcome occurs:
The principle of maximum expected utility (MEU) says that a rational agent should choose the
action that maximizes the agent’s expected utility:
In a sense, the MEU principle could be seen as a prescription for intelligent behavior. All an
intelligent agent has to do is calculate the various quantities, maximize utility over its actions,
and away it goes. But this does not mean that the AI problem is solved by the definition. The
MEU principle formalizes the general notion that an intelligent agent should “do the right thing,”
but does not operationalize that advice. Estimating the probability distribution P(s) over possible
states of the world, which folds into P(RESULT(a)=s ’ ), requires perception, learning, knowledge
representation, and inference. Computing P(RESULT(a)= s ’ ) itself requires a causal model of
the world. There may be many actions to consider, and computing the outcome utilities U(s ’)
may itself require further searching or planning because an agent may not know how good a state
is until it knows where it can get to from that state. An AI system acting on behalf of a human
may not know the human’s true utility function, so there may be uncertainty about U. In
summary, decision theory is not a panacea that solves the AI problem,but it does provide the
beginnings of a basic mathematical framework that is general enough to define the AI problem.
Decision Theory combines probability theory (reasoning about uncertainty) with utility theory
(reasoning about preferences) to determine the best action an agent can take under uncertain
conditions.
Goal: Help rational agents choose actions that maximize their expected utility, even when
outcomes are uncertain.
Widely applied in:
o Artificial intelligence (AI),
o Economics,
o Engineering,
o Healthcare, and more.
An agent's choices are represented as actions. Each action may result in multiple possible outcomes
depending on the uncertainty in the environment.
Outcomes
An outcome is the result of performing an action. It depends on both the agent's action and the state
of the environment.
Probabilities
Probabilities quantify the likelihood of an outcome. The probability of outcome OiO_iOi given action
A is represented as:
P(Oi∣A).
Utilities
Utilities measure how desirable an outcome is to the agent. They allow outcomes to be ranked
numerically based on preferences:
U(O):Outcome→R.
4.2.3. Expected Utility
Expected utility is the central concept in decision theory, providing a quantitative basis for choosing
between actions.
For an action A with possible outcomes O1,O2,…,On the expected utility is:
where:
Decision Rule
1. Chance Nodes: Represent random variables and uncertainties (e.g., weather conditions).
2. Decision Nodes: Represent the agent’s choices.
3. Utility Nodes: Represent the agent’s utility function.
Example structure:
[Chance Node: Weather] ---> [Decision Node: Carry Umbrella?] ---> [Utility Node: Comfort]
Here:
Graphical Components
An agent needs to decide whether to carry an umbrella based on the probability of rain.
Given Data:
Probabilities:
o P(Rain)=0.3,
o P(No Rain)=0.7.
Utilities:
o U(Carry Umbrella, Rain)=10,
o U(Carry Umbrella, No Rain)=5,
o U(No Umbrella, Rain)=−20,
o U(No Umbrella, No Rain)=15.
Since EU(Carry Umbrella)>EU(No Umbrella), the agent should carry the umbrella.
Graphical Representation:
Risk-Neutral Agent
Risk-Averse Agent
Prefers certain outcomes to risky ones with the same expected value.
Concave utility function:
Risk-Seeking Agent
Prefers risky outcomes to certain ones with the same expected value.
Convex utility function: U(x)=x2.
A graph can represent these risk preferences, showing linear, concave, and convex utility
functions.
1. Artificial Intelligence:
o Rational agents in uncertain environments.
o Game-playing algorithms.
2. Economics:
o Cost-benefit analysis.
o Consumer behavior modeling.
3. Robotics:
o Path planning with uncertain obstacles.
4. Healthcare:
o Treatment planning under probabilistic outcomes.
1. Complexity:
o Real-world decision problems can involve large state spaces and many variables.
2. Uncertainty in Probabilities:
o Estimating probabilities accurately is often challenging.
3. Subjective Utilities:
o Assigning utility values may involve bias or inconsistency.
Problem Setup
The agent navigates a 4×3 grid, starting from (1,1) and aiming for goal states (+1 or -1).
The environment is fully observable, meaning the agent knows its location at all times.
The agent has four actions: Up, Down, Left, Right.
Stochastic Environment
The probability of reaching a state s' from state s using action a is given by P(s' | s, a)
(Markovian property).
The agent receives a reward (R(s, a, s’)) after each transition:
o +1 for reaching the goal at (4,3).
o -1 for reaching the negative goal.
o -0.04 for all other transitions, encouraging the agent to reach the goal quickly.
A policy (π) defines the best action π(s) to take in each state.
Optimal Policy (π)* maximizes the expected utility over time.
Since outcomes are stochastic, different executions of the same policy can lead to different
sequences of states.
Real-World Relevance
Figure: (a) A simple, stochastic 4×3 environment that presents the agent with a sequential decision
problem. (b) Illustration of the transition model of the environment: the “intended” outcome occurs
with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction.
A collision with a wall results in no movement. Transitions into the two terminal states have reward
+1 and –1, respectively, and all other transitions have a reward of –0.04.
In MDPs, agent performance is measured by the sum of rewards obtained through state transitions.
The utility function UhU_hUh defines how rewards are aggregated over time.
oStationary Preference: The ranking of future sequences remains the same if they shift
in time.
o Mathematical Simplicity: Ensures finite utility values and prevents infinite rewards
from improper policies.
4. Alternative Reward Models:
o Proper policies: Ensure reaching a terminal state, allowing undiscounted rewards.
o Average reward per step: Useful but complex to analyze.
Additive discounted rewards are preferred for MDP analysis due to their mathematical advantages
and real-world applicability.
Figure: (a) The optimal policies for the stochastic environment with r= − 0.04 for transitions between
nonterminal states. There are two policies because in state (3,1) both Left and Up are optimal. (b)
Optimal policies for four different ranges of r.
2. Bellman Equation:
III Year/ VI Semester/Unit 4 Page 13
21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021
3. Q-Function Q(s,a):
o Represents the expected utility of taking action aaa in state sss.
o Defined as:
Applications:
These concepts are used in Reinforcement Learning (RL) to compute optimal policies in
decision-making problems.
The Bellman equation is the foundation of Value Iteration and Policy Iteration.
The Q-function is central to Q-learning, a model-free RL algorithm.
Figure: The utilities of the states in the 4×3 world with γ =1 and r= −0.04 for transitions to
nonterminal states.
Scaling and shifting the utility function (U(s)) or reward function (R(s,a,s′)) by constants m
and b does not change the optimal policy:
This follows from the definition of utility as the sum of discounted rewards.
Shaping Theorem:
This transformation preserves the optimal policy but can speed up learning by making
rewards more informative.
The term γΦ(s′)−Φ(s) acts like a potential gradient, guiding the agent uphill towards higher-
utility states.
If we set Φ(s)=U(s), the modified reward function makes greedy action selection optimal.
Practical Implications:
For large-scale MDPs, traditional table-based representations of rewards and transitions are
computationally expensive. Using Dynamic Decision Networks (DDNs) provides a more efficient
way to model complex systems by factoring the problem into smaller parts. This approach is useful in
domains like robotics and games such as Tetris, where state and action spaces are large, but the
network structure enables more efficient computations and policy evaluation.
Figure: A dynamic decision network for a mobile robot with state variables for battery level,
charging status, location, and velocity, and action variables for the left and right wheel motors
and for charging.
4. Tetris Example:
o The state variables for Tetris include CurrentPiece, NextPiece, and a bit-vector
Filled representing the board’s state.
o The DDN for Tetris models the game over time and shows how the board’s state
evolves based on the action taken (i.e., placing pieces).
o Tetris is a well-studied MDP where every policy eventually leads to a terminal state
(the board filling up).
Figure: (a) The game of Tetris. The T-shaped piece at the top center can be dropped in any
orientation and in any horizontal position. If a row is completed, that row disappears and the
rows above it move down, and the agent receives one point. The next piece (here, the L-
shaped piece at top right) becomes the current piece, and a new next piece appears, chosen at
random from the seven piece types. The game ends if the board fills up to the top. (b) The
DDN for the Tetris MDP.
Game theory studies interactions between multiple decision-makers (agents), each of whom may have
competing or cooperative goals. It provides a mathematical framework to model strategic interactions,
and it plays a crucial role in the study of multi-agent systems.
Game: A game is defined by a set of players, actions (strategies), and payoffs (rewards or
costs).
Players: The participants in the game who make decisions based on the game's rules.
Actions: The choices available to each player.
Payoffs: The outcomes for each player resulting from a particular combination of strategies
chosen by all players.
Strategies: A strategy for a player is a complete plan of actions for every possible situation in
the game.
Outcome: The result of a particular combination of strategies.
Games can be classified as:
o Cooperative vs. Non-cooperative: In cooperative games, players can form binding
agreements, whereas in non-cooperative games, players cannot make enforceable
agreements.
o Zero-sum vs. Non-zero-sum: A zero-sum game is one where the total payoff is
constant, so one player’s gain is the other’s loss. Non-zero-sum games involve mutual
gains or losses.
o Simultaneous vs. Sequential: In simultaneous games, players choose their actions at
the same time, whereas in sequential games, players take turns making decisions.
Rationality: Players are assumed to act rationally, meaning they choose strategies that
maximize their expected utility.
Nash Equilibrium (NE): A set of strategies where no player can improve their payoff by
unilaterally changing their strategy, assuming the other players’ strategies remain unchanged.
o Pure Strategy Nash Equilibrium: Each player chooses a single strategy.
o Mixed Strategy Nash Equilibrium: Players randomize over strategies, and the
equilibrium occurs when each player is indifferent to their choices.
The Prisoner’s Dilemma is a well-known example of game theory where the dominant
strategy (defecting) leads to a worse outcome for both players than if they had cooperated.
Extensive Form: A more detailed representation of a game, showing the sequence of moves,
players’ choices, and possible outcomes. It is typically represented using a game tree.
o Game Trees: Nodes represent decision points, and branches represent possible actions.
o Backward Induction: A method of solving extensive form games by starting from the
end of the game (terminal nodes) and working backward to determine the optimal
strategy for each player at each decision point.
Subgame Perfect Equilibrium: An extension of Nash equilibrium for extensive-form games.
A strategy profile is a subgame perfect equilibrium if it represents a Nash equilibrium in every
subgame of the original game.
4. Mixed Strategies
Mixed Strategy: A strategy where a player chooses between actions according to some
probability distribution. This is used when there is no dominant pure strategy.
o Expected Utility: Players calculate the expected value of the payoff from each action
and choose the one that maximizes their expected utility.
Repeated Games: A game that is played multiple times by the same players. Strategies in
repeated games can involve cooperation or retaliation.
o Grim Trigger Strategy: A strategy where a player cooperates until the other player
defects, and then they defect forever after.
o Discounting: In repeated games, players may discount future payoffs, which affects
the strategy choices and outcomes.
Bayesian Games: These are games where players have incomplete information about each
other (e.g., their payoffs, strategies, or types). Players form beliefs about the unknown
information and update these beliefs using Bayes' theorem.
Beliefs and Types: In a Bayesian game, players have private information called types, and
they form beliefs about the types of other players.
Bayes-Nash Equilibrium: An equilibrium concept for Bayesian games where players
maximize their expected utility given their beliefs about other players' strategies.
Evolutionary Game Theory studies strategies that evolve over time, particularly in the
context of biological and social systems. It focuses on how strategies spread in populations,
where the payoff depends on the frequency of different strategies within the population.
o Evolutionarily Stable Strategy (ESS): A strategy that, if adopted by most of the
population, cannot be invaded by any alternative strategy.
Applications in AI
Game theory plays a critical role in multi-agent systems and agent-based modeling, where different
agents interact, make decisions, and potentially cooperate or compete with each other. It is widely
applied in areas such as:
Mechanism Design: Designing systems where agents interact, such as auctions or matching
markets.
Multi-Agent Coordination: Coordination problems where multiple agents need to work
together efficiently.
Adversarial Games: Examples like chess or Go, where two players compete with perfect
information.