0% found this document useful (0 votes)
77 views21 pages

Fai Unit 4 Notes

The document covers the fundamentals of decision-making in artificial intelligence, focusing on utility theory and decision theory. It explains key concepts such as expected utility, risk preferences, and the principles guiding rational decision-making under uncertainty. Applications of these theories span various fields including AI, economics, and healthcare, while also addressing challenges and limitations in real-world scenarios.

Uploaded by

hareshvarsp734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
77 views21 pages

Fai Unit 4 Notes

The document covers the fundamentals of decision-making in artificial intelligence, focusing on utility theory and decision theory. It explains key concepts such as expected utility, risk preferences, and the principles guiding rational decision-making under uncertainty. Applications of these theories span various fields including AI, economics, and healthcare, while also addressing challenges and limitations in real-world scenarios.

Uploaded by

hareshvarsp734
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

21IT306 – Fundamentals of Artificial Intelligence VCET

Regulatio
n 2021

VELAMMAL COLLEGE OF ENGINEERING AND TECHNOLOGY

(AUTONOMOUS)

DEPARTMENT OF INFORMATION TECHNOLOGY

Fundamentals of Artificial Intelligence

UNIT IV-DECISION-MAKING

Basics of utility theory-Decision theory-Sequential decision problems-Elementary game


theory

CO4: Infer the basics of decision making.

4.1 The Basis of Utility Theory:

Utility theory forms the mathematical foundation for decision-making under uncertainty. It enables an
agent to quantify preferences, evaluate outcomes, and make rational decisions based on the principle
of maximizing expected utility. The framework is widely used in artificial intelligence, economics,
game theory, and decision sciences.

Utility theory provides a rigorous framework for modeling rational decision-making under
uncertainty. By satisfying fundamental axioms and leveraging utility functions, agents can evaluate
actions systematically and optimize their decisions. The Von Neumann-Morgenstern theorem ensures
that rational preferences can be consistently represented through a utility function, making utility
theory indispensable in AI and decision sciences.

4.1.1. Core Concepts of Utility Theory


Preferences and Rationality

Utility theory is grounded in an agent's preferences, which must satisfy specific rationality axioms.
These axioms ensure consistency in decision-making:

1. Completeness: For any two outcomes A and B, an agent must prefer one over the other or
consider them equally preferable:

A≻B (A is preferred to B),B≻A,or A∼B (A and B are equally preferred).

2. Transitivity: Preferences must be logically consistent:

A≻BandB≻C ⟹ A≻C.

III Year/ VI Semester/Unit 4 Page 1


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

3. Continuity: If A≻B≻C, there exists a probability p such that the agent is indifferent between
B and a lottery that offers A with probability p and C with probability (1−p):

B∼pA+(1−p)C.

4. Independence: If A≻B, then for any outcome C and probability p, the preference remains
consistent when combined with C:

pA+(1−p)C≻pB+(1−p)C.

Utility as a Numerical Representation of Preferences

A utility function U(x) maps outcomes to real numbers, where higher values represent more desirable
outcomes:

U:Outcomes→R.

 Utility functions allow an agent to rank and compare outcomes quantitatively.


 They are unique up to a positive linear transformation:

U′(x)=aU(x)+b,a>0.

Example: If an agent prefers A over B and B over C, then the utility values might be assigned as
U(A)=100, U(B)=50 and U(C)=0.

4.1.2. Expected Utility Theory


Decision-Making Under Uncertainty

Expected utility theory is used when outcomes are uncertain. The expected utility of an action A is
computed as the weighted sum of utilities of all possible outcomes:

EU(A)=∑iP(Oi∣A)⋅U(Oi),

where:

 EU(A): Expected utility of action A,


 P(Oi∣A): Probability of outcome Oi given action A,
 U(Oi): Utility of outcome Oi.

Principle of Maximum Expected Utility

A rational agent selects the action A∗ that maximizes the expected utility:

III Year/ VI Semester/Unit 4 Page 2


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

Example: Buying Insurance

Consider an agent deciding whether to buy insurance to mitigate financial losses from an accident.

 Outcomes:
o O1: No accident occurs, U(O1)=100.
o O2: Accident occurs, U(O2)=−500.

 Probabilities:
o P(O1)=0.9,
o P(O2)=0.1.

 Expected Utility of Buying Insurance: If insurance reduces the loss to −100-100−100, then:

EU(Buy Insurance)=(0.9⋅100)+(0.1⋅−100)=90−10=80.

 Expected Utility of Not Buying Insurance:

EU(No Insurance)=(0.9⋅100)+(0.1⋅−500)=90−50=40.

Since EU(Buy Insurance)>EU(No Insurance), the rational choice is to buy insurance.

4.1.3. Von Neumann-Morgenstern Utility Theorem


Theorem Statement

The Von Neumann-Morgenstern theorem formalizes the connection between rational preferences and
utility functions. It states that if preferences satisfy the axioms of completeness, transitivity,
continuity, and independence, then there exists a utility function U such that:

A≻B ⟺ U(A)>U(B).

The agent’s preferences can be represented as the maximization of expected utility.

Implications

 Preferences that satisfy these axioms are called rational preferences.


 The theorem justifies the use of utility functions for modeling decision-making under
uncertainty.

III Year/ VI Semester/Unit 4 Page 3


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

4.1.4. Risk Attitudes and Utility Functions


Risk Preferences

Agents can exhibit different attitudes toward risk, represented by the shape of their utility functions:

1. Risk-Neutral:
o Evaluates outcomes based solely on expected value.
o Linear utility function: U(x)=x.

2. Risk-Averse:
o Prefers certain outcomes over risky ones with the same expected value.
o Concave utility function

3. Risk-Seeking:
o Prefers risky outcomes over certain ones with the same expected value.
o Convex utility function:

Diminishing Marginal Utility

For many goods, such as wealth, the utility increases at a decreasing rate, capturing the idea of
diminishing returns:

U(x)=log(x).
4.1.5. Applications of Utility Theory

Utility theory is applied across various domains, including:

1. Artificial Intelligence:
o Decision-making in autonomous agents.
o Game-playing and strategy optimization.
2. Economics:
o Consumer choice modeling.
o Cost-benefit analysis.
3. Healthcare:
o Treatment planning under uncertainty.
4. Robotics:

III Year/ VI Semester/Unit 4 Page 4


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

o Balancing competing objectives in task execution.

4.2 Decision theory:

Preferences, as expressed by utilities, are combined with probabilities in the general theory of
rational decisions called decision theory:

Decision theory = probability theory+utility theory .

The fundamental idea of decision theory is that the action that yields the highest expected utility,
averaged over all the possible outcomes an agent is rational if and only if it chooses J of the
action. This is called the principle of maximum expected utility (MEU). Here, “expected” means
the “average,” or “statistical mean” of the outcome utilities, weighted by the probability of the
outcome

Decision theory, in its simplest form, deals with choosing among actions based on the
desirability of their immediate outcomes; that is, the environment is assumed to be episodic. The
agent’s preferences are captured by a utility function, U(s), which assigns a single number to
express the Utility function desirability of a state. The expected utility of an action given the
evidence, EU(a), is just the Expected utility average utility value of the outcomes, weighted by
the probability that the outcome occurs:

The principle of maximum expected utility (MEU) says that a rational agent should choose the
action that maximizes the agent’s expected utility:

In a sense, the MEU principle could be seen as a prescription for intelligent behavior. All an
intelligent agent has to do is calculate the various quantities, maximize utility over its actions,
and away it goes. But this does not mean that the AI problem is solved by the definition. The
MEU principle formalizes the general notion that an intelligent agent should “do the right thing,”
but does not operationalize that advice. Estimating the probability distribution P(s) over possible
states of the world, which folds into P(RESULT(a)=s ’ ), requires perception, learning, knowledge
representation, and inference. Computing P(RESULT(a)= s ’ ) itself requires a causal model of
the world. There may be many actions to consider, and computing the outcome utilities U(s ’)
may itself require further searching or planning because an agent may not know how good a state
is until it knows where it can get to from that state. An AI system acting on behalf of a human
may not know the human’s true utility function, so there may be uncertainty about U. In
summary, decision theory is not a panacea that solves the AI problem,but it does provide the
beginnings of a basic mathematical framework that is general enough to define the AI problem.

III Year/ VI Semester/Unit 4 Page 5


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

4.2.1. Introduction to Decision Theory

Decision Theory combines probability theory (reasoning about uncertainty) with utility theory
(reasoning about preferences) to determine the best action an agent can take under uncertain
conditions.

 Goal: Help rational agents choose actions that maximize their expected utility, even when
outcomes are uncertain.
 Widely applied in:
o Artificial intelligence (AI),
o Economics,
o Engineering,
o Healthcare, and more.

4.2.2. Key Components of Decision Theory


Actions

An agent's choices are represented as actions. Each action may result in multiple possible outcomes
depending on the uncertainty in the environment.

Outcomes

An outcome is the result of performing an action. It depends on both the agent's action and the state
of the environment.

Probabilities

Probabilities quantify the likelihood of an outcome. The probability of outcome OiO_iOi given action
A is represented as:

P(Oi∣A).
Utilities

Utilities measure how desirable an outcome is to the agent. They allow outcomes to be ranked
numerically based on preferences:

U(O):Outcome→R.
4.2.3. Expected Utility

Expected utility is the central concept in decision theory, providing a quantitative basis for choosing
between actions.

III Year/ VI Semester/Unit 4 Page 6


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

Expected Utility Formula

For an action A with possible outcomes O1,O2,…,On the expected utility is:

where:

 P(Oi∣A): Probability of outcome Oi given A,


 U(Oi)): Utility of outcome Oi.

Decision Rule

The agent chooses the action that maximizes expected utility:

4.2.4. Types of Decisions

Decision theory categorizes decisions based on the level of certainty in outcomes:

Decisions Under Certainty

 Outcomes of actions are known in advance.


 Agent chooses the action with the highest utility:

Decisions Under Risk

 Outcomes depend on probabilities, which are known or can be estimated.


 Agent evaluates the expected utility of each action and chooses the one that maximizes
EU(A).

Decisions Under Uncertainty

 Probabilities of outcomes are unknown.


 Often modeled using Bayesian methods to estimate probabilities or using heuristics to simplify
decision-making.
III Year/ VI Semester/Unit 4 Page 7
21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

4.2.5. Decision Networks

A Decision Network (or Influence Diagram) is a graphical representation of a decision-making


process that extends Bayesian networks by adding:

1. Chance Nodes: Represent random variables and uncertainties (e.g., weather conditions).
2. Decision Nodes: Represent the agent’s choices.
3. Utility Nodes: Represent the agent’s utility function.

Structure of a Decision Network

Example structure:

[Chance Node: Weather] ---> [Decision Node: Carry Umbrella?] ---> [Utility Node: Comfort]

Here:

 The Weather influences the decision to carry an umbrella.


 The utility (comfort) depends on both the weather and the decision.

Graphical Components

 Arrows between nodes show dependencies:


o From chance nodes to decision nodes (influence of uncertainty on decisions),
o From decision nodes to utility nodes (impact of decisions on utility).

4.2. 6. Example of Expected Utility Calculation


Scenario: Carrying an Umbrella

An agent needs to decide whether to carry an umbrella based on the probability of rain.

Given Data:

 Probabilities:
o P(Rain)=0.3,
o P(No Rain)=0.7.
 Utilities:
o U(Carry Umbrella, Rain)=10,
o U(Carry Umbrella, No Rain)=5,
o U(No Umbrella, Rain)=−20,
o U(No Umbrella, No Rain)=15.

III Year/ VI Semester/Unit 4 Page 8


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

Expected Utility Calculations:

1. Action 1: Carry Umbrella

EU(Carry Umbrella)=P(Rain)⋅U(Carry Umbrella, Rain)+P(No Rain)⋅


EU(Carry Umbrella)=(0.3⋅10)+(0.7⋅5)=3+3.5=6.5.

2. Action 2: Do Not Carry Umbrella

EU(No Umbrella)=P(Rain)⋅U(No Umbrella, Rain)+P(No Rain)⋅U(No Umbrella, No Rain).


EU(No Umbrella)=(0.3⋅−20)+(0.7⋅15)=−6+10.5=4.5.
Decision:

Since EU(Carry Umbrella)>EU(No Umbrella), the agent should carry the umbrella.

Graphical Representation:

A simple decision tree illustrating the scenario:

[Decision: Carry Umbrella?]


├── Rain (30%): Utility = +10
└── No Rain (70%): Utility = +5

4.2.7. Risk Preferences

Utility functions reflect the agent’s attitude toward risk:

Risk-Neutral Agent

 Values outcomes based on their expected value.


 Linear utility function: U(x)=x.

Risk-Averse Agent

 Prefers certain outcomes to risky ones with the same expected value.
 Concave utility function:

III Year/ VI Semester/Unit 4 Page 9


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

Risk-Seeking Agent

 Prefers risky outcomes to certain ones with the same expected value.
 Convex utility function: U(x)=x2.

Graph: Utility vs. Value

 A graph can represent these risk preferences, showing linear, concave, and convex utility
functions.

4.2.8. Applications of Decision Theory

1. Artificial Intelligence:
o Rational agents in uncertain environments.
o Game-playing algorithms.
2. Economics:
o Cost-benefit analysis.
o Consumer behavior modeling.
3. Robotics:
o Path planning with uncertain obstacles.
4. Healthcare:
o Treatment planning under probabilistic outcomes.

4.2.9. Challenges and Limitations

1. Complexity:
o Real-world decision problems can involve large state spaces and many variables.
2. Uncertainty in Probabilities:
o Estimating probabilities accurately is often challenging.
3. Subjective Utilities:
o Assigning utility values may involve bias or inconsistency.

4.3 Sequential Decision Problems:

Problem Setup

 The agent navigates a 4×3 grid, starting from (1,1) and aiming for goal states (+1 or -1).
 The environment is fully observable, meaning the agent knows its location at all times.
 The agent has four actions: Up, Down, Left, Right.

Stochastic Environment

 Actions succeed with 0.8 probability.


 With 0.1 probability, the agent moves perpendicular to the intended direction.
III Year/ VI Semester/Unit 4 Page 10
21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

 Bumping into a wall results in no movement.


 Example: Moving Up from (1,1) has:
o 80% chance of reaching (1,2).
o 10% chance of moving Right to (2,1).
o 10% chance of staying at (1,1) (hitting the wall).

Transition Model & Rewards

 The probability of reaching a state s' from state s using action a is given by P(s' | s, a)
(Markovian property).
 The agent receives a reward (R(s, a, s’)) after each transition:
o +1 for reaching the goal at (4,3).
o -1 for reaching the negative goal.
o -0.04 for all other transitions, encouraging the agent to reach the goal quickly.

Markov Decision Processes (MDPs)

 A sequential decision problem in a fully observable, stochastic environment is called an


MDP.
 An MDP consists of:
o A set of states (with an initial state).
o A set of actions per state.
o A transition model P(s' | s, a).
o A reward function R(s, a, s’).
 Solution Approach: MDPs are often solved using dynamic programming, which
recursively breaks the problem into smaller parts.

Policies & Optimal Decision Making

 A policy (π) defines the best action π(s) to take in each state.
 Optimal Policy (π)* maximizes the expected utility over time.
 Since outcomes are stochastic, different executions of the same policy can lead to different
sequences of states.

Balancing Risk & Reward

 Different values of reward (r) affect policy decisions:


o Highly negative r: The agent rushes to the nearest exit, even if it’s the –1 state.
o Moderately negative r: The agent takes the shortest route to +1 but may choose –1 if
the cost is too high.
o Slightly negative r: The agent plays safe and avoids the –1 state, even if it means
bumping into walls.
o Positive r: The agent avoids terminal states altogether, enjoying infinite reward.

III Year/ VI Semester/Unit 4 Page 11


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

Real-World Relevance

 Unlike deterministic search problems, MDPs model real-world decision-making under


uncertainty.
 They are studied in AI, operations research, economics, and control theory, with various
solution methods available.

Figure: (a) A simple, stochastic 4×3 environment that presents the agent with a sequential decision
problem. (b) Illustration of the transition model of the environment: the “intended” outcome occurs
with probability 0.8, but with probability 0.2 the agent moves at right angles to the intended direction.
A collision with a wall results in no movement. Transitions into the two terminal states have reward
+1 and –1, respectively, and all other transitions have a reward of –0.04.

4.3.1 Utilities over time

In MDPs, agent performance is measured by the sum of rewards obtained through state transitions.
The utility function UhU_hUh defines how rewards are aggregated over time.

1. Finite vs. Infinite Horizon:


o Finite horizon: Decisions depend on remaining time; policies are nonstationary.
o Infinite horizon: No fixed time limit; policies are stationary.
2. Additive Discounted Rewards:
o Utility is computed as Uh=R(s0,a0,s1)+γR(s1,a1,s2)+γ2R(s2,a2,s3)+…U_h =
R(s_0,a_0,s_1) + \gamma R(s_1,a_1,s_2) + \gamma^2 R(s_2,a_2,s_3) + \dotsUh=R(s0
,a0,s1)+γR(s1,a1,s2)+γ2R(s2,a2,s3)+…
o The discount factor γ\gammaγ (0≤γ≤10 \leq \gamma \leq 10≤γ≤1) prioritizes near-term
rewards over distant ones.
3. Justifications for Discounting:
o Empirical: Humans and animals favor immediate rewards.
o Economic: Earlier rewards can be invested.
o Uncertainty: Future rewards may not arrive.

III Year/ VI Semester/Unit 4 Page 12


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

oStationary Preference: The ranking of future sequences remains the same if they shift
in time.
o Mathematical Simplicity: Ensures finite utility values and prevents infinite rewards
from improper policies.
4. Alternative Reward Models:
o Proper policies: Ensure reaching a terminal state, allowing undiscounted rewards.
o Average reward per step: Useful but complex to analyze.

Additive discounted rewards are preferred for MDP analysis due to their mathematical advantages
and real-world applicability.

Figure: (a) The optimal policies for the stochastic environment with r= − 0.04 for transitions between
nonterminal states. There are two policies because in state (3,1) both Left and Up are optimal. (b)
Optimal policies for four different ranges of r.

4.3.2 Optimal policies and the utilities of states

1. Utility of a Policy Uπ(s):


o Defined as the expected sum of discounted rewards when following a policy π from
state s.
o Given by:

o The optimal policy π∗ maximizes the expected utility.

2. Bellman Equation:
III Year/ VI Semester/Unit 4 Page 13
21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

o Defines the utility of a state recursively based on neighboring states.


o Given by:

o Helps compute optimal state values iteratively.

3. Q-Function Q(s,a):
o Represents the expected utility of taking action aaa in state sss.
o Defined as:

o The optimal policy is derived from the Q-values:

Applications:

 These concepts are used in Reinforcement Learning (RL) to compute optimal policies in
decision-making problems.
 The Bellman equation is the foundation of Value Iteration and Policy Iteration.
 The Q-function is central to Q-learning, a model-free RL algorithm.

Figure: The utilities of the states in the 4×3 world with γ =1 and r= −0.04 for transitions to
nonterminal states.

III Year/ VI Semester/Unit 4 Page 14


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

4.3.3 Reward scales

Affine Transformations of Utilities & Rewards:

 Scaling and shifting the utility function (U(s)) or reward function (R(s,a,s′)) by constants m
and b does not change the optimal policy:

 This follows from the definition of utility as the sum of discounted rewards.

Shaping Theorem:

 The reward function can be modified using a potential function Φ(s):

 This transformation preserves the optimal policy but can speed up learning by making
rewards more informative.

Proof Using the Bellman Equation:

 The transformation modifies the Q-function:

 Since the optimal policy is determined by:

 and Φ(s) is independent of a, the policy remains unchanged.

Intuition Behind Reward Shaping:

 The term γΦ(s′)−Φ(s) acts like a potential gradient, guiding the agent uphill towards higher-
utility states.
 If we set Φ(s)=U(s), the modified reward function makes greedy action selection optimal.

III Year/ VI Semester/Unit 4 Page 15


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

Practical Implications:

 Reward shaping can accelerate convergence in Reinforcement Learning.


 This approach is used in animal training, where intermediate rewards help the agent (or
animal) learn the desired sequence of actions.

4.3.4 Representing MDPs

For large-scale MDPs, traditional table-based representations of rewards and transitions are
computationally expensive. Using Dynamic Decision Networks (DDNs) provides a more efficient
way to model complex systems by factoring the problem into smaller parts. This approach is useful in
domains like robotics and games such as Tetris, where state and action spaces are large, but the
network structure enables more efficient computations and policy evaluation.

1. Representing Transition Probabilities and Rewards:


o In small MDPs, transition probabilities P(s′∣s,a) and rewards R(s,a,s′) can be
represented as large, three-dimensional tables, but these grow exponentially with state
and action space, making them impractical for large problems.
o For sparse MDPs, these tables can be reduced, but even then, the size is still
problematic for larger problems.

2. Dynamic Decision Networks (DDNs):


o Instead of using large tables, MDPs can be represented as DDNs, which extend
Dynamic Bayesian Networks (DBNs) with decision, reward, and utility nodes.
o DDNs decompose complex problems into smaller, more manageable parts, allowing
for factored representations with exponential complexity advantages over traditional
atomic representations.
o These networks can model real-world problems more efficiently.

3. Mobile Robot Example (DDN for Robotics):


o The state St of a robot can be decomposed into variables like location (Xt), velocity
(X˙t), battery level (Batteryt), and whether it's charging (Charging_t).
o Actions At include decisions like Plug/Unplug, LeftWheel, and RightWheel.
o The reward function here depends on factors like the location and charging status. It
doesn't depend directly on actions or state transitions.

III Year/ VI Semester/Unit 4 Page 16


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

Figure: A dynamic decision network for a mobile robot with state variables for battery level,
charging status, location, and velocity, and action variables for the left and right wheel motors
and for charging.

4. Tetris Example:
o The state variables for Tetris include CurrentPiece, NextPiece, and a bit-vector
Filled representing the board’s state.
o The DDN for Tetris models the game over time and shows how the board’s state
evolves based on the action taken (i.e., placing pieces).
o Tetris is a well-studied MDP where every policy eventually leads to a terminal state
(the board filling up).

III Year/ VI Semester/Unit 4 Page 17


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

Figure: (a) The game of Tetris. The T-shaped piece at the top center can be dropped in any
orientation and in any horizontal position. If a row is completed, that row disappears and the
rows above it move down, and the agent receives one point. The next piece (here, the L-
shaped piece at top right) becomes the current piece, and a new next piece appears, chosen at
random from the seven piece types. The game ends if the board fills up to the top. (b) The
DDN for the Tetris MDP.

5. Utility and Rewards in DDNs:


o In DDNs, the utility function can represent the expected future rewards. For instance,
a heuristic approximation to the utility can be included, which can speed up decision-
making by focusing on a bounded-depth search instead of fully expanding all future
states.

4.4 Elementary game theory:

Game theory studies interactions between multiple decision-makers (agents), each of whom may have
competing or cooperative goals. It provides a mathematical framework to model strategic interactions,
and it plays a crucial role in the study of multi-agent systems.

1. Basics of Game Theory

 Game: A game is defined by a set of players, actions (strategies), and payoffs (rewards or
costs).
 Players: The participants in the game who make decisions based on the game's rules.
 Actions: The choices available to each player.

III Year/ VI Semester/Unit 4 Page 18


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

 Payoffs: The outcomes for each player resulting from a particular combination of strategies
chosen by all players.
 Strategies: A strategy for a player is a complete plan of actions for every possible situation in
the game.
 Outcome: The result of a particular combination of strategies.
 Games can be classified as:
o Cooperative vs. Non-cooperative: In cooperative games, players can form binding
agreements, whereas in non-cooperative games, players cannot make enforceable
agreements.
o Zero-sum vs. Non-zero-sum: A zero-sum game is one where the total payoff is
constant, so one player’s gain is the other’s loss. Non-zero-sum games involve mutual
gains or losses.
o Simultaneous vs. Sequential: In simultaneous games, players choose their actions at
the same time, whereas in sequential games, players take turns making decisions.

2. Rationality and Nash Equilibrium

 Rationality: Players are assumed to act rationally, meaning they choose strategies that
maximize their expected utility.
 Nash Equilibrium (NE): A set of strategies where no player can improve their payoff by
unilaterally changing their strategy, assuming the other players’ strategies remain unchanged.
o Pure Strategy Nash Equilibrium: Each player chooses a single strategy.
o Mixed Strategy Nash Equilibrium: Players randomize over strategies, and the
equilibrium occurs when each player is indifferent to their choices.
 The Prisoner’s Dilemma is a well-known example of game theory where the dominant
strategy (defecting) leads to a worse outcome for both players than if they had cooperated.

3. Extensive Form Games

 Extensive Form: A more detailed representation of a game, showing the sequence of moves,
players’ choices, and possible outcomes. It is typically represented using a game tree.
o Game Trees: Nodes represent decision points, and branches represent possible actions.
o Backward Induction: A method of solving extensive form games by starting from the
end of the game (terminal nodes) and working backward to determine the optimal
strategy for each player at each decision point.
 Subgame Perfect Equilibrium: An extension of Nash equilibrium for extensive-form games.
A strategy profile is a subgame perfect equilibrium if it represents a Nash equilibrium in every
subgame of the original game.

4. Mixed Strategies

 Mixed Strategy: A strategy where a player chooses between actions according to some
probability distribution. This is used when there is no dominant pure strategy.

III Year/ VI Semester/Unit 4 Page 19


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

o Expected Utility: Players calculate the expected value of the payoff from each action
and choose the one that maximizes their expected utility.

5. Repeated Games and Discounting

 Repeated Games: A game that is played multiple times by the same players. Strategies in
repeated games can involve cooperation or retaliation.
o Grim Trigger Strategy: A strategy where a player cooperates until the other player
defects, and then they defect forever after.
o Discounting: In repeated games, players may discount future payoffs, which affects
the strategy choices and outcomes.

6. Games with Incomplete Information

 Bayesian Games: These are games where players have incomplete information about each
other (e.g., their payoffs, strategies, or types). Players form beliefs about the unknown
information and update these beliefs using Bayes' theorem.
 Beliefs and Types: In a Bayesian game, players have private information called types, and
they form beliefs about the types of other players.
 Bayes-Nash Equilibrium: An equilibrium concept for Bayesian games where players
maximize their expected utility given their beliefs about other players' strategies.

7. Evolutionary Game Theory

 Evolutionary Game Theory studies strategies that evolve over time, particularly in the
context of biological and social systems. It focuses on how strategies spread in populations,
where the payoff depends on the frequency of different strategies within the population.
o Evolutionarily Stable Strategy (ESS): A strategy that, if adopted by most of the
population, cannot be invaded by any alternative strategy.

Applications in AI
Game theory plays a critical role in multi-agent systems and agent-based modeling, where different
agents interact, make decisions, and potentially cooperate or compete with each other. It is widely
applied in areas such as:

 Mechanism Design: Designing systems where agents interact, such as auctions or matching
markets.
 Multi-Agent Coordination: Coordination problems where multiple agents need to work
together efficiently.
 Adversarial Games: Examples like chess or Go, where two players compete with perfect
information.

III Year/ VI Semester/Unit 4 Page 20


21IT306 – Fundamentals of Artificial Intelligence VCET
Regulatio
n 2021

Course Incharge Course Coordinator Module Coordinator HoD/IT


Mrs. R. Nancy Deborah Mr.A.Srinivasan Dr.S.Kamalesh Dr.R.Kavitha

III Year/ VI Semester/Unit 4 Page 21

You might also like