0% found this document useful (0 votes)

7 views14 pages

4th Unit Imp Topics

Utility Theory is an economic concept that models human preferences and choices by measuring satisfaction or usefulness from different options. It is applied in AI for decision-making in various contexts such as game-playing agents and autonomous vehicles. The document also discusses the Pirate Ship Problem and other decision-making frameworks like POMDP and Policy Iteration, illustrating how agents make rational choices under uncertainty.

Uploaded by

sehofe9690

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views14 pages

4th Unit Imp Topics

Uploaded by

sehofe9690

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Utility Theory

What is Utility Theory?

• Utility Theory is a concept from economics and decision theory.
• It is used to understand and model human preferences and choices.
• In simple words, it helps us decide "What is the best option?" among many, based on how useful or
satisfying it is.

What is Utility?
• Utility means satisfaction, happiness, or usefulness a person or an agent gets from a particular choice
or outcome.
• It is expressed as a numerical value.
• Higher the utility = more preferred the outcome.

Example 1: Choosing Between Snacks

Suppose you're hungry and have 3 options:

Option Utility (out of 10)

Samosa 9

Vada Pav 7

Banana 5

Here, Samosa gives you the highest utility (9), so according to Utility Theory, you would choose Samosa.

Example 2: Risk and Utility

You are offered two choices:
• Option A: ₹100 for sure
• Option B: 50% chance to win ₹200, 50% chance to win ₹0
If you are risk-averse (don’t like uncertainty), you might choose Option A (certain ₹100) even though both
options have the same expected value of ₹100.
That’s because your personal utility for ₹100 is more than the risky ₹200.

Expected Utility
Utility Theory helps in making decisions under uncertainty using Expected Utility.

Use in AI & Decision Making

• In AI, Utility Theory is used to help agents choose the best actions based on outcomes.
• It is applied in:
o Game-playing agents
o Autonomous vehicles
o Recommendation systems
o Multi-agent systems

Example 3: AI Robot Cleaning Rooms

A robot has to choose between cleaning:

Room Reward (Utility)

Kitchen 5

Bedroom 10

Bathroom 3

The robot will choose Bedroom, as it gives the highest utility.

Summary

Concept Meaning

Utility Measure of satisfaction or usefulness

Utility Function A formula that gives utility to each possible option

Expected Utility Weighted average utility based on probabilities

Use in AI Helps agents make smart decisions

Final Thoughts
Utility Theory is a powerful tool that helps in:
• Making rational decisions
• Choosing the most beneficial option
• Understanding preferences under risk
It is like a mathematical way to model “what we want” and helps both humans and machines make better
choices.

Pirate Ship Problem – Utility Theory

Story of the Pirate Ship Problem:
Imagine there are five greedy pirates (Pirate A, B, C, D, and E), and they have found 100 gold coins.
But pirates are very smart and selfish, and they follow a certain rule to divide the coins:

Rules of the Game:

1. Highest-ranking pirate (A) proposes a plan to divide the coins.
2. All pirates (including the proposer) vote on the plan.
3. If half or more agree, the coins are divided as per the plan.
4. If the plan is rejected, the proposer is thrown overboard, and the next-highest-ranking pirate makes a
new plan.
5. Pirates are:
o Smart (they want to survive),
o Greedy (want maximum coins),
o Strategic (they know how others will think).

Goal (according to Utility Theory):

Each pirate tries to maximize their utility, where:
• Utility = Coins received.
• If survival is at risk, then survival > gold (i.e., utility of staying alive is more important than getting
gold).

Applying Utility Theory:

Let’s solve it backward, starting from the smallest group and then adding more pirates:
Case 1: Only Pirate E
• He gets all 100 coins (no one else to share).
Utility(E) = 100

Case 2: Pirates D and E

• Pirate D proposes a plan.
• Needs 1 vote (including himself) → plan always passes.
• D keeps all 100 coins.
Utility(D) = 100, E = 0

Case 3: Pirates C, D, E
• Pirate C proposes. Needs 2 out of 3 votes.
• If C is thrown out, D gets all (from above case).
• So, C bribes E (who gets nothing otherwise) with 1 coin.
• Final split: C = 99, D = 0, E = 1
Utility(C) = 99 (keeps most and survives)

Case 4: Pirates B, C, D, E
• B needs 2 out of 4 votes (50%).
• If rejected, C’s plan gives D = 0, E = 1.
• So, B can bribe D with 1 coin (better than 0).
• Final split: B = 99, C = 0, D = 1, E = 0

Case 5: Pirates A, B, C, D, E
• A need 3 out of 5 votes.
• If A is thrown out, B’s plan gives C = 0, E = 0.
• A bribes C and E with 1 coin each (better than 0).
• Final split: A = 98, B = 0, C = 1, D = 0, E = 1

What Does This Teach Us in Utility Theory?

• Each pirate makes rational decisions to maximize their utility.
• They use strategic thinking, based on what others will prefer.
• Utility is not just about money – survival has higher value.
• The pirates consider future outcomes and act accordingly.

Summary:

Pirate Utility (Coins) Utility Strategy

A 98 Bribed C and E to survive

B 0 Rejected (no need to bribe)

C 1 Accepted A’s plan

D 0 Didn’t gain from A's plan

E 1 Accepted A’s plan

Conclusion:
The Pirate Ship Problem is a classic way to understand how Utility Theory works in decision-making,
especially when people (or agents) are:
• Selfish
• Strategic
• Want to maximize their outcomes under rules and risks
This is very similar to how AI agents make decisions in multi-agent systems, auctions, negotiations, etc.

What is Preference Elicitation?

• Preference Elicitation means finding out what someone prefers.
• In AI or decision-making systems, it is the process of gathering information about the user’s likes
and dislikes, so that the system can make better choices for them.
• It can be done through surveys, questions, observing behaviour, or learning over time.

Example:
• Online shopping sites recommend products by learning your preferences over time based on your
clicks and purchases.

What is Expected Monetary Value (EMV)?

• Expected Monetary Value is a concept used to make decisions under uncertainty.
• It calculates the average outcome when there are multiple possible outcomes, each with a certain
probability.

Formula:

Example:
If there’s a 50% chance of getting ₹100 and 50% chance of getting ₹0:

So the expected monetary value is ₹50.

What are Multi-Attribute Utility Functions?

• Many real-life decisions depend on more than one factor (attribute), like cost, quality, and speed.
• A Multi-Attribute Utility Function helps in evaluating options that involve multiple criteria.
• It combines the utilities of different attributes into a single value to help in decision-making.

Example:
Choosing a laptop based on:
• Price (low is better)
• Battery life (high is better)
• Weight (low is better)
Each attribute gets a score, and the final decision is made using a combined utility function.

POMDP and Its Significance

Full Form:
POMDP = Partially Observable Markov Decision Process

Significance:
• In many situations, an agent cannot see the full state of the environment. It has to make decisions
based on partial information.
• POMDP helps in making decisions under uncertainty when the agent doesn’t have complete
knowledge.
• It uses:
o A belief state (probabilistic idea of what the true state might be),
o Rewards,
o Actions,
o Observations, and
o Transition probabilities.

Example:
• A robot in a smoky room where visibility is poor—it must act based on sensor readings (which are not
always accurate).

Sure! Here's a simple and easy-to-understand explanation of the Pirate Ship Problem in the context of Utility
Theory, written in a student-friendly and Indian English tone:

Policy Iteration Algorithm

What is Policy Iteration?

• Policy Iteration is a method used in Reinforcement Learning and Markov Decision Processes
(MDP) to find the best policy (i.e., the best way to act in every state).
• It repeatedly improves a policy until it becomes optimal.
• It works in two main steps:
Policy Evaluation and
Policy Improvement

Key Concepts
• Policy: A rule that tells the agent which action to take in each state.
• Value Function (V): Tells how good a state is, under a given policy.

Steps of the Policy Iteration Algorithm

Step 1: Start with any policy
• Randomly choose some actions for each state.
• This is your initial policy π.
Step 2: Policy Evaluation
• Calculate the value of each state under the current policy.
• Use the Bellman equation to find V(s) (can be done exactly or approximately).

Repeat this step until the value function becomes stable (i.e., doesn’t change much).
Step 3: Policy Improvement
• For each state, check if there is a better action than the one in the current policy.
• Choose the action that gives the highest expected value.

• If the policy doesn’t change after this step, the current policy is optimal, and we stop.
• Otherwise, update the policy and go back to Step 2.

Repeat Until Convergence

Keep doing Policy Evaluation and Policy Improvement until the policy stops changing.

Final Result:
• You will get the optimal policy – the best action to take from every state.
• Also, you’ll have the optimal value function for all states.

Example (Very Simple):

Let’s say an agent can be in 3 states: A, B, and C.
For each state, it can choose to go Left or Right.
• Start with a random policy, e.g.,
A → Left, B → Right, C → Left
• Use Policy Evaluation to calculate the value of A, B, and C.
• Then do Policy Improvement to check if switching Left/Right gives a better value.
• Update the policy if needed, and repeat!

Summary Table
Step What Happens

Policy Evaluation Compute value of current policy

Policy Improvement Check for better actions, update policy

Repeat Until policy becomes stable (optimal)

Why Is It Important?
• It is a fundamental algorithm in dynamic programming.
• Helps AI agents learn the best behaviour.
• Works in problems where the model (transition and reward) is known.

What is a POMDP?
POMDP stands for Partially Observable Markov Decision Process.
It is used when an agent:
• Cannot fully observe the current state of the environment,
• Has to take decisions based on partial and noisy observations.
In simple words, the agent doesn't "see" the full picture but must still make smart decisions using probabilities.

Elements in a POMDP Decision Diagram

A POMDP involves the following components:

Element Description

States (S) Possible situations in the environment (which the agent cannot fully see)

Actions (A) Possible things the agent can do

Observations (O) What the agent sees or senses (not always accurate)

Transition Model (T) Probability of moving from one state to another when an action is taken

Observation Model (Z) Probability of getting a certain observation from a state

Rewards (R) The benefit or penalty for taking an action in a state

Now, here’s a visual representation of the POMDP decision diagram:

POMDP Decision Diagram

Agent Tiger Problem – POMDP Example

Story Setup:
An agent is standing in front of two closed doors:

• Behind one door, there is a tiger.

• Behind the other door, there is a treasure.

The agent’s goal is to find the treasure without being eaten by the tiger.

But Here’s the Twist:

• The agent doesn’t know which door has the tiger.
• The agent can choose actions like:
o Open Left Door
o Open Right Door
o Listen (to get a clue where the tiger might be)
Listening gives a noisy observation – it might help, but it’s not 100% reliable.
Agent's Choices (Actions):

Action Effect

Open Left If tiger is there → penalty; If treasure → reward

Open Right Same as above, but for right door

Listen Pay a small cost, and get a clue (e.g., hear growl from left or right)

What Makes It a POMDP?

Because the agent:
• Cannot fully observe the state (doesn’t know where the tiger is),
• Has to make decisions based on beliefs (probability of tiger being behind a door),
• Observations (like growling) are inaccurate (may mislead).

Agent Strategy:
1. Start with a belief (e.g., 50% tiger behind left, 50% behind right).
2. Listen once or more → update belief based on what it hears.
3. When belief becomes strong enough (e.g., 90% sure tiger is behind left), open the right door.

Rewards and Penalties:

Action Outcome Reward

Open correct door Get treasure +10

Open tiger door Get eaten -100

Listen Get a clue (costs energy) -1

Real-World Use Case:

This problem is a toy example, but it reflects many real-life AI decisions, like:
• A robot searching in a risky environment,
• Medical diagnosis (guessing a disease based on symptoms),
• Autonomous vehicles avoiding hidden dangers.

Summary

Concept Description

Hidden State Tiger’s position is unknown

Observations Sounds or signals from listening

Belief State Probability of tiger being behind a certain door

Decision Making Based on updated beliefs and reward expectations

Sure! Let's break down the Wumpus World Problem, a classic example from Artificial Intelligence
(especially in logical agents and knowledge-based systems) — in a very simple and beginner-friendly way

Wumpus World Problem

What is Wumpus World?

The Wumpus World is a grid-based world (usually 4x4) where an AI agent has to explore and find gold
while avoiding dangers like the Wumpus (a monster) and bottomless pits.
This problem is used to test:
• Knowledge-based agents
• Reasoning with uncertainty
• Logical decision-making

What’s Inside the Grid?

• Agent starts at square (1,1)
• Gold is placed in some cell → Goal is to find and grab it
• Wumpus: A monster that will kill the agent if they enter its cell
• Pits: If the agent falls into a pit → It dies
• Safe cells: Empty and safe to walk on
Sensory Perceptions of the Agent:
When the agent enters a cell, it perceives the following clues:

Percept Meaning

Stench Wumpus is in a neighboring cell

Breeze There is a pit in a neighboring cell

Glitter Gold is in the same cell

Bump Agent has hit a wall

Scream Wumpus has been killed

These clues help the agent guess what might be around.

Agent's Actions:

Action Description

Move Forward Moves one cell in the facing direction

Turn Left/Right Changes facing direction

Grab Picks up gold

Shoot Fires an arrow to kill Wumpus

Climb Used to exit the cave

The agent gets only one arrow, so must use it wisely!

Rewards & Penalties:

Situation Reward/Penalty

Finding gold +100

Falling in pit -1000

Getting eaten -1000

Using arrow -10

Every move -1
How Does the Agent Decide?
The agent:
• Uses logic (like propositional logic) to infer safe and dangerous cells.
• Maintains a knowledge base (KB).
• Updates KB using percepts from the environment.
• Chooses actions that maximize expected reward and avoid risk.

Example Inference:
If cell (1,2) has a breeze, the agent can infer:
“One of the neighboring cells might contain a pit”
It will then mark them as ‘possibly dangerous’ and avoid until more info is collected.

Goal of the Agent:

1. Grab the gold
2. Avoid dying
3. Exit the cave safely
4. Try to optimize score while exploring logically

Summary Table:

Concept Description

Environment 4x4 grid with gold, pits, Wumpus

Agent's goal Grab gold, stay alive, exit

Senses Breeze, Stench, Glitter, Bump, Scream

Reasoning type Logical inference using propositional logic

Why is Wumpus World Important?

• It teaches how an agent can work in uncertain, dangerous environments.
• It’s a great example of logical reasoning, percept-based decisions, and knowledge representation in
AI.

2023 Tutorial 9 Sols - Student
No ratings yet
2023 Tutorial 9 Sols - Student
9 pages
MAS - Class
No ratings yet
MAS - Class
71 pages
AI Notes
No ratings yet
AI Notes
37 pages
V Imp ML NOTES
No ratings yet
V Imp ML NOTES
5 pages
Unit-4 of Ai
No ratings yet
Unit-4 of Ai
9 pages
c26 Dtheory
No ratings yet
c26 Dtheory
19 pages
Utility Based Systems
No ratings yet
Utility Based Systems
10 pages
Unit 4
No ratings yet
Unit 4
6 pages
Unit 5 Decision Making
No ratings yet
Unit 5 Decision Making
4 pages
Utility Theory in Artificial Intelligence
No ratings yet
Utility Theory in Artificial Intelligence
4 pages
(24F-COSE361) 5. Markov Decision Process
No ratings yet
(24F-COSE361) 5. Markov Decision Process
40 pages
Decision Making Under Uncertainty
No ratings yet
Decision Making Under Uncertainty
63 pages
08 MDPs
No ratings yet
08 MDPs
111 pages
Markov Decision Process I
No ratings yet
Markov Decision Process I
111 pages
Decision Making Under Uncertainty
No ratings yet
Decision Making Under Uncertainty
63 pages
CS480 Lecture October26th
No ratings yet
CS480 Lecture October26th
63 pages
22 Reinforcement Learning
No ratings yet
22 Reinforcement Learning
18 pages
07 Expectimax
No ratings yet
07 Expectimax
46 pages
Chapter 4 - Decision Theory (2024 New)
No ratings yet
Chapter 4 - Decision Theory (2024 New)
31 pages
A16 Simple Decisions
No ratings yet
A16 Simple Decisions
16 pages
325 Notes
No ratings yet
325 Notes
23 pages
Decision Making
No ratings yet
Decision Making
63 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
Chapter. 07 - Expectimax Search and Utilities
No ratings yet
Chapter. 07 - Expectimax Search and Utilities
47 pages
M 2
No ratings yet
M 2
12 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Fai Unit 4 Notes
No ratings yet
Fai Unit 4 Notes
21 pages
06 MDP
No ratings yet
06 MDP
89 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
Chapter 5
No ratings yet
Chapter 5
13 pages
Sp14 Cs188 Lecture 9 - Mdps II
No ratings yet
Sp14 Cs188 Lecture 9 - Mdps II
48 pages
Decisions Under Risk and Uncertainty: Certainty
No ratings yet
Decisions Under Risk and Uncertainty: Certainty
27 pages
AI Unit5
No ratings yet
AI Unit5
58 pages
Expectimax Search and Utilities
No ratings yet
Expectimax Search and Utilities
44 pages
1.unit - 5 - Acting Under Uncertinity
No ratings yet
1.unit - 5 - Acting Under Uncertinity
8 pages
Ramasubramanian ET Al (2021) (CPT Dynamic Program)
No ratings yet
Ramasubramanian ET Al (2021) (CPT Dynamic Program)
8 pages
Decision Making Under Risk and Uncertainty II 2019 PDF
No ratings yet
Decision Making Under Risk and Uncertainty II 2019 PDF
32 pages
MDP PDF
No ratings yet
MDP PDF
37 pages
Decision Theory
No ratings yet
Decision Theory
19 pages
Lect28 4up
No ratings yet
Lect28 4up
11 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
1 page
Wa0003.
No ratings yet
Wa0003.
16 pages
Micro ch12
No ratings yet
Micro ch12
24 pages
Game Theory
100% (1)
Game Theory
155 pages
Lecture I-II: Motivation and Decision Theory: 1 Motivating Experiment: Guess The Average
No ratings yet
Lecture I-II: Motivation and Decision Theory: 1 Motivating Experiment: Guess The Average
8 pages
Pomdps
No ratings yet
Pomdps
76 pages
RL Complete Unit-5
No ratings yet
RL Complete Unit-5
30 pages
RL 1
No ratings yet
RL 1
30 pages
Unit 1, 2 RL
No ratings yet
Unit 1, 2 RL
29 pages
CPS 270: Artificial Intelligence Decision Theory: Vincent Conitzer
No ratings yet
CPS 270: Artificial Intelligence Decision Theory: Vincent Conitzer
7 pages
Lecture7 MDPs I
No ratings yet
Lecture7 MDPs I
9 pages
Ai Lecture-5
No ratings yet
Ai Lecture-5
34 pages
Expectimax Search
No ratings yet
Expectimax Search
29 pages
A16 Simple Decisions
No ratings yet
A16 Simple Decisions
16 pages
A16-Simple Decisions
No ratings yet
A16-Simple Decisions
16 pages
Lecture1 1
No ratings yet
Lecture1 1
16 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet
Gale Researcher Guide for: Uncertainty and Heuristics
From Everand
Gale Researcher Guide for: Uncertainty and Heuristics
Kenney
No ratings yet
Bachelor of Business Administration
No ratings yet
Bachelor of Business Administration
49 pages
Subject Guide
No ratings yet
Subject Guide
561 pages
Sse 106 SG 4
No ratings yet
Sse 106 SG 4
8 pages
BOT Economics Grade 10 Term 2 Dynamics of Markets
No ratings yet
BOT Economics Grade 10 Term 2 Dynamics of Markets
81 pages
AECO141principle of Agricultural Economics PDF
50% (8)
AECO141principle of Agricultural Economics PDF
86 pages
Consumer Behaviour
No ratings yet
Consumer Behaviour
32 pages
Essays in Behavioural Contract Theory: Harvey Upton
No ratings yet
Essays in Behavioural Contract Theory: Harvey Upton
173 pages
Chapter 8 - PowerPoint Slides
No ratings yet
Chapter 8 - PowerPoint Slides
16 pages
Microeconomic Theory Basic Principles and Extensions 12e Walter Nicholson Download
100% (1)
Microeconomic Theory Basic Principles and Extensions 12e Walter Nicholson Download
56 pages
Micro I Notes All Chapters
100% (1)
Micro I Notes All Chapters
161 pages
Eco 1103-1
No ratings yet
Eco 1103-1
3 pages
Theory of Consumer Behavior: Concepts: Consumer Opportunities Consumer Preferences Utility or Satisfaction
No ratings yet
Theory of Consumer Behavior: Concepts: Consumer Opportunities Consumer Preferences Utility or Satisfaction
24 pages
The Missing Risk Premium: Why Low Volatility Investing Works
No ratings yet
The Missing Risk Premium: Why Low Volatility Investing Works
118 pages
G11 Q1 Module7 SuplementaryModule-SHS Final-Version-1
No ratings yet
G11 Q1 Module7 SuplementaryModule-SHS Final-Version-1
25 pages
Lecture 4 - Consumer Behaviour
No ratings yet
Lecture 4 - Consumer Behaviour
9 pages
Micro - Exercises A Solutions (Sheet 1, 2, 3)
No ratings yet
Micro - Exercises A Solutions (Sheet 1, 2, 3)
18 pages
ALTFELD - The Decision To Ally. A Theory and Test
No ratings yet
ALTFELD - The Decision To Ally. A Theory and Test
23 pages
Mutual Trust Bank (MTB) 2018
No ratings yet
Mutual Trust Bank (MTB) 2018
21 pages
Building Economics For Architecture PPT (F)
100% (2)
Building Economics For Architecture PPT (F)
54 pages
EC 501/zenginobuz Boğaziçi University/Fall 2021 Review Questions For Consumer Theory (Undergraduate Micro Level)
No ratings yet
EC 501/zenginobuz Boğaziçi University/Fall 2021 Review Questions For Consumer Theory (Undergraduate Micro Level)
2 pages
ULOs Week 8 To 9
No ratings yet
ULOs Week 8 To 9
30 pages
Micro Practise MCQ Questions Set
No ratings yet
Micro Practise MCQ Questions Set
3 pages
Unit 2
No ratings yet
Unit 2
19 pages
Example Candidate Responses: Cambridge International AS & A Level
No ratings yet
Example Candidate Responses: Cambridge International AS & A Level
92 pages
A Contingency Framework For Understanding Ethical Decision Making in Marketing
No ratings yet
A Contingency Framework For Understanding Ethical Decision Making in Marketing
11 pages
MBA ME Course Handout 2024
No ratings yet
MBA ME Course Handout 2024
14 pages
Updated Sem 1 B.com - JGCC All Subjects Short Answers
No ratings yet
Updated Sem 1 B.com - JGCC All Subjects Short Answers
82 pages
Understanding Law and Economics, A Primer For Judges by Lourdes Sereno, Associate Professor, UP College of Law
No ratings yet
Understanding Law and Economics, A Primer For Judges by Lourdes Sereno, Associate Professor, UP College of Law
11 pages
MAT - Micro Economics
100% (1)
MAT - Micro Economics
108 pages