0% found this document useful (0 votes)
7 views14 pages

4th Unit Imp Topics

Utility Theory is an economic concept that models human preferences and choices by measuring satisfaction or usefulness from different options. It is applied in AI for decision-making in various contexts such as game-playing agents and autonomous vehicles. The document also discusses the Pirate Ship Problem and other decision-making frameworks like POMDP and Policy Iteration, illustrating how agents make rational choices under uncertainty.

Uploaded by

sehofe9690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

4th Unit Imp Topics

Utility Theory is an economic concept that models human preferences and choices by measuring satisfaction or usefulness from different options. It is applied in AI for decision-making in various contexts such as game-playing agents and autonomous vehicles. The document also discusses the Pirate Ship Problem and other decision-making frameworks like POMDP and Policy Iteration, illustrating how agents make rational choices under uncertainty.

Uploaded by

sehofe9690
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Utility Theory

What is Utility Theory?


• Utility Theory is a concept from economics and decision theory.
• It is used to understand and model human preferences and choices.
• In simple words, it helps us decide "What is the best option?" among many, based on how useful or
satisfying it is.

What is Utility?
• Utility means satisfaction, happiness, or usefulness a person or an agent gets from a particular choice
or outcome.
• It is expressed as a numerical value.
• Higher the utility = more preferred the outcome.

Example 1: Choosing Between Snacks


Suppose you're hungry and have 3 options:

Option Utility (out of 10)

Samosa 9

Vada Pav 7

Banana 5

Here, Samosa gives you the highest utility (9), so according to Utility Theory, you would choose Samosa.

Example 2: Risk and Utility


You are offered two choices:
• Option A: ₹100 for sure
• Option B: 50% chance to win ₹200, 50% chance to win ₹0
If you are risk-averse (don’t like uncertainty), you might choose Option A (certain ₹100) even though both
options have the same expected value of ₹100.
That’s because your personal utility for ₹100 is more than the risky ₹200.

Expected Utility
Utility Theory helps in making decisions under uncertainty using Expected Utility.

Use in AI & Decision Making


• In AI, Utility Theory is used to help agents choose the best actions based on outcomes.
• It is applied in:
o Game-playing agents
o Autonomous vehicles
o Recommendation systems
o Multi-agent systems

Example 3: AI Robot Cleaning Rooms


A robot has to choose between cleaning:

Room Reward (Utility)

Kitchen 5

Bedroom 10

Bathroom 3

The robot will choose Bedroom, as it gives the highest utility.

Summary

Concept Meaning

Utility Measure of satisfaction or usefulness

Utility Function A formula that gives utility to each possible option

Expected Utility Weighted average utility based on probabilities

Use in AI Helps agents make smart decisions

Final Thoughts
Utility Theory is a powerful tool that helps in:
• Making rational decisions
• Choosing the most beneficial option
• Understanding preferences under risk
It is like a mathematical way to model “what we want” and helps both humans and machines make better
choices.

Pirate Ship Problem – Utility Theory


Story of the Pirate Ship Problem:
Imagine there are five greedy pirates (Pirate A, B, C, D, and E), and they have found 100 gold coins.
But pirates are very smart and selfish, and they follow a certain rule to divide the coins:

Rules of the Game:


1. Highest-ranking pirate (A) proposes a plan to divide the coins.
2. All pirates (including the proposer) vote on the plan.
3. If half or more agree, the coins are divided as per the plan.
4. If the plan is rejected, the proposer is thrown overboard, and the next-highest-ranking pirate makes a
new plan.
5. Pirates are:
o Smart (they want to survive),
o Greedy (want maximum coins),
o Strategic (they know how others will think).

Goal (according to Utility Theory):


Each pirate tries to maximize their utility, where:
• Utility = Coins received.
• If survival is at risk, then survival > gold (i.e., utility of staying alive is more important than getting
gold).

Applying Utility Theory:


Let’s solve it backward, starting from the smallest group and then adding more pirates:
Case 1: Only Pirate E
• He gets all 100 coins (no one else to share).
Utility(E) = 100

Case 2: Pirates D and E


• Pirate D proposes a plan.
• Needs 1 vote (including himself) → plan always passes.
• D keeps all 100 coins.
Utility(D) = 100, E = 0

Case 3: Pirates C, D, E
• Pirate C proposes. Needs 2 out of 3 votes.
• If C is thrown out, D gets all (from above case).
• So, C bribes E (who gets nothing otherwise) with 1 coin.
• Final split: C = 99, D = 0, E = 1
Utility(C) = 99 (keeps most and survives)

Case 4: Pirates B, C, D, E
• B needs 2 out of 4 votes (50%).
• If rejected, C’s plan gives D = 0, E = 1.
• So, B can bribe D with 1 coin (better than 0).
• Final split: B = 99, C = 0, D = 1, E = 0

Case 5: Pirates A, B, C, D, E
• A need 3 out of 5 votes.
• If A is thrown out, B’s plan gives C = 0, E = 0.
• A bribes C and E with 1 coin each (better than 0).
• Final split: A = 98, B = 0, C = 1, D = 0, E = 1

What Does This Teach Us in Utility Theory?


• Each pirate makes rational decisions to maximize their utility.
• They use strategic thinking, based on what others will prefer.
• Utility is not just about money – survival has higher value.
• The pirates consider future outcomes and act accordingly.

Summary:

Pirate Utility (Coins) Utility Strategy

A 98 Bribed C and E to survive

B 0 Rejected (no need to bribe)

C 1 Accepted A’s plan

D 0 Didn’t gain from A's plan

E 1 Accepted A’s plan

Conclusion:
The Pirate Ship Problem is a classic way to understand how Utility Theory works in decision-making,
especially when people (or agents) are:
• Selfish
• Strategic
• Want to maximize their outcomes under rules and risks
This is very similar to how AI agents make decisions in multi-agent systems, auctions, negotiations, etc.

What is Preference Elicitation?


• Preference Elicitation means finding out what someone prefers.
• In AI or decision-making systems, it is the process of gathering information about the user’s likes
and dislikes, so that the system can make better choices for them.
• It can be done through surveys, questions, observing behaviour, or learning over time.

Example:
• Online shopping sites recommend products by learning your preferences over time based on your
clicks and purchases.

What is Expected Monetary Value (EMV)?


• Expected Monetary Value is a concept used to make decisions under uncertainty.
• It calculates the average outcome when there are multiple possible outcomes, each with a certain
probability.

Formula:

Example:
If there’s a 50% chance of getting ₹100 and 50% chance of getting ₹0:

So the expected monetary value is ₹50.

What are Multi-Attribute Utility Functions?


• Many real-life decisions depend on more than one factor (attribute), like cost, quality, and speed.
• A Multi-Attribute Utility Function helps in evaluating options that involve multiple criteria.
• It combines the utilities of different attributes into a single value to help in decision-making.

Example:
Choosing a laptop based on:
• Price (low is better)
• Battery life (high is better)
• Weight (low is better)
Each attribute gets a score, and the final decision is made using a combined utility function.

POMDP and Its Significance


Full Form:
POMDP = Partially Observable Markov Decision Process

Significance:
• In many situations, an agent cannot see the full state of the environment. It has to make decisions
based on partial information.
• POMDP helps in making decisions under uncertainty when the agent doesn’t have complete
knowledge.
• It uses:
o A belief state (probabilistic idea of what the true state might be),
o Rewards,
o Actions,
o Observations, and
o Transition probabilities.

Example:
• A robot in a smoky room where visibility is poor—it must act based on sensor readings (which are not
always accurate).

Sure! Here's a simple and easy-to-understand explanation of the Pirate Ship Problem in the context of Utility
Theory, written in a student-friendly and Indian English tone:

Policy Iteration Algorithm

What is Policy Iteration?


• Policy Iteration is a method used in Reinforcement Learning and Markov Decision Processes
(MDP) to find the best policy (i.e., the best way to act in every state).
• It repeatedly improves a policy until it becomes optimal.
• It works in two main steps:
Policy Evaluation and
Policy Improvement

Key Concepts
• Policy: A rule that tells the agent which action to take in each state.
• Value Function (V): Tells how good a state is, under a given policy.

Steps of the Policy Iteration Algorithm


Step 1: Start with any policy
• Randomly choose some actions for each state.
• This is your initial policy π.
Step 2: Policy Evaluation
• Calculate the value of each state under the current policy.
• Use the Bellman equation to find V(s) (can be done exactly or approximately).

Repeat this step until the value function becomes stable (i.e., doesn’t change much).
Step 3: Policy Improvement
• For each state, check if there is a better action than the one in the current policy.
• Choose the action that gives the highest expected value.

• If the policy doesn’t change after this step, the current policy is optimal, and we stop.
• Otherwise, update the policy and go back to Step 2.

Repeat Until Convergence


Keep doing Policy Evaluation and Policy Improvement until the policy stops changing.

Final Result:
• You will get the optimal policy – the best action to take from every state.
• Also, you’ll have the optimal value function for all states.

Example (Very Simple):


Let’s say an agent can be in 3 states: A, B, and C.
For each state, it can choose to go Left or Right.
• Start with a random policy, e.g.,
A → Left, B → Right, C → Left
• Use Policy Evaluation to calculate the value of A, B, and C.
• Then do Policy Improvement to check if switching Left/Right gives a better value.
• Update the policy if needed, and repeat!

Summary Table
Step What Happens

Policy Evaluation Compute value of current policy

Policy Improvement Check for better actions, update policy

Repeat Until policy becomes stable (optimal)

Why Is It Important?
• It is a fundamental algorithm in dynamic programming.
• Helps AI agents learn the best behaviour.
• Works in problems where the model (transition and reward) is known.

What is a POMDP?
POMDP stands for Partially Observable Markov Decision Process.
It is used when an agent:
• Cannot fully observe the current state of the environment,
• Has to take decisions based on partial and noisy observations.
In simple words, the agent doesn't "see" the full picture but must still make smart decisions using probabilities.

Elements in a POMDP Decision Diagram


A POMDP involves the following components:

Element Description

States (S) Possible situations in the environment (which the agent cannot fully see)

Actions (A) Possible things the agent can do

Observations (O) What the agent sees or senses (not always accurate)

Transition Model (T) Probability of moving from one state to another when an action is taken

Observation Model (Z) Probability of getting a certain observation from a state

Rewards (R) The benefit or penalty for taking an action in a state

Now, here’s a visual representation of the POMDP decision diagram:


POMDP Decision Diagram

Agent Tiger Problem – POMDP Example

Story Setup:
An agent is standing in front of two closed doors:

• Behind one door, there is a tiger.

• Behind the other door, there is a treasure.


The agent’s goal is to find the treasure without being eaten by the tiger.

But Here’s the Twist:


• The agent doesn’t know which door has the tiger.
• The agent can choose actions like:
o Open Left Door
o Open Right Door
o Listen (to get a clue where the tiger might be)
Listening gives a noisy observation – it might help, but it’s not 100% reliable.
Agent's Choices (Actions):

Action Effect

Open Left If tiger is there → penalty; If treasure → reward

Open Right Same as above, but for right door

Listen Pay a small cost, and get a clue (e.g., hear growl from left or right)

What Makes It a POMDP?


Because the agent:
• Cannot fully observe the state (doesn’t know where the tiger is),
• Has to make decisions based on beliefs (probability of tiger being behind a door),
• Observations (like growling) are inaccurate (may mislead).

Agent Strategy:
1. Start with a belief (e.g., 50% tiger behind left, 50% behind right).
2. Listen once or more → update belief based on what it hears.
3. When belief becomes strong enough (e.g., 90% sure tiger is behind left), open the right door.

Rewards and Penalties:

Action Outcome Reward

Open correct door Get treasure +10

Open tiger door Get eaten -100

Listen Get a clue (costs energy) -1

Real-World Use Case:


This problem is a toy example, but it reflects many real-life AI decisions, like:
• A robot searching in a risky environment,
• Medical diagnosis (guessing a disease based on symptoms),
• Autonomous vehicles avoiding hidden dangers.

Summary

Concept Description

Hidden State Tiger’s position is unknown

Observations Sounds or signals from listening

Belief State Probability of tiger being behind a certain door

Decision Making Based on updated beliefs and reward expectations

Sure! Let's break down the Wumpus World Problem, a classic example from Artificial Intelligence
(especially in logical agents and knowledge-based systems) — in a very simple and beginner-friendly way

Wumpus World Problem

What is Wumpus World?


The Wumpus World is a grid-based world (usually 4x4) where an AI agent has to explore and find gold
while avoiding dangers like the Wumpus (a monster) and bottomless pits.
This problem is used to test:
• Knowledge-based agents
• Reasoning with uncertainty
• Logical decision-making

What’s Inside the Grid?


• Agent starts at square (1,1)
• Gold is placed in some cell → Goal is to find and grab it
• Wumpus: A monster that will kill the agent if they enter its cell
• Pits: If the agent falls into a pit → It dies
• Safe cells: Empty and safe to walk on
Sensory Perceptions of the Agent:
When the agent enters a cell, it perceives the following clues:

Percept Meaning

Stench Wumpus is in a neighboring cell

Breeze There is a pit in a neighboring cell

Glitter Gold is in the same cell

Bump Agent has hit a wall

Scream Wumpus has been killed

These clues help the agent guess what might be around.

Agent's Actions:

Action Description

Move Forward Moves one cell in the facing direction

Turn Left/Right Changes facing direction

Grab Picks up gold

Shoot Fires an arrow to kill Wumpus

Climb Used to exit the cave

The agent gets only one arrow, so must use it wisely!

Rewards & Penalties:

Situation Reward/Penalty

Finding gold +100

Falling in pit -1000

Getting eaten -1000

Using arrow -10

Every move -1
How Does the Agent Decide?
The agent:
• Uses logic (like propositional logic) to infer safe and dangerous cells.
• Maintains a knowledge base (KB).
• Updates KB using percepts from the environment.
• Chooses actions that maximize expected reward and avoid risk.

Example Inference:
If cell (1,2) has a breeze, the agent can infer:
“One of the neighboring cells might contain a pit”
It will then mark them as ‘possibly dangerous’ and avoid until more info is collected.

Goal of the Agent:


1. Grab the gold
2. Avoid dying
3. Exit the cave safely
4. Try to optimize score while exploring logically

Summary Table:

Concept Description

Environment 4x4 grid with gold, pits, Wumpus

Agent's goal Grab gold, stay alive, exit

Senses Breeze, Stench, Glitter, Bump, Scream

Reasoning type Logical inference using propositional logic

Why is Wumpus World Important?


• It teaches how an agent can work in uncertain, dangerous environments.
• It’s a great example of logical reasoning, percept-based decisions, and knowledge representation in
AI.

You might also like