02 Agents
02 Agents
Artificial
Intelligence
Intelligent Agents
AIMA Chapter 2
This work is licensed under a Creative Commons Image: "Robot at the British Library Science Fiction Exhibition"
Attribution-ShareAlike 4.0 International License.
by BadgerGravling
Outline
PEAS
(Performance
What is an
measure, Environment
intelligent Rationality Agent types
Environment, types
agent?
Actuators,
Sensors)
Outline
PEAS
(Performance
What is an
measure, Environment
intelligent Rationality Agent types
Environment, types
agent?
Actuators,
Sensors)
What is an Agents?
• An agent is anything that can be viewed as perceiving its environment
through sensors and acting upon that environment through actuators.
𝒑𝒑
𝒂𝒂 = 𝒇𝒇(𝒑𝒑)
𝑓𝑓 ∶ 𝑃𝑃∗ → 𝐴𝐴
𝒂𝒂
• Sensors
• Memory
• Computational power
Example:
Vacuum-cleaner World
• Percepts:
Location and status,
e.g., [A, Dirty]
• Actions:
Most recent
Left, Right, Suck, NoOp
Percept 𝑝𝑝
PEAS
(Performance
What is an
measure, Environment
intelligent Rationality Agent types
Environment, types
agent?
Actuators,
Sensors)
Rational Agents: What is Good Behavior?
Foundation
• Consequentialism: Evaluate behavior by its consequences.
• Utilitarianism: Maximize happiness and well-being.
𝑎𝑎 = argmax𝑎𝑎∈A 𝐸𝐸 𝑈𝑈 𝑎𝑎)
Rational Agents
Rule: Pick the action that maximize the expected utility
𝑎𝑎 = argmax𝑎𝑎∈A 𝐸𝐸 𝑈𝑈 𝑎𝑎)
This means:
• It is rational to explore and learn – I.e., use percepts to supplement prior knowledge
and become autonomous
PEAS
(Performance
What is an
measure, Environment
intelligent Rationality Agent types
Environment, types
agent?
Actuators,
Sensors)
Problem Specification: PEAS Performance
measure
Performance
Environment Actuators Sensors
measure
Performance
Environment Actuators Sensors
measure
• Safe • Roads • Steering • Cameras
• fast • other traffic wheel • sonar
• legal • pedestrians • accelerator • speedometer
• comfortable • customers • brake • GPS
trip • signal • Odometer
• maximize • horn • engine
profits sensors
• keyboard
Example: Spam Filter
Performance
Environment Actuators Sensors
measure
• Accuracy: • A user’s email • Mark as spam • Incoming
Minimizing account • delete messages
false • email server • etc. • other
positives, information
false about user’s
negatives account
Outline
PEAS
(Performance
What is an
measure, Environment
intelligent Rationality Agent types
Environment, types
agent?
Actuators,
Sensors)
Environment Types
Fully observable: The agent's sensors Partially observable: The agent cannot see all
give it access to the complete state of vs. aspects of the environment. E.g., it can’t see
the environment. The agent can “see” through walls
the whole environment.
Known: The agent knows the rules of the Unknown: The agent cannot predict the outcome
environment and can predict the
vs. of actions.
outcome of actions.
Environment Types
Static: The environment is not changing
Dynamic: The environment is changing while
while agent is deliberating. vs. the agent is deliberating.
Semidynamic: the environment is static,
but the agent's performance score
depends on how fast it acts.
Discrete: The environment provides a fixed Continuous: Percepts, actions, state variables or
number of distinct percepts, actions, and vs. time are continuous leading to an infinite state,
environment states. Time can also evolve in a percept or action space.
discrete or continuous fashion.
Single agent: An agent operating by itself in vs. Multi-agent: Agent cooperate or compete in the
an environment. same environment.
Examples of Different Environments
* Can be models as a single agent problem with the other agent(s) in the environment.
Outline
PEAS
(Performance
What is an
measure, Environment
intelligent Rationality Agent types
Environment, types
agent?
Actuators,
Sensors)
Designing a Rational Agent
Remember the definition of a
rational agent:
𝑓𝑓 “For each possible percept sequence, a
action rational agent should select an action
that maximizes its expected
performance measure, given the
evidence provided by the percept
sequence and the agent’s built-in
knowledge.”
Percept to the
Agent Function agent function
• Assess Note: Everything
𝑓𝑓 outside the agent
performance function can be
measure seen as the
• Remember environment.
percept sequence Action from the
• Built-in knowledge agent function
Hierarchy of Agent Types
Utility-based agents
Goal-based agents
𝑎𝑎 = 𝑓𝑓(𝑝𝑝)
The interaction is a sequence: 𝑝𝑝0 , 𝑎𝑎0 , 𝑝𝑝1 , 𝑎𝑎1 , 𝑝𝑝2 , 𝑎𝑎2 , … 𝑝𝑝𝑡𝑡 , 𝑎𝑎𝑡𝑡 , …
Example: A simple vacuum cleaner that uses rules based on its current sensor input.
Model-based Reflex Agent
• Maintains a state variable to keeps track of aspects of the environment that
cannot be currently observed. I.e., it has memory and knows how the
environment reacts to actions (called transition function).
• The state is updated using the percept.
• There is now more information for the rules to make better decisions.
𝑠𝑠
𝑠𝑠 ′ = 𝑇𝑇(𝑠𝑠, 𝑎𝑎)
𝑎𝑎 = 𝑓𝑓(𝑝𝑝, 𝑠𝑠)
The interaction is a sequence: 𝑠𝑠0 , 𝑎𝑎0 , 𝑝𝑝1 , 𝑠𝑠1 , 𝑎𝑎1 , 𝑝𝑝2 , 𝑠𝑠2 , 𝑎𝑎2 , 𝑝𝑝3 , … , 𝑝𝑝𝑡𝑡 , 𝑠𝑠𝑡𝑡 , 𝑎𝑎𝑡𝑡 , …
Light is Light is
off on
switch off
• States change because of:
a. System dynamics of the environment.
b. The actions of the agent.
We often construct atomic labels from factored information. E.g.: If the agent’s state is
the coordinate x = 7 and y = 3, then the atomic state label could be the string “(7, 3)”.
With the atomic representation, we can only compare if two labels are the same. With
the factored state representation, we can reason more and calculate the distance
between states!
The set of all possible states is called the state space 𝑺𝑺. This set is typically very large!
Old-school vs. Smart Thermostat
𝑇𝑇
The interaction is a sequence: 𝑠𝑠0 , 𝑎𝑎0 , 𝑝𝑝1 , 𝑠𝑠1 , 𝑎𝑎1 , 𝑝𝑝2 , 𝑠𝑠2 , 𝑎𝑎2 , … , 𝑠𝑠 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔
cost
Example: Solving a puzzle. What action gets me closer to the solution?
Utility-based Agent
• The agent uses a utility function to evaluate the desirability of each possible
states. This is typically expressed as the reward of being in a state 𝑅𝑅(𝑠𝑠).
• Choose actions to stay in desirable states.
• Performance measure: The discounted sum of expected utility over time.
∞
𝑎𝑎 = arg𝑚𝑚𝑚𝑚𝑚𝑚𝑎𝑎0∈A 𝔼𝔼 � 𝛾𝛾 𝑡𝑡 𝑟𝑟𝑡𝑡
𝑡𝑡=0
Utility is the
expected future
discounted reward
The interaction is a sequence: 𝑠𝑠0 , 𝑎𝑎0 , 𝑝𝑝1 , 𝑠𝑠1 , 𝑎𝑎1 . 𝑝𝑝2 , 𝑠𝑠2 , 𝑎𝑎2 , …
reward
Example: An autonomous Mars rover prefers states where its battery is not critically low.
Agents that Learn
Exploration
Example: Smart Thermostat
Change
temperature
when you are
too
cold/warm.
Smart thermostat
Percepts States
• Temp: deg. F Factored states
• Outside temp. • Estimated
• Weather report time to cool
• Energy the house
curtailment • Someone
• Someone walking home?
by • How long till
• Someone changes someone is
temp. coming
• Day & time home?
• … • A/C: on, off
Example: Modern Vacuum Robot
Features are:
• Control via App
• Cleaning Modes
• Navigation
• Mapping
• Boundary blockers
Source: https://www.techhive.com/article/3269782/best-robot-
vacuum-cleaners.html
PEAS Description of a
Modern Robot Vacuum
Performance
Environment Actuators Sensors
measure
What Type of Intelligent Agent is a
Modern Robot Vacuum?
Performance
Environment Actuators Sensors
measure
How does ChatGPT work?
What Type of Intelligent Agent is
ChatGPT?
Does it collect utility over
Utility-based agents time? How would the utility for
each state be defined?
Is it learning?
Stochastic:
Deterministic: Percepts are 100% reliable • Percepts are unreliable (noise distribution,
and changes in the environment is vs. sensor failure probability, etc.). This is called a
completely determined by the current state stochastic sensor model.
of the environment and the agent’s action. • The transition function is stochastic leading
to transition probabilities and a Markov
process.
Known: The agent knows the transition Unknown: The needs to learn the transition
vs. function by trying actions.
function.
We will spend the whole semester on discussing algorithms that can deal with
environments that have different combinations of these three properties.
Conclusion