0% found this document useful (0 votes)
48 views

Multiagent Systems - Slides

The document provides an introduction to multi-agent systems including definitions of agency, agent architectures, and motivations for using intelligent agents and multi-agent systems. It discusses key properties of intelligent agents such as autonomy, reactivity, proactiveness, rationality, and social ability.

Uploaded by

ziga bajric
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views

Multiagent Systems - Slides

The document provides an introduction to multi-agent systems including definitions of agency, agent architectures, and motivations for using intelligent agents and multi-agent systems. It discusses key properties of intelligent agents such as autonomy, reactivity, proactiveness, rationality, and social ability.

Uploaded by

ziga bajric
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Introduction to

Multi-Agent Systems
Michal Pechoucek, Branislav Bošanský & Michal Jakob
General Information
Lecturers: Prof. Michal Pěchouček, Dr. Branislav Bošanský, Dr. Michal Jákob
Tutorials: Branislav Bošanský and Karel Horak
14 lectures and 14 tutorials
Course web page:
https://cw.fel.cvut.cz/wiki/courses/be4m36mas/start
Recommended reading:
– J. M. Vidal: Multiagent Systems: with NetLogo Examples (on-line)
– Y. Shoham and K. Leyton-Brown: Multiagent Systems: Algorithmic, Game-
Theoretic, and Logical Foundations (on-line)
– Russel and Norvig: Artificial Intelligence: Modern Approach

Selected illustrations taken Russel and Norvig – Artificial Intelligence: Modern Approach
Selected slides have been prepared by Terry Payne, University of LIverpool, who kindly provided them for OI/MAS
Outline of Lecture 1
1. Motivational Introduction
2. Defining Agency
3. Specifying Agents
4. Agent Architectures

!3
Introduction to Multiagent Systems

Motivational Introduction
Autonomous Agents and Multiagent Systems

Multiagent system is a collection of multiple autonomous agents,


each acting towards its objectives while all interacting in a shared
environment, being able to communicate and possibly coordinate
their actions.

Autonomous agent ~ intelligent agent (see later).

!5
Why Intelligent Agents?
1992: computers everywhere
– lots of computerised data
– computer driven manufacturing, production planning, diagnostics

!6
Why Intelligent Agents?
1992: computers everywhere
– lots of computerised data
– computer driven manufacturing, production planning, diagnostics

– AI: expert systems, automated planning, machine learning

!7
Why Intelligent Agents?
1992: computers everywhere
Y2K: internet everywhere
– data provisioning via internet, search (Google from 1998, in 2001 3B of documents)

– an explosion of internet shopping (Amazon from 1995, Ebay from 1996)

!8
Why Intelligent Agents?
1992: computers everywhere
Y2K: internet everywhere
– data provisioning via internet, search (Google from 1998, in 2001 3B of documents)

– an explosion of internet shopping (Amazon from 1995, Ebay from 1996)

– parallel computing (map-reduce)


– statistical data analysis and machine learning
– networking, servers

!9
Why Intelligent Agents?
1992: computers everywhere
Y2K: internet everywhere
NOW: internet of everything
– mobile computing
– cloud computing
– wireless enabled devices

!10
Why Intelligent Agents?
1992: computers everywhere
Y2K: internet everywhere
NOW: internet of everything
– mobile computing
– cloud computing
– wireless enabled devices

– Intelligent Agents and Multiagent systems

!11
Why Intelligent Agents? Ubiquity: Cost of processing power
decreases dramatically (e.g. Moore’s
Law), computers used everywhere
1992: computers everywhere
Interconnection: Formerly only user-
Y2K: internet everywhere

Latest trends in computing


computer interaction, nowadays
distributed/networked machine-to-
NOW: internet of everything machine interactions (e.g. Web APIs)
– mobile computing
Complexity: Elaboration of tasks
– cloud computing
carried out by computers has grown
– wireless enabled devices
Delegation: Giving control to
computers even in safety-critical tasks
(e.g. aircraft or nuclear plant control)

Human-orientation: Increasing use of


metaphors that better reflect human
– Intelligent Agents and Multiagent intuition from everyday life (e.g. GUIs,
speech recognition, object orientation)

!12
Agents briefly
multi-agent system is a decentralized multi-actor (software) system,
often geographically distributed whose behavior is defined and
implemented by means of complex, peer-to-peer interaction among
autonomous, rational and deliberative entities.
autonomous agent is a special kind of a intelligent software program
that is capable of highly autonomous rational action, aimed at achieving
the private objective of the agent – can exists on its own but often is a
component of a multi-agent system – agent is autonomous, reactive,
proactive and social
agent researchers study problems of integration, communication,
reasoning and knowledge representation, competition (games) and
cooperation (robotics), agent oriented software engineering, …

!13
Agents briefly
agent technology is software technology supporting the development
of the autonomous agents and multi-agent systems agent-based
computing is a special research domain, subfield of computer science
and artificial intelligence that studies the concepts of autonomous agents
multi-agent application is a software system, functionality of which is
given by interaction among autonomous software/hardware/human
components.
- but also a monolithic software application that is autonomously
operating within a community of autonomously acting software
applications, hardware systems or human individuals

!14
Key properties of Intelligent Agent
Autonomy: Agent is fully accountable for its given state. Agent accepts
requests from other agents or the environment but decides individually
about its actions
Reactivity: Agent is capable of near-real-time decision with respect to
changes in the environment or events in its social neighbourhood
Intentionality: Agent maintain long term intention. the agent meets the
designer’s objectives. It knows its purpose and executes even if not
requested.
.

!15
Key properties of Intelligent Agent
Rationality: Agent is capable of intelligent rational decision making. Agent
can analyze future course of actions and choose an action which maximizes
his utility
Social capability: Agent is aware of the either:
(i) existence,
(ii) communication protocols,
(iii) capability, services provided by the other agents.
Agent can reason about other agents.

!16
Reactivity
Reactivity
• If a program’s environment is guaranteed to be fixed, the
program need never worry about its own success or
failure

• Program just executes blindly.


• Example of fixed environment: compiler.

• The real world is not like that: most environments are


dynamic and information is incomplete.
18 Copyright: M. J. Wooldridge, S.Parsons and T.R.Payne, Spring 2013. Updated 2018

!17
Reactivity Reactivity

• Software is hard to build for dynamic domains: program


must take into account possibility of failure
• ask itself whether it is worth executing!

• A reactive system is one that maintains an ongoing


interaction with its environment, and responds to changes
that occur in it (in time for the response to be useful).

19 Copyright: M. J. Wooldridge, S.Parsons and T.R.Payne, Spring 2013. Updated 2018

!18
Proactiveness
Proactiveness
• Reacting to an environment is easy
• e.g., stimulus → response rules

• But we generally want agents to do things for us.


• Hence goal directed behaviour.

• Pro-activeness = generating and attempting to achieve


goals; not driven solely by events; taking the initiative.
• Also: recognising opportunities.
20 Copyright: M. J. Wooldridge, S.Parsons and T.R.Payne, Spring 2013. Updated 2018

!19
Social Ability Social Ability
• The real world is a multi-agent environment: we cannot go
around attempting to achieve goals without taking others into
account.
• Some goals can only be achieved by interacting with others.
• Similarly for many computer environments: witness the INTERNET.

• Social ability in agents is the ability to interact with other


agents (and possibly humans) via cooperation, coordination,
and negotiation.
• At the very least, it means the ability to communicate. . .

21 Copyright: M. J. Wooldridge, S.Parsons and T.R.Payne, Spring 2013. Updated 2018

!20
SocialSocial
Ability:Ability: Cooperation
Cooperation

• Cooperation is working together as


a team to achieve a shared goal.

• Often prompted either by the fact that


no one agent can achieve the goal
alone, or that cooperation will obtain
a better result (e.g., get result faster).

22 Copyright: M. J. Wooldridge, S.Parsons and T.R.Payne, Spring 2013. Updated 2018

!21
Social Ability: Coordination
Social Ability: Coordination
• Coordination is managing the
interdependencies between
activities.

• For example, if there is a non-


sharable resource that you want to
use and I want to use, then we need
to coordinate.

23 Copyright: M. J. Wooldridge, S.Parsons and T.R.Payne, Spring 2013. Updated 2018

!22
Social Ability: Coordination
Social Ability: Negotiation
• Negotiation is the ability to reach
agreements on matters of common
interest.

• For example:
• You have one TV in your house; you want to watch a
movie, your housemate wants to watch football.
• A possible deal: watch football tonight, and a movie
tomorrow.

• Typically involves offer and counter-offer,


with compromises made by participants.
24 Copyright: M. J. Wooldridge, S.Parsons and T.R.Payne, Spring 2013. Updated 2018

!23
Some other properties
Some Other Properties...
• Mobility • Veracity
• The ability of an agent to move. For • Whether an agent will knowingly
software agents this movement is communicate false information.
around an electronic network.

• Benevolence
• Rationality • Whether agents have conflicting goals,
• Whether an agent will act in order to and thus whether they are inherently
achieve its goals, and will not helpful.
deliberately act so as to prevent its
goals being achieved.
• Learning/adaption
• Whether agents improve performance
over time.

25 Copyright: M. J. Wooldridge, S.Parsons and T.R.Payne, Spring 2013. Updated 2018

!24
Agents vs. Objects
agent's behaviour is unpredictable as observed from the outside,
agent is situated in the environment, communication model is
asynchronous, agent is autonomous, …

!25
Agents vs. Objects
agent's behaviour is unpredictable as observed from the outside,
agent is situated in the environment, communication model is
asynchronous, agent is autonomous, …

agents are programs, they are build out of objects


→ while objects often consist of objects, and object make
together an object, agents never contain other agents, agents
build together a multiagent system

!26
Multiagent Systems Engineering &
Agent Oriented Software Engineering

Novel paradigm for building robust, scalable and extensible


control, planning and decision-making systems
– socially-inspired computing
– self-organized teamwork systems
– distributed (collective) artificial intelligence

MAS become increasingly relevant as the connectivity, intelligence


and autonomy of devices grows!

Software engineering methodology for designing MAS


!27
Multiagent Systems Engineering &
Agent Oriented Software Engineering

Novel paradigm for building robust, scalable and extensible


control, planning and decision-making systems
– socially-inspired computing
– self-organized teamwork systems
– distributed (collective) artificial intelligence

MAS become increasingly relevant as the connectivity, intelligence


and autonomy of devices grows!

Software engineering methodology for designing MAS


!28
Multiagent Design Problem
Traditional design problem: How can I build a system that
produces the correct output given some input?
– Each system is more or less isolated, built from scratch

!29
Multiagent Design Problem
Traditional design problem: How can I build a system that
produces the correct output given some input?
– Each system is more or less isolated, built from scratch

Multiagent design problem: How can I build a system that can


operate independently on my behalf in a networked, distributed,
large-scale environment in which it will need to interact with
different other components pertaining to other users?
– Each system is built into an existing, persistent but constantly evolving
computing ecosystem – it should be robust with respect to changes
– No single owner and/or central authority

!30
Types of Agent Systems

single-agent multi-agent

cooperative competitive

single shared utility multiple different utilities

!31
Micro vs. Macro MAS Engineering

1. The agent design problem (micro perspective):


How should agents act to carry out their tasks?
2. The society design problem (macro perspective):
How should agents interact to carry out their tasks?

!32
!33
!34
!35
!36
Opportunities for MAS Deployment
Agent-based computing have been used:
1. Design paradigm – the concept of decentralized, interacting, socially
aware, autonomous entities as underlying software paradigm (often deployed
only in parts, where it suits the application)
2. Source of technologies – algorithms, models, techniques architectures,
protocols but also software packages that facilitate development of
multi-agent systems
3. Simulation concept – a specialized software technology that allows
simulation of natural multi-agent systems, based on (1) and (2).

!37
Opportunities for MAS Deployment
Agent-based computing have been used:
1. Design paradigm – the concept of decentralized, interacting, socially
aware, autonomous entities as underlying software paradigm (often deployed
only in parts, where it suits the application)
2. Source of technologies – algorithms, models, techniques architectures,
protocols but also software packages that facilitate development of
multi-agent systems
3.Agent Oriented
Simulation Software
concept Engineering
– a specialized software– technology
provide designers and
that allows
developers with
simulation a way of
of natural structuring
multi-agent an application
systems, based onaround
(1) andautonomous,
(2).
communicative elements, and lead to the construction of software tools and
infrastructures to support this metaphor

!38
Opportunities for MAS Deployment
Agent-based computing have been used:
1. Design paradigm – the concept of decentralized, interacting, socially
aware, autonomous entities as underlying software paradigm (often deployed
only in parts, where it suits the application)
2. Source of technologies – algorithms, models, techniques architectures,
protocols but also software packages that facilitate development of
multi-agent systems
3. Simulation concept – a specialized software technology that allows
simulation of natural multi-agent systems, based on (1) and (2).

Multi-Agent Techniques – provide a selection of specific computational


techniques and algorithms for dealing with collective of computational
processes and complexity of interactions in dynamic and open environments.

!39
Opportunities for MAS Deployment
Agent-based computing have been used:
1. Design paradigm – the concept of decentralized, interacting, socially
aware, autonomous entities as underlying software paradigm (often deployed
only in parts, where it suits the application)
2. Source of technologies – algorithms, models, techniques architectures,
protocols but also software packages that facilitate development of
multi-agent systems
3. Simulation concept – a specialized software technology that allows
simulation of natural multi-agent systems, based on (1) and (2).

Multi-Agent Simulation – provide expressive models for representing


complex and dynamic real-world environments, with the emphasis on
capturing the interaction related properties of such systems
!40
Intelligent Agents Applications
Manufacturing and production Air traffic and space

Traffic and logistics Security applications

Robotics, autonomous systems Energy and smart grids

!41
Course Content
– Agent architectures
– Non-cooperative game theory
– Coalition game theory
– Mechanism design
– Auctions
– Social choice
– Distributed constraint reasoning
– Agent based simulation

!42
Introduction to Multi-Agent Systems

Defining Agency

!43
What is Agent?

Definition (Russell & Norvig): An agent is anything that can


perceive its environment (through its sensors) and act upon
that environment (through its effectors)

Focus on situatedness in the environment (embodiment)


The agent can only influence the environment but not fully control it
(sensor/effector failure, non-determinism)
What is Agent?

Definition (Wooldridge & Jennings): An agent is a computer


system that is situated in some environment, and that is
capable of autonomous action in this environment in order to
meet its design objectives/delegated goals.

Adds a second dimension to agent definition: the relationship


between agent and designer/user
– agent is capable of independent action
– agent action is purposeful

Autonomy is a central, distinguishing property of agents

!45
Rational Behaviour
Definition (Russell & Norvig): Rational agent chooses
whichever action maximizes the expected value of the
performance measure given the percept sequence to date and
whatever bulit-in knowledge the agent has.

Rationality is relative and depends on four aspects:


1. performance measure for the degree of success
2. percept sequence (complete perceptual history)
3. agent’s knowledge about the environment
4. actions available to the agent

!46
Agent Environments

!47
Agent Environments

!48
Agent Environments

!49
Agent Environments

!50
Agent Environments

!51
Agent Environments

!52
AgentProperties of Environments
Environments

• Real time
• A real time interaction is one in which time plays a part in evaluating an agents performance
• Such interactions include those in which:
• A decision must be made about some action within a given time bound
• Some state of affairs must occur as quickly as possible
• An agent has to repeat some task, with the objective to repeat the task as often as possible

16 Copyright: M. J. Wooldridge, S.Parsons and T.R.Payne, Spring 2013. Updated 2018

!53
Example Environments

Solitaire Backgammon Shopping Taxi


Observable No Yes No No
Deterministic Yes No Partly No
Episodic No No No No
Static Yes Semi Semi No
Discrete Yes Yes Yes No
Single-agent Yes No Yes (except No
auctions)

!54
Rational Behaviour
Definition (Russell & Norvig): Rational agent chooses
whichever action maximizes the expected value of the
performance measure given the percept sequence to date and
whatever bulit-in knowledge the agent has.

Rationality is relative and depends on four aspects:


1. performance measure for the degree of success
2. percept sequence (complete perceptual history)
3. agent’s knowledge about the environment
4. actions available to the agent

!55
Abstract Architectures for Agents
•Assume the world may be in any of a finite set E of discrete,
instantaneous states
E = {e, e0 , . . .}

•Agents are assumed to have a repertoire of possible actions,


Ac, available to them, which transform the state of the world.
Ac = {↵, ↵0 , . . .}

• Actions can be non-deterministic, but only one state ever results from and
action.

•A run, r, of an agent in an environment is a sequence of


interleaved world states and actions:

0 1 ↵ 2 ↵
3 ↵ ↵u 1
r : e0 ! e1 ! e2 ! e3 ! ··· ! eu
Abstract Architectures for Agents (1)
• When actions are deterministic each state has only
one possible successor.
• A run would look something like the following:

North
North
Abstract Architectures for Agents (2)
• When actions are deterministic each state has only
one possible successor.
• A run would look something like the following:

East

North
Abstract Architectures for Agents

North
North

We could
illustrate this
as a graph...
Abstract Architectures for Agents

North
North

When actions are


non-deterministic a
run (or trajectory) is
the same, but the set
of possible runs is
more complex.
Runs

• In fact it is more complex still, because all of the runs we


pictured start from the same state.

• Let: R be the set of all such possible finite sequences (over E and Ac);
RAc be the subset of these that end with an action; and
RE be the subset of these that end with a state.

• We will use r,r′,... to stand for the members of R


• These sets of runs contain all runs from all starting states.
Environments
• A state transformer function represents behaviour of the
environment:
⌧ : RAc ! 2E
• Note that environments are...
• history dependent: the next state not only dependent on the
action of the agent, but an earlier action may be significant
• non-deterministic: There is some uncertainty about the result

• If ⌧ (r) = ; there are no possible successor states to r, so we say the


run has ended. (“Game over.”)
• An environment Env is then a triple Env = hE, e0 , ⌧ i where E is
set of states, e0 ∈ E is initial state; and τ is state transformer
function.
Agents

• We can think of an agent as being a function which


maps runs to actions: Ag : RE ! Ac

• Thus an agent makes a decision about what action


to perform
• based on the history of the system that it has witnessed to date.

• Let Ag be the set of all agents.


System

• A system is a pair containing an agent and an


environment.

• Any system will have associated with it a set of possible


runs
• We denote the set of runs of agent Ag in environment Env by:

R(Ag, Env)

• Assume that this only contains runs that have ended.


Systems

Formally, a sequence
(e0 , ↵0 , e1 , ↵1 , e2 , . . .)
represents a run of an agent Ag in environment Env = hE, e0 , ⌧ i if:

1. e0 is the initial state of Env

2. ↵0 = Ag(e0 ); and
3. for u > 0,
eu 2 ⌧ ((e0 , ↵0 , . . . , ↵u 1 )) and
↵u = Ag((e0 , ↵0 , . . . , eu ))
Why the notation?

• Well, it allows us to get a precise handle on some ideas about


agents.
• For example, we can tell when two agents are the same.

• Of course, there are different meanings for “same”. Here is one


specific one.

Two agents are said to be behaviorally equivalent with


respect to Env i↵ R(Ag1 , Env) = R(Ag2 , Env).

• We won’t be able to tell two such agents apart by watching what they
do.
Deliberative Agents

• Maecenas aliquam maecenas ligula


North
nostra, accumsan taciti. Sociis mauris in East
North
integer
• El eu libero cras interdum at eget West
North
habitasse elementum Westest, ipsum purus
pede
• Aliquet sed. Lorem ipsum dolor sit amet,
ligula suspendisse nulla
Potentially the pretium,
agent rhoncus
will reach a
different decision when it reaches
the same state by different
routes.
Purely Reactive Agents

• Some agents decide what to do without reference to their history


• they base their decision making entirely on the present,
with no reference at all to the past.

• We call such agents purely reactive:

action : E ! Ac

• A thermostat is a purely reactive agent.


o↵ if e = temperature OK
action(e) =
on otherwise.
Reactive Agents

• Maecenas aliquam maecenas ligula


North
nostra, accumsan taciti. Sociis mauris in
North
integer
• El eu libero cras interdum at eget West
North
habitasse elementum Westest, ipsum purus
pede
• Aliquet sed. Lorem ipsum dolor sit amet,
ligula suspendisse nulla pretium, rhoncus
A reactive agent will always
do the same thing in the same
state.
Purely Reactive Robots

• A simple reactive program for a robot might be:


– Drive forward until you bump into something. Then, turn to the
right. Repeat.
Agents with State

Environment

see actio
Agent

next
Perception

• The see function is the agent’s ability to observe its environment,


whereas the action function represents the agent’s decision making
process.

• Output of the see function is a percept:


see : E ! P er
• ...which maps environment states to percepts.

• The agent has some internal data structure, which is typically used to
record information about the environment state and history.

• Let I be the set of all internal states of the agent.


Actions and Next State Functions

• The action-selection function action is now defined as a


mapping from internal states to actions:
action : I ! Ac
• An additional function next is introduced, which maps an
internal state and percept to an internal state:

next : I ⇥ P er ! I
• This says how the agent updates its view of the world
when it gets a new percept.
Agent Control Loop

1. Agent starts in some initial internal state i0 .


2. Observes its environment state e, and generates a percept see(e).
3. Internal state of the agent is then updated via next function, becoming
next(i0 , see(e)).

4. The action selected by the agent is action(next(i0 , see(e))).


This action is then performed.
5. Goto (2).
Tasks for Agents
• We build agents in order to carry out tasks for us.
– The task must be specified by us. . .

• But we want to tell agents what to do without telling


them how to do it.
– How can we make this happen???
Utility functions
• One idea:
– associated rewards with states that we want agents to bring
about.
– We associate utilities with individual states
• the task of the agent is then to bring about states that maximise
utility.

• A task specification is then a function which associates


a real number with every environment state:

u:E!R
Local Utility Functions
• But what is the value of a run...
– minimum utility of state on run?
– maximum utility of state on run?
– sum of utilities of states on run?
– average?

• Disadvantage:
– difficult to specify a long term view when assigning utilities to
individual states.

• One possibility:
– a discount for states later on. This is what we do in reinforcement
learning.
r = -0.04 (unless stated otherwise)

Example of local 1 r=+

utility function 2
1
r=-1
3 👎

1 2 3 4
• Goal is to select actions to
maximise future rewards
– Each action results in moving to a state with some assigned reward
– Allocation of that reward may be immediate or delayed (e.g. until the
end of the run)
– It may be better to sacrifice immediate reward to gain more long-
term reward

• We can illustrate with a simple 4x3 environment


– What actions maximise the reward?
r = -0.04 (unless stated otherwise)
Example of local utility 1 r=1
function 2 👍
r=-1
3 👎

Assume environment was 1 2 3 4


deterministic
Deterministic
• Optimal Solution is: Environment
Agent is guaranteed
p=1.0
• [Up, Up, Right, Right, Right] to be in the intended
cell (i.e. probability =

• Additive Reward is: 1.0)

• r = (-0.04 x 4) + 1.0
• r = 1.0 - 0.16 = 0.84

• i.e. the utility gained is the sum of the rewards


received
• The negative (-0.04) reward incentivises the agent to reach its
goal asap.
r = -0.04 (unless stated otherwise)

Sequential Decision 1 r=1

Making 2 👍
r=-1
3 👎

When environment is non-deterministic 1 2 3 4

• Probability of reaching the goal if successful: p=0.1 p=0.8


– p = 0.85 = 0.32768 p=0.1
• Could also reach the goal accidentally by Non-Deterministic
going the wrong way round: Environment
Agent may fail to reach its
intended cell (i.e.
– p = 0.14 x 0.8 = 0.0001 x 0.8 = 0.00008 probability of success =
0.8, but may move

– Final probability of reaching the goal: sideways with p=0.1 in


each direction
p = 0.32776
• Utility gained depends on the route taken
– Reinforcement Learning builds upon this type of model
Utilities over Runs

• Another possibility: assigns a utility not to individual states,


but to runs themselves:

u:R!R

• Such an approach takes an inherently long term view.


• Other variations:
– incorporate probabilities of different states emerging.

• To see where utilities might come from, let’s look at an


example.
Utility in the Tileworld

• Simulated two dimensional grid environment on which there


are agents, tiles, obstacles, and holes.

• An agent can move in four directions: The agent starts


to push a tile
• up, down, left, or right towards the
hole.
• If it is located next to a tile, it can push it.
But then the
• Holes have to be filled up with tiles. hole
disappears!!!
• An agent scores points by filling holes with tiles,
with the aim to fill as many holes as possible.
Later, a much
more convenient
• TILEWORLD changes with the random hole appears
(bottom right)
appearance and disappearance of holes.
Utilities in the Tileworld

• Utilities are associated over runs, so that more holes


filled is a higher utility.
• Utility function defined as follows:
• Thus: number of holes filled in r
u(r) =
ˆ
number of holes that appeared in r

• if agent fills all holes, utility = 1.


• if agent fills no holes, utility = 0.

• TILEWORLD captures the need for reactivity and for


the advantages of exploiting opportunities.
Expected Utility

• To denote probability that run r occurs when agent


Ag is placed in environment Env, we can write:
P (r | Ag, Env)
• In a non-deterministic environment, for example, this
can be computed from the probability of each step.

For a run r = (e0 , ↵0 , e1 , ↵1 , e2 , . . .):

P (r | Ag, Env) = P (e1 , | e0 , ↵0 )P (e2 | e1 , ↵1 ) . . .

and clearly:
X
P (r | Ag, Env) = 1.
r2R(Ag,Env)
Expected Utility

• The expected utility (EU) of agent Ag in environment Env


(given P, u), is then:
X
EU (Ag, Env) = u(r)P (r | Ag, Env).
r2R(Ag,Env)

• That is, for each run we compute the utility and multiply it by
the probability of the run.

• The expected utility is then the sum of all of these.


Expected Utility
• The probability of a run can be determined from individual
actions within a run
– Using the decomposability axiom from Utility Theory

“... Compound lotteries can be reduced to simpler


ones using the law of probability. Known as the “no
Optimal Agents

• The optimal agent Agopt in an environment Env is the


one that maximizes expected utility:

Agopt = arg max EU (Ag, Env)


Ag2AG

• Of course, the fact that an agent is optimal does not


mean that it will be best; only that on average, we
can expect it to do best.
The probabilities of the various runs are

Example 1 as follows:

P (e0 !
0↵
e1 | Ag1 , Env1 ) = 0.4

P (e0 !
0
e2 | Ag1 , Env1 ) = 0.6

P (e0 !
1
e3 | Ag2 , Env1 ) = 0.1
Consider the environment Env1 = hE, e0 , ⌧ i ↵
defined as follows: P (e0 !
1
e4 | Ag2 , Env1 ) = 0.2

P (e0 !
1
e5 | Ag2 , Env1 ) = 0.7
E = {e0 , e1 , e2 , e3 , e4 , e5 }
↵ Assume the utility function u1 is defined
⌧ (e0 !)
0
= {e1 , e2 } as follows:

⌧ (e0 !)
1
= {e3 , e4 , e5 } ↵
u1 (e0 !
0
e1 ) = 8

There are two agents possible with respect u1 (e0 !
0
e2 ) = 11
to this environment: ↵
u1 (e0 !
1
e3 ) = 70
Ag1 (e0 ) = ↵0 ↵
u1 (e0 !
1
e4 ) = 9
Ag2 (e0 ) = ↵1 ↵
u1 (e0 !
1
e5 ) = 10

What are the expected utilities of the


agents for this utility function?
Example 1 Solution

Given the utility function u1 in the question, we have two transition func-
↵0 ↵1
tions defined as ⌧ (e0 !) = {e1 , e2 , e3 }, and ⌧ (e0 !) = {e4 , e5 , e6 }. The
probabilities of the various runs (two for the first agent and three for the sec-
ond) is given in the question, along with the probability of each run occurring.
Given the definition of the utility function u1 , the expected utilities of agents
Ag0 and Ag1 in environment Env can be calculated using:
X
EU (Ag, Env) = u(r)P (r|Ag, Env).
r2R(Ag,Env)

This is equivalent to calculating the sum of the product of each utility for a run
ending in some state with the probability of performing that run; i.e.

• Utility of Ag0 = (0.4 ⇥ 8) + (0.6 ⇥ 11) = 9.8

• Utility of Ag1 = (0.1 ⇥ 70) + (0.2 ⇥ 9) + (0.7 ⇥ 10) = 15.8

Therefore agent Ag1 is optimal.


The probabilities of the various runs are

Example 2 as follows:
0
P (e0 !

e1 | Ag1 , Env1 ) = 0.5
0↵
P (e0 ! e2 | Ag1 , Env1 ) = 0.5
1↵
Consider the environment Env1 = hE, e0 , ⌧ i P (e1 ! e3 | Ag1 , Env1 ) = 1.0
defined as follows: 0
P (e0 !

e1 | Ag2 , Env1 ) = 0.1

E = {e0 , e1 , e2 , e3 , e4 , e5 } 0
P (e0 ! e2 | Ag2 , Env1 ) = 0.9
↵ 2↵
0
⌧ (e0 !) = {e1 , e2 } P (e2 ! e4 | Ag2 , Env1 ) = 0.4
2↵
1↵ P (e2 ! e5 | Ag2 , Env1 ) = 0.6
⌧ (e1 !) = {e3 }

Assume the utility function u1 is defined
2
⌧ (e2 !) = {e4 , e5 } as follows:

0
There are two agents, Ag1 and Ag2 , with respect u1 (e0 ! e1 ) = 4
to this environment: ↵
0
u1 (e0 ! e2 ) = 3

1
Ag1 (e0 ) = ↵0 Ag2 (e0 ) = ↵0 u1 (e1 ! e3 ) = 7
Ag1 (e1 ) = ↵1 Ag2 (e2 ) = ↵2 ↵
2
u1 (e2 ! e4 ) = 3

2
u1 (e2 ! e5 ) = 2

What are the expected utilities of the agents


for this utility function?
Example 2 solution

Agent 1
=0.5p(e1⟶ e3)=1.0
e 1) e1 e3
e 0⟶
p ( .1
0
ep(0e =
e 1) e1
0⟶
e 0⟶
e2 ) (
=0e2
p
ep0(e
.5 0⟶ =0.4
e2 ) e 4) e4
=0 e 2⟶
p.9(
ep(e2
2⟶
e5 ) Agent 2
=0e5
.6
Example 2 solution

0.1 1e
Agent 1 p =
p=0.9⨉0.4=0.36
0.5p(e1⟶ e3)=1.0 pe e4
=
e 1) e1 e3 =00
.9⨉
e 0⟶ 0.6
p (
0 . 1 =0 e5
ep(0e =
e 1) e1 .54
0⟶
e 0⟶
e2 ) (
=0e2
p
ep0(e
.5
0 .5 0⟶ =0.4
=
.0 e3 e2 ) e 4) e4
⨉ 1 =0 e 2⟶
=0.5 p.9(
p
e0 ep(e2
p= 2⟶
0.5 e5 ) Agent 2
e2 =0e5
.6
Example 2 solution

u= 0.1 e1
4
Agent41 u(e1⟶ e3)=7 p=
= .5 p=0.9⨉0.4=0.36
e 1) )=0 p(e1⟶ e3)=1.0 pe=00 e
u=3+3=6 4
⟶ e 1 e1 e3 .9⨉
e 0
u( (e 0⟶ 4 0.6
u= =0 e5
p )= .1
e =0
ep(0e ⟶ e
1
1) e1 3+ .54
u( 0 ⟶ u(e 0⟶ 2=
0
e0 ⟶ 5
e2 ) p(e
e2 ) =0e 2 ep0 =3 .4
=3 5. 1 5 u( (e0 ⟶ )
e )=0
=1 = 0.
4
e0 ⟶
e2 ) (e 2⟶ e e4
7
4+ 1.0 e3
4
= e2 ) =0u e 2⟶
Find sum up=0.5 ⨉ =3 p.9(
ep 2
of utilities e0p= u( (e2
e2 ⟶⟶
for each u= 0.5 e
e5 ) 5 )= e5
Agent 2
3 e2 =2 0.6
run
Example 2 solution

Run Utility Probability


Agent 1 e0⟶ e3 u=11 p=0.5
e0⟶ e2 u=3 p=0.5
Agent 2 e0⟶ e1 u=4 p=0.1

e0⟶ e4 u=6 p=0.36

e0⟶ e5 u=5 p=0.54

Ag1= (11 ⨉ 0.5) + (3 ⨉ 0.5) = 5.5 + 1.5 = 7


Ag2= (4 ⨉ 0.1) + (6 ⨉ 0.36) + (5 ⨉ 0.54)
= 0.4 + 2.16 + 2.7 = 5.26
Bounded Optimal Agents
• Some agents cannot be implemented on some computers
• The number of actions possible on an environment (and consequently the
number of states) may be so big that it may need more than available
memory to implement.

• We can therefore constrain our agent set to include only


those agents that can be implemented on machine m:

AG m = {Ag | Ag 2 AG and Ag can be implemented on m}.


• The bounded optimal agent, Agbopt, with respect to m is
then. . .
Agbopt = arg max EU (Ag, Env)
Ag2AG m
Predicate Task Specifications

• A special case of assigning utilities to histories is to assign


0 (false) or 1 (true) to a run.
• If a run is assigned 1, then the agent succeeds on that run,
otherwise it fails.

• Call these predicate task specifications.


• Denote predicate task specification by Ψ:

: R ! {0, 1}
Task Environments
• A task environment is a pair <Env, Ψ>, where Env is an
environment, and the task specification Ψ is defined by:

: R ! {0, 1}

• Let the set of all task environments be defined by: T E

• A task environment specifies:


• the properties of the system the agent will inhabit;
• the criteria by which an agent will be judged to have either failed or
succeeded.
Task Environments

• To denote set of all runs of the agent Ag in environment Env that


satisfy Ψ, we write:
R (Ag, Env) = {r | r 2 R(Ag, Env) and (r) = 1}.

• We then say that an agent Ag succeeds in task environment


<Env, Ψ > if
R (Ag, Env) = R(Ag, Env)

• In other words, an agent succeeds if every run satisfies the


specification of the agent.

We could also write this as: A more optimistic idea of success is:
8r 2 R(Ag, Env), we have (r) = 1 9r 2 R(Ag, Env), we have (r) = 1

However, this is a bit pessimistic: if which counts an agent as successful as


The Probability of Success

• If the environment is non-deterministic, the τ


returns a set of possible states.
• We can define a probability distribution across the set of
states.
• Let P(r | Ag, Env) denote probability that run r occurs if
agent Ag is placed in environment Env.
• Then the probability P(Ψ | Ag, Env) that Ψ is satisfied by
Ag in Env would then simply be:

X
P( | Ag, Env) = P (r | Ag, Env)
r2R (Ag,Env)
Achievement and Maintenance Tasks

• The idea of a predicate task specification is


admittedly abstract.

• It generalises two common types of tasks,


achievement tasks and maintenance tasks:
1. Achievement tasks: Are those of the form “achieve state
of affairs φ”.
2. Maintenance tasks: Are those of the form “maintain
state of affairs ψ”.
Achievement and Maintenance Tasks
An achievement task is specified by a set G of “good” or “goal”
states: G ⊆ E.
– The agent succeeds if it is guaranteed to bring about at least one of these
states (we don’t care which, as all are considered good).

– The agent succeeds if in an achievement task if it can force the


environment into one of the goal states g ∈ G.

A maintenance goal is specified by a set B of “bad” states: B ⊆ E.


– The agent succeeds in a particular environment if it manages to avoid all states
in B — if it never performs actions which result in any state in B occurring.

– In terms of games, the agent succeeds in a maintenance task if it ensures that


it is never forced into one of the fail states b ∈ B.

!101
Rationality
The agents rationality is given by the choice of actions based on expected utility
of the outcome of the action. The rational agent selects an action a that
provides the maximal expected outcome:

Bounded Rationality: capability of the agent to perform rational decision (to


choose the lottery providing maximal expected outcome) given bounds on
computational resources:
– bounds on time complexity
– bounds on memory requirements
Calculative Rationality: capability to perform rational choice earlier than a
fastest change in the environment can occur.
!102
Rationality

Self-interested rational agent:

Cooperative rational agent:

!103

You might also like