Chapter2 Handout
Chapter2 Handout
Intelligent agent
Be come aware of s.thing
Figure 2.1 Agents interact with environments through sensors and actuators.
Mathematically speaking, we say that an agent's behavior is described by the agent function
1
The agent program runs on the physical architecture to produce F.
Exists theoretically
The agent function is an abstract mathematical description; the agent program is a
(Exists physically)
concrete implementation, running on the agent architecture.
To illustrate percept sequence and other related things, we will use a very simple
example-the vacuum-cleaner world shown in Figure 2.2.
This particular world has just two locations: squares A and B.
The vacuum agent perceives which square it is in and whether there is dirt in the
square. It can choose to move left, move right, suck up the dirt, or do nothing.
One very simple agent function is the following: if the current square is dirty, then
suck, otherwise move to the other square.
Looking at Figure 2.3, we see that various vacuum-world agents can be defined simply by
filling in the right-hand column in various ways. The obvious question, then, is this: what
makes an agent good or bad,intelligent or stupid? We answer these questions in the next
section.
Before closing this section, we will remark that the notion of an agent is meant to be a tool
for analyzing systems, not an absolute characterization that divides the world into agents
and non-agents. One could view a hand-held calculator as an agent that chooses the action of
displaying "4" when given the percept sequence "2 + 2 =," but such an analysis would hardly
aid our understanding of the calculator.
2
Percept sequence Action
Figure 2.3
Partial tabulation of a simple agent function for the vacuum-cleaner world shown in fig 2.2
3
Obviously, there is not one fixed measure suitable for all agents.
As a general rule, it is better to design performance measures according to what one
actually wants in the environment, rather than according to how one thinks the agent
should behave.
Rationality
What is rational at any given time depends on four things:
The performance measure that defines the criterion of success.
The agent's prior knowledge of the environment.
The actions that the agent can perform.
The agent's percept sequence to date.
This leads to a definition of a rational agent:
For each possible percept sequence, a rational agent should select an action that is
expected to maximize its performance measure, given the evidence provided by the
percept sequence and whatever built-in knowledge the agent has.
Omniscience, learning, and autonomy
Omniscience
We need to be careful to distinguish between rationality and omniscience. An omniscient
agent knows the actual outcome of its actions and can act accordingly; but omniscience is
impossible in reality.
Rationality is not the same as perfection. Rationality maximizes expected performance,
while perfection maximizes actual performance.
Doing actions in order to modify future percepts-sometimes called information gathering-is
an important part of rationality.
Learning
Our definition requires a rational agent not only to gather information, but also to learn as
much as possible from what it perceives. The agent's initial configuration could reflect some
prior knowledge of the environment, but as the agent gains experience this may be modified
and augmented.
4
Autonomy
If an agent relies on the prior knowledge of its designer rather than on its own percepts, we
say that the agent lacks autonomy. A rational agent should be autonomous-it should learn
what it can to compensate for partial or incorrect prior knowledge.
After sufficient experience of its environment, the behavior of a rational agent can become
effectively independent of its prior knowledge.
5
-Control over the engine through the accelerator and control over steering and braking. In
addition, it will need output to a display screen or voice synthesizer to talk back to the
passengers, and perhaps some way to communicate with other vehicles. politely or otherwise.
Sensor
Include one or more controllable TV cameras, the speedometer, and the odometer. To
control the vehicle properly, especially on curves, it should have an accelerometer; it will
also need to know the mechanical state of the vehicle, so it will need the usual array of engine
and electrical system sensors. It might have instruments that are not available to the average
human driver: a satellite global positioning system (GPS) to give it accurate position
information with respect to an electronic map, and infrared or sonar sensors to detect
distances to other cars and obstacles. Finally, it will need a keyboard or microphone for the
passenger to request a destination.
6
Deterministic vs Stochastic
If the next state of the environment is completely determined by the current state and the
action executed by the agent, then we say the environment is deterministic; otherwise, it is
stochastic.
Taxi driving is clearly stochastic in this sense, because one can never predict the
behaviour of traffic exactly; moreover, one's tires blow out and one's engine seizes up
without warning.
If the environment is deterministic except for the actions of other agents, we say that the
environment is strategic.
Episodic vs. sequential
In an episodic task environment, the agent's experience is divided into atomic episodes.
Each episode consists of the agent perceiving and then performing a single action.
Crucially, the next episode does not depend on the actions taken in previous episodes.
The choice of action in each episode depends only on the episode itself. For example, an
agent that has to spot defective parts on an assembly line bases each decision on the
current part, regardless of previous decisions; moreover, the current decision doesn't affect
whether the next part is defective.
In sequential environments, on the other hand, the current decision could affect all future
decisions. Chess and taxi driving are sequential: in both cases, short term actions can have
long-term consequences.
Episodic environments are much simpler than sequential environments because the agent
does not need to think ahead.
Static vs dynamic
If the environment can change while an agent is deliberating, then we say the environment is
dynamic for that agent; otherwise, it is static. Dynamic environments are continuously
asking the agent what it wants to do.
If the environment itself does not change with the passage of time but the agent's
performance score does, then we say the environment is semidynamic.
Taxi driving is clearly dynamic:
Single agent Vs multi-agent
The distinction between single-agent and multi-agent environments may seem simple
enough. For example, an agent solving a crossword puzzle by itself is clearly in a single-
7
agent environment, whereas an agent playing chess is in a two-agent environment. Chess is a
competitive multiagent environment. In the taxi-driving environment, on the other hand,
avoiding collisions maximizes the performance measure of all agents, so it is a partially
cooperative multiagent environment.
So far it is about the behaviour of agent=the action after sequence of percept. But
now it is about the inside work of the agent.
The job of AI is to design the agent program that implements the agent function mapping
percepts to actions.
This program will run on some sort of computing device with physical sensors and
actuators-we call this the architecture: agent = architecture +program.
The program we choose has to be one that is appropriate for the architecture.
Agent programs
The agent programs that we will design take the current percept as input from the sensors and
return an action to the actuator. Notice the difference between the agent program, which takes
the current percept as input, and the agent function, which takes the entire percept history.
The agent program takes just the current percept as input because nothing more is available
from the environment; if the agent's actions depend on the entire percept sequence, the agent
will have to remember the percepts.
8
In the remainder of this section, we outline four basic kinds of agent program that embody
the principles underlying almost all intelligent systems:
Simple reflex agents;
Model-based reflex agents;
Goal-based agents; and
Utility-based agents.
We then explain in general terms how to convert all these into learning agents.
Simple reflex agents
These agents select actions on the basis of the current percept, ignoring the rest of the
percept history. For example, the vacuum agent is a simple reflex agent, because its decision
is based only on the current location and on whether that contains dirt. An agent program for
this agent is shown in Figure 2.8.
9
Model-based reflex agents
The most effective way to handle partial observability is for the agent to keep track of the
part of the world it can't see now. That is, the agent should maintain some sort of internal
state that depends on the percept history and thereby reflects at least some of the
unobserved aspects of the current state.For driving tasks such as changing lanes, the agent
needs to keep track of where the other cars are if it can't see them all at once.
Updating this internal state information as time goes by requires two kinds of knowledge to
be encoded in the agent program. First, we need some information about how the world
evolves independently of the agent. Second, we need some information about how the
agent's own actions affect the world.
This knowledge about "how the world works-whether implemented in simple Boolean
circuits or in complete scientific theories-is called a model of the world. An agent that uses
such a +-model is called a model-based agent
10
Figure 2.12 A model-based reflex agent. It keeps track of the current state of the world
using an internal model. It then chooses an action in the same way as the reflex agent.
Goal-based agents
As well as a current state description, the agent needs some sort of goal information that
describes situations that are desirable-for example, being at the passenger's destination.
Figure 2.13
A model-based, goal-based agent. It keeps track of the world state as well as a set of goals it
is trying to achieve, and chooses an action that will (eventually) lead to the achievement of its
goals.
Search and planning are the subfields of AI devoted to finding action sequences that
achieve the agent's goals.
Utility-based agents
Goals alone are not really enough to generate high-quality behavior in most environments.
For example, there are many action sequences that will get the taxi to its destination (thereby
achieving the goal) but some are quicker, safer, more reliable, or cheaper than
others. Goals just provide a crude binary distinction between "happy" and "unhappy" states,
whereas a more general performance measure should allow a comparison of different world
states according to exactly how happy they would make the agent if they could be achieved.
Because "happy" does not sound very scientific, the customary terminology is to say that if
one world state is preferred to another, then it has higher utility for the agent.'
11
A utility function maps a state (or a sequence of states) onto a real number, which
describes the associated degree of happiness.
Learning agents
We have described agent programs with various methods for selecting actions. We have
not, so far, explained how the agent programs come into being. In his famous early paper,
Turing (1950) considers the idea of actually programming his intelligent machines by hand.
He estimates how much work this might take and concludes."Some more expeditious method
seems desirable." The method Turing(1950) proposes is to build learning machines and
then to teach them. In many areas of AI, this is now the preferred method for creating state-
of-the-art systems. Learning has another advantage, as we noted earlier: it allows the agent
to operate in initially unknown environments and to become more competent than its
initial knowledge alone might allow.
A learning agent can be divided into four conceptual components, as shown in Figure 2.15.
Learning element, which is responsible for making improvements.
The performance element, which is responsible for selecting external actions. The
performance element is what we have previously considered to be the entire agent: it
takes in percepts and decides on actions.
The learning element uses feedback from the critic on how the agent is doing and
determines how the performance element should be modified to do better in the
future.
The critic tells the learning element how well the agent is doing with respect to a
fixed performance standard.
The problem generator is responsible for suggesting actions that will lead to new
and informative experiences. Performance element is modified by installing the new
rule. The problem generator might identify certain areas of behavior in need of
improvement and suggest experiments, such as trying out the brakes on different road
surfaces under different conditions.
12
Figure 2.14
A model-based, utility-based agent. It uses a model of the world, along with a utility function
that measures its preferences among states of the world. Then it chooses the action that leads
to the best expected utility, where expected utility is computed by averaging over all possible
outcome states, weighted by the probability of the outcome.
Summary
An agent is something that perceives and acts in an environment. The agent function
for an agent specifies the action taken by the agent in response to any percept sequence.
13
The performance measure evaluates the behavior of the agent in an environment. A
rational agent acts so as to maximize the expected value of the performance measure,
given the percept sequence it has seen so far.
A task environment specification includes the performance measure, the external
environment, the actuators, and the sensors. In designing an agent, the first step must
always be to specify the task environment as fully as possible.
Task environments vary along several significant dimensions. They can be fully or partially
observable, deterministic or stochastic, episodic or sequential, static or dynamic,discrete or
continuous, and single agent or multiagent.
The agent program implements the agent function. There exists a variety of basic agent
program designs, reflecting the kind of information made explicit and used in the decision
process. The designs vary in efficiency, compactness, and flexibility. The appropriate design
of the agent program depends on the nature of the environment.
Simple reflex agents respond directly to percepts, whereas model-based reflex agents
maintain internal state to track aspects of the world that are not evident in the current percept.
Goal-based agents act to achieve their goals, and utility-based agents try to maximize their
own expected "happiness."
All agents can improve their performance through learning.
14