AIES Notes: Intelligent Agents (Lecture 2) : Topics of Lecture One
AIES Notes: Intelligent Agents (Lecture 2) : Topics of Lecture One
In which we discuss the nature of agents, perfect or otherwise, the diversity of environments, and the
resulting menagerie of agent types. We will see that the concept of rationality can be applied to a wide
variety of agents operating in any imaginable environment. Develop a small set of design principles
for building successful agents—systems that can reasonably be called intelligent.
Prerequisites
Students are expected to know how to write simple functions that take input through parameters,
conditional statement and how to return output of a function. Be able to differentiate among four
approaches of AI and establish relationship among think rationally and act rationally approach.
In general, an agent’s choice of action at any given instant can depend on the entire percept sequence observed
to date, but not on anything it has not perceived.
We can imagine tabulating the agent function that describes any given agent; for most agents, this
would be a very large table—infinite, in fact, unless we place a bound on the
length of percept sequences we want to consider. Given an agent to experiment with, we can, in
principle, construct this table by trying out all possible percept sequences and recording which actions
the agent does in response.
The table is, of course, an external characterization of the agent. Internally, the agent function for an
artificial agent will be implemented by an agent program. The agent function is an abstract
mathematical description; the agent program is a concrete implementation, running within some
physical system.
Rationality
What is rational at any given time depends on four things:
1. The performance measure that defines the criterion of success.
2. The agent’s prior knowledge of the environment.
3. The actions that the agent can perform.
4. The agent’s percept sequence to dates.
For each possible percept sequence, a rational agent should select an action that is expected to maximize its
performance measure, given the evidence provided by the percept sequence and whatever built-in knowledge the
agent has.
An omniscient agent knows the actual outcome of its actions and can act; accordingly, but omniscience
is impossible in reality.
This example shows that rationality is not the same as perfection. Rationality maximizes expected
performance, while perfection maximizes actual performance. rationality does not require
omniscience, then, because the rational choice depends only on the percept sequence to date.
Our definition requires a rational agent not only to gather information but also to learn as much as
possible from what it perceives. The agent’s initial configuration could reflect some prior knowledge
of the environment, but as the agent gains experience this may be modified and augmented. There are
extreme cases in which the environment is completely known a priori. In such cases, the agent need
not perceive or learn; it simply acts correctly.
To the extent that an agent relies on the prior knowledge of its designer rather than on its own percepts,
we say that the agent lacks autonomy. A rational agent should be autonomous—it should learn what
it can to compensate for partial or incorrect prior knowledge.
PEAS (Performance, Environment, Actuators, Sensors)
In designing an agent, the first step must always be to specify the task environment as fully as possible.
Structure of Agents
The job of AI is to design an agent program that implements the agent function— the mapping from
percepts to actions. We assume this program will run on some sort of computing device with physical
sensors and actuators—we call this the architecture.
Agent Program
Agent program, which takes the current percept as input, and the agent function, which takes the
entire percept history. If the agent’s actions need to depend on the entire percept sequence, the agent
will have to remember the percepts. Figure 2.7 shows a rather trivial agent program that keeps track
of the percept sequence and then uses it to index into a table of actions to decide what to do. The table—
an example of which is given for the vacuum world in Figure 2.3—represents explicitly the agent
function that the agent program embodies.
It is instructive to consider why the table-driven approach to agent construction is doomed to failure.
Let P be the set of possible percepts and let T be the lifetime of the agent (the total number of percepts
it will receive). The lookup table will contain ∑𝑻𝒕=𝟏 |𝑷|𝒕 entries. The lookup table for chess—a tiny, well-
behaved fragment of the real world—would have at least 10150 entries
The daunting size of these tables (the number of atoms in the observable universe is less than 1080)
means that (a) no physical agent in this universe will have the space to store the table, (b) the designer
would not have time to create the table, (c) no agent could ever learn all the right table entries from its
experience, and (d) even if the environment is simple enough to yield a feasible table size, the designer
still has no guidance about how to fill in the table entries.
Basic Kinds of Agent Programs
There are four basic kinds of agent programs that embody the principles underlying almost all
intelligent systems:
These agents select actions on the basis of the current percept, ignoring the rest of the percept history.
We use rectangles to denote the current internal state of the agent’s decision process, and ovals to
represent the background information used in the process. The agent program, which is also very
simple, is shown in Figure 2.10. The function generates an abstracted description of the current state
from the percept, and the function returns the first rule in the set of rules that matches the given state
description. Note that the description in terms of “rules” and “matching” is purely conceptual; actual
implementations can be as simple as a collection of logic gates implementing a Boolean circuit
Simple reflex agents have the admirable property of being simple, but they turn out to be of limited
intelligence. The agent in Figure 2.10 will work only if the correct decision can be made on the basis of
only the current percept—that is, only if the environment is fully observable. Even a little bit of
unobservability can cause serious trouble.
Model-Based Reflex Agents
The most effective way to handle partial observability is for the agent to keep track of the part of the
world it can’t see now. That is, the agent should maintain some sort of internal state that depends on
the percept history and thereby reflects at least some of the unobserved aspects of the current state.
Updating this internal state information as time goes by requires two kinds of knowledge to be encoded
in the agent program. First, we need some information about how the world evolves independently
of the agent—for example, that an overtaking car generally will be closer behind than it was a moment
ago. Second, we need some information about how the agent’s own actions affect the world—for
example, that when the agent turns the steering wheel clockwise, the car turns to the right.
This knowledge about “how the world works”—whether implemented in simple Boolean circuits or
in complete scientific theories—is called a model of the world. An agent that uses such a model is
called a model-based agent.
Goal-Based Agents
Knowing something about the current state of the environment is not always enough to decide what
to do. For example, at a road junction, the taxi can turn left, turn right, or go straight on. The correct
decision depends on where the taxi is trying to get to. In other words, as well as a current state
description, the agent needs some sort of goal information that describes situations that are
desirable—for example, being at the passenger’s destination.
Utility-Based Agents
Goals alone are not enough to generate high-quality behavior in most environments. For example,
many action sequences will get the taxi to its destination (thereby achieving the goal) but some are
quicker, safer, more reliable, or cheaper than others. Goals just provide a crude binary distinction
between “happy” and “unhappy” states. A more general performance measure should allow a
comparison of different world states according to exactly how happy they would make the agent.
Because “happy” does not sound very scientific, economists and computer scientists use the term
utility instead
Learning Agents
The method Alan Turing proposes is to build learning machines and then to teach them. In many areas
of AI, this is now the preferred method for creating state-of-the-art systems. Learning has another
advantage, as we noted earlier: it allows the agent to operate in initially unknown environments and
to become more competent than its initial knowledge alone might allow.
A learning agent can be divided into four conceptual components, as shown in Figure 2.15. The most
important distinction is between the learning element, which is responsible for making
improvements, and the performance element, which is responsible for selecting external actions.
The performance element is what we have previously considered to be the entire agent: it takes in
percepts and decides on actions. The learning element uses feedback from the critic on how the agent
is doing and determines how the performance element should be modified to do better in the future.
The design of the learning element depends very much on the design of the performance element.
When trying to design an agent that learns a certain capability, the first question is not “How am I
going to get it to learn this?” but “What kind of performance element will my agent need to do this
once it has learned how?” Given an agent design, learning mechanisms can be constructed to improve
every part of the agent.
The critic tells the learning element how well the agent is doing with respect to a fixed performance
standard. The critic is necessary because the percepts themselves provide no indication of the agent’s
success.
The last component of the learning agent is the problem generator. It is responsible for suggesting
actions that will lead to new and informative experiences. The point is that if the performance element
had its way, it would keep doing the actions that are best, given what it knows. But if the agent is
willing to explore a little and do some perhaps suboptimal actions in the short run, it might discover
much better actions for the long run. The problem generator’s job is to suggest these exploratory actions
to suggest these exploratory actions.
The learning element can make changes to any of the “knowledge” components shown in the agent
diagrams. The simplest cases involve learning directly from the percept sequence. Observation of pairs
of successive states of the environment can allow the agent to learn “How the world evolves,” and
observation of the results of its actions can allow the agent to learn “What my actions do.”
The situation is slightly more complex for a utility-based agent that wishes to learn utility information.
For example, suppose the taxi-driving agent receives no tips from passengers who have been
thoroughly shaken up during the trip. The external performance standard must inform the agent that
the loss of tips is a negative contribution to its overall performance; then the agent might be able to
learn that violent maneuvers do not contribute to its own utility. In a sense, the performance standard
distinguishes part of the incoming percept as a reward (or penalty) that provides direct feedback on
the quality of the agent’s behavior.
How the Components of Agent Programs Work
A question for a student of AI is, “How on earth do these components work?” Roughly speaking, we
can place the representations along an axis of increasing complexity and expressive power—atomic,
factored, and structured.
In an atomic representation each state of the world is indivisible—it has no internal structure. The
algorithms underlying search and game-playing (Chapters 3–5), HiddenMarkov models (Chapter 15),
and Markov decision processes (Chapter 17) all work with atomic representations—or, at least, they
treat representations as if they were atomic.
A factored representation splits up each state into a fixed set of variables or attributes, each of which
can have a value. While two different atomic states have nothing in common—they are just different
black boxes—two different factored states can share some attributes (such as being at some particular
GPS location) and not others (such as having lots of gas or having no gas); this makes it much easier to
work out how to turn one state into another. With factored representations, we can also represent
uncertainty—for example, ignorance about the amount of gas in the tank can be represented by leaving
that attribute blank. Many important areas of AI are based on factored representations, including
constraint satisfaction algorithms (Chapter 6), propositional logic (Chapter 7), planning (Chapters 10
and 11), Bayesian networks (Chapters 13–16), and the machine learning algorithms in Chapters 18,
20, and 21.
Structured representation, in which objects such as cows and trucks and their various and varying
relationships can be described explicitly. Structured representations underlie relational databases and
first-order logic (Chapters 8, 9, and 12), first-order probability models (Chapter 14), knowledge-
based learning (Chapter 19) and much of natural language understanding (Chapters 22 and 23). In
fact, almost everything that humans express in natural language concerns objects and their
relationships.
SUMMARY
The major points to recall are as follows:
• An agent is something that perceives and acts in an environment. The agent function for an agent
specifies the action taken by the agent in response to any percept sequence.
• The performance measure evaluates the behavior of the agent in an environment. A rational agent
acts so as to maximize the expected value of the performance measure, given the percept sequence it
has seen so far.
• A task environment specification includes the performance measure, the external environment, the
actuators, and the sensors. In designing an agent, the first step must always be to specify the task
environment as fully as possible.
• Task environments vary along several significant dimensions. They can be fully or partially
observable, single-agent or multiagent, deterministic or stochastic, episodic or sequential, static or
dynamic, discrete or continuous, and known or unknown.
• The agent program implements the agent function. There exists a variety of basic agent-program
designs reflecting the kind of information made explicit and used in the decision process. The designs
vary in efficiency, compactness, and flexibility. The appropriate design of the agent program depends
on the nature of the environment.
• Simple reflex agents respond directly to percepts, whereas model-based reflex agents maintain
internal state to track aspects of the world that are not evident in the current percept. Goal-based
agents act to achieve their goals, and utility-based agents try to maximize their own expected
“happiness.”