0% found this document useful (0 votes)
755 views30 pages

Learning: Chapter 17: Rich & Knight

The document discusses different types of machine learning, including: - Rote learning, which involves simple storage of information - Learning by taking advice from others such as a programmer - Learning through problem solving by adjusting parameters or creating macro-operators and chunks of operations - Learning from examples using induction to classify inputs and evolve class definitions It provides examples of different learning mechanisms used in artificial intelligence programs.

Uploaded by

Rupinder Aulakh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
755 views30 pages

Learning: Chapter 17: Rich & Knight

The document discusses different types of machine learning, including: - Rote learning, which involves simple storage of information - Learning by taking advice from others such as a programmer - Learning through problem solving by adjusting parameters or creating macro-operators and chunks of operations - Learning from examples using induction to classify inputs and evolve class definitions It provides examples of different learning mechanisms used in artificial intelligence programs.

Uploaded by

Rupinder Aulakh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 30

Learning

Chapter 17: Rich & knight


Learning
• What is Learning?
• Rote learning
• Learning by taking advice
• Learning in problem solving
• Learning from examples
• Induction
• Explanation based learning
• Discovery analogy
• Formal learning theory
• Neural net learning and genetic learning
What is Learning?
• Most often heard criticisms of AI is that machines cannot
be called intelligent until they are able to learn to do new
things and adapt to new situations, rather than simply
doing as they are told to do.
• Some critics of AI have been saying that computers
cannot learn!
• Definitions of Learning: changes in the system that are
adaptive in the sense that they enable the system to do
the same task or tasks drawn from the same population
more efficiently and more effectively the next time.
• Learning covers a wide range of phenomenon:
– Skill refinement : Practice makes skills improve. More you play
tennis, better you get
– Knowledge acquisition: Knowledge is generally acquired through
experience
Various learning mechanisms
• Simple storing of computed information or rote learning,
is the most basic learning activity.
• Many computer programs ie., database systems can be
said to learn in this sense although most people would
not call such simple storage learning.
• Another way we learn if through taking advice from
others. Advice taking is similar to rote learning, but high-
level advice may not be in a form simple enough for a
program to use directly in problem solving.
• People also learn through their own problem-solving
experience.
• Learning from examples : we often learn to classify
things in the world without being given explicit rules.
• Learning from examples usually involves a teacher who
helps us classify things by correcting us when we are
wrong.
Rote Learning
• When a computer stores a piece of data, it is performing a
rudimentary form of learning.
• In case of data caching, we store computed values so that we do
not have to recompute them later.
• When computation is more expensive than recall, this strategy can
save a significant amount of time.
• Caching has been used in AI programs to produce some surprising
performance improvements.
• Such caching is known as rote learning.
• Rote learning does not involve any sophisticated problem-solving
capabilities.
• It shows the need for some capabilities required of complex learning
systems such as:
– Organized Storage of information
– Generalization
Learning by taking Advice
• A computer can do very little without a program for it to run.
• When a programmer writes a series of instructions into a computer,
a rudimentary kind of learning is taking place: The programmer is
sort of a teacher and the computer is a sort of student.
• After being programmed, the computer is now able to do something
it previously could not.
• Executing a program may not be such a simple matter.
• Suppose the program is written in high level language such as
Prolog, some interpreter or compiler must intervene to change the
teacher’s instructions into code that the machine can execute
directly.
• People process advice in an analogous way.
• In chess, the advice “fight for control of the center of the board” is
useless unless the player can translate the advice into concrete
moves and plans. A computer program might make use of the
advice by adjusting its static evaluation function to include a factor
based on the number of center squares attacked by its own pieces.
Learning by advice
• A program called FOO, which accepts advice for
playing hearts, a card game. A human user first
translates the advice from english into a
representation that FOO can understand.
• A human can watch FOO play, detect new
mistakes, and correct them through yet more
advice, such as “play high cards when it is safe
to do so”.
• The ability to operationalize knowledge is critical
for systems that learn from a teacher’s advice.
Learning In Problem solving
• Can program get better without the aid of
a teacher?
• It can be by generalizing from its own
experiences.
Learning by parameter adjustment
• Many programs rely on an evaluation procedure that combines information
from several sources into a single summary statistic.
• Game playing programs do this in their static evaluation functions in which a
variety of factors such as piece advantage and mobility are combined into a
single score reflecting the desirability of a particular board position.
• Pattern classification programs often combine several features to determine
the correct category into which a given stimulus should be placed.
• In designing such programs, it is often difficult to know a priori how much
weight should be attached to each feature being used.
• One way of finding the correct weights is to begin with some estimate of the
correct settings and then to let the program modify the settings on the basis
of its experience.
• Features that appear to be good predictors of overall success will have their
weights increased, while those that do not will have their weights
decreased.
• Samuel’s checkers program uses static evaluation function in the
polynomial: c1t1 + c2t2 + … +c16 t16
• The t terms are the values of the sixteen features that contribute to the
evaluation.
• The c terms are the coefficients that are attached to each of these values.
As learning progresses, the c values will change.
Learning by Macro-operators
• Sequences of actions that can be treated as a whole are
called macro-operators.
• Example: suppose you are faced with the problem of
getting to the downtown post office. Your solution may
involve getting in your car, starting it, and driving along a
certain route. Substantial planning may go into choosing
the appropriate route, but you need not plan about how
to about starting the car. You are free to treat START-
CAR as an atomic action, even though it really consists
of several actions: sitting down, adjusting the mirror,
inserting the key, and turning the key.
• Macro-operators were used in the early problem solving
system STRIPS. After each problem solving episode, the
learning component takes the computed plan and stores
it away as a macro-operator, or MACROP.
• MACROP is just like a regular operator, except that it
consists of a sequence of actions, not just a single one.
Learning by Chunking
• Chunking is a process similar in flavor to macro-operators.
• The idea of chunking comes from the psychological literature
on memory and problem solving. Its computational basis is in
Production systems.
• When a system detects useful sequence of production firings,
it creates chunk, which is essentially a large production that
does the work of an entire sequence of smaller ones.
• SOAR is an example production system which uses chunking.
• Chunks learned during the initial stages of solving a problem
are applicable in the later stages of the same problem-solving
episode.
• After a solution is found, the chunks remain in memory, ready
for use in the next problem.
• At present, chunking is inadequate for duplicating the contents
of large directly-computed macro-operator tables.
The utility problem
• While new search control knowledge can be of great benefit in solving
future problems efficiently, there are also some drawbacks.
• The learned control rules can take up large amounts of memory and
the search program must take the time to consider each rule at each
step during problem solving.
• Considering a control rule amounts to seeing if its post conditions are
desirable and seeing if its preconditions are satisfied.
• This is a time consuming process.
• While learned rules may reduce problem-solving time by directing the
search more carefully, they may also increase problem-solving time by
forcing the problem solver to consider them.
• If we only want to minimize the number of node expansions in the
search space, then the more control rules we learn, the better.
• But if we want to minimize the total CPU time required to solve a
problem, we must consider this trade off.
Learning from Examples: Induction
• Classification is the process of assigning, to a particular input, the
name of a class to which it belongs.
• The classes from which the classification procedure can choose can
be described in a variety of ways.
• Their definition will depend on the use to which they are put.
• Classification is an important component of many problem solving
tasks.
• Before classification can be done, the classes it will use must be
defined:
– Isolate a set of features that are relevant to the task domain.Define each
class by a weighted sum of values of these features. Ex: task is weather
prediction, the parameters can be measurements such as rainfall,
location of cold fronts etc.
– Isolate a set of features that are relevant to the task domain. Define
each class as a structure composed of these features. Ex: classifying
animals, various features can be such things as color,length of neck etc
• The idea of producing a classification program that can evolve its
own class definitions is called concept learning or induction.
Winston’s Learning Program
• An early structural concept learning program.
• This program operates in a simple blocks world
domain.
• Its goal was to construct representations of the
definitions of concepts in blocks domain.
• For example, it learned the concepts House,
Tent and Arch.
• A near miss is an object that is not an instance
of the concept in question but that is very similar
to such instances.
Basic approach of Winston’s
Program
1. Begin with a structural description of one
known instance of the concept. Call that
description the concept defintion.
2. Examine descriptions of other known
instances of the concepts. Generalize th
definition to include them.
3. Examine the descriptions of near misses
of the concept. Restrict the definition to
exclude these.
Version spaces
• The goal os version spaces is to produce a description that is
consistent with all positive examples but no negative examples in
the training set.
• This is another approach to concept learning.
• Version spaces work by maintaining a set of possible descriptions
and evolving that set as new examples and near misses are
presented.
• The version space is simply a set of descriptions, so an initial idea is
to keep an explicit list of those descriptions.
• Version space consists of two subsets of the concept space.
• One subset called G contains most general descriptions consistent
with the training examples . The other subset contains the most
specific descriptions consistent with the training examples.
• The algorithm for narrowing the version space is called the
Candidate elimination algorithm.
Algorithm: Candidate Elimination
• Given: A representation language and a set of positive
and negative examples expressed in that language.
• Compute : A concept description that is consistent with all
the positive examples and none of the negative examples.
1. Initialize G to contain one element
2. Initialize S to contain one element: the first positive
element.
3. Accept new training example.If it is a positive example,
first remove from G any descriptions that do not cover the
example. Then update the set S to contain most specific
set of descriptions in the version space that cover the
example and the current elements of the S set. Inverse
actions for negative example
4. If S and G are both singleton sets, then if they are
identical, output their values and halt.
Decision Trees
• This is a third approach to concept learning.
• To classify a particular input, we start at the top of the
tree and answer questions until we reach a leaf, where
the specification is stored.
• ID3 is a program example for Decision Trees.
• ID3 uses iterative method to build up decision trees,
preferring simple trees over complex ones, on the theory
that simple trees are more accurate classifiers of future
inputs.
• It begins by choosing a random subset of the training
examples.
• This subset is called the window.
• The algorithm builds a decision tree that correctly
classifies all examples in the windo.
Decision tree for “Japanese
economy car”

Origin?

India USA UK Japan


Aus
(-) (-) (-) (-)

Type?

Sports Economy Luxury


(+) (-)
(-)
Explanation-Based Learning
• Learning complex concepts using Induction procedures typically
requires a substantial number of training instances.
• But people seem to be able to learn quite a bit from single
examples.
• We don’t need to see dozens of positive and negative examples of
fork( chess) positions in order to learn to avoid this trap in the future
and perhaps use it to our advantage.
• What makes such single-example learning possible? The answer is
knowledge.
• Much of the recent work in machine learning has moved away from
the empirical, data intensive approach described in the last section
toward this more analytical knowledge intensive approach.
• A number of independent studies led to the characterization of this
approach as explanation-base learning(EBL).
• An EBL system attempts to learn from a single example x by
explaining why x is an example of the target concept.
• The explanation is then generalized, and then system’s
performance is improved through the availability of this knowledge.
EBL
• We can think of EBL programs as accepting the following as input:
– A training example
– A goal concept: A high level description of what the program is supposed
to learn
– An operational criterion- A description of which concepts are usable.
– A domain theory: A set of rules that describe relationships between
objects and actions in a domain.
• From this EBL computes a generalization of the training example
that is sufficient to describe the goal concept, and also satisfies the
operationality criterion.
• Explanation-based generalization (EBG) is an algorithm for EBL and
has two steps: (1) explain, (2) generalize
• During the explanation step, the domain theory is used to prune
away all the unimportant aspects of the training example with
respect to the goal concept. What is left is an explanation of why the
training example is an instance of the goal concept. This explanation
is expressed in terms that satisfy the operationality criterion.
• The next step is to generalize the explanation as far as possible
while still describing the goal concept.
Discovery
• Learning is the process by which one entity
acquires knowledge. Usually that knowledge is
already possessed by some number of other
entities who may serve as teachers.
• Discovery is a restricted form of learning in
which one entity acquires knowledge without the
help of a teacher.
– Theory-Driven Discovery
– Data Driven Discovery
– Clustering
AM: Theory-driven Discovery
• Discovery is certainly learning. More clearly than other kinds of
learning, problem solving.
• Suppose that we want to build a program to discover things in
maths, such a program would have to rely heavily on the problem-
solving techniques.
• AM is written by Lenat and it worked from a few basic concepts of
set theory to discover a good deal of standard number theory.
• AM exploited a variety of general-purpose AI techniques. It used a
frame system to represent mathematical concepts. One of the major
activities of AM is to create new concepts and fill in their slots.
• AM uses Heuristic search, guided by a set of 250 heuristic rules
representing hints about activities that are likely to lead to
“interesting” discoveries.
• In one run AM discovered the concept of prime numbers. How did it
do it?
– Having stumbled onto the natural numbers, AM explored operations
such as addition, multiplication and their inverses. It created the concept
of divisibilty and noticed that some numbers had very few divisors.
Bacon: Data Driven Discovery
• AM showed how discovery might occur in theoritical setting.
• Scientific discovery has inspired several computer models.
• Langley et al presented a model of data-driven scientific discovery that has been
implemented as a program called BACON ( named after Sir Francis Bacon, a
philosopher of science)
• BACON begins with a set of variables for a problem.
• For example in the study of the behavior of gases, some variables are p, the pressure
on the gas, V, the volume of the gas, n, the amount of gas in moles, and T the
temperature of the gas.
• Physicists have long known a law, called ideal gas law, that relates these variables.
• BACON is able to derive this law on its own.
• First, BACON holds the variables n and T constant, performing experiments at different
pressures p1, p2 and p3.
• BACON notices that as the pressure increases, the volume V decreases.
• For all values, n,p, V and T, pV/nT = 8.32 which is ideal gas law as shown by BACON.
• BACON has been used to discover wide variety of scientifc laws such as Kepler’s third
law, Ohm’s law, the conservation of momentum and Joule’s law.
• BACON’s discovery procedure is state-space search.
• A better understanding of the science of scientific discovery may lead one day to
programs that display true creativity.
• Much more work must be done in areas of science that BACON does not model.
Clustering
• Clustering is very similar to induction. In Inductive learning a
program learns to classify objects based on the labelings provided
by a teacher,
• In clustering, no class labelings are provided.
• The program must discover for itself the natural classes that exist
for the objects, in addition to a method for classifying instances.
• AUTOCLASS is one program that accepts a number of training
cases and hypothesizes a set of classes.
• For any given case, the program provides a set of probabilities that
predict into which classes the case is likely to fall.
• In one application, AUTOCLASS found meaningful new classes of
stars from their infrared spectral data.
• This was an instance of true discovery by computer, since the facts
it discovered were previously unknown to astronomy.
• AUTOCLASS uses statistical Bayesian reasoning of the type
discussed.
Analogy
• Analogy is a powerful inference tool.
• Our language and reasoning are laden with analogies.
– Last month, the stock market was a roller coaster.
– Bill is like a fire engine.
– Problems in electromagnetism are just like problems in fluid flow.
• Underlying each of these examples is a complicated mapping between what
appear to be dissimilar concepts.
• For example, to understand the first sentence above, it is necessary to do two
things:
1. Pick out one key property of a roller coaster, namely that it travels up and down
rapidly
2. Realize that physical travel is itself an analogy for numerical fluctuations.
• This is no easy trick.
• The space of possible analogies is very large.
• An AI program that is unable to grasp analogy will be difficult to talk to and
consequently difficult to teach.
• Thus analogical reasoning is an important factor in learning by advice taking.
• Humans often solve problems by making analogies to things they already
understand how to do.
Formal Learning Theory
• Learning has attracted the attention of mathematicians
and theoritical computer scientists.
• Inductive learning in particular has received considerable
attention.
• Formally, a device learns a concept if it can given
positive and negative examples, produces and algorithm
that will classify future examples correctly with probability
1/h.
• The complexity of learning a concept is a function of
three factors: the error tolerance (h), the number of
binary features present in the examples (t) and the size
of the rule necessary to make the discrimination (f).
• If the number of training examples required is polynomial
in h, t, and f, then the concept is said to be learnable.
Formal Learning Theory
• For example, given positive and negative examples of
strings in some regular language, can we efficiently
induce the finite automation that produces all and only
the strings in the language? The answer is no; an
exponential number of computational steps is required.
• It is difficult to tell how such mathematical studies of
learning will affect the ways in which we solve AI
problems in practice.
• After all, people are able to solve many exponentially
hard problems by using knowledge to constrain the
space of possible solutions.
• Perhaps mathematical theory will one day be used to
quantify the use of such knowledge but this prospect
seems far off.
Neural Net Learning and Genetic
Learning
• Collections of idealized neurons were presented with stimuli and
prodded into changing their behaviour via forms of reward and
punishment.
• Researchers hoped that by imitating the learning mechanisms of
animals, they might build learning machines from very simple parts.
Such hopes proved elusive.
• However, the field of neural network learning has seen a resurgence
in recent years, partly as a result of the discovery of powerful new
learning algorithms.
• While neural network models are based on a computational “brain
metaphor”,of a number of other learning techniques make use of a
metaphor based on evolution.
• In this work, learning occurs through a selection process that begins
with a large population of random programs.
Summary
• The mos important thing to conclude from our
study of automated learning is that learning itself
is a problem-solving process.
– Learning by taking advice
– Learning from examples
– Learning in problem solving
– Discovery
• A learning machine is the dream system of AI

You might also like