Analytical Learning

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

05-04-2018 Dr.

Vijaya Sri Kompalli


ANALYTICAL LEARNING
1
INTRODUCTION

05-04-2018
 Inductive learning (Supervised Learning) : Generalize
from observed training examples to be +ve or –ve.
 Neural network and Decision tree learning, Inductive

Dr. Vijaya Sri Kompalli


Logic Programming, Genetic Algorithms
 Poor when insufficient data.
 Some fundamental bounds on the accuracy that can
be achieved when learning inductively
 Solution: Willing to reconsider the formulation of the
learning problem
 To develop learning algorithms that accept explicit
prior knowledge as an input, in addition to the input
training data.
 Explanation-based learning is one such approach. 2
EXPLANATION-BASED LEARNING (EBL)

05-04-2018
 It uses prior knowledge to analyze, or explain,
each training example in order to infer which
example features are relevant to the target

Dr. Vijaya Sri Kompalli


function and which are irrelevant.
 Uses prior knowledge to reduce the complexity of
the hypothesis space to be searched.

3
EXAMPLE

05-04-2018
 Target Concept: Chess

Dr. Vijaya Sri Kompalli


positions in which black
will lose its queen within
two moves.

4
 "board positions in which the black king and
queen are simultaneously attacked,“
 Heavy explanation or analyzing ability

05-04-2018
 "Because white's knight is attacking both the king and
queen, black must move out of check, thereby allowing

Dr. Vijaya Sri Kompalli


the knight to capture the queen."
 Rationally generalize

 Included in the general hypothesis, once the


explanation matches.
 Principle of optimal moves.

 Uncovered by specific training examples.

 Learning algorithms that learn from such


explanations.
 Example: Prolog_EBG 5
05-04-2018
Dr. Vijaya Sri Kompalli
INDUCTIVE LEARNING VS. ANALYTICAL
LEARNING

6
INDUCTIVE LEARNING

05-04-2018
 In inductive learning,
 the learner is given a hypothesis space H from which

Dr. Vijaya Sri Kompalli


 it must select an output hypothesis, and
 a set of training examples D = { ( x l ,f ( x ~ ) ). ., .
(x,, f ( x , ) ) } where f (xi )i s the target value for
the instance xi.
 The desired output of the learner is a hypothesis
h from H that is consistent with these training
examples.

7
INDUCTIVE LEARNING SPACE

05-04-2018
LEARNER Output
Hypothesis

Dr. Vijaya Sri Kompalli


T1
F1 T2
X1 F2 T3
X2 F3 .
X3 . .
. . Training
.
. . Examples
Tn
Xn Fn D
(x1,f1)
(x2,f2)
. h
.
(xn,fn) 8
H
EXAMPLE: CHESS GAME

05-04-2018
 "chessboard positions in which black will lose its
queen within two moves.“

Dr. Vijaya Sri Kompalli


 xi would describe a particular chess position

 True: xi is a position for


which black will lose
its queen within two moves
 f (xi)

 False : xi is a position for


which black will not lose
its queen within two moves
9
ANALYTICAL LEARNING

05-04-2018
 In analytical learning,
 the input to the learner includes the same
hypothesis space H and

Dr. Vijaya Sri Kompalli


 training examples D as for inductive learning.
 In addition, the learner is provided an additional
input: A domain theory B consisting of background
knowledge that can be used to explain observed
training examples.
 The desired output of ,the learner is a hypothesis h
from H that is consistent with both the training
examples D and the domain theory B.

10
ANALYTICAL LEARNING SPACE

Output

05-04-2018
Hypothesis
LEARNER
T1
F1 T2
X1 F2 T3

Dr. Vijaya Sri Kompalli


X2 F3 .
X3 . .
Domain . . .
Theory . . Tn
Xn Fn
(x1,f1)
(x2,f2) Training
“B” . h Examples D
.
(xn,fn)

H 11

B-Observed Results from experts:For a given value of x1, the target is T1 as from
the existing samples it is observed that the sampling maps 70 % of the values.
EXAMPLE: CHESS GAME

05-04-2018
 "chessboard positions in which black will lose its
queen within two moves.“
 Domain Theory B: Pre and well defined Legal

Dr. Vijaya Sri Kompalli


Moves of Chess.
 xi would describe a particular chess position
 True: xi is a position for
which black will lose
its queen within two moves
 f (xi)
 False : xi is a position for
which black will not lose
12
its queen within two moves
*B does not entail the negation of h
ANALYTICAL EXAMPLE: ROBOT SORTING
VARIOUS PHYSICAL OBJECTS

05-04-2018
 Consider an instance space X in which each instance is a
pair of physical objects.
Each of the two physical objects in the instance is described

Dr. Vijaya Sri Kompalli



by the predicates Color, Volume, Owner, Material,
Type, and Density,
 The relationship between the two objects is described
by the predicate On.
 Given this instance space, the task is to learn the
target concept "pairs of physical objects, such that one can
be stacked safely on the other," denoted by the predicate
SafeToStack(x,y).
 Learning this target concept might be useful, for
example, to a robot system that has the task of storing
13
various physical objects within a limited workspace.
05-04-2018 Dr. Vijaya Sri Kompalli
14
LEARNING WITH PERFECT
DOMAIN THEORIES: PROLOG-EBG

05-04-2018
 A domain theory is said to be
 correct if each of its assertions is a truthful
statement about the world.

Dr. Vijaya Sri Kompalli


 complete with respect to a given target concept and
instance space, if the domain theory covers every
positive example in the instance space.
 After all, if the learner had a perfect domain
theory, why would it need to learn? There are two
responses to this question.

15
REASONS

05-04-2018
 Although it is quite easy to write down the legal
moves of chess that constitute this domain
theory, it is extremely difficult to write down the

Dr. Vijaya Sri Kompalli


optimal chess-playing strategy.
 It is difficult to write a perfectly correct and
complete theory even for our relatively simple
SafeToStack problem.
 A more realistic assumption is that plausible
explanations based on imperfect domain theories
must be used, rather than exact proofs based on
perfect knowledge.
16
EXPLANATION-BASED LEARNING
ALGORITHMS : PROLOG-EBG

05-04-2018
 PROLOG-EBG: Kedar-Cabelli and McCarty 1987
 PROLOG-EBGis a sequential covering algorithm

Dr. Vijaya Sri Kompalli


 It operates by learning a single Horn clause rule,
removing the positive training examples covered
by this rule, then iterating this process on the
remaining positive examples until no further
positive examples remain uncovered.
 When given a complete and correct domain
theory, PROLOG-EBG is guaranteed to output a
hypothesis (set of rules) that is itself correct and
that covers the observed positive training
17
examples.
ILLUSTRATIVE TRACE

05-04-2018
 PROLOG-EBG algorithm is a sequential covering
algorithm that considers the training data
incrementally.

Dr. Vijaya Sri Kompalli


 For each new positive training example that is
not yet covered by a learned Horn clause, it forms
a new Horn clause by:
 (1) explaining the new positive training example,
(2) analyzing this explanation to determine an
appropriate generalization, and
(3) refining the current hypothesis by adding a
new Horn clause rule to cover this positive
18
example, as well as other similar instances
05-04-2018 Dr. Vijaya Sri Kompalli
19
1. EXPLAIN THE TRAINING
EXAMPLE

05-04-2018
 When the domain theory is correct and complete
this explanation constitutes a proof that the
training example satisfies the target

Dr. Vijaya Sri Kompalli


concept.
 When dealing with imperfect prior knowledge,
the notion of explanation must be extended to
allow for plausible, approximate arguments
rather than perfect proofs.

20
05-04-2018 Dr. Vijaya Sri Kompalli
21
2. ANALYZE THE EXPLANATION

05-04-2018
 "of the many features that happen to be true of
the current training example, which ones are
generally relevant to the target concept?‘’

Dr. Vijaya Sri Kompalli


 By collecting just the features mentioned in the
leaf nodes of the explanation and substituting
variables x and y for Objl and Obj2, we can
form a general rule that is justified by the
domain theory:
 SafeToStack(x, y) <- Volume(x, 2) ^
Density(x, 0.3) ^ Type(y, Endtable)

22
05-04-2018
 PROLOG-EBG computes the most general rule that
can be justified by the explanation, by computing the
weakest preimage of the explanation

Dr. Vijaya Sri Kompalli


 Definition: The weakest preimage of a conclusion
C with respect to a proof P is the most general set of
initial assertions A, such that A entails C according to
P.

 This more general rule does not require the specific


values for Volume and Density that were required by
the first rule.
 Instead, it states a more general constraint on the
23
values of these attributes
05-04-2018
 PROLOG-EBG computes the weakest preimage of the
target concept with respect to the explanation, using a
general procedure called regression.

Dr. Vijaya Sri Kompalli


 The regression procedure operates on a domain theory
represented by an arbitrary set of Horn clauses.
 It works iteratively backward through the explanation,
 First computing the weakest preimage of the target concept
with respect to the final proof step in the explanation, then
computing the weakest preimage of the resulting
expressions with respect to the preceding step, and so on.
 The procedure terminates when it has iterated over all
steps :in the explanation, yielding the weakest precondition
of the target concept with respect to the literals at the leaf
nodes of the explanation.
24
05-04-2018 Dr. Vijaya Sri Kompalli
25
REGRESSION
3. REFINE THE CURRENT
HYPOTHESIS

05-04-2018
 The current hypothesis at each stage consists of
the set of Horn clauses learned thus far.
 At each stage, the sequential covering

Dr. Vijaya Sri Kompalli


algorithm picks a new positive example that is
not yet covered by the current Horn clauses, explains
this new example, and formulates a new rule
 Only positive examples are covered in the algorithm
as we have defined it, and
 The learned set of Horn clause rules predicts only
positive examples.
 A new instance is classified as negative if the current
rules fail to predict that it is positive.
 This is in keeping with the standard negation-as-
failure approach used in Horn clause inference
systems such as PROLOG. 26
REMARKS ON EXPLANATION-
BASED LEARNING

05-04-2018
 Unlike inductive methods, PROLOG-EBG produces justified
general hypotheses by using prior knowledge to analyze individual
examples.
 The explanation of how the example satisfies the target

Dr. Vijaya Sri Kompalli


concept determines which example attributes are relevant: those
mentioned by the explanation.
 The further analysis of the explanation, regressing the target
concept to determine its weakest preimage with respect to the
explanation, allows deriving more general constraints on the values of
the relevant features.
 Each learned Horn clause corresponds to a sufficient condition
for satisfying the target concept. The set of learned Horn clauses
covers the positive training examples encountered by the learner, as
well as other instances that share the same explanations.
 The generality of the learned Horn clauses will depend on the
formulation of the domain theory and on the sequence in which
training examples are considered.
 PROLOG-EBG implicitly assumes that the domain theory is
correct and complete. If the domain theory is incorrect or
incomplete, the resulting learned concept may also be incorrect. 27
CAPABILITIES AND LIMITATIONS

05-04-2018
 EBL as theory-guided generalization of
examples EBL uses its given domain theory to
generalize rationally from examples based on

Dr. Vijaya Sri Kompalli


relevance.
 EBL as example-guided reformulation of
theories. The PROLOG-EBG algorithm can be
viewed as a method for reformulating the domain
theory into a more operational form by forming rules
to classify.
 EBL as "just" restating what the learner already
"knows. " i.e., if its initial domain theory is sufficient
to explain any observed training examples, then it is
also sufficient to predict their classification in
advance. 28
KNOWLEDGE COMPILATION

05-04-2018
 In its pure form EBL involves reformulating the
domain theory to produce general rules that
classify examples in a single inference step.

Dr. Vijaya Sri Kompalli


 This kind of knowledge reformulation is
sometimes referred to as knowledge
compilation, indicating that the transformation
is an efficiency improving one that does not alter
the correctness of the system's knowledge.

29
1. DISCOVERING NEW FEATURES

05-04-2018
 One interesting capability of PROLOG-EBG is its
ability to formulate new features that are not
explicit in the description of the training

Dr. Vijaya Sri Kompalli


examples, but that are needed to describe the
general rule underlying the training example.
 In particular, the learned rule asserts that the
essential constraint on the Volume and Density
of x is that their product is less than 5.
 In fact, the training examples contain no
description of such a product, or of the value it
should take on. Instead, this constraint is
formulated automatically by the learner. 30
COMPARISON WITH NN

05-04-2018
EBG
NN
 Feature is one of a very
large set of potential  “Feature" is similar in

Dr. Vijaya Sri Kompalli


features that can be kind to the types of
computed from the features represented by
available instance the hidden units of
attributes neural networks;
 PROLOG-EBG  Fits the features based
automatically formulates on training data in
such features in its attempt
to fit the training data
Backpropagation
 Statistical process derives
 PROLOG-EBG employs an
hidden unit features in
analytical process to derive neural networks from
new features based on many training examples
analysis of single training
31
examples.
2. DEDUCTIVE LEARNING

05-04-2018
 PROLOG-EBGis a deductive, rather than
inductive, learning process.

Dr. Vijaya Sri Kompalli


 That is, by calculating the weakest preimage of
the explanation it produces a hypothesis h that
follows deductively from the domain theory B,
while covering the training data D.
 To be more precise, PROLOG-EBGo utputs a
hypothesis h that satisfies the following two
constraints:

32
05-04-2018
 Earlier equation states the type of knowledge
that is required by PROLOG-EBG for its domain
theory.

Dr. Vijaya Sri Kompalli


 In particular, PROLOG-EBG assumes the
domain theory B entails the classifications of the
instances in the training data:

 This constraint on the domain theory B assures


that an explanation can be constructed for each
positive example.
33
ILP VS EBG
 Inductive Logic  Prolog-EBG:
Programming Explanation Based
Learning

05-04-2018
 It is an inductive learning
task  EBL is deductive
 Background knowledge B' learning task
is provided to the learner.  Domain Theory B is

Dr. Vijaya Sri Kompalli


 It does not typically provided to the learner.
satisfy the constraint  It satisfies the
given by Equation classification equation

 ILP uses its background  PROLOG-EBG uses its


knowledge B' to enlarge domain theory B to
the set of hypotheses to be reduce the set of
considered acceptable hypotheses.
 ILP systems output a  PROLOG-EBG outputs
hypothesis h that satisfies a hypothesis h that
the following constraint: satisfies the following
two constraints: 34
3.INDUCTIVE BIAS IN EXPLANATION-
BASED LEARNING

05-04-2018
 Inductive bias of a learning algorithm is a set of
assertions that, together with the training
examples, deductively entail subsequent

Dr. Vijaya Sri Kompalli


predictions made by the learner.
 The importance of inductive bias is that it
characterizes how the learner generalizes beyond
the observed training examples.
 In PROLOG-EBG the output hypothesis h follows
deductively from D^B
 Therefore, the domain theory B is a set of
assertions which, together with the training
35
examples, entail the output hypothesis.
05-04-2018
 PROLOG-EBGem ploys a sequential covering algorithm
that continues to formulate additional Horn clauses until
all positive training examples have been covered

Dr. Vijaya Sri Kompalli


 Approximate inductive bias of PROLOG-EBGT: he
domain theory B, plus a preference for small sets of
maximally general Horn clauses.
 The inductive bias is a fixed property of the learning
algorithm, typically determined by the syntax of its
hypothesis representation
 Therefore, any attempt to develop a general-purpose
learning method must at minimum allow the inductive bias
to vary with the learning problem at hand.
 On a more practical level, in many tasks it is quite natural
to input domain specific knowledge (e.g., the knowledge
about Weight in the SafeToStack example) to influence 36
how the learner will generalize beyond the training data.
KNOWLEDGE LEVEL LEARNING (HYPOTHESES
THAT ARE NOT ENTAILED BY THE DOMAIN
THEORY ALONE)

05-04-2018
 LE MMA-ENUMERATOR is an algorithm simply
enumerates all proof trees that conclude the
target concept based on assertions in the domain

Dr. Vijaya Sri Kompalli


theory B.
 For each such proof tree, LEMMA-
ENUMERATOR calculates the weakest preimage
and constructs a Horn clause, in the same
fashion as PROLOG-EBG.
 The only difference between LEMMA-
ENUMERATOR and PR OLOG-EBG is that
LEMMAENUMERATOR ignores the training
data and enumerates all proof trees. 37
05-04-2018
 "If Ross likes to play tennis when the humidity is
x, then he will also like to play tennis when the
humidity is lower than x,“

Dr. Vijaya Sri Kompalli


 domain theory does not entail any conclusions
regarding which instances are positive or
negative instances of PlayTennis

38
05-04-2018
 The phrase knowledge-level learning is
sometimes used to refer to this type of learning,
in which the learned hypothesis entails

Dr. Vijaya Sri Kompalli


predictions that go beyond those entailed by the
domain theory.
 The set of all predictions entailed by a set of
assertions Y is often called the deductive
closure of Y.
 The key distinction here is that in
knowledge-level learning the deductive
closure of B is a proper subset of the deductive
closure of B + h. 39
05-04-2018
 A second example of knowledge-level analytical
learning is provided by considering a type of
assertions known as determinations
Determinations assert that some attribute of the instance

Dr. Vijaya Sri Kompalli



is fully determined by certain other attributes, without
specifying the exact nature of the dependence.
 "people who speak Portuguese," and imagine we are
given as a domain theory the single determination
assertion
 "the language spoken by a person is determined by
their nationality."
 Taken alone, this domain theory does not enable us to
classify any instances as positive or negative.
 However, if we observe that "Joe, a 23-year-old left-handed
Brazilian, speaks Portuguese,"
 then we can conclude from this positive example and the 40
domain theory that "all Brazilians speak Portuguese."
EXPLANATION-BASED LEARNING OF
SEARCH CONTROL KNOWLEDGE

05-04-2018
 Exactly how should we formulate the problem of learning search control so that
we can apply explanation-based learning?
 One system that employs explanation-based learning to improve its search is
PRODIGY(C arbonell et al. 1990).

Dr. Vijaya Sri Kompalli


 PRODIGYis a domain-independent planning system that accepts the definition
of a problem domain in terms of the state space S and operators 0.
 It then solves problems of the form "find a sequence of operators that
leads from initial state si to a state that satisfies goal predicate G."
 PRODIGuYse s a means-ends planner that decomposes problems into
subgoals, solves them, then combines their solutions into a solution for the full
problem.
 Thus, during its search for problem solutions PRODIGYre peatedly faces
questions such as
 "Which subgoal should be solved next?'and
 "Which operator should be considered for solving this subgoal?'
Minton (1988) describes the integration of explanation-based learning into PRODIGYb y
defining a set of target concepts
 appropriate for these kinds of control decisions that it repeatedly confronts.
 For example, one target concept is "the set of states in which subgoal A should
be solved before subgoal B."
41
05-04-2018
Dr. Vijaya Sri Kompalli
 SOAR supports a broad variety of problem-solving strategies
that subsumes PRODIGYm'Se ans-ends planning strategy.
 SOAR learns by explaining situations in which its current search
strategy leads to inefficiencies.
 When it encounters a search choice for which it does not have a
definite answer (e.g., which operator to apply next) SOAR reflects
on this search impasse, using weak methods such as generate-
and-test to determine the correct course of action.
 The reasoning used to resolve this impasse can be interpreted
 as an explanation for how to resolve similar impasses in the future.
SOAR uses a variant of explanation-based learning called chunking
to extract the general conditions under which the same
explanation applies.
 SOAR has been applied in a great number of problem domains
and has also been proposed as a psychologically plausible model of
human learning processes 42

You might also like