0% found this document useful (0 votes)
52 views29 pages

Machine Learning UNIT-5

Uploaded by

k.saisindhu24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views29 pages

Machine Learning UNIT-5

Uploaded by

k.saisindhu24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Outline

• Two formulations for learning: Inductive and


Analytical
• Perfect domain theories and Prolog-EBG

Copyright Tom Mitchell 27 October 1994 1


A Positive Example

0J0S0A0Z
OPO0ZPZ0
0Z0Z0ZPZ
Z0Z0Z0Z0
0Z0ZqZpZ
ZpM0o0Zp
pZpZ0Z0Z
sks0ZbZ0

Copyright Tom Mitchell 27 October 1994 2


The Inductive Generalization Problem

Given:
• Instances
• Hypotheses
• Target Concept
• Training examples of target concept

Determine:
• Hypotheses consistent with the training examples

Copyright Tom Mitchell 27 October 1994 3


The Analytical Generalization Problem

Given:
• Instances
• Hypotheses
• Target Concept
• Training examples of target concept
• Domain theory for explaining examples
Determine:
• Hypotheses consistent with the training examples
and the domain theory

Copyright Tom Mitchell 27 October 1994 4


An Analytical Generalization Problem

Given:
• Instances: pairs of objects
• Hypotheses: sets of horn clause rules
• Target Concept: Safe-to-stack(x,y)
• Training Example: Safe-to-stack(OBJ1,OBJ2)
On(OBJ1,OBJ2)
Isa(OBJ1,BOX)
Isa(OBJ2,ENDTABLE)
Color(OBJ1,RED)
Color(OBJ2,BLUE)
Volume(OBJ1,.1)
Density(OBJ1,.1)
...
• Domain Theory:
Safe-To-Stack(x,y) :- Not(Fragile(y))
Safe-To-Stack(x,y) :- Lighter(x,y)
Lighter(x,y) :- Weight(x,wx), Weight(y,wy),
Less(wx,wy)
Weight(x,w) :- Volume(x,v), Density(x,d),
Equal(w, v*d)
Weight(x,5) :- Isa(x, ENDTABLE)
...
Determine:
• Hypotheses consistent with training examples and
domain theory

Copyright Tom Mitchell 27 October 1994 5


Learning from Perfect Domain Theories

Assumes domain theory is correct (error-free)


• Prolog-EBG is algorithm that works under this
assumption
• This assumption holds in chess and other search
problems
• Allows us to assume explanation = proof
• Later we’ll discuss methods that assume
approximate domain theories

Copyright Tom Mitchell 27 October 1994 6


Prolog EBG

Initialize hypothesis = {}

For each positive training example not covered by hypothesis:


1. Explain how training example satisfies target
concept, in terms of domain theory
2. Analyze the explanation to determine the most
general conditions under which this explanation
(proof) holds
3. Refine the hypothesis by adding a new rule, whose
preconditions are the above conditions, and whose
consequent asserts the target concept

Copyright Tom Mitchell 27 October 1994 7


Explanation of a Training Example

Explanation:
Safe-to-Stack(OBJ1,OBJ2)

Lighter(OBJ1,OBJ2)

Weight(OBJ1, 0.6) Weight(OBJ2,5)

Volume(OBJ1,2) Density(OBJ1,0.3) Equal(0.6, 2*0.3) Less-Than(0.6, 5) Type(OBJ2,ENDTABLE)

Training Example:

Wood
Material Yes
2
Volume 0.3
Density Fragile
On OBJ2
OBJ1
Type
Material Type Color
Owner
Color
Owner Blue
EndTable
Cardboard Box
Red Louise
Fred

Copyright Tom Mitchell 27 October 1994 8


Computing the Weakest Preimage of Explanation

Safe-to-Stack(OBJ1,OBJ2)
Safe-to-Stack(x,y)

Lighter(OBJ1,OBJ2)
Lighter(x,y)

Weight(OBJ1, 0.6) Less-Than(0.6,5) Weight(OBJ2,5)


Weight(x,wx) Less-Than(wx,wy) Weight(y,wy)

Volume(OBJ1,2) Density(OBJ1,0.3) Equal(0.6,2*0.3)


Volume(x,vx) Density(x,dx) Equal(wx,vx*dx) Less-Than(wx,wy) Weight(y,wy)

Type(OBJ2,ENDTABLE)
Volume(x,vx) Density(x,dx) Equal(wx,vx*dx) Less-Than(wx,5) Type(y,ENDTABLE)

Copyright Tom Mitchell 27 October 1994 9


Regression Algorithm

Regress(Frontier, Rule, Expression, UI,R)


Frontier: the set of expressions to be regressed through Rule
Rule: a horn clause.
Expression: the member of Frontier that is inferred by Rule in the explanation.
UI,R: the substitution that unifies Rule to the training example in the explanation

Returns the list of expressions forming the weakest preimage of Frontier with respect to Rule

let Consequent ← Rule consequent


let Antecedents ← Rule antecedents
1. UE,R ← most general unifier of Expression with Consequent
such that there exists a substitution S for which

S(UE,R(Consequent)) = UI,R(Consequent)

2. Return UE,R({Frontier -Consequent+Antecedent})

Example:
Regress({Volume(x,vs), Density(x,dx), Equal(wx,vx*dx),
Less-Than(wx,wy), Weight(y,wy)},
Weight(z,5) :- Type(z,ENDTABLE),
Weight(y,wy),
{OBJ2/z})

Consequent ← Weight(z,5)
Antecedents ← Type(z,ENDTABLE)
UE,R ← {y/z, 5/wy}, (S = {OBJ2/y})

Result ← {Volume(x,vs), Density(x,dx), Equal(wx,vx*dx),


Less-Than(wx,5), Type(y,ENDTABLE)}

Copyright Tom Mitchell 27 October 1994 10


Lessons from Safe-to-Stack Example

• Justified generalization from single example


• Explanation determines feature relevance
• Regression determines needed feature constraints
• Generality of result depends on domain theory
• Still require multiple examples

Copyright Tom Mitchell 27 October 1994 11


Perspectives on Prolog-EBG

• Theory-guided generalization from examples


• Example-guided operationalization of theories
• "Just" restating what learner already "knows"

Is it learning?
• Are you learning when you get better over time at
chess?
• Even though you already know everything in
principle, once you know rules of the game...
• Are you learning when you sit in a mathematics
class?
• Even though those theorems follow
deductively from the axioms you’ve already
learned...

Copyright Tom Mitchell 27 October 1994 12


Combining Inductive and Analytical
Learning

[Read Ch. 12]


[Suggested exercises: 12.1, 12.2, 12.6, 12.7, 12.8]
 Why combine inductive and analytical learning?
 KBANN: Prior knowledge to initialize the
hypothesis
 TangetProp, EBNN: Prior knowledge alters
search objective
 FOCL: Prior knowledge alters search operators

1 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Inductive and Analytical Learning

Inductive learning Analytical learning


Hypothesis ts data Hypothesis ts domain theo
Statistical inference Deductive inference
Requires little prior knowledge Learns from scarce data
Syntactic inductive bias Bias is domain theory

2 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
What We Would Like

Inductive learning Analytical learning


Plentiful data Perfect prior knowledge
No prior knowledge Scarce data

General purpose learning method:


 No domain theory ! learn as well as inductive
methods
 Perfect domain theory ! learn as well as
Prolog-EBG

 Accomodate arbitrary and unknown errors in


domain theory
 Accomodate arbitrary and unknown errors in
training data

3 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Domain theory:
Cup Stable, Liftable, OpenVessel
Stable BottomIsFlat
Liftable Graspable, Light
Graspable HasHandle
OpenVessel HasConcavity, ConcavityPointsUp
Training examples:

p Cups
p p p p pNon-Cups
p p
BottomIsFlat p pppp pp
ConcavityPoints Up p p p p
Expensive p p pp p p
Fragile p p
HandleOnTop p p p
HandleOnSide p pppp ppp p
HasConcavity p pp p p
HasHandle p pppppp p
Light p p pp
MadeOfCeramic p p
MadeOfPaper pp p p
MadeOfStyrofoam

4 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
KBANN

KBANN (data D, domain theory B )


1. Create a feedforward network h equivalent to B
2. Use Backprop to tune h to t D

5 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Neural Net Equivalent to Domain Theory

Expensive
BottomIsFlat Stable
MadeOfCeramic
MadeOfStyrofoam
MadeOfPaper
HasHandle Graspable Liftable Cup
HandleOnTop
HandleOnSide
Light
OpenVessel
HasConcavity
ConcavityPointsUp
Fragile

6 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Creating Network Equivalent to Do-
main Theory

Create one unit per horn clause rule (i.e., an AND


unit)
 Connect unit inputs to corresponding clause
antecedents
 For each non-negated antecedent, corresponding
input weight w W , where W is some constant
 For each negated antecedent, input weight
w ?W
 Threshold weight w 0 ?(n ? :5)W , where n is
number of non-negated antecedents
Finally, add many additional connections with
near-zero weights
Liftable Graspable; :Heavy

7 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Result of re ning the network

Expensive
BottomIsFlat Stable
MadeOfCeramic
MadeOfStyrofoam
MadeOfPaper
HasHandle Graspable Liftable Cup
HandleOnTop
HandleOnSide
Light
Open-Vessel
HasConcavity
ConcavityPointsUp
Fragile Large positive weight
Large negative weight
Negligible weight

8 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
KBANN Results

Classifying promoter regions in DNA


leave one out testing:
 Backpropagation: error rate 8/106
 KBANN: 4/106
Similar improvements on other classi cation,
control tasks.

9 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Hypothesis space search in KBANN

Hypothesis Space

Hypotheses that
fit training data
equally well
Initial hypothesis
for KBANN

Initial hypothesis
for BACKPROPAGATION

10 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
EBNN

Key idea:
 Previously learned approximate domain theory
 Domain theory represented by collection of
neural networks
 Learn target function as another neural network

11 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Explanation of
training example Stable
in terms of
domain theory:

BottomIsFlat =T
ConcavityPointsUp =T Graspable Liftable Cup
Expensive =T
Fragile =T
HandleOnTop =F
HandleOnSide =T Cup = T
HasConcavity =T
HasHandle =T
Light =T
0.8
MadeOfCeramic =T
MadeOfPaper =F
MadeOfStyrofoam =F

0.2 OpenVessel

Training
derivatives

Target network: BottomIsFlat


ConcavityPointsUp
Expensive Cup
Fragile target
HandleOnTop
HandleOnSide Cup
HasConcavity
HasHandle
Light
MadeOfCeramic
MadeOfPaper
MadeOfStyrofoam

12 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Modi ed Objective for Gradient Descent

2 0 12 3
E= X 666
(f (x ) ? f^(x )) +  2 X BB
B@
@A(x) ? @ f^(x) CC
CA
77
77
i
64 i i i
j @x j
@x j
(x=xi )
5

where
 1? jA(x ) ? f (x )j i i
i
c
 f (x) is target function
 f^(x) is neural net approximation to f (x)
 A(x) is domain theory approximation to f (x)

13 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
f(x)
h
f(x1)
f(x2)
f(x3) f
g

x1 x2 x3 x x x

14 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Hypothesis Space Search in EBNN

Hypothesis Space

Hypotheses that Hypotheses that


maximize fit to maximize fit to data
data and prior
knowledge

TANGENTPROP
Search BACKPROPAGATION
Search

15 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
Search in FOCL

Cup

Generated by the
domain theory

Cup HasHandle
[2+,3–]

Cup HasHandle
[2+,3–]
Cup Fragile ...
Cup BottomIsFlat,
[2+,4–] Light,
HasConcavity,
ConcavityPointsUp
[4+,2–]

Cup BottomIsFlat,
Light,
HasConcavity,
...
ConcavityPointsUp
HandleOnTop Cup BottomIsFlat,
[0+,2–] Light,
Cup BottomIsFlat, HasConcavity,
Light, ConcavityPointsUp,
HasConcavity, HandleOnSide
ConcavityPointsUp, [2+,0–]
HandleOnTop
[4+,0–]

16 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997
FOCL Results

Recognizing legal chess endgame positions:


 30 positive, 30 negative examples
 FOIL: 86%
 FOCL: 94% (using domain theory with 76%
accuracy)
NYNEX telephone network diagnosis
 500 training examples
 FOIL: 90%
 FOCL: 98% (using domain theory with 95%
accuracy)

17 lecture slides for textbook Machine Learning, c T. Mitchell, McGraw Hill, 1997

You might also like