0% found this document useful (0 votes)
189 views12 pages

ML Unit V

Uploaded by

Vasu 22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
189 views12 pages

ML Unit V

Uploaded by

Vasu 22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Inductive Learning:

 Inductive learning require a certain number of training examples to achieve a


given level of generalization accuracy
 It generalizes from observed training examples by identifying features that
empirically distinguish positive from negative training examples.
Example: -Decision Tree Learning
-Inductive Logic Programming (ILP)
or FOIL -Neural Network Learning and -Genetic Algorithms etc.
Disadvantage: -These perform poorly when insufficient data is available
Inductive Learning Given:
Instance space X Hypothesis space H Training examples D of some target
function f.
D = {h x 1,f ( x 1 ) i,... h x n,f ( x n )i}
Determine: A hypothesis from H consistent with training examples D.
Analytical Learning
Given: Instance space X Hypothesis space H Training examples D of some
target function f.
D = {h x 1,f ( x 1 ) i,... h x n,f ( x n )i}
Domain theory B for explaining training examples
Determine: A hypothesis from H consistent with both the training examples D
and domain theory B. We say B “explains” hx,f ( x ) i if x + B ⊢ f ( x ) B is
“consistent with” h if B 6⊢ ¬ h
Inductive Learning Analytical Learning
Goal Hypothesis fits data Hypothesis fits domain theory
Justification Statistical inference Deductive Inference
Advantages Requires little prior knowledge Learns from scarce data
Pitfalls Scarce data, incorrect bias Imperfect domain theory

 They seem very complementary.


 We want something in between!
 Example: If your had to review a medical database and learn "symptoms for
which drug X is more effective than Y" you would look at relevant attributes
(temperature, not insurance), then refine using data.
 Some domain theory (background knowledge) does squeeze into inductive
methods, e.g., when choosing an encoding.
Explanation based Learning

Explanation based learning has ability to learn from a single training instance.
Instead of taking more examples the explanation based learning is emphasized
to learn a single, specific example. For example, consider the Ludoo game. In a
Ludoo game, there are generally four colors of buttons. For a single color there
are four different squares. Suppose the colors are red, green, blue and yellow.
So maximum four members are possible for this game. Two members are
considered for one side (suppose green and red) and other two are considered
for another side (suppose blue and yellow). So for any one opponent the other
will play his game. A square sized small box marked by symbols one to six is
circulated among the four members. The number one is the lowest number and
the number six is the highest for which all the operations are done. Always any
one from the 1st side will try to attack any one member in the 2 nd side and vice
versa. At any instance of play the players of one side can attack towards the
players of another side. Likewise, all the buttons may be attacked and rejected
one by one and finally one side will win the game. Here at a time the players of
one side can attack towards the players of another side. So for a specific player,
the whole game may be affected. Hence we can say that always explanation
based learning is concentrated on the inputs like a simple learning program, the
idea about the goal state, the idea about the usable concepts and a set of rules
that describes relationships between the objects and the actions.
Explanation based generalization (EBG) is an algorithm for explanation based
learning, described in Mitchell at al. (1986). It has two steps first, explain
method and secondly, generalize method. During the first step, the domain
theory is used to prune away all the unimportant aspects of training examples
with respect to the goal concept. The second step is to generalize the
explanation as far as possible while still describing the goal concept. Consider
the problem of learning the concept bucket. We want to generalize from a single
example of a bucket. At first collect the following informations.

3. Goal: Bucket
B is a bucket if B is liftable, stable and open-vessel.

4. Description of Concept: These are expressed in purely structural forms like


Deep, Flat, rounded etc.

Given a training example and a functional description, we want to build a


general structural description of a bucket. In practice, there are two reasons why
the explanation based learning is important.

Hypothesis by using prior knowledge


KBANN
 The Knowledge-Based Artificial Neural Network (KBANN ) algorithm uses
prior knowledge to derive hypothesis from which to begin search.
 It first constructs a ANN that classifies every instance as the domain theory
would.
 So, if B is correct then we are done!
 Otherwise, we use Backpropagation to train the network.

KBANN Algorithm
KBANN(domainTheory, trainingExamples)

1. for each instance attribute create a network input.


2. for each Horn clause in domainTheory, create a network unit
1. Connect inputs to attributes tested by antecedents.
2. Each non-negated antecedent gets a weight W.
3. Each negated antecedent gets a weight -W
4. Threshold weight is -(n - .5), where n is the number of non-negated
antecedents.
3. Make all other connections between layers, giving these very low weights.
4. Apply Backpropagation using trainingExamples

KBANN Example

 Domain theory

Cup ← Stable, Liftable, OpenVessel


Stable ← BottomIsFlat
Liftable ← Graspable, Light
Graspable ← HasHandle
OpenVessel ← HasConcavity, ConcavityPointsUp

 Training Examples

Cup Non-Cups
BottomIsFlat X X X X X X X X
ConcavityPointsUp X X X X X X X
Expensive X X X X
Fragile X X X X X X
HandleOnTop X X
HandleOnSide X X X
HasConcavity X X X X X X X X X
HasHandle X X X X X
Light X X X X X X X X
MadeOfCeramic X X X X
MadeOfPaper X X
MadeOfStyrofoam X X X X
KBANN Example Network

After Training

KBANN Results
 In classifying promoter regions in DNA: Backpropagation got 8/106 error
rate, KBANN got 4/106.
 It does typically generalize more accurately than backpropagation.

Hypothesis Space Search

 We can view it as search in H.


 KBANN starts at a better stop.
 As such, it likely to converge to a hypothesis that generalizes beyond the
data in a way that is similar to domain theory predictions.
 On the negative side, KBANN can only deal with propositional domain
theories.

TangentProp
 The TangentProp algorithm incorporates the prior knowledge into the error
criterion minimized by gradient descent.
 Specifically, the prior knowledge is in the form of known derivatives of the
target function.

TangentProp Example

 $X$ are images of single handwritten characters.


 Task is to correctly classify these characters.
 B is "the target function is invariant to small rotations of the character in the
image". How do we express this mathematically?
 Define a transformation $s(\alpha, x)$ which rotates $x$ by $\alpha$
degrees. Then say \[ \frac{\partial f(s(\alpha,x_i))}{\partial \alpha} = 0 \]
Then incorporate these into the error function. How?
 Add an additional term to the error function

TangentProp Search

 The value of $\mu$ must be chosen carefully by the designer.


 TangentProp is not robust to errors in the prior knowledge (it throws off
Backpropagation).
 TangentProp's is searching for different (maybe) hypothesis from simple
Backpropagation. It searches on a different path.

EBNN
 The Explanation-Based Neural Network (EBNN ) algorithm extends
TangentProp.
 It computes the derivatives itself.
 The value of $\mu$ is chosen independently for each example.
 It represents the domain theory with a collection of neural networks.
 Then, learns the target function as another network.

EBNN Example

 There is one network for each of the Horn clauses in the domain theory.
 EBNN uses the top network to calculate the partial derivative of the
prediction with respect to each feature of the instance. (i.e., how much does
the output change as I tweak BottomIsFlat?).
 These derivatives are given to the bottom network which is trained with a
variation of TangentProp.

EBNN Summary

 EBNN has been shown to generalize more accurately than backpropagation,


especially when training data is scarce.
 It has been used to learn to control a simulated mobile robot.
 EBNN, like Prolog-EBG, constructs explanations, but they are based on a
domain theory consisting of neural networks rather than Horn clauses.
 EBNN accommodates imperfect domain theories.
 EBNN learns a fixed size network, so it might be unable to represent
complex functions.

Explanation-Based Learning (EBL)

In simple terms, it is the ability to gain basic problem-solving techniques by


observing and analyzing solutions to specific problems. In terms of Machine
Learning, it is an algorithm that aims to understand why an example is a part
of a particular concept to make generalizations or form concepts from
training examples. For example, EBL uses a domain theory and creates a
program that learns to play chess.
The objective of EBL is to understand the essential properties of a particular
concept. So, we need to find out what makes an example, part of a particular
concept. Unlike FOIL algorithm, here we focus on the one example instead
of collecting multiple examples.
The ability to explain single examples is known as “Domain Theory”.
An EBL accepts 4 kinds of input:
i) A training example: what the learning model sees in the world.
ii) A goal concept: a high level description of what the model is
supposed to learn.
iii) A operational criterion: states which other terms
can appear in the generalized result.
iv) A domain theory: set of rules that describe relationships
between objects and actions in a domain.
From the above 4 parameters, EBL uses the domain theory to find that
training example, that best describes the goal concept while abiding by the
operational criterion and keeping our justification as general as possible.
EBL involves 2 steps:
1. Explanation — The domain theory is used to eliminate all the
unimportant training example while retaining the important ones that best
describe the goal concept.
2. Generalization — The explanation of the goal concept is made as
general and widely applicable as possible. This ensures that all cases are
covered, not just certain specific ones.
EBL Architecture:
 EBL model during training
 During training, the model generalizes the training example in
such a way that all scenarios lead to the Goal Concept, not just
in specific cases. (As shown in Fig 1)
 EBL model after training
 Post training, EBL model tends to directly reach the hypothesis
space involving the goal concept. (As shown in Fig 2)
FOCL Algorithm

The First Order Combined Learner (FOCL) Algorithm is an extension of the


purely inductive,

The goal of FOCL, like FOIL, is to create a rule in terms of the extensionally
defined predicates, that covers all the positive examples and none of the
negative examples. Unlike FOIL, FOCL integrates background knowledge
and EBL methods into it which leads to a much more efficient search of
hypothesis space that fits the training data. (As shown in Fig 3)

FOCL: Intuition
Like FOIL, FOCL also tends to perform an iterative process of learning a set
of best-rules to cover the training examples and then remove all the training
examples covered by that best rule. (using a sequential covering algorithm)
However, what makes the FOCL algorithm more powerful is the approach
that it adapts while searching for that best-rule.
A literal is called operational if it can describe the training example properly
leading to the output hypothesis. In contrast, literals that occur only as
intermediate features in the domain theory, but not as primitive attributes of
the instances, are considered non-operational. Non-operational predicates
are evaluated in the same manner as operational predicates in FOCL.

Algorithm Involved:

//Inputs Literal --> operationalized


List of positive examples
List of negative examples
//Output Literal --> operational form
Operationalize(Literal, Positive examples, Negative examples):
If(Literal = operational):
Return Literal
Initialize Operational_Literals to the empty set
For each clause in the definition of Literal
Compute information gain of the clause over Positive
examples and Negative examples
For the clause with the maximum gain
For each literal L in the clause
Operational_Literals <-- Operationalize(L, Positive
examples, Negative examples)

Working of the Algorithm

Step 1 – Use the same method as done in FOIL and add a single feature for
each operational literal that is not part of the hypothesis space so as to
create candidates for the best rule.
(solid arrows in Fig.4 denote specializations of bottle)
Step 2 – Create an operational literal that is logically efficient to explain the
goal concept according to the Domain Theory.
(dashed arrows in Fig.4 denote domain theory based specializations of
bottle)
Step 3 – Add this set of literals to the current preconditions of hypothesis.
Step 4 – Remove all those preconditions of hypothesis space that are
unnecessary according to the training data.
Let us consider the example shown in Fig 4.
 First, FOCL creates all the candidate literals that have the possibility of
becoming the best-rule (all denoted by solid arrows). Something we have
already seen in the FOIL algorithm. In addition, it creates several
logically relevant candidate literals of its own. (the domain theory)
 Then, it selects one of the literals from the domain theory whose
precondition matches with the goal concept. If there are several such
literals present, then it just selects one which gives the most
information related to the goal concept.
For example,
If the bottle (goal concept) is made of steel (while satisfying
the other domain theory preconditions),
then the algorithm will select that as it the most relevant
information related to the goal concept.
i.e. the bottle.

 Now, all those literals that removed unless the affect the classification
accuracy over the training examples. This is done so that the domain
theory doesn’t over specialize the result by addition irrelevant literals. This
set of literals is now added to the preconditions of the current hypothesis.
 Finally, one candidate literal which provides the maximum information
gain is selected out two specialization methods. (FOIL and domain
theory)
FOCL is a powerful machine learning algorithm that uses EBL and domain
theory techniques, reaching the hypothesis space quickly and efficiently. It
has shown more improved and accurate results than the Inductive FOIL
Algorithm. A study on “Legal Chessboard Positions” showed that on 60
training examples describing 30 legal and 30 illegal endgame board
positions, FOIL accuracy was about 86% while that of FOCL was about
94%.

You might also like