0% found this document useful (0 votes)
9 views49 pages

LEARNING of Rules 1

This document discusses the learning of sets of if-then rules, particularly through inductive logic programming (ILP) and first-order Horn clauses. It covers various algorithms for learning these rules, including sequential covering algorithms and FOIL, which generate candidate rules and evaluate their performance based on training data. The document emphasizes the expressiveness of first-order logic in representing complex relationships and the process of refining rules through greedy search and variable binding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views49 pages

LEARNING of Rules 1

This document discusses the learning of sets of if-then rules, particularly through inductive logic programming (ILP) and first-order Horn clauses. It covers various algorithms for learning these rules, including sequential covering algorithms and FOIL, which generate candidate rules and evaluate their performance based on training data. The document emphasizes the expressiveness of first-order logic in representing complex relationships and the process of refining rules through greedy search and variable binding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

LEARNING

SETS OF RULES
• One of the most expressive and human readable
representations for learned hypotheses is sets of if-then
rules. This chapter explores several algorithms for
learning such sets of rules. One important special case
involves learning sets of rules containing variables, called
first-order Horn clauses. Because sets of first-order Horn
clauses can be interpreted as programs in the logic
programming language PROLOG.. Learning them is often
called inductive logic programming (ILP).
INTRODUCTION

• It is useful to learn the target function represented as a


set of if-then rules that jointly define the function.
• one way to learn sets of rules is to first learn a decision
tree, then translate the tree into an equivalent set of
rules-one rule for each leaf node in the tree.
• Use a genetic algorithm that encodes each rule set as a bit
string and uses genetic search operators to explore this
hypothesis space.
• We explore a variety of algorithms that directly learn rule
sets.
• First-order logic (FOL), also known as predicate logic or
first-order predicate calculus, is a powerful framework
used in various fields such as mathematics, philosophy,
linguistics, and computer science. In artificial intelligence
(AI), FOL plays a crucial role in knowledge representation,
automated reasoning, and natural language processing.
• First-order logic extends propositional logic by
incorporating quantifiers and predicates, allowing for
more expressive statements about the world and natural
language processing.
The key components of FOL
• Constants: Constants represent specific objects within the domain of discourse. For
example, in a given domain, Alice, 2, and NewYork could be constants.
• Variables: Variables stand for unspecified objects in the domain. Commonly used symbols
for variables include x, y, and z.
• Predicates: Predicates are functions that return true or false, representing properties of
objects or relationships between them. For example, Likes(Alice, Bob) indicates that Alice
likes Bob, and GreaterThan(x, 2) means that x is greater than 2.
• Functions: Functions map objects to other objects. For instance, MotherOf(x) might
denote the mother of x.
• Quantifiers: Quantifiers specify the scope of variables. The two main quantifiers are:
• Universal Quantifier (∀): Indicates that a predicate applies to all elements in the domain.
• For example, ∀x (Person(x) → Mortal(x)) means "All persons are mortal."
• Existential Quantifier (∃): Indicates that there is at least one element in the domain for which the
predicate holds.
• For example, ∃x (Person(x) ∧ Likes(x, IceCream)) means "There exists a person who likes ice cream."

• Logical Connectives: Logical connectives include conjunction (∧), disjunction (∨),


implication (→), biconditional (↔), and negation (¬). These connectives are used to form
complex logical statements.
As an example of first-order rule sets, consider the following two
rules that jointly describe the target concept Ancestor. Here we
use the predicate Parent(x, y) to indicate that y is the mother
or father of x, and the predicate Ancestor(x, y) to indicate that
y is an ancestor of x related by an arbitrary number of family
generations.
• Example of first-order rule sets
SEQUENTIAL COVERING
ALGORITHMS
• Learning one rule, removing the data it covers, then iterating this process.
• Such algorithms are called sequential covering algorithms.
• We have a subroutine LEARN-ONE-RULE that accepts a set of positive
and negative
• training examples as input, then outputs a single rule that covers many of the
• positive examples and few of the negative examples.
• We require that this output rule have high accuracy, but not necessarily high
coverage.
• One obvious approach to learning a set of rules is to invoke LEARN-ONE-
RULE on all the available training examples, remove any positive examples
covered by the rule it learns, then invoke it again to learn a second rule based
on the remaining training examples.
Contd..
• This procedure can be iterated as many times as desired
to learn a disjunctive set of rules.
• This is called a sequential covering algorithm because it
sequentially learns a set of rules that together cover the
full set of positive examples.
• The final set of rules can then be sorted so that more
accurate rules will be considered first when a new
instance must be classified.
Contd..
• It reduces the problem of learning a disjunctive set of
rules to a sequence of simpler problems.
• It performs a greedy search, formulating a sequence of
rules without backtracking, it is not guaranteed to find the
smallest or best set of rules that cover the training
examples.
• How shall we design LEARN-ONE-RULE to meet the
needs of the sequential covering algorithm?
General to Specific Beam Search

• One effective approach to implementing LEARN-ONE-RULE is to


organize the hypothesis space search in the same general fashion
as the ID3 algorithm.
• The search begins by considering the most general rule
precondition possible.
• Then greedily adding the attribute test that most improves rule
performance measured over the training examples.
• Once this test has been added, the process is repeated by
greedily adding a second attribute test, and so on.
Contd..
• This process grows the hypothesis by greedily adding new attribute
tests until the hypothesis reaches an acceptable level of performance.
• It follows only a single descendant at each search step-the attribute-
value
pair yielding the best performance.
• As with any greedy search, there is a danger that a suboptimal choice
will be made at any step .
• To reduce this risk, we can extend the algorithm to perform a beam
search
Day Outlook Temperature Humidity Wind Play Tennis
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
Contd..

• The algorithm maintains a list of the k best candidates at


each step, rather than a single best candidate.
• On each search step, descendants are generated for each
of these k best candidates, and the resulting set is again
reduced to the k most promising members.
• Beam search keeps track of the most promising
alternatives to the current top-rated hypothesis…
Contd..
• Remarks:
• Each hypothesis considered in the main loop of the algorithm is a conjunction of attribute-
value constraints.
• Each of these conjunctive hypotheses corresponds to a candidate set of preconditions for the
rule to be learned and is evaluated by the entropy of the examples it covers.
• The search considers increasingly specific candidate hypotheses until it reaches a maximally
specific hypothesis that contains all available attributes.
• The rule that is output by the algorithm is the rule encountered during the search whose
PERFORMANCE is greatest.
• Some common functions:
• Relative frequency. Let n denote the number of examples the rule matches and let nc denote
the number of these that it classifies correctly.

• m-estimate of accuracy:It is often preferred when data is scarce and the rule must be
evaluated based on few examples.
Contd..
• Entropy:Entropy measures the uniformity of the target function values for this set of
examples.
LEARNING FIRST-ORDER
RULES
• We consider here learning rules that contain variables-in
particular, learning first-order Horn theories.
• Our motivation for considering such rules is that they are
much more expressive than propositional rules.
• Inductive learning of first-order rules or theories is often
referred to as inductive logic programming
First-Order Horn Clauses
• Consider the task of learning the simple target concept
Daughter (x, y), defined over pairs of people x and y.
• The value of Daughter(x, y) is True when x is the
daughter of y, and False otherwise.
• Each person in the data is described by the attributes
Name, Mother, Father, Male, Female.
• Each training example will consist of the description of
two people in terms of these attributes, along with the
value of the target attribute Daughter.
• The following is a positive example in which Sharon is the
daughter

• If we were to collect a number of such training examples


for the target concept Daughterl,2 and provide them to a
propositional rule learner such as CN2 or C4.5.
Contd..
• The result will be as follows:

• This rule is so specific that it will rarely, if ever, be useful in


classifying future pairs of people.
• The problem is that propositional representations offer no
general way to describe the essential relations among the
values of the attributes.
Contd..
• A program using first-order representations could learn
the following general rule:

• x and y are variables that can be bound to any person.


• First-order Horn clauses may also refer to variables in the
preconditions that
do not occur in the post conditions.
Contd..
• Ex:

• such variables occurs only in the preconditions are


assumed to be existentially quantified.
• If we use the same predicates in the rule post conditions and
preconditions, enabling the description of recursive rules.
Terminology
• All expressions are composed of constants (e.g., Bob,
Louise), variables (e.g., x, y), predicate symbols (e.g.,
Married, Greater-Than), and function symbols (e.g.,
age).
• We will use lowercase symbols for variables and
capitalized symbols for constants.
• A term is any constant, any variable, or any function
applied to any term (e.g., Bob, x, age(Bob)).
• A literal is any predicate or its negation applied to any
term (e.g., Married(Bob, Louise), -Greater-
Than(age(Sue), 20)).
• A clause is any disjunction of literals, where all variables
are assumed to be
• universally quantified.
• A Horn clause is a clause containing at most one
positive literal, such as
• The above Horn clause can alternatively be written in the
form

• This is equivalent to the following, using our earlier rule


notation
FOIL
• It employs an approach very similar to the SEQUENTIAL-
COVERING and LEARN-ONE RULE algorithms of the
previous section.
• The hypotheses learned by FOIL are sets of first-order rules,
where each rule is similar to a Horn clause with two exceptions.
• The rules learned by FOIL are more restricted than general Horn
clauses.
• FOIL rules are more expressive than Horn clauses,
• FOIL has been applied to a variety of problem domains.
Contd..
• Differences are
• In its general-to-specific search to 'learn each new rule, FOIL
employs different detailed steps to generate candidate
specializations of the rule.
• FOIL employs a PERFORMANCE measure, Foil-Gain, that differs from
the entropy measure shown for LEARN-ONE-RULE
Contd..
• Generating Candidate Specializations in FOIL
• FOIL generates a variety of new literals, each of which may be
individually added to the rule preconditions.

• FOIL generates candidate specializations of this rule by


considering new literals L n+1 that fit one of the following
forms:
• At least one of the value in the created literal must already exist as a
variable in the rule.
Contd..
• Equal(xj, xk), where xi and xk are variables already present in
the rule.
• The negation of either of the above forms of literals.
• Ex: consider learning rules to predict the target literal
Grand- Daughter(x, y), where the other predicates used
to describe examples are Father and Female.
Contd..
• To specialize this initial rule, the above procedure
generates the following literals as candidate additions to
the rule preconditions: Equal ( x , y ) , Female(x),
Female(y), Father(x, y), Father(y, x), Father(x, z),
Father(z, x), Father(y, z), Father- (z, y), and the
negations of each of these literals (e.g., -Equal(x, y)).
• Among the above literals FOIL greedily selects Father-
( y , z) as the most promising, leading to the more specific
rule
Contd…
• FOIL will now consider all of the literals mentioned in the
previous step, plus the additional literals Female(z),
Equal(z, x), Equal(z, y), Father(z, w), Father(w, z),
and their negations.
• If FOIL at this point were to select the literal Father(z, x)
and on the
next iteration select the literal Female(y), this would lead
to the following rule
Guiding the Search in FOIL
• To select the most promising literal from the candidates
generated at each step, FOIL considers the performance
of the rule over the training data.
• To illustrate this process, consider again the example in
which we seek to learn a set of rules for the target literal
GrandDaughter(x, y).
• Assume the training data includes the following simple set
of assertions
Contd..
• To select the best specialization of the current rule, FOIL considers each
distinct way in which the rule variables can bind to constants in the
training
examples.
• The rule variables x and y are not constrained by any preconditions
and may
therefore bind in any combination to the four constants Victor,
Sharon, Bob, and
Tom
Contd..
• We will use the notation {x/Bob, y/Shar on} to denote a particular
variable
• binding; that is, a substitution mapping each variable to a constant.
• Given the four possible constants, there are 16 possible variable bindings
for this initial rule.
• The binding {x/victor, y/Sharon} corresponds to a positive example
binding, because the training data includes the assertion
GrandDaughter(Victor, Sharon).
• The other 15 bindings allowed by the rule (e.g., the binding {x/Bob,
y/Tom}) constitute negative evidence
Contd..
• At each stage, the rule is evaluated based on these sets of positive
and negative variable bindings, with preference given to rules
that possess more positive bindings and fewer negative bindings.
• As new literals are added to the rule, the sets of bindings will
change.
• If a literal is added that introduces a new variable, then the
bindings for the rule will grow in length (e.g., if Father(y, z) is
added to the above rule, then the original binding({x/victor,
y/Sharon, z/Bob).
Contd..
• The evaluation function used by FOIL to estimate the
utility of adding a new literal is based on the numbers of
positive and negative bindings covered before and after
adding the new literal.
• More precisely, consider some rule R, and a candidate
literal L that might be added to the body of R. Let R' be
the rule created by adding literal L to rule R.
• The value Foil-Gain(L, R) of adding L to R is defined as
Contd..

• po is the number of positive bindings of rule R,


• no is the number of negative bindings of R,
• pl is the number of positive bindings of rule R', and
• n1 is the number of negative bindings of R'.
• Finally, t is the number of positive bindings of rule R that are still
covered after adding literal L to R.
Learning Recursive Rule Sets
• If we include the target predicate in the input list of
Predicates, then FOIL will consider it as well when
generating candidate literals.
• This will allow it to form recursive rules-rules that use the
same predicate in
the body and the head of the rule.
• IF Parent (x, y) THEN Ancestor(x, y)
• IF Parent (x, z) A Ancestor(z, y) THEN
Ancestor(x,y)
Contd..
• The second rule is among the rules that are potentially
within reach of FOIL'S search, provided Ancestor is
included in the list Predicates that determines which
predicates may be considered when generating new literals.
• Of course whether this particular rule would be learned or
not depends on whether these particular literals outscore
competing candidates during FOIL'S greedy search for
increasingly specific rules

You might also like