The document discusses analytical learning in machine learning, focusing on inductive and deductive learning methods, particularly emphasizing explanation-based learning (EBL) and the PROLOG-EBG algorithm. It outlines the importance of domain theory in improving learning performance and details various algorithms like KBANN, TANGENTPROP, and EBNN that integrate prior knowledge with training data. The document also compares inductive and analytical approaches, highlighting their goals, justifications, and the challenges faced in combining both methods for effective learning.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
4 views50 pages
ML Unit 5
The document discusses analytical learning in machine learning, focusing on inductive and deductive learning methods, particularly emphasizing explanation-based learning (EBL) and the PROLOG-EBG algorithm. It outlines the importance of domain theory in improving learning performance and details various algorithms like KBANN, TANGENTPROP, and EBNN that integrate prior knowledge with training data. The document also compares inductive and analytical approaches, highlighting their goals, justifications, and the challenges faced in combining both methods for effective learning.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50
MACHINE LEARNING
01/22/2025 Dr S Pratap Singh 1
UNIT -V
01/22/2025 Dr S Pratap Singh 2
ANALYTICAL LEARNING • Inductive Learning: • Based on pattern[ Any missing data it assumes and does classification] • Ex: Decision tree learning, Neural networks,.. • Only training example is given for learning. • Deductive Learning: Does not based on pattern.
01/22/2025 Dr S Pratap Singh 4
ANALYTICAL LEARNING • Use prior knowledge and deductive reasoning • Past data /past experience Reasoning based on facts. • Also called as explanation based learning ( along with tr.ex. , we also give explanation) • Also used when there is missing data( more effective than Inductive learning)(because of explanation) • Ex: Chess game 01/22/2025 Dr S Pratap Singh 5 • Target Concept: "chessboard positions in which black will lose its queen within two moves.”
• Prior Knowledge: Legal Moves of chess
• In analytical learning, the input to the learner includes the same hypothesis space H and training examples D as for inductive learning. • A domain theory B consisting of background knowledge that can be used to explain observed training examples. • The desired output of the learner is a hypothesis h from H that is consistent with both the training examples D and the domain theory B. Learning with Perfect Domain Theory PROLOG-EBG • Analytical learning is Explanation based learning- >related to domain theory(DT)[knowledge in specific field] • Ex:Mathematics: domain • Science: domain • Domain theory is always said to be correct and complete • --correct: If each assertion made by DT is always true • --complete:If it covers each and every positive example 01/22/2025 Dr S Pratap Singh 8 • Need of Domain Theory: • 1.Improved performance • 2. Difficult to achieve a perfect domain Ex: PROLOG-EBG[programming with logic-explanation based learning] Mainly based on sequential learning and horn clauses 3 steps : Explaining(+ve tr.exs), gives explanation -Analysis(whether the explanation give is correct / not , suitable to our condition/not) -Refining(Adding horn clauses/generalizations in order to get a pure hyp) Illustrative example: safetostack 01/22/2025 Dr S Pratap Singh 9 SafeToStack LEARNING WITH PERFECT DOMAIN THEORIES: PROLOG-EBG • A domain theory is said to be correct if each of its assertions is a truthful statement about the world. • A domain theory is said to be complete with respect to a given target concept and instance space, if the domain theory covers every positive example in the instance space. PROLOG-EBG • PROLOG-EBG algorithm is a sequential covering algorithm that considers the training data incrementally. • For each new positive training example that is not yet covered by a learned Horn clause, it forms a new Horn clause by: (1) Explaining the training example. (2) Analyzing this explanation to determine an appropriate generalization. (3) Refining the current hypothesis by adding a new Horn clause rule to cover this positive example, as well as other similar instances. • PROLOG-EBG computes the most general rule by computing the weakest preimage of the explanation. • The weakest preimage of a conclusion C with respect to a proof P is the most general set of initial assertions A, such that A entails C according to P.
• PROLOG-EBG computes the weakest preimage of the target
concept with respect to the explanation, using a general procedure called regression. • The regression procedure operates on a domain theory represented by an arbitrary set of Horn clauses. • Regression works iteratively backward through the explanation, first computing the weakest preimage of the target concept with respect to the final proof step in the explanation, then computing the weakest preimage of the resulting expressions with respect to the preceding step, and so on. • The procedure terminates when it has iterated over all steps in the explanation, yielding the weakest precondition of the target concept with respect to the literals at the leaf nodes of the explanation. • The heart of the regression procedure is the algorithm that at each step regresses the current frontier of expressions through a single Horn clause from the domain theory. PROLOG-EBG Remarks on EB Learning • EBL as theory-guided generalization of examples. (Explanations are used to distinguish relevant from irrevalent features) • EBL as example-guided reformulation of theories(Examples are used to focus on which reformulations to make in order to produce operational concepts) • EBL as knowledge compilation(Explanations that are particularly useful for explaning the training examples are compiled out to improve efficiency) 01/22/2025 Dr S Pratap Singh 19 EBL of Search Control Knowledge • To find some move towards the goal state, the definitions of legal search operators provide correct anc complete DT for learning search control knowledge. • Need to choose perfect Target concept that depends on the intenal structure of problem solver. • Use PRODIGY ( a domain-independent planning system) that accepts the def. of a problem domain in terms of state space S and operators O. • If One subgoal to be solved is On(x,y) and • One subgoal to be solved is On(y,z) • Then Solve the subgoal On(y,z) before On(x,y) 01/22/2025 Dr S Pratap Singh 20 COMBINING INDUCTIVE AND ANALYTICAL LEARNING • Inductive methods, such as decision tree induction and neural network BACKPROPAGATION, seek general hypotheses that fit the observed training data. • Analytical methods, such as PROLOG-EBG, seek general hypotheses that fit prior knowledge while covering the observed data. • Inductive methods give statistical justification. • Analytical methods give logistic justification. Inductive Learning Analytical Learning
Goal Hypothesis fits data Hypothesis fits domain theory
Advantages Requires little prior Learns from scarce data
knowledge Pitfalls Scarce data, incorrect bias Imperfect domain theory INDUCTIVE-ANALYTICAL APPROACHES TO LEARNING • The learning problem Given: A set of training examples D, possibly containing errors A domain theory B, possibly containing errors A space of candidate hypotheses H Determine: A hypothesis that best fits the training examples and domain theory • There are 2 approaches: • 1. To find best fit find errorD(h) , errorB(h) • errorD(h)- defined to be the proportion of examples from D that are misclassified by h. • errorB(h)- of h with respect to a domain theory B to be the probabitlity that h will disagree with B on the classification of a randomly drawn instance. • We could require the hypothesis that minimizes some combined measure of these errors. 01/22/2025 Dr S Pratap Singh 25 • It is not clear what values to assign to KB and KD to specify the relative importance of fitting the data versus fitting the theory. • If we have a poor theory and great deal of reliable data, it will be best to weight errorD(h) more heavily. • Given a strong theory and a small sample of very noisy data, the best results would be obtained by weighting errorB(h) more heavily. • Note: The learner doesnot know in advance the quality of the domain theory or training data, it will be unclear how it should weight these two error components. 01/22/2025 Dr S Pratap Singh 26 Bayes theorem perspective • II nd approach to find the best fit hypothesis for this approach : • Bayes theorem computes this posterior probability based on observed data D, together with prior knowledge in the form of P(h), P(D) and P(D/h). • The Bayesian vies is that one should simply choose the hypothesis whose posterior probability is greatest and that BT provides the proper method for weighting the contribution of this prior knowledge and observed data. 01/22/2025 Dr S Pratap Singh 27 • When the quantities are imperfectly known , Bayes theorem alone does not prescribe how to combine them with the observed data. • The learning problem is to minimize some combined measure of the error of the hypothesis over the data and the domain theory.
01/22/2025 Dr S Pratap Singh 28
• Hypothesis Space Search i) Use prior knowledge to derive an initial hypothesis from which to begin the search: In this approach domain theory B is used to construct an initial hypothesis h0 that is consistent with B. Ex: KBANN It uses prior knowledge to design the interconnections and weights for an initial network, so that this initial network is perfectly consistent with the given DT. This initial network hypothesis is then refined inductively using BackPropagationAlgorithm and available data. Consistent hyp with DT makes final o/p hyp will better fit this theory. • Other types of Hypothesis space search:
i) Use prior knowledge to alter the
objective of the hypothesis space search.(TangentProp , EBNN) ii) Use prior knowledge to alter the available search steps.(FOCL)
01/22/2025 Dr S Pratap Singh 30
KBANN Algorithm • Learning Task : Identify Cup Limitations of KBANN • Accommodate only propositional domain theories.[ Collection of variable free horn clauses] • Possibility of misled by highly inaccurate domain theories.[Generalization accuracy can detoriate below the level of Back propagation alg] The TangentProp Algorithm • TANGENTPROP accommodates domain knowledge expressed as derivatives of the target function with respect to transformations of its inputs. • Prior knowledge is to incorporate it into the error criterion minimized by gradient descent, so that the network must fit a combined function of the training data and domain theory. Ex: Handwritten characters recognition • TANGENTPROP algorithm trains a neural network to fit both training values and training derivatives. • Each training example consists of a pair (xi, f (xi)) [instance, training value] • The TANGENTPROP algorithm assumes various training derivatives of the target function are also provided.[derivative] • If each instance xi is described by a single real value, then each training example may be of the form 1.Representation of x and its corresponding training data. 2.Back propagation algorithm , smooth interpolation is observed. 3.Tangent algorithm graph(considers 3 tuple training data) Slope gives more accurate results than BPA.
01/22/2025 Dr S Pratap Singh 38
• BACKPROPAGATION algorithm performs gradient descent to attempt to minimize the sum of squared errors
Where f(xi) – true target function value, 2nd function is learned NN. • The modified error function in TangentProp is
• To assert both rotational invariance and translational invariance of the character
identity. 1st-training derivative ,2nd-actual derivative Alpha is continuous derivative • TANGENTPROP uses prior knowledge in the form of desired derivatives of the target function with respect to transformations of its inputs. • Combines this prior knowledge with observed training data, by minimizing an objective function that measures both the network's error with respect to the training example values (fitting the data) and its error with respect to the desired derivatives (fitting the prior knowledge). • The value of μ determines the degree to which the network will fit one or the other of these two components in the total error. EBNN (Explanation-Based Neural Network learning) algorithm • It builds on the TANGENTPROP algorithm in two significant ways. • First, instead of relying on the user to provide training derivatives, EBNN computes training derivatives itself for each observed training example. • Second, EBNN addresses the issue of how to weight the relative importance of the inductive and analytical components of learning. • The value of μ is chosen independently for each training example, based on a heuristic that considers how accurately the domain theory predicts the training value for this particular example. • The top portion of this figure depicts an EBNN domain theory for the target function Cup, with each rectangular block representing a distinct neural network in the domain theory. • Some networks take the outputs of other networks as their inputs (e.g., the rightmost network labeled Cup takes its inputs from the outputs of the Stable, Liftable and OpenVessel networks). • Thus, the networks that make up the domain theory can be chained together to infer the target function value for the input instance, just as Horn clauses might be chained together for this purpose. • In general, these domain theory networks may be provided to the learner by some external source, or they may be the result of previous learning by the same system. • EBNN makes use of these domain theory networks to learn the new , target function. It does not alter the domain theory networks during this process. • EBNN calculates the partial derivative of prediction(to minimize the error functions) with respect to each instance feature, yielding the set of derivatives
• This set of derivatives is the gradient of the domain
theory prediction function with respect to the input instance. • EBNN uses a minor variant of the TANGENTPROP algorithm to train the target network to fit the following error function
• Here A(xi) is used-domain theory prediction of xi.
• Xj- jth component of vector x. • We are calculating domain theory prediction. FOCL Algorithm FOCL is an extension of the purely inductive FOIL algorithm. FOIL and FOCL learn a set of first-order Horn clauses to cover the observed training examples. Both employ a sequential covering algorithm that learns a single Horn clause, removes the positive examples covered by this new Horn clause, and then iterates this procedure over the remaining training examples.
01/22/2025 Dr S Pratap Singh 46
• In FOCL and FOIL new Horn clause is created by performing a general-to-specific search, beginning with the most general possible Horn clause.(TF) • Several candidate specializations of the current clause are then generated , and the specialization with greatest information gain relative to the training examples is chosen.
01/22/2025 Dr S Pratap Singh 47
• FOIL generates each candidate specialization by adding a single new literal to the clause preconditions. • FOCL uses this same method for producing candidate specializations, but also generates additional specializations based on the domain theory.
01/22/2025 Dr S Pratap Singh 48
01/22/2025 Dr S Pratap Singh 49 • FOIL- considers only training data. • FOCL-consider TD and DT. • Most general is cup (constructing from top to bottom) • 2+,3 Has handle 2 positive , 3 negative. • Using DT, adding leaf to the theory • Cup is stable , liftable , open vessel ----DT(non operational ones). General thing is replaced with operational ones.(Bottom is flat) 01/22/2025 Dr S Pratap Singh 50