67% found this document useful (3 votes)
5K views82 pages

Machine Learning by Iresh A. Dhotre

This book is published by Technical Publications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
67% found this document useful (3 votes)
5K views82 pages

Machine Learning by Iresh A. Dhotre

This book is published by Technical Publications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 82
* TECHNICAL 2 PUBLICATIONS ‘An Up-Thrust for Knowledge ‘As per Revised Syllabus of , VISVESVARAYA TECHNOLOGICAL UNIVERSITY Machine Learning Semester - Vii (CSE / ISE) 1, A. Dhotre M.E. (Information Technology) Ex-Foculty, Sinhgad College of Engineering Pune °, - ==> TECHNICAL PUBLICATIONS §— issu -wenscnicattsonsog An Up-Thrust for Knowledge https/www.facebook.conytechnicalpublications o Machine Learning Semester - Vil (CSE/ISE) nt Edson : Aust 2018 © Copyright with Author : All publishing righis (printed ond ebook version) reserved with Technical Publications. No port of this book should be reproduced in any form, Electronic, Mecharical, Photocopy or eny information storage and retrieval system without prior permission in writin, from Technical Publications, Pune. Published by : . °F TEgHMIGAL 3 Ts Printer: Began hate Indl Ente, Nanded Vilage Rotd, TitHinel Datine 417083, Priee:€ 78 ISHN 974-99-932-2018-7 | viuis 9789933220187 (1) a 1 | | PREFACE ‘The importance of Machine Ledrning is well known in varlous engineering fields Overwhelming response to my books on various siubjects inspired me to write this’ book. The book Is structured to cover the key aspects of the subject Machine Learning. The book uses plain, lucid language to explain fundamentals of this subject. The book provides logical method of explaining various complicated concepts and stepwise methods to explain the important topics: Each chapter is well supported with necessary illustrations, practical examples and solved problems. All chapters in this book aré arranged.in a proper sequence that permits each topic to build upon earlier studies. All care has been taken to make students comfortable in understanding the basic concepts of this subject. The book not only covers the entire scope of the subject but explains the philosophy of the subject. This makes the understanding of this subject more clear and makes it more interesting. The book will be very useful not only to the students but also to the subject teachers. The students have to omit nothing and possibly have to cover nothing more. I wish to express my profound thanks to ail those who helped in making this book a reality. Much needed moral support and encouragement is provided on numerous occasions by my whole family. I wish to thank the Publisher and the entire team of Technical Publications who have taken immense pain to get this book in time with quality printing. Any suggestion for the improvement of the book will be acknowledged and well appreciated. Anther 2. A. Dhotre Dedreated to God i SYLLABUS Machine Learning [15CS73] Module - 1 Introduction : Well posed learning problems, Designing atearingsystem, Perspective ond ssues in Machine Learning. Concept Learning : Concept learning task, Concept learning as search, Find-S algorithm, Version space, Candidate Elimination algorithm, Inductive Bias. (Chapter - 1) Module - 2 Decision Tree Learning : Decision tree representation, Appropriate problems for decision tree Jearning, Basic decision tree learning algorithm, hypothesls space search in decision tree learning, Inductive bias in decision tree learning, Issues in decision tree learning. (Chapter - 2) Module - 3 Artificial Neural Networks : Introduction, Neural Network representation, Appropriate problems, Perceptrons, Backpropagation algorithm. (Chapter-3) Module - 4 Bayesian Learning : Introduction, Bayes theorem. Bayes theorem and concept learning, ML and LS error hypothesis, ML for predicting probabilities, MDL principle, Naive Bayes classifier, Bayeslan belief networks, EM algorithm. (Chapter - 4) Module - 5 Evaluating Hypothesis : Motivation, Estimating hypothesis accuracy, Basics of sampling theorem, General approach for deriving confidence intervals, Difference in error of two hypothesis, ‘Comparing learning algorithms. Instance Based Learning ; Introduction. k-nearest neighbor learning, locally weighted regression, radial basis function, cased-based reasoning. Reinforcement Learning : Introduction, Learning Task, Q Learning. (Chapter-s) om . Chapter-1 Introduction and Concept . Learning (1-4) to (1-14) 1.1 Well Posed fearing Problem. ea 11. Way Machine Learning v2 1.12 Application of ML... ded 12 13/ 4 is 16 uy 18 suCrtieS 1.1.21 Differences between Machine Leaming ' and Data Mining. .. 14 Designing aLeaming System :..:..........1-4 1.2.1 Training Experience. . ded 1.2.2 Choosing and Representing the Target Function . s1e5 1.23 Estimating Training Values ites 1.24 Adjusting the Weights... 15 Perspective and Issues in Machine Learning . 1-6 -6 Concept Learning : Concept Learning Task...” Concept Learning as Search’. Find-S Algorithm ....... ‘Version Space and Candidate Elimination Algorithm . . Inductive Bias . ‘Chapter - 2 Decision Tree.Learning 2h 22 23 (2-1) to (2-12) Introduction 2-2: 2.1.1 Decision Tree Representation . 12-2 Appropriate Problems for Decision Treo Leaming .... 2-4 Basic Decision Tree Learning Algorithm . 2-4 23.1 Which Attribute is "best"?.... 232 Infonpation Gain... .2-5 as | | ” 24 25 2.6 TABLE OF CONTENTS tre 2.3.3. ‘The ID3 Algorittim Hypothesis Space Search in Decision Tree Learning. ......- Inductive Bias in Decision Tree Leaming. ....2-9 Tesues in Decision Tree Learning .........+. 2.6.1 Avoiding Overfiting the Data Chapter-3 Artificial Neural Networks 3a 32 33 34 35 . “(3-1) to (3 - 16) Introduction . 322 3.1.1 Biological Motivation 23-2 ‘Neural Network Representation. . 113-3 3.2.1 NN Architecture ; Single Layer Network °3-3 3.23. Multilayer Feed Forward NN. 2326 3.22.1 Delta Leaming Rule for : Multipereeptron Layer . 23-8 ‘Appropriate Problems . 3-9 Perceptron . 39 34.1 ~ Single Layer Peroeptron:. 3-10 3-11 342 Multilayer Perceptron’ 343 "Lito of Lean Peoepon: ‘Linear Separability ‘Backpropagation Algorithm. 35.1 Gradient Descent Algorithm . 3.52 Implementation of AND, OR, XOR Function, .. Performance Issue in Errar Bick Propagation (EBP) 353 Chapter-4 Bayesian Learning 41 (4-1) to (4-12) Introduction ... 4-2 4.1.1 Bayes Theorem .. seseeeee Ad 4.2, Bayes Theorem and Concept Leaming...... 4-3 42.1 Brute-Force Bayes Concept Leaming ... 4-3 422 MAP Hypotheses and 43. MLandLS Brror Hypothesis ...... 43.1. Least Square Method ceeded 43.2 Maximum Likelihood . 4.4 ML for Predicting Probabilities 4-6 44.1 Gradient Search to Maximize Likelihood in a Neural Net.......2.++. 4-6 45 Minimum Description Length (MDL) Principle. 46 Naive Bayes Classifier 4,7 Bayesian Belief Networks ........... 4.8 EM Algorithm ... Chapter-5 Evaluating Hypothesis (6-1) to (6 - 20) SL Motivation ..... 5.2. Estimating Hypothesis Accuracy ..... 5.2.1 Sample Etror and True Error... 5.2.2 Confidence Intervals for Discrete-Valued Hypotheses, : 5.3 Basics of Sampling Theorem . 54 55 56 ~ $62 Paired t Test. 87 58 59 5.10 SAL 512 513 5.14 QLearning,... $3.1 Error Bimation and Etinting Blom Proportions. ....... 5 $3.2 Binomial Distribution . . 5.33 Mean and Variance .... 5.34 Estimators, Bias, and Variance . 5.3.5 . Confidence Intervals .... General Approach for Deviving Confidence Intervals ....:.- SAA Central Limit Theorem ... Difference in Error of Two Hypothesis. ......5-9 5.5.1 Hypothesis Testing .. -5+10- Comporing Learning Algorithms 5.6.1 Type-Tand Type - I Errors . 512 Instance Based Learning : Introduction ..... 5-13 K-nearest Neighbor Leaming. .............5-14 Locally Weighted Regression ........66./.5+15 59.1 Locally Weighted Linear Regression ...5-16 Radial Basis Function castes SHI coves SIT Leaming Task . . TECHNICAL PUBLICATIONS”. Anup thast or kraniodgo Contents 1.1. Well Posed Learning Problem 1.2 Designing a Learning System | 1.3. Perspective and Issues in Machine Learning, 1.4 Concept Learning 3 Concept Learning Task . 1.5. Concept Learning as Search. 1.6 Find-5 Algorithm Elimination Algorithm... | i | | 1.7 Version Space and Candidate | 1.8 Inductive Bias... Introduction and Concept Learning Syllabus Introduction : Well posed learning problems, Designing ‘a Learning system, Perspective and Issues in Machine Learning. Concept Learning: : Concept learning task, Concept learning as search, Find-S algorithm, Version space, Candidate Elimination algorithm, Inductive Bias. 1.1 | Well Posed Learning Problem + Definition : A computer program is said to lea from experience E with respect to some class of tasks T and performance measure P, if its Performance at tasks in T, as measured by P, improves with experience E. © A (machine learning) problem is well-posed if a solution to it exists, if that solution is unique, and if that solution depends on the data / experience but it is not sensitive to (reasonably small) changes in the data / experience. + Identify three features aré as follows : 1, Class of tasks 2, Measure of performance to be improved 3. Source of experience ‘+ What are T, P, E? How do we formulate a machine learning problem ? + A Robot Driving Learning Problem 1, Task T + Driving on public,” 4-lane highway using vision sensors 2. Performance measure P : Average distance traveled before an error (as judged by human overseer) 3. Training experience E : A sequence of images and steering commands recorded while observing a human driver. * A Handwriting Recognition Leaming Problem 1. Task T: Recognizing and classifying handwritten words within images 2. Performance measure P correctly classified 3. Training experience E : A database of handwritten words with given classifications « Text Categorization Problem 1. Task T : Assign a document to its content category 2. Performance measure P : Precision and Recall, 3. Training experience E : Example pre-classified documents Percent: of words 4.1.1 | Why Machine Learning «Machine Leaming (ML) is a sub-field of Artificial Intelligence (AI) which concems with developing 1-2 eect Cnet een computational theories of leaming and. building learning machines. Learning is a phenomenon and process which has ‘manifestations of various aspects. Learning process includes gaining of new symbolic knowledge and ‘development of cognitive skills throtigh instruction and practice. It is also discovery of new facts and theories through observation and experiment. “Machine Learning Definition : A computer - program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E, «Machine leaming is programming computers to optimize a performance criterion using example data or past experience. Application of machine learning methods to large databases is called data mining, ‘it is very hard to write programs that solve problems like recognizing a human face. We do not know what program to write because we don't know how our brain does it: Instead of writing a program by hand, it is possible to collect lots of examples that specify the correct output for a given input . +A. machine Jearning algorithm then takes these ‘examples and produces a program that does the job: ‘The program produced by the learning algorithm may look very different from a typical hand-written program. It may contain millions of numbers. If we do it right, the program works for new cases as ‘well as the ones we trained it on. «Main goal of machine learning is to’ devise earning algorithms that do the leaming automatically without human intervention or assistance. The machine leaming paradigm can be viewed as “programming by example" Another goal is to develop computational models of human learning process and perform computer simulations. «The goal of machine leaming is to build computer systems that can adapt and learn from their experience. + Algorithm is used to solve a problem on computer. ‘An algorithm is a sequence of instruction. It should carry out to transform :the input to output. For examplé, for addition of four numbers is carried out ¥ TECHNICAL PUBLICATIONS". An up tutor none Machine Learning 1 by giving four number as input to the algorithm and output is|sum-of all four numbers. For the same task, there may be various algorithms. It is interested ‘to find the most efficient one, requiring the least number of instructions or memory or both. + For some tasks, however, we do not have an algorithm. Why Is Machine Learning Important 7 + Machine learning algorithms can figure out how to perform important tasks by generalizing from examples. *Machine Learning provides business insight and intelligence. Decision makers are provided with greater insights into their. organizations. This adaptive techhology is being used by global enterprises to gain a competitive edge. “Machine learning algorithms discover the relationships between the variables of a system (input, output ‘and hidden) from direct samples of the system. + Following are some of the reasons : 1. Some tasks cannot be defined well, except by ‘examples. For example: recognizing people. 2. Relationships and correlations can be hidden within large amounts of data. To solve these problems, machine learning and data mining may be able to find these relationships. 3 | Fig. 11.4 TECHNICAL PUBLICATIONS” An up tat or nonodve Introduction and Concept Learning 3. Human designers often produce machines that do not work as well as desired in the environments in which they are used. 4, The amount of knowledge available about certain tasks might be too large for explicit ‘encoding by humans. 5. Environments change time to time, 6. New knowledge about tasks is constantly being discovered by’ humans, Machine learning, also helps us find -solutions of many problems in computer vision, speech recognition and robotics. Machine learning uses the theory of statistics in building mathematical models, because the core task is making inference from a sample. How Machines Learn ? ‘+ Machine learning typically follows three phases : 1. Training : A training set of examples of correct behavior is analyzed and some representation of the newly learnt knowledge is stored. This is some form of rules, 2 Validation :' The rules are checked and, if necessary, additional training is given: Sometimes additional test data are used, but instead, a human, expert may validate the rules, or some other’ automatic knowledge - based component may be used. The role of the tester is often called «the opponent. Machine Learning . 3. Application : The rules are used in responding to some new situation. 4.1.2 Application of ML ‘* Examples of successful applications of machine learning : 1, Leaming to recognize spoken words. 2. Learning to drive an autonomous vehicle. 3. Leaming to classify new astronomical structures. 4, Learning to play world-class backgammon. 5. Spoken” language understanding: within the context of a limited domain, determine the meaning of something uttered by a speaker to the extent that it can be classified into one of a fixed set of categories. . Face Recognition ‘+ Face recognition task is effortlessly and every day we recognize our friends, relative and family members. We also recognition by looking at the photographs. In photographs, they are in different pose, hair styles, background light, makeup and without makeup. ‘*We do it subconsciously and cannot explain how we do it. Because we can't explain how we do it, ‘we can't write an algorithm. ‘sFace has some structure, It-is not a random collection of pixel. It is symmetric structure. It contains predefined components like nose, mouth, eye, ears. Every person face is a pattem composed of a particular combination of the features. By analyzing sample face images of a person, a learning program captures the pattern specific to that person and uses it to recognize if a new real face or new image belongs to this specific person or not. * Machine learning algorithm creates an optimized model of the concept being leamed based on data or past experience. and Data Mining In machin faring the In data mining the main i. main gdal isto leam a goal is to discover new model, which é2n be used . interesting information ‘which describes the: to predict future events. Mlichine Maiiog wsee "até mining wees dimple [ relatively‘camplex and ~~ ba al pao i _ global models. F ony undzeds or data sets, even Grotainde ce oamples ia | al millions of rows. To iden Sumoc foi =) Te.tnd tl iueesing Sei ded mils - peters wich dose this can.be-used to predict’ the data set” *=- > future events. Designing a Learning System’ * We discuss basic design issues and approaches to machine learning. *Goal : Design a system to leam how to play checkers and enter it into the world checkers toumament. 1) Choose the training experience 2) Choose the target function 3) Choose a representation for the target function 4) Choose a function approximation algorithm 4.2.4 | Training Experience «How training experience influences performance goal? 1. Type of feedback : Direct vs Indirect. 2, Leaming strategy : Have a teacher or not 2° Exploration vs Exploitation ? 3. Diversity of training : Is the training data representative of the task ? How many peers should we play with? How many tactics should we try when playing with self ? * eLet us decide that our program will leam by playing with itself and formulate the learning | problem TECHCAL PUBLICATIONS". An up rs fr krowiege 7 * Choosing the training experience : 1. Direct or inditect feedback 2. Degree of leamer’s control 3. Representative distribution of examples ‘Learning Goal is to : Define precisely a class of problems that forms interesting forms of learning, explore» algorithms to solve such problems, understand fundamental structure of learning problems and processes. + Design choice 1: The problem for selecting type of training experience from which our system will earn. Direct training examples. Just’ a bunch of board states together with a correct move. ‘Design: Choice 2 : Indirect training. A bunch of recorded games, where the correctness of the moves is inferred by the result of the game. sLeaming is most reliable when the training examples follow a distribution similar to that of future test examples, 1.2.2 | Choosing and Representing the Target Funetion * ‘It determines exactly what type of knowledge will be Ieamed and how this will be used’ by the performance program. * Choosing a representation for the target function : 1, Expressive representation for a close function approximation - 2. Simple representation for simple training data and learning algorithms VQ) = Wo +W1Xy Ha HWE Xs X12 + Number of black/red pieces on the board * Xq4 : Number of black/red kings on the board X,¢ : Number of black/red pieces threatened can be captured on red/black next turn) * Consider the chess board ‘program. + ChooseMove : B -> M where B is any legal board state and M is a legal move (hopefully the “best” legal move) + Alternatively, function V : B ->'9 which maps from B to some real jvalue where higher scores are assigned to better board states. | | TECHNICAL PUBLICATIONS” An up trust fr hnewiedoe Introduction and Concept Learning *Now use the legal moves to generate every subsequent boatd state and use V to choose the best one and therefore the best legal move. 1. Vb) = 100, if b is a final board state that is won "2. V(b) = -100, if b is a final board state that is lost 3. V(b) = 0, if is a final board state that is a draw 4, V(b) = V(b), if b is not a final state where b’ is, the best final board state starting from b assuming both players play optimally ‘+ While this recursive definition specifies a value of V(b) for every board state b, this definition is not usable by our checkers player because it is not efficiently computable. ._ * For representation _ 1. Use a large table ‘with an entry specifying a value for each distinct ‘board state. 2. Collection of rules that match against features of the board state. 3. Quadratic polynomial function of predefined « board features. [4.23] Estimating Training Values + Need to assign specific scores to intermediate board states. + Approximate intermediate board state b using the learner's current approximation of the next board state following b Vecain)(b) V (Successor (b)) «Simple and successful approach. + More accurate for states closer to end states. 4.2.4 | Adjusting the Weights + Choose the weights w; to best fit the set of training examples. «Minimize the squared error B between the train values and the values predicted by the hypothesis Es LT Werain&) V(b)? (&,Vtrain())€ training examples + Require an algorithm that will increméntally refine weights as new training examples become available and it will be robust to errors in these estimated training values. « Least Mean Squares (LMS) is one suchi algorithm. [al 1s _ Introduction and Concept Perspective and Issues in Machine Learning Issues in Machine Learning : + What learning algorithms to be used ? ‘© How much training data is sufficient ? » When and how prior knowledge can guide the learning process ? » What is the best strategy for choosing a next training experience ? + What is the best way to reduce the learning task to one or more function approximation problerns ? + How can the leamer automatically alter its representation to improve its learning ability 7 . Concept Learning : Concept Learning Task + Inducing general. functions from specific training examples is a main issue of machine leaning. . ‘= Concept Learning: Acquiring the definition of a general category from given sample positive and negative training examples of the category. + Concept Learning can be seen as a problem of searching through a predefined space of potential hypotheses for the hypothesis that best fits the training examples. +The hypothesis space has a general-to-specific ordering of hypotheses, and the search can be efficiently organized by taking advantage of a naturally occurring structure over the hypothesis space. + Formal Definition for Concept Learning: Inferring a boolean-valued function from training examples of its input and output. . + Ax comple for éonorpHeming is te leing of bstorer fom th glen exiles of nds (positive examples) and non-birds (negative examples), + We are trying to leam the definition of a concept from given examples. + Concept learning involves determining a mapping from a set of input variables to a Boolean value. Such methods are known as inductive learning methods. ‘Jf a function can be found which maps training data to correct classifications; then it will also work well for unseen data. This process is known as generalization. «Example : Leam the "days on which my friend enjoys his favorite water sport" +A set of example days, and each is described by six attributes. The task is to learn to predict the _value of EnjoySport for arbitrary day, based on the values of its attribute values. Machine Learning. «The inductive | learning hypothesis : Any hypothesis found |to approximate the target function well over a sulffcently large set of training examples will also approximate the target function well over other uhobserved examples. «Although the learning task is to determine a hypothesis ( h) identical to the target concept cover the entire set of instances ( X), the only information available about c is its value over. the training examples. « Inductive learning ‘algorithms can at best guarantee that the output Hypothesis fits the target concept over the training data, _ «Lacking any further information, ou assumption is that the best hypothesis regarding unseen instances is the hypothesis that best fits the observed training data, This is the fundamental assumption of inductive leaning, ‘+ Hypothesis representation (constraints on instance’ attributes) : 1. Any value is acceptable is represented by ? 2. No value is acceptable is represented by ® 1.5 | Concept Learning as Search + Concept’ leaming can be viewed as the task of searching through a large space of hypotheses implicitly defined by the hypothesis tepresentation. + The goal of this search is to find the hypothesis that best fits the training examples. «By selecting a hypothesis representation, the designer of the learning algorithm implicitly defines the space of all hypotheses that the program can ‘ever represent and therefore can ever learn. +A hypothesis is a vector of constraints for each attribute: : 1. Indicate by a '7" that any value is acceptable for this attribute 2. Specify a single required value for the attribute 3. Indication by a "O” that no value is acceptable “If some instance x satisfies all the constraints of hypothesis h, then h classifies x as a positive example (h(x) = 3); - 1 Introduction and Co + Example sypothesis for EnjoySport concept learning task «Given Instances X: Possible days, each described by the attributes Sky (with possible values Sunny, Cloudy, and Rainy) AirTemp (with values Warm and Cold) Humidity (with values Normal and High) ‘Wind (with values Strong and Weak) ‘Water (with values Warm and Cool}, and Forecast (with values Same end Change) ‘Hypothesis H: Zach hypothesis is described by conjunction of constraints on the attributes. The constraints may be", "2", or a specific value ‘Target concept c: EnjoySport : X—> {0,1} ‘Training Examples D: Positive or negative examples of the target function « Determine : A hypothesis h in H such that 1G) = e(x) for all x in X *Search through a large space of hypothesis implicitly defined by the hypothesis répresentation. Find the hypothesis that best fits the training examples. ‘+ How big is the hypothesis space? In EnjoySport six attributes: Sky has 3 values, and the rest have 2. «+ How many distinct instances ? + How many hypothesis 7 + By selecting a hypothesis representation, the designer of the learning algorithm implicitly defines the. space of all hypothesis the program can ever represent and therefore can ever learn. Tristances EnjoySport. 3*2*2 28.22 = 96 Hypothesis: 5*4*4°4*4"4 = 5120 4°9°34O"R341 = O78 * This is a very simple learning task. Most practical learning tasks’ involve much larger, sometimes infinite, hypothesis spaces. General-to-Specific Ordering of Hypotheses + Many algorithms for concept learning organize the search through the hypothesis spaces by relying on 2a general-to-specific ordering of hypotheses. + By taking advantage of this naturally occurring structure over the hypothesis space, we can design learning algorithms: that exhaustively search eveh infinite hypothesis ~ without explicitly Machine Learning _ . 1-8 Introduction and Concept Learning + Considér two hypotheses : hy = (Sunny, 2, ?, Strong, 2, 2) hg = (Gunny, ?, ?, 2, 22) + Now consider the sets of instances that are classified positive by h and by h. Because hy imposes fewer constraints on the instance, it classifies more ‘instances as positive. . + In fact, any instance classified positive by hy will also be classified positive by h2. Therefore, we say that ha is more general than hy. + Qne learning method is to determine the most specific hypothesis that matches all the training data. + More-General-Than-Or-Equal Relation : Let hy and hz be two boolean-valued functions ‘defined over X. Then hy is more-general-than-or-equalto hz (written hy 2 ha). Jf and only if any instance that satisfies hy also satisfies hy. ‘hy is more-general-than hy ( hy > hp) if and only if hy 2h is true and hy = hy is false. We also say hy is more-specific-than hy. hy 2 hy if Vxe Xb, (x) = 1 hye 1 =: ON thy = | a © Fig. 1.8.4 Find-S Algorithm + FINDS Algorithm starts from the most specific hypothesis and generalize it by considering only positive examples. + This algorithm ignores negative examples. As long as the hypothesis space contains a hypothesis that describes the true target concept, and the training data contains no errors, ignoring negative examples does not cause to any problem. ‘«FIND'S algorithm finds the most specific hypothesis within H that is consistent with the positive training examples. + The-final hypothesis will also be consistent with negative examples if the correct target concept is in H, and the training examples are correct. « h = <0, 0, 6, 0, 0, O> h = h = Sunny, Warm, ?, Strong, Warm, Same> hh = Algorithm : . + Initialize h to the most specific hypothesis in H : + For each attribute training instance x : For each attribute constraint a inh : If the constraint is not satisfied by x. Then replace a, by the next more general constraint satisfied by x. ‘+ Output hypothesis h _AirTeinp’ . Humidity Sky Sunriy Sunny Rainy Sunny + The output hypothesis is the most specific one that satisfies all positive training examples. + The result is consistent with the positive training examples. + Is the result is consistent with the negative training examples ? h = «Sunny, Warm, ?, Strong, ?, > TECHNICAL PUBLICATIONS". An vp Brust fr brow ‘Machine Leaming The result is consistent with the negative training examples if the target concept is contained in H (and the training examples are correct). *Sizes of the space: © Size of the instance space : |X|= 3.22222 = 96 © Size of the concept space C =2!*! = 2% © Size of the hypothesis space H = (433.333) + 1 = 973 << 2%, = The largest concept (in C) may not be contained in x ‘Questions : 1. Has the learner converged to the target concept, as there can be several consistent hypotheses (with both positive and negative training examples) ? F Why the most specific hypothesis is preferred 7 What if there are several maximally specific consistent hypotheses 7 4. What if the training examples are not correct ? [2:7] Version Space and Candidate Elimination Algorithm «Version space : a set of all hypotheses that are consistent with the training examples. « The version space, denoted VS_H,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D. * A. version space is a hicrarchical representation of knowledge that enables you to keep track of all the ‘useful information supplied by a sequence of earning examples without remembering any of the examples. «The version space method is ‘a concept learning process accomplished by managing multiple models within a version space. Version Space Characteristics 1. Tentative heuristics are represented using version spaces. 2. A version space represents all the alternative plausible descriptions of a heuristic. 3. A plausible description is one that is applicable to all known positive examples and no known negative example. ¥ 110 Introduction and Concept Learning i 4, A version space description. consists of two complementary trees: i. “One that contains nodes connected to overly general models, and i, One that contains nodes connected to overly specific models. 5. Node values/attributes are discrete. Fundamental Assumptions + The data is correct; there are no erroneous instances. + A correct description is a conjunction of some of the attributes with values. Diagrammatical Guidelines There is a generalization tree and a specialization tree. Each node is connected to a model. + Nodes in the generalization tree are.connected to a model that matches everything in its subtree. + Nodes in the specialization tree are connected to a model that matches only one thing in its subtree. *Links between nodes and their models denote ‘generalization relations in a generalization tree, and specialization relations in a specialization tree. ‘A Compact Representation for Version Spaces + Instead of enumerating all the hypotheses consistent with a training set, we can represent its most specific and most general boundaries. The hypotheses included in-between these two boundaries can be generated as needed. + Definition : The general boundary (G) with respect to hypothesis space H and training data D, is the set of maximally general members of H consistent with D. * Definition : The specific boundary ($) with respect to hypothesis space H and training data D, is the set of minimally general (ie, maximally specific) members of H consistent with D. ‘Most general . hypothesis A aan ° A ‘ j A + Examples Most specific hypothesis General and specific hypothesis zation must be a generalization of some specific concept description. No specialization can be » specication of another general concept description. Fig. 17.2 shows boundary set with hypothesis Most general fypotiess Advantage of the version space method : 1. Can describe all the possible hypotheses in the language consistent with the data. 2. Fast (close to linear). Disadvantages of the version space mothod : 1. Inconsistent data (noise) may cause the target concept to be pruned. OWost specific 2 Learning disjunctive concepts is challenging. Fig 1.7.2 : Boundary sot with hypothesis List-Then-Eliminate Algorithm + List-Then-Hliminate algorithm initializes the version space to contain all hypotheses in H, then eliminates any hypothesis fouind inconsistent with any training example. «The version space-of candidate hypotheses thus shrinks as more examples are obsérved, ‘unt ideally just one hypothesis remains that is consistent with all the observed examples. «If insufficient data is available to narrow the version space to a single hypothesis, then the algorithm can output the entire Set of hypotheses consistent with the observed data. ‘« List-Then-Eliminate algorithm can be applied whenever the hypothesis space H is finite. It has many advantages, including the fact that it is guaranteed to output all hypotheses consistent with the training data. «Unfortunately, it requires exhaustively enumerating all hypotheses in H - an unrealistic requirement for all but the most trivial hypothesis spaces. Candidate Elimination Algorithm +The candidate-Elimination algorithm computes the version space containing all (and only those) hypotheses from H that are consistent with an observed sequence of training examples. «Example : Learning the concept of "Japanese Economy Car" + Features : Country of Origin, Manufacturer, Color, Decade, Type ‘Machine Leaming . im = Introduction and Cancopt Learning ‘Positive Example 1: (Japan, Honda, Blue, 1980, | Method : Economy) “Initialize G, the set of maximally general + lnitialize G to a singleton set that includes everything. G = {0,2,2,7,0) « Initialize S to a singleton set that includes the first positive example. S = { Gapan, Honda, Blue, 1980, Economy) } + Negative Example 2 : Japan, Toyota, Green, 1970, Sports) + Specialize G to exclude the negative example. G = { @, Honda, 2, 2, 2), (@, 2, Blue, ?, 2) 2, % ?, 1980, 2), @, % 2, Economy) } S$ = { Gapan, Honda, Blue, 1980, Economy)'} + Positive Example 3 : Japan, Toyota, Blue, 1990, Economy) Prune G to exclude. descriptions inconsistent with the positive example, G = { (2, 2, Blue, 2, 7), (2, 2, 2, 2, Economy) ) + Generalize $ to include the positive example : $= { Gapan, ?, Blue, ?, Economy) } + Negative Example : (USA, Chrysler, Red, 1980, Economy) ‘+ Specialize G to exclude the negative example (but stay consistent with S) G = (G2, Blue, 2, 7), Japan, 2, 2,2, Economy) } S = { Gapan, ?, Blue, ?, Economy) } Negative Example : (Japan, Honda, Red, 1990, Economy) + Example is inconsistent with the version-space. G cannot be specialized. S cannot be generalized. + The version space collapses. + Conclusion : No conjunctive hypothesis is consistent- with the data set Algorithm for Candidate-Elimination : Given : . + A representation language. +A set of positive and negative examples expressed in that language. «Compute : A concept description that is consistent with all the positive examples and none of the negative examples. 7 ‘TECHNICAL PUBLICATIONS”. Am op trust or rovlesae hypotheses, to contain one element: the null description (all features are variables). + Initialize 8, the set of maximally specific hypotheses, to contain one element: the first positive example. * Accept a new training example. «If the example is positive : 1. Generalize all the specifié models to match the positive example, but ensure the following : # The new specific models involve minimal changes. «Each, new specific model is a specialization of some general model. + No new specific model is a generalization of some other specific madel. : 2, Prune away all the general models that fail to match the positive example, + If the example is negative : 1. Specialize all general models to prevent match with the negative example, but ensure the following: © The new general models involve minimal . © Each new general model is a generalization of some, specific model. © No new general model is a specialization of some other general model. 2. Prune away all the specific models that match the negative example. ‘If S and G are both singleton sets, then: * if they ate identical, output their value and halt. | if they are different, the training ‘cases were inconsistent. Output this result and halt. | else continue accepting new training examples. The algorithm stops when : | 1 stsiam out of dew, 2. The number of hypotheses remaining is = 0 - no consistent description for the data in the, language. 1 - answer (version space converges). 2+ - all descriptions in the language are implicitly included.

You might also like