Data Mining-Unit-3

data mining notes

Uploaded by

lokeshappalaneni9

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

36 views

Data Mining-Unit-3

data mining notes

Uploaded by

lokeshappalaneni9

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 16

UNIT-1 LECTURE 1 n? ‘What is classification and predi Data classification is a two step process. In the fi predetermined t step, a model is built describing a Set of data classes or concepts. The model is constructed by analyzing database tuples described by attributes. Each tuple is assumed to belong to a predefined class, as determined by one of the attributes, called the class label attribute. In the context of classification, data tuples are also referred to as samples, examples, or objects. The Data tuples analyzed to build the model collectively form the training data set. The individual tuples making up the training set are referred to as training samples and are randomly selected from the sample population. Since the class label of each training sample is provided, this step is also known as supervised learning (ic., the Learning of the model is ‘supervised! in that itis told to which class each training sample belongs). It contrasts with unsupervised learning (or clustering), in which the class labels of the training samples are not known, and the Number or set of classes to be learned may not be known in advance. Typically, the learned model is represented in the form of classification rules, decision trees, or mathematical formulae. For example, given a database of customer credit information, classification rules can be learned to identify customers as having either excellent or fair credit ratings Figure below. The rules can be used to categorize future data samples, as well as provide a better understanding of the database contents In the second step is shown in fig, the model is used for classification. First, the predictive accuracy of the model is estimated. The holdout method is a simple technique which uses a test set of class-labeled samples. These samples areClassification Process 4 Construction (1): Model Classification Algorithms 4 Mike [Assistant Prof | 3 no Mary |Assistant Prof 7 yes Bill Professor 2 yes Jim [Associate Prof} 7 yes Dave |Assistant Prof | 6 no [Anne [Associate Prof] 3 no IF rank = ‘professor’ OR years > 6 THEN tenured = ‘yes’Classification Process (2): Use the Model in Prediction Es a Tom |AssistantProt | 2 no Merlisa [Associate Prof 7 no George|Professar é yes Joseph [AssistantProft[ 7 yes Figure 7.1: The data classification process: a)Leaming: Training data are analyzed by a classification algorithm. Here, the class label attribute is credit rating, and the learned model or classifier is represented in the form of classification rules, b) Classification: Test data are used to ceptable, the rules can be applied to the classification of new data tuples randomly selected and are estimate the accuracy of the classification rules. If the accuracy is considered a independent of the training samples. The accuracy of a model on a given test set is the percentage of test set samples that are correctly classified by the model. For each test sample, the known class label is compared with the leamed model's class prediction for that sample. Note that if the accuracy of the model were estimated based on the training data set, this estimate could be optimistic since the learned model tends to over fit the data (that is, it may have incorporated some particular anomalies of the training data which are not present in the overall sample population). Therefore, a test set is used, If the accuracy of the model is considered acceptable, the model can be used to classify future data tuples or objects for which the class label is not known. (Such data are also referred to in the machine learning literature as “unknown" or “previously unseen" data). For example, theclassification rules learned in Figure 7.1a from the analysis of data from existing customers can be used to predict the credit rating of new or future (i.e., previously unseen) customers. “How is prediction different from classification?” Prediction can be viewed as the construction and use of a model to assess the class of an unlabeled object, or to assess the value or value ranges of an attribute that a given object is likely to have. In this view, classification and regression are the two major types of prediction problems where classification is used to predict discrete or nominal values, while regression is used to predict continuous ot ordered values. In our view, however, we refer to the use of predication to predict class labels as classification and The use of predication to predict continuous values (e.g., using regression techniques) as prediction. This view is commonly accepted in data mining Lecture? Issues regarding classification and prediction Preparing the data for classification and prediction: The following preprocessing steps may be applied to the data in order to help improve the accuracy, efficiency, and scalability of the classification or prediction process. 1, Data cleaning, This refers to the preprocessing of data in order to remove or reduce noise (by applying smoothing techniques, for example), and the treatment of missing values (e.g., by replacing a missing value with the most commonly occurring value for that attribute, or with the most probable value based on statistics). Although most classification algorithms have some mechanisms for handling noisy or missing data, this step can help reduce confusion during learning, 2. Relevance analysis. Many of the attributes in the data may be irrelevant to the classification or prediction task. For example, data recording the day of the week on which a bank loan application was filed is unlikely to be relevant to the success of the application. Furthermore, other attributes may be redundant. Hence, relevance analysis may be performed on the data with the aim of removing any irrelevant or redundant attributes from the learning process. In machine learning, this step is known as feature selection. Including such attributes may otherwise slow down, and possibly mislead, the learning step Ideally, the time spent on rel e analysis, when added to the time spent on learning from the resulting “reduced” feature subset, should be less than the time that would have been spent on learning from the original set of features. Hence, such analysis can help improve classification efficiency and scalability. ani3. Data transformation. The data can be generalized to higher-level concepts. Concept hierarchies may be used for this purpose, This is particularly useful for continuous-valued attributes. For example, numeric values for the attribute income may be generalized to discrete ranges such as low, medium, and high. Similarly, nominal-valued attributes, like street, can be generalized to higher-level concepts, like city. Since generalization compresses the original training data, fewer input/output operations may be involved during learning. The data may also be normalized, particularly when neural networks or methods involving distance measurements are used in the learning step. Normalization involves scaling all values for a given attribute so that they fall within a small specified range, such as -1.0 to 1.0, or 0 to 1.0. In methods which use distance measurements, for example, this would prevent attributes with initially large ranges (like, say income) from outweighing attributes with initially smaller ranges (such as binary attributes). Comparing classification methods. Classification and prediction methods can be compared and evaluated according to the following criteria: 1, Predictive accuracy. This refers to the ability of the model to correctly predict the class label of new or Previously unseen data. 2. Speed. This refers to the computation costs involved in generating and using the model. 3. Robustness. This is the ability of the model to make correct predictions given noisy data or data with missing Values, 4, Scalabil amounts of data. ity. This refers to the ability of the learned model to perform efficiently on large 5. Interpretability. This refers is the level of understanding and insight that is provided by the learned model.Lecture 3 Classification by Decision Tree Induction: A decision tree is a flow-chart-like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and leaf nodes represent classes or class distributions. The topmost node in a tree is the root node. A typical decision tree is shown in Figure 7.2. It represents the concept buys computer, that is, it predicts whether or not a customer at All Electronics is likely to purchase a computer. Internal nodes are denoted by rectangles, and leaf nodes are denoted by ovals. Algorithm for Decision Tree Induction: Basic algorithm (a greedy algorithm) 1. Tree is constructed in a top-down recursive divide-and-conquer manner 2. At start, all the training examples are at the root 3. Attributes are categorical (if continuous-valued, they are discretized in advance) 4, Examples are partitioned recursively based on selected attributes 5, Test attributes are selected on the basis of a heuristic or stati information gain) ical measure (e.g., Conditions for stopping partitioning 1. All samples for a given node belong to the same class, 2. There are no remaining attributes for further partitioning — majority voting is employed for classifying the leaf 3. There are no samples left Attribute Selection Measure: 1. Information gain (ID3/C4.5) All attributes are assumed to be categorical Can be modified for continnous-valued attributes Select the attribute with the highest information gain Assume there are two classes, P and NLet the set of examples $ contain p elements of class P and n elements of class N The amount of information, needed to decide if an arbitrary example in S belongs to P or N is defined as 1(p.n) =-—2e, pen 2. Gini index (IBM Intelligent Miner) All attributes are assumed continuous-valued Assume there exist several possible split values for each attribute May need other tools, such as clustering, to get the possible split values Can be modified for categorical attributes Ifa data set T contains examples from n classes, gini index, gini (I) is defined as Where p, is the relative frequency of class j in T Ifa data set T is split into two subsets T; and T2 with sizes N; and Nz respectively, the gini index of the split data contains examples from n classes, the gini index gini (T) is defined as SINI spi (7) Nisin (r)+N2 y er) The attribute provides the smallest ginigpii(T) is chosen to split the node (need to enumerate all possible splitting points for each attribute).Lecture 4 Bayesian Classification “What are Bayesian classifiers"? Bayesian classifiers are statistical classifiers. They can predict class membership probabilities, such as the probability that a given sample belongs to a particular class. Bayesian classification is based on Bayes theorem, described below. Studies comparing classification algorithms have found a simple Bayesian classifier known as the naive Bayesian classifier to be comparable in performance with decision tree and neural network classifiers. Bayesian classifiers have also exhibited high accuracy and speed when applied to large databases. Naive Bayesian classifiers assume that the effect of an attribute value on a given class is independent of the values of the other attributes. This assumption is called class conditional independence. It is made to simplify the computations involved, and in this sense, is considered “naive. Bayesian belief networks are graphical models, which unlike naive Bayesian classifiers allow the representation of dependencies among subsets of attributes. Bayesian belief networks can also be used for classification, Bayesian Theorem Given training data D, posteriori probability of a hypothesis h, P{h/D) follows the Bayes theorem, PID|h)P(h) PalD)=“Ses MAP (maximum posteriori) hypothesis h rgmax P(h|D)=argmax P(D|h)P(h). he, heH MAP Practical difficulty: require initial knowledge of many probabilities, significant computational cost Naive Bayes Classifier simplified assumption: attributes are conditionally independent: PCI) © PCN LPC) Greatly reduces the computation cost, only count the class distribution. Given a training set, we can compute the probabilitiesOutlook | P| N| [Humidity P | N sunny 2/9 | 3/5 [high 3/9 | 4/5 overcast 4/19 | 0 normal | 6/9 | 1/5 rain 3/9 | 2/5 hot 2/9 | 2/5 true 3/9 | 3/5 mild 4/9 | 2/5 false 6/9 | 2/5 cool 3/9 _| 1/5 fssing a-posteriori probabitit es P(C|X) = prob. that the sample tuple X= is of elass C E.g. P(class=N | outlook=sunny ,windy-true,...) Idea: assign to sample X the class label C such that P(C|X) is maximal Estimating a-posteriori probabilities Bayes theorem P(CIX) = P(XIC)-P(C) / POX) P(X) is constant for all classes P(C) = relative freq of class C samples c such that PCIX) is maximum, = C such that P(X|C)-P(C) is maximum Problem: computing P(X{C) is unfeasible Naive Bayesian Classification: Naive assumption: attribute independence POX XKIC) = POG IC)... PCRIC) If i-th attribute is categorical: P(xi{C) is estimated as the relative freq of samples having value x: as i-h attribute in class CIf i-th attribute is continuous: P(x{C) is estimated thru a Gaussian density function Computationally easy in both cases Lecture 5 Classification by Back propagation A multilayer feed-forward neural network The backpropagation algorithm performs learning on a multilayer feed-forward neural network ‘An example of such a network is shown in fig below. The inputs correspond to the attributes measured for each training sample. The inputs are fed simultaneously into a layer of units making up the input layer. The weighted outputs of these units are, in turn, fed simultaneously to a second layer of “neuron-like” units, known as a hidden layer. The hidden layer's weighted outputs can be input to another hidden layer, and so on. The number of hidden layers is arbitrary, although in practice, usually only one is used. The weighted outputs of the last hidden layer are input to units making up the output, layer, which emits the network's prediction for given samples. The units in the hidden layers and output layer are sometimes referred to as neurodes, due to their symbolic biological basis, or as output units. The multilayer neural network shown in Figure 7.9 has two layers of output units, Therefore, we say that it is a two-layer neural network. Similarly, a network containing two hidden layers is called a three-layer neural network, and so on. The network is feed-forward in that none of the weights cycle back to an input unit or to an output unit of a previous layer. It is fully connected in that each unit provides input to each unit in the next forward layer. Multilayer feed-forward networks of linear threshold functions, given enough hidden units, can closely approximate any function. Neural Networks: 1. Advantages prediction accuracy is generally high robust, works when training examples contain errors, output may be discrete, real-valued, or a vector of several discrete or real-valued attributes fast evaluation of the leamed target function2.Criticism Jong training time difficult to understand the leaned function (weights) not easy to incorporate domain knowledge Neuron output y Input weight weighted Activation vectorx vector w sum function Lecture 6 Classification Based on Concepts from Association Rule Mining Several methods for association-based classification ARCS: Quantitative association mining and clustering of association rules (Lent et al’97) It beats C4.5 in (mainly) scalability and also accuracyAssociative classification: (Liu et al’98) It mines high support and high confidence rules in the form of “cond_set => y”, where y is a class label CAEP (Classification by aggregating emerging patterns) (Dong et al’99) Emerging patterns (EPs): the item sets whose support increases significantly from one class to another Mine Eps based on minimum support and growth rate Lecture 7 Other Classification Methods 1. k-nearest neighbor classifier Al instances correspond to points in the n-D space. The nearest neighbor is defined in terms of Euclidean distance. ‘The target function could be diserete- or real- valued. For discrete-valued, the k-NN returns the most common value among the k training Examples nearest to xq. Vonoroi diagram: the decision surface induced by I-NN for a typical set of training examples, 2. case-based reasoning Also uses: lazy evaluation + analyze similar instances Difference: Instances are not “points in a Euclidean space” Example: Water faucet problem in CADET (Sycara et al’92) Methodology Instances represented by rich symbolic descriptions (e.g., function graphs) Multiple retrieved cases may be combinedTight coupling between case retrieval, knowledge-based reasoning, and problem solving Indexing based on syntactic similarity measure, and when failure, backtracking, and adapting to additional cases 3. Genetic algorithm GA: based on an analogy to biological evolution Each rule is represented by a string of bits ‘An initial population is created consisting of randomly generated rules e.g, IF A: and Not Aa then C2 can be encoded as 100 Based on the notion of survival of the fittest, a new population is formed to consists of the fittest rules and their offsprings The fitness of a rule is represented by its classification accuracy on a set of training examples Offsprings are generated by crossover and mutation. 4, Rough set approach: Rough sets are used to approximately or “roughly” define equivalent classes A rough set for a given class C is approximated by two sets: a lower approximation (certain to be in C) and an upper approximation (cannot be described as not belonging to ° Finding the minimal subsets (reducts) of attributes (for feature reduction) is NP-hard but a discernibility matrix is used to reduce the computation intensity. 5. Fuzzy set approaches: Fuzzy logic uses truth values between 0.0 and 1.00 to represent the degree of membership (such as using fuzzy membership graph) Attribute values are converted to fuzzy valuese.g, income is mapped into the discrete categories {low, medium, high} with fuzzy values calculated For a given new sample, more than one fuzzy value may apply Each applicable rule contributes a vote for membership in the categories Typically, the truth values for each predicted category are summed Lecture 8 Prediction and Classifier Accuracy: The prediction of continuous values can be modeled by statistical techniques of regression. For example, we may like to develop a model to predict the salary of college graduates with 10 years of work experience, or the potential sales of a new product given its price. Many problems can be solved by linear regression, and even more can be tackled by applying transformations to the variables so that a nonlinear problem can be converted to a linear one. Regress Analysis and Log-Linear Models in Prediction = Linear regression: Y = a + 6X = Two parameters , « and B specify the line and are to be estimated by using the data at hand. = using the least squares criterion to the known values of Yi, Y2, ..., Xt, X2, = Multiple regression: Y = b0 + bt X1 + b2 x2. = Many nonlinear functions can be transformed into the above. = Log-linear models: = The multi-way table of joint probabilities is approximated by a product of lower-order tables. = Probability: ofa, b, ¢ d) = aabBacvad ObedPrediction: Numerical Data Prediction: Categorical DataEien ger ter see toe cial Oeste) Toke of Ll =) en =e el Sef ST] is] emanate rot “ Ill 366.00-420.00 J 205.00-1000.00 lj 1260.00-6008.00 ™ ous i ie |i 42000-20500 113000-128000 ee me ‘Camping Chain 60 Outet Independent 4 be "| : Pee 4 Mass Marketer Sports Chain 3 (sami a0 Go sean hee siete 23h can, sstimating Error Rates: Partition: Training-and-testing ‘use two independent data sets, e.g, training set (2/3), test set(1/3) used for data set with large number of samples alidation divide the data set into k subsamples use k-1 subsamples as training data and one sub-sample as test data —- k-fold cross-validation for data set with moderate size Bootstrapping (leave-one-out) for small size data,

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6388)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (634)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
4/5 (1160)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (983)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4/5 (8302)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (633)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1254)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4/5 (10337)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (933)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (887)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1007)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4/5 (3237)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (581)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (297)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5058)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4346)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (458)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
3.5/5 (2091)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (1993)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (278)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2283)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1077)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2780)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2032)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2839)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (692)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (1912)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4087)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (76)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (830)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (906)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2544)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M L Stedman
4.5/5 (813)
Unit 1 MWS
No ratings yet
Unit 1 MWS
22 pages
Unit-5 Fom
No ratings yet
Unit-5 Fom
29 pages
Unit-4 Fom
No ratings yet
Unit-4 Fom
44 pages
Unit-1 Fom
No ratings yet
Unit-1 Fom
39 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (277)

Data Mining-Unit-3

Uploaded by

Data Mining-Unit-3

Uploaded by

You might also like