0% found this document useful (0 votes)

2 views

Unit-III Classification

Data mining

Uploaded by

Anjani A Uttarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Unit-III Classification

Data mining

Uploaded by

Anjani A Uttarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Unit-III

3.1 Classification
Classification and prediction are two forms of data analysis that can be used to extract models
describing important data classes or to predict future data trends.

Classification predicts categorical (discrete, unordered) labels, prediction models continuous

valued functions.

For example, we can build a classification model to categorize bankloan applications as either
safe or risky, or a prediction model to predict the expenditures of potential customers on
computer equipment given their income and occupation.

A predictor is constructed that predicts a continuous-valued function, or ordered value, as

opposed to a categorical label.

Regression analysis is a statistical methodology that is most often used for numeric
prediction.

Many classification and prediction methods have been proposed by researchers in machine
learning, pattern recognition, and statistics.

Most algorithms are memory resident, typically assuming a small data size. Recent data
mining research has built on such work, developing scalable classification and prediction
techniques capable of handling large disk-resident data.

3.1.1 Issues Regarding Classification and Prediction:

1.Preparing the Data for Classification and Prediction:

The following preprocessing steps may be applied to the data to help improve the accuracy,
efficiency, and scalability of the classification or prediction process.
(i)Data cleaning:
This refers to the preprocessing of data in order to remove or reduce noise (by applying
smoothing techniques) and the treatment of missing values(e.g., by replacing a missing value
with the most commonly occurring value for that attribute, or with the most probable value
based on statistics).
Although most classification algorithms have some mechanisms for handling noisy ormissing
data, this step can help reduce confusion during learning.

(ii)Relevance analysis:
Many of the attributes in the data may be redundant.

Correlation analysis can be used to identify whether any two given attributes
arestatisticallyrelated.

For example, a strong correlation between attributes A1 and A2 would suggest that one ofthe
two could be removed from further analysis.

A database may also contain irrelevant attributes. Attribute subset selection can be usedin
these cases to find a reduced set of attributes such that the resulting probabilitydistribution of
the data classes is as close as possible to the original distribution obtainedusing all attributes.

Hence, relevance analysis, in the form of correlation analysis and attribute subsetselection,
can be used to detect attributes that do not contribute to the classification orprediction task.

Such analysis can help improve classification efficiency and scalability.

(iii)Data Transformation And Reduction

The data may be transformed by normalization, particularly when neural networks ormethods
involving distance measurements are used in the learning step.

Normalization involves scaling all values for a given attribute so that they fall within a small
specified range, such as -1 to +1 or 0 to 1.

The data can also be transformed by generalizing it to higher-level concepts. Concept

hierarchies may be used for this purpose. This is particularly useful for continuous valued
attributes.
For example, numeric values for the attribute income can be generalized to discrete ranges,
such as low, medium, and high. Similarly, categorical attributes, like street, can be
generalized to higher-level concepts, like city.

Data can also be reduced by applying many other methods, ranging from wavelet
transformation and principle components analysis to discretization techniques, such as
binning, histogram analysis, and clustering.
3.1.2 Comparing Classification and Prediction Methods:
Accuracy:

The accuracy of a classifier refers to the ability of a given classifier to correctly predictthe
class label of new or previously unseen data (i.e., tuples without class label information).

The accuracy of a predictor refers to how well a given predictor can guess the value of the
predicted attribute for new or previously unseen data.

Speed:

This refers to the computational costs involved in generating and using thegiven classifier or
predictor.
Robustness:

This is the ability of the classifier or predictor to make correct predictionsgiven noisy data or
data with missing values.
Scalability:

This refers to the ability to construct the classifier or predictor efficientlygiven large amounts
of data.
Interpretability:

This refers to the level of understanding and insight that is providedby the classifier
orpredictor.

Interpretability is subjective and therefore more difficultto assess.

3.2 Classification by Decision Tree Induction:

Decision tree induction is the learning of decision trees from class-labeled training tuples.

A decision tree is a flowchart-like tree structure,where

Each internal nodedenotes a test on an attribute.

Each branch represents an outcome of the test.

Each leaf node holds a class label.

The topmost node in a tree is the root node.

The construction of decision treeclassifiers does not require any domain knowledge or
parameter setting, and therefore I appropriate for exploratory knowledge discovery.

Decision trees can handle high dimensionaldata.

Their representation of acquired knowledge in tree formis intuitive and generallyeasy to

assimilate by humans.

The learning and classification steps of decision treeinduction are simple and fast.

In general, decision tree classifiers have good accuracy.

Decision tree induction algorithmshave been used for classification in many application
areas, such as medicine,manufacturing and production, financial analysis, astronomy, and
molecular biology.

3.2.1 Algorithm For Decision Tree Induction:

The algorithm is called with three parameters:

Data partition

Attribute list

Attribute selection method

The parameter attribute list is a list ofattributes describing the tuples.

Attribute selection method specifies a heuristic procedurefor selecting the attribute that
―best‖ discriminates the given tuples according to class.

The tree starts as a single node, N, representing the training tuples in D.

If the tuples in D are all of the same class, then node N becomes a leaf and is labeledwith that
class .

Allof the terminating conditions are explained at the end of the algorithm.

Otherwise, the algorithm calls Attribute selection method to determine the splitting
criterion.
The splitting criterion tells us which attribute to test at node N by determiningthe ―best‖ way
to separate or partition the tuples in D into individual classes.

There are three possible scenarios.LetA be the splitting attribute. A has v distinct values, {a1,
a2, … ,av}, based on the training data.
1 A is discrete-valued:

In this case, the outcomes of the test at node N corresponddirectly to the known

values of A.
A branch is created for each known value, aj, of A and labeled with that value.

Aneed not be considered in any future partitioning of the tuples.

2 A is continuous-valued:

In this case, the test at node N has two possible outcomes, corresponding to the conditions A
<=split point and A >split point, respectively wheresplit point is the split-point returned by
Attribute selection method as part of the splitting criterion.
3 A is discrete-valued and a binary tree must be produced:
The test at node N is of the form―A€SA?‖. SA is the splitting subset for A, returned by
Attribute selection methodas part of the splitting criterion. It is a subset of the known values
of A.

(a)If A is Discrete valued (b)If A is continuous valued (c) IfA is discrete-valued and a
binarytree must be produced:
3.3 Bayesian Classification:
Bayesian classifiers are statistical classifiers.

They can predictclass membership probabilities, such as the probability that a given tuple
belongs toa particular class.

Bayesian classification is based on Bayes’ theorem.

3.3.1 Bayes’ Theorem:

Let X be a data tuple. In Bayesian terms, X is considered ―evidence.‖and it is described by
measurements made on a set of n attributes.

Let H be some hypothesis, such as that the data tuple X belongs to a specified class C.

For classification problems, we want to determine P(H|X), the probability that the hypothesis
H holds given the ―evidence‖ or observed data tuple X.

P(H|X) is the posterior probability, or a posteriori probability, of H conditioned on X.

Bayes’ theorem is useful in that it providesa way of calculating the posterior probability,
P(H|X), from P(H), P(X|H), and P(X).

3.3.2 Naïve Bayesian Classification:

1.The naïve Bayesian classifier, or simple Bayesian classifier, works as follows: 1.Let D be a
training set of tuples and their associated class labels. As usual, each tuple isrepresented by
an n-dimensional attribute vector, X = (x1, x2, …,xn), depicting n measurements made on the
tuple from n attributes, respectively, A1, A2, …, An.

2.Suppose that there are m classes, C1, C2, …, Cm. Given a tuple, X, the classifier
willpredict that X belongs to the class having the highest posterior probability, conditioned on
X. That is, the naïve Bayesian classifier predicts that tuple X belongs to the class Ci if and
only if
Thus we maximize P(CijX). The classifier which P(CijX) is maximized is called the
maximum posteriori hypothesis. By Bayes’ theorem

3.As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be maximized. If the class prior
probabilities are not known, then it is commonly assumed that the classes are equally likely,
that is, P(C1) = P(C2) = …= P(Cm), and we would therefore maximize P(X|Ci). Otherwise,
we maximize P(X|Ci)P(Ci).

4.Given data sets with many attributes, it would be extremely computationally

expensivetocompute P(X|Ci). In order to reduce computation in evaluating P(X|Ci), the naive
assumption of class conditional independence is made. This presumes that the values of the
attributes areconditionally independent of one another, given the class label of the tuple.
Thus,

We can easily estimate the probabilities P(x1|Ci), P(x2|Ci), : : : , P(xn|Ci) fromthe

trainingtuples. For eachattribute, we look at whether the attribute is categorical or continuous-
valued. Forinstance, to compute P(X|Ci), we consider the following:
If Akis categorical, then P(xk|Ci) is the number of tuples of class Ciin D havingthe
valuexkfor Ak, divided by |Ci,D| the number of tuples of class Ciin D.

If Akis continuous-valued, then we need to do a bit more work, but the calculationis
prettystraightforward.

A continuous-valued attribute is typically assumed to have a Gaussian distribution with a

mean μ and standard deviation , defined by

5.In order to predict the class label of X, P(XjCi)P(Ci) is evaluated for each class Ci.The
classifier predicts that the class label of tuple X is the class Ciif and only if
Classifier Accuracy:

The accuracy of a classifier on a given test set is the percentage of test set tuples that are
correctly classified by the classifier.

In the pattern recognition literature, this is also referred to as the overall recognition rate of
the classifier, that is, it reflects how well the classifier recognizes tuples of the various
classes.

The error rate or misclassification rate of a classifier, which is simply 1-Acc(M),

where Acc(M) is the accuracy of M.
The confusion matrix is a useful tool for analyzing how well your classifier can recognize

tuples of different classes.

True positives refer to the positive tuples that were correctly labeled by the classifier.

True negatives are the negative tuples that were correctly labeled by the classifier.

False positives are the negative tuples that were incorrectly labeled.

How well the classifier can recognize, for this sensitivity and specificity measures can be
used.

Accuracy is a function of sensitivity and specificity.

Where t _posis the number of true positives

Pos is the number of positive tuples
t _neg is the number of true negatives
neg is the number of negative tuples
f _posis the number of false positives

Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Unit 4
No ratings yet
Unit 4
20 pages
Classification and Prediction
No ratings yet
Classification and Prediction
69 pages
DWDM 4
No ratings yet
DWDM 4
58 pages
Unit 3
No ratings yet
Unit 3
16 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Classification
No ratings yet
Classification
81 pages
Lecture11-Ch8-ClassBasic-Part1
No ratings yet
Lecture11-Ch8-ClassBasic-Part1
38 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
CS402 Mod 3
No ratings yet
CS402 Mod 3
2 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Classification
100% (1)
Classification
37 pages
Unit-4 - Data Ware
No ratings yet
Unit-4 - Data Ware
59 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Unit-3 Classification
No ratings yet
Unit-3 Classification
28 pages
CH 5
No ratings yet
CH 5
84 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Module 3
No ratings yet
Module 3
64 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
Module 5: Data Mining Algorithms: Classification
No ratings yet
Module 5: Data Mining Algorithms: Classification
34 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Data Mining UNIT-III R20 Syllabus
No ratings yet
Data Mining UNIT-III R20 Syllabus
50 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Unit 3 - Classification
No ratings yet
Unit 3 - Classification
28 pages
siv UNIT-3 Classification DWM PART-A
No ratings yet
siv UNIT-3 Classification DWM PART-A
12 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
3 Module DWM
No ratings yet
3 Module DWM
16 pages
7 Classification
100% (3)
7 Classification
63 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Les 3 DWM
No ratings yet
Les 3 DWM
21 pages
Data Mining Unit-Iii
No ratings yet
Data Mining Unit-Iii
36 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
DM UNIT-3
No ratings yet
DM UNIT-3
23 pages
DWM Unit-III
No ratings yet
DWM Unit-III
24 pages
Dwdm-Unit-3 R16
No ratings yet
Dwdm-Unit-3 R16
14 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
Unit-3
No ratings yet
Unit-3
98 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
updated dm unit 3
No ratings yet
updated dm unit 3
28 pages
Unit 3
No ratings yet
Unit 3
95 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
20210913115613D3708 - Session 05-08 Decision Tree Classification
No ratings yet
20210913115613D3708 - Session 05-08 Decision Tree Classification
37 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
COMP 6930 Topic01 Classification Basics
No ratings yet
COMP 6930 Topic01 Classification Basics
190 pages
Module4 QB 1
No ratings yet
Module4 QB 1
26 pages
DWDM Unit 4 PDF
No ratings yet
DWDM Unit 4 PDF
18 pages
DWDM Module IV
No ratings yet
DWDM Module IV
57 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
22 pages
Unit 3-Classification
No ratings yet
Unit 3-Classification
71 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Draft: Reddit Sentiment Analysis
No ratings yet
Draft: Reddit Sentiment Analysis
19 pages
Internet of Things IoT-Enabled Machine Learning Models For Efficient Monitoring of Smart Agriculture
No ratings yet
Internet of Things IoT-Enabled Machine Learning Models For Efficient Monitoring of Smart Agriculture
17 pages
A Literature Review in Health Informatics
No ratings yet
A Literature Review in Health Informatics
7 pages
Lecture 10 Naive Baise Theorem
No ratings yet
Lecture 10 Naive Baise Theorem
3 pages
2024 - Slide2 - BayesML Sub
No ratings yet
2024 - Slide2 - BayesML Sub
40 pages
Data Mining Techniques For Sales Forecastings
No ratings yet
Data Mining Techniques For Sales Forecastings
4 pages
Quiz 9
No ratings yet
Quiz 9
3 pages
Geospatial-Temporal Analysis Andclassification of Criminal Data in Manila
No ratings yet
Geospatial-Temporal Analysis Andclassification of Criminal Data in Manila
6 pages
Classification & Prediction
No ratings yet
Classification & Prediction
78 pages
Probability Drinking Water (Water Quality Index) Industrial Training
No ratings yet
Probability Drinking Water (Water Quality Index) Industrial Training
32 pages
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
100% (1)
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
704 pages
Key Ideas in Machine Learning
No ratings yet
Key Ideas in Machine Learning
11 pages
ReportFormatBScCSEPHASE I (Updated) 2.0
No ratings yet
ReportFormatBScCSEPHASE I (Updated) 2.0
16 pages
Identifying Depression On Twitter: Moin Nadeem, Mike Horn., Glen Coppersmith, PHD, University of
No ratings yet
Identifying Depression On Twitter: Moin Nadeem, Mike Horn., Glen Coppersmith, PHD, University of
9 pages
A Two-Stage Optimized Robust Kernel Density Estima
No ratings yet
A Two-Stage Optimized Robust Kernel Density Estima
36 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
Amazon_Sales_Analysis_Presentation
No ratings yet
Amazon_Sales_Analysis_Presentation
24 pages
Naive Bayes Theory
No ratings yet
Naive Bayes Theory
4 pages
Augilera Et Al. (2010) - Hybrid Bayesian Network Classifiers - Application To Species Distribution Models
No ratings yet
Augilera Et Al. (2010) - Hybrid Bayesian Network Classifiers - Application To Species Distribution Models
10 pages
CS8082-Machine Learning Techniques
No ratings yet
CS8082-Machine Learning Techniques
13 pages
Web Phishing Detection Using Machine Learning
No ratings yet
Web Phishing Detection Using Machine Learning
22 pages
A Survey On Data Mining Techniques in Cu
No ratings yet
A Survey On Data Mining Techniques in Cu
7 pages
Heart Disease Prediction Using Machine Learning Techniques: Devansh Shah Samir Patel Santosh Kumar Bharti
No ratings yet
Heart Disease Prediction Using Machine Learning Techniques: Devansh Shah Samir Patel Santosh Kumar Bharti
6 pages
Data Mining Alternative Classification Notes
No ratings yet
Data Mining Alternative Classification Notes
72 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
AI Syllabus Course
No ratings yet
AI Syllabus Course
16 pages
Unit 1 DMW
No ratings yet
Unit 1 DMW
41 pages
Ai Fundamentals Midterm Quizzes Source
No ratings yet
Ai Fundamentals Midterm Quizzes Source
26 pages
Ben Ulmer, Matt Fernandez, Predicting Soccer Results in The English Premier League
No ratings yet
Ben Ulmer, Matt Fernandez, Predicting Soccer Results in The English Premier League
5 pages
LP 1 (DA) Question Bank
No ratings yet
LP 1 (DA) Question Bank
4 pages

Unit-III Classification

Uploaded by

Unit-III Classification

Uploaded by

Unit-III

Classification predicts categorical (discrete, unordered) labels, prediction models continuous

A predictor is constructed that predicts a continuous-valued function, or ordered value, as

3.1.1 Issues Regarding Classification and Prediction:

1.Preparing the Data for Classification and Prediction:

Such analysis can help improve classification efficiency and scalability.

(iii)Data Transformation And Reduction

The data can also be transformed by generalizing it to higher-level concepts. Concept

Interpretability is subjective and therefore more difficultto assess.

3.2 Classification by Decision Tree Induction:

A decision tree is a flowchart-like tree structure,where

Each internal nodedenotes a test on an attribute.

Each branch represents an outcome of the test.

The topmost node in a tree is the root node.

Decision trees can handle high dimensionaldata.

Their representation of acquired knowledge in tree formis intuitive and generallyeasy to

In general, decision tree classifiers have good accuracy.

3.2.1 Algorithm For Decision Tree Induction:

The algorithm is called with three parameters:

Attribute selection method

The tree starts as a single node, N, representing the training tuples in D.

Aneed not be considered in any future partitioning of the tuples.

Bayesian classification is based on Bayes’ theorem.

3.3.1 Bayes’ Theorem:

P(H|X) is the posterior probability, or a posteriori probability, of H conditioned on X.

3.3.2 Naïve Bayesian Classification:

4.Given data sets with many attributes, it would be extremely computationally

We can easily estimate the probabilities P(x1|Ci), P(x2|Ci), : : : , P(xn|Ci) fromthe

A continuous-valued attribute is typically assumed to have a Gaussian distribution with a

The error rate or misclassification rate of a classifier, which is simply 1-Acc(M),

tuples of different classes.

Accuracy is a function of sensitivity and specificity.

Where t _posis the number of true positives

You might also like