0% found this document useful (0 votes)
4 views32 pages

Mod 3a

Chapter 8 of 'Data Mining 3Ed' discusses classification as a data analysis method that predicts categorical class labels, with applications in various fields such as fraud detection and medical diagnosis. It outlines a two-step process involving learning (training) and classification (testing), and highlights decision tree induction as a popular method for classification due to its intuitive representation and good accuracy. The chapter also notes that while decision trees are effective, not all data can be classified using this approach.

Uploaded by

M Nandini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views32 pages

Mod 3a

Chapter 8 of 'Data Mining 3Ed' discusses classification as a data analysis method that predicts categorical class labels, with applications in various fields such as fraud detection and medical diagnosis. It outlines a two-step process involving learning (training) and classification (testing), and highlights decision tree induction as a popular method for classification due to its intuitive representation and good accuracy. The chapter also notes that while decision trees are effective, not all data can be classified using this approach.

Uploaded by

M Nandini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 32

Data Mining 3Ed.

Han, Kamber & Pei


Chapter 8

MODULE-3
CLASSIFICATION
Classification
 A form of data analysis that extracts models
describing important data classes.
 Classifiers, predict categorical (discrete,
unordered) class labels.
 Ex: classification model to categorize bank
loan applications as either safe or risky.
 Applications: fraud detection, target
marketing, performance prediction,
manufacturing, and medical diagnosis.
 Classifiers & class labels
Loan application: safe/risky
Marketing: yes/no
Medical: treatment A / treatment B / treatment C
 Predictors
Continuous valued functions or ordered value
Ex: a customers spending capacity
Method: Regression analysis
 Classification and numeric prediction are the
two major types of prediction problems.
Classification – 2 step process

1. Learning (Training): classification model is


constructed (y=f(X))
 Represented as classification rules, decision
trees or mathematical formulae.
2. Classification (Testing): model is used to
predict class labels
 To estimate predictive accuracy of classifier
(percentage of test set tuples that are correctly
classified).
 Test data is used to avoid overfit of data.
Ex: bank loan classifier
Decision Tree Induction
 Internal nodes – test  Will a customer buy
on attributes. a computer?
 Branch – test
outcome.
 Leaf node – class.

 Decision trees are


easily converted to excellent fair

classification rules.
Ex:
Advantages
 Does not require domain knowledge or parameter
setting, and therefore is appropriate for exploratory
knowledge discovery.
 Can handle multidimensional data.
 Representation of acquired knowledge in tree form is
intuitive and generally easy to understand by humans.
 Learning and classification steps of decision tree
induction are simple and fast.
 Good accuracy

(Note: not all data can be classified using decision trees)


 Applications: medicine, manufacturing and production,
financial analysis, astronomy, and molecular biology.
Basic Algorithm
Splitting criteria - discrete, continuous
and discrete binary valued
Mnmmmmmmmz
`MA,MZM;.;,N; .;V;L;,MXSUDH6DH
Illustrate ID3 algorithm on the dataset

You might also like