Data Mining - Classification - Lecture04
Data Mining - Classification - Lecture04
Madava Viranjan
What is Classification?
• Classification is a form of data analysis that extracts models describing important
data classes
Eg:
• Bank officer needs to analyze loan application as safe or not safe
• Computer shop owner wants to know customer with given profile buy a
new computer
• Medical researcher wants to know breast cancer data to predict which
treatment should patient have among three.
Classification Vs. Numeric
Prediction
• Classification works with discrete values.
• Numeric prediction works with continuous
valued function
Steps in Classification
1. Learning Step
– Constructs the Classification model
Steps in Classification
2. Classification Step
– Model used to predict class labels
Decision Tree Induction
• Decision tree induction is the learning of
decision trees from class-labeled training
tuples
How to build up a Decision Tree?
D – Data partition (set of training tuples)
attribute_list
attribute_selection_method
• Prepruning
– Pruned by halting the construction
• Postpruning
– Pruned after constructing the full tree
Problems in Decision Trees
• Repetition and replication of tree branches
cause to large trees
• Accuracy
• Error Rate
Evaluating Classifier Performance
• Sensitivity
True positive (recognition) rate
• Specificity
True negative rate
Evaluating Classifier Performance