0% found this document useful (0 votes)
8 views

Data Mining - Classification - Lecture04

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Data Mining - Classification - Lecture04

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Data Mining : Classification

Madava Viranjan
What is Classification?
• Classification is a form of data analysis that extracts models describing important
data classes
Eg:
• Bank officer needs to analyze loan application as safe or not safe
• Computer shop owner wants to know customer with given profile buy a
new computer
• Medical researcher wants to know breast cancer data to predict which
treatment should patient have among three.
Classification Vs. Numeric
Prediction
• Classification works with discrete values.
• Numeric prediction works with continuous
valued function
Steps in Classification
1. Learning Step
– Constructs the Classification model
Steps in Classification
2. Classification Step
– Model used to predict class labels
Decision Tree Induction
• Decision tree induction is the learning of
decision trees from class-labeled training
tuples
How to build up a Decision Tree?
D – Data partition (set of training tuples)
attribute_list
attribute_selection_method

step1: Tree starts with a single Node ‘N’


step2: if all the tuples in D are in same class then ‘N’ is a leaf
step3: otherwise attribute_selection_method determines
splitting criterion
step4: tuples in D are partition accordingly
– Discrete values
– Continuous value
– Discrete values and binary tree
Tree Pruning
• Remove branches which appear due to
anomalies of data.

• Prepruning
– Pruned by halting the construction

• Postpruning
– Pruned after constructing the full tree
Problems in Decision Trees
• Repetition and replication of tree branches
cause to large trees

• Loading entire training data set into memory


Bayes Classification
• Statistical Classification
• Compute the probability that a given tuple
belongs to a particular class
• Based on Baye’s theorem
Naïve Bayesian Classification
• Simple Bayesian classifier
• Predicts that tuple belongs to a class Ci if
and only if

• So How do we maximize P(Ci|X)?


Naïve Bayesian Classification
RID Age Income Student Credit_rating Class:
buys_computer
1 Youth High No Fair No
2 Youth High No Excellent No
3 Middle High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle Low Yes Excellent Yes
8 Youth Medium No Fair No
9 Youth Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Youth Medium Yes Excellent Yes
12 Middle Medium No Excellent Yes
13 Middle High Yes Fair Yes
14 Senior Medium No Excellent No
Naïve Bayesian Classification
X = (age = Youth, Income = Medium, Student =
Yes, Credit_rating = fair)

X = (age = Senior, Income = High, Student = No,


Credit_rating = Excellent)

What if one probability becomes 0?


Use Laplacian correction to avoid 0
Text Tag
“A great game” Sports
“The election was over” Not sports
“Very clean match” Sports
“A clean but forgettable game” Sports
“It was a close election” Not sports

“A very close game” Sport or Not Sport?


Rule Based Classification

• Use IF-THEN rules for classification


IF age = youth AND student = yes THEN buys_computer =yes

• Coverage and Accuracy

• What is the Coverage and Accuracy of R1?


Evaluating Classifier Performance
• True Positive (TP)
Positive tuples that were correctly labeled by the Classifier

• True Negative (TN)


Negative tuples that were correctly labeled by the Classifier

• False Positive (FP)


Negative tuples that were incorrectly labeled as Positive by the
Classifier

• False Negative (FN)


Positive tuples that were incorrectly labeled as Negative by the
Classier
Evaluating Classifier Performance

• Accuracy

• Error Rate
Evaluating Classifier Performance

Classes Buys_computer= Yes Buys_computer= No Total


Buys_computer= Yes 6954 46 7000
Buys_computer= No 412 2588 3000
Total 7366 2634 10000
Evaluating Classifier Performance

• Sensitivity
True positive (recognition) rate

• Specificity
True negative rate
Evaluating Classifier Performance

• What is the Sensitivity and Specificity of below

Classes Cancer = Yes Cancer = No Total


Cancer = Yes 90 210 300
Cancer = No 140 9560 9700
Total 230 9770 10000

You might also like