08 Classification
08 Classification
Classification Model
• Classification Model
• Classifies objects into a set of pre-specified object
classes (or categories) based on the values of relevant
object attributes (features) and objects’ class labels
Class X
O1 O6
O3 Classification O5
O2
O5 Model Class Y
O4
O1 O2 O6
Oi: contains relevant attribute Class Z
values
Classes and Z are pre-determined O3 O4
X, Ylabels
and class
2
Benefit of Classification
• Identifying the class by a single or a small number of data
attributes (e.g., gender, age) is manageable by human
decision makers, but not when the number of attributes or
the number of instances is large.
• Estimating/predicting the class or category of action
recipient supports time and cost-effective decision making.
3
Motivating Business Questions,
Costs and Benefits
• How do we identify mobile phone service customers who are likely
to churn (switch to another courier)?
• Churn or not: classes of customers
• Which customers: identified by customers’ attribute-value information –
e.g., age, income, gender, services subscribed, service utilization, etc.
• So what?
• Potential actions – increase or decrease customer service for customers
likely to churn
• Costs – increased service cost or loss of loyal customers
• Benefits - reduced churn rate or service cost
4
Motivating Business Questions,
Costs and Benefits
• How do we find customers who are likely to repurchase?
• Repurchase or not: classes of customers
• Which customers: identified by customers’ attributes-value information – e.g.,
recency, frequency and monetary amount of prior purchases (RFM approach)
• So what?
• Potential actions - Target emailing potential repurchase customers; more
customer services; offer coupons/incentives
• Costs – Cost of targeting and cost of losing re-purchase transactions
• Benefits – Increase repurchase amounts
5
Process
Model training/learning/building
Rules
Data Pre- Training Classification
(Patterns or
Collection processing Data algorithm
Models)
Model Model
testing application
Class
Evaluate
Prediction
6
Evaluation
• Approaches
• Splitting method - divide into training and testing sets (e.g.
70%/30% or 2/3 to 1/3)
• Cross-validation (e.g. 5 or 10 fold)
• Data partitioned into k mutually exclusive folds of equal size.
• Training and testing is done k times with k-1 folds in training and 1
fold in testing (e.g. D2, D3,….Dk in training and D1 in testing).
• Random sub-sampling
• A variation of the holdout where the holdout is repeated k times
with different training and testing sets.
7
Classifier Evaluation Measures
• Confusion matrix
Classified or Predicted
a b
Actual a aa ab
b ba bb
• D = aa+bb+ab+ba,
• Actual a = (aa+ab), also Actual non-b
• Actual b = (ba+bb), also Actual non-a
• Classified a = (aa+ba), Classified b = (ab+bb)
8
Evaluation Metrics
• Accuracy is the overall correctness of the model and is calculated as
the sum of correct classifications divided by the total number of
classifications.
• Accuracy = (aa+bb) / D
• True Positive Rate (a) = aa / Actual a
• True Positive Rate (b) = bb / Actual b
• False Positive Rate (a) = ba / Actual b
• False Positive Rate (b) = ab / Actual a
9
Precision
• Measure of accuracy for a specific class.
• Precision (a) = aa/ Classified a
• Precision (b) = bb/ Classified b
10
Recall
• Recall is a measure of the ability of a classification model to select
instances of a certain class from a data set. It is commonly also called
sensitivity.
• Equivalent to TP rate.
• Recall (a) = aa / Actual a
• Recall (b) = bb/ Actual b
11
F-measure
• The F-measure is the harmonic mean of precision and recall.
• It can be used as a single measure of performance of the test.
• F = ( 2 x Precision x Recall ) / ( Precision + Recall )
12
Decision
Tree
Classifier
13
Decision Tree Induction
• The learning of decision trees from class-labeled training data.
• Internal node is a test on the attribute.
• Branch is the outcome.
• Leaf node holds the Play Tennis?
class label.
• Most are binary but some
are none binary.
14
Example
Age
Age Income Churn?
70 20,000 Yes
60 18,000 Yes
75 36,000 Yes
67 33,000 Yes
60 36,000 Yes
60 50,000 No
50 12000 Yes
40 12000 Yes
30 12000 No
50 30,000 No
40 16000 Yes
35 20,000 Yes
48 36,000 No
30 37,000 No
22 50,000 No
21 51,000 No
Churn Income
Not
Churn
15
Notations Prediction Object
Classification Samples
Classification
Attributes Class Label Attribute Problem
Age Space
Age Income Churn?
70 20,000 Yes
60 18,000 Yes
75 36,000 Yes
67 33,000 Yes
60 36,000 Yes
60 50,000 No
50 12000 Yes
40 12000 Yes
30 12000 No
50 30,000 No
40 16000 Yes
35 20,000 Yes
48 36,000 No
30 37,000 No
22 50,000 No
21 51,000 No
:Churn Income
Class Labels
:Not
Churn
16
Decision Tree
• Mapping Principle: Recursively partition the data set so
that the subsets contain “pure” data
Ag
e
Churn
Not
Churn
Income
17
Decision Tree for Intrusion Detection
Ref: http://www.cs.iastate.edu/~dkkang/IDS_Bag/
18
Bayesian Classifiers
• Build on causal relations between variables in a domain, using
probability theory to reason with uncertainty
max
age income
churn P(churn)
Conditional Probability Tables (CPT) yes 9/16
no 7/16
P(X|churn=yes)
= P(age<35, income>=36K|churn=yes)
(conditional independence assumption)
= P(age<35|churn=yes) * P(income >=36K|churn=yes)
=0*0
P(X|churn = no)
= P(age<35, income>=36K|churn=no)
(conditional independence assumption)
= P(age<35|churn=no) * P(income >=36K|churn=no)
=4/7*5/7 =0.408
Naïve Bayesian Classification (cont.)