0% found this document useful (0 votes)
5 views26 pages

08 Classification

Uploaded by

eric8y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views26 pages

08 Classification

Uploaded by

eric8y
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 26

Classification

Classification Model
• Classification Model
• Classifies objects into a set of pre-specified object
classes (or categories) based on the values of relevant
object attributes (features) and objects’ class labels

Class X
O1 O6
O3 Classification O5
O2
O5 Model Class Y
O4
O1 O2 O6
Oi: contains relevant attribute Class Z
values
Classes and Z are pre-determined O3 O4
X, Ylabels
and class
2
Benefit of Classification
• Identifying the class by a single or a small number of data
attributes (e.g., gender, age) is manageable by human
decision makers, but not when the number of attributes or
the number of instances is large.
• Estimating/predicting the class or category of action
recipient supports time and cost-effective decision making.

The class will be or is “X”  The class will be or is likely to be “X”

3
Motivating Business Questions,
Costs and Benefits
• How do we identify mobile phone service customers who are likely
to churn (switch to another courier)?
• Churn or not: classes of customers
• Which customers: identified by customers’ attribute-value information –
e.g., age, income, gender, services subscribed, service utilization, etc.
• So what?
• Potential actions – increase or decrease customer service for customers
likely to churn
• Costs – increased service cost or loss of loyal customers
• Benefits - reduced churn rate or service cost

4
Motivating Business Questions,
Costs and Benefits
• How do we find customers who are likely to repurchase?
• Repurchase or not: classes of customers
• Which customers: identified by customers’ attributes-value information – e.g.,
recency, frequency and monetary amount of prior purchases (RFM approach)
• So what?
• Potential actions - Target emailing potential repurchase customers; more
customer services; offer coupons/incentives
• Costs – Cost of targeting and cost of losing re-purchase transactions
• Benefits – Increase repurchase amounts

5
Process
Model training/learning/building

Rules
Data Pre- Training Classification
(Patterns or
Collection processing Data algorithm
Models)

Test Data New Data

Model Model
testing application

Class
Evaluate
Prediction

6
Evaluation
• Approaches
• Splitting method - divide into training and testing sets (e.g.
70%/30% or 2/3 to 1/3)
• Cross-validation (e.g. 5 or 10 fold)
• Data partitioned into k mutually exclusive folds of equal size.
• Training and testing is done k times with k-1 folds in training and 1
fold in testing (e.g. D2, D3,….Dk in training and D1 in testing).
• Random sub-sampling
• A variation of the holdout where the holdout is repeated k times
with different training and testing sets.

7
Classifier Evaluation Measures
• Confusion matrix
Classified or Predicted
a b
Actual a aa ab
b ba bb

• D = aa+bb+ab+ba,
• Actual a = (aa+ab), also Actual non-b
• Actual b = (ba+bb), also Actual non-a
• Classified a = (aa+ba), Classified b = (ab+bb)
8
Evaluation Metrics
• Accuracy is the overall correctness of the model and is calculated as
the sum of correct classifications divided by the total number of
classifications.
• Accuracy = (aa+bb) / D
• True Positive Rate (a) = aa / Actual a
• True Positive Rate (b) = bb / Actual b
• False Positive Rate (a) = ba / Actual b
• False Positive Rate (b) = ab / Actual a

9
Precision
• Measure of accuracy for a specific class.
• Precision (a) = aa/ Classified a
• Precision (b) = bb/ Classified b

10
Recall
• Recall is a measure of the ability of a classification model to select
instances of a certain class from a data set. It is commonly also called
sensitivity.
• Equivalent to TP rate.
• Recall (a) = aa / Actual a
• Recall (b) = bb/ Actual b

11
F-measure
• The F-measure is the harmonic mean of precision and recall.
• It can be used as a single measure of performance of the test.
• F = ( 2 x Precision x Recall ) / ( Precision + Recall )

12
Decision
Tree
Classifier

13
Decision Tree Induction
• The learning of decision trees from class-labeled training data.
• Internal node is a test on the attribute.
• Branch is the outcome.
• Leaf node holds the Play Tennis?
class label.
• Most are binary but some
are none binary.

14
Example
Age
Age Income Churn?
70 20,000 Yes
60 18,000 Yes
75 36,000 Yes
67 33,000 Yes
60 36,000 Yes
60 50,000 No
50 12000 Yes
40 12000 Yes
30 12000 No
50 30,000 No
40 16000 Yes
35 20,000 Yes
48 36,000 No
30 37,000 No
22 50,000 No
21 51,000 No

Churn Income
Not
Churn
15
Notations Prediction Object
Classification Samples
Classification
Attributes Class Label Attribute Problem
Age Space
Age Income Churn?
70 20,000 Yes
60 18,000 Yes
75 36,000 Yes
67 33,000 Yes
60 36,000 Yes
60 50,000 No
50 12000 Yes
40 12000 Yes
30 12000 No
50 30,000 No
40 16000 Yes
35 20,000 Yes
48 36,000 No
30 37,000 No
22 50,000 No
21 51,000 No

:Churn Income
Class Labels
:Not
Churn
16
Decision Tree
• Mapping Principle: Recursively partition the data set so
that the subsets contain “pure” data
Ag
e

Churn

Not
Churn

Income

17
Decision Tree for Intrusion Detection
Ref: http://www.cs.iastate.edu/~dkkang/IDS_Bag/

18
Bayesian Classifiers
• Build on causal relations between variables in a domain, using
probability theory to reason with uncertainty

• Learned based on Bayes theorem

• Classify an instance’s class and class membership probabilities


General learning algorithms
• Learn structure that maximizes the scoring function from the given
dataset
• Learn a Conditional Probability Table (CPT) based on the obtained
structure and the given dataset
Naïve Bayesian Classification
• Theoretical foundation
• Given an instance X, the classifier will predict that X belongs to the class
having the highest posterior probability, i.e.
max{P(Cj|X)}

max

• Assume P(X) is constant for all Cj


• max{P(Cj|X)} 
max{P(X|Cj)*P(Cj)} or P(X and Cj)
An example of Naïve Bayes
churn

age income
churn P(churn)
Conditional Probability Tables (CPT) yes 9/16
no 7/16

P(age|churn) age P(income|churn) income


churn < 35 >= 35 churn <36k >=36k
yes 0/9 9/9 yes 9/9 0/9
no 4/7 3/7 no 2/7 5/7
Naïve Bayesian classification
• Instance
• X = (age =30, income =38k)

• What’s the prediction for if X will churn or not churn?


• Assumption by Bayesian Classification
• Conditional independence
• The values of attributes in X are conditionally independent of one
another
Naïve Bayesian Classification (cont.)

P(X|churn=yes)
= P(age<35, income>=36K|churn=yes)
(conditional independence assumption)
= P(age<35|churn=yes) * P(income >=36K|churn=yes)
=0*0

P(X|churn = no)
= P(age<35, income>=36K|churn=no)
(conditional independence assumption)
= P(age<35|churn=no) * P(income >=36K|churn=no)
=4/7*5/7 =0.408
Naïve Bayesian Classification (cont.)

• Prediction (customer X – age 30, income 38K)


P(X|churn = yes) * P(churn = yes)
= 0* 9/16
=0
P(X|churn = no) * P(churn = no)
= 0.408 * 7/16
= 0.179

Therefore, Naïve Bayesian classifier predicts a customer X will not


churn.
Code Walkthrough

You might also like