0% found this document useful (0 votes)

5 views26 pages

08 Classification

Uploaded by

eric8y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views26 pages

08 Classification

Uploaded by

eric8y

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Classification

Classification Model
• Classification Model
• Classifies objects into a set of pre-specified object
classes (or categories) based on the values of relevant
object attributes (features) and objects’ class labels

Class X
O1 O6
O3 Classification O5
O2
O5 Model Class Y
O4
O1 O2 O6
Oi: contains relevant attribute Class Z
values
Classes and Z are pre-determined O3 O4
X, Ylabels
and class
2
Benefit of Classification
• Identifying the class by a single or a small number of data
attributes (e.g., gender, age) is manageable by human
decision makers, but not when the number of attributes or
the number of instances is large.
• Estimating/predicting the class or category of action
recipient supports time and cost-effective decision making.

The class will be or is “X”  The class will be or is likely to be “X”

3
Motivating Business Questions,
Costs and Benefits
• How do we identify mobile phone service customers who are likely
to churn (switch to another courier)?
• Churn or not: classes of customers
• Which customers: identified by customers’ attribute-value information –
e.g., age, income, gender, services subscribed, service utilization, etc.
• So what?
• Potential actions – increase or decrease customer service for customers
likely to churn
• Costs – increased service cost or loss of loyal customers
• Benefits - reduced churn rate or service cost

4
Motivating Business Questions,
Costs and Benefits
• How do we find customers who are likely to repurchase?
• Repurchase or not: classes of customers
• Which customers: identified by customers’ attributes-value information – e.g.,
recency, frequency and monetary amount of prior purchases (RFM approach)
• So what?
• Potential actions - Target emailing potential repurchase customers; more
customer services; offer coupons/incentives
• Costs – Cost of targeting and cost of losing re-purchase transactions
• Benefits – Increase repurchase amounts

5
Process
Model training/learning/building

Rules
Data Pre- Training Classification
(Patterns or
Collection processing Data algorithm
Models)

Test Data New Data

Model Model
testing application

Class
Evaluate
Prediction

6
Evaluation
• Approaches
• Splitting method - divide into training and testing sets (e.g.
70%/30% or 2/3 to 1/3)
• Cross-validation (e.g. 5 or 10 fold)
• Data partitioned into k mutually exclusive folds of equal size.
• Training and testing is done k times with k-1 folds in training and 1
fold in testing (e.g. D2, D3,….Dk in training and D1 in testing).
• Random sub-sampling
• A variation of the holdout where the holdout is repeated k times
with different training and testing sets.

7
Classifier Evaluation Measures
• Confusion matrix
Classified or Predicted
a b
Actual a aa ab
b ba bb

• D = aa+bb+ab+ba,
• Actual a = (aa+ab), also Actual non-b
• Actual b = (ba+bb), also Actual non-a
• Classified a = (aa+ba), Classified b = (ab+bb)
8
Evaluation Metrics
• Accuracy is the overall correctness of the model and is calculated as
the sum of correct classifications divided by the total number of
classifications.
• Accuracy = (aa+bb) / D
• True Positive Rate (a) = aa / Actual a
• True Positive Rate (b) = bb / Actual b
• False Positive Rate (a) = ba / Actual b
• False Positive Rate (b) = ab / Actual a

9
Precision
• Measure of accuracy for a specific class.
• Precision (a) = aa/ Classified a
• Precision (b) = bb/ Classified b

10
Recall
• Recall is a measure of the ability of a classification model to select
instances of a certain class from a data set. It is commonly also called
sensitivity.
• Equivalent to TP rate.
• Recall (a) = aa / Actual a
• Recall (b) = bb/ Actual b

11
F-measure
• The F-measure is the harmonic mean of precision and recall.
• It can be used as a single measure of performance of the test.
• F = ( 2 x Precision x Recall ) / ( Precision + Recall )

12
Decision
Tree
Classifier

13
Decision Tree Induction
• The learning of decision trees from class-labeled training data.
• Internal node is a test on the attribute.
• Branch is the outcome.
• Leaf node holds the Play Tennis?
class label.
• Most are binary but some
are none binary.

14
Example
Age
Age Income Churn?
70 20,000 Yes
60 18,000 Yes
75 36,000 Yes
67 33,000 Yes
60 36,000 Yes
60 50,000 No
50 12000 Yes
40 12000 Yes
30 12000 No
50 30,000 No
40 16000 Yes
35 20,000 Yes
48 36,000 No
30 37,000 No
22 50,000 No
21 51,000 No

Churn Income
Not
Churn
15
Notations Prediction Object
Classification Samples
Classification
Attributes Class Label Attribute Problem
Age Space
Age Income Churn?
70 20,000 Yes
60 18,000 Yes
75 36,000 Yes
67 33,000 Yes
60 36,000 Yes
60 50,000 No
50 12000 Yes
40 12000 Yes
30 12000 No
50 30,000 No
40 16000 Yes
35 20,000 Yes
48 36,000 No
30 37,000 No
22 50,000 No
21 51,000 No

:Churn Income
Class Labels
:Not
Churn
16
Decision Tree
• Mapping Principle: Recursively partition the data set so
that the subsets contain “pure” data
Ag
e

Churn

Not
Churn

Income

17
Decision Tree for Intrusion Detection
Ref: http://www.cs.iastate.edu/~dkkang/IDS_Bag/

18
Bayesian Classifiers
• Build on causal relations between variables in a domain, using
probability theory to reason with uncertainty

• Learned based on Bayes theorem

• Classify an instance’s class and class membership probabilities

General learning algorithms
• Learn structure that maximizes the scoring function from the given
dataset
• Learn a Conditional Probability Table (CPT) based on the obtained
structure and the given dataset
Naïve Bayesian Classification
• Theoretical foundation
• Given an instance X, the classifier will predict that X belongs to the class
having the highest posterior probability, i.e.
max{P(Cj|X)}

max

• Assume P(X) is constant for all Cj

• max{P(Cj|X)} 
max{P(X|Cj)*P(Cj)} or P(X and Cj)
An example of Naïve Bayes
churn

age income
churn P(churn)
Conditional Probability Tables (CPT) yes 9/16
no 7/16

P(age|churn) age P(income|churn) income

churn < 35 >= 35 churn <36k >=36k
yes 0/9 9/9 yes 9/9 0/9
no 4/7 3/7 no 2/7 5/7
Naïve Bayesian classification
• Instance
• X = (age =30, income =38k)

• What’s the prediction for if X will churn or not churn?

• Assumption by Bayesian Classification
• Conditional independence
• The values of attributes in X are conditionally independent of one
another
Naïve Bayesian Classification (cont.)

P(X|churn=yes)
= P(age<35, income>=36K|churn=yes)
(conditional independence assumption)
= P(age<35|churn=yes) * P(income >=36K|churn=yes)
=0*0

P(X|churn = no)
= P(age<35, income>=36K|churn=no)
(conditional independence assumption)
= P(age<35|churn=no) * P(income >=36K|churn=no)
=4/7*5/7 =0.408
Naïve Bayesian Classification (cont.)

• Prediction (customer X – age 30, income 38K)

P(X|churn = yes) * P(churn = yes)
= 0* 9/16
=0
P(X|churn = no) * P(churn = no)
= 0.408 * 7/16
= 0.179

Therefore, Naïve Bayesian classifier predicts a customer X will not

churn.
Code Walkthrough

COMP-20043 Discrete Structure
No ratings yet
COMP-20043 Discrete Structure
126 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Cia 4
No ratings yet
Cia 4
18 pages
Data Mining-Unit-3
No ratings yet
Data Mining-Unit-3
16 pages
Unit 4 - Classification and Prediction
No ratings yet
Unit 4 - Classification and Prediction
72 pages
Week 5
No ratings yet
Week 5
72 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
V1-CH-6-Classification and Prediction
No ratings yet
V1-CH-6-Classification and Prediction
38 pages
Unit V - Classification and Prediction 2020-21
100% (1)
Unit V - Classification and Prediction 2020-21
68 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Week 4 Part 1 Classification
No ratings yet
Week 4 Part 1 Classification
71 pages
Classification and Prediction
No ratings yet
Classification and Prediction
14 pages
Classification
No ratings yet
Classification
33 pages
Classification
No ratings yet
Classification
36 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
DWDM - Unit - V
No ratings yet
DWDM - Unit - V
93 pages
Classification
100% (1)
Classification
37 pages
08 Class Basic
No ratings yet
08 Class Basic
141 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
52 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Lesson9 Classification1 PDF
No ratings yet
Lesson9 Classification1 PDF
37 pages
Introduction To ML
No ratings yet
Introduction To ML
31 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
CH 5
No ratings yet
CH 5
84 pages
4 - Data Analytics Using DM and ML Algorithms - 1
No ratings yet
4 - Data Analytics Using DM and ML Algorithms - 1
71 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Classification
No ratings yet
Classification
73 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
05 - Decision Tree - Updated
No ratings yet
05 - Decision Tree - Updated
69 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
Chapter 3
No ratings yet
Chapter 3
67 pages
Classification Notes
No ratings yet
Classification Notes
14 pages
Bilal Ahmed Shaik Data Mining
No ratings yet
Bilal Ahmed Shaik Data Mining
88 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Churn Prediction and ML
No ratings yet
Churn Prediction and ML
9 pages
Big Data Mining and Analytics Notes
No ratings yet
Big Data Mining and Analytics Notes
7 pages
7 - Classification
No ratings yet
7 - Classification
71 pages
Unit 3 (DWDM)
No ratings yet
Unit 3 (DWDM)
23 pages
DM - Ch4 - Classification (Part1)
No ratings yet
DM - Ch4 - Classification (Part1)
20 pages
3 DM Classification
No ratings yet
3 DM Classification
62 pages
Unit 4, DWDM, IT Dept, III Year - II Semester
No ratings yet
Unit 4, DWDM, IT Dept, III Year - II Semester
87 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
DWM Unit-III
No ratings yet
DWM Unit-III
24 pages
6.data Mining - Classification
No ratings yet
6.data Mining - Classification
37 pages
Case Study - Churn Mdel Prediction
No ratings yet
Case Study - Churn Mdel Prediction
77 pages
Decision Tree Part 1
No ratings yet
Decision Tree Part 1
16 pages
7 Classification
100% (3)
7 Classification
63 pages
Chp8 Classification Basic Concepts - Lecture#8
No ratings yet
Chp8 Classification Basic Concepts - Lecture#8
40 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Data Mining: Classification
No ratings yet
Data Mining: Classification
70 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
224 pages
Classification DMKD
No ratings yet
Classification DMKD
50 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
08ClassBasic L
No ratings yet
08ClassBasic L
78 pages
Paper 3
No ratings yet
Paper 3
6 pages
Presentation Layer and Application Layer
No ratings yet
Presentation Layer and Application Layer
48 pages
Comp, Comp1, Comp2 and Comp3 in Cobol
No ratings yet
Comp, Comp1, Comp2 and Comp3 in Cobol
3 pages
PDF Reducer V.3: User Guide
No ratings yet
PDF Reducer V.3: User Guide
38 pages
Question Paper & Answer Key - CSE
No ratings yet
Question Paper & Answer Key - CSE
17 pages
2023A FE PM Questions
No ratings yet
2023A FE PM Questions
44 pages
Service PO Print Form
No ratings yet
Service PO Print Form
3 pages
Chart Parsing-Earley Algorithm & Statistical Parsing: Dr. Sukhnandan Kaur Csed, Tiet
No ratings yet
Chart Parsing-Earley Algorithm & Statistical Parsing: Dr. Sukhnandan Kaur Csed, Tiet
40 pages
Logcat
No ratings yet
Logcat
17 pages
Corbett Arithmetic-With-Negatives PDF
No ratings yet
Corbett Arithmetic-With-Negatives PDF
6 pages
Logistic Regression
No ratings yet
Logistic Regression
5 pages
Lab 2 A
No ratings yet
Lab 2 A
3 pages
BCA 1st YEAR Syllabus
No ratings yet
BCA 1st YEAR Syllabus
26 pages
1Z0 853 Demo
No ratings yet
1Z0 853 Demo
7 pages
Assignment 4
No ratings yet
Assignment 4
12 pages
Cos 265 Foundations of Sequential Programming
No ratings yet
Cos 265 Foundations of Sequential Programming
8 pages
Practice Drill 1-Multiple Choice: When You Are Done, Check Your Answers in - Don't Forget To Time Yourself!
No ratings yet
Practice Drill 1-Multiple Choice: When You Are Done, Check Your Answers in - Don't Forget To Time Yourself!
4 pages
Computer Assignment
No ratings yet
Computer Assignment
19 pages
Lecture 1
No ratings yet
Lecture 1
61 pages
Lecture 17 Transfer Learning
No ratings yet
Lecture 17 Transfer Learning
12 pages
DXC - Java MCQs - 1-25
No ratings yet
DXC - Java MCQs - 1-25
13 pages
Daa LM Practical Exercises
No ratings yet
Daa LM Practical Exercises
49 pages
SAP Labs
No ratings yet
SAP Labs
2 pages
Case Study On Different Scheduling Algorithms
No ratings yet
Case Study On Different Scheduling Algorithms
5 pages
Database Management System MCQS: View Answer
No ratings yet
Database Management System MCQS: View Answer
1 page
Internship Assignment 5: Name: Sudeshna Acharyya Email
No ratings yet
Internship Assignment 5: Name: Sudeshna Acharyya Email
10 pages
Postfix Evaluation
No ratings yet
Postfix Evaluation
1 page
Bca 3 y Imp Question Python
No ratings yet
Bca 3 y Imp Question Python
2 pages
Be Summer 2022
No ratings yet
Be Summer 2022
2 pages

08 Classification

Uploaded by

08 Classification

Uploaded by

Classification

The class will be or is “X”  The class will be or is likely to be “X”

Test Data New Data

• Learned based on Bayes theorem

• Classify an instance’s class and class membership probabilities

• Assume P(X) is constant for all Cj

P(age|churn) age P(income|churn) income

• What’s the prediction for if X will churn or not churn?

• Prediction (customer X – age 30, income 38K)

Therefore, Naïve Bayesian classifier predicts a customer X will not

You might also like