0% found this document useful (0 votes)
10 views20 pages

SupervisedLearning Classification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views20 pages

SupervisedLearning Classification

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 20

Classification: Definition

 Given a collection of records (training set )


– Each record contains a set of attributes, one of the
attributes is the class.
 Find a model for class attribute as a function
of the values of other attributes.
 Goal: previously unseen records should be
assigned a class as accurately as possible.
– A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to build
the model and test set used to validate it.

© Vipin Kumar CSci 5980 Spring 2004 1


Illustrating Classification Task

Tid Attrib1 Attrib2 Attrib3 Class Learning


No
1 Yes Large 125K
algorithm
2 No Medium 100K No

3 No Small 70K No
4 Yes Medium 120K No
Induction
5 No Large 95K Yes
6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No
10 No Small 90K Yes
Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction


14 No Small 95K ?

15 No Large 67K ?
10

Test Set

© Vipin Kumar CSci 5980 Spring 2004 2


Examples of Classification Task

 Predicting tumor cells as benign or malignant

 Classifying credit card transactions


as legitimate or fraudulent

 Classifying secondary structures of protein


as alpha-helix, beta-sheet, or random
coil

 Categorizing news stories as finance,


weather, entertainment, sports, etc
© Vipin Kumar CSci 5980 Spring 2004 3
Classification Techniques

 Decision Tree based Methods


 Rule-based Methods
 Memory based reasoning
 Neural Networks
 Naïve Bayes and Bayesian Belief Networks
 Support Vector Machines

© Vipin Kumar CSci 5980 Spring 2004 4


Example of a Decision Tree

cal cal us
i i o
or or nu
teg
teg
nti
ass
ca ca co cl
Tid Refund Marital Taxable
Splitting Attributes
Status Income Cheat

1 Yes Single 125K No


2 No Married 100K No Refund
3 No Single 70K No
Yes No
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Married
Single, Divorced
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree

© Vipin Kumar CSci 5980 Spring 2004 5


Another Example of Decision Tree

cal cal us
i i o
or or nu
teg
teg
nti
ass Single,
ca ca co cl MarSt
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10

© Vipin Kumar CSci 5980 Spring 2004 6


Practical Issues of Classification

 Underfitting and Overfitting

 Missing Values

 Costs of Classification

© Vipin Kumar CSci 5980 Spring 2004 7


Underfitting and Overfitting
(Example)

500 circular and 500


triangular data points.

Circular points:
0.5  sqrt(x12+x22)  1

Triangular points:
sqrt(x12+x22) > 0.5 or
sqrt(x12+x22) < 1

© Vipin Kumar CSci 5980 Spring 2004 8


Underfitting and Overfitting

Overfitting

Underfitting: when model is too simple, both training and test errors are large

© Vipin Kumar CSci 5980 Spring 2004 9


Overfitting due to Noise

Decision boundary is distorted by noise point

© Vipin Kumar CSci 5980 Spring 2004 10


Overfitting due to Insufficient
Examples

Lack of data points in the lower half of the diagram makes it difficult
to predict correctly the class labels of that region
- Insufficient number of training records in the region causes the
decision tree to predict the test examples using other training
records that are irrelevant to the classification task
© Vipin Kumar CSci 5980 Spring 2004 11
Decision Boundary
1

0.9

0.8
x < 0.43?

0.7
Yes No
0.6

y < 0.33?
y

0.5 y < 0.47?


0.4

0.3
Yes No Yes No

0.2
:4 :0 :0 :4
0.1 :0 :4 :3 :0
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

x
•Border line between two neighboring regions of different classes is
known as decision boundary
•Decision boundary is parallel to axes because test condition involves
a single attribute at-a-time

© Vipin Kumar CSci 5980 Spring 2004 12


Oblique Decision Trees

x+y<1

Class = + Class =

• Test condition may involve multiple attributes


• More expressive representation
• Finding optimal test condition is computationally expensive

© Vipin Kumar CSci 5980 Spring 2004 13


Metrics for Performance
Evaluation

 Focus on the predictive capability of a model


– Rather than how fast it takes to classify or
build models, scalability, etc.
 Confusion Matrix:

PREDICTED CLASS
Class=Yes Class=No
a: TP (true positive)
b: FN (false negative)
Class=Yes a b
ACTUAL c: FP (false positive)

CLASS Class=No c d
d: TN (true negative)

© Vipin Kumar CSci 5980 Spring 2004 14


Metrics for Performance
Evaluation…

PREDICTED CLASS
Class=Yes Class=No

Class=Yes a b
ACTUAL (TP) (FN)
CLASS Class=No c d
(FP) (TN)

 Most widely-used metric:

ad TP  TN
Đô chính xác  
a  b  c  d TP  TN  FP  FN

© Vipin Kumar CSci 5980 Spring 2004 15


Limitation of Accuracy

 Consider a 2-class problem


– Number of Class 0 examples = 9990
– Number of Class 1 examples = 10

 If model predicts everything to be class 0,


accuracy is 9990/10000 = 99.9 %
– Accuracy is misleading because model does
not detect any class 1 example

© Vipin Kumar CSci 5980 Spring 2004 16


Cost Matrix

PREDICTED CLASS

C(i|j) Class=Yes Class=No

Class=Yes C(Yes|Yes) C(No|Yes)


ACTUAL
CLASS Class=No C(Yes|No) C(No|No)

C(i|j): Cost of misclassifying class j example as class i

© Vipin Kumar CSci 5980 Spring 2004 17


Computing Cost of Classification

Cost PREDICTED CLASS


Matrix
C(i|j) + -
ACTUAL
+ -1 100
CLASS
- 1 0

Model M1 PREDICTED CLASS Model M2 PREDICTED CLASS

+ - + -
ACTUAL ACTUAL
+ 150 40 + 250 45
CLASS CLASS
- 60 250 - 5 200

Accuracy = 80% Accuracy = 90%


Cost = 3910 Cost = 4255
© Vipin Kumar CSci 5980 Spring 2004 18
Cost vs Accuracy

Count PREDICTED CLASS Accuracy is proportional to cost if


1. C(Yes|No)=C(No|Yes) = q
Class=Yes Class=No
2. C(Yes|Yes)=C(No|No) = p
Class=Yes a b
ACTUAL N=a+b+c+d
CLASS Class=No c d

Accuracy = (a + d)/N

Cost PREDICTED CLASS


Cost = p (a + d) + q (b + c)
Class=Yes Class=No
= p (a + d) + q (N – a – d)
Class=Yes p q = q N – (q – p)(a + d)
ACTUAL
CLASS Class=No = N [q – (q-p)  Accuracy]
q p

© Vipin Kumar CSci 5980 Spring 2004 19


Cost-Sensitive Measures
a
Precision (p) 
a c
a
Recall (r) 
a b
2rp 2a
F - measure (F)  
r  p 2a  b  c
 Precision is biased towards C(Yes|Yes) & C(Yes|No)
 Recall is biased towards C(Yes|Yes) & C(No|Yes)
 F-measure is biased towards all except C(No|No)
wa w d
Weighted Accuracy  1 4

wa wbwcw d
1 2 3 4

© Vipin Kumar CSci 5980 Spring 2004 20

You might also like