Classification: Lecture Notes For Chapters 4 & 5

The document discusses classification, which involves using a model to assign class labels to new records based on patterns learned from a training set of labeled records. It provides examples of classification tasks like predicting medical diagnoses or categorizing news articles. Common classification techniques are discussed, including decision trees which use a tree structure to partition the data based on attribute values to determine a record's class. The document demonstrates how a decision tree model is induced from a training set and then used to classify new, unlabeled records by traversing the tree based on attribute values.

Uploaded by

Diaa Malah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views42 pages

Classification: Lecture Notes For Chapters 4 & 5

Uploaded by

Diaa Malah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 42

Classification

Lecture Notes for Chapters 4 & 5

Classification: Definition
 Given a collection of records (training set)
 Each record contains a set of attributes, one of the
attributes is the class label
 Find a model for the class label as a function of the
values of other attributes
 Goal: previously unseen records should be assigned a
class as accurately as possible
 A test set is used to determine the accuracy of the model
 Usually, the given data set is divided into training and test
sets, with training set used to build the model and test set
used to validate it
Illustrating Classification Task
Tid Refund Marital Taxable
Status Income Cheat

1 Yes Attrib1Single
Tid Attrib2 Attrib3
125K Class
No Learning
1No Yes Large 125K No
2 Married 100K No algorithm
2 No Medium 100K No
3 No Single 70K No
3 No Small 70K No
4 Yes Married 120K No
4 Yes Medium 120K No
5 No Divorced 95K Yes Induction
5 No Large 95K Yes
6 No Married 60K No
6 No Medium 60K No
7 7Yes Yes Divorced
Large 220K
220K No
No Learn
8 8No No Single
Small 85K
85K Yes
Yes Model
9 9No No Married
Medium 75K
75K No
No

10 10
No No Small
Single 90K
90K Yes
Yes
Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Examples of Classification
 Predicting tumor cells as benign or malignant

 Classifying secondary structures of protein

as alpha-helix, beta-sheet, or random
coil

 Classifying credit card transactions

as legitimate or fraudulent

 Categorizing news stories as finance,

weather, entertainment, sports, etc …
 Yahoo!
Classification Techniques
 “Decision Tree”-based Methods
 k Nearest Neighbors
 Rule-based Methods
 Case-based Reasoning
 Neural Networks
 Naïve Bayes and Bayesian Belief Networks
 Support Vector Machines
Example of a Decision Tree
Splitting Attributes

Tid Refund Marital Taxable

Status Income Cheat
Refund
1 Yes Single 125K No Yes No
2 No Married 100K No
NO MarSt
3 No Single 70K No
Single, Divorced Married
4 Yes Married 120K No
5 No Divorced 95K Yes TaxInc NO
6 No Married 60K No < 80K > 80K
7 Yes Divorced 220K No
NO YES
8 No Single 85K Yes
9 No Married 75K No

10
10 No Single 90K Yes Model: Decision Tree
Training Data
Another Example
MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that
10 No Single 90K Yes fits the same data!
10
Decision Tree Classification
Tid Attrib1 Attrib2 Attrib3 Class
Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn

Decision
8 No Small 85K Yes Model Tree
9 No Medium 75K No

10 No Small 90K Yes

Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class
Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K