0% found this document useful (0 votes)
4 views

Decision Tree Part 1

Uploaded by

sondaravalli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Decision Tree Part 1

Uploaded by

sondaravalli
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Classification

BY

Mohd Vaseem
M.Tech IIT Delhi
Pursuing PhD IIT Kanpur
(Assistant Professor , NIFT Panchukla)
Classification
• A form of data analysis that extracts model or classifier to
predict class labels
– class labels categorical (discrete or nominal)
– classifies data based on training set and values in a
classifying attribute, and uses it in classifying new data
• Numeric Prediction
– models continuous-valued functions, i.e., predicts
unknown or missing values
• Typical applications
– Credit/loan approval: loan application is “safe” or “risky”
– Medical diagnosis: tumor is “cancerous” or “benign”
– Fraud detection: transaction is “fraudulent”
Supervised vs. Unsupervised Learning
• Supervised learning (classification)
– Supervision: Training data is accompanied by labels
indicating the class of the observations
– New data is classified based on the training set

• Unsupervised learning (clustering)


– Class labels of training data is unknown
– Given a set of observations, the aim is to establish existence
of classes or clusters in the data
Classification— Two-Step Process
• Model construction: Describe a set of predetermined classes
– Each tuple is assumed to belong to a predefined class, as determined by
the class label attribute
– The model is represented as classification rules, decision trees, or
mathematical formulae

• Model usage: Classify future or unknown objects


– Estimate accuracy of the model
• The known label of test sample is compared with the classified result
from the model
• Accuracy = percentage of test set samples that are correctly
classified by the model
• Test set is independent of training set (otherwise overfitting)
– If the accuracy is acceptable, use the model to classify new data
Phase 1: Model Construction
Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
5
Phase 2: Model Usage
IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAME RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
Merlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes 6
Classification
Linearly Separable Not Linearly Separable
Decision Trees
Divides the feature space by Internal nodes
axes aligned decision boundaries branch (test on attributes)
Each rectangular region is (outcome of the test)

labeled with one label


1 Width > 6.5 cm?
1
Yes No
2
2 Height > 9.5 cm? 3 Height > 6.0 cm?
Yes No Yes No

Leaf node
(class label)

Decision tree: a flowchart-like tree structure


Not Linearly Separable
Decision Trees
• If-Then Rules
– If Width> 6.5 cm AND
Height> 9.5 cm THEN Lemon Width > 6.5 cm?
– If Width> 6.5 cm AND Yes No
Height≤ 9.5 cm THEN Orange
– If Width≤ 6.5 cm AND Height > 9.5 cm? Height > 6.0 cm?
Height> 6.0 cm THEN Lemon Yes Yes No
No
– If Width≤ 6.5 cm AND
Height≤ 6.0 cm THEN Orange
Example
• Whether a customer will wait for a table at a restaurant?
• Attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. Wait Estimate: estimated waiting time (0-10 min, 10-30, 30-60, >60)
Example
Which Tree is Better?
The tree to decide whether to
wait (T) or not (F)
What Makes a Good Tree?
• Not too big:
– computational efficiency (avoid redundant, spurious attributes)
– avoid overfitting training examples
– generalise well to new/unseen observations
– easy to understand and interpret
• Not too small:
– need to handle important but possibly subtle distinctions in data
• Occam's Razor: "the simplest explanation is most likely
the right one"
– find the simplest hypothesis (smallest tree) that fits the observations
Learning Decision Trees
• Learning the simplest (smallest) decision tree is an NP
complete problem (Hyal & Rivest,1976)
• Resort to a greedy heuristic:
– Start from an empty decision tree
– Split on next best attribute
– Recurse
• What is best attribute?
• We use information theory to guide us
– ID3 (Iterative Dichotomiser) – Information Gain
– C4.5 – Gain Ratio
– Classification and Regression Trees (CART) – Gini index
Decision Tree Learning Algorithm
• Simple, greedy, recursive approach, builds up tree
node-by-node
1. pick an attribute to split at a non-terminal node
2. split examples into groups based on attribute
value
3. for each group:
– if no examples - return majority from parent
– else if all examples in same class - return class
– else loop to Step 1
Choosing a Good Attribute
• Which attribute is better to split on, X1 or X2?

Pure Node

Idea:
1. use counts at leaves to define probability distributions, so we
can measure uncertainty
2. a good attribute splits the examples into subsets that are
(ideally) pure

You might also like