Lecture 6 - Decision Trees
Lecture 6 - Decision Trees
Module 1: AI Fundamentals
Lecture 6: Decision Trees
Courtesy: Dr. Ahsan Sadaat, Dr. Faisal Shafait and Dr. Adnan ul Hasan 1
Supervised Learning
- Regression
- Classification
2
Classification
3
Classification
• Supervised learning: classification is seen as supervised learning
from examples.
• Supervision: The data (observations, measurements, etc.) are
labeled with pre-defined classes.
• Test data are classified into these classes too.
• Goal: To learn a classification model from the data that can be used
to predict the classes of new (future, or test) cases/instances
4
Supervised learning process: two
steps
Learning (training): Learn a model using the training data
Testing: Test the model using unseen test data to assess the
model accuracy
6
An example
• Data: Loan application data
• Task: Predict whether a loan should be approved or not.
• Performance measure: accuracy.
3 No Small 70K No
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set 9
Examples of Classification Task
• Predicting tumor cells as benign or malignant
• Neural Networks
• Rule-based Methods
11
Decision Tree Learning
and
• it is very efficient.
MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than
10
10 No Single 90K Yes
one tree that fits the same
data! 14
Decision Tree Classification Task
15
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
16
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
17
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
18
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
19
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
20
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat
No Married 80K ?
Refund 10
Yes No
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
21
Decision Tree Classification Task
6 No Medium 60K No
Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
22
Example: Weather Dataset
23
Which attribute to select?
24
Criterion for attribute/ feature selection
25
Final decision tree
28
Computing Entropy
29
Information Gain
• The information gain is based on the decrease in
entropy after a dataset is split on an attribute
30
Example: Weather dataset
31
Example: Weather dataset
• Step 2: The dataset is then split on the different
attributes. The entropy for each branch is calculated.
• Then it is added proportionally, to get total entropy for
the split.
• The resulting entropy is subtracted from the entropy
before the split.
• result is the Information Gain, or decrease in entropy.
32
Example: Weather dataset
33
Example: Weather dataset
• Step 3: Choose attribute with the largest information
gain as the decision node.
34
Example: Weather dataset
• Step 4a: A branch with entropy of 0 is a leaf node
35
Example: Weather Dataset
36
Example: Weather dataset
• Step 4b: A branch with entropy of more than 0 needs
further splitting
37
Example: Weather dataset
• Step 5: The ID3 algorithm is run recursively on the
non-leaf branches, until all data is classified.
38
Happy
Learning!
The difference has to be maximum – difference is between two entropies