0% found this document useful (0 votes)
39 views43 pages

Lecture 6 - Decision Trees

This document discusses supervised machine learning for classification tasks using decision trees. It explains how decision trees are constructed from training data and can then be used to classify new test data by traversing the tree from the root node down to a leaf node to make a prediction. Decision trees provide an efficient and accurate classification model that can be applied to problems like predicting loan approvals, medical diagnoses, and more.

Uploaded by

Waseem Sajjad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views43 pages

Lecture 6 - Decision Trees

This document discusses supervised machine learning for classification tasks using decision trees. It explains how decision trees are constructed from training data and can then be used to classify new test data by traversing the tree from the root node down to a leaf node to make a prediction. Decision trees provide an efficient and accurate classification model that can be applied to problems like predicting loan approvals, medical diagnoses, and more.

Uploaded by

Waseem Sajjad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

High Impact Skills Development Program

in Artificial Intelligence, Data Science, and Blockchain

Module 1: AI Fundamentals
Lecture 6: Decision Trees

Instructor: Dr Syed Imran Ali


Assistant Professor, SEECS, NUST

Courtesy: Dr. Ahsan Sadaat, Dr. Faisal Shafait and Dr. Adnan ul Hasan 1
Supervised Learning
- Regression
- Classification

2
Classification

• Human learning from past experiences


• A computer does not have “experiences”
• A computer system learns from data, which represent some “past
experiences” of an application domain
• Our focus: learn a target function that can be used to predict the
values of a discrete class attribute, e.g., approve or not-approved,
and high-risk or low risk.
• The task is commonly called: Supervised learning, classification, or
inductive learning.

3
Classification
• Supervised learning: classification is seen as supervised learning
from examples.
• Supervision: The data (observations, measurements, etc.) are
labeled with pre-defined classes.
• Test data are classified into these classes too.
• Goal: To learn a classification model from the data that can be used
to predict the classes of new (future, or test) cases/instances
4
Supervised learning process: two
steps
 Learning (training): Learn a model using the training data
 Testing: Test the model using unseen test data to assess the
model accuracy

Number of correct classifications


Accuracy 
Total number of test cases 5
An example: data (loan application)
Approved or not

6
An example
• Data: Loan application data
• Task: Predict whether a loan should be approved or not.
• Performance measure: accuracy.

No learning: classify all future applications (test data) to the


majority class (i.e., Yes):
Accuracy = 9/15 = 60%.
• We can do better than 60% with learning.
7
Classification: Definition
• Given a collection of records (training set )
• Each record contains a set of attributes, one of the attributes is the
class.
• Find a model for class attribute as a function of the values of other
attributes.
• Goal: previously unseen records should be assigned a class as accurately
as possible.
• A test set is used to determine the accuracy of the model. Usually, the
given data set is divided into training and test sets, with training set
used to build the model and test set used to validate it.
8
Illustrating Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Learning
1 Yes Large 125K No
algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction


14 No Small 95K ?

15 No Large 67K ?
10

Test Set 9
Examples of Classification Task
• Predicting tumor cells as benign or malignant

• Classifying credit card transactions


as legitimate or fraudulent

• Classifying secondary structures of protein


as alpha-helix, beta-sheet, or random
coil

• Categorizing news stories as finance,


weather, entertainment, sports, etc 10
Classification Techniques
• Decision Tree based Methods

• Naïve Bayes and Bayesian Belief Networks

• Memory based reasoning

• Neural Networks

• Support Vector Machines

• Rule-based Methods
11
Decision Tree Learning

• Decision tree learning is one of the most widely used

techniques for classification.


• Its classification accuracy is competitive with other methods,

and
• it is very efficient.

• The classification model is a tree, called decision tree.


12
Example of a Decision Tree

Tid Refund Marital Taxable Splitting Attributes


Status Income Cheat

1 Yes Single 125K No


2 No Married 100K No Refund
No
Yes No
3 No Single 70K
4 Yes Married 120K No NO MarSt
5 No Divorced 95K Yes Single, Divorced Married
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree


13
Another Example of Decision Tree

MarSt Single,
Married Divorced
Tid Refund Marital Taxable
Status Income Cheat
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No < 80K > 80K
5 No Divorced 95K Yes
NO YES
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than
10
10 No Single 90K Yes
one tree that fits the same
data! 14
Decision Tree Classification Task

15
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

16
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

17
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

18
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

19
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

20
Apply Model to Test Data
Test Data
Refund Marital Taxable
Status Income Cheat

No Married 80K ?
Refund 10

Yes No

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES

21
Decision Tree Classification Task

Tid Attrib1 Attrib2 Attrib3 Class


Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply Decision
Tid Attrib1 Attrib2 Attrib3 Class
Model Tree
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?


Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set

22
Example: Weather Dataset

23
Which attribute to select?

24
Criterion for attribute/ feature selection

 Which is the best attribute?


 Want to get the smallest tree
 Heuristic: choose the attribute that produces the
“purest” nodes

25
Final decision tree

Splitting stops when data can’t be split any further


26
Entropy
• A decision tree is built top-down from a root node and
involves partitioning the data into subsets that contain
instances with similar values (homogeneous)

• ID3 algorithm uses entropy to calculate the


homogeneity of a sample

• If the sample is completely homogeneous the entropy


is zero and if the sample is an equally divided it has
entropy of one
27
Computing Entropy
• To build a decision tree, we need to calculate two types of entropy
values using frequency tables as follows:
• Entropy using the freq. table of one attribute (entropy of target/class)

28
Computing Entropy

• Entropy using the


freq. table of two
attributes

29
Information Gain
• The information gain is based on the decrease in
entropy after a dataset is split on an attribute

• Constructing a decision tree is all about finding


attribute that returns the highest information gain (i.e.,
the most homogeneous branches)

30
Example: Weather dataset

31
Example: Weather dataset
• Step 2: The dataset is then split on the different
attributes. The entropy for each branch is calculated.
• Then it is added proportionally, to get total entropy for
the split.
• The resulting entropy is subtracted from the entropy
before the split.
• result is the Information Gain, or decrease in entropy.

32
Example: Weather dataset

33
Example: Weather dataset
• Step 3: Choose attribute with the largest information
gain as the decision node.

34
Example: Weather dataset
• Step 4a: A branch with entropy of 0 is a leaf node

35
Example: Weather Dataset

36
Example: Weather dataset
• Step 4b: A branch with entropy of more than 0 needs
further splitting

37
Example: Weather dataset
• Step 5: The ID3 algorithm is run recursively on the
non-leaf branches, until all data is classified.

38
Happy
Learning!
The difference has to be maximum – difference is between two entropies

You might also like