0% found this document useful (0 votes)
36 views

Chapter#03 Supervised Learning and Its Algorithms - III

The document discusses the ID3 decision tree algorithm and the process for building a decision tree classifier. It explains that ID3 iteratively selects the attribute that best splits the data using information gain. The attribute with the highest information gain is chosen at each node to partition the data. Entropy and information gain are used to measure the impurity of a split and determine the attribute to select as the decision node. Examples are provided to demonstrate calculating information gain and entropy for different attributes to determine the optimal attribute to use at each split in the decision tree.

Uploaded by

Muhammad Huzaifa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views

Chapter#03 Supervised Learning and Its Algorithms - III

The document discusses the ID3 decision tree algorithm and the process for building a decision tree classifier. It explains that ID3 iteratively selects the attribute that best splits the data using information gain. The attribute with the highest information gain is chosen at each node to partition the data. Entropy and information gain are used to measure the impurity of a split and determine the attribute to select as the decision node. Examples are provided to demonstrate calculating information gain and entropy for different attributes to determine the optimal attribute to use at each split in the decision tree.

Uploaded by

Muhammad Huzaifa
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

CHAPTER#03

SUPERVISED LEARNING & ITS ALGORITHMS


PART # 03

COURSE INSTRUCTORS:
 DR. MUHAMMAD NASEEM
 ENGR. FARHEEN QAZI
 MS. RUQQIYA AZIZ

DEPARTMENT OF SOFTWARE ENGINEERING


SIR SYED UNIVERSITY OF ENGINEERING & TECHNOLOGY
AGENDA

 Decision Tree Algorithm


 Algorithms used in Decision Tree
 ID 3
 History
 Process of ID3
 Entropy and Information Gain
 Summary
DECISION TREE ALGORITHM
 Decision Tree algorithm belongs to the family of supervised learning algorithms.
 Unlike other supervised learning algorithms, the decision tree algorithm can be used
for solving regression and classification problems too.
 The goal of using a Decision Tree is to create a training model that can use to predict
the class or value of the target variable by learning simple decision rules inferred
from prior data(training data).
 In Decision Trees, for predicting a class label for a record we start from the root of
the tree.We compare the values of the root attribute with the record’s attribute.
 On the basis of comparison, we follow the branch corresponding to that value and
jump to the next node.
ALGORITHMS USED IN DECISION TREE

Let us look at some algorithms used in Decision Trees:


 ID3 → (extension of D3)
 C4.5 → (successor of ID3)
 CART → (Classification And Regression Tree)
 CHAID → (Chi-square automatic interaction detection Performs multi-level splits when computing
classification trees)
 MARS → (multivariate adaptive regression splines)
ID 3
(ITERATIVE DICHOTOMISER 3)
WHAT IS THE ID3 ALGORITHM?

 ID3 is one of the most common decision tree algorithm.


 ID3 stands for Iterative Dichotomiser 3 (dichotomisation means
dividing into two completely opposite things)
 The algorithm iteratively divides attributes into two groups which are
the most dominant attribute and others to construct a tree.
 Algorithm used to generate a decision tree.
 ID3 is a precursor to the C4.5 Algorithm.
HISTORY

 The ID3 algorithm was invented by Ross Quinlan.


 Quinlan was a computer science researcher in data mining, and decision
theory.
 Received doctorate in computer science at the University of Washington
in 1968.
DECISION TREE

 Classifies data using the attributes


 Tree consists of decision nodes and decision leafs.
 Nodes can have two or more branches which represents the value for
the attribute tested.
 Leafs nodes produces a homogeneous result.
THE PROCESS (BUILDING THE TREE)

 At each node, we need to find the attribute that best divides the data
into Yes and No.
 Take all unused attributes and calculates their entropies.
 Chooses attribute that has the lowest entropy is minimum or when
information gain is maximum
 The attribute with the highest information gain is selected at each node.
ENTROPY & INFORMATION GAIN

 In ID3, entropy is used to measure how informative a node is.

 It is observed that splitting on any attribute has the property that average entropy of the resulting
training subsets will be less than or equal to that of the previous training set.

 ID3 algorithm defines a measurement of a splitting called Information Gain to determine the goodness
of a split.

 The attribute with the largest value of information gain is chosen as the splitting attribute and

 it partitions into a number of smaller training sets based on the distinct values of attribute under split.
Example-1
Question#01: Using Decision Tree algorithm and given table, classify the given tuple (T), also
draw the tree displaying the parent node and children nodes.

ID AGE COMPETITION TYPE PROFIT


1 Old Y Software Down
2 Old N Software Down
3 Old N Hardware Down
4 Mid Y Software Down
5 Mid Y Hardware Down
6 Mid N Hardware Up
7 Mid N Software Up
8 New Y Software Up
9 New N Hardware Up
10 New N Software Up
CONTD…..
 Step-1: Determine the Entropy for the positive class (P) and the negative class (N). Note:These P and N
can be more than 2 according to the class attributes.

−𝑃 𝑃 𝑁 𝑁
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐶𝑙𝑎𝑠𝑠) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔 −− − (1)
𝑁+𝑃 𝑁+𝑃 𝑁+𝑃 𝑁+𝑃

 According to the table P = 5 and N = 5 because of 5 Up and 5 down.

−5 5 5 5
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐶𝑙𝑎𝑠𝑠) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔
5+5 5+5 5+5 5+5

𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐶𝑙𝑎𝑠𝑠) = 1
CONTD…..
Step-2: Determine information gain 𝐼(𝑃 , 𝑁 ) for each feature attributes like Age, Competition and Type.
−𝑃 𝑃 𝑁 𝑁
𝐼(𝑃 , 𝑁 ) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔 −− − (2)
𝑁+𝑃 𝑁+𝑃 𝑁+𝑃 𝑁+𝑃

Table-1: (Age)
𝑷𝒊 𝑵𝒊 𝑰(𝑷𝒊 , 𝑵𝒊 )
Old 0 3 0
Mid 2 2 1
New 3 0 0
0 0 3 3
𝐼(𝑃 , 𝑁 ) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔
3+0 3+0 3+0 3+0
𝐼(𝑃 , 𝑁 ) =0

−2 2 2 2
𝐼(𝑃 , 𝑁 ) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔
2+2 2+2 2+2 2+2
𝐼(𝑃 , 𝑁 ) =1

−3 3 3 3
𝐼(𝑃 , 𝑁 ) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔
0+3 0+3 3+0 3+0
𝐼(𝑃 , 𝑁 ) =0
CONTD…..

Step-3: Now determine the Entropy of the Features Age,

 
𝑃 +𝑁
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐴𝑔𝑒 = ∗ 𝐼 𝑃 ,𝑁 −− −(3)
𝑃+𝑁
 

From table-1,

0+3 2+2 3+0


𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐴𝑔𝑒 = 0 + 1 + 0
5+5 5+5 5+5

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐴𝑔𝑒 = 0.4


CONTD….

Step-4: Now determine the Gain,

𝐺𝑎𝑖𝑛 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐶𝑙𝑎𝑠𝑠 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐴𝑔𝑒)

𝐺𝑎𝑖𝑛 = 1 − 0.4

𝐺𝑎𝑖𝑛 = 0.6
CONTD….

Applying steps 2, 3 and 4 on competition,

Table-2: (Competition)

𝑷𝒊 𝑵𝒊 𝑰(𝑷𝒊 , 𝑵𝒊 )
Y 1 3 0.81127
N 4 2 0.918295

−1 1 3 3
𝐼(𝑃 , 𝑁 ) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔
1+3 1+3 1+3 1+3
𝐼(𝑃 , 𝑁 ) = 0.81127

−4 4 2 2
𝐼(𝑃 , 𝑁 ) = 𝑙𝑜𝑔 − 𝑙𝑜𝑔
2+4 2+4 2+4 2+4
𝐼(𝑃 , 𝑁 ) = 0.918295
CONTD….
From table-2,

1+3 4+2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐶𝑜𝑚𝑝𝑖𝑡𝑖𝑜𝑛 = 0.81127 + 0.918295
5+5 5+5

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐶𝑜𝑚𝑝𝑖𝑡𝑖𝑜𝑛 = 0.8754

Now determine the Gain,

𝐺𝑎𝑖𝑛 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐶𝑙𝑎𝑠𝑠 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐶𝑜𝑚𝑝𝑖𝑡𝑖𝑜𝑛)


𝐺𝑎𝑖𝑛 = 1 − 0.8754
𝐺𝑎𝑖𝑛 = 0.1245
CONTD….

 Applying steps 2, 3 and 4 on Type ,

Table-3: (Type)
𝑷𝒊 𝑵𝒊 𝑰(𝑷𝒊 , 𝑵𝒊 )
Software 3 3 1
Hardware 2 2 1

−3 3 3 3
𝐼(𝑃 , 𝑁 )𝑺𝒐𝒇𝒕𝒘𝒂𝒓𝒆 = 𝑙𝑜𝑔 − 𝑙𝑜𝑔
3+3 3+3 3+3 1+3
𝐼(𝑃 , 𝑁 )𝑺𝒐𝒇𝒕𝒘𝒂𝒓𝒆 = 1

−2 2 2 2
𝐼(𝑃 , 𝑁 )𝑯𝒂𝒓𝒅𝒘𝒂𝒓𝒆 = 𝑙𝑜𝑔 − 𝑙𝑜𝑔
2+2 2+2 2+2 2+2
𝐼(𝑃 , 𝑁 )𝑯𝒂𝒓𝒅𝒘𝒂𝒓𝒆 = 1
CONTD….
From table-3,
3+3 2+2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑇𝑦𝑝𝑒 = 1 + 1
5+5 5+5

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑇𝑦𝑝𝑒 = 1
Now determine the Gain,

𝐺𝑎𝑖𝑛 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐶𝑙𝑎𝑠𝑠 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑇𝑦𝑝𝑒)


𝐺𝑎𝑖𝑛 = 1 − 1
𝐺𝑎𝑖𝑛 = 0
Table-4: Information Gain
Age 0.6
Competition 0.1245
Type 0
CONTD….

 So we can see that Information Gain of Age is greater than both, which means that it will become a root node or parent
node.
Age

Old New
Mid

Down Up

Note
If you see in the table that all features of Old are Down, so we place down as one of its root and on the other hand
all features of New is Up, so we placed it as its other branch.
CONTD….

Now, we need to determine the feature (Age) at mid-range, so we build another table for that.

Table-5:
ID AGE COMPETITION TYPE PROFIT
1 Mid Y Software Down
2 Mid Y Hardware Down
3 Mid N Hardware Up
4 Mid N Software Up

Here, P=2 & N=2 because of 2 down and 2 ups.


CONTD….

Since both of them are same,

Entropy(Class) = 1
Table-6: Competition reduced

𝑷𝒊 𝑵𝒊 𝑰(𝑷𝒊 , 𝑵𝒊 )

Y 0 2 0
N 2 0 0
CONTD….

From table-6,
0+2 0+2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐶𝑜𝑚𝑝𝑖𝑡𝑖𝑜𝑛 𝑅𝑒𝑑𝑢𝑐𝑒𝑑 = 0 + 0
2+2 2+2

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐶𝑜𝑚𝑝𝑖𝑡𝑖𝑜𝑛 𝑅𝑒𝑑𝑢𝑐𝑒𝑑 = 0

𝐺𝑎𝑖𝑛 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐶𝑙𝑎𝑠𝑠 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝐶𝑜𝑚𝑝𝑖𝑡𝑖𝑜𝑛 𝑅𝑒𝑑𝑢𝑐𝑒𝑑)


𝐺𝑎𝑖𝑛 = 1 − 0
𝐺𝑎𝑖𝑛 = 1
CONTD….
Table-7: Type Reduced
𝑷𝒊 𝑵𝒊 𝑰(𝑷𝒊 , 𝑵𝒊 )

SOFTWARE 1 1 1
HARDWARE 1 1 1

From table-7,

1+1 1+1
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑇𝑦𝑝𝑒 𝑅𝑒𝑑𝑢𝑐𝑒𝑑 = 1 + 1
2+2 2+2

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑇𝑦𝑝𝑒 𝑅𝑒𝑑𝑢𝑐𝑒𝑑 = 1

𝐺𝑎𝑖𝑛 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝐶𝑙𝑎𝑠𝑠 − 𝐸𝑛𝑡𝑟𝑜𝑝𝑦( 𝑇𝑦𝑝𝑒 𝑅𝑒𝑑𝑢𝑐𝑒𝑑 )


𝐺𝑎𝑖𝑛 = 1 − 1
𝐺𝑎𝑖𝑛 = 0
Since Gain(Competition) is greater than Gain(Type), hence it becomes the child branch of Age.
FINAL DECISION TREE

Age Test Dataset:

AGE COMPETITION TYPE PROFIT

Old New Old N Hardware ?


Mid Mid N Software ?

Down Competition Up
Predict Profit Class using Decision tree:

AGE COMPETITION TYPE PROFIT

Y Old N Hardware Down


N
Mid N Software Up

Down Up
EXAMPLE-II

Question#02: Using Decision Tree algorithm and given table, classify the given tuple
(T), also draw the tree displaying the parent node and children nodes.
ADVANTAGES OF ID3

 Understandable prediction rules are created from the training data.


 Builds the fastest tree.
 Builds a short tree.
 Only need to test enough attributes until all data is classified.
 Finding leaf nodes enables test data to be pruned, reducing number of
tests.
DISADVANTAGES OF ID3

 Data may be over-fitted or over-classified, if a small sample is tested.


 Only one attribute at a time is tested for making a decision.
 Classifying continuous data may be computationally expensive, as many
trees must be generated to see where to break the continuum.
SUMMARY

 Decision trees can be used to help predict the future


 The trees are easy to understand
 Decision trees work more efficiently with discrete attributes
 The trees may suffer from error propagation

You might also like