0% found this document useful (0 votes)

16 views10 pages

Learning Decision Trees

Uploaded by

cllgapp1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views10 pages

Learning Decision Trees

Uploaded by

cllgapp1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Learning Decision Trees

 Decision tree algorithm falls under the category of supervised learning. They can
be used to solve both regression and classification problems.
 Decision tree uses the tree representation to solve the problem in which each
leaf node corresponds to a class label and attributes are represented on the
internal node of the tree.
 We can represent any boolean function on discrete attributes using the decision
tree.

Below are some assumptions that we made while using decision tree:
 At the beginning, we consider the whole training set as the root.

 Feature values are preferred to be categorical. If the values are continuous then
they are discretized prior to building the model.

 On the basis of attribute values records are distributed recursively.

 We use statistical methods for ordering attributes as root or the internal node.
From the above diagram it is observed that Decision Tree works on the Sum of
Product form which is also known as Disjunctive Normal Form. In the above image,
we are predicting the use of computer in the daily life of the people.
In Decision Tree the major challenge is to identification of the attribute for the root node
in each level. This process is known as attribute selection. We have two popular
attribute selection measures:
1. Information Gain
2. Gini Index
1. Information Gain
When we use a node in a decision tree to partition the training instances into smaller
subsets the entropy changes. Information gain is a measure of this change in entropy.
Definition: Suppose S is a set of instances, A is an attribute, Sv is the subset of S with
A = v, and Values (A) is the set of all possible values of A, then

The attribute which has the maximum information gain is selected as the parent node
and successively data is split on the node.
Entropy
Entropy is the measure of uncertainty of a random variable, it characterizes the
impurity of an arbitrary collection of examples. The higher the entropy more the
information content.
Definition: Entropy is the measure of homegeneity in the data. Its value is ranges from
0 to 1. Its value is close to 0 if all the example belongs to same class and is close to 1 is
there is almost equal split of the data into different classes. Now the formula to calculate
entropy is:

Here pi represents the proportion of the data with ith classification and c represents the
different types of classification.
Example:
For the set X = {a,a,a,b,b,b,b,b}
Total intances: 8
Instances of b: 5
Instances of a: 3

= -[0.375 * (-1.415) + 0.625 * (-0.678)]

=-(-0.53-0.424)
= 0.954

Building Decision Tree Using Information Gain

The essentials:
 Start with all training instances associated with the root node
 Use info gain to choose which attribute to label each node with
 Note: No root-to-leaf path should contain the same discrete attribute twice
 Recursively construct each subtree on the subset of training instances that would
be classified down that path in the tree.
The border cases:
 If all positive or all negative training instances remain, label that node “yes” or “no”
accordingly
 If no attributes remain, label with a majority vote of training instances left at that
node
 If no instances remain, label with a majority vote of the parent’s training instances

Example:
Now, lets draw a Decision Tree for the following data using Information gain.
Training set: 3 features and 2 classes

X Y Z C

1 1 1 I

1 1 0 I

0 0 1 II

1 0 0 II

Here, we have 3 features and 2 output classes.

To build a decision tree using Information gain. We will take each of the feature and
calculate the information for each feature.
Split on feature X

Split on feature Y
Echild1 = -(2/2) log2 (2/2) - (2/2) log2 (2/2) = 0
Echild2 = -(2/2) log2 (2/2) - (2/2) log2 (2/2) = 0

Split on feature Z
Echild1 = -(1/2) log2 (1/2) - (1/2) log2 (1/2) = 1
Echild2 = -(1/2) log2 (1/2) - (1/2) log2 (1/2) = 1
From the above images we can see that the information gain is maximum when we
make a split on feature Y. So, for the root node best suited feature is feature Y. Now
we can see that while splitting the dataset by feature Y, the child contains pure
subset of the target variable. So we don’t need to further split the dataset. The final
tree for the above data set is:

2. Gini Index
 Gini Index is a metric to measure how often a randomly chosen element would be
incorrectly identified.
 It means an attribute with lower Gini index should be preferred.
 Sklearn supports “Gini” criteria for Gini Index and by default, it takes “gini” value.
 The Formula for the calculation of the of the Gini Index is given below.

Example:
Lets consider the dataset in the image below and draw a decision tree using gini
index.
INDEX A B C D E

1 4.8 3.4 1.9 0.2 positive

2 5 3 1.6 1.2 positive

INDEX A B C D E

3 5 3.4 1.6 0.2 positive

4 5.2 3.5 1.5 0.2 positive

5 5.2 3.4 1.4 0.2 positive

6 4.7 3.2 1.6 0.2 positive

7 4.8 3.1 1.6 0.2 positive

8 5.4 3.4 1.5 0.4 positive

9 7 3.2 4.7 1.4 negative

10 6.4 3.2 4.7 1.5 negative

11 6.9 3.1 4.9 1.5 negative

12 5.5 2.3 4 1.3 negative

13 6.5 2.8 4.6 1.5 negative

14 5.7 2.8 4.5 1.3 negative

15 6.3 3.3 4.7 1.6 negative

16 4.9 2.4 3.3 1 negative

In the dataset above there are 5 attributes from which attribute E is the predicting
feature which contains 2(Positive & Negative) classes. We have an equal proportion for
Both the classes.
In Gini Index, we have to choose some random values to categorize each attribute.
These values for this dataset are:
A B C D
>= 5 >= 3.0 >= 4.2 >= 1.4
< 5 < 3.0 < 4.2 < 1.4

Calculating Gini Index for Var A:

Value >= 5: 12
Attribute A >= 5 & class = positive: 5/12
Attribute A >= 5 & class = negative: 7/12
Gini(5, 7) = 1 – [(5/12)2 + (7/12)2] = 0.4860

Value < 5: 4
Attribute A < 5 & class = positive: 3/4
Attribute A < 5 & class = negative: 1/4
Gini(3, 1) = 1 – [(3/4)2 + (1/4)2] = 0.375

By adding weight and sum each of the gini indices:

Gini (Target, A) = (12/16) * (0.4860) + (4/16) * (0.375) = 0.45825

Calculating Gini Index for Var B:

Value >= 3: 12
Attribute B >= 3 & class = positive: 8/12
Attribute B >= 5 & class = negative: 4/12
Gini(5, 7) = 1 – [(8/12)2 + (4/12)2] = 0.4460

Value < 3: 4
Attribute A < 3 & class = positive: 0/4
Attribute A < 3 & class = negative: 4/4
Gini(3, 1) = 1 – [(0/4)2 + (4/4)2] = 1
By adding weight and sum each of the gini indices:

Gini (Target, B) = (12/16) * (0.446) + (0/16) * (1) = 0.3345

Using the same approach we can calculate the Gini index for C and D attributes.
Positive Negative
For A | >= 5.0 5 7
| <5 3 1
Gini Index of A = 0.45825
Positive Negative
For B | >= 3.0 8 4
|< 3.0 0 4
Gini Index of B= 0.3345
Positive Negative
For C | >= 4.2 0 6
|< 4.2 8 2
Gini Index of C= 0.2
Positive Negative
For D | >= 1.4 0 5
| < 1.4 8 3
Gini Index of D= 0.273
The most notable types of decision tree algorithms are:-
1. Iterative Dichotomiser 3 (ID3): This algorithm uses Information Gain to decide
which attribute is to be used classify the current subset of the data. For each level of the
tree, information gain is calculated for the remaining data recursively.
2. C4.5: This algorithm is the successor of the ID3 algorithm. This algorithm uses either
Information gain or Gain ratio to decide upon the classifying attribute. It is a direct
improvement from the ID3 algorithm as it can handle both continuous and missing
attribute values.
3. Classification and Regression Tree (CART): It is a dynamic learning algorithm
which can produce a regression tree as well as a classification tree depending upon the
dependent variable.

Introduction To Big Data and Data Mining
No ratings yet
Introduction To Big Data and Data Mining
130 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Unit VI DAA MCQs-merged
No ratings yet
Unit VI DAA MCQs-merged
73 pages
5 Memory Bounded Heuristic Searches
No ratings yet
5 Memory Bounded Heuristic Searches
30 pages
SPCC Exp 10
No ratings yet
SPCC Exp 10
12 pages
SAP PI Context
No ratings yet
SAP PI Context
11 pages
Principles of ML
100% (1)
Principles of ML
2 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Disasters Management Notes
No ratings yet
Disasters Management Notes
34 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Unit 4 - K-Means Clustering Algorithm With Examples
No ratings yet
Unit 4 - K-Means Clustering Algorithm With Examples
14 pages
CH 5
No ratings yet
CH 5
81 pages
Unit 2 - Divide & Conquer and Greedy Stratagy
No ratings yet
Unit 2 - Divide & Conquer and Greedy Stratagy
93 pages
Unit 4 DM
No ratings yet
Unit 4 DM
88 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
87 pages
Class Basic
No ratings yet
Class Basic
75 pages
P9-10 ClassBasic
No ratings yet
P9-10 ClassBasic
82 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Unit 1 Classification & Prediction DM
No ratings yet
Unit 1 Classification & Prediction DM
71 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
50 pages
8 Classification
No ratings yet
8 Classification
82 pages
DAA Code
No ratings yet
DAA Code
58 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Basis of Bisection Method
No ratings yet
Basis of Bisection Method
31 pages
Decision Trees
No ratings yet
Decision Trees
61 pages
Data Mining & Knowledge Discovery
No ratings yet
Data Mining & Knowledge Discovery
34 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
Slide 07 Chapter8 Classification Basic Concept
No ratings yet
Slide 07 Chapter8 Classification Basic Concept
55 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
CSE445 NSU Week - 4
No ratings yet
CSE445 NSU Week - 4
48 pages
Unit-3 ML
No ratings yet
Unit-3 ML
47 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
For Classification Models
No ratings yet
For Classification Models
47 pages
2 Decision Tree Algo
No ratings yet
2 Decision Tree Algo
46 pages
Unit - 1
No ratings yet
Unit - 1
47 pages
Machine Learning Syllabus
No ratings yet
Machine Learning Syllabus
1 page
DM 3
No ratings yet
DM 3
37 pages
Binary Search Tree
No ratings yet
Binary Search Tree
30 pages
Gate
No ratings yet
Gate
33 pages
ML Unit-2
No ratings yet
ML Unit-2
16 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
The MV3R-Tree: A Spatio-Temporal Access Method For Timestamp and Interval Queries
No ratings yet
The MV3R-Tree: A Spatio-Temporal Access Method For Timestamp and Interval Queries
33 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Single Variable Calculus - DPP 01 Discussion (Part 02) Notes
No ratings yet
Single Variable Calculus - DPP 01 Discussion (Part 02) Notes
29 pages
Unit-4 DM
No ratings yet
Unit-4 DM
15 pages
Solution For DWDM Problems
No ratings yet
Solution For DWDM Problems
24 pages
Ch05-DT1-Dr Amin ML
No ratings yet
Ch05-DT1-Dr Amin ML
26 pages
Lecture On AI - Uninformed Search
No ratings yet
Lecture On AI - Uninformed Search
17 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Tree Data Structure
No ratings yet
Tree Data Structure
19 pages
ID3 Algorithm For Decision Trees
No ratings yet
ID3 Algorithm For Decision Trees
16 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
Creating HR Dashboards Using MS Excel
No ratings yet
Creating HR Dashboards Using MS Excel
16 pages
Data Structure - Unit II
No ratings yet
Data Structure - Unit II
20 pages
Report Orange Ngviethoang0212
No ratings yet
Report Orange Ngviethoang0212
15 pages
Chap 3 Greedy
No ratings yet
Chap 3 Greedy
20 pages
Decision Trees
No ratings yet
Decision Trees
13 pages
Attribute Selection Presentation by - Rohit Ghosh
No ratings yet
Attribute Selection Presentation by - Rohit Ghosh
11 pages
Problem Set 7 Solutions
No ratings yet
Problem Set 7 Solutions
7 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
Homework1 Excersises
No ratings yet
Homework1 Excersises
12 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
OR1 - PRACTICE FOR FINAL - Solution
No ratings yet
OR1 - PRACTICE FOR FINAL - Solution
10 pages
Gini Vs Entrophy
No ratings yet
Gini Vs Entrophy
8 pages
HW01
No ratings yet
HW01
8 pages
Chap 3 Heuristics
No ratings yet
Chap 3 Heuristics
9 pages
Example Decision Tree
No ratings yet
Example Decision Tree
8 pages
Final Sample
No ratings yet
Final Sample
4 pages
Data Mining Algorithms Classification L4
No ratings yet
Data Mining Algorithms Classification L4
7 pages
A2 P4 Basic Test
No ratings yet
A2 P4 Basic Test
2 pages
MA2305
No ratings yet
MA2305
2 pages
Mean Shift
No ratings yet
Mean Shift
5 pages
Decision Tree: "For Each Node of The Tree, The Information Value Measures
No ratings yet
Decision Tree: "For Each Node of The Tree, The Information Value Measures
3 pages
AI Assignment #4 049
No ratings yet
AI Assignment #4 049
4 pages
Bmi 401-Design and Analysis of Algorithms Course Outline
No ratings yet
Bmi 401-Design and Analysis of Algorithms Course Outline
4 pages
Counting Techniques & Introduction To Probability - DPP
No ratings yet
Counting Techniques & Introduction To Probability - DPP
4 pages
Construction of Decision Tree Attribute Selection Measures
No ratings yet
Construction of Decision Tree Attribute Selection Measures
5 pages
Akwpc
No ratings yet
Akwpc
1 page
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet
Data Science with R: Beginner to Expert
From Everand
Data Science with R: Beginner to Expert
Narayana Nemani
No ratings yet
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
From Everand
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
Blaine Bateman
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

Learning Decision Trees

Uploaded by

Learning Decision Trees

Uploaded by

Learning Decision Trees

 On the basis of attribute values records are distributed recursively.

= -[0.375 * (-1.415) + 0.625 * (-0.678)]

Building Decision Tree Using Information Gain

Here, we have 3 features and 2 output classes.

1 4.8 3.4 1.9 0.2 positive

2 5 3 1.6 1.2 positive

3 5 3.4 1.6 0.2 positive

4 5.2 3.5 1.5 0.2 positive

5 5.2 3.4 1.4 0.2 positive

6 4.7 3.2 1.6 0.2 positive

7 4.8 3.1 1.6 0.2 positive

8 5.4 3.4 1.5 0.4 positive

9 7 3.2 4.7 1.4 negative

10 6.4 3.2 4.7 1.5 negative

11 6.9 3.1 4.9 1.5 negative

12 5.5 2.3 4 1.3 negative

13 6.5 2.8 4.6 1.5 negative

14 5.7 2.8 4.5 1.3 negative

15 6.3 3.3 4.7 1.6 negative

16 4.9 2.4 3.3 1 negative

Calculating Gini Index for Var A:

By adding weight and sum each of the gini indices:

Gini (Target, A) = (12/16) * (0.4860) + (4/16) * (0.375) = 0.45825

Calculating Gini Index for Var B:

Gini (Target, B) = (12/16) * (0.446) + (0/16) * (1) = 0.3345

You might also like