0% found this document useful (0 votes)

7 views11 pages

Decision Tree

A Decision Tree is a supervised learning algorithm used primarily for classification problems, capable of handling both categorical and continuous variables. It works by splitting the dataset into homogeneous subsets based on the most significant input variables, using various algorithms like Gini Index, Chi-Square, and Information Gain to determine the best splits. Decision Trees can be categorized into those for categorical and continuous target variables, and they have advantages such as ease of understanding and less data cleaning, but can suffer from issues like overfitting.

Uploaded by

SK.Kalesha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views11 pages

Decision Tree

Uploaded by

SK.Kalesha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

What is a Decision Tree ? How does it work ?

 Decision tree is a type of supervised learning algorithm (having a pre-defined target

variable) that is mostly used in classification problems.
 It works for both categorical and continuous input and output variables.
 In this technique, we split the population or sample into two or more homogeneous
sets (or sub-populations) based on most significant splitter / differentiator in input
variables.

Example:
 Let’s say we have a sample of 30 students with three variables Gender (Boy/ Girl),
Class( IX/ X) and Height (5 to 6 ft).
 15 out of these 30 play cricket in leisure time.
 Now, I want to create a model to predict who will play cricket during leisure period?
 In this problem, we need to segregate students who play cricket in their leisure time
based on highly significant input variable among all three.
 This is where decision tree helps, it will segregate the students based on all values of
three variable and identify the variable, which creates the best homogeneous sets of
students (which are heterogeneous to each other).
 In the snapshot below, you can see that variable Gender is able to identify best
homogeneous sets compared to the other two variables.

 As mentioned above, decision tree identifies the most significant variable and it‟s
value that gives best homogeneous sets of population.
 Now the question which arises is, how does it identify the variable and the split?
 To do this, decision tree uses various algorithms, which we will shall discuss in the
following section.

Types of Decision Trees:

Types of decision tree is based on the type of target variable we have. It can be of two types:

1. Categorical Variable Decision Tree: Decision Tree which has categorical target
variable then it called as categorical variable decision tree.
Example:- In above scenario of student problem, where the target variable was “Student will
play cricket or not” i.e. YES or NO.

2. Continuous Variable Decision Tree: Decision Tree has continuous target variable
then it is called as Continuous Variable Decision Tree.
Example:-
 Let’s say we have a problem to predict whether a customer will pay his renewal
premium with an insurance company (yes/ no).
 Here we know that income of customer is a significant variable but insurance company
does not have income details for all customers.
 Now, as we know this is an important variable, then we can build a decision tree to
predict customer income based on occupation, product and various other variables.
 In this case, we are predicting values for continuous variable.

Important Terminology related to Decision Trees:

Let‟s look at the basic terminology used with Decision trees:

1. Root Node: It represents entire population or sample and this further gets divided into
two or more homogeneous sets.

2. Splitting: It is a process of dividing a node into two or more sub-nodes.

3. Decision Node: When a sub-node splits into further sub-nodes, then it is called decision
node.

4. Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.

Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You
can say opposite process of splitting.

5. Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree.

6. Parent and Child Node: A node, which is divided into sub-nodes is called parent node
of sub-nodes where as sub-nodes are the child of parent node.

These are the terms commonly used for decision trees. As we know that every algorithm has
advantages and disadvantages, below are the important factors which one should know.

Advantages:

1. Easy to Understand: Decision tree output is very easy to understand even for people from
non-analytical background. It does not require any statistical knowledge to read and interpret
them. Its graphical representation is very intuitive and users can easily relate their hypothesis.

2. Useful in Data exploration: Decision tree is one of the fastest way to identify most
significant variables and relation between two or more variables. With the help of decision
trees, we can create new variables / features that has better power to predict target variable.
You can refer article (Trick to enhance power of regression model) for one such trick. It can
also be used in data exploration stage. For example, we are working on a problem where we
have information available in hundreds of variables, there decision tree will help to identify
most significant variable.

3. Less data cleaning required: It requires less data cleaning compared to some other
modeling techniques. It is not influenced by outliers and missing values to a fair degree.

4. Data type is not a constraint: It can handle both numerical and categorical variables.

5. Non Parametric Method: Decision tree is considered to be a non-parametric method. This

means that decision trees have no assumptions about the space distribution and the classifier
structure.

Disadvantages:

1. Over fitting: Over fitting is one of the most practical difficulty for decision tree models.
This problem gets solved by setting constraints on model parameters and pruning .

2. Not fit for continuous variables: While working with continuous numerical variables,
decision tree looses information when it categorizes variables in different categories.
Regression Trees vs Classification Trees:
We all know that the terminal nodes (or leaves) lies at the bottom of the decision tree.
This means that decision trees are typically drawn upside down such that leaves are the
bottom & roots are the tops (shown below).

Both the trees work almost similar to each other, let’s look at the primary differences &
similarity between classification and regression trees:

1. Regression trees are used when dependent variable is continuous.

Classification trees are used when dependent variable is categorical.

2. In case of regression tree, the value obtained by terminal nodes in the training data is the
mean response of observation falling in that region. Thus, if an unseen data observation falls
in that region, we’ll make its prediction with mean value.

3. In case of classification tree, the value (class) obtained by terminal node in the training
data is the mode of observations falling in that region. Thus, if an unseen data observation
falls in that region, we‟ll make its prediction with mode value.

4. Both the trees divide the predictor space (independent variables) into distinct and non-
overlapping regions. For the sake of simplicity, you can think of these regions as high
dimensional boxes or boxes.

5. Both the trees follow a top-down greedy approach known as recursive binary splitting. We
call it as “top-down” because it begins from the top of tree when all the observations are
available in a single region and successively splits the predictor space into two new branches
down the tree. It is known as “greedy” because, the algorithm cares (looks for best variable
available) about only the current split, and not about future splits which will lead to a better
tree.

6. This splitting process is continued until a user defined stopping criteria is reached. For
example: we can tell the the algorithm to stop once the number of observations per node
becomes less than 50.

7. In both the cases, the splitting process results in fully grown trees until the stopping criteria
is reached. But, the fully grown tree is likely to overfit data, leading to poor accuracy on
unseen data. This bring “pruning”. Pruning is one of the technique used tackle overfitting.
How does a tree decide where to split?
The decision of making strategic splits heavily affects a tree’s accuracy. The decision criteria
is different for classification and regression trees.

Decision trees use multiple algorithms to decide to split a node in two or more sub-nodes. The
creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we
can say that purity of the node increases with respect to the target variable. Decision tree splits
the nodes on all available variables and then selects the split which results in most
homogeneous sub-nodes.

The algorithm selection is also based on type of target variables. Let’s look at the four most
commonly used algorithms in decision tree:

Gini Index:
Gini index says, if we select two items from a population at random then they must be of same
class and probability for this is 1 if population is pure.

1. It works with categorical target variable “Success” or “Failure”.

2. It performs only Binary splits
3. Higher the value of Gini higher the homogeneity.
4. CART (Classification and Regression Tree) uses Gini method to create binary splits.

Steps to Calculate Gini for a split:

1. Calculate Gini for sub-nodes, using formula sum of square of probability for success and
failure (p^2+q^2).
2. Calculate Gini for split using weighted Gini score of each node of that split

Example: – Here we want to segregate the students based on target variable ( playing cricket
or not ).
In the snapshot below, we split the population using two input variables Gender and Class.
Now, I want to identify which split is producing more homogeneous sub-nodes using Gini
index.

Split on Gender:
1. Calculate, Gini for sub-node Female = (0.2)*(0.2)+(0.8)*(0.8)=0.68
2. Gini for sub-node Male = (0.65)*(0.65)+(0.35)*(0.35)=0.55
3. Calculate weighted Gini for Split Gender = (10/30)*0.68+(20/30)*0.55 = 0.59

Similar for Split on Class:

1. Gini for sub-node Class IX = (0.43)*(0.43)+(0.57)*(0.57)=0.51
2. Gini for sub-node Class X = (0.56)*(0.56)+(0.44)*(0.44)=0.51
3. Calculate weighted Gini for Split Class = (14/30)*0.51+(16/30)*0.51 = 0.51
Above, you can see that Gini score for Split on Gender is higher than Split on Class, hence,
the node split will take place on Gender.

Chi-Square
It is an algorithm to find out the statistical significance between the differences between sub-
nodes and parent node. We measure it by sum of squares of standardized differences between
observed and expected frequencies of target variable.

1. It works with categorical target variable “Success” or “Failure”.

2. It can perform two or more splits.
3. Higher the value of Chi-Square higher the statistical significance of differences between
sub-node and Parent node.
4. Chi-Square of each node is calculated using formula,
5. Chi-square = ((Actual – Expected)^2 / Expected)^1/2
6. It generates tree called CHAID (Chi-square Automatic Interaction Detector)

Steps to Calculate Chi-square for a split:

1. Calculate Chi-square for individual node by calculating the deviation for Success and
Failure both
2. Calculated Chi-square of Split using Sum of all Chi-square of success and Failure of each
node of the split

Example: Let‟s work with above example that we have used to calculate Gini.

Split on Gender:
1. First we are populating for node Female, Populate the actual value for “Play Cricket” and
“Not Play Cricket”, here these are 2 and 8 respectively.
2. Calculate expected value for “Play Cricket” and “Not Play Cricket”, here it would be 5 for
both because parent node has probability of 50% and we have applied same probability
on Female count(10).
3. Calculate deviations by using formula, Actual – Expected.
It is for “Play Cricket” (2 – 5 = -3)
and for “Not play cricket” ( 8 – 5 = 3).
4. Calculate Chi-square of node for “Play Cricket” and “Not Play Cricket” using formula with
formula, = ((Actual – Expected)^2 / Expected)^1/2.
You can refer below table for calculation.
5. Follow similar steps for calculating Chi-square value for Male node.
6. Now add all Chi-square values to calculate Chi-square for split Gender.
Split on Class:
Perform similar steps of calculation for split on Class and you will come up with below table.

Above, you can see that Chi-square also identify the Gender split is more significant compare
to Class.
Information Gain:
 Look at the image below and think which node can be described easily.
 I am sure, your answer is C because it requires less information as all values are
similar.
 On the other hand, B requires more information to describe it and A requires the
maximum information.
 In other words, we can say that C is a Pure node, B is less Impure and A is more
impure.

 Now, we can build a conclusion that less impure node requires less information to
describe it.
 And, more impure node requires more information.
 Information theory is a measure to define this degree of disorganization in a system
known as Entropy.
 If the sample is completely homogeneous, then the entropy is zero and if the sample is
an equally divided (50% – 50%), it has entropy of one.

 Entropy can be calculated using formula:-

 Here p and q is probability of success and failure respectively in that node.

 Entropy is also used with categorical target variable.
 It chooses the split which has lowest entropy compared to parent node and other splits.
 The lesser the entropy, the better it is.

Steps to calculate entropy for a split:

1. Calculate entropy of parent node
2. Calculate entropy of each individual node of split and calculate weighted average of all
sub-nodes available in split.
Example: Let‟s use this method to identify best split for student example.

1. Entropy for parent node = -(15/30) log2 (15/30) – (15/30) log2 (15/30) = 1.
Here 1 shows that it is a impure node.

2. Entropy for Female node = -(2/10) log2 (2/10) – (8/10) log2 (8/10) = 0.72 and for male
node, -(13/20) log2 (13/20) – (7/20) log2 (7/20) = 0.93

3. Entropy for split Gender = Weighted entropy of sub-nodes = (10/30)0.72 + (20/30)0.93=

0.86

4. Entropy for Class IX node, -(6/14) log2 (6/14) – (8/14) log2 (8/14) = 0.99 and for Class X
node,
-(9/16) log2 (9/16) – (7/16) log2 (7/16) = 0.99.

5. Entropy for split Class = (14/30)0.99 + (16/30)0.99 = 0.99

Above, you can see that entropy for Split on Gender is the lowest among all, so the tree will
split on Gender. We can derive information gain from entropy as 1- Entropy.

Reduction in Variance
 Till now, we have discussed the algorithms for categorical target variable.
 Reduction in variance is an algorithm used for continuous target variables (regression
problems).
 This algorithm uses the standard formula of variance to choose the best split.
 The split with lower variance is selected as the criteria to split the population:

Above X-bar is mean of the values, X is actual and n is number of values.

Steps to calculate Variance:
1. Calculate variance for each node.
2. Calculate variance for each split as weighted average of each node variance.

Example:- Let‟s assign numerical value 1 for play cricket and 0 for not playing cricket.
Now follow the steps to identify the right split:
1. Variance for Root node, here mean value is (15*1 + 15*0)/30 = 0.5 and we have 15 one
and 15 zero.
Now variance would be ((1-0.5)^2+(1-0.5)^2+....15 times+(0-0.5)^2+(0-
0.5)^2+...15 times) / 30, this can be written as (15*(1-0.5)^2+15*(0-0.5)^2) / 30 = 0.25

2. Mean of Female node = (21+80)/10=0.2 and

Variance = (2*(1-0.2)^2+8*(0-0.2)^2) / 10 =0.16

3. Mean of Male Node = (131+70)/20=0.65 and

Variance = (13*(1-0.65)^2+7*(0-0.65)^2) /20 = 0.23

4. Variance for Split Gender = Weighted Variance of Sub-nodes = (10/30)*0.16 + (20/30)

*0.23= 0.21

5. Mean of Class IX node = (61+80)/14=0.43 and

Variance = (6*(1-0.43)^2+8*(0-0.43)^2) /14= 0.24

6. Mean of Class X node = (91+70)/16=0.56 and

Variance = (9*(1-0.56)^2+7*(0-0.56)^2) / 16= 0.25

7. Variance for Split Gender = (14/30)0.24 + (16/30) 0.25 = 0.25

Above, you can see that Gender split has lower variance compare to parent node, so the split
would take place on Gender variable.

Until here, we learnt about the basics of decision trees and the decision making process
involved to choose the best splits in building a tree model.

As I said, decision tree can be applied both on regression and classification problems.

Decision Tree
No ratings yet
Decision Tree
82 pages
EViews 9 Users Guide II
100% (3)
EViews 9 Users Guide II
1,099 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
MI - Unit 4
No ratings yet
MI - Unit 4
79 pages
1822 B.E Cse Batchno 149
No ratings yet
1822 B.E Cse Batchno 149
66 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
DS Tech M 3 1
No ratings yet
DS Tech M 3 1
13 pages
10a Introduction To Decision Trees Homogeneity
No ratings yet
10a Introduction To Decision Trees Homogeneity
12 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Decision Trees
No ratings yet
Decision Trees
17 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
30 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Adobe Scan 16 May 2023
No ratings yet
Adobe Scan 16 May 2023
12 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
BSC ML Ch3
No ratings yet
BSC ML Ch3
106 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
5.desion Tree
No ratings yet
5.desion Tree
18 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Pratham Books Catalogue 2019
No ratings yet
Pratham Books Catalogue 2019
79 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Decision Tree
No ratings yet
Decision Tree
3 pages
UNIT-3 ML Notes
No ratings yet
UNIT-3 ML Notes
4 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
Pengaruh Pelatihan On The Job Dan Off The Job Terhadap Kinerja Staff Perawat Rumah Sakit Muhammadiyah Gresik
No ratings yet
Pengaruh Pelatihan On The Job Dan Off The Job Terhadap Kinerja Staff Perawat Rumah Sakit Muhammadiyah Gresik
13 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Chapter 3 Study Guide
0% (1)
Chapter 3 Study Guide
11 pages
Stats For FRCA
No ratings yet
Stats For FRCA
5 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Decision Trees and Regression Techniques
No ratings yet
Decision Trees and Regression Techniques
27 pages
Unit 4
No ratings yet
Unit 4
33 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
11 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Goodness of Fit: Statistical Model
No ratings yet
Goodness of Fit: Statistical Model
5 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Business Analytics: Data Classification
No ratings yet
Business Analytics: Data Classification
36 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Unit II Notes
No ratings yet
Unit II Notes
36 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Q1 - Method Validation ICH
No ratings yet
Q1 - Method Validation ICH
143 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
IPPTCh 007
No ratings yet
IPPTCh 007
41 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
MLE in Stata
No ratings yet
MLE in Stata
17 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Trees
100% (2)
Decision Trees
16 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
ES031 M3 HypothesisTestingSingleSample
No ratings yet
ES031 M3 HypothesisTestingSingleSample
55 pages
JEE Main Sample Paper 5
No ratings yet
JEE Main Sample Paper 5
13 pages
Decision Treesnotes
No ratings yet
Decision Treesnotes
3 pages
Unit No. 02 - Feature Extraction & Selection
No ratings yet
Unit No. 02 - Feature Extraction & Selection
47 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Soal UAS Ekonometrika Time Series Gasal 2016 2017 Riyanto
No ratings yet
Soal UAS Ekonometrika Time Series Gasal 2016 2017 Riyanto
5 pages
Conditional Regression
No ratings yet
Conditional Regression
7 pages
(Ebook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach Instant Download
100% (1)
(Ebook PDF) Intermediate Social Statistics: A Conceptual and Graphic Approach Instant Download
46 pages
Advance Time Series
No ratings yet
Advance Time Series
12 pages
Testing Hypothesis Using P-Value Method (Q4 - Wk. 3, Las 1)
No ratings yet
Testing Hypothesis Using P-Value Method (Q4 - Wk. 3, Las 1)
5 pages
LICA Unit1
No ratings yet
LICA Unit1
33 pages
2008 - BLUP For Phenotypic Selection in Plant Breeding and Variety Testing
No ratings yet
2008 - BLUP For Phenotypic Selection in Plant Breeding and Variety Testing
20 pages
Introduction To Model Validation: Kasey Jones
No ratings yet
Introduction To Model Validation: Kasey Jones
23 pages
6.3 Linear Regression
No ratings yet
6.3 Linear Regression
4 pages
K-Nearest NEIGHBOUR
No ratings yet
K-Nearest NEIGHBOUR
16 pages
Inquiries Chapter 4
No ratings yet
Inquiries Chapter 4
6 pages
Seminars
No ratings yet
Seminars
7 pages
Fiverr Gig Research
No ratings yet
Fiverr Gig Research
7 pages
Final Exam Advanced Statistics First Sem 2019-20
No ratings yet
Final Exam Advanced Statistics First Sem 2019-20
1 page
Ch.5 Factorial CRD
No ratings yet
Ch.5 Factorial CRD
11 pages
Cockerill 1991
No ratings yet
Cockerill 1991
9 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Intermediate Statistics For Economics Econ006
No ratings yet
Intermediate Statistics For Economics Econ006
5 pages
Quality HW 1
No ratings yet
Quality HW 1
6 pages
Math2361 - Probability & Statistics
No ratings yet
Math2361 - Probability & Statistics
3 pages
Contoh Hasil Analisis Uji Deskriptif
No ratings yet
Contoh Hasil Analisis Uji Deskriptif
2 pages
Var Model Validation: Laura Garc Ia Jorcano February 2018
No ratings yet
Var Model Validation: Laura Garc Ia Jorcano February 2018
9 pages