0% found this document useful (0 votes)
56 views7 pages

Decision Tree

Decision trees are a machine learning technique that allows predictions to be made by organizing data into sequential "if-then" decision rules. An example decision tree is presented to predict whether Mike will go grocery shopping based on factors like supply levels, weather, and work schedule. Decision trees are created through a process of splitting, or induction, where the optimal attributes to divide the data are selected. The splits are chosen to maximize a metric like information gain. The final step is pruning less important splits from the tree to prevent overfitting.

Uploaded by

Sreshta Tric
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views7 pages

Decision Tree

Decision trees are a machine learning technique that allows predictions to be made by organizing data into sequential "if-then" decision rules. An example decision tree is presented to predict whether Mike will go grocery shopping based on factors like supply levels, weather, and work schedule. Decision trees are created through a process of splitting, or induction, where the optimal attributes to divide the data are selected. The splits are chosen to maximize a metric like information gain. The final step is pruning less important splits from the tree to prevent overfitting.

Uploaded by

Sreshta Tric
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Introduction

Decision trees are a classifier in machine learning that allows us to make predictions based on
previous data. They are like a series of sequential “if … then” statements you feed new data into to
get a result.

To demonstrate decision trees, let’s take a look at an example. Imagine we want to predict whether
Mike is going to go grocery shopping on any given day. We can look at previous factors that led Mike
to go to the store:

Figure 1. An example dataset

Here we can see the amount of grocery supplies Mike had, the weather, and whether Mike worked
each day. Green rows are days he went to the store, and red days are those he didn’t. The goal of a
decision tree is to try to understand why Mike goes to the store, and apply that to new data later on.

Let’s divide the first attribute up into a tree. Mike can either have a low, medium, or high amount of
supplies:
Figure 2. Our first split

Here we can see that Mike never goes to the store if he has a high amount of supplies. This is called
a pure subset, a subset with only positive or only negative examples. With decision trees, there is no
need to break a pure subset down further.

Let’s break the Med Supplies category into whether Mike worked that day:

Figure 3. Our second split

Here we can see we have two more pure subsets, so this tree is complete. We can replace any pure
subsets with their respective answer - in this case, yes or no.

Finally, let’s split the Low Supplies category by the Weather attribute:
Figure 4. Our third split

Now that we have all pure subsets, we can create our final decision tree:

Figure 5. The final decision tree

Motivation

Decision trees are easily created, visualized, and interpreted. Because of this, they are typically the
first method used to model a dataset. The hierarchical structure and categorical nature of a decision
tree makes it highly intuitive to implement. Decision trees expand logarithmically based on the
number of data points you have, meaning larger datasets will impact the tree creation process less
than other classifiers. Because of the tree structure, classifying new data points is also performed
logarithmically.

Classification and Regression Trees


Decision tree algorithms are also known as CART, or Classification and Regression Trees.
A Classification Tree, like the one shown above, is used to get a result from a set of possible values.
A Regression Tree is a decision tree where the result is a continuous value, such as the price of a car.

Splitting (Induction)

Decision trees are created through a process of splitting called induction, but how do we know when
to split? We need a recursive algorithm that determines the best attributes to split on. One such
algorithm is the greedy algorithm:

1. Starting from the root, we create a split for each attribute.

2. For each created split, calculate the cost of the split.

3. Choose the split that costs the least.

4. Recurse into the sub-trees and continue from step 1.

This process is repeated until all nodes have the same value as the target result, or splitting adds no
value to a prediction. This algorithm has the root node as the best classifier.

Cost of Splitting

The cost of a split is determined by a cost function. The goal of using a cost function is to split the
data in a way that can be computed and that provides the most information gain.

For classification trees, those that provide an answer rather than a value, we can compute
imformation gain using Gini Impurities:

Equation 1. The Gini Impurity Function

Ref: https://sebastianraschka.com/faq/docs/decision-tree-binary.html

Equation 2. The Gini Information Gain Formula

Ref: https://sebastianraschka.com/faq/docs/decision-tree-binary.html

To calculate information gain, we first start by computing the Gini Impurity of our root node. Let’s
take a look at the data we used earlier:

Weathe
Supplies Worked? Shopped?
r

D1 Low Sunny Yes Yes


Weathe
Supplies Worked? Shopped?
r

D2 High Sunny Yes No

D3 Med Cloudy Yes No

D4 Low Raining Yes No

D5 Low Cloudy No Yes

D6 High Sunny No No

D7 High Raining No No

D8 Med Cloudy Yes No

D9 Low Raining Yes No

D1
Low Raining No Yes
0

D1
Med Sunny No Yes
1

D1
High Sunny Yes No
2

Our root node is the target variable, whether Mike is going to go shopping. To calculate its Gini
Impurity, we need to find the sum of probabilities squared for each outcome and subtract this result
from one:
Let’s calculate the Gini Information Gain if we split on the first attribute, Supplies. We have three
different categories we can split by - Low, Med, and High. For each of these, we calculate its Gini
Impurity:

As you can see, the impurity for High supplies is 0. This means that if we split on Supplies and receive
High input, we immediately know what the outcome will be. To determine the Gini Information Gain
for this split, we compute the root’s impurity minus the weighted average of each child’s impurity:

We continue this pattern for every possible split, then choose the split that gives us the highest
information gain value. Maximizing information gain leaves us with the most polarized splits possible,
lowering the probability new input is incorrectly classified.

Pruning

A decision tree created through a sufficiently large dataset may end up with an excessive amount of
splits, each with decreasing usefulness. A highly detailed decision tree can even lead to overfitting,
discussed in the previous module. Because of this, it’s beneficial to prune less important splits of a
decision tree away. Pruning involves calculating the information gain of each ending sub-tree (the
leaf nodes and their parent node), then removing the sub-tree with the least information gain:
Ref: http://www.cs.cmu.edu/~bhiksha/courses/10-601/decisiontrees/

As you can see, the sub-tree is replaced with the more prominent result, becoming a new leaf. This
process can be repeated until you reach a desired complexity level, tree height, or information gain
amount. Information gain can be tracked and stored as the tree is built to save time when pruning as
well. Each model should make use of its own pruning algorithm to meet its needs.

Conclusion

Decision trees allow you to quickly and efficiently classify data. Because they shape data into a
heirarchy of decisions, they are highly understandable by even non-experts. Decision trees are
created and refined in a two-step process - induction and pruning. Induction involves picking the best
attribute to split on, while pruning helps to filter out results deemed useless. Because decision trees
are so simple to create and understand, they are typically the first approach used to model and
predict outcomes of a dataset.

You might also like