0% found this document useful (0 votes)
51 views10 pages

Scikit-Learn Decision Trees Explained - by Frank Ceballos - Towards Data Science

The document discusses decision trees, including how they work, how to visualize and interpret them, and how to make predictions using a trained decision tree model. It explains key concepts like nodes, leaves, gini impurity, and parameters for scikit-learn decision tree models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views10 pages

Scikit-Learn Decision Trees Explained - by Frank Ceballos - Towards Data Science

The document discusses decision trees, including how they work, how to visualize and interpret them, and how to make predictions using a trained decision tree model. It explains key concepts like nodes, leaves, gini impurity, and parameters for scikit-learn decision tree models.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Get started Open in app

Follow 531K Followers

You have 2 free member-only stories left this month. Sign up for Medium and get an extra one

Scikit-Learn Decision Trees Explained


Training, Visualizing, and Making Predictions with Decision Trees

Frank Ceballos Feb 23, 2019 · 9 min read


Get started Open in app
Photo by Lukasz Szmigiel on Unsplash

D ecision trees are the most important elements of a Random Forest. They
are capable of fitting complex data sets while allowing the user to see
how a decision was taken. While searching the web I was unable to find one
clear article that could easily describe them, so here I am writing about what I
have learned so far. It’s important to note, a single decision tree is not a very
good predictor; however, by creating an ensemble of them (a forest) and
collecting their predictions, one of the most powerful machine learning tools
can be obtained — the so called Random Forest.

Make sure you have installed pandas and scikit-learn on your machine. If you

haven't, you can learn how to do so here.

A Scikit-Learn Decision Tree


Let’s start by creating decision tree using the iris flower data set. The iris data
set contains four features, three classes of flowers, and 150 samples.

Features: sepal length (cm), sepal width (cm), petal length (cm), petal width
(cm)

Classes: setosa, versicolor, virginica

Numerically, setosa flowers are identified by zero, versicolor by one, and


virginica by two.

For simplicity, we will train our decision tree using all features and setting the
depth to two.

Visualizing The Decision Tree


Of course we still do not know how this tree classifies samples, so let’s visualize
this tree by first creating a dot file using Scikit-Learn export_graphviz module
and then processing it with graphviz.
This will create a file named tree.dot that needs to be processed on graphviz.
Get started Open in app
Here is a YouTube tutorial that shows you how to process such a file with
graphviz. The end result should be similar to the one shown in Figure-1;
however, a different tree might sprout even if the training data is the same!

Figure-1) Our decision tree: In this case, nodes are colored in white, while leaves are colored in orange, green,
and purple. More about leaves and nodes later.

A single decision tree is the classic example of a type of classifier known as a


white box. The predictions made by a white box classifier can easily be
understood. Here is an excellent article about black and white box classifiers.

Understanding the Contents of a Node


In Figure-1, you can see that each box contains several characteristics. Let’s
start by describing the content of the top most node, most commonly referred
to as the root node. The root node is at a depth of zero, see Figure-2. A node is a
point along the decision tree where a question is asked. This action divides the
data into smaller subsets.
Get started Open in app

Figure-2) The depth of the tree: The light colored boxes illustrate the depth of the tree. The root node is
located at a depth of zero.

petal length (cm) <=2.45: The first question the decision tree ask is if the petal
length is less than 2.45. Based on the result, it either follows the true or the false
path.

gini = 0.667: The gini score is a metric that quantifies the purity of the node/leaf
(more about leaves in a bit). A gini score greater than zero implies that samples
contained within that node belong to different classes. A gini score of zero
means that the node is pure, that within that node only a single class of samples
exist. You can find out more about impurity measures here. Notice that we have
a gini score greater than zero; therefore, we know that the samples contained
within the root node belong to different classes.

samples = 150: Since the iris flower data set contains 150 samples, this value is
set to 150.

value = [50, 50, 50]: The value list tells you how many samples at the given
node fall into each category. The first element of the list shows the number of
samples that belong to the setosa class, the second element of the list shows the
Get started Open in app
number of samples that belong to the versicolor class, and the third element in
the list shows the number of samples that belong to the virginica class. Notice
how this node is not a pure one since different types of classes are contained
within the same node. We knew this already from the gini score, but it’s nice to
actually see it.

class = setosa: The class value shows the prediction a given node will make
and it can be determined from the value list. Whichever class occurs the most
within the node will be selected as the class value. If the decision tree were to
end at the root node, it would predict that all 150 samples belonged to the
setosa class. Of course this makes no sense, since there is an equal number of
samples for each class. It seems to me that the decision tree is programmed to
choose the first class on the list if there is an equal number of samples for each
class.

Understanding How a Tree Makes a Split


To determine which feature to use to make the first split — that is, to make the
root node — the algorithm chooses a feature and makes a split. It then looks at
the subsets and measures their impurity using the gini score. It does this for
multiple thresholds and determines that the best split for the given feature is
the one that produces the purest subsets. This is repeated for all the features in
the training set. Ultimately, the root node is determined by the feature that
produces a split with purest subsets. Once the root node is decided, the tree is
grown to a depth of one. The same process is repeated for the other nodes in
the tree.

Understanding How a Tree Will Make a Prediction


Suppose we have a flower with petal_length = 1 and petal_width = 3 . If we
follow the logic of the decision tree shown on Figure-1, we will see that we will
end up in the orange box. In Figure-1, if the question a node asks turns out to
be true (false), we will move to the left (right). The orange box is at a depth of
one, see Figure-2. Since there is nothing growing out of this box, we will refer
to it as a leaf node. Notice the resemblance this has to an actual tree, see Figure-
3. Moreover, note that the gini score is zero — which makes it a pure leaf. The
total number of samples is 50. Out of the 50 samples that end up on the orange
Get started Open in app
leaf node, we can see that all of them belong to the setosa class, see the value

list for this leaf. Therefore, the tree will predict that the sample is a setosa
flower.

Figure-3) Real tree vs Decision Tree Similarity: The tree on the left is inverted to illustrate how a tree grows
from its root and ends at its leaves. Seeing the decision tree on the right should make this analogy more clear.

Let us pick a more interesting sample. For instance, petal_length = 2.60 and
petal_width = 1.2 . We start at the root node which asks whether the petal
length is less than 2.45. This is false; therefore we move to the internal node on
the right, where the gini score is 0.5 and the total number of samples is 100.
This internal node at a depth of one will ask the question “Is the petal width less
than 1.75?” In our case, this is true, so we move to the left and end up in the
green colored leaf node which is at a depth of 2. The decision tree will predict
that this sample is a versicolor flower. You can see that this is most likely the
case because 49 out of the 54 samples that end up in the green leaf node were
versicolor flowers, see the value list for this leaf.

Making a Prediction On a New Samples Using a Trained Tree


Now that we know how our decision tree works, let us make predictions. The
input should be in a list and ordered as [sepal length, sepal width, petal length,

petal width] where the sepal length and sepal width wont affect the predictions
made by the decision tree shown in Figure-1; therefore, we will can assign
Get started Open in app
them an arbitrary value.

The output should be:

This is exactly what we predicted by following the Decision Tree logic.

Scikit-Learn Decision Tree Parameters


If you take a look at the parameters the DecisionTreeClassifier can take, you
might be surprised so, let’s look at some of them.

criterion : This parameter determines how the impurity of a split will be


measured. The default value is “gini” but you can also use “entropy” as a metric
for impurity.

splitter: This is how the decision tree searches the features for a split. The
default value is set to “best”. That is, for each node, the algorithm considers all
the features and chooses the best split. If you decide to set the splitter
parameter to “random,” then a random subset of features will be considered.
The split will then be made by the best feature within the random subset. The
size of the random subset is determined by the max_features parameter. This is
partly where a Random Forest gets its name.

max_depth: This determines the maximum depth of the tree. In our case, we
use a depth of two to make our decision tree. The default value is set to none.
This will often result in over-fitted decision trees. The depth parameter is one of
the ways in which we can regularize the tree, or limit the way it grows to
prevent over-fitting. In Figure-4, you can see what happens if you don’t set the
depth of the tree — pure madness!
Get started Open in app

Figure-4) A fully grown Decision Tree: In the tree shown above, none of the parameters were set. The tree
grows to a fully to a depth of five. There are eight nodes and nine leaves. Not limiting the growth of a decision
tree may lead to over-fitting.

min_samples_split: The minimum number of samples a node must contain in


order to consider splitting. The default value is two. You can use this parameter
to regularize your tree.

min_samples_leaf: The minimum number of samples needed to be considered


a leaf node. The default value is set to one. Use this parameter to limit the
growth of the tree.

max_features: The number of features to consider when looking for the best
split. If this value is not set, the decision tree will consider all features available
to make the best split. Depending on your application, it’s often a good idea to
tune this parameter. Here is an article that recommends how to set
max_features.

For syntax purposes, lets set some of these parameters:

Closing Remarks
Now you know how create a decision tree using Scikit-learn. More importantly,
Get started Open in app
you should be able to visualize it and understand how it classifies samples. It’s
important to note that one needs to limit the liberty of a decision tree. There
are several parameters that can regularized a tree. By default, the max_depth is
set to none. Therefore, a tree will grow fully, which often results in over-fitting.
Moreover, a single decision tree is not a very powerful predictor.

The real power of decision trees unfolds more so when cultivating many of
them — while limiting the way they grow — and collecting their individual
predictions to form a final conclusion. In other words, you grow a forest, and if
your forest is random in nature, using the concept of bagging and with splitter

= "random" , we call this a Random Forest. Many of the parameters used in Scikit-
Learn Random Forest are the same ones explained in this article. So it’s a good
idea to understand what a single decision tree is and how it works, before
moving on to using the big guns.

You can find me in LinkedIn or visit my personal blog.

Sign up for The Daily Pick


By Towards Data Science

Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to
Thursday. Make learning your daily ritual. Take a look

Your email

Get this newsletter

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information
about our privacy practices.

Machine Learning Scikit Learn Decision Tree


Get started Open in app

About Help Legal

Get the Medium app

You might also like