Scikit-Learn Decision Trees Explained - by Frank Ceballos - Towards Data Science
Scikit-Learn Decision Trees Explained - by Frank Ceballos - Towards Data Science
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
D ecision trees are the most important elements of a Random Forest. They
are capable of fitting complex data sets while allowing the user to see
how a decision was taken. While searching the web I was unable to find one
clear article that could easily describe them, so here I am writing about what I
have learned so far. It’s important to note, a single decision tree is not a very
good predictor; however, by creating an ensemble of them (a forest) and
collecting their predictions, one of the most powerful machine learning tools
can be obtained — the so called Random Forest.
Make sure you have installed pandas and scikit-learn on your machine. If you
Features: sepal length (cm), sepal width (cm), petal length (cm), petal width
(cm)
For simplicity, we will train our decision tree using all features and setting the
depth to two.
Figure-1) Our decision tree: In this case, nodes are colored in white, while leaves are colored in orange, green,
and purple. More about leaves and nodes later.
Figure-2) The depth of the tree: The light colored boxes illustrate the depth of the tree. The root node is
located at a depth of zero.
petal length (cm) <=2.45: The first question the decision tree ask is if the petal
length is less than 2.45. Based on the result, it either follows the true or the false
path.
gini = 0.667: The gini score is a metric that quantifies the purity of the node/leaf
(more about leaves in a bit). A gini score greater than zero implies that samples
contained within that node belong to different classes. A gini score of zero
means that the node is pure, that within that node only a single class of samples
exist. You can find out more about impurity measures here. Notice that we have
a gini score greater than zero; therefore, we know that the samples contained
within the root node belong to different classes.
samples = 150: Since the iris flower data set contains 150 samples, this value is
set to 150.
value = [50, 50, 50]: The value list tells you how many samples at the given
node fall into each category. The first element of the list shows the number of
samples that belong to the setosa class, the second element of the list shows the
Get started Open in app
number of samples that belong to the versicolor class, and the third element in
the list shows the number of samples that belong to the virginica class. Notice
how this node is not a pure one since different types of classes are contained
within the same node. We knew this already from the gini score, but it’s nice to
actually see it.
class = setosa: The class value shows the prediction a given node will make
and it can be determined from the value list. Whichever class occurs the most
within the node will be selected as the class value. If the decision tree were to
end at the root node, it would predict that all 150 samples belonged to the
setosa class. Of course this makes no sense, since there is an equal number of
samples for each class. It seems to me that the decision tree is programmed to
choose the first class on the list if there is an equal number of samples for each
class.
list for this leaf. Therefore, the tree will predict that the sample is a setosa
flower.
Figure-3) Real tree vs Decision Tree Similarity: The tree on the left is inverted to illustrate how a tree grows
from its root and ends at its leaves. Seeing the decision tree on the right should make this analogy more clear.
Let us pick a more interesting sample. For instance, petal_length = 2.60 and
petal_width = 1.2 . We start at the root node which asks whether the petal
length is less than 2.45. This is false; therefore we move to the internal node on
the right, where the gini score is 0.5 and the total number of samples is 100.
This internal node at a depth of one will ask the question “Is the petal width less
than 1.75?” In our case, this is true, so we move to the left and end up in the
green colored leaf node which is at a depth of 2. The decision tree will predict
that this sample is a versicolor flower. You can see that this is most likely the
case because 49 out of the 54 samples that end up in the green leaf node were
versicolor flowers, see the value list for this leaf.
petal width] where the sepal length and sepal width wont affect the predictions
made by the decision tree shown in Figure-1; therefore, we will can assign
Get started Open in app
them an arbitrary value.
splitter: This is how the decision tree searches the features for a split. The
default value is set to “best”. That is, for each node, the algorithm considers all
the features and chooses the best split. If you decide to set the splitter
parameter to “random,” then a random subset of features will be considered.
The split will then be made by the best feature within the random subset. The
size of the random subset is determined by the max_features parameter. This is
partly where a Random Forest gets its name.
max_depth: This determines the maximum depth of the tree. In our case, we
use a depth of two to make our decision tree. The default value is set to none.
This will often result in over-fitted decision trees. The depth parameter is one of
the ways in which we can regularize the tree, or limit the way it grows to
prevent over-fitting. In Figure-4, you can see what happens if you don’t set the
depth of the tree — pure madness!
Get started Open in app
Figure-4) A fully grown Decision Tree: In the tree shown above, none of the parameters were set. The tree
grows to a fully to a depth of five. There are eight nodes and nine leaves. Not limiting the growth of a decision
tree may lead to over-fitting.
max_features: The number of features to consider when looking for the best
split. If this value is not set, the decision tree will consider all features available
to make the best split. Depending on your application, it’s often a good idea to
tune this parameter. Here is an article that recommends how to set
max_features.
Closing Remarks
Now you know how create a decision tree using Scikit-learn. More importantly,
Get started Open in app
you should be able to visualize it and understand how it classifies samples. It’s
important to note that one needs to limit the liberty of a decision tree. There
are several parameters that can regularized a tree. By default, the max_depth is
set to none. Therefore, a tree will grow fully, which often results in over-fitting.
Moreover, a single decision tree is not a very powerful predictor.
The real power of decision trees unfolds more so when cultivating many of
them — while limiting the way they grow — and collecting their individual
predictions to form a final conclusion. In other words, you grow a forest, and if
your forest is random in nature, using the concept of bagging and with splitter
= "random" , we call this a Random Forest. Many of the parameters used in Scikit-
Learn Random Forest are the same ones explained in this article. So it’s a good
idea to understand what a single decision tree is and how it works, before
moving on to using the big guns.
Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to
Thursday. Make learning your daily ritual. Take a look
Your email
By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information
about our privacy practices.