0% found this document useful (0 votes)

51 views10 pages

Scikit-Learn Decision Trees Explained - by Frank Ceballos - Towards Data Science

The document discusses decision trees, including how they work, how to visualize and interpret them, and how to make predictions using a trained decision tree model. It explains key concepts like nodes, leaves, gini impurity, and parameters for scikit-learn decision tree models.

Uploaded by

محمد الاسد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views10 pages

Scikit-Learn Decision Trees Explained - by Frank Ceballos - Towards Data Science

Uploaded by

محمد الاسد

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Get started Open in app

Follow 531K Followers

You have 2 free member-only stories left this month. Sign up for Medium and get an extra one

Scikit-Learn Decision Trees Explained

Training, Visualizing, and Making Predictions with Decision Trees

Frank Ceballos Feb 23, 2019 · 9 min read

Get started Open in app
Photo by Lukasz Szmigiel on Unsplash

D ecision trees are the most important elements of a Random Forest. They
are capable of ﬁtting complex data sets while allowing the user to see
how a decision was taken. While searching the web I was unable to ﬁnd one
clear article that could easily describe them, so here I am writing about what I
have learned so far. It’s important to note, a single decision tree is not a very
good predictor; however, by creating an ensemble of them (a forest) and
collecting their predictions, one of the most powerful machine learning tools
can be obtained — the so called Random Forest.

Make sure you have installed pandas and scikit-learn on your machine. If you

haven't, you can learn how to do so here.

A Scikit-Learn Decision Tree

Let’s start by creating decision tree using the iris ﬂower data set. The iris data
set contains four features, three classes of ﬂowers, and 150 samples.

Features: sepal length (cm), sepal width (cm), petal length (cm), petal width
(cm)

Classes: setosa, versicolor, virginica

Numerically, setosa ﬂowers are identiﬁed by zero, versicolor by one, and

virginica by two.

For simplicity, we will train our decision tree using all features and setting the
depth to two.

Visualizing The Decision Tree

Of course we still do not know how this tree classifies samples, so let’s visualize
this tree by first creating a dot file using Scikit-Learn export_graphviz module
and then processing it with graphviz.
This will create a file named tree.dot that needs to be processed on graphviz.
Get started Open in app
Here is a YouTube tutorial that shows you how to process such a file with
graphviz. The end result should be similar to the one shown in Figure-1;
however, a different tree might sprout even if the training data is the same!

Figure-1) Our decision tree: In this case, nodes are colored in white, while leaves are colored in orange, green,
and purple. More about leaves and nodes later.

A single decision tree is the classic example of a type of classiﬁer known as a

white box. The predictions made by a white box classiﬁer can easily be
understood. Here is an excellent article about black and white box classiﬁers.

Understanding the Contents of a Node

In Figure-1, you can see that each box contains several characteristics. Let’s
start by describing the content of the top most node, most commonly referred
to as the root node. The root node is at a depth of zero, see Figure-2. A node is a
point along the decision tree where a question is asked. This action divides the
data into smaller subsets.
Get started Open in app

Figure-2) The depth of the tree: The light colored boxes illustrate the depth of the tree. The root node is
located at a depth of zero.

petal length (cm) <=2.45: The ﬁrst question the decision tree ask is if the petal
length is less than 2.45. Based on the result, it either follows the true or the false
path.

gini = 0.667: The gini score is a metric that quantiﬁes the purity of the node/leaf
(more about leaves in a bit). A gini score greater than zero implies that samples
contained within that node belong to different classes. A gini score of zero
means that the node is pure, that within that node only a single class of samples
exist. You can ﬁnd out more about impurity measures here. Notice that we have
a gini score greater than zero; therefore, we know that the samples contained
within the root node belong to different classes.

samples = 150: Since the iris ﬂower data set contains 150 samples, this value is
set to 150.

value = [50, 50, 50]: The value list tells you how many samples at the given
node fall into each category. The ﬁrst element of the list shows the number of
samples that belong to the setosa class, the second element of the list shows the
Get started Open in app
number of samples that belong to the versicolor class, and the third element in
the list shows the number of samples that belong to the virginica class. Notice
how this node is not a pure one since different types of classes are contained
within the same node. We knew this already from the gini score, but it’s nice to
actually see it.

class = setosa: The class value shows the prediction a given node will make
and it can be determined from the value list. Whichever class occurs the most
within the node will be selected as the class value. If the decision tree were to
end at the root node, it would predict that all 150 samples belonged to the
setosa class. Of course this makes no sense, since there is an equal number of
samples for each class. It seems to me that the decision tree is programmed to
choose the ﬁrst class on the list if there is an equal number of samples for each
class.

Understanding How a Tree Makes a Split

To determine which feature to use to make the ﬁrst split — that is, to make the
root node — the algorithm chooses a feature and makes a split. It then looks at
the subsets and measures their impurity using the gini score. It does this for
multiple thresholds and determines that the best split for the given feature is
the one that produces the purest subsets. This is repeated for all the features in
the training set. Ultimately, the root node is determined by the feature that
produces a split with purest subsets. Once the root node is decided, the tree is
grown to a depth of one. The same process is repeated for the other nodes in
the tree.

Understanding How a Tree Will Make a Prediction

Suppose we have a ﬂower with petal_length = 1 and petal_width = 3 . If we
follow the logic of the decision tree shown on Figure-1, we will see that we will
end up in the orange box. In Figure-1, if the question a node asks turns out to
be true (false), we will move to the left (right). The orange box is at a depth of
one, see Figure-2. Since there is nothing growing out of this box, we will refer
to it as a leaf node. Notice the resemblance this has to an actual tree, see Figure-
3. Moreover, note that the gini score is zero — which makes it a pure leaf. The
total number of samples is 50. Out of the 50 samples that end up on the orange
Get started Open in app
leaf node, we can see that all of them belong to the setosa class, see the value

list for this leaf. Therefore, the tree will predict that the sample is a setosa
ﬂower.

Figure-3) Real tree vs Decision Tree Similarity: The tree on the left is inverted to illustrate how a tree grows
from its root and ends at its leaves. Seeing the decision tree on the right should make this analogy more clear.

Let us pick a more interesting sample. For instance, petal_length = 2.60 and
petal_width = 1.2 . We start at the root node which asks whether the petal
length is less than 2.45. This is false; therefore we move to the internal node on
the right, where the gini score is 0.5 and the total number of samples is 100.
This internal node at a depth of one will ask the question “Is the petal width less
than 1.75?” In our case, this is true, so we move to the left and end up in the
green colored leaf node which is at a depth of 2. The decision tree will predict
that this sample is a versicolor ﬂower. You can see that this is most likely the
case because 49 out of the 54 samples that end up in the green leaf node were
versicolor ﬂowers, see the value list for this leaf.

Making a Prediction On a New Samples Using a Trained Tree

Now that we know how our decision tree works, let us make predictions. The
input should be in a list and ordered as [sepal length, sepal width, petal length,

petal width] where the sepal length and sepal width wont affect the predictions
made by the decision tree shown in Figure-1; therefore, we will can assign
Get started Open in app
them an arbitrary value.

The output should be:

This is exactly what we predicted by following the Decision Tree logic.

Scikit-Learn Decision Tree Parameters

If you take a look at the parameters the DecisionTreeClassiﬁer can take, you
might be surprised so, let’s look at some of them.

criterion : This parameter determines how the impurity of a split will be

measured. The default value is “gini” but you can also use “entropy” as a metric
for impurity.

splitter: This is how the decision tree searches the features for a split. The
default value is set to “best”. That is, for each node, the algorithm considers all
the features and chooses the best split. If you decide to set the splitter
parameter to “random,” then a random subset of features will be considered.
The split will then be made by the best feature within the random subset. The
size of the random subset is determined by the max_features parameter. This is
partly where a Random Forest gets its name.

max_depth: This determines the maximum depth of the tree. In our case, we
use a depth of two to make our decision tree. The default value is set to none.
This will often result in over-ﬁtted decision trees. The depth parameter is one of
the ways in which we can regularize the tree, or limit the way it grows to
prevent over-ﬁtting. In Figure-4, you can see what happens if you don’t set the
depth of the tree — pure madness!
Get started Open in app

Figure-4) A fully grown Decision Tree: In the tree shown above, none of the parameters were set. The tree
grows to a fully to a depth of ﬁve. There are eight nodes and nine leaves. Not limiting the growth of a decision
tree may lead to over-ﬁtting.

min_samples_split: The minimum number of samples a node must contain in

order to consider splitting. The default value is two. You can use this parameter
to regularize your tree.

min_samples_leaf: The minimum number of samples needed to be considered

a leaf node. The default value is set to one. Use this parameter to limit the
growth of the tree.

max_features: The number of features to consider when looking for the best
split. If this value is not set, the decision tree will consider all features available
to make the best split. Depending on your application, it’s often a good idea to
tune this parameter. Here is an article that recommends how to set
max_features.

For syntax purposes, lets set some of these parameters:

Closing Remarks
Now you know how create a decision tree using Scikit-learn. More importantly,
Get started Open in app
you should be able to visualize it and understand how it classiﬁes samples. It’s
important to note that one needs to limit the liberty of a decision tree. There
are several parameters that can regularized a tree. By default, the max_depth is
set to none. Therefore, a tree will grow fully, which often results in over-ﬁtting.
Moreover, a single decision tree is not a very powerful predictor.

The real power of decision trees unfolds more so when cultivating many of
them — while limiting the way they grow — and collecting their individual
predictions to form a ﬁnal conclusion. In other words, you grow a forest, and if
your forest is random in nature, using the concept of bagging and with splitter

= "random" , we call this a Random Forest. Many of the parameters used in Scikit-
Learn Random Forest are the same ones explained in this article. So it’s a good
idea to understand what a single decision tree is and how it works, before
moving on to using the big guns.

You can ﬁnd me in LinkedIn or visit my personal blog.

Sign up for The Daily Pick

By Towards Data Science

Hands-on real-world examples, research, tutorials, and cutting-edge techniques delivered Monday to
Thursday. Make learning your daily ritual. Take a look

Your email

Get this newsletter

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information
about our privacy practices.

Machine Learning Scikit Learn Decision Tree

Get started Open in app

About Help Legal

Get the Medium app

Crimson Rivers PDF
100% (8)
Crimson Rivers PDF
2,563 pages
Garam Khandan
100% (3)
Garam Khandan
585 pages
"Can't Hurt Me": David Goggins
88% (8)
"Can't Hurt Me": David Goggins
12 pages
Not Without My Daughter-Betty Mahmoody
80% (97)
Not Without My Daughter-Betty Mahmoody
385 pages
ELS Answers Only
20% (5)
ELS Answers Only
20 pages
Socialism in Europe and The Russian Revolution - Shobhit Nirwan
84% (75)
Socialism in Europe and The Russian Revolution - Shobhit Nirwan
17 pages
Dating Format
91% (34)
Dating Format
33 pages
Bad Life (VOL 1-4)
100% (5)
Bad Life (VOL 1-4)
779 pages
Peer-E-Kamil - by Umera Ahmed (Roman Urdu Translation by Sk. Danish) Malgun Gothic1
96% (136)
Peer-E-Kamil - by Umera Ahmed (Roman Urdu Translation by Sk. Danish) Malgun Gothic1
492 pages
Art Heist Baby
No ratings yet
Art Heist Baby
458 pages
The Husky and His White Cat Shizun - Erha He Ta de Bai Mao Shizun Vol. 1
90% (10)
The Husky and His White Cat Shizun - Erha He Ta de Bai Mao Shizun Vol. 1
440 pages
Maza Ya Saza (Family Incest)
33% (3)
Maza Ya Saza (Family Incest)
26 pages
EATMORE2BEHAPPY - Ang Mutya NG Section E (Part Three) The Final Battle
84% (67)
EATMORE2BEHAPPY - Ang Mutya NG Section E (Part Three) The Final Battle
2,146 pages
Treasure of Nadia Walkthrough
80% (41)
Treasure of Nadia Walkthrough
226 pages
Metal Slinger - Rachel Schne
60% (5)
Metal Slinger - Rachel Schne
366 pages
I Fell in Love With Blind Man 1 To 50
91% (23)
I Fell in Love With Blind Man 1 To 50
826 pages
Bad Life (VOL 1-4)
91% (11)
Bad Life (VOL 1-4)
788 pages
The Graham Effect Campus Diaries Book 1 Elle Kennedy Z Library
86% (14)
The Graham Effect Campus Diaries Book 1 Elle Kennedy Z Library
329 pages
Fei Tian Ye Xiang 非天夜翔 Dinghai Fusheng Records 定海浮生录
100% (11)
Fei Tian Ye Xiang 非天夜翔 Dinghai Fusheng Records 定海浮生录
2,296 pages
You Amp Me - Tal Bauer
80% (10)
You Amp Me - Tal Bauer
304 pages
7th Time Loop 05
91% (22)
7th Time Loop 05
276 pages
Maleeha Meri Behan
88% (40)
Maleeha Meri Behan
57 pages
The Girl in The Green Dress - Jeni Haynes, George Blair-West
100% (5)
The Girl in The Green Dress - Jeni Haynes, George Blair-West
401 pages
Liberal Family 1
59% (54)
Liberal Family 1
226 pages
Manhoos Se Mahan Tak
No ratings yet
Manhoos Se Mahan Tak
3,252 pages
Mera Nam Ahsan He or Mere Ghar Me Ham 5 Log Hain
67% (15)
Mera Nam Ahsan He or Mere Ghar Me Ham 5 Log Hain
9 pages
Lord of The Mysteries 1
100% (5)
Lord of The Mysteries 1
865 pages
Pakistani Family Insect Saga PDF
64% (47)
Pakistani Family Insect Saga PDF
679 pages
Wind and Truth The Brand New Epic Stormli Brandon Sanderson
100% (3)
Wind and Truth The Brand New Epic Stormli Brandon Sanderson
1,592 pages
New Microsoft Office Word Document
67% (3)
New Microsoft Office Word Document
115 pages