Decision Trees in Machine Learning - by Prashant Gupta - Towards Data Science
Decision Trees in Machine Learning - by Prashant Gupta - Towards Data Science
Save
A decision tree is drawn upside down with its root at the top. In the image on the left, the
bold text in black represents a condition/internal node, based on which the tree splits
into branches/ edges. The end of the branch that doesn’t split anymore is the
decision/leaf, in this case, whether the passenger died or survived, represented as red
and green text respectively.
Although, a real dataset will have a lot more features and this will just be a branch in a
much bigger tree, but you can’t ignore the simplicity of this algorithm. The feature
importance is clear and relations can be viewed easily. This methodology is more
commonly known as learning decision tree from data and above tree is called
Classification tree as the target is to classify passenger as survived or died.
Regression trees are represented in the same manner, just they predict continuous
values like price of a house. In general, Decision Tree algorithms are referred to as
CART or Classification and Regression Trees.
So, what is actually going on in the background? Growing a tree involves deciding
on which features to choose and what conditions to use for splitting, along with
knowing when to stop. As a tree generally grows arbitrarily, you will need to trim it
down for it to look beautiful. Lets start with a common technique used for splitting.
https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052 2/6
28/08/2022, 13:03 Decision Trees in Machine Learning | by Prashant Gupta | Towards Data Science
In this procedure all the features are considered and different split points are tried and
tested using a cost function. The split with the best cost (or lowest cost) is selected.
Consider the earlier example of tree learned from titanic dataset. In the first split or the
root, all attributes/features are considered and the training data is divided into groups
based on this split. We have 3 features, so will have 3 candidate splits. Now we will
calculate how much accuracy each split will cost us, using a function. The split that
costs least is chosen, which in our example is sex of the passenger. This algorithm is
recursive in nature as the groups formed can be sub-divided using same strategy. Due
to this procedure, this algorithm is also known as the greedy algorithm, as we have an
excessive desire of lowering the cost. This makes the root node as best
predictor/classifier.
Cost of a split
Lets take a closer look at cost functions used for classification and regression. In
both cases the cost functions try to find most homogeneous branches, or branches
having groups with similar responses. This makes sense we can be more sure that a
test data input will follow a certain path.
https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052 3/6
28/08/2022, 13:03 Decision Trees in Machine Learning | by Prashant Gupta | Towards Data Science
A Gini score gives an idea of how good a split is by how mixed the response classes are
in the groups created by the split. Here, pk is proportion of same class inputs present in
a particular group. A perfect class purity occurs when a group contains all inputs from
the same class, in which case pk is either 1 or 0 and G = 0, where as a node having a
50–50 split of classes in a group has the worst purity, so for a binary classification it
will have pk = 0.5 and G = 0.5.
Pruning
The performance of a tree can be further increased by pruning. It involves removing
the branches that make use of features having low importance. This way, we reduce
the complexity of tree, and thus increasing its predictive power by reducing overfitting.
Pruning can start at either root or the leaves. The simplest method of pruning starts at
leaves and removes each node with most popular class in that leaf, this change is kept if
it doesn't deteriorate accuracy. Its also called reduced error pruning. More
sophisticated pruning methods can be used such as cost complexity pruning where a
learning parameter (alpha) is used to weigh whether nodes can be removed based on
the size of the sub-tree. This is also known as weakest link pruning.
Advantages of CART
Simple to understand, interpret, visualize.
Can handle both numerical and categorical data. Can also handle multi-output
problems.
D ii t i l ti l littl ff t f
https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052
f d t ti 4/6
28/08/2022, 13:03 Decision Trees in Machine Learning | by Prashant Gupta | Towards Data Science
Decision trees require relatively little effort from users for data preparation.
Open in app Get started
Disadvantages of CART
Decision-tree learners can create over-complex trees that do not generalize the data
well. This is called overfitting.
Decision trees can be unstable because small variations in the data might result in a
completely different tree being generated. This is called variance, which needs to be
lowered by methods like bagging and boosting.
Greedy algorithms cannot guarantee to return the globally optimal decision tree.
This can be mitigated by training multiple trees, where the features and samples
are randomly sampled with replacement.
Decision tree learners create biased trees if some classes dominate. It is therefore
recommended to balance the data set prior to fitting with the decision tree.
This is all the basic, to get you at par with decision tree learning. An improvement over
decision tree learning is made using technique of boosting. A popular library for
implementing these algorithms is Scikit-Learn. It has a wonderful api that can get your
model up an running with just a few lines of code in python.
If you liked this article, be sure to click ❤ below to recommend it and if you have any
questions, leave a comment and I will do my best to answer.
For being more aware of the world of machine learning, follow me. It’s the best way to
find out when I write more articles like this.
You can also follow me on Twitter, email me directly or find me on linkedin. I’d love
to hear from you.
5.8K 24
https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052 5/6
28/08/2022, 13:03 Decision Trees in Machine Learning | by Prashant Gupta | Towards Data Science
Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-
edge research to original features you don't want to miss. Take a look.
By signing up, you will create a Medium account if you don’t already have one. Review
our Privacy Policy for more information about our privacy practices.
https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8052 6/6