Decision Trees
Decision Trees
Decision Trees
● Decision trees are a non-parametric model used for both regression and
classification tasks
● This notebook demonstrates how to use decision trees for classification
● Decision trees are constructed from only two elements - nodes and branches
● Decision trees are based on recursion
○ A function that calls itself until some exit condition is met
○ The algorithm is built by recursively evaluating different features
and using the best split feature and criteria at each node
○ You'll learn how the best split feature and criteria are calculated in
the math section
● There are multiple types of node a decision tree could have, and all are presented
in the following figure:
● Root node - node at the top of the tree, contains a feature that best splits the data
(a single feature that alone classifies the target variable most accurately)
● Decision nodes - nodes where the variables are evaluated. These nodes have
arrows pointing to them and away from them
● Leaf nodes - final node at which the prediction is made
● Check how every input feature classifies the target variable independently
● If neither is 100% correct, we can consider them as impure
● The Entropy metric can be used to calculate impuritz
○ Formula discussed later
○ Values range from 0 (best) to 1 (worst)
● The variable with the lowest entropy (impurity) is used as a root node
Training process
● Determine the root node (discussed earlier)
● Calculate the Information gain for a single split
○ Formula discussed later
○ The higher the gain the better the split
● Do a greedy search
○ Go over all input feature and their unique values (thresholds)
○ Calculate information gain for every feature/threshold combination
○ Save the best split feature and best split threshold for every node
○ Build the tree recursively
○ Some stopping criteria should be applied when doing so
■ Think of it as an exit condition of a recursive
function
■ This could be maximum depth, minimum samples
at node...
○ If at the leaf node, return the prediction (most common value)
■ You'll know you're at a leaf node if a stopping
criteria has been met or if the split is pure
Prediction process
Math behind
● Example in Python:
In [1]:
import numpy as np
from collections import Counter
In [2]:
def entropy(s):
counts = np.bincount(s)
percentages = counts / len(s)
entropy = 0
for pct in percentages:
if pct > 0:
entropy += pct * np.log2(pct)
return -entropy
In [3]:
s = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1]
print(f'Entropy: {np.round(entropy(s), 5)}')
Entropy: 0.88129
● Information gain:
○ Simply an average of all entropy based on a specific split
○ The higher the information gain, the better the decision split is
● Example:
● Example in Python:
In [4]:
def information_gain(parent, left_child, right_child):
num_left = len(left_child) / len(parent)
num_right = len(right_child) / len(parent)
In [5]:
parent = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
left_child = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
right_child = [0, 0, 0, 0, 1, 1, 1, 1]
print(f'Factorial of 5 is {factorial(5)}')
Factorial of 5 is 120
@staticmethod
def _entropy(s):
'''
Helper function, calculates entropy from an array of integer values.
:param s: list
:return: float, entropy value
'''
# Convert to integers to avoid runtime errors
counts = np.bincount(np.array(s, dtype=np.int64))
# Probabilities of each class label
percentages = counts / len(s)
# Caclulate entropy
entropy = 0
for pct in percentages:
if pct > 0:
entropy += pct * np.log2(pct)
return -entropy
# Go to the left
if feature_value <= tree.threshold:
return self._predict(x=x, tree=tree.data_left)
# Go to the right
if feature_value > tree.threshold:
return self._predict(x=x, tree=tree.data_right)
iris = load_iris()
X = iris['data']
y = iris['target']
● You can now initialize and train the model, and make the predictions afterwards:
In [10]:
model = DecisionTree()
model.fit(X_train, y_train)
preds = model.predict(X_test)
In [11]:
np.array(preds, dtype=np.int64)
In [12]:
y_test
accuracy_score(y_test, preds)
● We already know our model works good, but let's compare it to the
DecisionTreeClassifier from Scikit-Learn
In [14]:
from sklearn.tree import DecisionTreeClassifier
sk_model = DecisionTreeClassifier()
sk_model.fit(X_train, y_train)
sk_preds = sk_model.predict(X_test)
In [15]:
accuracy_score(y_test, sk_preds)