0% found this document useful (0 votes)

59 views11 pages

Decision Trees

The document discusses decision trees, including how they are constructed from nodes and branches, the different types of nodes, how to determine the root node, the training and prediction processes, and the key mathematical concepts of entropy and information gain. Decision trees are built recursively by evaluating features to find the best split criteria at each node, using entropy to measure purity and information gain to evaluate splits, until a stopping criteria is reached. The document provides examples of calculating entropy and information gain in Python code and discusses how recursion is applied in decision tree implementations.

Uploaded by

Derek Degbedzui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views11 pages

Decision Trees

Uploaded by

Derek Degbedzui

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

5.

Decision Trees

● Decision trees are a non-parametric model used for both regression and
classification tasks
● This notebook demonstrates how to use decision trees for classification
● Decision trees are constructed from only two elements - nodes and branches
● Decision trees are based on recursion
○ A function that calls itself until some exit condition is met
○ The algorithm is built by recursively evaluating different features
and using the best split feature and criteria at each node
○ You'll learn how the best split feature and criteria are calculated in
the math section
● There are multiple types of node a decision tree could have, and all are presented
in the following figure:

● Root node - node at the top of the tree, contains a feature that best splits the data
(a single feature that alone classifies the target variable most accurately)
● Decision nodes - nodes where the variables are evaluated. These nodes have
arrows pointing to them and away from them
● Leaf nodes - final node at which the prediction is made

How to determine the root node

● Check how every input feature classifies the target variable independently
● If neither is 100% correct, we can consider them as impure
● The Entropy metric can be used to calculate impuritz
○ Formula discussed later
○ Values range from 0 (best) to 1 (worst)
● The variable with the lowest entropy (impurity) is used as a root node

Training process
● Determine the root node (discussed earlier)
● Calculate the Information gain for a single split
○ Formula discussed later
○ The higher the gain the better the split
● Do a greedy search
○ Go over all input feature and their unique values (thresholds)
○ Calculate information gain for every feature/threshold combination
○ Save the best split feature and best split threshold for every node
○ Build the tree recursively
○ Some stopping criteria should be applied when doing so
■ Think of it as an exit condition of a recursive
function
■ This could be maximum depth, minimum samples
at node...
○ If at the leaf node, return the prediction (most common value)
■ You'll know you're at a leaf node if a stopping
criteria has been met or if the split is pure

Prediction process

● Recursively traverse the tree

● At each node check if the direction of the traversal (left or right), based on the input
data
● When the leaf node is reached, the most common value is returned

Math behind

● Essentially, you only need to implement two formulas

○ Entropy
○ Information gain
● Entropy
○ Measures the purity of the split
○ Calculated at the node level
○ Ranges between 0 (pure) and 1 (impure)
● Example:

● Example in Python:
In [1]:
import numpy as np
from collections import Counter

In [2]:
def entropy(s):
counts = np.bincount(s)
percentages = counts / len(s)

entropy = 0
for pct in percentages:
if pct > 0:
entropy += pct * np.log2(pct)
return -entropy

In [3]:
s = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1]
print(f'Entropy: {np.round(entropy(s), 5)}')

Entropy: 0.88129

● Information gain:
○ Simply an average of all entropy based on a specific split
○ The higher the information gain, the better the decision split is

● Example:

● Example in Python:
In [4]:
def information_gain(parent, left_child, right_child):
num_left = len(left_child) / len(parent)
num_right = len(right_child) / len(parent)

gain = entropy(parent) - (num_left * entropy(left_child) + num_right * entropy(right_child))

return gain

In [5]:
parent = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
left_child = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
right_child = [0, 0, 0, 0, 1, 1, 1, 1]

print(f'Information gain: {np.round(information_gain(parent, left_child, right_child), 5)}')

Information gain: 0.18094

Recursion Crash Course

● A lot of decision trees implementation boils down to recursion

● Put simply, a recursive function is a function that calls itself
● Some "exit condition" is required if a function will call itself multiple times
○ It's common to write it at the top of the function
● Let's take a look at the simplest example - a recursive function that returns a
factorial of an integer
In [16]:
def factorial(x):
# Exit condition
if x == 1:
return 1
return x * factorial(x - 1)

print(f'Factorial of 5 is {factorial(5)}')

Factorial of 5 is 120

● Recursion is needed in decision tree classifiers to build additional nodes of a tree

until some condition (exit) is met
Implementation

● We'll need two classes

○ Node - implements a single node of a decision tree
○ DecisionTree - implements the algorithm
● The Node class is here to store the data about the feature, threshold, data going left
and right, information gain, and the leaf node value
○ All are initially set to None
○ The leaf node value is available only for leaf nodes
In [6]:
class Node:
'''
Helper class which implements a single tree node.
'''
def __init__(self, feature=None, threshold=None, data_left=None, data_right=None, gain=None,
value=None):
self.feature = feature
self.threshold = threshold
self.data_left = data_left
self.data_right = data_right
self.gain = gain
self.value = value

● The DecisionTree class contains a bunch of methods

● The constructor holds values for min_samples_split and max_depth. These are
hyperparameters. The first one is used to specify a minimum number of samples
required to split a node, and the second one specifies a maximum depth of a tree.
Both are used in recursive functions as exit conditions
● The _entropy(s) function calculates the impurity of an input vector s
● The _information_gain(parent, left_child, right_child) calculates the information gain
value of a split between a parent and two children
● The _best_split(X, y) function calculates the best splitting parameters for input
features X and a target variable y
○ It does so by iterating over every column in X and every threshold
value in every column to find the optimal split using information
gain
● The _build(X, y, depth) function recursively builds a decision tree until stopping
criterias are met (hyperparameters in the constructor)
● The fit(X, y) function calls the _build() function and stores the built tree to the
constructor
● The _predict(x) function traverses the tree to classify a single instance
● The predict(X) function applies the _predict() function to every instance in matrix X
In [7]:
class DecisionTree:
'''
Class which implements a decision tree classifier algorithm.
'''
def __init__(self, min_samples_split=2, max_depth=5):
self.min_samples_split = min_samples_split
self.max_depth = max_depth
self.root = None

@staticmethod
def _entropy(s):
'''
Helper function, calculates entropy from an array of integer values.

:param s: list
:return: float, entropy value
'''
# Convert to integers to avoid runtime errors
counts = np.bincount(np.array(s, dtype=np.int64))
# Probabilities of each class label
percentages = counts / len(s)

# Caclulate entropy
entropy = 0
for pct in percentages:
if pct > 0:
entropy += pct * np.log2(pct)
return -entropy

def _information_gain(self, parent, left_child, right_child):

'''
Helper function, calculates information gain from a parent and two child nodes.

:param parent: list, the parent node

:param left_child: list, left child of a parent
:param right_child: list, right child of a parent
:return: float, information gain
'''
num_left = len(left_child) / len(parent)
num_right = len(right_child) / len(parent)

# One-liner which implements the previously discussed formula

return self._entropy(parent) - (num_left * self._entropy(left_child) + num_right *
self._entropy(right_child))

def _best_split(self, X, y):

'''
Helper function, calculates the best split for given features and target

:param X: np.array, features

:param y: np.array or list, target
:return: dict
'''
best_split = {}
best_info_gain = -1
n_rows, n_cols = X.shape

# For every dataset feature

for f_idx in range(n_cols):
X_curr = X[:, f_idx]
# For every unique value of that feature
for threshold in np.unique(X_curr):
# Construct a dataset and split it to the left and right parts
# Left part includes records lower or equal to the threshold
# Right part includes records higher than the threshold
df = np.concatenate((X, y.reshape(1, -1).T), axis=1)
df_left = np.array([row for row in df if row[f_idx] <= threshold])
df_right = np.array([row for row in df if row[f_idx] > threshold])

# Do the calculation only if there's data in both subsets

if len(df_left) > 0 and len(df_right) > 0:
# Obtain the value of the target variable for subsets
y = df[:, -1]
y_left = df_left[:, -1]
y_right = df_right[:, -1]

# Caclulate the information gain and save the split parameters

# if the current split if better then the previous best
gain = self._information_gain(y, y_left, y_right)
if gain > best_info_gain:
best_split = {
'feature_index': f_idx,
'threshold': threshold,
'df_left': df_left,
'df_right': df_right,
'gain': gain
}
best_info_gain = gain
return best_split

def _build(self, X, y, depth=0):

'''
Helper recursive function, used to build a decision tree from the input data.

:param X: np.array, features

:param y: np.array or list, target
:param depth: current depth of a tree, used as a stopping criteria
:return: Node
'''
n_rows, n_cols = X.shape

# Check to see if a node should be leaf node

if n_rows >= self.min_samples_split and depth <= self.max_depth:
# Get the best split
best = self._best_split(X, y)
# If the split isn't pure
if best['gain'] > 0:
# Build a tree on the left
left = self._build(
X=best['df_left'][:, :-1],
y=best['df_left'][:, -1],
depth=depth + 1
)
right = self._build(
X=best['df_right'][:, :-1],
y=best['df_right'][:, -1],
depth=depth + 1
)
return Node(
feature=best['feature_index'],
threshold=best['threshold'],
data_left=left,
data_right=right,
gain=best['gain']
)
# Leaf node - value is the most common target value
return Node(
value=Counter(y).most_common(1)[0][0]
)

def fit(self, X, y):

'''
Function used to train a decision tree classifier model.

:param X: np.array, features

:param y: np.array or list, target
:return: None
'''
# Call a recursive function to build the tree
self.root = self._build(X, y)

def _predict(self, x, tree):

'''
Helper recursive function, used to predict a single instance (tree traversal).

:param x: single observation

:param tree: built tree
:return: float, predicted class
'''
# Leaf node
if tree.value != None:
return tree.value
feature_value = x[tree.feature]

# Go to the left
if feature_value <= tree.threshold:
return self._predict(x=x, tree=tree.data_left)

# Go to the right
if feature_value > tree.threshold:
return self._predict(x=x, tree=tree.data_right)

def predict(self, X):

'''
Function used to classify new instances.

:param X: np.array, features

:return: np.array, predicted classes
'''
# Call the _predict() function for every observation
return [self._predict(x, self.root) for x in X]
Testing

● We'll use the Iris dataset from Scikit-Learn

In [8]:
from sklearn.datasets import load_iris

iris = load_iris()

X = iris['data']
y = iris['target']

● The below code applies train/test split to the dataset:

In [9]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

● You can now initialize and train the model, and make the predictions afterwards:
In [10]:
model = DecisionTree()
model.fit(X_train, y_train)
preds = model.predict(X_test)

In [11]:
np.array(preds, dtype=np.int64)

In [12]:
y_test

● As you can see, the arrays are identical

● Let's calculate the accuracy to confirm this:
In [13]:
from sklearn.metrics import accuracy_score

accuracy_score(y_test, preds)

● As expected, the perfect score was obtained on the test set

Comparison with Scikit-Learn

● We already know our model works good, but let's compare it to the
DecisionTreeClassifier from Scikit-Learn
In [14]:
from sklearn.tree import DecisionTreeClassifier

sk_model = DecisionTreeClassifier()
sk_model.fit(X_train, y_train)
sk_preds = sk_model.predict(X_test)
In [15]:
accuracy_score(y_test, sk_preds)

● Both perform the same, at least accuracy-wise

Reinforcement Detailing Handbook-2010
100% (6)
Reinforcement Detailing Handbook-2010
188 pages
Learning C++ (MEAP V05) - Michael Haephrati and Ruth Haephrati - Chapters 1 To 8 of 17, 2023 - Manning Publications Co. - 9781617298509 - Anna's Archive
100% (1)
Learning C++ (MEAP V05) - Michael Haephrati and Ruth Haephrati - Chapters 1 To 8 of 17, 2023 - Manning Publications Co. - 9781617298509 - Anna's Archive
589 pages
VSE+InfoScale Enterprise OracleRAC 2020 05
No ratings yet
VSE+InfoScale Enterprise OracleRAC 2020 05
89 pages
General Catalogue - SICES Genset Controllers PDF
100% (1)
General Catalogue - SICES Genset Controllers PDF
12 pages
Customer Experience in 2020: of Buying Decisions Are Based On Customer Experience.
No ratings yet
Customer Experience in 2020: of Buying Decisions Are Based On Customer Experience.
17 pages
Tecnomatix Plant Simulation Release Notes Version 9.0
No ratings yet
Tecnomatix Plant Simulation Release Notes Version 9.0
14 pages
FRAM Utilities UsersGuide
No ratings yet
FRAM Utilities UsersGuide
70 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
18csc310j Unit 5
No ratings yet
18csc310j Unit 5
300 pages
Better Data Science - Generate PDF Reports With Python
No ratings yet
Better Data Science - Generate PDF Reports With Python
5 pages
Windows Sysadmin Interview Questions
No ratings yet
Windows Sysadmin Interview Questions
64 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
13 pages
Decision Trees Cheat Sheet PDF
No ratings yet
Decision Trees Cheat Sheet PDF
2 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
PHP All
No ratings yet
PHP All
63 pages
Simple Linear Regression: Math Behind
No ratings yet
Simple Linear Regression: Math Behind
6 pages
AS-IS & GAP Analysis: Mobile POS With NFC
No ratings yet
AS-IS & GAP Analysis: Mobile POS With NFC
7 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Certificate: Student Database" Mr. Sandeep Singh Chauhan
No ratings yet
Certificate: Student Database" Mr. Sandeep Singh Chauhan
20 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
Logistic Regression
No ratings yet
Logistic Regression
10 pages
Decision Tree
No ratings yet
Decision Tree
28 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Host 1
No ratings yet
Host 1
1,803 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Chapter 1
No ratings yet
Chapter 1
31 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
Homework 4 Solutions ECE 426 Spring 2018 Gabriel Kuri
No ratings yet
Homework 4 Solutions ECE 426 Spring 2018 Gabriel Kuri
2 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Linear Regression For Absolute Beginners With Implementation in Python
No ratings yet
Linear Regression For Absolute Beginners With Implementation in Python
17 pages
Machine Learning Lab: Delhi Technological University
No ratings yet
Machine Learning Lab: Delhi Technological University
6 pages
Multiple Regression
No ratings yet
Multiple Regression
7 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Random Forest: The Algorithm in A Nutshell
No ratings yet
Random Forest: The Algorithm in A Nutshell
10 pages
K Nearest Neighbors
No ratings yet
K Nearest Neighbors
5 pages
Dotw
No ratings yet
Dotw
2 pages
PCS - Process Control System ILTIS-PCS - Sistema Control de Procesos
No ratings yet
PCS - Process Control System ILTIS-PCS - Sistema Control de Procesos
9 pages
Region Complete Address Tel. No. Sector Duration Status City/Munici Pality Name of Institution (Public/Private) Course/Registered Program
No ratings yet
Region Complete Address Tel. No. Sector Duration Status City/Munici Pality Name of Institution (Public/Private) Course/Registered Program
324 pages
Better Data Science - Make Synthetic Datasets With Python
No ratings yet
Better Data Science - Make Synthetic Datasets With Python
4 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Email Spam PDF
No ratings yet
Email Spam PDF
5 pages
Huawei MSC Pool
No ratings yet
Huawei MSC Pool
4 pages
Almost Everything You Wanted To Know About PTO
No ratings yet
Almost Everything You Wanted To Know About PTO
34 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
21 Decision Trees
No ratings yet
21 Decision Trees
62 pages
IXrouter2 Installation Guide (2017-01)
No ratings yet
IXrouter2 Installation Guide (2017-01)
15 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
ICT Computer Support Technician 12-11 JD Ps SC 3 Feb 11
No ratings yet
ICT Computer Support Technician 12-11 JD Ps SC 3 Feb 11
3 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Jagannath University Paper
No ratings yet
Jagannath University Paper
2 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
CommonCore Gateway
No ratings yet
CommonCore Gateway
26 pages
19 - Decision Tree - ID3
No ratings yet
19 - Decision Tree - ID3
87 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Building A Decision Tree Classifier From Scratch
No ratings yet
Building A Decision Tree Classifier From Scratch
10 pages
Random Forest
No ratings yet
Random Forest
25 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Entropy and Information Gain For Decision Tree Algorithm
No ratings yet
Entropy and Information Gain For Decision Tree Algorithm
12 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
XML Example
No ratings yet
XML Example
14 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Inbound Integration Process (Lookup Integration)
No ratings yet
Inbound Integration Process (Lookup Integration)
4 pages
Asynchronous Bus.
No ratings yet
Asynchronous Bus.
3 pages
LVC 1 Post-Session Summary
No ratings yet
LVC 1 Post-Session Summary
9 pages
Decision Tree
No ratings yet
Decision Tree
17 pages
CS Dept Practical Exams Timetable 2024
No ratings yet
CS Dept Practical Exams Timetable 2024
4 pages
Decision Trees Implementation
No ratings yet
Decision Trees Implementation
13 pages
MA4270
No ratings yet
MA4270
1 page
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Unit Iii Machine Learning
No ratings yet
Unit Iii Machine Learning
19 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
‎⁨کامپیوتر صنف دوازدهم⁩
No ratings yet
‎⁨کامپیوتر صنف دوازدهم⁩
114 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Unit II
No ratings yet
Unit II
34 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
From Everand
Python for Data Science: Data Science Mastery by Nikhil Khan, #1
Nikhil Khan
No ratings yet
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet

Decision Trees

Uploaded by

Decision Trees

Uploaded by

5.

How to determine the root node

● Recursively traverse the tree

● Essentially, you only need to implement two formulas

gain = entropy(parent) - (num_left * entropy(left_child) + num_right * entropy(right_child))

print(f'Information gain: {np.round(information_gain(parent, left_child, right_child), 5)}')

Information gain: 0.18094

Recursion Crash Course

● A lot of decision trees implementation boils down to recursion

● Recursion is needed in decision tree classifiers to build additional nodes of a tree

● We'll need two classes

● The DecisionTree class contains a bunch of methods

def _information_gain(self, parent, left_child, right_child):

:param parent: list, the parent node

# One-liner which implements the previously discussed formula

def _best_split(self, X, y):

:param X: np.array, features

# For every dataset feature

# Do the calculation only if there's data in both subsets

# Caclulate the information gain and save the split parameters

def _build(self, X, y, depth=0):

:param X: np.array, features

# Check to see if a node should be leaf node

def fit(self, X, y):

:param X: np.array, features

def _predict(self, x, tree):

:param x: single observation

def predict(self, X):

:param X: np.array, features

● We'll use the Iris dataset from Scikit-Learn

● The below code applies train/test split to the dataset:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

● As you can see, the arrays are identical

● As expected, the perfect score was obtained on the test set

● Both perform the same, at least accuracy-wise

You might also like