Decision Tree Classifier Project
Decision Tree Classifier Project
Program:
Course Code: CSE 323
Course Name: PROGRAMMING
WITH DATA STRUCTURES
Examination Committee
Prof Hossam Fahmy
Dr. Islam El-Maddah
Plagiarism Statement
I certify that this assignment / report is my own work, based on my personal study and/or research and that I have
acknowledged all material and sources used in its preparation, whether they are books, articles, reports, lecture notes, and
any other kind of document, electronic or personal communication. I also certify that this assignment / report has not been
previously been submitted for assessment for another course. I certify that I have not copied in part or whole or otherwise
plagiarized the work of other students and / or persons.
Submission Contents
01: Background
02: Implementation Details
03: Complexity of Operations
04: References
2
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
01
Background
3
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
4
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
Figure 2. Simple DT
6
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
02
Implementation
Details
DDvhv
7
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
1. Overall implementation
Our Implementation is divided into two parts: the decision tree algorithm and a
GUI to make it easy to use the program without being involved in the python
script.
1.1. The decision tree implementation
1.1.1. Presumptions
In our implementation of decision tree algorithm, we assumed that the input
variables have only two values 0 and 1. Also we assumed that we are doing binary
classification (i.e. output labels have two values 0 and 1).
1.1.2. Information gain and entropy
The implementation of the decision tree is based on splitting the data to achieve
the highest possible information gain. So, what is information gain? It is a
statistical property that measures how well a given attribute separates the
training examples according to their target classification [1]. The highest the
information gain, the more separable is the data into groups that are readily
distinguishable from each other. To calculate the information gain, we must
define some statistical quantity that is related to the information theory which is
entropy. Entropy is a measure of degree of impurity in a group of examples. For
our case of binary classification, we have a simple mathematical equation:
𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑠𝑒𝑡) = −𝑝+ ∗ log 2 𝑝+ − 𝑝− log2 𝑝− (1)
Where: p+ is the portion of the data belonging to the positive class and p- is the
portion of the data belonging to the negative class.
Entropy is the negative of log as it is meant to describe the impurity of the set
with a positive number. Figure 3 visualizes this property.
Now we can conceptually calculate the information gain as a weighted average of
the entropy of each of the children:
𝐼. 𝑔. = 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑝𝑎𝑟𝑒𝑛𝑡) − 𝑝𝑙𝑒𝑓𝑡 ∗ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑙𝑒𝑓𝑡) − 𝑝𝑟𝑖𝑔ℎ𝑡 ∗ 𝑒𝑛𝑡𝑟𝑜𝑝𝑦(𝑟𝑖𝑔ℎ𝑡) (2)
Where: parent is the table available before splitting, left is the table resulting
from splitting and satisfying the left node condition, right is the table resulting
from splitting and satisfying the right node condition, pleft is the ratio of the labels
8
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
that belong to the left child to the total labels and p right is the ratio of the labels
that belong to the right child to the total labels.
9
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
10
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
a decision for new data. This objective is done by finding the best split that
maximizes the information gain on each step of splitting. Figure 5 visualizes the
complete decision tree resulting from the train function when it is used to be
trained on an OR function. The train function then -in a nutshell- call the
“information_gain” function then calls the “split_column” function so the train
function chooses the best column to split by means of it.
The base cases are: 1. Labels have only one class. 2. Labels have only one
element. 3. Specified maximum depth is reached. 4. All splits give tables that all
have maximum entropy (impurity) i.e. for all the splits maximum information gain
equals zero. If we are in a general case (any case other than the four specified
above), then simply the train function will recursively call itself on the right child
and the left child until a base case is reached.
11
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
So when will we reach a leaf node? This will happen if we have the best split that
maximizes the information gain, so, at this node we will save the prediction in the
attribute “prediction”.
1.1.3.3.2. The “information_gain” function
This function takes as inputs: a column that represents some training examples on
one feature, a column that represents the labels and index at which the table will
be split. The function goal is to calculate the information gain of the table after
splitting. Here, like we did in the entropy function, made use of the base case
when there is only one class in the labels or there are no elements on one of the
children. Obviously, at this base case, the information gain equals zero. At a
general case, we use equation (2).
1.1.3.3.3. The “split_column” function
The inputs to this function are the matrix of all training examples and all
attributes, the labels and the column index at which we want to spli
This function finds the best split index and returns this index and the maximum
information gain gained from this split. This is done first by sorting the values of
the feature column to have all zeros next to each other and all ones next to each
other to be able to split. Then the function finds the boundaries at which the
labels change from zero to one and runs through these boundaries to find the
index that have best split.
1.1.3.3.4. The “predict_element” function
This function aims at predicting the label of one training example. This function is
recursive in nature, so we want to find a base case to start giving a prediction and
build upon it. The base case is found when there is no column stored in the
attribute “column” or there is no split index stored in the attribute “split”. At this
base case the prediction equals the value stored in the prediction attribute (this
attribute stores the value of labels, mean of the value of labels when there is only
one class in the labels or there is only one label or the value of left and right
nodes predictions after finding best split). At a general case, the function tries to
find the nearest location of the entered training example by comparing index of
the best column to split the data to the index stored in the split attribute. If this
column’s index is less that the “split” attribute the we will turn “left” and vice
12
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
versa. We will continue traversing recursively until we reach a leaf node (node
that have a “NULL” left or right child).
1.1.3.3.5. The “predict” function
Predict function simply calls the “predict_element” function to predict a value for
each training example and returns an array of predictions (prediction for each
example).
1.1.3.4. The “DecisionTreeClassifier” class
1.1.3.4.1. The “fit” function
We called it “fit” to have the same name as scikit learn decision tree model to be
convenient to use. It simply creates a root as a “TreeNode” object and calls the
train function on the root to build the whole decision tree.
1.1.3.4.2. The “predict” function
Calls the predict function of the “TreeNode” class on an instance (the root).
1.1.3.4.3. The “accuracy” function
Calculates a chosen accuracy metric like usual accuracy, F1 score, or Matthews
correlation coefficient (more representative metric that is suitable for
classification) of the model on the given dataset. Here are some definitions to
make the idea clear.
True positives (TP) # of right predictions from the positive class.
False positives (FP) # of false predictions from the positive class.
True negatives (TN) # of right predictions from the negative class.
False negatives (FN) # of false predictions from the negative class.
Precision True positives / # of all positives.
Recall True positives / (True positives + false negatives).
F1 score Harmonic mean of precision & recall.
# 𝑜𝑓 𝑡𝑟𝑢𝑒 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝑠
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (3)
𝑡𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑙𝑎𝑏𝑒𝑙𝑠
2
𝐹1 𝑠𝑐𝑜𝑟𝑒 = (4)
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛−1 + 𝑟𝑒𝑐𝑎𝑙𝑙−1
13
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
𝑇𝑝 ∗ 𝑇𝑁 − 𝐹𝑃 ∗ 𝐹𝑁
𝑀𝐶𝐶 = (5)
√(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝑇𝑃)
This GUI is created to make it easy for the user to get the functionality of the
decision tree algorithm to work without being involved in the python script. Some
implementation details and a guide to run this GUI is presented in this video:
https://www.youtube.com/watch?v=g9s85DdLTNw.
Also the whole code of the project can be found here: https://github.com/omar-
ashinawy/DS-DT/blob/master/Dt.py
2. It Works!
In this section, we demonstrate that the project gives a reasonable accuracy using
usual accuracy defined by eq. (3) and F1 score defined by eq. (4) even when tree
is not pruned.
14
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
We also tested the classifier on the given test file and saved the test results in:
https://drive.google.com/open?id=1YYTMkutEiLqC6yr-Zqy5V_bfN4dPvTz8
15
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
16
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
03
Complexity of
operations
17
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
18
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
04
References
19
AIN SHAMS UNIVERSITY CSE 323, PROGRAMMING WITH DATA STRUCTURES, SPRING 2020
FACULTY OF ENGINEERING
20