0% found this document useful (0 votes)

129 views18 pages

LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038

1. The document describes a lab on building classification decision trees using Python. It introduces decision trees and their use for classification and regression problems. 2. It discusses representing classification trees, growing classification trees by splitting nodes, and parameters like maximum depth to control overfitting. 3. The lab objectives are to become familiar with classification decision trees and build one using the Iris dataset in Python. It covers importing packages, preprocessing data, building and evaluating the model, and displaying the trained tree.

Uploaded by

Eman Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views18 pages

LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038

Uploaded by

Eman Ibrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Islamic University Of Gaza

Computer Engineering Department

Artificial Intelligence
ECOM 5038

LAB (1)
Decision Tree

Eng.Mohammed W. Awwad
Eng.Lina Y. Al.Aloul

Mar,2020
P age |2

Objectives
1- To be familiar with classification decision tree.
2- To be able to build classification decision tree using Python.

Introduction
A decision tree is a map of the possible outcomes of a series of related choices. It allows an
individual or organization to weigh possible actions against one another based on their
costs, probabilities, and benefits. They can be used to map out an algorithm that predicts
the best choice mathematically.
A decision tree typically starts with a single node, which branches into possible outcomes.
Each of those outcomes leads to additional nodes, which branch off into other possibilities.

Types Of Trees
Classification and Regression Trees (CART) is a term introduced by Leo Breiman to refer to the
Decision Tree algorithm that can be learned for classification or regression predictive
modeling problems.
Classification predictive modelling : is the task of approximating a mapping function (f) from
input variables (X) to discrete output variables (y).
The output variables are often called labels or categories. The mapping function predicts the
class or category for a given observation.
Regression predictive modelling: is the task of approximating a mapping function (f) from
input variables (X) to a continuous output variable (y).
A continuous output variable is a real-value, such as an integer or floating point value.

Representation of Classification Trees

Classification trees are essentially a series of questions designed to assign a classification. The
image below is a classification tree trained on the IRIS dataset (flower species).Root (brown)
and decision (blue) nodes contain questions which split into sub-nodes. The root node is just
the topmost decision node. In other words, it is where you start traversing the classification
tree. The leaf nodes (green), also called terminal nodes, are nodes that don’t split into more
nodes. Leaf nodes are where classes are assigned by majority vote.

Figure 1 : Representation of Classification Trees

P age |3

Classification Trees Grown

A classification tree learns a sequence of if then questions with each question involving one
feature and one split point Look at the partial tree below (Figure 2(a)), the question, “petal
length (cm) ≤ 2.45” splits the data into two branches based on some value (2.45 in this case).
The value between the nodes is called a split point. A good value for a split point is one that
does a good job of separating one class from the others. Classification trees are a greedy
algorithm which means by default it will continue to split until it has a pure node.

Figure 2(a) : Partial Tree splits the data into two branches Figure 2(b): vertical line A as splitter at 2.45

Figure 3(a): splits the data into two branches based on 4.95 Figure 3(b) : Vertical line B as splitter at 4.95
P age |4

In the image in Figure 3(a) , the tree has a maximum depth of 2.Tree depth is a measure of
how many splits a tree can make before coming to a prediction. This process could be
continued further with more splitting until the tree is as pure as possible. The problem with
many repetitions of this process is that this can lead to a very deep classification tree with
many nodes. Luckily, most classification tree implementations allow you to control for the
maximum depth of a tree which reduces overfitting. In other words, you can set the
maximum depth to stop the growth of the decision tree past a certain depth. For a visual
understanding of maximum depth, you can look at the image in Figure 4 .

Figure 4 : Classification trees of different depths fit on the IRIS dataset.

Selection Criterion
Decision tree algorithm use information gain to split a node . Gini or entropy is the criterion
for calculating information gain .
IG = Entropy/Impurity before splitting(parent) — Entropy/Impurity after splitting(children)

Entropy in statistics is analogous to entropy in thermodynamics where it signifies

disorder. If there are multiple classes in a node, there is disorder in that node.

Gini impurity is a measure of how often a randomly chosen element from the set would be
incorrectly labeled if it was randomly labeled according to the distribution of labels in the
subset.
P age |5

Classification Tree Prediction

To use a classification tree, start at the root node (brown), and traverse the tree until you
reach a leaf (terminal) node. Using the classification tree in the the image below, imagine you
had a flower with a petal length of 4.5 cm and you wanted to classify it. Starting at the root
node, you would first ask “Is the petal length (cm) ≤ 2.45”? The length is greater than 2.45 so
that question is False. Proceed to the next decision node and ask, “Is the petal length (cm) ≤
4.95”? This is True so you could predict the flower species as versicolor.

What class (species) is

a flower with the
following feature ?

Petal length (cm) : 4.5

Figure (5) : Species counts are : setosa = 0 , versicolor = 38 ,

virginica = 3 . Prediction is versicolor as it is the majority class.

Tree Parameters
One of the benefits of decision tree training is that you can stop training based on several
thresholds.
The option minbucket provides the smallest number of observations that are allowed in a
terminal node. If a split decision breaks up the data into a node with less than the minbucket,
it won’t accept it.
The minsplit parameter is the smallest number of observations in the parent node that could
be split further. The default is 20. If you have less than 20 records in a parent node, it is labeled
as a terminal node.
Finally, the maxdepth parameter prevents the tree from growing past a certain depth/height.
height. The default is 30 . You can use the maxdepth option to create single-rule trees.
P age |6

Advantages and Disadvantages of Decision Tree

Advantages:
1. Compared to other algorithms decision trees requires less effort for data preparation
during pre-processing.
2. A decision tree does not require normalization of data.
3. A decision tree does not require scaling of data as well.
4. Missing values in the data also does NOT affect the process of building decision tree to
any considerable extent.
5. A Decision trees model is very intuitive and easy to explain to technical teams as well as
stakeholders.

Disadvantages:
1. A small change in the data can cause a large change in the structure of the decision tree
causing instability.
2. For a Decision tree sometimes calculation can go far more complex compared to other
algorithms.
3. Decision tree often involves higher time to train the model.
4. Decision tree training is relatively expensive as complexity and time taken is more.
5. Decision Tree algorithm is inadequate for applying regression and predicting continuous
values.
6. Decision trees prone to overfitting.
P age |7

Classification Tree Using Python

1- Import Packages

2- Overview of the problem set

Problem Statement: You are given the iris dataset which consists of 3 different types
of irises that are Setosa, Versicolour, and Virginica. The petal and sepal length and
width are stored in a 150x4 numpy array. Thus, The dataset contains 150 iris sample
where each sample has four features that are the petal and sepal length and width.
We will build a tree decision classifier that can correctly classify irises as Setosa,
Versicolour, or Virginica. Let's get more familiar with the dataset.

Note: You should reshape iris target shape from (150,) to (150,1) to treat it as
column vector.
P age |8

3- Splitting Data into Training and Test Sets

4- Building Model

5- Measuring Model Performance

P age |9

6- Displaying Decision Tree

To draw the tree , we need to install graphviz library by enter the command :
conda install python-graphviz , in anaconda window
P a g e | 10

7- Features importance

As we note the petal length and width have the highest features importance
weights. Keep in mind that if a feature has a low feature importance value, it doesn’t
necessarily mean that the feature isn’t important for prediction, it just means that the
particular feature wasn’t chosen at a particularly early level of the tree. It could also be
that the feature identical or highly correlated with another informative feature.
P a g e | 11

8- Tuning Model hyperparameters

One way to improve the performance of our model is by finding the optimal value
for max_depth hyperparameter. The code below outputs the accuracy for decision
trees with different values for max_depth.
P a g e | 12
P a g e | 13

9- Model Decision Boundary

P a g e | 14

10- Suggested Models

Model A
P a g e | 15

Model B
P a g e | 16

Model C
P a g e | 17

Model D
P a g e | 18

Model E

Good Luck :)

Decision Tree
0% (1)
Decision Tree
24 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
28 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Decision Tree Learning (8 Hours)
No ratings yet
Decision Tree Learning (8 Hours)
141 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Nahik Sinnar District Map
No ratings yet
Nahik Sinnar District Map
1 page
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Tree
100% (1)
Decision Tree
57 pages
Dar Lect 12
No ratings yet
Dar Lect 12
29 pages
Module#8 Decision Tree and Random Forest
No ratings yet
Module#8 Decision Tree and Random Forest
37 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Session 9 10 Decision Tree
No ratings yet
Session 9 10 Decision Tree
41 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Module 4
No ratings yet
Module 4
30 pages
Lesson 5.0 Supervised Learning With Decision Trees
No ratings yet
Lesson 5.0 Supervised Learning With Decision Trees
16 pages
ML Mod-4
No ratings yet
ML Mod-4
30 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
No ratings yet
Machine Learning With Python - Machine Learning Algorithms - Decision Tree
17 pages
Classification Using Decision Trees
No ratings yet
Classification Using Decision Trees
43 pages
Decision Trees
No ratings yet
Decision Trees
8 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
No ratings yet
1.10. Decision Trees - Scikit-Learn 0.24.1 Documentation
10 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
No ratings yet
Decision Trees - A Complete Introduction With Examples - by Shubham Koli - Medium
22 pages
ECS 3-6-1 - 2 - 800871b4
No ratings yet
ECS 3-6-1 - 2 - 800871b4
47 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
BGC Form - Idb Check
No ratings yet
BGC Form - Idb Check
2 pages
Integration Server Admin Guide
No ratings yet
Integration Server Admin Guide
368 pages
Unit-3 Alt
No ratings yet
Unit-3 Alt
24 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Konsep Ensemble
No ratings yet
Konsep Ensemble
52 pages
ML - Module-3-Chapter-6 RNSIT
No ratings yet
ML - Module-3-Chapter-6 RNSIT
10 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
DM Lab Cycle 5
No ratings yet
DM Lab Cycle 5
3 pages
Hydraulic Pump Power Calculation PDF
75% (4)
Hydraulic Pump Power Calculation PDF
1 page
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Introduction To Decision Tree: Gini Index
No ratings yet
Introduction To Decision Tree: Gini Index
15 pages
Boq Ductable Airconditioner
No ratings yet
Boq Ductable Airconditioner
3 pages
Trees and Forests: Machine Learning With Python Cookbook
No ratings yet
Trees and Forests: Machine Learning With Python Cookbook
5 pages
HSMC
No ratings yet
HSMC
5 pages
Ôn Tập 1
No ratings yet
Ôn Tập 1
10 pages
Ups Blazer 400va 600va 800va
No ratings yet
Ups Blazer 400va 600va 800va
34 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
El-Fi M20: Shaft Power Monitor
No ratings yet
El-Fi M20: Shaft Power Monitor
32 pages
AWS Certification Paths
No ratings yet
AWS Certification Paths
1 page
Miyachi - MA-627 Program Box Manual
No ratings yet
Miyachi - MA-627 Program Box Manual
16 pages
Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU
No ratings yet
Decision Trees: Principal Data Miner, ATO Adjunct Associate Professor, ANU
3 pages
Downloads - Mastering Node - JS, Part 1 - Introduction
100% (1)
Downloads - Mastering Node - JS, Part 1 - Introduction
41 pages
Mobile Phone Worksheet
No ratings yet
Mobile Phone Worksheet
2 pages
ERIEZ Detectores Metalarm Serie 3000
No ratings yet
ERIEZ Detectores Metalarm Serie 3000
8 pages
KTQM 67 Mitx
No ratings yet
KTQM 67 Mitx
148 pages
Lineage Tracking in GIS Ensuring Data Integrity
No ratings yet
Lineage Tracking in GIS Ensuring Data Integrity
8 pages
Group4 F5 Pumps
No ratings yet
Group4 F5 Pumps
16 pages
2013HW70753-EndSemReport-Sagar Agrawal
No ratings yet
2013HW70753-EndSemReport-Sagar Agrawal
56 pages
Cloud Computing Unit 4
No ratings yet
Cloud Computing Unit 4
12 pages
CAN Bus On STM32F103 - How To Begin
No ratings yet
CAN Bus On STM32F103 - How To Begin
1 page
DTV TX 25-150: Main Characteristics
No ratings yet
DTV TX 25-150: Main Characteristics
2 pages
Python Programming - Free Ebook
No ratings yet
Python Programming - Free Ebook
9 pages
Net Com Lab Assigment
No ratings yet
Net Com Lab Assigment
9 pages
Cant Help Falling in Love With You Chords (Ver 2) by Elvis Presley at Ultimate-Guitar
No ratings yet
Cant Help Falling in Love With You Chords (Ver 2) by Elvis Presley at Ultimate-Guitar
2 pages
Drop Leaf Dining Table in Solid Pine - Seats 2-Emerson: Popular Bundles
No ratings yet
Drop Leaf Dining Table in Solid Pine - Seats 2-Emerson: Popular Bundles
5 pages
Aluminum Electrolytic Capacitors Aluminum Electrolytic Capacitors
No ratings yet
Aluminum Electrolytic Capacitors Aluminum Electrolytic Capacitors
3 pages
(03-07) - Fuel Injection Nozzle - 4-390 4T-390 Emissions
No ratings yet
(03-07) - Fuel Injection Nozzle - 4-390 4T-390 Emissions
3 pages
723039EBAEAF
No ratings yet
723039EBAEAF
1 page
B4860FS
No ratings yet
B4860FS
2 pages
Carter SH: Synthetic Industrial Gear Lubricant With Long Drain Interval
No ratings yet
Carter SH: Synthetic Industrial Gear Lubricant With Long Drain Interval
2 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet

LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038

Uploaded by

LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038

Uploaded by

Islamic University Of Gaza

Computer Engineering Department

Representation of Classification Trees

Figure 1 : Representation of Classification Trees

Classification Trees Grown

Figure 4 : Classification trees of different depths fit on the IRIS dataset.

Entropy in statistics is analogous to entropy in thermodynamics where it signifies

Classification Tree Prediction

What class (species) is

Petal length (cm) : 4.5

Figure (5) : Species counts are : setosa = 0 , versicolor = 38 ,

Advantages and Disadvantages of Decision Tree

Classification Tree Using Python

2- Overview of the problem set

3- Splitting Data into Training and Test Sets

5- Measuring Model Performance

6- Displaying Decision Tree

8- Tuning Model hyperparameters

9- Model Decision Boundary

10- Suggested Models

You might also like