0% found this document useful (0 votes)

71 views

Lecture - 3 Classification (Decision Tree)

The document discusses decision trees (DT), including what they are, why they are useful, and how they work. It provides examples to illustrate key concepts in DT, such as: - DTs use splitting algorithms that recursively separate a dataset based on the values of features. - Information theory concepts like entropy and information gain are used to determine the optimal features to split on. - DTs output easy to understand classification rules and can handle both numeric and nominal data.

Uploaded by

Ashenafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views

Lecture - 3 Classification (Decision Tree)

Uploaded by

Ashenafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 44

Classification : Decision Tree

(DT)
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2020)
Outline

 What is decision tree (DT) algorithm

 Why we need DT
 The Pros and Cons of DT
 Information Theory
 Some Issues in DT
 Assignment II

11/07/22 2
Decision Tree (DT)

 Decision trees: splitting datasets one features at a time.

 The decision tree is one of the most commonly used classification technique.
 It has a decision blocks (rectangles).
 Termination block (ovals).
 The right and left arrows are called branches.

 The kNN algorithm can do a grate job of classification, but it didn’t lead to
any major insight about the data.

11/07/22 3
Decision Tree (DT)

Figure 1: A decision Tree

11/07/22 4
Decision Tree (DT)

 The best part of the DT (decision tree) algorithm is that humans can easily
understand the data:
 The DT algorithm:
 Takes a set of data. (training examples)
 Build a decision tree (model), and draw it.

 It can also be re-represented as sets of if-then rules to improve human readability.

 The DT does a grate job of distilling data into knowledge.
 Takes a set of unfamiliar data and extract a set of rules.
 DT is often used in expert system development.

11/07/22 5
Decision Tree (DT)

 The DT can be expressed using the following expression:

 (Outlook = Sunny ˄ Humidity = Normal) → Yes
 ˅ (Outlook = Overcast) → Yes
 ˅ (Outlook = Rain ˄ Wind = Weak) → Yes

11/07/22 6
Decision Tree (DT)

 The pros and cons of DT:

 Pros of DT:
 Computationally cheap to use,
 Easy for humans to understand the learned results,
 Missing values OK (robust to errors),
 Can deal with irrelevant features.

 Cons of DT:
 Prone to overfitting.
 Work with: Numeric values, nominal values.

11/07/22 7
Decision Tree (DT)

 Appropriate problems for DT learning:

 Instance are represented by attribute-value pairs (fixed set of attributes and
their values),
 The target function has discrete output values,
 Disjunctive descriptions may be required,
 The training data may contain errors,
 The training data may contain missing attribute values.

11/07/22 8
Decision Tree (DT)

 The mathematics that is used by DT to split the dataset is called

information theory:
 The first decision, you need to make is:
 Which feature shall be used to split the data.
 You need to try every feature and measure which split will give the
best result.
 Then split the dataset into subsets.
 The subsets will then traverse down the branches of the decision
node.
 If the data on the branch is the same, stop; else repeat the splitting.

11/07/22 9
Decision Tree (DT)

Figure 2: Pseudo-code for the splitting function

11/07/22 10
Decision Tree (DT)

 General approach to decision trees:

 Collect: Any method.
 Prepare: DI3 algorithm works only on nominal values, so any
continues values will need to be quantized.
 Analyze: Any method. You should visually inspect the tree after it
is build.
 Train: Construct a tree data structure. (DT)
 Test: Calculate the error rate with the learned tree.
 Use: This can be used in any supervised learning task. Often, to
better understand the data.

11/07/22 11
Decision Tree (DT)

 We would like to classify the following animals into two

classes:
 Fish and not Fish

Table 1: Marine animal data

11/07/22 12
Decision Tree (DT)

 Need to decide whether we should split the data based on the

first feature or the second feature:
 To make more organize the unorganized data.
 One way to do this is to measure the information.
 Measure the information before and after the split.

 Information theory is a branch of science that is concerned with

quantifying information.
 The change in information before and after the split is known
as the information gain.
11/07/22 13
Decision Tree (DT)

 The split with the highest information gain is the best option.
 The measure of information of a set is known as the Shannon
entropy or entropy.
 One way to do this is to measure the information.

 The change in information before and after the split is known

as the information gain.

11/07/22 14
Decision Tree (DT)

 To calculate entropy, you need the expected value of all the

information of all possible values of our class.
 This is given by:

 Where n is the number of classes:

11/07/22 15
Decision Tree (DT)

 The higher the entropy, the more mixed up the data.

 Another common measure of disorder in a set is the Gini
impurity.
 Which is the probability of choosing an item from the set and the
probability of that data item being misclassified.

 Calculate the shannon entropy of a dataset.

 Dataset splitting on a given feature.
 Choosing the best feature to split on.

11/07/22 16
Decision Tree (DT)

 Recursively building the tree.

 Start with dataset and split it based on the best attribute.
 The data will traverse down the branches of the tree to another
node.
 This node will then split the data again (recursively)
 Stop under the following conditions: run out of attributes or if the
instances in a branch are the same class.

11/07/22 17
Decision Tree (DT)

Table 2: Example training sets

11/07/22 18
Decision Tree (DT)

Figure 3: Data path while splitting

11/07/22 19
Decision Tree (DT)

 ID3 uses the information gain measure to select among the

candidate attributes.
 Start with dataset and split it based on the best attribute.
 Given a collection S, containing positive and negative examples
of some target.

 The entropy of S relative to this Boolean classification is:

Entropy(S) = -p1Xlog2p1+ - p2Xlog2p2

11/07/22 20
Decision Tree (DT)

 Example:
 The target attribute is PlayTennis. (yes/no)

Table 3: Example training sets

11/07/22 21
Decision Tree (DT)

 Suppose S is a collection of 14 examples of some Boolean

concept, including 9 positive and 5 negative examples.
 Then the entropy of S relative to Boolean classification is:

11/07/22 22
Decision Tree (DT)

 Note that the entropy is 0 if all members of S belong to the

same class.
 For example: if all the members are positive (p+ = 1), then (p-
= 0).
 Entropy (s) = -1.log2(1) – 0.log2(0) = 0

 Note the entropy is one (1) when the collection contain an

equal number of positive and negative examples.
 If the collection contain unequal number of positive and
negative the entropy is b/n 0 and 1.
11/07/22 23
Decision Tree (DT)

 Suppose S is a collection of training-example days described by

attributes Wind. (weak, strong)
 The information gain is the measure used by ID3 to select the
best attribute at each step in growing the tree.

11/07/22 24
Decision Tree (DT)

 Information gain of the two attributes: Humidity and Wind.

11/07/22 25
Decision Tree (DT)

 Example:
 ID3 will determines the information gain for each attribute.
(Outlook, Temperature, Humidity and Wind)
 Then select the one with the highest information gain.
 The information gain values for all four attributes are:
 Gain (S, Outlook) = 0.246
 Gain (S, Humidity) = 0.151
 Gain (S, Wind) = 0.048
 Gain (S, Temperature) = 0.029

 Outlook provides grater information gain than the other.

11/07/22 26
Decision Tree (DT)

 Example:
 According to the information gain measure, the Outlook attribute
selected as the root node.
 Branches are created below the root for each of its possible
values. (Sunny, Overcast, and Rain)

11/07/22 27
Decision Tree (DT)

 The partially learned decision tree resulting from the first step of ID3

11/07/22 28
Decision Tree (DT)

 The overcast descendant has only positive examples and

therefore becomes a leaf node with classification Yes:

 The other two nodes will be expand by selecting the attribute

with the highest information gain relative to the new subsets.

11/07/22 29
Decision Tree (DT)

 Decision Tree learning can be:

 Classification tree: finite set values target variables
 Regression tree: continuous values target variable

 There are many specific Decision Tree algorithms:

 ID3 (Iterative of ID3)
 C4.5 (Successor of ID3)
 CART(classification and Regression Tree)
 CHAID (Chi – squared Automatic Interaction Detector)
 MARS: extends DT to handle numerical data better
11/07/22 30
Decision Tree (DT)

 Different Decision Tree algorithms use different metrics for

measuring the “best attribute” :
 Information gain: used by ID3, C4.5 and C5.0
 Gini impurity: used by CART

11/07/22 31
Decision Tree (DT)

 ID3 in terms of its search space and search strategy:

 ID3’s hypothesis space of all decision tree is a complete space of
finite discrete-valued functions.
 ID3’s maintains only a single current hypothesis as it searches
through the space of decision trees.

 ID3 in its pure form perform no backtracking in its search. (post-

pruning the decision tree)
 ID3 uses all training examples at each step in the search to make
statistically based decisions regarding how to refine its current
hypothesis. (much less sensitive to error)
11/07/22 32
Decision Tree (DT)

 Inductive bias in Decision Tree learning (ID3) :

 Inductive bias are the set of assumption.
 ID3 selects in favor of shorter trees over longer ones. (breadth
first approach)
 Selects trees that place the attributes with the highest information
gain closest to the root.

11/07/22 33
Decision Tree (DT)

 Issues in Decision Tree learning:

 How deeply to grow the decision tree
 Handling continuous attributes
 Choosing an appropriate attributes selection measure
 Handling training data with missing attribute values
 Handling attributes with differing costs and
 Improve computational efficiency

 ID3 extended to address most of these issues to C4.5.

11/07/22 34
Decision Tree (DT)

 Avoiding over fitting the Data:

 Noisy data and too small training examples are problems.
 Over fitting is a practical problem for Decision Tree and many of
the learning algorithms.
 Over fitting was found to decrease the accuracy of the learned tree
by 10-25%.

 Approach to avoid over fitting:

 Stop growing the tree, before it over fitting. (direct but less practical)
 Allow the tree to over fitting, and then post-prune (the most successful
in practice)

11/07/22 35
Decision Tree (DT)

 Incorporating continuous-value attributes:

 Initial definition of ID3, attributes and target value must be
discrete set of values.

 The attributes tested in the decision nodes of the tree must be

discrete value :
 Create a new Boolean attribute for the continuous value or
 Multiple interval rather than just two interval.

11/07/22 36
Decision Tree (DT)

 Alternative measure for selecting attributes:

 Information gain favor attributes with many values.
 One alternative measure that has been used successfully is the
gain ratio.

11/07/22 37
Decision Tree (DT)

 Handling training example with missing attribute value:

 Assign it with the most common value among training examples at
node n.
 Assign the probability to each of the possible values of the attribute.
 The 2nd approach, is used in C4.5.

11/07/22 38
Decision Tree (DT)

 Handling attributes with different costs:

 Low-cost attributes than high-cost attributes.
 ID3 can be modified to take into account costs by introducing a cost
term into the attribute selection measure.
Divide the gain by the cost of the attribute.

11/07/22 39
Question & Answer

11/07/22 40
Thank You !!!

11/07/22 41
Assignment II

 Answer the given questions by considering the following set of training examples.

11/07/22 42
Assignment II

 (a) What is the entropy of this collection of training examples with respect to the target function classification?
 (b) What is the information gain of a2 relative to these training examples?

11/07/22 43
Decision Tree (DT)

 Do some research on the following Decision Tree algorithms:

11/07/22 44

ML-chap-3
No ratings yet
ML-chap-3
52 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant
No ratings yet
Decision Trees For Classification - A Machine Learning Algorithm - Xoriant
4 pages
Machine Learning-Lecture 05
No ratings yet
Machine Learning-Lecture 05
21 pages
Slide 3
No ratings yet
Slide 3
23 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
DWM_Module 3 (1)
No ratings yet
DWM_Module 3 (1)
22 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
Practice Q Machine Learning Ans
No ratings yet
Practice Q Machine Learning Ans
54 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
UNIT-3[MLT]
No ratings yet
UNIT-3[MLT]
42 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
Unit-3 Alt
No ratings yet
Unit-3 Alt
24 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
ML_UNIT3
No ratings yet
ML_UNIT3
24 pages
Unit-2 Material (1)
No ratings yet
Unit-2 Material (1)
52 pages
Lecture11-Ch8-ClassBasic-Part1
No ratings yet
Lecture11-Ch8-ClassBasic-Part1
38 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
decision-tree-intro-MDT903
No ratings yet
decision-tree-intro-MDT903
40 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Ch02 DecisionTree
No ratings yet
Ch02 DecisionTree
41 pages
5.KNN Naive Bayes and DT
No ratings yet
5.KNN Naive Bayes and DT
44 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Unit 3
No ratings yet
Unit 3
46 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
No ratings yet
Data Structures: Notes For Lecture 13 Techniques of Data Mining by Samaher Hussein Ali
8 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
6.2.Unit-2 ML Handsout
No ratings yet
6.2.Unit-2 ML Handsout
18 pages
DS_w12_DT
No ratings yet
DS_w12_DT
61 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Data Mining
No ratings yet
Data Mining
68 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Lecture 9 - 2024-Searching and Hashing Algorithms
No ratings yet
Lecture 9 - 2024-Searching and Hashing Algorithms
38 pages
ISL: Experiment 2: Aim: Implementation of Any AI Problem Using The Uninformed Search
No ratings yet
ISL: Experiment 2: Aim: Implementation of Any AI Problem Using The Uninformed Search
12 pages
Book 8 Math
No ratings yet
Book 8 Math
5 pages
ESO 208A: Computational Methods in Engineering: Tutorial 4
No ratings yet
ESO 208A: Computational Methods in Engineering: Tutorial 4
3 pages
Forsythe 1957
No ratings yet
Forsythe 1957
16 pages
Analysis of Algorithms 01 _ Class Notes
No ratings yet
Analysis of Algorithms 01 _ Class Notes
35 pages
Module I Supervised Learning PPT-1
100% (1)
Module I Supervised Learning PPT-1
147 pages
Full Download (Ebook) Independent Random Sampling Methods by Luca Martino, David Luengo, Joaquín Míguez ISBN 9783319726335, 9783319726342, 3319726331, 331972634X PDF DOCX
100% (8)
Full Download (Ebook) Independent Random Sampling Methods by Luca Martino, David Luengo, Joaquín Míguez ISBN 9783319726335, 9783319726342, 3319726331, 331972634X PDF DOCX
60 pages
1.fitting of Straight Line
No ratings yet
1.fitting of Straight Line
1 page
CE 007 (Numerical Solutions To CE Problems)
No ratings yet
CE 007 (Numerical Solutions To CE Problems)
6 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Waveform Coding Questions
No ratings yet
Waveform Coding Questions
12 pages
Unit 5
No ratings yet
Unit 5
39 pages
A Remainder Theorem and Factor Theorem 1
No ratings yet
A Remainder Theorem and Factor Theorem 1
18 pages
I B.SC CS DS Unit V
No ratings yet
I B.SC CS DS Unit V
22 pages
Review On Online Feature Selection
No ratings yet
Review On Online Feature Selection
4 pages
Evaluation of Numerical Methods For TSCOPF in A Large Interconnected System
No ratings yet
Evaluation of Numerical Methods For TSCOPF in A Large Interconnected System
10 pages
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
100% (1)
101905CS502H - Neural Networks and Deep Learning - Model Question Paper
4 pages
Information Retrieval Question Bank
No ratings yet
Information Retrieval Question Bank
3 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Solution Tutorial 9
No ratings yet
Solution Tutorial 9
4 pages
Cluster Analysis in R TML
No ratings yet
Cluster Analysis in R TML
5 pages
Regression
No ratings yet
Regression
4 pages
Associative Memory Hop Field Networks
No ratings yet
Associative Memory Hop Field Networks
66 pages
Deep Learning (MODULE-4)
No ratings yet
Deep Learning (MODULE-4)
102 pages
NFA to DFA _FST
No ratings yet
NFA to DFA _FST
95 pages
An Efficient Algorithm For The Calculation of A Constant Q Transform
No ratings yet
An Efficient Algorithm For The Calculation of A Constant Q Transform
4 pages
Termite Life Cycle Optimizer Report
No ratings yet
Termite Life Cycle Optimizer Report
19 pages
Divine Word College of Calapan: Engineering and Architecture Department
No ratings yet
Divine Word College of Calapan: Engineering and Architecture Department
1 page
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
No ratings yet
Csa4020 Deep-Learning LP 1.0 22 Csa4020 Deep-Learning LP 1.0 1 Deep Learning
2 pages

Lecture - 3 Classification (Decision Tree)

Uploaded by

Lecture - 3 Classification (Decision Tree)

Uploaded by

Classification : Decision Tree

 What is decision tree (DT) algorithm

 Decision trees: splitting datasets one features at a time.

Figure 1: A decision Tree

 It can also be re-represented as sets of if-then rules to improve human readability.

 The DT can be expressed using the following expression:

 The pros and cons of DT:

 Appropriate problems for DT learning:

 The mathematics that is used by DT to split the dataset is called

Figure 2: Pseudo-code for the splitting function

 General approach to decision trees:

 We would like to classify the following animals into two

Table 1: Marine animal data

 Need to decide whether we should split the data based on the

 Information theory is a branch of science that is concerned with

 The change in information before and after the split is known

 To calculate entropy, you need the expected value of all the

 Where n is the number of classes:

 The higher the entropy, the more mixed up the data.

 Calculate the shannon entropy of a dataset.

 Recursively building the tree.

Table 2: Example training sets

Figure 3: Data path while splitting

 ID3 uses the information gain measure to select among the

 The entropy of S relative to this Boolean classification is:

Table 3: Example training sets

 Suppose S is a collection of 14 examples of some Boolean

 Note that the entropy is 0 if all members of S belong to the

 Note the entropy is one (1) when the collection contain an

 Suppose S is a collection of training-example days described by

 Information gain of the two attributes: Humidity and Wind.

 Outlook provides grater information gain than the other.

 The overcast descendant has only positive examples and

 The other two nodes will be expand by selecting the attribute

 Decision Tree learning can be:

 There are many specific Decision Tree algorithms:

 Different Decision Tree algorithms use different metrics for

 ID3 in terms of its search space and search strategy:

 ID3 in its pure form perform no backtracking in its search. (post-

 Inductive bias in Decision Tree learning (ID3) :

 Issues in Decision Tree learning:

 ID3 extended to address most of these issues to C4.5.

 Avoiding over fitting the Data:

 Approach to avoid over fitting:

 Incorporating continuous-value attributes:

 The attributes tested in the decision nodes of the tree must be

 Alternative measure for selecting attributes:

 Handling training example with missing attribute value:

 Handling attributes with different costs:

 Do some research on the following Decision Tree algorithms:

You might also like