0% found this document useful (0 votes)

26 views45 pages

Deep Learning: Decision Trees I

The document discusses decision trees, which are a method for inductive learning that represents concepts as tree structures. Decision trees classify data instances by sorting them down the tree from the root to a leaf node, which assigns a classification. The document outlines how decision trees work, including that each node tests an attribute, branches represent attribute values, and leaves assign classifications. It also describes the basic algorithm for building decision trees, which selects the most informative attribute to test at each node using a measure called information gain.

Uploaded by

waheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views45 pages

Deep Learning: Decision Trees I

Uploaded by

waheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 45

Deep Learning

Decision Trees I

Dr M. Sultan Zia
Associate Professor
The University of Lahore, Chenab Campus, Gugrat, Pakistan
DECISION TREES

Introduction

It is a method that induces concepts from examples

(inductive learning)

Most widely used & practical learning method

The learning is supervised: i.e. the classes or categories of the

data instances are known

It represents concepts as decision trees (which can be

rewritten as if-then rules)

2
DECISION TREES

Introduction

The target function can be Boolean or discrete valued

3
DECISION TREES

Decision Tree Representation

1. Each node corresponds to an attribute

2. Each branch corresponds to an attribute value

3. Each leaf node assigns a classification

4
DECISION TREES

Example

5
DECISION TREES

Outlook
Example Sunny Rain
Overcast
Humidity Wind

High Normal Strong Weak

A Decision Tree for the concept PlayTennis

An unknown observation is classified by testing its attributes
and reaching a leaf node
6
DECISION TREES

Decision Tree Representation

Decision trees represent a disjunction of conjunctions of

constraints on the attribute values of instances

Each path from the tree root to a leaf corresponds to a

conjunction of attribute tests (one rule for classification)

The tree itself corresponds to a disjunction of these

conjunctions (set of rules for classification)

7
DECISION TREES

Decision Tree Representation

8
DECISION TREES

Basic Decision Tree Learning Algorithm

Most algorithms for growing decision trees are variants of a

basic algorithm

An example of this core algorithm is the ID3 algorithm

developed by Quinlan (1986)

It employs a top-down, greedy search through the space of

possible decision trees

9
DECISION TREES

Basic Decision Tree Learning Algorithm

First of all we select the best attribute to be tested at the root

of the tree

For making this selection each attribute is evaluated using a

statistical test to determine how well it alone classifies the
training examples

10
DECISION TREES

Basic Decision Tree Learning Algorithm

We have

D12 D11 - 14 observations

D1
D2 D5
D10 D4 - 4 attributes
D6 • Outlook
D3
D14 • Temperature
D8 D9 • Humidity
D7 D13 • Wind

- 2 classes (Yes, No)

11
DECISION TREES

Basic Decision Tree Learning Algorithm

Outlook
Sunny Rain
Overcast

D1 D8 D10 D6
D3
D14
D11 D4
D9 D12
D2 D7
D13 D5

12
DECISION TREES

Basic Decision Tree Learning Algorithm

The selection process is then repeated using the training

examples associated with each descendant node to select the
best attribute to test at that point in the tree

13
DECISION TREES

Outlook
Sunny Rain
Overcast

D1 D8 D10 D6
D3
D14
D11 D4
D9 D12
D2 D7
D13 D5

What is the
“best” attribute to test at this point? The possible choices are
Temperature, Wind & Humidity
14
DECISION TREES

Basic Decision Tree Learning Algorithm

This forms a greedy search for an acceptable decision tree, in

which the algorithm never backtracks to reconsider earlier
choices

15
DECISION TREES

Which Attribute is the Best Classifier?

The central choice in the ID3 algorithm is selecting which

attribute to test at each node in the tree

We would like to select the attribute which is most useful for

classifying examples

For this we need a good quantitative measure

For this purpose a statistical property, called information

gain is used

16
DECISION TREES

Which Attribute is the Best Classifier?: Definition of Entropy

In order to define information gain precisely, we begin by

defining entropy

Entropy is a measure commonly used in information theory,

called.

Entropy characterizes the impurity of an arbitrary collection

of examples

17
DECISION TREES

Which Attribute is the Best Classifier?: Definition of Entropy

Suppose we have four independent values of a variable X:

A, B, C, D

These values are independent and occur randomly

You might transmit these values over a binary serial link by

encoding each reading with two bits

A = 00 B = 01 C = 10 D=
11

18
DECISION TREES

Which Attribute is the Best Classifier?: Definition of Entropy

Someone tells you that there probabilities of occurrence is
not equal:

p(A) = 1/2
p(B) = 1/4
p(C) = 1/8
p(D) = 1/8

It is now possible to invent a coding that only uses 1.75 bits

on average per symbol, for the transmission, e.g.

A=0 B = 10 C = 110 D = 111

19
DECISION TREES

Which Attribute is the Best Classifier?: Definition of Entropy

Suppose X can have m values, V1, V2, …, Vm, with

probabilities: p1, p2, …, pm

The smallest number of bits, on average, per value, needed to

transmit a stream of values of X is

If a p = 1 and all other p’s are 0, then we need 0 bits (i.e. we

don’t need to transmit anything)

20
DECISION TREES

Which Attribute is the Best Classifier?: Definition of Entropy

If all p’s are equal for a given m, we need the highest number
of bits for transmission

If there are m possible values of an attribute, then the

entropy can be as large as log2 m

21
DECISION TREES

Which Attribute is the Best Classifier?: Definition of Entropy

This formula is called Entropy H

H(X) =

High Entropy means that the examples have more equal

probability of occurrence (and therefore not easily
predictable)

Low Entropy means easy predictability

22
DECISION TREES

Which Attribute is the Best Classifier?: Information Gain

Suppose we are trying to predict output Y (Like Film PK) &

we have input X (College Major = v)

Major
Math CS
History

23
DECISION TREES

Which Attribute is the Best Classifier?: Information gain

We have H(X) = 1.5 H(Y) = 1.0

Conditional Entropy H(Y | X = v)

The Entropy of Y among only those records in which X = v

Major
Math CS
History

24
DECISION TREES

Which Attribute is the Best Classifier?: Information Gain

Conditional Entropy of Y
H(Y | X = Math) = 1.0
H(Y | X = History) = 0
H(Y | X = CS) = 0

Major
Math CS
History

25
DECISION TREES

Which Attribute is the Best Classifier?: Information Gain

Average Conditional Entropy of Y

H(Y | X) =
Major
Math CS
History

26
DECISION TREES

Which Attribute is the Best Classifier?: Information Gain

Information Gain is the expected reduction in entropy

caused by partitioning the examples according to an
attribute’s value

Info Gain (Y | X) = H(Y) – H(Y | X) = 1.0 – 0.5 = 0.5

For transmitting Y, how much bits would be saved if both

side of the line knew X

In general, we write Gain (S, A)

Where S is the collection of examples & A is an attribute
27
DECISION TREES

Which Attribute is the Best Classifier?: Information Gain

Let’s
investigate
the attribute
Wind

28
DECISION TREES

Which Attribute is the Best Classifier?: Information Gain

The collection of examples has 9 positive values and 5

negative ones

Eight (6 positive and 2 negative ones) of these examples

have the attribute value Wind = Weak

Six (3 positive and 3 negative ones) of these examples have

the attribute value Wind = Strong

29
DECISION TREES

Which Attribute is the Best Classifier?: Information Gain

The information gain obtained by separating the

examples according to the attribute Wind is calculated as:

30
DECISION TREES

Which Attribute is the Best Classifier?: Information Gain

We calculate the Info Gain for each attribute and select
the attribute having the highest Info Gain

31
DECISION TREES

Select Attributes which Minimize Disorder

The formula can be converted from log2 to log10

logx(M) = log10M . logx10

= log10M/log10x

Hence log2(Y) = log10(Y)/log10(2)

32
DECISION TREES

Example

33
DECISION TREES

Example
Which attribute should be selected as the first test?

“Outlook” provides the most information

34
DECISION TREES

35
DECISION TREES

Example
The process of selecting a new attribute is now repeated for
each (non-terminal) descendant node, this time using only
training examples associated with that node

Attributes that have been incorporated higher in the tree are

excluded, so that any given attribute can appear at most once
along any path through the tree

36
DECISION TREES

Example
This process continues for each new leaf node until either:

1. Every attribute has already been included along this path

through the tree

2. The training examples associated with a leaf node have

zero entropy

37
DECISION TREES

Example

38
DECISION TREES

From Decision Trees to Rules

Next Step: Make rules from the decision tree

After making the identification tree, we trace each path from

the root node to leaf node, recording the test outcomes as
antecedents and the leaf node classification as the consequent

For our example we have:

If the Outlook is Sunny and the Humidity is High then No

If the Outlook is Sunny and the Humidity is Normal then Yes
...
39
DECISION TREES

Hypothesis Space Search

ID3 can be characterized as
searching a space of
hypotheses for one that fits
the training examples

The space searched is the set

of possible decision trees

ID3 performs a simple-to-

complex, hill-climbing
search through this
hypothesis space
40
DECISION TREES

Hypothesis Space Search

It begins with an empty tree,
then considers more and
more elaborate hypothesis
in search of a decision tree
that correctly classifies the
training data

The evaluation function that

guides this hill-climbing
search is the information
gain measure

41
DECISION TREES

Hypothesis Space Search

Some points to note:

• The hypothesis space of all decision trees is a complete

space. Hence the target function is guaranteed to be
present in it.

42
DECISION TREES

Hypothesis Space Search

• ID3 maintains only a single current hypothesis as it
searches through the space of decision trees.

By determining only a single hypothesis, ID3 loses the

capabilities that follow from explicitly representing all
consistent hypotheses.

For example, it does not have the ability to determine

how many alternative decision trees are consistent with
the training data, or to pose new instance queries that
optimally resolve among these competing hypotheses
43
DECISION TREES

Hypothesis Space Search

• ID3 performs no backtracking, therefore it is

susceptible to converging to locally optimal solutions

• ID3 uses all training examples at each step to refine its

current hypothesis. This makes it less sensitive to
errors in individual training examples.
However, this requires that all the training examples
are present right from the beginning and the learning
cannot be done incrementally with time

44
DECISION TREES

Reference
Sections 3.1 – 3.4.1 of T. Mitchell

Sections 3.4.2 – 3.5 of T. Mitchell

W7-8 - Decision Trees
No ratings yet
W7-8 - Decision Trees
81 pages
03 02 Decision Trees
No ratings yet
03 02 Decision Trees
61 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
20 pages
Educational Technology and Education Conferences (January To June 2012) Por Clayton R Wright
No ratings yet
Educational Technology and Education Conferences (January To June 2012) Por Clayton R Wright
56 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
DM-Lecture Decision Trees (A)
No ratings yet
DM-Lecture Decision Trees (A)
161 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
CASM Aircrafthistories CanadairCL 30SilverStar
100% (1)
CASM Aircrafthistories CanadairCL 30SilverStar
38 pages
Decision Trees Lectures
No ratings yet
Decision Trees Lectures
55 pages
Fjord 38 Open
0% (1)
Fjord 38 Open
11 pages
Tree Models
No ratings yet
Tree Models
42 pages
Ai Metaverse
No ratings yet
Ai Metaverse
47 pages
Unit 2
No ratings yet
Unit 2
20 pages
Ducati Monster '01 Owner's Manual General
100% (1)
Ducati Monster '01 Owner's Manual General
81 pages
02 DecisionTrees Done
No ratings yet
02 DecisionTrees Done
68 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
MAchine Learning 1
No ratings yet
MAchine Learning 1
17 pages
Module 2
No ratings yet
Module 2
42 pages
MAchine Learning 2
No ratings yet
MAchine Learning 2
16 pages
Practice Q Machine Learning Ans
No ratings yet
Practice Q Machine Learning Ans
54 pages
ML Lecture 13-14
No ratings yet
ML Lecture 13-14
33 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
COMM 1140 Week 8 Tutorial Worksheet
No ratings yet
COMM 1140 Week 8 Tutorial Worksheet
139 pages
Tactical News Magazine 1
100% (6)
Tactical News Magazine 1
116 pages
Receipt 1713415482
No ratings yet
Receipt 1713415482
2 pages
CN 62 - Detailed Account Transit Charges - Surface Mail: Completion Instructions
No ratings yet
CN 62 - Detailed Account Transit Charges - Surface Mail: Completion Instructions
11 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
Decision Tree Part 1
No ratings yet
Decision Tree Part 1
15 pages
Cuple Bosal
No ratings yet
Cuple Bosal
9 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
ISSN 1411 - 5247 Perancangan Produk Tempat Tisu Multifungsi Dengan Menggunakan Quality Function Deployment (QFD)
No ratings yet
ISSN 1411 - 5247 Perancangan Produk Tempat Tisu Multifungsi Dengan Menggunakan Quality Function Deployment (QFD)
9 pages
Amrapali Adarsh Awas Yojna Automated - Brochure
No ratings yet
Amrapali Adarsh Awas Yojna Automated - Brochure
5 pages
Top 80 ChatGPT Prompt Formats
No ratings yet
Top 80 ChatGPT Prompt Formats
12 pages
ML Unit-2 Material
No ratings yet
ML Unit-2 Material
20 pages
Agricultural Developmental Activities To Double The Crop Yield of Bisoi Block of Mayurbhanj District of Odisha During This Corona Period
No ratings yet
Agricultural Developmental Activities To Double The Crop Yield of Bisoi Block of Mayurbhanj District of Odisha During This Corona Period
5 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Limitations of Social Welfare System.
No ratings yet
Limitations of Social Welfare System.
7 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Foxboro CFT50 Digital Coriolis Mass Flowmeter: Custody Transfer For The Dairy Industry
No ratings yet
Foxboro CFT50 Digital Coriolis Mass Flowmeter: Custody Transfer For The Dairy Industry
4 pages
Graph Foundation Model To Uncover Online Information Operations
No ratings yet
Graph Foundation Model To Uncover Online Information Operations
9 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Decision Trees
No ratings yet
Decision Trees
40 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
HR Analytics at ScaleneWorks - Behavioral Modeling To Predict Renege IMB551
No ratings yet
HR Analytics at ScaleneWorks - Behavioral Modeling To Predict Renege IMB551
12 pages
Lecture 023+-+Decision+Trees+ - 1
No ratings yet
Lecture 023+-+Decision+Trees+ - 1
54 pages
Unit 3
No ratings yet
Unit 3
46 pages
Brembo - P 50 067
No ratings yet
Brembo - P 50 067
4 pages
Module 3
No ratings yet
Module 3
102 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Module 3
No ratings yet
Module 3
101 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Module - 3 - DTL & Ann
No ratings yet
Module - 3 - DTL & Ann
10 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
Expense Reimbursement Form HAFIZ 3
No ratings yet
Expense Reimbursement Form HAFIZ 3
2 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Lesson 1 Grooming Personality
No ratings yet
Lesson 1 Grooming Personality
12 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
Product Specifications
No ratings yet
Product Specifications
2 pages
Galle - Makumbura (Normal Day)
No ratings yet
Galle - Makumbura (Normal Day)
2 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Video Tutorial: Decision Tree Learning
No ratings yet
Video Tutorial: Decision Tree Learning
21 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
No ratings yet
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
31 pages
Decision Tree Learning and Inductive Inference
No ratings yet
Decision Tree Learning and Inductive Inference
37 pages
Willie Kim
No ratings yet
Willie Kim
2 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
Shipper Load Confirmation
No ratings yet
Shipper Load Confirmation
2 pages
Memorandum of Agreement
No ratings yet
Memorandum of Agreement
5 pages
ATA 24 Electrical Power L1
100% (1)
ATA 24 Electrical Power L1
40 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
Chapter 21 Managing Health - Financing
No ratings yet
Chapter 21 Managing Health - Financing
35 pages
Read The Following Text and Answer The Questions That Follow It
No ratings yet
Read The Following Text and Answer The Questions That Follow It
2 pages
ID3
No ratings yet
ID3
7 pages
7.PELAN ID 6116 - EC221 - Updated 26.02.2019
No ratings yet
7.PELAN ID 6116 - EC221 - Updated 26.02.2019
1 page
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet