0% found this document useful (0 votes)

12 views28 pages

L3 - Decision Trees

axxdddd

Uploaded by

fanvuliz1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views28 pages

L3 - Decision Trees

axxdddd

Uploaded by

fanvuliz1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Decision Trees

Lương Thái Lê
Outline of the Lecture
1. Introduction of Decision Trees (DT)
2. DT Algorithms
3. Choose the Best Features
• Information Gain

• Example
Root
Decision Tree (DT) Introduction
Banch
• DT is a supervised learning method – classification
• DT learns a classification function represented by a
decision tree
• Can be presented by a set of rules IF – THEN
• Can perform even with noise data
• As one of the most common inductive learning
Leaf
methods
• Successfully applied in many application problems
• Ex: Spam email filtering…
A DT: Example

• (Outlook=Overcast, Temperature=Hot, Humidity=High, Wind=Weak) → Yes

• (Outlook=Rain, Temperature=Mild, Humidity=High, Wind=Strong) → No

• (Outlook=Sunny, Temperature=Hot, Humidity=High, Wind=Strong) → No

Represent a DT (1)
• Each internal node represents an attribute to be tested for the examples.

• Each branch from a node corresponds to a possible value of the attribute

associated with that node

• Each leaf node represents one class ci in the set of class C

• A learned DT will classify for an example, by traversing the tree from the root node
to a leaf node

=> The class label associated with that leaf node will be assigned to the example to be classified
Represent a DT (2)
• A DT represents a disjunction of
combinations of constraints for
the attribute values of the
examples
• Each path from the root node to a
leaf node corresponds to a
combination of attribute tests
DT – Problem Setting
• Set of possible instances X:
• each instance x in X is a feature vector
• x = <x1, x2,…, xn >; Ex: <Humidity=low, Win=weak, Outlook=rain, Temp=hot>
• Unknown target function: 𝑓: 𝑋 → 𝑌
• 𝑦 ∈ 𝑌; 𝑦 = 1 𝑖𝑓 𝑤𝑒 𝑝𝑙𝑎𝑦 𝑡𝑒𝑛𝑛𝑖𝑠 𝑜𝑛 𝑡ℎ𝑖𝑠 𝑑𝑎𝑦, 𝑒𝑙𝑠𝑒 𝑦 = 0
• Set of function hypotheses 𝐻 = ℎ ℎ: 𝑋 → 𝑌}
• each hypothesis ℎ is a decision tree

• Input:
• Training Examples: < 𝑥 𝑖 , 𝑦 𝑖
> of unkown target function f
• Output:
• Hypothesis ℎ ∈ 𝐻 that best approximates f
Top – down Induction of Decision Trees
[ID3, C4.5, Quinlan]
node = Root
Main loop:
1. 𝐴 ← 𝑡ℎ𝑒 𝑏𝑒𝑠𝑡 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 (𝑓𝑒𝑎𝑡𝑢𝑟𝑒) 𝑓𝑜𝑟 𝑛𝑒𝑥𝑡 𝑛𝑜𝑑𝑒
2. 𝐴𝑠𝑠𝑖𝑔𝑛 𝐴 𝑎𝑠 𝑑𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝑓𝑜𝑟 𝑛𝑜𝑑𝑒
3. 𝐹𝑜𝑟 𝑒𝑎𝑐ℎ 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝐴, 𝑐𝑟𝑒𝑎𝑡𝑒 𝑑𝑒𝑐𝑒𝑛𝑑𝑎𝑛𝑡 𝑜𝑓 𝑛𝑜𝑑𝑒
4. 𝑆𝑜𝑟𝑡 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑡𝑜 𝑙𝑒𝑎𝑓 𝑛𝑜𝑑𝑒𝑠
5. 𝐼𝑓 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒𝑠 𝑝𝑒𝑟𝑓𝑒𝑐𝑡𝑙𝑦 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑒𝑑, 𝑡ℎ𝑒𝑛 𝑆𝑇𝑂𝑃 𝑒𝑙𝑠𝑒 𝑖𝑡𝑒𝑟𝑎𝑡𝑒 𝑜𝑣𝑒𝑟 𝑛𝑒𝑤 𝑙𝑒𝑎𝑓 𝑛𝑜𝑑𝑒

Which feature (attribute) is the best?

ID3 Pseudocode (Quinlan - 1979)
ID3 alg (Training_Set, Class_Labels, Attributes)
{
Create the Root node of the decision tree
If all examples of Training_Set belong to the same class c, Return Decision tree with a Root node is labeled c
If the set Attributes is empty, Return Decision Tree with a Root node attached to a class label ≡ Majority
_Class_Label(Training_Set)
A ← The attribute in the Attributes set has the "best" classifier for Training_Set
Test Attribute for Root node ←A
For each possible values v of the attribute A
Add a new branch under the Root node, corresponding to the case: "The value of A is v“
Determine Training_Setv ={Instance x|x ⊆ Training_Set , xA = v}
If(Training_Setv= ∅) 𝑡ℎ𝑒𝑛
Create a leaf node with class label= Majority _Class_Label(Training_Set)
Attach this leaf node to the newly created branch
Else Append to the new created branch a subtree generated by ID3_alg(Training_Setv , Class_Labels,
{Attributes}\{A})
Return Root
}
Choose the Best Attribute
• How to evaluate an attribute's ability
to separate learning examples by
their class label?
 Use a statistical evaluation
Information Gain
• Example:
Which Attribute will be chosen, A1 or A2 ?
Entropy
• To evaluate the heterogeneity/impurity of a set
• Entropy of the set S for classification with k classes
𝑘

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = ෍ −𝑝𝑖 𝑙𝑜𝑔2 𝑝𝑖

𝑖=1
where pi is the proportion of examples in the set S that belong to class i, and 0. 𝑙𝑜𝑔2 0 = 0

• Entropy of the set S for classification with 2 classes

𝐻 𝑆 ≡ −( 𝑝1 𝑙𝑜𝑔2 𝑝1 ) - 𝑝2 𝑙𝑜𝑔2 𝑝2
• The meaning of entropy in the field of Information Theory
• The entropy of the set S indicates the number of bits required to encode the class of an
element randomly drawn from the set S.
Entropy – Example with 2 classes
• S includes 14 examples, of which 9 belong to class c1 (Yes) and 5
examples belong to class c2 (No)
9 9 5 5
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = − . 𝑙𝑜𝑔2 − . 𝑙𝑜𝑔2 ≈ 0,94
14 14 14 14
• Entropy = 0, if all examples belong to the same class (c1 or c2)
• Entropy = 1, if the number of examples belonging to class c1 is equal to the number of examples
belonging to class c2
• Entropy = a value in the range (0,1), if the number of examples belonging to class c1 is different
from the number of examples belonging to class c2
High Entropy:
• x is from a uniform like distribution
• values sampled from it are less predictable
Low Entropy:
• x is from a varied (peaks and valley) distribution
• values sampled from it are easier predictable
Information Gain
• Information Gain of an attribute for a set of examples:
• The reduction degree in Entropy by partitioning the examples by the values of that
property
• Information Gain of attribute A for the set S:
𝑆𝑣
𝐼𝐺 𝑆, 𝐴 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑣 )
𝑆
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐴)
where Values(A) is the set of possible values of the attribute A and
𝑆𝑣 = {𝑥|𝑥 ∈ 𝑆, 𝑥𝐴 = 𝑣}
• Meaning of IG(S,A):
• The number of bits reduced for the class encoding of an random example from the
set S, when the value of attribute A is known.
=> The best feature is the feature with highest IG
The learning set S (Mitchell-1998)
Information Example
• Calculate the Information Gain value of the Wind attribute for the learning set
S : IG(S,Wind)
• The Wind attribute has 2 possible values: Weak and Strong
• S = {9 for Yes, and 5 for No}
• Sweak={6 examples of Yes class and 2 examples of No class with value Wind=Weak}
• Sstrong={3 examples of Yes class and 3 examples of No class with value
Wind=Strong}
𝑆𝑣
𝐼𝐺 𝑆, 𝑊𝑖𝑛𝑑 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆𝑣
𝑆
𝑣∈ 𝑊𝑒𝑎𝑘,𝑆𝑡𝑟𝑜𝑛𝑔
= 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − 8Τ14 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆𝑊𝑒𝑎𝑘 − 6Τ14 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆𝑆𝑡𝑟𝑜𝑛𝑔
= 0,94 − 8Τ14 0,81 − 6Τ14 1 = 0,048
Learning a DT – Example (1)
• For the Root, choose the best feature from the set {Outlook, Temperature,
Humidity, Wind}
• IG(S, Outlook) = … = 0,246
• IG(S, Temperature)= … =0,029 The highest IG
• IG(S, Humidity) = … = 0,151
• IG(S, Wind) = … = 0,048
=> Outlook is chosen to be the test feature for the Root
Learning a DT – Example (2)
• For the Node1, choose the best feature from
the set {Temperature, Humidity,
Wind} to be test feature.
• 𝐼𝐺 𝑆𝑆𝑢𝑛𝑛𝑦 , 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = … = 0,57
• 𝐼𝐺 𝑆𝑆𝑢𝑛𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 = … = 0,97
• 𝐼𝐺 𝑆𝑆𝑢𝑛𝑛𝑦 , 𝑊𝑖𝑛𝑑 = … = 0,57
Choose Humidity for Node1
 Similar, we have Node2, Node3, Node4
Comment on Stragy of ID3
• ID3 searches only one (but not all) decision trees that fit the training examples
• chooses the first matching decision tree found during its search
• Use Information Gain to choose the best test feature
bias towards multivalued attributes (Ex: Bank account, ID,…) => easily to get overfitting
• During the search, ID3 does not perform backtracking
=> It is only guaranteed to find a locally optimal solution,
Problems in ID3 that Need to be Solve
• Overfitting
• Handling attributes with continuous value (Age, Price…)
• The more suitable evaluations (better than Information Gain) for
determining the test attribute for a node
• Handling missing-value attributes training examples
• Handling attributes with different costs
=> C4.5 can handle all above problems
Overfitting Solving
• 2 stragies:
• Stop learning the decision tree earlier, before it reaches a tree structure that
matchs perfect classification of the training set
=> difficult to decide when to stop
• Learn the full tree (perfectly suitable for the training set), and then perform
the tree pruning process.
often give better performance in practice
• How to properly prune trees?
• Evaluation of classifier performance for an validation set
• Use reduced-error pruning and rule post-pruning
Reduced-error Pruning
• Each node of the completely tree is checked for pruning
• A node will be pruned if the tree (after pruning that node) achieves no
worse performance than the original tree for the validation set.
• Pruning a node includes:
• Remove all sub-trees associated with pruned node
• Convert pruned node to a leaf node (classified label)
• Attached to this leaf node (pruned node) the class label that dominates the training set
associated with that node
• Repeat pruning
• Always select a node that pruning maximizes the likelihood classification of the
decision tree for validation set
• Stop pruning when it reduces the classifiability of decision tree for the validation set
Rule post-pruning
• Convert the complete decision tree learned into
a set of corresponding rules
• Reducing each rule (independently of the others)
by removing any conditions that do not help
bring about an improvement in the classification
efficiency of that rule
• Rearrange the reduced rules according to the
classifier ability, and use this order for the
classification of future examples
Features with Continuous values
• Need to convert to discrete-valued attributes, by dividing the continuous interval
into a set of non-intersecting intervals.
• For the (continuous) attribute A, create a new attribute of binary type Av such
that: Av is True if A>v, and False otherwise.
• How to determine the “best” threshold value v?
• Choose the threshold value v that produces the highest Information Gain value
• Example:
• Sort the learning examples in ascending value for the Temperature
• Identify learning examples that are contiguous but different from class (Temperature 48
& 60; Temperature 80 &90)
average(48,60)=54; average(80,90)=85
• There are 2 possible threshold values: Temperature54 and Temperature85
• The new binary feature Temperature54 is selected, because IG(S,Temperature54) >
IG(S,Temperature85)
Gain Ratio – Another Way choosing the best
feature
• →Reduce the effect of attributes with many values
𝑆𝑣 𝑆𝑣
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑆, 𝐴 = − ෍ 𝑙𝑜𝑔2
𝑆 𝑆
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠 𝐴

𝐼𝐺(𝑆, 𝐴)
𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜 𝑆, 𝐴 =
𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛(𝑆, 𝐴)

where Values(A) is the set of possible values of the attribute A

𝑆𝑣 = {𝑥|𝑥 ∈ 𝑆, 𝑥𝐴 = 𝑣}
Handling attributes with missing values (1)
• Suppose attribute A is a candidate for the test attribute at node n
• How to deal with the example x has no value for attribute A
• Let Sn be the set of training examples associated with node n that
have a value for the attribute A
• Solution 1: xA is the most common value for attribute A among the examples
belonging to the set Sn
• Solution 2: xA is the most common value for attribute A among the examples
belonging to the set Sn having the same target class as x
Attributes have different costs
• In some machine learning problems, attributes can be assigned different
costs
• Example: In learning to classify medical diseases, BloodTest has costs $150, while
TemperatureTest costs $10
• Tendency to learn cost-based decision trees:
• Use as many low-cost attributes as possible
• Only use high-cost attributes when necessary (to help achievereliable classifications)
=> Using assessments other than IG for test attribute identification
When to use DT?
• Learning examples are represented by (attribute, value) pairs.
• Suitable with discrete-valued attributes
• For attributes with continuous values, it must be discretized
• The objective function whose output is discrete values
• Example: Classify the examples into the appropriate class
• The training set may contain noise/error
• The training set may contain missing attributes
Q&A - Thank you!

W7-8 - Decision Trees
No ratings yet
W7-8 - Decision Trees
81 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
No ratings yet
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
61 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Unit 3
No ratings yet
Unit 3
81 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Unit 3
No ratings yet
Unit 3
90 pages
Tree Models
No ratings yet
Tree Models
42 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
Lec4 - Decision Trees
No ratings yet
Lec4 - Decision Trees
43 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Ai 01 Id3
No ratings yet
Ai 01 Id3
7 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Unit 3
No ratings yet
Unit 3
46 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Module 3
No ratings yet
Module 3
102 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Module 3
No ratings yet
Module 3
101 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Deep Learning: Decision Trees I
No ratings yet
Deep Learning: Decision Trees I
45 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Numerical Methods I
No ratings yet
Numerical Methods I
44 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
ML - 8
No ratings yet
ML - 8
70 pages
Reduction
No ratings yet
Reduction
91 pages
Summary of Numerical Methods For Engineers
No ratings yet
Summary of Numerical Methods For Engineers
1 page
Bmte 144 em 2024 MP
No ratings yet
Bmte 144 em 2024 MP
28 pages
Machine Learning in Arcgis: Lauren Bennett, PHD Marjean Pobuda
No ratings yet
Machine Learning in Arcgis: Lauren Bennett, PHD Marjean Pobuda
19 pages
Class X-Maths-Polynomials-Aecs2 Mumbai
No ratings yet
Class X-Maths-Polynomials-Aecs2 Mumbai
5 pages
Gaussian Quadrature
No ratings yet
Gaussian Quadrature
10 pages
Aiml Neural Net
No ratings yet
Aiml Neural Net
19 pages
CYK Algorithm
No ratings yet
CYK Algorithm
29 pages
Coding: Aplikasi Sederhana Mengurutkan Bilangan
No ratings yet
Coding: Aplikasi Sederhana Mengurutkan Bilangan
8 pages
Assignment 1,2&3
No ratings yet
Assignment 1,2&3
3 pages
Competitive Programming: Maximum Bipartite Matching
No ratings yet
Competitive Programming: Maximum Bipartite Matching
19 pages
Método de Runge Kutta
No ratings yet
Método de Runge Kutta
12 pages
Binomial Maths JEE
No ratings yet
Binomial Maths JEE
17 pages
Rlassignment 2
No ratings yet
Rlassignment 2
3 pages
Review of Transforms: ECGR 6118 Computer Project: Transforms Student Name
No ratings yet
Review of Transforms: ECGR 6118 Computer Project: Transforms Student Name
25 pages
Task1 BasicProgramming AbelHendrikMP TI23T
No ratings yet
Task1 BasicProgramming AbelHendrikMP TI23T
3 pages
hw1 Deep Learning SoSe2024
No ratings yet
hw1 Deep Learning SoSe2024
2 pages
Be - Mechanical Engineering - Semester 5 - 2023 - May - Finite Element Analysisrev 2019 C Scheme
No ratings yet
Be - Mechanical Engineering - Semester 5 - 2023 - May - Finite Element Analysisrev 2019 C Scheme
3 pages
Mcta 102 Programming System Jun 2020
No ratings yet
Mcta 102 Programming System Jun 2020
2 pages
Topic: Non-Negative Matrix Factorisation: Assignment - 2
No ratings yet
Topic: Non-Negative Matrix Factorisation: Assignment - 2
6 pages
Accenture Mock Test - 9 21
No ratings yet
Accenture Mock Test - 9 21
7 pages
CSPC - 204
No ratings yet
CSPC - 204
4 pages
Course Book
No ratings yet
Course Book
8 pages
Optimization of Rasasm and Sambar Powder Ingredients Using Linear Programming Techniques
No ratings yet
Optimization of Rasasm and Sambar Powder Ingredients Using Linear Programming Techniques
1 page
PCA
No ratings yet
PCA
4 pages
Job 3
No ratings yet
Job 3
3 pages
A Systematic Algorithm For Denoising Audio Signal Using Savitzky - Golay Method
No ratings yet
A Systematic Algorithm For Denoising Audio Signal Using Savitzky - Golay Method
4 pages
Hamming Code Eng
No ratings yet
Hamming Code Eng
3 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)

L3 - Decision Trees

Uploaded by

L3 - Decision Trees

Uploaded by

Decision Trees

• (Outlook=Overcast, Temperature=Hot, Humidity=High, Wind=Weak) → Yes

• (Outlook=Rain, Temperature=Mild, Humidity=High, Wind=Strong) → No

• (Outlook=Sunny, Temperature=Hot, Humidity=High, Wind=Strong) → No

• Each branch from a node corresponds to a possible value of the attribute

• Each leaf node represents one class ci in the set of class C

Which feature (attribute) is the best?

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = ෍ −𝑝𝑖 𝑙𝑜𝑔2 𝑝𝑖

• Entropy of the set S for classification with 2 classes

where Values(A) is the set of possible values of the attribute A

You might also like