0% found this document useful (0 votes)

4 views43 pages

Week 11 - Decision Tree Learning

The document discusses Decision Tree Learning, a method for classification using tree-based classifiers that represent instances as feature-vectors. It covers properties, problems, and algorithms related to decision trees, including the importance of information gain and entropy in selecting features for splitting. Additionally, it addresses issues like overfitting and methods for pruning trees to improve generalization on unseen data.

Uploaded by

saufi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views43 pages

Week 11 - Decision Tree Learning

Uploaded by

saufi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Machine Learning

Methods
Lecture 11
Decision Tree Learning
Decision Trees
• Tree-based classifiers for instances represented as feature-vectors. Nodes
test features, there is one branch for each value of the feature, and leaves
specify the category.
color color
red blue green red green
blue
shape shape
neg pos B C
circle square triangle circle square triangle
pos neg neg A B C

• Can represent arbitrary conjunction and disjunction. Can represent any

classification function over discrete feature vectors.
• Can be rewritten as a set of rules, i.e. disjunctive normal form (DNF).
– red  circle → pos
– red  circle → A
blue → B; red  square → B
green → C; red  triangle → C

2
Properties of Decision Tree Learning
• Continuous (real-valued) features can be handled by
allowing nodes to split a real valued feature into two ranges
based on a threshold (e.g. length < 3 and length 3)
• Classification trees have discrete class labels at the leaves,
regression trees allow real-valued outputs at the leaves.
• Algorithms for finding consistent trees are efficient for
processing large amounts of training data for data mining
tasks.
• Methods developed for handling noisy training data (both
class and feature noise).
• Methods developed for handling missing feature values.

3
4
Problems for Decision Tree Learning
• Instances are represented by attribute-value pairs. Instances are described
by a fixed set of attributes (e.g., Temperature) and their values (e.g., Hot). The
easiest situation for decision tree learning is when each attribute takes on a small
number of disjoint possible values (e.g., Hot, Mild, Cold). However, extensions to
the basic algorithm allow handling real-valued attributes as well (e.g., representing
Temperature numerically).
• The target function has discrete output values. The decision tree in Figure
3.1 assigns a boolean classification (e.g., yes or no) to each example. Decision tree
methods easily extend to learning functions with more than two possible output
values. A more substantial extension allows learning target functions with real-
valued outputs, though the application of decision trees in this setting is less
common.
• Disjunctive descriptions may be required. As noted above, decision trees
naturally represent disjunctive expressions.

5
Problems for Decision Tree Learning

• The training data may contain errors. Decision tree learning

methods are robust to errors, both errors in classifications of the training
examples and errors in the attribute values that describe these examples.
• The training data may contain missing attribute values.
Decision tree methods can be used even when some training examples
have unknown values (e.g., if the Humidity of the day is known for only
some of the training examples).

CLASSIFICATION PROBLEMS

6
Decision Tree Induction Pseudocode
DTree(examples, features) returns a tree
If all examples are in one category, return a leaf node with that category label.
Else if the set of features is empty, return a leaf node with the category label that
is the most common in examples.
Else pick a feature F and create a node R for it
For each possible value vi of F:
Let examplesi be the subset of examples that have value vi for F
Add an out-going edge E to node R labeled with the value vi.
If examplesi is empty
then attach a leaf node to edge E labeled with the category that
is the most common in examples.
else call DTree(examplesi , features – {F}) and attach the resulting
tree as the subtree under edge E.
Return the subtree rooted at R.

9
Picking a Good Split Feature
• Goal is to have the resulting tree be as small as possible, per Occam’s razor.
• Finding a minimal decision tree (nodes, leaves, or depth) is an NP-hard
optimization problem.
• Top-down divide-and-conquer method does a greedy search for a simple
tree but does not guarantee to find the smallest.
– General lesson in ML: “Greed is good.”
• Want to pick a feature that creates subsets of examples that are relatively
“pure” in a single class so they are “closer” to being leaf nodes.
• There are a variety of heuristics for picking a good test, a popular one is
based on information gain that originated with the ID3 system of
Quinlan (1979).

10
Which attribute is the best classifier?

• Information gain - that measures how well a given

attribute separates the training examples according to their
target classification
• Entropy - that characterizes the (im)/purity of an arbitrary
collection of examples.

11
Entropy
• Entropy (disorder, impurity) of a set of examples, S, relative to a binary
classification is:
Entropy(S ) = − p1 log 2 ( p1 ) − p0 log 2 ( p0 )
where p1 is the fraction of positive examples in S and p0 is the fraction of
negatives.
• If all examples are in one category, entropy is zero (we define
0log(0)=0)
• If examples are equally mixed (p1=p0=0.5), entropy is a maximum of 1.
• Entropy can be viewed as the number of bits required on average to
encode the class of an example in S where data compression (e.g.
Huffman coding) is used to give shorter codes to more likely cases.
• For multi-class problems with c categories, entropy generalizes to:
𝑐

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = ෍ −𝑝𝑖 𝑙𝑜𝑔2 (𝑝𝑖 )

𝑖=1

12
Entropy Plot for Binary
Classification

13
Information Gain

• The information gain of a feature F is the expected

reduction in entropy resulting from splitting on this
feature / attribute.

𝑆𝑣
𝐺𝑎𝑖𝑛 𝑆, 𝐹 = 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 − ෍ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑣 )
𝑆
𝑣∈𝑉𝑎𝑙𝑢𝑒𝑠(𝐹)

where 𝑆𝑣 is the subset of 𝑆 having value v for feature F.

14
Information Gain

• Example:
– <big, red, circle>: + <small, red, circle>: +
– <small, red, square>: − <big, blue, circle>: −

2+, 2 −: E=1 2+, 2 − : E=1 2+, 2 − : E=1

size color shape

big small red blue circle square

1+,1− 1+,1− 2+,1− 0+,1− 2+,1− 0+,1−
E=1 E=1 E=0.918 E=0 E=0.918 E=0
Gain=1−(0.51 + 0.51) = 0 Gain=1−(0.750.918 + Gain=1−(0.750.918 +
0.250) = 0.311 0.250) = 0.311

15
Hypothesis Space Search
• Performs batch learning that processes all training instances
at once rather than incremental learning that updates a
hypothesis after each example.
• Performs hill-climbing (greedy search) that may only find a
locally-optimal solution. Guaranteed to find a tree consistent
with any conflict-free training set (i.e. identical feature vectors
always assigned the same class), but not necessarily the
simplest tree.
• Finds a single discrete hypothesis, so there is no way to
provide confidences or create useful queries.

17
Bias in Decision-Tree Induction

• Information-gain gives a bias for trees with minimal

depth.
• Implements a search (preference) bias instead of a
language (restriction) bias.

18
History of Decision-Tree Research
• Hunt and colleagues use exhaustive search decision-tree
methods (CLS) to model human concept learning in the 1960’s.
• In the late 70’s, Quinlan developed ID3 with the information gain
heuristic to learn expert systems from examples.
• Simultaneously, Breiman and Friedman and colleagues develop
CART (Classification and Regression Trees), similar to ID3.
• In the 1980’s a variety of improvements are introduced to
handle noise, continuous features, missing features, and
improved splitting criteria. Various expert-system
development tools results.
• Quinlan’s updated decision-tree package (C4.5) released in
1993.
• Weka includes Java version of C4.5 called J48.

19
Computational Complexity
• Worst case builds a complete tree where every path test every
feature. Assume n examples and m features.


F1

 Fm
Maximum of n examples spread across
all nodes at each of the m levels

• At each level, i, in the tree, must examine the remaining m− i

features for each instance at the level to calculate info gains.

 i  n = O(nm 2
)
i=1
• However, learned tree is rarely complete (number of leaves is  n). In
practice, complexity is linear in both number of features (m) and
number of training examples (n).

23
Overfitting
• Learning a tree that classifies the training data perfectly may not
lead to the tree with the best generalization to unseen data.
– There may be noise in the training data that the tree is erroneously
fitting.
– The algorithm may be making poor decisions towards the leaves of
the tree that are based on very little data and may not reflect reliable
trends.
• A hypothesis, h, is said to over fit the training data is there exists
another hypothesis which, h´, such that h has less error than h´ on
the training data but greater error on independent test data.

on training data
accuracy

on test data

hypothesis complexity
24
Overfitting Example
Testing Ohms Law: V = IR (I = (1/R)V)
Experimentally
measure 10 points current (I)

Fit a curve to the

Resulting data.

voltage (V)

Perfect fit to training data with an 9th degree polynomial

(can fit n points exactly with an n-1 degree polynomial)

Ohm was wrong, we have found a more accurate function!

25
Overfitting Example
Testing Ohms Law: V = IR (I = (1/R)V)

current (I)

voltage (V)

Better generalization with a linear function

that fits training data less accurately.

26
Overfitting Noise in Decision Trees
• Category or feature noise can easily cause overfitting.
– Add noisy instance <medium, blue, circle>: pos (but really neg)
color
red green blue
shape neg
neg
circle square triangle
pos neg pos

27
Overfitting Noise in Decision Trees
• Category or feature noise can easily cause overfitting.
– Add noisy instance <medium, blue, circle>: pos (but really neg)
color
red green blue <big, blue, circle>: −
shape <medium, blue, circle>: +
neg
circle square triangle small med big
pos neg pos neg pos neg

• Noise can also cause different instances of the same feature vector
to have different classes. Impossible to fit this data and must label
leaf with the majority class.
– <big, red, circle>: neg (but really pos)
• Conflicting examples can also arise if the features are incomplete
and inadequate to determine the class or if the target concept is
non-deterministic.

28
Overfitting Prevention (Pruning) Methods
• Two basic approaches for decision trees
– Prepruning: Stop growing tree as some point during top-down
construction when there is no longer sufficient data to make
reliable decisions.
– Postpruning: Grow the full tree, then remove subtrees that do not
have sufficient evidence.
• Label leaf resulting from pruning with the majority class of
the remaining data, or a class probability distribution.
• Method for determining which subtrees to prune:
– Cross-validation: Reserve some training data as a hold-out set
(validation set, tuning set) to evaluate utility of subtrees.
– Statistical test: Use a statistical test on the training data to
determine if any observed regularity can be dismisses as likely
due to random chance.
– Minimum description length (MDL): Determine if the additional
complexity of the hypothesis is less complex than just explicitly
remembering any exceptions resulting from pruning.

29
Reduced Error Pruning
• A post-pruning, cross-validation approach.
Partition training data in “grow” and “validation” sets.
Build a complete tree from the “grow” data.
Until accuracy on validation set decreases do:
For each non-leaf node, n, in the tree do:
Temporarily prune the subtree below n and replace it with a
leaf labeled with the current majority class at that node.
Measure and record the accuracy of the pruned tree on the validation set.

Permanently prune the node that results in the greatest increase in accuracy on
the validation set.

30
Issues with Reduced Error Pruning

• The problem with this approach is that it potentially

“wastes” training data on the validation set.
• Severity of this problem depends where we are on
the learning curve:
test accuracy

number of training examples

31
Cross-Validating without
Losing Training Data
• If the algorithm is modified to grow trees breadth-first
rather than depth-first, we can stop growing after
reaching any specified tree complexity.
• First, run several trials of reduced error-pruning using
different random splits of grow and validation sets.
• Record the complexity of the pruned tree learned in each
trial. Let C be the average pruned-tree complexity.
• Grow a final tree breadth-first from all the training data
but stop when the complexity reaches C.
• Similar cross-validation approach can be used to set
arbitrary algorithm parameters in general.

32
Additional Decision Tree Issues
• Better splitting criteria
– Information gain prefers features with many values.
• Continuous features
• Predicting a real-valued function (regression trees)
• Missing feature values
• Features with costs
• Misclassification costs
• Incremental learning
– ID4
– ID5
• Mining large databases that do not fit in main memory

33
What is ID3?
• A mathematical algorithm for building the decision tree.
• Invented by J. Ross Quinlan in 1979.
• Uses Information Theory invented by Shannon in 1948.
• Builds the tree from the top down, with no backtracking.
• Information Gain is used to select the most useful
attribute for classification.
Information Gain (IG)
• The information gain is based on the decrease in entropy after a
dataset is split on an attribute.
• Which attribute creates the most homogeneous branches?
• First the entropy of the total dataset is calculated.
• The dataset is then split on the different attributes.
• The entropy for each branch is calculated. Then it is added
proportionally, to get total entropy for the split.
• The resulting entropy is subtracted from the entropy before the split.
• The result is the Information Gain, or decrease in entropy.
• The attribute that yields the largest IG is chosen for the decision
node.
Information Gain (cont’d)
• A branch set with entropy of 0 is a leaf node.
• Otherwise, the branch needs further splitting to classify
its dataset.
• The ID3 algorithm is run recursively on the non-leaf
branches, until all data is classified.
Advantages of using ID3
• Understandable prediction rules are created from the
training data.
• Builds the fastest tree.
• Builds a short tree.
• Only need to test enough attributes until all data is
classified.
• Finding leaf nodes enables test data to be pruned,
reducing number of tests.
• Whole dataset is searched to create tree.
Disadvantages of using ID3
• Data may be over-fitted or over-classified, if a small
sample is tested.
• Only one attribute at a time is tested for making a
decision.
• Classifying continuous data may be computationally
expensive, as many trees must be generated to see
where to break the continuum.
Example: The Simpsons
Person Hair
Length
Weight Age Class

Homer 0” 250 36 M
Marge 10” 150 34 F
Bart 2” 90 10 M
Lisa 6” 78 8 F
Maggie 4” 20 1 F
Abe 1” 170 70 M
Selma 8” 160 41 F
Otto 10” 180 38 M
Krusty 6” 200 45 M

Comic 8” 290 38 ?
p  p  n  n 
Entropy(S) = − log 2   − log 2  
p+n  p+n p+n  p+n

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

= 0.9911
yes no
Hair Length <= 5?

Let us try splitting on

Hair length

Gain( A) = E(Current set) −  E(all child sets)

Gain(Hair Length <= 5) = 0.9911 – (4/9 * 0.8113 + 5/9 * 0.9710 ) = 0.0911
p  p  n  n 
Entropy(S) = − log 2   − log 2  
p+n  p+n p+n  p+n

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

= 0.9911
yes no
Weight <= 160?

Let us try splitting on

Weight

Gain( A) = E(Current set) −  E(all child sets)

Gain(Weight <= 160) = 0.9911 – (5/9 * 0.7219 + 4/9 * 0 ) = 0.5900
p  p  n  n 
Entropy(S) = − log 2   − log 2  
p+n  p+n p+n  p+n

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

= 0.9911
yes no
age <= 40?

Let us try splitting on

Age

Gain( A) = E(Current set) −  E(all child sets)

Gain(Age <= 40) = 0.9911 – (6/9 * 1 + 3/9 * 0.9183 ) = 0.0183
Of the 3 features we had, Weight was best.
But while people who weigh over 160 are
perfectly classified (as males), the under
160 people are not perfectly classified…
So we simply recurse! no
yes
Weight <= 160?

This time we find that we can split

on Hair length, and we are done!

yes no
Hair Length <= 2?
We need don’t need to keep the data around,
just the test conditions. Weight <= 160?

yes no

How would these

people be Hair Length <= 2?
classified? Male
yes no

Male Female
It is trivial to convert Decision Trees to
rules… Weight <= 160?

yes no

Hair Length <= 2?

Male
yes no

Male Female

Rules to Classify Males/Females

If Weight greater than 160, classify as Male

Elseif Hair Length less than or equal to 2, classify as Male
Else classify as Female
References
• Quinlan, J.R. 1986, Machine Learning, 1, 81

• http://dms.irb.hr/tutorial/tut_dtrees.php

• http://www.dcs.napier.ac.uk/~peter/vldb/dm/node11.html

• http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/4
_dtrees2.html

• Professor Sin-Min Lee, SJSU.

http://cs.sjsu.edu/~lee/cs157b/cs157b.html
SUMMARY AND FURTHER READING
The main points of this chapter include:
• Decision tree learning provides a practical method for concept learning and for
learning other discrete-valued functions. The ID3 family of algorithms infers
decision trees by growing them from the root downward, greedily selecting the
next best attribute for each new decision branch added to the tree.
• ID3 searches a complete hypothesis space (i.e., the space of decision trees can
represent any discrete-valued function defined over discrete-valued instances). It
thereby avoids the major difficulty associated with approaches that consider only
restricted sets of hypotheses: that the target function might not be present in the
hypothesis space.
• The inductive bias implicit in ID3 includes a preference for smaller trees; that is, its
search through the hypothesis space grows the tree only as large as needed in
order to classify the available training examples.

48
SUMMARY AND FURTHER READING
• Overfitting the training data is an important issue in decision tree learning.
Because the training examples are only a sample of all possible instances, it is
possible to add branches to the tree that improve performance on the training
examples while decreasing performance on other instances outside this set.
Methods for post-pruning the decision tree are therefore important to avoid
overfitting in decision tree learning (and other inductive inference methods that
employ a preference bias).
• A large variety of extensions to the basic ID3 algorithm has been developed by
different researchers. These include methods for post-pruning trees, handling real-
valued attributes, accommodating training examples with missing attribute values,
incrementally refining decision trees as new training examples become available,
using attribute selection measures other than information gain, and considering
costs associated with instance attributes.

Analog & Digital Electronics DOTE Text Book
No ratings yet
Analog & Digital Electronics DOTE Text Book
111 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
ML_UNIT_3_NOTES-1
No ratings yet
ML_UNIT_3_NOTES-1
118 pages
Module 3 - Decision Tress and Artificial Neural Networks
No ratings yet
Module 3 - Decision Tress and Artificial Neural Networks
177 pages
21 Decision Trees
No ratings yet
21 Decision Trees
62 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
AI PPT
No ratings yet
AI PPT
16 pages
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
No ratings yet
Unit6 -2 Classification-Decision-Trees_25625586-1bf9-4821-a721-70db2d7805ef
36 pages
22.InfoTheory-DecisionTrees-short
No ratings yet
22.InfoTheory-DecisionTrees-short
25 pages
Module 3
No ratings yet
Module 3
102 pages
Decision Trees
No ratings yet
Decision Trees
42 pages
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
No ratings yet
Learning by Asking Questions: Decision Trees: Piyush Rai Machine Learning (CS771A)
22 pages
Machine Learning: MVJ21CS62
No ratings yet
Machine Learning: MVJ21CS62
12 pages
P4-DTRF 1
No ratings yet
P4-DTRF 1
63 pages
Chap 18 B
No ratings yet
Chap 18 B
22 pages
Roger Penrose, Emanuele Severino, Ines Testoni, Giuseppe Vitiell - Artificial Intelligence Versus Natural Intelligence (2022, Springer) - Libgen - Li
No ratings yet
Roger Penrose, Emanuele Severino, Ines Testoni, Giuseppe Vitiell - Artificial Intelligence Versus Natural Intelligence (2022, Springer) - Libgen - Li
196 pages
A08-Decision-Trees-2up
No ratings yet
A08-Decision-Trees-2up
20 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
Decision Tree
No ratings yet
Decision Tree
52 pages
19 -- Decision Tree -- ID3
No ratings yet
19 -- Decision Tree -- ID3
87 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
THESIS
No ratings yet
THESIS
74 pages
MOD 4-1
No ratings yet
MOD 4-1
42 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
AIML Module-04
No ratings yet
AIML Module-04
46 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Module 3
No ratings yet
Module 3
101 pages
Chapter18 2
No ratings yet
Chapter18 2
40 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
2024-Lecture11-MLAlgorithms
No ratings yet
2024-Lecture11-MLAlgorithms
84 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
ML Lecture 3
No ratings yet
ML Lecture 3
13 pages
ML UNIT 2-2-40
No ratings yet
ML UNIT 2-2-40
39 pages
7. Decision Tree & Random Forest
No ratings yet
7. Decision Tree & Random Forest
41 pages
Lte Parameter Description PDF
No ratings yet
Lte Parameter Description PDF
77 pages
2 ML Ch3 Decision Trees Final
No ratings yet
2 ML Ch3 Decision Trees Final
70 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
Class 16 Decision Tree
No ratings yet
Class 16 Decision Tree
45 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Decision Trees
No ratings yet
Decision Trees
53 pages
CS 343: Artificial Intelligence Machine Learning: Raymond J. Mooney
No ratings yet
CS 343: Artificial Intelligence Machine Learning: Raymond J. Mooney
35 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
No ratings yet
CS446: Machine Learning: Lecture 21 (ML Models - Decision Trees - ID3)
54 pages
Computational Meshing For CFD Simulations
No ratings yet
Computational Meshing For CFD Simulations
31 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Unit 5. Decision Trees
No ratings yet
Unit 5. Decision Trees
58 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Risk Assessment Guidelines For Tunnels
100% (1)
Risk Assessment Guidelines For Tunnels
10 pages
W116-Vacuum-Climate - PNG (PNG Image, 600 × 776 Pixels) - Scaled (84%)
No ratings yet
W116-Vacuum-Climate - PNG (PNG Image, 600 × 776 Pixels) - Scaled (84%)
24 pages
2007 F4 Add Math Projects
100% (4)
2007 F4 Add Math Projects
17 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
Unit 2 Transmission lines and Waveguides 23_24.ppt
100% (2)
Unit 2 Transmission lines and Waveguides 23_24.ppt
130 pages
DIY Human Following Robot (Final File)
No ratings yet
DIY Human Following Robot (Final File)
52 pages
MAS04-06 - Standard Costing - MF - Encrypted
No ratings yet
MAS04-06 - Standard Costing - MF - Encrypted
9 pages
Conceptos GMAT PDF
No ratings yet
Conceptos GMAT PDF
312 pages
E 912 Prog
No ratings yet
E 912 Prog
24 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Combalicer Unsteadystateheattransfer
No ratings yet
Combalicer Unsteadystateheattransfer
7 pages
Regulacija Pozicije, Brzine I Ubrzanja Aktuatora Korištenjem Optimalne Estimacije Stanja
No ratings yet
Regulacija Pozicije, Brzine I Ubrzanja Aktuatora Korištenjem Optimalne Estimacije Stanja
10 pages
Electrochemistry Online Tutorial Question Form
No ratings yet
Electrochemistry Online Tutorial Question Form
3 pages
Unit-Iv Material
No ratings yet
Unit-Iv Material
24 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Analysis of Flow Characteristics and Pressure Drop For An Impinging Plate Fin Heat Sink With Elliptic Bottom Profiles
No ratings yet
Analysis of Flow Characteristics and Pressure Drop For An Impinging Plate Fin Heat Sink With Elliptic Bottom Profiles
17 pages
Labour Economics Reading Group
No ratings yet
Labour Economics Reading Group
7 pages
Beta-Formula-Excel-Template
No ratings yet
Beta-Formula-Excel-Template
5 pages
Dpu 3200 Spec
No ratings yet
Dpu 3200 Spec
8 pages
About Us: 20,000/-From Each Candidate. There Are Only Limited Seats Available. Interested
No ratings yet
About Us: 20,000/-From Each Candidate. There Are Only Limited Seats Available. Interested
3 pages
30 Ncmer037
No ratings yet
30 Ncmer037
16 pages
MP3 Music Player in Python
No ratings yet
MP3 Music Player in Python
15 pages
Introduction To RNAV: ICAO PBN Workshop Tanzania
No ratings yet
Introduction To RNAV: ICAO PBN Workshop Tanzania
33 pages
Chapter 11 Thread Fastener
100% (1)
Chapter 11 Thread Fastener
54 pages
DHTML Tutorial: Components of Dynamic HTML
No ratings yet
DHTML Tutorial: Components of Dynamic HTML
17 pages
IE-CD-001264-001 RASS For Maintenance Leaflet
No ratings yet
IE-CD-001264-001 RASS For Maintenance Leaflet
2 pages
Atoms and molecules worksheet
No ratings yet
Atoms and molecules worksheet
2 pages
chemistry chapter 1-6 class 10
No ratings yet
chemistry chapter 1-6 class 10
2 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)

Week 11 - Decision Tree Learning

Uploaded by

Week 11 - Decision Tree Learning

Uploaded by

Machine Learning

• Can represent arbitrary conjunction and disjunction. Can represent any

• The training data may contain errors. Decision tree learning

• Information gain - that measures how well a given

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑆 = ෍ −𝑝𝑖 𝑙𝑜𝑔2 (𝑝𝑖 )

• The information gain of a feature F is the expected

where 𝑆𝑣 is the subset of 𝑆 having value v for feature F.

2+, 2 −: E=1 2+, 2 − : E=1 2+, 2 − : E=1

big small red blue circle square

• Information-gain gives a bias for trees with minimal

• At each level, i, in the tree, must examine the remaining m− i

Fit a curve to the

Perfect fit to training data with an 9th degree polynomial

Ohm was wrong, we have found a more accurate function!

Better generalization with a linear function

• The problem with this approach is that it potentially

number of training examples

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

Let us try splitting on

Gain( A) = E(Current set) −  E(all child sets)

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

Let us try splitting on

Gain( A) = E(Current set) −  E(all child sets)

Entropy(4F,5M) = -(4/9)log2(4/9) - (5/9)log2(5/9)

Let us try splitting on

Gain( A) = E(Current set) −  E(all child sets)

This time we find that we can split

How would these

Hair Length <= 2?

Rules to Classify Males/Females

If Weight greater than 160, classify as Male

• Professor Sin-Min Lee, SJSU.

You might also like