0% found this document useful (0 votes)

29 views56 pages

2.decision Tree

The document discusses decision trees, including their representation, the ID3 learning algorithm, and entropy/information gain. It provides examples of using decision trees to classify whether to play tennis based on weather attributes. Key points are: - Decision trees classify examples by sorting them down the tree from root to leaf node, which assign a class. - ID3 builds decision trees top-down by choosing the attribute that reduces entropy/uncertainty the most at each step. - Entropy and information gain are used to quantify uncertainty and select the most informative attribute to split on.

Uploaded by

businessaccanurag2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views56 pages

2.decision Tree

Uploaded by

businessaccanurag2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Decision Tree

Outline
• Decision tree representation
• ID3 learning algorithm
• Entropy, Information gain
• Overfitting
Representation of Concepts
• Concept learning: conjunction of attributes
• (Sunny AND Hot AND Humid AND Windy) +
• Decision trees: disjunction of conjunction of attribues
• (Sunny AND Normal) OR (Overcast) OR (Rain AND Weak) +
• More powerful representation
• Larger hypothesis space H
• Can be represented as a tree
• Common form of decision making in humans
Decision Trees
Decision tree for Play Tennis

• Decision tree is a classifier with a tree representation:

 Each internal node is a decision node which specifies
a test of some attribute
 Each branch corresponds to attribute value
 Each leaf node assigns a classification
• Can be represented by logical formulas
• Example of representing rule in DT’s:
if outlook = sunny AND humidity = normal
OR
if outlook = overcast
OR
if outlook = rain AND wind = weak
then playtennis
Decision Trees and Rules
• Rules can represent a decision tree:
if item1 then subtree1
elseif item2 then subtree2
elseif...

• There are as many rules as there are leaf nodes in the decision tree.

• Advantage of rules over decision trees:

• Rules are a widely-used and well-understood representation formalism for
knowledge in expert systems;
• Rules are easier to understand, modify and combine; and
• Rules can significantly improve classification performance by eliminating
unnecessary tests.
• Rules make it possible to combine different decision trees for the same task.
Advantages of Decision
Trees
• Decision trees are simple to understand.
People are able to understand decision trees model after a brief explanation.
• Decision trees have a clear evaluation strategy
Decision trees are easily interpreted
• Decision trees are able to handle both nominal and categorical data.
Other techniques are usually specialised in analysing datasets that have only one
type of variable. Ex: neural networks can be used only with numerical variables
Advantages of Decision Trees (1)
• Decision trees are a white box model.
If a given situation is observable in a model the explanation for the condition is
easily explained by boolean logic. An example of a black box model is an artificial
neural network since the explanation for the results is excessively complex to be
comprehended.
• Decision trees are robust, perform well with large data in a short time.
Large amounts of data can be analysed using personal computers in a time short
enough to enable stakeholders to take decisions based on its analysis.
Decision Tree Another Example
Whether to approve a loan

Employed?
No Yes

Credit
Income?
Score?
High Low High Low

Approve Reject Approve Reject

Applications of Decision Trees
• Instances describable by attribute-value pairs
• Target function is discrete valued
• Disjunctive hypothesis may be required
• Possibly noisy training data

Examples:
• Equipment or medical diagnosis
• Credit risk analysis
• Modeling calendar scheduling preference
Decision Tree –Decision Boundary
•Decision trees divide the feature space into axis-parallel rectangles
•Each rectangular region is labeled with one label–or a probability distribution over labels
Expressiveness
Decision trees can represent any function of the input attributes
–Boolean operations (and, or, xor, etc.)
–All Boolean functions
Issues
• Given some training examples, what decision tree should be
generated?
• One proposal: prefer the smallest tree that is consistent with the data
(Bias)
• Possible method:
• search the space of decision trees for the smallest decision tree that fits the
data
Searching for a good tree
• The space of decision trees is too big for systematic search.
• Stop and
• return the a value for the target feature or
• a distribution over target feature values
• Choose a test (e.g. an input feature) to split on.
• For each value of the test, build a subtree for those examples with this
value for the test.
Top-Down Induction of Decision Trees ID3
1. Which node to proceed with?
1. A  the “best” decision attribute for next node
2. Assign A as decision attribute for node
3. For each value of A create new descendant
4. Sort training examples to leaf node according to the
attribute value of the branch
5. If all training examples are perfectly classified (same value
of target attribute) stop, else iterate over new leaf nodes.
2. When to stop?
The Basic ID3 Algorithm
Choices
• When to stop
• no more input features
• all examples are classified the same
• too few examples to make an informative split

• Which test to split on

• split gives smallest error.
• With multi-valued features
• split on all values or
• split values into half.
Which Attribute is ”best”?
Principled Criterion
• Selection of an attribute to test at each node - choosing the most
useful attribute for classifying examples.
• information gain
• measures how well a given attribute separates the training examples
according to their target classification
• This measure is used to select among the candidate attributes at each step
while growing the tree
• Gain is measure of how much we can reduce uncertainty (Value lies
between 0,1)
Entropy • The entropy is 0 if the outcome
is ``certain”.
• The entropy is maximum if we
• A measure for
have no knowledge of the
– uncertainty system (or any outcome is
– purity equally possible).
– information content

Info Gain = Entropy(Parent)- (Weighted Avg) Entropy(Children)

Entropy
• Entropy characterizes the (im)purity of an arbitrary collection of examples.
• Given a collection S (containing positive and negative examples), the
entropy of S relative to this boolean classification is

• p+/p- is the proportion of positive/negative

examples in S.
• Entropy is 0 if all members of S belong to the
same class; entropy is 1 when the collection
contains an equal number of positive and
negative examples.
• More general:
Information Gain
• Information gain measures the expected reduction in entropy caused by
partitioning the examples according to an attribute:

• First term: entropy of the original collection S; second term: expected value
of entropy after S is partitioned using attribute A (Sv subset of S).

• Gain(S,A): The expected reduction in entropy caused by knowing the value

of attribute A.

• ID3 uses information gain to select the best attribute at each step in
growing the tree.
Information Gain
Gain(S,A): expected reduction in entropy due to partitioning S
on attribute A

Gain(S,A)=Entropy(S) vvalues(A) |Sv|/|S| Entropy(Sv)

Entropy([29+,35-]) = -29/64 log2 29/64 – 35/64 log2 35/64
= 0.99

[29+,35-] A1=? A2=? [29+,35-]

True False True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

Information Gain
Entropy([21+,5-]) = 0.71 Entropy([18+,33-]) = 0.94
Entropy([8+,30-]) = 0.74 Entropy([8+,30-]) = 0.62
Gain(S,A1)=Entropy(S) Gain(S,A2)=Entropy(S)
-26/64*Entropy([21+,5-])
-51/64*Entropy([18+,33-])
-38/64*Entropy([8+,30-])
-13/64*Entropy([11+,2-])
=0.27
=0.12

[29+,35-] A1=? A2=? [29+,35-]

True False True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

22
Training Examples
Day Outlook Temp Humidity Wind Tennis?
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
Selecting the Next Attribute
S=[9+,5-] S=[9+,5-]
E=0.940 E=0.940

Humidity Wind

High Normal Weak Strong

[3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-]

E=0.985 E=0.592 E=0.811 E=1.0

Gain(S,Humidity) Gain(S,Wind)
=0.940-(7/14)*0.985 =0.940-(8/14)*0.811
– (7/14)*0.592 – (6/14)*1.0
=0.151 =0.048
Humidity provides greater info. gain than Wind, w.r.t target classification.
Selecting the Next Attribute
S=[9+,5-]
E=0.940

Outlook

Sunny Overcast Rain

[2+, 3-] [4+, 0] [3+, 2-]

E=0.971 E=0.0 E=0.971

Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.0971
=0.247
Selecting the Next Attribute
The information gain values for the 4 attributes are:
• Gain(S,Outlook) =0.247
• Gain(S,Humidity) =0.151
• Gain(S,Wind) =0.048
• Gain(S,Temperature) =0.029

where S denotes the collection of training examples

Note: 0Log20 =0
ID3 Algorithm
[D1,D2,…,D14] Outlook
[9+,5-]

Sunny Overcast Rain

Ssunny=[D1,D2,D8,D9,D11] [D3,D7,D12,D13] [D4,D5,D6,D10,D14]

[2+,3-] [4+,0-] [3+,2-]

? Yes ?
Test for this node

Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970

Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570
Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
ID3 Algorithm
Outlook

Sunny Overcast Rain

Humidity Yes Wind

[D3,D7,D12,D13]

High Normal Strong Weak

No Yes No Yes

[D1,D2] [D8,D9,D11] [D6,D14] [D4,D5,D10]

Splitting Rule: GINI Index
• GINI Index
• Measure of node impurity

GINInode (Node)  1 [ p(c)] 2

c  classes

Sv
GINIsplit (A)   S
GINI (N v )
v Values(A)
Splitting Based on Continuous Attributes
Continuous Attribute – Binary Split
• For continuous attribute
• Partition the continuous value of attribute A into a
discrete set of intervals. Temperature = 82:5
• Create a new boolean attribute Ac , looking for a
threshold c,

true if Ac  c
Ac  
 false otherwise
• consider all
How possible splitsc ?and finds the best cut
to choose

Temperature: 40 48 60 72 80 90
PlayTennis: No No Yes Yes Yes No
Hypothesis Space Search in Decision Trees
The ID3 algorithm
• searches hypotheses space which is set of possible hypotheses
• Performs simple-to-complex, Hill-Climbing search
• Starts with empty tree
• Depends on Information Gain which guides hill-climbing search
Hypothesis Space Search in Decision Trees

ID3 candidate-Elimination
ID3 maintains only a single current candidate-Elimination method, which
hypothesis maintains the set of all hypotheses
consistent with the available training
examples
It does not know how many alternative
decision trees are consistent with the
available training data
performs no backtracking in its search
ID3 uses all training examples at each FIND-S or CANDIDATE-ELIMINANATION
step in the search to make statistically make decisions incrementally, based on
based decisions individual training examples
Goal: to find the best decision tree
Hypothesis Space Search in Decision Trees
Hypothesis Space Search in Decision Trees

• ID3 searches a complete hypothesis space but does so incompletely since once it finds a good
hypothesis it stops (cannot find others).
• Candidate-Elimination searches an incomplete hypothesis space (it can only represent some
hypothesis) but does so completely.
• A preference bias is an inductive bias where some hypothesis are preferred over others.
• A restriction bias is an inductive bias where the set of hypothesis considered is restricted to a
smaller set.
INDUCTIVE BIAS IN DECISION TREE LEARNING
• Inductive bias is the set of assumptions that, together with the training data
• Which of these decision trees does ID3 choose?
• It chooses the first acceptable tree it encounters in its simple-to-complex, hill
climbing search through the space of possible trees
• the ID3 search strategy
• selects in favor of shorter trees over longer ones
• selects trees that place the attributes with highest information gain closest to the
root
Approximate inductive bias of ID3: Shorter trees are preferred over larger trees.

• Consider breadth-first search algorithm BFS-ID3 which finds a shortest decision tree and
thus exhibits precisely the bias "shorter trees are preferred over longer trees."
• BFS-ID3 conducts the entire breadth-first search through the hypothesis space
• ID3 can be viewed as an efficient approximation to BFS-ID3
• Because ID3 uses the information gain heuristic and a hill climbing strategy, it exhibits a
more complex bias than BFS-ID3
• it does not always find the shortest consistent tree
• it is biased to favor trees that place attributes with high information gain closest to the root.

A closer approximation to the inductive bias of ID3: Shorter trees are preferred over
longer trees. Trees that place high information gain attributes close to the root are preferred
over those that do not.
Restriction Biases and Preference Biases

types of inductive bias

• preference bias or search bias
• restriction bias or language bias

• The inductive bias of ID3 is thus a preference for certain hypotheses over others
(e.g., for shorter hypotheses), with no hard restriction on the hypotheses that can
be eventually enumerated. This form of bias is typically called a preference bias (or,
alternatively, a search bias).
• the bias of the CANDIDATE ELIMINATION algorithm is in the form of a categorical
restriction on the set of hypotheses considered. This form of bias is typically called
a restriction bias (or, alternatively, a language bias).

a preference bias is more desirable than a restriction bias

Bias and Occam’s Razor
Bias: Searches a complete hypothesis space incompletely
Inductive bias is solely a consequence of the ordering of
hypotheses by its search strategy

Inductive bias often goes by the name of Occam's razor

Occam’s Razor: Prefer shorter hypotheses that fits the data

Argument in favor:
• Fewer short hypotheses than long hypotheses
• A short hypothesis that fits the data is unlikely to be a coincidence
• A long hypothesis that fits the data might be a coincidence

ICS320 39
Issues in Decision Tree Learning
• Determine how deeply to grow the decision tree, underfitting and overfitting
• Handling continuous attributes
• Choosing an appropriate attribute selection measure
• Handling training data with missing attribute values
• Handling attributes with differing costs
• Improving computational efficiency
Overfitting
An algorithm can produce trees that overfit the training examples in the following two
cases
• There is noise in the data
• The number of training examples is too small to produce a representative sample of
the true target function
Overfitting
• Learning a tree that classifies the training data perfectly may not
lead to the tree with the best generalization performance.
• There may be noise in the training data
• May be based on insufficient data
• A hypothesis h is said to overfit the training data if there is another
hypothesis, h’, such that h has smaller error than h’ on the training
data but h has larger error on the test data than h’.

On training

accuracy On testing

Complexity of tree
Overfitting

Underfitting and Overfitting

Overfitting

Underfitting: when model is too simple, both training and test errors are large
Overfitting due to Noise

Decision boundary is distorted by noise point

Overfitting due to Insufficient Examples

Lack of data points makes it difficult to predict correctly the class labels
of that region
Notes on Overfitting
• Overfitting results in decision trees that are more complex than
necessary
• Training error no longer provides a good estimate of how well the
tree will perform on previously unseen records
• Overfitting happens when a model is capturing idiosyncrasies of the
data rather than generalities.
• Often caused by too many parameters relative to the amount of
training data.
• E.g. an order-N polynomial can intersect any N+1 data points
Avoid Overfitting
• There are several approaches to avoiding overfitting in decision
tree learning. These can be grouped into two classes:
• Prepruning: Stop growing when data split not statistically
significant.
• Postpruning: Grow full tree then remove nodes

• Methods for evaluating subtrees to prune:

• Minimum description length (MDL):
Minimize: size(tree) + size(misclassifications(tree))
• Measure performance over training data
• Measure performance over separate validation data set

ICS320 48
Pre-Pruning (Early Stopping)
• It is the difficult in the this approach of estimating precisely when to stop growing
the tree
• Typical stopping conditions for a node:
• Stop if all instances belong to the same class
• Stop if all the attribute values are the same
• More restrictive conditions:
• Stop if number of instances is less than some user-specified threshold
• Stop if class distribution of instances are independent of the available features (e.g.,
using  2 test)
• Stop if expanding the current node does not improve impurity
measures (e.g., Gini or information gain).
Reduced-error Pruning
Split data into training and validation set.
We use a validation set to prevent overfitting.

Do until further pruning is harmful(i.e., decreases accuracy of the tree

over the validation set):
1. Evaluate impact on validation set of pruning each possible node
(plus those below it)
2. Always choose the node whose removal most increases the
decision tree accuracy over the validation set

produces smallest version of most accurate subtree

What if data is limited?
General Strategy: Overfit and Simplify
Reduced Error Pruning
Rule Post Pruning
one quite successful method for finding high accuracy hypotheses is a technique
we shall call rule post-pruning.

Rule post-pruning involves the following steps:

1. Infer the decision tree from the training set, growing the tree until the training
data is fit as well as possible and allowing overfitting to occur.
2. Convert the learned tree into an equivalent set of rules by creating one rule for
each path from the root node to a leaf node.
3. Prune (generalize) each rule by removing any preconditions that result in
improving its estimated accuracy.
4. Sort the pruned rules by their estimated accuracy, and consider them in this
sequence when classifying subsequent instances.
Rule Post Pruning
There are three main advantages of converting the decision tree to rules before
pruning:
• Converting to rules allows distinguishing among the different contexts in which a
decision node is used.
• Converting to rules removes the distinction between attribute tests that occur
near the root of the tree and those that occur near the leaves
• Converting to rules improves readability
Model Selection & Generalization
• Learning is an ill-posed problem; data is not sufficient to find a unique
solution
• The need for inductive bias, assumptions about H
• Generalization: How well a model performs on new data
• Overfitting: H more complex than C or f
• Underfitting: H less complex than C or f

54
Triple Trade-Off
• There is a trade-off between three factors:
• Complexity of H, c (H),
• Training set size, N,
• Generalization error, E on new data
overfitting

• As N increases, E decreases
• As c (H) increases, first E decreases and then E increases
• As c (H) increases, the training error decreases for some time
and then stays constant (frequently at 0)

55
References:
1. Machine Learning: Tom Mitchel book
2. Introduction to Machine Learning (IITKGP)
Prof. Sudeshna Sarkar

Govt of Punjab Health Department Form
No ratings yet
Govt of Punjab Health Department Form
1 page
TDSB Resume PDF
100% (1)
TDSB Resume PDF
2 pages
Jewish Christianity in The Early Church
100% (10)
Jewish Christianity in The Early Church
89 pages
Guide For Aluminum Welding: 19-0Xxxx - Maxal-Aluminumproductguide - 46Pg-La2.Indd 1 3/5/19 1:29 PM
No ratings yet
Guide For Aluminum Welding: 19-0Xxxx - Maxal-Aluminumproductguide - 46Pg-La2.Indd 1 3/5/19 1:29 PM
46 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Decision Tree Learning and Inductive Inference
No ratings yet
Decision Tree Learning and Inductive Inference
37 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Screenshot 2024-02-06 at 1.43.15 PM
No ratings yet
Screenshot 2024-02-06 at 1.43.15 PM
66 pages
Module 3-1 PDF
No ratings yet
Module 3-1 PDF
43 pages
ID3
No ratings yet
ID3
7 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Unit 2
100% (1)
Unit 2
42 pages
Lec-3-Decision Trees
No ratings yet
Lec-3-Decision Trees
47 pages
Unit 3
No ratings yet
Unit 3
46 pages
Unit-3 MLT
No ratings yet
Unit-3 MLT
74 pages
Module 3 DecisionTree Notes
100% (1)
Module 3 DecisionTree Notes
14 pages
Tree Models
No ratings yet
Tree Models
42 pages
Dec Tree
No ratings yet
Dec Tree
17 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Decision Tree Learning Notes On 23rd July
No ratings yet
Decision Tree Learning Notes On 23rd July
23 pages
L8-1-decisiontrees--random-forest (1)
No ratings yet
L8-1-decisiontrees--random-forest (1)
118 pages
MCA3 (DS) Unit 4 ML
No ratings yet
MCA3 (DS) Unit 4 ML
29 pages
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
No ratings yet
Decision Trees Iterative Dichotomiser 3 (ID3) For Classification: An ML Algorithm
7 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
48 pages
Ai 01 Id3
No ratings yet
Ai 01 Id3
7 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
Decision Tree
No ratings yet
Decision Tree
14 pages
3 Decision Tree Learning
No ratings yet
3 Decision Tree Learning
38 pages
ML Unit-2.1
No ratings yet
ML Unit-2.1
17 pages
The ID3 Algorithm
No ratings yet
The ID3 Algorithm
9 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
03 02 Decision Trees
No ratings yet
03 02 Decision Trees
61 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
70 pages
AIML - Module 3 - Updated
No ratings yet
AIML - Module 3 - Updated
42 pages
ML Unit 3 Part 1
No ratings yet
ML Unit 3 Part 1
42 pages
Unit2 ML
No ratings yet
Unit2 ML
19 pages
Decision Trees
No ratings yet
Decision Trees
7 pages
ML Lec5
No ratings yet
ML Lec5
7 pages
Unit 3 MLT
No ratings yet
Unit 3 MLT
18 pages
Unit 3
No ratings yet
Unit 3
90 pages
ML Unit-2 Material
No ratings yet
ML Unit-2 Material
20 pages
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
No ratings yet
Visit:: Join Telegram To Get Instant Updates: Contact: MAIL: Instagram: Instagram: Whatsapp Share
21 pages
4th Sem MA Module 3 Notes
No ratings yet
4th Sem MA Module 3 Notes
27 pages
Module 2
No ratings yet
Module 2
42 pages
Module 2 Notes v1 PDF
No ratings yet
Module 2 Notes v1 PDF
20 pages
W7-8 - Decision Trees
No ratings yet
W7-8 - Decision Trees
81 pages
Module - 3 - DTL & Ann
No ratings yet
Module - 3 - DTL & Ann
10 pages
Video Tutorial: Decision Tree Learning
No ratings yet
Video Tutorial: Decision Tree Learning
21 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
Deep Learning: Decision Trees I
No ratings yet
Deep Learning: Decision Trees I
45 pages
Unit-2 Notes
No ratings yet
Unit-2 Notes
20 pages
NOTES Module 3 - Chapter 6 - Decision Tree Learning
No ratings yet
NOTES Module 3 - Chapter 6 - Decision Tree Learning
20 pages
Unit 3
No ratings yet
Unit 3
81 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
No ratings yet
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
31 pages
Painless Pre-Algebra
From Everand
Painless Pre-Algebra
Barron's Educational Series
3/5 (2)
Application Layer 5thunit
No ratings yet
Application Layer 5thunit
59 pages
Unit 2ANNs
No ratings yet
Unit 2ANNs
169 pages
1.concept Learning
No ratings yet
1.concept Learning
50 pages
CD Unit4-1 (RTE)
No ratings yet
CD Unit4-1 (RTE)
24 pages
Healthcare and Spirituality Stephen P. Kleiwer PDF Download
100% (4)
Healthcare and Spirituality Stephen P. Kleiwer PDF Download
85 pages
Canon Ir 3025 RC Machine
No ratings yet
Canon Ir 3025 RC Machine
12 pages
God, The Devil and You: A Systemic Functional Linguistic Analysis of The Language of Hillsong
No ratings yet
God, The Devil and You: A Systemic Functional Linguistic Analysis of The Language of Hillsong
32 pages
Termite MGT
No ratings yet
Termite MGT
33 pages
Difference Between Essay and Literature Review
100% (2)
Difference Between Essay and Literature Review
7 pages
Bernard Barani 2020
No ratings yet
Bernard Barani 2020
18 pages
Chapter 3 - Supplementary Problems
No ratings yet
Chapter 3 - Supplementary Problems
11 pages
Programming With SCILAB
No ratings yet
Programming With SCILAB
14 pages
0608 04-06 BodyFIX
No ratings yet
0608 04-06 BodyFIX
6 pages
Compediu Teste
No ratings yet
Compediu Teste
6 pages
Scope Shooting and Filming Camera Attachment
No ratings yet
Scope Shooting and Filming Camera Attachment
5 pages
Fluid Space and Transformational Learning 1st Edition Kyriaki Tsoukala PDF Download
No ratings yet
Fluid Space and Transformational Learning 1st Edition Kyriaki Tsoukala PDF Download
53 pages
Module 2 Empathy 7 9
No ratings yet
Module 2 Empathy 7 9
9 pages
Spss Diagram
No ratings yet
Spss Diagram
9 pages
Chapter 1
No ratings yet
Chapter 1
16 pages
ĐỀ SỐ 5 - KEY
No ratings yet
ĐỀ SỐ 5 - KEY
5 pages
6092 01
No ratings yet
6092 01
8 pages
TOEFL Kel.4
No ratings yet
TOEFL Kel.4
13 pages
NIU Northern Illinois University
No ratings yet
NIU Northern Illinois University
4 pages
Core Values
No ratings yet
Core Values
5 pages
IPC Cases For Discussion PDF
No ratings yet
IPC Cases For Discussion PDF
393 pages
Resume Deepak
No ratings yet
Resume Deepak
2 pages
Anh 11 Global UNIT 9. Social Issues - TEST 1
No ratings yet
Anh 11 Global UNIT 9. Social Issues - TEST 1
8 pages
Mintel Market Sizes Shampoo and Conditioners in Vietnam (2022)
0% (1)
Mintel Market Sizes Shampoo and Conditioners in Vietnam (2022)
3 pages
Guilford's
100% (2)
Guilford's
12 pages
Syllabus For BTLE ICT Specialization II
71% (7)
Syllabus For BTLE ICT Specialization II
10 pages

2.decision Tree

Uploaded by

2.decision Tree

Uploaded by

Decision Tree

• Decision tree is a classifier with a tree representation:

• Advantage of rules over decision trees:

Approve Reject Approve Reject

• Which test to split on

Info Gain = Entropy(Parent)- (Weighted Avg) Entropy(Children)

• p+/p- is the proportion of positive/negative

• Gain(S,A): The expected reduction in entropy caused by knowing the value

Gain(S,A)=Entropy(S) vvalues(A) |Sv|/|S| Entropy(Sv)

[29+,35-] A1=? A2=? [29+,35-]

True False True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

[29+,35-] A1=? A2=? [29+,35-]

True False True False

[21+, 5-] [8+, 30-] [18+, 33-] [11+, 2-]

High Normal Weak Strong

[3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-]

E=0.985 E=0.592 E=0.811 E=1.0

Sunny Overcast Rain

[2+, 3-] [4+, 0] [3+, 2-]

E=0.971 E=0.0 E=0.971

where S denotes the collection of training examples

Sunny Overcast Rain

Ssunny=[D1,D2,D8,D9,D11] [D3,D7,D12,D13] [D4,D5,D6,D10,D14]

Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

[D1,D2] [D8,D9,D11] [D6,D14] [D4,D5,D10]

GINInode (Node)  1 [ p(c)] 2

types of inductive bias

a preference bias is more desirable than a restriction bias

Inductive bias often goes by the name of Occam's razor

©Tom Mitchell, McGraw Hill, 1997

Decision boundary is distorted by noise point

• Methods for evaluating subtrees to prune:

Do until further pruning is harmful(i.e., decreases accuracy of the tree

produces smallest version of most accurate subtree

Rule post-pruning involves the following steps:

You might also like