New Module 3 Part1
New Module 3 Part1
1
Contents
Chapter-3
Decision Tree Learning: Decision Tree Learning
Introduction
Decision tree representation
Appropriate problems
ID3 algorithm.
Artificial Neural Network:
Introduction
NN representation
Appropriate problems
Perceptrons
Back propagation algorithm.
Decision tree learning is a method for
approximating discrete-valued target
functions, in which the learned function is
represented by a decision tree.
3
DECISION TREE REPRESENTATION
• Decision trees classify instances by sorting them down the tree from the root to some
leaf node, which provides the classification of the instance.
• Each node in the tree specifies a test of some attribute of the instance, and each
branch descending from that node corresponds to one of the possible values for this
attribute.
• An instance is classified by starting at the root node of the tree, testing the attribute
specified by this node, then moving down the tree branch corresponding to the value
of the attribute in the given example. This process is then repeated for the subtree
rooted at the new node.
Attributes
Target_Attribute
Examples
5
DECISION TREE REPRESENTATION
FIGURE:A decision tree for the concept PlayTennis. An example is classified by sorting it through
the tree to the appropriate leaf node, then returning the classification associated with this leaf.
• Decision trees represent a disjunction of conjunctions of constraints on the attribute values
of instances.
• Each path from the tree root to a leaf corresponds to a conjunction of attribute tests, and
the tree itself to a disjunction of these conjunctions
For example,
The decision tree shown in above figure corresponds to the expression
(Outlook = Sunny ∧ Humidity = Normal)
∨ (Outlook = Overcast)
∨ (Outlook = Rain ∧ Wind = Weak)
APPROPRIATE PROBLEMS FOR DECISION TREE LEARNING
Decision tree learning is generally best suited to problems with the following
characteristics:
2.The target function has discrete output values – The decision tree assigns a Boolean
classification (e.g., yes or no) to each example. Decision tree methods easily extend to
learning functions with more than two possible output values.
ID3 Algorithm
ID3(Examples, Target_attribute, Attributes)
Examples are the training examples. Target_attribute is the attribute whose value is to be predicted by
the tree. Attributes is a list of other attributes that may be tested by the learned decision tree. Returns a
decision tree that correctly classifies the given Examples.
Examples
15
16
Otherwise Begin
A ← the attribute from Attributes that best* classifies Examples
The decision attribute for Root ← A
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi
Let Examplesvi, be the subset of Examples that have value vi for A
If Examplesvi , is empty
Then below this new branch add a leaf node with label = most common value of Target_attribute in
Examples
Else below this new branch add the subtree ID3(Examples vi, Target_attribute, Attributes – {A}))
End
Return Root
• A statistical property called information gain measures how well a given attribute separates the training
examples according to their target classification.
• ID3 uses this information gain measure to select among the candidate attributes at each step while
growing the tree.
18
• Given a collection S, containing positive and negative examples of some target concept, the entropy of S
relative to this Boolean classification is
Where,
p+ is the proportion of positive examples in S
p- is the proportion of negative examples in S.
9 Positive examples
5 Negative examples
19
20
Example: Entropy
• The entropy is 1 when the collection contains an equal number of positive and
negative examples
Back
22
23
• Entropy measures the impurity in a collection of training examples. Information gain, is the expected
reduction in entropy caused by partitioning the examples according to the selected attribute.
Where, Values(A) is the set of all possible values for attribute A and S v is the subset of S for which attribute A has
value v, i.e., Sv ={s ϵ S | A(s)=v}
24
S = [9+, 5−]
Entropy (SWeak)[+6,-2] = -(6/8)* log2 (6/8) – (2/8)* log2 (2/8) SWeak = [6+, 2−]
= [3+, 3−]
= -0.75 *(-0.41503) – 0.25 * (-2)
SStrong
= 0.3112 + 0.5 = 0.811
Entropy (SStrong)[+3,-3] =
=1
An Illustrative Example
• To illustrate the operation of ID3, consider the learning task represented by the
training examples of below table.
• Here the target attribute PlayTennis, which can have values
yes or no for different days.
• Consider the first step through the algorithm, in which the topmost node of the
decision tree is created.
41
ID3 determines the information gain for each candidate attribute (i.e., Outlook, Temperature, Humidity,
and Wind), then selects the one with highest information gain
43
• According to the information gain measure, the Outlook attribute provides the
best prediction of the target attribute, PlayTennis, over the training examples.
Therefore, Outlook is selected as the decision attribute for the root node, and
branches are created below the root for each of its possible values i.e., Sunny,
Overcast, and Rain.
44
45
67
9. Apply ID3 algorithm for constructing decision tree for the following training examples. 10M
68
69