Decision Tree
Decision Tree
Decision Tree:
A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal
node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds
a class label. The topmost node in the tree is the root node.
Example:
The data set has five attributes. There is a special attribute: attribute class is the class label. The attributes, temp
and humidity are numerical attributes and the other attributes are categorical, that is, they cannot be ordered. Based on
the training data set, we want to find a set of rules to know what values of outlook, temperature, humidity and wind,
determine whether or not to play golf.
Decision Tree
In the above tree we have five leaf nodes. In a decision tree, each leaf node represents a rule. We have the following rules
corresponding to above tree.
RULE 1 If it is sunny and the humidity is not above 75%, then play.
RULE 2 If it is sunny and the humidity is above 75%, then do not play.
RULE 3 If it is overcast, then play.
RULE 4 If it is rainy and not windy, then play.
RULE 5 If it is rainy and windy, then don’t play.
The classification of an unknown input vector is done by traversing the tree from the root node to a leaf node. A
record enters the tree at the root node. At the root, a test is applied to determine which child node the record will
encounter next. This process is repeated until the record arrives at a leaf node. All the records that end up at a given leaf of
the tree are classified in the same way. There is a unique path from the root to each leaf. The path is a rule which is used to
classify the records.
In the above tree, we can carry out the classification for an unknown record as follows, Let us assume, for the record,
that we know the values of the first four attributes (but we do not know the value of class attribute) as
Outlook = rain; temp = 70; humidity = 65; and windy = true.
We start from the root node to check the value of the attribute associated at the root node. This attribute is the
splitting attribute at this node. Please note that for a decision tree, at every node there is an attribute associated with the
node called the splitting attribute. In our example, outlook is the splitting attribute at root. Since for the given record,
outlook = rain, we move to the right-most child node of the root, At this node, the splitting attribute is windy and we find
that for the record we want classify, windy = true, Hence, we move to the left child node to conclude that the class label is
"no play" .
1
Decision Trees
Note that every path from root node to leaf nodes represents a rule. It may be noted that many different leaves
of the tree may refer to the same class labels, but each leaf refers to a different rule.
The accuracy of the classifier is determined by the percentage of the test data set that is correctly classified.
Consider the following test data set,
We can see that RULE 1 there are two records of the test data set satisfying outlook = sunny and humidity<75,
and only one of these is correctly classified as play. Thus, the accuracy of this rule is 0.5 (or 50%). Similarly, the accuracy of
RULE 2 is also 0.5(or 50%). The accuracy of RULE 3 is 0.66.