0% found this document useful (0 votes)
152 views2 pages

Decision Tree

This document discusses decision trees, including: - Decision trees classify instances by sorting them down the tree from root to leaf node. - Internal nodes test attribute values and branches split the data, with leaf nodes holding class labels. - An example decision tree uses weather attributes to classify whether to play golf or not. - The tree produces rules like "If it is sunny and humidity is below 75%, then play."

Uploaded by

Joel Dcunha Joel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
152 views2 pages

Decision Tree

This document discusses decision trees, including: - Decision trees classify instances by sorting them down the tree from root to leaf node. - Internal nodes test attribute values and branches split the data, with leaf nodes holding class labels. - An example decision tree uses weather attributes to classify whether to play golf or not. - The tree produces rules like "If it is sunny and humidity is below 75%, then play."

Uploaded by

Joel Dcunha Joel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Decision Trees

Decision Tree:
A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal
node denotes a test on an attribute, each branch denotes the outcome of a test, and each leaf node holds
a class label. The topmost node in the tree is the root node.
Example:

OUTLOOK TEMP(F) HUMIDITY (%) WINDY CLASS


sunny 79 90 True No play
sunny 56 70 False Play
sunny 79 75 True Play
sunny 60 90 True No play
Overcast 88 88 False No play
Training Data Set
Overcast 63 75 True Play
Overcast 88 95 False Play
Rain 78 60 False Play
Rain 66 70 False No play
Rain 68 60 True No play

The data set has five attributes. There is a special attribute: attribute class is the class label. The attributes, temp
and humidity are numerical attributes and the other attributes are categorical, that is, they cannot be ordered. Based on
the training data set, we want to find a set of rules to know what values of outlook, temperature, humidity and wind,
determine whether or not to play golf.

Decision Tree

In the above tree we have five leaf nodes. In a decision tree, each leaf node represents a rule. We have the following rules
corresponding to above tree.
RULE 1 If it is sunny and the humidity is not above 75%, then play.
RULE 2 If it is sunny and the humidity is above 75%, then do not play.
RULE 3 If it is overcast, then play.
RULE 4 If it is rainy and not windy, then play.
RULE 5 If it is rainy and windy, then don’t play.

The classification of an unknown input vector is done by traversing the tree from the root node to a leaf node. A
record enters the tree at the root node. At the root, a test is applied to determine which child node the record will
encounter next. This process is repeated until the record arrives at a leaf node. All the records that end up at a given leaf of
the tree are classified in the same way. There is a unique path from the root to each leaf. The path is a rule which is used to
classify the records.
In the above tree, we can carry out the classification for an unknown record as follows, Let us assume, for the record,
that we know the values of the first four attributes (but we do not know the value of class attribute) as
Outlook = rain; temp = 70; humidity = 65; and windy = true.
We start from the root node to check the value of the attribute associated at the root node. This attribute is the
splitting attribute at this node. Please note that for a decision tree, at every node there is an attribute associated with the
node called the splitting attribute. In our example, outlook is the splitting attribute at root. Since for the given record,
outlook = rain, we move to the right-most child node of the root, At this node, the splitting attribute is windy and we find
that for the record we want classify, windy = true, Hence, we move to the left child node to conclude that the class label is
"no play" .

1
Decision Trees

Note that every path from root node to leaf nodes represents a rule. It may be noted that many different leaves
of the tree may refer to the same class labels, but each leaf refers to a different rule.
The accuracy of the classifier is determined by the percentage of the test data set that is correctly classified.
Consider the following test data set,

OUTLOOK TEMP(F) HUMIDITY (%) WINDY CLASS


sunny 79 90 True Play
sunny 56 70 False Play
sunny 79 75 True No play
sunny 60 90 True No play
Overcast 88 88 False No play Test Data Set
Overcast 63 75 True Play
Overcast 88 95 False Play
Rain 78 60 False Play
Rain 66 70 False No play
Rain 68 60 True play

We can see that RULE 1 there are two records of the test data set satisfying outlook = sunny and humidity<75,
and only one of these is correctly classified as play. Thus, the accuracy of this rule is 0.5 (or 50%). Similarly, the accuracy of
RULE 2 is also 0.5(or 50%). The accuracy of RULE 3 is 0.66.

Advantages and Shortcomings of Decision Tree Classifications

The major strengths of the decision tree methods are following:

 decision trees are able to generate understandable rules,


 they are able to handle both numerical and the categorical attributes, and
 they provide a clear indication of which fields are most important for prediction or classification.

Some of the weaknesses of the decision trees are:


 some decision trees can only deal with binary-valued target classes. Others are able to assign records to an
arbitrary number of classes, but are error-prone when the number of training examples per class gets small. This
can happen rather quickly in a tree with many levels and/or many branches per node.
 the process of growing a decision tree is computationally expensive. At each node, each candidate splitting field
is examined before its best split can be found.

You might also like