Homework
Homework
20
and we need to determine the best of:
Note: S has 4 categories(Cinema, Tennis, Stay in, Shopping). To calculate for Example
Entropy(Ssun )= Σ- pi(of sun with each category of S/no of sun)log2 pi
Entropy(Ssun )= - (1/3) log2 (1/3) (2/3) log2 (2/3) - (0/3) log2 (0/3) - (0/3) log2 (0/3)
21
This means that the first node in the decision tree will be the weather
attribute. As an exercise, convince yourself why this scored (slightly)
higher than the parents attribute - remember what entropy means and
look at the way information gain is calculated.
From the weather node, we draw a branch for the values that weather
can take: sunny, windy and rainy:
22
Now we look at the first branch. Ssunny = {W1, W2,
W10}.
This is not empty, so we do not put a default
categorization leaf node here.
The categorizations of W1, W2 and W10 are
Cinema, Tennis and Tennis respectively. As these
are not all the same, we cannot put a
categorization leaf node here.
Hence we put an attribute node here, which we
will leave blank for the time being.
23
Looking at the second branch, Swindy = {W3, W7,
W8, W9}.
Again, this is not empty, and they do not all
belong to the same class, so we put an attribute
node here, left blank for now.
The same situation happens with the third
branch, hence our amended tree looks like this:
24
Now we have to fill in the choice of attribute A, which
we know cannot be weather, because we've already
removed that from the list of attributes to use. So, we
need to calculate the values for Gain(Ssunny, parents)
and Gain(Ssunny, money). Firstly, Entropy(Ssunny) =
0.918.
Next, we set S to be Ssunny = {W1,W2,W10} (and, for
this part of the branch, we will ignore all the other
examples). In effect, we are interested only in this
part of the table:
Weekend Decision
Weather Parents Money
(Example) (Category)
W1 Sunny Yes Rich Cinema
W2 Sunny No Rich Tennis
W10 Sunny No Rich Tennis
25
Hence we can calculate:
26
Given our calculations, attribute A should be taken as parents
The two values from parents are yes and no, and we will draw a branch
from the node for each of these.
looking at Syes, we see that the only example of this is W1. Hence, the
branch for yes stops at a categorization leaf, with the category being
Cinema.
Also, Sno contains W2 and W10, but these are in the same category
(Tennis). Hence the branch for no ends here at a categorization
leaf(Tennis).
Hence our upgraded tree looks like this: