Assignment 1 (Decision Trees (2p) ) : Machine Learning - Sheet 2
Assignment 1 (Decision Trees (2p) ) : Machine Learning - Sheet 2
genre
action,romance,comedy
Nr.
1
2
3
4
5
genre
action
romance
action
comedy
romance
watch
no
yes
yes
yes
no
(a) (4 p) Consider the five training examples from Table 2. Build the root node of a decision
tree from these training examples.
To do this, you calculate the information gain on all three distinct attributes (genre, maincharacter, has ninjas) to decide which one would be the best choice for the root node (the
one with the largest gain).
The information gain is given as
P
|Sv |
Gain(S, A) = Entropy(S)
Entropy(Sv )
vV alues(A) |S|
The Entropy is given as
Entropy(S) = p log2 p p log2 p
Sv is the subset of S for which attribute A has value v.
Example for attribute main-character:
Sm [1+, 1], |Sm | = 2
Sf [2+, 1], |Sf | = 3
Provide all detailed calculations and the result.
Page 1
Summer 2015
(b) (2 p) Perform the same calculation as in a) but use the gain ratio instead of the information
gain. Does the result for the root node change?
GainRatio(S, A) =
Gain(S, A)
, with
SplitInf ormation(S, A)
SplitInf ormation(S, A) =
|Sv |
|Sv |
log2
|S|
|S|
vV alues(A)
P
(c) (2 p) Lets assume the root node is a node which checks the value of the attribute has ninjas.
Calculate the next level of the decision tree using the information gain.
Programming Exercises
For the following tasks you can use either Matlab or Python. Only use builtin functions where
they are explicitly permitted. Basic functions for file handling, array creation and manipulation
as well as plotting are of course excluded from this regulations. For Python users this covers the
use of the following modules:
1. scipy.io for handling .mat files
2. numpy for array creation/manipulation
3. matplotlib.pyplot for plotting
One last advice: Do NOT copy code from external sources and submit it as your own. If a group
should happen to submit such code all group members will receive a serious deduction of points.
Page 2