P02 DecisionTrees SolutionNotes
P02 DecisionTrees SolutionNotes
Practical exercises
Plot the learned decision tree using information gain (Shannon entropy). Show your calculus.
Brief notes:
y1 provides the highest gain, 𝐼𝐺(𝑦𝑜𝑢𝑡 |𝑦1 ) = 1 − 0.33, hence selected.
y1 correctly classifies all observations when 𝑦1 = 𝑏 and 𝑦1 = 𝑐
Entropies of 𝑦2 and 𝑦3 for (𝑦1 = 𝑎)-conditional data are both zero, so
we can select either.
There is no more uncertainty.
2. Show if a decision tree can learn the following logical functions and, if so, plot the corresponding
decision boundaries.
a) AND
b) OR
c) XOR
3. Consider the following testing targets, 𝑧, and the corresponding predictions, 𝑧̂, by a decision tree:
𝑧 = [𝐴 𝐴 𝐴 𝐵 𝐵 𝐵 𝐶 𝐶 𝐶 𝐶]
𝑧̂ = [𝐵 𝐵 𝐴 𝐶 𝐵 𝐴 𝐶 𝐴 𝐵 𝐶]
B 2 1 1
C 0 1 2
b) Compute the accuracy and sensitivity/recall per class
1 1 1
𝑎𝑐𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 0.4, 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦𝐴 = , 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦𝐵 = , 𝑠𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦𝐶 =
3 3 2
#𝐴 = 66 + 19 + 14 + 15 + 18 = 132
#𝐵 = 0 + 0 + 2 + 4 + 236 = 242
The minority class is A, hence is seen as positive.
True
P (A) N (B)
P (A) 114 6
Predicted
N (B) 18 236
b) Compare the accuracy of the given tree versus a pruned tree with only two nodes.
Is there any evidence towards overfitting?
Without the testing accuracy, there is no sufficient evidence to assume the tree is prone to overfit input data.
c) [optional] Are decision trees learned from high-dimensional data susceptible to underfitting?
Why an ensemble of decision trees minimizes this problem?
Assuming a limited depth, relevant data may be discarded due to a focus on a compact subset of overall
input variables. In ensemble models, such as random forests, different decision trees can be learned from
data subsamples and subspaces, leading to decisions that consider a broader set of input variables.
Programming quests
5. Following the provided Jupyter notebook on Classification, learn and evaluate a decision tree
classifier on the breast.w.arff dataset (available at the webpage) using sklearn.