0% found this document useful (0 votes)
38 views2 pages

Assignment 1 (Decision Trees (2p) ) : Machine Learning - Sheet 2

This document provides assignments for a machine learning course. It includes 4 assignments on decision trees, entropy and information gain, implementing decision trees on an iris data set using scikit-learn in Python, and performing a z-test on data to identify outliers. It provides details and guidelines on calculating decision trees, entropy, information gain, cross-validation, and implementing a function for the Rosner test to identify outliers above a threshold. Students are asked to complete the assignments and submit their work by a specified deadline.

Uploaded by

Toni Tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views2 pages

Assignment 1 (Decision Trees (2p) ) : Machine Learning - Sheet 2

This document provides assignments for a machine learning course. It includes 4 assignments on decision trees, entropy and information gain, implementing decision trees on an iris data set using scikit-learn in Python, and performing a z-test on data to identify outliers. It provides details and guidelines on calculating decision trees, entropy, information gain, cross-validation, and implementing a function for the Rosner test to identify outliers above a threshold. Students are asked to complete the assignments and submit their work by a specified deadline.

Uploaded by

Toni Tan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Summer 2015

Machine Learning - Sheet 2

Submission until: 03.05. (late submissions will get a deduction)


Discussion on: 05.05.
Submission as upload to your groups stud.IP folder as groupNumber sheet2.zip

Assignment 1 (Decision Trees (2p))


Build/draw the decision trees for the following boolean functions:
(a) A B ( = xor)
(b) (A B) (B C)
(c) (A B) (A C)
(d) (A B) (A C) D

Assignment 2 (Entropy and information gain - 8p)

genre
action,romance,comedy

Nr.
1
2
3
4
5

Table 1: Attributes and their possible values


main-character
has ninjas
male,female
true,false

genre
action
romance
action
comedy
romance

Table 2: training examples


main-character
has ninjas
male
true
male
true
female
true
female
true
female
false

watch
no
yes
yes
yes
no

(a) (4 p) Consider the five training examples from Table 2. Build the root node of a decision
tree from these training examples.
To do this, you calculate the information gain on all three distinct attributes (genre, maincharacter, has ninjas) to decide which one would be the best choice for the root node (the
one with the largest gain).
The information gain is given as
P
|Sv |
Gain(S, A) = Entropy(S)
Entropy(Sv )
vV alues(A) |S|
The Entropy is given as
Entropy(S) = p log2 p p log2 p
Sv is the subset of S for which attribute A has value v.
Example for attribute main-character:
Sm [1+, 1], |Sm | = 2
Sf [2+, 1], |Sf | = 3
Provide all detailed calculations and the result.

Prof. Dr. G. Heidemann

Page 1

Summer 2015

Machine Learning - Sheet 2

(b) (2 p) Perform the same calculation as in a) but use the gain ratio instead of the information
gain. Does the result for the root node change?
GainRatio(S, A) =

Gain(S, A)
, with
SplitInf ormation(S, A)

SplitInf ormation(S, A) =

|Sv |
|Sv |
log2
|S|
|S|
vV alues(A)
P

(c) (2 p) Lets assume the root node is a node which checks the value of the attribute has ninjas.
Calculate the next level of the decision tree using the information gain.

Programming Exercises
For the following tasks you can use either Matlab or Python. Only use builtin functions where
they are explicitly permitted. Basic functions for file handling, array creation and manipulation
as well as plotting are of course excluded from this regulations. For Python users this covers the
use of the following modules:
1. scipy.io for handling .mat files
2. numpy for array creation/manipulation
3. matplotlib.pyplot for plotting
One last advice: Do NOT copy code from external sources and submit it as your own. If a group
should happen to submit such code all group members will receive a serious deduction of points.

Assignment 3 (Decision Trees (5p))


Use builtin functions to solve this task. For Matlab have a look at the classregtree function.
Python users should make use of the DecisionTreeClassifier class from the scikit-learn module,
as well as pydot for plotting.
(a) (2 p) Calculate a decision tree on the Iris data set (iris.mat).
(b) (1 p) Perform a 3-fold cross-validation on the data set.
(c) (2 p) Calculate the errors for the test classification. Display the best and worst decision tree.

Assignment 4 (Z-test (5p))


Given the data in zPoints.mat, implement a function that performs the Rosner test.
(a) (3 p) using the mean.
(b) (1 p) using the median.
Use 3.0 as value for the threshold. Plot the data points and highlight those that would be/are
removed (1 p).

Prof. Dr. G. Heidemann

Page 2

You might also like