Lab 08: ID3 - Decision Tree and Linear Regression Objectives
Lab 08: ID3 - Decision Tree and Linear Regression Objectives
1. Decision Tree
A decison tree is a tree in which each branch node represents a choice between a number of
alternatives, and each leaf node represents a decision. Decision trees are commonly used for gaining
informatiom for the purpose of decision-making. Decision tree starts with a root node on which it is
for users to take actions. From this node, users split each node recursively according to decision tree
learning algorithm. The final result is a decision tree in which each branch represents a possible
scenario of decision and its outcome.
Entropy
In information theory, entropy is a measure of the uncertainity about a source of messages. The more
uncertain a receiver is about a source of messages, the more information that receiver will need in
order to know what message has been sent.
Information Gain
Measuring the expected reduction in Entropy As we mentioned before, to minimize the decision tree
depth, when we traverse the tree path, we need to select the optimal attribute for splitting the tree
node, which we can easily imply that the attribute with the most entropy reduction is the best choice.
We define information gain as the expected reduction of entropy related to specified attribute when
splitting a decision tree node.
Write a program in Python to implement the ID3 decision tree algorithm. You should read in a tab
delimited dataset, and output to the screen your decision tree and the training set accuracy in some
readable format.
Your decision tree program should be able to work on any dataset (don't hardcode in attributes or
values).
restaurant data:
restaurant.csv
o restaurant data
restaurant.txt
o restaurant metadata
zoo data:
zoo.csv
o zoo data
zoo.txt
o zoo metadata
credit screening data
crx.csv
o credit screening data
crx.txt
o Credit screening metadata
(for testing purposes, you might also want to work with the tennis example you solved by hand
tennis.txt tennis.csv
above. Here is the metadata, and here is the data.) When you run your
program, it should take a file name containing the data. For example:
BU, CS Department 3/4 Semester 7 (Spring 2018)
CSL-411: AI Lab Lab 08: ID3 &Lin Reg
tennis.txt
For output, you can choose how to draw the tree so long as it is clear what the tree is. For example:
outlook = sunny
| humidity = high: no
| humidity = normal: yes
outlook = overcast: yes
outlook = rainy
| windy = TRUE: no
| windy = FALSE: yes
2. Linear Regression
Formulas
Slope-intercept form of the equation for the linear regression prediction equation is y = a + bX
Where:
Yˆ = predicted score
b = slope of the line
a = Y intercept
parameters estimators
Write a program in Python to implement the linear regression algorithm. You should read in a tab
delimited dataset, and output to the screen your final linear regression expression.
6 136 59
7 144 62
8 142 65
9 149 67
10 161 71
11 167 72
12 168 74
13 162 75
14 171 76
15 175 79
16 182 80
17 180 82
18 183 85
19 188 87
20 200 90
21 194 93
22 206 94
23 207 95
24 210 97
25 219 100