DM Witten 03
DM Witten 03
• There are many different ways for representing the patterns that can be
discovered by machine learning, and…
• each one dictates the kind of technique that can be used to infer that
output structure from data.
• Once you understand how the output is represented, you have come a
long way toward understanding how it can be generated.
Tables
Tables
• The simplest, most rudimentary way of representing the output from
machine learning is to make it just the same as the input—a table.
Tables
• Decision table for the weather problem Outlook Humidity Play
Sunny high no
Sunny normal yes
Overcast high yes
• Main problem: selecting the right attributes Overcast normal yes
Rainy high No
Rainy normal yes
Linear Model
Linear Model
• Output is the sum of the attribute values, with weights applied to each
attribute before adding them together.
• The trick is to come up with good values for the weights—ones that
make the model’s output match the desired output.
• The output and the inputs—attribute values—are all numeric.
• Statisticians use the word regression for the process of predicting a
numeric quantity, and regression model is another term for this kind of
linear model.
Linear Model
• Easiest to visualize in two dimensions, where Chart Title
they are tantamount to drawing a straight line 1400
through a set of data points.
• Figure shows a line fitted to the CPU 1200
performance data described where only the 1000
cache attribute is used as input. 800
• The class attribute performance is shown on
the vertical axis, with cache on the horizontal 600
axis; both are numeric. 400
• The straight line represents the “best fit” 200
prediction equation
PRP + 37.06 + 2.47 CACH 0
0 50 100 150 200 250 300
Linear Model
• Easiest to visualize in two dimensions, where Chart Title
they are tantamount to drawing a straight line 1400
through a set of data points.
• Figure shows a line fitted to the CPU 1200
performance data described where only the 1000
cache attribute is used as input. 800
• The class attribute performance is shown on
the vertical axis, with cache on the horizontal 600
axis; both are numeric. 400
• The straight line represents the “best fit” 200
PRP
prediction equation
PRP + 37.06 + 2.47 CACH 0
0 50 100 150 200 250 300
CACH
Linear Model
• Can also be applied to binary classification problems. 2
• In this case, the line produced by the model separates 1.8
the two classes: 1.6
It defines where the decision changes from one class 1.4
Petal Width
value to the other. 1.2
• Such a line is often referred to as the decision 1
boundary. 0.8
• Figure shows a decision boundary for the iris data that 0.6
separates the Iris setosas from the Iris versicolors. 0.4
• In this case, the data is plotted using two of the input 0.2
attributes—petal length and petal width—and
0
• The straight line defining the decision boundary is a 0 20 40 60 80 100 120
function of these two attributes. Points lying on the Petal Length
line are given by the equation
• 2.0 − 0.5 PETAL-LENGTH − 0.8 PETAL-WIDTH = 0
Decision trees
Decision trees
• “Divide-and-conquer” approach produces tree
If a and b then x
If c and d then x
• Symmetry needs to be broken
• Corresponding tree contains identical subtrees
(known as “replicated subtree problem”)
Classification Rules – From Trees to Rules
If a and b then x
If c and d then x
The exclusive-or problem
x = 1? If x = 1 and y = 0 then class = a
If x = 0 and y = 1 then class = a
NO YES If x = 0 and y = 0 then class = b
If x = 1 and y = 1 then class = b
y = 1? y = 1?
NO YES NO YES
b a a b
Write Rules for this tree
x
1 2 3
Write Rules for this tree
y
1 2 3
1 2 3
w b b
1 2 3
a b b
x
1 2 3
Write Rules for this tree
y
1 2 3
1 2 3
w b b
1 2 3
a b b
Classification Rules – From Trees to Rules
If x = 1 and y = 1 then class = a
If z = 1 and w = 1 then class = a
Otherwise class = b
Executing a Rule Set
• Two ways of executing a rule set:
▫ Ordered set of rules (“decision list”)
Order is important for interpretation
▫ Unordered set of rules
Rules may overlap and lead to different conclusions for the
same instance
Executing a Rule Set - Problems
• What if two or more rules conflict?
▫ Give no conclusion at all?
▫ Go with rule that is most popular on training data?
▫…
• What if no rule applies to a test instance?
▫ Give no conclusion at all?
▫ Go with class that is most frequent in training data?
▫…
Executing a Rule Set – Boolean Class
• Assumption: if instance does not belong to class “yes”, it belongs to class “no”
• Trick: only learn rules for class “yes” and use default rule for “no”