0% found this document useful (0 votes)

48 views60 pages

Business Analytics & Machine Learning: Decision Tree Classifiers

The document discusses decision tree classifiers and their use in machine learning. It covers topics like information gain, gain ratio, and Gini index which are measures used to select the best attribute to split on at each node in building a decision tree. It provides an example of using information gain to choose the "Outlook" attribute for a weather data set. The document also discusses the process of building decision trees through top-down construction and bottom-up pruning.

Uploaded by

Arda Hüseyinoğlu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views60 pages

Business Analytics & Machine Learning: Decision Tree Classifiers

Uploaded by

Arda Hüseyinoğlu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Business Analytics & Machine Learning

Decision Tree Classifiers

Prof. Dr. Martin Bichler

Department of Computer Science
School of Computation, Information, and Technology
Technical University of Munich
Course Content
Machine
Learning

• Introduction
Supervized
• Regression Analysis Learning

• Regression Diagnostics
Regression Classification
• Logistic and Poisson Regression
• Naive Bayes and Bayesian Networks Linear Logistic
Regression Regression
• Decision Tree Classifiers
• Data Preparation and Causal Inference Poisson
Regression
Naive Bayes

• Model Selection and Learning Theory

• Ensemble Methods and Clustering Ensemble
Methods
Decision
Trees

• Dimensionality Reduction
PCR Ensemble
• Association Rules and Recommenders Regression Methods

• Convex Optimization
Neural
Lasso
• Neural Networks Networks

• Reinforcement Learning Neural

Networks

2
Recommended Literature
• Data Mining: Practical Machine Learning Tools and
Techniques
− Ian H. Witten, Eibe Frank, Mark A. Hall, Christopher Pal
− http://www.cs.waikato.ac.nz/ml/weka/book.html
− Section: 4.3, 6.1

Alternative literature:

• Machine Learning
− Tom M. Mitchell, 1997

• Data Mining: Introductory and Advanced Topics

− Margaret H. Dunham, 2003

3
Agenda for Today
• Choosing a splitting attribute in decision trees
− Information gain
− Gain ratio
− Gini index
• Numeric attributes
• Missing values
• Relational rules

4
The Weather Data
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
5
Example Tree for “Play?”

Outlook

sunny rain
overcast

Yes
Humidity Windy

high normal true false

No Yes No Yes

6
Regression Tree

CHMIN
≤ 7.5 > 7.5

CACH MMAX

≤ 8.5 ≤(8.5,28] >28

MMAX 64.6 MMAX

• High Accuracy
• Large and possibly awkward

7
Model Trees

CHMIN
≤ 7.5 > 7.5

CACH MMAX

≤ 8.5 >8.5 ≤ 28000 > 28000

MMAX LM4 LM5 LM6

LM 1 PRP = 8.29 + 0.004 MMAX + 2.77CHMIN

LM 2 PRP = 

8
Decision Trees
An internal node is a test on an attribute.

A branch represents an outcome of the test, e.g., 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶 = 𝑟𝑟𝑟𝑟𝑟𝑟.

A leaf node represents a class label or class label distribution.

At each node, one attribute is chosen to split training examples into distinct classes
as much as possible.

A new case is classified by following a matching path to a leaf node.

9
Building Decision Tree
Top-down tree construction
• At start, all training examples are at the root
• Partition the examples recursively by choosing one attribute each time

Bottom-up tree pruning

• Remove subtrees or branches, in a bottom-up manner, to improve the estimated
accuracy on new cases

10
Choosing the Splitting Attribute
At each node, available attributes are evaluated on the basis of separating the
classes of the training examples.

A “goodness” function is used for this purpose.

Typical goodness functions:

• information gain (ID3/C4.5)
• information gain ratio
• gini index (CART)

11
Which Attribute to Select?

12
A Criterion for Attribute Selection
Which is the best attribute?
• The one which will result in the smallest tree.
• Heuristic: choose the attribute that produces the “purest” nodes.

Popular impurity criterion: information gain.

• Information gain increases with the average purity of the subsets that an attribute
produces.

Strategy: choose attribute that results in greatest information gain.

13
Example: attribute “Outlook”

“Outlook” = “Sunny”:
info( 2,3 ) = entropy(2/5, 3/5) = − 2/5 𝑙𝑙𝑙𝑙𝑙𝑙2 (2/5 ) − 3/5 𝑙𝑙𝑙𝑙𝑙𝑙2 (3/5) = 0.971 bits

“Outlook” = “Overcast”: Note: log(0) is

not defined, but
info( 4,0 ) = entropy(1,0) = −1𝑙𝑙𝑙𝑙𝑙𝑙2 (1) − 0 𝑙𝑙𝑙𝑙𝑙𝑙2 (0) = 0 bits
we evaluate
0*log(0) as zero
“Outlook” = “Rainy”:
info( 3,2 ) = entropy(3/5, 2/5) = −3/5 𝑙𝑙𝑙𝑙𝑙𝑙2 (3/5) − 2/5 𝑙𝑙𝑙𝑙𝑙𝑙2 (2/5) = 0.971 bits

Expected information for attribute:

info( 2,3 , 4,0 , 3,2 ) = (5/14) × 0.971 + (4/14) × 0 + (5/14) x 0.971

= 0.693 bits

14
Computing the Information Gain
Information gain: (information before split) – (information after split)
gain Outlook = info 9,5 − info 2,3 , 4,0 , 3,2
= 0.940 − 0.693 = 0.247 bits

Information gain for attributes from weather data:

gain Outlook = 0.247 bits
gain Temperature = 0.029 bits
gain Humidity = 0.152 bits
gain Windy = 0.048 bits

15
Computing Information
Information is measured in bits
• given a probability distribution, the info required to encode/predict an event.
• entropy gives the information required in bits (this can involve fractions of bits!)

Formula for computing the information entropy:

entropy( p1 , p2 ,  , pn ) = − p1log 2 p1 − p2 log 2 p2  − pn log 2 pn

Note: In the exercises, we will use the natural logarithm for convenience (easier
with calculators in the exam), which turns „bits“ into „nits“

16
Expected Information Gain
| Sv |
gain ( S , a ) = entropy( S ) − ∑
v∈Values ( a ) | S |
entropy( S v )

S v = {s ∈ S : a ( s ) = v}
All possible values
for attribute 𝑎𝑎

gain(𝑆𝑆, 𝑎𝑎) is the information gained adding a sub-tree

(Reduction in number of bits needed to classify an instance)

Problems?

17
Wish List for a Purity Measure
Properties we require from a purity measure:
• When node is pure, measure should be zero (=0)
• When impurity is maximal (i.e. all classes equally likely), measure should be
maximal (e.g., 1 for boolean values)
• Multistage property: info[2,3,4]=info[2,7]+7/9 info[3,4]

Entropy is a function that satisfies these properties

Number of classes
𝑛𝑛

𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑆𝑆 = � −𝑝𝑝𝑖𝑖 𝑙𝑙𝑙𝑙𝑔𝑔2 𝑝𝑝𝑖𝑖 (scales from 0 to max log2n)

𝑖𝑖=1

Training data Probability of

(instances) S being classified to i

18
The Weather Data
Outlook Temp Humidity Windy Play
Sunny Hot High False No
Sunny Hot High True No
Overcast Hot High False Yes
Rainy Mild High False Yes
Rainy Cool Normal False Yes
Rainy Cool Normal True No
Overcast Cool Normal True Yes
Sunny Mild High False No
Sunny Cool Normal False Yes
Rainy Mild Normal False Yes
Sunny Mild Normal True Yes
Overcast Mild High True Yes
Overcast Hot Normal False Yes
Rainy Mild High True No
19
Continuing to Split

gain Temperatur = 0.571 bits gain Humidity = 0.971 bits

gain Windy = 0.020 bits

20
The Final Decision Tree

Note: not all leaves need to be pure;

 Splitting stops when data can’t be split any further

21
Claude Shannon
Born: 30 April 1916, Died: 23 February 2001
Shannon is famous for having founded
information theory with one landmark paper
published in 1948 (A Mathematical Theory of
Communication).

Information theory was developed to find fundamental limits on compressing and reliably
communicating data. Communications over a channel was the primary motivation. Channels
(such as a phone line) often fail to produce exact reconstruction of a signal; noise, periods of
silence, and other forms of signal corruption often degrade quality. How much information can
one hope to communicate over a noisy (or otherwise imperfect) channel?
An important application of information theory is coding theory:
• Data compression (by removing redundancy in data)
• Error-correcting codes add just the right kind of redundancy (i.e. error correction) needed to
transmit the data efficiently and faithfully across a noisy channel.

22
Background on Entropy and Information Theory
• Suppose there are 𝑛𝑛 possible states or messages.
• If the messages are equally likely, then the probability of each message
is 𝑝𝑝 = 1/𝑛𝑛 or 𝑛𝑛 = 1/𝑝𝑝
• Information (number of bits) of a message is described as
log 2 (𝑛𝑛) = log 2 (1/𝑝𝑝) = −log 2 (𝑝𝑝)
• Example: with 16 possible messages, the information is log(16) = 4
and we require 4 bits for each message.
• If the following probability distribution is given:
𝑃𝑃 = (𝑝𝑝1, 𝑝𝑝2, . . , 𝑝𝑝𝑛𝑛 )
then the information (or entropy of P) conveyed by the distribution
can be computed as follows:
𝐼𝐼(𝑃𝑃) = −(𝑝𝑝1 ∗ log(𝑝𝑝1) + 𝑝𝑝2 ∗ log(𝑝𝑝2) + . . + 𝑝𝑝𝑛𝑛 ∗ log(𝑝𝑝𝑛𝑛 ))
= − ∑𝑖𝑖 𝑝𝑝𝑖𝑖 ∗ log(𝑝𝑝𝑖𝑖 ) = ∑𝑖𝑖 𝑝𝑝𝑖𝑖 ∗ log(1/𝑝𝑝𝑖𝑖 )

23
Entropy
• Example: 𝑃𝑃 = 75% sun and 25% rain
𝐼𝐼 𝑃𝑃 = − 𝑝𝑝1 ∗ log 𝑝𝑝1 + 𝑝𝑝2 ∗ log 𝑝𝑝2 = −(0,75 ∗ log 0,75 − 0,25 ∗ log 0,25 ) = 0,81 bits

1
• Example: 8 equally likely states 𝐼𝐼 𝑃𝑃 = − ∑𝑖𝑖 𝑝𝑝𝑖𝑖 ∗ log 𝑝𝑝𝑖𝑖 = −8 ∗ ( ∗ −3) = 3 bits
8

Example: 8 states that are not equally likely

𝐼𝐼 𝑃𝑃 = − ∑𝑖𝑖 𝑞𝑞𝑖𝑖 ∗ log 𝑞𝑞𝑖𝑖 = −(0.35 ∗ log 0.35 + … + 0.01 ∗ log 0.01 ) = 2.23 bits

24
Detour: Cross-Entropy
Suppose, you have a classifier that produces a discrete probability distribution, and you have
the true underlying distribution. For example, the probabilities for high, medium, and low risk of
a customer might be:
Outcome distribution 𝑞𝑞 Ground truth 𝑝𝑝
0.7 1
0.2 0
0.1 0

The cross-entropy between the two distributions 𝑝𝑝 and 𝑞𝑞 is used to quantify the difference
between these two distributions.
𝐻𝐻 𝑝𝑝, 𝑞𝑞 = −∑𝑖𝑖 𝑝𝑝𝑖𝑖 ∗ log(𝑞𝑞𝑖𝑖 )

Example: 𝐻𝐻(𝑝𝑝, 𝑞𝑞) = −1 ∗ log(0.7) = 0,5146

The cross-entropy is not symmetric!

The lower the cross-entropy, the higher the probability that an event will happen.

25
Detour: KL-Divergence (or Relative Entropy)
The Kullback-Leibler-Divergence between two distributions p and q is a measure of how
one probability distribution is different from another. If KL-divergence is 0, there is no
difference.

cross-entropy = entropy + KL-divergence

𝐻𝐻 𝑝𝑝, 𝑞𝑞 = −∑𝑖𝑖 𝑝𝑝𝑖𝑖 ∗ log(𝑞𝑞𝑖𝑖 )

KL-divergence is defined as the difference between the cross entropy and entropy:

KL-divergence = cross-entropy – entropy

𝐷𝐷𝐾𝐾𝐾𝐾 𝑝𝑝, 𝑞𝑞 = 𝐻𝐻 𝑝𝑝, 𝑞𝑞 − 𝐻𝐻 𝑝𝑝

= −∑𝑖𝑖 𝑝𝑝𝑖𝑖 ∗ (log 𝑞𝑞𝑖𝑖 − log 𝑝𝑝𝑖𝑖 ) = −∑𝑖𝑖 𝑝𝑝𝑖𝑖 ∗ (log 𝑞𝑞 /𝑝𝑝 )) = ∑ 𝑝𝑝 ∗ (
𝑖𝑖 𝑖𝑖 𝑖𝑖 𝑖𝑖 log 𝑝𝑝𝑖𝑖 /𝑞𝑞𝑖𝑖 )

Both, cross-entropy and KL-divergence are used as loss function in machine learning.

26
Which splitting attribute would be selected in the following example?

ID Outlook Temperature Humidity Windy Play?

A sunny hot high false No
B sunny hot high true No
C overcast hot high false Yes
D rain mild high false Yes
E rain cool normal false Yes
F rain cool normal true No
G overcast cool normal true Yes
H sunny mild high false No
I sunny cool normal false Yes
J rain mild normal false Yes
K sunny mild normal true Yes
L overcast mild high true Yes
M overcast hot normal false Yes
N rain mild high true No

27
Split for ID Code Attribute

Entropy of split = 0 (since each leaf node is “pure”), having only one case.

Information gain is maximal for ID code.

28
Highly-Branching Attributes
Problematic: attributes with a large number of values
(extreme case: ID code)

Subsets are more likely to be pure if there is a large number of values.

• Information gain is biased towards choosing attributes with a large number of
values.
• This may result in overfitting (selection of an attribute that is non-optimal for
prediction).

29
Gain Ratio
Gain ratio: a modification of the information gain that reduces its bias on high-branch
attributes.

Gain ratio takes number and size of branches into account when choosing an
attribute.

It corrects the information gain by taking the intrinsic information of a split into
account (i.e. how much info do we need to tell which branch an instance belongs to).

30
Computing the Gain Ratio
Example: intrinsic information for ID code
1 1
intrinsic_info 1,1, … , 1 = 14 ∗ − ∗ log 2 = 3.807 bits
14 14

Importance of attribute decreases as intrinsic information grows.

gain S,a
Example of gain ratio: gainRatio S, a =
intrinsic_info S,a

0.940bits
Example: gainRatio ID_Code = = 0.246
3.807bits

31
Gain Ratios for Weather Data
Outlook Temperature
Info: 0.693 Info: 0.911
Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029
Split info: info([5,4,5]) 1.577 Split info: info([4,6,4]) 1.557
Gain ratio: 0.247/1.577 0.156 Gain ratio: 0.029/1.557 0.019

Humidity Windy
Info: 0.788 Info: 0.892
Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048
Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985
Gain ratio: 0.152/1 0.152 Gain ratio: 0.048/0.985 0.049

32
More on the Gain Ratio
„Outlook” still comes out top.

However:
• “ID code” has still greater gain ratio (0.246).
• Standard fix: ad hoc test to prevent splitting on that type of attribute.

Problem with gain ratio: it may overcompensate.

• May choose an attribute just because its intrinsic information is very low.
• Standard fix:
− First, only consider attributes with greater than average information gain.
− Then, compare them on gain ratio.

33
The Splitting Criterion in CART
• Classification and Regression Tree (CART)
• developed 1974-1984 by 4 statistics professors
− Leo Breiman (Berkeley), Jerry Friedman (Stanford), Charles Stone (Berkeley),
Richard Olshen (Stanford)
• Gini Index is used as a splitting criterion
• both C4.5 and CART are robust tools
• no method is always superior – experiment!

34
Gini Index for 2 Attribute Values
For example, two classes, Pos and Neg, and dataset S with 𝑝𝑝 Pos-elements and 𝑛𝑛
Neg-elements. The frequency of positives and negative classes is:

𝑃𝑃 = 𝑝𝑝 / (𝑝𝑝 + 𝑛𝑛)
𝑁𝑁 = 𝑛𝑛 / (𝑝𝑝 + 𝑛𝑛)

𝐺𝐺𝐺𝐺𝐺𝐺𝐺𝐺(𝑆𝑆) = 1 − 𝑃𝑃2 − 𝑁𝑁2 ∈ [0,0.5]

If dataset 𝑆𝑆 is split into 𝑆𝑆1 , 𝑆𝑆2 then

𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 (𝑆𝑆1 , 𝑆𝑆2 ) = (𝑝𝑝1 + 𝑛𝑛1 )/(𝑝𝑝 + 𝑛𝑛) · 𝐺𝐺𝐺𝐺𝑛𝑛𝑛𝑛(𝑆𝑆1 ) + (𝑝𝑝2 + 𝑛𝑛2 )/(𝑝𝑝 + 𝑛𝑛) · 𝐺𝐺𝐺𝐺𝐺𝐺𝑖𝑖(𝑆𝑆2 )

35
Example
Split by attribute I or II?

400 of A 400 of A
400 of B 400 of B

300 of A 100 of A 200 of A 200 of A

100 of B 300 of B 400 of B 0 of B

36
Example

Gini 𝑝𝑝 = 1 − � 𝑝𝑝𝑗𝑗2
𝑗𝑗

Numbers Proportion Gini Index

of Cases of Cases

A B A B

pA pB p2A p2B 1- p2A- p2B

400 400 0.5 0.5 0.25 0.25 0.5

Select the split that decreases the Gini Index most. This is done over all possible
places for a split and all possible variables to split.

37
Gini Index Example
Number Proportion Gini Index Info
of Cases of Cases required
A B A B
pA pB p2A p2B 1- p2A - p2B
300 100 0.75 0.25 0.5625 0.0625 0.375 0.1875 0.5*Gini(i)
100 300 0.25 0.75 0.0625 0.5625 0.375 0.1875
Total 0.375
200 400 0.33 0.67 0.1111 0.4444 0.4444 0.3333 0.75*Gini(i)
200 0 1 0 1 0 0 0
Total 0.3333

38
Generate_DT(samples, attribute_list)
• Create a new node N.

• If samples are all of class C then label N with C and exit.

• If attribute_list is empty then label N with majority_class(samples) and exit.

• Select best_split from attribute_list.

• For each value v of attribute best_split:

− Let S_v = set of samples with best_split=v.
− Let N_v = Generate_DT(S_v, attribute_list \ best_split).
− Create a branch from N to N_v labeled with the test best_split=v.

39
Time Complexity of Basic Algorithm
• Let 𝑚𝑚 be the number of attributes.

• Let 𝑛𝑛 be the number of instances.

• Assumption: Depth of tree is 𝒪𝒪(log 𝑛𝑛).

− usual assumption if the tree is not degenerate

• For each level of the tree all 𝑛𝑛 instances are considered (best = vi ).
− 𝒪𝒪(𝑛𝑛 log 𝑛𝑛) work for a single attribute over the entire tree

• Total cost is 𝒪𝒪(𝑚𝑚𝑚𝑚 log 𝑛𝑛) since all attributes are eventually considered.
− without pruning (see next class)

40
Scalability of DT Algorithms
• need to design for large amounts of data

• two things to worry about:

− large number of attributes
o leads to a large tree

o takes a long time

− large amounts of data

o Can the data be kept in memory?

o some new algorithms do not require all the data to be memory resident

41
C4.5 History
The above procedure is the basis for Ross Quinlain’s ID3 algorithm (so far works
only for nominal attributes).
• ID3, CHAID – 1960s

The algorithm was improved and is now most widely used as C4.5 or C5.0
respectively, available in most DM software packages.
• Commercial successor: C5.0

Witten et al. write “a landmark decision tree program that is probably the machine
learning workhorse most widely used in practice to date”.

42
C4.5 An Industrial-Strength Algorithm
For an algorithm to be useful in a wide range of real-world applications it must:
• permit numeric attributes
• allow missing values
• be robust in the presence of noise

Basic algorithm needs to be extended to fulfill these requirements.

43
How would you deal with missing values in the training data?

44
Outline for today
• Choosing a splitting attribute in decision trees
− Information gain
− Gain ratio
− Gini index
• Numeric attributes
• Missing values
• Relational rules

45
Numeric Attributes
Unlike nominal attributes, every attribute has many possible split points.
• Standard method: binary splits
• E.g. temp < 45

Solution is straightforward extension:

• Evaluate info gain (or other measure)
for every possible split point of attribute
• Choose “best” split point
• Info gain for best split point is highest info gain for attribute

Numerical attributes can be used several times in a decision tree, nominal attributes
only once.

46
Example
Split on temperature attribute:
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No

• e.g. temperature < 71.5: yes/4, no/2

temperature ≥ 71.5: yes/5, no/3
• Info([4,2],[5,3])
= 6/14 info([4,2]) + 8/14 info([5,3])
= 0.939 bits

Place split points halfway between values.

47
Binary Splits on Numeric Attributes
Splitting (multi-way) on a nominal attribute exhausts all information in that attribute.
• nominal attribute is tested (at most) once on any path in the tree

Not so for binary splits on numeric attributes!

• numeric attributes may be tested several times along a path in the tree

Disadvantage: tree is hard to read.

Remedy:
• pre-discretize numeric attributes, or
• allow for multi-way splits instead of binary ones using the Information Gain
criterion

48
Outline for today
• Choosing a splitting attribute in decision trees
− Information gain
− Gain ratio
− Gini index
• Numeric attributes
• Missing values
• Relational rules

49
Handling Missing Values / Training
Ignore instances with missing values.
• pretty harsh, and missing value might not be important

Ignore attributes with missing values.

• again, may not be feasible

Treat missing value as another nominal value.

• fine if missing a value has a significant meaning

Estimate missing values.

• data imputation: regression, nearest neighbor, mean, mode, etc.

50
Handling Missing Values / Classification
Follow the leader.
• an instance with a missing value for a tested attribute (temp) is sent down the
branch with the most instances

Temp

<75 >= 75
5 instances 3 instances

Instance included on the left branch

51
Handling Missing Values / Classification
“Partition” the instance.
• branches show # of instances
• Send down parts of the instance (e.g. 3/8 on Windy and 5/8 on Sunny)
proportional to the number of training instances
• Resulting leaf nodes get weighted in the result

Outlook

5/8 5 3 3/8

Temp. Windy

2 3 1 1 1

52
Overfitting
Two sources of abnormalities
• noise (randomness)
• outliers (measurement errors)

Chasing every abnormality causes overfitting

• decision tree gets too large and complex
• good accuracy on training set, poor accuracy on test set
• does not generalize to new data any more

Solution: prune the tree

53
Decision Trees - Summary
Decision trees are a classification technique.

The output of decision trees can be used for descriptive as well as predictive
purposes.

They can represent any function in the form of propositional logic.

Heuristics such as information gain are used to select relevant attributes.

Pruning is used to avoid overfitting.

54
Outline for today
• Choosing a splitting attribute in decision trees
− Information gain
− Gain ratio
− Gini index
• Numeric attributes
• Missing values
• Relational rules

55
The Shapes Problem

Shaded=standing
Unshaded=lying

56
Instances
Width Height Sides Class
2 4 4 Standing
3 6 4 Standing
4 3 4 Lying
7 8 3 Standing
7 6 3 Lying
2 9 4 Standing
9 1 4 Lying
10 2 3 Lying

If width ≥ 3.5 and height < 7.0 then lying.

If height ≥ 3.5 then standing.

57
Classification Rules
If width ≥ 3.5 and height < 7.0 then lying.
If height ≥ 3.5 then standing.

Work well to classify these instances ... but not necessarily for new ones.

Problems?

58
Relational Rules
If width > height then lying
If height > width then standing

Rules comparing attributes to constants are called propositional rules (propositional

data mining).

Relational rules are more expressive in some cases.

• define relations between attributes (relational data mining)
• most DM techniques do not consider relational rules

As a workaround for some cases, one can introduce additional attributes, describing
if width > height.
• allows using conventional “propositional” learners

59
Propositional Logic
Essentially, decision trees can represent any function in propositional logic.
• A, B, C: propositional variables
• and, or, not, => (implies), <=> (equivalent): connectives

A proposition is a statement that is either true or false.

• The sky is blue: color of sky = blue

Decision trees are an example of a propositional learner.

Decision Trees
No ratings yet
Decision Trees
26 pages
ID3 Lecture4
No ratings yet
ID3 Lecture4
25 pages
ML Lecture04x2
No ratings yet
ML Lecture04x2
16 pages
Classification: Decision Trees
No ratings yet
Classification: Decision Trees
30 pages
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
No ratings yet
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
61 pages
Decision Trees
No ratings yet
Decision Trees
128 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
2024 Lecture11 MLAlgorithms
No ratings yet
2024 Lecture11 MLAlgorithms
84 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Lecture2 DT
No ratings yet
Lecture2 DT
75 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Chap5 - Machine Learning Part II - Decision Tree
No ratings yet
Chap5 - Machine Learning Part II - Decision Tree
68 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Decision Tree
No ratings yet
Decision Tree
42 pages
Chapter 3 Decision Trees
No ratings yet
Chapter 3 Decision Trees
61 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Trees
No ratings yet
Trees
78 pages
2c Decision Tree Algorithm
No ratings yet
2c Decision Tree Algorithm
21 pages
Decision Tree
No ratings yet
Decision Tree
23 pages
Decision Trees: Classifier
No ratings yet
Decision Trees: Classifier
23 pages
L-10 Iiitmg
No ratings yet
L-10 Iiitmg
28 pages
Decision Tree Part 1
No ratings yet
Decision Tree Part 1
15 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Trees MIT 15.097 Course Notes
No ratings yet
Decision Trees MIT 15.097 Course Notes
17 pages
Data Mining I: Summer Semester 2017
No ratings yet
Data Mining I: Summer Semester 2017
52 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
No ratings yet
Decision Tree Learning: - A Learned Decision Tree Can Also Be Re-Represented As A Set of If-Then Rules
49 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
3 Decision Trees - LMS
No ratings yet
3 Decision Trees - LMS
47 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
DMDW Co3 Session 14
No ratings yet
DMDW Co3 Session 14
55 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
No ratings yet
Machine Learning 10601 Recitation 8 Oct 21, 2009: Oznur Tastan
46 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
Classification
No ratings yet
Classification
7 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Machine Learning Unit-3.2
No ratings yet
Machine Learning Unit-3.2
61 pages
ML - 04 - Decision Trees
No ratings yet
ML - 04 - Decision Trees
51 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
SDG Sdgs DF
No ratings yet
SDG Sdgs DF
23 pages
COS10022 DSP Week05 Decision Tree and Random Forest
No ratings yet
COS10022 DSP Week05 Decision Tree and Random Forest
50 pages
Decision Trees Notes
No ratings yet
Decision Trees Notes
11 pages
Module 3 Chap 3 Decision Tree Learning
No ratings yet
Module 3 Chap 3 Decision Tree Learning
79 pages
03-FSSR DS610 2024 2025T1 DT
No ratings yet
03-FSSR DS610 2024 2025T1 DT
51 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
74 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Week 11 - Decision Tree Learning
No ratings yet
Week 11 - Decision Tree Learning
43 pages
ABDULSALAM IBRAHIM OLOLADE Report
No ratings yet
ABDULSALAM IBRAHIM OLOLADE Report
22 pages
2.5 New Media
No ratings yet
2.5 New Media
6 pages
API Security A Comprehensive OWASP Top 10 API Playbook
No ratings yet
API Security A Comprehensive OWASP Top 10 API Playbook
60 pages
2SC6105 Datasheet
No ratings yet
2SC6105 Datasheet
5 pages
The Activities Are Carried Out by The Following Three People: Administrative Support Person: Filing
No ratings yet
The Activities Are Carried Out by The Following Three People: Administrative Support Person: Filing
3 pages
Java J2EE-Unit-1
No ratings yet
Java J2EE-Unit-1
42 pages
Kishore
No ratings yet
Kishore
50 pages
Societal Project On Scada by Medha Servo Drives
No ratings yet
Societal Project On Scada by Medha Servo Drives
24 pages
Manual Smar Tt301
100% (1)
Manual Smar Tt301
58 pages
Devops: Roadmap - SH
No ratings yet
Devops: Roadmap - SH
1 page
Fuel Card Request Form
100% (1)
Fuel Card Request Form
2 pages
Final Rpaper
No ratings yet
Final Rpaper
5 pages
COBOL SQL SFPDCDRV Stuck in Status Processing (Doc ID 2308542.1)
No ratings yet
COBOL SQL SFPDCDRV Stuck in Status Processing (Doc ID 2308542.1)
2 pages
Management Information Systems: Moving Business Forward 4th Edition R. Kelly Rainer All Chapter Instant Download
100% (4)
Management Information Systems: Moving Business Forward 4th Edition R. Kelly Rainer All Chapter Instant Download
53 pages
TG DK2 en
100% (1)
TG DK2 en
42 pages
OPSC7311MM
No ratings yet
OPSC7311MM
200 pages
QBEYBS
No ratings yet
QBEYBS
8 pages
9.3 技术服务合同
No ratings yet
9.3 技术服务合同
9 pages
Hella-India Report
No ratings yet
Hella-India Report
36 pages
Programming: Just Basic Tutorials
67% (3)
Programming: Just Basic Tutorials
360 pages
System Analysis and Design Assignment 1
No ratings yet
System Analysis and Design Assignment 1
3 pages
Chapter 9-Statements
No ratings yet
Chapter 9-Statements
3 pages
Automotive ECU SW Function Development Chart Template
100% (1)
Automotive ECU SW Function Development Chart Template
21 pages
AccountStatement - Report Subhash Kumar
No ratings yet
AccountStatement - Report Subhash Kumar
5 pages
Sy0-701 0
No ratings yet
Sy0-701 0
25 pages
Sa1 - Sample Question Paper Grade 8 PDF
No ratings yet
Sa1 - Sample Question Paper Grade 8 PDF
4 pages
Mini Max Algorithm
No ratings yet
Mini Max Algorithm
31 pages
Seminar Report Iot Based Health Monitoring System 2023
100% (1)
Seminar Report Iot Based Health Monitoring System 2023
19 pages
Ventor Quick Start Guide v2
No ratings yet
Ventor Quick Start Guide v2
16 pages
ECM &control Electronic Komatsu
No ratings yet
ECM &control Electronic Komatsu
33 pages