0% found this document useful (0 votes)
83 views28 pages

Decision Tree ID3 CART

The document provides an example of a decision tree, its components, and how it can be used for classification. It describes the different node types in a decision tree - root, internal, and leaf nodes. It also discusses binning of numerical variables, measures for selecting the best split like information gain and Gini index, and gives an example decision tree built from a sample dataset.

Uploaded by

Pravin Pandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views28 pages

Decision Tree ID3 CART

The document provides an example of a decision tree, its components, and how it can be used for classification. It describes the different node types in a decision tree - root, internal, and leaf nodes. It also discusses binning of numerical variables, measures for selecting the best split like information gain and Gini index, and gives an example decision tree built from a sample dataset.

Uploaded by

Pravin Pandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Decision Tree

Example of a decision tree


Example of a decision tree
Example of a decision tree
The tree has three types of nodes:
• A root node that has no incoming edges and zero or more
outgoing edges.
• Internal nodes, each of which has exactly one incoming edge and
two or more outgoing edges.
• Leaf or terminal nodes, each of which has exactly one incoming
edge and no outgoing edges
Representation of decision tree
w = (B-A) / k

Binning
• Binning or discretization is the process of transforming numerical
variables into categorical counterparts.
• An example is to bin values for Age into categories such as 20-39, 40-
59, and 60-79.
• Numerical variables are usually discretized in the modeling methods
based on frequency tables (e.g., decision trees).
• 2, 6, 7, 9, 13, 20, 21, 25, 30
• Width = w = (B-A) / k
• 10,15,16,18,20,30,35,42,48,50,52,55
• Suppose k = 4
• [min, min+w-1]
• [min+w,min+2w-1]
• [min+2w, min+3w-1]
• [min+3w, max]
When to use decision tree?
•Conjunction (and, &, ∧): combine (add) propositions.
•Disjunction (or): choose (select) propositions.
Dataset
Age Competition Type Profit

Old Yes software Down


Old No software Down
Old No hardware Down
mid yes software Down
mid yes hardware Down
mid No hardware Up
mid No software Up
new yes software Up
new No hardware Up
new No software Up
Measures for Selecting the Best Split
• The attribute selection measure provides a ranking for each attribute describing the
given training tuples.
• The attribute having the best score (Max / Min) for the measure is chosen as the
splitting attribute for the given tuples.
• The measures developed for selecting the best split are often based on the degree
of impurity of the child nodes.
• This attribute minimizes the information needed to classify the tuples in the
resulting partitions and reflects the least randomness or “impurity” in these
partitions.
• Info(D) is also known as the entropy of D.
• where pi is the nonzero probability
Age Up Down Competition Up Down

OLD 0 3
YES 1 3
MID 2 2 No 4 2

NEW 3 0

Type Up Down Age Competition Type Profit

Hardware 2 2 mid yes software Down


mid yes hardware Down
Software 3 3
mid No hardware Up
mid No software Up
During the late 1970s and early 1980s, J. Ross Quinlan, a researcher in
machine learning, developed a decision tree algorithm known as ID3
(Iterative Dichotomiser).
Information Gain
Dataset
Age Competition Type Profit

Old Yes software Down


Old No software Down
Old No hardware Down
mid yes software Down
mid yes hardware Down
mid No hardware Up
mid No software Up
new yes software Up
new No hardware Up
new No software Up
Gini Index / Gini Impurity
Age Up Down Competition Up Down

OLD 0 3
YES 1 3
MID 2 2 No 4 2

NEW 3 0

Type Up Down

Hardware 2 2
Software 3 3
Gini age (D) =

Gini competition (D) =

Gini type (D) =


{old, mid, new}
{old}
{mid}
{new}
{old, new}
{mid, new}
{old, mid}
{}
Gini age={old, new} (D) = old, new 0.6 1 3 6 3 6 0.5
Gini age={mid} (D) = mid 0.4 1 2 4 2 4 0.5
Gini age={old, mid} (D) = old, mid 0.7 1 3 7 4 7 0.4898
Gini age={new} (D) = new 0.3 1 0 3 3 3 0
Gini age={mid, new} (D) = mid, new 0.7 1 4 7 3 7 0.4898
Gini age={old} (D) = old 0.3 1 0 3 3 3 0

You might also like