Decision Tree
Decision Tree
Decision Tree
DECISION
TREEAmit Dr
Kumar
Das
INTRODUCTION
competitive
market. the
OrPossibly life build The
high
product of
management a
initial 10
small
If demand years.
dermand
products unsatisfactory,
demand
The plant
is will of
decision or
wil high might be company, a
a
surely and
will high large
indicate hinges
be the fallduring one UPL
company
introduced. to
on to
the low amanufacture
the what Chemical
possibility
levelinitial
does size
thereafter. Industries,
not two the
a
expand of yearsmarket new
a
sustained product Ltd.,
within but, for
the must
if with
the manyproduct
high-volume decide
first an
initial expected
two will whether
years, users be.
market to
find
2
Mumbal\VIM
INTRODUCTION
has build profitable
Thegrowth.
Management
large,operations the If If
produced the it the
development event builds
the offers event company
large-scale in that a
in the that
is the demand
small
some present builds
department, demand
uncertain small
years. plant plant,
management plant is a
ishigh big
management
to what and low plant,
particularly
exploit during
makeduring
to it
a do. the must
the chance a the has
The tidy
introductory
first the introductory the live
development profit
major to new option with
push
product, on it
product the the period; whatever of
period,expanding
company low
project if
development the
volume. the the
market
engineer, into company the size
aturns plant of
new market
the is will in
departmentpushing period out maintain two
to demand.
years
of be
to
in
3
MyobalIM
INTRODUCTION \IIM Mumbak
4
DECISION CHANCE
Key: POINT EVENT VIM Mumbal/
$10 million
HIGH AVERAGE
DEMAND
PROBABILIUTY:
0.60 $2.8 million
HIGH INITIAL,
LOW SUBSEQUENT DEMAND
LOW AVERAGE PROBABILITY:
BUILD DEMAND 0.10
BIG PLANT
PROBABILITY:
INVESTMENT: 0.30
$1 million
$3 million
Decision
point
1
DECISION 2
BUILD POSITION VALUE
SMALL PLANT
$3.6 million
INVESTMENT:
$1.3 million HIGH INITIAL $2.7 $0.9 million
DEMAND (2 YRS) million 2 years
PROBABILITY: 8years
0.70 $450,000/yr
LOW INITIAL
DEMAND
PROBABILITY:
0.30
$4 million
5
\VM Mumba
6
"Nothing
is
particularly
hard
if
you
Henry
Ford -
divide
it
into
small
jobs".
7
Mumbai/
INTRODUCTION DECISION
TREES:
areWhen
response In
Decision then discrete It
called is
decision
the a one
the
decision
variable trees
regression
response and of
tree the
are the
supervised
trees
continuous
(target learning,
variable effective
trees. are
variable)
takes for calledwhen learning
dependent
solving
continuousclassification
talkes the
algorithms
response
classification
discrete
variable.
values variable used
values. trees.
problems for
then takespredicting
the
discrete
decision in
which
both
trees the values the
8
MgmbabVIM
B
DECISION
TREE
Sub-Tree I
Node
Leaf
Decision
Node
Node
Leaf
Decision
Node
Node
Leaf Node
Leaf
Decision
Node
Node
Root
Decision
Node
Node
Leaf
MubalVIM
. CLASSIFICATION
Squared
Regression Index
Errors of Classification and used
Classification
afor
and Regression
a
Tree, Entropy Classification
tree and
on
uses Regression
the to Tree AND
other various
split (used
Tree
REGRESSION
the
hand, when Tree
impurity
nodes. (used
splits the (CART)
when
measures dependent
the
the is
aTREE
node dependentcommon
such variable
that
as terminology
minimizes the variable
is
continuous).
Gini
Impurity is
the discrete) that
Sum is
10
MumbalVIM
PLAYER
TRAINING
WILL
D14 D13 D12 D11 D10 DATA
D9 D8 D7 D6 DS D4 D3 D2 Day
D1
PLAY
Overcast
Overcast Overcast
Rain Sunny Overcast Outlook EXAMPLE:
RainSunnySunny Rain Rain Rain SunnySunny
TENNIS?
PlayTennis:
examples
training
Temperature
Mild Hot Mild Mild Mild Cool Mild Cool Cool Cool Mild
Hot Hot Hot
GOAL
Normal Normal
Normal
Normal Normal
NormalNormal Humidity
High High High High High High High IS
TO
Strong Weak
StrongStrong Weak Weak WeakStrongStrong Weak Weak WeakStrong PREDICT
Weak Wind
PlayTennis
No Yes Yes Yes Yes Yes No Yes No Yes Yes Yes No No
WHEN
THIS
11
MumbaVEM
VIM Mumbaly
Outlook
High Nonal
Strong Weak
No Yes No Yes
12
Outlook is a nominal feature. It can be sunny, overcast or rain.
for outlook feature. Summarizing the final decisions VIM Mumbai
Number of
Humidity Yes No
instances
High 3 4 7
Normal 6 1 7
15
Wind is a binary class similar to humidity. It can be weak and IM Mumbai/
strong.
Wind Yes No
Number of
instances
Weak 6 2 8
Strong 3 3 6
16
VIM Mumbal
Temperature 0.439
Humidity 0.367
Wind 0.428
17
VIM Mumbai,
Sunny Rain
Outlook
Overcast
Day .Outlook Temp. |Humidity Wind Decision Day
1 Sunny Hot High Weak No outlook Temp. Humidity -Wind Decision
2 Sunny Hot 4Rain Mild
High Strong No High Weak Yes
S Rain
8Sunny Mild High Weak No Cool Normal Weak Yes
9 Sunny Cool 6 Rain Cool Normal
Normal Weak Yes Strong No
11 Sunny Mild 10 Rain Mild
Normal Strong Yes Normal Weak Yes
14 Rain Mild High Strong No
18
VEM Mumbab
Outlook
Sunny
Rain
Day Outiook Temp.
1 Sunny
-jHumidity Wind Decision
Overcast
Hot High Day Outlook Temp.
Weak No Humidity Wind Decision
2 Sunny Hot 4Rain Mild
High |Strong No High Weak Yes
8 Sunny Mild SRain Cool
High Weak No Normal Weak Yes
9 Sunny Cool 6 Rain Cool
11 Sunny Mild
Normal
Normal
Weak Yes
Yes 10 Rain Mild
Normal
Normal
Strong No
Strong Yes
14 Rain
Weak Yes
Mild High Strong No
19
Focus on the sub dataset for sunny outlook. We need to find VIM Mumbal
the gini index scores for
temperature, humidity and wind features respectively.
Day Outlook Temp. Humidity Wind Decision
HN001
Sunny Hot High Weak No
2
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak
11
Yes
Sunny Mild Normal
Strong Yes
20
Gini of temperature for sunny
outlook \VM Mumbai
.
Gini(Outlook=Sunny and Ternp.=Hot) = 1- (0/2)2 - (2/2)2 = 0
Gini(Outlook=Sunny and Termp.=Cool) =1-(1/1)2 -(0/1) = 0
Gini(Outlook=Sunny and Temp.=Mild) = 1-(1/2)2 -(1/2)² = 1-0.25 - 0.25 = 0.5
Gini(Outlook=Sunny and Ternp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2
21
Gini of humidity for sunny outlook
.
Gini(Outlook=Sunny and Humidity=High) = 1- (0/3)2 - (3/3)² = 0
"
Gini(Outlook=Sunny and Humidity=Normal) = 1- (2/2)2 - (0/2)2 = 0
Gini(Outlook=Sunny and Humidity) = (3/5)x0 + (2/5)x0 = 0
22
Gini of wind for sunny outlook \TM Mombai
Wind
Humidity Feature
Temperature gini
SUNNY
it
hasindex
the
scores
OUTLOOK
lowest
for
value.
feature
0.466 0.2 index
Gini
when
outlook
is
sunny.
The
winner
24
Mumhai/VIM
Day
SunnySunny
8Sunny
2 1
.outlook
Temp.
gh High
High HiHumidity
High
Weak
Strong Weak
Wind
No No
NoDecision
Sunny
Humidity Outlook
Day
SunnySunny
11| 9
Outlook
Temp.
Mild Cool
Normal
Normal
Humidity
Normal
Strong Weak
Wind
Yes
YesDecision
25
Mumbal
VIM
\VM Mha
Outlook
Sunny
Rain
Overcast Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
S Rain Cool Normal Weak Yes
Humidity Yes
6 Rain
10 Rain
Cool
Mild
Normal
Normal
Strong
Weak
No
Yes
14 Rain Mild High Strong No
High Normal
No
Yes
26
Mumbai/
Outlook
Sunny
Rain
Overcast
27
GINI IMPURITY uM Mumba
where
GI(t) = Gini index at nodet
P(Ci|t) = Proportion of observations belonging to class Ci in node t
The lower the Gini Impurity, the higher the
Impurity of a pure node is zero.
homogeneity of the node. The Gini
28
ENTROPY & INFORMATION GAIN VIM Mumbal