Decision Tree

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

MunbaiIM

DECISION
TREEAmit Dr
Kumar

Das
INTRODUCTION
competitive
market. the
OrPossibly life build The
high
product of
management a
initial 10
small
If demand years.
dermand
products unsatisfactory,
demand
The plant
is will of
decision or
wil high might be company, a
a
surely and
will high large
indicate hinges
be the fallduring one UPL
company
introduced. to
on to
the low amanufacture
the what Chemical
possibility
levelinitial
does size
thereafter. Industries,
not two the
a
expand of yearsmarket new
a
sustained product Ltd.,
within but, for
the must
if with
the manyproduct
high-volume decide
first an
initial expected
two will whether
years, users be.
market to
find

2
Mumbal\VIM
INTRODUCTION
has build profitable
Thegrowth.
Management
large,operations the If If
produced the it the
development event builds
the offers event company
large-scale in that a
in the that
is the demand
small
some present builds
department, demand
uncertain small
years. plant plant,
management plant is a
ishigh big
management
to what and low plant,
particularly
exploit during
makeduring
to it
a do. the must
the chance a the has
The tidy
introductory
first the introductory the live
development profit
major to new option with
push
product, on it
product the the period; whatever of
period,expanding
company low
project if
development the
volume. the the
market
engineer, into company the size
aturns plant of
new market
the is will in
departmentpushing period out maintain two
to demand.
years
of be
to
in
3
MyobalIM
INTRODUCTION \IIM Mumbak

The chairman, a principal stockholder, is wary of the


possibility of large unneeded
plant capacity.
He favors a smaller plant
commitment, but recognizes that later expansion to meet high
volume demand would require more investment and be less
efficient to operate.
The chairman also recognizes that unless the
company moves promptly to fill the demand
which develops, competitors will be tempted to move in with
equivalent products.

4
DECISION CHANCE
Key: POINT EVENT VIM Mumbal/
$10 million

HIGH AVERAGE
DEMAND
PROBABILIUTY:
0.60 $2.8 million
HIGH INITIAL,
LOW SUBSEQUENT DEMAND
LOW AVERAGE PROBABILITY:
BUILD DEMAND 0.10
BIG PLANT
PROBABILITY:
INVESTMENT: 0.30
$1 million
$3 million

Decision
point
1

DECISION 2
BUILD POSITION VALUE
SMALL PLANT
$3.6 million
INVESTMENT:
$1.3 million HIGH INITIAL $2.7 $0.9 million
DEMAND (2 YRS) million 2 years
PROBABILITY: 8years
0.70 $450,000/yr

LOW INITIAL
DEMAND
PROBABILITY:
0.30
$4 million

5
\VM Mumba

The decision tree can clarify for


Iknow of, the choices, riskS, management, as can no other analytical tool that
objectives, monetary gains, and information needs
involved in an investment problem.
We shall be hearing a great deal
about decision trees in the years ahead.
Although a novelty tomost business people today, they will
management parlance before many more years have passed.surely be in common

6
"Nothing

is
particularly

hard
if
you
Henry
Ford -
divide

it
into
small
jobs".
7

Mumbai/
INTRODUCTION DECISION
TREES:
areWhen
response In
Decision then discrete It
called is
decision
the a one
the
decision
variable trees
regression
response and of
tree the
are the
supervised
trees
continuous
(target learning,
variable effective
trees. are
variable)
takes for calledwhen learning
dependent
solving
continuousclassification
talkes the
algorithms
response
classification
discrete
variable.
values variable used
values. trees.
problems for
then takespredicting
the
discrete
decision in
which
both
trees the values the
8
MgmbabVIM

B
DECISION
TREE

Sub-Tree I
Node
Leaf

Decision
Node

Node
Leaf

Decision
Node

Node
Leaf Node
Leaf

Decision
Node

Node
Root
Decision
Node
Node
Leaf

MubalVIM
. CLASSIFICATION
Squared
Regression Index
Errors of Classification and used
Classification
afor
and Regression
a
Tree, Entropy Classification
tree and
on
uses Regression
the to Tree AND
other various
split (used
Tree
REGRESSION
the
hand, when Tree
impurity
nodes. (used
splits the (CART)
when
measures dependent
the
the is
aTREE
node dependentcommon
such variable
that
as terminology
minimizes the variable
is
continuous).
Gini
Impurity is
the discrete) that
Sum is
10
MumbalVIM
PLAYER
TRAINING

WILL
D14 D13 D12 D11 D10 DATA
D9 D8 D7 D6 DS D4 D3 D2 Day
D1
PLAY
Overcast
Overcast Overcast
Rain Sunny Overcast Outlook EXAMPLE:
RainSunnySunny Rain Rain Rain SunnySunny
TENNIS?
PlayTennis:
examples
training
Temperature
Mild Hot Mild Mild Mild Cool Mild Cool Cool Cool Mild
Hot Hot Hot
GOAL

Normal Normal
Normal
Normal Normal
NormalNormal Humidity
High High High High High High High IS
TO
Strong Weak
StrongStrong Weak Weak WeakStrongStrong Weak Weak WeakStrong PREDICT
Weak Wind

PlayTennis
No Yes Yes Yes Yes Yes No Yes No Yes Yes Yes No No
WHEN

THIS

11

MumbaVEM
VIM Mumbaly

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Nonal
Strong Weak

No Yes No Yes

12
Outlook is a nominal feature. It can be sunny, overcast or rain.
for outlook feature. Summarizing the final decisions VIM Mumbai

Outlook Yes No Number of instances


Sunny 2 3 5
Overcast 0 4
Rain 3 2 5

Gini(Outlook=Sunny) = 1- (2/5)2 (3/5)2 = 1- 0.16 0.36 = 0.48


" Gini(Outlook=Overcast) =1- (4/4)² -(0/4)=0
" Gini(Outlook=Rain) = 1 - (3/5)2 - (2/5)2 = 1-0.36 -0.16 = 0.48
Then, we will calculate weighted sum of gini indexes for outlook feature.
. Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 = 0.171 + 0 + 0.171 =0.342
13
|VIM Mumbal
Similarly, temperature is a nominal feature and it could have 3 different
Mild. Let's summarize decisions for values: Cool, Hot and
temperature feature.
Ternperature Yes No Number of instances
Hot 2 2 4
Cool 3 1 4
Mild 4 6

Gini(Tenp=Hot) = 1- (2/4)2 -(2/4)2 = 0.5


Gini(Temp=Cool) =1-(3/4)2 - (1/4)2 =1-0.5625 -0.0625 = 0.375
Gini(Temp=Mild) =1-(4/6)2 - (2/6)2 = 1-0.444 0.111 = 0.445
We'llcalculate weighted sum of gini index for temperature feature
. Gini(Tenp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x0.445 = 0.142 + 0.107 +
0.190= 0.439
14
Humidity is a binary class feature. It can be high or normal. VM Mbal

Number of
Humidity Yes No
instances
High 3 4 7
Normal 6 1 7

Gini(Humidity=High) = 1-(3/7)2 -(4/7)2 = 1-0.183 - 0.326 = 0.489


Gini(Humidity=Normal) = 1- (6/7)2 -(1/7)2 = 1-0.734 - 0.02 = 0.244
Weighted sum for humidity feature will be calculated next
. Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 = 0.367

15
Wind is a binary class similar to humidity. It can be weak and IM Mumbai/

strong.

Wind Yes No
Number of
instances
Weak 6 2 8

Strong 3 3 6

. Gini(Wind=Weak) = 1-(6/8) -(2/8)2 = 1-0.5625 0.062 = 0.375

. Gini(Wind=Strong) = 1-(3/6)-(3/6)2 = 1-0.25 -0.25 = 0.5


" Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5 = 0.428

16
VIM Mumbal

Feature Gini index


Outlook 0.342

Temperature 0.439

Humidity 0.367

Wind 0.428

17
VIM Mumbai,

Sunny Rain
Outlook

Overcast
Day .Outlook Temp. |Humidity Wind Decision Day
1 Sunny Hot High Weak No outlook Temp. Humidity -Wind Decision
2 Sunny Hot 4Rain Mild
High Strong No High Weak Yes
S Rain
8Sunny Mild High Weak No Cool Normal Weak Yes
9 Sunny Cool 6 Rain Cool Normal
Normal Weak Yes Strong No
11 Sunny Mild 10 Rain Mild
Normal Strong Yes Normal Weak Yes
14 Rain Mild High Strong No

Day Outlook Temp. Humidity Wind Decision


3Overcast Hot High Weak Yes
7Overcast Cool Normal Strong Yes
12Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes

18
VEM Mumbab

Outlook
Sunny
Rain
Day Outiook Temp.
1 Sunny
-jHumidity Wind Decision
Overcast
Hot High Day Outlook Temp.
Weak No Humidity Wind Decision
2 Sunny Hot 4Rain Mild
High |Strong No High Weak Yes
8 Sunny Mild SRain Cool
High Weak No Normal Weak Yes
9 Sunny Cool 6 Rain Cool
11 Sunny Mild
Normal
Normal
Weak Yes
Yes 10 Rain Mild
Normal
Normal
Strong No
Strong Yes
14 Rain
Weak Yes
Mild High Strong No

19
Focus on the sub dataset for sunny outlook. We need to find VIM Mumbal
the gini index scores for
temperature, humidity and wind features respectively.
Day Outlook Temp. Humidity Wind Decision
HN001
Sunny Hot High Weak No
2
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak
11
Yes
Sunny Mild Normal
Strong Yes

20
Gini of temperature for sunny
outlook \VM Mumbai

Temperature Yes No Number of


instances
Hot 2 2
Cool 1 1
Mild 1 2

.
Gini(Outlook=Sunny and Ternp.=Hot) = 1- (0/2)2 - (2/2)2 = 0
Gini(Outlook=Sunny and Termp.=Cool) =1-(1/1)2 -(0/1) = 0
Gini(Outlook=Sunny and Temp.=Mild) = 1-(1/2)2 -(1/2)² = 1-0.25 - 0.25 = 0.5
Gini(Outlook=Sunny and Ternp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2

21
Gini of humidity for sunny outlook

Humidity Yes No Number of


instances
High 3 3
Normal 2 2

.
Gini(Outlook=Sunny and Humidity=High) = 1- (0/3)2 - (3/3)² = 0
"
Gini(Outlook=Sunny and Humidity=Normal) = 1- (2/2)2 - (0/2)2 = 0
Gini(Outlook=Sunny and Humidity) = (3/5)x0 + (2/5)x0 = 0

22
Gini of wind for sunny outlook \TM Mombai

Wind Yes No Number of


instances
Weak 1 2 3
Strong 1 1 2

Gini(Outlook=Sunny and Wind=Weak) = 1-(1/3)²-(2/3)2 = 0.444


"
Gini(Outlook=Sunny and Wind=Strong) = 1-(1/2)?-(1/2)? = 0.5
Gini(Outlook=Sunny and Wind) = (3/5)x0.266 + (2/5)x0.2 = 0.466
23
DECISION
We'isve
humidity
calculated
because FOR

Wind
Humidity Feature
Temperature gini
SUNNY
it
hasindex
the
scores
OUTLOOK
lowest
for
value.
feature
0.466 0.2 index
Gini
when

outlook

is
sunny.

The
winner
24

Mumhai/VIM
Day

SunnySunny
8Sunny
2 1
.outlook
Temp.

Mild Hot Hot

gh High
High HiHumidity
High

Weak
Strong Weak
Wind

No No
NoDecision

Sunny
Humidity Outlook

Day

SunnySunny
11| 9
Outlook
Temp.

Mild Cool

Normal
Normal
Humidity
Normal

Strong Weak
Wind

Yes
YesDecision

25
Mumbal
VIM
\VM Mha

Outlook
Sunny
Rain
Overcast Day Outlook Temp. Humidity Wind Decision
4 Rain Mild High Weak Yes
S Rain Cool Normal Weak Yes
Humidity Yes
6 Rain
10 Rain
Cool
Mild
Normal
Normal
Strong
Weak
No
Yes
14 Rain Mild High Strong No
High Normal
No
Yes

26
Mumbai/

Outlook
Sunny
Rain
Overcast

Humidity Yes Wind


High Normal Weak
Strong
No Yes Yes No

27
GINI IMPURITY uM Mumba

Gini impurity index is one of the measures of impurity that is used by


classification trees to split the nodes.
K K K
K,
GI) = } ~ P(C;)PC ) = ~P(C| |)0P(C; |) =1- ~PC; 1)}
i=lj=l,jzi i=l i=l

where
GI(t) = Gini index at nodet
P(Ci|t) = Proportion of observations belonging to class Ci in node t
The lower the Gini Impurity, the higher the
Impurity of a pure node is zero.
homogeneity of the node. The Gini

28
ENTROPY & INFORMATION GAIN VIM Mumbal

Entropy is a popular measure of impuritythat is used in


split a node. classification trees to
Entropy measures the degree of randomness in data
For a set of samples X with k classes: Low entropy High entropy

entropy(X) = Pi logz (Pi)


i=1

The information gain of an attribute a is the


splitting on values of a: (here X, is the subsetexpected reduction in
of X for which a = v)entropy due to

gain(X,a) = entropy (X) - |Xyl


|X| entropy (X,)
vEValues(a)
29

You might also like