0% found this document useful (0 votes)
21 views35 pages

Decision Tree

The document explains the decision tree algorithm, focusing on the concept of information gain as a criterion for selecting attributes. It provides a detailed example using weather conditions to predict whether to play golf, illustrating calculations of entropy and information gain for different attributes. The document concludes by identifying the attribute with the highest information gain to be used as the root node in the decision tree.

Uploaded by

SUPREET MAURYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views35 pages

Decision Tree

The document explains the decision tree algorithm, focusing on the concept of information gain as a criterion for selecting attributes. It provides a detailed example using weather conditions to predict whether to play golf, illustrating calculations of entropy and information gain for different attributes. The document concludes by identifying the attribute with the highest information gain to be used as the root node in the decision tree.

Uploaded by

SUPREET MAURYA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Decision Tree: Information Gain

Decision Tree
Basic algorithm (a greedy algorithm)
• Tree is constructed in a top-down recursive divide-and-conquer manner
• At start, all the training examples are at the root
• Attributes are categorical (if continuous-valued, they are discretized in advance)
• Examples are partitioned recursively based on selected attributes
• Test attributes are selected on the basis of a heuristic or statistical measure
(e.g., information gain)
Decision Tree: Information Gain
Sl. No. Outlook Temp Humidity Windy Play Golf
1 Rainy Hot High Weak No
2 Rainy Hot High Strong No
3 Overcast Hot High Weak Yes
4 Sunny Mild High Weak Yes
5 Sunny Cool Normal Weak Yes
6 Sunny Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Rainy Mild High Weak No
9 Rainy Cool Normal Weak Yes
10 Sunny Mild Normal Weak Yes
11 Rainy Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Sunny Mild High Strong No
Decision Tree: Information Gain
= 0.940 Wind
Yes No
𝑃 ( 𝑆 𝑊𝑒𝑎𝑘 ) =
𝑁𝑜 . 𝑜𝑓 𝑊𝑒𝑎𝑘
𝑇𝑜𝑡𝑎𝑙
= ( )
8
14 Weak 6 2
Strong 3 3
=( )
𝑁𝑜. 𝑜𝑓 𝑆𝑡𝑟𝑜𝑛𝑔 6
𝑃 ( 𝑆 𝑆𝑡𝑟𝑜𝑛𝑔 )= 𝑇𝑜𝑡𝑎𝑙 14 Total 9 5

= 0.811

=1

𝑛
𝐼𝐺 ( 𝑆 ,𝑊𝑖𝑛𝑑 )=𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 ) − ∑ 𝑝 ( 𝑥 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑥 )
𝑖=0

𝐼𝐺 ( 𝑆 ,𝑊𝑖𝑛𝑑 )=𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 ) − 𝑃 ( 𝑆𝑊𝑒𝑎𝑘 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑊𝑒𝑎𝑘 ) − 𝑃 ( 𝑆𝑆𝑡𝑟𝑜𝑛𝑔 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆𝑡𝑟𝑜𝑛𝑔)

𝐼𝐺 ( 𝑆 ,𝑊𝑖𝑛𝑑 )=0.940 −
8
14 ( )
(0.811)−
6
14 ( )
(1)= 0.048
5
( )
𝑁𝑜 .𝑜𝑓 𝑆𝑢𝑛𝑛𝑦 5 Overcast
𝑃 ( 𝑆 𝑆𝑢𝑛𝑛𝑦 ) = =
𝑇𝑜𝑡𝑎𝑙 14 Yes No
Sunny 3 2
𝑃 ( 𝑆 𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 ) =
𝑁𝑜. 𝑜𝑓 𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡
𝑇𝑜𝑡𝑎𝑙
=( )
4
14 Overcast 4 0
Rainy 2 3
𝑃 ( 𝑆 𝑅𝑎𝑖𝑛𝑦 )=
𝑁𝑜 . 𝑜𝑓 𝑅𝑎𝑖𝑛𝑦
𝑇𝑜𝑡𝑎𝑙
=( )
5
14 Total 9 5

0.940
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 𝑆𝑢𝑛𝑛𝑦 ) =− ( ) () () ( )
3
5
𝑙𝑜𝑔2
3
5

2
5
𝑙𝑜𝑔 2
2
5
=0.970951

= Undefined(Kindly consider it as Zero)

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 𝑅𝑎𝑖𝑛𝑦 )=− ( ) () ()


2
5
𝑙𝑜𝑔2
2
5

3
5
𝑙𝑜𝑔 2 ()
3
5
=0.970951

𝐼𝐺 ( 𝑆 , 𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 )=0.940 − ( )
5
14 ( ) ( )
( 0.970951 ) −
4
14
( 0) −
5
14
(0.970951)= 0.0246
6
𝑃 ( 𝑆 𝐻𝑜𝑡 ) =
𝑁𝑜 . 𝑜𝑓 𝐻𝑜𝑡
𝑇𝑜𝑡𝑎𝑙
=( )
4
14
Temperature
Yes No
Hot 2 2
𝑃 ( 𝑆 𝑀𝑖𝑙𝑑 ) =
𝑁𝑜 .𝑜𝑓 𝑀𝑖𝑙𝑑
𝑇𝑜𝑡𝑎𝑙
= ( )
6
14
Mild 4
Cool 3
2
1
𝑃 ( 𝑆 𝐶𝑜𝑜𝑙 ) =
𝑁𝑜 . 𝑜𝑓 𝐶𝑜𝑜𝑙
( )
=
4 Total 9 5

0.940
𝑇𝑜𝑡𝑎𝑙 14

) (4) ( 4 ) ( 4 ) ( 4 )=1
2 2 2 2
𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 𝐻𝑜𝑡 =− 𝑙𝑜𝑔 2 − 𝑙𝑜𝑔
2

= 0.918296

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 𝐶𝑜𝑜𝑙 ) =− () ( )() ()


3
4
𝑙𝑜𝑔2
3
4

1
4
𝑙𝑜𝑔 2
1
4
=0.811278

𝐺 ( 𝑆 ,𝑇𝑒𝑚𝑝 )=0.940 −
4
14 ( )
(1 ) −
6
14 ( )
( 0.918296 ) −
4
14 ( )
(0.811278) = 0.028937
7
Humidity
𝑃 ( 𝑆 𝐻𝑖𝑔h )=
𝑁𝑜 . 𝑜𝑓 𝐻𝑖𝑔h
𝑇𝑜𝑡𝑎𝑙 ( )
=
7
14 Yes No
High 3 4
𝑃 ( 𝑆 𝑁𝑜𝑟𝑚𝑎𝑙 ) =
𝑁𝑜 .𝑜𝑓 𝑁𝑜𝑟𝑚𝑎𝑙
𝑇𝑜𝑡𝑎𝑙
=
7
14 ( ) Normal 6 1

= 0.985228
Total 9 5

0.940

𝐼𝐺 ( 𝑆 ,𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 )=𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 ) − 𝑃 ( 𝑆 𝐻𝑖𝑔h ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝐻𝑖𝑔h ) − 𝑃 ( 𝑆 𝑁𝑜𝑟𝑚𝑎𝑙 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑁𝑜𝑟𝑚𝑎𝑙)

𝐼𝐺 ( 𝑆 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 )=0.940 − ( )
7
14
( 0.985228)−
7
14( )
( 0.591673)= 0.151
8
Decision Tree: Information Gain
𝐼𝐺 ( 𝑆 , 𝑂𝑣𝑒𝑟𝑐𝑎𝑠𝑡 )=0.246
Highest Information Gain
Shall be selected as Root Node
𝐼𝐺 ( 𝑆 ,𝑇𝑒𝑚𝑝 )=0.029

𝐼𝐺 ( 𝑆 ,𝑊𝑖𝑛𝑑 ) = 0.048
Outlook
𝐼𝐺 ( 𝑆 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 )=0.151

Sunny Rainy

Yes
Decision Tree: Gini Index Sunny

Play Sl. No. Temp Humidity Windy Play Golf


Sl. No. Outlook Temp Humidity Windy Golf 1 Mild High Weak Yes
1 Rainy Hot High Weak No 2 Cool Normal Weak Yes
2 Rainy Hot High Strong No 3 Cool Normal Strong No
3 Overcast Hot High Weak Yes 4 Mild Normal Weak Yes
4 Sunny Mild High Weak Yes
5 Mild High Strong No
5 Sunny Cool Normal Weak Yes
6 Sunny Cool Normal Strong No
7 Overcast Cool Normal Strong Yes Rainy
8 Rainy Mild High Weak No
Sl. No. Temp Humidity Windy Play Golf
9 Rainy Cool Normal Weak Yes
10 Sunny Mild Normal Weak Yes 1 Hot High Weak No
11 Rainy Mild Normal Strong Yes 2 Hot High Strong No
12 Overcast Mild High Strong Yes 3 Mild High Weak No
13 Overcast Hot Normal Weak Yes 4 Cool Normal Weak Yes
14 Sunny Mild High Strong No 5 Mild Normal Strong Yes
= 0.970 Wind
Yes No
𝑃 ( 𝑆 𝑊𝑒𝑎𝑘 ) =
𝑁𝑜 . 𝑜𝑓 𝑊𝑒𝑎𝑘 3
𝑇𝑜𝑡𝑎𝑙
= ()
5
Weak 3 0

Strong 0 2
=( )
𝑁𝑜. 𝑜𝑓 𝑆𝑡𝑟𝑜𝑛𝑔 2
𝑃 ( 𝑆 𝑆𝑡𝑟𝑜𝑛𝑔 )= 𝑇𝑜𝑡𝑎𝑙 5
Total 3 2
=0

=0

𝑛
𝐼𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑊𝑖𝑛𝑑 ) =𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆𝑢𝑛𝑛𝑦 ) − ∑ 𝑝 ( 𝑥 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑥 )
𝑖=0

𝐼𝐺 ( 𝑆𝑢𝑛𝑛𝑦 ,𝑊𝑖𝑛𝑑 ) =𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 ) − 𝑃 ( 𝑆 𝑊𝑒𝑎𝑘 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑊𝑒𝑎𝑘 ) − 𝑃 ( 𝑆 𝑆𝑡𝑟𝑜𝑛𝑔 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆𝑡𝑟𝑜𝑛𝑔)

𝐼𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑊𝑖𝑛𝑑 ) =0.970 −


3
5 () ()
(0)−
2
5
(0)= 0.970
11
𝑃 ( 𝑆 𝑀𝑖𝑙𝑑 ) =
𝑁𝑜 .𝑜𝑓 𝑀𝑖𝑙𝑑 3
𝑇𝑜𝑡𝑎𝑙
=
5() Temperature
Yes No

()
𝑁𝑜 . 𝑜𝑓 𝐶𝑜𝑜𝑙 2 Mild 2 1
𝑃 ( 𝑆 𝐶𝑜𝑜𝑙 ) = = Cool 1 1
𝑇𝑜𝑡𝑎𝑙 5
Total 3 2

0.970
= 0.88975

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 𝐶𝑜𝑜𝑙 ) =− () () () ( )
1
2
𝑙𝑜𝑔2
1
2

1
2
𝑙𝑜𝑔2
1
2
=1

𝑛𝑛𝑦 , 𝑇𝑒𝑚𝑝 )= 0.970 −


3
5 ( )
( 0.88975 ) −
2
5 ( )
( 1 ) = 0.036966
12
()
𝑁𝑜 . 𝑜𝑓 𝐻𝑖𝑔h 2 Humidity
𝑃 ( 𝑆 𝐻𝑖𝑔h )= = Yes No
𝑇𝑜𝑡𝑎𝑙 5
High 1 1
𝑃 ( 𝑆 𝑁𝑜𝑟𝑚𝑎𝑙 ) =
𝑁𝑜 .𝑜𝑓 𝑁𝑜𝑟𝑚𝑎𝑙 3
𝑇𝑜𝑡𝑎𝑙
=
5 () Normal 2 1

=1
Total 3 2

0.970

𝐼𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 ) − 𝑃 ( 𝑆 𝐻𝑖𝑔h ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝐻𝑖𝑔h ) − 𝑃 ( 𝑆 𝑁𝑜𝑟𝑚𝑎𝑙 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑁𝑜𝑟𝑚𝑎𝑙)

2 3
() ()
𝐼𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =0.970 − (1)− (0.591673)= 0.014961
5 5 13
Decision Tree: Information Gain Highest Information Gain
Shall be selected as Root Node
𝐼𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑊𝑖𝑛𝑑 ) =0.970
Outlook
𝐼𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =0.014961

𝐼𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑇𝑒𝑚𝑝 )=0.036966 Sunny Rainy

Windy Yes

Weak Strong

Yes No
= 0.970 Wind
Yes No
𝑁𝑜 . 𝑜𝑓 𝑊𝑒𝑎𝑘 3
()
Weak 1 2
𝑃 ( 𝑆 𝑊𝑒𝑎𝑘 ) = =
𝑇𝑜𝑡𝑎𝑙 5 Strong 1 1
Total 2 3
=( )
𝑁𝑜. 𝑜𝑓 𝑆𝑡𝑟𝑜𝑛𝑔 2
𝑃 ( 𝑆 𝑆𝑡𝑟𝑜𝑛𝑔 )= 𝑇𝑜𝑡𝑎𝑙 5
= 0.723308

=1

𝑛
𝐼𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝑊𝑖𝑛𝑑 ) =𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑅𝑎𝑖𝑛𝑦 ) − ∑ 𝑝 ( 𝑥 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑥 )
𝑖=0

𝐼𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝑊𝑖𝑛𝑑 ) =𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 ) − 𝑃 ( 𝑆 𝑊𝑒𝑎𝑘 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑊𝑒𝑎𝑘 ) − 𝑃 ( 𝑆 𝑆𝑡𝑟𝑜𝑛𝑔 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 (𝑆𝑡𝑟𝑜𝑛𝑔)

𝐼𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝑊𝑖𝑛𝑑 ) =0.970 −


3
5 ()
(0.723308)−
2
5 ()
(1)= 0.136966
15
( )
𝑁𝑜 . 𝑜𝑓 𝐻𝑜𝑡 2 Temperature
𝑃 ( 𝑆 𝐻𝑜𝑡 ) = = Yes No
𝑇𝑜𝑡𝑎𝑙 14
Hot 0 2

𝑃 ( 𝑆 𝑀𝑖𝑙𝑑 ) =
𝑁𝑜 .𝑜𝑓 𝑀𝑖𝑙𝑑
𝑇𝑜𝑡𝑎𝑙
= ( 14 )
2 Mild
Cool
1
1
1
0

𝑃 ( 𝑆 𝐶𝑜𝑜𝑙 ) =
𝑁𝑜 . 𝑜𝑓 𝐶𝑜𝑜𝑙
= ( )
4 Total 2 3

0.970
𝑇𝑜𝑡𝑎𝑙 14

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 𝐻𝑜𝑡 ) =− ( 2 ) ( 2 ) ( 2 ) ( 2 )=0


0
𝑙𝑜𝑔
2
0

2
𝑙𝑜𝑔
2
2

=1

𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 𝐶𝑜𝑜𝑙 ) =− () () () ( )
1
1
𝑙𝑜𝑔2
1
1

0
1
𝑙𝑜𝑔2
0
1
=0

𝐼𝐺 ( 𝑆 ,𝑇𝑒𝑚𝑝 )=0.970 −
2
14
( 0) −
2
14 ( )
( 1) −
1
14 ( ) ( )
(0) = 0.828093
16
Humidity
𝑃 ( 𝑆 𝐻𝑖𝑔h )=
𝑁𝑜 . 𝑜𝑓 𝐻𝑖𝑔h 3
𝑇𝑜𝑡𝑎𝑙
=
5() High
Yes
0
No Count P(No)
3 3 4/5
𝑃 ( 𝑆 𝑁𝑜𝑟𝑚𝑎𝑙 ) =
𝑁𝑜 .𝑜𝑓 𝑁𝑜𝑟𝑚𝑎𝑙 2
𝑇𝑜𝑡𝑎𝑙
=
5 () Nor
mal
2 0 2 1/5

=0
Total 2 3 100% 100%

0.970

𝐼𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 ) − 𝑃 ( 𝑆 𝐻𝑖𝑔h ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝐻𝑖𝑔h ) − 𝑃 ( 𝑆 𝑁𝑜𝑟𝑚𝑎𝑙 ) ∗ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑁𝑜𝑟𝑚𝑎𝑙)

𝐼𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =0.970 −


3
5 () ()
2
(0)− (0)= 0.970
5 17
Decision Tree: Information Gain Highest Information Gain
Shall be selected as Root Node
𝐼𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝑊𝑖𝑛𝑑 ) =0.136966
Outlook
𝐼𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =0.970

𝐼𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝑇𝑒𝑚𝑝 )=0.828093 Sunny Rain

Overcast

Windy Yes
Humidity

Weak Strong Normal High

Yes No Yes No
Decision Tree: Gini Index
Decision Tree: Gini Index Sunny
Gini Index
• Faster to compute compared to other impurity measures like entropy
• More sensitive to changes in class probabilities, which can be beneficial for certain datasets
• Can be biased towards attributes with more categories, which might not always be desirable

Entropy (Information Gain)


• Interpretability: Provides a clear measure of the amount of information needed to classify data points,
making it easier to understand the splits
• Balanced Splits: Tends to produce more balanced splits compared to the Gini Index
• Computationally Intensive: More complex and time-consuming to calculate than the Gini Index
𝑛 𝑛
𝐺𝑖𝑛𝑖 𝐼𝑚𝑝𝑢𝑟𝑖𝑡𝑦 1=1− ∑ (𝑝¿¿𝑖) ¿ 𝐺𝑖𝑛𝑖 𝐼𝑚𝑝𝑢𝑟𝑖𝑡𝑦 2=1 − ∑ ¿ ¿ ¿
2

𝑖=1 𝑖 =1

𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥=𝐺𝑖𝑛𝑖(𝑆)− 𝐺𝑖𝑛𝑖 𝐼𝑚𝑝𝑢𝑟𝑖𝑡𝑦(𝑆 , 𝑋 )


Decision Tree: Gini Index
Sl. No. Outlook Temp Humidity Windy Play Golf
1 Rainy Hot High Weak No
2 Rainy Hot High Strong No
3 Overcast Hot High Weak Yes
4 Sunny Mild High Weak Yes
5 Sunny Cool Normal Weak Yes
6 Sunny Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Rainy Mild High Weak No
9 Rainy Cool Normal Weak Yes
10 Sunny Mild Normal Weak Yes
11 Rainy Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Sunny Mild High Strong No
Decision Tree: Gini Index
Entropy of the Class Variable
+

++]

( )( ( ) ( ) )
2 2
5 3 2 Overcast
𝐺 ( 𝑆 ,𝑜𝑢𝑡𝑙𝑜𝑜𝑘 )=1 − 1− −
14 5 5
Yes No Count

( )( ( ) − ( ) )
2 2
4 4 0
− 1− Sunny 3 2 5
14 4 4
Overcast 4 0 4
( )( ( ) − ( ) )
2 2
5 2 3
− 1−
14 5 5 Rainy 2 3 5

𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥 ( 𝑆 , 𝑂𝑢𝑡𝑙𝑜𝑜𝑘) =0.4591 −0.657143=−0.19804 Total 9 5 100%


Decision Tree: Gini Index
++]

( )( ( ) ( ) )
2 2
4 2 2
𝐺 ( 𝑆 ,𝑇𝑒𝑚𝑝 )=1 − 1− −
14 4 4

14 ) ( ( 6 ) ( 6 ) )¿ 0.559524
−(
2 2
6 4 2
1 − −

14 ( 4 )
−( ) 1 −( ) − ( )
2 2
4 3 1 Temperature
4 Yes No Count
Hot 2 2 4
Mild 4 2 6
Cool 3 1 4
𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥 ( 𝑆 ,𝑇𝑒𝑚𝑝 )=0.4591 − 0.55952 4=−0.10042 Total 9 5 100%
Decision Tree: Gini Index
Entropy of the Class Variable
+

+]

( )( ( ) ( ) )
2 2
7 3 4
𝐺 ( 𝑆 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 )=1 − 1− − Humidity
14 7 7

1 ¿ 0.632653
Yes No Count
−( ) ( 7 )
1 −( ) − ( )
2 2
7 6
14 7 High 3 4 7
Normal 6 1 7

=-0.17355
Total 9 5 100%
Decision Tree: Gini Index
Entropy of the Class Variable
+

+]

( )( ( ) ( ) )
2 2
8 6 2
𝐺 ( 𝑆 ,𝑊𝑖𝑛𝑑𝑦 )=1 − 1− − Wind
14 8 8
¿ − 0.571429 Yes No Count

14 ) ( (6) (6 ) )
−(
2 2
6 3 3
1 − − Weak 6 2 8
Strong 3 3 6

=-
Total 9 5 100%
Decision Tree: Gini Gain
𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥 ( 𝑆 , 𝑂𝑢𝑡𝑙𝑜𝑜𝑘) =0.4591 −0.657143=−0.19804

𝐺𝑖𝑛𝑖 𝐼𝑛𝑑𝑒𝑥 ( 𝑆 ,𝑇𝑒𝑚𝑝 )=0.4591 − 0.55952 4=−0.10042 Minimum Gini Index or Maximum Gini Impurity

=-0.17355
Should be selected as Root Node

=- Outlook

Sunny Rainy

Yes
Decision Tree: Gini Index Sunny

Play Sl. No. Temp Humidity Windy Play Golf


Sl. No. Outlook Temp Humidity Windy Golf 1 Mild High Weak Yes
1 Rainy Hot High Weak No 2 Cool Normal Weak Yes
2 Rainy Hot High Strong No 3 Cool Normal Strong No
3 Overcast Hot High Weak Yes 4 Mild Normal Weak Yes
4 Sunny Mild High Weak Yes
5 Mild High Strong No
5 Sunny Cool Normal Weak Yes
6 Sunny Cool Normal Strong No
7 Overcast Cool Normal Strong Yes Rainy
8 Rainy Mild High Weak No
Sl. No. Temp Humidity Windy Play Golf
9 Rainy Cool Normal Weak Yes
10 Sunny Mild Normal Weak Yes 1 Hot High Weak No
11 Rainy Mild Normal Strong Yes 2 Hot High Strong No
12 Overcast Mild High Strong Yes 3 Mild High Weak No
13 Overcast Hot Normal Weak Yes 4 Cool Normal Weak Yes
14 Sunny Mild High Strong No 5 Mild Normal Strong Yes
Decision Tree: Gini Index Sunny
Entropy of the Class Variable
+

++]

( )( ( ) ( ) )
2 2
3 2 1
𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑇𝑒𝑚𝑝 )=1− 1− −
5 3 3 Temperature
¿ 0.5333 Yes No Count
( )( ( ) ( ) )
2 2
2 1 1 Mild 2 1 3
− 1− −
5 2 2 Cool 1 1 2
Total 3 2 100%
𝐺𝑖𝑛𝑖 𝐺𝑎𝑖𝑛 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑇𝑒𝑚𝑝 ) =0.48 − 0.5333=−0.05333
Decision Tree: Gini Index Sunny
Entropy of the Class Variable
+

+]

( )( ( ) ( ) )
2 2
2 1 1
𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =1 − 1− −
5 2 2
¿0.5333
( )( ( ) − ( ) )
2 2
3 2 1
− 1− Humidity
5 3 3
Yes No Count
High 1 1 2
𝐺𝑖𝑛𝑖 𝐺𝑎𝑖𝑛 ( 𝑆𝑢𝑛𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 )=0.48 − 0.5333=−0.05333 Normal 2 1 3
Total 3 2 100%
Decision Tree: Gini Index Sunny
Entropy of the Class Variable
+

+]

( )( ( ) ( ) )
2 2
3 3 0
𝐺 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑊𝑖𝑛𝑑𝑦 ) =1− 1− −
5 3 3
¿1
( )( ( ) ( ) )
2 2
2 0 2 Wind
− 1− −
5 2 2 Yes No Count
Weak 3 0 3
𝐺𝑖𝑛𝑖 𝐺𝑎𝑖𝑛 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑊𝑖𝑛𝑑𝑦 ) =0.48 −1=− 0.52 Strong 0 2 2
Total 3 2 100%
Choose the one with highest Gini
Decision Tree: Gini Index Gain
(or Lowest Gini Impurity) for root
node

𝐺𝑖𝑛𝑖 𝐺𝑎𝑖𝑛 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑇𝑒𝑚𝑝 ) =0.48 − 0.5333=−0.05333 Outlook

𝐺𝑖𝑛𝑖 𝐺𝑎𝑖𝑛 ( 𝑆𝑢𝑛𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 )=0.48 − 0.5333=−0.05333

𝐺𝑖𝑛𝑖 𝐺𝑎𝑖𝑛 ( 𝑆𝑢𝑛𝑛𝑦 , 𝑊𝑖𝑛𝑑𝑦 ) =0.48 −1=− 0.52


Sunny Rainy

Windy Yes

Weak Strong

Yes No
Decision Tree: Gini Index Rainy
Entropy of the Class Variable
+

+]

( )( ( ) ( ) )
2 2
2 0 2
𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝑇𝑒𝑚𝑝 )=1− 1− −
5 2 2 Temperature

0.8
Yes No Count
5 ( 2 )
− ( ) 1− ( ) − ( )
2 2
2 1 1
Hot 0 2 2
2
Mild 1 1 2

5 ( 1 )
Cool 1 0 1
− ( ) 1− ( ) − ( )
2 2
1 1 0
1 Total 2 3 100%
0.32
Decision Tree: Gini Index Rainy
Entropy of the Class Variable
+

+]

( )( ( ) ( ) )
2 2
3 0 3
𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =1− 1− − Humidity
5 3 3
Yes No Count
¿1 High 0 3 3
( )( ( ) ( ) )
2 2
2 2 0
− 1− − Norma
2 0 2
5 2 2 l

Total 2 3 100%
𝐺𝑖𝑛𝑖 𝐺𝑎𝑖𝑛 ( 𝑅𝑎𝑖𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =0.48 −1=− 0.2
Decision Tree: Gini Index Rainy
Entropy of the Class Variable
+

+]

( )( ( ) ( ) )¿ 0.5333
2 2
3 1 2 Wind
𝐺 ( 𝑅𝑎𝑖𝑛𝑦 , 𝑊𝑖𝑛𝑑 ) =1− 1− −
5 3 3 Yes No Count

5 ( 2 )
− ( ) 1− ( ) − ( )
2 2
2 1 1 Weak 1 2 3
2
Strong 1 1 2

0.26667
Total 2 3 100%
Decision Tree: Gini Gain RainyChoose the one with highest Gini
Gain
(or Lowest Gini Impurity) for root
0.32 node
Outlook
𝐺𝑖𝑛𝑖 𝐺𝑎𝑖𝑛 ( 𝑅𝑎𝑖𝑛𝑦 , 𝐻𝑢𝑚𝑖𝑑𝑖𝑡𝑦 ) =0.48 −1=− 0.2

0.26667
Sunny Rain

Overcast

Windy Yes
Humidity

Weak Strong Normal High

Yes No Yes No

You might also like