0% found this document useful (0 votes)
2 views6 pages

3 - Decision Trees

Chapter 3 discusses decision trees using a Tennis Play dataset to compute the Weighted GINI for various features. It demonstrates the process of calculating GINI impurity for different attributes like Outlook, Temperature, Humidity, and Wind, ultimately identifying Outlook_overcast as the root node due to its minimum GINI impurity. The chapter further explores the implications of these calculations in decision-making for playing tennis based on weather conditions.

Uploaded by

sshobby12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views6 pages

3 - Decision Trees

Chapter 3 discusses decision trees using a Tennis Play dataset to compute the Weighted GINI for various features. It demonstrates the process of calculating GINI impurity for different attributes like Outlook, Temperature, Humidity, and Wind, ultimately identifying Outlook_overcast as the root node due to its minimum GINI impurity. The chapter further explores the implications of these calculations in decision-making for playing tennis based on weather conditions.

Uploaded by

sshobby12
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Chapter 3

Decision Trees

Table 1. Tennis Play Dataset

Outlook Temperature Humidity Wind Play


1
Rain Cool Normal Strong No
2
Overcast Cool Normal Strong Yes
3
Sunny Mild High Weak No
4
Sunny Cool Normal Weak Yes
5
Rain Mild Normal Weak Yes
6
Sunny Mild Normal Strong Yes
7
Overcast Mild High Strong Yes
8
Overcast Hot Normal Weak Yes
9
Rain Mild High Strong No

Table 2. Play Tennis Dataset Encoded

Outlook_ Outlook_ Outlook_ Temperature_ Temperature_ Temperature_ Humidity_ Humidity_ Wind_ Wind_
Overcast Rain Sunny Cool Hot Mild High Normal Strong Weak Play
FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE No
TRUE FALSE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE Yes
FALSE FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE No
FALSE FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE Yes
FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE Yes
FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE Yes
TRUE FALSE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE Yes
TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE Yes
FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE No

Compute the Weighted GINI for each field


Outlook_overcast; True: Outlook_overcast; False: Weighted_GINI(Outlook_overcast) =
Yes: (3 instances) Yes: (3 instances) 3 6
∗ 0 + ∗ 0.5 = 0.333
No: (0 instance) No: (3 instance) 9 9
GINI(True) = GINI(False) =
3 2 0 2 3 2 3 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0.5
3 3 6 6

Outlook_Rain: True Outlook_Rain: False Weighted_GINI(Outlook_Rain) =


Yes: (1 instances) Yes: (5 instances) 3 6
∗ 0.44 + ∗ 0.277 = 0.331
No: (2 instance) No: (1 instances) 9 9
GINI(True) = GINI(False) =
1 2 2 2 5 2 1 2
1 − (( ) + ( ) ) = 0.44 1 − (( ) + ( ) ) = 0.277
3 3 6 6
Outlook_Sunny: True Outlook_Sunny: False Weighted_GINI(Outlook_Sunny) =
Yes: (2 instances) Yes: (4instances) 3 6
∗ 0.44 + ∗ 0.44 = 0.44
No: (1 instance) No: (2 instances) 9 9
GINI(True) = GINI(False) =
2 2 1 2 4 2 2 2
1 − (( ) + ( ) ) = 0.44 1 − (( ) + ( ) ) = 0.44
3 3 6 6

Temperature_Cool: True Temperature_Cool: False Weighted_GINI(Temperature_Cool) =


Yes: (2 instances) Yes: (4 instances) 3 6
∗ 0.44 + ∗ 0.44 = 0.44
No: (1 instance) No: (2 instance) 9 9
GINI(True) = GINI(False) =
2 2 1 2 4 2 2 2
1 − (( ) + ( ) ) = 0.44 1 − (( ) + ( ) ) = 0.44
3 3 6 6

Temperature_Hot: True Temperature_Hot: False Weighted_GINI(Temperature_Hot) =


Yes: (1 instances) Yes: (5 instances) 1 8
∗ 0 + ∗ 0.468 = 0.416
No: (0 instance) No: (3 instance) 9 9
GINI(True) = GINI(False) =
1 2 0 2 5 2 3 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0.468
1 1 8 8

Temperature_Mild: True Temperature_Mild: False Weighted_GINI(Temperature_Mild) =


Yes: (3instances) Yes: (3 instances) 5 4
No: (2 instance) No: (1 instance) ∗ 0.48 + ∗ 0.375 = 0.433
9 9
GINI(True) = GINI(False) =
3 2 2 2 3 2 1 2
1 − (( ) + ( ) ) = 0.48 1 − (( ) + ( ) ) = 0.375
5 5 4 4

Humidity_High: True Humidity_High: False Weighted_GINI(Humidity_High) =


Yes: (1 instances) Yes: (5 instances) 3 6
∗ 0.44 + ∗ 0.277 = 0.331
No: (2 instance) No: (1 instance) 9 9
GINI(True) = GINI(False) =
1 2 2 2 5 2 1 2
1 − (( ) + ( ) ) = 0.44 1 − (( ) + ( ) ) = 0.277
3 3 6 6

Humidity_Normal: True Humidity_Normal: False Weighted_GINI(Humidity_Normal) =


Yes: (5 instances) Yes: (1 instances) 6 3
∗ 0.277 + ∗ 0.44 = 0.331
No: (1instance) No: (2 instance) 9 9
GINI(True) = GINI(False) =
5 2 1 2 1 2 2 2
1 − (( ) + ( ) ) = 0.277 1 − (( ) + ( ) ) = 0.44
6 6 3 3
Wind_Strong: True Wind_Strong: False Weighted_GINI(Wind_Strong) =
Yes: (3 instances) Yes: (3 instances) 5 4
No: (2 instances) No: (1 instances) ∗ 0.48 + ∗ 0.375 = 0.433
9 9
GINI(True) = GINI(False) =
3 2 2 2 3 2 1 2
1 − (( ) + ( ) ) = 0.48 1 − (( ) + ( ) ) = 0.375
5 5 4 4

Wind_Weak: True Wind_Weak: False Weighted_GINI(Humidity_Weak) =


Yes: (3 instances) Yes: (3 instances) 4 5
No: (1 instances) No: (2 instances) ∗ 0.375 + ∗ 0.48 = 0.433
9 9
GINI(True) = GINI(False) =
3 2 1 2 3 2 2 2
1 − (( ) + ( ) ) = 0.375 1 − (( ) + ( ) ) = 0.48
4 4 5 5

So, among all the values of computed weighted_GINI, the Weighted_GINI(Outlook_overcast) has a
minimum GINI impurity, therefore it can be selected as a root node.
Outlook_Rain = TRUE
Temperature_ Temperature_ Temperature_ Humidity Humidity Wind_ Wind_
Play
Cool Hot Mild _High _Normal Strong Weak
TRUE FALSE FALSE FALSE TRUE TRUE FALSE No

FALSE FALSE TRUE FALSE TRUE FALSE TRUE Yes

FALSE FALSE TRUE TRUE FALSE TRUE FALSE No

Tempreature_Cool; True: Tempreature_Cool; False: Weighted_GINI(Tempreature_Cool) =


Yes: (0 instances) Yes: (1 instances) 1 2
∗ 0 + ∗ 0.5 = 0.333
No: (1 instance) No: (1 instance) 3 3
GINI(True) = GINI(True) =
0 2 1 2 1 2 1 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0.5
1 1 2 2

Tempreature_Hot; True: Tempreature_Hot; False: Weighted_GINI(Tempreature_Hot) =


Yes: (0 instances) Yes: (1 instances) 0 3
∗ 0 + ∗ 0.444 = 0.444
No: (1 instance) No: (2 instance) 3 3
GINI(True) = GINI(True) =
0 1 2 2 2
1 − (( ) + ( ) ) = 0.444
3 3

Tempreature_Mild; True: Tempreature_Mld; False: Weighted_GINI(Tempreature_Hot) =


Yes: (1 instances) Yes: (0 instances) 2 1
∗ 0.5 + ∗ 0 = 0.333
No: (1 instance) No: (1 instance) 3 3
GINI(True) = GINI(True) =
1 2 1 2 0 2 1 2
1 − (( ) + ( ) ) = 0.5 1 − (( ) + ( ) ) = 0
2 2 1 1

Humidity_High; True: Humidity_High; False: Weighted_GINI(Humidity_High) =


Yes: (0 instances) Yes: (1 instances) 1 2
∗ 0 + ∗ 0.5 = 0.333
No: (1 instance) No: (1 instance) 3 3
GINI(True) = GINI(True) =
0 2 1 2 1 2 1 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0.5
1 1 2 2

Humidity_Normal; True: Tempreature_high; False: Weighted_GINI(Tempreature_Hot) =


Yes: (1 instances) Yes: (0 instances) 2 1
∗ 0.5 + ∗ 0 = 0.333
No: (1 instance) No: (1 instance) 3 3
GINI(True) = GINI(True) =
1 2 1 2 0 2 1 2
1 − (( ) + ( ) ) = 0.5 1 − (( ) + ( ) ) = 0
2 2 1 1

Wind_Strong: True Wind_Strong: False Weighted_GINI(Wind_Strong) =


Yes: (0 instances) Yes: (1 instances) 2 1
∗0+ ∗0= 0
No: (2 instances) No: (0 instances) 3 3
GINI(True) = GINI(False) =
0 2 2 2 1 2 0 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0
2 2 1 1
Wind_Weak: True Wind_Weak: False Weighted_GINI(Humidity_Weak) =
Yes: (1 instances) Yes: (0 instances) 1 2
∗0+ ∗0= 0
No: (0 instances) No: (2 instances) 3 3
GINI(True) = GINI(False) =
1 2 0 2 0 2 2 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0
1 1 2 2

So here we are having Wind_Weak and Wind_Strong as purest GINIs. With equal purity,
therefore we need to randomly select one out of these. So, here we selected Wind_Strong and
GINI(True)=0 and GINI(False)=0 for Wind_Strong.
Outlook_Rain= FALSE
Outlook_ Outlook_ Temperature_ Temperature_ Temperature_ Humidity Humidity Wind_ Wind_
Play
Overcast Sunny Cool Hot Mild _High _Normal Strong Weak
TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE FALSE Yes

FALSE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE No

FALSE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE Yes

FALSE TRUE FALSE FALSE TRUE FALSE TRUE TRUE FALSE Yes

TRUE FALSE FALSE FALSE TRUE TRUE FALSE TRUE FALSE Yes

TRUE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE Yes

Outlook_overcast; True: Outlook_overcast; False: Weighted_GINI(Outlook_overcast) =


Yes: (3 instances) Yes: (2 instances) 3 3
∗ 0 + ∗ 0.44 = 0.22
No: (0 instance) No: (1 instance) 6 6
GINI(True) = GINI(False) =
3 2 0 2 2 2 1 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0.44
3 3 3 3

Outlook_Sunny: True Outlook_Sunny: False Weighted_GINI(Outlook_Sunny) =


Yes: (2 instances) Yes: (3instances) 3 3
∗ 0.44 + ∗ 0 = 0.22
No: (1 instance) No: (0 instances) 6 6
GINI(True) = GINI(False) =
2 2 1 2 3 2 0 2
1 − (( ) + ( ) ) = 0.44 1 − (( ) + ( ) ) = 0
3 3 3 3

Temperature_Cool: True Temperature_Cool: False Weighted_GINI(Temperature_Cool) =


Yes: (2 instances) Yes: (3 instances) 3 4
∗ 0 + ∗ 0.375 = 0.25
No: (0 instance) No: (1 instance) 6 6
GINI(True) = GINI(False) =
2 2 0 2 3 2 1 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0.375
2 2 4 4

Temperature_Hot: True Temperature_Hot: False Weighted_GINI(Temperature_Hot) =


Yes: (1 instances) Yes: (4 instances) 1 5
No: (0 instance) No: (1 instance) ∗ 0 + ∗ 0.32 = 0.266
6 6
GINI(True) = GINI(False) =
1 2 0 2 4 2 1 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0.32
1 1 5 5

Temperature_Mild: True Temperature_Mild: False Weighted_GINI(Temperature_Mild) =


Yes: (1instances) Yes: (3 instances) 3 1
∗ 0.44 + ∗ 0 = 0.22
No: (2 instance) No: (0 instance) 6 6
GINI(True) = GINI(False) =
1 2 2 2 3 2 0 2
1 − (( ) + ( ) ) = 0.444 1 − (( ) + ( ) ) = 0
3 3 3 3

Humidity_High: True Humidity_High: False Weighted_GINI(Humidity_High) =


Yes: (1 instances) Yes: (4 instances) 2 4
∗ 0.5 + ∗ 0 = 0.166
No: (1 instance) No: (0 instance) 6 6
GINI(True) = GINI(False) =
1 2 1 2 4 2 0 2
1 − (( ) + ( ) ) = 0.5 1 − (( ) + ( ) ) = 0
2 2 4 4

Humidity_Normal: True Humidity_Normal: False Weighted_GINI(Humidity_Normal) =


Yes: (4 instances) Yes: (1 instances) 4 2
∗ 0 + ∗ 0.5 = 0.166
No: (0 instance) No: (1 instance) 6 6
GINI(True) = GINI(False) =
4 2 0 2 1 2 1 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0.5
4 4 2 2

Wind_Strong: True Wind_Strong: False Weighted_GINI(Wind_Strong) =


Yes: (3instances) Yes: (2 instances) 3 3
∗ 0 + ∗ 0.44 = 0.22
No: (0 instances) No: (1 instances) 6 6
GINI(True) = GINI(False) =
3 2 0 2 2 2 1 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0.44
3 3 3 3

Wind_Weak: True Wind_Weak: False Weighted_GINI(Humidity_Weak) =


Yes: (2 instances) Yes: (3 instances) 3 3
∗ 0.44 + ∗ 0 = 0.22
No: (1 instances) No: (0 instances) 6 6
GINI(True) = GINI(False) =
2 2 1 2 3 2 0 2
1 − (( ) + ( ) ) = 0.44 1 − (( ) + ( ) ) = 0
3 3 3 3

So here we are having Humidity_High and Humidity_Normal as purest GINIs. With equal
purity, therefore we need to randomly select one out of these. So, here we selected
Humidity_Normal and GINI(True)=0 and GINI(False)=0.5 for Humidity_Normal.

Humidity_Normal=False
Outlook_ Outlook_ Temperature_ Temperature_ Temperature_ Humidity Wind_ Wind_
Play
Overcast Sunny Cool Hot Mild _High Strong Weak
FALSE TRUE FALSE FALSE TRUE TRUE FALSE TRUE No
TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE Yes

Outlook_overcast; True: Outlook_overcast; False: Weighted_GINI(Outlook_overcast) =


Yes: (1 instances) Yes: (0 instances) 1 1
∗0+ ∗0=0
No: (0 instance) No: (1 instance) 2 2
GINI(True) = GINI(False) =
1 2 0 2 0 2 1 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0
1 1 1 1

Outlook_Sunny: True Outlook_Sunny: False Weighted_GINI(Outlook_Sunny) = 0


Yes: (0 instances) Yes: (1instances)
No: (1 instance) No: (0 instances)
GINI(True) = GINI(False) =
0 2 1 2 1 2 0 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0
1 1 1 1

Temperature_Cool: True Temperature_Cool: False Weighted_GINI(Temperature_Cool) =


Yes: (0 instances) Yes: (1 instances) 0.5
No: (0 instance) No: (1 instance)
GINI(True) =0 GINI(False) =
1 2 1 2
1 − (( ) + ( ) ) = 0.5
2 2

Temperature_Hot: True Temperature_Hot: False Weighted_GINI(Temperature_Hot) =


Yes: (0 instances) Yes: (1 instances) 0.5
No: (0 instance) No: (1 instance)
GINI(True) =0 GINI(False) =
1 2 1 2
1 − (( ) + ( ) ) = 0.5
2 2

Temperature_Mild: True Temperature_Mild: False Weighted_GINI(Temperature_Mild) =


Yes: (1 instances) Yes: (0 instances) 0.5
No: (1 instance) No: (0 instance)
GINI(False) = GINI(True) =0
1 2 1 2
1 − (( ) + ( ) ) = 0.5
2 2

Humidity_High: True Humidity_High: False Weighted_GINI(Humidity_High) =


Yes: (1 instances) Yes: (0 instances) 0.5
No: (1 instance) No: (0 instance)
GINI(False) = GINI(True) =0
1 2 1 2
1 − (( ) + ( ) ) = 0.5
2 2
Wind_Strong: True Wind_Strong: False Weighted_GINI(Wind_Strong) =
Yes: (1 instances) Yes: (0 instances) 0
No: (0 instance) No: (1 instance)
GINI(True) = GINI(False) =
1 2 0 2 0 2 1 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0
1 1 1 1

Wind_Weak: True Wind_Weak: False Weighted_GINI(Humidity_Weak) =


Yes: (0 instances) Yes: (1instances) 0
No: (1 instance) No: (0 instances)
GINI(True) = GINI(False) =
0 2 1 2 1 2 0 2
1 − (( ) + ( ) ) = 0 1 − (( ) + ( ) ) = 0
1 1 1 1

So, here we selected Outlook_Sunny and GINI(True)=0 and GINI(False)=0 for Outlook_Sunny.

FALSE TRUE
Outlook_Rain

FALSE TRUE FALSE TRUE


Humidity_Normal Wind_Strong

FALSE TRUE
Outlook_Sunny Yes Yes No

Yes No

You might also like