0% found this document useful (0 votes)
0 views26 pages

Data Science Formula and Solved Example

The document discusses a decision tree example problem involving children's playtime based on various weather attributes. It details the calculation of information gain for attributes such as temperature, weather, humidity, and wind, ultimately leading to the selection of features for splitting the decision tree. The final tree structure is presented, illustrating how the attributes influence the decision to play or not.

Uploaded by

nvshelke370122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views26 pages

Data Science Formula and Solved Example

The document discusses a decision tree example problem involving children's playtime based on various weather attributes. It details the calculation of information gain for attributes such as temperature, weather, humidity, and wind, ultimately leading to the selection of features for splitting the decision tree. The final tree structure is presented, illustrating how the attributes influence the decision to play or not.

Uploaded by

nvshelke370122
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

CamScanne

CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
Decision Trees Example Problem

Consider the following data, where the Y label is whether or not the child goes out to play.

Day Weather Temperature Humidity Wind Play?

1 Sunny Hot High Weak No

2 Cloudy Hot High Weak Yes

3 Sunny Mild Normal Strong Yes

4 Cloudy Mild High Strong Yes

5 Rainy Mild High Strong No

6 Rainy Cool Normal Strong No

7 Rainy Mild High Weak Yes

8 Sunny Hot High Strong No

9 Cloudy Hot Normal Weak Yes

10 Rainy Mild High Strong No


Step 1: Calculate the IG (information gain) for each attribute (feature)

Initial entropy = 𝐻(𝑌) = − ∑𝑦 𝑃(𝑌 = 𝑦) log 2 𝑃(𝑌 = 𝑦)

= −𝑃(𝑌 = 𝑦𝑒𝑠) log 2 𝑃(𝑌 = 𝑦𝑒𝑠) − 𝑃(𝑌 = 𝑛𝑜) log 2 𝑃(𝑌 = 𝑛𝑜)

= −(0.5) log 2 (0.5) − (0.5) log 2 (0.5)

= 1

Temperature:

Temperature

HOT MILD COLD


N, Y, N, Y Y, Y, N, Y, N N

Total entropy of this division is:

𝐻(𝑌 | 𝑡𝑒𝑚𝑝) = − ∑ 𝑃(𝑡𝑒𝑚𝑝 = 𝑥) ∑ 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝑥) log 2 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝑥)


𝑥 𝑦

= −(𝑃(𝑡𝑒𝑚𝑝 = 𝐻) ∑𝑦 𝑃(𝑌 = 𝑦 |𝑡𝑒𝑚𝑝 = 𝐻) log 2 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝐻) +


𝑃(𝑡𝑒𝑚𝑝 = 𝑀) ∑𝑦 𝑃(𝑌 = 𝑦 |𝑡𝑒𝑚𝑝 = 𝑀) log 2 𝑃(𝑌 = 𝑦 |𝑡𝑒𝑚𝑝 = 𝑀) +
𝑃(𝑡𝑒𝑚𝑝 = 𝐶) ∑𝑦 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝐶) log 2 𝑃(𝑌 = 𝑦 | 𝑡𝑒𝑚𝑝 = 𝐶))
1 1 1 1 3 3 2 2
= −((0.4)((2) log 2 (2) + (2) log 2 (2)) + (0.5)((5) log 2 (5) + (5) log 2 (5)) +
(0.1)((1) log 2 (1) + (0) log 2 (0)))
= 0.7884

IG(Y, temp) = 1 – 0.7884 = 0.2116


Weather:

Weather

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Total entropy of this division is:

𝐻(𝑌 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟) = − ∑ 𝑃(𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑥) ∑ 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑥) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑥)


𝑥 𝑦

= −(𝑃(𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑆) ∑𝑦 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑆) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑆) +


𝑃(𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝐶) ∑𝑦 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝐶) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝐶) +
𝑃(𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑅) ∑𝑦 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑅) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑒𝑎𝑡ℎ𝑒𝑟 = 𝑅))
1 1 2 2
= −((0.3)((3) log 2 (3) + (3) log 2 (3)) + (0.3)((1) log 2 (1) + (0) log 2 (0)) +
1 1 3 3
(0.4)((4) log 2 (4) + (4) log 2 (4)))

= 0.6

IG(Y, weather) = 1 – 0.6 = 0.4


Humidity:

Humidity

STRONG WEAK
Y, Y, Y, N, N, N, N Y, N, Y

Total entropy of this division is:

𝐻(𝑌 | ℎ𝑢𝑚) = − ∑ 𝑃(ℎ𝑢𝑚 = 𝑥) ∑ 𝑃(𝑌 = 𝑦 | ℎ𝑢𝑚 = 𝑥) log 2 𝑃(𝑌 = 𝑦 | ℎ𝑢𝑚 = 𝑥)


𝑥 𝑦

= −(𝑃(ℎ𝑢𝑚 = 𝐻) ∑𝑦 𝑃(𝑌 = 𝑦 |ℎ𝑢𝑚 = 𝐻) log 2 𝑃(𝑌 = 𝑦 | ℎ𝑢𝑚 = 𝐻) +


𝑃(ℎ𝑢𝑚 = 𝑁) ∑𝑦 𝑃(𝑌 = 𝑦 |ℎ𝑢𝑚 = 𝑁) log 2 𝑃(𝑌 = 𝑦 |ℎ𝑢𝑚 = 𝑁)
3 3 4 4 2 2 1 1
= −((0.7)(( ) log 2 ( ) + ( ) log 2 ( )) + (0.3)(( ) log 2 ( ) + ( ) log 2 ( ))
7 7 7 7 3 3 3 3

= 0.8651

IG(Y, hum) = 1 – 0.8651 = 0.1349


Wind:

Wind

STRONG WEAK
Y, Y, N, N, N, N N, Y, Y, Y

Total entropy of this division is:

𝐻(𝑌 | 𝑤𝑖𝑛𝑑) = − ∑ 𝑃(𝑤𝑖𝑛𝑑 = 𝑥) ∑ 𝑃(𝑌 = 𝑦 | 𝑤𝑖𝑛𝑑 = 𝑥) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑖𝑛𝑑 = 𝑥)


𝑥 𝑦

= −(𝑃(𝑤𝑖𝑛𝑑 = 𝑆) ∑𝑦 𝑃(𝑌 = 𝑦 |𝑤𝑖𝑛𝑑 = 𝑆) log 2 𝑃(𝑌 = 𝑦 | 𝑤𝑖𝑛𝑑 = 𝑆) +


𝑃(𝑤𝑖𝑛𝑑 = 𝑊) ∑𝑦 𝑃(𝑌 = 𝑦 |𝑤𝑖𝑛𝑑 = 𝑊) log 2 𝑃(𝑌 = 𝑦 |𝑤𝑖𝑛𝑑 = 𝑊)
2 2 4 4 1 1 3 3
= −((0.6)(( ) log 2 ( ) + ( ) log 2 ( )) + (0.4)(( ) log 2 ( ) + ( ) log 2 ( ))
6 6 6 6 4 4 4 4

= 0.8755

IG(Y, wind) = 1 – 0.8755 = 0.1245

Step 2: Choose which feature to split with!

IG(Y, wind) = 0.1245

IG(Y, hum) = 0.1349

IG(Y, weather) = 0.4

IG(Y, temp) = 0.2116


Step 3: Repeat for each level (sad, I know)

Temperature

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Temperature Temperature

HOT HOT
N, N -

MILD MILD
Y N, Y, N

COOL
- COOL
N

1 1 2 2
Entropy of “Sunny” node = −((3) log 2 (3) + (3) log 2 (3)) = 0.9183

Entropy of its children = 0

IG = 0.9183

1 1 3 3
Entropy of “Rainy” node = −((4) log 2 (4) + (4) log 2 (4)) = 0.8113

3 1 1 2 2
Entropy of children = −(4)((3) log 2 (3) + (3) log 2 (3)) + 0 = 0.6887

IG = 0.1226
Humidity

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Humidity Humidity

HIGH HIGH
N, N N, Y, N

NORMAL NORMAL
Y N

1 1 2 2
Entropy of “Sunny” node = −((3) log 2 (3) + (3) log 2 (3)) = 0.9183

Entropy of its children = 0

IG = 0.9183

1 1 3 3
Entropy of “Rainy” node = −(( ) log 2 ( ) + ( ) log 2 ( )) = 0.8113
4 4 4 4

3 1 1 2 2
Entropy of children = −(4)((3) log 2 (3) + (3) log 2 (3)) + 0 = 0.6887

IG = 0.1226
Wind

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Wind Wind

STRONG STRONG
N, Y N, N, N

WEAK WEAK
N Y

1 1 2 2
Entropy of “Sunny” node = −(( ) log 2 ( ) + ( ) log 2 ( )) = 0.9183
3 3 3 3

2 1 1 1 1
Entropy of its children = −(3)((2) log 2 (2) + (2) log 2 (2)) + 0 = 0.6667

IG = 0.2516

1 1 3 3
Entropy of “Rainy” node = −((4) log 2 (4) + (4) log 2 (4)) = 0.8113

Entropy of children = 0

IG = 0.8113
Step 4: Choose feature for each node to split on!

“Sunny node”:

IG(Y, weather) = IG(humidity) = 0.9183

IG(Y, wind) = 0.2516

“Rainy node”:

IG(Y, weather) = IG(Y, humidity) = 0.1226

IG(Y, wind) = 0.8113

Final Tree!

Weather

SUNNY CLOUDY RAINY


N, Y, N Y, Y, Y N, N, Y, N

Humidity Wind

HIGH STRONG
N, N N, N, N

NORMAL WEAK
Y Y
Boosting

(https://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/boosting.pdf)

You might also like