Data Science Formula and Solved Example
Data Science Formula and Solved Example
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
CamScanne
Decision Trees Example Problem
Consider the following data, where the Y label is whether or not the child goes out to play.
= −𝑃(𝑌 = 𝑦𝑒𝑠) log 2 𝑃(𝑌 = 𝑦𝑒𝑠) − 𝑃(𝑌 = 𝑛𝑜) log 2 𝑃(𝑌 = 𝑛𝑜)
= 1
Temperature:
Temperature
Weather
= 0.6
Humidity
STRONG WEAK
Y, Y, Y, N, N, N, N Y, N, Y
= 0.8651
Wind
STRONG WEAK
Y, Y, N, N, N, N N, Y, Y, Y
= 0.8755
Temperature
Temperature Temperature
HOT HOT
N, N -
MILD MILD
Y N, Y, N
COOL
- COOL
N
1 1 2 2
Entropy of “Sunny” node = −((3) log 2 (3) + (3) log 2 (3)) = 0.9183
IG = 0.9183
1 1 3 3
Entropy of “Rainy” node = −((4) log 2 (4) + (4) log 2 (4)) = 0.8113
3 1 1 2 2
Entropy of children = −(4)((3) log 2 (3) + (3) log 2 (3)) + 0 = 0.6887
IG = 0.1226
Humidity
Humidity Humidity
HIGH HIGH
N, N N, Y, N
NORMAL NORMAL
Y N
1 1 2 2
Entropy of “Sunny” node = −((3) log 2 (3) + (3) log 2 (3)) = 0.9183
IG = 0.9183
1 1 3 3
Entropy of “Rainy” node = −(( ) log 2 ( ) + ( ) log 2 ( )) = 0.8113
4 4 4 4
3 1 1 2 2
Entropy of children = −(4)((3) log 2 (3) + (3) log 2 (3)) + 0 = 0.6887
IG = 0.1226
Wind
Wind Wind
STRONG STRONG
N, Y N, N, N
WEAK WEAK
N Y
1 1 2 2
Entropy of “Sunny” node = −(( ) log 2 ( ) + ( ) log 2 ( )) = 0.9183
3 3 3 3
2 1 1 1 1
Entropy of its children = −(3)((2) log 2 (2) + (2) log 2 (2)) + 0 = 0.6667
IG = 0.2516
1 1 3 3
Entropy of “Rainy” node = −((4) log 2 (4) + (4) log 2 (4)) = 0.8113
Entropy of children = 0
IG = 0.8113
Step 4: Choose feature for each node to split on!
“Sunny node”:
“Rainy node”:
Final Tree!
Weather
Humidity Wind
HIGH STRONG
N, N N, N, N
NORMAL WEAK
Y Y
Boosting
(https://www.ccs.neu.edu/home/vip/teach/MLcourse/4_boosting/slides/boosting.pdf)