1. (40pts) TABLE 1.
Decision: CATEGORY
Weekend (Example) Weather Parents Money Decision (Category)
W1 Sunny Yes Rich Cinema
W2 Sunny No Rich Tennis
W3 Windy Yes Rich Cinema
W4 Rainy Yes Poor Cinema
W5 Rainy No Rich Stay in
W6 Rainy Yes Poor Cinema
W7 Windy No Poor Cinema
W8 Windy No Rich Shopping
W9 Windy Yes Rich Cinema
W10 Sunny No Rich Tennis
a. Create a Decision Tree
b. Create a Model based on Naïve Bayes
c. Student Number ending in Prime Number: Determine the Decision if
Weather: Rainy, Parents: Yes, Money: Rich
Student Number ending in Non-Prime Number: Determine the Decision if
Weather: Sunny, Parents: Yes, Money: Poor
A. Decision Tree
Step 1
Cinema = 6
Tennis = 2
Stay in = 1
Shopping = 1
H(Category) = H(3/5, 1/5, 1/10, 1/10)
H(3/5, 2/10, 1/10, 1/10)=-(3/5 log2 3/5) – (1/5 log2 1/5) – (1/10 log2 1/10) – (1/10 log2 1/10)
=0.444+ 0.464 + 0.332 + 0.332
H(Category)=1.572
Step 2
H(Category/weather), H(Category/parents), H(Category/money)
H(Category/weather)
Sunny * H + Windy * H + Rainy * H =3/10 (1/3, 2/3, 0/3, 0/3) + 4/10 (3/4, 0/4, 0/4, 1/4) + 3/10 (2/3, 0/3,
1/3, 0/3)
H(Category, Weather) = 3/10 ((-1/3 log2 1/3)-(2/3 log2 2/3) – (0/3 log2 0/3) – (0/3 log2 0/3)) + 4/10
((3/4 log2 ¾) – (0/4 log2 0/4) – (0/4 log2 0/4) – (1/4 log2 ¼)) + 3/10 ((-2/3 log2 2/3) – (0/3 log2 0/3) -
(1/3 log2 1/3) – (0/3 log2 0/3)
= 3/10 (0.92) + 4/10 (0.82) + 3/10 ( 0.92)
=0.276 + 0.328 + 0.276
H(Category/weather) = 0.88
H(Category/parents)
Yes * H + No * H = 5/10 (5/5, 0/5, 0/5, 0/5) + 5/10 (1/5, 2/5, 1/5, 1/5)
H(Category/parents) = 5/10 ((-5/5 log2 5/5) + 5/10 ((-1/5 log2 1/5) – (2/5 log2 2/5) – (1/5 log2 1/5) –
(1/5 log2 1/5))
=5/10 (0) + 5/10 (1.92)
= 0 + 0.96
H(Category/parents) = 0.96
H(Category/money)
Rich * H + Poor * H = 7/10 (3/7 , 2/7, 1/7, 1/7) + 3/10 (3/3, 0/3, 0/3)
H(Category/money) = 7/10 ((-3/7 log2 3/7) – (2/7 log2 2/7) – (1/7 log2 1/7) – (1/7 log2 1/7)) + 3/10 ((-
3/3 log2 3/3))
= 7/10 (1.85) + 3/10 (0)
= 1.295 + 0
H(Category/money) = 1.295
Step 3
H(Category/weather) = 0.88 | H(Category/parents) = 0.96 | H(Category/money) = 1.295
I(Category/weather) = 1.572 – 0.88 = 0.692
I(Category/parents) = 1.572 – 0.96 = 0.612
I(Category/money) = 1.572 – 1.295 = 0.277
Max(0.692, 0.612, 0.277) = 0.692, so Weather is best
Step 4
Sunny = H(1/3, 2/3, 0/3, 0/3) = 0.92
H (Category/parents)
1/3 (1/1, 0/1, 0/1, 0/1) + 2/3 (0/2, 2/2, 0/2, 0/2)
H (Category/parents) = 1/3 ((-1/1 log2 1/1)) + 2/3 ((-2/2 log2 2/2))
= 1/3 (0) + 2/3 (0)
H (Category/parents) = 0
I (Category/parents) = H(1/3, 2/3, 0/3, 0/3) – 0
I (Category/parents) = 0.92
H (Category/money)
3/3 ( 1/3, 2/3, 0/3, 0/3) + 0/3 (0,0,0,0)
H (Category/money) = 3/3 ((-1/3 log2 1/3) – (2/3 log2 2/3))
= 3/3 (0.92)
H (Category/money) = 0.92
I (Category/money) = 0.92 – 0.92 = 0
Max (0.92) = 0.92
Windy = H (3/4, 0/4, 0/4, ¼) = 0.815
H (Category/parents)
2/4 (2/2, 0/2, 0/2, 0/2) + 2/4 (1/2, 0/2, 0/2, ½)
H (Category/parents) = 2/4 ((-2/2 log2 2/2)) + 2/4 ((-1/2 log2 ½) - (-1/2 log2 ½))
= 2/4 (0) + 2/4 (1)
= 0 + 0.5
H (Category/parents) = 0.5
I (Category/parents) = 0.815 – 0.5 = 0.315
H (Category/money)
3/4 (2/3, 0/3, 0/3, 1/3) + ¼ (1/1, 0/1, 0/1, 0/1)
H (Category/money) = ¾ ((-2/3 log2 2/3) – (1/3 log2 1/3)) + ¼ (-1/1 log2 1/1)
= ¾ (0.92)
H (Category/money) =0.69
I (Category/money) = 0.815 – 0.69 = 0.315
B)
Stud Number: 2020100575 Prime
Given: Weather: Rainy, Parents: Yes, Money: Rich
Step 1
P(C1) = P (Decision = Cinema) = 6/10 = 0.6
P (C2) = P (Decision = Tennis) = 2/10 = 0.2
P (C3) = P (Decision = Stay In) = 1/10 = 0.1
P (C4) = P (Decision = Shopping) = 1/10 = 0.1
Step 2
P(Rainy/Cinema) = 2/6 = 0.333
P(Rainy/Tennis) = 0/2 = 0
P(Rainy/Stay In) = 1/1 = 1
P(Rainy/Shopping) = 0/1 = 0
P (Yes / Cinema) = 5/6 = 0.833
P (Yes/ Tennis) = 0/2 = 0
P (Yes/ Stay In) = 0/1 = 0
P (Yes/ Shopping) = 0/1 = 0
P (Rich/ Cinema) = 3/6 = 0.5
P (Rich/ Tennis) = 2/2 = 1
P (Rich/ Stay In) = 1/1 = 1
P (Rich/ Shopping) = 1/1 =1
P(x/Cinema) = 0.333 * 0.833 * 0.5 = 0.139
P(x/Tennis) = 0 * 0* 1 = 0
P(x/Stay In) = 1 * 0 * 1 = 0
P(x/Shopping) = 0 * 0 * 1 = 0
P(Cinema) * P(x/cinema) = 0.6 * 0.139
P(Cinema) * P(x/cinema) = 0.083
Prediction: Cinema
2. (40pts) Table 2. Decision: BUYS-RRSP
a. Create a Decision Tree
b. Create a Model based on Naïve Bayes
c. Student Number ending in Prime Number: Determine the Decision if
Sector: Farming, Income: medium, Self-Employed: Yes, Credit-Rating: Fair
Student Number ending in Non-Prime Number: Determine the Decision if
Sector: Banking, Income: medium, Self-Employed: Yes, Credit-Rating: Excellent
A. Decision Tree
Step 1
Yes = 8
No = 6
H(Buys-RRSP) = H(8/14, 6/14)
=-(8/14 log2 8/14) – (6/14 log2 6/14)
=0.463 + 0.527
H(Buys-RRSP) =0.99
Step 2
H(Buys-RRSP/Sector), H(Buys-RRSP/Income), H(Buys-RRSP/Self-Employed), H(Buys-RRSP/Credit-Rating)
H(Buys-RRSP/Sector)
Farming * H + Oil * H + Banking * H = 5/14 (2/5, 3/5) + 5/14 (2/5, 3/5) + 4/14 (4/4, 0/4)
H(Buys-RRSP/Sector) = 5/14 ((-2/5 log2 2/5) – (3/5 log2 3/5)) + 5/14 ((-2/5 log2 2/5) – (3/5 log2 3/5)) +
4/14 (-4/4 log2 4/4)
= 5/14(0.97) + 5/14 (0.97) + 4/14 (0)
= 0.346 + 0.346
H(Buys-RRSP/Sector) = 0.692
H(Buys-RRSP/Income)
Low * H + Medium * H + High * H = 4/14(3/4, ¼) + 6/14 (3/6, 3/6) + 4/14 (2/4, 2/4)
H(Buys-RRSP/Income) = 4/14 ((-3/4 log2 ¾) – (1/4 log2 ¼)) + 6/14 ((-3/6 log2 3/6) – (3/6 log2 3/6)) +
4/14 ((-2/4 log2 2/4) – (2/4 log2 2/4))
= 0.232 + 0.429 + 0.286
H(Buys-RRSP/Income) = 0.947
H(Buys-RRSP/Self-Employed)
Yes * H + No * H = 7/14 (5/7, 2/7) + 7/14 (4/7, 3/7)
H(Buys-RRSP/Self-Employed) = 7/14 ((-5/7 log2 5/7) – (2/7 log2 2/7)) + 7/14 ((-4/7 log2 4/7) – (3/7 log2
3/7))
= 0.432 + 0.493
H(Buys-RRSP/Self-Employed) = 0.925
H(Buys-RRSP/Credit-Rating)
Fair * H + Excellent * H = 8/14 (3/8, 5/8) + 6/14 ( 5/6, 1/6)
H(Buys-RRSP/Credit-Rating) = 8/14 ((-3/8 log2 3/8) – (5/8 log2 5/8)) + 6/14 ((-5/6 log2 5/6) – (1/6 log2
1/6))
= 0.545 + 0.279
H(Buys-RRSP/Credit-Rating) = 0.824
Step 3
H(Buys-RRSP/Sector) = 0.692 | H(Buys-RRSP/Income) = 0.947 | H((Buys-RRSP/Self-Employed) =0.925
|H (Buys-RRSP/Credit-Rating) = 0.824
I(Buys-RRSP/Sector) = 0.99 – 0.692 = 0.298
I(Buys-RRSP/Income) = 0.99 – 0.947 = 0.043
I(Buys-RRSP/Self-employed) = 0.99 – 0.925 = 0.065
I(Buys-RRSP/credit-rating) = 0.99 – 0.824 = 0.166
Max (0.298, 0.043, 0.065, 0.166) = 0.298, so Sector is Best
Step 4
H(Buys-RRSP/Income), H(Buys-RRSP/Self-Employed), H(Buys-RRSP/Credit-Rating)
H(Buys-RRSP/Income)= 3/5 (1/3 , 2/3) + 2/5 (1/2,1/2) + 0/5 (0,0)
= 0.551 + 0.4
H(Buys-RRSP/Income)= = 0.951
I (Buys-RRSP/Income)= 0.99 – 0.951 = 0.039
H (Buys-RRSP/Self-Employed) = 3/5 (1/3, 2/3) + 2/5 (1/2 , ½)
= 0.551 + 0.4
H (Buys-RRSP/Self-Employed) = 0.951
I (Buys-RRSP/Self-Employed) = 0.99 – 0.951 = 0.039
I(Buys-RRSP/Credit-Rating) = 0.99 – 0 = 0.99
Max (0.039, 0.039, 0.99) = 0.99
B)
Stud Number: 2020100575 Prime
Given: Sector: Farming, Income: medium, Self-Employed: Yes, Credit-Rating: Fair
Step 1
P (C1) = P (Buys – RRSP = Yes) = 8/14 = 0.571
P (C2) = P (Buys – RRSP = No) = 6/14 = 0.429
Step 2
P(Farming/Yes) = 2/8 = 0.25
P(farming/No) = 3/6 = 0.5
P(Medium / Yes) = 3/8 = 0.375
P(Medium / No) = 3/6 = 0.5
P(Yes/ Yes) = 5/8 = 0.625
P(Yes/ No) = 2/6 = 0.333
P(Fair/Yes) = 3/8 = 0.375
P(Fair/No) = 5/6 = 0.833
P(x/yes) = 0.25 * 0.375 * 0.625 * 0.375 = 0.021
P(x/no) = 0.5 * 0.5 * 0.333 * 0.833 = 0.069
P(x/yes) * P(yes) = 0.021 * 0.571 = 0.012
P(x/no) * P(no) = 0.069 * 0.429 = 0.030
Prediction: No with 0.030 probability
3. (20PTS) TABLE 3. APPLY APRIORI ALGORITHM FOR BASKET ANALYSIS
Min support: 2
Min Confidence: 50%
Step 1
Items Bought Support Count
A 5
B 7
C 5
D 9
E 6
Min support: 2
Min Confidence: 50%
Step 2
Items Bought Support Count
{A,B} 3
{A,C} 2
{A,D} 4
{A,E} 4
{B,C} 3
{B,D} 6
{B,E} 4
{C,D} 4
{C,E} 2
{D,E} 6
A, B, C ,D ,E
Min support: 2
Min Confidence: 50%
Step 3
ABC, ABD, ABE, ACD, ACE, ADE, BCD, BCE, BDE, CDE
Min support: 2
Min Confidence: 50%
Items Bought Support Count
{A, B, C} 1 x
{A, B, D} 2
{A, B, E} 2
{A, C, D} 1x
{A, C, E} 1x
{A, D, E} 4
{B, C, D} 2
{B, C, E} 1x
{B, D, E} 4
{C, D, E} 2
New table with matching min support
Items Bought Support Count
{A, B, D} 2
{A, B, E} 2
{A, D, E} 4
{B, C, D} 2
{B, D, E} 4
{C, D, E} 2
{A, B, D}, {A, B, E}, {A, D, E}, {B, C, D}, {B, D, E}, and {C, D, E}
Step 4
{A, B, D}, {A, B, E}, {A, D, E}, {B, C, D}, {B, D, E}, and {C, D, E}
RULES SUPPORT CONFIDENCE
A -> D^E 3 Sup{ ( A ^(D^E)}/Sup (A) =3/5 =.6 =60%
B -> D^E 4 Sup { (B ^(D^E)}/ Sup (B) = 4/7 = .57 =57%
A - > B^D 2 Sup { A - > B^D}/Sup (A) = 2/5 = 0.4 = 40%
B -> C^D 2 Sup{ B -> C^D }/ Sup(B) = 2/7 = 0.26 = 26%
B^D -> E 4 Sup { (B^D)^E}/ Sup (B^D) = 4/5 = .8 =80%
B^E -> D 4 Sup { (B^E)^D}/ Sup (B^E) = 4/4 = 1 =100%
C^D -> E 2 Sup { C^D -> E }/ Sup (C^D) = 2/4 = .5 = 50%
D^A -> E 3 Sup { (D^A)^E}/ Sup (D^A) = 3/4 = .75 =75%
D -> E^A 3 Sup { (D^(E^A)}/ Sup (D) = 3/9 = .33 = 33%
C -> B^D 2 Sup { C -> B^D} / Sup (C) = 2/5 = .4 = 40%
D^E -> A 3 Sup { (D^E)^A}/ Sup (D^E) = 3/6 = .5 =50%
D^E -> B 4 Sup { (D^E)^B}/ Sup (D^E) = 4/6 = .67 =67%
D -> B^E 4 Sup { (D^(B^E)}/ Sup (D) = 4/9 = .44 =44%
E^A -> D 3 Sup { (E^A)^D}/ Sup (E^A) = 3/4 = .75 =75%
E -> B^D 4 Sup { (E^(B^D)}/ Sup (E) = 4/6 = .67 =67%
E -> D^A 3 Sup { (E(D^A)}/ Sup (E) = 3/6 = .5 =50