0% found this document useful (0 votes)
25 views

Data Mining Project

The document is a group project submission for a Data Mining course. It contains the group members' names and student IDs, the course lecturer, and the submission date. The body of the document includes answers to two questions involving data mining techniques. It shows the steps to generate frequent itemsets and association rules from market basket data in Question A. Question B constructs a frequent pattern tree and lists the frequent patterns mined from transactional data.

Uploaded by

Wasmawatie Waz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Data Mining Project

The document is a group project submission for a Data Mining course. It contains the group members' names and student IDs, the course lecturer, and the submission date. The body of the document includes answers to two questions involving data mining techniques. It shows the steps to generate frequent itemsets and association rules from market basket data in Question A. Question B constructs a frequent pattern tree and lists the frequent patterns mined from transactional data.

Uploaded by

Wasmawatie Waz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

BACHELOR OF TECHNOLOGY MANAGEMENT 

(HONOUR)
DATA MINING (MTM3223)
 

GROUP PROJECT 
NAME STUDENT ID

KHAIRUNISA BINTI ABD RAZAK BTM22090015

MOHAMMAD FADZILAH BIN FATIHI BTM22090019

WASMAWATI TEREN BTM22090016


NURUL NIENA SYAHIRA BINTI
BTM22090025
HERUANDI LEE
MOHAMMAD AIDILL MAULA HAMDAN BTM22090020
  
  
COURSE LECTURER: MS CHEW KIM MEY  
  
  
   
SUBMISSION DATE: 26 MAY 2023 (FRIDAY)   
 
TASK 1

QUESTION A

TASK 1

Answer for Question A:


STEP 1 – Generate Frequency Table

ITEMSET FREQUENCY/SUPPORT
Bread 3/10 = 30%
Milk 7/10 = 70%
Cheese 5/10 = 50%
Beer 6/10 = 60%
Umbrella 5/10 = 50%
Diaper 4/10 = 40%
Water 7/10 = 70%
Detergent 3/10 = 30%

STEP 2 – Prune Step


ITEMSET FREQUENCY/SUPPORT
Milk 70%
Cheese 50%
Beer 60%
Umbrella 50%
Water 70%

STEP 3 – Joint Step


ITEM SET FREQUENCY/SUPPORT
Milk, Cheese 3/10 = 30%
Milk, Beer 3/10 = 30%
Milk, Umbrella 3/10 = 30%
Milk, Water 6/10 = 60%
Cheese, Beer 2/10 = 20%
Cheese, Umbrella 5/10 = 50%
Cheese, Water 4/10 = 40%
Beer, Umbrella 2/10 = 20%
Beer, Water 3/10 = 30%
Umbrella, Water 4/10 = 40%

STEP 4 – Prune Step

ITEMSET FREQUENCY/SUPPORT
Milk, Water 60%
Cheese, Umbrella 50%

STEP 5 – Joint Step

ITEMSET FREQUENCY/SUPPORT
Milk, Water, Cheese, Umbrella 3/10 = 30%

Explanation: The item set in Step 5 was below the frequency 50% which makes it invalid. So
the item in Step 4 will be used to generate the association rule

ASSOCIATION RULE
Milk =>> Water 6
[60%, ( × 100¿ = 85.71%]
7
Cheese =>> Umbrella 5
[50%, ( ×100 ¿ = 100%]
5
Water =>> Milk 6
[60%, ( × 100¿ = 85.71%]
7
Umbrella =>> Cheese 5
[50%, ( ×100 ¿ = 100%]
5

QUESTION B

Answer for question B

b) Frequent Pattern Tree (FP-Tree):

(MILK: 7, WATER: 7, BEER: 6, CHEESE: 5, UMBRELLA: 5)

TID Items
T1 MILK, WATER, BEER
T2 MILK, WATER
T3 MILK, WATER, BEER
T4 MILK, BEER
T5 WATER, BEER, CHEESE, UMBRELLA
T6 MILK, WATER, CHEESE, UMBRELLA
T7 BEER, CHEESE, UMBRELLA
T8 MILK, WATER, CHEESE, UMBRELLA
T9 MILK, WATER, CHEESE, UMBRELLA
T10 BEER

FP TREE:
{}

MILK: 4 WATER: 1 MILK: 3 BEER: 1

BEER: 1 WATER: 3
WATER: 3

CHEESE: 1 CHEESE: 3
BEER: 2 BEER: 1

UMBRELLA: 3
UMBRELLA: 1
MILK=7

WATER=7
FREQUENT PATTERN (FP-TREE) = MILK: 7, WATER: 7, BEER: 6, CHEESE:
BEER=6
5, UMBRELLA: 5
CHEESE=5

UMBRELLA=5

TASK 2

ZeroR Model
Yes No
Purchase Yes 6 4 0.6
No 0 0 0.0
1 0 0.6

OneR Model

Purchased Majority Errors


Yes No class/Rules
20 - 30 1 2 No 2/3
Age 31 - 40 2 1 Yes 1/3
41 - 50 2 0 Yes 0/2
51 - 60 1 1 Yes 1/2

Errors
Purchase
Yes No
Gender Femal 3 2
e
Male 3 2

Majority
class/Rules
Yes 2/5
Yes 2/5
Purchase Majority Errors
Yes No class/Rules
New York 0 2 No 2/2
Los Angeles 1 0 Yes 0/1
Chicago 0 1 No 0/1
Houston 1 0 Yes 1/1
Location Miami 0 1 No 0/1
San Francisco 1 0 Yes 1/1
Boston 1 0 Yes 1/1
Dallas 1 0 Yes 1/1
Seattle 1 0 Yes 1/1
Purchase Majority Errors
Yes No class/Rules
Time 1- 20 2 4 No 4/6
Spent 21 - 30 3 0 Yes 0/3
Browsing 41 - 60 1 0 Yes 0/1
(min)

Majority Errors
Purchase
class/Rules
Yes No
Numbe 1 - 10 3 3 Yes 3/6
r of 11 - 20 3 1 Yes 1/4
Page
viewed
Purchase
Yes No
Time Spent Yes 4 0 1
Browsing No 2 4 0.33
0.67 1 0.8

Naïve Bayesian Model

Likelihood Table

Frequency Table Purchase P (Yes) = 6/10 Purchase


Yes No P (No) = 4/10 Yes No P(x) P (Yes) P (No)
20 - 30 1 2 20 - 30 1/6 2/4 3/10 0.3 0.67
Age 31 - 40 2 1 31 - 40 2/6 1/4 3/10 0.67 0.3
Ag
41 - 50 2 0 e 41 - 50 2/6 0/4 2/10 1 0
51 - 60 1 1 51 - 60 1/6 1/4 2/10 0.5 0.5

Likelihood Table

Frequency Table Purchase


Yes No P (Yes) = 6/10 Purchase
Gender Femal 3 2 P (No) = 4/10 Yes No P (x) P (Yes/x) P (No/x)
e Genders Female 3/6 2/ 5/10 0.6 0.4
male 3 2 4
Male 3/6 2/ 5/10 0.6 0.4
4
Frequency Table Purchase
Yes No
New York 0 2
Los Angeles 1 0
Chicago 0 1
Houston 1 0
Location Miami 0 1
San Francisco 1 0
Boston 1 0
Dallas 1 0
Seattle 1 0

Likelihood Table

P (Yes) = 8/10 Purchase


P (No) = 2/10 Yes No P (x) P (Yes/x) P (No/x)
New York 2/6 0/4 2/10 1.0 0
Los Angeles 1/6 0/4 1/10 1.0 0
Chicago 0/6 1/4 1/10 0 1
Houston 1/6 0/4 1/10 1.0 0
Miami 0/6 1/4 1/10 0 1
Location
San Francisco 1/6 0/4 1/10 1.0 0
Boston 1/6 0/4 1/10 1.0 0
Dallas 1/6 0/4 1/10 1.0 0
Seattle 1/6 0/4 1/10 1.0 0
Frequency Table Purchase
Yes No
Time 1 - 20 2 4
Spent 21 - 40 3 0
Browsing 41 - 60 1 0

Likelihood Table

P (Yes) = 6/10 Purchase


P (No) = 4/10 Yes No P (x) P (Yes/x) P (No/x)
Time 1 – 20 2/6 4/4 6/10 0.33 0.67
Spent 21 – 40 3/6 0/4 3/10 1 0
Browsin 41 – 60 1/6 0/4 1/10 1 0
g

Likelihood Table

Frequency table Yes No P (Yes) = 6/10 Purchase


Number of 1 - 20 3 3 P (No) = 4/10 Yes No P (x) P (Yes) P (No)
Pages 11 - 20 3 1 Number of 1 -20 3/6 3/4 6/10 0.5 0.5
Viewed Pages 11 – 3/6 1/4 4/10 0.75 0.25
Viewed 20

Purchase E (Purchase) = E (4, 6)


Yes No
= E (-0.4 log2 0.4) – (0.6 log2 0.6)
6 4
= 0.97
Entropy Age

E (Purchase, age) = P (20 - 30)* E (1, 2) + P (31 - 40)* E (2, 1) + P (41 – 50) * E (2, 0) +

P (51, 60)* E (1, 1)

= [P (3/10)* E (0.92)] + [P (3/10)* E (0.92) + [P (2/10)* E (0)] +

[P (2/10)* E (1)]

= 0.75

Gain age = E (Purchase) – E (Purchase, age)

= 0.92- 0.75

= 0.22

Entropy Gender

E (Purchase, Gender) = P (Female)* E (3, 2) + P (Male)* E (3, 2)

= P (5/10)* E (0.97) + P (5/10)* E (0.97)

= 0. 485 + 0.485

= 0.97

Gain Gender = E (Purchase) – E (Purchase, Gender)

= 0.97- 0.97

=0
Entropy Location

E (Purchase, Location) = P (New York)* E (1, 0) + P (Los Angeles)* E (1, 0) + P (Chicago)*

E (0, 1) + P (Houston)* E (1, 0) + P (Miami)* E (0, 1) +

P (San Francisco)* E (1, 0) + P (Boston)* E (1, 0) + P (Dallas)* E (1,


0)

+ P (Seattle)* E (1, 0)

=0

Gain Location = E (Purchase) – E (Purchase, Location)

= 0.97 – 0

= 0.97

Entropy Time Spent Browsing

E (Purchase, Time Spent Browsing) = P (1 – 20)* E (2, 4) + P (21 – 40)* E (3, 0) + P (41- 60)*

E (1, 0)

= P (6/10)* E (0, 92) + P (3/10)* E (0) + P (1/10)* E (0)

= 0.952

Gain Time Spent Browsing = E (Purchase) – E (Purchase, Time Spent Browsing)

= 0.97 – 0.95

= 0.02
Entropy Number of Pages Viewed

E (Purchase, Number of Page Viewed) = P (1 – 10)* E (3, 3) + P (11 – 20)* E (3, 1)

= P (6/10)* E (1) + P (4/10)* E (1.12)

= 1.05

Gain Number of Pages Viewed = E (Purchase) – E (Purchase, Number of Pages Viewed)

= 0.97-1.05

= 0.08
Decision Tree Model Age Gender Location T. Spent N. of Purchase
Browsing Pages
viewed
20 - 30 Male Chicago 1 - 20 1 - 10 No

20 - 30
20 - 30 Male Miami 1 - 20 1 - 10 No
20 - 30 Male Boston 1 - 20 1 - 10 Yes

31 - 40 Female New York 1 - 20 1 - 10 No

31 - 40 31 - 40 Female Seattle 1 - 20 1 - 10 Yes


Age

31 - 40 Female San Francisco 1 - 20 1 - 10 Yes

41 - 50 Male Los Angeles 21 - 40 11 - 20 Yes

41 - 50 41 - 50 Male Dallas 21 - 40 11 - 20 Yes

51 - 60 Female Houston 41 - 60 11 - 20 Yes

51 - 60
51 - 60 Female New York 1 - 20 11 - 20 No

For 20 - 30

Entropy (Purchase, Gender) = P (3/3)* E (2, 1)

= P (3/3)* E (0.92)

= 0.92

Gain = E (Purchase) – E (Purchase, Gender)

= 0.92 - .92

=0

Entropy (Purchase, Location) = P (Chicago)* E (1, 0) + P (Miami)* E (1, 0) + P (Boston)*


E (1, 0) + P (Boston)* E (1, 0)

= [P (1/1)* E (0)] + [P (1/1)* E (0)] + [P (1/1)* E (0)]

=0

Gain = E (Purchase) – E (Purchase, Location)

= 0.92- 0

= 0.92

Entropy (Purchase, Time) = P (1 -20)* E (2, 1)

= P (3/3)* E (0.92)

= 0.92

Gain = E (Purchase) – E (Purchase, Time)

= 0.92 – 0.92

=0

For 31 – 40
Entropy (Purchase, Gender) = P (Female)* E (2, 1)

= P (3/3)* E (0)

= -0.92

Gain = E (Purchase) – E (Purchase, Gender)

= 0 – (-0.92)

= 0.92

Entropy (Purchase, Location) = P (New York)* E (1, 0) + P (San Francisco)* E (1, 0) +

P (Seattle)* E (1, 0)

= P (1/3)* E (0) + P (1/3)* E (0) + P (1/3)* E (0)

=0

Gain = E (Purchase) – E (Purchase, Location)

= 0.92 – 0

= 0.92

Entropy (Purchase, Time Spent Browsing) = P (1 – 20)* E (2, 1)

= P (3/3)* E (-0.92)

= - 0.92

Gain = E (Purchase) – E (Purchase, Time Spent Browsing)

= 0 – (-0.92)

= 0.92

Entropy (Purchase, Number of Pages Viewed) = P (1 – 10)* E (1, 1) + P (11 – 20)* E (1, 0)
= P (2/3)* E (1) + P (1/3)* E (0)

= - 0.33

Gain = E (Purchase) – E (Purchase, Number of Pages Viewed)

= 0.52 – (-0.33)

= 0.85

For 51 - 60

Entropy (Purchase, Gender) = P (2/2)* E (1, 2)

= P (1/2)

= 0.5

Gain = E (Purchase) – E (Purchase, Gender)

= 1 – 0.5

Entropy (Purchase, Location) = P (Houston)* E (1, 0) + P (New York)* E (1, 0)

= P (1/1)* E (0) + P (1/1)* E (0)

=0

Gain = E (Purchase) – E (Purchase, Location)

=1–0

=0

Entropy (Purchase, Time) = P (41 – 60)* E (1, 0) + P (11 – 20)* E (1, 0)


= P (1/1) * E (0) + (1/1)* E (0)

=0

Gain = E (Purchase) – E (Purchase – Time)

=1–0

Entropy (Purchase, Number of Pages Viewed) = P (11 – 20)* E (1, 1)

= P (2/2) * E (1)

=1

Gain = E (Purchase) – E (Purchase – Number of Pages Viewed)

= 1- 1

=0

Generate Tree Diagram:


Age

20 - 30 31 - 40 41 - 50 51 - 60

Location Location Yes Location

San
Boston New York Seattle Houston New York
Francisco

Yes Yes Yes Yes Yes No

R1: IF (Age= 20 -30) AND (Location=Boston) THEN Purchase= Yes


R2: IF (Age= 31 – 40) AND (Location = New York, Seattle, San Francisco)
THEN Purchase = Yes
R3 : IF (Age = 31 – 40) AND (Location = Seattle) THEN Purchase = Yes
R4 : IF (Age = 31 – 40) AND (Location = San Francisco) THEN Purchase = yes
R5 : IF (Age = 41 – 50) THEN Purchase = Yes
R6 : IF (Age = 51 – 60) AND (Location = Houston) THEN Purchase = Yes
R7 : IF (Age = 51 – 60) AND (Location = New York) THEN Purchase = No

Task 3:
Scatter Plot for 2 Clusters of the Data Set:

CLUSTER 1 CLUSTER 2
7

5
COLOUR INTENSITY

0
65 70 75 80 85 90 95 100 105 110 115
MAGNESIUM

Scatter Plot for 3 Clusters of the Data Set:

CLUSTER 1 CLUSTER 2 CLUSTER 3


7

5
COLOR INTENSITY

0
60 80 100 120 140 160 180
MAGNESIUM

Scatter Plot for 4 Clusters of the Data Set:


CLUSTER 1 CLUSTER 2 CLUSTER 3 CLUSTER 4
14

12

10
COLOR INTENSITY

0
60 80 100 120 140 160 180
MAGNESIUM

Explanation:

Clustering is the process of creating a group of abstract objects within classes of similar objects.
The three-scatter plots above consists of 2, 3, and 4 clusters respectively. The first scatter
diagram shows two clusters with high class similarity, group 1 with low intra-class similarity and
group 2 with high intra-class similarity. All three clusters has high inter-class similarity. Clusters
1 and 2 show a high intra-class similarity. While cluster 3 has low intra-class similarity. The
third scatter diagram consists of 4 clusters. All clusters except cluster 3 have high inter-class
similarity. There is low inter-class similarity between cluster 3 and 4. There are strong intra-class
similarities between the four clusters. There are also possible outliers which can be identified in
each scatter diagram.

You might also like