0% found this document useful (0 votes)
37 views6 pages

1 Lab Program 4 2 Vinay Sirohi 3 2139472: December 1, 2021

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

2139472 Lab_3

December 1, 2021

1 Lab Program 4

2 Vinay Sirohi

3 2139472
3.1 Generate FP Tree for a transaction dataset
[54]: #Importing mlxtend and printing the current version of it.
import mlxtend
print(mlxtend.__version__)

0.19.0

[76]: #Importing important libraries of our use


from mlxtend.frequent_patterns import fpgrowth
from mlxtend.frequent_patterns import apriori,association_rules
import numpy as np
import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
import time

[12]: #Reading out data set


df = pd.read_csv('Groceries_dataset.csv')
#Looking at the top 20 values of it.
df.head(20)

[12]: Member_number Date itemDescription


0 1808 21-07-2015 tropical fruit
1 2552 05-01-2015 whole milk
2 2300 19-09-2015 pip fruit
3 1187 12-12-2015 other vegetables
4 3037 01-02-2015 whole milk
5 4941 14-02-2015 rolls/buns
6 4501 08-05-2015 other vegetables
7 3803 23-12-2015 pot plants
8 2762 20-03-2015 whole milk
9 4119 12-02-2015 tropical fruit

1
10 1340 24-02-2015 citrus fruit
11 2193 14-04-2015 beef
12 1997 21-07-2015 frankfurter
13 4546 03-09-2015 chicken
14 4736 21-07-2015 butter
15 1959 30-03-2015 fruit/vegetable juice
16 1974 03-05-2015 packaged fruit/vegetables
17 2421 02-09-2015 chocolate
18 1513 03-08-2015 specialty bar
19 1905 07-07-2015 other vegetables

[24]: #Converting date column to proper format


df['Date'] = pd.to_datetime(df['Date'])
df

[24]: Member_number Date itemDescription


0 1808 2015-07-21 tropical fruit
1 2552 2015-05-01 whole milk
2 2300 2015-09-19 pip fruit
3 1187 2015-12-12 other vegetables
4 3037 2015-01-02 whole milk
… … … …
38760 4471 2014-08-10 sliced cheese
38761 2022 2014-02-23 candy
38762 1097 2014-04-16 cake bar
38763 1510 2014-03-12 fruit/vegetable juice
38764 1521 2014-12-26 cat food

[38765 rows x 3 columns]

[27]: #Grouping the transactions by member number and date


df['itemDescription'] = df.groupby(['Member_number',␣
,→'Date'])['itemDescription'].transform(lambda x: ','.join(x))

df

[27]: Member_number Date \


0 1808 2015-07-21
1 2552 2015-05-01
2 2300 2015-09-19
3 1187 2015-12-12
4 3037 2015-01-02
… … …
38760 4471 2014-08-10
38761 2022 2014-02-23
38762 1097 2014-04-16
38763 1510 2014-03-12
38764 1521 2014-12-26

2
itemDescription
0 tropical fruit,rolls/buns,candy,tropical fruit…
1 whole milk,tropical fruit,chocolate,whole milk…
2 pip fruit,other vegetables,flour,pip fruit,oth…
3 other vegetables,onions,shopping bags,other ve…
4 whole milk,other vegetables,white bread,whole …
… …
38760 whole milk,yogurt,sliced cheese,whole milk,yog…
38761 cat food,yogurt,candy,cat food,yogurt,candy,ca…
38762 sausage,whole milk,cake bar,sausage,whole milk…
38763 beef,canned beer,fruit/vegetable juice,beef,ca…
38764 ham,seasonal products,cat food,ham,seasonal pr…

[38765 rows x 3 columns]

[34]: #Let's now create a list of the transactions so that we can transform our data␣
,→into the correct format using TransactionEncoder.

df1=[]
for i in range(0,len(df)-1):
data = df['itemDescription'][i].split(',')
df1.append(data)

[73]: #Applying transaction encoder to our data set


te = TransactionEncoder()
te_ary = te.fit(df1).transform(df1)
df2 = pd.DataFrame(te_ary, columns=te.columns_)
df2

[73]: Instant food products UHT-milk abrasive cleaner artif. sweetener \


0 False False False False
1 False False False False
2 False False False False
3 False False False False
4 False False False False
… … … … …
38759 False False False False
38760 False False False False
38761 False False False False
38762 False False False False
38763 False False False False

baby cosmetics bags baking powder bathroom cleaner beef berries \


0 False False False False False False
1 False False False False False False
2 False False False False False False
3 False False False False False False

3
4 False False False False False False
… … … … … … …
38759 False False False False True False
38760 False False False False False False
38761 False False False False False False
38762 False False False False False False
38763 False False False False True False

… turkey vinegar waffles whipped/sour cream whisky white bread \


0 … False False False False False False
1 … False False False False False False
2 … False False False False False False
3 … False False False False False False
4 … False False False False False True
… … … … … … … …
38759 … False False False False False False
38760 … False False False False False False
38761 … False False False False False False
38762 … False False False False False False
38763 … False False False False False False

white wine whole milk yogurt zwieback


0 False False False False
1 False True False False
2 False False False False
3 False False False False
4 False True False False
… … … … …
38759 False False False False
38760 False True True False
38761 False False True False
38762 False True False False
38763 False False False False

[38764 rows x 167 columns]

[71]: from mlxtend.frequent_patterns import fpgrowth


fpgrowth(df2, min_support=0.02 , use_colnames = True)

[71]: support itemsets


0 0.127954 (rolls/buns)
1 0.079326 (tropical fruit)
2 0.183753 (whole milk)
3 0.028093 (chocolate)
4 0.137679 (other vegetables)
5 0.057708 (pip fruit)
6 0.056444 (shopping bags)

4
7 0.023140 (onions)
8 0.028841 (white bread)
9 0.060649 (citrus fruit)
10 0.040140 (fruit/vegetable juice)
11 0.069497 (bottled water)
12 0.048344 (whipped/sour cream)
13 0.046048 (newspapers)
14 0.112269 (soda)
15 0.060623 (pastry)
16 0.101434 (yogurt)
17 0.054742 (bottled beer)
18 0.039624 (beef)
19 0.055954 (canned beer)
20 0.047983 (frankfurter)
21 0.044165 (brown bread)
22 0.033072 (chicken)
23 0.022315 (waffles)
24 0.041895 (butter)
25 0.024533 (UHT-milk)
26 0.025384 (hamburger meat)
27 0.020638 (frozen meals)
28 0.038464 (margarine)
29 0.026287 (napkins)
30 0.040914 (curd)
31 0.021128 (long life bakery product)
32 0.020586 (butter milk)
33 0.036658 (coffee)
34 0.044139 (domestic eggs)
35 0.074580 (sausage)
36 0.022289 (salty snack)
37 0.079404 (root vegetables)
38 0.043778 (pork)
39 0.033201 (frozen vegetables)
40 0.026029 (dessert)
41 0.027035 (cream cheese )
42 0.024739 (berries)
43 0.021128 (sugar)
44 0.022237 (whole milk, rolls/buns)
45 0.022340 (other vegetables, whole milk)

[75]: ## Compute the association rules based on the frequent itemsets


association_rules(frequent_itemsets, metric="confidence", min_threshold=0.02)

[75]: antecedents consequents antecedent support \


0 (other vegetables) (whole milk) 0.137679
1 (whole milk) (other vegetables) 0.183753
2 (whole milk) (rolls/buns) 0.183753

5
3 (rolls/buns) (whole milk) 0.127954

consequent support support confidence lift leverage conviction


0 0.183753 0.022340 0.162263 0.883052 -0.002959 0.974348
1 0.137679 0.022340 0.121578 0.883052 -0.002959 0.981670
2 0.127954 0.022237 0.121016 0.945782 -0.001275 0.992108
3 0.183753 0.022237 0.173790 0.945782 -0.001275 0.987942

You might also like