0% found this document useful (0 votes)
38 views6 pages

LAB ASSIGNMENT 4.ipynb - Colab

The document outlines a lab assignment involving data preprocessing and analysis using the Apriori algorithm on a retail dataset. It includes steps for transforming transaction data, finding frequent itemsets with a minimum support of 0.20, and generating association rules with a minimum confidence threshold of 50%. The output includes sample transactions, transformed data, frequent itemsets, and association rules.

Uploaded by

Jay Wardhan Suri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views6 pages

LAB ASSIGNMENT 4.ipynb - Colab

The document outlines a lab assignment involving data preprocessing and analysis using the Apriori algorithm on a retail dataset. It includes steps for transforming transaction data, finding frequent itemsets with a minimum support of 0.20, and generating association rules with a minimum confidence threshold of 50%. The output includes sample transactions, transformed data, frequent itemsets, and association rules.

Uploaded by

Jay Wardhan Suri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

10/02/2025, 22:26 LAB ASSIGNMENT 4.

ipynb - Colab

!pip install mlxtend

Show hidden output

import pandas as pd
from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules
import matplotlib.pyplot as plt
%matplotlib inline

from google.colab import files


uploaded = files.upload()

import io

dataset = pd.read_csv(io.BytesIO(uploaded['retail_dataset.csv']))

Show hidden output

keyboard_arrow_down Question 1 – Data Preprocessing


1(a) Preprocess the data and list the products
Assuming each row represents a transaction, we use the apply function to list the non-null items.

transactions = dataset.apply(lambda row: row.dropna().tolist(), axis=1).tolist()

print("Sample transactions:")
for t in transactions[:5]:
print(t)

Sample transactions:
['Bread', 'Wine', 'Eggs', 'Meat', 'Cheese', 'Pencil', 'Diaper']
['Bread', 'Cheese', 'Meat', 'Diaper', 'Wine', 'Milk', 'Pencil']
['Cheese', 'Meat', 'Eggs', 'Milk', 'Wine']
['Cheese', 'Meat', 'Eggs', 'Milk', 'Wine']
['Meat', 'Pencil', 'Wine']
/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call
and should_run_async(code)

1(b) Transform the transaction list using TransactionEncoder

te = TransactionEncoder()
te_ary = te.fit(transactions).transform(transactions)

df_transformed = pd.DataFrame(te_ary, columns=te.columns_)

print("Transformed DataFrame (first 5 rows):")


df_transformed.head()

Transformed DataFrame (first 5 rows):


/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call
and should_run_async(code)
Bagel Bread Cheese Diaper Eggs Meat Milk Pencil Wine

0 False True True True True True False True True

1 False True True True False True True True True

2 False False True False True True True False True

3 False False True False True True True False True

4 False False False False False True False True True

Next steps: Generate code with df_transformed toggle_off View recommended plots New interactive sheet

keyboard_arrow_down Question 2 – Apriori and Association Rules on the Preprocessed Data


https://colab.research.google.com/drive/1nS6c0Fa2quWZ7KdnOzm7Nyv_oVcUbxFl#scrollTo=3f80c9c8-f7f4-4f2b-92a5-6a3bdcf2af0d&printMode=true 1/6
10/02/2025, 22:26 LAB ASSIGNMENT 4.ipynb - Colab

2(a) Find frequent itemsets using the apriori algorithm with a minimum support of 0.20

frequent_itemsets = apriori(df_transformed, min_support=0.2, use_colnames=True)


print("Frequent Itemsets with min_support=0.2:")
frequent_itemsets

Frequent Itemsets with min_support=0.2:


/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call
and should_run_async(code)
support itemsets

0 0.425397 (Bagel)

1 0.504762 (Bread)

2 0.501587 (Cheese)

3 0.406349 (Diaper)

4 0.438095 (Eggs)

5 0.476190 (Meat)

6 0.501587 (Milk)

7 0.361905 (Pencil)

8 0.438095 (Wine)

9 0.279365 (Bagel, Bread)

10 0.225397 (Bagel, Milk)

11 0.238095 (Cheese, Bread)

12 0.231746 (Diaper, Bread)

13 0.206349 (Meat, Bread)

14 0.279365 (Milk, Bread)

15 0.200000 (Bread, Pencil)

16 0.244444 (Wine, Bread)

17 0.200000 (Cheese, Diaper)

18 0.298413 (Cheese, Eggs)

19 0.323810 (Cheese, Meat)

20 0.304762 (Cheese, Milk)

21 0.200000 (Cheese, Pencil)

22 0.269841 (Wine, Cheese)

23 0.234921 (Wine, Diaper)

24 0.266667 (Meat, Eggs)

25 0.244444 (Milk, Eggs)

26 0.241270 (Wine, Eggs)

27 0.244444 (Milk, Meat)

28 0.250794 (Wine, Meat)

29 0.219048 (Wine, Milk)

30 0.200000 (Wine, Pencil)

31 0.215873 (Cheese, Meat, Eggs)

32 0.203175 (Cheese, Milk, Meat)

Next steps: Generate code with frequent_itemsets toggle_off View recommended plots New interactive sheet

2(b) Generate association rules with a minimum confidence threshold of 50%

rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)


print("Association Rules (min confidence=0.5):")
rules

https://colab.research.google.com/drive/1nS6c0Fa2quWZ7KdnOzm7Nyv_oVcUbxFl#scrollTo=3f80c9c8-f7f4-4f2b-92a5-6a3bdcf2af0d&printMode=true 2/6
10/02/2025, 22:26 LAB ASSIGNMENT 4.ipynb - Colab
Association Rules (min confidence=0.5):
/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call
and should_run_async(code)
antecedent consequent
antecedents consequents support confidence lift representativity leverage conviction
support support

0 (Bagel) (Bread) 0.425397 0.504762 0.279365 0.656716 1.301042 1.0 0.064641 1.442650

1 (Bread) (Bagel) 0.504762 0.425397 0.279365 0.553459 1.301042 1.0 0.064641 1.286787

2 (Bagel) (Milk) 0.425397 0.501587 0.225397 0.529851 1.056348 1.0 0.012023 1.060116

3 (Diaper) (Bread) 0.406349 0.504762 0.231746 0.570312 1.129864 1.0 0.026636 1.152554

4 (Milk) (Bread) 0.501587 0.504762 0.279365 0.556962 1.103415 1.0 0.026183 1.117823

5 (Bread) (Milk) 0.504762 0.501587 0.279365 0.553459 1.103415 1.0 0.026183 1.116164

6 (Pencil) (Bread) 0.361905 0.504762 0.200000 0.552632 1.094836 1.0 0.017324 1.107003

7 (Wine) (Bread) 0.438095 0.504762 0.244444 0.557971 1.105414 1.0 0.023311 1.120375

8 (Cheese) (Eggs) 0.501587 0.438095 0.298413 0.594937 1.358008 1.0 0.078670 1.387202

9 (Eggs) (Cheese) 0.438095 0.501587 0.298413 0.681159 1.358008 1.0 0.078670 1.563203

10 (Cheese) (Meat) 0.501587 0.476190 0.323810 0.645570 1.355696 1.0 0.084958 1.477891

11 (Meat) (Cheese) 0.476190 0.501587 0.323810 0.680000 1.355696 1.0 0.084958 1.557540

12 (Cheese) (Milk) 0.501587 0.501587 0.304762 0.607595 1.211344 1.0 0.053172 1.270148

13 (Milk) (Cheese) 0.501587 0.501587 0.304762 0.607595 1.211344 1.0 0.053172 1.270148

14 (Pencil) (Cheese) 0.361905 0.501587 0.200000 0.552632 1.101765 1.0 0.018473 1.114099

15 (Wine) (Cheese) 0.438095 0.501587 0.269841 0.615942 1.227986 1.0 0.050098 1.297754

16 (Cheese) (Wine) 0.501587 0.438095 0.269841 0.537975 1.227986 1.0 0.050098 1.216177

17 (Wine) (Diaper) 0.438095 0.406349 0.234921 0.536232 1.319633 1.0 0.056901 1.280060

18 (Diaper) (Wine) 0.406349 0.438095 0.234921 0.578125 1.319633 1.0 0.056901 1.331922

19 (Meat) (Eggs) 0.476190 0.438095 0.266667 0.560000 1.278261 1.0 0.058050 1.277056

20 (Eggs) (Meat) 0.438095 0.476190 0.266667 0.608696 1.278261 1.0 0.058050 1.338624

21 (Eggs) (Milk) 0.438095 0.501587 0.244444 0.557971 1.112411 1.0 0.024701 1.127557

22 (Wine) (Eggs) 0.438095 0.438095 0.241270 0.550725 1.257089 1.0 0.049342 1.250691

23 (Eggs) (Wine) 0.438095 0.438095 0.241270 0.550725 1.257089 1.0 0.049342 1.250691

24 (Meat) (Milk) 0.476190 0.501587 0.244444 0.513333 1.023418 1.0 0.005593 1.024136

25 (Wine) (Meat) 0.438095 0.476190 0.250794 0.572464 1.202174 1.0 0.042177 1.225182

26 (Meat) (Wine) 0.476190 0.438095 0.250794 0.526667 1.202174 1.0 0.042177 1.187123

27 (Wine) (Milk) 0.438095 0.501587 0.219048 0.500000 0.996835 1.0 -0.000695 0.996825

28 (Pencil) (Wine) 0.361905 0.438095 0.200000 0.552632 1.261442 1.0 0.041451 1.256022

(Cheese,
29 (Eggs) 0.323810 0.438095 0.215873 0.666667 1.521739 1.0 0.074014 1.685714
Meat)

(Cheese,
30 (Meat) 0.298413 0.476190 0.215873 0.723404 1.519149 1.0 0.073772 1.893773
Eggs)

31 (Meat, Eggs) (Cheese) 0.266667 0.501587 0.215873 0.809524 1.613924 1.0 0.082116 2.616667

32 (Cheese, Milk) (Meat) 0.304762 0.476190 0.203175 0.666667 1.400000 1.0 0.058050 1.571429

(Cheese,
33 (Milk) 0.323810 0.501587 0.203175 0.627451 1.250931 1.0 0.040756 1.337845
Meat)

34 (Milk, Meat) (Cheese) 0.244444 0.501587 0.203175 0.831169 1.657077 1.0 0.080564 2.952137

Next steps: Generate code with rules toggle_off View recommended plots New interactive sheet

2(c) Generate association rules using the lift metric with a minimum threshold of 0.65

lift_rules = association_rules(frequent_itemsets, metric="lift", min_threshold=0.65)


print("Association Rules (min lift=0.65):")
lift_rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']]

https://colab.research.google.com/drive/1nS6c0Fa2quWZ7KdnOzm7Nyv_oVcUbxFl#scrollTo=3f80c9c8-f7f4-4f2b-92a5-6a3bdcf2af0d&printMode=true 3/6
10/02/2025, 22:26 LAB ASSIGNMENT 4.ipynb - Colab
Association Rules (min lift=0.65):
/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call
and should_run_async(code)
antecedents consequents support confidence lift

0 (Bagel) (Bread) 0.279365 0.656716 1.301042

1 (Bread) (Bagel) 0.279365 0.553459 1.301042

2 (Bagel) (Milk) 0.225397 0.529851 1.056348

3 (Milk) (Bagel) 0.225397 0.449367 1.056348

4 (Cheese) (Bread) 0.238095 0.474684 0.940411

5 (Bread) (Cheese) 0.238095 0.471698 0.940411

6 (Diaper) (Bread) 0.231746 0.570312 1.129864

7 (Bread) (Diaper) 0.231746 0.459119 1.129864

8 (Meat) (Bread) 0.206349 0.433333 0.858491

9 (Bread) (Meat) 0.206349 0.408805 0.858491

10 (Milk) (Bread) 0.279365 0.556962 1.103415

11 (Bread) (Milk) 0.279365 0.553459 1.103415

12 (Bread) (Pencil) 0.200000 0.396226 1.094836

13 (Pencil) (Bread) 0.200000 0.552632 1.094836

14 (Wine) (Bread) 0.244444 0.557971 1.105414

15 (Bread) (Wine) 0.244444 0.484277 1.105414

16 (Cheese) (Diaper) 0.200000 0.398734 0.981260

17 (Diaper) (Cheese) 0.200000 0.492188 0.981260

18 (Cheese) (Eggs) 0.298413 0.594937 1.358008

19 (Eggs) (Cheese) 0.298413 0.681159 1.358008

20 (Cheese) (Meat) 0.323810 0.645570 1.355696

21 (Meat) (Cheese) 0.323810 0.680000 1.355696

22 (Cheese) (Milk) 0.304762 0.607595 1.211344

23 (Milk) (Cheese) 0.304762 0.607595 1.211344

24 (Cheese) (Pencil) 0.200000 0.398734 1.101765

25 (Pencil) (Cheese) 0.200000 0.552632 1.101765

26 (Wine) (Cheese) 0.269841 0.615942 1.227986

27 (Cheese) (Wine) 0.269841 0.537975 1.227986

28 (Wine) (Diaper) 0.234921 0.536232 1.319633

29 (Diaper) (Wine) 0.234921 0.578125 1.319633

30 (Meat) (Eggs) 0.266667 0.560000 1.278261

31 (Eggs) (Meat) 0.266667 0.608696 1.278261

32 (Milk) (Eggs) 0.244444 0.487342 1.112411

33 (Eggs) (Milk) 0.244444 0.557971 1.112411

34 (Wine) (Eggs) 0.241270 0.550725 1.257089

35 (Eggs) (Wine) 0.241270 0.550725 1.257089

36 (Milk) (Meat) 0.244444 0.487342 1.023418

37 (Meat) (Milk) 0.244444 0.513333 1.023418

38 (Wine) (Meat) 0.250794 0.572464 1.202174

39 (Meat) (Wine) 0.250794 0.526667 1.202174

40 (Wine) (Milk) 0.219048 0.500000 0.996835

41 (Milk) (Wine) 0.219048 0.436709 0.996835

42 (Wine) (Pencil) 0.200000 0.456522 1.261442

43 (Pencil) (Wine) 0.200000 0.552632 1.261442

44 (Cheese, Meat) (Eggs) 0.215873 0.666667 1.521739

45 (Cheese, Eggs) (Meat) 0.215873 0.723404 1.519149

46 (Meat, Eggs) (Cheese) 0.215873 0.809524 1.613924

https://colab.research.google.com/drive/1nS6c0Fa2quWZ7KdnOzm7Nyv_oVcUbxFl#scrollTo=3f80c9c8-f7f4-4f2b-92a5-6a3bdcf2af0d&printMode=true 4/6
10/02/2025, 22:26 LAB ASSIGNMENT 4.ipynb - Colab
47 (Cheese) (Meat, Eggs) 0.215873 0.430380 1.613924

48 (Meat) (Cheese, Eggs) 0.215873 0.453333 1.519149

49 (Eggs) (Cheese, Meat) 0.215873 0.492754 1.521739

50 (Cheese, Milk) (Meat) 0.203175 0.666667 1.400000

51 (Cheese, Meat) (Milk) 0.203175 0.627451 1.250931

52 (Milk, Meat) (Cheese) 0.203175 0.831169 1.657077

53 (Cheese) (Milk, Meat) 0.203175 0.405063 1.657077

54 (Milk) (Cheese, Meat) 0.203175 0.405063 1.250931

55 (Meat) (Cheese, Milk) 0.203175 0.426667 1.400000

keyboard_arrow_down Question 3 – Processing the Retail Dataset


3(a) Convert the dataset into binary format using one-hot encoding
We first find all unique items and then encode each transaction with 1 if the item is present, 0 otherwise.

all_items = set(item for transaction in transactions for item in transaction)

binary_data = []
for transaction in transactions:
transaction_set = set(transaction)
binary_data.append({item: (1 if item in transaction_set else 0) for item in all_items})

binary_dataset = pd.DataFrame(binary_data)
print("Binary encoded dataset (first 5 rows):")
binary_dataset.head()

Binary encoded dataset (first 5 rows):


/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call
and should_run_async(code)
Bagel Meat Bread Eggs Cheese Diaper Pencil Wine Milk

0 0 1 1 1 1 1 1 1 0

1 0 1 1 0 1 1 1 1 1

2 0 1 0 1 1 0 0 1 1

3 0 1 0 1 1 0 0 1 1

4 0 1 0 0 0 0 1 1 0

Next steps: Generate code with binary_dataset toggle_off View recommended plots New interactive sheet

3(b) Apply the apriori algorithm to the binary dataset with minimum support of 0.2

frequent_itemsets_binary = apriori(binary_dataset, min_support=0.2, use_colnames=True)


print("Frequent Itemsets on binary dataset (min_support=0.2):")
frequent_itemsets_binary

https://colab.research.google.com/drive/1nS6c0Fa2quWZ7KdnOzm7Nyv_oVcUbxFl#scrollTo=3f80c9c8-f7f4-4f2b-92a5-6a3bdcf2af0d&printMode=true 5/6
10/02/2025, 22:26 LAB ASSIGNMENT 4.ipynb - Colab
Frequent Itemsets on binary dataset (min_support=0.2):
/usr/local/lib/python3.11/dist-packages/ipykernel/ipkernel.py:283: DeprecationWarning: `should_run_async` will not call
and should_run_async(code)
/usr/local/lib/python3.11/dist-packages/mlxtend/frequent_patterns/fpcommon.py:161: DeprecationWarning: DataFrames with n
warnings.warn(
support itemsets

0 0.425397 (Bagel)

1 0.476190 (Meat)

2 0.504762 (Bread)

3 0.438095 (Eggs)

4 0.501587 (Cheese)

5 0.406349 (Diaper)

6 0.361905 (Pencil)

7 0.438095 (Wine)

8 0.501587 (Milk)

9 0.279365 (Bagel, Bread)

10 0.225397 (Bagel, Milk)

11 0.206349 (Meat, Bread)

12 0.266667 (Meat, Eggs)

13 0.323810 (Cheese, Meat)

14 0.250794 (Wine, Meat)

15 0.244444 (Milk, Meat)

16 0.238095 (Cheese, Bread)

17 0.231746 (Diaper, Bread)

18 0.200000 (Bread, Pencil)

19 0.244444 (Wine, Bread)

20 0.279365 (Milk, Bread)

21 0.298413 (Cheese, Eggs)

22 0.241270 (Wine, Eggs)

23 0.244444 (Milk, Eggs)

24 0.200000 (Cheese, Diaper)

25 0.200000 (Cheese, Pencil)

26 0.269841 (Wine, Cheese)

27 0.304762 (Cheese, Milk)

28 0.234921 (Wine, Diaper)

29 0.200000 (Wine, Pencil)

30 0.219048 (Wine, Milk)

31 0.215873 (Cheese, Meat, Eggs)

32 0.203175 (Cheese, Milk, Meat)

Next steps: Generate code with frequent_itemsets_binary toggle_off View recommended plots New interactive sheet

3(c) Generate association rules on the binary dataset with a minimum confidence threshold of 0.6 and plot support vs. confidence

association_rules_binary = association_rules(frequent_itemsets_binary, metric="confidence", min_threshold=0.6)


print("Association Rules on binary dataset (min confidence=0.6):")
display(association_rules_binary[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

plt.figure(figsize=(10, 6))
plt.scatter(association_rules_binary['support'], association_rules_binary['confidence'], alpha=0.6)
plt.xlabel('Support')
plt.ylabel('Confidence')
plt.title('Support vs Confidence')
plt.grid(True)
plt.show()

https://colab.research.google.com/drive/1nS6c0Fa2quWZ7KdnOzm7Nyv_oVcUbxFl#scrollTo=3f80c9c8-f7f4-4f2b-92a5-6a3bdcf2af0d&printMode=true 6/6

You might also like