0% found this document useful (0 votes)

136 views8 pages

KDD & Data Mining: Lab Experiment No 7: FP Growth Algorithm Name: - Gaurav Sonawane PRN:-20200802154

The document describes an experiment using the FP-Growth algorithm to find frequent itemsets in a market basket dataset. It involves creating FP-Tree and mining functions to implement FP-Growth from scratch and comparing the results to using an existing library implementation. The dataset is preprocessed and frequent itemsets above a minimum support threshold are extracted.

Uploaded by

Prathamesh More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

136 views8 pages

KDD & Data Mining: Lab Experiment No 7: FP Growth Algorithm Name: - Gaurav Sonawane PRN:-20200802154

Uploaded by

Prathamesh More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

KDD & Data Mining

Lab Experiment No 7 : FP Growth Algorithm

Name: - Gaurav Sonawane

PRN:- 20200802154

Aim : To perform FP Growth Algorithm on the given dataset (market-basket-optimisation.csv) using

1. By creating functions.
2. By using NumPy and FP Growth library.

And validating the results

By Creating Function

import numpy as np
In [ ]:
import pandas as pd
from futureimport division, print_function from mlxtend.preprocessing import TransactionEnc
from mlxtend.frequent_patterns import association_rules
import warnings
warnings.filterwarnings('ignore')

%pip install mlxtend --upgrade

In [ ]:

In [ ]: # Define the Node class for the FP-Tree

class Node:
def init (self, item, count, parent): self.item = item
self.count = count self.parent = parent self.children = {}

def increment(self, count): self.count += count

# Create the FP-Tree

def create_fp_tree(data, min_support):
# Count the frequency of each item in the dataset
item_counts = {}
for transaction in data:
for item in transaction:
if item in item_counts:
item_counts[item] += 1
else:
item_counts[item] = 1

# Remove infrequent items from the dataset

data = [[item for item in transaction if item_counts[item] >= min_support] for trans
Loading [MathJax]/extensions/Safe.js
# Sort the items in each transaction by their frequency
data = [sorted(transaction, key=lambda item: item_counts[item], reverse=True) for tr

# Create the root node of the FP-Tree

root = Node(None, 0, None)

# Add each transaction to the FP-Tree

for transaction in data: current_node = root
for item in transaction:
if item in current_node.children:
child_node = current_node.children[item] child_node.increment(1)
else:
child_node = Node(item, 1, current_node) current_node.children[item] = child_node
current_node = child_node

return root, item_counts

In [ ]: # Define the FP-Growth algorithm

def fp_growth(data, min_support):
# Create the FP-Tree
root, item_counts = create_fp_tree(data, min_support)

# Mine the FP-Tree for frequent itemsets

itemset_list = []
mine_fp_tree(root, [], itemset_list, min_support)

# Return the frequent itemsets and their counts

return itemset_list

# Define the function to recursively mine the FP-Tree for frequent itemsets
def mine_fp_tree(node, prefix, itemset_list, min_support):
if node.count >= min_support:
itemset = prefix + [node.item]
itemset_list.append((itemset, node.count))
for child_node in node.children.values():
mine_fp_tree(child_node, prefix + [node.item], itemset_list, min_support)

In [ ]: df = pd.read_csv('/content/Market_Basket_Optimisation.csv', header=None)

transaction = []

for i in df.itertuples():
l = set(list(i))
transaction.append([i for i in l if (str(i)!="nan" and type(i)!=int)])

len(transaction)
7501
Out[ ]
:

In [ itemsets = fp_growth(transaction,150)
]: itemsets
Loading [MathJax]/extensions/Safe.js
Out[ ] [([None, 'mineral water'], 1788),
: ([None, 'mineral water', 'eggs'], 382),
([None, 'mineral water', 'spaghetti'],
341), ([None, 'mineral water',
'chocolate'], 174), ([None, 'eggs'], 966),
([None, 'eggs', 'french fries'], 184),
([None, 'eggs', 'spaghetti'], 167),
([None, 'french fries'], 714),
([None, 'spaghetti'], 691),
([None, 'cookies'], 305),
([None, 'chocolate'], 434),
([None, 'green tea'], 360),
([None, 'escalope'], 177),
([None, 'milk'], 215)]

Using Libraries in-built func

In [ df = pd.read_csv("/content/Market_Basket_Optimisation.csv", names=[i for i in

]: range(20)] df

Out[ ]: 0 1 2 3 4 5 6 7 8 9 10 11

whole low
vegetables green cottage energy tomato green
0 shrimp almonds avocado weat yams fat
mix grapes cheese drink juice tea
flour yogurt

1burgersmeatballs eggs NaN NaNNaNNaN NaNNaNNaNNaNNaN

2 chutney NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

3 turkeyavocado NaN NaN NaNNaNNaN NaNNaNNaNNaNNaN

mineral energy whole

4 water milk green tea NaN NaN NaN NaN NaN NaN NaN
bar wheat rice

... ... ... ... ... ... ... ... ... ... ... ... ...
fresh
7496 butter light mayo
bread NaN NaN NaN NaN NaN NaN NaN NaN NaN

frozen vegetables french green

7497burgers eggs NaN NaNNaNNaNNaNNaN
fries magazines tea

7498 chicken NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

7499 escalopegreen tea NaN NaN NaNNaNNaN NaNNaNNaNNaNNaN

frozen
7500 eggs
smoothie yogurt low fat
cake yogurt NaN NaN NaN NaN NaN NaN NaN NaN

7501 rows × 20 columns

In [ ]: transaction = []

for i in
df.itertuples(): l =
set(list(i))
transaction.append([i for i in l if (str(i)!="nan" and type(i)!=int)])

7501
Out[ ]
:
In [ ]: t = TransactionEncoder()
t_arr = t.fit_transform(transaction)

data = pd.DataFrame(t_arr, columns=t.columns_)

Loading [MathJax]/extensions/Safe.js
Out[ ]:
antioxydant babie barbecue black
asparagus almonds bacon ...
asparagus avocado s food sauce blueberries
juice tea

0 False True True False True False False False False False ...

1 False False False False False False False False False False ...

2 False False False False False False False False False False ...

3 False False False False True False False False False False ...

4 False False False False False False False False False False ...

... ... ... ... ... ... ... ... ... ... ... ...

7496 False False False False False False False False False False ...

7497 False False False False False False False False False False ...

7498 False False False False False False False False False False ...

7499 False False False False False False False False False False ...

7500 False False False False False False False False False False ...

7501 rows × 120 columns

In [ ]: res = fpgrowth(data, min_support=0.05, use_colnames=True)

res

Loading [MathJax]/extensions/Safe.js
Out[ ]: support itemsets

0 0.238368 (mineral water)

1 0.132116 (green tea)

2 0.076523 (low fat yogurt)

3 0.071457 (shrimp)

4 0.065858 (olive oil)

5 0.063325 (frozen smoothie)

6 0.179709 (eggs)

7 0.087188 (burgers)

8 0.062525 (turkey)

9 0.129583 (milk)

10 0.058526 (whole wheat rice)

11 0.170911 (french fries)

12 0.050527 (soup)

13 0.174110 (spaghetti)

14 0.095321 (frozen vegetables)

15 0.080389 (cookies)

16 0.051060 (cooking oil)

17 0.163845 (chocolate)

18 0.059992 (chicken)

19 0.068391 (tomatoes)

20 0.095054 (pancakes)

21 0.052393 (grated cheese)

22 0.098254 (ground beef)

23 0.079323 (escalope)

24 0.081056 (cake)

25 0.050927 (mineral water, eggs)

26 0.059725 (mineral water, spaghetti)

27 0.052660 (mineral water, chocolate)

In [ ]: res = association_rules(res,metric="confidence", min_threshold=0.06)

res

Loading [MathJax]/extensions/Safe.js
Out[ ] antecedent
antecedents consequents consequent
: support confidence lift leverage conviction
support support

(mineral
0 water) (eggs) 0.238368 0.179709 0.050927 0.213647 1.188845 0.008090 1.043158

(mineral water)
1 (eggs) 0.179709 0.238368 0.0509270.283383 1.188845 0.0080901.062815

(mineral
2 water) (spaghetti) 0.238368 0.174110 0.059725 0.250559 1.439085 0.018223 1.102008

(mineral water)
3(spaghetti) 0.174110 0.238368 0.0597250.343032 1.439085 0.0182231.159314

(mineral
4 water) (chocolate) 0.238368 0.163845 0.052660 0.220917 1.348332 0.013604 1.073256

(mineral water)
5(chocolate) 0.163845 0.238368 0.0526600.321400 1.348332 0.0136041.122357

Loading [MathJax]/extensions/Safe.js

943EMH (Stage IIIA) - Cummins - Elevated Cab
No ratings yet
943EMH (Stage IIIA) - Cummins - Elevated Cab
2 pages
Python Codes Arules
100% (1)
Python Codes Arules
17 pages
Acknowledgement Thesis Sample Friends
100% (2)
Acknowledgement Thesis Sample Friends
5 pages
Master in Business For Architecture and Design
No ratings yet
Master in Business For Architecture and Design
27 pages
Lecture 6 - Association Analysis
No ratings yet
Lecture 6 - Association Analysis
62 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Management Reporting System and Its Evaluation
75% (4)
Management Reporting System and Its Evaluation
6 pages
R - Practical
No ratings yet
R - Practical
50 pages
BDA Experiments
No ratings yet
BDA Experiments
41 pages
DSP Syllabus
No ratings yet
DSP Syllabus
10 pages
Production and Operations Management 5th Edition S. N. Chary Ebook All Chapters PDF
100% (5)
Production and Operations Management 5th Edition S. N. Chary Ebook All Chapters PDF
55 pages
Sports Calendar 2022-23
No ratings yet
Sports Calendar 2022-23
11 pages
Module 3 - Part 2 - Frequency Pattern Mining
No ratings yet
Module 3 - Part 2 - Frequency Pattern Mining
51 pages
Case Study On ChatGPT
No ratings yet
Case Study On ChatGPT
4 pages
Fa22-Bcs-025 MOAZ Assignment 1
No ratings yet
Fa22-Bcs-025 MOAZ Assignment 1
9 pages
Modified Frequent Pattern Mining From Data Stream
No ratings yet
Modified Frequent Pattern Mining From Data Stream
38 pages
Lab-7 DM
No ratings yet
Lab-7 DM
7 pages
Lab-7 DM
No ratings yet
Lab-7 DM
7 pages
Estimating Frequent Patterns Using FP-Growth On A Transactional Data Stream
No ratings yet
Estimating Frequent Patterns Using FP-Growth On A Transactional Data Stream
3 pages
Chota Bheem
No ratings yet
Chota Bheem
6 pages
Weantuday: T Deuhh Anytha
No ratings yet
Weantuday: T Deuhh Anytha
23 pages
Writ of Summons
100% (10)
Writ of Summons
17 pages
Advanced Big Data Assignment 3
No ratings yet
Advanced Big Data Assignment 3
3 pages
Big Data Prcatical
No ratings yet
Big Data Prcatical
3 pages
DWM Exp8
No ratings yet
DWM Exp8
8 pages
SYLLABUS OCS 4410 Fall 2022
No ratings yet
SYLLABUS OCS 4410 Fall 2022
4 pages
DMC Lab Ex - 1 To 15 (31.03.2024)
No ratings yet
DMC Lab Ex - 1 To 15 (31.03.2024)
52 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
56 pages
Task-4: Algorithm
No ratings yet
Task-4: Algorithm
4 pages
Prediction of Sales On Market Basket Data Using: Machine Learning Techniques (Apriori and FP Growth)
No ratings yet
Prediction of Sales On Market Basket Data Using: Machine Learning Techniques (Apriori and FP Growth)
23 pages
ML 4
No ratings yet
ML 4
13 pages
Prepared Food Photos, Inc V New Kianis Pizza & Subs, Inc: Judgment Entered $51,461.50
No ratings yet
Prepared Food Photos, Inc V New Kianis Pizza & Subs, Inc: Judgment Entered $51,461.50
5 pages
AIML Assignment
No ratings yet
AIML Assignment
5 pages
ToT Agri Entrepreneurship
No ratings yet
ToT Agri Entrepreneurship
1 page
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
24 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
15th QN
No ratings yet
15th QN
3 pages
Notes From preMBA Stats
No ratings yet
Notes From preMBA Stats
4 pages
FP Growth Alg
No ratings yet
FP Growth Alg
17 pages
Data Mining Unit 2 Assignment
No ratings yet
Data Mining Unit 2 Assignment
15 pages
Lab3 Data Mining
No ratings yet
Lab3 Data Mining
2 pages
Market Basket Analysis Using Improved FP-tree
No ratings yet
Market Basket Analysis Using Improved FP-tree
4 pages
Experiment - 1.4
No ratings yet
Experiment - 1.4
2 pages
Tutorial 02
No ratings yet
Tutorial 02
17 pages
Data Mining Lab Record
No ratings yet
Data Mining Lab Record
18 pages
Digest By: Shimi Fortuna Ali Akang Vs Municipality of Isulan
No ratings yet
Digest By: Shimi Fortuna Ali Akang Vs Municipality of Isulan
2 pages
DWM Exp10 - 96
No ratings yet
DWM Exp10 - 96
11 pages
Lab Manual 4
No ratings yet
Lab Manual 4
23 pages
Docslide - Us New Database1
No ratings yet
Docslide - Us New Database1
274 pages
DMT Cia2
No ratings yet
DMT Cia2
11 pages
Prac7 8 9 10
No ratings yet
Prac7 8 9 10
12 pages
Grapes Export Project
100% (9)
Grapes Export Project
55 pages
Ex 1
No ratings yet
Ex 1
8 pages
Data Mining Unit 2 (Part 2) - 1
No ratings yet
Data Mining Unit 2 (Part 2) - 1
7 pages
DM Exp 1.42637
No ratings yet
DM Exp 1.42637
3 pages
FP Growth
No ratings yet
FP Growth
21 pages
Black Book Introduction
No ratings yet
Black Book Introduction
23 pages
DVT Exp - 7
No ratings yet
DVT Exp - 7
11 pages
Apriori - Mlxtend
No ratings yet
Apriori - Mlxtend
4 pages
Ds 2
No ratings yet
Ds 2
3 pages
Lab Manual JAVA
No ratings yet
Lab Manual JAVA
133 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Financial Statement Analysis: Abid Hussain
No ratings yet
Financial Statement Analysis: Abid Hussain
14 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
1 Lab Program 4 2 Vinay Sirohi 3 2139472: December 1, 2021
No ratings yet
1 Lab Program 4 2 Vinay Sirohi 3 2139472: December 1, 2021
6 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
BSBWRT401 - Assessment 2 Template
No ratings yet
BSBWRT401 - Assessment 2 Template
13 pages
Program
No ratings yet
Program
2 pages
Code:: To Find Frequent Itemsets and Association Between Different Itemsets Using Apriori Algorithm
No ratings yet
Code:: To Find Frequent Itemsets and Association Between Different Itemsets Using Apriori Algorithm
28 pages
Algorithm
No ratings yet
Algorithm
8 pages
FP Growth Algorithm
No ratings yet
FP Growth Algorithm
17 pages
Module 08 Fixture I
100% (1)
Module 08 Fixture I
34 pages
DNN Assignment4
No ratings yet
DNN Assignment4
1 page
The Basics of "Criminal Trial" Q and A-Part-III)
No ratings yet
The Basics of "Criminal Trial" Q and A-Part-III)
5 pages
Monkey Ladder
No ratings yet
Monkey Ladder
1 page
FP Growth Presentation v1 (Handout)
No ratings yet
FP Growth Presentation v1 (Handout)
10 pages
KDD Lab 7 2214
No ratings yet
KDD Lab 7 2214
6 pages
RESUME - Payam Rahrow
No ratings yet
RESUME - Payam Rahrow
2 pages
E-Note 28879 Content Document 20241209125940PM
No ratings yet
E-Note 28879 Content Document 20241209125940PM
20 pages
Sample of An Information For Malversation of Public Funds and Property
No ratings yet
Sample of An Information For Malversation of Public Funds and Property
7 pages
SEPM Format
No ratings yet
SEPM Format
7 pages
Book Tool: Kickoff Meeting Template
No ratings yet
Book Tool: Kickoff Meeting Template
7 pages
Why Is The BSP The Main Government Agency Responsible For Promoting Price Stability
No ratings yet
Why Is The BSP The Main Government Agency Responsible For Promoting Price Stability
4 pages
Data Mining Ex1
No ratings yet
Data Mining Ex1
10 pages
Who Is Ferdinand Marcos
No ratings yet
Who Is Ferdinand Marcos
1 page
Dbms All 8 Assignments
No ratings yet
Dbms All 8 Assignments
33 pages
Split Data
No ratings yet
Split Data
5 pages
Gas Treating Technology Comparison GPA 2008
No ratings yet
Gas Treating Technology Comparison GPA 2008
12 pages
Niact 2
No ratings yet
Niact 2
25 pages
Managing Blind: A Data Quality and Data Governance Vade Mecum
From Everand
Managing Blind: A Data Quality and Data Governance Vade Mecum
Peter Benson
No ratings yet
Книга здоровых рождественских рецептов для диабетиков - 47 рецептов, чтобы отпраздновать без забот
From Everand
Книга здоровых рождественских рецептов для диабетиков - 47 рецептов, чтобы отпраздновать без забот
Liwra
No ratings yet
A Study On Job Satisfaction of Employees at
No ratings yet
A Study On Job Satisfaction of Employees at
6 pages
Case 2 and 3 For Practice of Profession
No ratings yet
Case 2 and 3 For Practice of Profession
3 pages
Earthing Specification
No ratings yet
Earthing Specification
7 pages
25 Macrobiotic-Friendly Recipes - part 2: From Smoothies and Soups to delicious Rice dishes and Salads - measurements in grams
From Everand
25 Macrobiotic-Friendly Recipes - part 2: From Smoothies and Soups to delicious Rice dishes and Salads - measurements in grams
Mattis Lundqvist
No ratings yet
How To Solve The Rubik's Cube
No ratings yet
How To Solve The Rubik's Cube
23 pages
IAPP Cipm - Instructiuni Tematica Si Examen
No ratings yet
IAPP Cipm - Instructiuni Tematica Si Examen
7 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages

KDD & Data Mining: Lab Experiment No 7: FP Growth Algorithm Name: - Gaurav Sonawane PRN:-20200802154

Uploaded by

KDD & Data Mining: Lab Experiment No 7: FP Growth Algorithm Name: - Gaurav Sonawane PRN:-20200802154

Uploaded by

KDD & Data Mining

Lab Experiment No 7 : FP Growth Algorithm

Name: - Gaurav Sonawane

Aim : To perform FP Growth Algorithm on the given dataset (market-basket-optimisation.csv) using

And validating the results

%pip install mlxtend --upgrade

In [ ]: # Define the Node class for the FP-Tree

def increment(self, count): self.count += count

# Create the FP-Tree

# Remove infrequent items from the dataset

# Create the root node of the FP-Tree

# Add each transaction to the FP-Tree

return root, item_counts

In [ ]: # Define the FP-Growth algorithm

# Mine the FP-Tree for frequent itemsets

# Return the frequent itemsets and their counts

Using Libraries in-built func

In [ df = pd.read_csv("/content/Market_Basket_Optimisation.csv", names=[i for i in

1burgersmeatballs eggs NaN NaNNaNNaN NaNNaNNaNNaNNaN

3 turkeyavocado NaN NaN NaNNaNNaN NaNNaNNaNNaNNaN

mineral energy whole

frozen vegetables french green

7499 escalopegreen tea NaN NaN NaNNaNNaN NaNNaNNaNNaNNaN

7501 rows × 20 columns

data = pd.DataFrame(t_arr, columns=t.columns_)

7501 rows × 120 columns

In [ ]: res = fpgrowth(data, min_support=0.05, use_colnames=True)

0 0.238368 (mineral water)

1 0.132116 (green tea)

2 0.076523 (low fat yogurt)

4 0.065858 (olive oil)

5 0.063325 (frozen smoothie)

10 0.058526 (whole wheat rice)

11 0.170911 (french fries)

14 0.095321 (frozen vegetables)

16 0.051060 (cooking oil)

21 0.052393 (grated cheese)

22 0.098254 (ground beef)

25 0.050927 (mineral water, eggs)

26 0.059725 (mineral water, spaghetti)

27 0.052660 (mineral water, chocolate)

In [ ]: res = association_rules(res,metric="confidence", min_threshold=0.06)

You might also like