0% found this document useful (0 votes)

11 views9 pages

U3 - FP Trees - 5th Sem - DS

datasciences

Uploaded by

subbumail051

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views9 pages

U3 - FP Trees - 5th Sem - DS

datasciences

Uploaded by

subbumail051

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Unit 3 - FP Trees

Prepared by: Varun Rao (Dean, Data Science & AI)

For: Data Science - 3rd years

As we all know, Apriori is an algorithm for frequent pattern mining that focuses on
generating itemsets and discovering the most frequent itemset. It greatly reduces
the size of the itemset in the database, however, Apriori has its own shortcomings
as well.

The FP-Growth Algorithm proposed by Han in. This is an efficient and scalable method
for mining the complete set of frequent patterns by pattern fragment growth, using an
extended prefix-tree structure for storing compressed and crucial information about
frequent patterns named frequent-pattern tree (FP-tree). In his study, Han proved that
his method outperforms other popular methods for mining frequent patterns, e.g. the
Apriori Algorithm and the TreeProjection. In some later works, it was proved that FP-
Growth performs better than other methods, including Eclat and Relim. The popularity
and efficiency of the FP-Growth Algorithm contribute to many studies that propose
variations to improve its performance.

Han defines the FP-tree as the tree structure given below:

1. One root is labelled as "null" with a set of item-prefix subtrees as children and a
frequent-item-header table.

2. Each node in the item-prefix subtree consists of three fields:

○ Item-name: registers which item is represented by the node;

○ Count: the number of transactions represented by the portion of the path

reaching the node;

○ Node-link: links to the next node in the FP-tree carrying the same item
name or null if there is none.

3. Each entry in the frequent-item-header table consists of two fields:

○ Item-name: as the same to the node;

○ Head of node-link: a pointer to the first node in the FP-tree carrying the
item name.

The construction of a FP-tree is subdivided into three major steps.

1. Scan the data set to determine the support count of each item, discard the
infrequent items and sort the frequent items in decreasing order.
2. Scan the data set one transaction at a time to create the FP-tree. For each
transaction:
1. If it is a unique transaction form a new path and set the counter for each
node to 1.
2. If it shares a common prefix itemset then increment the common itemset
node counters and create new nodes if needed.
3. Continue this until each transaction has been mapped unto the tree.

Shortcomings Of Apriori Algorithm

1. Using Apriori requires a generation of candidate itemsets. These

itemsets may be large in number if the itemset in the database is huge.
2. Apriori needs multiple scans of the database to check the support of
each itemset generated and this leads to high costs.

These shortcomings can be overcome using the FP growth algorithm.

Frequent Pattern Growth Algorithm

This algorithm is an improvement to the Apriori method. A frequent pattern is
generated without the need for candidate generation. FP growth algorithm
represents the database in the form of a tree called a frequent pattern tree or FP
tree.

This tree structure will maintain the association between the itemsets. The database
is fragmented using one frequent item. This fragmented part is called a “pattern
fragment”. The itemsets of these fragmented patterns are analyzed. Thus with this
method, the search for frequent itemsets is reduced comparatively.
FP Tree
Frequent Pattern Tree is a tree-like structure that is made with the initial itemsets of
the database. The purpose of the FP tree is to mine the most frequent pattern. Each
node of the FP tree represents an item of the itemset.

The root node represents null while the lower nodes represent the itemsets. The
association of the nodes with the lower nodes, that is the itemsets with the other
itemsets are maintained while forming the tree.

Frequent Pattern Algorithm Steps

The frequent pattern growth method lets us find the frequent pattern without
candidate generation.

Let us see the steps followed to mine the frequent pattern using frequent
pattern growth algorithm:

#1) The first step is to scan the database to find the occurrences of the itemsets in
the database. This step is the same as the first step of Apriori. The count of 1-
itemsets in the database is called support count or frequency of 1-itemset.

#2) The second step is to construct the FP tree. For this, create the root of the tree.
The root is represented by null.

#3) The next step is to scan the database again and examine the transactions.
Examine the first transaction and find out the itemset in it. The itemset with the max
count is taken at the top, the next itemset with lower count and so on. It means that
the branch of the tree is constructed with transaction item sets in descending order
of count.

#4) The next transaction in the database is examined. The item sets are ordered in
descending order of count. If any itemset of this transaction is already present in
another branch (for example in the 1st transaction), then this transaction branch
would share a common prefix to the root.

This means that the common itemset is linked to the new node of another itemset in
this transaction.
#5) Also, the count of the itemset is incremented as it occurs in the transactions.
Both the common node and new node count is increased by 1 as they are created
and linked according to transactions.

#6) The next step is to mine the created FP Tree. For this, the lowest node is
examined first along with the links of the lowest nodes. The lowest node represents
the frequency pattern length 1. From this, traverse the path in the FP Tree. This path
or paths are called a conditional pattern base.

Conditional pattern base is a sub-database consisting of prefix paths in the FP tree

occurring with the lowest node (suffix).

#7) Construct a Conditional FP Tree, which is formed by a count of itemsets in the

path. The itemsets meeting the threshold support are considered in the Conditional
FP Tree.

#8) Frequent Patterns are generated from the Conditional FP Tree.

FP Growth vs Apriori

FP Growth Apriori

Pattern Generation

FP growth generates pattern by Apriori generates pattern by pairing the

constructing a FP tree items into singletons, pairs and triplets.

Candidate Generation

There is no candidate generation Apriori uses candidate generation

Process
The process is faster as compared to The process is comparatively slower than
Apriori. The runtime of the process FP Growth, the runtime increases
increases linearly with an increase in the exponentially with increase in number of
number of itemsets. itemsets

Memory Usage

A compact version of database is saved The candidates combinations are saved in

memory

Advantages Of FP Growth Algorithm

1. This algorithm needs to scan the database only twice when compared
to Apriori which scans the transactions for each iteration.
2. The pairing of items is not done in this algorithm and this makes it
faster.
3. The database is stored in a compact version in memory.
4. It is efficient and scalable for mining both long and short frequent
patterns.

Disadvantages Of FP-Growth Algorithm

1. FP Tree is more cumbersome and difficult to build than Apriori.

2. It may be expensive.
3. When the database is large, the algorithm may not fit in the shared
memory.

Finding co-occurring words

Words co-occurrence statistics describe how words occur together that in turn
captures the relationships between words. Words co-occurrence statistics is
computed simply by counting how two or more words occur together in a
given corpus. Co-occurrence matrices analyze text in context. Word
embeddings and vector semantics are ways to understand words in their
context, namely the semantics analysis in NLP (compare to syntax analysis
such as language modeling using ngram, Part-of-Speech (POS) taggings,
Named Entity Recognition (NER)).

Unlike the Occurrence matrix which is a rectangular matrix, the co-

occurrence matrix is a square matrix where it depicts the co-occurrence of

two terms in a context. Thus, the co-occurrence matrix is also sometimes

called the term-term matrix. It’s a square matrix as it's a matrix between

each term and another term.

Typically there are two approaches which are followed

1. Term-context matrix e.g. Each sentence is represented as a

context (there can be other definitions as well). If two terms occur

in the same context, they are said to have occurred in the same

occurrence context.

2. k-skip-n-gram approach e.g. A sliding window will include the

(k+n) words. This window will serve as the context now. Terms

that co-occur within this context are said to have co-occurred.

Dis-advantage of the term-context matrix is that it will not consider the

words which are though close to each other but are in different sentences.

The concept of looking into word co-occurrences can be extended in

many ways. For example, we may count how many times a sequence of
three words occurs together to generate trigram frequencies. We may
even count how many times a pair of words occurs together in sentences
irrespective of their positions in sentences. Such occurrences are called
skip-bigram frequencies. Because of such variations in how co-
occurrences are specified, these methods in general are known as n-
gram methods. The term context window is often used to specify the co-
occurrence relationship.

Co-occurrence analysis is simply the counting of paired data within a

collection unit. For example, buying shampoo and a brush at a drug store
is an example of co-occurrence. Here the data is the brush and the
shampoo, and the collection unit is the particular transaction. In this
example, the paired data is {shampoo, brush} and it occurs once. Of
course, more items can be purchased at a time, so the pairings become
more numerous as each item is paired with each other item. For
example, if in addition to the two items, a third item is purchased, say,
goo, then there are three pairings ({shampoo, brush}, {shampoo, goo},
{brush, goo}), again each with a count of one.

Mining a click stream from a news site

The clickstream analysis is the tracking and analysis of visits to websites.
Although there are other ways to collect this data, clickstream analysis
typically uses the Web server log files to monitor and measure website
activity. This analysis can be used to report user behavior on a specific
website, such as routing, stickiness (a user’s tendency to remain at the
website), where users come from and where they go from the site. It can also
be used for more aggregate measurements, such as the number of hits
(visits), page views, and unique and repeat visitors, which are of value in
understanding how the website operates from a technical, user experience
and business perspective.

Applications of Clickstream Analysis

Clickstream analysis has a wide range of applications. Using just clickstream data,
webmasters can identify which pieces of content may need to be improved and optimize
the links between other pieces of content. However, combining clickstream data with
session analytics — to compare traffic channels or advertising strategies — may be
most popular.

Comparing traffic channels

Webmasters can use clickstream analysis to compare traffic channels if they

know how their users first reached the website. With most website analytics
tools, webmasters will have this information; for example, whether a given user
reached the website through a search engine, social media, or by typing the
website’s URL into their browser.

From the clickstream data itself, a webmaster may be able to identify that users
from certain channels view more or less pages, on average. For example, they
may notice that users from search engines view twice as many pages as users
from social media. As a result of this, the webmaster may choose to focus more
resources on the former channel.

Improving existing content

Clickstream analysis can still be incredibly powerful, even without session

analytics. By looking at the path users take through a website, webmasters are
able to see where users “drop off”. With this information, they can choose to
improve the pieces of content which caused users to leave the website.

How to Do Clickstream Analysis

Clickstream analysis is surprisingly easy to get started with. In just four steps,
you can begin to gain insights on the behavior of your website’s users.
1. Understand your objectives

It may sound cliche, but the first step in effective clickstream analysis is
understanding your objectives. Its helpful to know whether you are using
clickstream analysis to review your traffic channels and advertising strategies or
to improve your content and its interlinking. Once you know this, the remaining
steps will be much easier.

2. Collect and visualize data

Like with any data analysis, the next step in clickstream analysis is to collect the
data itself. There are numerous ways to collect clickstream data which we’ll
discuss later. Following the collection, it’s helpful to have a way to visualize or
otherwise review the data in a convenient format. This may be offered by the
same tools used for data collection.

3. Identify patterns and exceptions

Generally speaking, data analysis is all about identifying patterns and exceptions.
Regardless of what you are trying to achieve with clickstream analysis, you will
want to look at patterns and exceptions in the way users interact with your pages.
If you are comparing traffic channels or advertising strategies, also look at
patterns and exceptions in how users from varying sources interact with your
website.

4. Draw and implement conclusions

After analyzing the patterns and exceptions in clickstream data, you should
be able to draw conclusions about the pages on and users of your website.
Finally, you can implement these conclusions to improve your website.

DWDM Unit-3
100% (1)
DWDM Unit-3
63 pages
FP Growth
No ratings yet
FP Growth
21 pages
MRA Project - Shehroz Khan
67% (3)
MRA Project - Shehroz Khan
19 pages
Fpgrowth
No ratings yet
Fpgrowth
11 pages
FP Tree Growth: Frequent Pattern Growth Algorithm
100% (1)
FP Tree Growth: Frequent Pattern Growth Algorithm
2 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
44 pages
FP Growth Algorithm
No ratings yet
FP Growth Algorithm
17 pages
Shihab Rahman Dolon Chanpa Department of Computer Science and Engineering, University of Dhaka
No ratings yet
Shihab Rahman Dolon Chanpa Department of Computer Science and Engineering, University of Dhaka
23 pages
FP-Tree Growth Algorithm
No ratings yet
FP-Tree Growth Algorithm
15 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
23 pages
Fp-Tree Growth Algorithm
No ratings yet
Fp-Tree Growth Algorithm
11 pages
FP Growth (Tree)
No ratings yet
FP Growth (Tree)
24 pages
Mining Association Rules With Systolic Trees: Dept. of Electrical and Computer Engineering Iowa State University Email
No ratings yet
Mining Association Rules With Systolic Trees: Dept. of Electrical and Computer Engineering Iowa State University Email
6 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
16 pages
Lecture 5 - FP-Growth Algorithm
No ratings yet
Lecture 5 - FP-Growth Algorithm
26 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
No ratings yet
ESE Handouts 4 - FP Growth Algorithm (Fall 2016)
13 pages
Mining Frequent Patterns Without Candidate Generation
No ratings yet
Mining Frequent Patterns Without Candidate Generation
12 pages
FP Tree
No ratings yet
FP Tree
42 pages
(18-22) Hybrid Association Rule Mining Using AC Tree
No ratings yet
(18-22) Hybrid Association Rule Mining Using AC Tree
5 pages
FP Tree
No ratings yet
FP Tree
37 pages
Lecture 6
No ratings yet
Lecture 6
18 pages
FP Tree
No ratings yet
FP Tree
54 pages
Association Rule Mining: FP Growth
No ratings yet
Association Rule Mining: FP Growth
22 pages
Improv Me Net
No ratings yet
Improv Me Net
7 pages
Fptreehuffman
No ratings yet
Fptreehuffman
4 pages
Unit4 2 Association Rules FP Growth
No ratings yet
Unit4 2 Association Rules FP Growth
33 pages
Cold Start Problem
100% (1)
Cold Start Problem
18 pages
Tutorial 02
No ratings yet
Tutorial 02
17 pages
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
No ratings yet
A Frequent Pattern Mining Algorithm Based On Fp-Tree Structure Andapriori Algorithm
3 pages
03 Pre Processing
No ratings yet
03 Pre Processing
20 pages
FPTree 09
No ratings yet
FPTree 09
45 pages
An Improved Frequent Pattern Tree The Child Struct
No ratings yet
An Improved Frequent Pattern Tree The Child Struct
19 pages
Q) FP Growth Algorithm?: This Algorithm Works As Follows
No ratings yet
Q) FP Growth Algorithm?: This Algorithm Works As Follows
3 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
23 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
Module 4.2 Association Rule Mining
No ratings yet
Module 4.2 Association Rule Mining
88 pages
Frequent Closed Pattern Mining Algorithm Based On COFI-Tree
No ratings yet
Frequent Closed Pattern Mining Algorithm Based On COFI-Tree
2 pages
FP Growth
No ratings yet
FP Growth
16 pages
18-FP-Growth Algorithm-12-02-2025
No ratings yet
18-FP-Growth Algorithm-12-02-2025
24 pages
FP Growth Alg
No ratings yet
FP Growth Alg
17 pages
Lecture 13 14 FP
No ratings yet
Lecture 13 14 FP
41 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
5 pages
15-Fp-Tree Problem-10-09-2024
No ratings yet
15-Fp-Tree Problem-10-09-2024
2 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
FPgrowth
No ratings yet
FPgrowth
2 pages
FP Growth Presentation v1 (Handout)
No ratings yet
FP Growth Presentation v1 (Handout)
10 pages
F P-Tree F P-Growth
No ratings yet
F P-Tree F P-Growth
7 pages
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
No ratings yet
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
6 pages
FP Growth
No ratings yet
FP Growth
30 pages
Lecture 2.3.3 2.3.4
No ratings yet
Lecture 2.3.3 2.3.4
29 pages
ML 4
No ratings yet
ML 4
13 pages
Frequent Pattern Mining Without Candidate Generation: Lesson Introduction
No ratings yet
Frequent Pattern Mining Without Candidate Generation: Lesson Introduction
6 pages
B.C.A. Data Science
No ratings yet
B.C.A. Data Science
83 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
Data Mining NOTES
No ratings yet
Data Mining NOTES
57 pages
DWM Question Bank Winter 2024
No ratings yet
DWM Question Bank Winter 2024
4 pages
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
29 pages
DWM Solution May 2019
No ratings yet
DWM Solution May 2019
9 pages
DSBDA EndSem2023 12F FlyHigh
No ratings yet
DSBDA EndSem2023 12F FlyHigh
20 pages
DMW Merged
No ratings yet
DMW Merged
454 pages
Exam Version
No ratings yet
Exam Version
413 pages
Srs Submit
No ratings yet
Srs Submit
36 pages
Presentations PPT Examples PDF
No ratings yet
Presentations PPT Examples PDF
15 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
Unit 3 DWM Notes
No ratings yet
Unit 3 DWM Notes
17 pages
Lecture - 2 - Data Mining Concepts
No ratings yet
Lecture - 2 - Data Mining Concepts
30 pages
BE Aids - BI Syllabus
No ratings yet
BE Aids - BI Syllabus
3 pages
Elmasri and Navathe DBMS Concepts 27
No ratings yet
Elmasri and Navathe DBMS Concepts 27
10 pages
May 2016 - May 2024 DWM
No ratings yet
May 2016 - May 2024 DWM
34 pages
CS246 Hw1
No ratings yet
CS246 Hw1
5 pages
Unit 4 Intro DM
No ratings yet
Unit 4 Intro DM
30 pages
Business Data Mining Week 5
No ratings yet
Business Data Mining Week 5
19 pages
1 s2.0 S0957417424031506 Main
No ratings yet
1 s2.0 S0957417424031506 Main
17 pages
Data Mining An Overview From A Database Perspective
No ratings yet
Data Mining An Overview From A Database Perspective
18 pages
DataMining - Workbook TF
No ratings yet
DataMining - Workbook TF
8 pages
Usage Apriori and Clustering Algorithms in WEKA Tools To Mining Dataset of Traffic Accidents
No ratings yet
Usage Apriori and Clustering Algorithms in WEKA Tools To Mining Dataset of Traffic Accidents
16 pages
Data Mining Solved PP Short Q's
No ratings yet
Data Mining Solved PP Short Q's
11 pages
Ieee 2013
No ratings yet
Ieee 2013
2 pages
How To Pass Sem 5 - Comps
No ratings yet
How To Pass Sem 5 - Comps
11 pages
Distributed DataMining
No ratings yet
Distributed DataMining
16 pages
A Classification Model For Predicting The Suitable Study Track For School Students
No ratings yet
A Classification Model For Predicting The Suitable Study Track For School Students
6 pages
Java 9 Data Structures and Algorithms
From Everand
Java 9 Data Structures and Algorithms
Debasish Ray Chawdhuri
No ratings yet
Learn Design and Analysis of Algorithms in 24 Hours
From Everand
Learn Design and Analysis of Algorithms in 24 Hours
Alex Nordeen
No ratings yet
Search Tree: Fundamentals and Applications
From Everand
Search Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet

U3 - FP Trees - 5th Sem - DS

Uploaded by

U3 - FP Trees - 5th Sem - DS

Uploaded by

Unit 3 - FP Trees

Prepared by: Varun Rao (Dean, Data Science & AI)

Han defines the FP-tree as the tree structure given below:

2. Each node in the item-prefix subtree consists of three fields:

○ Item-name: registers which item is represented by the node;

○ Count: the number of transactions represented by the portion of the path

3. Each entry in the frequent-item-header table consists of two fields:

○ Item-name: as the same to the node;

The construction of a FP-tree is subdivided into three major steps.

Shortcomings Of Apriori Algorithm

1. Using Apriori requires a generation of candidate itemsets. These

These shortcomings can be overcome using the FP growth algorithm.

Frequent Pattern Growth Algorithm

Frequent Pattern Algorithm Steps

Conditional pattern base is a sub-database consisting of prefix paths in the FP tree

#7) Construct a Conditional FP Tree, which is formed by a count of itemsets in the

#8) Frequent Patterns are generated from the Conditional FP Tree.

FP growth generates pattern by Apriori generates pattern by pairing the

There is no candidate generation Apriori uses candidate generation

A compact version of database is saved The candidates combinations are saved in

Advantages Of FP Growth Algorithm

Disadvantages Of FP-Growth Algorithm

1. FP Tree is more cumbersome and difficult to build than Apriori.

Finding co-occurring words

Unlike the Occurrence matrix which is a rectangular matrix, the co-

occurrence matrix is a square matrix where it depicts the co-occurrence of

two terms in a context. Thus, the co-occurrence matrix is also sometimes

each term and another term.

Typically there are two approaches which are followed

1. Term-context matrix e.g. Each sentence is represented as a

context (there can be other definitions as well). If two terms occur

2. k-skip-n-gram approach e.g. A sliding window will include the

that co-occur within this context are said to have co-occurred.

The concept of looking into word co-occurrences can be extended in

Co-occurrence analysis is simply the counting of paired data within a

Mining a click stream from a news site

Applications of Clickstream Analysis

Comparing traffic channels

Webmasters can use clickstream analysis to compare traffic channels if they

Improving existing content

Clickstream analysis can still be incredibly powerful, even without session

How to Do Clickstream Analysis

2. Collect and visualize data

3. Identify patterns and exceptions

4. Draw and implement conclusions

You might also like