0% found this document useful (0 votes)

49 views86 pages

FP Growth Datamining Lect 5

The document describes the FP-tree and FP-growth algorithm for efficiently finding frequent itemsets in transactional datasets, where an FP-tree is constructed to compactly represent the transaction database and allow the FP-growth algorithm to recursively mine the tree to enumerate all frequent itemsets using a divide-and-conquer approach.

Uploaded by

Faiza Israr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

49 views86 pages

FP Growth Datamining Lect 5

Uploaded by

Faiza Israr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 86

MS (Data Science)

Fall 2020 Semester

CT-530 DATA MINING

Dr. Sohail Abdul Sattar

Ex Professor & Co-chairman
Department of Computer Science & Information
Technology
NED University of Engineering & Technology
2

Course Teacher

Dr. Sohail Abdul Sattar

Ex Professor & Co-Chairman
Department of Computer Science & Information
Technology
NED University

PhD Computer Vision NED + UCF (Orlando, US)

MS Comp. Science NED
MCS Comp. Science KU
BE Mech. Engg. NED
3

Books
• “Introduction to Data Mining” by Tan, Steinbach, Kumar.

• Mining Massive Datasets by Anand Rajaraman, Jeff Ullman, and Jure

Leskovec. Free online book. Includes slides from the course

• “Data Mining: Concepts and Techniques”, by Jiawei Han and

Micheline Kambe

• “Data Mining: Practical Machine Learning Tools and Techniques”

by Ian H. Witten
4

Thanks to the owner of these slides !!!

DATA MINING
LECTURE 5
FP-Tree and
FP-Growth Algorithm
6

THE FP-TREE AND THE

FP-GROWTH ALGORITHM
Slides from course lecture of E. Pitoura
7

Overview
• The FP-tree contains a compressed
representation of the transaction database.
• A trie (prefix-tree) data structure is used
• Each transaction is a path in the tree – paths can
overlap.

• Once the FP-tree is constructed the recursive,

divide-and-conquer FP-Growth algorithm is used
to enumerate all frequent itemsets.
8

Definition of ‘trie’
9

FP-tree Construction
• The FP-tree is a trie (prefix tree)
TID Items
1 {A,B} • Since transactions are sets of
2 {B,C,D} items, we need to transform them
3 {A,C,D,E} into ordered sequences so that
4 {A,D,E}
we can have prefixes
5 {A,B,C}
• Otherwise, there is no common prefix
6 {A,B,C,D}
7 {B,C} between sets {A,B} and {B,C,A}
8 {A,B,C} • We need to impose an order to
9 {A,B,D} the items
10 {B,C,E} • Initially, assume a lexicographic order.
10

FP-tree Construction
• Initially the tree is empty

TID Items
1 {A,B} null
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
11

FP-tree Construction
• Reading transaction TID = 1

TID Items null

1 {A,B}
2 {B,C,D}
3 {A,C,D,E} A:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} B:1
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E} Node label = item:support

• Each node in the tree has a label consisting of the item and
the support (number of transactions that reach that node, i.e.
follow that path)
12

FP-tree Construction
• Reading transaction TID = 2
TID Items null
1 {A,B}
2 {B,C,D}
A:1 B:1
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C} B:1
6 {A,B,C,D}
C:1
7 {B,C}
8 {A,B,C} D:1
9 {A,B,D}
10 {B,C,E} Each transaction is a path in the tree

• We add pointers between nodes that refer to the

same item
13

FP-tree Construction
TID Items
null
1 {A,B} After reading
2 {B,C,D} transactions TID=1, 2:
3 {A,C,D,E} A:1 B:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} B:1 C:1
7 {B,C}
8 {A,B,C}
Header Table D:1
9 {A,B,D}
10 {B,C,E} Item Pointer
A
B
The Header Table and the C
pointers assist in D
computing the itemset E
support
14

FP-tree Construction
null
• Reading transaction TID = 3
TID Items
A:1 B:1
1 {A,B}
2 {B,C,D}
3 {A,C,D,E} B:1 C:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} D:1
7 {B,C} Item Pointer
8 {A,B,C} A
9 {A,B,D} B
10 {B,C,E} C
D
E
15

FP-tree Construction
null
• Reading transaction TID = 3
TID Items
A:2 B:1
1 {A,B}
2 {B,C,D}
3 {A,C,D,E} B:1
C:1 C:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} D:1
7 {B,C} Item Pointer D:1
8 {A,B,C} A
9 {A,B,D} B E:1
10 {B,C,E} C
D
E
16

Each transaction is a path in the tree

FP-Tree Construction
TID Items Each transaction is a path in the tree
1 {A,B}
Transaction
2 {B,C,D} Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E} C:1 D:1
E:1
Header table D:1
C:3
Item Pointer D:1 E:1
A D:1
B E:1
C D:1
D Pointers are used to assist
E frequent itemset generation
18

FP-tree size
• Every transaction is a path in the FP-tree
• The size of the tree depends on the
compressibility of the data
• Extreme case: All transactions are the same, the FP-
tree is a single branch
• Extreme case: All transactions are different the size of
the tree is the same as that of the database (bigger
actually since we need additional pointers)
19

Item ordering
• The size of the tree also depends on the ordering of the items.
• Heuristic: order the items in according to their frequency from larger
to smaller.
• We would need to do an extra pass over the dataset to count frequencies
• Example:

TID Items TID Items

1 {A,B} σ(Α)=7, σ(Β)=8, 1 {Β,Α}
2 {B,C,D} σ(C)=7, σ(D)=5, 2 {B,C,D}
3 {A,C,D,E} σ(Ε)=3 3 {A,C,D,E}
4 {A,D,E} 4 {A,D,E}
Ordering : Β,Α,C,D,E
5 {A,B,C} 5 {Β,Α,C}
6 {A,B,C,D} 6 {Β,Α,C,D}
7 {B,C} 7 {B,C}
8 {A,B,C} 8 {Β,Α,C}
9 {A,B,D} 9 {Β,Α,D}
10 {B,C,E} 10 {B,C,E}
20

Frequent?

ABCDE
24

Frequent Itemsets

All Itemsets

Ε D C B A
Frequent?

DE CE BE AE CD BD AD BC AC AB

Frequent?

CDE BDE ADE BCE ACE ABE BCD ACD ABD ABC
Frequent?

ACDE BCDE ABDE ABCE ABCD

ABCDE
We can generate all itemsets this way
We expect the FP-tree to contain a lot less
25

Using the FP-tree to find frequent itemsets

TID Items
Transaction
1 {A,B}
2 {B,C,D}
Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E}
C:1 D:1

Header table
E:1
D:1
C:3
Item Pointer D:1 E:1
A D:1
B E:1
C D:1
D
Bottom-up traversal of the tree.
E First, itemsets ending in E, then D,
etc, each time a suffix-based class
26

Finding Frequent Itemsets

null
Subproblem: find frequent
itemsets ending in E
A:7 B:3

B:5 C:3
C:1 D:1

Header table D:1

C:3
Item Pointer D:1 E:1 E:1
A D:1
B E:1
C D:1
D
E

 We will then see how to compute the support for the possible itemsets
27

Finding Frequent Itemsets

D:1 E:1 E:1

E:1

Prefix Paths for Ε:

{A,C,D,E}, {A,D,Ε}, {B,C,E}
34

Example
null
Compute Support for E
(minsup = 2)
A:7 B:3
How?
Follow pointers while
summing up counts: C:3
1+1+1 = 3 > 2 C:1 D:1
E is frequent

D:1 E:1 E:1

E:1

{E} is frequent so we can now consider suffixes DE, CE, BE, AE

Example
null
E is frequent so we proceed with Phase 2

Phase 2 A:7 B:3

Convert the prefix tree of E into a
conditional FP-tree
C:3
Two changes C:1 D:1
(1) Recompute support
(2) Prune infrequent D:1 E:1 E:1

E:1
36

Example
null
Recompute Support
A:7 B:3

The support counts for some of the

Example
null

A:2 B:1
Truncate
Delete the nodes of Ε C:1
C:1 D:1

D:1 E:1 E:1

E:1
45

Example
null

A:2 B:1
Truncate
Delete the nodes of Ε C:1
C:1 D:1

D:1 E:1 E:1

E:1
46

Example
null

A:2 B:1
Truncate
Delete the nodes of Ε C:1
C:1 D:1

D:1
47

Example
null

A:2 B:1
Prune infrequent
In the conditional FP-tree C:1
some nodes may have C:1 D:1
support less than minsup
e.g., B needs to be D:1
pruned
This means that B
appears with E less than
minsup times
48

Example
null

A:2 B:1

C:1
C:1 D:1

D:1
49

Example
null

A:2 C:1

C:1 D:1

D:1
50

Example
null

A:2 C:1

C:1 D:1

D:1

The conditional FP-tree for E

Repeat the algorithm for {D, E}, {C, E}, {A, E}
51

Example
null

A:2 C:1

C:1 D:1

D:1

Phase 1

Find all prefix paths that contain D (DE) in the conditional FP-tree
52

Example
null

A:2

C:1 D:1

D:1

Phase 1

Find all prefix paths that contain D (DE) in the conditional FP-tree
53

Example
null

A:2

C:1 D:1

D:1

Compute the support of {D,E} by following the pointers in the tree

1+1 = 2 ≥ 2 = minsup

{D,E} is frequent
54

Example
null

A:2

C:1 D:1

D:1
Phase 2

Construct the conditional FP-tree

1. Recompute Support
2. Prune nodes
55

Example
null

A:2

Recompute support C:1 D:1

D:1
56

Example
null

A:2

Prune nodes C:1 D:1

D:1
57

Example
null

A:2

Example
null

A:2 C:1

C:1

Phase 1

Find all prefix paths that contain C (CE) in the conditional FP-tree
63

Example
null

A:2 C:1

C:1

Compute the support of {C,E} by following the pointers in the tree

1+1 = 2 ≥ 2 = minsup

{C,E} is frequent
64

Example
null

A:2 C:1

C:1

Phase 2

Construct the conditional FP-tree

1. Recompute Support
2. Prune nodes
65

Example
null

A:1 C:1

Recompute support C:1

Example
null

A:1 C:1

Prune nodes C:1

Example
null

A:1

Prune nodes
68

Example
null

A:1

Prune nodes
69

Example
null

Prune nodes

Return to the previous subproblem

Example
null

A:2 C:1

C:1 D:1

D:1

The conditional FP-tree for E

We repeat the algorithm for {D,E}, {C,E}, {A,E}

Example
null

A:2 C:1

C:1 D:1

D:1

Phase 1

Find all prefix paths that contain A (AE) in the conditional FP-tree
72

Example
null

A:2

Phase 1

Find all prefix paths that contain A (AE) in the conditional FP-tree
73

Example
null

A:2

Compute the support of {A,E} by following the pointers in the tree

2 ≥ minsup

{A,E} is frequent

There is no conditional FP-tree for {A,E}

Example
• So for E we have the following frequent itemsets
{E}, {D,E}, {A,D,E}, {C,E}, {A,E}

• We proceed with D (optional) ?

Example
null

Ending in D
A:7 B:3

B:5 C:3
C:1 D:1

Header table D:1

C:3
Item Pointer D:1 E:1 E:1
A D:1
B E:1
C D:1
D
E
76

Example
null
Phase 1 – construct
prefix tree B:3
A:7
Find all prefix paths that
contain D
Support 5 > minsup, D is B:5 C:3
C:1 D:1
frequent
Phase 2 D:1
C:3
Convert prefix tree into D:1
conditional FP-tree D:1

D:1
77

Example
null

A:7 B:3

B:5 C:3
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
78

Example
null

A:7 B:3

B:2 C:3
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
79

Example
null

A:3 B:3

B:2 C:3
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
80

Example
null

A:3 B:3

B:2 C:1
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
81

Example
null

A:3 B:1

B:2 C:1
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
82

Example
null

A:3 B:1

B:2 C:1
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Prune nodes
83

Example
null

A:3 B:1

B:2 C:1
C:1

C:1

Prune nodes
84

Example
null

A:3 B:1

B:2 C:1
C:1

C:1

Construct conditional FP-trees for {C,D}, {B,D}, {A,D}

And so on….
85

Observations
• At each recursive step we solve a subproblem
• Construct the prefix tree
• Compute the new support
• Prune nodes
• Subproblems are disjoint so we never consider
the same itemset twice

• Support computation is efficient – happens

together with the computation of the frequent
itemsets.
86

Observations
• The efficiency of the algorithm depends on the
compaction factor of the dataset

• If the tree is bushy then the algorithm does not

work well, it increases a lot of number of
subproblems that need to be solved.

Test: Jfo Section 2 Quiz
0% (2)
Test: Jfo Section 2 Quiz
2 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
34 pages
Spek Super Control Monitoring System
No ratings yet
Spek Super Control Monitoring System
5 pages
Detecting EBPF Rootkits Using Virtualization and Memory Forensics
No ratings yet
Detecting EBPF Rootkits Using Virtualization and Memory Forensics
8 pages
Procomm Plus® Version 4.8 Configuration
No ratings yet
Procomm Plus® Version 4.8 Configuration
5 pages
Aakriti Thakur Resume
No ratings yet
Aakriti Thakur Resume
1 page
Adsa Mid-1 MCQ Unit-1
No ratings yet
Adsa Mid-1 MCQ Unit-1
5 pages
IBM HMC Recovery
No ratings yet
IBM HMC Recovery
98 pages
DOA IPIP - 200B GTS - HS Global POF
No ratings yet
DOA IPIP - 200B GTS - HS Global POF
13 pages
FTI 2023 Trend Report
No ratings yet
FTI 2023 Trend Report
820 pages
Math 0405 Test 1 Review
No ratings yet
Math 0405 Test 1 Review
5 pages
Penjelasan Listing Program
No ratings yet
Penjelasan Listing Program
63 pages
Hol 2210 91 SDC - PDF - en
No ratings yet
Hol 2210 91 SDC - PDF - en
55 pages
Pentesting Methodologies
No ratings yet
Pentesting Methodologies
12 pages
ML 4
No ratings yet
ML 4
13 pages
Mod Menu Log - Com - Fffungame.taptaprun
No ratings yet
Mod Menu Log - Com - Fffungame.taptaprun
23 pages
FP Growth
No ratings yet
FP Growth
30 pages
Asset v1 HKUSTx+MSBD5002x+1T2022+Type@Asset+Block@03 FP Tree
No ratings yet
Asset v1 HKUSTx+MSBD5002x+1T2022+Type@Asset+Block@03 FP Tree
50 pages
Untitled Document
No ratings yet
Untitled Document
5 pages
Priya Presentation
No ratings yet
Priya Presentation
21 pages
Lecture 2.3.3 2.3.4
No ratings yet
Lecture 2.3.3 2.3.4
29 pages
Zscaler Private Access: Fast, Secure Access To Private Applications With Cloud-Delivered Zero Trust Network Access (ZTNA)
No ratings yet
Zscaler Private Access: Fast, Secure Access To Private Applications With Cloud-Delivered Zero Trust Network Access (ZTNA)
4 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
Tan FP Growth
No ratings yet
Tan FP Growth
8 pages
Aegisub Instructions - 20230314
No ratings yet
Aegisub Instructions - 20230314
14 pages
Biometric User Manual
No ratings yet
Biometric User Manual
23 pages
CSE1002
No ratings yet
CSE1002
14 pages
Lecture 13 14 FP
No ratings yet
Lecture 13 14 FP
41 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
tb7100 Inst Guide 19p
No ratings yet
tb7100 Inst Guide 19p
19 pages
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
FP Growth
No ratings yet
FP Growth
32 pages
PCB & Electronics Hardware
No ratings yet
PCB & Electronics Hardware
84 pages
Fpgrowth
No ratings yet
Fpgrowth
11 pages
04 Processes Slides
No ratings yet
04 Processes Slides
33 pages
Artificial Intelligence Unit-1 Introduction and Intelligent Agents
100% (2)
Artificial Intelligence Unit-1 Introduction and Intelligent Agents
16 pages
Comparing Dataset Characteristics That Favor The Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms
No ratings yet
Comparing Dataset Characteristics That Favor The Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms
7 pages
XC Hardware Manual
No ratings yet
XC Hardware Manual
138 pages
TLE11 ICT Empowerment Tech Q1 W1
No ratings yet
TLE11 ICT Empowerment Tech Q1 W1
43 pages
Chapter 5
No ratings yet
Chapter 5
26 pages
FP Growth Alg
No ratings yet
FP Growth Alg
17 pages
DWM Exp10 - 201107
No ratings yet
DWM Exp10 - 201107
13 pages
Module 4.2 Association Rule Mining
No ratings yet
Module 4.2 Association Rule Mining
88 pages
Data Mining Unit 2 (Part 2) - 1
No ratings yet
Data Mining Unit 2 (Part 2) - 1
7 pages
FP Tree
No ratings yet
FP Tree
54 pages
FP Tree
No ratings yet
FP Tree
37 pages
Simply Your Daily Business: Lasermfd 6080
No ratings yet
Simply Your Daily Business: Lasermfd 6080
2 pages
FP Tree
No ratings yet
FP Tree
42 pages
HW5 Redina
No ratings yet
HW5 Redina
6 pages
DLL CSS12 Week7
No ratings yet
DLL CSS12 Week7
4 pages
FP Growth Algorithm Example Problems
No ratings yet
FP Growth Algorithm Example Problems
12 pages
15-Fp-Tree Problem-10-09-2024
No ratings yet
15-Fp-Tree Problem-10-09-2024
2 pages
CandidateHallTicket PDF
No ratings yet
CandidateHallTicket PDF
1 page
Tutorial 02
No ratings yet
Tutorial 02
17 pages
Lecture 6
No ratings yet
Lecture 6
18 pages
BI
No ratings yet
BI
17 pages
Qdoc - Tips Raghu Sir Jspiders Programs
No ratings yet
Qdoc - Tips Raghu Sir Jspiders Programs
132 pages
Mining Frequent Item Sets
No ratings yet
Mining Frequent Item Sets
12 pages
HW6 Redina
No ratings yet
HW6 Redina
7 pages
Frequent Closed Pattern Mining Algorithm Based On COFI-Tree
No ratings yet
Frequent Closed Pattern Mining Algorithm Based On COFI-Tree
2 pages
U3 - FP Trees - 5th Sem - DS
No ratings yet
U3 - FP Trees - 5th Sem - DS
9 pages
DM Module 3
No ratings yet
DM Module 3
11 pages
Module 3 - Part 2 - Frequency Pattern Mining
No ratings yet
Module 3 - Part 2 - Frequency Pattern Mining
51 pages
FP-Growth Example
0% (1)
FP-Growth Example
3 pages
Chap 18 - Association Rule Mining III
No ratings yet
Chap 18 - Association Rule Mining III
39 pages
4.1) FP Growth Algorithm
No ratings yet
4.1) FP Growth Algorithm
26 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 03 - Part 01
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 03 - Part 01
31 pages
FP Growth (Tree)
No ratings yet
FP Growth (Tree)
24 pages
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
No ratings yet
AzqaSaleemKhan (SP22 RCS 003) FPGrowth
19 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
23 pages
FP Growth
No ratings yet
FP Growth
21 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
FP-Tree Growth Algorithm
No ratings yet
FP-Tree Growth Algorithm
15 pages
Fp-Tree Growth Algorithm
No ratings yet
Fp-Tree Growth Algorithm
11 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
FP-Growth Algorithm
No ratings yet
FP-Growth Algorithm
23 pages
FP Tree Example
No ratings yet
FP Tree Example
11 pages
03 Pre Processing
No ratings yet
03 Pre Processing
20 pages
Veloso Sbac03
No ratings yet
Veloso Sbac03
8 pages
How To Find Frequent Patterns?: Wim Pijls Walter A. Kosters
No ratings yet
How To Find Frequent Patterns?: Wim Pijls Walter A. Kosters
8 pages
A New Fast Algorithm For Constructing FP - Tree: Zhenzhou Wang Jiaomin Liu Sheng Guo Lijuan Yang
No ratings yet
A New Fast Algorithm For Constructing FP - Tree: Zhenzhou Wang Jiaomin Liu Sheng Guo Lijuan Yang
4 pages
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
No ratings yet
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
22 pages
FP Growth Algorithm
No ratings yet
FP Growth Algorithm
17 pages
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
No ratings yet
Lecture 5 - Monday, September 3, 2007: 2.1 Example From Paper
6 pages
FP Growth Presentation v1 (Handout)
No ratings yet
FP Growth Presentation v1 (Handout)
10 pages
FP Example
No ratings yet
FP Example
3 pages
Mtech Project Seminar1
No ratings yet
Mtech Project Seminar1
36 pages

FP Growth Datamining Lect 5

Uploaded by

FP Growth Datamining Lect 5

Uploaded by

MS (Data Science)

Fall 2020 Semester

CT-530 DATA MINING

Dr. Sohail Abdul Sattar

Dr. Sohail Abdul Sattar

PhD Computer Vision NED + UCF (Orlando, US)

• Mining Massive Datasets by Anand Rajaraman, Jeff Ullman, and Jure

• “Data Mining: Concepts and Techniques”, by Jiawei Han and

• “Data Mining: Practical Machine Learning Tools and Techniques”

Thanks to the owner of these slides !!!

THE FP-TREE AND THE

• Once the FP-tree is constructed the recursive,

TID Items null

• We add pointers between nodes that refer to the

Each transaction is a path in the tree

TID Items TID Items

Finding Frequent Itemsets

ACDE BCDE ABDE ABCE ABCD

ACDE BCDE ABDE ABCE ABCD

ACDE BCDE ABDE ABCE ABCD

ACDE BCDE ABDE ABCE ABCD

Using the FP-tree to find frequent itemsets

Finding Frequent Itemsets

Header table D:1

Finding Frequent Itemsets

Header table D:1

Finding Frequent Itemsets

Header table D:1

Finding Frequent Itemsets

Header table D:1

Finding Frequent Itemsets

Header table D:1

Header table D:1

D:1 E:1 E:1

Prefix Paths for Ε:

D:1 E:1 E:1

{E} is frequent so we can now consider suffixes DE, CE, BE, AE

Phase 2 A:7 B:3

The support counts for some of the

For example in null->B->C->E we

The support of any node is equal to E:1

D:1 E:1 E:1

D:1 E:1 E:1

D:1 E:1 E:1

D:1 E:1 E:1

D:1 E:1 E:1

D:1 E:1 E:1

D:1 E:1 E:1

D:1 E:1 E:1

D:1 E:1 E:1

The conditional FP-tree for E

Compute the support of {D,E} by following the pointers in the tree

Construct the conditional FP-tree

Recompute support C:1 D:1

Prune nodes C:1 D:1

Prune nodes C:1

Final condition FP-tree for {D,E}

The support of A is ≥ minsup so {A,D,E} is frequent

The conditional FP-tree for E

We repeat the algorithm for {D,E}, {C,E}, {A,E}

Compute the support of {C,E} by following the pointers in the tree

Construct the conditional FP-tree

Recompute support C:1

Prune nodes C:1

Return to the previous subproblem

The conditional FP-tree for E

We repeat the algorithm for {D,E}, {C,E}, {A,E}

Compute the support of {A,E} by following the pointers in the tree

There is no conditional FP-tree for {A,E}

• We proceed with D (optional) ?

Header table D:1

Construct conditional FP-trees for {C,D}, {B,D}, {A,D}

• Support computation is efficient – happens

• If the tree is bushy then the algorithm does not

You might also like