0% found this document useful (0 votes)
49 views86 pages

FP Growth Datamining Lect 5

The document describes the FP-tree and FP-growth algorithm for efficiently finding frequent itemsets in transactional datasets, where an FP-tree is constructed to compactly represent the transaction database and allow the FP-growth algorithm to recursively mine the tree to enumerate all frequent itemsets using a divide-and-conquer approach.

Uploaded by

Faiza Israr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views86 pages

FP Growth Datamining Lect 5

The document describes the FP-tree and FP-growth algorithm for efficiently finding frequent itemsets in transactional datasets, where an FP-tree is constructed to compactly represent the transaction database and allow the FP-growth algorithm to recursively mine the tree to enumerate all frequent itemsets using a divide-and-conquer approach.

Uploaded by

Faiza Israr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 86

MS (Data Science)

Fall 2020 Semester

CT-530 DATA MINING

Dr. Sohail Abdul Sattar


Ex Professor & Co-chairman
Department of Computer Science & Information
Technology
NED University of Engineering & Technology
2

Course Teacher

Dr. Sohail Abdul Sattar


Ex Professor & Co-Chairman
Department of Computer Science & Information
Technology
NED University

PhD Computer Vision NED + UCF (Orlando, US)


MS Comp. Science NED
MCS Comp. Science KU
BE Mech. Engg. NED
3

Books
• “Introduction to Data Mining” by Tan, Steinbach, Kumar.

• Mining Massive Datasets by Anand Rajaraman, Jeff Ullman, and Jure


Leskovec. Free online book. Includes slides from the course     

• “Data Mining: Concepts and Techniques”, by Jiawei Han and


Micheline Kambe

• “Data Mining: Practical Machine Learning Tools and Techniques”


by Ian H. Witten
4

Thanks to the owner of these slides !!!


DATA MINING
LECTURE 5
FP-Tree and
FP-Growth Algorithm
6

THE FP-TREE AND THE


FP-GROWTH ALGORITHM
Slides from course lecture of E. Pitoura
7

Overview
• The FP-tree contains a compressed
representation of the transaction database.
• A trie (prefix-tree) data structure is used
• Each transaction is a path in the tree – paths can
overlap.

• Once the FP-tree is constructed the recursive,


divide-and-conquer FP-Growth algorithm is used
to enumerate all frequent itemsets.
8

Definition of ‘trie’
9

FP-tree Construction
• The FP-tree is a trie (prefix tree)
TID Items
1 {A,B} • Since transactions are sets of
2 {B,C,D} items, we need to transform them
3 {A,C,D,E} into ordered sequences so that
4 {A,D,E}
we can have prefixes
5 {A,B,C}
• Otherwise, there is no common prefix
6 {A,B,C,D}
7 {B,C} between sets {A,B} and {B,C,A}
8 {A,B,C} • We need to impose an order to
9 {A,B,D} the items
10 {B,C,E} • Initially, assume a lexicographic order.
10

FP-tree Construction
• Initially the tree is empty

TID Items
1 {A,B} null
2 {B,C,D}
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E}
11

FP-tree Construction
• Reading transaction TID = 1

TID Items null


1 {A,B}
2 {B,C,D}
3 {A,C,D,E} A:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} B:1
7 {B,C}
8 {A,B,C}
9 {A,B,D}
10 {B,C,E} Node label = item:support

• Each node in the tree has a label consisting of the item and
the support (number of transactions that reach that node, i.e.
follow that path)
12

FP-tree Construction
• Reading transaction TID = 2
TID Items null
1 {A,B}
2 {B,C,D}
A:1 B:1
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C} B:1
6 {A,B,C,D}
C:1
7 {B,C}
8 {A,B,C} D:1
9 {A,B,D}
10 {B,C,E} Each transaction is a path in the tree

• We add pointers between nodes that refer to the


same item
13

FP-tree Construction
TID Items
null
1 {A,B} After reading
2 {B,C,D} transactions TID=1, 2:
3 {A,C,D,E} A:1 B:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} B:1 C:1
7 {B,C}
8 {A,B,C}
Header Table D:1
9 {A,B,D}
10 {B,C,E} Item Pointer
A
B
The Header Table and the C
pointers assist in D
computing the itemset E
support
14

FP-tree Construction
null
• Reading transaction TID = 3
TID Items
A:1 B:1
1 {A,B}
2 {B,C,D}
3 {A,C,D,E} B:1 C:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} D:1
7 {B,C} Item Pointer
8 {A,B,C} A
9 {A,B,D} B
10 {B,C,E} C
D
E
15

FP-tree Construction
null
• Reading transaction TID = 3
TID Items
A:2 B:1
1 {A,B}
2 {B,C,D}
3 {A,C,D,E} B:1
C:1 C:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} D:1
7 {B,C} Item Pointer D:1
8 {A,B,C} A
9 {A,B,D} B E:1
10 {B,C,E} C
D
E
16

FP-tree Construction
null
• Reading transaction TID = 3
TID Items
A:2 B:1
1 {A,B}
2 {B,C,D}
3 {A,C,D,E} B:1
C:1 C:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} D:1
7 {B,C} Item Pointer D:1
8 {A,B,C} A
9 {A,B,D} B E:1
10 {B,C,E} C
D
E

Each transaction is a path in the tree


17

FP-Tree Construction
TID Items Each transaction is a path in the tree
1 {A,B}
Transaction
2 {B,C,D} Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E} C:1 D:1
E:1
Header table D:1
C:3
Item Pointer D:1 E:1
A D:1
B E:1
C D:1
D Pointers are used to assist
E frequent itemset generation
18

FP-tree size
• Every transaction is a path in the FP-tree
• The size of the tree depends on the
compressibility of the data
• Extreme case: All transactions are the same, the FP-
tree is a single branch
• Extreme case: All transactions are different the size of
the tree is the same as that of the database (bigger
actually since we need additional pointers)
19

Item ordering
• The size of the tree also depends on the ordering of the items.
• Heuristic: order the items in according to their frequency from larger
to smaller.
• We would need to do an extra pass over the dataset to count frequencies
• Example:

TID Items TID Items


1 {A,B} σ(Α)=7, σ(Β)=8, 1 {Β,Α}
2 {B,C,D} σ(C)=7, σ(D)=5, 2 {B,C,D}
3 {A,C,D,E} σ(Ε)=3 3 {A,C,D,E}
4 {A,D,E} 4 {A,D,E}
Ordering : Β,Α,C,D,E
5 {A,B,C} 5 {Β,Α,C}
6 {A,B,C,D} 6 {Β,Α,C,D}
7 {B,C} 7 {B,C}
8 {A,B,C} 8 {Β,Α,C}
9 {A,B,D} 9 {Β,Α,D}
10 {B,C,E} 10 {B,C,E}
20

Finding Frequent Itemsets


• Input: The FP-tree
• Output: All Frequent Itemsets and their support
• Method:
• Divide and Conquer:
• Consider all itemsets that end in: E, D, C, B, A
• For each possible ending item, consider the itemsets with last
items one of items preceding it in the ordering
• E.g, for E, consider all itemsets with last item D, C, B, A. This
way we get all the itesets ending at DE, CE, BE, AE
• Proceed recursively this way.
• Do this for all items.
21

Frequent itemsets

All Itemsets

Ε D C B A

DE CE BE AE CD BD AD BC AC AB

CDE BDE ADE BCE ACE ABE BCD ACD ABD ABC

ACDE BCDE ABDE ABCE ABCD

ABCDE
22

Frequent Itemsets

All Itemsets

Ε D C B A
Frequent?;

DE CE BE AE CD BD AD BC AC AB

Frequent?;

CDE BDE ADE BCE ACE ABE BCD ACD ABD ABC
Frequent?

ACDE BCDE ABDE ABCE ABCD


Frequent?

ABCDE
23

Frequent Itemsets
All Itemsets

Frequent?
Ε D C B A

DE CE BE AE CD BD AD BC AC AB

Frequent?

CDE BDE ADE BCE ACE ABE BCD ACD ABD ABC
Frequent? Frequent?

ACDE BCDE ABDE ABCE ABCD


Frequent?

ABCDE
24

Frequent Itemsets

All Itemsets

Ε D C B A
Frequent?

DE CE BE AE CD BD AD BC AC AB

Frequent?

CDE BDE ADE BCE ACE ABE BCD ACD ABD ABC
Frequent?

ACDE BCDE ABDE ABCE ABCD

ABCDE
We can generate all itemsets this way
We expect the FP-tree to contain a lot less
25

Using the FP-tree to find frequent itemsets


TID Items
Transaction
1 {A,B}
2 {B,C,D}
Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E}
C:1 D:1

Header table
E:1
D:1
C:3
Item Pointer D:1 E:1
A D:1
B E:1
C D:1
D
Bottom-up traversal of the tree.
E First, itemsets ending in E, then D,
etc, each time a suffix-based class
26

Finding Frequent Itemsets


null
Subproblem: find frequent
itemsets ending in E
A:7 B:3

B:5 C:3
C:1 D:1

Header table D:1


C:3
Item Pointer D:1 E:1 E:1
A D:1
B E:1
C D:1
D
E

 We will then see how to compute the support for the possible itemsets
27

Finding Frequent Itemsets


null
Ending in D
A:7 B:3

B:5 C:3
C:1 D:1

Header table D:1


C:3
Item Pointer D:1 E:1 E:1
A D:1
B E:1
C D:1
D
E
28

Finding Frequent Itemsets


null

Ending in C
A:7 B:3

B:5 C:3
C:1 D:1

Header table D:1


C:3
Item Pointer D:1 E:1 E:1
A D:1
B E:1
C D:1
D
E
29

Finding Frequent Itemsets


null

Ending in B
A:7 B:3

B:5 C:3
C:1 D:1

Header table D:1


C:3
Item Pointer D:1 E:1 E:1
A D:1
B E:1
C D:1
D
E
30

Finding Frequent Itemsets


null
Ending in Α

A:7 B:3

B:5 C:3
C:1 D:1

Header table D:1


C:3
Item Pointer D:1 E:1 E:1
A D:1
B E:1
C D:1
D
E
31

Algorithm
• For each suffix X
• Phase 1
• Construct the prefix tree for X as shown before, and
compute the support using the header table and the
pointers

• Phase 2
• If X is frequent, construct the conditional FP-tree for X in
the following steps
1. Recompute support
2. Prune infrequent items
3. Prune leaves and recurse
32

Example
null
Phase 1 – construct
prefix tree
A:7 B:3
Find all prefix paths that
contain E

B:5 C:3
C:1 D:1

Header table D:1


C:3
Item Pointer D:1 E:1 E:1
A D:1
B E:1
C D:1
D
E
Suffix Paths for Ε:
{A,C,D,E}, {A,D,Ε}, {B,C,E}
33

Example
null
Phase 1 – construct
prefix tree
A:7 B:3
Find all prefix paths that
contain E

C:3
C:1 D:1

D:1 E:1 E:1

E:1

Prefix Paths for Ε:


{A,C,D,E}, {A,D,Ε}, {B,C,E}
34

Example
null
Compute Support for E
(minsup = 2)
A:7 B:3
How?
Follow pointers while
summing up counts: C:3
1+1+1 = 3 > 2 C:1 D:1
E is frequent

D:1 E:1 E:1

E:1

{E} is frequent so we can now consider suffixes DE, CE, BE, AE


35

Example
null
E is frequent so we proceed with Phase 2

Phase 2 A:7 B:3


Convert the prefix tree of E into a
conditional FP-tree
C:3
Two changes C:1 D:1
(1) Recompute support
(2) Prune infrequent D:1 E:1 E:1

E:1
36

Example
null
Recompute Support
A:7 B:3

The support counts for some of the


nodes include transactions that do C:3
C:1 D:1
not end in E

For example in null->B->C->E we


count {B, C} D:1 E:1 E:1

The support of any node is equal to E:1


the sum of the support of leaves
with label E in its subtree
37

Example
null

A:7 B:3

C:3
C:1 D:1

D:1 E:1 E:1

E:1
38

Example
null

A:7 B:3

C:1
C:1 D:1

D:1 E:1 E:1

E:1
39

Example
null

A:7 B:1

C:1
C:1 D:1

D:1 E:1 E:1

E:1
40

Example
null

A:7 B:1

C:1
C:1 D:1

D:1 E:1 E:1

E:1
41

Example
null

A:7 B:1

C:1
C:1 D:1

D:1 E:1 E:1

E:1
42

Example
null

A:2 B:1

C:1
C:1 D:1

D:1 E:1 E:1

E:1
43

Example
null

A:2 B:1

C:1
C:1 D:1

D:1 E:1 E:1

E:1
44

Example
null

A:2 B:1
Truncate
Delete the nodes of Ε C:1
C:1 D:1

D:1 E:1 E:1

E:1
45

Example
null

A:2 B:1
Truncate
Delete the nodes of Ε C:1
C:1 D:1

D:1 E:1 E:1

E:1
46

Example
null

A:2 B:1
Truncate
Delete the nodes of Ε C:1
C:1 D:1

D:1
47

Example
null

A:2 B:1
Prune infrequent
In the conditional FP-tree C:1
some nodes may have C:1 D:1
support less than minsup
e.g., B needs to be D:1
pruned
This means that B
appears with E less than
minsup times
48

Example
null

A:2 B:1

C:1
C:1 D:1

D:1
49

Example
null

A:2 C:1

C:1 D:1

D:1
50

Example
null

A:2 C:1

C:1 D:1

D:1

The conditional FP-tree for E


Repeat the algorithm for {D, E}, {C, E}, {A, E}
51

Example
null

A:2 C:1

C:1 D:1

D:1

Phase 1

Find all prefix paths that contain D (DE) in the conditional FP-tree
52

Example
null

A:2

C:1 D:1

D:1

Phase 1

Find all prefix paths that contain D (DE) in the conditional FP-tree
53

Example
null

A:2

C:1 D:1

D:1

Compute the support of {D,E} by following the pointers in the tree


1+1 = 2 ≥ 2 = minsup

{D,E} is frequent
54

Example
null

A:2

C:1 D:1

D:1
Phase 2

Construct the conditional FP-tree


1. Recompute Support
2. Prune nodes
55

Example
null

A:2

Recompute support C:1 D:1

D:1
56

Example
null

A:2

Prune nodes C:1 D:1

D:1
57

Example
null

A:2

Prune nodes C:1


58

Example
null

A:2

Small support
Prune nodes C:1
59

Example
null

A:2

Final condition FP-tree for {D,E}

The support of A is ≥ minsup so {A,D,E} is frequent


Since the tree has a single node we return to the next
subproblem
60

Example
null

A:2 C:1

C:1 D:1

D:1

The conditional FP-tree for E

We repeat the algorithm for {D,E}, {C,E}, {A,E}


61

Example
null

A:2 C:1

C:1 D:1

D:1

Phase 1

Find all prefix paths that contain C (CE) in the conditional FP-tree
62

Example
null

A:2 C:1

C:1

Phase 1

Find all prefix paths that contain C (CE) in the conditional FP-tree
63

Example
null

A:2 C:1

C:1

Compute the support of {C,E} by following the pointers in the tree


1+1 = 2 ≥ 2 = minsup

{C,E} is frequent
64

Example
null

A:2 C:1

C:1

Phase 2

Construct the conditional FP-tree


1. Recompute Support
2. Prune nodes
65

Example
null

A:1 C:1

Recompute support C:1


66

Example
null

A:1 C:1

Prune nodes C:1


67

Example
null

A:1

Prune nodes
68

Example
null

A:1

Prune nodes
69

Example
null

Prune nodes

Return to the previous subproblem


70

Example
null

A:2 C:1

C:1 D:1

D:1

The conditional FP-tree for E

We repeat the algorithm for {D,E}, {C,E}, {A,E}


71

Example
null

A:2 C:1

C:1 D:1

D:1

Phase 1

Find all prefix paths that contain A (AE) in the conditional FP-tree
72

Example
null

A:2

Phase 1

Find all prefix paths that contain A (AE) in the conditional FP-tree
73

Example
null

A:2

Compute the support of {A,E} by following the pointers in the tree


2 ≥ minsup

{A,E} is frequent

There is no conditional FP-tree for {A,E}


74

Example
• So for E we have the following frequent itemsets
{E}, {D,E}, {A,D,E}, {C,E}, {A,E}

• We proceed with D (optional) ?


75

Example
null

Ending in D
A:7 B:3

B:5 C:3
C:1 D:1

Header table D:1


C:3
Item Pointer D:1 E:1 E:1
A D:1
B E:1
C D:1
D
E
76

Example
null
Phase 1 – construct
prefix tree B:3
A:7
Find all prefix paths that
contain D
Support 5 > minsup, D is B:5 C:3
C:1 D:1
frequent
Phase 2 D:1
C:3
Convert prefix tree into D:1
conditional FP-tree D:1

D:1
77

Example
null

A:7 B:3

B:5 C:3
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
78

Example
null

A:7 B:3

B:2 C:3
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
79

Example
null

A:3 B:3

B:2 C:3
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
80

Example
null

A:3 B:3

B:2 C:1
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
81

Example
null

A:3 B:1

B:2 C:1
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Recompute support
82

Example
null

A:3 B:1

B:2 C:1
C:1 D:1

D:1
C:1
D:1
D:1

D:1

Prune nodes
83

Example
null

A:3 B:1

B:2 C:1
C:1

C:1

Prune nodes
84

Example
null

A:3 B:1

B:2 C:1
C:1

C:1

Construct conditional FP-trees for {C,D}, {B,D}, {A,D}

And so on….
85

Observations
• At each recursive step we solve a subproblem
• Construct the prefix tree
• Compute the new support
• Prune nodes
• Subproblems are disjoint so we never consider
the same itemset twice

• Support computation is efficient – happens


together with the computation of the frequent
itemsets.
86

Observations
• The efficiency of the algorithm depends on the
compaction factor of the dataset

• If the tree is bushy then the algorithm does not


work well, it increases a lot of number of
subproblems that need to be solved.

You might also like