0% found this document useful (0 votes)

58 views31 pages

CSE 385 - Data Mining and Business Intelligence - Lecture 03 - Part 01

The document discusses different methods for mining frequent itemsets from transactional databases, including the Apriori algorithm which uses candidate generation and support counting, and the FP-Growth approach which avoids candidate generation by building an FP-tree structure and mining patterns by pattern fragment growth. It also presents methods for improving the efficiency of frequent itemset mining, such as sampling databases to reduce scans, partitioning patterns and databases to mine subsets in parallel, and building conditional FP-trees to recursively mine conditional patterns.

Uploaded by

Islam Ashraf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views31 pages

CSE 385 - Data Mining and Business Intelligence - Lecture 03 - Part 01

Uploaded by

Islam Ashraf

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

DATA MINING AND

BUSINESS INTELLIGENCE -
LECTURE 03
Dr. Mahmoud Mounir
[email protected]
Scalable Frequent Itemset Mining Methods

◼ Apriori: A Candidate Generation-and-Test Approach

◼ Improving the Efficiency of Apriori

◼ FPGrowth: A Frequent Pattern-Growth Approach

2
Further Improvement of the Apriori Method

◼ Major computational challenges

◼ Multiple scans of transaction database
◼ Huge number of candidates
◼ Tedious workload of support counting for candidates
◼ Improving Apriori: general ideas
◼ Reduce passes of transaction database scans
◼ Shrink number of candidates
◼ Facilitate support counting of candidates

3
Sampling for Frequent Patterns

◼ Select a sample of original database, mine frequent patterns within sample

using Apriori
◼ Scan database once to verify frequent itemsets found in sample, only borders
of closure of frequent patterns are checked
◼ Example: check abcd instead of ab, ac, …, etc.
◼ Scan database again to find missed frequent patterns
◼ H. Toivonen. Sampling large databases for association rules. In VLDB’96

4
Scalable Frequent Itemset Mining Methods

◼ Apriori: A Candidate Generation-and-Test Approach

◼ Improving the Efficiency of Apriori

◼ FPGrowth: A Frequent Pattern-Growth Approach

5
Pattern-Growth Approach: Mining Frequent
Patterns Without Candidate Generation
◼ Bottlenecks of the Apriori approach
◼ Breadth-first (i.e., level-wise) search
◼ Candidate generation and test
◼ Often generates a huge number of candidates
◼ The FPGrowth Approach (J. Han, J. Pei, and Y. Yin, SIGMOD’ 00)
◼ Depth-first search
◼ Avoid explicit candidate generation
◼ Major philosophy: Grow long patterns from short ones using local
frequent items only
◼ “abc” is a frequent pattern
◼ Get all transactions having “abc”, i.e., project DB on abc: DB|abc
◼ “d” is a local frequent item in DB|abc → abcd is a frequent pattern
6
Construct FP-tree from a Transaction Database

TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o, w} {f, b} min_support = 3
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p} {}
Header Table
1. Scan DB once, find
frequent 1-itemset (single Item frequency head f:4 c:1
item pattern) f 4
c 4 c:3 b:1 b:1
2. Sort frequent items in a 3
frequency descending b 3 a:3 p:1
order, f-list m 3
p 3
3. Scan DB again, construct m:2 b:1
FP-tree
F-list = f-c-a-b-m-p p:2 m:1
7
Partition Patterns and Databases

◼ Frequent patterns can be partitioned into subsets

according to f-list
◼ F-list = f-c-a-b-m-p

◼ Patterns containing p

◼ Patterns having m but no p

◼ …

◼ Patterns having c but no a nor b, m, p

◼ Pattern f

◼ Completeness and non-redundency

8
Find Patterns Having P From P-conditional Database

◼ Starting at the frequent item header table in the FP-tree

◼ Traverse the FP-tree by following the link of each frequent item p
◼ Accumulate all of transformed prefix paths of item p to form p’s
conditional pattern base

{}
Header Table
f:4 c:1 Conditional pattern bases
Item frequency head
f 4 item cond. pattern base
c 4 c:3 b:1 b:1 c f:3
a 3
b 3 a:3 p:1 a fc:3
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1

p:2 m:1 p fcam:2, cb:1

9
From Conditional Pattern-bases to Conditional FP-trees

◼ For each pattern-base

◼ Accumulate the count for each item in the base

◼ Construct the FP-tree for the frequent items of the

pattern base

m-conditional pattern base:

{} fca:2, fcab:1
Header Table
Item frequency head All frequent
f:4 c:1 patterns relate to m
f 4 {}
c 4 c:3 b:1 b:1 m,

a 3 f:3  fm, cm, am,
b 3 a:3 p:1 fcm, fam, cam,
m 3 c:3 fcam
p 3 m:2 b:1
p:2 m:1 a:3
m-conditional FP-tree
10
Recursion: Mining Each Conditional FP-tree
{}

{} Cond. pattern base of “am”: (fc:3) f:3

c:3
f:3
am-conditional FP-tree
c:3 {}
Cond. pattern base of “cm”: (f:3)
a:3 f:3
m-conditional FP-tree
cm-conditional FP-tree

{}

Cond. pattern base of “cam”: (f:3) f:3

cam-conditional FP-tree

11
(1) Fast scan the transaction database
Items Support Count
Tid Itemset
M 3
T100 {M, O, N, K, E, Y} Find support count of O 3
T200 {D, O, N, K, E, Y} each item. set of (L)
N 2
frequent item patterns
T300 {M, A, K, E}
that contains only items K 5
T400 {M, U, C, K, Y} that achieve minimum E 4
T500 {C, O, O, K, I, E} support
Y 3
Min_Support = 3 D 1
Min_Confidence = 80%
A 1
U 1
C 2
I 1

Items Support Count Items Support Count

Find a set (L) of M 3 K 5
Sort the list in a L
frequent item
O 3 descending or E 4
patterns that
decreasing order
contains only items K 5 M 3
based on each item
that achieve E 4 O 3
support count
minimum support
Y 3 Y 3
Using Apriori Algorithm
Items Support Count Items Support Count Items Support Count
M 3 M, O 1 M, K 3
O 3 M, K 3 O, K 3
K 5 M, E 2 O, E 3
E 4 M, Y 2 K, E 4
Y 3 O, K 3 K, Y 3
O, E 3
O, Y 2
Items Support Count
K, E 4 M, K, O
K, Y 3 M, K, E, O
E, Y 2 M, K, E 2
M, K, Y 2
O, K, E 3
O, K, E, Y 2
K, E, Y 2
(2) Construct the FP tree
Tid Itemset Ordered itemset
T100 {M, O, N, K, E, Y} {K, E, M, O, Y}
Order itemset in
each transaction T200 {D, O, N, K, E, Y} {K, E, O, Y}
Build the FP tree
based on their T300 {M, A, K, E} {K, E, M}
priority in list L T400 {M, U, C, K, Y} {K, M, Y}
T500 {C, O, O, K, I, E} {K, E, O}

Items Support Count Node Link

K 5

E 4

M 3

O 3

Y 3
(2) Construct the FP tree
Tid Itemset Ordered itemset
T100 {M, O, N, K, E, Y} {K, E, M, O, Y}
Order itemset in
each transaction T200 {D, O, N, K, E, Y} {K, E, O, Y}
Build the FP tree
based on their T300 {M, A, K, E} {K, E, M}
priority in list L T400 {M, U, C, K, Y} {K, M, Y}
T500 {C, O, O, K, I, E} {K, E, O}