0% found this document useful (0 votes)
10 views

Tutorial 02

Uploaded by

ketkikdighe01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Tutorial 02

Uploaded by

ketkikdighe01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 17

SEG4630 2009-

2010
Tutorial 2 – Frequent Pattern
Mining
Frequent Patterns
 Frequent pattern: a pattern (a set of items,
subsequences, substructures, etc.) that occurs
frequently in a data set
 itemset: A set of one or more items
 k-itemset: X = {x1, …, xk}
 Mining algorithms
Tid Items bought
 Apriori
 FP-growth 10 Beer, Nuts, Diaper

20 Beer, Coffee, Diaper

30 Beer, Diaper, Eggs

40 Nuts, Eggs, Milk


50 Nuts, Coffee, Diaper, Eggs, Beer
2
Support & Confidence
 Support
 (absolute) support, or, support count of X: Frequency or
occurrence of an itemset X
 (relative) support, s, is the fraction of transactions that
contains X (i.e., the probability that a transaction contains X)
 An itemset X is frequent if X’s support is no less than a minsup
threshold
 Confidence (association rule: XY )
 sup(XY)/sup(x) (conditional prob.: Pr(Y|X) = Pr(X^Y)/Pr(X) )
 confidence, c, conditional probability that a transaction
having X also contains Y
 Find all the rules XY with minimum support and confidence
 sup(XY) ≥ minsup
 sup(XY)/sup(X) ≥ minconf
3
Apriori Principle
 If an itemset is frequent, then all of its subsets must also be
frequent (X  Y)
 If an itemset is infrequent, then all of its supersets must be
null
infrequent too (¬Y  ¬X)
frequent A B C D E

frequent infrequent
AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
infrequent

ABCD ABCE ABDE ACDE BCDE

ABCDE
Apriori: A Candidate
Generation & Test Approach
 Initially, scan DB once to get frequent 1-
itemset
 Loop
 Generate length (k+1) candidate
itemsets from length k frequent
itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set
can be generated

5
Generate candidate itemsets
 Example

Frequent 3-itemsets:
{1, 2, 3}, {1, 2, 4}, {1, 2, 5}, {1, 3, 4},
{1, 3, 5}, {2, 3, 4}, {2, 3, 5} and {3, 4, 5}
 Candidate 4-itemset:
{1, 2, 3, 4}, {1, 2, 3, 5}, {1, 2, 4, 5}, {1, 3, 4,
5}, {2, 3, 4, 5}
 Which need not to be counted?

{1, 2, 4, 5} & {1, 3, 4, 5} & {2, 3, 4, 5}

6
Maximal vs Closed Frequent
Itemsets
 An itemset X is a max-pattern if X is frequent and
there exists no frequent super-pattern Y ‫ כ‬X
 An itemset X is closed if X is frequent and there
exists no super-pattern Y ‫ כ‬X, with the same
support as X
Frequent
Closed Frequent Itemsets are Lossless: Itemsets
the support for any frequent itemset
can be deduced from the closed Closed
Frequent
frequent itemsets Itemsets

Maximal
Frequent
Itemsets

7
Maximal vs Closed Frequent
Itemsets
null Closed but
minsup=2 not
maximal
124 123 1234 245 345
A B C D E
Closed and
maximal
frequent
12 124 24 4 123 2 3 24 34 45
AB AC AD AE BC BD BE CD CE DE

12 2 24 4 4 2 3 4
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

# Closed = 9
2 4
ABCD ABCE ABDE ACDE BCDE # Maximal = 4

8
ABCDE
Algorithms to find frequent
pattern
 Apriori: uses a generate-and-test approach –
generates candidate itemsets and tests if they
are frequent
 Generation of candidate itemsets is expensive (in both
space and time)
 Support counting is expensive
 Subset checking (computationally expensive)
 Multiple Database scans (I/O)
 FP-Growth: allows frequent itemset discovery
without candidate generation. Two step:
 1.Build a compact data structure called the FP-tree
 2 passes over the database
 2.extracts frequent itemsets directly from the FP-tree
 Traverse through FP-tree

9
Pattern-Growth Approach: Mining
Frequent Patterns Without
Candidate Generation
 The FP-Growth Approach
 Depth-first search (Apriori: Breadth-first search)
 Avoid explicit candidate generation
FP-Growth approach:
• For each frequent item, Fp-tree construatioin:
construct its conditional pattern- • Scan DB once, find
base, and then its conditional frequent 1-itemset
FP-tree (single item pattern)
• Repeat the process on each • Sort frequent items in
newly created conditional FP- frequency descending
tree order, f-list
• Until the resulting FP-tree is • Scan DB again, construct
empty, or it contains only one FP-tree 10
path—single path will generate
all the combinations of its sub-
FP-tree Size
 The size of an FP­tree is typically smaller than the
size of the uncompressed data because many
transactions often share a few items in common
 Best­case scenario: All transactions have the same
set of items, and the FP­tree contains only a single
branch of nodes.
 Worst­case scenario: Every transaction has a unique
set of items. As none of the transactions have any
items in common, the size of the FP­tree is
effectively the same as the size of the original data.
 The size of an FP­tree also depends on how the
items are ordered

11
Example
 FP-tree with item  FP-tree with item ascending
descending ordering ordering

12
Find Patterns Having p From P-
conditional Database
 Starting at the frequent item header table in the FP-tree
 Traverse the FP-tree by following the link of each frequent
item p
 Accumulate all of transformed prefix paths of item p to
form p’s conditional pattern base

{}
Header Table
f:4 c:1 Conditional pattern bases
Item frequency head
f 4 item cond. pattern base
c 4 c:3 b:1 b:1
c f:3
a 3
b 3 a:3 p:1 a fc:3
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p:2 m:1 p fcam:2, cb:1 13
FP-Growth
1 f, c, a, m
4 c, b +p
5 f, c, a, m
1 f, c, a
1 f, c, a, m, p 2 f, c, a, b + m
2 f, c, a, b, m 5 f, c, a
1 f, c, a, m 2 f, c, a
3 f, b 3f +b
2 f, c, a, b, m
4 c, b, p 4c
3 f, b 1 f, c, a 1 f, c
5 f, c, a, m, p
4 c, b 2 f, c, a, b 2 f, c + a
5 f, c, a, m 3 f, b 1 f, c, a 5 f, c
4 c, b 2 f, c, a
5 f, c, a 3 f 1 f, c
4 c 2 f, c
5 f, c, a 3 f
4 c 14

5 f, c
FP-Growth
1 f, c, a, m
+p 1 f, c, a
4 c, b
2 f, c, a, b + m
5 f, c, a, m
5 f, c, a
(1) (2)
1 f, c, a, m, p
2 f, c, a, b, m 2 f, c, a 1 f, c
3 f, b 3f +b 2 f, c + a
4 c, b, p 4c 5 f, c
5 f, c, a, m, p (3) (4)

1f
2f
+c f: 1,2,3,5
4
5f
15
(6)
(5)
{} {}

f:2 c:1 f:3

c:2 b:1 c:3


{}
a:2 p:1 a:3
f:4 c:1
m:2 + b:1 +
p m
c:3 b:1 b:1 (1) (2)

a:3 p:1 {} {}
{}
m:2 b:1
f:2 c:1 f:3 f:4
f:3
p:2 m:1
c:1 c:3 +
+
+ c
a:1 a
b 16

(3) (4) (5) (6)


1 f, c, a, m 1 c
4 c, b +p 4 c +p
p: 3
cp: 3
5 f, c, a, m 5 c

1 f, c, a 1 f, c, a
2 f, c, a, b + m 2 f, c, a + m
m: 3
min_sup = 3
5 f, c, a fm: 3
5 f, c, a
cm: 3
2 f, c, a
am: 3
3f +b b: 3 fcm: 3
1 f, c, a, m, p 4c fam: 3
2 f, c, a, b, m a: 3 cam: 3
1 f, c
3 f, b fa: 3 fcam: 3
2 f, c + a
4 c, b, p ca: 3
5 f, c fca: 3
5 f, c, a, m, p

1f
2f c: 4
+c
4 fc: 3
5f
17
f: 1,2,3,5 f: 4

You might also like