0% found this document useful (0 votes)
303 views22 pages

Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682

This document discusses frequent pattern mining and sequential pattern mining algorithms. It provides an overview of the FP-growth algorithm for frequent pattern mining and the generalized sequential pattern (GSP) mining algorithm. The FP-growth algorithm uses an FP-tree to store compressed and crucial information about frequent patterns and mines the tree to find the complete set of frequent patterns. The GSP algorithm finds sequential patterns by scanning the database multiple times and generating candidate sequences of increasing length.

Uploaded by

Rahul Kelaskar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
303 views22 pages

Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682

This document discusses frequent pattern mining and sequential pattern mining algorithms. It provides an overview of the FP-growth algorithm for frequent pattern mining and the generalized sequential pattern (GSP) mining algorithm. The FP-growth algorithm uses an FP-tree to store compressed and crucial information about frequent patterns and mines the tree to find the complete set of frequent patterns. The GSP algorithm finds sequential patterns by scanning the database multiple times and generating candidate sequences of increasing length.

Uploaded by

Rahul Kelaskar
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 22

Group members:

Rahul Kelaskar A – 636


Anish Khale A - 638
Dhaval Doshi A - 682 Guide : Mr. Gautam Borkar
• Process of exploring and analyzing data
• Iterative multi-step process
• Involves data preparation, search for patterns, knowledge
evaluation and interpretation
• Arrangement or Ordering
• Existence of organization of underlying structure
 Application of algorithms to
extract patterns in data.

 Act of taking in raw data and


taking “action” based on the
“category” of the pattern.
Identifies underlying patterns from transformed data.
 Input:
A database DB, represented by FP-tree and a
minimum support S.
 Output:
The complete set of frequent patterns.
 Method:
call FP-growth(FP-tree, null)
 Procedure FP-growth(Tree, α)
 {
 if Tree contains a single prefix path // Mining single prefix-path FP-tree
 then {
 let P be the single prefix-path part of Tree;
 let Q be the multipath part with the top branching node replaced by a null root;
 for each combination (denoted as β) of the nodes in the path P do
 generate pattern β ∪ α with support = minimum support of nodes in β;
 let freq pattern set(P) be the set of patterns so generated; }
 else let Q be Tree;
 for each item ai in Q do { // Mining multipath FP-tree
 generate pattern β = ai ∪ α with support = ai .support;
 construct β’s conditional pattern-base and then β’s conditional FP-tree Treeβ ;
 if Treeβ = ∅
 then call FP-growth(Treeβ, β);
 let freq pattern set(Q) be the set of patterns so generated; }
 return(freq pattern set(P) ∪ freq pattern set(Q) ∪ (freq pattern set(P) ×freq pattern
set(Q)))
 }
Example:[1]

{}
Header Table
Conditional pattern bases
Item frequency head f:4 c:1 item cond. pattern base
f 4 c f:3
c 4 c:3 b:1 b:1
a 3 a fc:3
b 3 a:3 p:1 b fca:1, f:1, c:1
m 3
p 3 m fca:2, fcab:1
m:2 b:1
p fcam:2, cb:1
p:2 m:1
m-conditional pattern base:
fca:2, fcab:1
{}
Header Table
f:4 c:1 {} All frequent patterns
Item frequency head relate to m
f 4 m,
c:3 b:1 b:1  f:3 
c 4
fm, cm, am,
a 3 c:3
b 3 a:3 p:1 fcm, fam, cam,
m 3 a:3 fcam
p 3 m:2 b:1
m-conditional FP-tree
p:2 m:1
GENERALIZED SEQUENTIAL PATTERN MINING
ALGORITHM
1. Initially, every item in DB is a candidate of
length-1.
2. For each level (i.e., sequences of length-k) do
2.1 Scan database to collect support count for each
candidate sequence.
2.2 Generate candidate length-(k+1) sequences from
length-k frequent sequences using Apriori.
3. Repeat until no frequent sequence or no
candidate can be found.
Cand Sup
<a> 3
Seq. ID Sequence
10 <(bd)cb(ac)> <b> 5
20 <(bf)(ce)b(fg)> <c> 4
30 <(ah)(bf)abf> <d> 3
40 <(be)(ce)d> <e> 3
50 <a(bd)bcb(ade)>
<f> 2
Minimum support =2 <g> 1
<h> 1
Length-1 Candidates
<a> <b> <c> <d> <e> <f>
<a> <aa> <ab> <ac> <ad> <ae> <af>
<b> <ba> <bb> <bc> <bd> <be> <bf>
<c> <ca> <cb> <cc> <cd> <ce> <cf>
<d> <da> <db> <dc> <dd> <de> <df>
<e> <ea> <eb> <ec> <ed> <ee> <ef>
<f> <fa> <fb> <fc> <fd> <fe> <ff>
<a> <b> <c> <d> <e> <f>
<a> <(ab)> <(ac)> <(ad)> <(ae)> <(af)>
<b> <(bc)> <(bd)> <(be)> <(bf)>
<c> <(cd)> <(ce)> <(cf)>
<d> <(de)> <(df)>
Length-2 Candidates
<e> <(ef)>
<f>
5th scan: 1 cand. <(bd)cba> Cand. cannot pass
1 length-5 seq. pat. sup. threshold

4th scan: 8 cand. <abba> <(bd)bc> … Cand. not in DB at all


6 length-4 seq. pat.
3rd scan: 46 cand. <abb> <aab> <aba> <baa> <bab> …
19 length-3 seq. pat

2nd scan: 51 cand. <aa> <ab> … <af> <ba> <bb> … <ff> <(ab)> … <(ef)>
19 length-2 seq. pat.
1st scan: 8 cand. <a> <b> <c> <d> <e> <f> <g> <h>
6 length-1 seq. pat.
Seq. ID Sequence

min_sup =2 10 <(bd)cb(ac)>
20 <(bf)(ce)b(fg)>
30 <(ah)(bf)abf>
40 <(be)(ce)d>
50 <a(bd)bcb(ade)>
 Security(credit card fraud)
 Global climate modeling
 Business
 Disaster Management
 [1] Florian Verhein, Frequent Pattern Growth (FP-Growth)
Algorithm, 2008.

 [2] An Introduction to Apriori-based method: GSP


(Generalized Sequential Patterns: Srikant & Agrawal
[EDBT’96].
 

You might also like