0% found this document useful (0 votes)

19 views

Lecture 13

This document discusses sequential pattern mining. It begins by defining sequence databases and the goal of sequential pattern mining, which is to find all frequent subsequences given a sequence database and minimum support threshold. It then describes some applications of sequential pattern mining and challenges in the field. Finally, it explains two common algorithms for sequential pattern mining: GSP, which is Apriori-based, and SPADE, which uses a vertical format to more efficiently generate candidates and test for frequency. Both algorithms aim to find all frequent subsequences while minimizing the number of database scans.

Uploaded by

bbasmiu

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views

Lecture 13

Uploaded by

bbasmiu

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Seial Pattern Mining

1
Outline
• What is sequence database and sequential
pattern mining
• Methods for sequential pattern mining
• Constraint-based sequential pattern mining
• Periodicity analysis for sequence data

2
Sequence Databases
• A sequence database consists of ordered elements
or events
• Transaction databases vs. sequence databases

A transaction database A sequence database

TID itemsets SID sequences
10 a, b, d 10 <a(abc)(ac)d(cf)>
20 a, c, d 20 <(ad)c(bc)(ae)>
30 a, d, e 30 <(ef)(ab)(df)cb>
40 b, e, f 40 <eg(af)cbc>

3
Applications
• Applications of sequential pattern mining
– Customer shopping sequences:
• First buy computer, then CD-ROM, and then digital camera,
within 3 months.
– Medical treatments, natural disasters (e.g., earthquakes),
science & eng. processes, stocks and markets, etc.
– Telephone calling patterns, Weblog click streams
– DNA sequences and gene structures

4
Subsequence vs. super sequence
• A sequence is an ordered list of events,
denoted < e1 e2 … el >
• Given two sequences α=< a1 a2 … an > and β=<
b1 b2 … bm >
• α is called a subsequence of β, denoted as α⊆
β, if there exist integers 1≤ j1 < j2 <…< jn ≤m
such that a1 ⊆ bj1, a2 ⊆ bj2,…, an ⊆ bjn
• β is a super sequence of α
– E.g.α=< (ab), d> and β=< (abc), (de)>

5
What Is Sequential Pattern Mining?
• Given a set of sequences and support
threshold, find the complete set of frequent
subsequences A sequence : < (ef) (ab) (df) c b >
A sequence database
SID sequence An element may contain a set of items.
10 <a(abc)(ac)d(cf)> Items within an element are unordered
and we list them alphabetically.
20 <(ad)c(bc)(ae)>
30 <(ef)(ab)(df)cb> <a(bc)dc> is a subsequence
40 <eg(af)cbc> of <a(abc)(ac)d(cf)>

Given support threshold min_sup =2, <(ab)c> is a

sequential pattern 6
Challenges on Sequential Pattern
Mining
• A huge number of possible sequential patterns
are hidden in databases
• A mining algorithm should
– find the complete set of patterns, when
possible, satisfying the minimum support
(frequency) threshold
– be highly efficient, scalable, involving only a
small number of database scans
– be able to incorporate various kinds of user-
7
specific constraints
Studies on Sequential Pattern
Mining
• Concept introduction and an initial Apriori-like algorithm
– Agrawal & Srikant. Mining sequential patterns, [ICDE’95]
• Apriori-based method: GSP (Generalized Sequential Patterns: Srikant
& Agrawal [EDBT’96])
• Pattern-growth methods: FreeSpan & PrefixSpan (Han et al.KDD’00;
Pei, et al. [ICDE’01])
• Vertical format-based mining: SPADE (Zaki [Machine Leanining’00])
• Constraint-based sequential pattern mining (SPIRIT: Garofalakis,
Rastogi, Shim [VLDB’99]; Pei, Han, Wang [CIKM’02])
• Mining closed sequential patterns: CloSpan (Yan, Han & Afshar
[SDM’03])
8
Methods for sequential pattern
mining
• Apriori-based Approaches
– GSP
– SPADE
• Pattern-Growth-based Approaches
– FreeSpan
– PrefixSpan

9
The Apriori Property of Sequential
Patterns
• A basic property: Apriori (Agrawal & Sirkant’94)
– If a sequence S is not frequent, then none of the
super-sequences of S is frequent
– E.g, <hb> is infrequent so do <hab> and
<(ah)b>
Seq. ID Sequence
10 <(bd)cb(ac)>
Given support threshold
20 <(bf)(ce)b(fg)>
min_sup =2
30 <(ah)(bf)abf>
40 <(be)(ce)d>
50 <a(bd)bcb(ade)> 10
GSP—Generalized Sequential Pattern
Mining
• GSP (Generalized Sequential Pattern) mining
algorithm
• Outline of the method
– Initially, every item in DB is a candidate of length-1
– for each level (i.e., sequences of length-k) do
• scan database to collect support count for each candidate
sequence
• generate candidate length-(k+1) sequences from length-k
frequent sequences using Apriori
– repeat until no frequent sequence or no candidate can
be found
• Major strength: Candidate pruning by Apriori 11
Finding Length-1 Sequential
Patterns
• Initial candidates:
– <a>, , <c>, <d>, <e>, <f>, <g>, <h> Cand Sup
• Scan database once, count support <a> 3
for candidates 5
<c> 4
min_sup =2
<d> 3
Seq. ID Sequence
10 <(bd)cb(ac)> <e> 3
20 <(bf)(ce)b(fg)> <f> 2
30 <(ah)(bf)abf> <g> 1
40 <(be)(ce)d>
50 <a(bd)bcb(ade)>
<h> 1
12
Generating Length-2 Candidates

<a> <c> <d> <e> <f>

<a> <aa> <ab> <ac> <ad> <ae> <af>
51 length-2 <ba> <bb> <bc> <bd> <be> <bf>
Candidates <c> <ca> <cb> <cc> <cd> <ce> <cf>
<d> <da> <db> <dc> <dd> <de> <df>
<e> <ea> <eb> <ec> <ed> <ee> <ef>
<f> <fa> <fb> <fc> <fd> <fe> <ff>

<a> <c> <d> <e> <f> Without Apriori

<a> <(ab)> <(ac)> <(ad)> <(ae)> <(af)>
property,
 <(bc)> <(bd)> <(be)> <(bf)>
8*8+8*7/2=92
<c> <(cd)> <(ce)> <(cf)>
<d> <(de)> <(df)>
candidates
<e> <(ef)> Apriori prunes
<f> 44.57% candidates
13
Finding Length-2 Sequential
Patterns
• Scan database one more time, collect support
count for each length-2 candidate
• There are 19 length-2 candidates which pass
the minimum support threshold
– They are length-2 sequential patterns

14
The GSP Mining Process

5th scan: 1 cand. 1 length-5 seq. <(bd)cba> Cand. cannot pass

pat. sup. threshold

4th scan: 8 cand. 6 length-4 seq. <abba> <(bd)bc> … Cand. not in DB at all
pat.
3rd scan: 46 cand. 19 length-3 seq. <abb> <aab> <aba> <baa> <bab> …
pat. 20 cand. not in DB at all
2nd scan: 51 cand. 19 length-2 seq.
pat. 10 cand. not in DB at all <aa> <ab> … <af> <ba> <bb> … <ff> <(ab)> … <(ef)>
1st scan: 8 cand. 6 length-1 seq.
pat. <a> <c> <d> <e> <f> <g> <h>
Seq. ID Sequence

min_sup =2 10 <(bd)cb(ac)>
20 <(bf)(ce)b(fg)>
30 <(ah)(bf)abf>
40 <(be)(ce)d>
15
50 <a(bd)bcb(ade)>
The GSP Algorithm
• Take sequences in form of <x> as length-1
candidates
• Scan database once, find F1, the set of length-1
sequential patterns
• Let k=1; while Fk is not empty do
– Form Ck+1, the set of length-(k+1) candidates from Fk;
– If Ck+1 is not empty, scan database once, find Fk+1, the
set of length-(k+1) sequential patterns
– Let k=k+1;

16
The GSP Algorithm
• Benefits from the Apriori pruning
– Reduces search space
• Bottlenecks
– Scans the database multiple times
– Generates a huge set of candidate sequences

There is a need for

more efficient mining
methods
17
The SPADE Algorithm
• SPADE (Sequential PAttern Discovery using
Equivalent Class) developed by Zaki 2001
• A vertical format sequential pattern mining
method
• A sequence database is mapped to a large set
of Item: <SID, EID>
• Sequential pattern mining is performed by
– growing the subsequences (patterns) one item at a
18
time by Apriori candidate generation
The SPADE Algorithm

19
Bottlenecks of Candidate
Generate-and-test
• A huge set of candidates generated.
– Especially 2-item candidate sequence.

• Multiple Scans of database in mining.

– The length of each candidate grows by one at each
database scan.

• Inefficient for mining long sequential patterns.

– A long pattern grow up from short patterns
– An exponential number of short candidates
20
PrefixSpan (Prefix-Projected
Sequential Pattern Growth)
• PrefixSpan
– Projection-based
– But only prefix-based projection: less projections and
quickly shrinking sequences
• J.Pei, J.Han,… PrefixSpan : Mining sequential
patterns efficiently by prefix-projected pattern
growth. ICDE’01.

21
Prefix and Suffix (Projection)

• <a>, <aa>, <a(ab)> and <a(abc)> are prefixes

of sequence <a(abc)(ac)d(cf)>
• Given sequence <a(abc)(ac)d(cf)>

Prefix Suffix (Prefix-Based Projection)

22
Mining Sequential Patterns by
Prefix Projections
• Step 1: find length-1 sequential patterns
– <a>, , <c>, <d>, <e>, <f>
• Step 2: divide search space. The complete set of
seq. pat. can be partitioned into 6 subsets:
– The ones having prefix <a>;
– The ones having prefix ; SID sequence
– … 10 <a(abc)(ac)d(cf)>
– The ones having prefix <f> 20 <(ad)c(bc)(ae)>
30 <(ef)(ab)(df)cb>
40 <eg(af)cbc>
23
Finding Seq. Patterns with Prefix
<a>
• Only need to consider projections w.r.t. <a>
– <a>-projected database: <(abc)(ac)d(cf)>,
<(_d)c(bc)(ae)>, <(_b)(df)cb>, <(_f)cbc>

• Find all the length-2 seq. pat. Having prefix <a>:

<aa>, <ab>, <(ab)>, <ac>, <ad>, <af>
– Further partition into 6 subsets SID sequence
• Having prefix <aa>; 10 <a(abc)(ac)d(cf)>

• … 20 <(ad)c(bc)(ae)>
30 <(ef)(ab)(df)cb>
• Having prefix <af>
40 <eg(af)cbc>
24
Completeness of PrefixSpan
SDB
SID sequence
Length-1 sequential patterns
10 <a(abc)(ac)d(cf)>
<a>, , <c>, <d>, <e>, <f>
20 <(ad)c(bc)(ae)>
30 <(ef)(ab)(df)cb>
40 <eg(af)cbc>
Having prefix <a> Having prefix <c>, …, <f>
Having prefix 
<a>-projected database -projected database
<(abc)(ac)d(cf)> Length-2 sequential
…
<(_d)c(bc)(ae)> patterns
<(_b)(df)cb> <aa>, <ab>, <(ab)>,
<(_f)cbc> <ac>, <ad>, <af>
……
Having prefix <aa> Having prefix <af>

<aa>-proj. db … <af>-proj. db
25
The Algorithm of PrefixSpan
• Input: A sequence database S, and the
minimum support threshold min_sup
• Output: The complete set of sequential patterns
• Method: Call PrefixSpan(<>,0,S)
• Subroutine PrefixSpan(α, l, S|α)
• Parameters:
– α: sequential pattern,
– l: the length of α;
– S|α: the α-projected database, if α ≠<>; otherwise; the
sequence database S

26
The Algorithm of PrefixSpan(2)
• Method
1. Scan S|α once, find the set of frequent items b
such that:
a) b can be assembled to the last element of α to form
a sequential pattern; or
b) can be appended to α to form a sequential
pattern.
2. For each frequent item b, append it to α to form
a sequential pattern α’, and output α’;
3. For each α’, construct α’-projected database
S|α’, and call PrefixSpan(α’, l+1, S|α’).
27
Efficiency of PrefixSpan

• No candidate sequence needs to be

generated

• Projected databases keep shrinking

• Major cost of PrefixSpan: constructing

projected databases
– Can be improved by bi-level projections
28
Optimization in PrefixSpan
• Single level vs. bi-level projection
– Bi-level projection with 3-way checking may reduce
the number and size of projected databases
• Physical projection vs. pseudo-projection
– Pseudo-projection may reduce the effort of projection
when the projected database fits in main memory
• Parallel projection vs. partition projection
– Partition projection may avoid the blowup of disk
space

29
Scaling Up by Bi-Level Projection
• Partition search space based on length-2
sequential patterns
• Only form projected databases and pursue
recursive mining over bi-level projected
databases

30
Speed-up by Pseudo-projection
• Major cost of PrefixSpan: projection
– Postfixes of sequences often appear
repeatedly in recursive projected databases

• When (projected) database can be held

in main memory, use pointers to form
projections s=<a(abc)(ac)d(cf)>
<a>
– Pointer to the sequence
s|<a>: ( , 2) <(abc)(ac)d(cf)>
– Offset of the postfix
<ab>
s|<ab>: ( , 4) <(_c)(ac)d(cf)>
31
Pseudo-Projection vs. Physical
Projection
• Pseudo-projection avoids physically copying
postfixes
– Efficient in running time and space when
database can be held in main memory
• However, it is not efficient when database
cannot fit in main memory
– Disk-based random accessing is very costly
• Suggested Approach:
– Integration of physical and pseudo-projection
– Swapping to pseudo-projection when the data set 32

fits in memory
Performance on Data Set
C10T8S8I8

33
Performance on Data Set Gazelle

34
Effect of Pseudo-Projection

35
CloSpan: Mining Closed Sequential
Patterns
• A closed sequential pattern s:
there exists no superpattern s’
such that s’ ‫ כ‬s, and s’ and s
have the same support
• Motivation: reduces the
number of (redundant)
patterns but attains the same
expressive power
• Using Backward Subpattern
and Backward Superpattern
pruning to prune redundant
36
search space
CloSpan: Performance Comparison
with PrefixSpan

37
Constraints for Seq.-Pattern Mining
• Item constraint
– Find web log patterns only about online-bookstores
• Length constraint
– Find patterns having at least 20 items
• Super pattern constraint
– Find super patterns of “PC digital camera”
• Aggregate constraint
– Find patterns that the average price of items is over $100

38
More Constraints
• Regular expression constraint
– Find patterns “starting from Yahoo homepage, search
for hotels in Washington DC area”
– Yahootravel(WashingtonDC|DC)(hotel|motel|lodging)
• Duration constraint
– Find patterns about ±24 hours of a shooting
• Gap constraint
– Find purchasing patterns such that “the gap between
each consecutive purchases is less than 1 month”

39
From Sequential Patterns to Structured
Patterns
• Sets, sequences, trees, graphs, and other
structures
– Transaction DB: Sets of items
• {{i1, i2, …, im}, …}
– Seq. DB: Sequences of sets:
• {<{i1, i2}, …, {im, in, ik}>, …}
– Sets of Sequences:
• {{<i1, i2>, …, <im, in, ik>}, …}
– Sets of trees: {t1, t2, …, tn}
– Sets of graphs (mining for frequent subgraphs):
• {g1, g2, …, gn}
• Mining structured patterns in XML documents, 40
Episodes and Episode Pattern
Mining
• Other methods for specifying the kinds of
patterns
– Serial episodes: A ® B
– Parallel episodes: A & B
– Regular expressions: (A | B)C*(D ® E)
• Methods for episode pattern mining
– Variations of Apriori-like algorithms, e.g., GSP
– Database projection-based pattern growth
• Similar to the frequent pattern growth without candidate 41
generation
Periodicity Analysis
• Periodicity is everywhere: tides, seasons, daily power
consumption, etc.
• Full periodicity
– Every point in time contributes (precisely or approximately) to the
periodicity
• Partial periodicit: A more general notion
– Only some segments contribute to the periodicity
• Jim reads NY Times 7:00-7:30 am every week day
• Cyclic association rules
– Associations which form cycles
• Methods
– Full periodicity: FFT, other statistical analysis methods
– Partial and cyclic periodicity: Variations of Apriori-like mining
methods 42
Summary
• Sequential Pattern Mining is useful in many
application, e.g. weblog analysis, financial
market prediction, BioInformatics, etc.
• It is similar to the frequent itemsets mining, but
with consideration of ordering.
• We have looked at different approaches that are
descendants from two popular algorithms in
mining frequent itemsets
– Candidates Generation: AprioriAll and GSP
– Pattern Growth: FreeSpan and PrefixSpan

Product List-Infitek
No ratings yet
Product List-Infitek
6 pages
2206C - E13TAG2 Part Book
100% (3)
2206C - E13TAG2 Part Book
207 pages
Lecture 8-9 Association Rule Mining
No ratings yet
Lecture 8-9 Association Rule Mining
21 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
No ratings yet
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
67 pages
Mining Sequential Patterns
No ratings yet
Mining Sequential Patterns
43 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
26 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
34 pages
Concepts and Techniques: Mining Sequence Patterns in Transactional Databases
No ratings yet
Concepts and Techniques: Mining Sequence Patterns in Transactional Databases
26 pages
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
No ratings yet
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
22 pages
DM Lect 5_Sequence & Stream Mining
No ratings yet
DM Lect 5_Sequence & Stream Mining
32 pages
Lesson 5 Quiz
No ratings yet
Lesson 5 Quiz
11 pages
Lesson 5 Quiz
0% (6)
Lesson 5 Quiz
3 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
Outline: Problem Statement Definitions & Examples Strategies
No ratings yet
Outline: Problem Statement Definitions & Examples Strategies
7 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
BIDE: Efficient Mining of Frequent Closed Sequences: Jianyong Wang and Jiawei Han
No ratings yet
BIDE: Efficient Mining of Frequent Closed Sequences: Jianyong Wang and Jiawei Han
36 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Association Rule Mining: Iyad Batal
No ratings yet
Association Rule Mining: Iyad Batal
37 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
20 pages
Activity Factor PDF
No ratings yet
Activity Factor PDF
126 pages
ATC - Lecture - Notes - Data Mining Techniques - 2021
No ratings yet
ATC - Lecture - Notes - Data Mining Techniques - 2021
77 pages
Association Rule Mining: Iyad Batal
No ratings yet
Association Rule Mining: Iyad Batal
37 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Synthesis Testing
No ratings yet
Synthesis Testing
50 pages
Anti Virus 2.0 "Compilers in Disguise": Mihai G. Chiriac Bitdefender
No ratings yet
Anti Virus 2.0 "Compilers in Disguise": Mihai G. Chiriac Bitdefender
45 pages
Nrb Officer 081-08-07
No ratings yet
Nrb Officer 081-08-07
4 pages
Association Rule Mining
No ratings yet
Association Rule Mining
34 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
OS1 TEST
No ratings yet
OS1 TEST
6 pages
Chapter4
No ratings yet
Chapter4
32 pages
pp DWDM 4 5
No ratings yet
pp DWDM 4 5
26 pages
03 Chapter 15 Algorithms For Query Processing Optimization
No ratings yet
03 Chapter 15 Algorithms For Query Processing Optimization
35 pages
Assoc
No ratings yet
Assoc
166 pages
Grow More Faculty of Engineering
No ratings yet
Grow More Faculty of Engineering
10 pages
Tutorial 1 (Selected Answers) : ELEC4404 Signal Processing
No ratings yet
Tutorial 1 (Selected Answers) : ELEC4404 Signal Processing
2 pages
Automatic Test Pattern Generation: Outline
No ratings yet
Automatic Test Pattern Generation: Outline
35 pages
Day2.2 DataAnalyticsLanguages
No ratings yet
Day2.2 DataAnalyticsLanguages
100 pages
DBMS Bits1
No ratings yet
DBMS Bits1
28 pages
Mining Frequent Patterns, Association and Correlations
No ratings yet
Mining Frequent Patterns, Association and Correlations
100 pages
sample_mid1
No ratings yet
sample_mid1
22 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
17 pages
AR Mining Rev
No ratings yet
AR Mining Rev
45 pages
COA 1ST sessional 2023
No ratings yet
COA 1ST sessional 2023
1 page
Artificial Intelligence: Foundations & Applications: Prof. Partha P. Chakrabarti & Arijit Mondal
No ratings yet
Artificial Intelligence: Foundations & Applications: Prof. Partha P. Chakrabarti & Arijit Mondal
40 pages
Dwdmunit2 Assoc
No ratings yet
Dwdmunit2 Assoc
55 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
Mr.cooper questions and Analysis
100% (5)
Mr.cooper questions and Analysis
10 pages
Curs09 Dtur Working With Gaussian09
No ratings yet
Curs09 Dtur Working With Gaussian09
64 pages
NLHDH 411
No ratings yet
NLHDH 411
7 pages
De Giua Hoc Ki 1 Toan 10 Nam 2022 2023 Truong THPT Chu Van An Ha Noi
No ratings yet
De Giua Hoc Ki 1 Toan 10 Nam 2022 2023 Truong THPT Chu Van An Ha Noi
10 pages
Brian's Concise Common Lisp Reference Sheet: Basic List Stuff
No ratings yet
Brian's Concise Common Lisp Reference Sheet: Basic List Stuff
11 pages
Getwd
No ratings yet
Getwd
24 pages
Chapter 5 - Dictionary Techniques
No ratings yet
Chapter 5 - Dictionary Techniques
25 pages
L13-16 Sequential Patterns
No ratings yet
L13-16 Sequential Patterns
36 pages
Devdocs: C++ Programming Language
No ratings yet
Devdocs: C++ Programming Language
9 pages
W2 Advanced Data Structures, IO & Control
No ratings yet
W2 Advanced Data Structures, IO & Control
44 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Why People Choose Gated Communities: A Case Study of Alexandria Metropolitan Area
No ratings yet
Why People Choose Gated Communities: A Case Study of Alexandria Metropolitan Area
11 pages
Friction and Wear Behavior of Laser-Sintered Iron Silicon Carbide Composites J Mat Proc Tech PDF
No ratings yet
Friction and Wear Behavior of Laser-Sintered Iron Silicon Carbide Composites J Mat Proc Tech PDF
8 pages
Internship Report on The Impact of Non-Performing Loan on Profitability of Jamuna Bank PLC
No ratings yet
Internship Report on The Impact of Non-Performing Loan on Profitability of Jamuna Bank PLC
61 pages
Msunit 2 Grasps
No ratings yet
Msunit 2 Grasps
6 pages
1 PDF
No ratings yet
1 PDF
11 pages
Calibration of Refrigerated Centrifuge: D. Zheen J. Saber
100% (1)
Calibration of Refrigerated Centrifuge: D. Zheen J. Saber
13 pages
Module 8
No ratings yet
Module 8
9 pages
Models_of_Change
No ratings yet
Models_of_Change
20 pages
Final Year Group and Project
No ratings yet
Final Year Group and Project
1 page
Sle On Journalism
No ratings yet
Sle On Journalism
5 pages
6sec.600 (Concrete Pavement)
No ratings yet
6sec.600 (Concrete Pavement)
35 pages
Mario B Cordeiro JR - CV DS PH Mar - 2018
No ratings yet
Mario B Cordeiro JR - CV DS PH Mar - 2018
6 pages
Project Management LOOK
No ratings yet
Project Management LOOK
4 pages
TD G58 G58i 2PG BW Oct2015
No ratings yet
TD G58 G58i 2PG BW Oct2015
2 pages
Packing Slip Template
No ratings yet
Packing Slip Template
10 pages
OS-ch1-3-CL and GUI OS
No ratings yet
OS-ch1-3-CL and GUI OS
9 pages
Astrology Lessons Basic Astrology Terms
100% (3)
Astrology Lessons Basic Astrology Terms
16 pages
Edward Pashkov CV 1
No ratings yet
Edward Pashkov CV 1
5 pages
Inventory Module (HolidayStore) - Answers
No ratings yet
Inventory Module (HolidayStore) - Answers
14 pages
Ngo M ĐKTX Dong Hae 10m3
No ratings yet
Ngo M ĐKTX Dong Hae 10m3
42 pages
Tecno Economics of LNG Regasification Terminal in
No ratings yet
Tecno Economics of LNG Regasification Terminal in
11 pages
Speaker Notes Heist
No ratings yet
Speaker Notes Heist
2 pages
Name: Maximum Points: 100 AP Biology Test 2
No ratings yet
Name: Maximum Points: 100 AP Biology Test 2
10 pages
Inversions: Worksheet 8 - C1 Level
No ratings yet
Inversions: Worksheet 8 - C1 Level
24 pages
13 Goalsetting (With GP Notes) PDF
No ratings yet
13 Goalsetting (With GP Notes) PDF
2 pages
Accepted Manuscript: 10.1016/j.jclepro.2014.09.075
No ratings yet
Accepted Manuscript: 10.1016/j.jclepro.2014.09.075
28 pages
Progressive Waves
No ratings yet
Progressive Waves
8 pages
General Statement of The Problem
No ratings yet
General Statement of The Problem
15 pages

Lecture 13

Uploaded by

Lecture 13

Uploaded by

Seial Pattern Mining

A transaction database A sequence database

Given support threshold min_sup =2, <(ab)c> is a

<a> <b> <c> <d> <e> <f>

<a> <b> <c> <d> <e> <f> Without Apriori

5th scan: 1 cand. 1 length-5 seq. <(bd)cba> Cand. cannot pass

There is a need for

• Multiple Scans of database in mining.

• Inefficient for mining long sequential patterns.

• <a>, <aa>, <a(ab)> and <a(abc)> are prefixes

Prefix Suffix (Prefix-Based Projection)

• Find all the length-2 seq. pat. Having prefix <a>:

• No candidate sequence needs to be

• Projected databases keep shrinking

• Major cost of PrefixSpan: constructing

• When (projected) database can be held

You might also like