0% found this document useful (0 votes)

9 views93 pages

PrefixSpan The Presentation

The document introduces sequential pattern mining, a data mining task aimed at discovering frequently occurring subsequences in discrete sequences. It defines key concepts such as discrete sequences, itemsets, subsequences, and support, while also discussing the challenges and algorithms associated with mining these patterns. The document emphasizes the importance of efficient algorithms to handle the potentially vast number of sequential patterns in a database.

Uploaded by

vineetsuradkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views93 pages

PrefixSpan The Presentation

Uploaded by

vineetsuradkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

An Introduction to

Sequential Pattern Mining

Philippe Fournier-Viger
http://www.philippe-Fournier-viger.com

Fournier-Viger, P., Lin, J. C.-W., Kiran, R. U., Koh, Y. S., Thomas, R. (2017). A
Survey of Sequential Pattern Mining. Data Science and Pattern Recognition
(DSPR), vol. 1(1), pp. 54-77.

Source code and datasets available in the SPMF library 1

Introduction
• Data Mining: the goal is to discover or extract
useful knowledge from data.
• Many types of data can be analyzed: graphs,
relational databases, time series, sequences,
etc.
• In this presentation, we focus on analyzing a
common type of data called discrete
sequences to find interesting patterns in it.

2
What is a discrete sequence?
A sequence is an ordered list of symbols.

Example 1: a sequence can be the items that are

purchased by a customer over time:

Computer Monitor Router

3
What is a discrete sequence?
A sequence is an ordered list of symbols.

Example 2: a sequence can be the list of words in a

sentence:

I go back home

4
What is a discrete sequence?
A sequence is an ordered list of symbols.

Example 3: a sequence can be the list of locations

visited by a car in a city

a b f g

a b c d

e f g h

5
Sequential Pattern Mining
• It is a popular data mining task, introduced in 1994
by Agrawal & Srikant.
• The goal is to find all subsequences that appear
frequently in a set of discrete sequences.
• For example:
– find sequences of items purchased by many customers
over time,
– find sequences of locations frequently visited by
tourists in a city,
– Find sequences of words that appear frequently in a
text.
6
Definition: Items
Let there be a set of items (symbols) called 𝐼.

Example: 𝐼 = {𝑎, 𝑏, 𝑐, 𝑑, 𝑒}

𝑎 = apple 𝑑 = dattes

𝑏 = bread 𝑒 = eggs

𝑐 = cake

7
Definition: Itemset
An itemset is a set of items that is a subset of 𝐼.

Example: {𝑎, 𝑏, 𝑐} is an itemset containing 3 items

{𝑑, 𝑒} is an itemset containing 2 items

• Note: an itemset cannot contain a same item twice.

8
• An itemset having 𝑘 items is called a k-itemset.
Definition: Sequence
A discrete sequence 𝑆 is a an ordered list of itemsets
𝑆 = 𝑋1 , 𝑋2 , … , 𝑋𝑛 where 𝑋𝑗 ⊆ 𝐼 for any 𝑗 ∈ {1,2. . 𝑛}

Example 1: ⟨ 𝑎, 𝑏 , 𝑐 ⟩ is a sequence containing two

itemsets.

It means that a customer purchased 𝑎𝑝𝑝𝑙𝑒 and

𝑏𝑟𝑒𝑎𝑑 at the same time and then purchased 𝑐𝑎𝑘𝑒.

Example 2: ⟨ 𝑎 , 𝑎 , {𝑐}⟩
9
Definition: Subsequence (⊑)
Let there be two sequences:
𝑆𝐴 = 𝐴1 , 𝐴2 , … , 𝐴𝑟 and S𝐵 = 𝐵1 , 𝐵2 , … , 𝐵𝑡 .
The sequence 𝑆𝐴 is a subsequence of S𝐵 if and only
if there exists 𝑟 integers 1 ≤ 𝑖1 < 𝑖2 < ⋯ < 𝑖𝑟 ≤ 𝑡
such that 𝐴1 ⊆ 𝐵𝑖1 , 𝐴2 ⊆ 𝐵𝑖2 , … 𝐴𝑟 ⊆ 𝐵𝑖𝑟 .

This is denoted as SA ⊑ 𝑆𝐵

Examples: ⟨ 𝑎, 𝑐 ⟩ ⊑ ⟨ 𝑎, 𝑏, 𝑐 ⟩
𝑎, 𝑐 ⊑ ⟨ 𝑎}, {𝑐 ⟩
⟨ 𝑎 , 𝑐 ⟩ ⊑ ⟨ 𝑎, 𝑏 , {𝑑}, 𝑏, 𝑐 ⟩
⟨ 𝑎 , 𝑐 ⟩ ⊑ ⟨ 𝑎, 𝑐 , {𝑑}⟩
10
Definition: Sequence database
A sequence database 𝐷 is a set of discrete
sequences 𝐷 = {𝑆1 , 𝑆2 , … 𝑆𝑚 } where each
sequence 𝑆𝑗 ∈ 𝐷 has a unique identifier 𝑗.

Example 1: This is a sequence database with

four sequences 𝐷 = {𝑆1 , 𝑆2 , 𝑆3 , 𝑆4 } :
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨𝑎 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ 11
Definition: Support of a sequence
The number of sequences in a sequence
database 𝐷 that contain a sequence 𝑆𝐴 is called
the support of 𝑆𝐴 . It is defined as:
𝑠𝑢𝑝(𝑆𝐴 ) = | 𝑆 𝑆 ∈ 𝐷 𝑎𝑛𝑑 𝑆𝐴 ⊑ 𝑆}|

Example 1:
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑠𝑢𝑝(⟨ 𝑎 ⟩) = 3
𝑆2 = ⟨𝑎 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
12
Definition: Support of a sequence
The number of sequences in a sequence
database 𝐷 that contain a sequence 𝑆𝐴 is called
the support of 𝑆𝐴 . It is defined as:
𝑠𝑢𝑝(𝑆𝐴 ) = | 𝑆 𝑆 ∈ 𝐷 𝑎𝑛𝑑 𝑆𝐴 ⊑ 𝑆}|

Example 2:
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑠𝑢𝑝(⟨ 𝑏 ⟩) = 4
𝑆2 = ⟨𝑎 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
13
Definition: Support of a sequence
The number of sequences in a sequence
database 𝐷 that contain a sequence 𝑆𝐴 is called
the support of 𝑆𝐴 . It is defined as:
𝑠𝑢𝑝(𝑆𝐴 ) = | 𝑆 𝑆 ∈ 𝐷 𝑎𝑛𝑑 𝑆𝐴 ⊑ 𝑆}|

Example 3:
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑠𝑢𝑝(⟨{𝑎}, {𝑏}⟩ = 1
𝑆2 = ⟨𝑎 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
14
Definition: Support of a sequence
The number of sequences in a sequence
database 𝐷 that contain a sequence 𝑆𝐴 is called
the support of 𝑆𝐴 . It is defined as:
𝑠𝑢𝑝(𝑆𝐴 ) = | 𝑆 𝑆 ∈ 𝐷 𝑎𝑛𝑑 𝑆𝐴 ⊑ 𝑆}|

Example 4:
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑠𝑢𝑝(⟨ 𝑎, 𝑏 ⟩) = 2
𝑆2 = ⟨𝑎 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
15
Definition: Sequential pattern mining
• Input: A sequence database 𝐷 and a
minimum support threshold 𝑚𝑖𝑛𝑠𝑢𝑝 > 0.
• Output: All sequential patterns.
A sequential pattern is a sequence 𝑆 where
sup 𝑆 ≥ 𝑚𝑖𝑛𝑠𝑢𝑝.

16
Example 1
INPUT: OUTPUT:

Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

17
Example 1
INPUT: OUTPUT:

Sequence database all sequential patterns:

𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑎 support = 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑏 support = 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑐 support = 4
𝑎 , {𝑐} support = 3
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
𝑎, 𝑏 support = 2
𝑏 , {𝑐} support = 4
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 𝑎, 𝑏 , {𝑐} support = 3

What will happen if we change the threshold? →

18
Example 2
INPUT: OUTPUT:

Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 4

Observation: If we increase the minsup

threshold, less patterns may be found
19
Example 2
INPUT: OUTPUT:

Sequence database all sequential patterns:

𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑏 support = 4
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑐 support = 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑏 , {𝑐} support = 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 4

Observation: If we increase the minsup

threshold, less patterns may be found
20
It is a difficult problem!
• A naïve algorithm would read the database and count the
support (frequency) of all possible patterns.
• Inefficient because there can be a very large number of
sequential patterns.
• For example:
⟨ 𝑎 ⟩, ⟨ 𝑏 ⟩, ⟨ 𝑐 ⟩ ….
….
𝑎, 𝑏 , 𝑎, 𝑐 , 𝑎, 𝑑 …
…
𝑎 , 𝑎 , 𝑎 , 𝑎 , 𝑎 , 𝑎 , 𝑎 , 𝑎 , 𝑎 … . 𝑎, 𝑏 𝑎 ,….
𝑎}, {𝑏 𝑎 ,….
….
• An efficient algorithm must find the frequent sequential
patterns, without checking all possibilities. 21
Some popular algorithms
• GSP: R. Agrawal, and R. Srikant, Mining sequential patterns, ICDE 1995, pp. 3–14,
1995.
• SPAM: Ayres, J. Flannick, J. Gehrke, and T. Yiu, Sequential pattern mining using a
bitmap representation, KDD 2002, pp. 429–435, 2002.
• SPADE: M. J. Zaki, SPADE: An efficient algorithm for mining frequent sequences,
Machine learning, vol. 42(1-2), pp. 31–60, 2001.
• PrefixSpan: J. Pei, et al. Mining sequential patterns by pattern-growth: The
prefixspan approach, IEEE Transactions on knowledge and data engineering, vol.
16(11), pp. 1424–1440, 2004.
• CM-SPAM and CM-SPADE: P. Fournier-Viger, A. Gomariz, M. Campos, and R.
Thomas, Fast Vertical Mining of Sequential Patterns Using Co-occurrence
Information, PAKDD 2014, pp. 40–52, 2014.

They all have the same input and output.

The difference is performance due to optimizations, search strategies and data structures!

Fast implementations available in the SPMF library

22
A performance comparison
Four benchmark datasets are used

Kosarak BMS

Leviathan Snake

23
The “Apriori” property
Property (anti-monotonicity).
Let be two subsequences X and Y. If X ⊑ 𝐘, then the
support of Y is less than or equal to the support of X.

Example
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ The support of 𝑏 is 4
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ The support of 𝑏 , 𝑐 is 4
The support of 𝑏 , 𝑐 , {𝑑} is 1
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

24
THE PREFIXSPAN ALGORITHM

PrefixSpan: J. Pei, et al. Mining sequential patterns by pattern-growth:

The prefixspan approach, IEEE Transactions on knowledge and data
engineering, vol. 16(11), pp. 1424–1440, 2004.

25
The PrefixSpan algorithm
• Proposed by Jian Pei et al (2001)
• This algorithm is designed to only consider
patterns that exist in the database.
• This algorithm uses a concept of database
projection and a depth-first search.
• This is not the most efficient algorithm, but it
is simple and easy to extend, so it is popular.
• I will explain with an example.

26
Example
This is the input:
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

27
Step 1
PrefixSpan first counts the support of each item by scanning the
database:
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

28
Step 1
PrefixSpan first counts the support of each item by scanning the
database:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ ⟨ 𝑑 ⟩ support : 1

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

29
Step 2
PrefixSpan eliminates infrequent items:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ ⟨ 𝑑 ⟩ support : 1

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

30
Step 2
PrefixSpan eliminates infrequent items:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

31
Step 2
PrefixSpan eliminates infrequent items:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
Those are the sequential
patterns containing one item!
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

32
Step 2
PrefixSpan eliminates infrequent items:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
Those are the sequential
patterns containing one item!
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
Prefixspan then extends each
item recursively…
Lets start with ⟨ 𝑎 ⟩ →
33
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

34
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

PrefixSpan does a database projection with ⟨ 𝑎 ⟩:

Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
What is a database projection?
It means to keep only the
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 sequences containing 𝑎 .

Moreover, for these sequences, we

delete the first occurrence of⟨ 𝑎 ⟩
and everything that appears
before.
35
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

PrefixSpan does a database projection with ⟨ 𝑎 ⟩:

Sequence database Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ 𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
What is a database projection?
It means to keep only the
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 sequences containing 𝑎 .

Moreover, for these sequences, we

delete the first occurrence of⟨ 𝑎 ⟩
and everything that appears
before.
36
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

PrefixSpan does a database projection with ⟨ 𝑎 ⟩:

Moreover, for these sequences, we

delete the first occurrence of⟨ 𝑎 ⟩
and everything that appears
before.
37
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

PrefixSpan does a database projection with ⟨ 𝑎 ⟩:

Sequence database Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
What is a database projection?
It means to keep only the
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 sequences containing 𝑎 .

Moreover, for these sequences, we

delete the first occurrence of⟨ 𝑎 ⟩
and everything that appears
before.
38
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with ⟨ 𝑎 ⟩ that has one more item:
Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

39
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with ⟨ 𝑎 ⟩ that has one more item:
Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
Result:
⟨ 𝑎 , {𝑎}⟩ support : 1
⟨ 𝑎 , {𝑏}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 ⟨ 𝑎 , {𝑐}⟩ support: 3
𝑎, 𝑏 support : 3

40
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, infrequent patterns are removed:

Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
Result:
⟨ 𝑎 , {𝑎}⟩ support : 1
⟨ 𝑎 , {𝑏}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 ⟨ 𝑎 , {𝑐}⟩ support: 3
𝑎, 𝑏 support : 3

41
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, infrequent patterns are removed:

Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
Result:

𝑚𝑖𝑛𝑠𝑢𝑝 = 3 ⟨ 𝑎 , {𝑐}⟩ support: 3

𝑎, 𝑏 support : 3

42
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, infrequent patterns are removed:

Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
Result:

𝑚𝑖𝑛𝑠𝑢𝑝 = 3 ⟨ 𝑎 , {𝑐}⟩ support: 3

𝑎, 𝑏 support : 3

Prefixspan then extends each pattern recursively…

Lets start with ⟨ 𝑎 , {𝑐}⟩ → 43
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

PrefixSpan does a database projection with ⟨ 𝑎 , 𝑐 ⟩:

Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

44
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

PrefixSpan does a database projection with ⟨ 𝑎 , 𝑐 ⟩:

Projected database of ⟨ 𝑎 ⟩
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

45
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

PrefixSpan does a database projection with ⟨ 𝑎 , 𝑐 ⟩:

Projected database of ⟨ 𝑎 ⟩
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

46
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with 𝑎 , 𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

47
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with 𝑎 , 𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

Result:
⟨ 𝑎 , 𝑐 , {𝑎}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

48
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with 𝑎 , 𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

Result:
⟨ 𝑎 , 𝑐 , {𝑎}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
This pattern is infrequent!

Then PrefixSpan try to find

patterns starting with ⟨{𝑎, 𝑏}⟩ →
49
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

PrefixSpan does a database projection with ⟨{𝑎, 𝑏}⟩:

Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

50
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

PrefixSpan does a database projection with ⟨{𝑎, 𝑏}⟩:

Projected database of ⟨ 𝑎 ⟩
Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

51
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

PrefixSpan does a database projection with ⟨{𝑎, 𝑏}⟩:

Projected database of ⟨ 𝑎 ⟩
Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

52
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

PrefixSpan does a database projection with ⟨{𝑎, 𝑏}⟩:

Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

53
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with ⟨{𝑎, 𝑏}⟩ that has one more item:
Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

54
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with ⟨{𝑎, 𝑏}⟩ that has one more item:
Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

Result:
⟨ 𝑎, 𝑏 , {𝑐}⟩ support : 3
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
⟨ 𝑎, 𝑏 , {𝑎}⟩ support : 1
⟨ 𝑎, 𝑏 , {𝑏}⟩ support : 1

55
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

Then, PrefixSpan removes infrequent patterns:

Projected database of ⟨{𝑎, 𝑏}⟩

𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

Result:
⟨ 𝑎, 𝑏 , {𝑐}⟩ support : 3
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
⟨ 𝑎, 𝑏 , {𝑎}⟩ support : 1
⟨ 𝑎, 𝑏 , {𝑏}⟩ support : 1

56
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

Then, PrefixSpan removes infrequent patterns:

Projected database of ⟨{𝑎, 𝑏}⟩

𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

Result:
⟨ 𝑎, 𝑏 , {𝑐}⟩ support : 3
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

Then PrefixSpan try to find patterns

starting with ⟨ 𝑎, 𝑏 , {𝑐}⟩ → 57
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

PrefixSpan does a database projection for ⟨ 𝑎, 𝑏 , {𝑐}⟩:

Projected database of ⟨{𝑎, 𝑏}⟩

𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

58
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

PrefixSpan does a database projection for ⟨ 𝑎, 𝑏 , {𝑐}⟩:

Projected database of ⟨{𝑎, 𝑏}⟩ Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑆1 = ⟨ 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨ 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

59
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

PrefixSpan does a database projection for ⟨ 𝑎, 𝑏 , {𝑐}⟩:

Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑆1 = ⟨ 𝑎 ⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

60
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

Then, PrefixSpan counts the support of each sequential pattern

starting with ⟨ 𝑎, 𝑏 , {𝑐}⟩ that has one more item:
Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

61
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

Then, PrefixSpan counts the support of each sequential pattern

starting with ⟨ 𝑎, 𝑏 , {𝑐}⟩ that has one more item:
Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

Result:
⟨ 𝑎, 𝑏 , 𝑐 , {𝑎}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

62
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

Then, PrefixSpan counts the support of each sequential pattern

starting with ⟨ 𝑎, 𝑏 , {𝑐}⟩ that has one more item:
Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

Result:
⟨ 𝑎, 𝑏 , 𝑐 , {𝑎}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

This pattern is infrequent!

Then, PrefixSpan tries to find

patterns starting with ⟨ 𝑏 ⟩ → 63
Step 7 – Find patterns starting with ⟨ 𝑏 ⟩

Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

64
Step 7 – Find patterns starting with {𝑏}

PrefixSpan does a database projection for ⟨ 𝑏 ⟩:

Sequence database Projected database of ⟨{𝑏}⟩

𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

65
Step 7 – Find patterns starting with {𝑏}

PrefixSpan does a database projection for ⟨ 𝑏 ⟩:

Sequence database Projected database of ⟨{𝑏}⟩

𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ 𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

66
Step 7 – Find patterns starting with {𝑏}

PrefixSpan does a database projection for ⟨ 𝑏 ⟩:

Sequence database Projected database of ⟨{𝑏}⟩

𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ 𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

67
Step 7 – Find patterns starting with {𝑏}

Then, PrefixSpan counts the support of each sequential

pattern starting with ⟨ 𝑏 ⟩ that has one more item:
Projected database of ⟨{𝑏}⟩
𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

Result:
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
⟨ 𝑏 , {𝑎}⟩ support : 2
⟨ 𝑏 , {𝑏}⟩ support : 2
⟨ 𝑏 , {𝑐}⟩ support : 3
⟨ 𝑏 , {𝑑}⟩ support : 1
68
Step 7 – Find patterns starting with {𝑏}

Then, PrefixSpan eliminates infrequent patterns:

Projected database of ⟨{𝑏}⟩

𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

Result:
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
⟨ 𝑏 , {𝑎}⟩ support : 2
⟨ 𝑏 , {𝑏}⟩ support : 2
⟨ 𝑏 , {𝑐}⟩ support : 3
⟨ 𝑏 , {𝑑}⟩ support : 1
Then, PrefixSpan tries to find patterns starting 69
with ⟨ 𝑏 , {𝑐}⟩ →
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

PrefixSpan does a database projection for⟨ 𝑏}, {𝑐 ⟩:

Projected database of ⟨{𝑏}⟩

𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

70
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

PrefixSpan does a database projection for⟨ 𝑏}, {𝑐 ⟩:

Projected database of ⟨{𝑏}⟩ Projected database of ⟨ 𝑏 , {𝑐}⟩

𝑆1 = ⟨𝑐 , 𝑎 ⟩ 𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩ 𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩ 𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

71
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

PrefixSpan does a database projection for⟨ 𝑏}, {𝑐 ⟩:

Projected database of ⟨{𝑏}⟩ Projected database of ⟨ 𝑏 , {𝑐}⟩

𝑆1 = ⟨𝑐 , 𝑎 ⟩ 𝑆1 = ⟨𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩ 𝑆3 = ⟨{𝑑}⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

72
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with⟨ 𝑏}, {𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑏 , {𝑐}⟩
𝑆1 = ⟨𝑎 ⟩
𝑆3 = ⟨{𝑑}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

73
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with⟨ 𝑏}, {𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑏 , {𝑐}⟩
𝑆1 = ⟨𝑎 ⟩
𝑆3 = ⟨{𝑑}⟩

Result:
⟨ 𝑏 , 𝑐 , {𝑎}⟩ support : 1
𝑏 , 𝑐 , {𝑑} support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

74
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

Then, PrefixSpan counts the support of each sequential pattern

starting with⟨ 𝑏}, {𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑏 , {𝑐}⟩
𝑆1 = ⟨𝑎 ⟩
𝑆3 = ⟨{𝑑}⟩

Result:
⟨ 𝑏 , 𝑐 , {𝑎}⟩ support : 1
𝑏 , 𝑐 , {𝑑} support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

All these patterns are infrequent!

PrefixSpan has finished its work.
75
Final result:
Those are the frequent sequential patterns:
• ⟨ 𝑎 ⟩ support : 3
• ⟨ 𝑏 ⟩ support : 4
• ⟨ 𝑐 ⟩ support : 4
• ⟨ 𝑎 , {𝑐}⟩ support: 3
• 𝑎, 𝑏 support : 3
• ⟨ 𝑎, 𝑏 , {𝑐}⟩ support : 3
• ⟨ 𝑏 , {𝑐}⟩ support : 3
76
Observation
PrefixSpan performs a depth-first search:
⟨⟩

Notation:
Frequent sequential pattern
Infrequent sequential pattern 77
Observation
PrefixSpan performs a depth-first search:
⟨⟩

⟨𝑎⟩ ⟨𝑏⟩ ⟨𝑐⟩ ⟨𝑑⟩

Notation:
Frequent sequential pattern
Infrequent sequential pattern 78
Observation
PrefixSpan performs a depth-first search:
⟨⟩

⟨𝑎⟩ ⟨𝑏⟩ ⟨𝑐⟩ ⟨𝑑⟩

Notation:
Frequent sequential pattern
Infrequent sequential pattern 79
Observation
PrefixSpan performs a depth-first search:
⟨⟩

⟨𝑎⟩ ⟨𝑏⟩ ⟨𝑐⟩ ⟨𝑑⟩

⟨ 𝑎 , {𝑎}⟩ ⟨ 𝑎 , {𝑏}⟩ ⟨ 𝑎 , {𝑐}⟩ 𝑎, 𝑏

Notation:
Frequent sequential pattern
Infrequent sequential pattern 80
Observation
PrefixSpan performs a depth-first search:
⟨⟩

⟨𝑎⟩ ⟨𝑏⟩ ⟨𝑐⟩ ⟨𝑑⟩

⟨ 𝑎 , {𝑎}⟩ ⟨ 𝑎 , {𝑏}⟩ ⟨ 𝑎 , {𝑐}⟩ 𝑎, 𝑏

⟨ 𝑎 , 𝑐 , {𝑎}⟩

Notation:
Frequent sequential pattern
Infrequent sequential pattern 81
Observation
PrefixSpan performs a depth-first search:
⟨⟩

⟨𝑎⟩ ⟨𝑏⟩ ⟨𝑐⟩ ⟨𝑑⟩

⟨ 𝑎 , {𝑎}⟩ ⟨ 𝑎 , {𝑏}⟩ ⟨ 𝑎 , {𝑐}⟩ 𝑎, 𝑏

⟨ 𝑎 , 𝑐 , {𝑎}⟩ ⟨ 𝑎, 𝑏 , {𝑐}⟩ ⟨ 𝑎, 𝑏 , {𝑎}⟩ ⟨ 𝑎, 𝑏 , {𝑏}⟩

Notation:
Frequent sequential pattern
Infrequent sequential pattern 82
Observation
PrefixSpan performs a depth-first search:
⟨⟩

⟨𝑎⟩ ⟨𝑏⟩ ⟨𝑐⟩ ⟨𝑑⟩

⟨ 𝑎 , {𝑎}⟩ ⟨ 𝑎 , {𝑏}⟩ ⟨ 𝑎 , {𝑐}⟩ 𝑎, 𝑏

⟨ 𝑎 , 𝑐 , {𝑎}⟩ ⟨ 𝑎, 𝑏 , {𝑐}⟩ ⟨ 𝑎, 𝑏 , {𝑎}⟩ ⟨ 𝑎, 𝑏 , {𝑏}⟩

⟨ 𝑎, 𝑏 , 𝑐 , {𝑎}⟩

Notation:
Frequent sequential pattern
Infrequent sequential pattern 83
Observation
PrefixSpan performs a depth-first search:
⟨⟩

⟨𝑎⟩ ⟨𝑏⟩ ⟨𝑐⟩ ⟨𝑑⟩

⟨ 𝑎 , {𝑎}⟩ ⟨ 𝑎 , {𝑏}⟩ ⟨ 𝑎 , {𝑐}⟩ 𝑎, 𝑏 ⟨ 𝑏 , {𝑎}⟩ ⟨ 𝑏 , {𝑏}⟩ ⟨ 𝑏 , {𝑐}⟩ ⟨ 𝑏 , {𝑑}⟩

⟨ 𝑎 , 𝑐 , {𝑎}⟩ ⟨ 𝑎, 𝑏 , {𝑐}⟩ ⟨ 𝑎, 𝑏 , {𝑎}⟩ ⟨ 𝑎, 𝑏 , {𝑏}⟩

⟨ 𝑎, 𝑏 , 𝑐 , {𝑎}⟩

Notation:
Frequent sequential pattern
Infrequent sequential pattern 84
Observation
PrefixSpan performs a depth-first search:
⟨⟩

⟨𝑎⟩ ⟨𝑏⟩ ⟨𝑐⟩ ⟨𝑑⟩

⟨ 𝑎 , {𝑎}⟩ ⟨ 𝑎 , {𝑏}⟩ ⟨ 𝑎 , {𝑐}⟩ 𝑎, 𝑏 ⟨ 𝑏 , {𝑎}⟩ ⟨ 𝑏 , {𝑏}⟩ ⟨ 𝑏 , {𝑐}⟩ ⟨ 𝑏 , {𝑑}⟩

⟨ 𝑎 , 𝑐 , {𝑎}⟩ ⟨ 𝑎, 𝑏 , {𝑐}⟩ ⟨ 𝑎, 𝑏 , {𝑎}⟩ ⟨ 𝑎, 𝑏 , {𝑏}⟩ ⟨ 𝑏 , 𝑐 , {𝑎}⟩ 𝑏 , 𝑐 , {𝑑}

⟨ 𝑎, 𝑏 , 𝑐 , {𝑎}⟩

Notation:
Frequent sequential pattern
Infrequent sequential pattern 85
Pseudocode of PrefixSpan (simple version)

PrefixSpan(a database 𝐷, a sequence 𝑆 (initially empty ⟨⟩), 𝑚𝑖𝑛𝑠𝑢𝑝)

1. Scan D to find the support of each sequence starting with S that has one more
item.
2. For each sequence 𝑅 such that sup 𝑅 ≥ 𝑚𝑖𝑛𝑠𝑢𝑝
3. Output 𝑅
4. Create the projected database 𝐷𝑅 of 𝑅 by doing a projection with 𝐷
5. Call PrefixSpan(𝐷𝑅 , 𝑅, 𝑚𝑖𝑛𝑠𝑢𝑝)

86
Optimization 1
• Observation:
– Making a copy of the database for each projection can spend a lot of time!
– A projected database can also take a lot of memory.
• Solution:
– do pseudo-projections
– This means that we don’t make a real copy. We use pointers on the original
database instead.

Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

87
Optimization 1
• Observation:
– Making a copy of the database for each projection can spend a lot of time!
– A projected database can also take a lot of memory.
• Solution:
– do pseudo-projections
– This means that we don’t make a real copy. We use pointers on the original
database instead. Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
Sequence database 𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ Pseudo-projected database of ⟨ 𝑎 ⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ 𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
88
Optimization 2
• Observation:
– After reading the database to count the support of each item,
PrefixSpan can remove all infrequent items from the database.
– This will reduce the database size…
– This could be done also when creating projected databases.

Sequence database
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝒅}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

89
Optimization 2
• Observation:
– After reading the database to count the support of each item,
PrefixSpan can remove all infrequent items from the database.
– This will reduce the database size…
– This could be done also when creating projected databases.

Sequence database
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆3 = ⟨𝑏 , 𝑐 ⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

90
PrefixSpan is a good algorithm?

• Generally, very fast.

• For each frequent pattern, PrefixSpan scans the
database once to count the support of patterns. This
takes linear time w.r.t the database size.
• Creating a projected database is done in linear time
– This can still consume a lot of time and memory.
– But projected databases are always smaller than the original
database.
• Unlike some other algorithms (e.g. GSP), PrefixSpan only
considers patterns that exist in the database.
• PrefixSpan can be easily extended to add constraints
(e.g. maximum length, maximum gap)

91
What influence the performance of
PrefixSpan?

• The minsup threshold

• The database:
– The number of sequences
– The length of sequences
– The sequences are similar?
– The number of distinct items

92
Code, datasets and more…
• A fast Java implementation of PrefixSpan is available in the
SPMF data mining software
(http://www.philippe-fournier-viger.com/spmf/ )
– It can be used as a stand alone software, or as a library.
– Several other sequential pattern mining algorithms are
also provided.
– Datasets are given
• A survey of sequential pattern mining:
– Fournier-Viger, P., Lin, J. C.-W., Kiran, R. U., Koh, Y. S., Thomas, R.
(2017). A Survey of Sequential Pattern Mining. Data Science and
Pattern Recognition (DSPR), vol. 1(1), pp. 54-77.

Lesson 5 Quiz
No ratings yet
Lesson 5 Quiz
11 pages
Sequential Pattern Mining
No ratings yet
Sequential Pattern Mining
24 pages
Ms 102t00a Enu Powerpoint 01
0% (1)
Ms 102t00a Enu Powerpoint 01
58 pages
Longest Common Subsequence
No ratings yet
Longest Common Subsequence
11 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
PrefixSpan The Presentation
No ratings yet
PrefixSpan The Presentation
76 pages
Lecture 2.3.7-2.3.9
No ratings yet
Lecture 2.3.7-2.3.9
53 pages
Data Mining Patrones Secuenciales
No ratings yet
Data Mining Patrones Secuenciales
59 pages
Lecture 13
No ratings yet
Lecture 13
43 pages
Sequences: 5.1 Data Types, Cost Specifications, and Data Structures
No ratings yet
Sequences: 5.1 Data Types, Cost Specifications, and Data Structures
30 pages
Unit5-Dwdm
No ratings yet
Unit5-Dwdm
58 pages
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
No ratings yet
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
67 pages
PrefixSpan The Presentation (1) Removed
No ratings yet
PrefixSpan The Presentation (1) Removed
51 pages
Neural Networks
100% (1)
Neural Networks
4 pages
A Survey of Sequential Pattern Mining
No ratings yet
A Survey of Sequential Pattern Mining
24 pages
How To Create Security Roles For SAP FIORI Tiles Via PFCG in Gateway - Frontend System
100% (1)
How To Create Security Roles For SAP FIORI Tiles Via PFCG in Gateway - Frontend System
9 pages
Algorithms Design Exam Help
No ratings yet
Algorithms Design Exam Help
20 pages
DM Lect 5 - Sequence & Stream Mining
No ratings yet
DM Lect 5 - Sequence & Stream Mining
32 pages
PrefixSpan Final
No ratings yet
PrefixSpan Final
22 pages
Chapter 10: Sequence Mining
No ratings yet
Chapter 10: Sequence Mining
37 pages
Mining Sequential Patterns
No ratings yet
Mining Sequential Patterns
43 pages
Sequence Analysis: Athira P-AM - BU.P2MBA20029
No ratings yet
Sequence Analysis: Athira P-AM - BU.P2MBA20029
14 pages
L13-16 Sequential Patterns
No ratings yet
L13-16 Sequential Patterns
36 pages
49 Sweight
No ratings yet
49 Sweight
13 pages
Balaji Institute of Sciences: Narsampet, Warangal-506 331 2010-11
No ratings yet
Balaji Institute of Sciences: Narsampet, Warangal-506 331 2010-11
36 pages
DSPLab2 Sampling Theorem
No ratings yet
DSPLab2 Sampling Theorem
8 pages
15.082J/6.855J/ESD.78J September 14, 2010: Data Structures
No ratings yet
15.082J/6.855J/ESD.78J September 14, 2010: Data Structures
45 pages
ADMA2013 MaxSP Maximal Sequential Patterns
No ratings yet
ADMA2013 MaxSP Maximal Sequential Patterns
12 pages
LAN - PAKDD2014 - Sequential - Pattern - Mining - CM-SPADE - CM-SPAM
No ratings yet
LAN - PAKDD2014 - Sequential - Pattern - Mining - CM-SPADE - CM-SPAM
13 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
34 pages
Bab 06 - Seq Mining - Part 2
No ratings yet
Bab 06 - Seq Mining - Part 2
26 pages
Mining Sequential Patterns: E-Mail: Arif@its-Sby - Edu URL: WWW - Its-Sby - Edu/ Arif
No ratings yet
Mining Sequential Patterns: E-Mail: Arif@its-Sby - Edu URL: WWW - Its-Sby - Edu/ Arif
25 pages
Concepts and Techniques: Mining Sequence Patterns in Transactional Databases
No ratings yet
Concepts and Techniques: Mining Sequence Patterns in Transactional Databases
26 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
26 pages
Efficient Mining of Correlated Sequential Patterns Based On Null Hypothesis
No ratings yet
Efficient Mining of Correlated Sequential Patterns Based On Null Hypothesis
8 pages
Efficient Mining of Top-K Sequential Rules: Abstract
No ratings yet
Efficient Mining of Top-K Sequential Rules: Abstract
14 pages
Pattern Sequence Mining: Presented By: Devika Mittal
No ratings yet
Pattern Sequence Mining: Presented By: Devika Mittal
15 pages
Good One
No ratings yet
Good One
12 pages
Sequential Pattern Mining by Pattern-Growth: Principles and Extensions
No ratings yet
Sequential Pattern Mining by Pattern-Growth: Principles and Extensions
38 pages
Summarizing Sequential Data With Closed Partial Orders
No ratings yet
Summarizing Sequential Data With Closed Partial Orders
12 pages
Midterm Question Bank Health Informatics
No ratings yet
Midterm Question Bank Health Informatics
42 pages
Workbook For Algorithms
No ratings yet
Workbook For Algorithms
13 pages
Mining High Utility Patterns in One Phase Without Generating Candidates
No ratings yet
Mining High Utility Patterns in One Phase Without Generating Candidates
17 pages
Sequential Pattern Mining: A Comparison Between GSP, SPADE and Prefix SPAN
No ratings yet
Sequential Pattern Mining: A Comparison Between GSP, SPADE and Prefix SPAN
21 pages
BIDE: Efficient Mining of Frequent Closed Sequences: Jianyong Wang and Jiawei Han
No ratings yet
BIDE: Efficient Mining of Frequent Closed Sequences: Jianyong Wang and Jiawei Han
36 pages
Activate Prism - GraphPad
No ratings yet
Activate Prism - GraphPad
3 pages
Compusoft, 3 (10), 1140-1142 PDF
No ratings yet
Compusoft, 3 (10), 1140-1142 PDF
3 pages
Scalable Sequential Pattern Mining Based On PrefixSpan For High Dimensional Data
No ratings yet
Scalable Sequential Pattern Mining Based On PrefixSpan For High Dimensional Data
6 pages
Huang 2006
No ratings yet
Huang 2006
12 pages
A Simple Algorithm For Finding Frequent Elements in Streams and Bags
No ratings yet
A Simple Algorithm For Finding Frequent Elements in Streams and Bags
5 pages
Compusoft, 3 (9), 1079-1082 PDF
No ratings yet
Compusoft, 3 (9), 1079-1082 PDF
4 pages
Mining Temporal Patterns For Interval-Based and Point-Based Events
No ratings yet
Mining Temporal Patterns For Interval-Based and Point-Based Events
6 pages
PIC16F877A μc1
No ratings yet
PIC16F877A μc1
26 pages
Icremental Mining of Sequential Pattern
No ratings yet
Icremental Mining of Sequential Pattern
25 pages
Efficient Mining of Top-K Sequential Rules: Philippe Fournier-Viger
No ratings yet
Efficient Mining of Top-K Sequential Rules: Philippe Fournier-Viger
21 pages
Sequential Pattern Mining
No ratings yet
Sequential Pattern Mining
3 pages
Improved Sequential Pattern Mining Using An Extended Bitmap Representation
No ratings yet
Improved Sequential Pattern Mining Using An Extended Bitmap Representation
11 pages
Kleene's Theorem
No ratings yet
Kleene's Theorem
6 pages
Lab3 3
No ratings yet
Lab3 3
3 pages
Modicon M221 - TM221C40R
No ratings yet
Modicon M221 - TM221C40R
17 pages
Data Mining - Mining Sequential Patterns
No ratings yet
Data Mining - Mining Sequential Patterns
10 pages
Outline: Problem Statement Definitions & Examples Strategies
No ratings yet
Outline: Problem Statement Definitions & Examples Strategies
7 pages
How To Install Software License Manager (SLM) License Server
No ratings yet
How To Install Software License Manager (SLM) License Server
6 pages
AI-Powered Crop Suggestion Yield Prediction Disease Detection and Soil Monitoring
No ratings yet
AI-Powered Crop Suggestion Yield Prediction Disease Detection and Soil Monitoring
5 pages
Agriculture Problem Statments DRI KVK Ambajogai, Beed
No ratings yet
Agriculture Problem Statments DRI KVK Ambajogai, Beed
43 pages
Computer Studies Notes Form 2
No ratings yet
Computer Studies Notes Form 2
5 pages
Technical Communication
No ratings yet
Technical Communication
52 pages
Memory Access Method
No ratings yet
Memory Access Method
14 pages
Ao Search
No ratings yet
Ao Search
15 pages
Python Lesson 5 - Selection
No ratings yet
Python Lesson 5 - Selection
19 pages
V09N1-PP03 Oct2015
No ratings yet
V09N1-PP03 Oct2015
9 pages
11 Database Security
No ratings yet
11 Database Security
44 pages
GRD - 8 - RECORDING - 2019 - ORACLE SENIOR SECONDARY SCHOOL (NPC) Nov2019
No ratings yet
GRD - 8 - RECORDING - 2019 - ORACLE SENIOR SECONDARY SCHOOL (NPC) Nov2019
46 pages
Thanh Machine Learning Daylight Analysis
No ratings yet
Thanh Machine Learning Daylight Analysis
19 pages
Data Sheet Acronis SCS Cyber Backup 12.5 Hardened Edition EN US 230627
No ratings yet
Data Sheet Acronis SCS Cyber Backup 12.5 Hardened Edition EN US 230627
2 pages
Linear Programming - 17 March 23
No ratings yet
Linear Programming - 17 March 23
8 pages
Module 2 GSM
No ratings yet
Module 2 GSM
47 pages
Circuits and Systems For Efficient Portable-to-Portable Wireless Charging
No ratings yet
Circuits and Systems For Efficient Portable-to-Portable Wireless Charging
125 pages
NLP Simple Explanation
No ratings yet
NLP Simple Explanation
9 pages
Training Programmes CAD-CAM
No ratings yet
Training Programmes CAD-CAM
1 page
Data Page
No ratings yet
Data Page
7 pages
Stages of NLP
No ratings yet
Stages of NLP
6 pages
Sap Commerce Notes
No ratings yet
Sap Commerce Notes
12 pages
Dependency Parsing
No ratings yet
Dependency Parsing
32 pages
Digital Marketing
No ratings yet
Digital Marketing
41 pages
Grandstream Catalogo 2024
No ratings yet
Grandstream Catalogo 2024
12 pages
Real-Time Crop Growth Tracking and Disease Detection Using Machine Learning
No ratings yet
Real-Time Crop Growth Tracking and Disease Detection Using Machine Learning
5 pages
Zkfi: Privacy-Preserving and Regulation Compliant Transactions Using Zero Knowledge Proofs
No ratings yet
Zkfi: Privacy-Preserving and Regulation Compliant Transactions Using Zero Knowledge Proofs
10 pages
Server, Storage - Centric
No ratings yet
Server, Storage - Centric
4 pages
Sequential Pattern Mining in Data Streams Using The Weighted Sliding Window
No ratings yet
Sequential Pattern Mining in Data Streams Using The Weighted Sliding Window
5 pages
E-Commerce Market Analysis From A Graph-Based Product Classifier
No ratings yet
E-Commerce Market Analysis From A Graph-Based Product Classifier
8 pages
5G in Military Usage
No ratings yet
5G in Military Usage
1 page
COMM 141 Student Video Introduction Assignment-S23
No ratings yet
COMM 141 Student Video Introduction Assignment-S23
2 pages
Jawab Pertanyaan Studi Kasus (Case Study Questions) Sesuai Pembagian Di Bawah Ini
No ratings yet
Jawab Pertanyaan Studi Kasus (Case Study Questions) Sesuai Pembagian Di Bawah Ini
2 pages
Algebra
From Everand
Algebra
Larry C. Grove
5/5 (3)
Topology Essentials
From Everand
Topology Essentials
Emil G. Milewski
5/5 (1)
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet