0% found this document useful (0 votes)
16 views51 pages

PrefixSpan The Presentation (1) Removed

The PrefixSpan algorithm, proposed by Jian Pei et al. in 2001, is designed to identify frequent patterns in a sequence database using database projection and depth-first search. Although it is not the most efficient algorithm, its simplicity and extensibility contribute to its popularity. The document provides a detailed example of how the algorithm processes a sequence database to find frequent patterns based on a minimum support threshold.

Uploaded by

vineetsuradkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views51 pages

PrefixSpan The Presentation (1) Removed

The PrefixSpan algorithm, proposed by Jian Pei et al. in 2001, is designed to identify frequent patterns in a sequence database using database projection and depth-first search. Although it is not the most efficient algorithm, its simplicity and extensibility contribute to its popularity. The document provides a detailed example of how the algorithm processes a sequence database to find frequent patterns based on a minimum support threshold.

Uploaded by

vineetsuradkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

The PrefixSpan algorithm

• Proposed by Jian Pei et al (2001)


• This algorithm is designed to only consider
patterns that exist in the database.
• This algorithm uses a concept of database
projection and a depth-first search.
• This is not the most efficient algorithm, but it
is simple and easy to extend, so it is popular.
• I will explain with an example.

26
Example
This is the input:
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

27
Step 1
PrefixSpan first counts the support of each item by scanning the
database:
Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

28
Step 1
PrefixSpan first counts the support of each item by scanning the
database:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ ⟨ 𝑑 ⟩ support : 1

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

29
Step 2
PrefixSpan eliminates infrequent items:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ ⟨ 𝑑 ⟩ support : 1

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

30
Step 2
PrefixSpan eliminates infrequent items:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

31
Step 2
PrefixSpan eliminates infrequent items:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
Those are the sequential
patterns containing one item!
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

32
Step 2
PrefixSpan eliminates infrequent items:
Sequence database
Result:
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
⟨ 𝑎 ⟩ support : 3
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ ⟨ 𝑏 ⟩ support : 4
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ ⟨ 𝑐 ⟩ support : 4
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
Those are the sequential
patterns containing one item!
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
Prefixspan then extends each
item recursively…
Lets start with ⟨ 𝑎 ⟩ →
33
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

34
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

PrefixSpan does a database projection with ⟨ 𝑎 ⟩:


Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
What is a database projection?
It means to keep only the
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 sequences containing 𝑎 .

Moreover, for these sequences, we


delete the first occurrence of⟨ 𝑎 ⟩
and everything that appears
before.
35
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

PrefixSpan does a database projection with ⟨ 𝑎 ⟩:


Sequence database Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ 𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
What is a database projection?
It means to keep only the
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 sequences containing 𝑎 .

Moreover, for these sequences, we


delete the first occurrence of⟨ 𝑎 ⟩
and everything that appears
before.
36
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

PrefixSpan does a database projection with ⟨ 𝑎 ⟩:


Sequence database Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ 𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
What is a database projection?
It means to keep only the
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 sequences containing 𝑎 .

Moreover, for these sequences, we


delete the first occurrence of⟨ 𝑎 ⟩
and everything that appears
before.
37
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

PrefixSpan does a database projection with ⟨ 𝑎 ⟩:


Sequence database Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩
What is a database projection?
It means to keep only the
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 sequences containing 𝑎 .

Moreover, for these sequences, we


delete the first occurrence of⟨ 𝑎 ⟩
and everything that appears
before.
38
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with ⟨ 𝑎 ⟩ that has one more item:
Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

39
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with ⟨ 𝑎 ⟩ that has one more item:
Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
Result:
⟨ 𝑎 , {𝑎}⟩ support : 1
⟨ 𝑎 , {𝑏}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 ⟨ 𝑎 , {𝑐}⟩ support: 3
𝑎, 𝑏 support : 3

40
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, infrequent patterns are removed:

Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
Result:
⟨ 𝑎 , {𝑎}⟩ support : 1
⟨ 𝑎 , {𝑏}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3 ⟨ 𝑎 , {𝑐}⟩ support: 3
𝑎, 𝑏 support : 3

41
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, infrequent patterns are removed:

Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
Result:

𝑚𝑖𝑛𝑠𝑢𝑝 = 3 ⟨ 𝑎 , {𝑐}⟩ support: 3


𝑎, 𝑏 support : 3

42
Step 3 – Find patterns starting with ⟨ 𝑎 ⟩

Then, infrequent patterns are removed:

Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
Result:

𝑚𝑖𝑛𝑠𝑢𝑝 = 3 ⟨ 𝑎 , {𝑐}⟩ support: 3


𝑎, 𝑏 support : 3

Prefixspan then extends each pattern recursively…


Lets start with ⟨ 𝑎 , {𝑐}⟩ → 43
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

PrefixSpan does a database projection with ⟨ 𝑎 , 𝑐 ⟩:


Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

44
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

PrefixSpan does a database projection with ⟨ 𝑎 , 𝑐 ⟩:


Projected database of ⟨ 𝑎 ⟩
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

45
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

PrefixSpan does a database projection with ⟨ 𝑎 , 𝑐 ⟩:


Projected database of ⟨ 𝑎 ⟩
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

46
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with 𝑎 , 𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

47
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with 𝑎 , 𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

Result:
⟨ 𝑎 , 𝑐 , {𝑎}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

48
Step 4 – Find patterns starting with ⟨ 𝑎 , {𝑐}⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with 𝑎 , 𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑎 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

Result:
⟨ 𝑎 , 𝑐 , {𝑎}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
This pattern is infrequent!

Then PrefixSpan try to find


patterns starting with ⟨{𝑎, 𝑏}⟩ →
49
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

PrefixSpan does a database projection with ⟨{𝑎, 𝑏}⟩:


Projected database of ⟨ 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

50
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

PrefixSpan does a database projection with ⟨{𝑎, 𝑏}⟩:


Projected database of ⟨ 𝑎 ⟩
Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

51
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

PrefixSpan does a database projection with ⟨{𝑎, 𝑏}⟩:


Projected database of ⟨ 𝑎 ⟩
Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ _𝑏 , 𝑐 , 𝑎 ⟩
𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ _𝑏 , 𝑏 , 𝑐 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ _𝑏 , {𝑐}⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

52
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

PrefixSpan does a database projection with ⟨{𝑎, 𝑏}⟩:


Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

53
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with ⟨{𝑎, 𝑏}⟩ that has one more item:
Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

54
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with ⟨{𝑎, 𝑏}⟩ that has one more item:
Projected database of ⟨{𝑎, 𝑏}⟩
𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

Result:
⟨ 𝑎, 𝑏 , {𝑐}⟩ support : 3
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
⟨ 𝑎, 𝑏 , {𝑎}⟩ support : 1
⟨ 𝑎, 𝑏 , {𝑏}⟩ support : 1

55
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

Then, PrefixSpan removes infrequent patterns:

Projected database of ⟨{𝑎, 𝑏}⟩


𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

Result:
⟨ 𝑎, 𝑏 , {𝑐}⟩ support : 3
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
⟨ 𝑎, 𝑏 , {𝑎}⟩ support : 1
⟨ 𝑎, 𝑏 , {𝑏}⟩ support : 1

56
Step 5 – Find patterns starting with ⟨ 𝑎, 𝑏 ⟩

Then, PrefixSpan removes infrequent patterns:

Projected database of ⟨{𝑎, 𝑏}⟩


𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

Result:
⟨ 𝑎, 𝑏 , {𝑐}⟩ support : 3
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

Then PrefixSpan try to find patterns


starting with ⟨ 𝑎, 𝑏 , {𝑐}⟩ → 57
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

PrefixSpan does a database projection for ⟨ 𝑎, 𝑏 , {𝑐}⟩:

Projected database of ⟨{𝑎, 𝑏}⟩


𝑆1 = ⟨ 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

58
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

PrefixSpan does a database projection for ⟨ 𝑎, 𝑏 , {𝑐}⟩:

Projected database of ⟨{𝑎, 𝑏}⟩ Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩


𝑆1 = ⟨ 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨ 𝑎 ⟩
𝑆2 = ⟨ 𝑏 , 𝑐 ⟩
𝑆4 = ⟨ {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

59
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

PrefixSpan does a database projection for ⟨ 𝑎, 𝑏 , {𝑐}⟩:

Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩


𝑆1 = ⟨ 𝑎 ⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

60
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

Then, PrefixSpan counts the support of each sequential pattern


starting with ⟨ 𝑎, 𝑏 , {𝑐}⟩ that has one more item:
Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

61
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

Then, PrefixSpan counts the support of each sequential pattern


starting with ⟨ 𝑎, 𝑏 , {𝑐}⟩ that has one more item:
Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

Result:
⟨ 𝑎, 𝑏 , 𝑐 , {𝑎}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

62
Step 6 – Find patterns starting with 𝑎, 𝑏 , {𝑐}

Then, PrefixSpan counts the support of each sequential pattern


starting with ⟨ 𝑎, 𝑏 , {𝑐}⟩ that has one more item:
Projected database of ⟨ 𝑎, 𝑏 , {𝑐}⟩
𝑆1 = ⟨ 𝑎 ⟩

Result:
⟨ 𝑎, 𝑏 , 𝑐 , {𝑎}⟩ support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

This pattern is infrequent!

Then, PrefixSpan tries to find


patterns starting with ⟨ 𝑏 ⟩ → 63
Step 7 – Find patterns starting with ⟨ 𝑏 ⟩

Sequence database
𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

64
Step 7 – Find patterns starting with {𝑏}

PrefixSpan does a database projection for ⟨ 𝑏 ⟩:

Sequence database Projected database of ⟨{𝑏}⟩


𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

65
Step 7 – Find patterns starting with {𝑏}

PrefixSpan does a database projection for ⟨ 𝑏 ⟩:

Sequence database Projected database of ⟨{𝑏}⟩


𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ 𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

66
Step 7 – Find patterns starting with {𝑏}

PrefixSpan does a database projection for ⟨ 𝑏 ⟩:

Sequence database Projected database of ⟨{𝑏}⟩


𝑆1 = ⟨ 𝑎, 𝑏 , 𝑐 , 𝑎 ⟩ 𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨ 𝑎, 𝑏 , 𝑏 , 𝑐 ⟩ 𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑏 , 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑏 , 𝑎, 𝑏 , {𝑐}⟩ 𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

67
Step 7 – Find patterns starting with {𝑏}

Then, PrefixSpan counts the support of each sequential


pattern starting with ⟨ 𝑏 ⟩ that has one more item:
Projected database of ⟨{𝑏}⟩
𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

Result:
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
⟨ 𝑏 , {𝑎}⟩ support : 2
⟨ 𝑏 , {𝑏}⟩ support : 2
⟨ 𝑏 , {𝑐}⟩ support : 3
⟨ 𝑏 , {𝑑}⟩ support : 1
68
Step 7 – Find patterns starting with {𝑏}

Then, PrefixSpan eliminates infrequent patterns:

Projected database of ⟨{𝑏}⟩


𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

Result:
𝑚𝑖𝑛𝑠𝑢𝑝 = 3
⟨ 𝑏 , {𝑎}⟩ support : 2
⟨ 𝑏 , {𝑏}⟩ support : 2
⟨ 𝑏 , {𝑐}⟩ support : 3
⟨ 𝑏 , {𝑑}⟩ support : 1
Then, PrefixSpan tries to find patterns starting 69
with ⟨ 𝑏 , {𝑐}⟩ →
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

PrefixSpan does a database projection for⟨ 𝑏}, {𝑐 ⟩:

Projected database of ⟨{𝑏}⟩


𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

70
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

PrefixSpan does a database projection for⟨ 𝑏}, {𝑐 ⟩:

Projected database of ⟨{𝑏}⟩ Projected database of ⟨ 𝑏 , {𝑐}⟩


𝑆1 = ⟨𝑐 , 𝑎 ⟩ 𝑆1 = ⟨𝑐 , 𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩ 𝑆2 = ⟨𝑏 , 𝑐 ⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩ 𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩ 𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

71
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

PrefixSpan does a database projection for⟨ 𝑏}, {𝑐 ⟩:

Projected database of ⟨{𝑏}⟩ Projected database of ⟨ 𝑏 , {𝑐}⟩


𝑆1 = ⟨𝑐 , 𝑎 ⟩ 𝑆1 = ⟨𝑎 ⟩
𝑆2 = ⟨𝑏 , 𝑐 ⟩ 𝑆3 = ⟨{𝑑}⟩
𝑆3 = ⟨ 𝑐 , {𝑑}⟩
𝑆4 = ⟨ 𝑎, 𝑏 , {𝑐}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

72
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with⟨ 𝑏}, {𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑏 , {𝑐}⟩
𝑆1 = ⟨𝑎 ⟩
𝑆3 = ⟨{𝑑}⟩

𝑚𝑖𝑛𝑠𝑢𝑝 = 3

73
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with⟨ 𝑏}, {𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑏 , {𝑐}⟩
𝑆1 = ⟨𝑎 ⟩
𝑆3 = ⟨{𝑑}⟩

Result:
⟨ 𝑏 , 𝑐 , {𝑎}⟩ support : 1
𝑏 , 𝑐 , {𝑑} support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

74
Step 8 – Find patterns starting with ⟨ 𝑏}, {𝑐 ⟩

Then, PrefixSpan counts the support of each sequential pattern


starting with⟨ 𝑏}, {𝑐 ⟩ that has one more item:
Projected database of ⟨ 𝑏 , {𝑐}⟩
𝑆1 = ⟨𝑎 ⟩
𝑆3 = ⟨{𝑑}⟩

Result:
⟨ 𝑏 , 𝑐 , {𝑎}⟩ support : 1
𝑏 , 𝑐 , {𝑑} support : 1
𝑚𝑖𝑛𝑠𝑢𝑝 = 3

All these patterns are infrequent!


PrefixSpan has finished its work.
75
Final result:
Those are the frequent sequential patterns:
• ⟨ 𝑎 ⟩ support : 3
• ⟨ 𝑏 ⟩ support : 4
• ⟨ 𝑐 ⟩ support : 4
• ⟨ 𝑎 , {𝑐}⟩ support: 3
• 𝑎, 𝑏 support : 3
• ⟨ 𝑎, 𝑏 , {𝑐}⟩ support : 3
• ⟨ 𝑏 , {𝑐}⟩ support : 3
76

You might also like