0% found this document useful (0 votes)
79 views14 pages

Sequence Analysis: Athira P-AM - BU.P2MBA20029

This document discusses sequential pattern mining. It begins by introducing predictive analytics and association analysis. It then defines sequential databases and describes how sequential pattern mining can be used to identify frequent patterns and associations over time in sequential data. The document explains the Apriori algorithm as well as the GSP algorithm for sequential pattern mining and provides an example of using the GSPpy library in Python. Finally, some applications of sequential pattern mining are listed.

Uploaded by

athira p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views14 pages

Sequence Analysis: Athira P-AM - BU.P2MBA20029

This document discusses sequential pattern mining. It begins by introducing predictive analytics and association analysis. It then defines sequential databases and describes how sequential pattern mining can be used to identify frequent patterns and associations over time in sequential data. The document explains the Apriori algorithm as well as the GSP algorithm for sequential pattern mining and provides an example of using the GSPpy library in Python. Finally, some applications of sequential pattern mining are listed.

Uploaded by

athira p
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Sequence Analysis

Athira P- AM.BU.P2MBA20029

1
Agenda
• Predictive analytics
• Association
• Sequential data base
• Sequential pattern mining
• Algorithm
• GSP Py
• Applications
2
Predictive analytics
• Data mining
Data Mining is a process used by organizations to extract specific data from huge databases
to solve business problems. It primarily turns raw data into useful information. Builds a
model to identify a pattern among the attributes presented in the data set of customers.
• Categories
Prediction
Association
Clustering

3
Association
Finding frequent patterns, associations, correlations, or causal structures
among sets of items or objects in transaction databases, relational databases,
and other information repositories.

• Link analysis
• Sequence analysis

4
Sequence data base
• Customer shopping sequences
Purchases laptop first, then digital camera, and then smart phone in 6 months
• Medical treatments, natural disasters..
• Stocks and markets
• Biological sequences: DNA, Protein..
• Soft ware engineering: Program execution..

5
Sequential pattern mining

• In Sequential pattern mining relationships between objects are examined


in terms of their order of occurrence to identify associations over time.
• Transaction database, sequence data base and time-series data base
• Gapped and non-gapped sequential patterns
• An example of a sequential pattern is “Customers who buy a Canon
digital camera are likely to buy an HP colour printer within a month.”

6
• Sequential pattern mining: Given a set of sequences, find a complete set
of frequent sub sequences (satisfying the min_sup threshold)

A sequence data base • A sequence: (ef) (ab) (df) c b


SID Sequence
10 <a(abc)(ac)d(cf)> • An element(event) may contain set of items
20 <(ad)c(bc)(ae)> Items within an element is unordered, usually ordered
30 <(ef)(ab)(df)cb> alphabetically
40 <eg(af)cbc>
• <a(bc)dc> is a sub sequence of <a(abc)
(ac)d(cf)>
• Given a support threshold min_sup =2, <(ab)c> is a sequential pattern
7
Sequential pattern mining algorithms

• Algorithm requirements: Efficient, Scalable, Finding complete set, Incorporating


various kinds of user-specific constraints
• The Apriori property: if a subsequence s1 is infrequent, none of its super-sequences
can be frequent
• Representative algorithms
• GSP(Generalized sequential patterns)
• SPADE
• PrefixSpan
• Clospan 8
Apriori algorithm - Example

9
GSP Py
• Generalized Sequence Pattern (GSP) algorithm in Python
• Install it with pip:
pip install gsppy
• To use it in a project, import it and use the GSP class.
from gsppy.gsp import GSP
• It is assumed that your transactions are a sequence of sequences representing
items in baskets.
10
transactions = [ ['Bread', 'Milk'], ['Bread', 'Diaper', 'Beer', 'Eggs'],
['Milk', 'Diaper', 'Beer', 'Coke'], ['Bread', 'Milk', 'Diaper', 'Beer'],
['Bread', 'Milk', 'Diaper', 'Coke'] ]

• Init the class to prepare the transactions and to find patterns in baskets that occur over the
support threshold (count):

result = GSP(transactions).search(0.3)

11
Applications
• Sales transactions
• Credit card transactions
• Banking services
• Insurance service products
• Telecommunication services
• Medical records

12
References
• https://www.youtube.com/watch?v=GhEteXWNIXc
• http://hanj.cs.illinois.edu/cs412/bk3/7_sequential_pattern_mining.pdf
• SEQUENTIAL DATA MINING FOR BUSINESS STATISTIC ANALYSIS: IJCSET –
Volume 3, Issue 2 – February 2017.
• Sequential Pattern Mining – Approaches and Algorithms: ACM Journal Name, Vol. V,
No. N, M 20YY, Pages 1–46.
• https://www.javatpoint.com/data-mining

13
Thank you

14

You might also like