0% found this document useful (0 votes)

9 views32 pages

DM Lect 5 - Sequence & Stream Mining

The document discusses various data mining techniques, focusing on frequent pattern mining, sequence mining, and stream data processing. It outlines methods such as Apriori, GSP, and PrefixSpan for mining sequential patterns and addresses challenges in processing data streams, including real-time processing and memory limits. Additionally, it introduces strategies like sampling, window processing, and the Lossy Counting Algorithm for managing frequent itemsets in dynamic data environments.

Uploaded by

mohamed2004mowaffak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views32 pages

DM Lect 5 - Sequence & Stream Mining

Uploaded by

mohamed2004mowaffak

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

Frequent Patterns Data Mining

Mining
Sequence & Stream
Mining

Dr. Wedad Hussein

[email protected]

Dr. Mahmoud Mounir

[email protected]
Data Mining Techniques
Data Mining

Frequent
Patterns Clustering Classification
Mining

Association Sequence
Rules Mining

Apriori GSP

FP-Growth SPADE

ECLAT PrefixSpan
Element vs Event

Sequence Sequence Element Event

Database (Item)
Web Data Browsing activity of a A collection of files viewed Home page, index
particular Web visitor by a Web visitor after a page, contact info, etc
single mouse click
E-Commerce Whole purchase All products bought in a Individual products
Data history of a customer single transaction

Element Event

<{i1, i2, i4}, {i3}, {i2, i4}>

Subsequence Examples

Sequence A Sequence B Is B subsequence

of A?

<{2,4} {3,5,6} <{2} {3,6} {8}> Yes

{8}>
<{2,4} {3,5,6} <{2} {8}> Yes
{8}>
No
<{1,2} {3,4}> <{1} {2}>
Yes
<{2,4} {2,4} <{2} {4}>
{2,5}>
Sequence Mining Techniques
• Apriori-based method: GSP (Generalized
Sequential Patterns: Srikant & Agrawal)
• Pattern-growth methods: FreeSpan &
PrefixSpan (Han et al; Pei et al.)
• Vertical format-based mining: SPADE (M. J.
Zaki)
PrefixSpan
Prefix and Suffix (Projection)
• <{a}>, <{a}{a}>, <{a}{ab}> and <{a}{abc}>
are prefixes of sequence <{a}{abc}{ac}{d}{cf}>
• Given sequence <{a}{abc}{ac}{d}{cf}>

Prefix Suffix (Prefix-Based

Projection)
<{a}> <{abc}{ac}{d}{cf}>
<{a}{a}> <{_bc}{ac}{d}{cf}>
<{a}{ab}> <{_c}{ac}{d}{cf}>

7
Prefix Projection

• Step 1: find length-1 sequential patterns

• <a>, <b>, <c>, <d>, <e>, <f>
• Step 2: divide search space. The complete
set of seq. pat. can be partitioned into 6
subsets:
• The ones having prefix <a>; SID sequence
• The ones having prefix <b>; 10 <{a}{abc}{ac}{d}{cf}>
• … 20 <{ad}{c}{bc}{ae}>
• 30 <{ef}{ab}{df}{c}{b}>
The ones having prefix <f>
40 <{e}{g}{af}{c}{b}{c}>

8
PrefixSpan - Example
• 1. Find length 1 sequential
patterns:
id Sequence
10 <{a}{abc}{ac}{d}{cf}>
20 <{ad}{c}{bc}{ae}>
30 <{ef}{ab}{df}{c}{b}>
40 <{e}{g}{af}{c}{b}{c}>

>a< >b< >c< >d< >e< >f< >g<

4 4 4 3 3 3 1

Frequent Events:
<a>,<b>,<c>,<d>,<e>,<f>
PrefixSpan - Example
id Sequence
10 <{a}{abc}{ac}{d}{cf}>
20 <{ad}{c}{bc}{ae}>
• 2. Divide search space 30 <{ef}{ab}{df}{c}{b}>
40 <{e}{g}{af}{c}{b}{c}>

Prefix

>a< >c<
<{abc}{ac}{d}{cf}> <{ac}{d}{cf}> >e<
<{_d}{c}{bc}{ae}> <{bc}{ae}>
<{_f}{ab}{df}{c}{b}>
<{_b}{df}{c}{b}> <{b}>
<{af}{c}{b}{c}>
<{_f}{c}{b}{c}> <{b}{c}>

>b< >d<
<{_c}{ac}{d}{cf}> <{cf}> >f<
<{_c}{ae}> <{c}{bc}{ae}> <{ab}{df}{c}{b}>
<{df}{c}{b}> <{_f}{c}{b}> <{c}{b}{c}>
<{c}>
PrefixSpan – Example
>d<
<{cf}>
<{c}{bc}{ae}>
<{_f}{c}{b}>

>a< >b< >c< >d< >e< >f< }f_{<

>
1 2 3 0 1 1 1

<{d}{b}>
<{d}{c}>

Frequent Sequences
PrefixSpan – Example
• Continue with frequent sequences:
>d<
<{cf}>
<{c}{bc}{ae}>
<{_f}{c}{b}>

Use b as a prefix Use c as a prefix

<{d}{b}> <{d}{c}>
<{bc}{ae}> >b< >a< >e< >c<
<{_c}{ae}>
<{b}> 2 1 1 1

Frequent: <{d}{c}{b}>
<{d}{c}{b}>
>}ae{<
PrefixSpan – Example
• Find combinations with c:
>c<
<{ac}{d}{cf}> >a< >b < >c< >d< > e< >f<
<{bc}{ae}> 2 3 3 1 1 1
<{b}>
<{b}{c}>
<{c}{a}> <{c}{b}> <{c}{c}>

<{c}{b}> >c< >c_< >a< > e<

<{c}{a}> <{c}{c}>
<{_c}{ae}> 1 1 1 1 <{_f}>
<{_c}{d}{cf}> <{c}>
<{_e}
No frequent eventsNo frequent events No frequent events
Data Streams
Stream Data

• Data arrives fast.

• If not processed immediately it will
be lost.
• Examples:
• Sensor Data
• Image Data
• Web Traffic
• Social Media
Issues with Stream Data

• Processing is done in real-time.

• Multiple passes are not possible.
• Memory limits.
• Accuracy vs. Storage.
Processing Data
Streams
A. Sampling

• Keep an unbiased sample of the

data.
• Use reservoir sampling.
• Keeps a sample of size s.
• Every new element in the stream
has a probability to replace any old
element.
B. Window Processing

• A window is a subset of the arrived

transactions.
• Types of windows:
• Landmark Window
• Sliding Window
• Damped Window
Landmark Window

• From a fixed starting point I to current

time t.
• If i=1, we are processing the whole
stream.
• All time points are equally important.
Sliding Window

• Focus on recent data only.

• Given a window size w and current
time t we process W[t-w+1,t].
• Window moves with data.
Damped Window

• Associates weights with the data in

the stream, and gives higher
weights to recent data than those
in the past.
C. Histograms
• Approximate the distribution of values.
• Divide range into buckets of equal width or
equal depth.
• Can be used later to approximate query
answers.
D. Stream Queries

• One-time Query: evaluated once at

a certain point in time.
• Continuous Query: evaluated
continuously as data streams
continue to arrive.
Frequent Pattern
Mining in Data
Streams
Challenges

• Multiple passes are not achievable.

• The frequency of items change
(frequent items can become
infrequent and so on)
• The number of infrequent itemsets
is exponential.
Approaches

• Keep track of limited set of

itemsets.
• Disadvantage: limited usage and
expressiveness.
• Derive and approximate set of
answers.
• Approximate the frequency within a
certain error limit.
Lossy Counting Algoritm

• Inputs: min_support threshold (σ)

and error bound (ε).
• Divide into buckets of width:
ceiling (1/ ε)
Finding Frequent Items
New bucket
arrives

Add item to list No

Item
with f=1 and ∆=b-
exists
1

Ye
s
Increase
frequency

Ye Delete items
No End of
s with
bucket
f+∆≤b

∆, the maximum possible error on the frequency

count
Support Estimation

• The support can be underestimated

but still it can be proven that no
frequent items are lost.
• Properties of the algorithm:
• There are no false negatives.
• False positives are quite “positive”.
• The error in estimation is never very
high.
Finding Frequent Itemsets
Read β buckets
into memory

Generate
Subsets

Count frequency

f≥β No Itemse
t
Yes exists
Insert with into Yes
list
No f
+∆≤b

Yes
All
No subsets
Delete itemset
processe
d
Yes

∆=b-β
Thank You

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
Professional Summary: Jayantee Bhalerao
No ratings yet
Professional Summary: Jayantee Bhalerao
2 pages
Data Mining - Mining Sequential Patterns
No ratings yet
Data Mining - Mining Sequential Patterns
10 pages
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
No ratings yet
An Updown Directed Acyclic Graph Approach For Sequential Pattern Mining
67 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
34 pages
Sequential Pattern Mining
No ratings yet
Sequential Pattern Mining
24 pages
Lecture 13
No ratings yet
Lecture 13
43 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
26 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
Mining Sequential Patterns
No ratings yet
Mining Sequential Patterns
43 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
Notes 4 DWM Data Mining
No ratings yet
Notes 4 DWM Data Mining
34 pages
Concepts and Techniques: Mining Sequence Patterns in Transactional Databases
No ratings yet
Concepts and Techniques: Mining Sequence Patterns in Transactional Databases
26 pages
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
No ratings yet
Streaming Algorithms: Ajinkya Potdar Hemanga Krishna Borah
47 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
LAN - PAKDD2014 - Sequential - Pattern - Mining - CM-SPADE - CM-SPAM
No ratings yet
LAN - PAKDD2014 - Sequential - Pattern - Mining - CM-SPADE - CM-SPAM
13 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
NGDM07 Philip Yu
No ratings yet
NGDM07 Philip Yu
22 pages
Fptreehuffman
No ratings yet
Fptreehuffman
4 pages
PrefixSpan The Presentation
No ratings yet
PrefixSpan The Presentation
93 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Modified Frequent Pattern Mining From Data Stream
No ratings yet
Modified Frequent Pattern Mining From Data Stream
38 pages
DM Unit 2 Topics
No ratings yet
DM Unit 2 Topics
12 pages
Chapter 10: Sequence Mining
No ratings yet
Chapter 10: Sequence Mining
37 pages
Sequential Pattern Mining: A Comparison Between GSP, SPADE and Prefix SPAN
No ratings yet
Sequential Pattern Mining: A Comparison Between GSP, SPADE and Prefix SPAN
21 pages
Unit 3
No ratings yet
Unit 3
62 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Efficient Mining of Correlated Sequential Patterns Based On Null Hypothesis
No ratings yet
Efficient Mining of Correlated Sequential Patterns Based On Null Hypothesis
8 pages
Finding Frequent Subpaths in A Graph
No ratings yet
Finding Frequent Subpaths in A Graph
12 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
Frequent Pattern Mining
No ratings yet
Frequent Pattern Mining
2 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
FP Growth (Tree)
No ratings yet
FP Growth (Tree)
24 pages
DM Unit-2
No ratings yet
DM Unit-2
14 pages
Veloso Sbac03
No ratings yet
Veloso Sbac03
8 pages
DWDM 3
No ratings yet
DWDM 3
34 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Sequential Pattern Mining
No ratings yet
Sequential Pattern Mining
3 pages
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
No ratings yet
Guide: Mr. Gautam Borkar: Group Members: Rahul Kelaskar A - 636 Anish Khale A - 638 Dhaval Doshi A - 682
22 pages
FP-Tree Growth Algorithm
No ratings yet
FP-Tree Growth Algorithm
15 pages
Mining Temporal Patterns For Interval-Based and Point-Based Events
No ratings yet
Mining Temporal Patterns For Interval-Based and Point-Based Events
6 pages
DM Module 3
No ratings yet
DM Module 3
11 pages
DWDS Unit 4
No ratings yet
DWDS Unit 4
56 pages
Unit5-Dwdm
No ratings yet
Unit5-Dwdm
58 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Icremental Mining of Sequential Pattern
No ratings yet
Icremental Mining of Sequential Pattern
25 pages
Improved Sequential Pattern Mining Using An Extended Bitmap Representation
No ratings yet
Improved Sequential Pattern Mining Using An Extended Bitmap Representation
11 pages
21 Maxweight
No ratings yet
21 Maxweight
15 pages
Week 3
No ratings yet
Week 3
56 pages
Good One
No ratings yet
Good One
12 pages
Lecture 4
No ratings yet
Lecture 4
76 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 2
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 2
2 pages
Networks Lecture 5
No ratings yet
Networks Lecture 5
29 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
DM Lect 9 - Classification - Decision Trees
No ratings yet
DM Lect 9 - Classification - Decision Trees
39 pages
DM Lect 6 - Recommender Systems
No ratings yet
DM Lect 6 - Recommender Systems
46 pages
DM Lec 6
No ratings yet
DM Lec 6
4 pages
5-Data Analytics in A Business Operations and BI Marketing Models
No ratings yet
5-Data Analytics in A Business Operations and BI Marketing Models
29 pages
Lecture 5 Modes of Operation
No ratings yet
Lecture 5 Modes of Operation
30 pages
Lecture 1 - Introduction To Data Security
No ratings yet
Lecture 1 - Introduction To Data Security
46 pages
Lec5-Regular Simplex Method and Dual Simplex Method
No ratings yet
Lec5-Regular Simplex Method and Dual Simplex Method
48 pages
1-Introduction To Business Intelligence in A Business Environment
No ratings yet
1-Introduction To Business Intelligence in A Business Environment
40 pages
3-Data Fundamentals For BI - Part2
No ratings yet
3-Data Fundamentals For BI - Part2
44 pages
Networks Lecture 2
No ratings yet
Networks Lecture 2
21 pages
Networks Lecture 1
No ratings yet
Networks Lecture 1
28 pages
Office Records Management
No ratings yet
Office Records Management
17 pages
Dbms Unit - 1
No ratings yet
Dbms Unit - 1
18 pages
Uday Resume
No ratings yet
Uday Resume
3 pages
28-Introduction To Dashboards-27-03-2025
No ratings yet
28-Introduction To Dashboards-27-03-2025
20 pages
Mobile Shop
33% (3)
Mobile Shop
8 pages
DBMS QuestionBank
No ratings yet
DBMS QuestionBank
3 pages
Lesson 1 Information Retrieval
No ratings yet
Lesson 1 Information Retrieval
6 pages
Crime Prediction Using Machine Learning Project (1) (Read-Only)
No ratings yet
Crime Prediction Using Machine Learning Project (1) (Read-Only)
14 pages
Oracle Wait Events and Solution Part I
No ratings yet
Oracle Wait Events and Solution Part I
11 pages
Types of Computer
No ratings yet
Types of Computer
5 pages
AD3491 - FDSA - Unit - I - Topic 01 - Need For Data Science
No ratings yet
AD3491 - FDSA - Unit - I - Topic 01 - Need For Data Science
12 pages
Capstone Project Documentation Contents New
No ratings yet
Capstone Project Documentation Contents New
4 pages
My SQL
No ratings yet
My SQL
44 pages
#Chapter - 7 @HCI (Important Points)
No ratings yet
#Chapter - 7 @HCI (Important Points)
8 pages
HCI Transforming E Learning Platforms Content
No ratings yet
HCI Transforming E Learning Platforms Content
21 pages
IoT Project Template
No ratings yet
IoT Project Template
54 pages
KNN Algo
No ratings yet
KNN Algo
7 pages
PHP Runner
No ratings yet
PHP Runner
613 pages
Complex Excel Formulas 5 Real World Examples
No ratings yet
Complex Excel Formulas 5 Real World Examples
14 pages
A Project Report ON Online Movie Ticket Booking: Project Report Submitted in The Partial Fulfillment For The Degree of
No ratings yet
A Project Report ON Online Movie Ticket Booking: Project Report Submitted in The Partial Fulfillment For The Degree of
3 pages
LSST Gen3 Butler
No ratings yet
LSST Gen3 Butler
17 pages
Cyber Recovery With Powerprotect For Multi Cloud Solution Brief
No ratings yet
Cyber Recovery With Powerprotect For Multi Cloud Solution Brief
2 pages
Zoho Creator Plan Comparison
No ratings yet
Zoho Creator Plan Comparison
7 pages
Cloud Computing Seminar Abstract
100% (6)
Cloud Computing Seminar Abstract
2 pages
Chapter 3 MCQV
No ratings yet
Chapter 3 MCQV
33 pages
IV.3 Multilevel Tree & Multilevel BOM: Bill of Material, Meja Lipat
No ratings yet
IV.3 Multilevel Tree & Multilevel BOM: Bill of Material, Meja Lipat
1 page
Técnica Cirúrgica em Grandes Animais-Turner-1 - 240128 - 133347
No ratings yet
Técnica Cirúrgica em Grandes Animais-Turner-1 - 240128 - 133347
331 pages
Technology: Kraft Heinz Finds A New Recipe For Analyzing Its Data
No ratings yet
Technology: Kraft Heinz Finds A New Recipe For Analyzing Its Data
5 pages
r18dbms Lab Manual
No ratings yet
r18dbms Lab Manual
57 pages

DM Lect 5 - Sequence & Stream Mining

Uploaded by

DM Lect 5 - Sequence & Stream Mining

Uploaded by

Frequent Patterns Data Mining

Dr. Wedad Hussein

Dr. Mahmoud Mounir

Sequence Sequence Element Event

<{i1, i2, i4}, {i3}, {i2, i4}>

Sequence A Sequence B Is B subsequence

<{2,4} {3,5,6} <{2} {3,6} {8}> Yes

Prefix Suffix (Prefix-Based

• Step 1: find length-1 sequential patterns

>a< >b< >c< >d< >e< >f< >g<

>a< >b< >c< >d< >e< >f< }f_{<

Use b as a prefix Use c as a prefix

<{c}{b}> >c< >c_< >a< > e<

• Data arrives fast.

• Processing is done in real-time.

• Keep an unbiased sample of the

• A window is a subset of the arrived

• From a fixed starting point I to current

• Focus on recent data only.

• Associates weights with the data in

• One-time Query: evaluated once at

• Multiple passes are not achievable.

• Keep track of limited set of

• Inputs: min_support threshold (σ)

Add item to list No

∆, the maximum possible error on the frequency

• The support can be underestimated

You might also like