0% found this document useful (0 votes)

14 views33 pages

Apriori

The document discusses Frequent Itemset Mining (FIM) and the Apriori algorithm, which is used to identify frequent itemsets in transactional data. It explains the concepts of support, cover, and the anti-monotonicity principle that underlies the Apriori algorithm, as well as its efficiency improvements through candidate generation and counting. Additionally, it mentions the historical context of FIM and the challenges associated with optimizing the algorithm.

Uploaded by

pierrick.verhoosel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views33 pages

Apriori

Uploaded by

pierrick.verhoosel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

LINFO2364: Mining Patterns in Data

Frequent Itemset Mining

& Apriori
Siegfried Nijssen
Transactional Data
●
Itemset mining is defined on data that consists of
transactions, where each transaction consists of a set of
items

2
Frequent Itemset Mining
Can be applied on any tabular dataset after binarization
A B C
0.5 0.6 10
0.3 0.7 12
0.9 0.4 9

A=[0-0.5] A=]0.5-1] B=[0-0.5] B=]0.5-1] C=[0-10] C=]10-20]

1 0 0 1 1 0
1 0 0 1 0 1
0 1 1 0 1 0
3
Frequent Itemset Mining
●
Examples of data analyzed using FIM in the literature:
¨ Supermarket products per visitor
¨ Web pages accessed per visitor
¨ Binarized gene expression for patients
¨ Transcription factor binding sites of genes
¨ Single-nucleotide polymorphisms of patients

4
Transactional Data
●
Itemset mining is defined on data that consists of
transactions, where each transaction consists of a set of
items
●
Running example:
{B}
{E}
{A, C}
{A, E}
{B, C}
{D, E}
{C, D, E}
{A, B, C}
{A, B, E}
{A, B, C, E}

5
Transactional Data
●
Transactional data can be represented in a boolean matrix

A B C D E
{B} 0 1 0 0 0
{E} 0 0 0 0 1
{A, C} 1 0 1 0 0
{A, E} 1 0 0 0 1
{B, C} 0 1 1 0 0
{D, E} 0 0 0 1 1
{C, D, E} 0 0 1 1 1
{A, B, C} 1 1 1 0 0
{A, B, E} 1 1 0 0 1
{A, B, C, E} 1 1 1 0 1

6
Transactional Data
●
More formally, we represent:
¨ All possible items using a set , e.g.
¨ All possible transaction identifiers using a set , e.g.:

¨ A transactional database using one of these equivalent formalisms:

●
A function that returns for every transaction identifier a set of
items:
For example,

●
A function that returns a boolean for every transaction and
item:
For example,

●
A boolean matrix indexed with transaction and item ids
7
Cover and Support
●
An itemset matches or covers a transaction
iff ; more formally:

●
The cover of an itemset in a database is the set of
identifiers of transactions it covers in the database:

●
The support of an itemset in a database is the
size of the cover:

8
Cover and Support
●
Example for: 1 {B}
2 {E}
3 {A, C}
4 {A, E}
5 {B, C}
6 {D, E}
7 {C, D, E}
8 {A, B, C}
9 {A, B, E}
10 {A, B, C, E}

9
Frequency & Frequent Itemsets
●
The frequency (or relative support) is the normalized
support:

●
An itemset is frequent iff its support is above a given
minimum support threshold , i.e. iff

or, alternatively, iff

●
The task of frequent itemset mining is the task of finding
all itemsets that are frequent, i.e., to find the set

10
Frequency & Frequent Itemsets
●
Example for: 1 {B}
2 {E}
3 {A, C}
4 {A, E}
5 {B, C}
6 {D, E}
7 {C, D, E}
8 {A, B, C}
9 {A, B, E}
10 {A, B, C, E}

●
→ if , is a frequent
itemset
●
→ if , is infrequent
11
Frequency & Frequent Itemsets
1 {B}
2 {E}
3 {A, C}
4 {A, E}
5 {B, C}
6 {D, E}
7 {C, D, E}
8 {A, B, C}
9 {A, B, E}
●
{A,C}? 10 {A, B, C, E}

●
{C,D}?
●
{A,B,E}?

12
Diagram of All Itemsets
Is-a-direct-subset-of:

13
1 {B}
2 {E}

Diagram of All Itemsets 3

4
{A, C}
{A, E}
5 {B, C}
6 {D, E}
7 {C, D, E}

Itemset is frequent for 8 {A, B, C}

9 {A, B, E}
10 {A, B, C, E}

14
Complexity of
Frequent Itemset Mining

●
How many potential frequent itemsets are there?

where n is the number of items

●
Is there a database and a minimum support threshold for
which this number of frequent itemsets is obtained?

16
Early History of
Frequent Itemset Mining
●
1993: Frequent itemset mining was defined as a task by
Agrawal, Imieliński, and Srikant
¨ Although it was called “large itemset mining” at the time
●
1994: The first good algorithm for frequent itemset mining was
published independently by two teams:
¨ Agrawal and Srikant from IBM,
calling the algorithm “Apriori”
(21.608 citations)
¨ Mannila, Toivonen and Verkamo
from the University of Helsinki,
calling the algorithm “OCD”
Agrawal Mannila
(1.019 citations)
●
1996: Agrawal, Srikant, Mannila, Toivonen and Verkamo
published a joint paper summarizing their algorithm
17
Apriori
Two key ideas:

1) Anti-monotonicity 2) Level-wise search

18
Apriori: Anti-Monotonicity
●
A function on sets is called anti-monotonic iff:

●
Support is anti-monotonic:

Proof: if , then
Hence,
Hence,
Hence,
And finally,

19
Apriori: Anti-Monotonicity
●
Any subset of a frequent itemset is also frequent
●
Any superset of an infrequent itemset is also infrequent

20
Apriori: Level-Wise Search

Level 0

Level 1

Level 2

Level 3

Level 4

Level 5

21
Apriori: Outline
Apriori( ):
i=0
do
Generate candidates at level i from itemsets (if )
Determine support of candidates in data
= Frequent itemsets in
i=i+1
while is not empty

●
Anti-monotonicity is used to reduce the amount of itemsets
for which the support is calculated (“candidates”)
¨ Don’t count itemsets which have an infrequent subset
22
Apriori: Level-Wise Search

Level 0

Level 1

Level 2

Level 3

Level 4

Level 5

Itemset included in Itemset pruned as candidate

23
Apriori: Counting Candidates
●
All candidates at a certain level can be counted using a
single pass through the data
●
Outline:

Determine support of candidates in data :

for every transaction do

for every candidate do
if then
increase counter for candidate C

●
All frequent itemsets are found using at most passes
through the data

24
Apriori: Generating Candidates
●
Naïve process:

Generate candidates at level i from itemsets :

for all subsets C of with |C|=i do

if all subsets of C occur in then
add C to the set of candidates

●
Inefficient: would need to generate all subsets, many of
which may not be reasonable candidates

25
Apriori: Generating Candidates
●
More efficient approach:
¨ Convert all frequent itemsets in a level i into strings by sorting the
items in all sets (e.g., alphabetically)
●
{A,B} → AB
●
{A,C} → AC
¨ Generate initial candidates by combining strings that are identical,
except for the last symbol
●
AB and AC → ABC
●
ABC and ABD → ABCD
●
ABC and ABE → ABCE
¨ Advantage: every candidate generated in this process has already
at least two frequent itemsets

26
Apriori: Generating Candidates

Level 0

Level 1

Level 2

Level 3

Level 4

Level 5

Itemset generated by this frequent itemset merging process

27
Apriori: Generating Candidates
After merging itemsets, we can still check the other subsets to eliminate other candidates

Level 0

Level 1

Level 2

Level 3

Level 4

Level 5
Only itemset that can be
pruned additionally
Itemset generated by this frequent itemset merging process

28
Apriori: Generating Candidates
●
Common implementation choices:
¨ Do not perform the additional anti-monotonicity pruning, as the
number of candidates eliminated additionally is often small
¨ Store frequent itemsets and candidates in a trie (also called a
prefix tree) to make merging more efficient

A B C D

B C E C E E E

Trie for all frequent itemsets of size

2 in the running example

29
Apriori: Generating Candidates
●
Merging itemsets = copying leaves, followed by deletion

A B C D

B C E C E E E

A B C D

B C E C E E E

C E E E
30
Apriori: Counting Candidates
●
The trie can also be used to make candidate counting more
efficient
¨ Traverse the trie, check the presence of its items in a transaction

A B

B C C

C E E E
+1

31
Apriori: Counting Candidates
●
The trie can also be used to make candidate counting more
efficient

Determine support for node v in candidate trie in

transaction t:

if v is a leaf then
increase support counter for node v
else
for all children i of node v do
if item i in transaction t then
Determine support for node i in transaction t

Call this function for the root of the trie

32
First Programming Exercise
●
Create two variations of the Apriori algorithm

●
They should differ in optimisations:
¨ Eg. with and without subset pruning
¨ Eg. with and without “intelligent” elimination of candidates

●
Compare their performance on a number of datasets, for
varying thresholds

●
Do your optimizations work?

34
Frequent Itemset Mining
Challenge
●
A “FIMI” challenge was organized in 2003 and 2004
aiming to develop the most efficient itemset mining
algorithm

http://fimi.ua.ac.be
●
This led to results such as these: Implementations of Apriori

Equent Patterns
No ratings yet
Equent Patterns
74 pages
Chap5 Frequent Itemset
No ratings yet
Chap5 Frequent Itemset
70 pages
Lecture 4
No ratings yet
Lecture 4
76 pages
Bu I 11 FIM Apriori
No ratings yet
Bu I 11 FIM Apriori
72 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Ariori DHP
No ratings yet
Ariori DHP
53 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
3 FrequentItemsetMining
No ratings yet
3 FrequentItemsetMining
63 pages
Module 3
No ratings yet
Module 3
136 pages
DM 2
No ratings yet
DM 2
71 pages
CH 4
No ratings yet
CH 4
51 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Data Mining: Frequent Itemsets and Association Rules
No ratings yet
Data Mining: Frequent Itemsets and Association Rules
105 pages
Apriori and FP-Growth Algorithm
No ratings yet
Apriori and FP-Growth Algorithm
48 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Frequent Pattern Analysis-Arpriori
No ratings yet
Frequent Pattern Analysis-Arpriori
27 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
No ratings yet
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
118 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Unit 3
No ratings yet
Unit 3
62 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
Week 3
No ratings yet
Week 3
56 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
Data Analytics - Unit - 4
No ratings yet
Data Analytics - Unit - 4
14 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Dm&bi - L10-Association Rules
No ratings yet
Dm&bi - L10-Association Rules
43 pages
Ijctt V27P116
No ratings yet
Ijctt V27P116
7 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Deep Learning
No ratings yet
Deep Learning
24 pages
DM Module 3
No ratings yet
DM Module 3
11 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
No ratings yet
Apriori Based Novel Frequent Itemset Mining Mechanism: Issn No
8 pages
Unit - 3 Greedy Algorithm (Fractional Knapsack Problem)
No ratings yet
Unit - 3 Greedy Algorithm (Fractional Knapsack Problem)
14 pages
4 DL Deep Neural Nets
No ratings yet
4 DL Deep Neural Nets
56 pages
Deep Learning Unit 1
No ratings yet
Deep Learning Unit 1
35 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Algebraic Expressions: Course
No ratings yet
Algebraic Expressions: Course
6 pages
Frequent Item-Set Mining Methods: Prepared By-Mr - Nilesh Magar
No ratings yet
Frequent Item-Set Mining Methods: Prepared By-Mr - Nilesh Magar
31 pages
Uninformed Search Strategies: Artificial Intelligence COSC-3112 Ms. Humaira Anwer
No ratings yet
Uninformed Search Strategies: Artificial Intelligence COSC-3112 Ms. Humaira Anwer
21 pages
Association Analysis: Basic Concepts and Algorithms: Problem Definition
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Problem Definition
15 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Deep Learning Basics Lecture 2 Backpropagation
No ratings yet
Deep Learning Basics Lecture 2 Backpropagation
31 pages
Determinants Theory & Solved Example Module-6-A
No ratings yet
Determinants Theory & Solved Example Module-6-A
12 pages
Unit2 Searching and Sorting
No ratings yet
Unit2 Searching and Sorting
16 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
2 pages
Simpson S Rules
No ratings yet
Simpson S Rules
9 pages
Cap 7 - 3151910-Operations-Research-Theory-And-Applications-By-J.-K.-Sharma-Z-Lib - Org
No ratings yet
Cap 7 - 3151910-Operations-Research-Theory-And-Applications-By-J.-K.-Sharma-Z-Lib - Org
35 pages
Slides Credited From Hsueh-I Lu & Hsu-Chun Hsiao
No ratings yet
Slides Credited From Hsueh-I Lu & Hsu-Chun Hsiao
43 pages
The Simplex Method and Sensitivity Analysis
No ratings yet
The Simplex Method and Sensitivity Analysis
55 pages
Taylor Series Method For Solving Linear Fredholm Integral Equation of Second Kind Using MATLAB
No ratings yet
Taylor Series Method For Solving Linear Fredholm Integral Equation of Second Kind Using MATLAB
10 pages
Project Report Graph Algorithm Simulator .
No ratings yet
Project Report Graph Algorithm Simulator .
15 pages
EUC1502 Module2 Machine Learning
No ratings yet
EUC1502 Module2 Machine Learning
32 pages
Politecnico Di Torino: Bridging Course in Mathematics Sheet 1 Polynomials
No ratings yet
Politecnico Di Torino: Bridging Course in Mathematics Sheet 1 Polynomials
10 pages
HW3 Sol PDF
No ratings yet
HW3 Sol PDF
44 pages
18MC9122 - Design and Analysis of Algorithms
No ratings yet
18MC9122 - Design and Analysis of Algorithms
4 pages
4 How To Solve Recurrence Relation
No ratings yet
4 How To Solve Recurrence Relation
18 pages
What Is BFS?: Let's Consider The Below Graph For The Breadth First Search Traversal
No ratings yet
What Is BFS?: Let's Consider The Below Graph For The Breadth First Search Traversal
13 pages
Linear Congruence PDF
No ratings yet
Linear Congruence PDF
1 page
Maths X Periodic Test 1 Samper Paper 02 2017
No ratings yet
Maths X Periodic Test 1 Samper Paper 02 2017
2 pages
5th Sem March-April 2024 Design and Analysis of Algorithms
No ratings yet
5th Sem March-April 2024 Design and Analysis of Algorithms
2 pages
Questions 4
No ratings yet
Questions 4
5 pages
Uj 038
No ratings yet
Uj 038
2 pages
BCS 465 Neural Network - 2020
No ratings yet
BCS 465 Neural Network - 2020
5 pages
17CS43 2019 Jan
No ratings yet
17CS43 2019 Jan
3 pages
A New Hybrid GA/SA Algorithm For The Job Shop Scheduling Problem
No ratings yet
A New Hybrid GA/SA Algorithm For The Job Shop Scheduling Problem
2 pages

Apriori

Uploaded by

Apriori

Uploaded by

LINFO2364: Mining Patterns in Data

Frequent Itemset Mining

A=[0-0.5] A=]0.5-1] B=[0-0.5] B=]0.5-1] C=[0-10] C=]10-20]

¨ A transactional database using one of these equivalent formalisms:

or, alternatively, iff

Diagram of All Itemsets 3

Itemset is frequent for 8 {A, B, C}

where n is the number of items

1) Anti-monotonicity 2) Level-wise search

Itemset included in Itemset pruned as candidate

Determine support of candidates in data :

for every transaction do

Generate candidates at level i from itemsets :

for all subsets C of with |C|=i do

Itemset generated by this frequent itemset merging process

Trie for all frequent itemsets of size

Determine support for node v in candidate trie in

Call this function for the root of the trie

You might also like