0% found this document useful (0 votes)

58 views8 pages

Term Paper CS705A

The document discusses the Apriori algorithm for frequent itemset mining. It begins with an introduction to frequent itemset mining and the need for association rule mining. It then covers the Apriori algorithm's strategies for reducing candidates and leveraging the Apriori principle. An example is provided. It discusses counting supports, the runtime, and concludes with the core aspects of Apriori and its bottlenecks being candidate generation and multiple database scans.

Uploaded by

Sourav Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views8 pages

Term Paper CS705A

Uploaded by

Sourav Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Subnetting

Bachelor of Technology
Computer Science and Engineering

Submitted By

SOURAV BANERJEE (13000116045)

OCTOBER 2019

Techno India
EM-4/1, Sector-V, Salt Lake
Kolkata- 700091
West Bengal
India
TABLE OF CONTENTS

1. Abstract

2. Introduction

3. Frequent Itemset Generation Strategies

4. Reducing Number of Candidates

5. Illustrating Apriori Principle

6. The Idea of the Apriori Algorithm

7. Apriori Algorithm Example (s = 50%)

8. How to Count Supports of Candidates?

9. Run Time of Apriori

10. Conclusion

11. References

TISL/CSE/TermPaper 2
Abstract

The Association Analysis platform uses the Apriori algorithm to reduce computational
time when generating frequent item sets. The Apriori algorithm leverages the fact that an
item set’s support is never larger than the support of its subsets. The platform generates
larger item sets from combinations of smaller item sets that meet the minimum support
level. In addition, the platform does not generate item sets that exceed either the specified
maximum number of antecedents or the maximum rule size. These options are useful when
working with large data sets, because the total possible number of rules increases
exponentially with the number of items. For more information about the Apriori
algorithm, see Agrawal and Srikant

Introduction

Association Mining searches for frequent items in the data-set. In frequent mining usually
the interesting associations and correlations between item sets in transactional and
relational databases are found. In short, Frequent Mining shows which items appear
together in a transaction or relation.

Need of Association Mining:

Frequent mining is generation of association rules from a Transactional Dataset. If there
are 2 items X and Y purchased frequently then its good to put them together in stores or
provide some discount offer on one item on purchase of other item. This can really increase
the sales. For example it is likely to find that if a customer buys Milk and bread he/she also
buys Butter.
So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So seller can suggest the customer
to buy butter if he/she buys Milk and Bread.

TISL/CSE/TermPaper 3
Body

Frequent Itemset Generation Strategies

• Reduce the number of candidates (M)

– Complete search: M=2d

– Use pruning techniques to reduce M

• Reduce the number of transactions (N)

– Reduce size of N as the size of itemset increases

– Used by DHP and vertical-based mining algorithms

• Reduce the number of comparisons (NM)

– Use efficient data structures to store the

candidates or transactions

– No need to match every candidate against every

Transaction

Reducing Number of Candidates

• Apriori principle: – If an itemset is frequent, then all of its subsets must also be
frequent

• Apriori principle holds due to the following property of the support measure:

∀X,Y (: X ⊆ Y ) ⇒ s(X ) ≥ s(Y )

– Support of an itemset never exceeds the support of its subsets

– This is known as the anti-monotone property of support

TISL/CSE/TermPaper 4
Illustrating Apriori Principle

Illustrating Apriori Principle

TISL/CSE/TermPaper 5
The Idea of the Apriori Algorithm

start with all 1-itemsets

• go through data and count their support and find all “large” 1-itemsets

• combine them to form “candidate” 2-itemsets

• go through data and count their support and find all “large” 2-itemsets

• combine them to form “candidate” 3-itemsets

The Apriori Algorithm

• Join Step: Ck is generated by joining Lk-1with itself

• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent
k-itemset

• Pseudo-code: Ck : Candidate itemset of size k Lk : frequent itemset of size k L1 =

{frequent items}; for (k = 1; Lk !=∅; k++) do begin Ck+1 = candidates generated
from Lk ; for each transaction t in database do increment the count of all candidates
in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end
return ∪k Lk ;

Apriori Algorithm Example (s = 50%)

TISL/CSE/TermPaper 6
How to Count Supports of Candidates?

• Why counting supports of candidates is a problem?

– The total number of candidates can be very huge

– One transaction may contain many candidates

• Method:

– Candidate itemsets are stored in a hash-tree

– Leaf node of hash-tree contains a list of itemsets and counts

– Interior node contains a hash table

– Subset function

Run Time of Apriori

• k passes over data where k is the size of the largest candidate itemset

• Memory chunking algorithm ⇒ 2 passes over data on disk but multiple in memory

Toivonen 1996 gives a statistical technique which requires 1 + e passes (but more
memory)

Brin 1997 - Dynamic Itemset Counting ⇒ 1 + e passes (less memory)

Conclusion

• The core of the Apriori algorithm:

– Use frequent (k – 1)-itemsets to generate candidate frequent

k-itemsets

– Use database scan and pattern matching to collect counts for the

candidate itemsets

• The bottleneck of Apriori: candidate generation

TISL/CSE/TermPaper 7
– Huge candidate sets:

• 104 frequent 1-itemset will generate 107 candidate 2-itemsets

• To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100},

one needs to generate 2100 ≈ 1030 candidates.

– Multiple scans of database:

• Needs (n +1 ) scans, where n is the length of the longest pattern

References

 GeeksforGeeks [https://www.geeksforgeeks.org/frequent-item-set-in-data-set-association-
rule-mining/]

 Science Direct [https://www.sciencedirect.com/topics/computer-science/frequent-itemsets]

 Uregina [http://www2.cs.uregina.ca/~dbd/cs831/notes/itemsets/itemset_apriori.html]

 JMP [https://www.jmp.com/support/help/14-2/frequent-item-set-generation.shtml]

TISL/CSE/TermPaper 8

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
3 DeltaRule PDF
No ratings yet
3 DeltaRule PDF
10 pages
Trees: Discrete Mathematics
No ratings yet
Trees: Discrete Mathematics
40 pages
Data Mining Techniques & Applications: Association Rules
No ratings yet
Data Mining Techniques & Applications: Association Rules
50 pages
Ijctt V27P116
No ratings yet
Ijctt V27P116
7 pages
DM - Unit 2
No ratings yet
DM - Unit 2
49 pages
L6-7 - Apriori
No ratings yet
L6-7 - Apriori
22 pages
Improving Efficiency of Apriori Algorithm Using Transaction Reduction
No ratings yet
Improving Efficiency of Apriori Algorithm Using Transaction Reduction
4 pages
An Approach of Improvisation in Efficiency of Apriori Algorithm
No ratings yet
An Approach of Improvisation in Efficiency of Apriori Algorithm
13 pages
Unit 4
No ratings yet
Unit 4
72 pages
Frequent Pattern Analysis-Arpriori
No ratings yet
Frequent Pattern Analysis-Arpriori
27 pages
Unit IV DWDM
No ratings yet
Unit IV DWDM
17 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
9 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
No ratings yet
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
7 pages
Apriori and FP-Growth Algorithm
No ratings yet
Apriori and FP-Growth Algorithm
48 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Unit 4
No ratings yet
Unit 4
21 pages
Data Analytics - Unit - 4
No ratings yet
Data Analytics - Unit - 4
14 pages
Week 6 - Basic Association Analysis
No ratings yet
Week 6 - Basic Association Analysis
71 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
11 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
Shweta Singh-Dwdm2024
No ratings yet
Shweta Singh-Dwdm2024
5 pages
DMDW 3rd Module
No ratings yet
DMDW 3rd Module
34 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Unit 3
No ratings yet
Unit 3
36 pages
Ex 9 DWM Aryant
No ratings yet
Ex 9 DWM Aryant
9 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
15 pages
Apriori Algorithm: 1 Setting
No ratings yet
Apriori Algorithm: 1 Setting
3 pages
A New Efficient Matrix Based Frequent Itemset Mining Algorithm With Tags
No ratings yet
A New Efficient Matrix Based Frequent Itemset Mining Algorithm With Tags
4 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
Data Warehousing and Mining
No ratings yet
Data Warehousing and Mining
14 pages
Module 4 DM
No ratings yet
Module 4 DM
86 pages
DM Association
No ratings yet
DM Association
43 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Data Mining Practical 6
No ratings yet
Data Mining Practical 6
5 pages
Module 4
No ratings yet
Module 4
71 pages
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
No ratings yet
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
10 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Ariori DHP
No ratings yet
Ariori DHP
53 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Association Analysis: Basic Concepts and Algorithms: Problem Definition
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Problem Definition
15 pages
Unit 2 Decision Tree
No ratings yet
Unit 2 Decision Tree
16 pages
Study On Application of Apriori Algorithm in Data Mining
No ratings yet
Study On Application of Apriori Algorithm in Data Mining
4 pages
Mining Frequent Itemsets Using Apriori Algorithm
No ratings yet
Mining Frequent Itemsets Using Apriori Algorithm
5 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
1 Algo
No ratings yet
1 Algo
3 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
Learn Design and Analysis of Algorithms in 24 Hours
From Everand
Learn Design and Analysis of Algorithms in 24 Hours
Alex Nordeen
No ratings yet
IGNOU Operating System Previous Years Solved Papers
From Everand
IGNOU Operating System Previous Years Solved Papers
Manish Soni
No ratings yet
Resource Allocation Graph (RAG) in Operating System
No ratings yet
Resource Allocation Graph (RAG) in Operating System
8 pages
Rubik's Cube
No ratings yet
Rubik's Cube
6 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
Euler Circuits: The Circuit Comes To Town
No ratings yet
Euler Circuits: The Circuit Comes To Town
55 pages
Amortized Analysis
100% (1)
Amortized Analysis
15 pages
Examples ML Algorithm
No ratings yet
Examples ML Algorithm
4 pages
Randomized Algorithms: Prabhakar Raghavan IBM Almaden Research Center San Jose, CA
No ratings yet
Randomized Algorithms: Prabhakar Raghavan IBM Almaden Research Center San Jose, CA
40 pages
Sequencing and Scheduling: J J J J J J J J J J J J
No ratings yet
Sequencing and Scheduling: J J J J J J J J J J J J
8 pages
UNIT-2 (Part-1)
No ratings yet
UNIT-2 (Part-1)
22 pages
BCA DM Chapter 6 - Association
No ratings yet
BCA DM Chapter 6 - Association
37 pages
Module 22 Graphs
No ratings yet
Module 22 Graphs
79 pages
Experiment - I Addition/Subtraction of An Array of 16-Bit Numbers
No ratings yet
Experiment - I Addition/Subtraction of An Array of 16-Bit Numbers
7 pages
Graphs
No ratings yet
Graphs
37 pages
Ok What Is An Algorithm
No ratings yet
Ok What Is An Algorithm
18 pages
Introduction To Quantum Computing
No ratings yet
Introduction To Quantum Computing
131 pages
Boolean Logic Class 11 CS
No ratings yet
Boolean Logic Class 11 CS
33 pages
AI QBSM-III Updated
No ratings yet
AI QBSM-III Updated
67 pages
1.2 Initial-Value Problems - 20182019
No ratings yet
1.2 Initial-Value Problems - 20182019
11 pages
AQA GCSE Computer Science - Paper 1 Revision Sheet
No ratings yet
AQA GCSE Computer Science - Paper 1 Revision Sheet
4 pages
Linear Programming Simplex Method
No ratings yet
Linear Programming Simplex Method
9 pages
Booth's, Barrel's and Array Multiplication
No ratings yet
Booth's, Barrel's and Array Multiplication
7 pages
Information Theory &amp Coding (ECE) by Nitin Mittal
0% (1)
Information Theory &amp Coding (ECE) by Nitin Mittal
15 pages
Sudoku Solver Report
No ratings yet
Sudoku Solver Report
5 pages
Kunci Jawaban Math
No ratings yet
Kunci Jawaban Math
19 pages
Lab Manual: Experiment NO: Name of Experiment
No ratings yet
Lab Manual: Experiment NO: Name of Experiment
19 pages
Cat 2 - Artificial Intelligence 1 Marks: (B) Local Beam Search
No ratings yet
Cat 2 - Artificial Intelligence 1 Marks: (B) Local Beam Search
10 pages
Final Quiz 2 4
No ratings yet
Final Quiz 2 4
4 pages
Operators: Chapter-2
No ratings yet
Operators: Chapter-2
8 pages

Term Paper CS705A

Uploaded by

Term Paper CS705A

Uploaded by

Subnetting

SOURAV BANERJEE (13000116045)

3. Frequent Itemset Generation Strategies

4. Reducing Number of Candidates

5. Illustrating Apriori Principle

6. The Idea of the Apriori Algorithm

7. Apriori Algorithm Example (s = 50%)

8. How to Count Supports of Candidates?

9. Run Time of Apriori

Need of Association Mining:

Frequent Itemset Generation Strategies

• Reduce the number of candidates (M)

– Complete search: M=2d

– Use pruning techniques to reduce M

• Reduce the number of transactions (N)

– Reduce size of N as the size of itemset increases

– Used by DHP and vertical-based mining algorithms

• Reduce the number of comparisons (NM)

– Use efficient data structures to store the

– No need to match every candidate against every

Reducing Number of Candidates

∀X,Y (: X ⊆ Y ) ⇒ s(X ) ≥ s(Y )

– Support of an itemset never exceeds the support of its subsets

– This is known as the anti-monotone property of support

Illustrating Apriori Principle

start with all 1-itemsets

• combine them to form “candidate” 2-itemsets

• combine them to form “candidate” 3-itemsets

The Apriori Algorithm

• Join Step: Ck is generated by joining Lk-1with itself

• Pseudo-code: Ck : Candidate itemset of size k Lk : frequent itemset of size k L1 =

Apriori Algorithm Example (s = 50%)

• Why counting supports of candidates is a problem?

– The total number of candidates can be very huge

– One transaction may contain many candidates

– Candidate itemsets are stored in a hash-tree

– Leaf node of hash-tree contains a list of itemsets and counts

– Interior node contains a hash table

Run Time of Apriori

Brin 1997 - Dynamic Itemset Counting ⇒ 1 + e passes (less memory)

• The core of the Apriori algorithm:

– Use frequent (k – 1)-itemsets to generate candidate frequent

• The bottleneck of Apriori: candidate generation

• 104 frequent 1-itemset will generate 107 candidate 2-itemsets

• To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100},

one needs to generate 2100 ≈ 1030 candidates.

– Multiple scans of database:

• Needs (n +1 ) scans, where n is the length of the longest pattern

 Science Direct [https://www.sciencedirect.com/topics/computer-science/frequent-itemsets]

You might also like