0% found this document useful (0 votes)
56 views8 pages

Term Paper CS705A

The document discusses the Apriori algorithm for frequent itemset mining. It begins with an introduction to frequent itemset mining and the need for association rule mining. It then covers the Apriori algorithm's strategies for reducing candidates and leveraging the Apriori principle. An example is provided. It discusses counting supports, the runtime, and concludes with the core aspects of Apriori and its bottlenecks being candidate generation and multiple database scans.

Uploaded by

Sourav Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views8 pages

Term Paper CS705A

The document discusses the Apriori algorithm for frequent itemset mining. It begins with an introduction to frequent itemset mining and the need for association rule mining. It then covers the Apriori algorithm's strategies for reducing candidates and leveraging the Apriori principle. An example is provided. It discusses counting supports, the runtime, and concludes with the core aspects of Apriori and its bottlenecks being candidate generation and multiple database scans.

Uploaded by

Sourav Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Subnetting

Bachelor of Technology
Computer Science and Engineering

Submitted By

SOURAV BANERJEE (13000116045)

OCTOBER 2019

Techno India
EM-4/1, Sector-V, Salt Lake
Kolkata- 700091
West Bengal
India
TABLE OF CONTENTS

1. Abstract

2. Introduction

3. Frequent Itemset Generation Strategies

4. Reducing Number of Candidates

5. Illustrating Apriori Principle

6. The Idea of the Apriori Algorithm

7. Apriori Algorithm Example (s = 50%)

8. How to Count Supports of Candidates?

9. Run Time of Apriori

10. Conclusion

11. References

TISL/CSE/TermPaper 2
Abstract

The Association Analysis platform uses the Apriori algorithm to reduce computational
time when generating frequent item sets. The Apriori algorithm leverages the fact that an
item set’s support is never larger than the support of its subsets. The platform generates
larger item sets from combinations of smaller item sets that meet the minimum support
level. In addition, the platform does not generate item sets that exceed either the specified
maximum number of antecedents or the maximum rule size. These options are useful when
working with large data sets, because the total possible number of rules increases
exponentially with the number of items. For more information about the Apriori
algorithm, see Agrawal and Srikant

Introduction

Association Mining searches for frequent items in the data-set. In frequent mining usually
the interesting associations and correlations between item sets in transactional and
relational databases are found. In short, Frequent Mining shows which items appear
together in a transaction or relation.

Need of Association Mining:


Frequent mining is generation of association rules from a Transactional Dataset. If there
are 2 items X and Y purchased frequently then its good to put them together in stores or
provide some discount offer on one item on purchase of other item. This can really increase
the sales. For example it is likely to find that if a customer buys Milk and bread he/she also
buys Butter.
So the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So seller can suggest the customer
to buy butter if he/she buys Milk and Bread.

TISL/CSE/TermPaper 3
Body

Frequent Itemset Generation Strategies

• Reduce the number of candidates (M)

– Complete search: M=2d

– Use pruning techniques to reduce M

• Reduce the number of transactions (N)

– Reduce size of N as the size of itemset increases

– Used by DHP and vertical-based mining algorithms

• Reduce the number of comparisons (NM)

– Use efficient data structures to store the

candidates or transactions

– No need to match every candidate against every

Transaction

Reducing Number of Candidates

• Apriori principle: – If an itemset is frequent, then all of its subsets must also be
frequent

• Apriori principle holds due to the following property of the support measure:

∀X,Y (: X ⊆ Y ) ⇒ s(X ) ≥ s(Y )

– Support of an itemset never exceeds the support of its subsets

– This is known as the anti-monotone property of support

TISL/CSE/TermPaper 4
Illustrating Apriori Principle

Illustrating Apriori Principle

TISL/CSE/TermPaper 5
The Idea of the Apriori Algorithm

start with all 1-itemsets

• go through data and count their support and find all “large” 1-itemsets

• combine them to form “candidate” 2-itemsets

• go through data and count their support and find all “large” 2-itemsets

• combine them to form “candidate” 3-itemsets

The Apriori Algorithm

• Join Step: Ck is generated by joining Lk-1with itself

• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent
k-itemset

• Pseudo-code: Ck : Candidate itemset of size k Lk : frequent itemset of size k L1 =


{frequent items}; for (k = 1; Lk !=∅; k++) do begin Ck+1 = candidates generated
from Lk ; for each transaction t in database do increment the count of all candidates
in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end
return ∪k Lk ;

Apriori Algorithm Example (s = 50%)

TISL/CSE/TermPaper 6
How to Count Supports of Candidates?

• Why counting supports of candidates is a problem?

– The total number of candidates can be very huge

– One transaction may contain many candidates

• Method:

– Candidate itemsets are stored in a hash-tree

– Leaf node of hash-tree contains a list of itemsets and counts

– Interior node contains a hash table

– Subset function

Run Time of Apriori

• k passes over data where k is the size of the largest candidate itemset

• Memory chunking algorithm ⇒ 2 passes over data on disk but multiple in memory

Toivonen 1996 gives a statistical technique which requires 1 + e passes (but more
memory)

Brin 1997 - Dynamic Itemset Counting ⇒ 1 + e passes (less memory)

Conclusion

• The core of the Apriori algorithm:

– Use frequent (k – 1)-itemsets to generate candidate frequent

k-itemsets

– Use database scan and pattern matching to collect counts for the

candidate itemsets

• The bottleneck of Apriori: candidate generation

TISL/CSE/TermPaper 7
– Huge candidate sets:

• 104 frequent 1-itemset will generate 107 candidate 2-itemsets

• To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100},

one needs to generate 2100 ≈ 1030 candidates.

– Multiple scans of database:

• Needs (n +1 ) scans, where n is the length of the longest pattern

References

 GeeksforGeeks [https://www.geeksforgeeks.org/frequent-item-set-in-data-set-association-
rule-mining/]

 Science Direct [https://www.sciencedirect.com/topics/computer-science/frequent-itemsets]

 Uregina [http://www2.cs.uregina.ca/~dbd/cs831/notes/itemsets/itemset_apriori.html]

 JMP [https://www.jmp.com/support/help/14-2/frequent-item-set-generation.shtml]

TISL/CSE/TermPaper 8

You might also like