Term Paper CS705A
Term Paper CS705A
Bachelor of Technology
Computer Science and Engineering
Submitted By
OCTOBER 2019
Techno India
EM-4/1, Sector-V, Salt Lake
Kolkata- 700091
West Bengal
India
TABLE OF CONTENTS
1. Abstract
2. Introduction
10. Conclusion
11. References
TISL/CSE/TermPaper 2
Abstract
The Association Analysis platform uses the Apriori algorithm to reduce computational
time when generating frequent item sets. The Apriori algorithm leverages the fact that an
item set’s support is never larger than the support of its subsets. The platform generates
larger item sets from combinations of smaller item sets that meet the minimum support
level. In addition, the platform does not generate item sets that exceed either the specified
maximum number of antecedents or the maximum rule size. These options are useful when
working with large data sets, because the total possible number of rules increases
exponentially with the number of items. For more information about the Apriori
algorithm, see Agrawal and Srikant
Introduction
Association Mining searches for frequent items in the data-set. In frequent mining usually
the interesting associations and correlations between item sets in transactional and
relational databases are found. In short, Frequent Mining shows which items appear
together in a transaction or relation.
TISL/CSE/TermPaper 3
Body
candidates or transactions
Transaction
• Apriori principle: – If an itemset is frequent, then all of its subsets must also be
frequent
• Apriori principle holds due to the following property of the support measure:
TISL/CSE/TermPaper 4
Illustrating Apriori Principle
TISL/CSE/TermPaper 5
The Idea of the Apriori Algorithm
• go through data and count their support and find all “large” 1-itemsets
• go through data and count their support and find all “large” 2-itemsets
• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent
k-itemset
TISL/CSE/TermPaper 6
How to Count Supports of Candidates?
• Method:
– Subset function
• k passes over data where k is the size of the largest candidate itemset
• Memory chunking algorithm ⇒ 2 passes over data on disk but multiple in memory
Toivonen 1996 gives a statistical technique which requires 1 + e passes (but more
memory)
Conclusion
k-itemsets
– Use database scan and pattern matching to collect counts for the
candidate itemsets
TISL/CSE/TermPaper 7
– Huge candidate sets:
References
GeeksforGeeks [https://www.geeksforgeeks.org/frequent-item-set-in-data-set-association-
rule-mining/]
Uregina [http://www2.cs.uregina.ca/~dbd/cs831/notes/itemsets/itemset_apriori.html]
JMP [https://www.jmp.com/support/help/14-2/frequent-item-set-generation.shtml]
TISL/CSE/TermPaper 8