Report of 2nd Defence

This research proposes an improved Apriori algorithm to generate quantitative association rules by evaluating quantitative attributes associated with items in transactions. The proposed approach aims to reduce the number of candidate item sets generated and the overall execution time of the algorithm. A literature review discusses existing association rule mining algorithms like Apriori, AprioriTid, FP-growth and how they optimize performance and memory usage.

Uploaded by

Sachin Dhingra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views6 pages

Report of 2nd Defence

Uploaded by

Sachin Dhingra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 6

Study and Analysis of Association Rule Mining

Master of Technology In Information Technology

University School of Information Technology GGS Indraprastha University Kashmere Gate, Delhi 6
(2008-2011)

Supervisor Mr S.K.Sahu Asstt. Professor, USIT, GGSIPU M.Tech(ITW) VI Semester

Submitted by Anamika REG. NO. 0391646508

Abstract
This research work proposes an improved Apriori algorithm to minimize the number of candidate sets while generating association rules by evaluating quantitative information associated with each item that occurs in a transaction, which was usually, discarded as traditional association rules focus just on qualitative correlations. The proposed approach reduces not only the number of item sets generated but also the overall execution time of the algorithm. Any valued attribute will be treated as quantitative and will be used to derive the quantitative association rules which usually increases the rules' information content. Transaction reduction is achieved by discarding the transactions that does not contain any frequent item set in subsequent scans which in turn reduces overall execution time. Dynamic item set counting is done by adding new candidate item sets only when all of their subsets are estimated to be frequent. The frequent item ranges are the basis for generating higher order item ranges using Apriori algorithm. During each iteration of the algorithm, use the frequent sets from the previous iteration to generate the candidate sets and check whether their support is above the threshold. The set of candidate sets found is pruned by a strategy that discards sets which contain infrequent subsets. This work evaluates the scalability of the algorithm by considering transaction time, number of item sets used in the transaction and memory utilization. Quantitative association rules can be used in several domains where the traditional approach is employed.

Introduction
Association means grouping of related items from a set. A simple example is analyzing a large database of supermarket transactions with the aim of finding association rules. This is called Association rules mining or market basket analysis. Association rule mining is used to: Find frequent patterns. Find associations. Find correlations.

Among the set of items or objects in transactional databases, relational databases. This can be used by the retailers, entrepreneur in order to make any advertisement, improvement in their business. The market based analysis find customers purchasing

habits. This analysis is done onto the customer basket to identify the frequent combination of products. Market Basket Analysis is a technique that assists in understanding what items are likely to be purchased together according to the association rules, primarily with the aim of identifying cross-selling opportunities. A super market can use this technique to organize and place products frequently sold together into the same area. The direct marketers can use the MBA to find what new products to offer their customers. The application of market basket analysis is generally facilitated by the use of the data mining tools. Using this analysis product in demand can be identified by marketers and "combined take rates" of the products can be known. The combined take rates are defined as - how often the items are bought together. In a data base, this can be answered with a query. When there are 100 products, it will take thousands of queries to get the "most popular basket". Association rule proposed the supportconfidence measurement framework and reduced association rule mining to the discovery of frequent item sets. Two basic Entities of association rule are: Support-> Support is a measure of what fraction of thepopulation satisfies both the antecedent and the consequent ofthe rule. Confidence->Confidence is a measure of how often the consequent is true when the antecedent is true. Five different algorithms are used in development of association rules. They are AIS, SETM, Apriori, AprioriTID,Apriori Hybrid. Ex: Transaction Id 1 2 3 4 Items A,B,C A,C A,D B,E,F

Let the minimum support and minimum confidences both are 50%, we have the following association rule: A -> C (50%, 66.6%)

C -> A (50%, 100%) For A->C, the 66.6% means that customer buys A also have 66.6% chance tends to buy C. With C->A, customer buys C is 100% tends to buy A. Equation for support and confidence: Support (P->Q) = Probability (PQ) Confidence (P->Q) = Probability (Q/P) Think the transaction database in the previous page, for rule A->C: Support ({A, C}) = 2 / 4 * 100% = 50% Confidence = Support ({A, C})/ Support ({A}) = 50% /75% = 66.6%

Generating the frequent itemsets

TId T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 Items I1,I2,I5 I1,I2,I3 I2,I4 I1,I2,I4 I1,I3 I2,I3 I1,I3 I1,I2,I3,I5 I1,I2,I3 I1,I2,I3,I4,I5 From the definition mentioned above, we can mine association rule by Apriori algorithm, taking a typical example shown in Table . There are 10 transactions and 5 items in all, that is N=10; the minimum support threshold is 3/10. Scan D for count of each candidate itemsets of C1 = { {I1}, {I2}, {I3}, {I4}, {I5} } , the support of each one is 8/10, 8/10, 7/10, 3/10, 3/10 respectively, remove those candidate itemsets which support are lower than the minimum support threshold 3/10, we can easily

obtain frequent itemsets L1 = { {I1}, {I2}, {I3}, {I4}, {I5} }; generate candidate itemsets C2 = { {I1I2}, {I1I3}, {I1I4}, {I1I5}, {I2I3}, {I2I4}, {I2I5}, {I3I4}, {I3I5}, {I4I5} }, scan D for count of each candidate itemset, which support are 6/10, 6/10, 2/10, 3/10, 5/10, 3/10, 3/10, 1/10, 2/10, 1/10, respectively, remove those itemsets which support are lower than 3/10, we can easily get L2 = {{I1I2}, {I1I3}, {I1I5}, {I2I3}, {I2I4}, {I2I5} }; generate candidate itemsets C3 = {{I1I2I3}, {I1I2I5}, {I1I3I5}, {I2I3 I4}, {I2I3I5}, {I2I4I5}}, scan D for count of each candidate itemset, which support are 4/10, 3/10, 2/10, 1/10, 2/10, 1/10 respectively, then we can gain the frequent itemsets L3 = { {I1I2I3},

{I1I2I5} }, then C4 = { {I1I2I3I5} },with support is 2/10 lower than minimum support threshold and the finding process for frequent itemset will be finished.

Literature Survey
This section presents a comprehensive survey, mainly focused on the study of research methods for mining the frequent itemsets and association rules with utility considerations. Most of the existing works paid attention to performance and memory perceptions. The AIS (Agrawal, Imielinski, Swami) algorithm put forth by Agrawal was the forerunner of all the algorithms used to generate the frequent itemsets and confident association rules, the description of which has been given along with the introduction of mining problem. The algorithm comprises of two phases. The first phase constitutes the generation of the frequent itemsets. This is followed by the generation of the confident and frequent association rules in the second phase. The exploitation of the monotonicity property of the support of itemsets and the confidence of association rules led to the enhancement of the algorithm and it was renamed Apriori in a later point of time by Agrawal. Though a number of algorithms were put forth following the introduction of Apriori algorithm, a majority of them dealt with the optimization of one or more steps of the Apriori bearing the similar general structure. Alongside Apriori, Agrawal proposed the AprioriTid and AprioriHybrid algorithms as well. Apriori outperforms AIS on problems of various sizes. It beats by a factor of two for high minimum support and more than an order magnitude for low levels of support. SETM (SET-oriented Mining of association rules) was constantly outperformed by AIS. AprioriTid performed equivalently well as Apriori for smaller problem sizes however performance degraded twice slow when applied to large problems.

The support counting procedure of the Apriori algorithm has attracted voluminous research owing to the fact that the performance of the algorithm mostly relies on this aspect. Park et al. proposed an optimization, called DHP (Direct Hashing and Pruning) intended towards restricting the number of candidate itemstes, shortly following the Apriori algorithms mentioned above. Brin et al put forth the DIC algorithm that partitions the database into intervals of a fixed size so as to reduce the number of traversals through the database. Another algorithm called the CARMA algorithm (Continuous Association Rule Mining Algorithm) employs an identical technique in order to restrict the interval size to 1. A methodology that is entirely different from that of the aforesaid ones was proposed by Savasere. In this case, the vertical data base layout comes into action while storing the database in main memory besides the computation of an itemset being done with the intersection of the covers of two of its subsets. The Eclat algorithm put forth by Zaki is considered to be the archetype in the depth first manner of generation of frequent itemsets. This was followed by the introduction of diverse depth first algorithms among which the FP-growth algorithm by Han is the most famous and widely used. The numerous algorithms available are categorized based on their attention towards the parameters: performance, memory and discussed briefly with comparison and other related works in the following sub-sections.

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
LDD Dumps
100% (3)
LDD Dumps
22 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
CCSK Prep Guide V3
No ratings yet
CCSK Prep Guide V3
4 pages
Literature Survey On Various Frequent Pattern Mining Algorithm
No ratings yet
Literature Survey On Various Frequent Pattern Mining Algorithm
7 pages
p139 Data Mining Mafia
No ratings yet
p139 Data Mining Mafia
13 pages
p132 Closet
No ratings yet
p132 Closet
11 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages
A Survey of Association Rule Mining For Customer Relationship Management
No ratings yet
A Survey of Association Rule Mining For Customer Relationship Management
7 pages
Data Mining For Supermarket Sale Analysis Using Association Rule
No ratings yet
Data Mining For Supermarket Sale Analysis Using Association Rule
5 pages
Implementation of An Efficient Algorithm: 2. Related Works
No ratings yet
Implementation of An Efficient Algorithm: 2. Related Works
5 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Unity 5.x Shaders and Effects Cookbook - Sample Chapter
100% (1)
Unity 5.x Shaders and Effects Cookbook - Sample Chapter
25 pages
Chris Ries - Inside Windows Rootkits
100% (2)
Chris Ries - Inside Windows Rootkits
28 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Autodesk 3ds Max Shortcut Keys
No ratings yet
Autodesk 3ds Max Shortcut Keys
5 pages
Data Mining Association Rules Mining:: Large
No ratings yet
Data Mining Association Rules Mining:: Large
7 pages
Association Analysis and Frequent Sequential Pattern Mining-Apriori Algorithm
No ratings yet
Association Analysis and Frequent Sequential Pattern Mining-Apriori Algorithm
13 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Autocad DXF Codes
No ratings yet
Autocad DXF Codes
270 pages
Graph Data Structure-Notes
100% (1)
Graph Data Structure-Notes
15 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
Feature Extraction and Reduction by Using ModifiedApriori Algorithm
No ratings yet
Feature Extraction and Reduction by Using ModifiedApriori Algorithm
9 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
Efficient Mining Frequent Itemsets Algorithms: Marghny H. Mohamed Mohammed M. Darwieesh
No ratings yet
Efficient Mining Frequent Itemsets Algorithms: Marghny H. Mohamed Mohammed M. Darwieesh
11 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
No ratings yet
I. Review Questions Chapter 4: Mining Frequent Patterns, Associations, Ad Corelations
19 pages
CH - 5
No ratings yet
CH - 5
43 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
Extraction of Interesting Association Rules Using Genetic Algorithms
No ratings yet
Extraction of Interesting Association Rules Using Genetic Algorithms
8 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Ijctt V27P116
No ratings yet
Ijctt V27P116
7 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Unit - III
No ratings yet
Unit - III
27 pages
MBAMarket Basket Analysis Using Frequent Pattern Mining Techniques
No ratings yet
MBAMarket Basket Analysis Using Frequent Pattern Mining Techniques
8 pages
Module 2
No ratings yet
Module 2
13 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Association Rule
No ratings yet
Association Rule
27 pages
HLR9820 Maintenance Manual-Emergency Maintenance
No ratings yet
HLR9820 Maintenance Manual-Emergency Maintenance
99 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Contents
No ratings yet
Contents
59 pages
Unit 2 - Apriori and FP Growth Algortithm
No ratings yet
Unit 2 - Apriori and FP Growth Algortithm
15 pages
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
No ratings yet
Market Basket Analysis Using Association Rule: ISSN: 2454-132X Impact Factor: 4.295
4 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Frequent Pattern Mining With Associations: Lesson Introduction
No ratings yet
Frequent Pattern Mining With Associations: Lesson Introduction
6 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
Thabet Slimani - Efficiant Analysis of Pattern and Association Rule Mining Approaches
No ratings yet
Thabet Slimani - Efficiant Analysis of Pattern and Association Rule Mining Approaches
14 pages
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
FDS Unit02
No ratings yet
FDS Unit02
16 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
What You See Is Not What You See
No ratings yet
What You See Is Not What You See
29 pages
Easeus Partition Master User Guide
No ratings yet
Easeus Partition Master User Guide
46 pages
Useful Function Keys and Shortcuts
No ratings yet
Useful Function Keys and Shortcuts
2 pages
Sols Book PDF
100% (1)
Sols Book PDF
120 pages
Nightshade UV Editor 2.1 - User Manual
No ratings yet
Nightshade UV Editor 2.1 - User Manual
11 pages
Questions Based On Classes and Constructor: Q1. Define A Class Flight in C++ With The Following Specification
No ratings yet
Questions Based On Classes and Constructor: Q1. Define A Class Flight in C++ With The Following Specification
2 pages
Closet - An Efficient Algorithm For Mining Frequent
No ratings yet
Closet - An Efficient Algorithm For Mining Frequent
8 pages
CE301 Course Syllabus Fall 2015
No ratings yet
CE301 Course Syllabus Fall 2015
2 pages
Java Virtual Machine
No ratings yet
Java Virtual Machine
6 pages
WSO2 Product Administration: Module 01 - Introduction WSO2 Training
No ratings yet
WSO2 Product Administration: Module 01 - Introduction WSO2 Training
27 pages
Csqsao 10
No ratings yet
Csqsao 10
607 pages
Encipherment Using Modern Symmetric-Key Ciphers
No ratings yet
Encipherment Using Modern Symmetric-Key Ciphers
28 pages
Code Contracts User Manual: Microsoft Corporation December 2, 2010
No ratings yet
Code Contracts User Manual: Microsoft Corporation December 2, 2010
47 pages
My Computing Portfolio
No ratings yet
My Computing Portfolio
15 pages
Scalar Key Manager Appliance Data Sheet
No ratings yet
Scalar Key Manager Appliance Data Sheet
2 pages
Super Sample
No ratings yet
Super Sample
12 pages
Security Chp1 Lab-A Rsrch-Net-Attack Instructor
No ratings yet
Security Chp1 Lab-A Rsrch-Net-Attack Instructor
5 pages
Professional Java For Web Applications - (Introduction)
100% (1)
Professional Java For Web Applications - (Introduction)
1 page
JADE Tutorial For Beginners: Part 2 - Using Jade Fabio Bellifemine, TILAB
No ratings yet
JADE Tutorial For Beginners: Part 2 - Using Jade Fabio Bellifemine, TILAB
17 pages
Issues in Information Systems: A Framework and Demo For Preventing Anti-Computer Forensics
No ratings yet
Issues in Information Systems: A Framework and Demo For Preventing Anti-Computer Forensics
7 pages
Assignment 3 - Plane Seat Reservation System
No ratings yet
Assignment 3 - Plane Seat Reservation System
4 pages
Microsoft Azure IoT
No ratings yet
Microsoft Azure IoT
19 pages
1.linear Search
No ratings yet
1.linear Search
2 pages
Learn Design and Analysis of Algorithms in 24 Hours
From Everand
Learn Design and Analysis of Algorithms in 24 Hours
Alex Nordeen
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

Report of 2nd Defence

Uploaded by

Report of 2nd Defence

Uploaded by

Study and Analysis of Association Rule Mining

Master of Technology In Information Technology

Supervisor Mr S.K.Sahu Asstt. Professor, USIT, GGSIPU M.Tech(ITW) VI Semester

Submitted by Anamika REG. NO. 0391646508

Generating the frequent itemsets

You might also like