0% found this document useful (0 votes)

3 views17 pages

Unit IV DWDM

Association rule mining is a data mining technique used to discover interesting relationships within large datasets, particularly in transactional databases. Key concepts include support, confidence, and the Apriori algorithm, which efficiently identifies frequent itemsets and generates association rules based on user-defined thresholds. Despite its usefulness in various fields, the Apriori algorithm can be slow and inefficient with large datasets due to the extensive number of candidate itemsets it generates.

Uploaded by

ckesava474

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views17 pages

Unit IV DWDM

Uploaded by

ckesava474

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

ASSOCIATION RULE MINING

Association rule mining is a technique in data mining that aims to discover interesting relationships or
patterns within large datasets. It is particularly useful for uncovering associations between different
variables or items in transactional databases, such as items frequently purchased together in a retail
setting or symptoms co-occurring in medical records.

Here are some key concepts and steps involved in association rule mining:

1. Transaction Database:

• Association rule mining typically starts with a transactional database, where each
transaction represents a set of items. For example, a transaction could be a customer's
shopping basket containing various products.

2. Support:

• Support is a measure of how frequently an itemset (a set of items) appears in the

dataset. It is calculated as the number of transactions containing the itemset divided by
the total number of transactions. High support indicates that the itemset is common in
the dataset.

3. Confidence:

• Confidence measures the reliability of an association rule. It is the probability of finding

the consequent (the item you want to predict) in a transaction given that the
transaction contains the antecedent (the item(s) used for prediction). High confidence
indicates a strong association between the antecedent and the consequent.

4. Association Rules:

• Association rules are typically expressed in the form "IF {antecedent} THEN
{consequent}." For example, "IF {diapers} THEN {baby formula}." These rules are derived
based on the support and confidence thresholds set by the user.

5. Apriori Algorithm:

• The Apriori algorithm is a popular algorithm for association rule mining. It uses a
breadth-first search strategy to discover frequent itemsets and generate association
rules efficiently. The algorithm relies on the Apriori property, which states that if an
itemset is frequent, all of its subsets must also be frequent.

6. Pruning:

• To optimize the process of rule discovery, pruning techniques are often employed to
eliminate itemsets or rules that do not meet certain criteria, such as minimum support
or confidence thresholds.

7. Lift:
• Lift is another measure used in association rule mining. It compares the likelihood of the
consequent occurring when the antecedent is present to the likelihood of the
consequent occurring in general. A lift value greater than 1 indicates a positive
correlation between the antecedent and consequent.

Association rule mining is widely used in various fields, including retail, healthcare, and finance, to
uncover hidden patterns in large datasets. However, it's essential to interpret the results carefully, as
associations do not imply causation, and some discovered rules may be spurious or coincidental.

FREQUENT ITEMSET GENERATION

Frequent itemset generation is a crucial step in association rule mining, particularly in algorithms like
Apriori. The goal is to identify sets of items that occur together frequently in a dataset. Here's an
overview of the process:

1. Support Count:

• The support count of an itemset is the number of transactions in which the itemset
appears. It is the basis for determining the frequency of itemsets. For example, if you're
working with a retail dataset, the support count of an itemset {A, B} is the number of
transactions containing both items A and B.

2. Support Threshold:

• The support threshold is a user-defined parameter that sets the minimum support count
or percentage required for an itemset to be considered "frequent." Itemsets that meet
or exceed this threshold are considered candidates for further analysis.

3. Apriori Algorithm:

• The Apriori algorithm is a classic algorithm for generating frequent itemsets. It uses a
level-wise, breadth-first search strategy to discover frequent itemsets of increasing size.
The algorithm relies on the Apriori property, which states that if an itemset is frequent,
all of its subsets must also be frequent.

4. Algorithm Steps:

• Here are the basic steps of the Apriori algorithm:

• Initialization: Identify frequent 1-itemsets (single items) by scanning the

database and counting their support.

• Iteration: Generate candidate itemsets of size k based on frequent itemsets of

size k-1. Prune candidate itemsets that have infrequent subsets.

• Counting: Count the support of each candidate itemset by scanning the

database.

• Pruning: Eliminate candidate itemsets that do not meet the minimum support
threshold.
• Repeat: Repeat the process until no new frequent itemsets can be found.

5. Example:

• Suppose you have a transactional dataset with the following transactions:

• T1: {A, B, C}

• T2: {A, B}

• T3: {A, C}

• T4: {B, C}

• T5: {B}

• With a minimum support threshold of 2, the initial frequent 1-itemsets are {A}, {B}, and
{C}. Then, the algorithm iteratively generates and prunes candidate itemsets of higher
sizes until no more frequent itemsets can be found.

6. Performance Optimization:

• To improve efficiency, the Apriori algorithm often uses techniques such as pruning
(eliminating candidates with infrequent subsets) and the hash tree structure for
counting support.

Frequent itemset generation is a fundamental step in association rule mining, providing the basis for
discovering meaningful patterns and relationships in large datasets.

THE APRIORI PRINCIPLE

The Apriori principle is a fundamental concept in association rule mining and is a key foundation for the
Apriori algorithm. Proposed by Rakesh Agrawal and Ramakrishnan Srikant in 1994, the Apriori principle
helps reduce the search space when discovering frequent itemsets in a dataset.

The Apriori principle is based on the observation that if an itemset is frequent, then all of its subsets
must also be frequent. This principle is often expressed as:

If an itemset is infrequent, all its supersets will also be infrequent.

Conversely:

If an itemset is frequent, all its subsets will also be frequent.

This principle is crucial for efficiently identifying frequent itemsets in a large dataset without having to
examine all possible combinations. The Apriori algorithm uses the Apriori principle to generate
candidate itemsets and prune those that cannot be frequent based on the downward closure property.

Here's how the Apriori principle is applied in the context of the Apriori algorithm:

1. Generate Frequent 1-Itemsets:

• Initially, the algorithm identifies frequent 1-itemsets (individual items) by counting their
support in the dataset.
2. Generate Candidate Itemsets:

• For subsequent iterations, the algorithm generates candidate itemsets of size k based
on frequent itemsets of size k-1.

3. Prune Based on Apriori Principle:

• Before counting the support of candidate itemsets, the algorithm prunes candidates
that have infrequent subsets. This pruning step is possible due to the Apriori principle,
which ensures that if a candidate itemset is infrequent, any of its supersets (larger
itemsets) will also be infrequent.

4. Count Support and Repeat:

• After pruning, the algorithm counts the support of the remaining candidate itemsets in
the dataset. Frequent itemsets are retained, and the process is repeated until no new
frequent itemsets can be found.

By leveraging the Apriori principle and the downward closure property, the Apriori algorithm efficiently
explores the search space of potential frequent itemsets, avoiding the need to examine all possible
combinations and reducing the computational cost of association rule mining.

APRIORI ALGORITHM
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset
for boolean association rule. Name of the algorithm is Apriori because it uses prior knowledge of
frequent itemset properties. We apply an iterative approach or level-wise search where k-frequent
itemsets are used to find k+1 itemsets.

To improve the efficiency of level-wise generation of frequent itemsets, an important property is used
called Apriori property which helps by reducing the search space.

Apriori Property –
All non-empty subset of frequent itemset must be frequent. The key concept of Apriori algorithm is its
anti-monotonicity of support measure. Apriori assumes that

All subsets of a frequent itemset must be frequent(Apriori property).

If an itemset is infrequent, all its supersets will be infrequent.

Before we start understanding the algorithm, go through some definitions which are explained in my
previous post.
Consider the following dataset and we will find frequent itemsets and generate association rules for
them.
minimum support count is 2
minimum confidence is 60%

Step-1: K=1
(I) Create a table containing support count of each item present in dataset – Called C1(candidate set)

(II) compare candidate set item’s support count with minimum support count(here min_support=2 if
support_count of candidate set items is less than min_support then remove those items). This gives us
itemset L1.

Step-2: K=2

• Generate candidate set C2 using L1 (this is called join step). Condition of joining L k-1 and Lk-1 is
that it should have (K-2) elements in common.

• Check all subsets of an itemset are frequent or not and if not frequent remove that
itemset.(Example subset of{I1, I2} are {I1}, {I2} they are frequent.Check for each itemset)
• Now find support count of these itemsets by searching in dataset.

(II) compare candidate (C2) support count with minimum support count(here min_support=2 if
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L2.

Step-3:

• Generate candidate set C3 using L2 (join step). Condition of joining L k-1 and Lk-1 is that it
should have (K-2) elements in common. So here, for L2, first element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1, I3, i5}{I2, I3, I4}{I2, I4, I5}{I2,
I3, I5}

• Check if all subsets of these itemsets are frequent or not and if not, then remove that
itemset.(Here subset of {I1, I2, I3} are {I1, I2},{I2, I3},{I1, I3} which are frequent. For {I2,
I3, I4}, subset {I3, I4} is not frequent so remove it. Similarly check for every itemset)

• find support count of these remaining itemset by searching in dataset.

(II) Compare candidate (C3) support count with minimum support count(here min_support=2 if
support_count of candidate set item is less than min_support then remove those items) this gives us
itemset L3.

Step-4:

• Generate candidate set C4 using L3 (join step). Condition of joining L k-1 and Lk-1 (K=4) is
that, they should have (K-2) elements in common. So here, for L3, first 2 elements
(items) should match.

• Check all subsets of these itemsets are frequent or not (Here itemset formed by joining
L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not frequent). So no itemset
in C4

• We stop here because no frequent itemsets are found further

Thus, we have discovered all the frequent item-sets. Now generation of strong association rule comes
into picture. For that we need to calculate confidence of each rule.

Confidence –
A confidence of 60% means that 60% of the customers, who purchased milk and bread also bought
butter.

Confidence(A->B)=Support_count(A∪B)/Support_count(A)

So here, by taking an example of any frequent itemset, we will show the rule generation.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1Î2]=>[I3] //confidence = sup(I1Î2Î3)/sup(I1Î2) = 2/4*100=50%
[I1Î3]=>[I2] //confidence = sup(I1Î2Î3)/sup(I1Î3) = 2/4*100=50%
[I2Î3]=>[I1] //confidence = sup(I1Î2Î3)/sup(I2Î3) = 2/4*100=50%
[I1]=>[I2Î3] //confidence = sup(I1Î2Î3)/sup(I1) = 2/6*100=33%
[I2]=>[I1Î3] //confidence = sup(I1Î2Î3)/sup(I2) = 2/7*100=28%
[I3]=>[I1Î2] //confidence = sup(I1Î2Î3)/sup(I3) = 2/6*100=33%

So if minimum confidence is 50%, then first 3 rules can be considered as strong association rules.
Limitations of Apriori Algorithm
Apriori Algorithm can be slow. The main limitation is time required to hold a vast number of candidate
sets with much frequent itemsets, low minimum support or large itemsets i.e. it is not an efficient
approach for large number of datasets. For example, if there are 10^4 from frequent 1- itemsets, it need
to generate more than 10^7 candidates into 2-length which in turn they will be tested and accumulate.
Furthermore, to detect frequent pattern in size 100 i.e. v1, v2… v100, it have to generate 2^100
candidate itemsets that yield on costly and wasting of time of candidate generation. So, it will check for
many sets from candidate itemsets, also it will scan database many times repeatedly for finding
candidate itemsets. Apriori will be very low and inefficiency when memory capacity is limited with large
number of transactions.

RULE GENERATION
Rule generation in the context of association rule mining, specifically using algorithms like Apriori,
involves deriving meaningful relationships or patterns from the discovered frequent itemsets. Once
frequent itemsets are identified, association rules are generated to express the associations between
items. These rules are in the form "IF {antecedent} THEN {consequent}" and provide insights into the co-
occurrence of items in the dataset.

Here's a step-by-step process for rule generation:

1. Frequent Itemset Discovery:

• Before generating rules, you need to identify frequent itemsets using an algorithm such
as Apriori. Frequent itemsets are sets of items that occur together frequently in the
dataset.

2. Rule Generation:

• For each frequent itemset, generate association rules by considering different

combinations of items as antecedents and consequents. A rule must have at least one
item in the antecedent and one item in the consequent.

3. Rule Evaluation:

• Evaluate the quality of each rule using metrics such as support, confidence, and lift.
These metrics help determine the significance and reliability of the discovered
associations.

• Support: The proportion of transactions in the dataset that contain both the
antecedent and the consequent.

• Confidence: The probability of finding the consequent in a transaction given

that the antecedent is present.

•
• Lift: Measures the degree to which the antecedent and consequent are
dependent, considering their individual occurrences.

4. Pruning Rules:

• Apply additional filtering criteria, such as setting a minimum confidence threshold, to

retain only high-quality and interesting rules. Pruning helps focus on rules that are more
likely to be meaningful or actionable.

5. Presentation and Interpretation:

• Present the generated rules in a human-readable format, making it easy for users to
understand the relationships between different items. This might involve sorting the
rules based on confidence or support.

6. Iterative Refinement:

• Depending on the specific goals of the analysis, you may need to iteratively refine the
rule generation process by adjusting parameters, such as support and confidence
thresholds, or by considering additional domain-specific constraints.

7. Rule Application:

• Once you have a set of high-quality rules, you can apply them to new data to make
predictions or gain insights. For example, in a retail setting, if you discover a rule like "IF
{bread} THEN {butter}," it suggests that customers who buy bread are likely to buy
butter as well.

It's important to note that association rules do not imply causation, and the interpretation of rules
should be done cautiously. Additionally, the effectiveness of rules depends on the quality of the data
and the appropriateness of the algorithm and parameters chosen for rule generation.

COMPACT REPRESENTATION OF FREQUENT ITEMSETS

Compact representation of frequent itemsets is a way to represent and store the discovered patterns in
a more efficient and concise manner. This is particularly important when dealing with large datasets, as
the raw enumeration of all frequent itemsets can be computationally expensive and memory-intensive.
There are several techniques for compactly representing frequent itemsets:

1. Closed Itemsets:

• Closed itemsets are a compact representation that eliminates redundant information.

An itemset is closed if none of its supersets has the same support. Therefore, for a
closed itemset, adding any other item would result in a decrease in support. Closed
itemsets capture all the essential information about frequent itemsets without
redundancy.

2. Maximal Itemsets:

• Maximal itemsets are another compact representation that retains only those itemsets
that are not subsets of any other frequent itemset. Unlike closed itemsets, maximal
itemsets do not consider support; they only focus on the structure of the itemsets.
Maximal itemsets provide a more compact representation by excluding subsets that do
not add new information.

3. Association Rules:

• Instead of storing all frequent itemsets separately, one can store only the high-
confidence association rules. These rules capture the essential relationships between
items in a more human-readable and actionable format. The compactness comes from
representing associations rather than individual itemsets.

4. Tree-Based Structures:

• Some methods use tree-based structures, such as FP-growth (Frequent Pattern growth),
to compactly represent frequent itemsets. FP-growth builds a compressed data
structure called the FP-tree, which facilitates efficient mining of frequent itemsets
without the need to explicitly generate and store all possible itemsets.

5. Bitwise Representations:

• In databases where items have unique identifiers, bitwise representations can be used
to compactly represent itemsets. Each item corresponds to a bit, and itemsets are
represented as bit vectors, making it computationally efficient for certain operations.

6. Vertical Data Format:

• In some cases, representing data in a vertical format where each item has its list of
transactions can lead to a more compact representation, especially when dealing with
sparse datasets.

The choice of compact representation depends on the specific requirements of the analysis, the
characteristics of the dataset, and the goals of the mining process. Each compact representation method
has its advantages and trade-offs in terms of storage efficiency, computational complexity, and ease of
interpretation.

THE FP-GROWTH (FREQUENT PATTERN GROWTH) ALGORITHM

The FP-Growth (Frequent Pattern Growth) algorithm is an efficient algorithm for mining frequent
itemsets from transactional databases. It was introduced by Jiawei Han, Jian Pei, and Yiwen Yin in their
paper "Mining Frequent Patterns without Candidate Generation" in 2000. FP-Growth is particularly well-
suited for large datasets and is an alternative to the Apriori algorithm.

Here's an overview of how the FP-Growth algorithm works:

1. Build the FP-Tree:

• Scan the transactional database and construct a data structure called the FP-Tree
(Frequent Pattern Tree).
• The FP-Tree is built by inserting each transaction into the tree. Items within a
transaction are added as nodes, and the tree structure helps represent the relationships
between different items.

2. Generate Conditional Pattern Bases:

• For each frequent item in the dataset, create a conditional pattern base by removing the
frequent item from the original transactions and keeping the remaining structure. This
step is performed recursively.

3. Construct Conditional FP-Trees:

• For each conditional pattern base, build a conditional FP-Tree. This is essentially a
smaller FP-Tree constructed from the conditional pattern base.

4. Mine Frequent Itemsets from Conditional FP-Trees:

• Recursively mine frequent itemsets from each conditional FP-Tree. This process involves
repeating the steps of building conditional pattern bases and constructing conditional
FP-Trees until no more frequent itemsets can be found.

5. Combine Frequent Itemsets:

• Combine the frequent itemsets obtained from the conditional FP-Trees with the
frequent itemsets from the original transactions to obtain the complete set of frequent
itemsets.

The key advantages of FP-Growth include:

• No Candidate Generation:

• Unlike the Apriori algorithm, FP-Growth does not generate candidate itemsets explicitly.
It constructs the FP-Tree directly from the dataset, avoiding the need to generate and
test multiple candidate itemsets.

• Efficiency:

• FP-Growth can be more efficient than traditional algorithms, especially when dealing
with large datasets, as it reduces the number of passes over the data and avoids the
generation of an explicit candidate set.

• Compact Data Structure:

• The FP-Tree is a compact data structure that captures the frequency information in a
condensed form, making it efficient for frequent pattern mining.

While FP-Growth is generally efficient, its performance depends on the characteristics of the dataset. It
is well-suited for datasets with a large number of transactions and a relatively small number of unique
items.
FP-GROWTH ALGORITHM
The two primary drawbacks of the Apriori Algorithm are:

1. At each step, candidate sets have to be built.

2. To build the candidate sets, the algorithm has to repeatedly scan the database.

These two properties inevitably make the algorithm slower. To overcome these redundant steps, a new
association-rule mining algorithm was developed named Frequent Pattern Growth Algorithm. It
overcomes the disadvantages of the Apriori algorithm by storing all the transactions in a Trie Data
Structure. Consider the following data:-

The above-given data is a hypothetical dataset of transactions with each letter representing an item. The
frequency of each individual item is computed:-

Let the minimum support be 3. A Frequent Pattern set is built which will contain all the elements whose
frequency is greater than or equal to the minimum support. These elements are stored in descending
order of their respective frequencies. After insertion of the relevant items, the set L looks like this:-

L = { K : 5, E : 4, M : 3, O : 3, Y : 3 }

Now, for each transaction, the respective Ordered-Item set is built. It is done by iterating the Frequent
Pattern set and checking if the current item is contained in the transaction in question. If the current
item is contained, the item is inserted in the Ordered-Item set for the current transaction. The following
table is built for all the transactions:

Now, all the Ordered-Item sets are inserted into a Trie Data Structure.

a) Inserting the set {K, E, M, O, Y}:

Here, all the items are simply linked one after the other in the order of occurrence in the set and
initialize the support count for each item as 1.

b) Inserting the set {K, E, O, Y}:

Till the insertion of the elements K and E, simply the support count is increased by 1. On inserting O we
can see that there is no direct link between E and O, therefore a new node for the item O is initialized
with the support count as 1 and item E is linked to this new node. On inserting Y, we first initialize a new
node for the item Y with support count as 1 and link the new node of O with the new node of Y.
c) Inserting the set {K, E, M}:

Here simply the support count of each element is increased by 1.

d) Inserting the set {K, M, Y}:

Similar to step b), first the support count of K is increased, then new nodes for M and Y are initialized
and linked accordingly.
e) Inserting the set {K, E, O}:

Here simply the support counts of the respective elements are increased. Note that the support count of
the new node of item O is increased.

Now, for each item, the Conditional Pattern Base is computed which is path labels of all the paths which
lead to any node of the given item in the frequent-pattern tree. Note that the items in the below table
are arranged in the ascending order of their frequencies.
Now for each item, the Conditional Frequent Pattern Tree is built. It is done by taking the set of
elements that is common in all the paths in the Conditional Pattern Base of that item and calculating its
support count by summing the support counts of all the paths in the Conditional Pattern Base.

From the Conditional Frequent Pattern tree, the Frequent Pattern rules are generated by pairing the
items of the Conditional Frequent Pattern Tree set to the corresponding to the item as given in the
below table.
For each row, two types of association rules can be inferred for example for the first row which contains
the element, the rules K -> Y and Y -> K can be inferred. To determine the valid rule, the confidence of
both the rules is calculated and the one with confidence greater than or equal to the minimum
confidence value is retained.

Staad Questions PDF
No ratings yet
Staad Questions PDF
8 pages
Topics in Finite and Discrete Mathematics - Sheldon M. Ross
100% (1)
Topics in Finite and Discrete Mathematics - Sheldon M. Ross
279 pages
Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Python Material 1
No ratings yet
Python Material 1
259 pages
Swimming Pool Structural Calcs
100% (1)
Swimming Pool Structural Calcs
7 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
Feldman-Mahalanobis Model
No ratings yet
Feldman-Mahalanobis Model
3 pages
Association Rule Mining
No ratings yet
Association Rule Mining
11 pages
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
23 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
No ratings yet
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
7 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Apriori Algo
No ratings yet
Apriori Algo
15 pages
Unit 4
No ratings yet
Unit 4
72 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
Contents
No ratings yet
Contents
59 pages
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
23 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Lecture Notes
23 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
9 pages
DWDM Unit 4 (R22)
No ratings yet
DWDM Unit 4 (R22)
25 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Unit - III
No ratings yet
Unit - III
27 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Mod 5
No ratings yet
Mod 5
56 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
5 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
U2 - Apriori - 5th Sem - DS
No ratings yet
U2 - Apriori - 5th Sem - DS
12 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
667a8d24bb947 PPT
No ratings yet
667a8d24bb947 PPT
24 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Unit - 5 Machine Learning
No ratings yet
Unit - 5 Machine Learning
72 pages
Shweta Singh-Dwdm2024
No ratings yet
Shweta Singh-Dwdm2024
5 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Ex 9 DWM Aryant
No ratings yet
Ex 9 DWM Aryant
9 pages
Module-4 DM - Introduction
No ratings yet
Module-4 DM - Introduction
5 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
15 pages
Data Mining Notes UNIT III
No ratings yet
Data Mining Notes UNIT III
26 pages
17.tendering Strategies Iconsult
No ratings yet
17.tendering Strategies Iconsult
32 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Unit3 Data Mining Pattern
No ratings yet
Unit3 Data Mining Pattern
46 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
IELTS Writing
0% (1)
IELTS Writing
8 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Introduction To The Apriori Algorithm
No ratings yet
Introduction To The Apriori Algorithm
10 pages
7apriori Algorithm Slide
No ratings yet
7apriori Algorithm Slide
15 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
Apriori
No ratings yet
Apriori
34 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Topic 1, 2, 3
No ratings yet
Topic 1, 2, 3
5 pages
Monsoon Theories
100% (1)
Monsoon Theories
14 pages
Apriori Algorithm Examples
No ratings yet
Apriori Algorithm Examples
45 pages
Chapter 5
No ratings yet
Chapter 5
34 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
No ratings yet
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
10 pages
Private Health Institutions Law
100% (1)
Private Health Institutions Law
22 pages
Briere ITCT-A Final PDF
No ratings yet
Briere ITCT-A Final PDF
119 pages
Attachment and Culture - Security in The United States and Japan
No ratings yet
Attachment and Culture - Security in The United States and Japan
12 pages
Bachelor Thesis
No ratings yet
Bachelor Thesis
88 pages
Hanover Report 1978
100% (1)
Hanover Report 1978
10 pages
Trainz 2004 DRAFT Content Creation Procedures
100% (1)
Trainz 2004 DRAFT Content Creation Procedures
101 pages
Learning Area Learners With Special Educational Needs (LSEN) Learning Delivery Modality Modular Distance Learning Modality
No ratings yet
Learning Area Learners With Special Educational Needs (LSEN) Learning Delivery Modality Modular Distance Learning Modality
5 pages
Microlink Information Technology College Department of Computer Science
No ratings yet
Microlink Information Technology College Department of Computer Science
87 pages
Camry - EF932 - Instructions - For - Use - Manual 21
No ratings yet
Camry - EF932 - Instructions - For - Use - Manual 21
8 pages
Review 1 Lop 11 Thi Diem Units 123
No ratings yet
Review 1 Lop 11 Thi Diem Units 123
2 pages
Q.18604 Cummin Genset Nta 855 - 1
100% (2)
Q.18604 Cummin Genset Nta 855 - 1
1 page
Project Charter Template
No ratings yet
Project Charter Template
9 pages
Module 5 Reflection 1
No ratings yet
Module 5 Reflection 1
7 pages
Excel - Chapter
No ratings yet
Excel - Chapter
69 pages
Phil Summa
No ratings yet
Phil Summa
3 pages
Contempo Proposal Group 1
No ratings yet
Contempo Proposal Group 1
15 pages
Amcas Coursework Video
100% (2)
Amcas Coursework Video
7 pages
Acebrofilina+budesonida
No ratings yet
Acebrofilina+budesonida
3 pages
Java Python All Programs
No ratings yet
Java Python All Programs
4 pages
Table of Contents (The Summary) : Intro
No ratings yet
Table of Contents (The Summary) : Intro
14 pages
Abrahams & Millar (2008)
No ratings yet
Abrahams & Millar (2008)
27 pages
Unit 4
No ratings yet
Unit 4
15 pages
HBase
No ratings yet
HBase
12 pages
Semitic Alphabets
No ratings yet
Semitic Alphabets
16 pages
Be Electrical Engineering Semester 5 2023 December Renewable Energy Sourcesrev 2019 C Scheme
No ratings yet
Be Electrical Engineering Semester 5 2023 December Renewable Energy Sourcesrev 2019 C Scheme
1 page
Aluminum and Glass Company in Qatar
No ratings yet
Aluminum and Glass Company in Qatar
5 pages
Important Questions For CSE VTH Sem Computer Networks For Mid
No ratings yet
Important Questions For CSE VTH Sem Computer Networks For Mid
2 pages
Sphinx Search Beginner's Guide
From Everand
Sphinx Search Beginner's Guide
Abbas Ali
4/5 (2)
Java 9 Data Structures and Algorithms
From Everand
Java 9 Data Structures and Algorithms
Debasish Ray Chawdhuri
No ratings yet
Learn Design and Analysis of Algorithms in 24 Hours
From Everand
Learn Design and Analysis of Algorithms in 24 Hours
Alex Nordeen
No ratings yet
Differential Evolution: Fundamentals and Applications
From Everand
Differential Evolution: Fundamentals and Applications
Fouad Sabry
No ratings yet

Unit IV DWDM

Uploaded by

Unit IV DWDM

Uploaded by

ASSOCIATION RULE MINING

• Support is a measure of how frequently an itemset (a set of items) appears in the

• Confidence measures the reliability of an association rule. It is the probability of finding

FREQUENT ITEMSET GENERATION

• Here are the basic steps of the Apriori algorithm:

• Initialization: Identify frequent 1-itemsets (single items) by scanning the

• Iteration: Generate candidate itemsets of size k based on frequent itemsets of

• Counting: Count the support of each candidate itemset by scanning the

• Suppose you have a transactional dataset with the following transactions:

THE APRIORI PRINCIPLE

If an itemset is infrequent, all its supersets will also be infrequent.

If an itemset is frequent, all its subsets will also be frequent.

1. Generate Frequent 1-Itemsets:

3. Prune Based on Apriori Principle:

4. Count Support and Repeat:

All subsets of a frequent itemset must be frequent(Apriori property).

• find support count of these remaining itemset by searching in dataset.

• We stop here because no frequent itemsets are found further

Here's a step-by-step process for rule generation:

1. Frequent Itemset Discovery:

• For each frequent itemset, generate association rules by considering different

• Confidence: The probability of finding the consequent in a transaction given

• Apply additional filtering criteria, such as setting a minimum confidence threshold, to

5. Presentation and Interpretation:

COMPACT REPRESENTATION OF FREQUENT ITEMSETS

• Closed itemsets are a compact representation that eliminates redundant information.

6. Vertical Data Format:

THE FP-GROWTH (FREQUENT PATTERN GROWTH) ALGORITHM

Here's an overview of how the FP-Growth algorithm works:

1. Build the FP-Tree:

2. Generate Conditional Pattern Bases:

3. Construct Conditional FP-Trees:

4. Mine Frequent Itemsets from Conditional FP-Trees:

5. Combine Frequent Itemsets:

The key advantages of FP-Growth include:

• Compact Data Structure:

1. At each step, candidate sets have to be built.

a) Inserting the set {K, E, M, O, Y}:

b) Inserting the set {K, E, O, Y}:

Here simply the support count of each element is increased by 1.

d) Inserting the set {K, M, Y}:

You might also like