0% found this document useful (0 votes)

11 views34 pages

DMDW 3rd Module

Uploaded by

Mandira Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views34 pages

DMDW 3rd Module

Uploaded by

Mandira Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

DATA MINING AND DATA WAREHOUSING

MODULE-3 ASSOCIATION ANALYSIS

3.1 Introduction
3.2 Association Analysis: Problem Definition
3.3 Frequent Item set Generation,
3.4 Rule generation. Alternative Methods for Generating Frequent Item sets
3.5 FP-Growth Algorithm,
3.6 Evaluation of Association Patterns
3.7 Outcome
3.8 Important Question

3.1 Introduction
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences
of other items in the transaction
Market-Basket transactions

{Diaper} ->{Beer},
{Milk, Bread} ->{Eggs,Coke},
{Beer, Bread} -> {Milk},

Definition: Item Set and support count

Itemset and Support Count Let I ={i1,i2,….id} be the set of all itemsin a market basket data and T :
{t1,t2,..-,tN} be the set of all transactions. Each transaction ti contains a subset of items chosen from I
In association analysis, a collection of zero or more items is termed an itemset. If an itemsetcontains k-
items, it is called a k-itemset
Example: {Milk, Bread, Diaper}
DATA MINING AND DATA WAREHOUSING

Rule Evaluation Metrics:

Support count (σ)
– Frequency of occurrence of an itemset

– E.g. σ({Milk, Bread, Diaper}) = 2/5 Support(s)

– Fraction of transactions that contain an itemset

– E.g. s({Milk, Bread, Diaper}) = 2

Frequent Itemset

– An itemset whose support is greater than or equal to a minsup threshold .

Example:

3.2Association Rule Mining Task

Given a set of transactions T, the goal of association rule mining is to find all rules having
– support ≥ minsupthreshold

– confidence ≥ minconfthreshold
Brute-force approach:

– List all possible association rules

– Compute the support and confidence for each rule

– Prune rules that fail the minsup and minconf thresholds

More specifically, the total number of possible rules extracted from a data set that contains
d items is

Even for the small data set with 6 items, this approach requires us to compute the support and confidence
for 36 - 27 * 1 = 602 rules.
More than 80% of the rules are discarded after applying minsup : 20Vo andminconf : 5070, thus making
most of the computations become wasted.
To avoid performing needless computations, it would be useful to prune the rulesearly
without having to compute their support and confidence values.
If the itemset is infrequent, then all six candidate rules can be pruned immediately without our having to
compute their confidence values.
Therefore, a common strategy adopted by many association rule mining algorithms is
to decompose the problem into two major subtasks:

1. Frequent Itemset Generation

– Generate all itemsets whose support >= minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a
frequent itemset .
Frequent itemset generation is still computationally expensive.
m

3.3 Frequent Itemset Generation:

A lattice structure can be used to enumerate the list of all possible item sets.
Figure 6.1 shows an item set lattice for 1: {a,b,c.,d,e}.In general, a data setthat contains k items can
potentially generate up to 2k- 1 frequent itemsets ,excluding the null set. Because k can be very large in
many practical applications, the search space of item sets that need to be explored is exponentially Large.
Brute-force approach:

– Each itemset in the lattice is a candidate frequent itemset

– Count the support of each candidate by scanning the database

Such an approach can be very expensive because it requires O(N Mw) comparisons, where N is the
number of transactions, M =2k - 1 is the number of candidate itemsets, and w is the maximum
transaction width.
There are several ways to reduce the computational complexity of frequent itemset generation.
Reduce the number of candidates (M)

– Complete search: M=2d

– Use pruning techniques to reduce M

Reduce the number of transactions (N)

– Reduce size of N as the size of itemset increases

Reduce the number of comparisons (NM)

– Use efficient data structures to store the candidates or transactions

– No need to match every candidate against every transaction.

Apriori principle : Reducing Number of Candidates

Apriory principle: If an itemset is frequent, then all of its subsets must also be frequent
To illustrate the idea behind the Apriory principle, consider the itemset Iattice shown in Figure 6.3.
Suppose {c, d, e} is a frequent itemset. Clearly,any transaction that contains {c,d,e} must also contain its
subsets, {c,d},{c,e}, {d,e}, {c}, {d}, and {e}. As a result, if {c,d,e} is frequent, then all subsets of {c, d,e}
(i.e., the shaded itemsets in this figure) must also be frequent.
Conversely, if an itemset such as {a, b} is infrequent, then all of its supersets must be infrequent too. As
illustrated in Figure 6.4, the entire subgraph containing the supersets of {a, b} can be pruned immediately
once {a, b} is found to be infrequent. This strategy of trimming the exponential search space based on the
support measure is known as support-based pruning.

VTUPulse.co
Frequent Itemset Generation in the Apriori Algorithm: Illustration with example.
Figure 6.5 provides a high-level illustration of the frequent item set generation part of the Apriori
algorithm for the transactions shown inTable 6.1. We assume that the support threshold is 60 To, which is
equivalent to a minimum support count equal to 3.
Initially, every item is considered as a candidate l-itemset. Aftercounting their supports, the candidate
itemsets {Co1a} and {Eggs} are discarded because they appear in fewer than three transactions.
In the next iteration, candidate 2-itemsets are generated using only the frequent 1-itemsets becausethe
Apriory principle ensures that all supersets of the infrequent 1-itemsets must be infrequent.
Because there are only four frequent 1-itemsets, the number of candidate 2-itemsets generated by the
algorithm is 6. Two of these six candidates, {Beer, Bread} and {Beer, Milk}, are subsequently found to be
infrequent after computing their support values. The remaining four candidates are frequent, and thus will
be used to generate candidate 3-itemsets.
Without support-based.pruning, there are 20 candidate3-itemsets that can be formed using the six items
given in this example. With the Apriory principle, we only need to keep candidate 3- itemsets whose
subsets are frequent. The only candidate that has this property is {Bread, Diapers, Milk).

The effectiveness of the Apriory pruning strategy can be shown by counting the number of candidate
itemsets generated.
A brute-force strategy of enumerating all itemsets( up to size3 ) as candidates w ill produce 41 candidates.
With the Apriory principle, this number decreases t o candidates, which epresents a 68% reduction in the
number of candidate itemsets even in this simple example.

Apriori Algorithm:
Input: set of items I, set of transactions T, number of transactions N, minimum support minsup.
Output: frequent k-itemsets Fk, k=1…
Method:
K=1

• Compute support for each 1-itemset (item) by scanning the transactions

• F1 = items that have support above minsup
• Repeat until no new frequent itemsets are identified

1. Ck+1 = candidate k+1 -itemsets generated from length k frequent itemsets Fk

2. Compute the support of each candidate in Ck+1 by scanning the transactions T
3. Fk+1 = Candidates in Ck+1 that have support above minsup
Candidate Generation and Pruning
Candidate Generation: This operation generates new candidate kitemsets based on the frequent (k - l)-
itemsets found in the previous iteration.
Candidate Pruning: This operation eliminates some of the candidate k-itemsets using the support- based
pruning strategy.
The folIowing is a list of requirements for an effective candidate generation procedure:

• It should avoid generating too many unnecessary candidates.

• It must ensure that the candidate set is complete, i.e., no frequent itemsetsare left out by the
candidate generation procedure.
• It should not generate the same candidate itemset more than once. e.g. {a,b,c,d} can be
generated by merging {a,b,c} with {d} or {b,d} with {a,c}, {a,b} with {c,d}

Several candidate generation strategies are discussed below.

Brute-Force Method: The brute-force method considers every k-itemset asa potential candidate and then
applies the candidate pruning step to removeany unnecessary candidates (see Figure)

Fk-1 x F1 Method:
Combine frequent k-1 –itemsets with frequent 1- itemsets
Figure 6.7 illustrates how a frequent 2-itemset such as {Beer, Diapers} can be augmented with a frequent
item such as Bread to produce a candidate 3-itemset {Beer, Diapers, Bread}.
Satisfaction of our requirements
1) while many k-itemsets are left ungenerated, can still generate unnecessary candidates
e.g. merging {Beer, Diapers} with {Milk} is unnecessary, since {Beer, Milk} is infrequent.

2) Method is complete: each frequent itemset consists of a frequent k-1 –itemset and a frequent 1-
itemset.
3) Can generate the same set twice
e.g. {Bread, Diapers, Milk} can be generated by merging {Bread,Diapers} with {Milk} or
{Bread,Milk} with {Diapers} or {Diapers, Milk} with {Bread}
This can be circumvented by keeping all frequent itemsets in their lexicographical order (\
- e.g. {Bread,Diapers} can be merged with {Milk} as ‘Milk’ comes after ‘Bread’ and ‘Diapers’ in
lexicographical order
- {Diapers, Milk} is not merged with {Bread}, {Bread, Milk} is not merged with {Diapers} as that
would violate the lexicographical ordering

Fk-1 x Fk-1 Method:

• Combine a frequent k-1 –itemset with another frequent k-1 -itemset

• Items are stored in lexicographical order in the itemset
• When considering for merging, only pairs that share first k-2 items are considered

o e.g. {Bread, Diapers} is merged with {Bread,Milk}

o if the pairs share fewer than k-2 items, the resulting itemset would be larger than k, so
we do not need to generate it yet.

• The resulting k-itemset has k subsets of size k-1, which will be checked against support
threshold
o the merging ensures that at least two of the subsets are frequent
o An additional check is made that the remaining k-2 subsets are frequent as well.
In Figure 6.8, the frequent itemsets {Bread, Diapers} and {Bread, Milk} are merged to form a candidate
3-itemset {Bread, Diapers, Milk}.

ering
in their last item.
3) Each candidate itemset is generated only once.

VTU
Satisfaction of our requirements
1)Avoids the generation of many unnecessary candidates that are generated by the Fk-1 x F1 meth
e.g. will not generate {Beer, Diapers, Milk} as {Beer,Milk} is infrequent

2) Method is complete: every frequent k-itemset can be formed of two frequent k-1 –itemsets diff
Support counting using hash tree:
Given the candidate itemsets Ck and the set of transactions T, we need to compute the support counts
σ(X) for each itemset X in Ck.
Brute-force algorithm would compare each transaction against each itemset.
• large amount of comparisons.
An alternative approach
• Divide the candidate itemsets Ck into buckets by using a hash function for each transaction t.
• Hash the itemsets contained in t into buckets using the same hash function.
• Compare the corresponding buckets of candidates and the transaction.
• Increment the support counts of each matching candidate itemset.
• A hash tree is used to implement the hash function.

An alternative approach is to enumerate the itemsets contained in each transaction and use them to update
the support counts oftheir respective candidate itemsets. To illustrate, consider a transaction t that contains
five items, {1,2,3,5,6}.

Figure 6.9 shows a systematic way for enumerating the 3-itemsets contained in t. Assuming that each
itemset keeps its items in increasing lexicographic order, an itemset can be enumerated by specifying the
smallest item first, followed by the larger items. For instance, given t : {1,2,3,5,6}, all the 3- itemsets
contained in f must begin with item 1, 2, or 3.
Figure 6.11 shows an example of a hash tree structure.
Each internal node of the tree uses the following hash function, h(p) : p mod 3, to determine which branch
of the current node should be followed next.
For example, items 1, 4, and 7 are hashed to the same branch (i.e., the leftmost branch) because they have
the same remainder after dividing the number by 3.
All candidate itemsets are stored at the leaf nodes of the hash tree. The hash tree shown in Figure 6.11
contains 15 candidate 3-itemsets, distributed across 9 leaf nodes.
Consider a transaction, t, : {1,2,3,5,6}. To update the support counts of the candidate itemsets, the hash
tree must be traversed in such a way that all the leaf nodes containing candidate 3-itemsets belonging to t
must be visited at least once.
At the root node of the hash tree, the items 1, 2, and 3 of the transaction are hashed separately. Item 1 is
hashed to the left child of the root node, item 2 is hashed to the middle child, and item 3 is hashed to the
right child.
At the next level of the tree, the transaction is hashed on the second item listed in the Level 2 structures
shown in Figure 6.9.
For example, after hashing on item 1 at the root node, items 2, 3, and 5 of the transaction are hashed.
Items 2 and 5 are hashed to the middle child, while item 3 is hashed to the right child, as shown in Figure
6.12. This process continues until the leaf nodes of the hash tree are reached.
The candidate item sets stored at the visited leaf nodes are compared against the transaction. If a candidate
is a subset of the transaction, its support count is incremented.
In this example, 5 out of the 9 leaf nodes are visited and 9 out of the 15 item sets are compared against the
transaction.
V

– If {A,B,C,D} is a frequent itemset,

candidate rules:
f satisfies the
Rule Generation
Given a frequent itemset L, find all non- –
minimum confidence requirement
Candidate rule is
generated by merging two
rules that share the same
prefix in the rule
consequent
join(CD=>AB,BD=>AC)
would produce the
candidate rule D => ABC
Prune rule D=>ABC if
– In general, confidence does not have its subset AD=>BC
an anti-monotone property does not have high
How to efficiently generate rules from confidence.
frequent itemsets? c(ABC->D) can be
larger or smaller than c(AB-
>D)
– But confidence of rules generated from
the same itemset has an anti-monotone
property
e.g., L = {A,B,C,D}:

◆ Confidence is anti-monotone w.r.t.

number of items on the RHS of the
rule
3.4Alternative Methods for Generating Frequent Itemsets
Traversal of Itemset Lattice::A search for frequent itemsets can be conceptually viewed as a traversal on
the itemset lattice.
The search strategy employed by an algorithm dictates how the lattice structure is traversed during the
frequent itemset generation process. Some search strategies are better than others, depending on the
configuration of frequent itemsets in the lattice.

Equivalence classes : Equivalence Classes can also be defined according to the prefix or suffix labels of
an itemset.
In this case, two itemsets belong to the same equivalence class if they share a common prefix or suffix of
length k. In the prefix-based approach, the algorithm can search for frequent itemsets starting with the
prefix a before looking for those starting with prefixes b, c and so on.

m
Breadth-First versus Depth-First: The Apriori, algorithm traverses the lattice in a breadth-first manner)
as shown in Figure 6.2L(a). It first discovers all the frequent 1-itemsets, followed by the frequent 2-
itemsets, and so on, until no new frequent itemsets are generated.
The algorithm can start from, say, node a, in Figure 6.22, and count its support to determine whether it is
frequent. If so, the algorithm progressively expands the next level of nodes, i.e., ab, abc, and so on, until
an infrequent node is reached, say, abcd. It then backtracks to another branch, say, abce, and continues the
search from there.
3.5 FP-Growth Algorithm
Apriori: uses a generate-and-test approach – generates candidate itemsets and tests if they are frequent –
Generation of candidate itemsets is expensive(in both space and time)
– Support counting is expensive
• Subset checking (computationally expensive) Multiple Database scans

 FP-Growth: allows frequent itemset discovery without candidate itemset generation. Two step
approach: –
Step 1: Build a compact data structure called the FP-tree
• Built using 2 passes over the data-set.
Step 2: Extracts frequent itemsets directly from the FP-tree

Step 1: FP-Tree Construction

FP-Tree is constructed using 2 passes over the data-set:
Pass 1: Scan data and find support for each
– Sort frequent items in decreasing order based on their support.
Use this order when building the FP-Tree, so common prefixes can be shared. Pass 2:
Nodes correspond to items and have a counter
1. FP-Growth reads 1 transaction at a time and maps it to a path 2. Fixed order is used, so paths can
overlap when transactions share items (when they have the same prfix ). – In this case, counters are
incremented
3. Pointers are maintained between nodes containing the same item, creating singly linked lists (dotted
lines) – The more paths that overlap, the higher the compression. FP-tree may fit in memory.
4. Frequent itemsets extracted from the FP-Tree.

Figure 6.24 shows a data set that contains ten transactions and five items.
Initially, the FP-tree contains only the root node represented by the null symbol. The FP-tree is
subsequently extended in the following way:

1. The data set is scanned once to determine the support count of each item. Infrequent items are
discarded, while the frequent items are sorted in decreasing support counts. For the data set shown in
Figure 6.24, a is the most frequent item, followed by b, c, d, and e.
2. .The algorithm makes a second pass over the data to construct the FP tree. After reading the
first transaction, {a,b), the nodes labeled as a and b are created. A path is then formed from nulI ,a, b to
encode the transaction. Every node along the path has a frequency count of 1.

3. After reading the second transaction, {b,c,d}, a new set of nodes is created for items b, c, and d. A
path is then formed to represent the transaction by connecting the nodes null ,b,c, d. Every node along this
path also has a frequency count equal to one. Although the first two transactions have an item in common,
which is b, their paths are disjoint because the transactions do not share a common prefix.
The third transaction, {a,c,d,e}, shares a common prefix item (which is a) with the first transaction. As a
result, the path for the third transaction null , a,c,d, e, overlaps with the path for the first transaction, nuI,a
,b. Because of their overlapping path, the frequency count for node a is incremented to two, while the
frequency counts for the newly created nodes, c, d, and e) are equal to one.
This process continues until every transaction has been mapped onto one of the paths given in the FP-tree.
The resulting FP-tree after reading all the transactions is shown at the bottom of Figure 6.25.

Step 2: Frequent Itemset Generation

FP-growth is an algorithm that generates frequent itemsets from an FP-tree by exploring the tree in a
bottom-up fashion.
Given the example tree shown in Figure 6.24, the algorithm looks for frequent itemsets ending in e first,
followed by d, c, b, and finally, a. This bottom-up strategy for finding frequent itemsets ending with a
particular item is equivalent to the suffix-based approach.
Since every transaction is mapped onto a path in the FP-tree, we can derive the frequent itemsets ending
with a particular item, say e, by examining only the paths containing node e. These paths can be accessed
rapidly using the pointers associated with node e. The extracted paths are shown in Figure 6.26(a).
After finding the frequent itemsets ending in e, the algorithm proceeds to look for frequent itemsets
ending in d by processing the paths associated with node d. The corresponding paths are shown in Figure
6.26(b). This process continues until all the paths associated with nodes c, b, and finally a are processed.
The paths for these items are shown in Figures 6.26(c), (d), and (e), while their corresponding frequent
itemsets are summarized in Table 6.6.
3.7 Evaluation of Association Pattern
Association rule algorithms tend to produce too many rules
– many of them are uninteresting or redundant
– Redundant if {A,B,C} -> {D} and {A,B} -> {D} have same support & confidence
Interestingness measures can be used to prune/rank the derived patterns

Objective measures of interestingness

Given a rule X -> Y, information needed to compute rule interestingness can be obtained from a
contingency table
Contingency table for X -> Y
Interest Factor
Correlation Analysis
For binary variables, correlation can be measured using the d-coefficient. which is defined as

The value of correlation ranges from -1 (perfect negative correlation) to +1 (perfect positive correlation).
If the variables are statistically independent, then it is 0.
3.8 Important Questions
1. What is association analysis? Define support and confidence with an example.
2. Develop the appriori algorithm for frequent itemset generation, with an example.
3. Explain the various measure of evaluating association patterns.
4. Explain in detail frequent itemset generation and rule generation with reference to appriorialong with
an example.
5. Define following: a) Support b) Confidence.
6. Explain FP growth algorithm for discovering frequent item sets. What are its limitation.
7. Consider following transaction data set
Construct the FP tress by showing the tress separately after reading each transaction.
8. Illustrate the limitations of support confidence framework for evaluation of an association rule
9. Define cross support pattern. Suppose the support for milk is 70%, support for sugar is 10% and
support for bread is 0.04%. given hc= 0.01. is the frequent item set {milk, sugar, bread} the cross- support
pattern?
10. Which are the factors affecting the computational complexity of appriori algorithm? Explain them.
11. Define a frequent pattern tree. Discuss the method of computing a FP-Tree, with an algorithm.
12. Give an example to show that items in a strong association rule may actually be negatively
corelated.
13. A database has five transactions. Let min-sup = 60% and min-conf = 80%

Find all frequent item sets using appriori and FP growth respectively,
14. Explain various alternative methods for generating frequent item sets.
15. A database has four transactions. Let min-sup = 40% and min-conf = 60%

Find all frequent item sets using appriori and FP growth algorithms. Compare the efficiency of two
measuring process.
16. Explain various Candidate Generation and Pruning techniques.
17. Explain the various properties of objective measures.
18. Comprehend the Simpson’s Paradox.
19. Illustrate the nature of Simpson’s paradox for the following two-way contingency table
20. What is appriori algorithm? Give an example. A database has six transactions of purchase of books
from a book shop as given below

21. Consider the following transaction data set:

Construct FP Tree.Generate List of frequent item set ordered by their corresponding suffixes.
22. Consider following set of frequent 3 item sets

Assume that there are only 5 items in data set.

a) List all candidate 4 item sets obtained by a candidate generation procedure using Fk-1 X F1
merging strategy
b) List all candidate 4 item sets obtained by the candidate generation procedure in appriori,
23. Apply appriori algorithm for

Item set = {Milk, Bread, Eggs, Cookies, Coffee, Butter, Juice}, use 0.2 for min-sup.

DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
MCQ
100% (7)
MCQ
37 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
Week 6 - Basic Association Analysis
No ratings yet
Week 6 - Basic Association Analysis
71 pages
Unit 4
No ratings yet
Unit 4
72 pages
Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Data Mining Association Analysis: Basic Concepts and Algorithms
38 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
35 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
Unit 4
No ratings yet
Unit 4
21 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
DM Mod3 PDF
No ratings yet
DM Mod3 PDF
96 pages
DM -Unit 2-PPT
No ratings yet
DM -Unit 2-PPT
49 pages
06FPBasic
No ratings yet
06FPBasic
77 pages
DM Association
No ratings yet
DM Association
43 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
Slides
No ratings yet
Slides
92 pages
Association Analysis: Basic Concepts and Algorithms: Problem Definition
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Problem Definition
15 pages
Datamining Lect2 Frequent
No ratings yet
Datamining Lect2 Frequent
59 pages
L9
No ratings yet
L9
24 pages
DS2 Association
No ratings yet
DS2 Association
48 pages
Mod_5
No ratings yet
Mod_5
56 pages
dwdm FINAL4
No ratings yet
dwdm FINAL4
37 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Dm&bi - L10-Association Rules
No ratings yet
Dm&bi - L10-Association Rules
43 pages
DWM-UNIT-4
No ratings yet
DWM-UNIT-4
11 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
BD25
No ratings yet
BD25
19 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Apriori and FP-Growth Algorithm
No ratings yet
Apriori and FP-Growth Algorithm
48 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
L6-7 - Apriori
No ratings yet
L6-7 - Apriori
22 pages
Unit IV Dwdm
No ratings yet
Unit IV Dwdm
17 pages
Module III Final
No ratings yet
Module III Final
68 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Association Rules
No ratings yet
Association Rules
24 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
DMDW Chapter 4(Updated)
No ratings yet
DMDW Chapter 4(Updated)
28 pages
Unit-4
No ratings yet
Unit-4
97 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Unit 5 Notes DWM
No ratings yet
Unit 5 Notes DWM
11 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Improving Efficiency of Apriori Algorithm Using Transaction Reduction
No ratings yet
Improving Efficiency of Apriori Algorithm Using Transaction Reduction
4 pages
Week 3
No ratings yet
Week 3
56 pages
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
No ratings yet
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
37 pages
DWDM UNIT-5
No ratings yet
DWDM UNIT-5
14 pages
Module 4 (3)
No ratings yet
Module 4 (3)
71 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
association rule
No ratings yet
association rule
22 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Associate Rules
No ratings yet
Associate Rules
26 pages
Manufacturing: Engineering, Management and Marketing
From Everand
Manufacturing: Engineering, Management and Marketing
S.O.T Ogaji
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
No ratings yet
Unit 2 Data Preprocessing and Association Rule Mining
31 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
88 pages
DMW Unit4
No ratings yet
DMW Unit4
39 pages
Online Retail Market Basket Analysis
No ratings yet
Online Retail Market Basket Analysis
51 pages
Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique
No ratings yet
Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique
8 pages
BDA 557 Data Science for Business Slides
No ratings yet
BDA 557 Data Science for Business Slides
145 pages
978-981-97-6806-6
No ratings yet
978-981-97-6806-6
410 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
Data Mining PPT 7
No ratings yet
Data Mining PPT 7
14 pages
Data Mining Notes Jntuh Compress
No ratings yet
Data Mining Notes Jntuh Compress
62 pages
Sequential Rule PDF
No ratings yet
Sequential Rule PDF
4 pages
Large Scale Parallel Data Mining 1759 Lecture Notes in Computer Science 1st edition by Mohammed Zaki, Ching Tien Ho ISBN 3540671943 978-3540671947 - Download the full set of chapters carefully compiled
100% (6)
Large Scale Parallel Data Mining 1759 Lecture Notes in Computer Science 1st edition by Mohammed Zaki, Ching Tien Ho ISBN 3540671943 978-3540671947 - Download the full set of chapters carefully compiled
79 pages
Data Mining MCA 3 Sem
No ratings yet
Data Mining MCA 3 Sem
51 pages
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
42 pages
8CT-DWM Lab Manual-19-20
No ratings yet
8CT-DWM Lab Manual-19-20
31 pages
Brin Et Al
No ratings yet
Brin Et Al
10 pages
BBA CA Semester III Manisha Madam
No ratings yet
BBA CA Semester III Manisha Madam
32 pages
15-Fp-Tree Problem-10-09-2024
No ratings yet
15-Fp-Tree Problem-10-09-2024
2 pages
Apriori Algorithm in Word File
No ratings yet
Apriori Algorithm in Word File
16 pages
Session 7
No ratings yet
Session 7
45 pages
Data Analytics Unit-4
No ratings yet
Data Analytics Unit-4
47 pages
DMDW_Association Analysis
No ratings yet
DMDW_Association Analysis
12 pages
Assingment No1
No ratings yet
Assingment No1
2 pages
Market Basket Analysis Using Apriori and FP Growth Algorithm
No ratings yet
Market Basket Analysis Using Apriori and FP Growth Algorithm
7 pages
07FPAdvanced
No ratings yet
07FPAdvanced
81 pages
ADBMS Lab Manual
No ratings yet
ADBMS Lab Manual
26 pages
Market Basket Analysis For Data Mining Concepts and Techniques
No ratings yet
Market Basket Analysis For Data Mining Concepts and Techniques
4 pages

DMDW 3rd Module

Uploaded by

DMDW 3rd Module

Uploaded by

DATA MINING AND DATA WAREHOUSING

MODULE-3 ASSOCIATION ANALYSIS

Definition: Item Set and support count

Rule Evaluation Metrics:

– E.g. σ({Milk, Bread, Diaper}) = 2/5 Support(s)

– Fraction of transactions that contain an itemset

– E.g. s({Milk, Bread, Diaper}) = 2

– An itemset whose support is greater than or equal to a minsup threshold .

3.2Association Rule Mining Task

– List all possible association rules

– Compute the support and confidence for each rule

– Prune rules that fail the minsup and minconf thresholds

1. Frequent Itemset Generation

3.3 Frequent Itemset Generation:

– Each itemset in the lattice is a candidate frequent itemset

– Complete search: M=2d

Reduce the number of transactions (N)

Reduce the number of comparisons (NM)

– Use efficient data structures to store the candidates or transactions

Apriori principle : Reducing Number of Candidates

• Compute support for each 1-itemset (item) by scanning the transactions

1. Ck+1 = candidate k+1 -itemsets generated from length k frequent itemsets Fk

• It should avoid generating too many unnecessary candidates.

Several candidate generation strategies are discussed below.

Fk-1 x Fk-1 Method:

• Combine a frequent k-1 –itemset with another frequent k-1 -itemset

o e.g. {Bread, Diapers} is merged with {Bread,Milk}

– If {A,B,C,D} is a frequent itemset,

◆ Confidence is anti-monotone w.r.t.

Step 1: FP-Tree Construction

Step 2: Frequent Itemset Generation

Objective measures of interestingness

21. Consider the following transaction data set:

Assume that there are only 5 items in data set.

You might also like