0% found this document useful (0 votes)

85 views104 pages

Mod 4 Part1 - Merged

Module 4 focuses on Association Rule Analysis, covering the introduction to association rules, methods for discovering them, and advanced algorithms like Apriori and FP-tree. It emphasizes the significance of association rule mining in market basket analysis, including identifying product relationships and optimizing marketing strategies. Key concepts such as support, confidence, and frequent itemsets are defined, alongside practical applications in various fields.

Uploaded by

Insha Nourin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views104 pages

Mod 4 Part1 - Merged

Uploaded by

Insha Nourin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 104

Module 4: (Association Rule Analysis)

▪ Part - 1
▪ Association Rules-Introduction
▪ Methods to discover Association rules
▪ Apriori(Level-wise algorithm)
Visualization:
https://athena.ecs.csus.edu/~mei/associationcw/Association.html
▪ Part - 2 (Advanced Frequent Itemset Mining Algorithms)
▪ Partition Algorithm
▪ FP-tree Growth Algorithm.
▪ Pincer Search Algorithm,
▪ Dynamic Itemset Counting Algorithm

1
KTU
4.1.1. Discuss the significance of association rule mining in market
basket analysis. (3)
4.1.2. Define support, confidence, and frequent itemset, in association
rule mining context. (3)

2
What Is Frequent Pattern Analysis?
◼ Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
◼ Motivation: Finding inherent regularities in data
◼ What products were often purchased together?
◼ What are the subsequent purchases after buying a PC?
◼ What kinds of DNA are sensitive to this new drug?
◼ Can we automatically classify web documents?
◼ Applications
◼ market basket analysis, cross-marketing, catalog design, sale
campaign analysis, web log (click stream) analysis, and DNA
sequence analysis, plagiarism check

3
Market Basket
TID List of Items
101 Litchi, Hill Banana, Strawberry
102 Litchi, Passion Fruit
103 Passion Fruit, Tomato
104 Litchi, Hill Banana, Strawberry
105 Pears, Strawberry
106 Pears
107 Pears, Passion Fruit
108 Litchi, Hill Banana, Watermelon, Strawberry
109 Watermelon, Tomato
110 Litchi, Hill Banana

4
Market Basket

5
Significance of Association Rule Mining in Market
Basket Analysis

◼ Identifying Product Relationships: Helps retailers discover

which products are often bought together (e.g., bread and
butter, milk and cereal).
◼ Personalized Recommendations: E-commerce platforms use
association rules to suggest related products to customers,
enhancing user experience.
◼ Optimizing Store Layouts: Supermarkets can arrange items
strategically to boost cross-selling opportunities.
◼ Enhancing Marketing Strategies: Helps in targeted promotions,
bundling offers, and discount strategies.
◼ Reducing Inventory Costs: Understanding item correlations
helps in better stock management and demand forecasting.

6
Association Rule Mining –
Support Transaction
Items
ID
• I={i1, i2, ..., in}: a set of all the items 10 A, C, D
• Transaction t: 20 B, C, E
• a set of items such that t  I 30 A, B, C, E

• Transaction Database D: a set of 40 B, E

transactions
Obtain 1-
• A: a set of items A  I item Count
Itemsets Support
• Support of itemset A:
{A} 2
• Frequency of itemset A in D {B} 3
• The probability that a transaction {C} 3
contains A {D} 1
• S = n(A) / n(D) {E} 3
Support Example
3, 5, 8.
Support Count (8) =7
2, 6, 8.
Support of (8) = n(S) / n(D) = 7 / 10
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10
Association Rule
Mining – Confidence
3, 5, 8
◼ Support count of A = n(A)
2, 6, 8
◼ An itemset A is frequent if A’s
support count is >= a min-sup 1, 4, 7, 10
threshold 3, 8, 10
◼ An Association Rule: is an 2, 5, 8
implication of the form 1, 5, 6
A  B, where A, B  I 4, 5, 6, 8
◼ Confidence A  B = P(B/A) 2, 3, 4
= Support-Count (A U B) / 1, 5, 7, 8
Support-Count (A) 3, 8, 9, 10

9
Association Rule 3, 5, 8
Mining – Confidence 2, 6, 8
1, 4, 7, 10
◼ Support count of A = n(A)
3, 8, 10
◼ An itemset A is frequent if A’s
2, 5, 8
support count is >= a min-sup
threshold 1, 5, 6
◼ An Association Rule: is an 4, 5, 6, 8
implication of the form 2, 3, 4
A  B, where A, B  I 1, 5, 7, 8
◼ Confidence A  B = P(B/A) 3, 8, 9, 10
Confidence (8) => (5)
= Support-Count (A U B) /
Support-Count (A) = Sup Count (8 U 5) /
Sup Count (8) = 4 / 7

10
Support and Confidence Example
3, 5, 8. What is the Confidence of the association
rule {5} => {8}?
2, 6, 8.
1, 4, 7, 10. • Sup Count (5 U 8) =4

3, 8, 10. • Sup Count (5) =5

2, 5, 8. • Confidence {5} => {8}
1, 5, 6.
Sup Count (5 U 8) / Sup Count (5)
4, 5, 6, 8.
4/5 = 0.8
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10
The Apriori Tuning Principle
(and the Downward Closure Property)
◼ The downward closure property of frequent patterns
◼ The downward closure property of frequent patterns states
that any subset of a frequent itemset must also be frequent.
◼ For example, if the itemset {bread, eggs, coffee} is frequent,
then its immediate subsets—{bread, eggs}, {bread, coffee},
and {eggs, coffee}—must also be frequent.
◼ Apriori pruning principle
◼ If an itemset is infrequent, all of its supersets will also be
infrequent and these supersets should not be generated.

12
The Apriori Algorithm
1. Initial step: Set k = 1 . Scan the transaction database once
to obtain frequent 1-item itemset list ‘L1’.
2. Self-join Lk: We can join 2 itemsets if they have k-1
common items and they differ in one item
3. Prune: Select itemsets for which all the immediate subsets
are frequent. Add them to Ck+1, the candidate itemset list
4. Support count: Scan the transaction database and count
the support for each itemset in Ck+1
5. Select the frequent itemsets: Choose itemsets from Ck+1
whose support count meets or exceeds the minimum support
threshold. These form the next frequent itemset Lk+1
6. Iterate: Set k= k+1; Iterate steps 2 to 5, till Lk !=

13
Example of Generation of
Candidate 3-Item Itemsets from L2
◼ Given Frequent 2-Item Itemsets 𝐿2 =
{{𝐴,𝐵},{𝐴,𝐶},{𝐴,𝐷},{𝐵,𝐶},{𝐵,𝐷},{𝐶,𝐷}}, generate Candidate
3-Itemsets (C3)
◼ Join two itemsets only if they share k−1 common items and
they differ in one item
◼ {A,B} and {A,C} → {𝐴,𝐵,𝐶}
◼ {A,B} and {𝐴,𝐷} → {𝐴,𝐵,𝐷}
◼ {A,C} and {A,D} → {𝐴,𝐶,𝐷}
◼ {B,C} and {B,D} → {B,C,D}
◼ {C,D} remains since no other itemset starts with prefix ‘C’.
◼ Candidate 3-Item Itemsets (C3) =
{{A,B,C},{A,B,D},{A,C,D},{B,C,D}}
14
The Apriori Algorithm Illustration …
Consider a transaction database Transaction Items
with four transactions. Assume ID
minimum support is 2. Let us 10 A, C, D
illustrate the Apriori algorithm. 20 B, C, E
The first step is to scan the DB
and generate frequent 1-item 30 A, B, C, E
itemset 40 B, E

Obtain 1-item Count Select frequent 1- Support

Itemsets Support item itemsets ‘L1’ count
{A} 2 {A} 2
{B} 3
{B} 3
{C} 3
{C} 3
{D} 1
{E} 3 {E} 3
15
… The Apriori Algorithm Illustration
Database TDB Itemset sup L1 sup
Tid Items {A} 2 {A} 2
C1 {B} 3
10 A, C, D {B} 3
20 B, C, E 1st scan {C} 3 {C} 3
30 A, B, C, E {D} 1 {E} 3
40 B, E {E} 3

C2 sup C2
L2 sup {A, B} 1 2nd scan {A, B}
{A, C} 2 {A, C}
{A, C} 2
{B, C} 2 {A, E}
{A, E} 1
{B, E} 3 {B, C}
{B, C} 2
{C, E} 2
{B, E} 3 {B, E}
{C, E} 2 {C, E}

16
… The Apriori Algorithm Illustration

Database TDB L1 sup C2

Tid Items {A} 2 {A, B}
10 A, C, D {B} 3 {A, C}
{C} 3 {A, E}
20 B, C, E
{E} 3 {B, C}
30 A, B, C, E
40 B, E {B, E}
{C, E}
L2 sup
{A, C} 2 C3 (L2 L3 support
join L2)
{B, C} 2 3rd scan
{B, C, E} {B, C, E} 2
{B, E} 3
{C, E} 2

17
The Apriori Algorithm Illustration – Summary
Itemset sup
Database TDB Itemset sup
{A} 2
{A} 2
Tid Items C1 {B} 3 L1
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
Itemset sup
Itemset
{A, B} 1
Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2
L2 {A, E} 1 {A, C}
{B, C} 2
{A, E}
{B, E} 3
{B, C} 2 C2 C2
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3rd scan
Itemset sup
{B, C, E} {B, C, E} 2 L3
18
The Apriori Algorithm Illustration - Termination

• L3: {B,C,E} L1 Sup

• When there are no more frequent {A} 2
itemsets to generate, the Apriori {B} 3
algorithm terminates {C} 3
• Output: all the frequent itemsets {E} 3
for min-sup = 2
L2 Sup
1. Frequent 1-item itemsets: {A, C} 2
A,B,C,E {B, C} 2
2. Frequent 2-item itemsets: {B, E} 3
{A,C}, {B,C}, {B,E}, {C,E} {C, E} 2
3. Frequent 3-item itemsets: L3 sup
{B,C,E}
{B, C, E} 2
19
Association Rule Mining
• Strong association rules satisfy both

➢ minimum support

➢ minimum confidence

• Support count of A = n(A)

• An association rule is an implication of the form

A  B, where A, B  I

• Confidence A  B = P(B/A)

= Support-Count (A U B) / Support-Count (A)

20
Association Rule Mining from Frequent Itemsets

▪ For each frequent itemset l, generate all nonempty subsets of l.

▪ For every nonempty subset s of l, create the association rule

s => (l-s)

▪ The association rule is considered strong if it meets or

exceeds the confidence threshold, i.e., [support count (l) /
support count (s)] >= minimum confidence threshold
Association Rule Mining Example
◼ Consider the frequent item set {K,Y}
The association rules are: - K -> Y and Y -> K
◼ Consider the frequent item set {E,K,O}. The association rules
are:
E -> O, O -> E,
K -> O, O -> K,
E -> K, K -> E,
{E, K} -> O,
{E, O} -> K,
{K, O} -> E

22
Association Rule Mining Full Example …
Describe the process of generating association rules using an
example frequent 3-item set

▪ Consider a frequent itemset {I1, I2, I5}

▪ Subsets are:- I1, I2, I5, {I1, I2}, {I1, I5}, {I2, I5}, {I1, I2, I5}

▪ The association rules, s => (l-s)

I1 => {I2, I5}, I2 => {I1, I5}, I5 => {I1, I2}

{I1, I5} => I2, {I2, I5} => I1

… Association Rule Mining Full Example
▪ Consider the frequent itemset {I1, I2, I5}. It’s subsets are:-
I1, I2, I5, {I1, I2}, {I1, I5}, {I2, I5}, {I1, I2, I5}
▪ Assume the following table lists the support counts of all the
above subsets of items.

L1 L2 L3
Item Sup Item Sup Item Sup
I1 6 I1, I2 4 I1, I2, I3 2
I2 7 I1 , I3 4 I1, I2, I5 2
I3 6 I1 , I5 2
I4 2 I2, I3 4
▪ Assume the table L1 L2 L3
below lists the
support counts of all
Item Sup Item Sup Item Sup
subsets of items I1 6 I1, I2 4 I1, I2, I3 2
I2 7 I1 , I3 4 I1, I2, I5 2
• Confidence A  B
= Sup-Count (A u B) / I3 6 I1 , I5 2
Sup-Count (A) I4 2 I2, I3 4

AR for {I1, I2, I5} Confidence = s(A u B) / s(A) Confidence

{I1, I2} => I5 n(I1,I2,I5) / n(I1,I2) 2 / 4 = 50%
{I1, I5} => I2 n(I1,I2,I5) / n(I1,I5) 2 / 2 = 100%
{I2, I5} => I1 n(I1,I2,I5) / n(I2,I5) 2 / 2 = 100%
I1 => {I2, I5} n(I1,I2,I5) / n(I1) 2 / 6 = 33%
I2 => {I1, I5} n(I1,I2,I5) / n(I2) 2 / 7 = 29%
I5 => {I1, I2} n(I1,I2,I5) / n(I5) 2 / 2 = 100%
… Association Rule Mining Full Example

▪ Assume a minimum confidence of 70%.

Association Rule Strength of Association Confidence

{I1, I2} => I5 Weak Association 2 / 4 = 50%
{I1, I5} => I2 Strong Association 2 / 2 = 100%
{I2, I5} => I1 Strong Association 2 / 2 = 100%
I1 => {I2, I5} Weak Association 2 / 6 = 33%
I2 => {I1, I5} Weak Association 2 / 7 = 29%
I5 => {I1, I2} Strong Association 2 / 2 = 100%
Comparison of Interestingness
• Support and Confidence
– “Buy rice  buy milk [1%, 80%]” is misleading if many
customers buy milk
• Null transactions:
– Transactions that does NOT contain any of the item sets
being examined, are called Null transactions
– Null- invariance is crucial for correlation analysis
– Support and confidence are not null invariant.
– Use the following two null-invariant measures to get a
better understanding of correlations
• Kulczynski Measure

27
• IR (Imbalance Ratio)
EXERCISES
UQP 4.1.4
b) State the Apriori principle in candidate generation. Find out the
frequent item sets with minimum support of 2 using Apriori for the
following data. (8)
UQP 4.1.5 …
A database has six transactions. Let min_sup be 60% and min_conf be
80%. Find frequent itemsets using Apriori algorithm and generate
strong association rules from a three-item dataset. (8)
◼ Find frequent itemset using Apriori algorithm – 4 marks [marks can

be given for the correct steps to the solution]

◼ Generate strong association rules from the dataset – 3 marks

[marks can be given for the correct steps to the solution]

TID items bought
T1 I1, I2, I3
T2 I2, I3, I4
T3 I4, I5
T4 I1, I2, I4
T5 I1, I2, I3, I5
T6 I1, I2, I3, I4
{I1} support count= 4 Let min_sup = 0.6 or support = 3 (out of 5 transactions)
{I2} support count= 5
{I3} support count= 4 Frequent 2-Item Itemsets:-
{I4} support count= 4 n(I1, I2) = 4
{I5} support count= 2 n(I3, I2) = 4
{I1, I2} support count= 4
{I1, I3} support count= 3
{I1, I4} support count= 2
Also Note:-
n(I1) = 4
… UQP 4.1.5
{I5, I1} support count= 1 n(I2) = 5
{I3, I2} support count= 4 n(I3) = 4
{I4, I2} support count= 3
{I5, I2} support count= 1 Note:- Strong association rules require high support, and
{I4, I3} support count= 2 high confidence. The rule must be also business relevant
{I5, I3} support count= 1
{I5, I4} support count= 1
Let min_conf = 0.8;
{I1, I3, I2} support count= 3 Strong Association Rules are :-
{I1, I4, I2} support count= 2
Strong Association Rules for the Itemset {I1, I2}:-
{I5, I1, I2} support count= 1
{I1, I4, I3} support count= 1
{I1} -> {I2} confidence: 1.0 <- n(I1, I2)/ n(I1)
{I5, I1, I3} support count= 1 {I2} -> {I1} confidence: 0.8 <- n(I1, I2)/ n(I2)
{I4, I3, I2} support count= 2
Strong Association Rules for the Itemset {I3, I2}:-
{I5, I3, I2} support count= 1
{I1, I4, I3, I2} support count= 1
{I2} -> {I3} confidence: 0.8 <- n(I2, I3)/ n(I2)
{I5, I1, I3, I2} support count= 1 {I3} -> {I2} confidence: 1.0 <- n(I2, I3)/ n(I3)

31
UQP 4.1.6 …
A database has six transactions. Let min_sup be 33.33% and
min_conf be 60%. Find frequent itemset using Apriori algorithm
and generate strong association rules from the dataset (8)
◼ TID ITEMS Total Transactions = 6
min_sup = 0.33
◼ T1 Cake, Bread, Jam
min_sup_count = 6*.33 = 2
◼ T2 Cake, Bread
◼ T3 Cake, Coke, Chips Itemset {'Bread'} sup_count = 2
Itemset {'Cake'} sup_count = 4
◼ T4 Chips, Coke
Itemset {'Chips'} sup_count = 4
◼ T5 Chips, Jam Itemset {'Coke'} sup_count = 3
◼ T6 Cake, Coke, Chips Itemset {'Jam'} sup_count = 2
… UQP 4.1.6
Association Rules (min_conf to be 60%) :-

Itemset {'Cake', 'Bread'} sup_count= 2

Ok. {'Bread'} -> {'Cake'} confidence: 1.0 <- n(Bread, Cake) / n(Bread) = 2/2 = 1
Not Ok. {'Cake'} -> {'Bread'} confidence: 0.5 <- n(Bread, Cake) / n(Cake) = 2/4 = 0.5

Itemset {'Cake', 'Chips'} sup_count= 2

Not Ok. {'Cake'} -> {'Chips'} confidence: 0.5 <- n(Cake, Chips) / n(Cake) = 2/4 = 0.5
Not Ok. {'Chips'} -> {'Cake'} confidence: 0.5 <- n(Cake, Chips) / n(Chips) = 2/4 = 0.5

Itemset {'Cake', 'Coke'} sup_count= 2

Not Ok. {'Cake'} -> {'Coke'} confidence: 0.5 <- n(Cake, Coke) / n(Cake) = 2/4 = 0.5
Ok. {'Coke'} -> {'Cake'} confidence: 0.67 <- n(Cake, Coke) / n(Coke) = 3/4 = 0.67

Itemset {'Chips', 'Coke'} sup_count= 3

Ok. {'Chips'} -> {'Coke'} confidence: 0.75 <- n(Chips, Coke) / n(Chips) = 3/4 = 0.75
Ok. {'Coke'} -> {'Chips'} confidence: 1.0 <- n(Chips, Coke) / n(Coke) = 3/3 = 1

33
… UQP 4.1.6
Itemset {'Cake', 'Chips', 'Coke'} sup_count= 2

Not Ok.
{'Cake'} -> {'Chips', 'Coke'} confidence: 0.5 : n(Coke,Cake,Chips) / n(Cake)
{'Chips'} -> {'Cake', 'Coke'} confidence: 0.5 : n(Coke,Cake,Chips) / n(Chips)

Ok.
{Coke} -> {Cake,Chips} confidence: 0.67 : n(Coke,Cake,Chips) / n(Coke)
{Cake,Chips} -> {Coke} confidence: 1.0 : n(Coke,Cake,Chips) / n(Cake,Chips)
{Cake,Coke} -> {Chips} confidence: 1.0 : n(Coke,Cake,Chips) / n(Cake,Coke)
{Chips,Coke} -> {Cake} confidence: 0.67 : n(Coke,Cake,Chips) / n(Chips,Coke)

34
… UQP 4.1.6
Strong Association Rules for min_conf = 60%.

Itemset {'Cake', 'Bread'} sup_count= 2

{'Bread'} -> {'Cake'} confidence: 1.0 : n(Bread, Cake) / n(Bread) = 2/2

Itemset {'Cake', 'Coke'} sup_count= 2

{'Coke'} -> {'Cake'} confidence: 0.67 : n(Cake, Coke) / n(Coke) = 3/4

Itemset {'Chips', 'Coke'} sup_count= 3

{'Chips'} -> {'Coke'} confidence: 0.75 : n(Chips, Coke) / n(Chips) = 3/4
{'Coke'} -> {'Chips'} confidence: 1.0 : n(Chips, Coke) / n(Coke) = 3/3

Itemset {'Cake', 'Chips', 'Coke'} sup_count= 2

{Coke} -> {Cake,Chips} confidence: 0.67 : n(Coke,Cake,Chips) / n(Coke)
{Cake,Chips} -> {Coke} confidence: 1.0 : n(Coke,Cake,Chips) / n(Cake,Chips)
{Cake,Coke} -> {Chips} confidence: 1.0 : n(Coke,Cake,Chips) / n(Cake,Coke)
{Chips,Coke} -> {Cake} confidence: 0.67 : n(Coke,Cake,Chips) / n(Chips,Coke)

35
Additional Exercise 4.1.7 …
A database has ten transactions. Let the min- Hint:-
sup =30%. Find all frequent itemsets. Let min- Before you start, for
conf = 75%. Demonstrate association rule an easy workout, you
analysis using a frequent 3-item set from the may code the items
exercise. uniquely
TID List of Items
101 Litchi, Hill Banana, Strawberry Hill Banana H
102 Litchi, Passion Fruit Litchi L
103 Passion Fruit, Tomato Passion Fruit P
104 Litchi, Hill Banana, Strawberry Pears R
105 Pears, Strawberry Strawberry S
106 Pears
Tomato T
107 Pears, Passion Fruit
Watermelon W
108 Litchi, Hill Banana, Watermelon, Strawberry
109 Watermelon, Tomato
110 Litchi, Hill Banana
… Additional Exercise 4.1.7

Frequent Item Sets

{'Banana'} sup_count= 4.0
{'Grapes'} sup_count= 2.0
{'Litchi'} sup_count= 5.0
{'Plum'} sup_count= 3.0
{'Banana', 'Litchi'} sup_count= 4.0
{'Banana', 'Plum'} sup_count= 3.0
{'Litchi', 'Plum'} sup_count= 3.0
{'Banana', 'Litchi', 'Plum'} sup_count= 3.0

37
… Additional Exercise 4.1.7
Association Rule Example for {'Banana', 'Litchi', 'Plum’}: -

Given min-conf = 75%. :-

{'Banana'} -> {'Litchi', 'Plum'} confidence: 0.75

{'Plum'} -> {'Banana', 'Litchi'} confidence: 1.0

{'Banana', 'Litchi'} -> {'Plum'} confidence: 0.75

{'Banana', 'Plum'} -> {'Litchi'} confidence: 1.0

{'Litchi', 'Plum'} -> {'Banana'} confidence: 1.0

38
Additional Exercise 4.1.8
A database has four transactions. Let min-sup =60%. Find all
frequent itemsets.
Let min-conf = 80%. Demonstrate association rule analysis using
a frequent 3-item set from the exercise.
TID Date Items
T100 10/07/15 {K, A, D,B}
T200 10/07/15 {D, A, C, E, B}
T300 10/07/19 {C, A, B, E}
T400 22/10/10 {B, A, D}
Extras

40
Module 4: (Association Rule Analysis)

▪ Part - 1
▪ Association Rules-Introduction
▪ Methods to discover Association rules
▪ Apriori(Level-wise algorithm)

▪ Part - 2 (Advanced Frequent Itemset Mining)

1. Partitioning Algorithm
2. FP-tree Growth Algorithm.
Visualization:
https://athena.ecs.csus.edu/~mei/associationcw/Association.html
3. Pincer Search Algorithm,
4. Dynamic Itemset Counting Algorithm

1
KTU
4.2.1. List the modification methods to improve the efficiency of the
Apriori algorithm(3)
Describe any three methods to improve the efficiency of the
Apriori algorithm. (3)
Describe any three methods to improve the efficiency of the
Apriori algorithm. (3)

2
KTU
4.2.2 Discuss the partitioning algorithm for finding large itemset and
compare its performance with Apriori algorithm. (6)
Explain the partitioning algorithm for finding large itemset and explain
how it removes the disadvantage of Apriori algorithm. (6)
◼ Partitioning algorithm for finding large items (4)
◼ Explain how it removes the disadvantage of Apriori algorithm (2)
4.2.3 Illustrate the working of Pincer Search algorithm with an example
(6)
Illustrate the working of Pincer Search Algorithm with an example. (6)
Illustrate the working of Pincer Search Algorithm with an example. (6)
◼ Pincer Search Algorithm explanation (4)
◼ Illustration with an example (2)
4.2.4. Write about the bi-directional searching technique for pruning in
the pincer search algorithm (3)

3
KTU
4.2.5. Describe the working of the dynamic itemset counting technique
with a suitable example. Specify when to move an itemset from
dashed structures to solid structures. (8)
Describe how the dynamic itemset counting technique works with
a suitable example. Specify when to move an itemset from dashed
structures to solid structures. (6)
◼ dynamic itemset counting technique (4)
◼ explanation with a suitable example (2)
◼ Specify when to move an itemset from dashed structures to
solid structures. (2)

Note:- FP-tree Growth Algorithm – Problems included at the end

4
0. Apriori algorithm – Challenges and
Improvements

5
Apriori algorithm – Challenges and
Improvements
• Apriori - major computational challenges

– Multiple scans of the transaction database

– Huge number of candidate item sets

– High workload of support counting for candidates

• Improving Apriori: general ideas

– Reduce passes of transaction database scans

– Reduce the number of candidate item sets generated

– Improve support counting of candidates

6
Partition: Scan Database Only Twice
• Any itemset that is potentially frequent in DB must be
frequent in at least one of the partitions of DB

• Scan Database Only Twice

– Scan 1: Partition database. Find local frequent patterns

– Scan 2: Consolidate global frequent patterns

DB1 + DB2 + + DBk = DB

sup1(i) < σDB1 sup2(i) < σDB2 supk(i) < σDBk sup(i) < σDB
Direct Hashing and Pruning to Reduce the
Number of Candidates Generated
• A k-itemset whose corresponding hashing bucket count is
below the threshold cannot be frequent
– Candidates: a, b, c, d, e count itemsets
35 {ab, ad, ae}
– Hash enTrees 88 {bd, be, de}
• {ab, ad, ae} . .
.
• {bd, be, de}
.
. .

• …
102 {yz, qs, wt}
– Frequent 1-itemset: a, b, d, e
Hash Table
– ab is not a candidate 2-itemset if the sum of count of {ab, ad,
ae} is below support threshold
8
Sampling for Frequent Patterns
• Select a sample of original database, mine frequent
patterns within sample using Apriori

• Scan database once to verify the frequent itemsets

found from the sample

– Only borders of closure of frequent patterns are checked.

– Example: check abcd instead of ab, ac, …, etc.

• Scan database again to find missed frequent patterns

• Sampling for Frequent Patterns improves the efficiency of

frequent pattern mining by reducing database scans and
minimizing candidate itemsets.
9
Transaction reduction

• Remove transactions that do not contain frequent items in

later scans

• Direct Hashing and Pruning algorithm uses this technique.

10
1. Partitioning algorithm

11
Partitioning Algorithm
for Frequent Item Set Discovery
4.2.2 Discuss the partitioning algorithm for finding large itemset and
compare its performance with Apriori algorithm. (6)
Explain the partitioning algorithm for finding large itemset and
explain how it removes the disadvantage of Apriori algorithm. (6)
◼ Partitioning algorithm for finding large items (4)
◼ Explain how it removes the disadvantage of Apriori algorithm
(2)
Partioning Algorithm Pseudocode - Explanation
1. Initially the database D is logically partitioned into n partitions.
2. Generate local large itemsets: During the first database scan, the
algorithm counts the support of itemsets in each partition, using
algorithms such as Apriori. For each partition pi generate local
frequent itemsets of all lengths, L1i , L2i , … , Lki If the support is
greater than or equal to the minimum local support threshold.
3. Global candidate itemsets: Select all itemsets that are large in at
least one partition to generate the global candidate itemsets CjG
where, j = 1 to k.
4. Generate global frequent itemsets: Count the support of each global
candidate itemset CjG over the entire database. If the support is
greater than or equal to the minimum global support threshold,
include it in the global frequent itemsets LjG
Note that the algorithm reduces the number of database scans to two

13
Partioning Algorithm Illustration

◼ The Table below illustrates the use of the partition algorithm. The
database D is partitioned into three, each containing 2 transactions.
◼ Set minimum local support threshold as 2. Generate local large 1-
item itemset L1 and 2-item itemset L2
◼ (Note:- As the partition size set was too small, frequent 3-item
itemsets were not generated)
Local Large Local Frequent
TID Items Partition
Itemset L1 Itemset L2
T1 I1, I2, I3 1
I2:2, I3:2 {I2,I3}:2
T2 I2, I3, I4 1
T3 I4, I5 2
I4:2
T4 I1, I2, I4 2
T5 I1, I2, I3, I5 3 {I1, I2} :2
I1:2, I2:2, I3:2 {I1, I3} :2
T6 I1, I2, I3, I4 3 {I2, I3} :2

14
Partioning Algorithm Illustration
◼ Set minimum global support threshold as 4.
◼ Count the support of itemsets in the global candidate itemsets C1,
and C2. Generate frequent 1-item itemset L1 and 2-item itemset L2
◼ The frequent itemsets selected: {I1},{I2},{I3},{I4},{I1,I2},{I2,I3}
◼ (Note:- As the partition size set was too small, frequent 3-item
itemsets were not generated)
Global Global Global Frequent
Global
TID Items Candidate Candidate Itemsets (Min-
Support
Itemset C1 Itemset C2 Support=4)
T1 I1, I2, I3 {I1} 4 {I1}
T2 I2, I3, I4 {I2} 5 {I2}
T3 I4, I5 {I3} 4 {I3}
T4 I1, I2, I4 {I4} 4 {I4}
{I1, I2} 4 {I1, I2}
T5 I1, I2, I3, I5
{I1, I3} 3 Not Frequent
T6 I1, I2, I3, I4 {I2, I3} 4 {I2, I3}
15
Advantages of partitioning method
1. Large Itemset Property: A large itemset must be large in at least
one partition. So the focus shifts from analyzing the entire database to
identifying large itemsets within individual partitions, which is less costly.
2. Limited Memory: The memory required by a partition is relatively
less. The count of itemsets to be processed per partition is smaller
compared the entire database - this further reduces the memory needs.
3. Parallel and Distributed Processing: Each partition can be
processed independently, allowing for parallelization. In a distributed
computing environment, each partition can be assigned to a separate
processing unit, enabling efficient utilization of CPU and processing time.
4. Incremental Generation of Association Rules: When new data is
added to the database, only the partitions containing the new entrees
need to be processed to update the association rules. This approach
avoids the need to recompute the association rules, from scratch for the
entire database, saving computational resources and time.
16
2. FP-tree Growth Algorithm

17
FP-tree Growth Algorithm
for Frequent Item Set Discovery
◼ Solved Problem: Description
◼ https://www.geeksforgeeks.org/frequent-pattern-growth-
algorithm/
◼ Solved Problem: Video
◼ (https://www.youtube.com/watch?v=7oGz4PCp9jI)
◼ Frequent Pattern (FP) Growth Algorithm Association Rule Mining
Solved Example by Mahesh Huddar
◼ Solved Problem: Animation Software
◼ Provide the transaction List. The software will demonstrate the
solution, step by step
◼ https://athena.ecs.csus.edu/~mei/associationcw/FpGrowth.html
Drawbacks of Apriori Algorithm

◼ The Apriori Algorithm has two costly drawbacks

◼ Breadth-first (i.e., level-wise) search
◼ Candidate generation and counting
◼ At each step, candidate sets have to be built. Often
generates a huge number of candidates
◼ To count the frequency of candidate sets, the algorithm
repeatedly scans the database.
FP-tree Growth Algorithm
◼ The Frequent Pattern Growth Algorithm features the
following. These features reduce the bottlenecks of Apriori.
◼ Depth-first search
◼ Avoiding explicit candidate generation and counting.
Only two database scans needed. This is particularly
advantageous for large datasets.
◼ The FP-Growth algorithm utilizes a compact data structure
called ‘Tree" to store transactions in a compressed format,
aiding in efficient frequent itemset discovery.
◼ The FP-Tree is a compact, FP-TREE
prefix-based representation of Txn Itemsets
transaction data. T1 E, K, M, N, O, Y
◼ The root is an empty node. T2 D, E, K, N, O, Y
◼ Each node in the FP-Tree T3 A, E, K, M
corresponds to an item in a T4 C, K, M, U, Y
transaction and maintains a T5 C, E, I, K, O
frequency count.
◼ Traversing from the root to a
leaf node forms an itemset by
concatenating the items along
that path.
◼ In FP-Growth, transactions are
stored as paths in the FP-Tree,
where each node represents
an item and its count.

21
The Algorithm TXN ITEMSET
I. Scan DB once, find the frequent T1 E, K, M, N, O, Y
1-items. Sort frequent items in
T2 D, E, K, N, O, Y
descending order of frequency
T3 A, E, K, M
II. Scan DB again, construct FP-tree T4 C, K, M, U, Y
Example T5 C, E, I, K, O
Consider the transaction database
A 1
with 5 transactions composed of 11 L1
C 2
items. Let the min. support be 3. D 1 Item Freq.
STEP - I E 4 K 5
I 1
Scan the database. E 4
K 5
There are 5 frequent 1-items. M 3 M 3
L= {K:5, E:4, M:3, O:4, Y:3}, O 3 O 3
U 1
in the sorted order of frequency Y 3
Y 3
A 1
22
STEP – I Continued
TXN ITEMSET ORDERED ITEMSET K 5
T1 E, K, M, N, O, Y K, E, M, O, Y E 4
T2 D, E, K, N, O, Y K, E, O, Y
M 3
T3 A, E, K, M K, E, M
T4 C, K, M, U, Y K, M, Y O 3
T5 C, E, I, K, O K, E, O Y 3

• Order the items within each transaction by their frequency.

• Drop the infrequent items from the item set
• For example, T1:
• T1 {E, K, M, N, O, Y} -> {K, E, M, O, Y}
• The items are rearranged in the order of their frequency.
Item ‘N’ is discarded, as it’s not frequent (The support
count of N is 2, whereas 3 is set as the min. support)

23
Step –II (a) Insert ordered item set {K,E,M,O,Y} in FP Tree
TXN ITEMSETS ORDERED ITEMSETS
T1 E, K, M, N, O, Y K, E, M, O, Y
T2 D, E, K, N, O, Y K, E, O, Y
T3 A, E, K, M K, E, M
T4 C, K, M, U, Y K, M, Y
T5 C, E, I, K, O K, E, O

• Root node is designated as ‘NULL’

• The Ordered Item Sets are
inserted one after the other
• E.g. {K E M O Y}. Insert each
item in the itemset as per order.
• Mark the inserted nodes with ’1’
– the number of times the node
is visited (support)

24
Step –II (b) Insert ordered item set {K,E,O,Y} in FP Tree

TXN ITEMSETS ORDERED ITEMSETS

T1 E, K, M, N, O, Y K, E, M, O, Y
T2 D, E, K, N, O, Y K, E, O, Y

• The elements K and E are

already present in order in the
Tree.
• So, increase their support count
by 1
• K=2
• E=2

25
Step –II (b) Insert ordered item set {K,E,O,Y} in FP Tree

TXN ITEMSETS ORDERED ITEMSETS

T1 E, K, M, N, O, Y K, E, M, O, Y
T2 D, E, K, N, O, Y K, E, O, Y

• On inserting O:
• There is no direct link between E & O.
• So, create a new node for ‘O’.
• Link it with E.
• Assign a support count of 1 to O
• On inserting Y
• Create a new node for the item ‘Y’.
• Link it with O.
• Assign a support count of 1 to Y

26
Step –II (c) Insert ordered item set {K,E,M} in FP Tree
ORDERED
TXN ITEMSETS
ITEMSETS
T1 E, K, M, N, O, Y K, E, M, O, Y
T2 D, E, K, N, O, Y K, E, O, Y
T3 A, E, K, M K, E, M
T4 C, K, M, U, Y K, M, Y
T5 C, E, I, K, O K, E, O

• The elements K, E, and M

are already present, in
order, in the Tree.
• So, increase their support
count by 1
• K = 2+1 = 3
• E = 2+1 = 3
• M = 1+1 = 2
27
Step –II (d) Insert {K,M,Y} TXN ITEMSETS
ORDERED
ITEMSETS
• K is already present in T1 E, K, M, N, O, Y K, E, M, O, Y
order in the Tree. So, T2 D, E, K, N, O, Y K, E, O, Y
increase its support T3 A, E, K, M K, E, M
T4 C, K, M, U, Y K, M, Y
count by 1 (K = 4)
T5 C, E, I, K, O K, E, O
• On inserting M
• There is no direct link
between K and M.
• So, create a new
node for ‘M’. Link it
with ‘K’
• Assign M = 1
• On inserting Y
• Create a new node
‘Y’. Link it with M.
• Assign Y = 1
28
Step –II (e) Insert {K,E,O} in FP Tree
ORDERED
TXN ITEMSETS
ITEMSETS
T1 E, K, M, N, O, Y K, E, M, O, Y
T2 D, E, K, N, O, Y K, E, O, Y
T3 A, E, K, M K, E, M
T4 C, K, M, U, Y K, M, Y
T5 C, E, I, K, O K, E, O

• The elements K, E, O
are already present
in order in the Tree.
• So, increase their
support count by 1
• K=5
• E=4
• O=2

29
Conditional Pattern Bases
◼ Make a list of the frequent 1-items in the ascending order of their
frequencies
◼ For each of these items, find out all the paths leading from the
root. These paths are called the conditional pattern base.
◼ Example
◼ ‘Y’ can be reached from the root using the path (K->E->M-

>O). So K,E,M,O is a conditional pattern base for item ‘Y’.

◼ ‘Y’ can be reached using the paths KEMO, KEO, and KM. The 3

paths together form the conditional pattern base for item ‘Y

Y 3
O 3
M 3
E 4
K 5
30
Conditional Frequent Pattern Tree
For each item (Y,O,M,E,K), build conditional frequent pattern tree:-
◼ Consider one item at time (e.g., Y)

◼ Take the set of items that are common to all the paths in its

conditional pattern base.

◼ Calculate the support count of common items by summing

the support counts in all the paths.

◼ E.g., Consider ‘Y’. The conditional pattern base of ‘Y’ is

{KEMO:1}, {KEO:1}, {KM:1}.

◼ Observe that K is common to all paths. Sum the support count
‘K’ in all the paths. So the support count of K = 1+1+1 = 3

31
Conditional Frequent Pattern Tree
• Consider ‘O’.
• The conditional pattern base of ‘O’ is {KEM:1}, {KE:2}.
• Observe that KE is common to all paths.
• Sum the support count of ‘KE’ in all paths. So the support count
of KE = 1+2 = 3

32
Conditional Frequent Pattern Tree
• Consider ‘M’.
• The conditional pattern base of ‘M’ is {KE:2}, {K:1}.
• Observe that K is common to all paths.
• Sum the support count of ‘K’ in all the paths. So the support
count of K = 2 + 1 = 3

33
Frequent Itemsets
From the Conditional Frequent Pattern tree, the Frequent
Itemsets are generated by pairing itemsets in the ‘Conditional
Frequent Pattern Tree’ with the corresponding frequent 1-item.
◼ For example, consider the first itemset {K} of the conditional
frequent pattern tree. {K} will be paired with ‘Y’.
◼ The frequent pattern that emerges is {K,Y}, with frequency 3

CONDITIONAL
(FREQUENT
FREQUENT PATTERN FREQUENT ITEMSETS
1-ITEM)
TREE
Y {K} : 3 {K,Y} : 3
O {K,E} : 3 [ {K,O}:3, {E,O}:3, {E,K,O} :3 ]
M {K} : 3 {K,M} : 3
E {K} : 4 {E,K} : 3
K
34
Frequent Itemsets
◼ Consider the second itemset {K,E} of the conditional frequent
pattern tree. {K,E} will be paired with ‘O’.
◼ The frequent patterns that emerge are {K,O}, {E,O}, {E,K,O},
each with frequency 3.
◼ Similarly pair {K} with ‘M’. Then pair {K} with ‘E’.
◼ Final Result: ‘frequent itemsets’ listed in the last column.
CONDITIONAL
(FREQUENT
FREQUENT PATTERN FREQUENT ITEMSETS
1-ITEM)
TREE
Y {K} : 3 {K,Y} : 3
O {K,E} : 3 [ {K,O}:3, {E,O}:3, {E,K,O} :3 ]
M {K} : 3 {K,M} : 3
E {K} : 4 {E,K} : 3
K
35
4.2.6 KTU - May 2024: FP Growth
◼ Mention the advantages of FP Growth algorithm. Find
out the frequent item sets using FP Growth for the
following data. (8)

4.2.6 KTU - May

2024: FP Growth

36
4.2.7 KTU - June 2023: FP Growth
◼ A database has six TID ITEMS
transactions. Let min_sup T1 {f, a, c, d, m, p}
be 3. Find frequent itemsets T2 {a, b, c, f, m}
using FP growth algorithm.
◼ Answer:- T3 {b, f, j}
https://web.iitd.ac.in/~bspand
T4 {b, c, k, p}
a/MTL782FPTREE.pdf
(IIT Delhi) T5 {a, f, c, e, p, m}
T6 {f, a, c, d, m, p}

FP Growth IIT
Delhi

37
Ex.4.2.8 FP Ex.4.2.9 FP
(From Apriori Section) (From Apriori Section)
A, C, D A database has four transactions.
B, C, E Let min-sup =60%. Find all
A, B, C, E frequent itemsets using FP Growth
B, E
TID Items
Let min-support =50%. Find
T100 {K, A, D,B}
the frequent itemsets using FP
Growth T200 {D, A, C, E, B}
T300 {C, A, B, E}
Use: - T400 {B, A, D}
https://athena.ecs.csus.edu/~ Use: -
mei/associationcw/FpGrowth.
https://athena.ecs.csus.edu/~mei/
html
associationcw/FpGrowth.html
4.2.10.FP and Association Rules

A database has six transactions. Let min_sup be 60% and min_conf be

80%. Find frequent itemsets using FP Tree growth algorithm and
generate strong association rules. (8)

TID items bought

T1 I1, I2, I3
T2 I2, I3, I4
T3 I4, I5 FP and
Association Rules
T4 I1, I2, I4
T5 I1, I2, I3, I5
T6 I1, I2, I3, I4

Note: This was asked in 2023 October – for solution using Apriori
3. Pincer Search Algorithm

40
Pincer Search Algorithm
for Frequent Item Set Discovery
◼ University Questions
◼ Illustrate the working of Pincer Search Algorithm with an example. (6)
◼ Pincer Search Algorithm explanation (4)
◼ Explanation with an example (2)
◼ Write about the bi-directional searching technique for pruning in pincer search
algorithm (3)
◼ Answer: Pincer Search Algorithm Video and Writeup
◼ https://www.youtube.com/watch?v=Rb1gDmeBPxA
DM2 CL7 -PINCER SEARCH algorithm in data mining with
Example(മലയാളത്തിൽ)
◼ Corresponding Writeup:-
◼ https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=1000342
◼ Or https://vikramuniv.ac.in/files/wp-content/uploads/MCA-
IV_DataMining18_Pincer_Search_Algorithm_Keerti_Dixit.pdf
Pincer Search and Maximal frequent set
◼ A maximal frequent set is a frequent set where all of its
proper subsets are also frequent. Moreover, all of its proper
supersets are infrequent.
◼ Maximum Frequent Set (MFS) is a collection of all the
maximal frequent itemsets. It's like a master list containing all
the biggest groups of items that are frequent.
◼ The MFS acts as a boundary between the groups of items that are
popular and the ones that aren't. Everything in the MFS is
frequent, and everything outside of it is not.
◼ Instead of trying to find all the frequent itemsets, we focus on finding
the MFS only. From the MFS, we generate all other frequent itemsets.
◼ Once MFS is ready, we can get the count of all frequent items by
scanning the transaction database just once. No need for multiple
database scan for item count as is done in Apriori

42
Pincer Search Method
◼ The key concept behind Pincer-Search is the bi-directional
exploration of the search space.
◼ The top-down search starts with the largest possible
itemsets and gradually prunes them down
◼ The bottom-up search begins with the smallest
itemsets and expands them.
◼ The information gleaned from one direction is shared with the
other, allowing for computationally effective pruning of
candidate itemsets.
◼ Pincer search uses the Apriori algorithm for bottom-up search
approach to identify frequent itemsets of size-1, size-2 and so
on in sequence.

43
Pincer Search Method
◼ Pincer search maintains two special data structures
◼ MFS – Maximum Frequent Set
◼ MFCS, the Maximum Frequent Candidate Set.
◼ The MFCS efficiently identifies maximal frequent item sets of
large length. It starts with a single set of all the items, from
which subsets of frequent item sets are generated by a top-
down method.
◼ The Maximum Frequent set (MFS) comprises all maximally
frequent itemsets. It starts as a null set and is constructed
bottom-up.
◼ When the algorithm terminates, MFS = MFCS.
◼ We generate the subsets of all the sets in MFS - they all will be
frequent.

44
Pincer Search – A simple Example …
◼ Consider the following Problem
◼ Items 1, 2, 3, 4, 5.
◼ Transactions: {1, 3}, {1, 2}, {1, 2, 3, 4}
◼ Minimum Support: 0.5
◼ Initially, MFCS = {1, 2, 3, 4, 5}; MFS ={}.
◼ In the first pass, {1, 2, 3, 4, 5} is the candidate for the top-
down search.
◼ All 1-item itemsets are candidates for the bottom-up search

45
… Pincer Search – A simple Example
◼ Bottom-up:
◼ First pass: consider all 1-itemsets. Do support counting. We
observe that itemset {5} is not frequent.
◼ Second pass:
◼ Eliminate the supersets of {5} from further processing -
the itemsets {1, 5}, {2, 5}, {3, 5}, {4, 5} are discarded.
◼ The itemsets {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, and
{3, 4} are candidates for further processing.

46
… Pincer Search – A simple Example
◼ Top down:
◼ From the bottom-up search, itemset {5} was found infrequent.
◼ So, eliminate the supersets of {5}: MFCS = {1, 2, 3, 4}.
◼ Do support counting. {1, 2, 3, 4} is discovered to be frequent.
◼ Bottom up:
◼ From the top-down search, we found that {1, 2, 3, 4} is
frequent
◼ So all the subsets of {1, 2, 3, 4} must be frequent and they
need not be examined further.
◼ MFS is a frequent set where its proper subsets are frequent
and its proper supersets are infrequent. So, MFS = {1, 2, 3, 4}

47
… Pincer Search – A simple Example
◼ Top down:
◼ MFCS = {1, 2, 3, 4}. This is discovered to be frequent.
◼ Bottom up:
◼ MFS = {1, 2, 3, 4}
◼ MFCS = MFS. The program terminates.
◼ There is one Frequent 4 item itemset, MFS = {1, 2, 3, 4}.
◼ The subsets of this MFS are also frequent. This means:-
◼ Frequent 4 itemsets: {1, 2, 3, 4}.
◼ Frequent 3 itemsets: {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}.
◼ Frequent 2 itemsets:{1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}
◼ Frequent 1 itemsets:{1}, {2}, {3}, {4}
48
4. Dynamic Itemset Counting
algorithm

49
Dynamic Itemset Counting
for Frequent Item Set Discovery
◼ Describe the working of dynamic itemset counting technique
with suitable example. Specify when to move an itemset from
dashed structures to solid structures. (8)
◼ dynamic itemset counting technique (4)

◼ explanation with suitable example (2)

◼ explanation to move an itemset from dashed structures to

solid structures (2)

◼ Dynamic Itemset Counting Algorithm
◼ https://www.youtube.com/watch?v=SLhLJZK6KaE
◼ Dynamic Itemset Counting Solved Example Apriori
Algorithm Association Rule Mining by Mahesh Huddar
DYNAMIC ITEMSET COUNTING – OVERVIEW

◼ Dynamic itemset counting offers a more efficient and scalable

approach to discovering frequent itemsets in transactional data,
enabling insights into item associations and patterns with
reduced computational overhead and memory usage.
◼ Unlike traditional static counting methods, which involve
recalculating support counts for the entire dataset, dynamic
itemset counting updates support counts incrementally as each
transaction is processed.
◼ This incremental approach reduces computational overhead
and memory usage, making it particularly suitable for large
transaction databases.

51
DYNAMIC ITEMSET COUNTING – FEATURES
◼ Transaction-based Updates: Support counts of itemsets are updated
dynamically as each transaction is processed. Instead of re-evaluating all
transactions, only the relevant ones are considered for updating support
counts.

◼ Efficient Data Structures: DIC relies on efficient data structures such as

hash tables, trees, or other indexing mechanisms to maintain and update
support counts. These structures facilitate fast reTreeval and updating of
support counts during transaction processing.

◼ Memory Efficiency: Dynamic itemset counting optimizes memory usage by

updating the support counts incrementally and avoiding redundant storage of
transactional data. It focuses on maintaining only the necessary data
structures for efficient counting, leading to reduced memory requirements.

◼ Scalability: Dynamic itemset counting is scalable and can effectively handle

large transaction databases. Its incremental approach allows it to process
transactions efficiently, making it suitable for scenarios where traditional
methods may become impractical due to memory or processing constraints.

52
DIC PROCEDURE …
An itemset lattice contains all the possible itemsets for a
transaction database. Each itemset in the lattice points to all of its
supersets. When represented graphically, an itemset lattice can
help us to understand the concepts behind the DIC algorithm.
Itemsets are marked in four different ways as they are counted:
◼ Dashed circle: suspected infrequent itemset - an itemset we
are still counting that is below min_sup
◼ Dashed box: suspected frequent itemset - an itemset we are
still counting that exceeds min_sup
◼ Solid box: confirmed frequent itemset - an itemset we have
finished counting and exceeds the support threshold min_sup
◼ Solid circle: confirmed infrequent itemset - we have finished
counting, and it is below min_sup

53
DIC PROCEDURE …
1. Initialization:
◼ Start with an empty itemset marked with a solid square.
◼ Mark all 1-itemsets with dashed circles.

2. Iterative Counting:
While there are still dashed itemsets:
◼ Read M transactions at a time from the dataset.
◼ Increase the count of itemsets marked with dashes if they
are present in the transaction.
◼ Update markings based on count thresholds. (PTO)

54
… DIC PROCEDURE
… Step-3 continued
◼ On updating the markings:
If a dashed circle's count exceeds the minimum support
threshold (min_sup),
◼ Turn it into a dashed square.
◼ Check the immediate supersets: If any immediate superset
has all its subsets as squares (either solid or dashed), mark
that superset as a dashed circle and update its count.
◼ Completion:
Once all dashed itemsets has been counted through all the
transactions in the database, make it solid and stop counting
it.

55
DIC – ILLUSTRATION …

◼ Consider the following ◼ Rewrite the transactions

example with four to show the presence of
transactions. items with ‘1’ and absence
◼ We are interested in the with ‘0’
items A, B, C.

Txn Itemsets A B C
T1 A,B 1 1 0
T2 A 1 0 0
T3 B, C 0 1 1
T4 l,θ 0 0 0

https://www.youtube.com/watch?v=SLhLJZK6KaE
https://www2.cs.uregina.ca/~dbd/cs831/notes/itemsets/DIC.html

56
… DIC – ILLUSTRATION …
M=2 A B C ◼ Counters: A=0, B=0, C=0
1 1 0 ◼ Empty itemset marked solid box
1M
1 0 0 ◼ All 1-itemsets marked with
0 1 1 dashed circles
2M
0 0 0 ◼ Min-Sup = 25% = 4*.25 = 1

57
After M Transactions are read After 2M Transactions read
Counters: A=2, B=1, C=0, AB=0 Counters: A=2, B=2, C=1, AB=0,
Empty itemset marked in solid box. AC=0, BC=0. By now, the entire
database is read once. A, B, and C
A and B are >= min_sup: Mark in
are changed to solid squares as
dashed square. Mark ‘AB’ the
their sup. >=min_sup
immediate superset in dashed
circle. Mark ‘C’ in dashed circle. Counters are added for AC and BC

58
After 3M Transactions read After 4M Transactions read
Counters: A=2, B=2, C=1, AB=1, Counters: A=2, B=2, C=1, AB=1,
AC=0, BC=0. AB has been AC=0, BC=1. The entire database
counted through all transactions – is read once. AC and BC have
change it from dashed circle to been counted through all the
solid square. Mark BC with dashed transactions. Change their mark
square, as its immediate superset from dashed to solid. Don’t count
of the frequent itemsets B, and C. ABC, as AC is not frequent.

59
… DIC – ILLUSTRATION - Completion
Completion: As all the dashed itemsets has been counted through
all the transactions in the database, stop counting

60
Extras

61
from mlxtend.frequent_patterns import fpgrowth
from mlxtend.frequent_patterns import association_rules
FP Growth Python Code
import pandas as pd
# Given transactions print(frequent_itemsets)
data = [['A', 'B'], support itemsets
['A', 'B', 'C'], 0 0.777778 (B)
['A', 'B', 'C', 'E'], 1 0.777778 (A)
['A', 'B', 'C', 'E'], 2 0.666667 (C)
['A', 'B', 'D'], 3 0.222222 (E)
['A', 'C'], 4 0.222222 (D)
['A', 'C'], 5 0.555556 (A, B)
['B', 'C'], 6 0.555556 (A, C)
['B', 'D']] 7 0.444444 (C, B)
# Create a list of all unique items 8 0.333333 (A, C, B)
items = sorted(set(item for transaction in data for item in 9 0.222222 (E, C)
transaction))
10 0.222222 (A, E)
# Convert transactions into a DataFrame with binary representation
11 0.222222 (E, B)
df = pd.DataFrame([{item: (item in transaction) for item in items} for
transaction in data]) 12 0.222222 (A, E, C)

# Apply FP-Growth algorithm 13 0.222222 (E, C, B)

frequent_itemsets = fpgrowth(df, min_support=2/9, 14 0.222222 (A, E, B)

use_colnames=True) 15 0.222222 (A, E, C, B)
# Display frequent itemsets 16 0.222222 (D, B)
print("Frequent Itemsets:")
print(frequent_itemsets)
62
Dynamic itemset counting
to Reduce Number of Scans

ABCD
◼ Once both A and D are determined
frequent, the counting of AD begins
ABC ABD ACD BCD ◼ Once all length-2 subsets of BCD are
determined frequent, the counting of BCD
begins
AB AC BC AD BD CD
Transactions
1-itemsets
A B C D
Apriori 2-itemsets
…
{}
Itemset lattice 1-itemsets
2-items
S. Brin R. Motwani, J. Ullman,
and S. Tsur. and implication rules DIC 3-items
for market basket data.
SIGMOD’97
63
Partioning Algorithm
Pseudocode

Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Mod 5
No ratings yet
Mod 5
56 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Unit - III
No ratings yet
Unit - III
27 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Unit IV DWDM
No ratings yet
Unit IV DWDM
17 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Association Rules
No ratings yet
Association Rules
33 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
Chapter - 05 - Association Rules
No ratings yet
Chapter - 05 - Association Rules
38 pages
Unit 4
No ratings yet
Unit 4
97 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Unit 4
No ratings yet
Unit 4
72 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
Apriori
No ratings yet
Apriori
34 pages
04 AssociationRules
No ratings yet
04 AssociationRules
15 pages
04 AssociationRules PDF
No ratings yet
04 AssociationRules PDF
15 pages
Unit - 5 Machine Learning
No ratings yet
Unit - 5 Machine Learning
72 pages
What Is A Frequent Itemset?
No ratings yet
What Is A Frequent Itemset?
7 pages
Marketbasket Analysis
No ratings yet
Marketbasket Analysis
28 pages
Contents
No ratings yet
Contents
59 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
11 Association Rules Mining New
No ratings yet
11 Association Rules Mining New
32 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
667a8d24bb947 PPT
No ratings yet
667a8d24bb947 PPT
24 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
No ratings yet
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
10 pages
Apriori Algorithm Examples
No ratings yet
Apriori Algorithm Examples
45 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
The Data Science Workshop: A New, Interactive Approach to Learning Data Science
From Everand
The Data Science Workshop: A New, Interactive Approach to Learning Data Science
Anthony So
No ratings yet
Simplifying Data Science With Python
From Everand
Simplifying Data Science With Python
Billy David millican
No ratings yet
Core Concepts in Real Analysis
From Everand
Core Concepts in Real Analysis
Roshan Trivedi
No ratings yet
FALLSEM2022-23 MCSE502L TH VL2022230106912 2022-11-24 Reference-Material-II
No ratings yet
FALLSEM2022-23 MCSE502L TH VL2022230106912 2022-11-24 Reference-Material-II
16 pages
Using Path-Finding Algorithms of Graph Theory For Route-Searching
No ratings yet
Using Path-Finding Algorithms of Graph Theory For Route-Searching
23 pages
Adaptive Linear Neuron Using Linear (Identity) Activation Function With Batch Gradient Method
No ratings yet
Adaptive Linear Neuron Using Linear (Identity) Activation Function With Batch Gradient Method
19 pages
Lecture 3: Simple Sorting and Searching Algorithms: Data Structure and Algorithm Analysis
No ratings yet
Lecture 3: Simple Sorting and Searching Algorithms: Data Structure and Algorithm Analysis
39 pages
Problem Set 1
No ratings yet
Problem Set 1
2 pages
1
No ratings yet
1
2 pages
Predictive Parsing and LL (1) - Compiler Design - Dr. D. P. Sharma - NITK Surathkal by Wahid311
100% (2)
Predictive Parsing and LL (1) - Compiler Design - Dr. D. P. Sharma - NITK Surathkal by Wahid311
56 pages
FDS IN SEM Decode
No ratings yet
FDS IN SEM Decode
25 pages
106-Subsets With Duplicates (Easy) - Grokking The Coding Interview - Patterns For Coding Questions
No ratings yet
106-Subsets With Duplicates (Easy) - Grokking The Coding Interview - Patterns For Coding Questions
6 pages
Sorting and Searching
No ratings yet
Sorting and Searching
17 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
The Analysis of Forward and Backward Dynamic Programming For Multistage Graph
No ratings yet
The Analysis of Forward and Backward Dynamic Programming For Multistage Graph
7 pages
DS Module4
No ratings yet
DS Module4
27 pages
DEShaw Tagged LeetCode Problems 1661452730
No ratings yet
DEShaw Tagged LeetCode Problems 1661452730
3 pages
1 Gauss Seidel
No ratings yet
1 Gauss Seidel
20 pages
TCS NQT Syllabus PDF
No ratings yet
TCS NQT Syllabus PDF
9 pages
Cse373 12wi Midterm1.Key
No ratings yet
Cse373 12wi Midterm1.Key
10 pages
Ch.10 Numerical Methods
No ratings yet
Ch.10 Numerical Methods
1 page
1.1 Unit 01 Part 01 Introduction
No ratings yet
1.1 Unit 01 Part 01 Introduction
31 pages
Chapter Non-Parametric Methods
No ratings yet
Chapter Non-Parametric Methods
9 pages
Time Complexity
No ratings yet
Time Complexity
32 pages
A Guessing Game PDF
No ratings yet
A Guessing Game PDF
2 pages
Experimental Investigation of Heuristics For Resource-Constrained Project Scheduling: An Update
No ratings yet
Experimental Investigation of Heuristics For Resource-Constrained Project Scheduling: An Update
21 pages
University Institute of Engineering Department of Computer Science and Engineering
No ratings yet
University Institute of Engineering Department of Computer Science and Engineering
15 pages
Solution To Optimal Power Flow by PSO
No ratings yet
Solution To Optimal Power Flow by PSO
5 pages
Python DataStructure Question
No ratings yet
Python DataStructure Question
12 pages
Lecture - 1: CS-406 Data Structures and Algorithms
No ratings yet
Lecture - 1: CS-406 Data Structures and Algorithms
24 pages
Euclid's Algorithm: ENGI 1331: Exam 2 Review - Additional Practice Problems Fall 2020
No ratings yet
Euclid's Algorithm: ENGI 1331: Exam 2 Review - Additional Practice Problems Fall 2020
4 pages
Ai 3 Exp
No ratings yet
Ai 3 Exp
4 pages
21BCS10193 - Arjun - Kumar - Day 8
No ratings yet
21BCS10193 - Arjun - Kumar - Day 8
9 pages