CH - 5
CH - 5
CH - 5
(03105430)
Prof. Dheeraj Kumar Singh, Assistant Professor
Information Technology Department
CHAPTER- 5
Mining Frequent Patterns,
Associations, and Correlations
Market Basket Analysis
• Analyzes customer buying habits by finding associations between different
items that customers place in their “shopping baskets”.
• Help retailers to develop marketing strategies.
Shoes Jacket
Support=50%, Confidence=66% Frequent Itemset Support
Jacket Shoes {Shoes} 75%
Support=50%, Confidence=100% {Shirt} 50%
{Jacket} 50%
{Shoes, Jacket} 50%
Generation of candidate itemsets and frequent
itemset
Database D C
itemset sup.
itemset sup.
TID Items
1 {1} 2 L1 {1} 2
{2} 3 {2} 3
100 134 {3} 3
{3} 3
200 235 {5} 3
300 1235
Scan D {4} 1
{5} 3
400 25
C2 itemset
itemset sup {1 2}
C 2 {1 2} 1 {1 3}
L2 itemset sup {1 3} 2 Scan D {1 5}
{1 5} 1
{1 3} 2 {2 3} 2 {2 3}
{2 3} 2 {2 5} 3 {2 5}
{2 5} 3 {3 5} 2 {3 5}
{3 5} 2
Min support =50%
C3 itemset Scan D L3 itemset sup
{2 3 5}
{2 3 5} 2
Apriori Algorithm
Input: D, a database of transactions; min sup, minimum support count threshold.
Output: L, frequent itemsets in D.
Method: (1) L1 = find frequent 1-itemsets(D);
(2) for (k = 2;Lk-1 != φ; k++) {
(3) Ck = apriori_gen(Lk-1);
(4) for each transaction t ∈ D {
(5) Ct = subset(Ck , t)
(6) for each candidate c ∈ Ct
(7) c.count++;
(8) }
(9) Lk= {c ∈ Ck |c.count ≥ min sup}
(10) }
(11) return L = ∪k Lk
Apriori Algorithm
procedure apriori_gen (Lk-1 :frequent (k − 1)-itemsets)
(1)for each itemset l1 ∈ Lk-1
(2) for each itemset l2 ∈ Lk-1
(3) if (l1 [1] = l2 [1]) ∧ (l1 [2] = l2 [2]) ∧ ... ∧ (l1[k − 2] = l2 [k − 2]) ∧ (l1 [k − 1]
< l2 [k − 1]) then {
(4) c = l1 l2 ; // join step
(5) if has_infrequent_subset (c, Lk-1 ) then
(6) delete c; // prune step
(7) else add c to Ck ;
(8) }
(9) return Ck ;
Apriori Algorithm
procedure has_infrequent_subset (c: candidate k-itemset;
Lk-1 : frequent (k − 1)-itemsets);
(1) for each (k − 1)-subset s of c
(2) if s !∈ Lk-1 then
(3) return TRUE;
(4) return FALSE;
Generating Association Rules from Frequent Itemsets
confidence(A ⇒ B) = P(B|A) = support count(A ∪ B)
support count(A)
• In this variation, new candidate itemsets can be added at any start point
– It may need to repeatedly scan the database and check a large set of
candidates by pattern matching.
FP-Growth Algorithm
Input: D, a database of transactions; min sup, minimum support count threshold.
Output: The complete set of frequent patterns
• Build a compact data structure called the FP-tree using 2 passes over data-set.
Pass1:
– Scan data and find support for each item.
– Discard infrequent items.
– Sort frequent items in decreasing order based on their support.
Pass2: Construct the Frequent Pattern tree.
– Create root of an FP-tree, and label it as “null.”
– Let the sorted frequent item list in transaction be [pIP], where p is the first
element and P is the remaining list.
– Call insert tree([pIP], T)
FP-Growth Algorithm
• insert tree([pIP], T) Procedure:
– If T has a child N such that N.item-name=p.item-name,
– Then increment N’s count by 1,
– Else create a new node N, and let its count be 1, its parent link be linked to
T, and its node-link to nodes with same item-name via the node-link
structure.
– If P is nonempty, call insert tree(P, N) recursively.
• Next step is to extracts frequent itemsets directly from FP-tree through tree
traversal.
– The FP-tree is mined by calling FP-growth(FP tree, null).
FP-Growth Algorithm
procedure FP-growth(Tree, α)
(1) if Tree contains a single path P then
(2) for each combination (denoted as β) of nodes in path P
(3) generate pattern β ∪ α
with support count = minimum support count of nodes in β;
(4) else for each ai in the header of Tree {
(5) generate pattern β = ai ∪ α with support count = ai .support count ;
(6) construct β’s conditional pattern base and then β’s conditional FP tree
Treeβ ;
(7) if Treeβ != φ then
(8) call FP growth(Treeβ , β); }
FP-Growth or frequent-pattern growth Algorithm
I2:7
I2:4
I2:3
I2:2
I2:5
I2:6
I2: 1 Null I1:2
I1:1
TID List of item IDs
T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
Item Support Node- T400 I1,I1:4
I1:2
I1:3
I1:I4
I2, 1
I4:1
ID count link T500 I1, I3 I3:1
I3:2
I3:1 I3:2
I2 7 T600 I2, I3
T700 I1, I3
I1 6
T800 I1, I2, I3, I5
I3 6 T900 I1, I2, I3
I3:2
I3:1
I4:1
I4 2
I5:1
I5 2
I5:1
Item Header Table FP Tree
FP-Growth or frequent-pattern growth Algorithm
Null
Null
Item Support Node- I2:2
I2: 1
ID count link
I2 2
Item Support Node- I2:2
I2: 1
ID count link
I1 2 I2 2
I1:2
I1: 1
I3 1 I1 1
I1: 1
I3:1
The Conditional FP-tree associated Conditional FP-tree associated with the
with the conditional node I5. conditional node I4.
Item Conditional Pattern Conditional Frequent Patterns
Base FP-tree Generated
I5 {{I2, I1: 1}, {I2, I1, I3: <I2: 2, I1: 2> {I2, I5: 2}, {I1, I5: 2}, {I2,
1}} I1, I5: 2}
I4 {{I2, I1: 1}, {I2: 1}} <I2: 2> {I2, I4: 2}
FP-Growth or frequent-pattern growth Algorithm
null
I2:2
I2:4
I2:3
I2: 1 I1:2
I1:1
Item Support Node-
ID count link
I2 4
I1 4 I1:2
I1: 1
Item Conditional Pattern Base Conditional FP-tree Frequent Patterns Generated
I5 {{I2, I1: 1}, {I2, I1, I3: 1}} <I2: 2, I1: 2> {I2, I5: 2}, {I1, I5: 2}, {I2, I1, I5: 2}
Coffee Coffee
Confidence = P(Coffee|Tea) = 0.75
Tea 15 5 20 Support = P(Coffee) = 0.9
Tea 75 5 80 Then Lift = 0.75/0.9= 0.8333 (< 1v)
90 10 100 So this rule is negatively associated.
www.paruluniversity.ac.in