04 Frequent Patterns Analysis
04 Frequent Patterns Analysis
Lecture 4:
Frequent Patterns Analysis
Data Mining 1
Frequent Itemsets
Data Mining 3
Definition: Frequent Itemset
TID Items
1 Bread, Milk
2 Bread, Diaper, Coffee, Eggs
3 Milk, Diaper, Coffee, Coke
4 Bread, Milk, Diaper, Coffee
5 Bread, Milk, Diaper, Coke
4
Definition: Association Rule
Association Rule:
TID Items
– An implication expression of the form
X Y, where X and Y are itemsets. 1 Bread, Milk
– Example: 2 Bread, Diaper, Coffee, Eggs
{Milk, Diaper} {Coffee} 3 Milk, Diaper, Coffee, Coke
4 Bread, Milk, Diaper, Coffee
Rule Evaluation Metrics:
– Support (s): 5 Bread, Milk, Diaper, Coke
Fraction of transactions that contain
both X and Y. Example:
– Confidence (c):
{Milk, Diaper} Coffee
Measures how often items in Y
appear in transactions that
contain X. (Milk, Diaper, Coffee) 2
s 0.4
|T| 5
(Milk, Diaper, Coffee) 2
c 0.67
(Milk, Diaper) 3
5
Association Rule
Input: set of transactions T, over a set of items I.
Output: All itemsets with items in I having:
Support ≥ minsup threshold
Find all the rules X Y with minimum support and confidence:
Support (s) is probability that a transaction contains X U Y.
s = P(X U Y) = support count (X U Y) / number of all transactions
Data Mining 6
Example
Tid Items bought Customer buys both Customer
10 Juice, Nuts, Diaper buys diaper
20 Juice, Coffee, Diaper
30 Juice, Diaper, Eggs
40 Nuts, Eggs, Milk
Customer
50 Nuts, Coffee, Diaper, Eggs, Milk
buys Coffee
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Brute-force approach:
Each itemset in the lattice is a candidate frequent itemset.
database.
Data Mining 12
Illustration of the Apriori principle
Frequent
subsets
Found to be frequent
Data Mining 13
Illustration of the Apriori principle
null
A B C D E
AB AC AD AE BC BD BE CD CE DE
Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Infrequent supersets
ABCDE
Pruned
Data Mining 14
Illustration of the Apriori principle
TID Items
1 Bread, Milk
minsup = 3 2 Bread, Diaper, Coffee, Eggs
3 Milk, Diaper, Coffee, Coke
Item Count Items (1-itemsets) 4 Bread, Milk, Diaper, Coffee
Bread 4 5 Bread, Milk, Diaper, Coke
Coke 2 Itemset Count Pairs (2-itemsets)
Milk 4 {Bread,Milk} 3
Coffee 3 {Bread,Coffee} 2 (No need to generate candidates
Diaper 4 {Bread,Diaper} 3
Eggs 1 involving Coke or Eggs).
{Milk,Coffee} 2
{Milk,Diaper} 3
{Coffee,Diaper} 3
Triplets (3-itemsets)
If every subset is considered: Itemset Count
26 = 64 {Bread,Milk,Diaper} 2
With support-based pruning:
(No need to generate candidates involving
6 + 6 + 1 = 13
{Bread, coffee} or {Milk, coffee}).
the DB.
Eliminate candidates that are infrequent, leaving
Data Mining 16
Important Details of Apriori
How to generate candidates? Ck = candidate itemsets of size k
Step 1: self-joining Lk Lk = frequent itemsets of size k
Join any two itemsets from Lk if they share the same (k-1) prefix (i.e. differ by
last item only)
Step 2: pruning (omitted in most implementations)
Prune any itemset from Ck+1 if any of its k-itemset subsets is not in Lk
Example of Candidate-generation
L3={abc, abd, acd, ace, bcd}
Self-joining: L3* L3
abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L3
C4={abcd}
Data Mining 17
Example: Generate Candidates Ck+1
{a,b,c} {a,b,d}
C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
Data Mining 19
The Apriori Algorithm: Example (2)
minsupp = 2
TID Items F1-ISs F2-ISs F3-ISs F4-ISs
1 {A,B} A (7) AB (6) ABC (4) ABCD (2)
2 {B,C,D} B (9) AC (4) ABD (3) BCDE (1)
3 {A,B,C,D,E} C (7) AD (4) ABE (1)
Save frequents D (5) AE (2) ACD (2)
4 {A,D,E}
along with their E (3) BC (7) ACE (1)
5 {A,B,C} BD (4) ADE (2)
supports for
6 {A,B,C,D} later !!! BE (2) BCD (3)
7 {B,C} CD (3) BCE (2)
8 {A,B,C} CE (2) BDE (1)
9 {A,B,D} DE (2) CDE (1)
10 {B,C,E}
Data Mining 20
Rule Generation
ABC D, ABD C, ACD B, BCD A.
If |S| = k, then there are 2k – 2 candidate association rules
(ignoring S and S).
Data Mining 21
Different- Colored Cellular Phone
Faceplates
Data Mining 22
Phone Faceplate Data in Binary
Matrix Format
Data Mining 23
Item Sets with Support Count of At
Least Two (20%)
Data Mining 24
Generating Association Rule
For itemset {red, white, green}.
Rule 1: {red, white} => {green},
conf = sup {red, white, green} / sup {red, white} = 2/4 = 50%
Rule 2: {red, green} => {white},
conf = sup {red, white, green} / sup {red, green} = 2/2 = 100%
Rule 3: {white, green} => {red},
conf = sup {red, white, green} / sup {white, green} = 2/2 = 100%
Rule 4: {red} =>{white, green},
conf = sup {red, white, green} / sup {red} = 2/6 = 33%
Rule 5: {white} => {red, green},
conf = sup {red, white, green} / sup {white} = 2/7 = 29%
Rule 6: {green} => {red, white}
conf = sup {red, white, green} / sup {green} = 2/2 = 100%
If the desired min_conf is 70%, we got Rule 2, 3, 6.
Data Mining 25
Final Results for Phone Faceplate
Transactions
Data Mining 26
Example (3):
• Use Apriori to generate frequent itemsets for the following
transaction database:
Let: min sup = 60% and min conf = 80%.
TID Items-bought
T100 {F, A, C, D, G, I, M, P}
T200 {A, B, C, F, L, M, O}
T300 {B, F, H, J, O, W}
T400 {B, C, K, S, P}
T500 {A, F, C, E, L, P, M, N}
Data Mining 27
C1 L1 C2 L2 C3
A 3 A 3 AB 1 AC 3 ACF 3
B 3 B 3 AC 3 AF 3 ACM 3
C 4 C 4 AF 3 AM 3 AFM 3
D 1 F 4 AM 3 CF 3 CFM 3
E 1 M 3 AP 2 CM 3 CFP 2
F 4 P 3 BC 2 CP 3 CMP 2
G 1 BF 2 FM 3
H 1 BM 1
I 1 BP 1 C4
J 1 CF 3 ACFM 3
K 1 CM 3 L3
L 2 CP 3 ACF 3
M 3 FM 3 ACM 3 L4
N 1 FP 2 AFM 3 ACFM 3
O 2 MP 2 CFM 3
P 3 C5 =
S 1
W 1
Data Mining 28
• PHASE 2 OF APRIORI:
• For every frequent itemset L, we find all its proper subsets
and create the association rules as shown in the next
example:
• Let L be {A, C, F, M}
Rx: SxL - Sx
CONF(Rx) = SUPPORT(L) / SUPPORT(Sx)
Data Mining 29
R1:S1L - S1
ACFM
CONF(R1)= 3/3=100% > 80% STRONG
R2:S2L-S2
CAFM
CONF(R2)=3/3=100% > 80% STRONG
R3:S3L-S3
F ACM
CONF(R3)=3/3=100% >80% STRONG
R4:S4L-S4
MACF
CONF(R4)=3/3=100% >80% STRONG
R5:S5L-S5
ACFM
CONF(R5)=3/3=100% >80% STRONG
Data Mining 30
R6:S6L-S6
AFCM
CONF(R6)=3/3>100% >80% STRONG
R7:S7L-S7
AMCF
CONF(R7)=3/3=100 % >80% STRONG
R8:S8L-S8
CFAC
CONF(R8)=3/3=100 % >80% STRONG
R9:S9L-S9
CMAF
CONF(R9)=3/3=100 % >80% STRONG
R10:S10L-S10
FMAC
CONF(R10)=3/3=100 % >80% STRONG
Data Mining 31
R11:S11L-S11
ACFM
CONF(R11)=3/3=100 % >80% STRONG
R12:S12L-S12
ACMF
CONF(R12)=3/4=75 % < 80% NOT STRONG
R13:S13L-S13
AFMC
CONF(R12)=3/4=75 % < 80 % NOT STRONG
R14:S14L-S14
CFMA
CONF(R14)=3/3=100 % >80% STRONG
Data Mining 32
Example (4):
Use Apriori to generate frequent itemsets for the following transaction database:
Let min sup = 20% and min conf = 70%.
Data Mining 33
Data Mining 34
Generating association rules from
frequent itemsets
Data Mining 35
Data Mining 36
Example (5):
Use Apriori to generate frequent itemsets for the following transaction database:
Let min sup = 60% and min conf = 80%.
TID Items-Bought
T100 E, K, M, N, O, Y
T200 D, E, K, N, O, Y
T300 A, E, K, M
T400 C, K, M, U, Y
T500 C, E, I, K,O
Data Mining 37