Part 4 Mining Freqent Patterns
Part 4 Mining Freqent Patterns
t 1
Instructor: r e 4 5 , 1 A p a l
NICA
Dr. Mohamed H. Farrag DATA MININ
Concepts a n d Techniques
ModifiedforhtpoductiontoDataMiringbyDr.MohamedH. Farrag
Instructor: Dr. Mohamed H. Farrag 3 C o u r s e : Data Mining CMS: As s oci at on Analysis, Basic Concept 0 1 )
MarketBasket Analysis
• Given:
• A databaseofcustomertransactions(e.g.,shoppn.lgbaskets),where each
transactionisasetofitems(e.g., products)
• Find:
•Groupsofitemswhicharefrequentlypurchased together
A Cash tece01
trefisection: <1'{A) B5 C
• Itemset
—A collection of one or more items
• Example: {Milk, Bread, Diaper} HI) I t e m s
—k-itemset 1 Bread, Milk
• An itemset that contains K items 2 Bread, Diaper, Beer, Eggs
• Support count (a) 3 Milk, Diaper, Beer, Coke
—Frequency of occurrence of an itemset 4 Bread, Milk, Diaper, Beer
—E.g. G({Milk, Bread, Diaperl) = 2 5 Bread Milk, Diaper Coke
• Support
—Fraction of transactions that contain an itemset
—E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
—An itemset whose support is greater than or equal to a minsup
threshold
Instructor: Dr. Mohamed H. Farrag 8 Course: Data Mining Ch4: Association Analysis Basic Concept C Op_I—
Definition:Association Rule
•Association Rule
—Find alltherulesoftheformX—›Y, where X TM Items
andYare itemsets 1 Bread, Milk
—Withminimumconfidenceand support
2 Bread,Diaper,Beer, Eggs
• Example: 3 Milk,Diaper,Beer, Coke
(Milk,Diaper)—› {Beer} 4 Bread,Milk,Diaper, Beer
5 Bread,Milk,Diaper, Coke
• RuleEvaluation Metrics
- Support (s) Example:
• Fraction of transactions that contain {Milk,Diaper} {Beer}
bothXandYP(XUnionY) I T
s=c(Milk,Diaper, Beer)—04
- Confidence (c) T 5
• Measures how often items in Y c=a(Milk,Diaper,Beer)——
2 0.67
appear i n transactions t h a t c(Milk,Diaper) 3
containX P(YIX)
(Th
Instructor: Dr. Mohamed H. Farrag 10 Course: Data Mining CMS: Association Analysis, Basic Concept a d o —
Association Rules: Basics
• Typical representation formats for association rules:
—"IF buys diapers, THEN buys beer in 60% of the cases. Diapers and
beer are bought together in 0.5% of the rows in the database."
instructor: Dr. Mohamed H. Farrag 11 C o u r s e : Data Mining OM: Association Analysis Basic Concept 0 1 ) —
Association Rules: Basics
instructor: Dr. Mohamed H. Farrag 12 C o u r s e : Data Mining CMS: Association Analysis, Basic Concept: 0 1 ) —
Association Rule Mining Task
Instructor: Dr. Mohamed H. Farrag 13 C o u r s e : Data Mining CMS: Association Analysis, Basic Concept a d o —
Association Rules: Basics
• Given: (1) database of transactions, (2) each transaction
is a list of items bought (purchased by a customer in a
visit)
O a r ) a g h t
100 A.B.0
200 A.0 (B) and fC1
400 A.D {ID}, fEl and {F)
500 AC
pairs
• Find: all rules with minimum support and confidence
• I f min. support 50% and min. confidence 50%, then
A C [50%, 66.6%], C = A [50%, 100%]
Instructor: Dr. Mohamed H. Farrag 14 C o u r s e : Data Mining Ch4: Association Analysis, Basic Concept O D —
Mining Association Rules
Observations:
• All t h e above rules a r e binary partitions o f t h e s a m e itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
Instructor: Dr. Mohamed H. Farrag 15 C o u r s e : Data Mining CMS: Association Analysis, Basic Concept: O D —
Mining Association Rules
• Tw o - s t e p approach:
1. Frequent ltemset Generation
—Generate all itemsets whose support≥ minsup
2. Rule Generation
—Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset
Instructor: Dr. Mohamed H. Farrag 16 C o u r s e : Data Mining Chit Association Analysis Basic Concept: O r o —
FrequentItemset Generation
Givenditems, there
are2d possible
candidate itemsets
Instructor: Dr.MohamedH. Farrag 17 C o u r s e : DataMiningCh4:AssociationAnalysis Basic Concept.
Frequent Itemset Generation
• Brute-force approach:
—Each itemset in the lattice is a candidate frequent itemset
—Count the support of each candidate by scanning the database
Tr a n s a c t i o n s List o f
Candidates
TID Bents I
1 Bread, M i l k
2 Bread, Diaper, Beer, Eggs
3 M i l k , Diaper, Beer, Coke
4 Bread, M i l k , Diaper, Beer
5 Bread, M i l k , Diaper, Coke
W
Instructor: Dr. Mohamed H. Farrag 18 C o u r s e : Data Mining CMS: Association Analysis, Basic Concept: O D —
Computational Complexity
d—I
R = X I
R=1 J = 1
0,
2
0
,-, 3 3d + 1
If d=6, R = 602 rules
1
4 6 10
Instructor: Dr. Mohamed H. Farrag 19 C o u r s e : Data Mining CMS: Association Analysis, Basic Concept a d o —
Frequent Itemset Generation Strategies
instructor: Dr. Mohamed H. Farrag 20 C o u r s e : Data Mining OM: Association Analysis Basic Concept —
Reducing Number of Candidates
• Apriori principle:
—If an itemset is frequent, then all of its subsets must
also be frequent
instructor: Dr. Mohamed H. Farrag 21 C o u r s e : Data Mining CMS: Association Analysis Basic Concept O D —
IlustratingApriori Principle
Instructor:Dr.MohamedH.Farrag 22Course:DataMiningCM:AssociationAnalysisBasicConcept 00
—
Association Rule Generation
• Association rule mining is a two-step process:
STEP 1: Find the frequent itemsets: the sets of
items that have minimum support.
- So called Apriori trick: A subset of a frequent itemset must also be a
frequent itemset:
• i.e., i f {AB} is a frequent itemset, both {A} and {B} should be
frequent itemsets
—Iteratively find frequent itemsets with size from 1 to k (k-itemset)
Instructor: Dr. Mohamed H. Farrag 23 C o u r s e : Data Mining C h t Association Analysis Basic Concept: O r o —
Frequent Sets with Apriori
• Join Step: C k is generated by joining Lk4with itself
• Prune Step: Any (k-/)-itemset that is not frequent cannot be a subset
of a frequent k-itemset
Pseudo-code:
Ck: Candidate itemset of size k; Lk: Frequent itemset of size k
Li = {frequent items};
for (k 1 ; Lk !=0; k-F-F) do begin
Ck+1 = {candidates generated from Lk};
for each transaction tin database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = (candidates in Ck+1 with min_support)
end
return Uk Lk;
C.N
Instructor: Dr. Mohamed H. Farrag 24 Course: Data Mining CM: Association Analysis, Basic Concept a d o —
Apriori Algorithm
—Fk: frequent k-itemsets
—Lk: candidate k-itemsets
• Algorithm
—Let k=1
—Generate Fl = {frequent 1-itemsets}
—Repeat until Fk is empty
• Candidate Generation: Generate Lk+1 from Fk
• Candidate Pruning: Prune candidate itemsets in Lk+1 containing
subsets of length k that are infrequent
• Support Counting: Count the support of each candidate in Lk+1 by
scanning the DB
• Candidate Elimination: Eliminate candidates in Lk+1 that are
infrequent, leaving only those that are frequent =› F
- k+1
(Th
Instructor: Dr. Mohamed H. Farrag 25 Course: Data Mining CM: Association Analysis, Basic Concept a d o —
Apriori Candidate Generation
• The Apriori principle:
Any subset of a frequent itemset must be frequent
• L3={abc, abd, acd, ace, bcd}
• Self-joining: L3113
—abcd from abc and abd
—acde from acd and ace
• Pruning:
—acde is removed because ade is not in L3
• C4={abcd}
instructor: Dr. Mohamed H. Farrag 26 C o u r s e : Data Mining OM: Association Analysis Basic Concept 0 1 _
Candidate Generation: FicA x Fk4 Method
• Merge two frequent (k-1)-itemsets if their first (k-2) items are identical
• F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE}
—Merge(ABC, ABD) = ABCD
—Merge(ABC, ABE) = ABCE
—Merge(ABD, ABE) = ABDE
instructor: Dr. Mohamed H. Farrag 27 C o u r s e : Data Mining OM: Association Analysis Basic Concept 0 1 ) —
Candidate Pruning
• Candidate pruning
—Prune ABCE because ACE and BCE are infrequent
—Prune ABDE because ADE is infrequent
instructor: Dr. Mohamed H. Farrag 28 C o u r s e : Data Mining OM: Association Analysis Basic Concept 0 1 8 ) —
Alternate FkA x Fic_i Method
• Merge two frequent (k-1)-itemsets if the last (k-2) items of the first one
is identical to the first (k-2) items of the second.
• F3 = {ABC,ABD,ABE,ACD,BCD BDE,CDE}
—Merge(ABC, BCD) = ABCD
—Merge(ABD, BDE) = ABDE
—Merge(ACD, CDE) = ACDE
—Merge(BCD, CDE) = BCDE
instructor: Dr. Mohamed H. Farrag 29 C o u r s e : Data Mining OM: Association Analysis Basic Concept 0 1 ) —
Candidate Pruning for Alternate Fic_i x Fk_i Method
instructor: Dr. Mohamed H. Farrag 30 C o u r s e : Data Mining OM: Association Analysis Basic Concept 0 1 ) —
Rule Generation
C.N
Instructor: Dr. Mohamed H. Farrag 31 Course: Data Mining CMS: Association Analysis, Basic Concept a d o
Rule Generation
Instructor: Dr. Mohamed H. Farrag 32 C o u r s e : Data Mining CM: Association Analysis, Basic Concept a d o —
RuleGenerationforApriori Algorithm
Latticeof rules
Low W W I
Confideetc
e
Pruned
Rules
Database D Cl
•
• itemset sup.
TID Items • i t e m s e t sup.
• {1} 2 •
100 134 {1} 2
Scan D {2} 3
200 2 35 • {2} 3
300 12 3 5 {3} 3
{4} 1 {3} 3
400 2 5 {5} 3
{5} 3
nstructor: Dr. Mohamed H. Farrag 34 C o u r s e : Data Mining Ch4: Association Analysis, Basic Concept 0 1 ) —
itemset itemset sup
{1 2 } {1 2} 1 itemset sup
{1 3 } {1 3} 2 {1 3} 2
{1 5 } {1 5} 1 {2 3} 2
{2 3} {2 3} 2 {2 5} 3
{2 5} {2 5} 3 {3 5} 2
T3 51 {3 5} 2
Instructor: Dr. Mohamed H. Farrag 3 5 C o u r s e : Data Mining C h t Association Analysis Basic Concept: O r o 1
itemsetsup
{2 3 5} 2
Instructor: Dr. Mohamed H. Farrag 36 C o u r s e : Data Mining CM: Association Analysis, Basic Concept a d o
Search Space of 12345
Database D
123 1 2 4 1 2 5 1 3 4 1 3 5 1 4 5 2 3 4 2 3 5 2 4 5 3 4 5
12 1 3 14 15 23 2 25 3 4 35 45
instructor: Dr. Mohamed H. Farrag 3 7 C o u r s e : Data Mining CMS: Association Analysis Basic Concept: 0 1 ) —
Apriori Trick 12345
on Level 1
1245 1 2 3 5 1 2 3 4
135 1 4 5 2 3 4 2 3 5 2 4 5 3 4 5
Instructor: Dr. Mohamed H. Farrag 3 8 C o u r s e : Data Mining Chit Association Analysis Basic Concept: O r o _
AprioriTrick 1 2 3 4 5
onLevel 2
'345 2345 1245 1235 1234
123 174 125 134 135 145 234 235 245 345
12 13 23 24 25 34 35
ApplytheApriorialgorithmtofindallitemsets with
support ›= 0.2fromthefollowing data:
Transaction Itemsin Transaction
Milk,Bread, Eggs
2 Milk, Juice
3 Juice, Butter
4 Milk,Bread, Eggs
5 Coffee, Eggs
6 Coffee
Coffee, juice
Milk,Bread,Cookies, Eggs
Cookies, Butter
Milk, Bread
41
Question 1: Applying the Apiori Algorithm
42
Question 1: Applying the Apiori Algorithm
Eggs, Cookies
4
43
Question 1: Applying the Apiori Algorithm
Itemset Count i
Milk, Bread, higs I 3
44
Question 1: Applying the Apiori Algorithm
45
Question 2: Applying the Apiori Algorithm
46
Question 2: Applying the Apiori Algorithm
47
Question 2: Applying the Apiori Algorithm
48
Question 2: Applying the Apiori Algorithm
49
Question 2: Applying the Apiori Algorithm
50
Question 2: Applying the Apiori Algorithm
51
Question 2: Applying the Apiori Algorithm
52
Question 2: Applying the Apiori Algorithm
53
Question2:Applying theApiori Algorithm
AssociationRulesfor{Milk, Eggs}:
Transaction Items in
{Milk} > {Eggs} Transactio
Milk,Bread, Eggs
Support= 3/10 = 03 3
Milk, Juice
juice, Butter
Confidence=3/5 = 0.6 Milk,Bread, Eggs
Coffee, Eggs
ICoffee
{Eggs}-> {Milk} 8
Coffee, juice
Milk, Bread,
Support = 9
Cookies, Eggs
Cookies, Butter
Confidence = 10 • Milk, Bread
Question2:Applying theApiori Algorithm
AssociationRulesfor{Milk, Eggs}:
Transaction Items in
{Milk} > {Eggs} Transaction
Milk,Bread, Eggs
Support= 3/10 = 0.25 Milk, juice
Juice, Butter
Confidence=3/5 = 0.6 Milk,Bread, Eggs
Coffee, Eggs
Coffee
Coffee, Juice
Milk, Bread,
Cookies, Eggs
Cookies, Butter
Milk, Bread
Question 2: Applying the Apiori Algorithm
56
Question2:Applying theApiori Algorithm
AssociationRulesfor{Bread Eggs}:
Transaction Items in
{Bread} > {Eggs} I I
Transaction
Milk,Bread, Eggs
Support= 3/10 = 0.3 11111•11111- Milk, juice
Juice, Butter
Confidence=3/4 = 0.75 Milk,Bread, Eggs
Coffee, Eggs
Coffee
{Eggs}-> {Bread} Coffee, Juice
Milk, Bread,
Support = Cookies, Eggs
Cookies, Butter
Confidence = Milk, Bread
Question2:Applying theApiori Algorithm
58
Question 2: Applying the Apiori Algorithm
59