0% found this document useful (0 votes)

34 views40 pages

Association

Uploaded by

321106410027

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views40 pages

Association

Uploaded by

321106410027

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Chapter 5: Mining Frequent Patterns, Association

and Correlations: Basic Concepts and Methods

 Basic Concepts

 Frequent Itemset Mining Methods

 Apriori Algorithm, Improvements to Apriori
 Association Rule Mining
 FP Growth Mining

 Pattern Evaluation Methods

 Summary

1
What Is Frequent Pattern Analysis?
 Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of frequent itemsets and association rule mining
 Motivation: Finding inherent regularities in data
 What products were often purchased together?
 Web pages of interest to groups of users
 What are the subsequent purchases after buying a PC?
 Finding structural patterns from chemical compounds or social media
 Applications
 Basket data analysis, cross-marketing, sale campaign analysis, Web
log (click stream) analysis, and DNA sequence analysis & motif
identification 2
Frequent Patterns: Frequent Itemsets
 A Frequent pattern in general captures an intrinsic and important
property of a dataset.
 Frequent patterns of a transaction database are the set of itemsets
frequently purchased together, which are called as Frequent Itemsets.
 itemset: A set of one or more items
 k-itemset X = {x1, …, xk}
 support count of X: Frequency or occurrence of an itemset X
 (relative) support, s, is the fraction of transactions that contains X (i.e.,
the probability that a transaction contains X)
 An itemset X is frequent if X’s support is no less than a minsup threshold

Transactions
Tid Items bought containing Transactions
10 Bread, Nuts, Jam both containing
20 Bread, Coffee, Jam Nuts
30 Bread, Jam, Eggs
40 Nuts, Eggs, Milk Transactions
50 Nuts, Coffee, Jam, Eggs, Milk containing Bread
3
Basic Concepts: Association Rules
Tid Items bought
 support, s, probability that a
10 Bread, Nuts, Jam transaction contains X  Y
20 Bread, Coffee, Jam  confidence, c, conditional
30 Bread, Jam, Eggs probability(Y|X) that a
40 Nuts, Eggs, Milk
50 Nuts, Coffee, Jam, Eggs, Milk
transaction having X also
contains Y
Let minsup = 50%, minconf = 50%
Min threshold on sup count=50% of 5≈3
Freq. Pat. : Bread:3, Nuts:3, Jam:4, Eggs:3, {Bread, Jam}:3
 Find all the rules X  Y with minimum support and
confidence
 Association rules formed from the 2-itemset:
 Bread  Jam (60%, 100%)
 Jam  Bread (60%, 75%) 4
Computational Complexity of Frequent Itemset
Mining
 How many itemsets are possibly generated in the worst case?
 Worst case: MN where M: # distinct items, and N: max length of
transactions (all combinations of items of the longest transaction can be
frequent enough)
 The number of frequent itemsets to be generated is senstive to the
minsup threshold
 When minsup is low, there exist exponential number of frequent
itemsets
 The worst case complexity vs. the expected probability
 Ex. Suppose Walmart has 104 distinct items
 The probability to pick up a specific item is 10-4
 The probability to pick up a particular set of 10 items: ~10-40
 What is the chance this particular set of 10 products to be frequent
with occurrence of 103 times in 109 transactions?
5
Chapter 5: Mining Frequent Patterns, Association
and Correlations: Basic Concepts and Methods

 Basic Concepts

 Frequent Itemset Mining Methods

 Pattern Evaluation Methods

 Summary

6
The Downward Closure Property and Scalable
Approaches to Frequent Pattern Mining
 The downward closure property of frequent patterns
 Any subset of a frequent itemset must be frequent

 If {bread, jam, nuts} is frequent, so is {bread, jam}

 i.e., every transaction having {bread, jam, nuts} also

contains {bread, jam}

 Scalable mining methods: Three major approaches
 Apriori (Agrawal & Srikant@VLDB’94)

 Freq. pattern growth (FPgrowth—Han, Pei & Yin

@SIGMOD’00)
 Vertical data format approach (Charm—Zaki & Hsiao

@SDM’02)
 First two approaches are there in the syllabus
7
Frequent Itemset Mining Methods

 Apriori: A Candidate Generation-and-Test

Approach

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

8
Apriori: A Candidate Generation & Test Approach
 Apriori pruning based on anti-monotone property:
If an itemset is found to be infrequent, its supersets are not
candidates to be generated/tested!
 Method:
1. Initially, scan DB once to get frequent 1-itemset
2. Generate length (k+1) candidate itemsets from length
k frequent itemsets
3. Test the candidates against DB and identify frequent
candidates
4. Repeat step 2& 3 for next k
5. Terminate when no frequent or candidate set can be
generated 9
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
Tid Items
L1 {A} 2
C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2
10
The Apriori Algorithm (Pseudo-Code)

Ck: Candidate itemset of size k

Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk ;
for each transaction t in database do
increment the count of all candidates in Ck+1 that
are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk ;
11
Implementation of Apriori

 How to generate candidates?

 Step 1: self-joining Lk
 Step 2: pruning
 Example of Candidate-generation
 L3={abc, abd, acd, ace, bcd}
 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because cde and ade are not in L3
 C4 = {abcd}
12
Condensed Representation: Closed
Patterns and Max-Patterns
 A long pattern contains a combinatorial number of sub-
patterns, e.g., {a1, …, a100} contains 2100 – 1 = 1.27*1030
sub-patterns! (correspond to non-zero rows of a truth
table of 100 variables)
 Solution: Mine closed patterns and max-patterns instead
 An itemset X is closed if X is frequent and there exists no
super-pattern Y ‫ כ‬X, with the same support as X
 An itemset X is a max-pattern if X is frequent and there
exists no frequent super-pattern Y ‫ כ‬X
 Closed pattern is a lossless compression of freq. patterns
 Used for reducing the # of patterns and rules

13
Closed Patterns and Max-Patterns
 Exercise. DB = {<a1, …, a100>, < a1, …, a50>}
 Min_sup = 1.
 What is the set of closed itemset?
 <a1, …, a100>: 1
 < a1, …, a50>: 2
 What is the set of max-pattern?
 <a1, …, a100>: 1
 What is the set of all patterns?
 !!
14
Representing frequent patterns: Example

 All frequent Itemsets= k Lk are listed below:

{<BCE:2>, <AC:2>,<BC:2>,<BE:3>,<CE:2>,<A;2>,<B:3>,<C:3>,<E:3>}
 Association rules are generated from these patterns
 Maximal patterns, M={<BCE:2>, <AC:2>}
 Closed patterns, C={<BCE:2>,<AC:2>,<BE:3>,<C:3>}
 We can infer all patterns and their supports from the set of
closed patterns, C.
 The support of an frequent itemset which is not in C, is equal
to the maximum support over all its closed superpatterns.
Eg: sup(B) = max {sup(BCE), sup(BE)}=max{2,3}=3
Similarly sup(BC)=2 and sup(CE)=2

15
Association Rule Formation from a
Frequent Pattern

 Generate all non-empty proper subsets from the

frequent itemset, f and from each of the subset,s,
generate a rule: s(f-s) and estimate its
confidence, and check against min confidence.
 Eg: generating ARs from <BCE:2> with min. conf 75%
Rule Rule Strong /weak
Id.
R1 BCE 2/2=100% strong
R2 BEC 2/3=67% weak
R3 CEB 2/2=100% Strong
R4 BCE Pruned because it has a parent rule (R2) which is weak
R5 EBC Pruned because it has a parent rule (R2) which is weak

R6 CBE 2/3=67% weak

16
Scalable Frequent Itemset Mining Methods

 Apriori: A Candidate Generation-and-Test

Approach

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

17
Further Improvement of the Apriori Method

 Major computational challenges

 Multiple scans of transaction database
 Huge number of candidates
 Tedious workload of support counting for candidates
 Improving Apriori: general ideas
 Reduce the number of transaction database scans
 Shrink the transaction database
 Shrink the number of candidates
 Facilitate support counting of candidates
18
Partition TDB: Scan Database Only Twice
 If the size of TDB is too huge making it non-memory
resident, each scan in the Apriori alg requires a lot of I/O.
Instead TDB can be partitioned such that the partitions fit
into the memory and Apriori alg finds the locally frequent
itemsets separately from each of them.
 Rationale: Any itemset that is globally frequent in TDB
must be locally frequent in at least one of the partitions.
 Scan 1: partition database and apply Apriori algorithm

on each partition separately and find locally frequent

patterns
 Scan 2: consolidate global frequent patterns

DB1 + DB2 + + DBk = TDB

sup1(i) < σDB1 sup2(i) < σDB2 supk(i) < σDBk sup(i) < σDB
Sampling for Frequent Patterns
1. Select a sample of original database, mine frequent patterns
(FISS) within sample using Apriori
2. Prepare set of candidate itemsets to contain the FISS and the
itemsets at the borders of closure of FISS
Example: include abcd also as a candidate itemset, if abc, acd, abd, and
bcd are found to be frequent patterns in the sample

3. Scan database once to count the support for the candidate

itemsets generated in step 2 and identify FIS by thresholding
4. Scan database again to find the support of possible
extensions of the missed frequent patterns, if any.
 This approach is very fast as it applies Apriori algorithm only on
a representative sample rather than the huge TDB and requires
a maximum of two scans of TDB .
20
DIC: Dynamic Itemset Counting

• TDB is partitioned into blocks marked by

start points and candidate itemsets can
be added at any of these start points.
• The support count of a candidate
itemset is finalized upon revisiting its
start point while scanning the TDB.
• Reduces the number of scans

Once both A and D gains the

minimum support required, AD is
introduced as a candidate itemset,
and support counting of AD begins
Once all length-2 subsets of BCD are
determined frequent, the counting of
BCD begins 21
DHP: Reduce the Number of Candidates

 In sparse transaction DBs most of the items qualify themselves as frequent

items (L1) but get filtered away as a member of larger itemsets (Lk). In a DB
with 100’s of distinct items, |C1| and |L1| are almost equal but |L2| is very
small compared to |C2|. Hence first scan also counts support for groups of
2-itemsets in a hash table.
 A 2-itemset whose corresponding hashing bucket count is below the
threshold cannot be frequent count itemsets
35 {ab, ad, ae}
 Min sup is 40 88 {bd, be, de}
Hash Table
 Hash entries are . .
. .
 {ab, ad, ae} . .

 {bd, be, de} …. 102 {yz, qs, wt}

 None of the 2-itemsets mapped into the backet ({ab, ad, ae}) are qualified
as candidates, since the total count of the bucket is below 40.
22
Scalable Frequent Itemset Mining Methods

 Apriori: A Candidate Generation-and-Test

Approach

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

23
Pattern-Growth Approach: Mining Frequent
Patterns Without Candidate Generation
 Bottlenecks of the Apriori approach
 Breadth-first (i.e., level-wise) search
 Candidate generation and test
 Often generates a huge number of candidates
 The FPGrowth Approach (J. Han, J. Pei, and Y. Yin, SIGMOD’ 00)
 Depth-first search
 Avoid explicit candidate generation
 Major philosophy: Grow long patterns from short ones using local
frequent items only
 “abc” is a frequent pattern
 Get all transactions having “abc”, i.e., project DB on abc: DB|abc
 “d” is a local frequent item in DB|abc  abcd is a frequent pattern
24
Construct FP-tree from a Transaction Database

TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o, w} {f, b} min_support = 3
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
1. Scan DB once, find Header Table
frequent 1-itemset (single
item pattern) Item frequency head f:4 c:1
f 4
2. Sort frequent items in c 4 c:3 b:1 b:1
frequency descending a 3
order, f-list b 3 a:3 p:1
m 3
3. Scan DB again, construct p 3
FP-tree m:2 b:1
F-list = f-c-a-b-m-p
p:2 m:1 25
Partition Patterns and Databases

 Frequent patterns can be partitioned into subsets

according to f-list
 F-list = f-c-a-b-m-p

 Patterns containing p

 Patterns having m but no p

 …

 Patterns having c but none of a, b, m, p

 Pattern f

 Completeness and non-redundancy achieved

26
Find Patterns Having P From P-conditional Database

 Starting at each frequent item in header table of the FP-tree,

traverse the FP-tree by following the parent link of each frequent
item, p
 Accumulate all of transformed prefix paths of item p to form p’s
conditional pattern base
{}
Header Table
f:4 c:1 Conditional pattern bases
Item frequency head
f 4 item cond. pattern base
c 4 c:3 b:1 b:1 c f:3
a 3
b 3 a:3 p:1 a fc:3
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1

p:2 m:1 p fcam:2, cb:1

27
From Conditional Pattern-bases to Conditional FP-trees

 For pattern-base of each locally frequent item

 Construct the cond-FP-tree from the pattern base to grow

the pattern
 Accumulate the count for each item in the base to identify

extended patterns m-conditional pattern base:

fca:2, fcab:1
{} m-conditional FP-tree
Header Table
Item frequency head {} All frequent
f:4 c:1 patterns relate to m
f 4
c 4 c:3 b:1 b:1 f:3 m,

a 3  fm, cm, am,
b 3 a:3 p:1 c:3 fcm, fam, cam,
m 3 fcam
p 3 m:2 b:1 a:3
p:2 m:1 |
b:1
28
Recursion: Mining Each Conditional FP-tree
{}

{} Cond. pattern base of “am”: (fc:3) f:3

c:3
f:3
am-conditional FP-tree
c:3 {}
Cond. pattern base of “cm”: (f:3)
a:3 f:3
m-conditional FP-tree
cm-conditional FP-tree

{}

Cond. pattern base of “cam”: (f:3) f:3

cam-conditional FP-tree

29
A Special Case: Single Prefix Path in FP-tree

 Suppose a (conditional) FP-tree T has a shared

single prefix-path P
 Mining can be decomposed into two parts
{}
 Reduction of the single prefix path into one node
a1:n1  Concatenation of the mining results of the two
a2:n2 parts
a3:n3
{} r1

b1:m1 C1:k1 a1:n1

 r1 = + b1:m1 C1:k1
a2:n2
C2:k2 C3:k3
a3:n3 C2:k2 C3:k3
30
Benefits of the FP-tree Structure

 Completeness
 Preserve complete information for frequent pattern
mining
 Never break a long pattern of any transaction
 Compactness
 Reduce irrelevant info—infrequent items are gone
 Items in frequency descending order: the more
frequently occurring, the more likely to be shared
 Never be larger than the original database (not count
node-links and the count field)

31
The Frequent Pattern Growth Mining Method

 Idea: Frequent pattern growth

 Recursively grow frequent patterns by pattern and

database partition
 Method
 For each frequent item, construct its conditional

pattern-base, and then its conditional FP-tree

 Repeat the process on each newly created conditional

FP-tree
 Until the resulting FP-tree is empty, or it contains only

one path—single path will generate all the

combinations of its sub-paths, each of which is a
frequent pattern

32
33
Performance of FPGrowth in Large Datasets

100
90 D1 FP-…
D1 Apriori…
80
70
Run time(sec.)

60
50 Data set T25I20D10K
40
30
20
10
0
0 1Support threshold(%)
2 3

FP-Growth vs. Apriori

34
Advantages of the Pattern Growth Approach
 Divide-and-conquer:
 Decompose both the mining task and DB according to the frequent
patterns obtained so far
 For huge TDBs the main memory may not be enough to accommodate the FP-tree in
full. The TDB can be partitioned into a set of projected databases along specific
frequent items and then apply FP-growth alg on each projection and the patterns
extracted can be extended by the suffix representing the frequent item.

 Leads to focused search of smaller databases

 Other factors
 No candidate generation, no candidate test
 Compressed database: FP-tree structure compresses dense TDBs around
ten fold
 No repeated scan of entire database
 Basic ops: counting local freq items and building sub FP-tree, no pattern
search and matching
35
Chapter 5: Mining Frequent Patterns, Association
and Correlations: Basic Concepts and Methods

 Basic Concepts

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

Evaluation Methods

36
Interestingness Measure: Correlations (Lift)
 play basketball  eat cereal [40%, 66.7%] is misleading
 The overall % of students eating cereal is 75% > 66.7%.
 play basketball  not eat cereal [20%, 33.3%] is more accurate,
although with lower support and confidence
 Measure of dependent/correlated events: lift

P( A B) Basketball Not basketball Sum (row)

lift  Cereal 2000 1750 3750
P( A) P( B)
Not cereal 1000 250 1250
2000 / 5000
lift ( B, C )   0.89 Sum(col.) 3000 2000 5000
3000 / 5000 * 3750 / 5000
1000 / 5000
lift ( B, C )   1.33
3000 / 5000 *1250 / 5000

37
Are lift and 2 Good Measures of Correlation?

 “Buy walnuts  buy milk [1%, 80%]” is misleading if 85% of

customers buy milk
 Support and confidence measure co-occurance and are not good to
indicate correlations
 Other widely used interestingness measures:
 All_conf(A,B)= min{P(A|B), P(B|A)} similarly Max_Conf(A,B)
 Kulczynski measure Kulc(A,B)= (P(A|B)+P(B|A))/2
 Cosine(A,B)= sqrt( P(A|B)*P(B|A) )
A measure is Null-Invariant if its value is free from the influence of
null-transactions. Above four measures are Null-Invariant where
as lift and 2 are not Null-invariant.
38
Comparison of Interestingness Measures

39
Which Null-Invariant Measure Is Better?
 IR (Imbalance Ratio): measures the imbalance of two itemsets A and B
in rule implications

 Datasets D4 through D6 are all neural (kulc =0.5) even with a lot of
variation in the individual frequencies of ‘m’ and ‘c’. Since their
Kulczynski value is unaffected, it is recommended to use Imbalance
Ratio (IR) together with Kulczynski for extracting interesting patterns.
 D4 is balanced & neutral

 D5 is imbalanced & neutral

 D6 is very imbalanced & neutral

04 0862 02 MS 6RP AFP tcm143-686151
68% (19)
04 0862 02 MS 6RP AFP tcm143-686151
12 pages
Week 3
No ratings yet
Week 3
56 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
DM 2
No ratings yet
DM 2
71 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
Association Rules
No ratings yet
Association Rules
48 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Slide 06 Chapter6 Frequent Itemset Mining Methods
No ratings yet
Slide 06 Chapter6 Frequent Itemset Mining Methods
62 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
P8 FPBasic
No ratings yet
P8 FPBasic
53 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
Unit 2
No ratings yet
Unit 2
65 pages
Unit 3
No ratings yet
Unit 3
62 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Module 3
No ratings yet
Module 3
136 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
7 - Association Rule Analysis
No ratings yet
7 - Association Rule Analysis
16 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
06apriori Edited v3
No ratings yet
06apriori Edited v3
29 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
Unit - 3 Mining Frequent Patterns
No ratings yet
Unit - 3 Mining Frequent Patterns
10 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Lecture 4
No ratings yet
Lecture 4
76 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Association Rules
No ratings yet
Association Rules
24 pages
Association Rules
No ratings yet
Association Rules
20 pages
Brute Force Search: Fundamentals and Applications
From Everand
Brute Force Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
cs516 Unit II
No ratings yet
cs516 Unit II
24 pages
ln13 Ds
No ratings yet
ln13 Ds
21 pages
Unit IV CPM, PERT and Crashing
100% (1)
Unit IV CPM, PERT and Crashing
12 pages
Ai Chapter 2
No ratings yet
Ai Chapter 2
27 pages
Death Before Birth Fetal Health and Mortality in Historical Perspective All Chapters Included
100% (14)
Death Before Birth Fetal Health and Mortality in Historical Perspective All Chapters Included
14 pages
Insurance Sem 4 - Copy1
No ratings yet
Insurance Sem 4 - Copy1
64 pages
3 Amigos - SVS-Fault - Test & Mod - Sierrafery
No ratings yet
3 Amigos - SVS-Fault - Test & Mod - Sierrafery
11 pages
FMEP Interactive Handbook Gold
0% (1)
FMEP Interactive Handbook Gold
5 pages
BeagleBone and Linux
80% (5)
BeagleBone and Linux
11 pages
Human Centred Design For Mental Health Services Workshop Report 250523
No ratings yet
Human Centred Design For Mental Health Services Workshop Report 250523
26 pages
Catalogue Khớp Nối Mềm Rắc Co
No ratings yet
Catalogue Khớp Nối Mềm Rắc Co
2 pages
Harvesting AND Marketing Farm Crops Products
No ratings yet
Harvesting AND Marketing Farm Crops Products
58 pages
LifeScience Front Matter Essential Science For Teachers Life Science
No ratings yet
LifeScience Front Matter Essential Science For Teachers Life Science
34 pages
The Warm and The Cold
No ratings yet
The Warm and The Cold
3 pages
Line Sizing Calculation - Pump Discharge
No ratings yet
Line Sizing Calculation - Pump Discharge
2 pages
Haran Resume (Quality)
No ratings yet
Haran Resume (Quality)
4 pages
Lecture 8 BEC
No ratings yet
Lecture 8 BEC
14 pages
The Problem and Its Background
100% (1)
The Problem and Its Background
12 pages
Final Exam Answers - History of Spirituality - I - 2011-12
100% (1)
Final Exam Answers - History of Spirituality - I - 2011-12
15 pages
Micro Teaching 1
No ratings yet
Micro Teaching 1
2 pages
PAVEMENT DESIGN REPORT (Tura Dalu CS)
No ratings yet
PAVEMENT DESIGN REPORT (Tura Dalu CS)
28 pages
Ebook Global Talent Management
No ratings yet
Ebook Global Talent Management
216 pages
Rubric For Oral Presentation
100% (1)
Rubric For Oral Presentation
1 page
Eschatology - Kingdom of God
50% (2)
Eschatology - Kingdom of God
13 pages
Europe 1900
No ratings yet
Europe 1900
1 page
Challenges That Face Entrepreneurships in Tanzania
No ratings yet
Challenges That Face Entrepreneurships in Tanzania
6 pages
Background of The Study
No ratings yet
Background of The Study
2 pages
Autocad Multtiple Choice Questions
No ratings yet
Autocad Multtiple Choice Questions
10 pages
Part 3 Speaking On The Phone Public Places
No ratings yet
Part 3 Speaking On The Phone Public Places
1 page
(PPT) Pre-Seed Pitch Deck Template
No ratings yet
(PPT) Pre-Seed Pitch Deck Template
14 pages
THE PEOPLE V DAUTI TIYESANJE PHIRI (1985)
No ratings yet
THE PEOPLE V DAUTI TIYESANJE PHIRI (1985)
2 pages
Jujutsu Kaisen Manga Chapter 241
No ratings yet
Jujutsu Kaisen Manga Chapter 241
1 page
Datasheet LT1171HV
No ratings yet
Datasheet LT1171HV
20 pages

Association

Uploaded by

Association

Uploaded by

Chapter 5: Mining Frequent Patterns, Association

and Correlations: Basic Concepts and Methods

 Frequent Itemset Mining Methods

 Pattern Evaluation Methods

 Frequent Itemset Mining Methods

 Pattern Evaluation Methods

 If {bread, jam, nuts} is frequent, so is {bread, jam}

 i.e., every transaction having {bread, jam, nuts} also

contains {bread, jam}

 Freq. pattern growth (FPgrowth—Han, Pei & Yin

 Apriori: A Candidate Generation-and-Test

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

C3 Itemset L3 Itemset sup

Ck: Candidate itemset of size k

 How to generate candidates?

 All frequent Itemsets= k Lk are listed below:

 Generate all non-empty proper subsets from the

R6 CBE 2/3=67% weak

 Apriori: A Candidate Generation-and-Test

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

 Major computational challenges

on each partition separately and find locally frequent

DB1 + DB2 + + DBk = TDB

3. Scan database once to count the support for the candidate

• TDB is partitioned into blocks marked by

Once both A and D gains the

 In sparse transaction DBs most of the items qualify themselves as frequent

 {bd, be, de} …. 102 {yz, qs, wt}

 Apriori: A Candidate Generation-and-Test

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

TID Items bought (ordered) frequent items

 Frequent patterns can be partitioned into subsets

 Patterns having m but no p

 Patterns having c but none of a, b, m, p

 Completeness and non-redundancy achieved

 Starting at each frequent item in header table of the FP-tree,

p:2 m:1 p fcam:2, cb:1

 For pattern-base of each locally frequent item

extended patterns m-conditional pattern base:

{} Cond. pattern base of “am”: (fc:3) f:3

Cond. pattern base of “cam”: (f:3) f:3

 Suppose a (conditional) FP-tree T has a shared

b1:m1 C1:k1 a1:n1

 Idea: Frequent pattern growth

pattern-base, and then its conditional FP-tree

one path—single path will generate all the

FP-Growth vs. Apriori

 Leads to focused search of smaller databases

 Frequent Itemset Mining Methods

 Which Patterns Are Interesting?—Pattern

P( A B) Basketball Not basketball Sum (row)

 “Buy walnuts  buy milk [1%, 80%]” is misleading if 85% of

 D5 is imbalanced & neutral

 D6 is very imbalanced & neutral

You might also like