0% found this document useful (0 votes)

20 views49 pages

DM - Unit 2

The document discusses Association Rule Mining, which aims to identify rules predicting item occurrences in transaction datasets, exemplified by market-basket transactions. It outlines concepts such as frequent itemsets, support, and confidence, and describes the Apriori algorithm for efficient rule generation by reducing candidate itemsets and comparisons. The document emphasizes the computational complexity and strategies for optimizing the mining process.

Uploaded by

raju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views49 pages

DM - Unit 2

Uploaded by

raju

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 49

Motivation: Association Rule Mining

• Given a set of transactions, find rules that will predict the

occurrence of an item based on the occurrences of other
items in the transaction

Market-Basket transactions
Example of Association
TID Items Rules
{Diaper}  {Beer},
1 Bread, Milk {Milk, Bread}  {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread}  {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!

1
Applications: Association Rule Mining

• *  Maintenance Agreement
– What the store should do to boost Maintenance
Agreement sales
• Home Electronics  *
– What other products should the store stocks up?
• Attached mailing in direct marketing
• Detecting “ping-ponging” of patients
• Marketing and Sales Promotion
• Supermarket shelf management
2
Definition: Frequent Itemset
• Itemset
– A collection of one or more items
• Example: {Milk, Bread, Diaper}
– k-itemset
• An itemset that contains k items TID Items

• Support count () 1 Bread, Milk

– Frequency of occurrence of an itemset 2 Bread, Diaper, Beer, Eggs
– E.g. ({Milk, Bread,Diaper}) = 2 3 Milk, Diaper, Beer, Coke
• Support 4 Bread, Milk, Diaper, Beer
– Fraction of transactions that contain an 5 Bread, Milk, Diaper, Coke
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
– An itemset whose support is greater
than or equal to a minsup threshold
3
Definition: Association Rule
• Association Rule TID Items

– An implication expression of the form 1 Bread, Milk

X  Y, where X and Y are itemsets 2 Bread, Diaper, Beer, Eggs
– Example: 3 Milk, Diaper, Beer, Coke
{Milk, Diaper}  {Beer} 4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
• Rule Evaluation Metrics
– Support (s) Example:
• Fraction of transactions that contain {Milk, Diaper}  Beer
both X and Y
– Confidence (c)
 (Milk , Diaper, Beer) 2
• Measures how often items in Y s  0.4
appear in transactions that |T| 5
contain X
 (Milk, Diaper, Beer) 2
c  0.67
 (Milk , Diaper) 3
4
Association Rule Mining Task
• Given a set of transactions T, the goal of
association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

• Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
 Computationally prohibitive!
5
Computational Complexity
• Given d unique items:
– Total number of itemsets = 2d
– Total number of possible association rules:

 d 
d1  d  k 
d k
R       
 k   j 
k 1 j 1

3  2  1
d d 1

If d=6, R = 602 rules

6
Mining Association Rules: Decoupling
TID Items Example of Rules:
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs {Milk,Diaper}  {Beer} (s=0.4, c=0.67)
{Milk,Beer}  {Diaper} (s=0.4, c=1.0)
3 Milk, Diaper, Beer, Coke
{Diaper,Beer}  {Milk} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Beer
{Beer}  {Milk,Diaper} (s=0.4, c=0.67)
5 Bread, Milk, Diaper, Coke {Diaper}  {Milk,Beer} (s=0.4, c=0.5)
{Milk}  {Diaper,Beer} (s=0.4, c=0.5)
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements 7
Mining Association Rules

• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset

• Frequent itemset generation is still

computationally expensive

8
Frequent Itemset Generation
• Brute-force approach:
– Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the
database
Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w

– Match each transaction against every candidate

– Complexity ~ O(NMw) => Expensive since M = 2d !!! 9
Frequent Itemset Generation Strategies
• Reduce the number of candidates (M)
– Complete search: M=2d
– Use pruning techniques to reduce M

• Reduce the number of transactions (N)

– Reduce size of N as the size of itemset increases
– Use a subsample of N transactions

• Reduce the number of comparisons (NM)

– Use efficient data structures to store the candidates or
transactions
– No need to match every candidate against every
transaction
10
Reducing Number of Candidates: Apriori
• Apriori principle:
– If an itemset is frequent, then all of its subsets must also
be frequent

• Apriori principle holds due to the following property

of the support measure:
X , Y : ( X  Y )  s( X ) s(Y )
– Support of an itemset never exceeds the support of its
subsets
– This is known as the anti-monotone property of support
11
Illustrating Apriori Principle
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE 12
supersets
Illustrating Apriori Principle

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)

If every subset is considered, Itemset Count

6
C1 + 6C2 + 6C3 = 41 {Bread,Milk,Diaper} 3
With support-based pruning,
6 + 6 + 1 = 13
13
Apriori Algorithm

• Method:

– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified
• Generate length (k+1) candidate itemsets from length k
frequent itemsets
• Prune candidate itemsets containing subsets of length k that
are infrequent
• Count the support of each candidate by scanning the DB
• Eliminate candidates that are infrequent, leaving only those
that are frequent

14
Apriori: Reducing Number of Comparisons
• Candidate counting:
– Scan the database of transactions to determine the support of
each candidate itemset
– To reduce the number of comparisons, store the candidates in a
hash structure
• Instead of matching each transaction against every candidate,
match it against candidates contained in the hashed buckets

Transactions Hash Structure

TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke k
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Buckets 15
Apriori: Implementation Using Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3
5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}
You need:
• Hash function
• Max leaf size: max number of itemsets stored in a leaf node
(if number of candidate itemsets exceeds max leaf size, split the node)

Hash function 234

3,6,9 567
1,4,7
145 136
2,5,8 345 356 367
357 368
124 159 689
125 16
457 458
Apriori: Implementation Using Hash Tree
1 2 3 5 6 transaction

1+ 2356
2+ 356
12+ 356
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458
Match transaction against 11 out of 15 candidates
17
Apriori: A Candidate Generation-and-Test
Approach

• Apriori pruning principle: If there is any itemset which is

infrequent, its superset should not be generated/tested!
(Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)
• Method:
– Initially, scan DB once to get frequent 1-itemset
– Generate length (k+1) candidate itemsets from length k
frequent itemsets
– Test the candidates against DB
– Terminate when no frequent or candidate set can be
generated
18
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Database TDB Itemset sup
{A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2
Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan
{B, C, E} {B, C, E} 2 19
The Apriori Algorithm
• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1= candidates in Ck+1 with min_support
end
return k Lk; 20
Important Details of Apriori
• How to generate candidates?
– Step 1: self-joining Lk
– Step 2: pruning
• How to count supports of candidates?
• Example of Candidate-generation
– L3={abc, abd, acd, ace, bcd}
– Self-joining: L3*L3
• abcd from abc and abd
• acde from acd and ace
– Pruning:
• acde is removed because ade is not in L3
– C4={abcd}
21
How to Generate Candidates?
• Suppose the items in Lk-1 are listed in an order
• Step 1: self-joining Lk-1
insert into Ck
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Lk-1 p, Lk-1 q
where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1
• Step 2: pruning
forall itemsets c in Ck do
forall (k-1)-subsets s of c do
if (s is not in Lk-1) then delete c from Ck
22
How to Count Supports of Candidates?

• Why counting supports of candidates a problem?

– The total number of candidates can be very huge
– One transaction may contain many candidates
• Method:
– Candidate itemsets are stored in a hash-tree
– Leaf node of hash-tree contains a list of itemsets and
counts
– Interior node contains a hash table
– Subset function: finds all the candidates contained in
a transaction
23
Example: Counting Supports of Candidates

Subset function
Transaction: 1 2 3 5 6
3,6,9
1,4,7
2,5,8

1+2356

13+56 234
567
145 345 356 367
136 368
357
12+356
689
124
457 125 159
458

24
Challenges of Frequent Pattern Mining

• Challenges
– Multiple scans of transaction database
– Huge number of candidates
– Tedious workload of support counting for candidates
• Improving Apriori: general ideas
– Reduce passes of transaction database scans
– Shrink number of candidates
– Facilitate support counting of candidates

25
Partition: Scan Database Only Twice

• Any itemset that is potentially frequent in DB must be

frequent in at least one of the partitions of DB
– Scan 1: partition database and find local frequent
patterns
– Scan 2: consolidate global frequent patterns
• A. Savasere, E. Omiecinski, and S. Navathe. An efficient
algorithm for mining association in large databases. In
VLDB’95
26
DHP: Reduce the Number of Candidates
• A k-itemset whose corresponding hashing bucket count is
below the threshold cannot be frequent
– Candidates: a, b, c, d, e
– Hash entries: {ab, ad, ae} {bd, be, de} …
– Frequent 1-itemset: a, b, d, e
– ab is not a candidate 2-itemset if the sum of count of
{ab, ad, ae} is below support threshold
• J. Park, M. Chen, and P. Yu. An effective hash-based
algorithm for mining association rules. In SIGMOD’95
27
Apriori: Alternative Search Methods

• Traversal of Itemset Lattice

– General-to-specific vs Specific-to-general
Frequent
itemset Frequent
border null null itemset null
border

.. .. ..
.. .. ..
Frequent
{a1,a2,...,an} {a1,a2,...,an} itemset {a1,a2,...,an}
border
(a) General-to-specific (b) Specific-to-general (c) Bidirectional

28
Apriori: Alternative Search Methods

• Traversal of Itemset Lattice

– Breadth-first vs Depth-first

(a) Breadth first (b) Depth first

29
Bottlenecks of Apriori

• Candidate generation can result in huge

candidate sets:
– 104 frequent 1-itemset will generate 107 candidate 2-
itemsets
– To discover a frequent pattern of size 100, e.g., {a1,
a2, …, a100}, one needs to generate 2100 ~ 1030
candidates.
• Multiple scans of database:
– Needs (n +1 ) scans, n is the length of the longest
pattern

30
ECLAT: Another Method for Frequent Itemset
Generation
• ECLAT: for each item, store a list of transaction
ids (tids); vertical data layout

Horizontal
Data Layout Vertical Data Layout
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B
TID-list 31
ECLAT: Another Method for Frequent Itemset
Generation
• Determine support of any k-itemset by intersecting tid-
lists of two of its (k-1) subsets.
A B AB
1 1 1
4 2 5
5
6
 5
7
 7
8
7 8
8 10
9

• 3 traversal approaches:
– top-down, bottom-up and hybrid
• Advantage: very fast support counting
• Disadvantage: intermediate tid-lists may become too
32
large for memory
FP-growth: Another Method for Frequent
Itemset Generation

• Use a compressed representation of the

database using an FP-tree

• Once an FP-tree has been constructed, it uses a

recursive divide-and-conquer approach to mine
the frequent itemsets

33
FP-Tree Construction
null
After reading TID=1:
TID Items
1 {A,B} A:1
2 {B,C,D}
3 {A,C,D,E}
B:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} After reading TID=2:
null
7 {B,C}
8 {A,B,C} A:1 B:1
9 {A,B,D}
10 {B,C,E}
B:1 C:1

34
D:1
FP-Tree Construction
TID Items
Transaction
1 {A,B}
2 {B,C,D} Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E}
C:1 D:1

Header table D:1

C:3 E:1
Item Pointer D:1 E:1
A D:1
B E:1
C D:1
D Pointers are used to assist
35
E frequent itemset generation
FP-growth

Build conditional pattern

null base for E:
P = {(A:1,C:1,D:1),
B:3 (A:1,D:1),
A:7
(B:1,C:1)}
Recursively apply FP-
B:5 C:3 growth on P
C:1 D:1
C:3
D:1
D:1 E:1
D:1 E:1
E:1
D:1
36
FP-growth
Conditional tree for E:

Conditional Pattern base

null for E:
P = {(A:1,C:1,D:1,E:1),
B:1 (A:1,D:1,E:1),
A:2
(B:1,C:1,E:1)}
Count for E is 3: {E} is
C:1 frequent itemset
C:1 D:1
Recursively apply FP-
growth on P
D:1 E:1
E:1
E:1

37
FP-growth
Conditional tree for D
within conditional tree
for E:
Conditional pattern base
null for D within conditional
base for E:
P = {(A:1,C:1,D:1),
A:2
(A:1,D:1)}
Count for D is 2: {D,E} is
C:1 D:1 frequent itemset
Recursively apply FP-
growth on P
D:1

38
FP-growth
Conditional tree for C
within D within E:
Conditional pattern base
null for C within D within E:
P = {(A:1,C:1)}
A:1 Count for C is 1: {C,D,E}
is NOT frequent itemset

C:1

39
FP-growth
Conditional tree for A
within D within E:
Count for A is 2: {A,D,E}
null is frequent itemset
Next step:
A:2
Construct conditional tree
C within conditional tree
E
Continue until exploring
conditional tree for A
(which has only node A)

40
Benefits of the FP-tree Structure
• Performance study shows
– FP-growth is an order of
magnitude faster than
Apriori, and is also faster 100

than tree-projection 90

80
D1 FP-grow th runtime
D1 Apriori runtime

• Reasoning 70

Run time(sec.)
60

– No candidate generation, 50

no candidate test 40

– Use compact data 20

10
structure 0
0 0.5 1 1.5 2 2.5 3
– Eliminate repeated Support threshold(%)

database scan
– Basic operation is counting
and FP-tree building

41
Complexity of Association Mining
• Choice of minimum support threshold
– lowering support threshold results in more frequent itemsets
– this may increase number of candidates and max length of
frequent itemsets
• Dimensionality (number of items) of the data set
– more space is needed to store support count of each item
– if number of frequent items also increases, both computation and
I/O costs may also increase
• Size of database
– since Apriori makes multiple passes, run time of algorithm may
increase with number of transactions
• Average transaction width
– transaction width increases with denser data sets
– This may increase max length of frequent itemsets and traversals
of hash tree (number of subsets in a transaction increases with its
width)
42
Maximal Frequent Itemset
An itemset is maximal frequent if none of its immediate supersets
is frequent
null

Maximal A B C D E
Itemsets

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Infrequent
Itemsets Border
ABCD 43
E
Closed Itemset
• Problem with maximal frequent itemsets:
– Support of their subsets is not known – additional DB scans are
needed
• An itemset is closed if none of its immediate supersets
has the same support as the itemset
Itemset Support
{A} 4
TID Items Itemset Support
{B} 5
1 {A,B} {A,B,C} 2
{C} 3
2 {B,C,D} {A,B,D} 3
{D} 4
3 {A,B,C,D} {A,C,D} 2
{A,B} 4
4 {A,B,D} {B,C,D} 2
{A,C} 2 {A,B,C,D} 2
5 {A,B,C,D}
{A,D} 3
{B,C} 3
{B,D} 4
{C,D} 3 44
Maximal vs Closed Frequent Itemsets
Minimum support = 2 null Closed but
not
maximal
124 123 1234 245 345 Closed and
A B C D E maximal

12 124 24 4 123 2 3 24 34 45
AB AC AD AE BC BD BE CD CE DE

12 2 24 4 4 2 3 4
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

TID Items
# Closed = 9
1 ABC 2 4
ABCD ABCE ABDE ACDE BCDE
# Maximal = 4
2 ABCD
3 BCE
ABCDE
4 ACDE
45
5 DE
Maximal vs Closed Itemsets

Frequent
Itemsets

Closed
Frequent
Itemsets

Maximal
Frequent
Itemsets

46
Rule Generation

• Given a frequent itemset L, find all non-empty

subsets f  L such that f  L – f satisfies the
minimum confidence requirement
– If {A,B,C,D} is a frequent itemset, candidate rules:
ABC D, ABD C, ACD B, BCD A,
A BCD, B ACD, C ABD, D ABC
AB CD, AC  BD, AD  BC, BC AD,
BD AC, CD AB,

• If |L| = k, then there are 2k – 2 candidate

association rules (ignoring L   and   L)
47
Rule Generation
• How to efficiently generate rules from frequent
itemsets?
– In general, confidence does not have an anti-monotone
property
c(ABC D) can be larger or smaller than c(AB D)

– But confidence of rules generated from the same itemset

has an anti-monotone property
– e.g., L = {A,B,C,D}:

c(ABC  D)  c(AB  CD)  c(A  BCD)

• Confidence is anti-monotone w.r.t. number of items on the RHS of

the rule
48
Rule Generation
Lattice of rules
ABCD=>{ }
Low
Confidence
Rule
BCD=>A ACD=>B ABD=>C ABC=>D

CD=>AB BD=>AC BC=>AD AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

Pruned
Rules 49

Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Unit 4
No ratings yet
Unit 4
72 pages
DS2 Association
No ratings yet
DS2 Association
48 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Apriori and FP-Growth Algorithm
No ratings yet
Apriori and FP-Growth Algorithm
48 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
35 pages
Week 6 - Basic Association Analysis
No ratings yet
Week 6 - Basic Association Analysis
71 pages
Slides
No ratings yet
Slides
92 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
DMDW 3rd Module
No ratings yet
DMDW 3rd Module
34 pages
Datamining Lect2 Frequent
No ratings yet
Datamining Lect2 Frequent
59 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
Unit 5
No ratings yet
Unit 5
40 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Mod 4 Part1 - Merged
No ratings yet
Mod 4 Part1 - Merged
104 pages
11 Association Rules Mining New
No ratings yet
11 Association Rules Mining New
32 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Association Rule
No ratings yet
Association Rule
22 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Data Analytics - Unit - 4
No ratings yet
Data Analytics - Unit - 4
14 pages
Data Mining - Module2
No ratings yet
Data Mining - Module2
112 pages
BD25
No ratings yet
BD25
19 pages
Term Paper CS705A
No ratings yet
Term Paper CS705A
8 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
15 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
DM-M4.1-Association v25.4.2
No ratings yet
DM-M4.1-Association v25.4.2
40 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Apriori Algorithm Examples
No ratings yet
Apriori Algorithm Examples
45 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
DM Association
No ratings yet
DM Association
43 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Data Mining Association Analysis
No ratings yet
Data Mining Association Analysis
18 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Final Year Project
No ratings yet
Final Year Project
41 pages
Quantum Computing PPT 541-1
No ratings yet
Quantum Computing PPT 541-1
11 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
DPPM
No ratings yet
DPPM
2 pages
TYBSc (CS) WT & DA Practical Slips
No ratings yet
TYBSc (CS) WT & DA Practical Slips
30 pages
Winter 2022 3160714
No ratings yet
Winter 2022 3160714
2 pages
Data Mining & Warehousing Exam
No ratings yet
Data Mining & Warehousing Exam
28 pages
DM Consolidated
100% (1)
DM Consolidated
676 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
DM-Unit 1
No ratings yet
DM-Unit 1
110 pages
ENCh 27
No ratings yet
ENCh 27
10 pages
DM Notes Pra
No ratings yet
DM Notes Pra
63 pages
Employee Performance Appraisal For Salary Hike - Project
No ratings yet
Employee Performance Appraisal For Salary Hike - Project
93 pages
Identifying Coal Mine Safety Production Risk Factors by Employing Text
No ratings yet
Identifying Coal Mine Safety Production Risk Factors by Employing Text
15 pages
A Machine Learning Approach To Anomaly Detection
No ratings yet
A Machine Learning Approach To Anomaly Detection
13 pages
Clustering &amp Association Rule Mining MCQ
No ratings yet
Clustering &amp Association Rule Mining MCQ
5 pages
Data Analysis CRM
No ratings yet
Data Analysis CRM
6 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Lec 1 Data Mining Introduction For Exam
No ratings yet
Lec 1 Data Mining Introduction For Exam
48 pages
An Analysis of Data Mining Applications in Crime Domain: P. Thongtae and S. Srisuk
No ratings yet
An Analysis of Data Mining Applications in Crime Domain: P. Thongtae and S. Srisuk
5 pages
Data Analytics 2marks PDF
100% (1)
Data Analytics 2marks PDF
13 pages
Data Mining Lab 15.11.24
No ratings yet
Data Mining Lab 15.11.24
29 pages
"Educational Data Mining A Review of Satate of Art
No ratings yet
"Educational Data Mining A Review of Satate of Art
18 pages
Web Mining Frameworks
No ratings yet
Web Mining Frameworks
6 pages
Long Quiz 3 Math144
No ratings yet
Long Quiz 3 Math144
7 pages
Postprocessing in Machine Learning and Data Mining: Ivan Bruha A. (Fazel) Famili
No ratings yet
Postprocessing in Machine Learning and Data Mining: Ivan Bruha A. (Fazel) Famili
5 pages
Module 4 BDA NOTES
No ratings yet
Module 4 BDA NOTES
75 pages
Da Kit 601 It3
No ratings yet
Da Kit 601 It3
2 pages
DWM 5
No ratings yet
DWM 5
9 pages
Tanagra
No ratings yet
Tanagra
8 pages
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
No ratings yet
APRIORI Algorithm: Professor Anita Wasilewska Book Slides
23 pages
Data Mining Edited
No ratings yet
Data Mining Edited
7 pages
Scalable Algorithms For Association Mining: Mohammed J. Zaki, Member, IEEE
No ratings yet
Scalable Algorithms For Association Mining: Mohammed J. Zaki, Member, IEEE
19 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
18 pages

DM - Unit 2

Uploaded by

DM - Unit 2

Uploaded by

Motivation: Association Rule Mining

• Given a set of transactions, find rules that will predict the

• Support count () 1 Bread, Milk

– An implication expression of the form 1 Bread, Milk

If d=6, R = 602 rules

• Frequent itemset generation is still

– Match each transaction against every candidate

• Reduce the number of transactions (N)

• Reduce the number of comparisons (NM)

• Apriori principle holds due to the following property

ABCD ABCE ABDE ACDE BCDE

Item Count Items (1-itemsets)

If every subset is considered, Itemset Count

Transactions Hash Structure

Hash function 234

• Apriori pruning principle: If there is any itemset which is

C3 Itemset L3 Itemset sup

• Why counting supports of candidates a problem?

• Any itemset that is potentially frequent in DB must be

• Traversal of Itemset Lattice

• Traversal of Itemset Lattice

(a) Breadth first (b) Depth first

• Candidate generation can result in huge

• Use a compressed representation of the

• Once an FP-tree has been constructed, it uses a

Header table D:1

Build conditional pattern

Conditional Pattern base

– Use compact data 20

ABCD ABCE ABDE ACDE BCDE

• Given a frequent itemset L, find all non-empty

• If |L| = k, then there are 2k – 2 candidate

– But confidence of rules generated from the same itemset

c(ABC  D)  c(AB  CD)  c(A  BCD)

• Confidence is anti-monotone w.r.t. number of items on the RHS of

CD=>AB BD=>AC BC=>AD AD=>BC AC=>BD AB=>CD

D=>ABC C=>ABD B=>ACD A=>BCD

You might also like