0% found this document useful (0 votes)

12 views32 pages

Lec1b Assoc Rules

Uploaded by

jasperqiu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views32 pages

Lec1b Assoc Rules

Uploaded by

jasperqiu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Frequent Itemsets and

Association Rules

Market Baskets
Frequent Itemsets
A-Priori Algorithm
1

Thursday, March 19, 20

The Market-Basket Model

! A large set of items, e.g., things sold in a

supermarket.
! A large set of baskets, each of which is a
small set of the items, e.g., the things one
customer buys on one day.

Thursday, March 19, 20

Market-Baskets – (2)

! A general many-many mapping

(association) between two kinds of things.
But we ask about connections among “items,” not
“baskets.”
! The technology focuses on common
events, not rare events (“long tail”).

Thursday, March 19, 20

Support

! Simplest question: find sets of items that

appear “frequently” in the baskets.
! Support for itemset I (s(I)) = the number of
baskets containing all items in I.
! Given a support threshold s, sets of items
that appear in at least s baskets are called
frequent itemsets.

Thursday, March 19, 20

Example: Frequent Itemsets
! Items={milk, coke, pepsi, beer, juice}.
! Support = 3 baskets.
B1 = {m, c, b} B2 = {m, p, j}
B3 = {m, b} B4 = {c, j}
B5 = {m, p, b} B6 = {m, c, b, j}
B7 = {c, b, j} B8 = {b, c}
! Frequent itemsets: {m}, {c}, {b}, {j},

{m,b}, {b,c} , {c,j}. 5

Thursday, March 19, 20

Monotonicity

! For any sets of items I and any set of items J,

it holds that:

Thursday, March 19, 20

Applications – (1)

! Items = products; baskets = sets of

products someone bought in one trip to the
store.
! Example application: given that many
people buy beer and diapers together:
− Run a sale on diapers; raise price of beer.
! Only useful if many buy diapers & beer.

Thursday, March 19, 20

Applications – (2)

! Items = sets of documents; baskets = for

any sentence there is a basket containining
all documents with that sentence.
! Items that appear together too often could
represent plagiarism.

Thursday, March 19, 20

Applications – (3)

! Baskets = Web pages; items = words.

! Unusual words appearing together in a
large number of documents, e.g.,
“Hollande” and “Merkel”, may indicate an
interesting connection.

Thursday, March 19, 20

Scale of the Problem

! WalMart, Carrefour sell more than 100,000

items and can store billions of baskets.
! The Web has billions of words and many
billions of pages.

Thursday, March 19, 20

Association Rules

If-then rules I → j about the contents of

baskets, I is a set of items and j is an item.
!
I → j means: “if a basket contains all the
items in I then it is likely to contain j.”
Confidence of this association rule is the
probability of j given I, i.e.

Thursday, March 19, 20

Example: Confidence
+ B1 = {m, c, b} B2 = {m, p, j}
- B3 = {m, b} B4 = {c, j}
- B5 = {m, p, b} +B6 = {m, c, b, j}
B7 = {c, b, j} B8 = {b, c}

!
An association rule: {m, b} → c.
Confidence = ?

Thursday, March 19, 20

Example: Confidence
+ B1 = {m, c, b} B2 = {m, p, j}
- B3 = {m, b} B4 = {c, j}
- B5 = {m, p, b} +B6 = {m, c, b, j}
B7 = {c, b, j} B8 = {b, c}

!
An association rule: {m, b} → c.
Confidence = # baskets containing {m,b,c} / #
baskets containing {m,b} = 2/4 = 0.5
13

Thursday, March 19, 20

Finding Association Rules

! Question: “find all association rules with

support ≥ s and confidence ≥ c .”
− Note: “support” of an association rule is defined:
s(I → j) = s(I U {j})
! Hard part: finding the frequent itemsets.

Thursday, March 19, 20

From Frequent Itemsets to AR

Note: if I → j has high support s and confidence,

then I U {j} is frequent which implies that I is
frequent.

Hence, AR with high confidence are found by

considering every frequent itemset I and check
if I \ {j} → j has high confidence, for any j in I.

Difficult part: find frequent itemsets. 15

Thursday, March 19, 20

Computation Model

! Typically, data is kept in raw files rather than

in a database system.
Stored on disk.
Stored basket-by-basket.
Expand baskets into pairs, triples, etc. as you read
baskets.
! Use k nested loops to generate all sets of size k.

Thursday, March 19, 20

File Organization

Item
Item
Item Basket 1 Example: every line
Item
Item in a text document
Item
Item Basket 2
contains all items of a
Item given basket.
Item
Item Basket 3
Item
Item

Etc.

Thursday, March 19, 20

Computation Model – (2)

! The true cost of mining disk-resident data is

usually the number of disk I/O’s.
! In practice, association-rule algorithms read
data in passes – all baskets read in turn.
! Thus, we measure the cost by the number
of passes an algorithm takes.

Thursday, March 19, 20

Finding Frequent Pairs
Main-Memory Bottleneck
! For many frequent-itemset algorithms, main
memory is the critical resource.
As we read baskets, we need to count
something, e.g., occurrences of pairs.
The number of different things we can count is
limited by main memory.

Thursday, March 19, 20

! The hardest problem often turns out to be
finding the frequent pairs.
Why? Often frequent pairs are common, frequent
triples are rare.
! Why? Probability of being frequent drops
exponentially with size; number of sets grows
more slowly with size.
! We’ll concentrate on pairs, then extend to
larger sets.
20

Thursday, March 19, 20

Naïve Algorithm

! Read file once, counting in main memory

the occurrences of each pair.
From each basket of n items, generate its
n (n -1)/2 pairs by two nested loops.
! Fails if (#items)2 exceeds main memory.
Remember: #items can be 100K (Wal-Mart) or
10B (Web pages).

Thursday, March 19, 20

Example: Counting Pairs

! Suppose 105 items.

! Suppose counts are 4-byte integers.
! Number of pairs of items: 105(105-1)/2 =
5*109 (approximately).
! Therefore, 2*1010 (20 gigabytes) of main
memory needed.

Thursday, March 19, 20

A-Priori Algorithm – (1)

! A two-pass approach called a-priori limits

the need for main memory.
! Key idea: monotonicity : if a set of items
appears at least s times, so does every
subset.
For pairs: if item i does not appear in s baskets,
then no pair including i can appear in s
baskets.

Thursday, March 19, 20

A-Priori Algorithm – (2)
! Pass 1: Read baskets and count in main
memory the occurrences of each item.
Requires only memory proportional to #items.
! Items that appear at least s times are the
frequent items.

Thursday, March 19, 20

A-Priori Algorithm – (3)
Pass 2: Read baskets again and count in
main memory only those pairs both of
which were found in pass 1 to be frequent.
! To count number of item pairs use a hash

function. Requires memory proportional to

square of frequent items only, plus a list of
the frequent items (so you know what must
be counted), plus space for hashing.

Thursday, March 19, 20

Main Memory Data in A-Priori

Item counts Frequent items

Counts of
pairs of
frequent
items

Pass 1 Pass 2 26

Thursday, March 19, 20

Frequent k-itemsets

! For each k, we construct two sets of k -

itemsets (sets of size k ):
− Ck = candidate k -sets = those that might be
frequent sets (support > s ) based on
information from the the (k –1)th pass .
− Fk = the set of truly frequent k -itemsets.

Thursday, March 19, 20

Frequent k-Itemsets

! C1 = all items
! Fk = members of Ck with support ≥ s.

Thursday, March 19, 20

Frequent k-itemsets

! C1 = all items
! Fk = members of Ck with support ≥ s.
! Ck +1 = (k +1) -sets, each k of which is in Fk .
(e.g. {a,b,c,d} is in Ck only if {b,c,d}, {a,c,d}, {a,b,d},
{a,b,c} are all frequent )

Thursday, March 19, 20

Frequent k-itemsets

! C1 = all items
! Fk = members of Ck with support ≥ s.
! Ck +1 = (k +1) -sets, each k of which is in Fk .
(e.g. {a,b,c,d} is in Ck only if {b,c,d}, {a,c,d}, {a,b,d},
{a,b,c} are all frequent )
! When do we stop?

Thursday, March 19, 20

Frequent k-itemsets

Thursday, March 19, 20

A-Priori for All Frequent Itemsets

! One pass for each k.

! Needs room in main memory to count each
candidate k -set.
! For typical market-basket data and
reasonable support (e.g., 1%), k = 2
requires the most memory.

Thursday, March 19, 20

Ohs352 Project Report Notes
No ratings yet
Ohs352 Project Report Notes
67 pages
Unit 4 - DA - Frequent Itemsets and Associations
No ratings yet
Unit 4 - DA - Frequent Itemsets and Associations
31 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
178 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Association Rules and Frequent Item Sets
No ratings yet
Association Rules and Frequent Item Sets
98 pages
Big Data - Week04 - Association Rules
No ratings yet
Big Data - Week04 - Association Rules
46 pages
Big Data Analytics AAM Unit 4
No ratings yet
Big Data Analytics AAM Unit 4
80 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
L2: Frequent Itemsets Mining and Association Rules
No ratings yet
L2: Frequent Itemsets Mining and Association Rules
54 pages
ch06 Assocrules
No ratings yet
ch06 Assocrules
59 pages
38 GM - ASAP-Association Rule Mining
No ratings yet
38 GM - ASAP-Association Rule Mining
64 pages
Assoc Rules1
No ratings yet
Assoc Rules1
32 pages
L13 Apriori
No ratings yet
L13 Apriori
32 pages
Ch06 Frequent Itemsets
No ratings yet
Ch06 Frequent Itemsets
59 pages
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
100% (1)
DWM Unit 5 Mining Frequent Patterns and Cluster Analysis
15 pages
4 Frequent Item Set Mining & Association Rules
No ratings yet
4 Frequent Item Set Mining & Association Rules
68 pages
Unit - IV A DA
No ratings yet
Unit - IV A DA
39 pages
Iare Iare Ads Lecture Notes
No ratings yet
Iare Iare Ads Lecture Notes
86 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Unit 3
No ratings yet
Unit 3
44 pages
Association Rules
No ratings yet
Association Rules
58 pages
Data Mining: Frequent Itemsets and Association Rules
No ratings yet
Data Mining: Frequent Itemsets and Association Rules
105 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
ch06 Assocrules
No ratings yet
ch06 Assocrules
110 pages
Association Rules
No ratings yet
Association Rules
33 pages
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
No ratings yet
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
86 pages
Lecture 10-Assiciation Rule Mining-I-M
No ratings yet
Lecture 10-Assiciation Rule Mining-I-M
30 pages
ch03 Assocrules
No ratings yet
ch03 Assocrules
59 pages
Limited Pass Algorithm
No ratings yet
Limited Pass Algorithm
33 pages
02 Assocrules
No ratings yet
02 Assocrules
56 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
5 Frequent Pattern Mining
No ratings yet
5 Frequent Pattern Mining
44 pages
Chapter 9 - Apriori
No ratings yet
Chapter 9 - Apriori
45 pages
14-Assosiation Rules
No ratings yet
14-Assosiation Rules
37 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rules
No ratings yet
Association Rules
56 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Datamining Lect2 Frequent
No ratings yet
Datamining Lect2 Frequent
59 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
15a. Caretium NB-201 PDF
No ratings yet
15a. Caretium NB-201 PDF
2 pages
Asssociation Rules: Prof. Sin-Min Lee Department of Computer Science
No ratings yet
Asssociation Rules: Prof. Sin-Min Lee Department of Computer Science
68 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
MS (Data Science) Fall 2020 Semester
No ratings yet
MS (Data Science) Fall 2020 Semester
36 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Frequent Itemsets
No ratings yet
Frequent Itemsets
11 pages
Partial Differential Equations
No ratings yet
Partial Differential Equations
45 pages
2D Shapes
No ratings yet
2D Shapes
62 pages
Data Mining of Very Large Data
No ratings yet
Data Mining of Very Large Data
50 pages
DataMining Chapter2
No ratings yet
DataMining Chapter2
8 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Brin Et Al
No ratings yet
Brin Et Al
10 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Reward Management Practices and Its Impact On Employees Motivation An Evidence
No ratings yet
Reward Management Practices and Its Impact On Employees Motivation An Evidence
6 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Report
No ratings yet
Report
5 pages
"Association Rules": Market Baskets Frequent Itemsets A-Priori Algorithm
No ratings yet
"Association Rules": Market Baskets Frequent Itemsets A-Priori Algorithm
30 pages
Design of Beam
No ratings yet
Design of Beam
70 pages
Unit 2
No ratings yet
Unit 2
14 pages
Association Rule Mining
No ratings yet
Association Rule Mining
11 pages
Association Rules
No ratings yet
Association Rules
24 pages
Friction - DPPs
No ratings yet
Friction - DPPs
11 pages
(Download) SSC - CGL Tier-II Exam Paper-I (Arithmetical Ability) Held On - 16-09-2012 - SSCPORTAL PDF
No ratings yet
(Download) SSC - CGL Tier-II Exam Paper-I (Arithmetical Ability) Held On - 16-09-2012 - SSCPORTAL PDF
12 pages
Electrical and Electronics Measurement and Instrumentation
100% (1)
Electrical and Electronics Measurement and Instrumentation
50 pages
Incremental Rules: Goals For Market-Basket Mining
No ratings yet
Incremental Rules: Goals For Market-Basket Mining
5 pages
ZXF01U03
No ratings yet
ZXF01U03
4 pages
1.3 Translational Equilibrium Statics
No ratings yet
1.3 Translational Equilibrium Statics
55 pages
A Survey of Association Rule Mining For Customer Relationship Management
No ratings yet
A Survey of Association Rule Mining For Customer Relationship Management
7 pages
CBSE Class 8 Maths Activity 4
No ratings yet
CBSE Class 8 Maths Activity 4
2 pages
Buble Sort
No ratings yet
Buble Sort
97 pages
State Reduction and State Ass
No ratings yet
State Reduction and State Ass
37 pages
RPT Math DLP Year 2 (2025)
No ratings yet
RPT Math DLP Year 2 (2025)
17 pages
MPRA Paper 83458
No ratings yet
MPRA Paper 83458
32 pages
MMPBSA Python Manual
No ratings yet
MMPBSA Python Manual
17 pages
Classical Wavelet Theory: Jonathan R. Partington, University of Leeds, School of Mathematics April 29, 2010
No ratings yet
Classical Wavelet Theory: Jonathan R. Partington, University of Leeds, School of Mathematics April 29, 2010
45 pages
RRB NTPC 12 January 2021 Question Paper PDF
No ratings yet
RRB NTPC 12 January 2021 Question Paper PDF
3 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
III-Day 37
No ratings yet
III-Day 37
3 pages
Yangon Wind Speed
No ratings yet
Yangon Wind Speed
5 pages
CH2114
No ratings yet
CH2114
2 pages
Act std4
No ratings yet
Act std4
3 pages
Ekeland 1974
No ratings yet
Ekeland 1974
30 pages
Model Test 2
No ratings yet
Model Test 2
6 pages
Dushu # Unit-3, 4 Ru TK
No ratings yet
Dushu # Unit-3, 4 Ru TK
22 pages
Jyoti Mam 8th Cbse Class Test
No ratings yet
Jyoti Mam 8th Cbse Class Test
2 pages
Okun'S Law in Malaysia: An Autoregressive Distributed Lag (Ardl) Approach With Hodrick-Prescott (HP) Filter
No ratings yet
Okun'S Law in Malaysia: An Autoregressive Distributed Lag (Ardl) Approach With Hodrick-Prescott (HP) Filter
9 pages

Lec1b Assoc Rules

Uploaded by

Lec1b Assoc Rules

Uploaded by

Frequent Itemsets and

Thursday, March 19, 20

! A large set of items, e.g., things sold in a

Thursday, March 19, 20

! A general many-many mapping

Thursday, March 19, 20

! Simplest question: find sets of items that

Thursday, March 19, 20

{m,b}, {b,c} , {c,j}. 5

Thursday, March 19, 20

! For any sets of items I and any set of items J,

Thursday, March 19, 20

! Items = products; baskets = sets of

Thursday, March 19, 20

! Items = sets of documents; baskets = for

Thursday, March 19, 20

! Baskets = Web pages; items = words.

Thursday, March 19, 20

! WalMart, Carrefour sell more than 100,000

Thursday, March 19, 20

If-then rules I → j about the contents of

Thursday, March 19, 20

Thursday, March 19, 20

Thursday, March 19, 20

! Question: “find all association rules with

Thursday, March 19, 20

Note: if I → j has high support s and confidence,

Hence, AR with high confidence are found by

Difficult part: find frequent itemsets. 15

Thursday, March 19, 20

! Typically, data is kept in raw files rather than

Thursday, March 19, 20

Thursday, March 19, 20

! The true cost of mining disk-resident data is

Thursday, March 19, 20

Thursday, March 19, 20

Thursday, March 19, 20

! Read file once, counting in main memory

Thursday, March 19, 20

! Suppose 105 items.

Thursday, March 19, 20

! A two-pass approach called a-priori limits

Thursday, March 19, 20

Thursday, March 19, 20

function. Requires memory proportional to

Thursday, March 19, 20

Item counts Frequent items

Thursday, March 19, 20

! For each k, we construct two sets of k -

Thursday, March 19, 20

Thursday, March 19, 20

Thursday, March 19, 20

Thursday, March 19, 20

Thursday, March 19, 20

! One pass for each k.

Thursday, March 19, 20

You might also like