0% found this document useful (0 votes)

2 views75 pages

Session 8-Association Rules Mining

The document provides an overview of Association Rules Mining, including its introduction, mining algorithms, data formats, and key concepts such as support and confidence. It discusses the Apriori algorithm for discovering frequent itemsets and generating association rules, emphasizing the importance of minimum support and confidence thresholds. Additionally, it highlights various applications of association rules in market basket analysis and other fields.

Uploaded by

dothaogiangt67

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views75 pages

Session 8-Association Rules Mining

Uploaded by

dothaogiangt67

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

Session 8: Association Rules Mining

Lecturer: Dr. Le Hoang Son

Vietnam National University (VNU)
[email protected]
[email protected]

1
Contents
1 Introduction

2 Mining algorithms

3 Data formats for mining

4 Multiple minimum supports

5 Mining class association rules

6 Discussion & Exercises

2
1. Introduction
❖ Data Mining
▪ Data processing
▪ Data warehouses and OLAP
▪ Association Rules Mining
▪ Classification
▪ Clustering
▪ Sequential Patterns Mining
▪ Advanced topics: outlier detection, web mining

3
1. Introduction
❖ Proposed by Agrawal et al. in 1993. It is an important
data mining model studied extensively by the database
and data mining community

❖ Assume all data are categorical. No good algorithm for

numeric data

❖ Initially used for Market Basket Analysis to find how

items purchased by customers are related

Bread → Milk [sup = 5%, conf = 100%]

4
The model: data

❖ I = {i1, i2, …, im}: a set of items

❖ Transaction t :
▪ t a set of items, and t ⊆ I

❖ Transaction Database T: a set of

transactions T = {t1, t2, …, tn}

5
Example of Transaction database: supermarket data

❖ Market basket transactions:

t1: {bread, cheese, milk}
t2: {apple, eggs, salt, yogurt}
… …
tn: {biscuit, eggs, milk}

❖ Concepts:
▪ An item: an item/article in a basket
▪ I: the set of all items sold in the store
▪ A transaction: items purchased in a basket; it may
have TID (transaction ID)
▪ A transactional dataset: A set of transactions

6
Example of transaction database: a set of documents

❖ A text document data set. Each document is treated

as a “bag” of keywords

doc1: Student, Teach, School

doc2: Student, School
doc3: Teach, School, City, Game
doc4: Baseball, Basketball
doc5: Basketball, Player, Spectator
doc6: Baseball, Coach, Game, Team
doc7: Basketball, Team, City, Game

7
The model: rules
❖ A transaction t contains X, a set of items
(itemset) in I, if X ⊆ t

❖ An association rule is an implication of the form:

X → Y, where X, Y ⊂ I, and X ∩Y = ∅

❖ An itemset is a set of items

▪ E.g., X = {milk, bread, cereal} is an itemset

❖ A k-itemset is an itemset with k items

▪ E.g., {milk, bread, cereal} is a 3-itemset
8
Rule strength measures
❖ Support: The rule holds with support sup in T
(the transaction data set) if sup% of transactions
contain X ∪ Y
▪ sup = Pr(X ∪ Y)

❖ Confidence: The rule holds in T with confidence

conf if conf% of transactions that contain X also
contain Y
▪ conf = Pr(Y | X)

❖ An association rule is a pattern that states when

X occurs, Y occurs with certain probability
9
Support and Confidence
❖ Support count: The support count of an itemset
X, denoted by X.count, in a data set T is the
number of transactions in T that contain X.
Assume T has n transactions.
❖ Then,

10
Rule Measures: Support and Confidence
Customer
buys both Find all the rules X & Y ⇒ Z with minimum
Customer
buys diaper confidence and support
▪ support, s, probability that a transaction
contains {X  Y  Z}
▪ confidence, c, conditional probability
that a transaction having {X  Y} also
Customer contains Z
buys beer

Let minimum support 50%, and

minimum confidence 50%, we have
▪ A ⇒ C (50%, 66.6%)
▪ C ⇒ A (50%, 100%)

11
Goal and key features

❖ Goal: Find all rules that satisfy the user-specified

minimum support (minsup) and minimum
confidence (minconf).
❖ Key Features
▪ Completeness: find all rules.
▪ No target item(s) on the right-hand-side
▪ Mining with data on hard disk (not in memory)

12
An example
t1: Beef, Chicken, Milk
❖ Transaction data t2: Beef, Cheese
t3: Cheese, Boots
❖ Assume: t4: Beef, Chicken, Cheese
minsup = 30% t5: Beef, Chicken, Clothes, Cheese, Milk
t6: Chicken, Clothes, Milk
minconf = 80%
t7: Chicken, Milk, Clothes

❖ An example frequent itemset:

{Chicken, Clothes, Milk} [sup = 3/7]

❖ Association rules from the itemset:

Clothes → Milk, Chicken [sup = 3/7, conf = 3/3]
… …
Clothes, Chicken → Milk, [sup = 3/7, conf = 3/3]
13
Application Examples
❖ Market Basket Analysis
▪ * ⇒ Maintenance Agreement (What the store should do to
boost Maintenance Agreement sales?)

▪ Home Electronics ⇒ * (What other products should the store

stocks up on if the store has a sale on Home Electronics?)

▪ Attached mailing in direct marketing

▪ Detecting “ping-pong”ing of patients

• Transaction: patient
• Item: doctor/clinic visited by patient
• Support of the rule: number of common patients

14
Transaction data representation
❖ A simplistic view of shopping baskets

❖ Some important information not considered:

▪ the quantity of each item purchased and
▪ the price paid

15
Association Rule Mining: A Road Map
❖ Boolean vs. quantitative associations (Based on the types of values
handled)
▪ buys(x, “SQLServer”) ^ buys(x, “DMBook”) ® buys(x, “DBMiner”)
[0.2%, 60%]
▪ age(x, “30..39”) ^ income(x, “42..48K”) ® buys(x, “PC”) [1%, 75%]

❖ Single dimension vs. multiple dimensional associations (see ex. Above)

❖ Single level vs. multiple-level analysis

▪ What brands of beers are associated with what brands of diapers?

❖ Various extensions
▪ Correlation, causality analysis
• Association does not necessarily imply correlation or causality
▪ Constraints enforced
• E.g., small sales (sum < 100) trigger big buys (sum > 1,000)?
16
Contents
1 Introduction

2 Mining algorithms

3 Data formats for mining

4 Multiple minimum supports

5 Mining class association rules

6 Discussion & Exercises

17
2. Mining algorithms
❖ They use different strategies and data structures

❖ Their resulting sets of rules are all the same

▪ Given a transaction data set T, and a minimum
support and a minimum confident, the set of
association rules existing in T is uniquely determined

❖ Any algorithm should find the same set of rules although

their computational efficiencies and memory
requirements may be different.

❖ The Apriori Algorithm

18
Discovering Rules

❖ Naïve Algorithm:

for each frequent itemset l do

for each subset c of l do
if (support(l ) / support(l - c) >= minconf) then
output the rule (l – c ) ⇒ c,
with confidence = support(l ) / support (l - c )
and support = support(l )

19
Discovering Rules (2)

❖ Lemma. If consequent c generates a valid rule, so do all

subsets of c. (e.g. X ⇒ YZ, then XY ⇒ Z and XZ ⇒ Y)

❖ Example: Consider a frequent itemset ABCDE

If ACDE ⇒ B and ABCE ⇒ D are the only

one-consequent rules with minimum support confidence,

then: ACE ⇒ BD is the only other rule that needs to be

tested

20
Mining Frequent Itemsets: the Key Step

❖ Find the frequent itemsets: the sets of items that have

minimum support
▪ A subset of a frequent itemset must also be a frequent
itemset
• i.e., if {AB} is a frequent itemset, both {A} and {B}
should be a frequent itemset
▪ Iteratively find frequent itemsets with cardinality from 1
to k (k-itemset)

❖ Use the frequent itemsets to generate association rules

21
The Apriori algorithm
t1: Beef, Chicken, Milk
t2: Beef, Cheese
t3: Cheese, Boots
t4: Beef, Chicken, Cheese
t5: Beef, Chicken, Clothes, Cheese, Milk
t6: Chicken, Clothes, Milk
t7: Chicken, Milk, Clothes

▪ Find all itemsets that have minimum support (frequent

itemsets, also called large itemsets)
▪ Use frequent itemsets to generate rules
❖ E.g., a frequent itemset
{Chicken, Clothes, Milk} [sup = 3/7]
and one rule from the frequent itemset
Clothes → Milk, Chicken [sup = 3/7, conf = 3/3]
22
Step 1: Mining all frequent itemsets
❖ A frequent itemset is an itemset whose support is ≥
minsup

❖ Key idea: The apriori property (downward closure

property): any subsets of a frequent itemset are also
frequent itemsets
ABC ABD ACD BCD

AB AC AD BC BD CD

A B C D

23
The Algorithm
❖ Iterative algo. (also called level-wise search): Find all
1-item frequent itemsets; then all 2-item frequent
itemsets, and so on
▪ In each iteration k, only consider itemsets that contain
some k-1 frequent itemset.

❖ Find frequent itemsets of size 1: F1

❖ From k = 2
▪ Ck = candidates of size k: those itemsets of size k that
could be frequent, given Fk-1
▪ Fk = those itemsets that are actually frequent, Fk ⊆ Ck
(need to scan the database once).
24
Example – Finding frequent itemsets

Dataset T TID Items

minsup=0.5 T100 1, 3, 4
itemset:count
T200 2, 3, 5
1. scan T 🡺 C1: {1}:2, {2}:3, {3}:3, {4}:1, {5}:3 T300 1, 2, 3,
🡺 F1: {1}:2, {2}:3, {3}:3, {5}:3 5
🡺 C2: {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5} T400 2, 5
2. scan T 🡺 C2: {1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2
🡺 F2: {1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2
🡺 C3: {2, 3,5}
3. scan T 🡺 C3: {2, 3, 5}:2 🡺 F3: {2, 3, 5}

25
Details: ordering of items
❖ The items in I are sorted in lexicographic order (which is
a total order).

❖ The order is used throughout the algorithm in each

itemset.

❖ {w[1], w[2], …, w[k]} represents a k-itemset w consisting

of items w[1], w[2], …, w[k], where w[1] < w[2] < … <
w[k] according to the total order.

26
Details: the algorithm
Algorithm Apriori(T)
C1 ← init-pass(T);
F1 ← {f | f ∈ C1, f.count/n ≥ minsup}; // n: no. of transactions in T
for (k = 2; Fk-1 ≠ ∅; k++) do
Ck ← candidate-gen(Fk-1);
for each transaction t ∈ T do
for each candidate c ∈ Ck do
if c is contained in t then
c.count++;
end
end
Fk ← {c ∈ Ck | c.count/n ≥ minsup}
end
return F ← k Fk;

27
Apriori candidate generation
❖ The candidate-gen function takes Fk-1 and returns a
superset (called the candidates) of the set of all
frequent k-itemsets. It has two steps:
▪ join step: Generate all possible candidate itemsets
Ck of length k
▪ prune step: Remove those candidates in Ck that
cannot be frequent

28
Candidate-gen function
Function candidate-gen(Fk-1)
Ck ← ∅;
for all f1, f2 ∈ Fk-1
with f1 = {i1, … , ik-2, ik-1}
and f2 = {i1, … , ik-2, i’k-1}
and ik-1 < i’k-1 do
c ← {i1, …, ik-1, i’k-1}; // join f1 and f2
Ck ← Ck ∪ {c};
for each (k-1)-subset s of c do
if (s ∉ Fk-1) then
delete c from Ck; // prune
end
end
return Ck;

29
An example
❖ F3 = {{1, 2, 3}, {1, 2, 4}, {1, 3, 4},
{1, 3, 5}, {2, 3, 4}}

❖ After join
▪ C4 = {{1, 2, 3, 4}, {1, 3, 4, 5}}

❖ After pruning:
▪ C4 = {{1, 2, 3, 4}}
because {1, 4, 5} is not in F3 ({1, 3, 4, 5} is removed)

30
The Apriori Algorithm — Example
Min support =50% = 2 trans
Database D
C1 L1

Scan D

C2 C2
L2 Scan D

C3 Scan D L3

31
Step 2: Generating rules from frequent itemsets

❖ Frequent itemsets ≠ association rules

❖ One more step is needed to generate association rules

❖ For each frequent itemset X,

For each proper nonempty subset A of X,
▪ Let B = X - A
▪ A → B is an association rule if
• Confidence(A → B) ≥ minconf,
support(A → B) = support(A∪B) = support(X)
confidence(A → B) = support(A ∪ B) / support(A)

32
Generating rules: an example
❖ Suppose {2,3,4} is frequent, with sup=50%
▪ Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4},
with sup=50%, 50%, 75%, 75%, 75%, 75% respectively
▪ These generate these association rules:
• 2,3 → 4, confidence=100%
• 2,4 → 3, confidence=100%
• 3,4 → 2, confidence=67%
• 2 → 3,4, confidence=67%
• 3 → 2,4, confidence=67%
• 4 → 2,3, confidence=67%
• All rules have support = 50%

33
Generating rules: summary

❖ To recap, in order to obtain A → B, we need to have

support (A ∪ B) and support (A)

❖ All the required information for confidence computation

has already been recorded in itemset generation. No
need to see the data T any more

❖ This step is not as time-consuming as frequent itemsets

generation

34
On Apriori Algorithm
Seems to be very expensive
❖ Level-wise search
❖ K = the size of the largest itemset
❖ It makes at most K passes over data
❖ In practice, K is bounded (10)
❖ The algorithm is very fast. Under some conditions, all
rules can be found in linear time
❖ Scale up to large data sets

35
Hash-tree: search

❖ Given a transaction T and a set Ck find all members of its

members contained in T

❖ Assume an ordering on the items

❖ Start from the root, use every item in T to go to the next

node

❖ If you are at an interior node and you just used item i, then
use each item that comes after i in T

❖ If you are at a leaf node check the itemsets

36
Methods to Improve Apriori’s Efficiency
❖ Transaction reduction: A transaction that does not contain any
frequent k-itemset is useless in subsequent scans

❖ Partitioning: Any itemset that is potentially frequent in DB must be

frequent in at least one of the partitions of DB

❖ Sampling: mining on a subset of given data, lower support

threshold + a method to determine the completeness

❖ Dynamic itemset counting: add new candidate itemsets only when

all of their subsets are estimated to be frequent

37
Is Apriori Fast Enough? — Performance Bottlenecks

❖ The core of the Apriori algorithm:

▪ Use frequent (k – 1)-itemsets to generate candidate frequent
k-itemsets
▪ Use database scan and pattern matching to collect counts for the
candidate itemsets

❖ The bottleneck of Apriori: candidate generation

▪ Huge candidate sets:
• 104 frequent 1-itemset will generate 107 candidate 2-itemsets
• To discover a frequent pattern of size 100, e.g., {a1, a2, …,
a100}, one needs to generate 2100 ≈ 1030 candidates.
▪ Multiple scans of database:
• Needs (n +1 ) scans, n is the length of the longest pattern

38
Max-Miner

❖ Max-miner finds long patterns efficiently: the maximal

frequent patterns

❖ Instead of checking all subsets of a long pattern, try to

detect long patterns early

❖ Scales linearly to the size of the patterns

39
Max-Miner: the idea

Set enumeration tree of

φ an ordered set

1 2 3 4

1,2 1,3 1,4 2,3 2,4

Pruning: (1) set infrequency
3,4
(2) Superset frequency
1,2,3 1,2,4 1,3,4 2,3,4
Each node is a candidate group g
h(g) is the head: the itemset of the node
1,2,3,4
t(g) tail: an ordered set that contains all
items that can appear in the subnodes

Example: h({1}) = {1} and t({1}) = {2,3,4}

40
Max-miner pruning

❖ When we count the support of a candidate group g, we

compute also the support for h(g), h(g) t(g) and h(g)
{i} for each i in t(g)

❖ If h(g) t(g) is frequent, then stop expanding the node g

and report the union as frequent itemset

❖ If h(g) {i} is infrequent, then remove i from all

sub-nodes (just remove i from any tail of a group after g)

❖ Expand the node g by one and do the same

41
The algorithm
Max-Miner
Set candidate groups C {}
Set of Itemsets F {Gen-Initial-Groups(T,C)}
while C not empty do
scan T to count the support of all candidate groups in C
for each g in C s.t. h(g) U t(g) is frequent do
F  F U {h(g) U t(g)}
Set candidate groups Cnew{ }
for each g in C such that h(g) U t(g) is infrequent do
F F U {Gen-sub-nodes(g, Cnew)}
C
remove from F any itemset with a proper superset in F
remove from C any group g s.t. h(g) U t(g) has a superset in F
return F

42
The algorithm (2)
Gen-Initial-Groups(T, C)
scan T to obtain F1, the set of frequent 1-itemsets
impose an ordering on items in F1
for each item i in F1 other than the greatest itemset do
let g be a new candidate with h(g) = {i}
and t(g) = {j | j follows i in the ordering}
C C U {g}
return the itemset F1 (an the C of course)

Gen-sub-nodes(g, C) /* generation of new itemsets at the next level*/

remove any item i from t(g) if h(g) U {i} is infrequent
reorder the items in t(g)
for each i in t(g) other than the greatest do
let g’ be a new candidate with h(g’) = h(g) U {i} and t(g’) = {j | j in t(g)
and j is after i in t(g)}
C  C U {g’}
return h(g) U {m} where m is the greatest item in t(g) or h(g) if t(g) is empty

43
Item Ordering

❖ Re-ordering items we try to increase the effectiveness of

frequency-pruning

❖ Very frequent items have higher probability to be

contained in long patterns

❖ Put these item at the end of the ordering, so they appear

in many tails

44
More on association rule mining
❖ Clearly the space of all association rules is exponential,
O(2m), where m is the number of items in I

❖ The mining exploits sparseness of data, and high

minimum support and high minimum confidence values

❖ Still, it always produces a huge number of rules,

thousands, tens of thousands, millions, ...

45
Contents
1 Introduction

2 Mining algorithms

3 Data formats for mining

4 Multiple minimum supports

5 Mining class association rules

6 Discussion & Exercises

46
3. Different data formats for mining
❖ The data can be in transaction form or table form

Transaction form: a, b
a, c, d, e
a, d, f

Table form: Attr1 Attr2 Attr3

a, b, d
b, c, e

❖ Table data need to be converted to transaction form for

association mining

47
Conversion
Table form: Attr1 Attr2 Attr3
a, b, d
b, c, e

⇒ Transaction form:
(Attr1, a), (Attr2, b), (Attr3, d)
(Attr1, b), (Attr2, c), (Attr3, e)

candidate-gen can be slightly improved. Why?

48
Contents
1 Introduction

2 Mining algorithms

3 Data formats for mining

4 Multiple minimum supports

5 Mining class association rules

6 Discussion & Exercises

49
4. Problems with the association mining
❖ Single minsup: It assumes that all items in the data are
of the same nature and/or have similar frequencies

❖ Not true: In many applications, some items appear

very frequently in the data, while others rarely appear

❖ E.g., in a supermarket, people buy food processor and

cooking pan much less frequently than they buy bread
and milk

50
Rare Item Problem
❖ If the frequencies of items vary a great deal, we will
encounter two problems:
▪ If minsup is set too high, those rules that involve rare
items will not be found
▪ To find rules that involve both frequent and rare items,
minsup has to be set very low. This may cause
combinatorial explosion because those frequent items
will be associated with one another in all possible ways

51
Multiple minsups model

❖ The minimum support of a rule is expressed in terms of

minimum item supports (MIS) of the items that appear
in the rule

❖ Each item can have a minimum item support

❖ By providing different MIS values for different items, the

user effectively expresses different support
requirements for different rules

52
Minsup of a rule
❖ Let MIS(i) be the MIS value of item i. The minsup of a rule
R is the lowest MIS value of the items in the rule

❖ I.e., a rule R: a1, a2, …, ak → ak+1, …, ar satisfies its

minimum support if its actual support is ≥
min(MIS(a1), MIS(a2), …, MIS(ar)).

53
An Example
❖ Consider the following items:
bread, shoes, clothes

The user-specified MIS values are as follows:

MIS(bread) = 2% MIS(shoes) = 0.1%
MIS(clothes) = 0.2%

The following rule doesn’t satisfy its minsup:

clothes → bread [sup=0.15%,conf =70%]

The following rule satisfies its minsup:

clothes → shoes [sup=0.15%,conf =70%]
54
Downward closure property
❖ In the new model, the property no longer holds (?)

❖ E.g., Consider four items 1, 2, 3 and 4 in a database.

Their minimum item supports are
MIS(1) = 10% MIS(2) = 20%
MIS(3) = 5% MIS(4) = 6%

{1, 2} with support 9% is infrequent, but {1, 2, 3} and

{1, 2, 4} could be frequent

55
Solution
❖ We sort all items in I according to their MIS values (make
it a total order)

❖ The order is used throughout the algorithm in each

itemset

❖ Each itemset w is of the following form:

{w[1], w[2], …, w[k]}, consisting of items,
w[1], w[2], …, w[k],
where MIS(w[1]) ≤ MIS(w[2]) ≤ … ≤ MIS(w[k]).

56
The MSapriori algorithm
Algorithm MSapriori(T, MS)
M ← sort(I, MS);
L ← init-pass(M, T);
F1 ← {{i} | i ∈ L, i.count/n ≥ MIS(i)};
for (k = 2; Fk-1 ≠ ∅; k++) do
if k=2 then
Ck ← level2-candidate-gen(L)
else Ck ← MScandidate-gen(Fk-1);
end;
for each transaction t ∈ T do
for each candidate c ∈ Ck do
if c is contained in t then
c.count++;
if c – {c[1]} is contained in t then
c.tailCount++
end
end
Fk ← {c ∈ Ck | c.count/n ≥ MIS(c[1])}
end
return F ← kFk;

57
Candidate itemset generation
❖ Special treatments needed:
▪ Sorting the items according to their MIS values
▪ First pass over data (the first three lines)
• Let us look at this in detail
▪ Candidate generation at level-2
• Read it in the handout
▪ Pruning step in level-k (k > 2) candidate generation
• Read it in the handout

58
First pass over data
❖ It makes a pass over the data to record the support
count of each item

❖ It then follows the sorted order to find the first item i in

M that meets MIS(i)
▪ i is inserted into L.
▪ For each subsequent item j in M after i, if j.count/n ≥
MIS(i) then j is also inserted into L, where j.count is
the support count of j and n is the total number of
transactions in T. Why?

❖ L is used by function level2-candidate-gen

59
First pass over data: an example
❖ Consider the four items 1, 2, 3 and 4 in a data set. Their
minimum item supports are:
MIS(1) = 10% MIS(2) = 20%
MIS(3) = 5% MIS(4) = 6%

❖ Assume our data set has 100 transactions. The first pass
gives us the following support counts:
{3}.count = 6, {4}.count = 3,
{1}.count = 9, {2}.count = 25

❖ Then L = {3, 1, 2}, and F1 = {{3}, {2}}

❖ Item 4 is not in L because 4.count/n < MIS(3) (= 5%),
❖ {1} is not in F1 because 1.count/n < MIS(1) (= 10%)
60
Rule generation
❖ The following two lines in MSapriori algorithm are
important for rule generation, which are not needed for
the Apriori algorithm:

if c – {c[1]} is contained in t then

c.tailCount++

❖ Many rules cannot be generated without them

❖ Why?

61
On multiple minsup rule mining
❖ Multiple minsup model subsumes the single support
model

❖ It is a more realistic model for practical applications

❖ The model enables us to found rare item rules yet

without producing a huge number of meaningless
rules with frequent items

❖ By setting MIS values of some items to 100% (or

more), we effectively instruct the algorithms not to
generate rules only involving these items
62
Contents
1 Introduction

2 Mining algorithms

3 Data formats for mining

4 Multiple minimum supports

5 Mining class association rules

6 Discussion & Exercises

63
5. Mining class association rules (CAR)
❖ Normal association rule mining does not have any target

❖ It finds all possible rules that exist in data, i.e., any item
can appear as a consequent or a condition of a rule

❖ However, in some applications, the user is interested in

some targets:
▪ E.g, the user has a set of text documents from some
known topics. He/she wants to find out what words are
associated or correlated with each topic.

64
Problem definition
❖ Let T be a transaction data set consisting of n transactions

❖ Each transaction is also labeled with a class y

❖ Let I be the set of all items in T, Y be the set of all class

labels and I ∩ Y = ∅

❖ A class association rule (CAR) is an implication of the

form
X → y, where X ⊆ I, and y ∈ Y

❖ The definitions of support and confidence are the same

as those for normal association rules

65
An example
❖ A text document data set
doc 1: Student, Teach, School : Education
doc 2: Student, School : Education
doc 3: Teach, School, City, Game : Education
doc 4: Baseball, Basketball : Sport
doc 5: Basketball, Player, Spectator : Sport
doc 6: Baseball, Coach, Game, Team : Sport
doc 7: Basketball, Team, City, Game : Sport

❖ Let minsup = 20% and minconf = 60%. The following are two
examples of class association rules:

Student, School → Education [sup= 2/7, conf = 2/2]

game → Sport [sup= 2/7, conf = 2/3]

66
Mining algorithm
❖ Unlike normal association rules, CARs can be mined
directly in one step

❖ The key operation is to find all ruleitems that have

support above minsup. A ruleitem is of the form:
(condset, y)

where condset is a set of items from I (i.e., condset ⊆

I), and y ∈ Y is a class label.

❖ Each ruleitem basically represents a rule:

condset → y,
❖ The Apriori algorithm can be modified to generate CARs
67
Multiple minimum class supports
❖ The multiple minimum support idea can also be applied here

❖ The user can specify different minimum supports to different

classes, which effectively assign a different minimum support
to rules of each class

❖ For example, we have a data set with two classes, Yes and
No. We may want
▪ rules of class Yes to have the minimum support of 5% and
▪ rules of class No to have the minimum support of 10%
▪
❖ By setting minimum class supports to 100% (or more for some
classes), we tell the algorithm not to generate rules of those
classes

68
Contents
1 Introduction

2 Mining algorithms

3 Data formats for mining

4 Multiple minimum supports

5 Mining class association rules

6 Discussion & Exercises

69
Summary
❖ Association rule mining has been extensively studied in the
data mining community

❖ There are many efficient algorithms and model variations

❖ Other related work includes

▪ Multi-level or generalized rule mining
▪ Constrained rule mining
▪ Incremental rule mining
▪ Maximal frequent itemset mining
▪ Numeric association rule mining
▪ Rule interestingness and visualization
▪ Parallel algorithms
▪ …
70
Questions

71
Exercises

72
Exercises (2)

73
Main reference

74
Click to edit company slogan .

Electronic Medical Record
100% (3)
Electronic Medical Record
5 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Association Rules
No ratings yet
Association Rules
24 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Association Rules & Sequential Patterns
No ratings yet
Association Rules & Sequential Patterns
65 pages
Contents
No ratings yet
Contents
59 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Association Rules
No ratings yet
Association Rules
64 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
CS583 Association Sequential Patterns
No ratings yet
CS583 Association Sequential Patterns
65 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
CS583 Association Sequential Patterns
No ratings yet
CS583 Association Sequential Patterns
64 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
DM Association
No ratings yet
DM Association
43 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
ICS 2408 - Lecture 5 - Association
No ratings yet
ICS 2408 - Lecture 5 - Association
44 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
Unit 5
No ratings yet
Unit 5
40 pages
Association Rules
No ratings yet
Association Rules
39 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
CH - 5
No ratings yet
CH - 5
43 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Unit - III
No ratings yet
Unit - III
27 pages
Association Rule Mining
No ratings yet
Association Rule Mining
21 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
ATC - Lecture - Notes - Data Mining Techniques - 2021
No ratings yet
ATC - Lecture - Notes - Data Mining Techniques - 2021
77 pages
CS583 Association Rules
No ratings yet
CS583 Association Rules
53 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Lect 6
No ratings yet
Lect 6
74 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
CH 5
No ratings yet
CH 5
45 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Data Mining: Association
No ratings yet
Data Mining: Association
41 pages
COS10022 DSP Week06 Association Rules
No ratings yet
COS10022 DSP Week06 Association Rules
52 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Tds0002 Illustration Style Standards Issue e
No ratings yet
Tds0002 Illustration Style Standards Issue e
149 pages
Veritas Information Classifier (VIC) : A Shared Service For All Veritas Products
No ratings yet
Veritas Information Classifier (VIC) : A Shared Service For All Veritas Products
21 pages
Accounting Lesson 3 AIS
No ratings yet
Accounting Lesson 3 AIS
12 pages
Unit 2 Bda
No ratings yet
Unit 2 Bda
5 pages
AI Agents in n8n - College Notes
100% (1)
AI Agents in n8n - College Notes
2 pages
Welcome To The Ceeol: A Tutorial For Librarians
No ratings yet
Welcome To The Ceeol: A Tutorial For Librarians
36 pages
Cyber Recovery With Powerprotect For Multi Cloud Solution Brief
No ratings yet
Cyber Recovery With Powerprotect For Multi Cloud Solution Brief
2 pages
Manual Honda Nc700x
No ratings yet
Manual Honda Nc700x
2 pages
Information Systems PPT New
No ratings yet
Information Systems PPT New
7 pages
Data Analyst Chapter 2
No ratings yet
Data Analyst Chapter 2
25 pages
SravanKumar Mekala Resume
No ratings yet
SravanKumar Mekala Resume
2 pages
Comp6 Unit2b Lecture Slides
No ratings yet
Comp6 Unit2b Lecture Slides
13 pages
CC5051 Database Coursework Guidelines
No ratings yet
CC5051 Database Coursework Guidelines
7 pages
1996 - 997 - DOC - Hands-On Practice With AI Tools
No ratings yet
1996 - 997 - DOC - Hands-On Practice With AI Tools
3 pages
Model Building Using Healthcare Dataset
No ratings yet
Model Building Using Healthcare Dataset
19 pages
Farooq Resume 1J23-2
No ratings yet
Farooq Resume 1J23-2
3 pages
Economics and Finance Research - IDEAS - RePEc
No ratings yet
Economics and Finance Research - IDEAS - RePEc
1 page
Z08430000120164004comp8129 Ux1
No ratings yet
Z08430000120164004comp8129 Ux1
28 pages
Group 4 - Security Part 2 - Auditing Database-Systems - Villapa, Ceneta, Sister - Quizzes With Answers
No ratings yet
Group 4 - Security Part 2 - Auditing Database-Systems - Villapa, Ceneta, Sister - Quizzes With Answers
3 pages
Alex Gorelik What Is A Data Lake OReilly Media Inc. 2020
No ratings yet
Alex Gorelik What Is A Data Lake OReilly Media Inc. 2020
82 pages
@vtucode - in BCS403 Model Paper 2022 Scheme
No ratings yet
@vtucode - in BCS403 Model Paper 2022 Scheme
5 pages
CS8492 - DBMS Syll
No ratings yet
CS8492 - DBMS Syll
1 page
Cs6008 Human Computer Interaction L T P C 3 0 0 3
No ratings yet
Cs6008 Human Computer Interaction L T P C 3 0 0 3
1 page
Assignment 8
No ratings yet
Assignment 8
4 pages
Fundamentals of Data Science Unit 1
No ratings yet
Fundamentals of Data Science Unit 1
29 pages
Safari
No ratings yet
Safari
1 page
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
Week 6.1 ERP Process Map
No ratings yet
Week 6.1 ERP Process Map
15 pages
Customer Data and PegaDATA
No ratings yet
Customer Data and PegaDATA
3 pages