0% found this document useful (0 votes)

34 views44 pages

Association Rule Mining 2023 (Compatibility Mode)

Association Rule Mining 2023

Uploaded by

Ajitesh Thawait

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views44 pages

Association Rule Mining 2023 (Compatibility Mode)

Association Rule Mining 2023

Uploaded by

Ajitesh Thawait

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Association Rule Learning

Association rule learning is a rule-based machine

learning method for discovering interesting
relations between variables in large databases.

It is intended to identify strong rules discovered in

databases using some measures of interestingness.

Rules Discovered:
{Src IP = 206.163.37.95,
Dest Port = 139,
Bytes  [150, 200]} --> {ATTACK}
Association rule learning

For example, if you analyse grocery lists of

a consumer over a period of time you will be
able to see a certain buying pattern, like,
if peanut butter & jelly are bought then
bread is also bought; this information can
be used in marketing and pricing
decisions.
Association rule learning
Another example is Netflix movie
recommendations that are made based
on choices made by previous
customers.
For example, if a movie of particular genre
is selected then similar movie
recommendations are made.
Market Basket Analysis
Market basket analysis may be performed
on the retail data of customer transactions
at a store.
That can be then used to plan marketing or
advertising strategies, or in the design of a
new catalog.
Market basket analysis can also help
retailers plan which items to put on sale at
reduced prices.
If customers tend to purchase computers
and printers together, then having a sale on
printers may encourage the sale of printers
as well as computers.
Frequent Pattern Analysis
 Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in
the context of frequent itemsets and association rule mining
 Motivation: Finding inherent regularities in data
 What products were often purchased together?— Bread
and Butter?
 What are the subsequent purchases after buying a PC?
 Applications
 Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.

5
Association Mining
 Association rule mining:
 Finding frequent patterns, associations, or
correlations, among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
 Examples.
 Rule form: “Body ead [support, confidence]”.
 buys(x, “Bread”)  buys(x, “Milk”) [0.5%, 60%]
 major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%,
75%]
Example Association Rule

90% of transactions that purchase bread and

butter also purchase milk

Antecedent: bread and butter

Consequent: milk
Confidence factor: 90%
Example

 I: itemset
{cucumber, parsley, onion, tomato, salt, bread, olives,
cheese, butter}
 D: set of transactions
1 {{cucumber, parsley, onion, tomato, salt, bread},
2 {tomato, cucumber, parsley},
3 {tomato, cucumber, olives, onion, parsley},
4 {tomato, cucumber, onion, bread},
5 {tomato, salt, onion},
6 {bread, cheese}
7 {tomato, cheese, cucumber}
8 {bread, butter}}
FORMAL MODEL

• I = i1, i2, …, im: set of literals (items)

 D : database of transactions
 T  D : a transaction. T  I
 X: a subset of I
 T contains X if X  T.
Rule Measures: Support and Confidence
Customer
Customer
buys both
buys Laptop
Let minimum support 50%, and
minimum confidence 50%, we
have
 A  C (50%, 66.6%)
Customer  C  A (50%, 100%)
buys Printer

Transaction ID Items Bought

2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F
Formal Model (Cont.)

 Association rule: X  Y
here X  I, Y  I and X Y = .
 Rule X  Y has a support s in D
if s% of transactions in D contain X  Y.

 Rule X  Y has a confidence c in D

if c% of transactions in D that contain X also contain Y.
Terminologies

 K-Itemset : If the length of itemset is K

 Frequent K-Itemset :- If the length of the itemset is
K and the itemset satisfies a minimum support
threshold.
 Downward closure property :- Any subset of
frequent set is frequent set.
 Upward closure property :- Any superset of an
infrequent set is an infrequent set.
 Maximal Frequent set :- The set itself is frequent
and no superset of this is a frequent set.
 Border set :- If it is not frequent set,but all its proper
subset are frequent set.
The Apriori Algorithm — Example –1
Database D
itemset sup.
{1} 2 itemset sup.
TID Items
100 134 C1 {2} 3 L1 {1} 2
200 235 {3} 3 {2} 3
Scan D {4} 1 {3} 3
300 1235
400 25 {5} 3 {5} 3

itemset sup C2 itemset

itemset sup C2
{1 2} 1 {1 2}
L2 {1 3} 2 Scan D
{1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 5} 3 {2 3} 2 {2 3}
{3 5} 2 {2 5} 3 {2 5}
{3 5} 2 {3 5}

C3 itemset Scan D L3 itemset sup

{2 3 5} {2 3 5} 2
Discovering Large Itemsets
Apriori algorithm: - Required Prior Knowledge
Level-wise algorithm
Basic intuition:

Itemset having k items can be generated by joining

large itemsets having k-1 items, and deleting those
that contain any subset that is not large.

Algorithm consist of two phases :

I) Candidate Generation
2) Pruning
Apriori Algorithm
Gen_candidate_itemsets with the given Lk-1 as
follows:
Ck = 
for all itemset L1  Lk-1 do
for all itemset L2  Lk-1 do

if L1[1] = L2[1] and L1[2] = L2[2] and …..and L1[k-1]

< L2[k-1]
then c= L1[1], L2[2]……. L1[k-1],
L2[k-1]

Ck = Ck U {c}
Apriori Candidate Generation
Example :
L3 = { { 1,2,3},{1,2,5},{1,3,5},{2,3,4},{2,3,5} }
Algorithm will generate following itemset
1) { 1, 2,3, 5 } is generated from { 1,2 ,3}
and {1,2,5}
Similarly , { 2,3,4,5} is generated from { 2,3,4} and
{ 2,3, 5}.
Now
Ck = { {1,2,3,5} , {2,3,4,5} }
Example of Generating Candidates

 L3={abc, abd, acd, ace, bcd}

 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because ade is not in L3
 C4={abcd}
PRUNING
 The pruning step eliminates the extensions of (k-1)-
itemsets which are not frequent.

prune(Ck)

for all c  Ck
for all (k-1)-subsets d of c do
if d  Lk-1
then Ck = Ck \ {c}
APRIORI ALGORTHM
Example:
with k = 3 (& k-itemsets lexicographically ordered)

{3,4,5}, {3,4,7}, {3,5,6}, {3,5,7}, {3,5,8}, {4,5,6},

{4,5,7}

genereate all possible (k+1)-itemsets, by, for each to sets where we have
{a1,a2,..a(k-1),X} and {a1,a2,..a(k-1),Y}, results in candidate
{a_1,a_2,...a_(k-1),X,Y}.

{3,4,5,7}, {3,5,6,7}, {3,5,6,8}, {3,5,7,8}, {4,5,6,7}

APRIORI ALGORTHM
Example (CONTINUED):

{3,4,5,7}, {3,5,6,7}, {3,5,6,8}, {3,5,7,8}, {4,5,6,7}

Delete (prune) all itemset candidates with non-frequent

subsets. Like; {3,5,6,8} self never frequent since subset
{5,6,8} is not frequent.

Actually, here, only one remaining candidate {3,4,5,7}

Last; after pruning, determine the support of the remaining

itemsets, and check if they make the threshold.
EKO , KMY, KOY
EK ,EO,KO KM,KY,MY KO,KY,OY
Minimum Support 2
Minimum Support 3
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {5} => {8} ) ?

supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} =>

{5} ) ?
supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf ( {5} => {8} ) ? 80% Done.

Conf ( {8} => {5} ) ? 57% Done.

Rule ( {5} => {8} ) more meaningful then

Rule ( {8} => {5} )
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {9} => {3} ) ?

supp({9}) = 1 , supp({3}) = 4 , supp({3,9}) = 1,
then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf( {9} => {3} ) = 100%. Done.

Notice: High Confidence, Low Support.

-> Rule ( {9} => {3} ) not meaningful
Problem decomposition
1. Find all itemsets that have transaction
support above minimum support.Such
itemsets are called frequent itemsets.
2. Use the large itemsets to generate the
Association rules:
2 1. For every large itemset I, find its all
subsets
2.2. For every subset a, output a rule:
a  (I - a) if support(l)
minconf 
support(a)
Generation of Rules

Extract rules from

For a rule:

R: < c1, c2, …, ci-1 >  < ci, ci+1, ... ck >
Head Tail

A confidence value is calculated:

A database has 5 transactions. Let min sup = 0.6
and min conf = 0.8.
•List the frequent k-itemset for the largest k, and
•all the strong association rules (with given support
and confidence) for the following shape of rules:

transaction, buys(x,item1) ^ buys(x,item2)

buys(x,item3)

Customer Date Item_bought

100 10/15 {K,A,D.B,C}
200 10/15 {D,A,E,F}
300 10/19 { C , D ,B , E }
400 10/20 { B , A , C ,K D }
500 10/21 {A,G,C}

Cassandra PPT Final
No ratings yet
Cassandra PPT Final
23 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
The Data Driven Enterprise of 2025 Final
No ratings yet
The Data Driven Enterprise of 2025 Final
10 pages
Week 6 - Basic Association Analysis
No ratings yet
Week 6 - Basic Association Analysis
71 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
26CS157F Stefaan Yetimyan Cs175A
No ratings yet
26CS157F Stefaan Yetimyan Cs175A
18 pages
SOP DC-DR Database DRILL
No ratings yet
SOP DC-DR Database DRILL
4 pages
04 AssociationRules
No ratings yet
04 AssociationRules
15 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
04 AssociationRules PDF
No ratings yet
04 AssociationRules PDF
15 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
The Big Data System, Components, Tools, and Technologies A Survey
No ratings yet
The Big Data System, Components, Tools, and Technologies A Survey
100 pages
Association Rule
No ratings yet
Association Rule
27 pages
Association Rules
No ratings yet
Association Rules
24 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Data Ty
No ratings yet
Data Ty
59 pages
Sybase
No ratings yet
Sybase
18 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
03 Ch3 DesignTheoryforRelationalDatabases
No ratings yet
03 Ch3 DesignTheoryforRelationalDatabases
105 pages
Unit 4
No ratings yet
Unit 4
72 pages
Bank Management System
100% (7)
Bank Management System
36 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
How To : Expose Custom Measurements For Use in Web Applications
No ratings yet
How To : Expose Custom Measurements For Use in Web Applications
17 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
DMDW 3rd Module
No ratings yet
DMDW 3rd Module
34 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
35 pages
Experiment No 3: Mitesh Chauhan Te It - 1 B1 Roll No:-08
No ratings yet
Experiment No 3: Mitesh Chauhan Te It - 1 B1 Roll No:-08
6 pages
Asset Performance Testing: 1. Overview
No ratings yet
Asset Performance Testing: 1. Overview
8 pages
DISESOR - Decision Support System For Mining Industry: Michał Kozielski Marek Sikora Łukasz Wróbel
No ratings yet
DISESOR - Decision Support System For Mining Industry: Michał Kozielski Marek Sikora Łukasz Wróbel
8 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Melinex 1311 - UL Product Iq
No ratings yet
Melinex 1311 - UL Product Iq
2 pages
Amrish Raj
No ratings yet
Amrish Raj
10 pages
Resume For Cloud Technologies
No ratings yet
Resume For Cloud Technologies
4 pages
Mis Unit 4
No ratings yet
Mis Unit 4
10 pages
DM 2
No ratings yet
DM 2
71 pages
Practice 11 Applying Patch Set Update (PSU)
No ratings yet
Practice 11 Applying Patch Set Update (PSU)
9 pages
DM - Unit 2
No ratings yet
DM - Unit 2
49 pages
Lesson 3 Measures of Central Tendency, Dispersion and Skewness An Kurtosis
No ratings yet
Lesson 3 Measures of Central Tendency, Dispersion and Skewness An Kurtosis
31 pages
Api Class
No ratings yet
Api Class
49 pages
KR Edition 204 206: You Are Here: Home Knowledge Refreshers
No ratings yet
KR Edition 204 206: You Are Here: Home Knowledge Refreshers
3 pages
Data Mining - Module 6
No ratings yet
Data Mining - Module 6
7 pages
Database Practical
No ratings yet
Database Practical
10 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Open Source Software Referance Guide
No ratings yet
Open Source Software Referance Guide
9 pages
IP Practical, Kanishk Bhati XII-D
No ratings yet
IP Practical, Kanishk Bhati XII-D
37 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Association Rules
No ratings yet
Association Rules
33 pages
Module 4 DM
No ratings yet
Module 4 DM
86 pages
Sangam Pawar: Software Test Engineer
No ratings yet
Sangam Pawar: Software Test Engineer
5 pages
Module 4
No ratings yet
Module 4
71 pages
10th International Conference On Data Mining and Database Management Systems (DMDBS 2024)
No ratings yet
10th International Conference On Data Mining and Database Management Systems (DMDBS 2024)
2 pages
Practical-File-12 IP 24-25
No ratings yet
Practical-File-12 IP 24-25
49 pages
Book Notes - Designing Data-Intensive Applications
No ratings yet
Book Notes - Designing Data-Intensive Applications
30 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
SP3D Admin Responsibility
No ratings yet
SP3D Admin Responsibility
5 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Mod 5
No ratings yet
Mod 5
56 pages
Apriori Algorithm Examples
No ratings yet
Apriori Algorithm Examples
45 pages
Software Development 5th International Conference Modelsward 2017 7152258
100% (1)
Software Development 5th International Conference Modelsward 2017 7152258
52 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Data Storytelling With Generative AI Using Python and Altair MEAP V05 Angelica Lo Duca PDF Download
100% (1)
Data Storytelling With Generative AI Using Python and Altair MEAP V05 Angelica Lo Duca PDF Download
83 pages

Association Rule Mining 2023 (Compatibility Mode)

Uploaded by

Association Rule Mining 2023 (Compatibility Mode)

Uploaded by

Association Rule Learning

Association rule learning is a rule-based machine

It is intended to identify strong rules discovered in

For example, if you analyse grocery lists of

90% of transactions that purchase bread and

Antecedent: bread and butter

• I = i1, i2, …, im: set of literals (items)

Transaction ID Items Bought

 Rule X  Y has a confidence c in D

 K-Itemset : If the length of itemset is K

itemset sup C2 itemset

C3 itemset Scan D L3 itemset sup

Itemset having k items can be generated by joining

Algorithm consist of two phases :

if L1[1] = L2[1] and L1[2] = L2[2] and …..and L1[k-1]

 L3={abc, abd, acd, ace, bcd}

{3,4,5}, {3,4,7}, {3,5,6}, {3,5,7}, {3,5,8}, {4,5,6},

{3,4,5,7}, {3,5,6,7}, {3,5,6,8}, {3,5,7,8}, {4,5,6,7}

{3,4,5,7}, {3,5,6,7}, {3,5,6,8}, {3,5,7,8}, {4,5,6,7}

Delete (prune) all itemset candidates with non-frequent

Actually, here, only one remaining candidate {3,4,5,7}

Last; after pruning, determine the support of the remaining

Conf ( {5} => {8} ) ?

Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} =>

Conf ( {5} => {8} ) ? 80% Done.

Rule ( {5} => {8} ) more meaningful then

Conf ( {9} => {3} ) ?

Conf( {9} => {3} ) = 100%. Done.

Notice: High Confidence, Low Support.

Extract rules from

A confidence value is calculated:

transaction, buys(x,item1) ^ buys(x,item2)

Customer Date Item_bought

You might also like