0% found this document useful (0 votes)

48 views14 pages

Association Rules

Association rule mining finds frequent patterns and correlations among items in transactional databases. It aims to discover rules that describe large portions of your data like "customers that buy X also tend to buy Y". The key step is finding all frequent itemsets that occur above a minimum support threshold. The Apriori algorithm is commonly used, joining potentially frequent itemsets in each pass over the database. Support and confidence are typical measures but have limitations, and other measures like lift address rules between negatively correlated items better. Association rule mining has had significant impact and continued research explores new data types.

Uploaded by

Fitriana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views14 pages

Association Rules

Uploaded by

Fitriana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Mining:

Association Rules Techniques

May 10, 2023 1

What Is Association Mining?

 Association rule mining:

 Finding frequent patterns, associations, correlations among
sets of items or objects in transaction databases, relational
databases, and other information repositories.
 Applications:
 Basket data analysis, cross-marketing, catalog design, loss-
leader analysis, clustering, classification, etc.
 Examples.
 Rule form: “Body ead [support, confidence]”.
 buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]
 major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%, 75%]

May 10, 2023 2

Association Rule: Basic Concepts

 Given: (1) database of transactions, (2) each transaction is a

list of items (purchased by a customer in a visit)
 Find: all rules that correlate the presence of one set of items
with that of another set of items
 E.g., 98% of people who purchase tires and auto

accessories also get automotive services done

May 10, 2023 3

Rule Measures: Support and
Confidence
Customer Customer
buys both
 Find all the rules X & Y  Z with
buys diaper
minimum confidence and support
 support, s, probability that a

transaction contains {X, Y, Z}

 confidence, c, conditional

Customer probability that a transaction

buys beer having {X, Y} also contains Z

Transaction ID Items Bought Let minimum support 50%, and

2000 A,B,C minimum confidence 50%, we have
 A  C (50%, 66.6%)
1000 A,C
 C  A (50%, 100%)
4000 A,D
5000 B,E,F
May 10, 2023 4
Association Rule Mining: A Road Map

 Boolean vs. quantitative associations (Based on the

types of values handled)
 buys(x, “SQLServer”) ^ buys(x, “DMBook”)
buys(x, “DBMiner”) [0.2%, 60%]
 age(x, “30..39”) ^ income(x, “42..48K”) buys(x,
“PC”) [1%, 75%]
 Single dimension vs. multiple dimensional associations
(see ex. Above)
 Single level vs. multiple-level analysis
 What brands of beers are associated with what
brands of diapers?
May 10, 2023 5
Mining Association Rules—An Example

Transaction ID Items Bought Min. support 50%

2000 A,B,C Min. confidence 50%
1000 A,C
4000 A,D Frequent Itemset Support
{A} 75%
5000 B,E,F
{B} 50%
{C} 50%
For rule A  C : {A,C} 50%
support = support({A, C}) = 50%
confidence = support({A, C})/support({A}) = 66.6%
The Apriori principle:
Any subset of a frequent itemset must be frequent
May 10, 2023 6
Mining Frequent Itemsets: the
Key Step
 Find the frequent itemsets: the sets of items that have
minimum support
 A subset of a frequent itemset must also be a frequent
itemset
 i.e., if {AB} is a frequent itemset, both {A} and {B} should be a
frequent itemset
 Iteratively find frequent itemsets with cardinality from 1 to k
(k-itemset)
 Use the frequent itemsets to generate association rules.

May 10, 2023 7

The Apriori Algorithm
 Join Step: Ck is generated by joining Lk with itself
-1

 Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a

frequent k-itemset
 Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;

May 10, 2023 8

The Apriori Algorithm — Example
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
May 10, 2023 9
Interestingness Measurements
 Objective measures
Two popular measurements:
 support; and

 confidence

 Subjective measures (Silberschatz & Tuzhilin,

KDD95)
A rule (pattern) is interesting if
 it is unexpected (surprising to the user); and/or

 actionable (the user can do something with it)

May 10, 2023 10

Criticism to Support and Confidence
 Example 1: (Aggarwal & Yu, PODS98)
 Among 5000 students

3000 play basketball

3750 eat cereal

2000 both play basket ball and eat cereal
 play basketball  eat cereal [40%, 66.7%] is misleading

because the overall percentage of students eating cereal is 75%

which is higher than 66.7%.
 play basketball  not eat cereal [20%, 33.3%] is far more

accurate, although with lower support and confidence

basketball not basketball sum(row)
cereal 2000 1750 3750
not cereal 1000 250 1250
sum(col.) 3000 2000 5000
May 10, 2023 11
Criticism to Support and Confidence
(Cont.)
 Example 2:
 X and Y: positively correlated, X 1 1 1 1 0 0 0 0
 X and Z, negatively related Y 1 1 0 0 0 0 0 0
 support and confidence of
Z 0 1 1 1 1 1 1 1
X=>Z dominates
 We need a measure of dependent
or correlated events Rule Support Confidence
P ( A B) X=>Y 25% 50%
corrA, B 
P ( A) P( B) X=>Z 37.50% 75%
 P(B|A)/P(B) is also called the lift
of rule A => B
May 10, 2023 12
Other Interestingness Measures: Interest
 Interest (correlation, lift) P( A  B)
P ( A) P ( B )
 taking both P(A) and P(B) in consideration
 P(A^B)=P(B)*P(A), if A and B are independent events
 A and B negatively correlated, if the value is less than 1;
otherwise A and B positively correlated
Itemset Support Interest
X 1 1 1 1 0 0 0 0
X,Y 25% 2
Y 1 1 0 0 0 0 0 0 X,Z 37.50% 0.9
Z 0 1 1 1 1 1 1 1 Y,Z 12.50% 0.57

May 10, 2023 13

Summary

 Association rule mining

 probably the most significant contribution from the
database community in KDD
 A large number of papers have been published
 Many interesting issues have been explored
 An interesting research direction
 Association analysis in other types of data: spatial
data, multimedia data, time series data, etc.

May 10, 2023 14

Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Oracle 1-2 - Data vs. Information
100% (1)
Oracle 1-2 - Data vs. Information
2 pages
Data Mining:: Association Rules Techniques
No ratings yet
Data Mining:: Association Rules Techniques
14 pages
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
No ratings yet
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
54 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Association Rules
No ratings yet
Association Rules
33 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Rani 2
No ratings yet
Rani 2
98 pages
Association Rule
No ratings yet
Association Rule
27 pages
Contents
No ratings yet
Contents
59 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Unit 5
No ratings yet
Unit 5
40 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Lecture 6 - Other Data Science Tasks and Techniques
No ratings yet
Lecture 6 - Other Data Science Tasks and Techniques
60 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
CH - 5
No ratings yet
CH - 5
43 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
Association Rule Mining: Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin
No ratings yet
Association Rule Mining: Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin
11 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
New Association Rule
No ratings yet
New Association Rule
37 pages
Association Rules
No ratings yet
Association Rules
24 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
Apriori
No ratings yet
Apriori
27 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Apriori
No ratings yet
Apriori
27 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
Unit - III
No ratings yet
Unit - III
27 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
28 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
40 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Association Rules
No ratings yet
Association Rules
39 pages
Association
No ratings yet
Association
29 pages
06 Association Rules
No ratings yet
06 Association Rules
32 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Essential Access Exercises
No ratings yet
Essential Access Exercises
15 pages
EFIGS 032-0252P1 Reva VxWinDICOM
No ratings yet
EFIGS 032-0252P1 Reva VxWinDICOM
114 pages
Annotator CV
No ratings yet
Annotator CV
3 pages
GreatGigz '086 PATROLL Winning Submission
No ratings yet
GreatGigz '086 PATROLL Winning Submission
7 pages
Data Flow Diagrams
No ratings yet
Data Flow Diagrams
9 pages
WebPage Evaluation Checklist - Berkeley PDF
No ratings yet
WebPage Evaluation Checklist - Berkeley PDF
1 page
Soa-Analysis & Design
No ratings yet
Soa-Analysis & Design
27 pages
An Overview On Data Quality Issues at Data Staging ETL
No ratings yet
An Overview On Data Quality Issues at Data Staging ETL
4 pages
B0662 B.C.A VI - Semester Examination 2019-20 Information System: Analysis Design & Implementation
No ratings yet
B0662 B.C.A VI - Semester Examination 2019-20 Information System: Analysis Design & Implementation
2 pages
Question 1 of 20
No ratings yet
Question 1 of 20
61 pages
QDRECL03003 Document and Record Storage Retention and Disposal Procedure
100% (1)
QDRECL03003 Document and Record Storage Retention and Disposal Procedure
4 pages
HCI Individual Project-1
No ratings yet
HCI Individual Project-1
10 pages
Import Minex Data To Spry
No ratings yet
Import Minex Data To Spry
11 pages
Apache HBase
No ratings yet
Apache HBase
12 pages
Scei Model
No ratings yet
Scei Model
9 pages
Database Worksheet 2
No ratings yet
Database Worksheet 2
5 pages
Gis in Buganda Land Board - New
No ratings yet
Gis in Buganda Land Board - New
14 pages
Power BI Interview Question
No ratings yet
Power BI Interview Question
41 pages
B-7039-Article Text-21030-1-2-20220317-1
No ratings yet
B-7039-Article Text-21030-1-2-20220317-1
10 pages
Paramente R Classification Clustering: 11) What Is The Difference Between Classification and Clustering?
No ratings yet
Paramente R Classification Clustering: 11) What Is The Difference Between Classification and Clustering?
2 pages
AWS Developer Associate Real Questions With Answer 2018
No ratings yet
AWS Developer Associate Real Questions With Answer 2018
35 pages
ICDT 2018: International Conference On Digital Transformation
No ratings yet
ICDT 2018: International Conference On Digital Transformation
8 pages
Shreyansh - SPORTS CLUB MANAGEMENT SYSTEM
No ratings yet
Shreyansh - SPORTS CLUB MANAGEMENT SYSTEM
39 pages
UNIT-1 1) KDD: KDD (Knowledge Discovery in Database)
No ratings yet
UNIT-1 1) KDD: KDD (Knowledge Discovery in Database)
17 pages
Harshit Project File
No ratings yet
Harshit Project File
10 pages
Chapter 3 Database Management System
No ratings yet
Chapter 3 Database Management System
51 pages
OmniDocs Brochure
No ratings yet
OmniDocs Brochure
4 pages
Oracle Golden Gate Interview Questions
100% (1)
Oracle Golden Gate Interview Questions
37 pages
Power Automate PDF
No ratings yet
Power Automate PDF
14 pages

Association Rules

Uploaded by

Association Rules

Uploaded by

Data Mining:

Association Rules Techniques

May 10, 2023 1

 Association rule mining:

May 10, 2023 2

 Given: (1) database of transactions, (2) each transaction is a

accessories also get automotive services done

May 10, 2023 3

transaction contains {X, Y, Z}

Customer probability that a transaction

Transaction ID Items Bought Let minimum support 50%, and

 Boolean vs. quantitative associations (Based on the

Transaction ID Items Bought Min. support 50%

May 10, 2023 7

 Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a

May 10, 2023 8

 Subjective measures (Silberschatz & Tuzhilin,

 actionable (the user can do something with it)

May 10, 2023 10

because the overall percentage of students eating cereal is 75%

accurate, although with lower support and confidence

May 10, 2023 13

 Association rule mining

May 10, 2023 14

You might also like