0% found this document useful (0 votes)

33 views43 pages

Lecture 5

Uploaded by

srinutirumanisetti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views43 pages

Lecture 5

Uploaded by

srinutirumanisetti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Association Rule Mining

* Data Mining:Association Rules 1

Overview

■ Data Mining
■ Association rule mining
■ Apriori method
■ Some other methods – a brief.
■ Later, we see them.

* Data Mining:Association Rules 2

Association Mining?
• Association rule mining:
– Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
• Applications:
– Basket data analysis, cross-marketing, catalog design,
clustering, classification, etc.
• Examples.
– Rule form: “Body → Ηead [support, confidence]”.
– buys(x, “diapers”) → buys(x, “beers”) [0.5%, 60%]
– major(x, “CS”) ^ takes(x, “DB”) → grade(x, “A”) [1%, 75%]

* Data Mining:Association 3
Rules
Association Rules: Basic Concepts
• Given: (1) database of transactions, (2) each transaction
is a list of items (purchased by a customer in a visit)
• Find: all rules that correlate the presence of one set of
items with that of another set of items
– E.g., 98% of people who buy a laptop also buy antivirus
software.

* Data Mining:Association 4
Rules
Association Rule

■ An example: There is a super-market, and

people are buying items from it.
■ The goods bought by each person are stored

in a database.
■ Let the items are {A, B, C, … }.

* Data Mining:Association Rules 5

Association Rule

■ A rule like, if a person buys a set of items

{A,C,E} then mostly he/she will buy
another set of items {D,F}.
■ {A,C,E} -> {D,F} is the association rule.

■ Eg: People who buy potato chips are also

buying cool-drinks .
■ Potato chips -> cool-drinks

* Data Mining:Association Rules 6

Association Rule

■ But, how good (sound) are these rules?

■ That is, how much we can trust these

rules.

■ Are these rules useful?

■ How frequently is this rule applicable.

* Data Mining:Association Rules 7

Association Rule

■ {D} -> {A} is an association rule.

■ According to the given database, this rule is true.
[confidence is high]
■ But, only one person bought both D and A.
[support is low]

* Data Mining:Association Rules 8

Association Rule

■ {A} -> {C} is an association rule.

■ According to the given database, this rule is true
only partly. [confidence is not 100%]
■ But, 2 out of 4 bought both A and C.
[support is moderate]

* Data Mining:Association Rules 9

Notation and Definitions

■ Let I be the set of all items.

■ X, Y, … be the subsets of I
■ We call X, Y, … as itemsets.

■ If X has k items, then X is called as a

k-itemset
■ If I is of size n . That is, in total there are
n items.
■ Then, the total number of itemsets is
2n – 1.
■ Association rule is of the form X -> Y

* Data Mining:Association Rules 10

Notation and Definitions

* Data Mining:Association Rules 11

The Example

For rule A ⇾ C :
support = 0.5 (or 50%)
confidence = 0.666 (or 66.6%)

* Data Mining:Association Rules 12

Notation

■ Normally support is defined for an

itemset.
■ Support (X) = percent of transactions

having X.

■ Confidence is defined for a rule.

■ Confidence (X ⇾ Y) =

Support (X and Y) / Support (X)

* Data Mining:Association Rules 13

An Exercise Problem

Transaction Id Items bought

100 A,B,C
101 B,C
102 A,C
103 A,B,D
104 A,B,C
105 A,C,E
106 B,D
107 A,B,C

Find out support and confidence of A ⇾ B

Find out support and confidence of B ⇾ A

* Data Mining:Association Rules 14

Functional Dependency in DBMS

* Data Mining:Association Rules 15

Functional Dependency in DBMS

■ FDs in relational DBs are association rules with

100% confidence.
■ Support is irrelevant.

* Data Mining:Association Rules 16

The Problem of finding Association Rules

■ Given a transactional database, find out all

association rules satisfying the given
minimum support and confidence.

* Data Mining:Association Rules 17

The Problem of finding Association Rules

■ This problem boils down into two sub-problems

■ Find out all itemsets for which the support is

more than the minimum value.

■ This is called frequent itemset mining.

■ Find out the association rules using frequent

itemsets.

* Data Mining:Association Rules 18

The Problem

■ Frequent itemset mining is the more difficult

problem.
■ Find out all itemsets for which the support is

more than a given value.

■ How much difficult is this problem?

* Data Mining:Association Rules 19

Association rules from frequent
itemsets can be easily found !!
■

But how is that we know that we need capture

support of A ?

* Data Mining:Association Rules 20

Association rules from frequent
itemsets can be easily found !!
■

* Data Mining:Association Rules 21

A simple algorithm

* Data Mining:Association Rules 22

A Naive Algorithm

1) For each itemset create a counter.

2) Initialize all counters to zero.
3) For each transaction in the database,
1) Find out all subsets of the transaction and increment
their respective counters.
4) Select those itemsets for which the counter
value is more than the given threshold value.

* Data Mining:Association Rules 23

Analysis of the Algorithm

■ If there are n items. Then the total number of

counters is 2n – 1 .
■ If n is a small number (perhaps < 20) then this is a
feasible solution.
■ But when n is large (like 1000) then it is not feasible
to create 21000 – 1 counters.
■ As an exercise, try to figure out how much big

this number is.

* Data Mining:Association Rules 24

Analysis

■ Time complexity is O(m ) [Good]

■ #database scans = one only. [Good]
■ Space complexity is O(2 n) [Very Bad]

■ In data mining #database scans is one

important measure of scalability.

* Data Mining:Association Rules 25

Other Naïve Method

■ The other way is to use only one counter

and find the support for each itemset
separately.
■ For this one has to scan the database
2 n – 1 times.
■ Space complexity is reduced, but time
complexity is increased.

* Data Mining:Association Rules 26

Apriori Algorithm

■ One of the initial algorithms to solve this

problem in a better way.
■ It uses an important property regarding
the itemsets
■ A subset of a frequent itemset must
also be a frequent itemset
■ i.e., if {A,B} is a frequent itemset, both {A}
and {B} should also be frequent.
■ If either {A} or {B} is not frequent, then
{A,B} is also non-frequent.

* Data Mining:Association Rules 27

Apriori Algorithm

■ Some of the itemsets, we can discard at

early stages.

■ For example, if X is a non-frequent

itemset, then there is no need to worry
about all supersets of X.

■ But, if X is frequent, then may be a

superset of X is also frequent.

* Data Mining:Association Rules 28

Apriori Algorithm

■ This is a bottom-up method.

■ First find frequent 1-itemsets, then find

frequent 2-itemsets, …

■ If we already found frequent k-itemsets.

■ We call this L
K

* Data Mining:Association Rules 29

Apriori Algorithm Continued …

■ We generate candidates which can be

frequent K+1 itemsets.
■ We call these candidates as C
K+1

■ We find count of these candidates and

find L K+1

* Data Mining:Association Rules 30

How candidates are generated

■ If {A,B,C} and {A,B,D} are two itemsets

in L 3 then a candidate itemset in C 4 is
{A,B,C,D} provided all its subsets of size
3 are in L 3

■ If, for example, {B,C,D} is not in L 3 then

{A,B,C,D} can not be frequent and is
removed from C 4 [This is
called the pruning step ]

* Data Mining:Association Rules 31

The Apriori Algorithm
C k: Candidate itemset of size k
L k : frequent itemset of size k

Find L 1 ;
for (k = 1; L k !=∅; k ++) do begin
C k+1 = candidates generated from L k
for each transaction t in database do
(i) increment the count of all
(ii) candidates in C k+1 that are contained
in t
(iii) L k+1 = candidates in C k+1 with
min_support
end
return ∪ k L k

* Data Mining:Association Rules 32

The Apriori Algorithm — Example
Database D
L1
C1
Scan D

C2 C2
L2 Scan D

C3 Scan D L3

* Data Mining:Association Rules 33

Analysis of Apriori Algorithm

■ If the largest itemset size is k then we

need to scan the database atleast k times.

■ The space required depends on the

number of candidates generated.

■ But, certainly this is better than the naïve

methods.

* Data Mining:Association Rules 34

Exercise Problem

Transaction Id Items bought

100 A,B,C,D,E
101 A,B,C,D,F

102 B,C,F
103 A,C,F,G

Let the minimum support required is 50%, find out all frequent
itemsets using the Apriori algorithm.
At each stage show the candidates generated and describe how the
Apriori property is used to prune the candidates set.

* Data Mining:Association Rules 35

Transaction Id Items bought

100 A,B,C,D,E

101 A,B,C,D,F

102 B,C,F

103 A,C,F,G

* Data Mining:Association Rules 36

Methods to Improve Apriori’s Efficiency

■ Hash-based itemset counting: A k -itemset whose

corresponding hashing bucket count is below the
threshold cannot be frequent
■ Transaction reduction: A transaction that does not
contain any frequent k-itemset is useless in
subsequent scans
■ Partitioning: Any itemset that is potentially
frequent in DB must be frequent in at least one of
the partitions of DB
* Data Mining:Association Rules 37
Methods to Improve Apriori’s Efficiency

■ Sampling: mining on a subset of given

data, lower support threshold + a method
to determine the completeness
■ Dynamic itemset counting: add new
candidate itemsets only when all of their
subsets are estimated to be frequent

* Data Mining:Association Rules 38

Mining Frequent Patterns
Without Candidate Generation

■ Compress a large database into a compact,

Frequent-Pattern tree (FP-tree) structure

■ highly condensed, but complete for

frequent pattern mining

■ avoid costly database scans

* Data Mining:Association Rules 39

FP-tree based mining

■ Develops an efficient, FP-tree-based

frequent pattern mining method
■ A divide-and-conquer methodology:
decompose mining tasks into smaller
ones
■ Avoid candidate generation:
sub-database test only!

* Data Mining:Association Rules 40

Partition based methods

■ Partition the database and then

apply divide-and-conquer strategies.

* Data Mining:Association Rules 41

Summary

■ Association rule mining

■ probably the most significant contribution from
the database community in KDD
■ A large number of papers have been published
■ Many interesting issues have been explored
■ An interesting research direction
■ Association analysis in other types of data:
spatial data, multimedia data, time series data,
etc.

* Data Mining:Association Rules 42

Thank you !!!
* Data Mining:Association Rules 43

Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
INFS2030: Digital Business Management: Week 6 - The Data-Driven Organisation: Big Data, Analytics and Decision-Making
100% (1)
INFS2030: Digital Business Management: Week 6 - The Data-Driven Organisation: Big Data, Analytics and Decision-Making
22 pages
SAP Datasphere
No ratings yet
SAP Datasphere
35 pages
Project Report Class X
No ratings yet
Project Report Class X
15 pages
Action Plan On Library MGMT
100% (4)
Action Plan On Library MGMT
2 pages
Mobile Shop
33% (3)
Mobile Shop
8 pages
Lecture - Taxonomy of Information Systems
No ratings yet
Lecture - Taxonomy of Information Systems
2 pages
Lab#4
No ratings yet
Lab#4
5 pages
Hiring Data Librarians
100% (1)
Hiring Data Librarians
5 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 6
82 pages
PHP Runner
No ratings yet
PHP Runner
613 pages
Module 5 Ioe
No ratings yet
Module 5 Ioe
18 pages
Mining Association Rules in Large Databases
No ratings yet
Mining Association Rules in Large Databases
77 pages
Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
No ratings yet
Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
81 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
BIS 541 Ch05 20-21 S
No ratings yet
BIS 541 Ch05 20-21 S
91 pages
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
No ratings yet
Computing Techniques-Continued: Association Rule Mining Clustering Time Series Analysis
174 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
6asso ST
No ratings yet
6asso ST
77 pages
Lecture 2. TRW & RM-2 (Zotero As Citation Manager)
No ratings yet
Lecture 2. TRW & RM-2 (Zotero As Citation Manager)
30 pages
Data Mining: Magister Teknologi Informasi Universitas Indonesia
No ratings yet
Data Mining: Magister Teknologi Informasi Universitas Indonesia
72 pages
Association Rules
No ratings yet
Association Rules
48 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Unit 5
No ratings yet
Unit 5
40 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Contents
No ratings yet
Contents
59 pages
DM 2
No ratings yet
DM 2
71 pages
Association Rules Classroom
No ratings yet
Association Rules Classroom
102 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
AI Concepts and Viva Prep Updated
No ratings yet
AI Concepts and Viva Prep Updated
16 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Chap 6
No ratings yet
Chap 6
77 pages
CH - 5
No ratings yet
CH - 5
43 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Artificial Intelligence: Dr. Piyush Joshi IIIT Sri City
No ratings yet
Artificial Intelligence: Dr. Piyush Joshi IIIT Sri City
27 pages
Artificial Intelligence: Dr. Piyush Joshi IIIT Sri City
No ratings yet
Artificial Intelligence: Dr. Piyush Joshi IIIT Sri City
23 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Unit 2
No ratings yet
Unit 2
65 pages
Lecture 11
No ratings yet
Lecture 11
49 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
17 Decidabi - Ity
No ratings yet
17 Decidabi - Ity
58 pages
18 Reducibility
No ratings yet
18 Reducibility
57 pages
Unit - III
No ratings yet
Unit - III
27 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
27 pages
A Search: F (N) Estimated Cost of The Best Path That Continues From N To A Goal
No ratings yet
A Search: F (N) Estimated Cost of The Best Path That Continues From N To A Goal
20 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Association
No ratings yet
Association
40 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
International Conference: Recent Trends in Academic Libraries: Systems and Services
No ratings yet
International Conference: Recent Trends in Academic Libraries: Systems and Services
16 pages
ENGG1320 EEE Projects Introduction 202324S1
No ratings yet
ENGG1320 EEE Projects Introduction 202324S1
23 pages
Association Rule Mining
No ratings yet
Association Rule Mining
11 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
Problem Solving by Searching
No ratings yet
Problem Solving by Searching
40 pages
16 Turing Machines Variants NTM
No ratings yet
16 Turing Machines Variants NTM
36 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Clarity Data Model
No ratings yet
Clarity Data Model
16 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
24 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Association Analysis and Frequent Sequential Pattern Mining-Apriori Algorithm
No ratings yet
Association Analysis and Frequent Sequential Pattern Mining-Apriori Algorithm
13 pages
Lecture 14
No ratings yet
Lecture 14
20 pages
DWM
No ratings yet
DWM
66 pages
19 Reduction Computation History PCP
No ratings yet
19 Reduction Computation History PCP
25 pages
Rani 2
No ratings yet
Rani 2
98 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Lecture 12
No ratings yet
Lecture 12
13 pages
DS & DBMS Course
No ratings yet
DS & DBMS Course
8 pages
Uml Tutorial For Beginners: Class Diagram
No ratings yet
Uml Tutorial For Beginners: Class Diagram
11 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
PT 08 - File Systems
No ratings yet
PT 08 - File Systems
6 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
Internship Proposal - Time Series Analysis For A Telecommunication Company PDF
No ratings yet
Internship Proposal - Time Series Analysis For A Telecommunication Company PDF
1 page
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
ER Model
No ratings yet
ER Model
8 pages
CRUD
No ratings yet
CRUD
6 pages
Spam Filtering Email Classification SFECM Using Gain and Graph Mining Algorithm
No ratings yet
Spam Filtering Email Classification SFECM Using Gain and Graph Mining Algorithm
7 pages
A0306 - CRM Ontology
No ratings yet
A0306 - CRM Ontology
6 pages
CIT 843 TMA 3 Quiz Question
No ratings yet
CIT 843 TMA 3 Quiz Question
3 pages
Virtual Reference Services in Modern Libraries
No ratings yet
Virtual Reference Services in Modern Libraries
5 pages
Sample Questions 4&5
No ratings yet
Sample Questions 4&5
4 pages
20 Properties of RE and R Sets
No ratings yet
20 Properties of RE and R Sets
2 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
Group 4 - Security Part 2 - Auditing Database-Systems - Villapa, Ceneta, Sister - Quizzes With Answers
No ratings yet
Group 4 - Security Part 2 - Auditing Database-Systems - Villapa, Ceneta, Sister - Quizzes With Answers
3 pages
325E5C
No ratings yet
325E5C
2 pages
LATCH - Enq TX - Index Contention
No ratings yet
LATCH - Enq TX - Index Contention
2 pages