0% found this document useful (0 votes)

40 views59 pages

Module-IV (Frequent Pattern & Association Rule Mining)

Uploaded by

20051694

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views59 pages

Module-IV (Frequent Pattern & Association Rule Mining)

Uploaded by

20051694

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Frequent Item-sets

&
Association Rules
Frequent Item Set & Association Rules Mining
Association rule mining
Given a set of records each of which contain some number of items from a given collection of
transactions. Aim is to produce the dependency rules which will predict occurrence of an item
based on the occurrences of other items.
Transac ID Items bought Example of Association Rules
tion
T1 10 Beer, Nuts, Diaper
{Diaper} → {Beer},
{Milk, Bread} →{Eggs,Coke},
T2 20 Beer, Coke, Diaper
{Beer, Bread} → {Milk}
T3 30 Bread, Diaper, Eggs
T4 40 Nuts, Eggs, Milk
Implication (“->”) means co-occurrence,
T5 50 Nuts, Coffee, Diaper, Eggs, Milk
not causality!
The customers who purchase Bread have a chance to purchase Milk
The retail organizations are trying to find the assocation between products
which can be sold togother
Note: Assume all data are categorical. However, no good algorithm for numeric data.
Example of Association rule

Then the rule discovered be :-

“IF” part = antecedent bread and butter → Milk [with confidence level 90%]
“THEN” part = consequent (If: antecedent) (then: consequent)
Note: Antecedent and consequent are disjoint (i.e., have no items in common)

Can be used to determine what should be done to boost its sales.

Can be used to see which products would be affected if

the store discontinues selling bread and butter.

Can be used to see what

products should be sold with bread and butter to promote sale of Milk!
Transaction data: Supermarket data
Concepts:
• I: the set of all items sold in the store. {Bread, Milk,...}
• A transaction: items purchased in a basket; it may
have TID (transaction ID)
• A transactional dataset: A set of transactions
• Itemset: A collection of one or more items
{Milk, Bread, Diaper}
k-itemset: An itemset that contains k items
Class Exercise Transaction data: Supermarket data

Determine the frequent Itemsets.

Market-Basket Model
What is Market Basket Analysis?
 It is a technique which identifies the strength of association between pairs of products purchased
together and identify patterns of co-occurrence. A co-occurrence is when two or more things take place
together.
 It creates If-Then scenario rules, for example, if item A is purchased then item B is likely to be
purchased. The rules are probabilistic in nature or, in other words, they are derived from the
frequencies of co-occurrence in the observations. Frequency is the proportion of baskets that contain the
items of interest. The rules can be used in pricing strategies, product placement, inventory
management, content placement in e-commerce sites, advertising strategy, store layout, and
various types of cross-selling strategies.
 Market Basket Analysis takes data at transaction level, which lists all items bought by a customer in a
single purchase. The technique determines relationships of what products were purchased with which
other product(s). These relationships are then used to build profiles containing If-Then rules of the
items purchased.
 The rules could be written as: If {A} Then {B}. The If part of the rule (the {A}) is known as the
antecedent and the THEN part of the rule is known as the consequent (the {B}). The antecedent is the
condition and the consequent is the result.
7
What is Market Basket Analysis
 For example, you are in a supermarket to buy milk. Referring to the below example, there are nine baskets
containing varying combinations of milk, cheese, apples, and bananas.
 Question - are you more likely to buy apples or cheese in the same transaction than somebody who did not buy
milk?

 The next step is to determine the relationships and the rules. So, association rule mining is applied in this context.
It is a procedure which aims to observe frequently occurring patterns, correlations, or associations from datasets
found in various kinds of databases such as relational databases, transactional databases, and other forms of
repositories.
8
Market-Basket Analysis
• Market Basket Analysis (Association Analysis) is a mathematical modeling
technique based upon the theory that if you buy a certain group of items, you
are likely to buy another group of items.

• It is used to analyze the customer purchasing behavior and helps in

increasing the sales and maintaining inventory by focusing on the point
of sale transaction data.

9
Market-Basket Analysis
• Consider shopping cart filled with several items
• Market basket analysis tries to answer the following questions:
• Who makes purchases?
• What items tend to be purchased together
• obvious: steak-potatoes; beer-pretzels
• What items are purchased sequentially
• obvious: house-furniture; car-tires
• In what order do customers purchase items?
• What items tend to be purchased by season
• It is also about what customers do not purchase, and why.
• If customers purchase baking powder, but no flour, what are they baking?
• If customers purchase a mobile phone, but no case, are you missing an
opportunity?
10
Market-Basket Analysis

• Categorize customer purchase behavior

• Identify actionable information
• purchase profiles
• profitability of each purchase profile
• use for marketing
• layout or catalogs
• select products for promotion
• space allocation, product placement

11
Market-Basket Analysis
TID CID Date Item Qty
111 201 5/1/99 Pen 2
A database of customer transactions 111 201 5/1/99 Ink 1
111 201 5/1/99 Milk 3
• Each transaction is a set of items 111 201 5/1/99 Juice 6
• Example: 112 105 6/3/99 Pen 1
Transaction with TID 111 contains 112 105 6/3/99 Ink 1
112 105 6/3/99 Milk 1
items 113 106 6/5/99 Pen 1
{Pen, Ink, Milk, Juice} 113 106 6/5/99 Milk 1
114 201 7/1/99 Pen 2
114 201 7/1/99 Ink 2
114 201 7/1/99 Juice 4

12
Market-Basket Analysis
Co-occurrences
• 80% of all customers purchase items X, Y and Z together.
Association rules
• 60% of all customers who purchase X and Y also buy Z.
Sequential patterns
• 60% of customers who first buy X also purchase Y within
three weeks.

13
Association rule mining
• Proposed by Agrawal et al in 1993.
• It is an important data mining model studied extensively by the database
and data mining community.
• Assume all data are categorical.
• No good algorithm for numeric data.
• Initially used for Market Basket Analysis to find how items purchased by
customers are related.

Bread  Milk [sup = 5%, conf = 100%]

14
Association Rule Mining Task
The association rule has three measures that
express the degree of confidence in the rule, i.e.
Support, Confidence, and Lift

 The number of transactions that include both items {A} and {B} as a percentage
of the total number of transactions. It is a measure of how frequently the collection of items
(here {A} & {B}) occur together as a percentage of all transactions.
 probability that a transaction contains (A Ս B)
Example: Support(milk) = 6/9, Support(cheese) = 7/9, Support(milk & cheese) = 6/9.
Association Rule Mining Task
 It is the ratio of the no of transactions that includes {A} and {B} to the no of
transactions that includes all items in {A}.
 Confidence is the conditional probability that a transaction having A also contains B:- Pr(B|A)
Example: , Confidence(milk => cheese) = (milk & cheese)/(milk) = 6/ 6.

The lift of the rule A=>B is the confidence of the rule divided by the expected confidence,
assuming that the itemsets A and B are independent of each other.

Example: Lift(milk => cheese) = [(milk & cheese)/(milk) ]/[cheese/Total] = [6/6] / [7/9] =
1/0.777.
Class Exercise Transaction data: Supermarket data

Basket Item(s) Support Confidence Lift

Milk ?
Cheese ?
1 Milk => Cheese ? ? ?
2 Apple, Milk ?
3 (Apple, Milk) => Cheese ? ? ?
4 Apple, Cheese ?
5 (Apple, Cheese) => Milk ? ? ?
Association Rule Mining Task

The goal of association rule mining is to find all rules having

 support ≥ minsup threshold
 confidence ≥ minconf threshold

• List all possible association rules

• Compute the support and confidence for each rule
• Prune rules that fail the minsup and minconf thresholds
[Computationally prohibitive!]
The model: Association rules
• A transaction t contains X, a set of items (itemset) in I,
if X  t.
• An association rule is an implication of the form:
X  Y, where X, Y  I, and X Y = 
• An item-set is a set of items.
• E.g., X = {milk, bread, cereal} is an item-set.
• A k-item-set is an item-set with k items.
• E.g., {milk, bread, cereal} is a 3-itemset

19
Association Rule Mining Task
Association Rule Mining is a Two-step approach:

• Generate all itemsets whose support >= minsup

• This property belongs to a special category of properties called

anti-monotonicity in the sense that if a set cannot pass a test, all of its
supersets will fail the same test as well.

• Generate high confidence rules from each frequent itemset, where

• each rule is a binary partitioning of a frequent item-set
Association Rule Mining Task [Brute-force Approach]

Given d items, there are 2d -1

possible candidate item-sets
Association Rule Mining Algorithim: Apriori Algorithm
 The Apriori algorithm uses frequent itemsets to generate association rules and with the help of
these association rules it determines how strongly or how weakly two objects are connected.
 It is the iterative process for finding the frequent itemsets from the large dataset.
 It is designed to work on the databases that contain transactions. It is mainly used for market
basket analysis and helps to find those products that can be bought together. It can also be used
in the healthcare field to find drug reactions for patients.

Apriori says the probability that item I is not frequent if:

 P(I) < minimum support threshold, then I is not frequent.

 And If an itemset set I has value less than minimum support then all of its
supersets (I+A) will also fall below minimum support, and thus can be ignored.
This property is called the Anti-monotone property.
Apriori Algorithm
• The steps followed to genarate frequent item set in the Apriori algorithm are:

 Join Step: This step generates (k+1) candidate itemset from k-itemsets by joining each item
with itself.
[The algorithm uses a level-wise search, where k-itemsets are used to explore (k+1)-itemsets.
The frequent subsets are extended one item at a time (this step is known as candidate generation
process) ]
 Prune Step: This step scans the support count of each item in the database. If the candidate
item does not meet minimum support, then it is regarded as infrequent and thus is removed.
This step is performed to reduce the size of the candidate itemsets.
[From Candidate list of k-itemsets it extracts a Frequent list of k-itemsets using the support count]

 Find the association rules from the final subsets or frequent itemset by calculating the
confidence value. The association rule which does not meet minimum confidence are
ignored.
Apriori Algorithm Problem

For the following given dataset generate rules using Apriori Algorithms. Consider support
value=2 (or 22%) and confidence = 50%
Apriori Algorithm Problem
Step-1:
Iteratio:1
Find from C1 that
Find that contains Support contains item set of length-1 those meet the
count of each item set of length-1 minimum support count.

C1 L1
Itemset Support Count Itemset Support Count
A 6 A 6
B 7 B 7
C 6
C 6
D 2
D 2
E 1
Step-1: Apriori Algorithm Problem contd...
Iteratio:2
Find with the help of L1 that Find from C2 that
contains item set of length-2 those meet the
contains Support count of each item set of
minimum support count.
length-2
C2 L2
Itemset Support Count Itemset Support Count
{A, B} 4
{A, B} 4
{A,C} 4
{A, D} 1 {A, C} 4
{B, C} 4 {B, C} 4
{B, D} 2
{B, D} 2
{C, D} 0
Apriori Algorithm Problem contd...
Step-1:
Iteratio:3 Find from C3 that
Find with the help of L2 that contains item set of length-3 those meet the
contains Support count of each item set of minimum support count.
length-3
C3 L3

Itemset Support Count Itemset Support Count

{A, B, C} 2
{A, B, C} 2
{B, C, D} 0 Now, as L3 is having only one combination,
{A, C, D} 0 i.e., {A, B, C} no further joining can be
possible.
{A, B, D} 0
Apriori Algorithm Problem contd...
Step-2: Association Rule Generation
Create a new table Association Rule (AR) with all the possible rules from the occurred
combinations of frequent item sets. For all the rules, calculate the Confidence (A=>B) using
formula sup(A^B)/A. After calculating the confidence value for all rules, exclude the rules
that have less confidence than the minimum threshold (50%).
Rules Confidence
A^B → C sup{(A^B)^C}/sup(A^B)= 2/4 = 0.5 =50%
B^C → A sup{(B^C) ^A}/sup(B^C)= 2/4=0.5=50%
A^C → B sup{(A ^C)^B}/sup(A^C)= 2/4=0.5=50%
C → A^B sup{(C^( A^B)}/sup(C)= 2/6=0.4=40%
A → B^C sup{(A^( B^C)}/sup(A)= 2/6=0.33=33.33%
B → B^C sup{(B^( B^C)}/sup(B)= 2/7=0.28=28%

As the given minimum threshold confidence is 50%, so the first three rules A^B → C, B^C → A,
and A^C → B can be considered as the strong association rules for the given problem.
APRIORI ALGORITHM EXAMPLE

29
Class Excercise

30
Class Exercise Solution

31
Generating Association Rules
From frequent item-sets
• Procedure 1:
• Let we have the list of frequent item-sets

• Generate all nonempty subsets for each frequent item-set I

• For I = {1,3,5}, all nonempty subsets are {1,3},{1,5},{3,5},{1},{3},{5}

• For I = {2,3,5}, all nonempty subsets are {2,3},{2,5},{3,5},{2},{3},{5}

32
Generating Association Rules
From frequent item-sets
• Procedure 2:
• For every nonempty subset S of I, output the rule:
S → (I - S)
• If support_count(I)/support_count(s)>= min_conf
where min_conf is minimum confidence threshold

• Let us assume:
• minimum confidence threshold is 60%

33
Association Rules with confidence
• R1 : 1,3 -> 5
– Confidence = sc{1,3,5}/sc{1,3} = 2/3 = 66.66% (R1 is selected)
• R2 : 1,5 -> 3
– Confidence = sc{1,5,3}/sc{1,5} = 2/2 = 100% (R2 is selected)
• R3 : 3,5 -> 1
– Confidence = sc{3,5,1}/sc{3,5} = 2/3 = 66.66% (R3 is selected)
• R4 : 1 -> 3,5
– Confidence = sc{1,3,5}/sc{1} = 2/3 = 66.66% (R4 is selected)
• R5 : 3 -> 1,5
– Confidence = sc{3,1,5}/sc{3} = 2/4 = 50% (R5 is REJECTED)
• R6 : 5 -> 1,3
– Confidence = sc{5,1,3}/sc{5} = 2/4 = 50% (R6 is REJECTED)
34
How to efficiently generate rules?
• In general, confidence does not have an anti-monotone property
c(ABC→D) can be larger or smaller than c(AB →D)

• But confidence of rules generated from the same item-set has an anti-
monotone property

• e.g., L= {A,B,C,D}
c(ABC→D) ≥ c(AB→CD) ≥ c(A→BCD)
Confidence is anti-monotone w.r.t. number of items on the RHS of the rule.

35
Rule generation for Apriori Algorithm

36
Rule generation for Apriori Algorithm

Pruned the
Rule
37
Apriori Algorithm Flow
Apriori Algorithm Pseudo Code
Apriori Algorithm
Class Exercise
Q1: Find frequent temsets and generate association rules for them. Illustrate it with step-by-step process.
Transaction List of items Minimum support = 2
T1 I1, I2, I3 Minimum confidence = 50%
T2 I2, I3, I4
T3 I4, I5
T4 I1, I2, I4
T5 I1, I2, I3, I5
T6 I1, I2, I3, I4
Q2: Choose minimum support and minimum confidence to your choice and Find frequent itemsets.
Advantages
Uses large itemset property
Easily parallelized
Easy to implement
Disadvantages
Assumes transaction database is memory resident.
Requires many database scans.
Frequent Itemset, Closed Itemset and Maximal Itemset

TID Itemset
1 {A, C, D}
2 {B, C, E}
3 {A, B, C, E}
4 {B, E}
5 {A, B, C, E}
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support FR/In-Fr Itemset Support FR/In-Fr
Frequent Itemset:
A 3/5 Freq ABD 0/5 In-fre
For the following Transaction Database B 4/5 Freq ABE 2/5 Freq
for minimum support 2 the frequent C 4/5 ACD 1/5
Freq In-fre
itemset are: D 1/5 ACE 2/5
In-fre Freq
Total Itemsets 25-1=31 E 4/5
Freq
ADE 0/5 In-fre
Frequent Itemets = 15 AB 2/5 BCD 0/5 In-fre
Freq
Infrequent Itemsets = 16 AC 3/5 BCE 3/5 Freq
Freq
TID Itemset AD 1/5 BDE 0/5 In-fre
In-fre
AE 2/5 CDE 0/5 In-fre
1 {A, C, D} Freq
BC 3/5 ABCD 0/5 In-fre
2 {B, C, E} Freq
BD 0/5 ABCE 2/5 Freq
BE 4/5 In-fre ABDE 0/5 In-fre
3 {A, B, C, E}
CD 1/5 Freq ACDE 0/5 In-fre
4 {B, E}
CE 3/5 In-fre BCDE 0/5 In-fre
5 {A, B, C, E} DE 0/5 Freq ABCDE 0/5 In-fre
Frequent Itemset, Closed Itemset and Maximal Itemset
Frequent Iteset: Itemset
Itemset Support
Support FR/In-Fr
FR/In-Fr Itemset
Itemset Support
Support FR/In-Fr
FR/In-Fr
AA 3/5
3/5 Freq
Freq ABD
ABD 0/5
0/5 In-fre
In-fre
TID Itemset BB 4/5
4/5 ABE
ABE 2/5
2/5 Freq
Freq
Freq
Freq
1 {A, C, D} CC 4/5
4/5 Freq
Freq ACD
ACD 1/5
1/5 In-fre
In-fre
D
D 1/5
1/5 In-fre
In-fre ACE
ACE 2/5
2/5 Freq
Freq
2 {B, C, E}
EE 4/5
4/5 Freq
Freq ADE
ADE 0/5
0/5 In-fre
In-fre
3 {A, B, C, E} AB
AB 2/5
2/5 BCD 0/5 In-fre
Freq
Freq BCD 0/5 In-fre
4 {B, E} AC
AC 3/5
3/5 Freq
Freq BCE
BCE 3/5
3/5 Freq
Freq
AD
AD 1/5
1/5 In-fre BDE
BDE 0/5
0/5 In-fre
In-fre
5 {A, B, C, E} In-fre
AE
AE 2/5
2/5 Freq
Freq CDE
CDE 0/5
0/5 In-fre
In-fre
BC
BC 3/5
3/5 Freq
Freq ABCD
ABCD 0/5
0/5 In-fre
In-fre
Apriori Principle:
BD
BD 0/5
0/5 In-fre
In-fre ABCE
ABCE 2/5
2/5 Fre
Freq
• If an Itemset is Infrequent hen BE
BE 4/5
4/5 Freq
Freq ABDE
ABDE 0/5
0/5 In-Freq
In-fre
all its Supersets are Infrequent. CD
CD 1/5
1/5 In-fre
In-fre ACDE
ACDE 0/5
0/5 In-fre
In-fre
• If an Itemset is frequent then CE
CE 3/5
3/5 Freq
Freq BCDE
BCDE 0/5
0/5 In-fre
In-fre
all its Subsets are Frequent DE
DE 0/5
0/5 In-Freq
In-Freq ABCDE
ABCDE 0/5
0/5 In-fre
In-fre
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support
Due to Apriori Principle, Frequent Itemset: ==> A 3/5
B 4/5

TID Itemset C 4/5

E 4/5
1 {A, C, D}
AB 2/5
2 {B, C, E} AC 3/5
3 {A, B, C, E} AE 3/5
4 {B, E} BC 2/5
BE 4/5
5 {A, B, C, E}
CE 3/5
ABE 2/5
ACE 2/5
BCE 3/5
ABCE 2/5
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support FR/In-Fr CL/NCL
Closed Itemset:
A 3/5 Freq NCL
An itemset X is Closed Itenmset if B 4/5 Freq NCL
1. X is Frequent C 4/5 Freq CL
2. No immediate superset of X has E 4/5 Freq NCL
same support as X AB 2/5 Freq NCL
[or If atleast one superset contains equal
AC 3/5 Freq CL
support of X then X is Not Closed].
AE 2/5 Freq NCL
BC 3/5 Freq NCL
TID Itemset
BE 4/5 Freq CL
1 {A, C, D}
CE 3/5 Freq NCL
2 {B, C, E} ABE 2/5 Freq NCL
3 {A, B, C, E} ACE 2/5 Freq NCL
4 {B, E} BCE 3/5 Freq CL
5 {A, B, C, E} ABCE 2/5 Freq CL
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support FR/In-Fr CL/NCL Max/NMax
Maximal Itemset: A 3/5 Freq NCL NMax
B 4/5 Freq NCL NMax
An itemset X is Maximal Itenmset if
C 4/5 Freq CL NMax
1. X is Frequent
2. No immediate superset of X is E 4/5 Freq NCL NMax
frequent. AB 2/5 Freq NCL NMax
AC 3/5 Freq CL NMax
AE 2/5 Freq NCL NMax
BC 3/5 Freq NCL NMax
TID Itemset
BE 4/5 Freq CL NMax
1 {A, C, D}
CE 3/5 Freq NCL NMax
2 {B, C, E} ABE 2/5 Freq NCL NMax
3 {A, B, C, E} ACE 2/5 Freq NCL NMax
4 {B, E} BCE 3/5 Freq NMax
CL
5 {A, B, C, E} ABCE 2/5 Freq MAX
CL
Frequent Itemset, Closed Itemset and Maximal Itemset
Itemset Support FR/In-Fr CL/NCL Max/NMax
According to given example
A 3/5 Freq NCL NMax
• Total Itemset: 31 (one set is Ø)
• Frequent Itemset: 15 B 4/5 Freq NCL NMax
• Closed Itemset: 5 C 4/5 Freq CL NMax
• Maximal Itemset: 1 E 4/5 Freq NCL NMax
The relation is M C F AB 2/5 Freq NCL NMax
AC 3/5 Freq CL NMax
AE 2/5 Freq NCL NMax
BC 3/5 Freq NCL NMax
BE 4/5 Freq CL NMax
CE 3/5 Freq NCL NMax
ABE 2/5 Freq NCL NMax
ACE 2/5 Freq NCL NMax
BCE 3/5 Freq NMax
CL
ABCE 2/5 Freq MAX
CL
Frequent Itemset, Closed Itemset and Maximal Itemset

Class Exercise
Q1: Find frequent temsets, closed itemsets and maximal itemsets from the following transaction table.
Illustrate it with step-by-step process.

TID Itemset
100 {I1, I3, I4}
200 {I2, I3, IE}
300 {I1, I2, I3, I5}
400 {I2, I5}
Which Patterns Are Interesting?

Pattern Evaluation Methods

Which Patterns Are Interesting?

• Most association rule mining algorithm employ a support-confidence

framework

• However when mining at low support thresholds or mining for long

patterns this association rule mining fail or proofs to be bottleneck
approach.

• Or Strong rules of this method are not necessarily interesting.

Strong Rules are not Necessarily Interesting
Consider the following transaction DB and find the association rule
Let min support = 30% & min confidence = 60%
Game Game Sum(row) Find association of (game->video) is strong or not?
Video 4000 3500 7500 Ans:
video 2000 500 2500 support(game+video) = 4000/10000= 40% > 30%
Sum(col) 6000 4000 10000 confidence(game->video)= sup(game+video)/sup(game)
= (4000/10000)/(6000/10000) = 40/60=66% > 60%
As support and confidence both are above the given threshold
=> rule (game->video) is one strong rule
However, purchasing video is 75% (7500/10000) which is more than computed 66%

=> In fact games and videos are negatively associated not strongly associated

Or all strong rule are not necessarily interesting

From Association Analysis to Correlation Analysis
• Association rules analysis is a technique to uncover how items are associated/correlated to each other.
Traditionally, association rule mining is performed by using two interestingness measures named the
support and confidence to evaluate rules.
• However, the drawback of the confidence measure is that it might misrepresent the importance of an
association. As discussed in last example. To account for the base popularity of both constituent items, a
third measure is used called lift / Intereset.
A=>B [support, confidence, correlation]

• Support: This says how popular an itemset is, as measured by the proportion of transactions.
Confidence: This says how likely item Y is purchased when item X is purchased, expressed as {X ->
Y}. This is measured by the proportion of transactions with item X, in which item Y also appears.

• Lift: The lift value of an association rule is the ratio of the confidence of the rule and the expected
confidence of the rule. This says how likely item Y is purchased when item X is purchased, while
controlling for how popular item Y is.
Correlation Analysis
Correlation Analysis says howmuch related two items are. [There are two
techniques to find correlation of objects [Lift and Chi-square Method]
• In probabilistics, two elements let A and B are independent:
if P(AՍ B) = P(A) × P(B)

Measure of dependent/correlated events: LIFT

Correlation Concepts
Lift (A->B) measures how much A and B are corelated.

• If Lift =1 ⇒ corr(A,B)=1: Occurrence of A is independent of the occurrence of B

i.e. there is no correlation between them
• If Lift >1 ⇒ corr(A,B)>1: A and B are positively correlated i.e. the occurrence of
one implies the occurrence of the other
• If Lift <1 ⇒ corr(A,B)<1: A and B are negatively correlated i.e. occurance of A
discourages the occurrence of B
Interestingness Measure: Correlations (Lift)

Game Game Sum(row)

Video 4000 3500 7500

video 2000 500 2500
Sum(col) 6000 4000 10000
Interestingness Measure: Correlations (Lift) contd..
Association Rule Mining Task

If d=6, R = 602 rules

Association Rule Mining Algorithims
There are many association rule mining algorithms that uses different strategies [computational
efficiency] and data-structures [memory requirement] but result same sets of rules.
or
Given a transaction data set T, and a minimum support and a minimum confidence, the set of
association rules existing in T is uniquely determined.

• Use pruning techniques to reduce the search space of M (i.e. 2d)

• As the size of item-set increases it reduces the number of transaction by DHP (Direct
Hashing & Purning) and vertical-based mining algorithms

• Use efficient data structures to store the candidates or transactions due to which it is not
required to match every candidate against every transaction
We will study only Apriori Algorithm

COMP5310 Notes
No ratings yet
COMP5310 Notes
10 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
Association
No ratings yet
Association
54 pages
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
No ratings yet
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
12 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
30 pages
UNIT 3 Mining Frequent Pattern
No ratings yet
UNIT 3 Mining Frequent Pattern
11 pages
Unit 2
No ratings yet
Unit 2
14 pages
Association Rule Mining
No ratings yet
Association Rule Mining
11 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
14 pages
CH 5
No ratings yet
CH 5
53 pages
Data Mining Unit-Ii
No ratings yet
Data Mining Unit-Ii
32 pages
Data Mining Unit-Ii
No ratings yet
Data Mining Unit-Ii
29 pages
Sig Mod 93 Assoc
No ratings yet
Sig Mod 93 Assoc
10 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
Unit-14 Association Rules
No ratings yet
Unit-14 Association Rules
28 pages
Data Mining
No ratings yet
Data Mining
4 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Association Rules
No ratings yet
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Association Rules
9 pages
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
ch14 Min Assoc Rules
No ratings yet
ch14 Min Assoc Rules
12 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Association Rule
No ratings yet
Association Rule
17 pages
TMK - DWDM - Unit 4. From Government Engineering College
No ratings yet
TMK - DWDM - Unit 4. From Government Engineering College
176 pages
Data Minig Unit 2nd
No ratings yet
Data Minig Unit 2nd
9 pages
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
12 pages
Mining Frequent Patterns, Associations, and Correlations
No ratings yet
Mining Frequent Patterns, Associations, and Correlations
12 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
1.assosiation Rules
No ratings yet
1.assosiation Rules
21 pages
Association Analysis
No ratings yet
Association Analysis
26 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
15 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
DM Unit Ii
No ratings yet
DM Unit Ii
30 pages
Unit 4 - DA - Frequent Itemsets and Associations
No ratings yet
Unit 4 - DA - Frequent Itemsets and Associations
31 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Lect 6
No ratings yet
Lect 6
74 pages
AssociationRule and Apriori
No ratings yet
AssociationRule and Apriori
45 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
33 pages
Association Rule Mining: Applications in Various Areas: Akash Rajak and Mahendra Kumar Gupta
No ratings yet
Association Rule Mining: Applications in Various Areas: Akash Rajak and Mahendra Kumar Gupta
5 pages
UNIT-4 DMCT Discovering Patterns and Rules
No ratings yet
UNIT-4 DMCT Discovering Patterns and Rules
18 pages
Unit 3 Mining Frequent Patterens
No ratings yet
Unit 3 Mining Frequent Patterens
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
1 Best Naive Lec02-Assocrules Apriori
No ratings yet
1 Best Naive Lec02-Assocrules Apriori
58 pages
Association Analysis Basic Concepts and Algorithms
No ratings yet
Association Analysis Basic Concepts and Algorithms
83 pages
DA Unit 4
100% (1)
DA Unit 4
125 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Presentation: Advanced Business Analytics
No ratings yet
Presentation: Advanced Business Analytics
23 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Market Basket Analysisv4.0 PDF
No ratings yet
Market Basket Analysisv4.0 PDF
31 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
20 pages
Association Rules
No ratings yet
Association Rules
39 pages
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
No ratings yet
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
75 pages
Ch.5 - Association Rule Mining
No ratings yet
Ch.5 - Association Rule Mining
29 pages
7638 16634 1 SM
No ratings yet
7638 16634 1 SM
10 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
11 pages
Managing Blind: A Data Quality and Data Governance Vade Mecum
From Everand
Managing Blind: A Data Quality and Data Governance Vade Mecum
Peter Benson
No ratings yet
AdvancedBooks - Python Wiki
0% (1)
AdvancedBooks - Python Wiki
104 pages
DM Notes
No ratings yet
DM Notes
26 pages
Business Analytics: Enhancing Decision Making Association Analytics: A Mining Approach
No ratings yet
Business Analytics: Enhancing Decision Making Association Analytics: A Mining Approach
30 pages
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
No ratings yet
SEM 5 - Comps, IOT, CYBER, CS - Data Warehousing & Mining - 2024 MAY To 2022 DEC PYQ - Aeraxia - in
10 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
RGPV Syllabus Btech Cs 7 Sem cs703 B Data Mining and Warehousing
No ratings yet
RGPV Syllabus Btech Cs 7 Sem cs703 B Data Mining and Warehousing
1 page
Data Mining
No ratings yet
Data Mining
2 pages
DS ML CompleteSlides PDF
No ratings yet
DS ML CompleteSlides PDF
211 pages
(Ebook PDF) Data Mining Concepts and Techniques 3rd Instant Download
100% (4)
(Ebook PDF) Data Mining Concepts and Techniques 3rd Instant Download
54 pages
Clustering Data Without Distance Functions
No ratings yet
Clustering Data Without Distance Functions
6 pages
MCA III Year I Semester
No ratings yet
MCA III Year I Semester
12 pages
APRIORI Algorithms
No ratings yet
APRIORI Algorithms
4 pages
Advanced Database
No ratings yet
Advanced Database
23 pages
Group Project Weka
No ratings yet
Group Project Weka
20 pages
Kadi Sarva Vishwavidyalaya: B.E. Semester 7 Examination April - May, 2016
No ratings yet
Kadi Sarva Vishwavidyalaya: B.E. Semester 7 Examination April - May, 2016
2 pages
76-Article Text-107-2-10-20180521
No ratings yet
76-Article Text-107-2-10-20180521
4 pages
DMBD (3rd) Dec2020
No ratings yet
DMBD (3rd) Dec2020
2 pages
SOM-based Generating of Association Rules
No ratings yet
SOM-based Generating of Association Rules
5 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
d2k Tutorial
No ratings yet
d2k Tutorial
78 pages
AI Based Resource Utilization Estimation and Prediction For Public Transportation Using Face Recognition Systems
No ratings yet
AI Based Resource Utilization Estimation and Prediction For Public Transportation Using Face Recognition Systems
13 pages
FP Growth Datamining Lect 5
No ratings yet
FP Growth Datamining Lect 5
86 pages
A Model For Prediction of Crop Yield: Pachaiyappas College India Pachaiyappas College India
No ratings yet
A Model For Prediction of Crop Yield: Pachaiyappas College India Pachaiyappas College India
8 pages
Sarika Certificate
No ratings yet
Sarika Certificate
7 pages
DM Question Bank
No ratings yet
DM Question Bank
5 pages
Data Mining Assignment
0% (1)
Data Mining Assignment
11 pages
BANA 560 Lecture 6 Association Rules Collaborative Filtering
No ratings yet
BANA 560 Lecture 6 Association Rules Collaborative Filtering
34 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Data Mining-Module Ii Notes (S4 Bca)
No ratings yet
Data Mining-Module Ii Notes (S4 Bca)
40 pages

Module-IV (Frequent Pattern & Association Rule Mining)

Uploaded by

Module-IV (Frequent Pattern & Association Rule Mining)

Uploaded by

Frequent Item-sets

Then the rule discovered be :-

Can be used to determine what should be done to boost its sales.

Can be used to see which products would be affected if

Can be used to see what

Determine the frequent Itemsets.

• It is used to analyze the customer purchasing behavior and helps in

• Categorize customer purchase behavior

Bread  Milk [sup = 5%, conf = 100%]

Basket Item(s) Support Confidence Lift

The goal of association rule mining is to find all rules having

• List all possible association rules

• Generate all itemsets whose support >= minsup

• This property belongs to a special category of properties called

• Generate high confidence rules from each frequent itemset, where

Given d items, there are 2d -1

Apriori says the probability that item I is not frequent if:

 P(I) < minimum support threshold, then I is not frequent.

Itemset Support Count Itemset Support Count

• Generate all nonempty subsets for each frequent item-set I

• For I = {2,3,5}, all nonempty subsets are {2,3},{2,5},{3,5},{2},{3},{5}

TID Itemset C 4/5

Pattern Evaluation Methods

• Most association rule mining algorithm employ a support-confidence

• However when mining at low support thresholds or mining for long

• Or Strong rules of this method are not necessarily interesting.

Or all strong rule are not necessarily interesting

Measure of dependent/correlated events: LIFT

• If Lift =1 ⇒ corr(A,B)=1: Occurrence of A is independent of the occurrence of B

Game Game Sum(row)

Video 4000 3500 7500

If d=6, R = 602 rules

• Use pruning techniques to reduce the search space of M (i.e. 2d)

You might also like