0% found this document useful (0 votes)
34 views44 pages

Association Rule Mining 2023 (Compatibility Mode)

Association Rule Mining 2023

Uploaded by

Ajitesh Thawait
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views44 pages

Association Rule Mining 2023 (Compatibility Mode)

Association Rule Mining 2023

Uploaded by

Ajitesh Thawait
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Association Rule Learning

Association rule learning is a rule-based machine


learning method for discovering interesting
relations between variables in large databases.

It is intended to identify strong rules discovered in


databases using some measures of interestingness.

Rules Discovered:
{Src IP = 206.163.37.95,
Dest Port = 139,
Bytes  [150, 200]} --> {ATTACK}
Association rule learning

For example, if you analyse grocery lists of


a consumer over a period of time you will be
able to see a certain buying pattern, like,
if peanut butter & jelly are bought then
bread is also bought; this information can
be used in marketing and pricing
decisions.
Association rule learning
Another example is Netflix movie
recommendations that are made based
on choices made by previous
customers.
For example, if a movie of particular genre
is selected then similar movie
recommendations are made.
Market Basket Analysis
Market basket analysis may be performed
on the retail data of customer transactions
at a store.
That can be then used to plan marketing or
advertising strategies, or in the design of a
new catalog.
Market basket analysis can also help
retailers plan which items to put on sale at
reduced prices.
If customers tend to purchase computers
and printers together, then having a sale on
printers may encourage the sale of printers
as well as computers.
Frequent Pattern Analysis
 Frequent pattern: a pattern (a set of items, subsequences,
substructures, etc.) that occurs frequently in a data set
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in
the context of frequent itemsets and association rule mining
 Motivation: Finding inherent regularities in data
 What products were often purchased together?— Bread
and Butter?
 What are the subsequent purchases after buying a PC?
 Applications
 Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.

5
Association Mining
 Association rule mining:
 Finding frequent patterns, associations, or
correlations, among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
 Examples.
 Rule form: “Body ead [support, confidence]”.
 buys(x, “Bread”)  buys(x, “Milk”) [0.5%, 60%]
 major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%,
75%]
Example Association Rule

90% of transactions that purchase bread and


butter also purchase milk

Antecedent: bread and butter


Consequent: milk
Confidence factor: 90%
Example

 I: itemset
{cucumber, parsley, onion, tomato, salt, bread, olives,
cheese, butter}
 D: set of transactions
1 {{cucumber, parsley, onion, tomato, salt, bread},
2 {tomato, cucumber, parsley},
3 {tomato, cucumber, olives, onion, parsley},
4 {tomato, cucumber, onion, bread},
5 {tomato, salt, onion},
6 {bread, cheese}
7 {tomato, cheese, cucumber}
8 {bread, butter}}
FORMAL MODEL

• I = i1, i2, …, im: set of literals (items)


 D : database of transactions
 T  D : a transaction. T  I
 X: a subset of I
 T contains X if X  T.
Rule Measures: Support and Confidence
Customer
Customer
buys both
buys Laptop
Let minimum support 50%, and
minimum confidence 50%, we
have
 A  C (50%, 66.6%)
Customer  C  A (50%, 100%)
buys Printer

Transaction ID Items Bought


2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F
Formal Model (Cont.)

 Association rule: X  Y
here X  I, Y  I and X Y = .
 Rule X  Y has a support s in D
if s% of transactions in D contain X  Y.

 Rule X  Y has a confidence c in D


if c% of transactions in D that contain X also contain Y.
Terminologies

 K-Itemset : If the length of itemset is K


 Frequent K-Itemset :- If the length of the itemset is
K and the itemset satisfies a minimum support
threshold.
 Downward closure property :- Any subset of
frequent set is frequent set.
 Upward closure property :- Any superset of an
infrequent set is an infrequent set.
 Maximal Frequent set :- The set itself is frequent
and no superset of this is a frequent set.
 Border set :- If it is not frequent set,but all its proper
subset are frequent set.
The Apriori Algorithm — Example –1
Database D
itemset sup.
{1} 2 itemset sup.
TID Items
100 134 C1 {2} 3 L1 {1} 2
200 235 {3} 3 {2} 3
Scan D {4} 1 {3} 3
300 1235
400 25 {5} 3 {5} 3

itemset sup C2 itemset


itemset sup C2
{1 2} 1 {1 2}
L2 {1 3} 2 Scan D
{1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 5} 3 {2 3} 2 {2 3}
{3 5} 2 {2 5} 3 {2 5}
{3 5} 2 {3 5}

C3 itemset Scan D L3 itemset sup


{2 3 5} {2 3 5} 2
Discovering Large Itemsets
Apriori algorithm: - Required Prior Knowledge
Level-wise algorithm
Basic intuition:

Itemset having k items can be generated by joining


large itemsets having k-1 items, and deleting those
that contain any subset that is not large.

Algorithm consist of two phases :


I) Candidate Generation
2) Pruning
Apriori Algorithm
Gen_candidate_itemsets with the given Lk-1 as
follows:
Ck = 
for all itemset L1  Lk-1 do
for all itemset L2  Lk-1 do

if L1[1] = L2[1] and L1[2] = L2[2] and …..and L1[k-1]


< L2[k-1]
then c= L1[1], L2[2]……. L1[k-1],
L2[k-1]

Ck = Ck U {c}
Apriori Candidate Generation
Example :
L3 = { { 1,2,3},{1,2,5},{1,3,5},{2,3,4},{2,3,5} }
Algorithm will generate following itemset
1) { 1, 2,3, 5 } is generated from { 1,2 ,3}
and {1,2,5}
Similarly , { 2,3,4,5} is generated from { 2,3,4} and
{ 2,3, 5}.
Now
Ck = { {1,2,3,5} , {2,3,4,5} }
Example of Generating Candidates

 L3={abc, abd, acd, ace, bcd}


 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because ade is not in L3
 C4={abcd}
PRUNING
 The pruning step eliminates the extensions of (k-1)-
itemsets which are not frequent.

prune(Ck)

for all c  Ck
for all (k-1)-subsets d of c do
if d  Lk-1
then Ck = Ck \ {c}
APRIORI ALGORTHM
Example:
with k = 3 (& k-itemsets lexicographically ordered)

{3,4,5}, {3,4,7}, {3,5,6}, {3,5,7}, {3,5,8}, {4,5,6},


{4,5,7}

genereate all possible (k+1)-itemsets, by, for each to sets where we have
{a1,a2,..a(k-1),X} and {a1,a2,..a(k-1),Y}, results in candidate
{a_1,a_2,...a_(k-1),X,Y}.

{3,4,5,7}, {3,5,6,7}, {3,5,6,8}, {3,5,7,8}, {4,5,6,7}


APRIORI ALGORTHM
Example (CONTINUED):

{3,4,5,7}, {3,5,6,7}, {3,5,6,8}, {3,5,7,8}, {4,5,6,7}

Delete (prune) all itemset candidates with non-frequent


subsets. Like; {3,5,6,8} self never frequent since subset
{5,6,8} is not frequent.

Actually, here, only one remaining candidate {3,4,5,7}

Last; after pruning, determine the support of the remaining


itemsets, and check if they make the threshold.
EKO , KMY, KOY
EK ,EO,KO KM,KY,MY KO,KY,OY
Minimum Support 2
Minimum Support 3
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {5} => {8} ) ?


supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} =>


{5} ) ?
supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,
then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf ( {5} => {8} ) ? 80% Done.


Conf ( {8} => {5} ) ? 57% Done.

Rule ( {5} => {8} ) more meaningful then


Rule ( {8} => {5} )
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1: 3, 5, 8.
2: 2, 6, 8.
3: 1, 4, 7, 10.
4: 3, 8, 10.
5: 2, 5, 8.
6: 1, 5, 6.
7: 4, 5, 6, 8.
8: 2, 3, 4.
9: 1, 5, 7, 8.
10: 3, 8, 9, 10.

Conf ( {9} => {3} ) ?


supp({9}) = 1 , supp({3}) = 4 , supp({3,9}) = 1,
then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf( {9} => {3} ) = 100%. Done.

Notice: High Confidence, Low Support.


-> Rule ( {9} => {3} ) not meaningful
Problem decomposition
1. Find all itemsets that have transaction
support above minimum support.Such
itemsets are called frequent itemsets.
2. Use the large itemsets to generate the
Association rules:
2 1. For every large itemset I, find its all
subsets
2.2. For every subset a, output a rule:
a  (I - a) if support(l)
minconf 
support(a)
Generation of Rules

Extract rules from

For a rule:

R: < c1, c2, …, ci-1 >  < ci, ci+1, ... ck >
Head Tail

A confidence value is calculated:


A database has 5 transactions. Let min sup = 0.6
and min conf = 0.8.
•List the frequent k-itemset for the largest k, and
•all the strong association rules (with given support
and confidence) for the following shape of rules:

transaction, buys(x,item1) ^ buys(x,item2)


buys(x,item3)

Customer Date Item_bought


100 10/15 {K,A,D.B,C}
200 10/15 {D,A,E,F}
300 10/19 { C , D ,B , E }
400 10/20 { B , A , C ,K D }
500 10/21 {A,G,C}

You might also like