0% found this document useful (0 votes)
22 views

Data Mining Techniques Market Basket Analysis and Association Rules

This document discusses market basket analysis and association rule mining. It defines market basket analysis as an unsupervised data mining technique used to discover relationships between products customers purchase together. Association rule mining is used to generate rules that describe strong relationships between items in transactional data, which can be used to drive marketing strategies. The document outlines key concepts like support, confidence and lift for evaluating rule quality and discusses techniques for building, refining and extending association rule mining.

Uploaded by

ccodrici
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Data Mining Techniques Market Basket Analysis and Association Rules

This document discusses market basket analysis and association rule mining. It defines market basket analysis as an unsupervised data mining technique used to discover relationships between products customers purchase together. Association rule mining is used to generate rules that describe strong relationships between items in transactional data, which can be used to drive marketing strategies. The document outlines key concepts like support, confidence and lift for evaluating rule quality and discusses techniques for building, refining and extending association rule mining.

Uploaded by

ccodrici
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Mining Techniques

Chapter 9:
Market Basket Analysis and Association Rules
Market basket analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Market basket data I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Market basket data II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Association rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
How good is an association rule? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Building association rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Pizza example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Calculations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Renements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1
Market basket analysis
Undirected data mining technique (no target or response variable).
Kind of problem: what merchandise are customers buying and when?
Which products tend to be purchased together and which are amenable to promotion?
suggest new store layouts;
determine which products to put on special;
indicate when to issue coupons;
tie data to individual customers through a loyalty card or website registration.
Other potential applicationssee p. 288.
Pitfalls:
rules can end up merely describing previous marketing promotions;
product dierentiation may not be apparent until long history.
c Iain Pardoe, 2006 2 / 11
Market basket data I
Point-of-sale transaction data.
Three levels of market basket data:
customers;
orders (purchases, baskets, item sets);
items.
Track customers over time:
average # orders per customer;
average # (unique) items per order;
proportion of customers purchasing a particular product;
average # orders per customer including a particular product;
average quantity ordered per customer including a particular product.
c Iain Pardoe, 2006 3 / 11
2
Market basket data II
Order characteristics.
Item popularity:
in a one-item order;
in a multi-item order;
amongst customers who are repeat purchasers;
over time;
over regions.
Tracking marketing interventions.
Clustering products by usagewhich product(s) in a purchase suggest the purchase of other
particular products at the same time:
association rules (handfuls of items);
cluster analysis (larger sets).
c Iain Pardoe, 2006 4 / 11
Association rules
If a customer buys item A, we expect he/she will also buy item B.
Actionable: useful rules with understandable, high-quality info, e.g., if Barbie then chocolate;
might suggest more prominent product placement (e.g., beer and diaperssee p. 298),
product tie-ins and promotions, or particular ways to advertise products.
Trivial: already known by managers, e.g., if maintenance contract then large appliance;
may reect past marketing or product bundling;
exceptions may signal failures in business operations, data collection, or processing.
Inexplicable: seem to have no explanation and do not suggest a course of action, e.g., if new
hardware store then toilet bowl cleaners;
may reect over-tting.
c Iain Pardoe, 2006 5 / 11
3
How good is an association rule?
If I
1
(condition/antecedent) then I
2
(result/consequent) e.g., if OJ then milk (p. 299, beware
mistakes).
Support = proportion of transactions with I
1
& I
2
, e.g., 1/5 = 20.0%.
Condence = (# transactions with I
1
& I
2
) /
(# transactions with I
1
), e.g., 1/4 = 25.0%.
Lift =condence / proportion of transactions with I
2
, e.g., (1/4) / (1/5) = 1.25:
or, lift = (actual # transactions with I
1
& I
2
) / (expected #trans with I
1
&I
2
if no
relationship) = #(I
1
&I
2
) / (#I
1
#I
2
/ N) = 1 / (4 1 / 5);
excess = # (I
1
& I
2
) (#I
1
#I
2
/ N) =
1 (4 1 / 5) = 0.20.
c Iain Pardoe, 2006 6 / 11
Building association rules
Determine the item set:
select right level of detail (start broad, repeat with ner detail to hone in);
product hierarchies help to generalize items, e.g., frozen food dessert, vegetable, dinner;
hybrid approach depending on price or frequency (analysis easier when roughly same
number of transactions for each item);
virtual items go beyond product hierarchy, e.g., designer labels, low-fat products,
energy-saving options, payment method, day, demographic info, etc.
Calculate counts/probabilities of items and combinations of items.
Analyze support/condence/lift to nd actionable rules.
c Iain Pardoe, 2006 7 / 11
4
Pizza example
c Iain Pardoe, 2006 8 / 11
Calculations
Pr(I
1
&I
2
) Pr(I
1
)
Pr(I1&I2)
Pr(I1)
Pr(I
2
)
Pr(I1&I2)
Pr(I1)Pr(I2)
Rule Support Condence Lift
If M then P 0.25 0.450 0.556 0.425 1.31
If P then M 0.25 0.425 0.588 0.450 1.31
If M then C 0.20 0.450 0.444 0.400 1.11
If C then M 0.20 0.400 0.500 0.450 1.11
If P then C 0.15 0.425 0.353 0.400 0.88
If C then P 0.15 0.400 0.375 0.425 0.88
If (M,P) then C 0.05 0.250 0.200 0.400 0.50
If (M,C) then P 0.05 0.200 0.250 0.425 0.59
If (P,C) then M 0.05 0.150 0.333 0.450 0.74
Best rule: if Pepperoni then Mushroom.
Support 25% and condence 58.8%.
Lift 1.31 means when Pepperoni requested then Mushroom is 31% more likely to be requested
than if Pepperoni & Mushroom were unrelated.
c Iain Pardoe, 2006 9 / 11
5
Renements
Negative rules: when lift < 1 then negating the result produces a better rule, e.g.:
Pr(I
1
&I
2
) Pr(I
1
)
Pr(I1&I2)
Pr(I1)
Pr(I
2
)
Pr(I1&I2)
Pr(I1)Pr(I2)
Rule Support Conf Lift
If (P,C) then not M 0.10 0.150 0.667 0.550 1.21
If (M,P) then not C 0.20 0.250 0.800 0.600 1.33
Overcoming practical limits: nd rules with 2 items, 3 items, etc.
Pruning: reduce # of items and combinations of items considered at each step (e.g., minimum
support threshold).
Problems of large datasets: very computer intensive.
c Iain Pardoe, 2006 10 / 11
Extensions
Using association rules to compare stores, promotions at various times, geographic areas, urban
vs. suburban, seasons, etc. (use virtual items).
Dissociation rules, e.g., if I
1
and not I
2
then I
3
(use sparingly for only the most frequent items).
Sequential analysis using association rules: requires identifying customers and tracking them
over time.
c Iain Pardoe, 2006 11 / 11
6

You might also like