Lecture 8
Lecture 8
WAREHOUSE AND
MINING
LECTURE 8: ASSOCIATION
RULE MINING
INTRODUCTION
◎ Association Rule Mining (ARM) is a key technique used in data mining to
uncover hidden patterns and relationships in large datasets. It is mostly
applied in scenarios where we want to identify interesting and useful
relationships between variables in datasets.
◎ Goal of ARM: To discover rules that indicate associations, for example, "if a
customer buys item A, they are likely to buy item B."
MARKET BASKET ANALYSIS
◎ Market Basket Analysis (MBA) is a primary application of Association Rule
Mining. It analyzes transaction data to find associations between different
products that customers buy together.
◎ Example: In a retail environment, customers who buy milk often buy bread as
well. These associations help improve product placement and marketing
strategies.
◎ Purpose of MBA:
◉ Improve cross-selling (suggesting related products).
◉ Help in store layout decisions (placing related items close to each other).
◉ Build recommendation engines for online retail.
◎ Example in Action: An online bookstore might recommend a book based on
previous purchases like "You may also like..." This is driven by MBA
techniques.
APRIORI ALGORITHM AND FREQUENT ITEMSETS
◎ The Apriori Algorithm is one of the most popular algorithms used for
Association Rule Mining.
◎ Example: If you have a dataset of transactions, say 100 transactions, and the
itemset {bread, butter} appears in 60 of those, it would have a support of 60%
(if the threshold is 50%, it will be considered frequent).
GENERATING ASSOCIATION RULES
◎ Once we have frequent itemsets, we can generate association rules. These
are rules of the form {A} → {B}, meaning "if A happens, B is likely to happen.“
◎ Confidence and Lift are used to evaluate the strength of these rules.
◎ Example Rule: In retail, the rule {bread} → {butter} suggests that customers
who buy bread are likely to buy butter. The strength of this rule is determined
by its confidence and lift.
MEASURES OF RULE INTERESTINGNESS
◎ To determine whether an association rule is interesting, we need to evaluate
it using three key metrics:
◉ Support: The frequency or probability of an itemset occurring in the
dataset.
• Formula: Support(A) = P(A), where A is an itemset.
• Example: If 100 transactions are recorded, and itemset {bread,
butter} appears in 30 transactions, its support is 30%.
◎ Let’s look at how these metrics work together to help evaluate rules.
◎ Support: A graph showing how frequently an itemset occurs across
transactions.
◎ Confidence: A graph showing how likely it is that item B is bought given that
item A was purchased.
◎ Steps Taken:
◉ Step 1: Identify frequent itemsets like {bread}, {butter}, {bread, butter}.
◉ Step 2: Generate rules like {bread} → {butter} and calculate support,
confidence, and lift.
◉ Step 3: Interpret results to suggest that customers who buy bread are
50% more likely to buy butter.
◎ Outcome: The store places bread and butter close to each other in the store to
increase sales.
APRIORI ALGORITHM STEP-BY-STEP EXAMPLE
◎ Walk through a simple dataset and show how the Apriori algorithm works.
◎ Step 1: Start by counting individual item frequencies.
◎ Step 2: Generate 2-itemsets from frequent 1-itemsets.
◎ Step 3: Repeat the process for 3-itemsets, 4-itemsets, etc., until no further
frequent itemsets are found.
◎ Step 4: Generate association rules from these frequent itemsets and evaluate
them.
◎ Example Dataset: Transaction data for 5 items in a supermarket.
◎ Practical Exercise: You can ask the class to manually calculate support and
confidence for a couple of simple rules.
APPLICATIONS OF ASSOCIATION RULE MINING
◎ Fraud Detection: Banks and financial institutions use association rule mining
to detect unusual patterns of transactions that might indicate fraud.
◎ Example: "People who bought a digital camera also bought a camera bag."
ADVANTAGES AND LIMITATIONS
◎ Advantages:
◉ Automatic Pattern Discovery: No need for human hypothesis.
◉ Actionable Insights: The discovered patterns are easy to interpret and can guide
decisions.
◉ Scalable: Can handle large datasets.
◎ Limitations:
◉ High Computational Cost: Especially with large datasets.
◉ Too Many Rules: Can generate a lot of rules, not all of which are useful.
◉ Threshold Setting: Deciding on the right support and confidence
thresholds can be challenging.
CONCLUSION
◎ Association Rule Mining, particularly through the Apriori algorithm, is a
crucial tool for discovering valuable patterns in large datasets.
◎ The metrics support, confidence, and lift are essential for evaluating the
relevance and strength of these patterns.
◎ Despite its challenges, ARM provides actionable insights that drive better
decision-making.