0% found this document useful (0 votes)
14 views

Lecture 8

Uploaded by

Talha Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Lecture 8

Uploaded by

Talha Khan
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

ADVANCE DATA

WAREHOUSE AND
MINING

LECTURE 8: ASSOCIATION
RULE MINING
INTRODUCTION
◎ Association Rule Mining (ARM) is a key technique used in data mining to
uncover hidden patterns and relationships in large datasets. It is mostly
applied in scenarios where we want to identify interesting and useful
relationships between variables in datasets.

◎ Real-World Example: Consider the scenario where a supermarket wants to


understand customer buying habits to improve inventory management. By
identifying which products are often bought together, they can optimize store
placement and promotions.

◎ Goal of ARM: To discover rules that indicate associations, for example, "if a
customer buys item A, they are likely to buy item B."
MARKET BASKET ANALYSIS
◎ Market Basket Analysis (MBA) is a primary application of Association Rule
Mining. It analyzes transaction data to find associations between different
products that customers buy together.
◎ Example: In a retail environment, customers who buy milk often buy bread as
well. These associations help improve product placement and marketing
strategies.
◎ Purpose of MBA:
◉ Improve cross-selling (suggesting related products).
◉ Help in store layout decisions (placing related items close to each other).
◉ Build recommendation engines for online retail.
◎ Example in Action: An online bookstore might recommend a book based on
previous purchases like "You may also like..." This is driven by MBA
techniques.
APRIORI ALGORITHM AND FREQUENT ITEMSETS

◎ The Apriori Algorithm is one of the most popular algorithms used for
Association Rule Mining.

◎ Frequent Itemsets are groups of items that appear together in transactions


with a frequency higher than a user-specified threshold (support).
◎ Apriori Algorithm:
◉ Step 1: Generate frequent 1-itemsets (individual items that appear most
often).
◉ Step 2: Combine these frequent items to generate 2-itemsets and so on.
◉ Step 3: Eliminate any itemsets that do not meet the minimum support threshold.

◎ Example: If you have a dataset of transactions, say 100 transactions, and the
itemset {bread, butter} appears in 60 of those, it would have a support of 60%
(if the threshold is 50%, it will be considered frequent).
GENERATING ASSOCIATION RULES
◎ Once we have frequent itemsets, we can generate association rules. These
are rules of the form {A} → {B}, meaning "if A happens, B is likely to happen.“

◎ Confidence and Lift are used to evaluate the strength of these rules.

◎ Steps to Generate Rules:


◉ Identify frequent itemsets using the Apriori algorithm.
◉ For each frequent itemset, generate all possible rules (e.g., for {A, B}, generate
{A} → {B} and {B} → {A}.
◉ Calculate the support, confidence, and lift of each rule to determine if they are
meaningful.

◎ Example Rule: In retail, the rule {bread} → {butter} suggests that customers
who buy bread are likely to buy butter. The strength of this rule is determined
by its confidence and lift.
MEASURES OF RULE INTERESTINGNESS
◎ To determine whether an association rule is interesting, we need to evaluate
it using three key metrics:
◉ Support: The frequency or probability of an itemset occurring in the
dataset.
• Formula: Support(A) = P(A), where A is an itemset.
• Example: If 100 transactions are recorded, and itemset {bread,
butter} appears in 30 transactions, its support is 30%.

◉ Confidence: The probability that an item B is bought when item A is


bought.
• Formula: Confidence(A → B) = P(B|A)
• Example: If 50% of customers who buy {bread} also buy {butter},
then confidence for {bread} → {butter} is 50%.
CONTINUED…
◎ Lift: This measures the strength of the rule by comparing the observed
frequency of {A, B} to what we'd expect if A and B were independent.
◉ Formula: Lift(A → B) = P(A ∩ B) / (P(A) * P(B))
◉ Example: If P(A) = 0.3, P(B) = 0.4, and P(A ∩ B) = 0.12, the lift would be 1,
meaning the two items are not more likely to be bought together than if
they were independent.

◎ Why These Measures Matter:


◉ Support helps us identify common itemsets.
◉ Confidence tells us how reliable the rule is.
◉ Lift helps assess the strength of the rule.
VISUALIZING SUPPORT, CONFIDENCE, AND LIFT

◎ Let’s look at how these metrics work together to help evaluate rules.
◎ Support: A graph showing how frequently an itemset occurs across
transactions.

◎ Confidence: A graph showing how likely it is that item B is bought given that
item A was purchased.

◎ Lift: A graph showing how much stronger the association is compared to a


random purchase.
◎ Example Visualization:
◉ A table or chart comparing multiple rules and their support, confidence,
and lift values.
◉ This would allow the class to visually compare different rules and their
significance.
EXAMPLE OF ASSOCIATION RULE MINING (CASE
STUDY)
◎ Scenario: A retail store wants to understand the relationship between
different products bought by customers.

◎ Dataset: 1,000 transactions with 10 products.

◎ Steps Taken:
◉ Step 1: Identify frequent itemsets like {bread}, {butter}, {bread, butter}.
◉ Step 2: Generate rules like {bread} → {butter} and calculate support,
confidence, and lift.
◉ Step 3: Interpret results to suggest that customers who buy bread are
50% more likely to buy butter.

◎ Outcome: The store places bread and butter close to each other in the store to
increase sales.
APRIORI ALGORITHM STEP-BY-STEP EXAMPLE

◎ Walk through a simple dataset and show how the Apriori algorithm works.
◎ Step 1: Start by counting individual item frequencies.
◎ Step 2: Generate 2-itemsets from frequent 1-itemsets.
◎ Step 3: Repeat the process for 3-itemsets, 4-itemsets, etc., until no further
frequent itemsets are found.
◎ Step 4: Generate association rules from these frequent itemsets and evaluate
them.
◎ Example Dataset: Transaction data for 5 items in a supermarket.
◎ Practical Exercise: You can ask the class to manually calculate support and
confidence for a couple of simple rules.
APPLICATIONS OF ASSOCIATION RULE MINING

◎ Market Basket Analysis: Most commonly used to find associations in retail


transactions.

◎ Recommendation Systems: E-commerce websites suggest products based on


the items frequently bought together.

◎ Healthcare: In medical databases, association rules can uncover relationships


between symptoms, treatments, and outcomes.

◎ Fraud Detection: Banks and financial institutions use association rule mining
to detect unusual patterns of transactions that might indicate fraud.

◎ Example: "People who bought a digital camera also bought a camera bag."
ADVANTAGES AND LIMITATIONS
◎ Advantages:
◉ Automatic Pattern Discovery: No need for human hypothesis.
◉ Actionable Insights: The discovered patterns are easy to interpret and can guide
decisions.
◉ Scalable: Can handle large datasets.

◎ Limitations:
◉ High Computational Cost: Especially with large datasets.
◉ Too Many Rules: Can generate a lot of rules, not all of which are useful.
◉ Threshold Setting: Deciding on the right support and confidence
thresholds can be challenging.
CONCLUSION
◎ Association Rule Mining, particularly through the Apriori algorithm, is a
crucial tool for discovering valuable patterns in large datasets.

◎ The metrics support, confidence, and lift are essential for evaluating the
relevance and strength of these patterns.

◎ Real-world applications in retail, healthcare, and finance demonstrate the


power of ARM in solving practical problems.

◎ Despite its challenges, ARM provides actionable insights that drive better
decision-making.

You might also like