Module-4 DM _introduction
Module-4 DM _introduction
Goal:
Association rule mining aims to identify interesting relationships or
associations between different items or variables within a dataset.
"If-then" Rules:
These rules are expressed in the form of "if X, then Y", where X is the
condition (antecedent) and Y is the outcome (consequent).
Applications:
It's widely used in various fields, including market basket analysis (identifying
products bought together), web usage mining, bioinformatics, and more.
How it works:
Frequent Itemset Mining:
The process typically starts with identifying frequent itemsets, which are
groups of items that frequently occur together in the dataset.
Rule Generation:
Once frequent itemsets are identified, association rules are generated based
on the co-occurrence of items within these itemsets.
Rule Evaluation:
The generated rules are then evaluated based on metrics like support,
confidence, and lift to determine their strength and relevance.
Examples:
Market Basket Analysis: "If a customer buys bread, they are also likely to
buy milk".
Retail: Identifying products that are frequently purchased together to improve
store layout, product placement, and marketing efforts.
Healthcare: Studying frequent symptom clusters to guide diagnoses or
identify risk factors.
Finance: Detecting unusual purchase or transfer patterns that may indicate
fraud.
Key Algorithms:
Apriori Algorithm: A well-known algorithm for finding frequent itemsets and
generating association rules.
FP-Growth Algorithm: Another popular algorithm, particularly efficient for
large datasets.
Eclat Algorithm: An algorithm that uses a different approach to find frequent
itemsets.
In this algorithm, we have to start the process by identifying frequent item sets in the
dataset. The frequent item is a collection of items (or attributes) frequently occurring in
the data.
We can measure the frequency of the data set by using a metric called support. These
supports are represented by the proportion of transactions or records in which the
itemset appears.
We have to use the Apriori algorithm, which uses the bottom-up approach. At first, we
have to look for the frequent individual items, and then we have to gradually combine
them to find larger itemsets.
After identifying the frequent items, we have to generate the association rules from
these items.
Then, we have to write the association rule, which will be in the form of "if-then"
statements, where the "if" part is called the antecedent (premise), and the "then" part is
called the consequent (conclusion).
Then, we have to explore the Apriori algorithm, which combines items within frequent
itemsets to generate potential association rules.
3. Rule Pruning:
We have to apply some criteria to ensure that only meaningful rules are generated. The
most useful criteria are as follows.
In this process, we have to iterate the Apriori algorithm by generating itemsets, creating
rules, and pruning rules until no more valid rules can be generated.
Then, we have to perform the iteration in which the algorithm employs a "downward
closure property," which states that if an item is frequent, all of its subsets are also
frequent. This property helps reduce the computational complexity of the algorithm.