ML Unit 2
ML Unit 2
Unit II
Dr Arun K S
Assistant Professor
Department of Computer Applications
Cochin University of Science and Technology
Kochi - 682022
Presentation Outline 2
Apriori Algorithm
FP Growth Algorithm
Introduction to Data Mining
What is Data Mining? 4
Let’s break down the concept of extracting non-trivial, implicit, previously unknown,
or potentially useful patterns (or knowledge) in the context of Market Basket Anal-
ysis:
1. Non-trivial or Implicit or Previously Unknown or Potentially Useful
Patterns:
▶ Patterns or knowledge that are not immediately obvious or predictable based on
common sense or intuition from the given set of transactions.
What is Data Mining? 6
Let’s break down the concept of extracting non-trivial, implicit, previously unknown,
or potentially useful patterns (or knowledge) in the context of Market Basket Anal-
ysis:
1. Non-trivial or Implicit or Previously Unknown or Potentially Useful
Patterns:
▶ Patterns or knowledge that are not immediately obvious or predictable based on
common sense or intuition from the given set of transactions.
▶ These types of patterns are discovered by analysing the transactional data.
What is Data Mining? 6
Let’s break down the concept of extracting non-trivial, implicit, previously unknown,
or potentially useful patterns (or knowledge) in the context of Market Basket Anal-
ysis:
1. Non-trivial or Implicit or Previously Unknown or Potentially Useful
Patterns:
▶ Patterns or knowledge that are not immediately obvious or predictable based on
common sense or intuition from the given set of transactions.
▶ These types of patterns are discovered by analysing the transactional data.
▶ For example, discovering that customers who buy milk most frequently tend to
purchase bread might not be immediately apparent without analysing the given
set of transactions.
What is Data Mining? 7
{Diaper} → {Beer}
{M ilk, Bread} → {Eggs, Coke}
{Beer, Bread} → {M ilk}
▶ If you are given a set of transactions, association rule mining is the process of
discovering interesting or strong or valid association rules.
Association Rule Mining - Frequent Itemset 13
Itemset
▶ A collection or set of one or more items is called an Itemset.
▶ Examples:
1. {Milk, Bread, Diaper}
2. {Milk}
k - Itemset
▶ An itemset that contains k items or elements.
▶ Examples for 2-itemset:
1. {Milk, Bread}
2. {Milk, Diaper}
Association Rule Mining - Frequent Itemset 15
Frequent Itemset
▶ An itemset whose support is greater than or equal to a user-specified
threshold value minsup.
▶ if minsup = 40%, then {Milk, Bread, Diaper} is a frequent itemset.
Interesting Association Rules 17
▶ If the itemset {Beer, Diaper, M ilk} is infrequent, then all the above-listed can-
didate rules can be pruned immediately without computing their confidence
values.
▶ Therefore, a common strategy adopted by many association rule mining strate-
gies, including Apriori Algorithm, is to decompose the problem into two steps:
1. Frequent Itemset Generation:, whose objective is to find all the itemsets that
satisfy the minsup threshold. These itemsets are called frequent itemset.
2. Rule Generation:, whose objective is to extract all the high-confidence rules
from the frequent itemsets found in the previous step. These rules are called
interesting or valid or strong association rules.
▶ The computational requirements for frequent itemset generation are generally
more computationally expensive than those of rule generation.
Preliminaries 25
Apriori Principle
▶ The Apriori principle states that if an itemset is frequent, then all of its subsets
must be frequent.
Preliminaries 28