0% found this document useful (0 votes)
7 views70 pages

ML Unit 2

The document provides an overview of data mining, focusing on techniques such as Association Rule Mining and algorithms like Apriori and FP Growth. It explains the concept of data mining through Market-Basket Analysis, emphasizing the extraction of useful patterns from transactional data. Additionally, it details the process of discovering association rules and the significance of support and confidence in evaluating these rules.

Uploaded by

Jithin S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views70 pages

ML Unit 2

The document provides an overview of data mining, focusing on techniques such as Association Rule Mining and algorithms like Apriori and FP Growth. It explains the concept of data mining through Market-Basket Analysis, emphasizing the extraction of useful patterns from transactional data. Additionally, it details the process of discovering association rules and the significance of support and confidence in evaluating these rules.

Uploaded by

Jithin S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

22-382-0203 Machine Learning

Unit II

Dr Arun K S
Assistant Professor
Department of Computer Applications
Cochin University of Science and Technology
Kochi - 682022
Presentation Outline 2

Introduction to Data Mining

Association Rule Mining

Apriori Algorithm

FP Growth Algorithm
Introduction to Data Mining
What is Data Mining? 4

Data Mining (knowledge discovery from data)


Extraction of interesting (non-trivial, implicit, previously unknown or
potentially useful) patterns or knowledge from huge amounts of data.
What is Data Mining? 5

Let us consider Market-Basket Analysis as an example to illustrate the concept of


data mining.
▶ Market-Basket Analysis is a data mining technique used in business and retail to
identify associations between products or items that customers tend to purchase
together.
What is Data Mining? 5

Let us consider Market-Basket Analysis as an example to illustrate the concept of


data mining.
▶ Market-Basket Analysis is a data mining technique used in business and retail to
identify associations between products or items that customers tend to purchase
together.
▶ Market-Basket Analysis uncovers patterns and relationships within transac-
tional data, specifically looking at the items that customers buy in conjunction
with each other.
What is Data Mining? 5

Let us consider Market-Basket Analysis as an example to illustrate the concept of


data mining.
▶ Market-Basket Analysis is a data mining technique used in business and retail to
identify associations between products or items that customers tend to purchase
together.
▶ Market-Basket Analysis uncovers patterns and relationships within transac-
tional data, specifically looking at the items that customers buy in conjunction
with each other.
▶ In the context of Market-Basket Analysis, transactional data specifically refers
to the details about transactions or purchases made by individual customers.
What is Data Mining? 6

Let’s break down the concept of extracting non-trivial, implicit, previously unknown,
or potentially useful patterns (or knowledge) in the context of Market Basket Anal-
ysis:
1. Non-trivial or Implicit or Previously Unknown or Potentially Useful
Patterns:
▶ Patterns or knowledge that are not immediately obvious or predictable based on
common sense or intuition from the given set of transactions.
What is Data Mining? 6

Let’s break down the concept of extracting non-trivial, implicit, previously unknown,
or potentially useful patterns (or knowledge) in the context of Market Basket Anal-
ysis:
1. Non-trivial or Implicit or Previously Unknown or Potentially Useful
Patterns:
▶ Patterns or knowledge that are not immediately obvious or predictable based on
common sense or intuition from the given set of transactions.
▶ These types of patterns are discovered by analysing the transactional data.
What is Data Mining? 6

Let’s break down the concept of extracting non-trivial, implicit, previously unknown,
or potentially useful patterns (or knowledge) in the context of Market Basket Anal-
ysis:
1. Non-trivial or Implicit or Previously Unknown or Potentially Useful
Patterns:
▶ Patterns or knowledge that are not immediately obvious or predictable based on
common sense or intuition from the given set of transactions.
▶ These types of patterns are discovered by analysing the transactional data.
▶ For example, discovering that customers who buy milk most frequently tend to
purchase bread might not be immediately apparent without analysing the given
set of transactions.
What is Data Mining? 7

1. Non-trivial or Implicit or Previously Unknown or Potentially Useful


Patterns:
▶ Businesses can improve marketing strategies or decision-making policies, optimise
product placements, and improve consumer satisfaction levels by identifying these
types of patterns.
What is Data Mining? 7

1. Non-trivial or Implicit or Previously Unknown or Potentially Useful


Patterns:
▶ Businesses can improve marketing strategies or decision-making policies, optimise
product placements, and improve consumer satisfaction levels by identifying these
types of patterns.
▶ For example, these types of patterns are valuable for recommendation systems and
targeted marketing. Businesses can use these patterns to suggest related products
to customers, improve product bundling, and enhance the overall shopping expe-
rience.
What is Data Mining? 7

1. Non-trivial or Implicit or Previously Unknown or Potentially Useful


Patterns:
▶ Businesses can improve marketing strategies or decision-making policies, optimise
product placements, and improve consumer satisfaction levels by identifying these
types of patterns.
▶ For example, these types of patterns are valuable for recommendation systems and
targeted marketing. Businesses can use these patterns to suggest related products
to customers, improve product bundling, and enhance the overall shopping expe-
rience.
▶ Another advantage is optimizing product placement in a supermarket. By iden-
tifying frequently associated items, businesses can strategically position items to
enhance visibility, encourage cross-category purchases, and increase overall sales.
What is Data Mining? 8

Alternate Names for Data Mining


▶ Knowledge Discovery in Databases (KDD)
▶ Knowledge Extraction
▶ Data/Pattern Analysis
▶ Data Archeology
▶ Data Dredging
▶ Information Harvesting
▶ Business Intelligence
Association Rule Mining
Association Rule Mining 10

▶ We are given a set of transactions.


Association Rule Mining 10

▶ We are given a set of transactions.


▶ Association rule mining discovers rules that
will predict the occurrence of an item based
on the occurrence of other items in the given
transactions.
Association Rule Mining 11

▶ Examples of Association Rules:

{Diaper} → {Beer}
{M ilk, Bread} → {Eggs, Coke}
{Beer, Bread} → {M ilk}

▶ Please note that → denotes co-occurrence, not


implication.
What is Association Rule Mining? 12

▶ If you are given a set of transactions, association rule mining is the process of
discovering interesting or strong or valid association rules.
Association Rule Mining - Frequent Itemset 13

▶ What are the frequent itemsets in the given


set of transactions?
Association Rule Mining - Frequent Itemset 14

Itemset
▶ A collection or set of one or more items is called an Itemset.
▶ Examples:
1. {Milk, Bread, Diaper}
2. {Milk}

k - Itemset
▶ An itemset that contains k items or elements.
▶ Examples for 2-itemset:
1. {Milk, Bread}
2. {Milk, Diaper}
Association Rule Mining - Frequent Itemset 15

Support Count of an Itemset (σ)


▶ Frequency of occurrence of an itemset in the given set of transactions.
▶ Example:
1. σ({M ilk, Bread, Diaper}) = 2

Support of an itemset (s)


▶ Fraction of transactions that contain a particular itemset.
▶ Example for support:
2
1. s({Milk, Bread, Diaper}) = 5
Association Rule Mining - Frequent Itemset 16

Frequent Itemset
▶ An itemset whose support is greater than or equal to a user-specified
threshold value minsup.
▶ if minsup = 40%, then {Milk, Bread, Diaper} is a frequent itemset.
Interesting Association Rules 17

What is an interesting Association Rule?


▶ An association rule is an implication expression of the form X → Y , where
both X and Y are disjoint itemsets (i.e., X ∩ Y = ∅).
Interesting Association Rules 17

What is an interesting Association Rule?


▶ An association rule is an implication expression of the form X → Y , where
both X and Y are disjoint itemsets (i.e., X ∩ Y = ∅).
▶ {Milk, Diaper } → { Beer} is an example of an association rule.
Interesting Association Rules 17

What is an interesting Association Rule?


▶ An association rule is an implication expression of the form X → Y , where
both X and Y are disjoint itemsets (i.e., X ∩ Y = ∅).
▶ {Milk, Diaper } → { Beer} is an example of an association rule.
▶ For an association rule X → Y to be interesting, we need to compute the
following rule evaluation metrics:
1. Support(s): It is the fraction of transactions that contain both the itemset X
and Y .
2. Confidence (c): It measures how often items in Y appear in transactions that
contain X. This is measured by the proportion of transactions with itemset X,
in which itemset Y also appear.
Interesting Association Rules 18

▶ Consider the following association rule from


the previously shown transactions: {Milk, Di-
aper } → { Beer}
Interesting Association Rules 18

▶ Consider the following association rule from


the previously shown transactions: {Milk, Di-
aper } → { Beer}
▶ For the above-mentioned association rule:
σ(M ilk, Diaper, Beer) 2
s= = = 0.4
|T | 5
σ(M ilk, Diaper, Beer) 2
c= = = 0.67
σ(M ilk, Diaper) 3
Interesting Association Rules 19

What is an interesting Association Rule?


▶ For an association rule X → Y to be interesting or strong or valid, it has to
satisfy the following two conditions:
1. Support(s) ≥ minsup threshold.
2. Confidence(c) ≥ minconf threshold.
Interesting Association Rule Mining 20

How will you find all interesting Association Rules?


▶ Brute-force Approach:
1. List all possible association rules (i.e., candidate rules) from the given set of trans-
actions.
2. Compute the Support(s) and Confidence (c) of each association rule.
3. Prune the rules that fail the minsup and minconf thresholds.

Limitations of Brute-force Approach


▶ The Brute-force approach for association rule mining is computationally expen-
sive.
▶ In order to avoid this limitation, the Apriori Algorithm is proposed, which
avoids the computation of all possible candidate rules.
Apriori Algorithm
Preliminaries 22

▶ The significance and reliability of an association rule are measured in terms of


its Support and Confidence.
▶ High support indicates that the itemsets are common in the transaction database,
meaning they occur frequently.
▶ An association rule with low support is likely to be uninteresting from a business
perspective because it may not be profitable to promote items that are seldom
bought together.
▶ Confidence measures the strength of the relationship between the antecedent
and consequent in an association rule.
▶ High confidence indicates that the presence of the antecedent is strongly asso-
ciated with the presence of the consequent.
Preliminaries 23

▶ An initial step towards improving the performance of the Brute-force Ap-


proach for association rule mining is to decouple the support and confidence
requirements.
▶ The support of an association rule X → Y depends only on the support of the
corresponding itemsets (X ∪ Y )
▶ For example, the following rules have identical support because they involve
items from the same itemset - {Beer, Diaper, M ilk}.
{Beer, Diaper} → {M ilk}
{Beer, M ilk} → {Diaper}
{Diaper, M ilk} → {Beer}
{Beer} → {Diaper, M ilk}
{M ilk} → {Beer, Diaper}
{Diaper} → {Beer, M ilk}
Preliminaries 24

▶ If the itemset {Beer, Diaper, M ilk} is infrequent, then all the above-listed can-
didate rules can be pruned immediately without computing their confidence
values.
▶ Therefore, a common strategy adopted by many association rule mining strate-
gies, including Apriori Algorithm, is to decompose the problem into two steps:
1. Frequent Itemset Generation:, whose objective is to find all the itemsets that
satisfy the minsup threshold. These itemsets are called frequent itemset.
2. Rule Generation:, whose objective is to extract all the high-confidence rules
from the frequent itemsets found in the previous step. These rules are called
interesting or valid or strong association rules.
▶ The computational requirements for frequent itemset generation are generally
more computationally expensive than those of rule generation.
Preliminaries 25

Frequent Itemset Generation:


▶ A lattice structure termed as Hasse Diagram can be used to enumerate all
possible itemset. In general, a transaction database that contains k items can
potentially generate 2k − 1 itemset excluding the null set.
▶ The following figure shows the Hasse diagram for a set of five items: I =
{a, b, c, d, e}.
Preliminaries 26

Frequent Itemset Generation:


▶ Because k can be very large in many practical applications, the search space of
itemsets that need to be explored in the Hasse diagram is exponential.
▶ Therefore, the brute-force approach for finding frequent itemset requires the
computation of the confidence score for the candidate itemsets in the Hasse
diagram.
▶ To do this, we need to compare each candidate against every transaction as
shown in the following figure, and it requires M × N × w number of operations,
where N is the number of transactions, M is the number of candidate itemsets
and w is the maximum transaction width.
Preliminaries 27

Apriori Principle
▶ The Apriori principle states that if an itemset is frequent, then all of its subsets
must be frequent.
Preliminaries 28

Apriori Principle for Frequent Itemset Generation:


▶ There are several ways to reduce the computational complexity of frequent
itemset generation.
▶ In Apriori algorithm, the Apriori principle helps reduce the number of candidate
itemsets to be explored during frequent itemset generation.
▶ To illustrate the idea behind Apriori principle, consider the Hasse diagram
shown previously for the itemset I = {a, b, c, d, e} and let {c, d, e} be a frequent
itemset.
Preliminaries 29

Apriori Principle for Frequent Itemset Generation:


▶ Then, any transaction that contains the items belonging to {c, d, e} must be a
subset ({c, d}, {c, e}, {d, e}, {c}, {d} and {e}) of this frequent itemset.
▶ As a result, if {c, d, e} is frequent, then all the subsets of {c, d, e} (the shaded
region in the Hasse diagram) must also be frequent. (this is known as Apriori
Principle)
Preliminaries 30

Apriori Principle for Frequent Itemset Generation:


▶ Conversely, if an itemset such as {a, b} is infrequent, then all of its supersets
must be infrequent too.
▶ As illustrated in the following figure, the entire subgraph containing the super-
sets of {a, b} can be prunned immediately once {a, b} is found to be infrequent.
Preliminaries 31

Apriori Principle for Frequent Itemset Generation:


▶ This strategy of trimming the exponential search space in the Hasse diagram
based on the support measure is termed as support-based prunning.
▶ Such a prunnig strategy is made possible by a key property of the support
measure, i.e., the support of an itemset never exceeds the support of its subset.
▶ This property is also known as the anti-monotone property of the support
measure.
▶ Any measure that possesses an anti-monotone property can be incorporated
directly into association rule mining algorithms to effectively prune the expo-
nential search space of the candidate itemsets.
Apriori Algorithm 32

▶ The Apriori algorithm was proposed by Rakesh Agrawal and Ramakrishnan


Srikant in 1994.
▶ The Apriori algorithm divided the problem of discovering interesting association
rules into two phases.
1. Phase 1: Find all itemsets with a specified minimal support (frequent itemset).
The Apriori property avoids unnecessary generation and evaluation of itemsets
that cannot be frequent if their supersets are not frequent.
2. Phase2: Use the frequent itemsets obtained in Phase 1 to help generate interest-
ing association rules.
Apriori Algorithm 33

Phase 1 of the Apriori Algorithm - some key terminologies.


▶ Large itemset: It doesn’t mean an itemset with many items. It means an
itemset whose support is at least minimum support.
▶ Set of Large k-itemsets(Lk ): It is the collection of Large itemsets of size k.
▶ Set of Candidate k-itemssets(Ck ): It is the collection of potentially large
k-itemsets. It contains all the k-itemsets that might be large.
Apriori Algorithm 34

Phase 1 of the Apriori Algorithm - Pseudocode


1. Generate the set of candidate 1-itemsets C1 .
2. Generate the set of large 1-itemsets L1 from C1 .
3. let k=2.
4. Generate candidate k-itemsets (Ck ) by performing a join operation on the large
(k-1)-itemset (i.e., Lk−1 ) with itself.
5. Prune the candidate k-itemsets generated in step 4. That is, eliminate candidate
k-itemsets with subsets of size (k-1) that are not members of Lk−1 . For example,
if {A, B, C} is a candidate 3-itemset, but both {A, B} and {A, C} are not
members of L2 , then {A, B, C} is pruned.
6. Generate large k-itemsets by counting the support of each candidate k-itemset
by scanning the entire database of transactions and then removing those can-
didate k-itemsets with support below the minimum support threshold.
Apriori Algorithm 35

Phase 1 of the Apriori Algorithm - Pseudocode contd..


7. k = k+1
8. Repeat steps 4-7 until no more large k-itemsets can be generated.
S
9. Return k Lk as the frequent itemsets.
Apriori Algorithm 36

Phase 1 of the Apriori Algorithm - Join Operation.


▶ The join operation (▷◁) is performed to combine the itemsets belonging to Lk−1
with itself to generate candidate k-itemset(Ck ). The condition for joining Lk−1
and Lk−1 is that they should have itemsets with the first (k-2) items in common.
▶ Thus, join operation is the cross product with the condition that two itemsets
in Lk−1 are combined to form a candidate k-itemset if they should have the first
k-2 items in common.
▶ For example, if the large 2-itemsets are {A, B} and {A, C}, then the candidate
3-itemset would be {A, B, C} after performing the join operation.
▶ By convention, Apriori algorithm assumes that the items in a transaction are
sorted in lexicographic order.
Apriori Algorithm 37

Phase 1 of the Apriori Algorithm - Example Problem.


▶ Consider the following transaction database (D). Lets assume that the mini-
mum support count = 2 and minimum confidence = 70%. Then, using Apriori
algorithm (i) Find all frequent itemsets (ii) List out all valid association rules.
TID List of items
T1 I1 , I2 , I5
T2 I2 , I4
T3 I2 , I3
T4 I1 , I2 , I4
T5 I1 , I3
T6 I2 , I3
T7 I1 , I3
T8 I1 , I2 , I3 , I5
T9 I1 , I2 , I3
Apriori Algorithm 38

Phase 1 of the Apriori Algorithm - step by step solution.


▶ Generate C1 : Each item in D is a member of the set of candidate 1-itemsets
(C1 ). So, we have to simply scan D in order to count the number of occurrences
of each item.
Itemset Support Count
I1 6
I2 7
I3 6
I4 2
I5 2
Apriori Algorithm 39

Phase 1 of the Apriori Algorithm - step by step solution


▶ Generate L1 : It consists of the candidate 1-itemsets satisfying the minimum
support threshold. Here, all the candidates in C1 satisfy minimum support.
Itemset Support Count
I1 6
I2 7
I3 6
I4 2
I5 2
Apriori Algorithm 40

Phase 1 of the Apriori Algorithm - step by step solution


▶ Generate C2 by performing join operation on L1 : i.e., C2 = L1 ▷◁ L1 . Here,
L1 ▷◁ L1 is equivalent to L1 × L1 , since the definition of Lk ▷◁ Lk requires the
two joining itemset to share the first k-2 items in common. (here k-2=0).
Itemset Support Count
{I1 , I2 } 4
{I1 , I3 } 4
{I1 , I4 } 1
{I1 , I5 } 2
{I2 , I3 } 4
{I2 , I4 } 2
{I2 , I5 } 2
{I3 , I4 } 0
{I3 , I5 } 1
{I4 , I5 } 0
Apriori Algorithm 41

Phase 1 of the Apriori Algorithm - step by step solution


▶ Generate C2 by performing join operation on L1 : Note that no candidates are
removed from C2 during the prune step because each subset of the candidates
is also frequent.
Apriori Algorithm 42

Phase 1 of the Apriori Algorithm - step by step solution


▶ Generate L2 from C2 : The set of large 2-itemset (L2 ) consists of those candidate
2-itemsets in C2 having minimum support.

Itemset Support Count


{I1 , I2 } 4
{I1 , I3 } 4
{I1 , I5 } 2
{I2 , I3 } 4
{I2 , I4 } 2
{I2 , I5 } 2
Apriori Algorithm 43

Phase 1 of the Apriori Algorithm - step by step solution


▶ Generate C3 by performing the join operation on L2 : Here L2 ▷◁ L2 =
{{I1 , I2 , I3 }, {I1 , I2 , I5 }, {I1 , I3 , I5 }, {I2 , I3 , I4 }, {I2 , I3 , I5 }, {I2 , I4 , I5 }}.
▶ Based on the Apriori property that all subsets of a large itemset must also be
large, we can determine that the four later candidates cannot possibly be
large. We can therefore remove them from C3 .

Itemset Support Count


{I1 , I2 , I3 } 2
{I1 , I2 , I5 } 2
Apriori Algorithm 44

Phase 1 of the Apriori Algorithm - step by step solution


▶ Generate L3 from C3 : The set of large 3-itemset (L3 ) consists of those candidate
3-itemsets in C3 having minimum support.

Itemset Support Count


{I1 , I2 , I3 } 2
{I1 , I2 , I5 } 2
Apriori Algorithm 45

Phase 1 of the Apriori Algorithm - step by step solution


▶ Generate C4 from L3 : We can use L3 ▷◁ L3 to generate the set of candidate
4-itemsets C4 . In this case, the join results in {{I1 , I2 , I3 , I5 }} and this itemset
is prunned because its subset {I2 , I3 , I5 } is not large. Thus, C4 = ϕ and the
algorithm terminates, having found all the frequent itemsets.
Apriori Algorithm 46

Phase 1 of the Apriori Algorithm - step by step solution


▶ Thus, Phase 1 of the Apriori algorithm generates the following frequent itemsets:

1. {I1 , I2 , I3 } (large (or frequent) 3-itemset from L3 )


2. {I1 , I2 , I5 } (large (or frequent) 3-itemset from L3 )
3. {I1 , I2 } (large (or frequent) 2-itemset from L2 )
4. {I1 , I3 } (large (or frequent) 2-itemset from L2 )
5. {I1 , I5 } (large (or frequent) 2-itemset from L2 )
6. {I2 , I3 } (large (or frequent) 2-itemset from L2 )
7. {I2 , I4 } (large (or frequent) 2-itemset from L2 )
8. {I2 , I5 } (large (or frequent) 2-itemset from L2 )
9. {I1 } (large (or frequent) 1-itemset from L1 )
10. {I2 } (large (or frequent) 1-itemset from L1 )
11. {I3 } (large (or frequent) 1-itemset from L1 )
12. {I4 } (large (or frequent) 1-itemset from L1 )
13. {I5 } (large (or frequent) 1-itemset from L1 )
Apriori Algorithm 47

Phase 1 of the Apriori Algorithm - step by step solution


▶ Now, we have to generate interesting or valid or strong association rules from the
above-listed frequent itesets generated from Phase 1 of the Apriori algorithm.
▶ Phase 2 of the Apriori algorithm discovers all the interesting association rules
from the above-listed frequent itemsets.
Apriori Algorithm 48

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ Valid association rules can be generated from the frequent itemsets obtained in
Phase 1 as follows:
1. For each frequent itemset l, generate all nonempty proper subsets of l.
2. For every nonempty proper subset s of l, output the rule “s → (l − s)” if
support count(l)
support count(s) ≥ minconf threshold.
Apriori Algorithm 49

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ For the previous problem, consider the frequent itemset {I1 , I2 , I5 } ∈ L3 .
▶ Its non-empty proper subsets are: {{I1 , I2 }, {I1 , I5 }, {I2 , I5 }, {I1 }, {I2 }, {I5 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 , I2 } → {I5 } ( σ(s) = 24 )
σ(l)
2. {I1 , I5 } → {I2 } ( σ(s) = 22 )
σ(l)
3. {I2 , I5 } → {I1 } ( σ(s) = 22 )
σ(l)
4. {I1 } → {I2 , I5 } ( σ(s) = 26 )
σ(l)
5. {I2 } → {I1 , I5 } ( σ(s) = 27 )
σ(l)
6. {I5 } → {I1 , I2 } ( σ(s) = 22 )
▶ Since the given confidence threshold is 70%, only the second, third, and last rules will be
selected as valid association rules from the frequent itemset {I1 , I2 , I5 } ∈ L3 .
Apriori Algorithm 50

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ Now, consider the frequent itemset {I1 , I2 , I3 } ∈ L3 .
▶ Its non-empty proper subsets are: {{I1 , I2 }, {I1 , I3 }, {I2 , I3 }, {I1 }, {I2 }, {I3 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 , I2 } → {I3 } ( σ(s) = 24 )
σ(l)
2. {I1 , I3 } → {I2 } ( σ(s) = 24 )
σ(l)
3. {I2 , I3 } → {I1 } ( σ(s) = 24 )
σ(l)
4. {I1 } → {I2 , I3 } ( σ(s) = 26 )
σ(l)
5. {I2 } → {I1 , I3 } ( σ(s) = 27 )
σ(l)
6. {I3 } → {I1 , I2 } ( σ(s) = 26 )
▶ Since the given confidence threshold is 70%, none of the above-mentioned association rules
will be selected as valid association rules from the frequent itemset {I1 , I2 , I3 } ∈ L3 .
Apriori Algorithm 51

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ Now, consider the frequent itemset {I1 , I2 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I1 }, {I2 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 } → {I2 } ( σ(s) = 46 )
σ(l)
2. {I2 } → {I1 } ( σ(s) = 47 )
▶ Since the given confidence threshold is 70%, none of the above-mentioned
association rules will be selected as valid association rules from the frequent
itemset {I1 , I2 } ∈ L2 .
Apriori Algorithm 52

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ Now, consider the frequent itemset {I1 , I3 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I1 }, {I3 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 } → {I3 } ( σ(s) = 46 )
σ(l)
2. {I3 } → {I1 } ( σ(s) = 46 )
▶ Since the given confidence threshold is 70%, none of the above-mentioned
association rules will be selected as valid association rules from the frequent
itemset {I1 , I3 } ∈ L2 .
Apriori Algorithm 53

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ Now, consider the frequent itemset {I1 , I5 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I1 }, {I5 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 } → {I5 } ( σ(s) = 26 )
σ(l)
2. {I5 } → {I1 } ( σ(s) = 22 )
▶ Since the given confidence threshold is 70%, only the second association rules
will be selected as valid association rules from the frequent itemset
{I1 , I5 } ∈ L2 .
Apriori Algorithm 54

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ Now, consider the frequent itemset {I2 , I3 } ∈ L2 .
▶ Its non empty proper subsets are: {{I2 }, {I3 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I2 } → {I3 } ( σ(s) = 46 )
σ(l)
2. {I3 } → {I2 } ( σ(s) = 47 )
▶ Since the given confidence threshold is 70%, none of the above listed
association rules will be selected as valid association rules from the frequent
itemset {I2 , I3 } ∈ L2 .
Apriori Algorithm 55

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ Now, consider the frequent itemset {I2 , I4 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I2 }, {I4 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I2 } → {I4 } ( σ(s) = 27 )
σ(l)
2. {I4 } → {I2 } ( σ(s) = 22 )
▶ Since the given confidence threshold is 70%, only the second association rule
will be selected as valid association rule from the frequent itemset
{I2 , I4 } ∈ L2 .
Apriori Algorithm 56

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ Now, consider the frequent itemset {I2 , I5 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I2 }, {I5 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I2 } → {I5 } ( σ(s) = 27 )
σ(l)
2. {I5 } → {I2 } ( σ(s) = 22 )
▶ Since the given confidence threshold is 70%, only the second association rule
will be selected as valid association rule from the frequent itemset
{I2 , I5 } ∈ L2 .
Apriori Algorithm 57

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets


▶ We don’t want to consider the itemsets belonging to L1 . Because, from these
large 1-itemsets in L1 , we cannot generate association rules as per the algorithm
discussed previously.
▶ So, the valid association rules that can be generated from the given transaction
database D are:
1. {I1 , I5 } → {I2 }
2. {I2 , I5 } → {I1 }
3. {I5 } → {I1 , I2 }
4. {I5 } → {I1 }
5. {I4 } → {I2 }
6. {I5 } → {I2 }
Apriori Algorithm 58

Apriori Algorithm - Practice Problem.


▶ Consider the following transaction database (D). Let’s assume that min sup
= 60% and min conf = 80%. Then, using the Apriori algorithm (i) Find all
frequent itemsets and (ii) List out all valid association rules.

TID List of items


T100 {K, A, D, B}
T200 {D, A, C, E, B}
T300 {C, A, B, E}
T400 {B, A, D}
Apriori Algorithm 59

Apriori Algorithm - Practice Problem :- The Frequent Itemsets are:


L1 L2 L3
Itemset support count Itemset support count
{A} 4 {A, B} 4 Itemset support count
{B} 4 {A, D} 3 {A, B, D} 3
{D} 3 {B, D} 3
FP Growth Algorithm

You might also like