0% found this document useful (0 votes)

7 views70 pages

ML Unit 2

The document provides an overview of data mining, focusing on techniques such as Association Rule Mining and algorithms like Apriori and FP Growth. It explains the concept of data mining through Market-Basket Analysis, emphasizing the extraction of useful patterns from transactional data. Additionally, it details the process of discovering association rules and the significance of support and confidence in evaluating these rules.

Uploaded by

Jithin S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views70 pages

ML Unit 2

Uploaded by

Jithin S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

22-382-0203 Machine Learning

Unit II

Dr Arun K S
Assistant Professor
Department of Computer Applications
Cochin University of Science and Technology
Kochi - 682022
Presentation Outline 2

Introduction to Data Mining

Association Rule Mining

Apriori Algorithm

FP Growth Algorithm
Introduction to Data Mining
What is Data Mining? 4

Data Mining (knowledge discovery from data)

Extraction of interesting (non-trivial, implicit, previously unknown or
potentially useful) patterns or knowledge from huge amounts of data.
What is Data Mining? 5

Let us consider Market-Basket Analysis as an example to illustrate the concept of

data mining.
▶ Market-Basket Analysis is a data mining technique used in business and retail to
identify associations between products or items that customers tend to purchase
together.
▶ Market-Basket Analysis uncovers patterns and relationships within transac-
tional data, specifically looking at the items that customers buy in conjunction
with each other.
What is Data Mining? 5

Let us consider Market-Basket Analysis as an example to illustrate the concept of

Let’s break down the concept of extracting non-trivial, implicit, previously unknown,
or potentially useful patterns (or knowledge) in the context of Market Basket Anal-
ysis:
1. Non-trivial or Implicit or Previously Unknown or Potentially Useful
Patterns:
▶ Patterns or knowledge that are not immediately obvious or predictable based on
common sense or intuition from the given set of transactions.
▶ These types of patterns are discovered by analysing the transactional data.
What is Data Mining? 6

1. Non-trivial or Implicit or Previously Unknown or Potentially Useful

Patterns:
▶ Businesses can improve marketing strategies or decision-making policies, optimise
product placements, and improve consumer satisfaction levels by identifying these
types of patterns.
▶ For example, these types of patterns are valuable for recommendation systems and
targeted marketing. Businesses can use these patterns to suggest related products
to customers, improve product bundling, and enhance the overall shopping expe-
rience.
▶ Another advantage is optimizing product placement in a supermarket. By iden-
tifying frequently associated items, businesses can strategically position items to
enhance visibility, encourage cross-category purchases, and increase overall sales.
What is Data Mining? 8

Alternate Names for Data Mining

▶ Knowledge Discovery in Databases (KDD)
▶ Knowledge Extraction
▶ Data/Pattern Analysis
▶ Data Archeology
▶ Data Dredging
▶ Information Harvesting
▶ Business Intelligence
Association Rule Mining
Association Rule Mining 10

▶ We are given a set of transactions.

Association Rule Mining 10

▶ We are given a set of transactions.

▶ Association rule mining discovers rules that
will predict the occurrence of an item based
on the occurrence of other items in the given
transactions.
Association Rule Mining 11

▶ Examples of Association Rules:

{Diaper} → {Beer}
{M ilk, Bread} → {Eggs, Coke}
{Beer, Bread} → {M ilk}

▶ Please note that → denotes co-occurrence, not

implication.
What is Association Rule Mining? 12

▶ If you are given a set of transactions, association rule mining is the process of
discovering interesting or strong or valid association rules.
Association Rule Mining - Frequent Itemset 13

▶ What are the frequent itemsets in the given

set of transactions?
Association Rule Mining - Frequent Itemset 14

Itemset
▶ A collection or set of one or more items is called an Itemset.
▶ Examples:
1. {Milk, Bread, Diaper}
2. {Milk}

k - Itemset
▶ An itemset that contains k items or elements.
▶ Examples for 2-itemset:
1. {Milk, Bread}
2. {Milk, Diaper}
Association Rule Mining - Frequent Itemset 15

Support Count of an Itemset (σ)

▶ Frequency of occurrence of an itemset in the given set of transactions.
▶ Example:
1. σ({M ilk, Bread, Diaper}) = 2

Support of an itemset (s)

▶ Fraction of transactions that contain a particular itemset.
▶ Example for support:
2
1. s({Milk, Bread, Diaper}) = 5
Association Rule Mining - Frequent Itemset 16

Frequent Itemset
▶ An itemset whose support is greater than or equal to a user-specified
threshold value minsup.
▶ if minsup = 40%, then {Milk, Bread, Diaper} is a frequent itemset.
Interesting Association Rules 17

What is an interesting Association Rule?

▶ An association rule is an implication expression of the form X → Y , where
both X and Y are disjoint itemsets (i.e., X ∩ Y = ∅).
Interesting Association Rules 17

What is an interesting Association Rule?

▶ An association rule is an implication expression of the form X → Y , where
both X and Y are disjoint itemsets (i.e., X ∩ Y = ∅).
▶ {Milk, Diaper } → { Beer} is an example of an association rule.
▶ For an association rule X → Y to be interesting, we need to compute the
following rule evaluation metrics:
1. Support(s): It is the fraction of transactions that contain both the itemset X
and Y .
2. Confidence (c): It measures how often items in Y appear in transactions that
contain X. This is measured by the proportion of transactions with itemset X,
in which itemset Y also appear.
Interesting Association Rules 18

▶ Consider the following association rule from

the previously shown transactions: {Milk, Di-
aper } → { Beer}
Interesting Association Rules 18

▶ Consider the following association rule from

the previously shown transactions: {Milk, Di-
aper } → { Beer}
▶ For the above-mentioned association rule:
σ(M ilk, Diaper, Beer) 2
s= = = 0.4
|T | 5
σ(M ilk, Diaper, Beer) 2
c= = = 0.67
σ(M ilk, Diaper) 3
Interesting Association Rules 19

What is an interesting Association Rule?

▶ For an association rule X → Y to be interesting or strong or valid, it has to
satisfy the following two conditions:
1. Support(s) ≥ minsup threshold.
2. Confidence(c) ≥ minconf threshold.
Interesting Association Rule Mining 20

How will you find all interesting Association Rules?

▶ Brute-force Approach:
1. List all possible association rules (i.e., candidate rules) from the given set of trans-
actions.
2. Compute the Support(s) and Confidence (c) of each association rule.
3. Prune the rules that fail the minsup and minconf thresholds.

Limitations of Brute-force Approach

▶ The Brute-force approach for association rule mining is computationally expen-
sive.
▶ In order to avoid this limitation, the Apriori Algorithm is proposed, which
avoids the computation of all possible candidate rules.
Apriori Algorithm
Preliminaries 22

▶ The significance and reliability of an association rule are measured in terms of

its Support and Confidence.
▶ High support indicates that the itemsets are common in the transaction database,
meaning they occur frequently.
▶ An association rule with low support is likely to be uninteresting from a business
perspective because it may not be profitable to promote items that are seldom
bought together.
▶ Confidence measures the strength of the relationship between the antecedent
and consequent in an association rule.
▶ High confidence indicates that the presence of the antecedent is strongly asso-
ciated with the presence of the consequent.
Preliminaries 23

▶ An initial step towards improving the performance of the Brute-force Ap-

proach for association rule mining is to decouple the support and confidence
requirements.
▶ The support of an association rule X → Y depends only on the support of the
corresponding itemsets (X ∪ Y )
▶ For example, the following rules have identical support because they involve
items from the same itemset - {Beer, Diaper, M ilk}.
{Beer, Diaper} → {M ilk}
{Beer, M ilk} → {Diaper}
{Diaper, M ilk} → {Beer}
{Beer} → {Diaper, M ilk}
{M ilk} → {Beer, Diaper}
{Diaper} → {Beer, M ilk}
Preliminaries 24

▶ If the itemset {Beer, Diaper, M ilk} is infrequent, then all the above-listed can-
didate rules can be pruned immediately without computing their confidence
values.
▶ Therefore, a common strategy adopted by many association rule mining strate-
gies, including Apriori Algorithm, is to decompose the problem into two steps:
1. Frequent Itemset Generation:, whose objective is to find all the itemsets that
satisfy the minsup threshold. These itemsets are called frequent itemset.
2. Rule Generation:, whose objective is to extract all the high-confidence rules
from the frequent itemsets found in the previous step. These rules are called
interesting or valid or strong association rules.
▶ The computational requirements for frequent itemset generation are generally
more computationally expensive than those of rule generation.
Preliminaries 25

Frequent Itemset Generation:

▶ A lattice structure termed as Hasse Diagram can be used to enumerate all
possible itemset. In general, a transaction database that contains k items can
potentially generate 2k − 1 itemset excluding the null set.
▶ The following figure shows the Hasse diagram for a set of five items: I =
{a, b, c, d, e}.
Preliminaries 26

Frequent Itemset Generation:

▶ Because k can be very large in many practical applications, the search space of
itemsets that need to be explored in the Hasse diagram is exponential.
▶ Therefore, the brute-force approach for finding frequent itemset requires the
computation of the confidence score for the candidate itemsets in the Hasse
diagram.
▶ To do this, we need to compare each candidate against every transaction as
shown in the following figure, and it requires M × N × w number of operations,
where N is the number of transactions, M is the number of candidate itemsets
and w is the maximum transaction width.
Preliminaries 27

Apriori Principle
▶ The Apriori principle states that if an itemset is frequent, then all of its subsets
must be frequent.
Preliminaries 28

Apriori Principle for Frequent Itemset Generation:

▶ There are several ways to reduce the computational complexity of frequent
itemset generation.
▶ In Apriori algorithm, the Apriori principle helps reduce the number of candidate
itemsets to be explored during frequent itemset generation.
▶ To illustrate the idea behind Apriori principle, consider the Hasse diagram
shown previously for the itemset I = {a, b, c, d, e} and let {c, d, e} be a frequent
itemset.
Preliminaries 29

Apriori Principle for Frequent Itemset Generation:

▶ Then, any transaction that contains the items belonging to {c, d, e} must be a
subset ({c, d}, {c, e}, {d, e}, {c}, {d} and {e}) of this frequent itemset.
▶ As a result, if {c, d, e} is frequent, then all the subsets of {c, d, e} (the shaded
region in the Hasse diagram) must also be frequent. (this is known as Apriori
Principle)
Preliminaries 30

Apriori Principle for Frequent Itemset Generation:

▶ Conversely, if an itemset such as {a, b} is infrequent, then all of its supersets
must be infrequent too.
▶ As illustrated in the following figure, the entire subgraph containing the super-
sets of {a, b} can be prunned immediately once {a, b} is found to be infrequent.
Preliminaries 31

Apriori Principle for Frequent Itemset Generation:

▶ This strategy of trimming the exponential search space in the Hasse diagram
based on the support measure is termed as support-based prunning.
▶ Such a prunnig strategy is made possible by a key property of the support
measure, i.e., the support of an itemset never exceeds the support of its subset.
▶ This property is also known as the anti-monotone property of the support
measure.
▶ Any measure that possesses an anti-monotone property can be incorporated
directly into association rule mining algorithms to effectively prune the expo-
nential search space of the candidate itemsets.
Apriori Algorithm 32

▶ The Apriori algorithm was proposed by Rakesh Agrawal and Ramakrishnan

Srikant in 1994.
▶ The Apriori algorithm divided the problem of discovering interesting association
rules into two phases.
1. Phase 1: Find all itemsets with a specified minimal support (frequent itemset).
The Apriori property avoids unnecessary generation and evaluation of itemsets
that cannot be frequent if their supersets are not frequent.
2. Phase2: Use the frequent itemsets obtained in Phase 1 to help generate interest-
ing association rules.
Apriori Algorithm 33

Phase 1 of the Apriori Algorithm - some key terminologies.

▶ Large itemset: It doesn’t mean an itemset with many items. It means an
itemset whose support is at least minimum support.
▶ Set of Large k-itemsets(Lk ): It is the collection of Large itemsets of size k.
▶ Set of Candidate k-itemssets(Ck ): It is the collection of potentially large
k-itemsets. It contains all the k-itemsets that might be large.
Apriori Algorithm 34

Phase 1 of the Apriori Algorithm - Pseudocode

1. Generate the set of candidate 1-itemsets C1 .
2. Generate the set of large 1-itemsets L1 from C1 .
3. let k=2.
4. Generate candidate k-itemsets (Ck ) by performing a join operation on the large
(k-1)-itemset (i.e., Lk−1 ) with itself.
5. Prune the candidate k-itemsets generated in step 4. That is, eliminate candidate
k-itemsets with subsets of size (k-1) that are not members of Lk−1 . For example,
if {A, B, C} is a candidate 3-itemset, but both {A, B} and {A, C} are not
members of L2 , then {A, B, C} is pruned.
6. Generate large k-itemsets by counting the support of each candidate k-itemset
by scanning the entire database of transactions and then removing those can-
didate k-itemsets with support below the minimum support threshold.
Apriori Algorithm 35

Phase 1 of the Apriori Algorithm - Pseudocode contd..

7. k = k+1
8. Repeat steps 4-7 until no more large k-itemsets can be generated.
S
9. Return k Lk as the frequent itemsets.
Apriori Algorithm 36

Phase 1 of the Apriori Algorithm - Join Operation.

▶ The join operation (▷◁) is performed to combine the itemsets belonging to Lk−1
with itself to generate candidate k-itemset(Ck ). The condition for joining Lk−1
and Lk−1 is that they should have itemsets with the first (k-2) items in common.
▶ Thus, join operation is the cross product with the condition that two itemsets
in Lk−1 are combined to form a candidate k-itemset if they should have the first
k-2 items in common.
▶ For example, if the large 2-itemsets are {A, B} and {A, C}, then the candidate
3-itemset would be {A, B, C} after performing the join operation.
▶ By convention, Apriori algorithm assumes that the items in a transaction are
sorted in lexicographic order.
Apriori Algorithm 37

Phase 1 of the Apriori Algorithm - Example Problem.

▶ Consider the following transaction database (D). Lets assume that the mini-
mum support count = 2 and minimum confidence = 70%. Then, using Apriori
algorithm (i) Find all frequent itemsets (ii) List out all valid association rules.
TID List of items
T1 I1 , I2 , I5
T2 I2 , I4
T3 I2 , I3
T4 I1 , I2 , I4
T5 I1 , I3
T6 I2 , I3
T7 I1 , I3
T8 I1 , I2 , I3 , I5
T9 I1 , I2 , I3
Apriori Algorithm 38

Phase 1 of the Apriori Algorithm - step by step solution.

▶ Generate C1 : Each item in D is a member of the set of candidate 1-itemsets
(C1 ). So, we have to simply scan D in order to count the number of occurrences
of each item.
Itemset Support Count
I1 6
I2 7
I3 6
I4 2
I5 2
Apriori Algorithm 39

Phase 1 of the Apriori Algorithm - step by step solution

▶ Generate L1 : It consists of the candidate 1-itemsets satisfying the minimum
support threshold. Here, all the candidates in C1 satisfy minimum support.
Itemset Support Count
I1 6
I2 7
I3 6
I4 2
I5 2
Apriori Algorithm 40

Phase 1 of the Apriori Algorithm - step by step solution

▶ Generate C2 by performing join operation on L1 : i.e., C2 = L1 ▷◁ L1 . Here,
L1 ▷◁ L1 is equivalent to L1 × L1 , since the definition of Lk ▷◁ Lk requires the
two joining itemset to share the first k-2 items in common. (here k-2=0).
Itemset Support Count
{I1 , I2 } 4
{I1 , I3 } 4
{I1 , I4 } 1
{I1 , I5 } 2
{I2 , I3 } 4
{I2 , I4 } 2
{I2 , I5 } 2
{I3 , I4 } 0
{I3 , I5 } 1
{I4 , I5 } 0
Apriori Algorithm 41

Phase 1 of the Apriori Algorithm - step by step solution

▶ Generate C2 by performing join operation on L1 : Note that no candidates are
removed from C2 during the prune step because each subset of the candidates
is also frequent.
Apriori Algorithm 42

Phase 1 of the Apriori Algorithm - step by step solution

▶ Generate L2 from C2 : The set of large 2-itemset (L2 ) consists of those candidate
2-itemsets in C2 having minimum support.

Itemset Support Count

{I1 , I2 } 4
{I1 , I3 } 4
{I1 , I5 } 2
{I2 , I3 } 4
{I2 , I4 } 2
{I2 , I5 } 2
Apriori Algorithm 43

Phase 1 of the Apriori Algorithm - step by step solution

▶ Generate C3 by performing the join operation on L2 : Here L2 ▷◁ L2 =
{{I1 , I2 , I3 }, {I1 , I2 , I5 }, {I1 , I3 , I5 }, {I2 , I3 , I4 }, {I2 , I3 , I5 }, {I2 , I4 , I5 }}.
▶ Based on the Apriori property that all subsets of a large itemset must also be
large, we can determine that the four later candidates cannot possibly be
large. We can therefore remove them from C3 .

Itemset Support Count

{I1 , I2 , I3 } 2
{I1 , I2 , I5 } 2
Apriori Algorithm 44

Phase 1 of the Apriori Algorithm - step by step solution

▶ Generate L3 from C3 : The set of large 3-itemset (L3 ) consists of those candidate
3-itemsets in C3 having minimum support.

Itemset Support Count

{I1 , I2 , I3 } 2
{I1 , I2 , I5 } 2
Apriori Algorithm 45

Phase 1 of the Apriori Algorithm - step by step solution

▶ Generate C4 from L3 : We can use L3 ▷◁ L3 to generate the set of candidate
4-itemsets C4 . In this case, the join results in {{I1 , I2 , I3 , I5 }} and this itemset
is prunned because its subset {I2 , I3 , I5 } is not large. Thus, C4 = ϕ and the
algorithm terminates, having found all the frequent itemsets.
Apriori Algorithm 46

Phase 1 of the Apriori Algorithm - step by step solution

▶ Thus, Phase 1 of the Apriori algorithm generates the following frequent itemsets:

1. {I1 , I2 , I3 } (large (or frequent) 3-itemset from L3 )

2. {I1 , I2 , I5 } (large (or frequent) 3-itemset from L3 )
3. {I1 , I2 } (large (or frequent) 2-itemset from L2 )
4. {I1 , I3 } (large (or frequent) 2-itemset from L2 )
5. {I1 , I5 } (large (or frequent) 2-itemset from L2 )
6. {I2 , I3 } (large (or frequent) 2-itemset from L2 )
7. {I2 , I4 } (large (or frequent) 2-itemset from L2 )
8. {I2 , I5 } (large (or frequent) 2-itemset from L2 )
9. {I1 } (large (or frequent) 1-itemset from L1 )
10. {I2 } (large (or frequent) 1-itemset from L1 )
11. {I3 } (large (or frequent) 1-itemset from L1 )
12. {I4 } (large (or frequent) 1-itemset from L1 )
13. {I5 } (large (or frequent) 1-itemset from L1 )
Apriori Algorithm 47

Phase 1 of the Apriori Algorithm - step by step solution

▶ Now, we have to generate interesting or valid or strong association rules from the
above-listed frequent itesets generated from Phase 1 of the Apriori algorithm.
▶ Phase 2 of the Apriori algorithm discovers all the interesting association rules
from the above-listed frequent itemsets.
Apriori Algorithm 48

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ Valid association rules can be generated from the frequent itemsets obtained in
Phase 1 as follows:
1. For each frequent itemset l, generate all nonempty proper subsets of l.
2. For every nonempty proper subset s of l, output the rule “s → (l − s)” if
support count(l)
support count(s) ≥ minconf threshold.
Apriori Algorithm 49

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ For the previous problem, consider the frequent itemset {I1 , I2 , I5 } ∈ L3 .
▶ Its non-empty proper subsets are: {{I1 , I2 }, {I1 , I5 }, {I2 , I5 }, {I1 }, {I2 }, {I5 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 , I2 } → {I5 } ( σ(s) = 24 )
σ(l)
2. {I1 , I5 } → {I2 } ( σ(s) = 22 )
σ(l)
3. {I2 , I5 } → {I1 } ( σ(s) = 22 )
σ(l)
4. {I1 } → {I2 , I5 } ( σ(s) = 26 )
σ(l)
5. {I2 } → {I1 , I5 } ( σ(s) = 27 )
σ(l)
6. {I5 } → {I1 , I2 } ( σ(s) = 22 )
▶ Since the given confidence threshold is 70%, only the second, third, and last rules will be
selected as valid association rules from the frequent itemset {I1 , I2 , I5 } ∈ L3 .
Apriori Algorithm 50

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ Now, consider the frequent itemset {I1 , I2 , I3 } ∈ L3 .
▶ Its non-empty proper subsets are: {{I1 , I2 }, {I1 , I3 }, {I2 , I3 }, {I1 }, {I2 }, {I3 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 , I2 } → {I3 } ( σ(s) = 24 )
σ(l)
2. {I1 , I3 } → {I2 } ( σ(s) = 24 )
σ(l)
3. {I2 , I3 } → {I1 } ( σ(s) = 24 )
σ(l)
4. {I1 } → {I2 , I3 } ( σ(s) = 26 )
σ(l)
5. {I2 } → {I1 , I3 } ( σ(s) = 27 )
σ(l)
6. {I3 } → {I1 , I2 } ( σ(s) = 26 )
▶ Since the given confidence threshold is 70%, none of the above-mentioned association rules
will be selected as valid association rules from the frequent itemset {I1 , I2 , I3 } ∈ L3 .
Apriori Algorithm 51

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ Now, consider the frequent itemset {I1 , I2 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I1 }, {I2 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 } → {I2 } ( σ(s) = 46 )
σ(l)
2. {I2 } → {I1 } ( σ(s) = 47 )
▶ Since the given confidence threshold is 70%, none of the above-mentioned
association rules will be selected as valid association rules from the frequent
itemset {I1 , I2 } ∈ L2 .
Apriori Algorithm 52

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ Now, consider the frequent itemset {I1 , I3 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I1 }, {I3 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 } → {I3 } ( σ(s) = 46 )
σ(l)
2. {I3 } → {I1 } ( σ(s) = 46 )
▶ Since the given confidence threshold is 70%, none of the above-mentioned
association rules will be selected as valid association rules from the frequent
itemset {I1 , I3 } ∈ L2 .
Apriori Algorithm 53

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ Now, consider the frequent itemset {I1 , I5 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I1 }, {I5 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I1 } → {I5 } ( σ(s) = 26 )
σ(l)
2. {I5 } → {I1 } ( σ(s) = 22 )
▶ Since the given confidence threshold is 70%, only the second association rules
will be selected as valid association rules from the frequent itemset
{I1 , I5 } ∈ L2 .
Apriori Algorithm 54

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ Now, consider the frequent itemset {I2 , I3 } ∈ L2 .
▶ Its non empty proper subsets are: {{I2 }, {I3 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I2 } → {I3 } ( σ(s) = 46 )
σ(l)
2. {I3 } → {I2 } ( σ(s) = 47 )
▶ Since the given confidence threshold is 70%, none of the above listed
association rules will be selected as valid association rules from the frequent
itemset {I2 , I3 } ∈ L2 .
Apriori Algorithm 55

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ Now, consider the frequent itemset {I2 , I4 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I2 }, {I4 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I2 } → {I4 } ( σ(s) = 27 )
σ(l)
2. {I4 } → {I2 } ( σ(s) = 22 )
▶ Since the given confidence threshold is 70%, only the second association rule
will be selected as valid association rule from the frequent itemset
{I2 , I4 } ∈ L2 .
Apriori Algorithm 56

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ Now, consider the frequent itemset {I2 , I5 } ∈ L2 .
▶ Its non-empty proper subsets are: {{I2 }, {I5 }}.
▶ Then, the resultant association rules will be:
σ(l)
1. {I2 } → {I5 } ( σ(s) = 27 )
σ(l)
2. {I5 } → {I2 } ( σ(s) = 22 )
▶ Since the given confidence threshold is 70%, only the second association rule
will be selected as valid association rule from the frequent itemset
{I2 , I5 } ∈ L2 .
Apriori Algorithm 57

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

▶ We don’t want to consider the itemsets belonging to L1 . Because, from these
large 1-itemsets in L1 , we cannot generate association rules as per the algorithm
discussed previously.
▶ So, the valid association rules that can be generated from the given transaction
database D are:
1. {I1 , I5 } → {I2 }
2. {I2 , I5 } → {I1 }
3. {I5 } → {I1 , I2 }
4. {I5 } → {I1 }
5. {I4 } → {I2 }
6. {I5 } → {I2 }
Apriori Algorithm 58

Apriori Algorithm - Practice Problem.

▶ Consider the following transaction database (D). Let’s assume that min sup
= 60% and min conf = 80%. Then, using the Apriori algorithm (i) Find all
frequent itemsets and (ii) List out all valid association rules.

TID List of items

T100 {K, A, D, B}
T200 {D, A, C, E, B}
T300 {C, A, B, E}
T400 {B, A, D}
Apriori Algorithm 59

Apriori Algorithm - Practice Problem :- The Frequent Itemsets are:

L1 L2 L3
Itemset support count Itemset support count
{A} 4 {A, B} 4 Itemset support count
{B} 4 {A, D} 3 {A, B, D} 3
{D} 3 {B, D} 3
FP Growth Algorithm

Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
No ratings yet
Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
81 pages
DWM
No ratings yet
DWM
66 pages
Contents
No ratings yet
Contents
59 pages
Ariori Introduction and Concept
No ratings yet
Ariori Introduction and Concept
37 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Market Basket Analysisv4.0 PDF
No ratings yet
Market Basket Analysisv4.0 PDF
31 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
30 pages
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
12 pages
Mining Association
No ratings yet
Mining Association
14 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
AssociationRule and Apriori
No ratings yet
AssociationRule and Apriori
45 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Unit 3 Final
No ratings yet
Unit 3 Final
13 pages
Unit 4 Association Rule Mining
No ratings yet
Unit 4 Association Rule Mining
18 pages
Evans Analytics2e PPT 10 Data Mining
No ratings yet
Evans Analytics2e PPT 10 Data Mining
69 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
CSA 106 Market Basket Analysis
No ratings yet
CSA 106 Market Basket Analysis
13 pages
Data Mining - Output: Knowledge Representation
No ratings yet
Data Mining - Output: Knowledge Representation
30 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
Market Basket Analysis: Identify The Changing Trends of Market Data Using Association Rule Mining
No ratings yet
Market Basket Analysis: Identify The Changing Trends of Market Data Using Association Rule Mining
8 pages
Association Rule Mining
No ratings yet
Association Rule Mining
6 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
ARM Merged
No ratings yet
ARM Merged
11 pages
ch14 Min Assoc Rules
No ratings yet
ch14 Min Assoc Rules
12 pages
1 - Educational Data Mining Survey
No ratings yet
1 - Educational Data Mining Survey
32 pages
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
DWDM Unit 5
No ratings yet
DWDM Unit 5
55 pages
Data Mining: Department of Computer Science & Engineering Jamia Hamdard, New Delhi
No ratings yet
Data Mining: Department of Computer Science & Engineering Jamia Hamdard, New Delhi
43 pages
Web Mining Report
100% (2)
Web Mining Report
46 pages
BCA-511 Data Mining & Warehousing - VK BCA
No ratings yet
BCA-511 Data Mining & Warehousing - VK BCA
3 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
Association Rule Mining With R
No ratings yet
Association Rule Mining With R
58 pages
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
No ratings yet
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
86 pages
Week 01 Lecture Material PDF
No ratings yet
Week 01 Lecture Material PDF
79 pages
Market Basket Analysis For Data Mining Concepts and Techniques
No ratings yet
Market Basket Analysis For Data Mining Concepts and Techniques
4 pages
Module 1 Data Mining
No ratings yet
Module 1 Data Mining
10 pages
DA Unit 4
100% (1)
DA Unit 4
125 pages
Association Rules v2
No ratings yet
Association Rules v2
9 pages
Easy Understanding of Attribute Oriented Induction (AOI) Characteristic Rule Algorithm
No ratings yet
Easy Understanding of Attribute Oriented Induction (AOI) Characteristic Rule Algorithm
8 pages
Frequent Item Set in Data Set (Association Rule Mining) - Unit3
No ratings yet
Frequent Item Set in Data Set (Association Rule Mining) - Unit3
6 pages
Credit Card Fraud Detection by Improving K-Means: Mahesh Singh, Aashima, Sangeeta Raheja
No ratings yet
Credit Card Fraud Detection by Improving K-Means: Mahesh Singh, Aashima, Sangeeta Raheja
5 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
DMBI Viva
No ratings yet
DMBI Viva
18 pages
Association Rules Classroom
No ratings yet
Association Rules Classroom
102 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
Data Warehouse and Data Mining - Unit 5
No ratings yet
Data Warehouse and Data Mining - Unit 5
30 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Market Basket Analysis Using Association Rules Unit 5
No ratings yet
Market Basket Analysis Using Association Rules Unit 5
21 pages
UNIT-4 DMCT Discovering Patterns and Rules
No ratings yet
UNIT-4 DMCT Discovering Patterns and Rules
18 pages
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
Assignment 3 - Exploratory Data Analysis
No ratings yet
Assignment 3 - Exploratory Data Analysis
2 pages
Unit IV
No ratings yet
Unit IV
86 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
Data Mining
No ratings yet
Data Mining
2 pages
Data Mining
No ratings yet
Data Mining
4 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Unit-II Association Rules
No ratings yet
Unit-II Association Rules
16 pages
MBA Data Mining Unit 1 Notes
No ratings yet
MBA Data Mining Unit 1 Notes
12 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Fundamentals of Data Mining
No ratings yet
Fundamentals of Data Mining
36 pages
Data Mining - 2
No ratings yet
Data Mining - 2
16 pages
Dataminingshort Question Part2
No ratings yet
Dataminingshort Question Part2
17 pages
Tutorial 02
No ratings yet
Tutorial 02
17 pages
Slides CRM - 4
No ratings yet
Slides CRM - 4
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
ML Unit 1
No ratings yet
ML Unit 1
124 pages
DBMS Unit-Iv
No ratings yet
DBMS Unit-Iv
20 pages
Higher Ed
No ratings yet
Higher Ed
60 pages
TMK - DWDM - Unit 4. From Government Engineering College
No ratings yet
TMK - DWDM - Unit 4. From Government Engineering College
176 pages
DM 02 04 Data Transformation
No ratings yet
DM 02 04 Data Transformation
49 pages
Transport Layer Services
No ratings yet
Transport Layer Services
8 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
8 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
Data Mining Mod 2
No ratings yet
Data Mining Mod 2
7 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Past PPR
No ratings yet
Past PPR
31 pages
Final Exam Machine Learning & Data Mining
No ratings yet
Final Exam Machine Learning & Data Mining
3 pages
Communication For Ugc Net Paper 1 Topics Brief
No ratings yet
Communication For Ugc Net Paper 1 Topics Brief
15 pages
32033
No ratings yet
32033
48 pages
Decision Tree New
No ratings yet
Decision Tree New
52 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Probability and Statistics Mansoura Day3
No ratings yet
Probability and Statistics Mansoura Day3
31 pages
LAN Switching and Link Layer Switches
No ratings yet
LAN Switching and Link Layer Switches
7 pages
Chap5 Frequent Itemset
No ratings yet
Chap5 Frequent Itemset
70 pages
Link State Protocol
No ratings yet
Link State Protocol
5 pages
TCP Congestion Control
No ratings yet
TCP Congestion Control
5 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Gini Index Problem
No ratings yet
Gini Index Problem
12 pages
DM Unit-2
No ratings yet
DM Unit-2
22 pages
IP Anycast
No ratings yet
IP Anycast
5 pages
Association Rules
No ratings yet
Association Rules
39 pages
Unit 2
No ratings yet
Unit 2
14 pages
Rani 2
No ratings yet
Rani 2
98 pages
Topic 03 - Mining Association Rules
No ratings yet
Topic 03 - Mining Association Rules
12 pages
Johnson Et Al 2011 Political Geography
No ratings yet
Johnson Et Al 2011 Political Geography
10 pages
Data Analytics. Fast Overview.
From Everand
Data Analytics. Fast Overview.
George Letton
2.5/5 (19)
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
From Everand
Data Analytics for Businesses 2019: Master Data Science with Optimised Marketing Strategies using Data Mining Algorithms (Artificial Intelligence, Machine Learning, Predictive Modelling and more)
Riley Adams
5/5 (1)

ML Unit 2

Uploaded by

ML Unit 2

Uploaded by

22-382-0203 Machine Learning

Introduction to Data Mining

Association Rule Mining

Data Mining (knowledge discovery from data)

Let us consider Market-Basket Analysis as an example to illustrate the concept of

Let us consider Market-Basket Analysis as an example to illustrate the concept of

Let us consider Market-Basket Analysis as an example to illustrate the concept of

1. Non-trivial or Implicit or Previously Unknown or Potentially Useful

1. Non-trivial or Implicit or Previously Unknown or Potentially Useful

1. Non-trivial or Implicit or Previously Unknown or Potentially Useful

Alternate Names for Data Mining

▶ We are given a set of transactions.

▶ We are given a set of transactions.

▶ Examples of Association Rules:

▶ Please note that → denotes co-occurrence, not

▶ What are the frequent itemsets in the given

Support Count of an Itemset (σ)

Support of an itemset (s)

What is an interesting Association Rule?

What is an interesting Association Rule?

What is an interesting Association Rule?

▶ Consider the following association rule from

▶ Consider the following association rule from

What is an interesting Association Rule?

How will you find all interesting Association Rules?

Limitations of Brute-force Approach

▶ The significance and reliability of an association rule are measured in terms of

▶ An initial step towards improving the performance of the Brute-force Ap-

Frequent Itemset Generation:

Frequent Itemset Generation:

Apriori Principle for Frequent Itemset Generation:

Apriori Principle for Frequent Itemset Generation:

Apriori Principle for Frequent Itemset Generation:

Apriori Principle for Frequent Itemset Generation:

▶ The Apriori algorithm was proposed by Rakesh Agrawal and Ramakrishnan

Phase 1 of the Apriori Algorithm - some key terminologies.

Phase 1 of the Apriori Algorithm - Pseudocode

Phase 1 of the Apriori Algorithm - Pseudocode contd..

Phase 1 of the Apriori Algorithm - Join Operation.

Phase 1 of the Apriori Algorithm - Example Problem.

Phase 1 of the Apriori Algorithm - step by step solution.

Phase 1 of the Apriori Algorithm - step by step solution

Phase 1 of the Apriori Algorithm - step by step solution

Phase 1 of the Apriori Algorithm - step by step solution

Phase 1 of the Apriori Algorithm - step by step solution

Itemset Support Count

Phase 1 of the Apriori Algorithm - step by step solution

Itemset Support Count

Phase 1 of the Apriori Algorithm - step by step solution

Itemset Support Count

Phase 1 of the Apriori Algorithm - step by step solution

Phase 1 of the Apriori Algorithm - step by step solution

1. {I1 , I2 , I3 } (large (or frequent) 3-itemset from L3 )

Phase 1 of the Apriori Algorithm - step by step solution

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Phase 2 of the Apriori Algorithm - Rule generation from frequent itemsets

Apriori Algorithm - Practice Problem.

TID List of items

Apriori Algorithm - Practice Problem :- The Frequent Itemsets are:

You might also like