0% found this document useful (0 votes)
51 views29 pages

DMDW Chapter 4

The document discusses frequent pattern mining and association rule mining. It defines frequent patterns as patterns that appear frequently in a dataset, like items that frequently occur together. Finding frequent patterns is important for tasks like association analysis. It describes the Apriori algorithm for mining frequent itemsets and generating association rules. Apriori uses a candidate generation-and-test approach with pruning to iteratively find frequent itemsets of increasing size. The algorithm terminates when no further frequent itemsets are found.

Uploaded by

Samuel Gesese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views29 pages

DMDW Chapter 4

The document discusses frequent pattern mining and association rule mining. It defines frequent patterns as patterns that appear frequently in a dataset, like items that frequently occur together. Finding frequent patterns is important for tasks like association analysis. It describes the Apriori algorithm for mining frequent itemsets and generating association rules. Apriori uses a candidate generation-and-test approach with pruning to iteratively find frequent itemsets of increasing size. The algorithm terminates when no further frequent itemsets are found.

Uploaded by

Samuel Gesese
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

FACULTY: COMPUTING & SWE

Introduction to Data Mining


and
Warehousing
Amin T (Asst. Prof). (2022/23)
CHAPTER FOUR
BASIC CONCEPTS AND METHODS OF MINING FREQUENT
PATTERNS, ASSOCIATIONS AND CORRELATIONS

Garbage In Garbage Out


(GIGO)
02/22/2023 2
Basic concept
• Frequent patterns are patterns (such as itemsets or subsequences) that
appear in a data set frequently.
– For example,
– a set of items, such as milk and bread, that appear frequently together in a
transaction dataset is a frequent itemset.
– A subsequence, such as buying first a PC, then a digital camera, and then a
memory card, if it occurs frequently in a shopping history database, is a
(frequent) sequential pattern
• Finding such frequent patterns plays an essential role in mining associations,
correlations, and many other interesting relationships among data.

02/22/2023 3
Frequent Pattern Analysis
• ƒ Motivation: Finding inherent regularities in data
– ƒ What products were often purchased together?— Beer and diapers?!
– ƒ What are the subsequent purchases after buying a PC?
– ƒ What kinds of DNA are sensitive to this new drug?
– ƒ Can we automatically classify web documents?
• Applications
-Basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log
(click stream) analysis, and DNA sequence analysis.

02/22/2023 4
Why is Frequent Pattern Mining Important?
• Frequent pattern: An intrinsic (fundamental) and important property of datasets
• ƒ Foundation for many essential data mining tasks
ƒ Association, correlation, and causality analysis Sequential, structural (e.g., sub-
graph) patterns
ƒ Pattern analysis in spatiotemporal, multimedia, time-series, and stream data
ƒ Classification: discriminative, frequent pattern analysis
ƒ Cluster analysis: frequent pattern-based clustering
ƒ Data warehousing: iceberg cube and cube-gradient
ƒ Semantic data compression: fascicles
ƒ Broad applications

02/22/2023 5
Association Rule Mining
• Frequent itemset mining leads to the discovery of associations and correlations among
items in large transactional or relational data sets.
• Association is used to mine frequent patterns
• Initially used for Market Basket Analysis to analyze customer buying habits by finding
associations between the different items that customers place in their “shopping
baskets”

02/22/2023 6
Association Rule Mining
• The result can be used in
– Making various marketing strategies or advertising strategies
– Shelf arrangement
– design different layouts of shop

02/22/2023 7
Association rule can be used for

02/22/2023 8
Association Rule Mining
• Association Rule:
TID Items
– An implication expression of the form
1. Bread, Milk
X → Y, where X and Y are itemsets 2. Bread, Diaper, Beer, Eggs
– Example: 3. Milk, Diaper, Beer, Coke
{Milk, Diaper} → {Beer} 4. Bread, Milk, Diaper, Beer
5. Bread, Milk, Diaper, Coke
• Rule Evaluation Metrics
– Support(s)
 = support(“AB” probability that a transaction contains A∪B
 s = P(A∪B)
– Confidence (c)
 conditional probability that a transaction having
A also contains B.
 c = confidence(“AB”) = P(B|A)
02/22/2023 9
Basic Concept: Association Rules

Transaction-id Items bought  Let min_support = 50%,


10 A, B, C min_conf = 50%:
20 A, C  A  C (50%, 66.7%)
30 A, D  C  A (50%, 100%)
40 B, E, F

Customer Customer
buys both buys diaper

Customer
buys beer
10
Basic Concepts: Frequent Patterns and Association Rules

• Association rule mining is a two-step process:


– Find all frequent itemsets
– Generate strong association rules from the frequent itemsets
– For every frequent itemset L, find all non-empty subsets of L. For every such subset A, output a
rule of the form “A  (L-A)” if the ratio of support(L) to support(A) is at least minimum
confidence

• The overall performance of mining association rules is determined by the first step
• Given a set of transactions T, the goal of association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

11
Mining Association Rules—an Example

Transaction-id Items bought Min. support 50%


10 A, B, C Min. confidence 50%
20 A, C
Frequent pattern Support
30 A, D
{A} 75%
40 B, E, F
{B} 50%
{C} 50%
{A, C} 50%
For rule A C:
support = support({A}{C}) = 50%
confidence = support({A}{C})/support({A}) = 66.6%

12
The Apriori Algorithm
• The name, Apriori, is based on the fact that the algorithm uses prior knowledge of
frequent itemset properties
• Apriori employs an iterative approach known as a level-wise search, where k-
itemsets are used to explore (k+1)-itemsets
– The first pass determines the frequent 1-itemsets denoted L1
– A subsequence pass k consists of two phases
– First, the frequent itemsets Lk-1 are used to generate the candidate itemsets C k
– Next, the database is scanned and the support of candidates in Ck is counted
– The frequent itemsets Lk are determined

13
Apriori Property
• Apriori property: any subset of a large itemset must be large
– If {beer, diaper, nuts} is frequent, so is {beer, diaper}
– Every transaction having {beer, diaper, nuts} also contains {beer, diaper}
• Anti-monotone: if a set cannot pass a test, all of its supersets will fail the same test as
well

14
Apriori: A Candidate Generation-and-test Approach
• Apriori pruning principle: If there is any itemset which is infrequent, its superset should
not be generated/tested!
• Method: join and prune steps
– Generate candidate (k+1)-itemsets Ck+1 from frequent k-itemsets Lk
– If any k-subset of a candidate (k+1)-itemset is not in Lk, then the candidate cannot
be frequent either and so can be removed from Ck
– Test the candidates against DB to obtain Lk+1

15
The Apriori Algorithm—Example
• Let the minimum support be 20%

16
The Apriori Algorithm—Example

The algorithm uses L3 x L3 to generate a candidate set of 4-itemsets, C 4. Although the join results in
{{ I1, I2, I3, I5 }}, this itemset is pruned because its subset {{ I2, I3, I5}} is not frequent. Thus, C4
=Ø, and the algorithm terminates, having found all of the frequent itemsets.
17
The Apriori Algorithm
• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++)
Ck+1 = candidates generated from Lk;
for each transaction t in database
increment the count of all candidates in Ck+1 that are
contained in t
end
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;

18
Important Details of Apriori
• How to generate candidates?
– Step 1: self-joining Lk
– Step 2: pruning

• How to count supports of candidates?

• Example of candidate-generation
– L3={abc, abd, acd, ace, bcd}
– Self-joining: L3*L3
– abcd from abc and abd
– acde from acd and ace
– Pruning:
– acde is removed because ade is not in L3

– C4={abcd}
19
How to Generate Candidates?
• Suppose the items in Lk-1 are listed in an order

• Step 1: self-joining Lk-1


insert into Ck

select p.item1, p.item2, …, p.itemk-1, q.itemk-1

from Lk-1 p, Lk-1 q

where p.item1=q.item1, …, p.itemk-2=q.itemk-2, p.itemk-1 < q.itemk-1

• Step 2: pruning
forall itemsets c in Ck do

forall (k-1)-subsets s of c do

if (s is not in Lk-1) then delete c from Ck


20
Generating Association Rules from Frequent Itemsets
• Once the frequent itemsets from transactions in a database D have been found,
it is straightforward to generate strong association rules from them (where
strong association rules satisfy both minimum support and minimum
confidence).

02/22/2023 21
• The association rules that can be generated from L3= { I1, I2, I5 }:

– The nonempty subsets of l are
– The resulting association rules are as shown below, each listed with its confidence:

• If the minimum confidence threshold is, say, 70%, then only the second, third,
and last rules above are the output, because these are the only ones generated
that are strong.

02/22/2023 22
Mining Frequent Itemsets without Candidate Generation

• FP-growth: Mining frequent patterns with FP-tree by pattern fragment growth:


– Input: constructed FP-tree, using DB and a minimum support threshold .
– Output: The complete set of frequent patterns.
– Method: Call FP-growth(FP-tree, null).

23
Algorithm of FP-growth

24
Construct FP-tree from a Transaction Database
 Let the minimum support be 20%
1. Scan DB once, find frequent 1-itemset (single item pattern)
2. Sort frequent items in frequency descending order, f-list
3. Scan DB again, construct FP-tree

Frequent 1-itemset Support Count


I1 6
I2 7
I3 6
I4 2
I5 2
25
Construct FP-tree from a Transaction Database

26
Construct FP-tree from a Transaction Database

27
Additional references
• https://www.youtube.com/watch?v=yCbankIouUU
• http://home.etf.rs/~vm/os/dmsw/FP-Growth%20algorithm.ppt

02/22/2023 28
Thank You

You might also like