0% found this document useful (0 votes)

27 views21 pages

KDDM-Lecture 3

This document discusses frequent pattern analysis and association rule mining. It introduces key concepts such as frequent itemsets, support count, and association rules. It also describes several algorithms for mining frequent patterns, including Apriori, which is a seminal candidate generation-and-test approach. Apriori works in multiple passes over the transaction database, generating candidate itemsets in each pass and pruning those that are not frequent. The document discusses ways to improve the efficiency of Apriori, such as reducing the number of database scans.

Uploaded by

Kamran Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views21 pages

KDDM-Lecture 3

Uploaded by

Kamran Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Data Mining:

Concepts and Techniques

(3rd ed.)

— Chapter 6 —

Jiawei Han, Micheline Kamber, and Jian Pei

University of Illinois at Urbana-Champaign &
Simon Fraser University
©2011 Han, Kamber & Pei. All rights reserved.
1
What Is Frequent Pattern Analysis?
 Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context
of frequent itemsets and association rule mining
 Motivation: Finding inherent regularities in data
 What products were often purchased together?— Beer and diapers?!
 What are the subsequent purchases after buying a PC?
 What kinds of DNA are sensitive to this new drug?
 Can we automatically classify web documents?
 Applications
 Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
2
Basic Concepts: Association Rules
 Association rule mining is a two-step process

 Find all frequent item sets

 Item sets will occur at least as frequently as a
predetermined minimum support count, min sup

 Find strong association rules from the frequent

item sets
 Rules, which must satisfy min sup and min conf

3
Scalable Frequent Itemset Mining Methods

 Apriori: A Candidate Generation-and-Test

Approach

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data

Format
4
Basic Concepts: Frequent Patterns

Tid Items bought  itemset: A set of one or more

10 Beer, Nuts, Diaper items
20 Beer, Coffee, Diaper  k-itemset X = {x1, …, xk}
30 Beer, Diaper, Eggs  (absolute) support, or, support
40 Nuts, Eggs, Milk count of X: Frequency or
50 Nuts, Coffee, Diaper, Eggs, Milk occurrence of an itemset X
Customer Customer
 (relative) support, s, is the
buys both buys diaper fraction of transactions that
contains X (i.e., the probability
that a transaction contains X)
 An itemset X is frequent if X’s
support is no less than a minsup
Customer threshold
buys beer

5
The Downward Closure Property and Scalable
Mining Methods
 The downward closure property of frequent patterns
 Any subset of a frequent itemset must be frequent

 If {beer, diaper, nuts} is frequent, so is {beer,

diaper}
 i.e., every transaction having {beer, diaper, nuts} also

contains {beer, diaper}

 Scalable mining methods: Three major approaches
 Apriori (Agrawal & Srikant@VLDB’94)

 Freq. pattern growth (FPgrowth—Han, Pei & Yin

@SIGMOD’00)
 Vertical data format approach (Charm—Zaki & Hsiao

@SDM’02)
6
Apriori: A Candidate Generation & Test Approach

 Apriori pruning principle: If there is any itemset which is

infrequent, its superset should not be generated/tested!
(Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)
 Method:
 Initially, scan DB once to get frequent 1-itemset
 Generate length (k+1) candidate itemsets from length k
frequent itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set can be
generated

7
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup
{A, C} 2 2nd scan {A, B}
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
8
The Apriori Algorithm (Pseudo-Code)
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk; 9
Implementation of Apriori

 How to generate candidates?

 Step 1: self-joining Lk
 Step 2: pruning
 Example of Candidate-generation
 L3={abc, abd, acd, ace, bcd}
 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because ade is not in L3
 C4 = {abcd}
10
Basic Concepts: Frequent Patterns

 Relationship between the frequent item sets is

represented as association rule
 Association rule between two item sets X and Y with
support s and confidence c is represented as follows

 Let be an item set and , and

11
Basic Concepts: Association Rules
Tid Items bought  computerantivirus software [support
10 Beer, Nuts, Diaper
=2%,confidence =60%]
20 Beer, Coffee, Diaper
 2% of all the transactions under analysis
30 Beer, Diaper, Eggs
show that computer and antivirus
40 Nuts, Eggs, Milk
software are purchased together
50 Nuts, Coffee, Diaper, Eggs, Milk

Customer
Customer  60% of the customers who purchased a
buys both
buys computer also bought the software
diaper
 Association rules:
 Beer  Diaper (0.6, 1)
Customer 
 Diaper
Diaper  Beer
Beer (0.6, 0.75)
buys beer

12
Scalable Frequent Itemset Mining Methods

 Apriori: A Candidate Generation-and-Test Approach

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data Format

 Mining Close Frequent Patterns and Maxpatterns

13
Further Improvement of the Apriori Method

 Major computational challenges

 Multiple scans of transaction database
 Huge number of candidates
 Tedious workload of support counting for candidates
 Improving Apriori: general ideas
 Reduce passes of transaction database scans
 Shrink number of candidates
 Facilitate support counting of candidates

14
Partition: Scan Database Only Twice
 Any itemset that is potentially frequent in DB must be
frequent in at least one of the partitions of DB
 Scan 1: partition database and find local frequent

patterns
 Scan 2: consolidate global frequent patterns

 A. Savasere, E. Omiecinski and S. Navathe, VLDB’95

DB1 + DB2 + + DBk = DB

sup1(i) < σDB1 sup2(i) < σDB2 supk(i) < σDBk sup(i) < σDB
DHP: Reduce the Number of Candidates

 A k-itemset whose corresponding hashing bucket count is below the

threshold cannot be frequent count itemsets
 Candidates: a, b, c, d, e 35 {ab, ad, ae}
88 {bd, be, de}
 Hash entries
{ab, ad, ae}
.

.
. .
{bd, be, de}
.
 .

 … 102 {yz, qs, wt}

 Frequent 1-itemset: a, b, d, e Hash Table

 ab is not a candidate 2-itemset if the sum of count of {ab, ad, ae}

is below support threshold
 J. Park, M. Chen, and P. Yu. An effective hash-based algorithm for
mining association rules. SIGMOD’95
16
Sampling for Frequent Patterns

 Select a sample of original database, mine frequent

patterns within sample using Apriori
 Scan database once to verify frequent itemsets found in
sample, only borders of closure of frequent patterns are
checked
 Example: check abcd instead of ab, ac, …, etc.
 Scan database again to find missed frequent patterns
 H. Toivonen. Sampling large databases for association
rules. In VLDB’96

17
Scalable Frequent Itemset Mining Methods

 Apriori: A Candidate Generation-and-Test Approach

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data Format

 Mining Close Frequent Patterns and Maxpatterns

18
Pattern-Growth Approach: Mining Frequent Patterns
Without Candidate Generation
 Bottlenecks of the Apriori approach
 Breadth-first (i.e., level-wise) search
 Candidate generation and test
 Often generates a huge number of candidates
 The FPGrowth Approach (J. Han, J. Pei, and Y. Yin, SIGMOD’ 00)
 Depth-first search
 Avoid explicit candidate generation
 Major philosophy: Grow long patterns from short ones using local
frequent items only
 “abc” is a frequent pattern
 Get all transactions having “abc”, i.e., project DB on abc: DB|abc
 “d” is a local frequent item in DB|abc  abcd is a frequent pattern
19
Construct FP-tree from a Transaction Database

TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m,{f,
p}c, a, m, p}
200 {a, b, c, f, l, m, o}{f, c, a, b, m}
300 {b, f, h, j, o, w} {f, b} min_support = 3
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m,{f,
n}c, a, m, p}
{}
Header Table
1. Scan DB once, find
frequent 1-itemset (single Item frequency head f:4 c:1
item pattern) f 4
c 4 c:3 b:1 b:1
2. Sort frequent items in a 3
frequency descending b 3 a:3 p:1
order, f-list m 3
p 3
3. Scan DB again, construct m:2 b:1
FP-tree
F-list = f-c-a-b-m-p p:2 m:1
20
Partition Patterns and Databases

 Frequent patterns can be partitioned into subsets

according to f-list
 F-list = f-c-a-b-m-p

 Patterns containing p

 Patterns having m but no p

 …

 Patterns having c but no a nor b, m, p

 Pattern f

 Completeness and non-redundency

Agents Companion v2
100% (1)
Agents Companion v2
76 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
37 pages
A Study On "CRM: Sales Force Automation"
No ratings yet
A Study On "CRM: Sales Force Automation"
84 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Frequent Itemset Mining
No ratings yet
Frequent Itemset Mining
58 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Concepts and Techniques: - Chapter 6
No ratings yet
Concepts and Techniques: - Chapter 6
64 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
Week 3
No ratings yet
Week 3
56 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Slides 06FPBasic
No ratings yet
Slides 06FPBasic
30 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Module 3
No ratings yet
Module 3
136 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
DWDWM Unit2
No ratings yet
DWDWM Unit2
59 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Association
No ratings yet
Association
40 pages
Unit 3
No ratings yet
Unit 3
62 pages
Frequent Pattern Based Clustering Methods
No ratings yet
Frequent Pattern Based Clustering Methods
23 pages
Slide 06 Chapter6 Frequent Itemset Mining Methods
No ratings yet
Slide 06 Chapter6 Frequent Itemset Mining Methods
62 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
7 - Association Rule Analysis
No ratings yet
7 - Association Rule Analysis
16 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Frequent Patterns and Association Rule Mining: Outline
No ratings yet
Frequent Patterns and Association Rule Mining: Outline
26 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
DM 2
No ratings yet
DM 2
71 pages
SE 458 - Data Mining (DM) : Spring 2019 Section W1
No ratings yet
SE 458 - Data Mining (DM) : Spring 2019 Section W1
20 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Module 3
No ratings yet
Module 3
98 pages
04 FPbasic
No ratings yet
04 FPbasic
78 pages
Unit 2
No ratings yet
Unit 2
65 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
Chapter - 6 Data Mining
No ratings yet
Chapter - 6 Data Mining
65 pages
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
No ratings yet
Chap 4-Mining Frequent Patterns, Association-Lecture 6-2
66 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
06 FPBasic
No ratings yet
06 FPBasic
59 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
CS 412 Intro. To Data Mining
No ratings yet
CS 412 Intro. To Data Mining
55 pages
Chap4 PatternMiningBasic
No ratings yet
Chap4 PatternMiningBasic
52 pages
What Is Frequent Pattern Analysis?
No ratings yet
What Is Frequent Pattern Analysis?
5 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
06 Apriori
No ratings yet
06 Apriori
36 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
Association Rules
No ratings yet
Association Rules
48 pages
06apriori Edited v3
No ratings yet
06apriori Edited v3
29 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Association Rules
No ratings yet
Association Rules
20 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Lecture 4
No ratings yet
Lecture 4
76 pages
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
From Everand
IGNOU BCA Computer Oriented Numerical Technique Previous Year Unsolved Papers BCS 054
Manish Soni
No ratings yet
Reporter and Partner Codes Reporter, Partner and Item Types Flow Codes Service Items Codes Methodology Codes
No ratings yet
Reporter and Partner Codes Reporter, Partner and Item Types Flow Codes Service Items Codes Methodology Codes
17 pages
DBMS Lab
No ratings yet
DBMS Lab
2 pages
SQL Interview QnA
100% (1)
SQL Interview QnA
47 pages
MINTED 2.0 Fact Sheet - Final
No ratings yet
MINTED 2.0 Fact Sheet - Final
2 pages
Normalization in DBMS
No ratings yet
Normalization in DBMS
17 pages
Predictive Maintenance System For Production Lines in Manufacturing (ESTE)
No ratings yet
Predictive Maintenance System For Production Lines in Manufacturing (ESTE)
10 pages
Current Log
No ratings yet
Current Log
22 pages
Caable
No ratings yet
Caable
39 pages
S12 B4H ADSOs+-+Part+1
No ratings yet
S12 B4H ADSOs+-+Part+1
12 pages
Resume For Cloud Technologies
No ratings yet
Resume For Cloud Technologies
4 pages
Design A Social Media Feed Like Facebook or Twitter
No ratings yet
Design A Social Media Feed Like Facebook or Twitter
3 pages
Passport Lookout Tracking System (PLOTS) PIA: Privacy Impact Assessment
No ratings yet
Passport Lookout Tracking System (PLOTS) PIA: Privacy Impact Assessment
14 pages
Stats Exam
No ratings yet
Stats Exam
5 pages
TRAINIG
No ratings yet
TRAINIG
3 pages
Harshil Resume 1
No ratings yet
Harshil Resume 1
1 page
Chapter 7
No ratings yet
Chapter 7
73 pages
CH 3 (SpatialDB)
No ratings yet
CH 3 (SpatialDB)
142 pages
Python Project Documentation 2024
No ratings yet
Python Project Documentation 2024
13 pages
31 Days of Backup and Restore With Dbatools
No ratings yet
31 Days of Backup and Restore With Dbatools
15 pages
Email:: XXXXXXXXXXXX Phone: XXXXXXXXXXXXXX
No ratings yet
Email:: XXXXXXXXXXXX Phone: XXXXXXXXXXXXXX
2 pages
Dataware Housing and Data Mining Question
No ratings yet
Dataware Housing and Data Mining Question
8 pages
1
No ratings yet
1
6 pages
Im Lec
No ratings yet
Im Lec
337 pages
Akash Thumma Resumed
No ratings yet
Akash Thumma Resumed
1 page
CS-Sample QP-DAV Bandhabahal
No ratings yet
CS-Sample QP-DAV Bandhabahal
12 pages
Dream Content Analysis Using Artificial Intelli-Gence
No ratings yet
Dream Content Analysis Using Artificial Intelli-Gence
11 pages
Car Price Prediction Using Machine Learning Techniques
100% (1)
Car Price Prediction Using Machine Learning Techniques
6 pages
Term Paper NG Maagang Pag Aasawa
100% (1)
Term Paper NG Maagang Pag Aasawa
4 pages

KDDM-Lecture 3

Uploaded by

KDDM-Lecture 3

Uploaded by

Data Mining:

Concepts and Techniques

Jiawei Han, Micheline Kamber, and Jian Pei

 Find all frequent item sets

 Find strong association rules from the frequent

 Apriori: A Candidate Generation-and-Test

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data

Tid Items bought  itemset: A set of one or more

 If {beer, diaper, nuts} is frequent, so is {beer,

contains {beer, diaper}

 Freq. pattern growth (FPgrowth—Han, Pei & Yin

 Apriori pruning principle: If there is any itemset which is

 How to generate candidates?

 Relationship between the frequent item sets is

 Let be an item set and , and

 Apriori: A Candidate Generation-and-Test Approach

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data Format

 Mining Close Frequent Patterns and Maxpatterns

 Major computational challenges

 A. Savasere, E. Omiecinski and S. Navathe, VLDB’95

DB1 + DB2 + + DBk = DB

 A k-itemset whose corresponding hashing bucket count is below the

 … 102 {yz, qs, wt}

 ab is not a candidate 2-itemset if the sum of count of {ab, ad, ae}

 Select a sample of original database, mine frequent

 Apriori: A Candidate Generation-and-Test Approach

 Improving the Efficiency of Apriori

 FPGrowth: A Frequent Pattern-Growth Approach

 ECLAT: Frequent Pattern Mining with Vertical Data Format

 Mining Close Frequent Patterns and Maxpatterns

TID Items bought (ordered) frequent items

 Frequent patterns can be partitioned into subsets

 Patterns having m but no p

 Patterns having c but no a nor b, m, p

 Completeness and non-redundency

You might also like