0% found this document useful (0 votes)
18 views71 pages

DM 2

The document covers the concepts of Association Rule Mining, focusing on mining frequent patterns, correlations, and various mining methods. It discusses the importance of pattern discovery, the basic concepts of itemsets and their supports, and the challenges faced in mining frequent patterns. Additionally, it introduces the Apriori algorithm and its improvements for efficient mining of association rules and frequent itemsets.

Uploaded by

mrpulluri1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views71 pages

DM 2

The document covers the concepts of Association Rule Mining, focusing on mining frequent patterns, correlations, and various mining methods. It discusses the importance of pattern discovery, the basic concepts of itemsets and their supports, and the challenges faced in mining frequent patterns. Additionally, it introduces the Apriori algorithm and its improvements for efficient mining of association rules and frequent itemsets.

Uploaded by

mrpulluri1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

AI512PE: DATA MINING (PE - I)

Unit - 2
Unit - II
Association Rule Mining
❑ Mining Frequent Patterns
❑ Associations and correlations
❑ Mining Methods
❑ Mining Various kinds of Association Rules
❑ Correlation Analysis
❑ Constraint based Association mining
❑ Graph Pattern Mining, SPM.

2
Pattern Discovery: Basic Concepts

❑ What Is Pattern Discovery? Why Is It Important?


❑ Basic Concepts: Frequent Patterns and Association Rules

❑ Compressed Representation: Closed Patterns and Max-Patterns

3
What Is Pattern Discovery?
❑ What are patterns?
❑ Patterns: A set of items, subsequences, or substructures that occur
frequently together (or strongly correlated) in a data set
❑ Patterns represent intrinsic and important properties of datasets
❑ Pattern discovery: Uncovering patterns from massive data sets
❑ Motivation examples:
❑ What products were often purchased together?
❑ What are the subsequent purchases after buying an iPad?
❑ What code segments likely contain copy-and-paste bugs?
❑ What word sequences likely form phrases in this corpus?

4
Basic Concepts: k-Itemsets and Their Supports
❑ Itemset: A set of one or more items Tid Items bought
10 Beer, Nuts, Diaper
❑ k-itemset: X = {x1, …, xk}
20 Beer, Coffee, Diaper
❑ Ex. {Beer, Nuts, Diaper} is a 3-itemset
30 Beer, Diaper, Eggs
❑ (absolute) support (count) of X, sup{X}: 40 Nuts, Eggs, Milk
Frequency or the number of 50 Nuts, Coffee, Diaper, Eggs, Milk
occurrences of an itemset X
❑ Ex. sup{Beer} = 3 ❑ (relative) support, s{X}: The fraction of
transactions that contains X (i.e., the
❑ Ex. sup{Diaper} = 4
probability that a transaction contains X)
❑ Ex. sup{Beer, Diaper} = 3
❑ Ex. s{Beer} = 3/5 = 60%
❑ Ex. sup{Beer, Eggs} = 1
❑ Ex. s{Diaper} = 4/5 = 80%
❑ Ex. s{Beer, Eggs} = 1/5 = 20%

5
Basic Concepts: Frequent Itemsets (Patterns)
❑ An itemset (or a pattern) X is frequent Tid Items bought
if the support of X is no less than a 10 Beer, Nuts, Diaper
minsup threshold σ 20 Beer, Coffee, Diaper

❑ Let σ = 50% (σ: minsup threshold) 30 Beer, Diaper, Eggs


40 Nuts, Eggs, Milk
For the given 5-transaction dataset
50 Nuts, Coffee, Diaper, Eggs, Milk
❑ All the frequent 1-itemsets:
❑ Beer: 3/5 (60%); Nuts: 3/5 (60%) ❑ Why do these itemsets (shown on the
❑ Diaper: 4/5 (80%); Eggs: 3/5 (60%) left) form the complete set of frequent
❑ All the frequent 2-itemsets: k-itemsets (patterns) for any k?
❑ {Beer, Diaper}: 3/5 (60%) ❑ Observation: We may need an
❑ All the frequent 3-itemsets? efficient method to mine a complete
❑ None set of frequent patterns

6
From Frequent Itemsets to Association Rules
❑ Comparing with itemsets, rules can be more telling Tid Items bought
10 Beer, Nuts, Diaper
❑ Ex. Diaper → Beer
20 Beer, Coffee, Diaper
❑ Buying diapers may likely lead to buying beers
30 Beer, Diaper, Eggs
❑ How strong is this rule? (support, confidence) 40 Nuts, Eggs, Milk
❑ Measuring association rules: X → Y (s, c) 50 Nuts, Coffee, Diaper, Eggs, Milk
❑ Both X and Y are itemsets Containing Containing
both diaper
❑ Support, s: The probability that a transaction
contains X  Y Beer {Beer}  Diaper
{Diaper}
❑ Ex. s{Diaper, Beer} = 3/5 = 0.6 (i.e., 60%)
❑ Confidence, c: The conditional probability that a Containing beer
transaction containing X also contains Y {Beer}  {Diaper} = {Beer, Diaper}
❑ Calculation: c = sup(X  Y) / sup(X) Note: X  Y: the union of two itemsets
◼ The set contains both X and Y
❑ Ex. c = sup{Diaper, Beer}/sup{Diaper} = ¾ = 0.75
7
Mining Frequent Itemsets and Association Rules
❑Association rule mining Tid Items bought
❑ Given two thresholds: minsup, minconf 10 Beer, Nuts, Diaper

❑ Find all of the rules, X → Y (s, c) 20 Beer, Coffee, Diaper


30 Beer, Diaper, Eggs
❑ such that, s ≥ minsup and c ≥ minconf
40 Nuts, Eggs, Milk
❑ Let minsup = 50% 50 Nuts, Coffee, Diaper, Eggs, Milk
❑ Freq. 1-itemsets: Beer: 3, Nuts: 3,
❑Observations:
Diaper: 4, Eggs: 3
❑ Mining association rules and
❑ Freq. 2-itemsets: {Beer, Diaper}: 3
mining frequent patterns are
very close problems
❑Let minconf = 50%
❑ Scalable methods are needed
❑ Beer → Diaper (60%, 100%)
for mining large datasets
❑ Diaper → Beer (60%, 75%)

(Q: Are these all rules?)


8
Challenge: There Are Too Many Frequent Patterns!
❑ A long pattern contains a combinatorial number of sub-patterns
❑ How many frequent itemsets does the following TDB1 contain?
❑ TDB1: T1: {a1, …, a50}; T2: {a1, …, a100}
❑ Assuming (absolute) minsup = 1
❑ Let’s have a try
1-itemsets: {a1}: 2, {a2}: 2, …, {a50}: 2, {a51}: 1, …, {a100}: 1,
2-itemsets: {a1, a2}: 2, …, {a1, a50}: 2, {a1, a51}: 1 …, …, {a99, a100}: 1,
…, …, …, …
99-itemsets: {a1, a2, …, a99}: 1, …, {a2, a3, …, a100}: 1
100-itemset: {a1, a2, …, a100}: 1 A too huge set for any
one to compute or store!
❑ The total number of frequent itemsets:

9
Apriori Pruning and Scalable Mining Methods
❑ Apriori pruning principle: If there is any itemset which is
infrequent, its superset should not even be generated! (Agrawal &
Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)
❑Scalable mining Methods: Three major approaches
❑ Level-wise, join-based approach: Apriori (Agrawal &
Srikant@VLDB’94)
❑ Vertical data format approach: Eclat (Zaki, Parthasarathy,
Ogihara, Li @KDD’97)
❑ Frequent pattern projection and growth: FPgrowth (Han, Pei,
Yin @SIGMOD’00)

10
The Apriori Algorithm—An Example
minsup = 2 Itemset sup
Database TDB Itemset sup
{A} 2 F1
Tid Items C1 {B} 3
{A} 2
10 A, C, D {B} 3
{C} 3
20 B, C, E 1st scan {C} 3
{D} 1
30 A, B, C, E {E} 3
{E} 3
40 B, E
C2 Itemset sup C2 Itemset
F2 Itemset sup {A, B} 1 {A, B}
{A, C} 2 {A, C} 2
2nd scan {A, C}
{B, C} 2 {A, E} 1
{A, E}
{B, E} 3 {B, C} 2
{B, C}
{C, E} 2 {B, E} 3
{B, E}
{C, E} 2
{C, E}

C3 Itemset F3 Itemset sup


3rd scan
{B, C, E} {B, C, E} 2

11
The Downward Closure Property of Frequent Patterns
❑ Observation: From TDB1: T1: {a1, …, a50}; T2: {a1, …, a100}
❑ We get a frequent itemset: {a1, …, a50}
❑ Also, its subsets are all frequent: {a1}, {a2}, …, {a50}, {a1, a2}, …, {a1, …, a49}, …
❑ There must be some hidden relationships among frequent patterns!
❑ The downward closure (also called “Apriori”) property of frequent patterns
❑ If {beer, diaper, nuts} is frequent, so is {beer, diaper}
❑ Every transaction containing {beer, diaper, nuts} also contains {beer, diaper}
❑ Apriori: Any subset of a frequent itemset must be frequent
❑ Efficient mining methodology
❑ If any subset of an itemset S is infrequent, then there is no chance for S to
be frequent—why do we even have to consider S!? A sharp knife for pruning!

12
Apriori: A Candidate Generation & Test Approach
❑ Outline of Apriori (level-wise, candidate generation and test)
❑ Initially, scan DB once to get frequent 1-itemset
❑ Repeat
❑ Generate length-(k+1) candidate itemsets from length-k frequent
itemsets
❑ Test the candidates against DB to find frequent (k+1)-itemsets
❑ Set k := k +1
❑ Until no frequent or candidate set can be generated
❑ Return all the frequent itemsets derived
13
The Apriori Algorithm (Pseudo-Code)
Ck: Candidate itemset of size k
Fk : Frequent itemset of size k

K := 1;
Fk := {frequent items}; // frequent 1-itemset
While (Fk != ) do { // when Fk is non-empty
Ck+1 := candidates generated from Fk; // candidate generation
Derive Fk+1 by counting candidates in Ck+1 with respect to TDB at minsup;
k := k + 1
}
return k Fk // return Fk generated at each level

14
Candidate Generation: An SQL Implementation
self-join self-join
❑ Suppose the items in Fk-1 are listed
abc abd acd ace bcd
in an order
❑ Step 1: self-joining Fk-1 abcd acde
insert into Ck
pruned
select p.item1, p.item2, …, p.itemk-1, q.itemk-1
from Fk-1 as p, Fk-1 as q
where p.item1= q.item1, …, p.itemk-2 = q.itemk-2, p.itemk-1 < q.itemk-1
❑ Step 2: pruning
for all itemsets c in Ck do
for all (k-1)-subsets s of c do
if (s is not in Fk-1) then delete c from Ck
15
Apriori: Improvements and Alternatives
❑ Reduce passes of transaction database scans
To be discussed in
❑ Partitioning (e.g., Savasere, et al., 1995) subsequent slides
❑ Dynamic itemset counting (Brin, et al., 1997)
❑ Shrink the number of candidates
To be discussed in
❑ Hashing (e.g., DHP: Park, et al., 1995) subsequent slides
❑ Pruning by support lower bounding (e.g., Bayardo 1998)
❑ Sampling (e.g., Toivonen, 1996)
❑ Exploring special data structures
❑ Tree projection (Agarwal, et al., 2001)
❑ H-miner (Pei, et al., 2001)
❑ Hypecube decomposition (e.g., LCM: Uno, et al., 2004)
16
Partitioning: Scan Database Only Twice
❑ Theorem: Any itemset that is potentially frequent in TDB must be frequent in at least
one of the partitions of TDB

TDB1 + TDB2 + ... + TDBk = TDB


sup1(X) < σ|TDB1| sup2(X) < σ|TDB2| ... supk(X) < σ|TDBk| sup(X) < σ|TDB|

❑ Method: Scan DB twice (A. Savasere, E. Omiecinski and S. Navathe, VLDB’95)


❑ Scan 1: Partition database so that each partition can fit in main memory (why?)
❑ Mine local frequent patterns in this partition
❑ Scan 2: Consolidate global frequent patterns
❑ Find global frequent itemset candidates (those frequent in at least one partition)
❑ Find the true frequency of those candidates, by scanning TDBi one more time
17
Efficient Pattern Mining Methods
❑ The Downward Closure Property of Frequent Patterns

❑ The Apriori Algorithm


❑ Extensions or Improvements of Apriori

❑ Mining Frequent Patterns by Exploring Vertical Data Format

❑ FPGrowth: A Frequent Pattern-Growth Approach

❑ Mining Closed Patterns

18
Why Mining Frequent Patterns by Pattern Growth?
❑ Apriori: A breadth-first search mining algorithm
❑ First find the complete set of frequent k-itemsets
❑ Then derive frequent (k+1)-itemset candidates
❑ Scan DB again to find true frequent (k+1)-itemsets
❑ Motivation for a different mining methodology
❑ Can we develop a depth-first search mining algorithm?
❑ For a frequent itemset ρ, can subsequent search be confined
to only those transactions that containing ρ?
❑ Such thinking leads to a frequent pattern growth approach:
❑ FPGrowth (J. Han, J. Pei, Y. Yin, “Mining Frequent Patterns
without Candidate Generation,” SIGMOD 2000)
19
Example: Construct FP-tree from a Transaction DB
TID Items in the Transaction Ordered, frequent itemlist
100 {f, a, c, d, g, i, m, p} f, c, a, m, p
200 {a, b, c, f, l, m, o} f, c, a, b, m
After inserting the 1st frequent
300 {b, f, h, j, o, w} f, b
Itemlist: “f, c, a, m, p”
400 {b, c, k, s, p} c, b, p
500 {a, f, c, e, l, p, m, n} f, c, a, m, p {}
1. Scan DB once, find single item frequent pattern: Header Table
Let min_support = 3 f:1
Item Frequency header
f:4, a:3, c:4, b:3, m:3, p:3
f 4 c:1
2. Sort frequent items in frequency descending
c 4
order, f-list F-list = f-c-a-b-m-p a:1
a 3
3. Scan DB again, construct FP-tree
b 3
❑ The frequent itemlist of each transaction is m:1
m 3
inserted as a branch, with shared sub-
branches merged, counts accumulated p 3 p:1
20
Example: Construct FP-tree from a Transaction DB
TID Items in the Transaction Ordered, frequent itemlist
100 {f, a, c, d, g, i, m, p} f, c, a, m, p
200 {a, b, c, f, l, m, o} f, c, a, b, m
300 {b, f, h, j, o, w} f, b After inserting the 2nd frequent
400 {b, c, k, s, p} c, b, p itemlist “f, c, a, b, m”
500 {a, f, c, e, l, p, m, n} f, c, a, m, p {}
1. Scan DB once, find single item frequent pattern: Header Table
Let min_support = 3 f:2
Item Frequency header
f:4, a:3, c:4, b:3, m:3, p:3
f 4 c:2
2. Sort frequent items in frequency descending
c 4
order, f-list F-list = f-c-a-b-m-p
a 3 a:2
3. Scan DB again, construct FP-tree
b 3
❑ The frequent itemlist of each transaction is m:1 b:1
m 3
inserted as a branch, with shared sub-
branches merged, counts accumulated p 3 p:1 m:1
21
Example: Construct FP-tree from a Transaction DB
TID Items in the Transaction Ordered, frequent itemlist
100 {f, a, c, d, g, i, m, p} f, c, a, m, p
200 {a, b, c, f, l, m, o} f, c, a, b, m
300 {b, f, h, j, o, w} f, b After inserting all the
400 {b, c, k, s, p} c, b, p frequent itemlists
500 {a, f, c, e, l, p, m, n} f, c, a, m, p
{}
1. Scan DB once, find single item frequent pattern: Header Table
Let min_support = 3 f:4 c:1
Item Frequency header
f:4, a:3, c:4, b:3, m:3, p:3
f 4 c:3 b:1 b:1
2. Sort frequent items in frequency descending
c 4
order, f-list F-list = f-c-a-b-m-p
a 3 a:3 p:1
3. Scan DB again, construct FP-tree
b 3
❑ The frequent itemlist of each transaction is m:2 b:1
m 3
inserted as a branch, with shared sub-
branches merged, counts accumulated p 3 p:2 m:1
22
Mining FP-Tree: Divide and Conquer
Based on Patterns and Data
❑ Pattern mining can be partitioned according to current patterns
❑ Patterns containing p: p’s conditional database: fcam:2, cb:1
❑ p’s conditional database (i.e., the database under the condition that p exists):
❑ transformed prefix paths of item p
❑ Patterns having m but no p: m’s conditional database: fca:2, fcab:1
❑ …… …… {}
min_support = 3 Conditional database of each pattern

Item Frequency Header f:4 c:1 Item Conditional database


f 4 c f:3
c:3 b:1 b:1
c 4 a fc:3
a 3 a:3 p:1 b fca:1, f:1, c:1
b 3 m fca:2, fcab:1
m 3 m:2 b:1 p fcam:2, cb:1
p 3
p:2 m:1
23
Mine Each Conditional Database Recursively
min_support = 3
❑ For each conditional database
Conditional Data Bases ❑ Mine single-item patterns
item cond. data base
❑ Construct its FP-tree & mine it
c f:3
a fc:3 p’s conditional DB: fcam:2, cb:1 → c: 3
b fca:1, f:1, c:1 m’s conditional DB: fca:2, fcab:1 → fca: 3
m fca:2, fcab:1 b’s conditional DB: fca:1, f:1, c:1 → ɸ
p fcam:2, cb:1

{} {} {} {} Actually, for single branch FP-tree, all the


frequent patterns can be generated in one shot
f:3 f:3 f:3 f:3 m: 3
c:3 cm’s FP-tree cam’s FP-tree fm: 3, cm: 3, am: 3
c:3
am’s FP-tree fcm: 3, fam:3, cam: 3
a:3
m’s FP-tree Then, mining m’s FP-tree: fca:3 fcam: 3
24
Expressing Patterns in Compressed Form: Closed Patterns
❑ How to reduce number of patterns?
❑ Solution 1: Closed patterns: A pattern (itemset) X is closed if X is frequent, and
there exists no super-pattern Y ‫ כ‬X, with the same support as X
❑ Let Transaction DB TDB1: T1: {a1, …, a50}; T2: {a1, …, a100}
❑ Suppose minsup = 1. How many closed patterns does TDB1 contain?
❑ Two: P1: “{a1, …, a50}: 2”; P2: “{a1, …, a100}: 1”
❑ Closed pattern is a lossless compression of frequent patterns
❑ Reduces the # of patterns but does not lose the support information!
❑ You will still be able to say: “{a2, …, a40}: 2”, “{a5, a51}: 1”

25
Expressing Patterns in Compressed Form: Max-Patterns
❑ Solution 2: Max-patterns: A pattern X is a max-pattern if X is frequent and there
exists no frequent super-pattern Y ‫ כ‬X
❑ Difference from close-patterns?
❑ Do not care the real support of the sub-patterns of a max-pattern
❑ Let Transaction DB TDB1: T1: {a1, …, a50}; T2: {a1, …, a100}
❑ Suppose minsup = 1. How many max-patterns does TDB1 contain?
❑ One: P: “{a1, …, a100}: 1”
❑ Max-pattern is a lossy compression!
❑ We only know {a1, …, a40} is frequent
❑ But we do not know the real support of {a1, …, a40}, …, any more!
❑ Thus in many applications, mining close-patterns is more desirable than mining
max-patterns
26
Mining Multiple-Level Frequent Patterns
❑ Items often form hierarchies Uniform support Reduced support
Milk Level 1
Level 1
❑ Ex.: Dairyland 2% milk; min_sup = 5% [support = 10%] min_sup = 5%
Wonder wheat bread
Level 2
2% Milk Skim Milk Level 2
❑ How to set min-support min_sup = 5% [support = 6%] [support = 2%] min_sup = 1%
thresholds?
❑ Uniform min-support across multiple levels (reasonable?)
❑ Level-reduced min-support: Items at the lower level are
expected to have lower support
❑ Efficient mining: Shared multi-level mining
❑ Use the lowest min-support to pass down the set of
candidates

28
Redundancy Filtering at Mining Multi-Level
Associations
❑ Multi-level association mining may generate many redundant rules
❑ Redundancy filtering: Some rules may be redundant due to “ancestor”
relationships between items
❑ milk  wheat bread [support = 8%, confidence = 70%] (1)
❑ 2% milk  wheat bread [support = 2%, confidence = 72%] (2)
❑ Suppose the 2% milk sold is about ¼ of milk sold in gallons
❑ (2) should be able to be “derived” from (1)
❑ A rule is redundant if its support is close to the “expected” value, according
to its “ancestor” rule, and it has a similar confidence as its “ancestor”
❑ Rule (1) is an ancestor of rule (2), which one to prune?

29
Customized Min-Supports for Different Kinds of Items
❑ We have used the same min-support threshold for all the items or item sets
to be mined in each association mining
❑ In reality, some items (e.g., diamond, watch, …) are valuable but less
frequent
❑ It is necessary to have customized min-support settings for different kinds of
items
❑ One Method: Use group-based “individualized” min-support
❑ E.g., {diamond, watch}: 0.05%; {bread, milk}: 5%; …
❑ How to mine such rules efficiently?
❑ Existing scalable mining algorithms can be easily extended to cover such
cases

30
Mining Multi-Dimensional Associations
❑ Single-dimensional rules (e.g., items are all in “product” dimension)
❑ buys(X, “milk”)  buys(X, “bread”)
❑ Multi-dimensional rules (i.e., items in  2 dimensions or predicates)
❑ Inter-dimension association rules (no repeated predicates)
❑ age(X, “18-25”)  occupation(X, “student”)  buys(X, “coke”)
❑ Hybrid-dimension association rules (repeated predicates)
❑ age(X, “18-25”)  buys(X, “popcorn”)  buys(X, “coke”)
❑ Attributes can be categorical or numerical
❑ Categorical Attributes (e.g., profession, product: no ordering among
values): Data cube for inter-dimension association
❑ Quantitative Attributes: Numeric, implicit ordering among values—
discretization, clustering, and gradient approaches
31
Mining Quantitative Associations
❑ Mining associations with numerical attributes
❑ Ex.: Numerical attributes: age and salary
❑ Methods
❑ Static discretization based on predefined concept hierarchies
❑ Discretization on each dimension with hierarchy
❑ age: {0-10, 10-20, …, 90-100} → {young, mid-aged, old}
❑ Dynamic discretization based on data distribution
❑ Clustering: Distance-based association
❑ First one-dimensional clustering, then association
❑ Deviation analysis:
❑ Gender = female  Wage: mean=$7/hr (overall mean = $9)
32
Mining Extraordinary Phenomena in Quantitative
Association Mining
❑ Mining extraordinary (i.e., interesting) phenomena
❑ Ex.: Gender = female  Wage: mean=$7/hr (overall mean = $9)
❑ LHS: a subset of the population
❑ RHS: an extraordinary behavior of this subset
❑ The rule is accepted only if a statistical test (e.g., Z-test) confirms the
inference with high confidence
❑ Subrule: Highlights the extraordinary behavior of a subset of the
population of the super rule
❑ Ex.: (Gender = female) ^ (South = yes)  mean wage = $6.3/hr
❑ Rule condition can be categorical or numerical (quantitative rules)
❑ Ex.: Education in [14-18] (yrs)  mean wage = $11.64/hr
❑ Efficient methods have been developed for mining such extraordinary
rules (e.g., Aumann and Lindell@KDD’99)
33
Rare Patterns vs. Negative Patterns
❑ Rare patterns
❑ Very low support but interesting (e.g., buying Rolex watches)
❑ How to mine them? Setting individualized, group-based min-support
thresholds for different groups of items
❑ Negative patterns
❑ Negatively correlated: Unlikely to happen together
❑ Ex.: Since it is unlikely that the same customer buys both a Ford
Expedition (an SUV car) and a Ford Fusion (a hybrid car), buying a
Ford Expedition and buying a Ford Fusion are likely negatively
correlated patterns
❑ How to define negative patterns?
34
Defining Negative Correlated Patterns
❑ A support-based definition
❑ If itemsets A and B are both frequent but rarely occur together, i.e.,
sup(A U B) << sup (A) × sup(B)
❑ Then A and B are negatively correlated
❑ Is this a good definition for large transaction datasets?
❑ Ex.: Suppose a store sold two needle packages A and B 100 times each,
but only one transaction contained both A and B
❑ When there are in total 200 transactions, we have
❑ s(A U B) = 0.005, s(A) × s(B) = 0.25, s(A U B) << s(A) × s(B)
❑ But when there are 105 transactions, we have
❑ s(A U B) = 1/105, s(A) × s(B) = 1/103 × 1/103, s(A U B) > s(A) × s(B)
❑ What is the problem?—Null transactions: The support-based definition
is not null-invariant!
35
Defining Negative Correlation: Need
Null-Invariance in Definition
❑ A good definition on negative correlation should take care of the null-
invariance problem
❑ Whether two itemsets A and B are negatively correlated should not be
influenced by the number of null-transactions
❑ A Kulczynski measure-based definition
❑ If itemsets A and B are frequent but
(s(A U B)/s(A) + s(A U B)/s(B))/2 < є,
where є is a negative pattern threshold, then A and B are negatively
correlated
❑ For the same needle package problem:
❑ No matter there are in total 200 or 105 transactions
❑ If є = 0.01, we have
(s(A U B)/s(A) + s(A U B)/s(B))/2 = (0.01 + 0.01)/2 < є
36
Mining Compressed Patterns
Pat-ID Item-Sets Support ❑ Why mining compressed patterns?
P1 {38,16,18,12} 205227
❑ Too many scattered patterns but not so
P2 {38,16,18,12,17} 205211
meaningful
P3 {39,38,16,18,12,17} 101758
P4 {39,16,18,12,17} 161563 ❑ Pattern distance measure
P5 {39,16,18,12} 161576

❑ Closed patterns ❑ δ-clustering: For each pattern P, find all patterns


❑ P1, P2, P3, P4, P5 which can be expressed by P and whose distance
❑ Emphasizes too much on to P is within δ (δ-cover)
support
❑ There is no compression ❑ All patterns in the cluster can be represented by P
❑ Max-patterns ❑ Method for efficient, direct mining of compressed
❑ P3: information loss frequent patterns (e.g., D. Xin, J. Han, X. Yan, H.
❑ Desired output (a good balance): Cheng, "On Compressing Frequent Patterns",
❑ P2, P3, P4
Knowledge and Data Engineering, 60:5-29, 2007)
37
Redundancy-Aware Top-k Patterns
❑ Desired patterns: high significance & low redundancy

❑ Method: Use MMS (Maximal Marginal Significance) for measuring the


combined significance of a pattern set
❑ Xin et al., Extracting Redundancy-Aware Top-K Patterns, KDD’06
38
Advanced Frequent Pattern Mining
❑ Mining Diverse Patterns

❑ Sequential Pattern Mining

❑ Constraint-Based Frequent Pattern Mining

❑ Graph Pattern Mining

❑ Pattern Mining Application: Mining Software Copy-and-Paste Bugs

❑ Summary

39
Constraint-Based Pattern Mining
❑ Why Constraint-Based Mining?
❑ Different Kinds of Constraints: Different Pruning Strategies
❑ Constrained Mining with Pattern Anti-Monotonicity
❑ Constrained Mining with Pattern Monotonicity
❑ Constrained Mining with Data Anti-Monotonicity
❑ Constrained Mining with Succinct Constraints
❑ Constrained Mining with Convertible Constraints
❑ Handling Multiple Constraints
❑ Constraint-Based Sequential-Pattern Mining
40
Why Constraint-Based Mining?
❑ Finding all the patterns in a dataset autonomously?—unrealistic!
❑ Too many patterns but not necessarily user-interested!
❑ Pattern mining in practice: Often a user-guided, interactive process
❑ User directs what to be mined using a data mining query language (or a
graphical user interface), specifying various kinds of constraints
❑ What is constraint-based mining?
❑ Mine together with user-provided constraints
❑ Why constraint-based mining?
❑ User flexibility: User provides constraints on what to be mined
❑ Optimization: System explores such constraints for mining efficiency
❑ E.g., Push constraints deeply into the mining process
41
Various Kinds of User-Specified Constraints
❑ Knowledge type constraint—Specifying what kinds of knowledge to mine
❑ Ex.: Classification, association, clustering, outlier finding, …
❑ Data constraint—using SQL-like queries
❑ Ex.: Find products sold together in NY stores this year
❑ Dimension/level constraint—similar to projection in relational database
❑ Ex.: In relevance to region, price, brand, customer category
❑ Interestingness constraint—various kinds of thresholds
❑ Ex.: Strong rules: min_sup  0.02, min_conf  0.6, min_correlation  0.7
❑ Rule (or pattern) constraint The focus of this study
❑ Ex.: Small sales (price < $10) triggers big sales (sum > $200)

42
Pattern Space Pruning with Pattern Anti-Monotonicity
TID Transaction ◼ A constraint c is anti-monotone
10 a, b, c, d, f, h ◼ If an itemset S violates constraint c, so does any of its superset
20 b, c, d, f, g, h that is, mining on itemset S can be terminated
30 b, c, d, f, g
◼ Ex. 1: c1: sum(S.price)  v is anti-monotone
40 a, c, e, f, g
min_sup = 2 ◼ Ex. 2: c2: range(S.profit)  15 is anti-monotone
Item Price Profit ◼ Itemset ab violates c2 (range(ab) = 40)
a 100 40
◼ So does every superset of ab
b 40 0
c 150 −20
◼ Ex. 3. c3: sum(S.Price)  v is not anti-monotone
d 35 −15 ◼ Ex. 4. Is c4: support(S)  σ anti-monotone?
e 55 −30 ◼ Yes! Apriori pruning is essentially pruning with an anti-
f 45 −10 monotonic constraint!
g 80 20
Note: item.price > 0
h 10 5 Profit can be negative
43
Pattern Monotonicity and Its Roles
TID Transaction
10 a, b, c, d, f, h ◼ A constraint c is monotone: If an itemset S satisfies the
20 b, c, d, f, g, h
constraint c, so does any of its superset
30 b, c, d, f, g ◼ That is, we do not need to check c in subsequent mining
a, c, e, f, g
40
◼ Ex. 1: c1: sum(S.Price)  v is monotone
min_sup = 2
Item Price Profit ◼ Ex. 2: c2: min(S.Price)  v is monotone
a 100 40 ◼ Ex. 3: c3: range(S.profit)  15 is monotone
b 40 0
c 150 −20
◼ Itemset ab satisfies c3
d 35 −15 ◼ So does every superset of ab
e 55 −30
f 45 −10
g 80 20
Note: item.price > 0
h 10 5 Profit can be negative
44
Data Space Pruning with Data Anti-Monotonicity
TID Transaction ❑ A constraint c is data anti-monotone: In the mining process, if a
10 a, b, c, d, f, h data entry t cannot satisfy a pattern p under c, t cannot satisfy
20 b, c, d, f, g, h p’s superset either
30 b, c, d, f, g ❑ Data space pruning: Data entry t can be pruned
40 a, c, e, f, g
❑ Ex. 1: c1: sum(S.Profit)  v is data anti-monotone
min_sup = 2
❑ Let constraint c1 be: sum(S.Profit) ≥ 25
Item Price Profit
a 100 40 ❑ T30: {b, c, d, f, g} can be removed since none of their
b 40 0 combinations can make an S whose sum of the profit is ≥ 25
c 150 −20 ❑ Ex. 2: c2: min(S.Price)  v is data anti-monotone
d 35 −15 ❑ Consider v = 5 but every item in a transaction, say T50 , has a
e 55 −30 price higher than 10
f 45 −10
g 80 20
❑ Ex. 3: c3: range(S.Profit) > 25 is data anti-monotone
Note: item.price > 0
h 10 5
45 Profit can be negative
Data Space Pruning Should Be Explored Recursively
b’s-proj. DB
❑Example. c3: range(S.Profit) > 25 Item Profit
TID Transaction TID Transaction
❑ We check b’s projected database 10 a, b, c, d, f, h a 40
10 a, c, d, f, h
❑ But item “a” is infrequent (sup = 1) 20 c, d, f, g, h 20 b, c, d, f, g, h b 0

❑ After removing “a (40)” from T 10 30 b, c, d, f, g c −20


30 c, d, f, g
40 a, c, e, f, g d −15
❑ T10 cannot satisfy c3 any more
e −30
min_sup = 2
❑ Since “b (0)” and “c (−20), d (−15), f (−10), h (5)” f −10
price(item) > 0
❑ By removing T10, we can also prune “h” in T 20 g 20
Constraint:
b’s-proj. DB TID Transaction h 5
Recursive range{S.profit} > 25
10 a, c, d, f, h Data b’s FP-tree
20 c, d, f, g, h Pruning single branch: cdfg: 2
Only a single branch “cdfg: 2”
30 c, d, f, g to be mined in b’s projected DB

❑ Note: c3 prunes T10 effectively only after “a” is pruned (by min-sup) in b’s projected DB
46
Succinctness: Pruning Both Data and Pattern Spaces
◼ Succinctness: If the constraint c can be enforced by directly manipulating the data
◼ Ex. 1: To find those patterns without item i
◼ Remove i from DB and then mine (pattern space pruning)
◼ Ex. 2: To find those patterns containing item i
◼ Mine only i-projected DB (data space pruning)
◼ Ex. 3: c3: min(S.Price)  v is succinct
◼ Start with only items whose price  v and remove transactions with high-price
items only (pattern + data space pruning)
◼ Ex. 4: c4: sum(S.Price)  v is not succinct
◼ It cannot be determined beforehand since sum of the price of itemset S keeps
increasing

47
Convertible Constraints: Ordering Data in Transactions
TID Transaction ◼ Convert tough constraints into (anti-)monotone by proper ordering
10 a, b, c, d, f, h of items in transactions
20 a, b, c, d, f, g, h
◼ Examine c1: avg(S.profit) > 20
30 b, c, d, f, g
◼ Order items in (profit) value-descending order
40 a, c, e, f, g
◼ <a, g, f, b, h, d, c, e>
min_sup = 2
Item Price Profit ◼ An itemset ab violates c1 (avg(ab) = 20)
a 100 40 ◼ So does ab* (i.e., ab-projected DB)
b 40 0
◼ C1: anti-monotone if patterns grow in the right order!
c 150 −20
◼ Can item-reordering work for Apriori?
d 35 −15
e 55 −30 ◼ Level-wise candidate generation requires multi-way checking!
f 45 −5 ◼ avg(agf) = 21.7 > 20, but avg(gf) = 12.5 < 20
g 80 30 ◼ Apriori will not generate “agf” as a candidate
h 10 5
48
Advanced Frequent Pattern Mining
❑ Mining Diverse Patterns

❑ Sequential Pattern Mining

❑ Constraint-Based Frequent Pattern Mining

❑ Graph Pattern Mining

❑ Pattern Mining Application: Mining Software Copy-and-Paste Bugs

❑ Summary

49
What Is Graph Pattern Mining?
❑ Chem-informatics:
❑ Mining frequent chemical compound structures

❑ Social networks, web communities, tweets, …


❑ Finding frequent research collaboration subgraphs

50
Frequent (Sub)Graph Patterns
❑ Given a labeled graph dataset D = {G1, G2, …, Gn), the supporting graph set of a
subgraph g is Dg = {Gi | g  Gi, Gi D}
❑ support(g) = |Dg|/ |D|
❑ A (sub)graph g is frequent if support(g) ≥ min_sup
Graph Dataset
❑ Ex.: Chemical structures

❑ Alternative:
(A) (B) (C)
❑ Mining frequent subgraph
patterns from a single large min_sup = 2 Frequent Graph Patterns
graph or network
(1) (2)

support = 67%
51
Applications of Graph Pattern Mining
❑ Bioinformatics
❑ Gene networks, protein interactions, metabolic pathways
❑ Chem-informatics: Mining chemical compound structures
❑ Social networks, web communities, tweets, …
❑ Cell phone networks, computer networks, …
❑ Web graphs, XML structures, Semantic Web, information networks
❑ Software engineering: Program execution flow analysis
❑ Building blocks for graph classification, clustering, compression, comparison,
and correlation analysis
❑ Graph indexing and graph similarity search

52
Graph Pattern Mining Algorithms: Different
Methodologies
❑ Generation of candidate subgraphs
❑ Apriori vs. pattern growth (e.g., FSG vs. gSpan)
❑ Search order
❑ Breadth vs. depth
❑ Elimination of duplicate subgraphs
❑ Passive vs. active (e.g., gSpan [Yan & Han, 2002])
❑ Support calculation
❑ Store embeddings (e.g., GASTON [Nijssen & Kok, 2004], FFSM [Huan, Wang,
& Prins, 2003], MoFa [Borgelt & Berthold, ICDM’02])
❑ Order of pattern discovery
❑ Path → tree → graph (e.g., GASTON [Nijssen & Kok, 2004])

53
Apriori-Based Approach
❑ The Apriori property (anti-monotonicity): A size-k (k+1)-edge
subgraph is frequent if and only if all of its k-edge
subgraphs are frequent G1
❑ A candidate size-(k+1) edge/vertex subgraph is G
generated if its corresponding two k-edge/vertex G2
subgraphs are frequent
G’
❑ Iterative mining process: …
❑ Candidate-generation → candidate pruning →
G’’ Gn
support counting → candidate elimination Join

54
Candidate Generation:
Vertex Growing vs. Edge Growing
❑ Methodology: Breadth-search, Apriori joining two size-k graphs
❑ Many possibilities at generating size-(k+1) candidate graphs

❑ Generating new graphs with one more vertex


❑ AGM (Inokuchi, Washio, & Motoda, PKDD’00)
❑ Generating new graphs with one more edge
❑ FSG (Kuramochi & Karypis, ICDM’01)
❑ Performance shows via edge growing is more efficient
55
Pattern-Growth Approach
❑ Depth-first growth of subgraphs from k-edge to (k+1)-edge, then (k+2)-edge
subgraphs (k+2)-edge
❑ Major challenge (k+1)-edge
❑ Generating many duplicate subgraphs G1 …
❑ Major idea to solve the problem
k-edge G2 duplicate
❑ Define an order to generate
G graphs
subgraphs
❑ DFS spanning tree: Flatten a graph …
into a sequence using depth-first Gn
search

❑ gSpan (Yan & Han, ICDM’02)

56
gSPAN: Graph Pattern Growth in Order
❑Right-most path extension in subgraph e0: (0,1)
0
pattern growth e1: (1,2)
1
❑ Right-most path: The path from root to the e2: (2,3)
right-most leaf (choose the vertex with the 2 e3: (3,0)
smallest index at each step) 4
❑ Reduce generation of duplicate subgraphs 3 e4: (2,4)

❑ Completeness: The enumeration of graphs


using right-most path extension is complete
❑ DFS code: Flatten a graph into a sequence
using depth-first search

57
Why Mine Closed Graph Patterns?
❑ Challenge: An n-edge frequent graph may have 2n subgraphs
❑ Motivation: Explore closed frequent subgraphs to handle graph
pattern explosion problem
❑ A frequent graph G is closed if there exists no supergraph of G that
carries the same support as G
If this subgraph is closed in the
graph dataset, it implies that
none of its frequent super-graphs
carries the same support

❑ Lossless compression: Does not contain non-closed graphs, but still


ensures that the mining result is complete
❑ Algorithm CloseGraph: Mines closed graph patterns directly

58
CloseGraph: Directly Mining Closed Graph Patterns
❑ CloseGraph: Mining closed graph patterns by extending gSpan (Yan & Han, KDD’03)

At what condition can we


G1 stop searching their children,
i.e., early termination?
k-edge
G2 ❑ Suppose G and G1 are frequent, and G is a
G subgraph of G1
❑ If in any part of the graph in the dataset

where G occurs, G1 also occurs, then we
need not grow G (except some special, subtle
Gn
cases), since none of G’s children will be
(k+1)-edge closed except those of G1
59
Experiment and Performance Comparison
❑ The AIDS antiviral screen compound dataset from NCI/NIH
❑ The dataset contains 43,905 chemical compounds
❑ Discovered patterns: The smaller minimum support, the bigger and more
interesting subgraph patterns discovered
20% 10% 5%

# of Patterns: Frequent vs. Closed Runtime: Frequent vs. Closed


frequent graphs
closed frequent graphs 10000
1.0E+06 FSG
Gspan
Number of patterns

Run time (sec)


1000 CloseGraph
1.0E+05

100
1.0E+04

1.0E+03 10

1.0E+02 1
0.05 0.06 0.07 0.08 0.1 0.05 0.06 0.07 0.08 0.1

60 Minimum support Minimum support


Advanced Frequent Pattern Mining
❑ Mining Diverse Patterns

❑ Sequential Pattern Mining

❑ Constraint-Based Frequent Pattern Mining

❑ Graph Pattern Mining

❑ Pattern Mining Application: Mining Software Copy-and-Paste Bugs

❑ Summary

61
Sequential Pattern Mining
❑ Sequential Pattern and Sequential Pattern Mining

❑ GSP: Apriori-Based Sequential Pattern Mining

❑ SPADE: Sequential Pattern Mining in Vertical Data Format

❑ PrefixSpan: Sequential Pattern Mining by Pattern-Growth

❑ CloSpan: Mining Closed Sequential Patterns

62
Sequence Databases & Sequential Patterns
❑ Sequential pattern mining has broad applications
❑ Customer shopping sequences
❑ Purchase a laptop first, then a digital camera, and then a smartphone,
within 6 months
❑ Medical treatments, natural disasters (e.g., earthquakes), science &
engineering processes, stocks and markets, ...
❑ Weblog click streams, calling patterns, …
❑ Software engineering: Program execution sequences, …
❑ Biological sequences: DNA, protein, …
❑ Transaction DB, sequence DB vs. time-series DB
❑ Gapped vs. non-gapped sequential patterns
❑ Shopping sequences, clicking streams vs. biological sequences
63
Sequential Pattern and Sequential Pattern Mining
❑ Sequential pattern mining: Given a set of sequences, find the complete set of
frequent subsequences (i.e., satisfying the min_sup threshold)

A sequence database A sequence: < (ef) (ab) (df) c b >


SID Sequence
10 <a(abc)(ac)d(cf)> ❑ An element may contain a set of items (also called
20 <(ad)c(bc)(ae)> events)
❑ Items within an element are unordered and we list
30 <(ef)(ab)(df)cb>
them alphabetically
40 <eg(af)cbc>
<a(bc)dc> is a subsequence of <a(abc)(ac)d(cf)>

❑ Given support threshold min_sup = 2, <(ab)c> is a sequential pattern

64
Sequential Pattern Mining Algorithms
❑ Algorithm requirement: Efficient, scalable, finding complete set, incorporating
various kinds of user-specific constraints
❑ The Apriori property still holds: If a subsequence s 1 is infrequent, none of s1’s
super-sequences can be frequent
❑ Representative algorithms
❑ GSP (Generalized Sequential Patterns): Srikant & Agrawal @ EDBT’96)
❑ Vertical format-based mining: SPADE (Zaki@Machine Leanining’00)
❑ Pattern-growth methods: PrefixSpan (Pei, et al. @TKDE’04)
❑ Mining closed sequential patterns: CloSpan (Yan, et al. @SDM’03)
❑ Constraint-based sequential pattern mining (to be covered in the constraint
mining section)
65
GSP: Apriori-Based Sequential Pattern Mining
❑ Initial candidates: All 8-singleton sequences SID Sequence
❑ <a>, <b>, <c>, <d>, <e>, <f>, <g>, <h> 10 <(bd)cb(ac)>
20 <(bf)(ce)b(fg)>
❑ Scan DB once, count support for each candidate
30 <(ah)(bf)abf>
❑ Generate length-2 candidate sequences
40 <(be)(ce)d>
<a> <b> <c> <d> <e> <f>
min_sup = 2 50 <a(bd)bcb(ade)>
<a> <aa> <ab> <ac> <ad> <ae> <af>

Cand. sup
<b> <ba> <bb> <bc> <bd> <be> <bf> ❑ Without Apriori pruning:
<c> <ca> <cb> <cc> <cd> <ce> <cf>
<a> 3 <d> <da> <db> <dc> <dd> <de> <df> (8 singletons) 8*8+8*7/2 =
<b> 5 <e> <ea> <eb> <ec> <ed> <ee> <ef> 92 length-2 candidates
<c> 4 <f> <fa> <fb> <fc> <fd> <fe> <ff> ❑ With pruning, length-2
<d> 3 <a> <b> <c> <d> <e> <f> candidates: 36 + 15= 51
<e> 3 <a> <(ab)> <(ac)> <(ad)> <(ae)> <(af)>
<b> <(bc)> <(bd)> <(be)> <(bf)> GSP (Generalized Sequential
<f> 2
<g> 1
<c> <(cd)> <(ce)> <(cf)> Patterns): Srikant & Agrawal
<d> <(de)> <(df)>
<h> 1 <e> <(ef)>
@ EDBT’96)
66 <f>
GSP Mining and Pruning
5th scan: 1 cand. 1 length-5 seq. pat. <(bd)cba> Candidates cannot pass min_sup
threshold
4th scan: 8 cand. 7 length-4 seq. pat. <abba> <(bd)bc> … Candidates not in DB

3rd scan: 46 cand. 20 length-3 seq. pat. 20


<abb> <aab> <aba> <baa> <bab> …
cand. not in DB at all
2nd scan: 51 cand. 19 length-2 seq. pat. <aa> <ab> … <af> <ba> <bb> … <ff> <(ab)> … <(ef)>
10 cand. not in DB at all
1st scan: 8 cand. 6 length-1 seq. pat. <a> <b> <c> <d> <e> <f> <g> <h> min_sup = 2
❑ Repeat (for each level (i.e., length-k)) SID Sequence
10 <(bd)cb(ac)>
❑ Scan DB to find length-k frequent sequences
20 <(bf)(ce)b(fg)>
❑ Generate length-(k+1) candidate sequences from length-k
30 <(ah)(bf)abf>
frequent sequences using Apriori <(be)(ce)d>
40
❑ set k = k+1 50 <a(bd)bcb(ade)>
❑ Until no frequent sequence or no candidate can be found
67
Sequential Pattern Mining in Vertical Data
Format: The SPADE Algorithm
❑ A sequence database is mapped to: <SID, EID>
❑ Grow the subsequences (patterns) one item at a time by Apriori candidate generation
SID Sequence
1 <a(abc)(ac)d(cf)>
2 <(ad)c(bc)(ae)>
3 <(ef)(ab)(df)cb>
4 <eg(af)cbc>
min_sup = 2

Ref: SPADE (Sequential


PAttern Discovery
using Equivalent Class)
[M. Zaki 2001]

68
PrefixSpan: A Pattern-Growth Approach
min_sup = 2
SID Sequence ❑Prefix and suffix
10 <a(abc)(ac)d(cf)> Prefix Suffix (Projection)
❑ Given <a(abc)(ac)d(cf)>
20 <(ad)c(bc)(ae)> <a> <(abc)(ac)d(cf)>
<aa> <(_bc)(ac)d(cf)> ❑ Prefixes: <a>, <aa>,
30 <(ef)(ab)(df)cb>
<ab> <(_c)(ac)d(cf)> <a(ab)>, <a(abc)>, …
40 <eg(af)cbc>
❑ Suffix: Prefixes-based
❑PrefixSpan Mining: Prefix Projections projection
❑ Step 1: Find length-1 sequential patterns
❑ <a>, <b>, <c>, <d>, <e>, <f>
❑ Step 2: Divide search space and mine each projected DB
❑ <a>-projected DB,
PrefixSpan (Prefix-projected
❑ <b>-projected DB,
Sequential pattern mining)
❑ …
Pei, et al. @TKDE’04
❑ <f>-projected DB, …
69
PrefixSpan: Mining Prefix-Projected DBs
SID Sequence min_sup = 2
10 <a(abc)(ac)d(cf)>
Length-1 sequential patterns
20 <(ad)c(bc)(ae)> <a>, <b>, <c>, <d>, <e>, <f>
30 <(ef)(ab)(df)cb>
40 <eg(af)cbc>
prefix <a> prefix <b> prefix <c>, …, <f>
<a>-projected DB <b>-projected DB
<(abc)(ac)d(cf)> Length-2 sequential …
<(_d)c(bc)(ae)> patterns
<aa>, <ab>, <(ab)>,
<(_b)(df)cb>
<(_f)cbc>
<ac>, <ad>, <af> ……
prefix <aa> prefix <af> Major strength of PrefixSpan:
◼ No candidate subseqs. to be generated
<aa>-projected DB … <af>-projected DB
◼ Projected DBs keep shrinking
70
Implementation Consideration: Pseudo-Projection vs.
Physical Projection
❑ Major cost of PrefixSpan: Constructing projected DBs
❑ Suffixes largely repeating in recursive projected DBs
❑ When DB can be held in main memory, use pseudo projection
❑ No physically copying suffixes
s = <a(abc)(ac)d(cf)>
❑ Pointer to the sequence <a>
❑ Offset of the suffix
s|<a>: ( , 2) <(abc)(ac)d(cf)>
❑ But if it does not fit in memory <ab>
❑ Physical projection s|<ab>: ( , 5) <(_c)(ac)d(cf)>
❑ Suggested approach:
❑ Integration of physical and pseudo-projection
❑ Swapping to pseudo-projection when the data fits in memory
71
END OF UNIT - II

72

You might also like