0% found this document useful (0 votes)

19 views32 pages

13 Assoc2

This document outlines concepts related to association rule mining, including: - Single-dimensional and multi-level association rule mining algorithms - Multi-dimensional association rules can involve two or more dimensions or predicates - Techniques for mining multi-dimensional associations include concept-based, distribution-based, and distance-based approaches

Uploaded by

eshwarpunna98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views32 pages

13 Assoc2

Uploaded by

eshwarpunna98

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Association Rules

CS 5331 by Rattikorn Hewett

Texas Tech University 1

Outline
n Association Rule Mining – Basic Concepts
n Association Rule Mining Algorithms:
¨ Single-dimensional Boolean associations
¨ Multi-level associations
¨ Multi-dimensional associations
n Association vs. Correlation
n Adding constraints
n Applications/extensions of frequent pattern mining
n Summary

1
Multiple-level Association Rules
Why?
n Hard to find strong associations among low conceptual
level data (e.g., less support counts for “skim milk” than
“milk”)
n Associations among high-level data are likely to be known
and uninteresting
n Easier to find interesting associations among items at
multiple conceptual levels, rather than only among single
level data

Approaches: uniform vs. reduced threshold

n Uniform min support
¨ uses same min support threshold at all levels
+ Search is simplified (dealing with one threshold) and
optimized (omitting itemsets that has an infrequent itemset child)
- If the threshold is set too high
à might miss associations at low level
if it is set too low
à too many uninteresting associations

uniform support reduced support

Level 1 Milk Level 1
min_sup = 5% [support = 10%] min_sup = 5%

Level 2 2% Milk Skim Milk Level 2

min_sup = 5% [support = 6%] [support = 4%] min_sup = 3%

2
Reduced Min Support
n Four strategies:
1.level-by-level: full breath search on every node
2.level-cross filtering by single item: items are examined only
if parents are frequent (e.g., do not examine 2%Milk and Skim Milk)
3.level-cross filtering by k-itemsets: examine only children of
frequent k-itemsets (e.g., the 2-itemset Milk&Bread is infrequent so do not
examine all its children)
Milk Top level: min_sup = 5%
[support = 4%]
Bottom level: min_sup = 3%
2% Milk Skim Milk
[support = 3%] [support = 2%]

Milk Milk & Bread

[support = 4%] [support = 4%]

2% Milk Skim Milk 2% Milk & wheat bread Skim Milk & white bread
[support = 3%] [support = 2%] [support = 2%] [support = 1%]

Search 1 is too relax, 3 is too limited, 2 is like 3 but less restricted

because it deals with 1-item set 5

Reduced Min Support (cont)

4. Controlled level-cross filtering by single item: add level

passage threshold (e.g., user slide the level passage threshold between 5%
and 2% -- can do this for each concept hierarchy)

Top level: min_sup = 5%

Milk
[support = 4%] Bottom level: min_sup = 2%
2% Milk Skim Milk Method 2. could miss associations:
[support = 3%] [support = 2%]
2%Milk à Skim Milk

Top level: min_sup = 5%

Milk level-passage-sup = 4%
[support = 4%]
Bottom level: min_sup = 2%
2% Milk Skim Milk
[support = 3%] [support = 2%]
6

3
Flexible Support Constraints
n Why flexible support constraints?
¨ Real life occurrence frequencies vary greatly
n Diamond, watch, pens in a shopping basket
¨ Uniform support may not be an interesting model
n A flexible model
¨ Usually,lower-level, more dimension combination, and
longer pattern length ---> smaller support
¨ General rules should be easy to specify and understand
¨ Special items and special group of items may be
specified individually and have higher priority
7

Multi-Level Mining
n A top-down, progressive deepening approach:
¨ First mine high-level frequent items:
milk (15%), bread (10%)
¨ Then mine their lower-level “weaker” frequent itemsets:
skim milk (5%), wheat bread (4%)

n Different min_support threshold across multi-levels

lead to different algorithms:
¨ If adopting the same min_support across multi-levels
then toss t if any of t’s ancestors is infrequent.
¨ If adopting reduced min_support at lower levels
then examine only those descendents whose ancestor’s support
is frequent/non-negligible.
8

4
Redundancy checking
n Must check if the resulting rules from multi-
level association mining are redundant
E.g.,
1. Milk ⇒ Bread [support 8%, confidence 70%]
2. Skim Milk ⇒ Bread [support 2%, confidence 72%]
Suppose about 1/4 of milk sales are skim milk, then
Rule 1. can estimate that
Skim Milk ⇒ Bread [support = 1/4 of 8% = 2%, confidence 70%]
This makes Rule 2. “redundant” since it’s closed to what
is “expected”
9

5
Multi-dimensional Associations
n Involve two or more dimensions (or predicates)
Example:
Single-dimensional rule: buys(X, “milk”) ⇒ buys(X, “bread”)
Multi-dimensional rule: age(X, “0..10”) ⇒ income(X, “0..2K”)
n Two types of multi-dimensional assoc. rules:
¨ Inter-dimension assoc. rules (no repeated predicates)
age(X,”19-25”) ∧ occupation(X,“student”) ⇒ buys(X,“coke”)
¨ hybrid-dimension assoc. rules (repeated predicates)
age(X,”19-25”) ∧ buys(X, “popcorn”) ⇒ buys(X, “coke”)
n Here we’ll deal with inter-dimension associations

Multi-dimension Mining
n Attribute types:
¨ Categorical: finite number of values, no ordering among values
¨ Quantitative: numeric, implicit ordering among values

n Techniques for mining multi-dimensional associations

¨ Search for frequent predicate sets (as opposed to frequent itemsets)
¨ Classified by how “quantitative” attributes are treated
E.g., {age, occupation, buys} is a 3-predicate set
Techniques can be categorized by how age values are treated

6
Multi-dimension Mining (MDM) Techniques
1. Concept-based
¨ Quantitative attribute values are treated as predefined
categories/ranges
¨ Discretization occurs prior to mining using predefined concept
hierarchies
2. Distribution-based
¨ Quantitative attribute values are treated as quantities to satisfy
some criteria (e.g., max confidence)
¨ Discretization occurs during mining process using “bins” based
on the distribution of the data
3. Distance-based
¨ Quantitative attribute values are treated as quantities to capture
meaning of interval data
¨ Discretization occurs during mining process using the distance
between data points
13

Concept-based MDM
n Numeric values are replaced by ranges or predefined concepts
n Two approaches depending on how data are stored:
¨ Relational tables
n Modify the Apriori to finding all frequent predicate sets

n Finding k-predicate sets will require k or k+1 table scans.

¨ Data cubes
n Well suited since data cubes are multi-dimensional structures

n The cells of n-D cuboid store support/confidence of n-

predicate sets (cuboids represent aggregated dimensions)
n To reduce candidates generated, apply the Apriori principle :
every subset of frequent predicate set must be frequent

7
Distribution-based MDM
n Unlike concept-based approach, numeric attribute values are
dynamically discretized to meet some criteria
¨ Example of discretization: binning
n Equiwidth: same interval size
n Equidepth: same number of data points in each bin
n Homogeneity-based: data points in each bin are uniformly distributed
¨ Example of criteria:
n Compact
n Strong rules (i.e., high confidence/support)

n Resulting rules are referred to as Quantitative Association Rules

n Consider a 2-D quantitative association rule: Aquan1 ∧ Bquan2 ⇒ Ccat
E.g., age(X,“30-39”) ∧ income(X, “40K-44K”) ⇒ buys(X, “HD TV”)

Distribution-based MDM - Example

ARCS – Association Rule Clustering System
n For each quantitative attribute, discretize the numeric values based on
the data distribution, e.g., by binning techniques Income 2 3 11 4
30-34K
¨ 2-D table of the resulting bins of the two
35-39K 5 20 23 6
quantitative attributes on LHS of the rule
¨ Each cell holds count distribution in each 40-44K 1 33 46 9

category of the attribute on the RHS of the rule 45-49K 2 12 10 14

n Finding frequent predicate sets 25-29 30-34 35-39 40-44

Age
¨ Generate strong associations (same as in Apriori)
age(X,“30-34”) ∧ income(X, “40K-44K”) ⇒ buys(X, “HD TV”)
age(X,“35-39”) ∧ income(X, “40K-44K”) ⇒ buys(X, “HD TV”)

n Simplify resulting rules

¨ Rule “clusters” ( here in “grids”) are further combined
age(X,“30-39”) ∧ income(X, “40K-44K”) ⇒ buys(X, “HD TV”)

8
Distance-based MDM
n Binning methods do not capture the semantics of interval
data, e.g., Price ($): 7 20 22 50 51 53
Equi-width Equi-depth Distance-
(width $10) (depth 2) based
[0,10] [7,20] [7,7]
[11,20] [22,50] [20,22]
[21,30] [51,53] [50,53]
[31,40]
[41,50]
[51,60]
n Distance-based partitioning, more meaningful discretization
considering:
¨ density/number of points in an interval
¨ “closeness” of points in an interval

Distance-based MDM (contd)

n Distance measures: e.g., two points (x1,x2,x3) and (t1,t2,t3)
¨ Euclidean 3
∑ i =1
( xi − ti ) 2
¨ Manhattan 3
| xi − ti |
n Two phases: ∑ i =1

¨ Identify clusters (Ch 8)

n Data points in each cluster satisfy both frequency threshold
and density threshold ~ support
¨ Obtain association rules
n Define degree of associations ~ confidence, e.g., centroid
(average of data points in the cluster) Manhattan distance
n Three conditions:
¨ Clusters in LHS are each strongly associated with each clusters in RHS
¨ Clusters in LHS collectively occur together
¨ Clusters in RHS collectively occur together
18

9
Outline
n Association Rule Mining – Basic Concepts
n Association Rule Mining Algorithms:
¨ Single-dimensional Boolean associations
¨ Multi-level
associations
¨ Multi-dimensional associations
n Association vs. Correlation
n Adding constraints
n Applications/extensions of frequent pattern mining
n Summary

Association & Correlation analysis

Basketball Not basketball Sum (row)

Cereal 400 350 750

Not cereal 200 50 250

Sum(col.) 600 400 1000

n Suppose: min support 20%, min confidence = 50%

n Probability of buying cereal = 750/1000 = 75%
n basketball ⇒ cereal [400/1000 = 40%, 400/600 = 66.7%]
Chance of buying cereal (even without this rule) is already higher than 66.7%
à the implication of this rule is not interesting
“strong” rule (high conf) but “uninformative” (prob on RHS > conf)

10
Association & Correlation analysis (contd)
Basketball Not basketball Sum (row)

Cereal 400 350 750

Not cereal 200 50 250

Sum(col.) 600 400 1000

P(A ∩ B) 1 if A and B are independent

n Define corrA ,B = = < 1 if A & B are –ve correlated
P(A)P(B) > 1 if A & B are +ve correlated
= Lift(A ⇒ B) Does this notation fit the definition?

n Corrbasketball, cereal = (400/1000)/[(600/1000)(750/1000)] = 0.89

€à basketball and cereal are negatively correlated
n Corrbasketball, not cereal = (200/1000)/[(600/1000)(250/1000)] = 1.3
à basketball and not cereal are positively correlated
But basketball ⇒ not cereal [200/1000 = 20%,200/600 = 33.3%]
“Not strong” but “informative” (prob of not buying cereal only 25%)
21

Association & Correlation analysis (contd)

n Association and Correlation are not the same

¨ basketball ⇒ cereal -ve corr:
P(A & B) < P(A) P(B)
strong
Informative:
uninformative & -vely correlated P(B) < Conf(AàB)
P(B) < P(A & B)/P(A)
¨ basketball ⇒ not cereal P(B)P(A) < P(A & B)
not strong -ve corr = uninformative
informative & +vely correlated
Can LHS and RHS of a rule be negatively
correlated and yet the rule is informative?

11
Association & Correlation analysis (contd)

n Association and Correlation are not the same

n Mining of correlated rules
¨ I.e., rules involve correlated itemsets (instead of frequent itemsets)
¨ Correlation value of a set of items can be calculated (cf. corrA,B)
¨ Use the χ2 statistic to test if the correlation value is statistically
significant
¨ Upward closed property – If A has a property, so is A’s superset
n Correlation is upward closed (A is an correlated itemset, so is its superset)
n χ2 is upward closed (within each significance level)
¨ Search upward for correlated itemsets starting from an empty set
to find minimal correlated item sets
n In datacube – random walk algorithms are used
n In general – still an open problem when dealing with large dimensions
n See also [Brin et al., 97] Is “frequent itemset” upward closed?
23

12
Constraint-based Mining
n Finding all the patterns in a database
autonomously? — unrealistic!
¨ The patterns could be too many but not focused!
n Constraint-based mining allows
¨ Specification
of constraints on what to be mined
à more effective mining, e.g.,
Metarule: Template A(x,y) + B(x,w) ⇒ buys(x, “HD TV”) to guide search
Rule constraint: small sales (price<$10) triggers big sales (sum>$200)
¨ Systemoptimization
à more efficient mining, e.g., data mining query optimization
n Constraint-based mining aims to reduce search and
find all answers that satisfy a given constraint 25

Constrained Frequent Pattern Mining

A Mining Query Optimization Problem
n Given a frequent pattern mining query with a set of constraints C, the
algorithm should be
¨ sound: it only finds frequent sets that satisfy C
Which is harder?
¨ complete: all frequent sets satisfying C are found
n A naïve solution:
¨ First find all frequent sets, and then test them for constraint
satisfaction
n More efficient approaches:
¨ Analyze the properties of constraints comprehensively
¨ Push them as deeply as possible inside the frequent pattern
computation and still ensure completeness of the answer.

What kind of rule constraints can be pushed as above?

13
Rule constraints
n Types of rule constraints:
¨ Anti-monotone
¨ Monotone
¨ Succinct
¨ Convertible
¨ Inconvertible
n The first four types can be pushed in the mining process to
improve efficiency without losing completeness of the
answers

(Anti-)monotone constraints
c = a rule constraint
A = an itemset, B = a proper superset of A
n Monotone: A satisfies c àany B satisfies c
n Anti-monotone: A doesn’t satisfy c ànone of B satisfies c
Examples: Item Profit
n sum(A.Price) ≥ v is monotone a 40
n min(A.Price) ≤ v is monotone b 0
n sum(A.Price) ≤ v is anti-monotone c -20
n min(A.Price) ≥ v is anti-monotone d 10
n C: range(A.profit) ≤ 15 is anti-monotone e -30
¨ Itemset ab violates C f 30
¨ So does every superset of ab g 20
h -10

14
Succinct constraints
n Succinct: there is a “formula” to generate precisely all itemsets
satisfying the constraint
¨ itemsets satisfying the constraint can be enumerated before support
counting starts
Item Price
¨ Succinct constraints are pre-counting prunable
a 40
Examples: b 10
n c: max(A.Price) ≥ 20 is monotone and succinct c 22
An itemset satisfies c is of the form A1 ∪ A2, where d 25
A2 is {b} - a set (can be empty) of items with prices ≤ v e 30
A1 is a non-empty subset of {a, c, d, e} - a set of items with prices ≥ v
n min(A.Price) ≤ v is succinct and monotone
TID Transaction
n sum(A.Price) ≤ v is not succinct but anti-monotone 10 a, b, c, d
n sum(A.Price) ≥ v is not succinct but monotone 20 a, c, d
30 a, b, d

The Apriori Algorithm — Example

Database D itemset sup itemset sup. Min support = 2
TID Items C1 {1} 2 L1 {1} 2
100 134 {1} 3 {2} 3
200 235
Scan D {3} 3
{3} 3
{4} 1
300 1235 {5} 3 {5} 3
400 25
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 {1 2}
{1 3} 2 {1 3} 2 Scan D {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 5} 3 {2 3} 2 {2 3}
{3 5} 2 {2 5} 3 {2 5}
{3 5} 2 {3 5}

C3 itemset Scan D L3 itemset sup

{2 3 5} {2 3 5} 2
30

15
Näive: Apriori + Constraint: Sum{S.price < 5}
price of item k is k
Database D itemset sup itemset sup. Min support = 2
TID Items C1 {1} 2 L1 {1} 2
100 134 {2} 3 {2} 3
200 235
Scan D {3} 3
{3} 3
{4} 1
300 1235 {5} 3 {5} 3
400 25
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 {1 2}
{1 3} 2 {1 3} 2 Scan D {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 5} 3 {2 3} 2 {2 3}
{3 5} 2 {2 5} 3 {2 5}
{3 5} 2 {3 5}

C3 itemset Scan D L3 itemset sup

{2 3 5} {2 3 5} 2
31

Pushing constraint: Sum{S.price < 5} price of item k is k

Database D itemset sup itemset sup. Min support = 2

TID Items C1 {1} 2 L1 {1} 2
100 134 {1} 3 {2} 3
200 235
Scan D {3} 3
{3} 3
{4} 1
300 1235 {5} 3 {5} 3
400 25
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 {1 2}
{1 3} 2 {1 3} 2 Scan D {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 5} 3 {2 3} 2 {2 3}
{3 5} 2 {2 5} 3 {2 5}
{3 5} 2 {3 5}

C3 itemset Scan D L3 itemset sup

{2 3 5} {2 3 5} 2
32

16
Pushing Succinct Constraint: Min{S.price ≤ 1}
Database D itemset sup itemset sup. Min support = 2
TID Items C1 {1} 2 L1 {1} 2
100 134 {2} 3 {2} 3
200 235
Scan D {3} 3
{3} 3
{4} 1
300 1235 {5} 3 {5} 3
400 25
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 {1 2}
{1 3} 2 {1 3} 2 Scan D {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 5} 3 {2 3} 2 {2 3}
{3 5} 2 {2 5} 3 {2 5}
{3 5} 2 {3 5}

C3 itemset Scan D L3 itemset sup

{2 3 5} {2 3 5} 2
33

Convertible constraints
n Constraints that can become anti-monotone or monotone when items
in itemsets are ordered in a certain way Item Profit
a 40
Example: b 0
c -20
C: avg(S.profit) ≥ 15 d 10
e -30
n C is not anti-monotone nor monotone f 30
g 20
n If Items are added in value-descending order h -10

<a, f, g, d, b, h, c, e>
40 30 20 10 0 -10 -20 -30
ascending dg satisfies C, so does dg*
gb violates C, so does gbh, and
gb* (note * = strings representing itemsets with each item value ≤ b’s value)
à C becomes anti-monotone
ascending monotone
n C with respect to value-descending order is anti-monotone convertible
34

17
Strongly Convertible Constraints
n avg(X) ≥ 15 is convertible anti-monotone w.r.t. item
value descending order R: <a, f, g, d, b, h, c, e>

n avg(X) ≥ 15 is convertible monotone w.r.t. item

value ascending order R-1: <e, c, h, b, d, g, f, a>

n We say, avg(X) ≥ 15 is strongly convertible

More examples
Convertible Convertible Strongly
Constraint anti-monotone monotone convertible
avg(S) ≤ , ≥ v Yes Yes Yes
median(S) ≤ , ≥ v Yes Yes Yes
sum(S) ≤ v (items could be of any
Yes No No
value, v ≥ 0)
sum(S) ≤ v (items could be of any
No Yes No
value, v ≤ 0)
sum(S) ≥ v (items could be of any
No Yes No
value, v ≥ 0)
sum(S) ≥ v (items could be of any
Yes No No
value, v ≤ 0)
……

18
Common SQL-based constraints
Constraint Antimonotone Monotone Succinct
v∈S no yes yes
S⊆V no yes yes
S⊆V yes no yes
min(S) ≤ v no yes yes
min(S) ≥ v yes no yes
max(S) ≤ v yes no yes

max(S) ≥ v no yes yes

count(S) ≤ v yes no weakly
count(S) ≥ v no yes weakly
sum(S) ≤ v ( a ∈ S, a ≥ 0 ) yes no no
sum(S) ≥ v ( a ∈ S, a ≥ 0 ) no yes no
range(S) ≤ v yes no no
range(S) ≥ v no yes no
avg(S) θ v, θ ∈ { =, ≤, ≥ } convertible convertible no
support(S) ≥ ξ yes no no
support(S) ≤ ξ no yes no
37

Classification of Constraints

Monotone
Antimonotone

Strongly
convertible
Succinct

Convertible Convertible
anti-monotone monotone

Inconvertible

19
Mining with convertible constraints
TID Transaction

n C: avg(S.profit) ≥ 25 10
20
a, b, c, d, f
b, c, d, f, g
n List items in every transaction in value 30
40
a, c, d, e, f
c, e, f, g, h
descending order R: <a, f, g, d, b, h, c, e> Item sup Profit
a 2 40
¨ C is convertible anti-monotone w.r.t. R f 4 30
g 2 20
n Scan transaction DB once d 3 10
b 2 0
¨ remove infrequent items: drop h h 1 -10
n C can’t be pushed in level-wise framework c
e
4
2
-20
-30
¨ Itemset df violates C - we want to prune it
TID Transaction
¨ Since adf satisfies C, Apriori needs df to assemble 10 a, f, d, b, c
adf, df cannot be pruned 20 f, g, d, b, c
n But C can be pushed into frequent-pattern 30 a, f, d, c, e
40 f, g, h, c, e
growth framework!
39

Recap: Constraint-based mining

n All types of rule constraints but inconvertible can be used to
guide the mining process to improve mining efficiency
n Anti-monotone constraints can be applied at each iteration
of Apriori-like algorithms while guaranteeing completeness
¨ Pushing non-anti-monotone constraints into the mining process will
not guarantee completeness
n Itemsets satisfy succinct constraints can be determined
before support counting begins
¨ no need to iteratively check the rule constraint during the mining
process
¨ succinct constraints are pre-computing pushable
n Convertible constraints can’t be pushed in level-wise
mining algorithm such as Apriori 40

20
Handling Multiple Constraints
n Different constraints may require different or even
conflicting item-ordering
n If there exists an order R s.t. both C1 and C2 are convertible
w.r.t. R, then there is no conflict between the two
convertible constraints
n If there exists conflict on order of items
¨ Try to satisfy one constraint first
¨ Then using the order for the other constraint to mine frequent
itemsets in the corresponding projected database

21
Extensions/applications
n The following is not an exhaustive list
n Some topics are likely to be assigned for
your presentations in the second half of this
class

Sequential Pattern Mining

n Sequence data vs. Time-series data
¨ sequences of ordered events (with or without explicit notion of time)
¨ sequences of values/events typically measured at equal time intervals
n Time-series data are sequence data but not viz.
n Sequential Pattern mining
¨ Deals with frequent sequential patterns (as opposed to frequent patterns)
¨ Problem: given a set of sequences, find the complete set of frequent
subsequences

n Applications of sequential pattern mining

¨ Customer shopping sequences, e.g., First buy computer, then CD-ROM,
and then digital camera, within 3 months.
¨ Medical treatment, natural disasters (e.g., earthquakes), science &
engineering processes, stocks and markets, etc.
¨ Telephone calling patterns, Weblog click streams
¨ DNA sequences and gene structures
44

22
Studies on Sequential Pattern Mining
n Concept introduction and an initial Apriori-like algorithm
¨ R. Agrawal & R. Srikant. “Mining sequential patterns,” ICDE’95
n GSP—An Apriori-based, influential mining method (developed at IBM
Almaden)
¨ R. Srikant & R. Agrawal. “Mining sequential patterns: Generalizations and
performance improvements,” EDBT’96
n From sequential patterns to episodes (Apriori-like + constraints)
¨ H. Mannila, H. Toivonen & A.I. Verkamo. “Discovery of frequent episodes
in event sequences,” Data Mining and Knowledge Discovery, 1997
n Mining sequential patterns with constraints
¨ M.N. Garofalakis, R. Rastogi, K. Shim: SPIRIT: Sequential Pattern Mining
with Regular Expression Constraints. VLDB 1999

Classification-Based on Associations
n Mine association possible rules (PR) in form of
condset è c
¨ Condset: a set of attribute-value pairs
¨ C: class label
n Build Classifier
¨ Organize rules according to decreasing precedence
based on confidence and support
n B. Liu, W. Hsu & Y. Ma. Integrating classification and
association rule mining. In KDD’98

23
Iceberg Cube computation
n It is too costly to materialize a high dimen. cube
¨ 20 dimensions each with 99 distinct values may lead to 10020 cube cells
¨ Even if there is only one nonempty cell in each 1010 cells, the cube will still
contain 1030 nonempty cells
n Observation: Trivial cells are usually not interesting
¨ Nontrivial: large volume of sales, or high profit
n Solution:
¨ Iceberg cube—materialize only nontrivial cells of a data cube – cf.
tip of the iceberg
¨ Computation: Based on Apriori-like pruning, e.g.,
n BUC [Bayer & Ramakrishnan, 99]
n bottom-up cubing, efficient bucket-sort alg.
n Only handles anti-monotonic iceberg cubes
¨ If a cell c violates the HAVING clause, so do all more specific cells

Spatial and Multi-Media Association

A Progressive Refinement Method: Why?
n Mining operator can be expensive or cheap, fine or rough
¨ Trade speed with quality: step-by-step refinement.
n Superset coverage property:
¨ Preserve all the positive answers—allow a positive false
test but not a false negative test.
n Two- or multi-step mining:
¨ First apply rough/cheap operator (superset coverage)
¨ Then apply expensive algorithm on a substantially
reduced candidate set (Koperski & Han, SSD’95).

24
Spatial Associations
n Hierarchy of spatial relationship:
¨ “g_close_to”: near_by, touch, intersect, contain, etc.
¨ First search for rough relationship and then refine it.
n Two-step mining of spatial association:
¨ Step1: rough spatial computation (as a filter)
¨ Step2: Detailed spatial algorithm (as refinement)
n Apply only to those objects which have passed the rough
spatial association test (no less than min_support)

Mining Multimedia Associations

Correlations with color, spatial relationships, etc.
From coarse to fine resolution mining

25
Outline
n Association Rule Mining – Basic Concepts
n Association Rule Mining Algorithms:
¨ Single-dimensional Boolean associations
¨ Multi-level
associations
¨ Multi-dimensional associations
n Association vs. Correlation
n Adding constraints
n Applications/extensions of frequent pattern mining
n Summary

Achievements
n Frequent pattern mining—an important task in data mining
n Frequent pattern mining methodology
¨ Candidate generation-test vs. projection-based (frequent-pattern growth)
¨ Vertical vs. horizontal format (itemsets vs. transactionsets)
¨ Various optimization methods: database partition, scan reduction, hash
tree, sampling, border computation, clustering, etc.
n Related frequent pattern mining algorithm: scope extension
¨ Mining closed frequent itemsets and max-patterns (e.g., MaxMiner,
CLOSET, CHARM, etc.)
¨ Mining multi-level, multi-dimensional frequent patterns with flexible
support constraints
¨ Constraint pushing for mining optimization
¨ From frequent patterns to correlation and causality

26
Applications
n Related problems which need frequent pattern mining
¨ Association-based classification
¨ Iceberg cube computation
¨ Database compression by frequent patterns
¨ Mining sequential patterns (GSP, PrefixSpan, SPADE, etc.)
n Mining partial periodicity, cyclic associations, etc.
n Mining frequent structures, trends, etc.
n Typical application examples
¨ Market-basket analysis, Weblog analysis, DNA mining,
etc.

Some Research Problems

n Multi-dimensional gradient analysis: patterns regarding
changes and differences
¨ Not just counts—other measures, e.g., avg(profit)
n Mining top-k frequent patterns without support constraint
n Partial periodic patterns
n DNA sequence analysis and pattern classification

27
References
Frequent-pattern Mining Methods
n R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for
generation of frequent itemsets. Journal of Parallel and Distributed Computing, 2000.
n R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of
items in large databases. SIGMOD'93, 207-216, Washington, D.C.
n R. Agrawal and R. Srikant. Fast algorithms for mining association rules. VLDB'94
487-499, Santiago, Chile.
n J. Han, J. Pei, and Y. Yin: “Mining frequent patterns without candidate generation”. In
Proc. ACM-SIGMOD’2000, pp. 1-12, Dallas, TX, May 2000.
n H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for discovering
association rules. KDD'94, 181-192, Seattle, WA, July 1994.

References
Frequent-pattern Mining Methods
n A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for mining
association rules in large databases. VLDB'95, 432-443, Zurich, Switzerland.
n C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable techniques for mining
causal structures. VLDB'98, 594-605, New York, NY.
n R. Srikant and R. Agrawal. Mining generalized association rules. VLDB'95, 407-419,
Zurich, Switzerland, Sept. 1995.
n R. Srikant and R. Agrawal. Mining quantitative association rules in large relational
tables. SIGMOD'96, 1-12, Montreal, Canada.
n H. Toivonen. Sampling large databases for association rules. VLDB'96, 134-145,
Bombay, India, Sept. 1996.
n M.J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery
of association rules. KDD’97. August 1997.

28
References
Performance Improvements
n S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and
implication rules for market basket analysis. SIGMOD'97, Tucson, Arizona, May 1997.
n D.W. Cheung, J. Han, V. Ng, and C.Y. Wong. Maintenance of discovered association
rules in large databases: An incremental updating technique. ICDE'96, New Orleans,
LA.
n T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining using two-
dimensional optimized association rules: Scheme, algorithms, and visualization.
SIGMOD'96, Montreal, Canada.
n E.-H. Han, G. Karypis, and V. Kumar. Scalable parallel data mining for association
rules. SIGMOD'97, Tucson, Arizona.
n J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining
association rules. SIGMOD'95, San Jose, CA, May 1995.

References
Performance Improvements
n G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G.
Piatetsky-Shapiro and W. J. Frawley, Knowledge Discovery in Databases,. AAAI/MIT
Press, 1991.
n J.S. Park, M.S. Chen, and P.S. Yu. An effective hash-based algorithm for mining
association rules. SIGMOD'95, San Jose, CA.
n S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule mining with
relational database systems: Alternatives and implications. SIGMOD'98, Seattle, WA.
n K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Computing optimized
rectilinear regions for association rules. KDD'97, Newport Beach, CA, Aug. 1997.
n M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm for discovery of
association rules. Data Mining and Knowledge Discovery, 1:343-374, 1997.

29
References
Multi-level, correlation, ratio rules, etc
n S. Brin, R. Motwani, and C. Silverstein. Beyond market basket: Generalizing association
rules to correlations. SIGMOD'97, 265-276, Tucson, Arizona.
n J. Han and Y. Fu. Discovery of multiple-level association rules from large databases.
VLDB'95, 420-431, Zurich, Switzerland.
n M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A.I. Verkamo. Finding
interesting rules from large sets of discovered association rules. CIKM'94, 401-408,
Gaithersburg, Maryland.
n F. Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new paradigm for fast,
quantifiable data mining. VLDB'98, 582-593, New York, NY
n B. Lent, A. Swami, and J. Widom. Clustering association rules. ICDE'97, 220-231,
Birmingham, England.
n R.J. Miller and Y. Yang. Association rules over interval data. SIGMOD'97, 452-461,
Tucson, Arizona.
n A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong negative associations in a
large database of customer transactions. ICDE'98, 494-502, Orlando, FL, Feb. 1998.
n J. Pei, A.K.H. Tung, J. Han. Fault-Tolerant Frequent Pattern Mining: Problems and
Challenges. SIGMOD DMKD’01, Santa Barbara, CA.

References
Mining Max-patterns and Closed itemsets
n R. J. Bayardo. Efficiently mining long patterns from databases. SIGMOD'98, 85-93,
Seattle, Washington.
n J. Pei, J. Han, and R. Mao, "CLOSET: An Efficient Algorithm for Mining Frequent
Closed Itemsets", Proc. 2000 ACM-SIGMOD Int. Workshop on Data Mining and
Knowledge Discovery (DMKD'00), Dallas, TX, May 2000.
n N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets
for association rules. ICDT'99, 398-416, Jerusalem, Israel, Jan. 1999.
n M. Zaki. Generating Non-Redundant Association Rules. KDD'00. Boston, MA. Aug.
2000
n M. Zaki. CHARM: An Efficient Algorithm for Closed Association Rule Mining, SIAM’02

30
References
Constraint-based Mining
n G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of constrained correlated
sets. ICDE'00, 512-521, San Diego, CA, Feb. 2000.
n Y. Fu and J. Han. Meta-rule-guided mining of association rules in relational databases.
KDOOD'95, 39-46, Singapore, Dec. 1995.
n J. Han, L. V. S. Lakshmanan, and R. T. Ng, "Constraint-Based, Multidimensional Data
Mining", COMPUTER (special issues on Data Mining), 32(8): 46-50, 1999.
n L. V. S. Lakshmanan, R. Ng, J. Han and A. Pang, "Optimization of Constrained
Frequent Set Queries with 2-Variable Constraints", SIGMOD’99
n R. Ng, L.V.S. Lakshmanan, J. Han & A. Pang. “Exploratory mining and pruning
optimizations of constrained association rules.” SIGMOD’98
n J. Pei, J. Han, and L. V. S. Lakshmanan, "Mining Frequent Itemsets with Convertible
Constraints", Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), April 2001.
n J. Pei and J. Han "Can We Push More Constraints into Frequent Pattern Mining?",
Proc. 2000 Int. Conf. on Knowledge Discovery and Data Mining (KDD'00), Boston, MA,
August 2000.
n R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints.
KDD'97, 67-73, Newport Beach, California

References

Sequential Pattern Mining Methods

n R. Agrawal and R. Srikant. Mining sequential patterns. ICDE'95, 3-14, Taipei, Taiwan.
n R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and
performance improvements. EDBT’96.
n J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M.-C. Hsu, "FreeSpan: Frequent
Pattern-Projected Sequential Pattern Mining", Proc. 2000 Int. Conf. on Knowledge
Discovery and Data Mining (KDD'00), Boston, MA, August 2000.
n H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event
sequences. Data Mining and Knowledge Discovery, 1:259-289, 1997.
n J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, "PrefixSpan: Mining
Sequential Patterns Efficiently by Prefix-Projected Pattern Growth", Proc. 2001 Int.
Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.
n B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules. ICDE'98,
412-421, Orlando, FL.
n S. Ramaswamy, S. Mahajan, and A. Silberschatz. On the discovery of interesting
patterns in association rules. VLDB'98, 368-379, New York, NY.
n M.J. Zaki. Efficient enumeration of frequent sequences. CIKM’98. Novermber 1998.
n M.N. Garofalakis, R. Rastogi, K. Shim: SPIRIT: Sequential Pattern Mining with Regular
Expression Constraints. VLDB 1999: 223-234, Edinburgh, Scotland.
62

31
References
Mining in Spatial, Multimedia, Text & Web Databases
n K. Koperski, J. Han, and G. B. Marchisio, "Mining Spatial and Image Data through
Progressive Refinement Methods", Revue internationale de gomatique (European
Journal of GIS and Spatial Analysis), 9(4):425-440, 1999.
n A. K. H. Tung, H. Lu, J. Han, and L. Feng, "Breaking the Barrier of Transactions:
Mining Inter-Transaction Association Rules", Proc. 1999 Int. Conf. on Knowledge
Discovery and Data Mining (KDD'99), San Diego, CA, Aug. 1999, pp. 297-301.
n J. Han, G. Dong and Y. Yin, "Efficient Mining of Partial Periodic Patterns in Time Series
Database", Proc. 1999 Int. Conf. on Data Engineering (ICDE'99), Sydney, Australia,
March 1999, pp. 106-115
n H. Lu, L. Feng, and J. Han, "Beyond Intra-Transaction Association Analysis:Mining
Multi-Dimensional Inter-Transaction Association Rules", ACM Transactions on
Information Systems (TOIS’00), 18(4): 423-454, 2000.
n O. R. Zaiane, M. Xin, J. Han, "Discovering Web Access Patterns and Trends by
Applying OLAP and Data Mining Technology on Web Logs," Proc. Advances in Digital
Librar ies Conf. (ADL'98), Santa Barbara, CA, April 1998, pp. 19-29
n O. R. Zaiane, J. Han, and H. Zhu, "Mining Recurrent Items in Multimedia with
Progressive Resolution Refinement", ICDE'00, San Diego, CA, Feb. 2000, pp. 461-470

References

Mining for Classification and Data Cube Computation

n K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes.
SIGMOD'99, 359-370, Philadelphia, PA, June 1999.
n M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J. D. Ullman. Computing
iceberg queries efficiently. VLDB'98, 299-310, New York, NY, Aug. 1998.
n J. Han, J. Pei, G. Dong, and K. Wang, “Computing Iceberg Data Cubes with Complex
Measures”, Proc. ACM-SIGMOD’2001, Santa Barbara, CA, May 2001.
n M. Kamber, J. Han, and J. Y. Chiang. Metarule-guided mining of multi-dimensional
association rules using data cubes. KDD'97, 207-210, Newport Beach, California.
n K. Beyer and R. Ramakrishnan. Bottom-up computation of sparse and iceberg cubes.
SIGMOD’99
n T. Imielinski, L. Khachiyan, and A. Abdulghani. Cubegrades: Generalizing association
rules. Technical Report, Aug. 2000

Additive Manufacturingand 3 DPrinting Technology
100% (3)
Additive Manufacturingand 3 DPrinting Technology
311 pages
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
No ratings yet
Drag & Drop Volume Profile Indicator User Guide: Dragdropvolumeprofile - Ex4 Next
7 pages
Mining Multilevel Association Rules From Transactional Databases
No ratings yet
Mining Multilevel Association Rules From Transactional Databases
46 pages
Architecture Overview Diagram
No ratings yet
Architecture Overview Diagram
12 pages
Automatix
No ratings yet
Automatix
4 pages
6 Asso
No ratings yet
6 Asso
37 pages
Week 4
No ratings yet
Week 4
59 pages
Lecture 2.3.5 2.3.6
No ratings yet
Lecture 2.3.5 2.3.6
19 pages
New Association Rule
No ratings yet
New Association Rule
37 pages
FPA - Advance
No ratings yet
FPA - Advance
80 pages
Unit3mining Association Rules
No ratings yet
Unit3mining Association Rules
21 pages
Data Mining: Association
No ratings yet
Data Mining: Association
41 pages
Advanced
No ratings yet
Advanced
80 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
74 pages
CH 5
No ratings yet
CH 5
45 pages
Ch15multilevel Association Rules PDF
No ratings yet
Ch15multilevel Association Rules PDF
11 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
81 pages
DM Unit-2
No ratings yet
DM Unit-2
22 pages
07 FPAdvanced
No ratings yet
07 FPAdvanced
81 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
81 pages
Mining Frequent Patterns, Association and Correlations
No ratings yet
Mining Frequent Patterns, Association and Correlations
42 pages
Clustering & Association Algorithms 4
No ratings yet
Clustering & Association Algorithms 4
17 pages
Dmbi
No ratings yet
Dmbi
9 pages
Patterning in Multilevel and Multidimensional Space
No ratings yet
Patterning in Multilevel and Multidimensional Space
14 pages
Multi Level Association Rules
No ratings yet
Multi Level Association Rules
6 pages
Data Mining
No ratings yet
Data Mining
7 pages
Cs6220: Data Mining Techniques: Chapter 7: Advanced Pattern Mining
No ratings yet
Cs6220: Data Mining Techniques: Chapter 7: Advanced Pattern Mining
62 pages
Unit-II Association Rules
No ratings yet
Unit-II Association Rules
16 pages
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
Lecture 2.3.7
No ratings yet
Lecture 2.3.7
17 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Mining Various Kinds of Association Rules
No ratings yet
Mining Various Kinds of Association Rules
11 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
FP-tree and COFI Based Approach For Mining of Multiple Level Association Rules in Large Databases
No ratings yet
FP-tree and COFI Based Approach For Mining of Multiple Level Association Rules in Large Databases
7 pages
TMK - DWDM - Unit 4. From Government Engineering College
No ratings yet
TMK - DWDM - Unit 4. From Government Engineering College
176 pages
L-9 Iiitmg N
No ratings yet
L-9 Iiitmg N
19 pages
Unit 2: Scs5623 - Data Mining and Warehousing
No ratings yet
Unit 2: Scs5623 - Data Mining and Warehousing
9 pages
HTCB Unit 3
No ratings yet
HTCB Unit 3
6 pages
Association Analysis (DMDW)
No ratings yet
Association Analysis (DMDW)
16 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Lecture Notes For Chapter 7 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 7 Introduction To Data Mining: by Tan, Steinbach, Kumar
67 pages
Module 3 Mining Frequent Patterns and Associations
No ratings yet
Module 3 Mining Frequent Patterns and Associations
37 pages
A Test
No ratings yet
A Test
7 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
CH - 5
No ratings yet
CH - 5
43 pages
Chapter 3
No ratings yet
Chapter 3
27 pages
Chapter 4 Association Rule Mining (ARM)
No ratings yet
Chapter 4 Association Rule Mining (ARM)
22 pages
Probability and Statistics Mansoura Day3
No ratings yet
Probability and Statistics Mansoura Day3
31 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
Unit IV
No ratings yet
Unit IV
86 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
CH 4
No ratings yet
CH 4
58 pages
DWM
No ratings yet
DWM
66 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Association Rule Mining
No ratings yet
Association Rule Mining
26 pages
Chap7 Extended Association Analysis
No ratings yet
Chap7 Extended Association Analysis
67 pages
Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
No ratings yet
Data Mining: Concepts and Techniques: Mining Association Rules in Large Databases
81 pages
DM Association
No ratings yet
DM Association
43 pages
Living Off the Land: A Beginner’s Guide to Being Self-sufficient
From Everand
Living Off the Land: A Beginner’s Guide to Being Self-sufficient
Darla Noble
No ratings yet
Managing Blind: A Data Quality and Data Governance Vade Mecum
From Everand
Managing Blind: A Data Quality and Data Governance Vade Mecum
Peter Benson
No ratings yet
Flourish
From Everand
Flourish
Catherine Joseph
No ratings yet
How To Add A Right Margin To The Visual Studio Code Editor?: Stack Overflow
No ratings yet
How To Add A Right Margin To The Visual Studio Code Editor?: Stack Overflow
6 pages
Final Year Project Report Format
No ratings yet
Final Year Project Report Format
80 pages
BP Axialkolbenpumpen EnGB Screen
100% (1)
BP Axialkolbenpumpen EnGB Screen
48 pages
DBMS MCQ
No ratings yet
DBMS MCQ
17 pages
WS-011 Windows Server 2019 Administration
No ratings yet
WS-011 Windows Server 2019 Administration
58 pages
18CS34 CES Questionnaire
No ratings yet
18CS34 CES Questionnaire
2 pages
Computer Tips and Tricks
100% (1)
Computer Tips and Tricks
46 pages
Hama IR110 v2.0. Manual
No ratings yet
Hama IR110 v2.0. Manual
40 pages
2-Review of Discrete-Time Signals and Systems-13-12-2024
No ratings yet
2-Review of Discrete-Time Signals and Systems-13-12-2024
68 pages
Lecture 2 - Web Application Vulnerabilities in JAVA Web Application
No ratings yet
Lecture 2 - Web Application Vulnerabilities in JAVA Web Application
33 pages
E3220 p5k3 Deluxe
No ratings yet
E3220 p5k3 Deluxe
172 pages
Reading and Writing Skills Reviewer
No ratings yet
Reading and Writing Skills Reviewer
10 pages
Oracle Database Administrator (DBA) Salary Guide - Oracle University
No ratings yet
Oracle Database Administrator (DBA) Salary Guide - Oracle University
1 page
Draeger MSI Compact
No ratings yet
Draeger MSI Compact
2 pages
Source Code: 1. Create A Series From An Ndarray and A Dictionary of Values
No ratings yet
Source Code: 1. Create A Series From An Ndarray and A Dictionary of Values
28 pages
Presentation of Final Project 2
No ratings yet
Presentation of Final Project 2
26 pages
Apple 820-3588-A
No ratings yet
Apple 820-3588-A
86 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
6 pages
Scaling Techniques
No ratings yet
Scaling Techniques
30 pages
What Revenue Model Does Foursquare Use? PDF
No ratings yet
What Revenue Model Does Foursquare Use? PDF
6 pages
MR Sensore Zen-X E 2024 03 GB
No ratings yet
MR Sensore Zen-X E 2024 03 GB
4 pages
CIT478
No ratings yet
CIT478
195 pages
Deploy Placement Batch Docker
No ratings yet
Deploy Placement Batch Docker
5 pages
Test Techniques
No ratings yet
Test Techniques
13 pages
Azure Book 74
No ratings yet
Azure Book 74
1 page
LS 3 Ge16
No ratings yet
LS 3 Ge16
1 page

13 Assoc2

Uploaded by

13 Assoc2

Uploaded by

Association Rules

CS 5331 by Rattikorn Hewett

Approaches: uniform vs. reduced threshold

uniform support reduced support

Level 2 2% Milk Skim Milk Level 2

Milk Milk & Bread

Search 1 is too relax, 3 is too limited, 2 is like 3 but less restricted

Reduced Min Support (cont)

4. Controlled level-cross filtering by single item: add level

Top level: min_sup = 5%

Top level: min_sup = 5%

n Different min_support threshold across multi-levels

n Techniques for mining multi-dimensional associations

n Finding k-predicate sets will require k or k+1 table scans.

n The cells of n-D cuboid store support/confidence of n-

n Resulting rules are referred to as Quantitative Association Rules

Distribution-based MDM - Example

category of the attribute on the RHS of the rule 45-49K 2 12 10 14

n Finding frequent predicate sets 25-29 30-34 35-39 40-44

n Simplify resulting rules

Distance-based MDM (contd)

¨ Identify clusters (Ch 8)

Association & Correlation analysis

Cereal 400 350 750

Not cereal 200 50 250

Sum(col.) 600 400 1000

n Suppose: min support 20%, min confidence = 50%

Cereal 400 350 750

Not cereal 200 50 250

Sum(col.) 600 400 1000

P(A ∩ B) 1 if A and B are independent

n Corrbasketball, cereal = (400/1000)/[(600/1000)(750/1000)] = 0.89

Association & Correlation analysis (contd)

n Association and Correlation are not the same

n Association and Correlation are not the same

Constrained Frequent Pattern Mining

What kind of rule constraints can be pushed as above?

The Apriori Algorithm — Example

C3 itemset Scan D L3 itemset sup

C3 itemset Scan D L3 itemset sup

Pushing constraint: Sum{S.price < 5} price of item k is k

Database D itemset sup itemset sup. Min support = 2

C3 itemset Scan D L3 itemset sup

C3 itemset Scan D L3 itemset sup

n avg(X) ≥ 15 is convertible monotone w.r.t. item

n We say, avg(X) ≥ 15 is strongly convertible

max(S) ≥ v no yes yes

Recap: Constraint-based mining

Sequential Pattern Mining

n Applications of sequential pattern mining

Spatial and Multi-Media Association

Mining Multimedia Associations

Some Research Problems

Sequential Pattern Mining Methods

Mining for Classification and Data Cube Computation

You might also like