0% found this document useful (0 votes)

313 views42 pages

Mining Frequent Patterns, Association and Correlations

Uploaded by

Sudha Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

313 views42 pages

Mining Frequent Patterns, Association and Correlations

Uploaded by

Sudha Patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 42

Data Mining:

Concepts and Techniques

— Chapter 5 —

Jiawei Han
Department of Computer Science
University of Illinois at Urbana-Champaign
www.cs.uiuc.edu/~hanj
©2006 Jiawei Han and Micheline Kamber, All rights reserved

February 16, 2022 1

Chapter 5: Mining Frequent Patterns,
Association and Correlations

 Basic concepts and a road map

 Efficient and scalable frequent itemset mining
methods
 Mining various kinds of association rules
 From association mining to correlation
analysis
 Constraint-based association mining
 Summary

February 16, 2022 2

What Is Frequent Pattern Analysis?
 Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of
frequent itemsets and association rule mining
 Motivation: Finding inherent regularities in data
 What products were often purchased together?— Beer and diapers?!
 What are the subsequent purchases after buying a PC?
 What kinds of DNA are sensitive to this new drug?
 Can we automatically classify web documents?
 Applications
 Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.

February 16, 2022 3

Why Is Freq. Pattern Mining Important?

 Discloses an intrinsic and important property of data sets

 Forms the foundation for many essential data mining tasks
 Association, correlation, and causality analysis
 Sequential, structural (e.g., sub-graph) patterns
 Pattern analysis in spatiotemporal, multimedia, time-
series, and stream data
 Classification: associative classification
 Cluster analysis: frequent pattern-based clustering
 Data warehousing: iceberg cube and cube-gradient
 Semantic data compression: fascicles
 Broad applications
February 16, 2022 4
Basic Concepts: Frequent Patterns and
Association Rules
Transaction-id Items bought  Itemset X = {x1, …, xk}
10 A, B, D  Find all the rules X  Y with minimum
20 A, C, D support and confidence
30 A, D, E  support, s, probability that a
40 B, E, F
transaction contains X  Y
50 B, C, D, E, F
 confidence, c, conditional
Customer
buys both
Customer probability that a transaction
buys diaper
having X also contains Y
Let supmin = 50%, confmin = 50%
Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3}
Association rules:
Customer
A  D (60%, 100%)
buys beer
D  A (60%, 75%)
February 16, 2022 5
Chapter 5: Mining Frequent Patterns,
Association and Correlations
 Basic concepts and a road map
 Efficient and scalable frequent itemset mining
methods
 Mining various kinds of association rules
 From association mining to correlation
analysis
 Constraint-based association mining
 Summary

February 16, 2022 6

Scalable Methods for Mining Frequent Patterns
 The downward closure property of frequent patterns
 Any subset of a frequent itemset must be frequent

 If {beer, diaper, nuts} is frequent, so is {beer,

diaper}
 i.e., every transaction having {beer, diaper, nuts} also

contains {beer, diaper}

 Scalable mining methods: Three major approaches
 Apriori (Agrawal & Srikant@VLDB’94)

 Freq. pattern growth (FPgrowth—Han, Pei & Yin

@SIGMOD’00)
 Vertical data format approach (Charm—Zaki & Hsiao

@SDM’02)
February 16, 2022 7
Apriori: A Candidate Generation-and-Test Approach

 Apriori pruning principle: If there is any itemset which is

infrequent, its superset should not be generated/tested!
(Agrawal & Srikant @VLDB’94, Mannila, et al. @ KDD’ 94)
 Method:
 Initially, scan DB once to get frequent 1-itemset
 Generate length (k+1) candidate itemsets from length k
frequent itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set can be
generated
February 16, 2022 8
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
20 B, C, E 1st scan {D} 1
{C} 3
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
February 16, 2022 9
The Apriori Algorithm
 Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
February 16, 2022 10
Important Details of Apriori
 How to generate candidates?
 Step 1: self-joining Lk
 Step 2: pruning
 How to count supports of candidates?
 Example of Candidate-generation
 L3={abc, abd, acd, ace, bcd}
 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because ade is not in L3
 C4={abcd}

February 16, 2022 11

Chapter 5: Mining Frequent Patterns,
Association and Correlations
 Basic concepts and a road map
 Efficient and scalable frequent itemset mining
methods
 Mining various kinds of association rules
 From association mining to correlation
analysis
 Constraint-based association mining
 Summary

February 16, 2022 12

Mining Various Kinds of Association Rules

 Mining multilevel association

 Miming multidimensional association

 Mining quantitative association

 Mining interesting correlation patterns

February 16, 2022 13

Mining Multiple-Level Association Rules
 Items often form hierarchies
 Flexible support settings
 Items at the lower level are expected to have lower

support
 Exploration of shared multi-level mining (Agrawal &
Srikant@VLB’95, Han & Fu@VLDB’95)

uniform support reduced support

Level 1
Milk Level 1
min_sup = 5%
[support = 10%] min_sup = 5%

Level 2 2% Milk Skim Milk Level 2

min_sup = 5% [support = 6%] [support = 4%] min_sup = 3%

February 16, 2022 14

Multi-level Association: Redundancy Filtering

 Some rules may be redundant due to “ancestor”

relationships between items.
 Example
 milk  wheat bread [support = 8%, confidence = 70%]
 2% milk  wheat bread [support = 2%, confidence = 72%]
 We say the first rule is an ancestor of the second rule.
 A rule is redundant if its support is close to the
“expected” value, based on the rule’s ancestor.

February 16, 2022 15

Mining Multi-Dimensional Association
 Single-dimensional rules:
buys(X, “milk”)  buys(X, “bread”)
 Multi-dimensional rules:  2 dimensions or predicates
 Inter-dimension assoc. rules (no repeated predicates)
age(X,”19-25”)  occupation(X,“student”)  buys(X, “coke”)
 hybrid-dimension assoc. rules (repeated predicates)
age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)
 Categorical Attributes: finite number of possible values, no
ordering among values—data cube approach
 Quantitative Attributes: numeric, implicit ordering among
values—discretization, clustering, and gradient approaches

February 16, 2022 16

Mining Quantitative Associations

 Techniques can be categorized by how numerical

attributes, such as age or salary are treated
1. Static discretization based on predefined concept
hierarchies (data cube methods)
2. Dynamic discretization based on data distribution
(quantitative rules, e.g., Agrawal & Srikant@SIGMOD96)
3. Clustering: Distance-based association (e.g., Yang &
Miller@SIGMOD97)
 one dimensional clustering then association
4. Deviation: (such as Aumann and Lindell@KDD99)
Sex = female => Wage: mean=$7/hr (overall mean = $9)

February 16, 2022 17

Static Discretization of Quantitative Attributes

 Discretized prior to mining using concept hierarchy.

 Numeric values are replaced by ranges.
 In relational database, finding all frequent k-predicate sets
will require k or k+1 table scans.
()
 Data cube is well suited for mining.
 The cells of an n-dimensional (age) (income) (buys)
cuboid correspond to the
predicate sets.
(age, income) (age,buys) (income,buys)
 Mining from data cubes
can be much faster. (age,income,buys)
February 16, 2022 18
Quantitative Association Rules
 Proposed by Lent, Swami and Widom ICDE’97
 Numeric attributes are dynamically discretized
 Such that the confidence or compactness of the rules

mined is maximized
 2-D quantitative association rules: Aquan1  Aquan2  Acat
 Cluster adjacent
association rules
to form general
rules using a 2-D grid
 Example
age(X,”34-35”)  income(X,”30-50K”)
 buys(X,”high resolution TV”)

February 16, 2022 19

Mining Other Interesting Patterns

 Flexible support constraints (Wang et al. @ VLDB’02)

 Some items (e.g., diamond) may occur rarely but are
valuable
 Customized supmin specification and application
 Top-K closed frequent patterns (Han, et al. @ ICDM’02)
 Hard to specify supmin, but top-k with lengthmin is more
desirable
 Dynamically raise supmin in FP-tree construction and
mining, and select most promising path to mine

February 16, 2022 20

Chapter 5: Mining Frequent Patterns,
Association and Correlations
 Basic concepts and a road map
 Efficient and scalable frequent itemset mining
methods
 Mining various kinds of association rules
 From association mining to correlation analysis
 Constraint-based association mining
 Summary

February 16, 2022 21

Chapter 5: Mining Frequent Patterns,
Association and Correlations
 Basic concepts and a road map
 Efficient and scalable frequent itemset mining
methods
 Mining various kinds of association rules
 From association mining to correlation analysis
 Constraint-based association mining
 Summary

February 16, 2022 22

Constraint-based (Query-Directed) Mining

 Finding all the patterns in a database autonomously? —

unrealistic!
 The patterns could be too many but not focused!
 Data mining should be an interactive process
 User directs what to be mined using a data mining
query language (or a graphical user interface)
 Constraint-based mining
 User flexibility: provides constraints on what to be
mined
 System optimization: explores such constraints for
efficient mining—constraint-based mining
February 16, 2022 23
Constraints in Data Mining

 Knowledge type constraint:

 classification, association, etc.

 Data constraint — using SQL-like queries

 find product pairs sold together in stores in Chicago in

Dec.’02
 Dimension/level constraint
 in relevance to region, price, brand, customer category

 Rule (or pattern) constraint

 small sales (price < $10) triggers big sales (sum >

$200)
 Interestingness constraint
 strong rules: min_support  3%, min_confidence 

60%
February 16, 2022 24
Constrained Mining vs. Constraint-Based Search

 Constrained mining vs. constraint-based search/reasoning

 Both are aimed at reducing search space

 Finding all patterns satisfying constraints vs. finding

some (or one) answer in constraint-based search in AI

 Constraint-pushing vs. heuristic search

 It is an interesting research problem on how to

integrate them
 Constrained mining vs. query processing in DBMS
 Database query processing requires to find all

 Constrained pattern mining shares a similar philosophy

as pushing selections deeply in query processing

February 16, 2022 25

Chapter 5: Mining Frequent Patterns,
Association and Correlations
 Basic concepts and a road map
 Efficient and scalable frequent itemset mining
methods
 Mining various kinds of association rules
 From association mining to correlation analysis
 Constraint-based association mining
 Summary

February 16, 2022 26

Frequent-Pattern Mining: Summary

 Frequent pattern mining—an important task in data mining

 Scalable frequent pattern mining methods
 Apriori (Candidate generation & test)
 Projection-based (FPgrowth, CLOSET+, ...)
 Vertical format approach (CHARM, ...)
 Mining a variety of rules and interesting patterns
 Constraint-based mining
 Mining sequential and structured patterns
 Extensions and applications
February 16, 2022 27
Frequent-Pattern Mining: Research Problems

 Mining fault-tolerant frequent, sequential and structured

patterns
 Patterns allows limited faults (insertion, deletion,
mutation)
 Mining truly interesting patterns
 Surprising, novel, concise, …
 Application exploration
 E.g., DNA sequence analysis and bio-pattern
classification
 “Invisible” data mining

February 16, 2022 28

Ref: Basic Concepts of Frequent Pattern Mining

 (Association Rules) R. Agrawal, T. Imielinski, and A. Swami. Mining

association rules between sets of items in large databases.
SIGMOD'93.
 (Max-pattern) R. J. Bayardo. Efficiently mining long patterns from
databases. SIGMOD'98.
 (Closed-pattern) N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal.
Discovering frequent closed itemsets for association rules. ICDT'99.
 (Sequential pattern) R. Agrawal and R. Srikant. Mining sequential
patterns. ICDE'95

February 16, 2022 29

Ref: Apriori and Its Improvements

 R. Agrawal and R. Srikant. Fast algorithms for mining association rules.

VLDB'94.
 H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient algorithms for
discovering association rules. KDD'94.
 A. Savasere, E. Omiecinski, and S. Navathe. An efficient algorithm for
mining association rules in large databases. VLDB'95.
 J. S. Park, M. S. Chen, and P. S. Yu. An effective hash-based algorithm for
mining association rules. SIGMOD'95.
 H. Toivonen. Sampling large databases for association rules. VLDB'96.
 S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting
and implication rules for market basket analysis. SIGMOD'97.
 S. Sarawagi, S. Thomas, and R. Agrawal. Integrating association rule
mining with relational database systems: Alternatives and implications.
SIGMOD'98.
February 16, 2022 30
Ref: Depth-First, Projection-Based FP Mining

 R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection

algorithm for generation of frequent itemsets. J. Parallel and
Distributed Computing:02.
 J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate
generation. SIGMOD’ 00.
 J. Pei, J. Han, and R. Mao. CLOSET: An Efficient Algorithm for Mining
Frequent Closed Itemsets. DMKD'00.
 J. Liu, Y. Pan, K. Wang, and J. Han. Mining Frequent Item Sets by
Opportunistic Projection. KDD'02.
 J. Han, J. Wang, Y. Lu, and P. Tzvetkov. Mining Top-K Frequent Closed
Patterns without Minimum Support. ICDM'02.
 J. Wang, J. Han, and J. Pei. CLOSET+: Searching for the Best
Strategies for Mining Frequent Closed Itemsets. KDD'03.
 G. Liu, H. Lu, W. Lou, J. X. Yu. On Computing, Storing and Querying
Frequent Patterns. KDD'03.
February 16, 2022 31
Ref: Vertical Format and Row Enumeration Methods

 M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm

for discovery of association rules. DAMI:97.
 Zaki and Hsiao. CHARM: An Efficient Algorithm for Closed Itemset
Mining, SDM'02.
 C. Bucila, J. Gehrke, D. Kifer, and W. White. DualMiner: A Dual-
Pruning Algorithm for Itemsets with Constraints. KDD’02.
 F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. Zaki , CARPENTER:
Finding Closed Patterns in Long Biological Datasets. KDD'03.

February 16, 2022 32

Ref: Mining Multi-Level and Quantitative Rules

 R. Srikant and R. Agrawal. Mining generalized association rules.

VLDB'95.
 J. Han and Y. Fu. Discovery of multiple-level association rules from
large databases. VLDB'95.
 R. Srikant and R. Agrawal. Mining quantitative association rules in
large relational tables. SIGMOD'96.
 T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama. Data mining
using two-dimensional optimized association rules: Scheme,
algorithms, and visualization. SIGMOD'96.
 K. Yoda, T. Fukuda, Y. Morimoto, S. Morishita, and T. Tokuyama.
Computing optimized rectilinear regions for association rules. KDD'97.
 R.J. Miller and Y. Yang. Association rules over interval data.
SIGMOD'97.
 Y. Aumann and Y. Lindell. A Statistical Theory for Quantitative
Association Rules KDD'99.
February 16, 2022 33
Ref: Mining Correlations and Interesting Rules

 M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I.

Verkamo. Finding interesting rules from large sets of discovered
association rules. CIKM'94.
 S. Brin, R. Motwani, and C. Silverstein. Beyond market basket:
Generalizing association rules to correlations. SIGMOD'97.
 C. Silverstein, S. Brin, R. Motwani, and J. Ullman. Scalable
techniques for mining causal structures. VLDB'98.
 P.-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right
Interestingness Measure for Association Patterns. KDD'02.
 E. Omiecinski. Alternative Interest Measures for Mining
Associations. TKDE’03.
 Y. K. Lee, W.Y. Kim, Y. D. Cai, and J. Han. CoMine: Efficient Mining
of Correlated Patterns. ICDM’03.
February 16, 2022 34
Ref: Mining Other Kinds of Rules

 R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining

association rules. VLDB'96.
 B. Lent, A. Swami, and J. Widom. Clustering association rules.
ICDE'97.
 A. Savasere, E. Omiecinski, and S. Navathe. Mining for strong
negative associations in a large database of customer transactions.
ICDE'98.
 D. Tsur, J. D. Ullman, S. Abitboul, C. Clifton, R. Motwani, and S.
Nestorov. Query flocks: A generalization of association-rule mining.
SIGMOD'98.
 F. Korn, A. Labrinidis, Y. Kotidis, and C. Faloutsos. Ratio rules: A new
paradigm for fast, quantifiable data mining. VLDB'98.
 K. Wang, S. Zhou, J. Han. Profit Mining: From Patterns to Actions.
EDBT’02.
February 16, 2022 35
Ref: Constraint-Based Pattern Mining

 R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item

constraints. KDD'97.
 R. Ng, L.V.S. Lakshmanan, J. Han & A. Pang. Exploratory mining and
pruning optimizations of constrained association rules. SIGMOD’98.
 M.N. Garofalakis, R. Rastogi, K. Shim: SPIRIT: Sequential Pattern
Mining with Regular Expression Constraints. VLDB’99.
 G. Grahne, L. Lakshmanan, and X. Wang. Efficient mining of
constrained correlated sets. ICDE'00.
 J. Pei, J. Han, and L. V. S. Lakshmanan. Mining Frequent Itemsets
with Convertible Constraints. ICDE'01.
 J. Pei, J. Han, and W. Wang, Mining Sequential Patterns with
Constraints in Large Databases, CIKM'02.

February 16, 2022 36

Ref: Mining Sequential and Structured Patterns

 R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations

and performance improvements. EDBT’96.
 H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent
episodes in event sequences. DAMI:97.
 M. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences.
Machine Learning:01.
 J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan:
Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth.
ICDE'01.
 M. Kuramochi and G. Karypis. Frequent Subgraph Discovery. ICDM'01.
 X. Yan, J. Han, and R. Afshar. CloSpan: Mining Closed Sequential
Patterns in Large Datasets. SDM'03.
 X. Yan and J. Han. CloseGraph: Mining Closed Frequent Graph Patterns.
KDD'03.
February 16, 2022 37
Ref: Mining Spatial, Multimedia, and Web Data

 K. Koperski and J. Han, Discovery of Spatial Association Rules in

Geographic Information Databases, SSD’95.
 O. R. Zaiane, M. Xin, J. Han, Discovering Web Access Patterns and
Trends by Applying OLAP and Data Mining Technology on Web Logs.
ADL'98.
 O. R. Zaiane, J. Han, and H. Zhu, Mining Recurrent Items in
Multimedia with Progressive Resolution Refinement. ICDE'00.
 D. Gunopulos and I. Tsoukatos. Efficient Mining of Spatiotemporal
Patterns. SSTD'01.

February 16, 2022 38

Ref: Mining Frequent Patterns in Time-Series Data

 B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules.

ICDE'98.
 J. Han, G. Dong and Y. Yin, Efficient Mining of Partial Periodic Patterns in
Time Series Database, ICDE'99.
 H. Lu, L. Feng, and J. Han. Beyond Intra-Transaction Association
Analysis: Mining Multi-Dimensional Inter-Transaction Association Rules.
TOIS:00.
 B.-K. Yi, N. Sidiropoulos, T. Johnson, H. V. Jagadish, C. Faloutsos, and A.
Biliris. Online Data Mining for Co-Evolving Time Sequences. ICDE'00.
 W. Wang, J. Yang, R. Muntz. TAR: Temporal Association Rules on
Evolving Numerical Attributes. ICDE’01.
 J. Yang, W. Wang, P. S. Yu. Mining Asynchronous Periodic Patterns in
Time Series Data. TKDE’03.
February 16, 2022 39
Ref: FP for Classification and Clustering
 G. Dong and J. Li. Efficient mining of emerging patterns:
Discovering trends and differences. KDD'99.
 B. Liu, W. Hsu, Y. Ma. Integrating Classification and Association
Rule Mining. KDD’98.
 W. Li, J. Han, and J. Pei. CMAR: Accurate and Efficient
Classification Based on Multiple Class-Association Rules. ICDM'01.
 H. Wang, W. Wang, J. Yang, and P.S. Yu. Clustering by pattern
similarity in large data sets. SIGMOD’ 02.
 J. Yang and W. Wang. CLUSEQ: efficient and effective sequence
clustering. ICDE’03.
 B. Fung, K. Wang, and M. Ester. Large Hierarchical Document
Clustering Using Frequent Itemset. SDM’03.
 X. Yin and J. Han. CPAR: Classification based on Predictive
Association Rules. SDM'03.
February 16, 2022 40
Ref: Stream and Privacy-Preserving FP Mining

 A. Evfimievski, R. Srikant, R. Agrawal, J. Gehrke. Privacy Preserving

Mining of Association Rules. KDD’02.
 J. Vaidya and C. Clifton. Privacy Preserving Association Rule Mining
in Vertically Partitioned Data. KDD’02.
 G. Manku and R. Motwani. Approximate Frequency Counts over
Data Streams. VLDB’02.
 Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-
Dimensional Regression Analysis of Time-Series Data Streams.
VLDB'02.
 C. Giannella, J. Han, J. Pei, X. Yan and P. S. Yu. Mining Frequent
Patterns in Data Streams at Multiple Time Granularities, Next
Generation Data Mining:03.
 A. Evfimievski, J. Gehrke, and R. Srikant. Limiting Privacy Breaches
in Privacy Preserving Data Mining. PODS’03.
February 16, 2022 41
Ref: Other Freq. Pattern Mining Applications

 Y. Huhtala, J. Kärkkäinen, P. Porkka, H. Toivonen. Efficient

Discovery of Functional and Approximate Dependencies Using
Partitions. ICDE’98.
 H. V. Jagadish, J. Madar, and R. Ng. Semantic Compression and
Pattern Extraction with Fascicles. VLDB'99.
 T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk.
Mining Database Structure; or How to Build a Data Quality
Browser. SIGMOD'02.

February 16, 2022 42

Nptel Swayam DWDM Slides
No ratings yet
Nptel Swayam DWDM Slides
406 pages
TTSH Nursing Survival Guide
100% (2)
TTSH Nursing Survival Guide
96 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
91 pages
E-Tivity 2.2 Tharcisse 217010849
No ratings yet
E-Tivity 2.2 Tharcisse 217010849
7 pages
Normalization Example: Project Management Report
No ratings yet
Normalization Example: Project Management Report
3 pages
Complete Kitcar - March 2017
No ratings yet
Complete Kitcar - March 2017
84 pages
Playboy Magazine Edition Croatia January 2016 - Pamela Anderson - Free Poster Calendar 2016 - Plastic Wrap Unopened
No ratings yet
Playboy Magazine Edition Croatia January 2016 - Pamela Anderson - Free Poster Calendar 2016 - Plastic Wrap Unopened
5 pages
Corrosion Inhibitors
100% (2)
Corrosion Inhibitors
70 pages
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
94% (18)
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
70 pages
CS614 FinalTerm Solved Papers
No ratings yet
CS614 FinalTerm Solved Papers
24 pages
Attribute Selection Measures: Decision Tree Based Classification
No ratings yet
Attribute Selection Measures: Decision Tree Based Classification
16 pages
Dataming T PDF
No ratings yet
Dataming T PDF
48 pages
DS GTU Study Material Presentations Unit-1
No ratings yet
DS GTU Study Material Presentations Unit-1
14 pages
DM Chapter 3 Data Preprocessing
No ratings yet
DM Chapter 3 Data Preprocessing
76 pages
SQL Server and ASP Net Questions & Answers
No ratings yet
SQL Server and ASP Net Questions & Answers
12 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Data Mining - Tasks: Data Characterization Data Discrimination
No ratings yet
Data Mining - Tasks: Data Characterization Data Discrimination
4 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Data Mining
100% (4)
Data Mining
9 pages
Chapter 4 Solutions
No ratings yet
Chapter 4 Solutions
63 pages
DataMining Lecture 1
No ratings yet
DataMining Lecture 1
35 pages
Data Mining
No ratings yet
Data Mining
27 pages
Database Engineering (EC-240) : Lab Manual # 04
No ratings yet
Database Engineering (EC-240) : Lab Manual # 04
9 pages
Database Lab 4
No ratings yet
Database Lab 4
7 pages
DWDM R13 Unit 1 PDF
No ratings yet
DWDM R13 Unit 1 PDF
10 pages
Data Mining Worksheet One
No ratings yet
Data Mining Worksheet One
2 pages
Distributed Database Systems: January 2002
No ratings yet
Distributed Database Systems: January 2002
25 pages
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
No ratings yet
Saveetha Institute of Medical and Technical Sciences: Unit V Plotting and Regression Analysis in R
63 pages
Query by Example QBE Tutorial
No ratings yet
Query by Example QBE Tutorial
16 pages
Week 8-Association Rules Part 1
No ratings yet
Week 8-Association Rules Part 1
31 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
Data Mining
No ratings yet
Data Mining
15 pages
ERD Lab Manual
No ratings yet
ERD Lab Manual
6 pages
DWDM Important Questions
No ratings yet
DWDM Important Questions
2 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
Question Bank: Data Warehousing and Data Mining Semester: VII
No ratings yet
Question Bank: Data Warehousing and Data Mining Semester: VII
4 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
19 pages
Normalization in DBMS11
No ratings yet
Normalization in DBMS11
17 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
The Eclat Algorithm Final
No ratings yet
The Eclat Algorithm Final
12 pages
DBMS Unit4 Notes
No ratings yet
DBMS Unit4 Notes
14 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
24 pages
DM 04 04 Rule-Based Classification
No ratings yet
DM 04 04 Rule-Based Classification
72 pages
Interview Preparations - NielsenIQ
No ratings yet
Interview Preparations - NielsenIQ
1 page
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
5.1 Mining Data Streams
No ratings yet
5.1 Mining Data Streams
16 pages
Outline: Problem Statement Definitions & Examples Strategies
No ratings yet
Outline: Problem Statement Definitions & Examples Strategies
7 pages
Dbms 1 S
No ratings yet
Dbms 1 S
32 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
CH 6
No ratings yet
CH 6
72 pages
QUIZ
No ratings yet
QUIZ
3 pages
Failure Recovery in Distributed Systems
No ratings yet
Failure Recovery in Distributed Systems
24 pages
Data Mining-Rule Based Classification
No ratings yet
Data Mining-Rule Based Classification
4 pages
Synchronization Notes
No ratings yet
Synchronization Notes
9 pages
Cluster Analysis Chapter 8 Solution
No ratings yet
Cluster Analysis Chapter 8 Solution
8 pages
Data Mining
No ratings yet
Data Mining
7 pages
Synchronization
No ratings yet
Synchronization
114 pages
Data Mining and Model Selection
No ratings yet
Data Mining and Model Selection
27 pages
CS 2032 - Data Warehousing and Data Mining PDF
No ratings yet
CS 2032 - Data Warehousing and Data Mining PDF
3 pages
The Freelancer Designers Marketing Playbook PDF
No ratings yet
The Freelancer Designers Marketing Playbook PDF
34 pages
Converting An E-R Diagram To A Relational Schema
No ratings yet
Converting An E-R Diagram To A Relational Schema
4 pages
Data Mining Exam
No ratings yet
Data Mining Exam
14 pages
UNIT V DWM Notes
No ratings yet
UNIT V DWM Notes
18 pages
Catalogue Corolla Altis Compressed 1
No ratings yet
Catalogue Corolla Altis Compressed 1
8 pages
Remote Method Invocation (RMI)
No ratings yet
Remote Method Invocation (RMI)
20 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
Distributed File Systems
No ratings yet
Distributed File Systems
107 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
93 pages
6asso ST
No ratings yet
6asso ST
77 pages
Coe4Tn3 Image Processing: Image Enhancement in The Spatial Image Enhancement in The Spatial Domain
No ratings yet
Coe4Tn3 Image Processing: Image Enhancement in The Spatial Image Enhancement in The Spatial Domain
12 pages
Digital Signal Processing PROF. S. C. Dutta Roy Department of Electrical Engineering IIT Delhi Discrete Fourier Transform (D F T Cont.) Lecture-10
No ratings yet
Digital Signal Processing PROF. S. C. Dutta Roy Department of Electrical Engineering IIT Delhi Discrete Fourier Transform (D F T Cont.) Lecture-10
15 pages
Distributed Comp (Intro)
No ratings yet
Distributed Comp (Intro)
39 pages
Cryptographic Hash Functions - Data Integrity Applications: Dhiren Patel
No ratings yet
Cryptographic Hash Functions - Data Integrity Applications: Dhiren Patel
54 pages
Corporate and Academic Services: Part 1: Basic Data
No ratings yet
Corporate and Academic Services: Part 1: Basic Data
3 pages
EN671: Solar Energy Conversion Technology: Fundamentals of Flat Plate Collectors
No ratings yet
EN671: Solar Energy Conversion Technology: Fundamentals of Flat Plate Collectors
24 pages
Ds LEIAN DCDU 12B Specification
No ratings yet
Ds LEIAN DCDU 12B Specification
9 pages
Lecture7 Segmentation
No ratings yet
Lecture7 Segmentation
12 pages
BNAP Forms 2023 1
No ratings yet
BNAP Forms 2023 1
5 pages
Remote Procedure Call (RPC)
No ratings yet
Remote Procedure Call (RPC)
50 pages
Single Phase String Inverter 7-10 KW: Csi-7Ktl1P-Gi-Fl - Csi-8Ktl1P-Gi-Fl CSI-9KTL1P-GI-FL - CSI-10KTL1P-GI-FL
No ratings yet
Single Phase String Inverter 7-10 KW: Csi-7Ktl1P-Gi-Fl - Csi-8Ktl1P-Gi-Fl CSI-9KTL1P-GI-FL - CSI-10KTL1P-GI-FL
2 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Designing Block Ciphers - DES: Dhiren Patel
No ratings yet
Designing Block Ciphers - DES: Dhiren Patel
37 pages
Distributes Scheduling
No ratings yet
Distributes Scheduling
36 pages
Block Cipher Modes
No ratings yet
Block Cipher Modes
34 pages
Concepts and Techniques: - Chapter 7
No ratings yet
Concepts and Techniques: - Chapter 7
70 pages
Ch03 Chen
No ratings yet
Ch03 Chen
24 pages
Bell's Palsy Treatment and Recovery: The Pharmaceutical Journal
No ratings yet
Bell's Palsy Treatment and Recovery: The Pharmaceutical Journal
5 pages
Process Migration: February 8, 2022
No ratings yet
Process Migration: February 8, 2022
41 pages
Writing Client/Server Programs in C Using Sockets (A Tutorial) Session 5958 Greg Granger Grgran at Sas SAS/C & C++ Support Institute Cary, NC
No ratings yet
Writing Client/Server Programs in C Using Sockets (A Tutorial) Session 5958 Greg Granger Grgran at Sas SAS/C & C++ Support Institute Cary, NC
31 pages
Report On Smart Device
No ratings yet
Report On Smart Device
5 pages
Image Processing Image Processing: Intensity Transformations Intensity Transformations CH 3 CH 3 Chapter 3 Chapter 3
No ratings yet
Image Processing Image Processing: Intensity Transformations Intensity Transformations CH 3 CH 3 Chapter 3 Chapter 3
66 pages
Class 10 Geography Chapter 12
No ratings yet
Class 10 Geography Chapter 12
10 pages
Elliptic Curve Cryptography: Presented By: Mrs. S J Patel Department of Computer Engineering, Nit, Surat
No ratings yet
Elliptic Curve Cryptography: Presented By: Mrs. S J Patel Department of Computer Engineering, Nit, Surat
53 pages
DWH QB
No ratings yet
DWH QB
10 pages
Discrete and Stationary Wavelet Decomposition For Image Resolution Enhancement
100% (2)
Discrete and Stationary Wavelet Decomposition For Image Resolution Enhancement
61 pages
ASME IX Explanations
100% (4)
ASME IX Explanations
13 pages
Oteco General
No ratings yet
Oteco General
16 pages
ARC List
No ratings yet
ARC List
4 pages
Butterfly Arrow 500 W Mixer Grinder: Grand Total 1625.00
No ratings yet
Butterfly Arrow 500 W Mixer Grinder: Grand Total 1625.00
1 page
Lesson Plan: Data Warehousing and Data Mining
No ratings yet
Lesson Plan: Data Warehousing and Data Mining
1 page
Cbs 350 Chapter 08
No ratings yet
Cbs 350 Chapter 08
18 pages
List of Imran Series by Ibn-e-Safi - Wikipedia
No ratings yet
List of Imran Series by Ibn-e-Safi - Wikipedia
25 pages
292 Lab Basic Switch and End Device Configuration
No ratings yet
292 Lab Basic Switch and End Device Configuration
17 pages
The Role of Catestatin in Pree
No ratings yet
The Role of Catestatin in Pree
18 pages
Hunshu
No ratings yet
Hunshu
6 pages
AP Physics 1 Practice Test 1: Kinematics
No ratings yet
AP Physics 1 Practice Test 1: Kinematics
1 page
C929 Template
No ratings yet
C929 Template
5 pages
FQ P1YIydaRO5Vamw3Z8XJDmy3y9
No ratings yet
FQ P1YIydaRO5Vamw3Z8XJDmy3y9
6 pages
MVHP Essentials 9
No ratings yet
MVHP Essentials 9
75 pages
( ) 2024 7.life in Space - ( ) 2 (25 ) (Q)
No ratings yet
( ) 2024 7.life in Space - ( ) 2 (25 ) (Q)
8 pages
Medisin The Causes Solutions To Disease Malnutrition and The Medical Sins That Are Killing The World 1st Scott Whitaker PDF Download
No ratings yet
Medisin The Causes Solutions To Disease Malnutrition and The Medical Sins That Are Killing The World 1st Scott Whitaker PDF Download
82 pages

Mining Frequent Patterns, Association and Correlations

Uploaded by

Mining Frequent Patterns, Association and Correlations

Uploaded by

Data Mining:

Concepts and Techniques

February 16, 2022 1

 Basic concepts and a road map

February 16, 2022 2

February 16, 2022 3

 Discloses an intrinsic and important property of data sets

February 16, 2022 6

 If {beer, diaper, nuts} is frequent, so is {beer,

contains {beer, diaper}

 Freq. pattern growth (FPgrowth—Han, Pei & Yin

 Apriori pruning principle: If there is any itemset which is

February 16, 2022 11

February 16, 2022 12

 Mining multilevel association

 Miming multidimensional association

 Mining quantitative association

 Mining interesting correlation patterns

February 16, 2022 13

uniform support reduced support

Level 2 2% Milk Skim Milk Level 2

February 16, 2022 14

 Some rules may be redundant due to “ancestor”

February 16, 2022 15

February 16, 2022 16

 Techniques can be categorized by how numerical

February 16, 2022 17

 Discretized prior to mining using concept hierarchy.

February 16, 2022 19

 Flexible support constraints (Wang et al. @ VLDB’02)

February 16, 2022 20

February 16, 2022 21

February 16, 2022 22

 Finding all the patterns in a database autonomously? —

 Knowledge type constraint:

 Data constraint — using SQL-like queries

 Rule (or pattern) constraint

 Constrained mining vs. constraint-based search/reasoning

 Finding all patterns satisfying constraints vs. finding

some (or one) answer in constraint-based search in AI

 It is an interesting research problem on how to

 Constrained pattern mining shares a similar philosophy

as pushing selections deeply in query processing

February 16, 2022 25

February 16, 2022 26

 Frequent pattern mining—an important task in data mining

 Mining fault-tolerant frequent, sequential and structured

February 16, 2022 28

 (Association Rules) R. Agrawal, T. Imielinski, and A. Swami. Mining

February 16, 2022 29

 R. Agrawal and R. Srikant. Fast algorithms for mining association rules.

 R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection

 M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. Parallel algorithm

February 16, 2022 32

 R. Srikant and R. Agrawal. Mining generalized association rules.

 M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I.

 R. Meo, G. Psaila, and S. Ceri. A new SQL-like operator for mining

 R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item

February 16, 2022 36

 R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations

 K. Koperski and J. Han, Discovery of Spatial Association Rules in

February 16, 2022 38

 B. Ozden, S. Ramaswamy, and A. Silberschatz. Cyclic association rules.

 A. Evfimievski, R. Srikant, R. Agrawal, J. Gehrke. Privacy Preserving

 Y. Huhtala, J. Kärkkäinen, P. Porkka, H. Toivonen. Efficient

February 16, 2022 42

You might also like