0% found this document useful (0 votes)

71 views126 pages

Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach

The document provides an overview of a tutorial on integrating classification and pattern mining. It discusses frequent pattern mining, classification, associative classification using frequent patterns, and directly mining discriminative patterns. It also covers substructure-based graph classification, integration with other machine learning techniques, and conclusions. The tutorial covers applications of frequent patterns to problems like text categorization, drug design, and spam detection.

Uploaded by

Madiha Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

71 views126 pages

Integration of Classification and Pattern Mining: A Discriminative and Frequent Pattern-Based Approach

Uploaded by

Madiha Shaikh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 126

Integration of Classification and

Pattern Mining: A Discriminative and

Frequent Pattern-Based Approach

Hong Cheng Jiawei Han

Chinese Univ. of Hong Kong Univ. of Illinois at U-C
[email protected] [email protected]

Xifeng Yan Philip S. Yu

Univ. of California at Santa Barbara Univ. of Illinois at Chicago
[email protected] [email protected]
Tutorial Outline
 Frequent Pattern Mining

 Classification Overview

 Associative Classification

 Substructure-Based Graph Classification

 Direct Mining of Discriminative Patterns

 Integration with Other Machine Learning Techniques

 Conclusions and Future Directions

2019/6/22 ICDM 08 Tutorial 2

Frequent Patterns

TID Items bought

10 Beer, Nuts, Diaper

20 Beer, Coffee, Diaper

30 Beer, Diaper, Eggs

40 Nuts, Eggs, Milk

50 Nuts, Diaper, Eggs, Beer

Frequent Itemsets Frequent Graphs

frequent pattern: support no less than min_sup

min_sup: the minimum frequency threshold

2019/6/22 ICDM 08 Tutorial 3

Major Mining Methodologies
Apriori approach
Candidate generate-and-test, breadth-first search
Apriori, GSP, AGM, FSG, PATH, FFSM

Pattern-growth approach
Divide-and-conquer, depth-first search
FP-Growth, PrefixSpan, MoFa, gSpan, Gaston

Vertical data approach

ID list intersection with (item: tid list) representation
Eclat, CHARM, SPADE

2019/6/22 ICDM 08 Tutorial 4

Apriori Approach
• Join two size-k patterns to a size-(k+1)
pattern

• Itemset: {a,b,c} + {a,b,d}  {a,b,c,d}

• Graph:

5
Pattern Growth Approach
• Depth-first search, grow a size-k pattern to
size-(k+1) one by adding one element

• Frequent subgraph mining

6
Vertical Data Approach
• Major operation: transaction list intersection

t ( AB)  t ( A)  t ( B)

Item Transaction id
A t1, t2, t3,…
B t2, t3, t4,…
C t1, t3, t4,…
… …

7
Mining High Dimensional Data

• High dimensional data

– Microarray data with 10,000 – 100,000
columns

• Row enumeration rather than column

enumeration
– CARPENTER [Pan et al., KDD’03]
– COBBLER [Pan et al., SSDBM’04]
– TD-Close [Liu et al., SDM’06]

8
Mining Colossal Patterns
[Zhu et al., ICDE’07]
• Mining colossal patterns: challenges
– A small number of colossal (i.e., large) patterns, but a
very large number of mid-sized patterns
– If the mining of mid-sized patterns is explosive in size,
there is no hope to find colossal patterns efficiently by
insisting “complete set” mining philosophy

• A pattern-fusion approach
– Jump out of the swamp of mid-sized results and
quickly reach colossal patterns
– Fuse small patterns to large ones directly

9
Impact to Other Data Analysis Tasks
• Association and correlation analysis
– Association: support and confidence
– Correlation: lift, chi-square, cosine, all_confidence, coherence
– A comparative study [Tan, Kumar and Srivastava, KDD’02]

• Frequent pattern-based Indexing

– Sequence Indexing [Cheng, Yan and Han, SDM’05]
– Graph Indexing [Yan, Yu and Han, SIGMOD’04; Cheng et al.,
SIGMOD’07; Chen et al., VLDB’07]

• Frequent pattern-based clustering

– Subspace clustering with frequent itemsets
• CLIQUE [Agrawal et al., SIGMOD’98]
• ENCLUS [Cheng, Fu and Zhang, KDD’99]
• pCluster [Wang et al., SIGMOD’02]

• Frequent pattern-based classification

– Build classifiers with frequent patterns (our focus in this talk!)

10
Classification Overview

Training Model
Instances Learning

Positive

Test Prediction
Instances Model
Negative

2019/6/22 ICDM 08 Tutorial 11

Existing Classification Methods
age?

<=30 31..40 >40

student? credit rating?

yes

no yes excellent fair

no yes yes

and many more…

Support Vector Machine Decision Tree
Family
Smoker
History

LungCancer Emphysema

PositiveXRay Dyspnea

Neural Network Bayesian Network

2019/6/22 ICDM 08 Tutorial 12
Many Classification Applications

Text Categorization
Drug Design

Spam
Detection
Classifier

2019/6/22 Face Recognition ICDM 08 Tutorial Spam Detection 13

Major Data Mining Themes

Frequent Pattern
Classification
Analysis

Frequent
Pattern-Based
Classification

Clustering Outlier Analysis

2019/6/22 ICDM 08 Tutorial 14

Why Pattern-Based Classification?
Feature construction
Higher order
Compact
Discriminative

Complex data modeling

Sequences
Graphs
Semi-structured/unstructured data

2019/6/22 ICDM 08 Tutorial 15

Feature Construction

Phrases vs. … the long-awaited Apple iPhone has arrived …

single words … the best apple pie recipe …

disambiguation
Sequences vs. … login, changeDir, delFile, appendFile, logout …
single commands … login, setFileType, storeFile, logout …

temporal order higher order,

discriminative

2019/6/22 ICDM 08 Tutorial 16

Complex Data Modeling

age income credit Buy?

25 80k good Yes
Training Classification Prediction
50 200k good No model
Instances Model
32 50k fair No

Predefined
Feature vector

Training
Instances ? Classification
model
Prediction
Model

NO Predefined
Feature vector

2019/6/22 ICDM 08 Tutorial 17

Discriminative Frequent Pattern-
Based Classification

Pattern-Based
Discriminative
Training Model
Feature
Frequent
InstancesPatterns Learning
Construction

Positive

Feature
Test Space Prediction
Transformation
Instances Model
Negative

2019/6/22 ICDM 08 Tutorial 18

Pattern-Based Classification on
Transactions
Frequent Support
Itemset
Attributes Class Mining
AB 3
A, B, C 1
AC 3
A 1 min_sup=3 Augmented
BC 3
A, B, C 1
C 0 A B C AB AC BC Class
A, B 1 1 1 1 1 1 1 1
A, C 0 1 0 0 0 0 0 1

B, C 0 1 1 1 1 1 1 1
0 0 1 0 0 0 0
1 1 0 1 0 0 1
1 0 1 0 1 0 0
0 1 1 0 0 1 0

2019/6/22 ICDM 08 Tutorial 19

Pattern-Based Classification on Graphs
Inactive

Frequent Graphs

Active g1 g2 Class
Mining Transform
g2
1 1 0
0 0 1
min_sup=2
1 1 0
Inactive

2019/6/22 ICDM 08 Tutorial 20

Applications: Drug Design
Test Chemical
O Active
H
Compound
H
H
H
N
H O
H H

H Inactive

Descriptor-space
HO
H
Representation Classifier Model
Cl

H Active
H

O
N ?
Class = Active / Inactive

.. H Training
Chemical
2019/6/22 . Compounds ICDM 08 Tutorial
Courtesy of Nikil Wale
21
Applications: Bug Localization

calling graph

correct executions incorrect executions

2019/6/22 ICDM 08 Tutorial Courtesy of Chao Liu 22

Tutorial Outline
 Frequent Pattern Mining

 Classification Overview

 Associative Classification

 Substructure-Based Graph Classification

 Direct Mining of Discriminative Patterns

 Integration with Other Machine Learning Techniques

 Conclusions and Future Directions

2019/6/22 ICDM 08 Tutorial 23

Associative Classification
 Data: transactional data, microarray data

 Pattern: frequent itemsets and association rules

 Representative work
 CBA [Liu, Hsu and Ma, KDD’98]
 Emerging patterns [Dong and Li, KDD’99]
 CMAR [Li, Han and Pei, ICDM’01]
 CPAR [Yin and Han, SDM’03]
 RCBT [Cong et al., SIGMOD’05]
 Lazy classifier [Veloso, Meira and Zaki, ICDM’06]
 Integrated with classification models [Cheng et al., ICDE’07]

2019/6/22 ICDM 08 Tutorial 24

CBA [Liu, Hsu and Ma, KDD’98]
• Basic idea
• Mine high-confidence, high-support class
association rules with Apriori
• Rule LHS: a conjunction of conditions
• Rule RHS: a class label
• Example:
R1: age < 25 & credit = ‘good’  buy iPhone (sup=30%, conf=80%)
R2: age > 40 & income < 50k  not buy iPhone (sup=40%, conf=90%)

25
CBA
• Rule mining
• Mine the set of association rules wrt. min_sup and
min_conf
• Rank rules in descending order of confidence and
support
• Select rules to ensure training instance coverage

• Prediction
• Apply the first rule that matches a test case
• Otherwise, apply the default rule

26
CMAR [Li, Han and Pei, ICDM’01]

• Basic idea
– Mining: build a class distribution-associated FP-tree
– Prediction: combine the strength of multiple rules

• Rule mining
– Mine association rules from a class distribution-
associated FP-tree
– Store and retrieve association rules in a CR-tree
– Prune rules based on confidence, correlation and
database coverage

27
Class Distribution-Associated
FP-tree

28
CR-tree: A Prefix-tree to Store and
Index Rules

29
Prediction Based on Multiple Rules

• All rules matching a test case are collected and

grouped based on class labels. The group with
the most strength is used for prediction

• Multiple rules in one group are combined with a

weighted chi-square as:
 
2 2

 max  2
where max  2
is the upper bound of chi-square of
a rule.

30
CPAR [Yin and Han, SDM’03]
• Basic idea
– Combine associative classification and FOIL-based
rule generation
– Foil gain: criterion for selecting a literal

– Improve accuracy over traditional rule-based

classifiers
– Improve efficiency and reduce number of rules over
association rule-based methods
31
CPAR
• Rule generation
– Build a rule by adding literals one by one in a greedy
way according to foil gain measure
– Keep all close-to-the-best literals and build several
rules simultaneously

• Prediction
– Collect all rules matching a test case
– Select the best k rules for each class
– Choose the class with the highest expected accuracy
for prediction

32
Performance Comparison
[Yin and Han, SDM’03]
Data C4.5 Ripper CBA CMAR CPAR
anneal 94.8 95.8 97.9 97.3 98.4
austral 84.7 87.3 84.9 86.1 86.2
auto 80.1 72.8 78.3 78.1 82.0
breast 95.0 95.1 96.3 96.4 96.0
cleve 78.2 82.2 82.8 82.2 81.5
crx 84.9 84.9 84.7 84.9 85.7
diabetes 74.2 74.7 74.5 75.8 75.1
german 72.3 69.8 73.4 74.9 73.4
glass 68.7 69.1 73.9 70.1 74.4
heart 80.8 80.7 81.9 82.2 82.6
hepatic 80.6 76.7 81.8 80.5 79.4
horse 82.6 84.8 82.1 82.6 84.2
hypo 99.2 98.9 98.9 98.4 98.1
iono 90.0 91.2 92.3 91.5 92.6
iris 95.3 94.0 94.7 94.0 94.7
labor 79.3 84.0 86.3 89.7 84.7
… … … … … …
Average 83.34 82.93 84.69 85.22 85.17 33
Emerging Patterns
[Dong and Li, KDD’99]
• Emerging Patterns (EPs) are contrast patterns between
two classes of data whose support changes significantly
between the two classes.

• Change significance can be defined by:

big support ratio:
supp2(X)/supp1(X) >= minRatio similar to RiskRatio

big support difference:

defined by Bay+Pazzani 99
|supp2(X) – supp1(X)| >= minDiff

• If supp2(X)/supp1(X) = infinity, then X is a jumping EP.

– jumping EP occurs in one class but never occurs in the other
class.
Courtesy of Bailey and Dong
34
A Typical EP in the Mushroom
Dataset
• The Mushroom dataset contains two classes: edible and
poisonous

• Each data tuple has several features such as: odor, ring-
number, stalk-surface-bellow-ring, etc.

• Consider the pattern

{odor = none,
stalk-surface-below-ring = smooth,
ring-number = one}

Its support increases from 0.2% in the poisonous class

to 57.6% in the edible class (a growth rate of 288).
Courtesy of Bailey and Dong
35
EP-Based Classification: CAEP
[Dong et al, DS’99]
• Given a test case T, obtain T’s scores for each class, by
aggregating the discriminating power of EPs contained in T; assign
the class with the maximal score as T’s class.
• The discriminating power of EPs are expressed in terms of
supports and growth rates. Prefer large supRatio, large support

• The contribution of one EP X (support weighted confidence):

strength(X) = sup(X) * supRatio(X) / (supRatio(X)+1)

• Given a test T and a set E(Ci) of EPs for class Ci, the
aggregate score of T for Ci is
score(T, Ci) = S strength(X)
(over X of Ci matching T)

• For each class, may use median (or 85%) aggregated value to
normalize to avoid bias towards class with more EPs
Courtesy of Bailey and Dong
36
Top-k Covering Rule Groups for Gene
Expression Data [Cong et al., SIGMOD’05 ]
• Problem
– Mine strong association rules to reveal correlation between
gene expression patterns and disease outcomes
– Example: gene1[a1 , b1 ],..., genen [an , bn ]  class
– Build a rule-based classifier for prediction

• Challenges: high dimensionality of data

– Extremely long mining time
– Huge number of rules generated

• Solution
– Mining top-k covering rule groups with row enumeration
– A classifier RCBT based on top-k covering rule groups

2019/6/22 ICDM 08 Tutorial 37

A Microarray Dataset

2019/6/22 ICDM 08 Tutorial Courtesy of Anthony Tung 38

Top-k Covering Rule Groups
• Rule group
– A set of rules which are supported by the same set
of transactions G  { Ai  C | Ai  I }
– Rules in one group have the same sup and conf
– Reduce the number of rules by clustering them into
groups

• Mining top-k covering rule groups

– For a row ri , the set of rule groups { ri j }, j  [1, k ]
satisfying minsup and there is no more significant
rule groups

2019/6/22 ICDM 08 Tutorial 39

Row Enumeration

item
tid

2019/6/22 ICDM 08 Tutorial 40

TopkRGS Mining Algorithm
• Perform a depth-first traversal of a row
enumeration tree
• { ri j } for row ri are initialized
• Update
– If a new rule is more significant than existing rule
groups, insert it
• Pruning
– If the confidence upper bound of a subtree X is below
the minconf of current top-k rule groups, prune X

2019/6/22 ICDM 08 Tutorial 41

RCBT
• RCBT uses a set of matching rules for a
collective decision
• Given a test data t, assume t satisfies mi rules of
class ci , the classification score of class ci is
mi
Score(t )  ( S ( (t ) )) / S
ci ci
i
ci
norm
i 1
where the score of a single rule is
S ( ci )   ci .conf   ci .sup / d ci

2019/6/22 ICDM 08 Tutorial 42

Mining Efficiency

Top-k
Top-k

2019/6/22 ICDM 08 Tutorial 43

Classification Accuracy

2019/6/22 ICDM 08 Tutorial 44

Lazy Associative Classification
[Veloso, Meira, Zaki, ICDM’06]
• Basic idea
– Simply stores training data, and the classification model (CARs)
is built after a test instance is given
• For a test case t, project training data D on t
• Mine association rules from Dt
• Select the best rule for prediction

– Advantages
• Search space is reduced/focused
– Cover small disjuncts (support can be lowered)
• Only applicable rules are generated
– A much smaller number of CARs are induced
– Disadvantages
• Several models are generated, one for each test instance
• Potentially high computational cost

Courtesy of Mohammed Zaki

2019/6/22 ICDM 08 Tutorial 45

Caching for Lazy CARs
• Models for different test instances may share
some CARs
– Avoid work replication by caching common CARs

• Cache infrastructure
– All CARs are stored in main memory
– Each CAR has only one entry in the cache
– Replacement policy
• LFU heuristic Courtesy of Mohammed Zaki

2019/6/22 ICDM 08 Tutorial 46

Integrated with Classification
Models [Cheng et al., ICDE’07]
 Framework
 Feature construction
 Frequent itemset mining

 Feature selection
 Select
discriminative features
 Remove redundancy and correlation

 Model learning
A general classifier based on SVM or C4.5 or other
classification model

2019/6/22 ICDM 08 Tutorial 47

Information Gain vs. Frequency?
1

0.9 InfoGain
IG_UpperBnd
0.8
Info Gain

Info Gain

Info Gain
0.7

Information Gain
0.6

0.5

0.4

0.3

0.2

0.1

0
0 100 200 300 400 500 600 700

Frequency Frequency
Support
Frequency

(a) Austral (b) Breast Low support, (c) Sonar

low info gain
Information Gain Formula:

IG (C | X )  H (C )  H (C | X )
48
Fisher Score vs. Frequency?
3.5

FisherScore
3 FS_UpperBnd

2.5
fisher

fisher

fisher
Fisher Score
2

1.5

0.5

0
0 100 200 300 400 500 600 700

Frequency Frequency Support

Frequency

(a) Austral (b) Breast (c) Sonar

Fisher Score Formula:

i1 i i
c
n (u  u ) 2

Fr 
i1 i i
 2 c
n
2019/6/22 ICDM 08 Tutorial 49
Analytical Study on Information Gain

IG (C | X )  H (C )  H (C | X )

m
H (C )   pi log 2 ( pi ) H (C | X )   j P( X  x j ) H (Y | X  x j )
i 1

Entropy Conditional Entropy

Constant given data Study focus

2019/6/22 ICDM 08 Tutorial 50

Information Gain Expressed by
Pattern Frequency
X: feature; C: class labels
H (C | X )   
Entropy when feature
P( x)
x{0 ,1}
 P(c | x) log P(c | x)
c{0 ,1} Conditional prob. of
appears (x=1) the positive class
when pattern appears
q  P(c  1 | x  1)
H (C | X )  q log q   (1  q) log( 1  q)

p  q (1  p)   (1  q)
 (q  p) log  ( (1  q)  (1  p)) log
1 1
Entropy when feature
not appears (x=0)
Pattern Prob. of
frequency Positive Class
  P( x  1) p  P(c  1)
2019/6/22 ICDM 08 Tutorial 51
Conditional Entropy in a Pure Case

• When q  1(or q  0 )

H (C | X )  q log q   (1  q) log( 1  q) 0

p  q (1  p)   (1  q)
 (q  p) log  ( (1  q)  (1  p)) log
1 1

p  p  1 p 1 p
H (C | X )|q 1  (  1)( log  log )
1 1 1 1

2019/6/22 ICDM 08 Tutorial 52

Frequent Is Informative
p  p  1 p 1 p
H (C | X )|q 1  (  1)( log  log )
1 1 1 1

the H(C|X) minimum value when   p (similar for q=0)

Take a partial derivative

H (C | X )|q 1 p 
 log  log 1  0 since   p  1
 1

H(C|X) lower bound is monotonically decreasing with frequency

IG(C|X) upper bound is monotonically increasing with frequency

2019/6/22 ICDM 08 Tutorial 53

Too Frequent is Less Informative
• For   p , we have a similar conclusion:

H(C|X) lower bound is monotonically increasing with frequency

IG(C|X) upper bound is monotonically decreasing with frequency

• Similar analysis on Fisher score

0.9 InfoGain
IG_UpperBnd
0.8

0.7
Information Gain

0.6

0.5

0.4

0.3

0.2

0.1

0
0 100 200 300 400 500 600 700

2019/6/22 ICDM 08 Tutorial

Support
54
Accuracy
Single Feature Frequent Pattern Single Feature Frequent Pattern
Data Item_All* Item_FS Pat_All Pat_FS Data Item_All Item_FS Pat_All Pat_FS
austral 85.01 85.50 81.79 91.14 austral 84.53 84.53 84.21 88.24
auto 83.25 84.21 74.97 90.79 auto 71.70 77.63 71.14 78.77
cleve 84.81 84.81 78.55 95.04 Cleve 80.87 80.87 80.84 91.42
diabetes 74.41 74.41 77.73 78.31 diabetes 77.02 77.02 76.00 76.58
glass 75.19 75.19 79.91 81.32 glass 75.24 75.24 76.62 79.89
heart 84.81 84.81 82.22 88.15 heart 81.85 81.85 80.00 86.30
iono 93.15 94.30 89.17 95.44 iono 92.30 92.30 92.89 94.87

Accuracy based on SVM Accuracy based on Decision Tree

* Item_All: all single features Item_FS: single features with selection

Pat_All: all frequent patterns Pat_FS: frequent patterns with selection

2019/6/22 ICDM 08 Tutorial 55

Classification with A Small Feature Set

Decision
min_sup # Patterns Time SVM (%)
Tree (%)
1 N/A N/A N/A N/A
2000 68,967 44.70 92.52 97.59
2200 28,358 19.94 91.68 97.84
2500 6,837 2.91 91.68 97.62
2800 1,031 0.47 91.84 97.37
3000 136 0.06 91.90 97.06

Accuracy and Time on Chess

2019/6/22 ICDM 08 Tutorial 56

Tutorial Outline
 Frequent Pattern Mining

 Classification Overview

 Associative Classification

 Substructure-Based Graph Classification

 Direct Mining of Discriminative Patterns

 Integration with Other Machine Learning Techniques

 Conclusions and Future Directions

2019/6/22 ICDM 08 Tutorial 57

Substructure-Based Graph
Classification
 Data: graph data with labels, e.g., chemical compounds, software
behavior graphs, social networks

 Basic idea
 Extract graph substructures F  { g1,..., g n }
 Represent a graph with a feature vector x  {x1 ,..., xn }, where xi is
the frequency of g i in that graph
 Build a classification model

 Different features and representative work

 Fingerprint
 Maccs keys
 Tree and cyclic patterns [Horvath et al., KDD’04]
 Minimal contrast subgraph [Ting and Bailey, SDM’06]
 Frequent subgraphs [Deshpande et al., TKDE’05; Liu et al., SDM’05]
 Graph fragments [Wale and Karypis, ICDM’06]
2019/6/22 ICDM 08 Tutorial 58
Fingerprints (fp-n)

Enumerate all paths up Hash features to position(s) in

Chemical to length l and certain cycles a fixed length bit-vector
Compounds
O

N O

O 1 2 ■ ■ ■ ■
O
■ n

N O N
N

O O
1 2 ■ ■ ■ ■

.. ■ n
..
. .
Courtesy of Nikil Wale
2019/6/22 ICDM 08 Tutorial 59
Maccs Keys (MK)
Each Fragment forms a
fixed dimension in the
descriptor-space
Domain Expert O OH

NH2 O

HO
O

NH2

O
Identify “Important”
Fragments NH2

for bioactivity

Courtesy of Nikil Wale

2019/6/22 ICDM 08 Tutorial 60
Cycles and Trees (CT)
[Horvath et al., KDD’04]
Bounded
Identify Cyclicity
Bi-connected Using
components Bi-connected
Chemical Compound Fixed number
components of cycles

NH2 O
O Delete
Bi-connected NH2 Left-over
Components Trees
from the
compound

Courtesy of Nikil Wale

2019/6/22 ICDM 08 Tutorial 61
Frequent Subgraphs (FS)
[Deshpande et al., TKDE’05]
Discovering Features
Topological features – captured by
Chemical graph representation
Compounds
Discovered Sup:+ve:30% -ve:5%
H H Subgraphs
H

O
H
N
Sup:+ve:40%-ve:0%
O
Frequent
O
Subgraph
O F
F
H Discovery
H

H Sup:+ve:1% -ve:30%

H
Min.
H
Support.
H H

Courtesy of Nikil Wale

2019/6/22 ICDM 08 Tutorial 62
Graph Fragments (GF)
[Wale and Karypis, ICDM’06]
• Tree Fragments (TF): At least one node of the tree
fragment has a degree greater than 2 (no cycles).
O

NH2

• Path Fragments (PF): All nodes have degree less

than or equal to 2 but does not include cycles.
OH

• Acyclic Fragments (AF): TF U PF

– Acyclic fragments are also termed as free trees.

Courtesy of Nikil Wale

2019/6/22 ICDM 08 Tutorial 63
Comparison of Different Features
[Wale and Karypis, ICDM’06]

2019/6/22 ICDM 08 Tutorial 64

Minimal Contrast Subgraphs
[Ting and Bailey, SDM’06]
• A contrast graph is a subgraph appearing
in one class of graphs and never in
another class of graphs
– Minimal if none of its subgraphs are contrasts
– May be disconnected
• Allows succinct description of differences
• But requires larger search space

Courtesy of Bailey and Dong

65
Mining Contrast Subgraphs
• Main idea
– Find the maximal common edge sets
• These may be disconnected
– Apply a minimal hypergraph transversal
operation to derive the minimal contrast edge
sets from the maximal common edge sets
– Must compute minimal contrast vertex sets
separately and then minimal union with the
minimal contrast edge sets

Courtesy of Bailey and Dong

66
Frequent Subgraph-Based Classification
[Deshpande et al., TKDE’05]
• Frequent subgraphs
– A graph is frequent if its support (occurrence frequency) in a given dataset
is no less than a minimum support threshold

• Feature generation
– Frequent topological subgraphs by FSG
– Frequent geometric subgraphs with 3D shape information

• Feature selection
– Sequential covering paradigm

• Classification
– Use SVM to learn a classifier based on feature vectors
– Assign different misclassification costs for different classes to address
skewed class distribution

2019/6/22 ICDM 08 Tutorial 67

Varying Minimum Support

2019/6/22 ICDM 08 Tutorial 68

Varying Misclassification Cost

2019/6/22 ICDM 08 Tutorial 69

Frequent Subgraph-Based Classification for
Bug Localization [Liu et al., SDM’05]
• Basic idea
– Mine closed subgraphs from software behavior graphs
– Build a graph classification model for software behavior prediction
– Discover program regions that may contain bugs
• Software behavior graphs
– Node: functions
– Edge: function calls or transitions

2019/6/22 ICDM 08 Tutorial 70

Bug Localization
• Identify suspicious
functions relevant to PA
incorrect runs
– Gradually include more trace
data
– Build multiple classification PB
models and estimate the
accuracy boost
– A function with a significant
precision boost could be bug
relevant
PB-PA is the accuracy boost
of function B
2019/6/22 ICDM 08 Tutorial 71
Case Study

2019/6/22 ICDM 08 Tutorial 72

Graph Fragment
[Wale and Karypis, ICDM’06]
• All graph substructures up to a given length (size or
# of bonds)
– Determined dynamically → Dataset dependent descriptor space
– Complete coverage → Descriptors for every compound
– Precise representation → One to one mapping
– Complex fragments → Arbitrary topology

• Recurrence relation to generate graph fragments of

length l

Courtesy of Nikil Wale

2019/6/22 ICDM 08 Tutorial 73
Performance Comparison

2019/6/22 ICDM 08 Tutorial 74

Tutorial Outline
 Frequent Pattern Mining

 Classification Overview

 Associative Classification

 Substructure-Based Graph Classification

 Direct Mining of Discriminative Patterns

 Integration with Other Machine Learning Techniques

 Conclusions and Future Directions

2019/6/22 ICDM 08 Tutorial 75

Re-examination of Pattern-Based
Classification

Pattern-Based
Training Model
Feature
Instances Learning
Construction

Positive

Test Feature Space Prediction

Instances Transformation Model
Negative

2019/6/22 ICDM 08 Tutorial 76

The Computational Bottleneck
Two steps, expensive

Mining Filtering
Frequent Patterns Discriminative
Data 104~106 Patterns

Direct mining, efficient

Transform Direct Mining

Discriminative
Data
Patterns

FP-tree

2019/6/22 ICDM 08 Tutorial 77

Challenge: Non Anti-Monotonic

Non Monotonic

Anti-Monotonic

Enumerate subgraphs
: small-size to large-size

Non-Monotonic: Enumerate all subgraphs then check their score?

2019/6/22 ICDM 08 Tutorial 78

Direct Mining of Discriminative
Patterns
• Avoid mining the whole set of patterns
– Harmony [Wang and Karypis, SDM’05]
– DDPMine [Cheng et al., ICDE’08]
– LEAP [Yan et al., SIGMOD’08]
– MbT [Fan et al., KDD’08]

• Find the most discriminative pattern

– A search problem?
– An optimization problem?

• Extensions
– Mining top-k discriminative patterns
– Mining approximate/weighted discriminative patterns

2019/6/22 ICDM 08 Tutorial 79

Harmony
[Wang and Karypis, SDM’05]
• Direct mining the best rules for classification
– Instance-centric rule generation: the highest confidence rule for
each training case is included

– Efficient search strategies and pruning methods

• Support equivalence item (keep “generator itemset”)
– e.g., prune (ab) if sup(ab)=sup(a)

• Unpromising item or conditional database

– Estimate confidence upper bound
– Prune an item or a conditional db if it cannot generate a rule with higher
confidence

– Ordering of items in conditional database

• Maximum confidence descending order
• Entropy ascending order
• Correlation coefficient ascending order

80
Harmony
• Prediction
– For a test case, partition the rules into k
groups based on class labels
– Compute the score for each rule group
– Predict based the rule group with the highest
score

81
Accuracy of Harmony

82
Runtime of Harmony

83
DDPMine [Cheng et al., ICDE’08]

• Basic idea
– Integration of branch-and-bound search with
FP-growth mining
– Iteratively eliminate training instance and
progressively shrink FP-tree

• Performance
– Maintain high accuracy
– Improve mining efficiency

2019/6/22 ICDM 08 Tutorial 84

FP-growth Mining with Depth-first
Search
sup( child )  sup( parent )
sup( ab)  sup( a)
b

a bc bd
cd ce

ab ac
cef ceg
2019/6/22 ICDM 08 Tutorial 85
Branch-and-Bound Search

a: constant, a parent node

Association between information
b: variable, a descendent
gain and frequency

2019/6/22 ICDM 08 Tutorial 86

Training Instance Elimination

Examples covered
Examples covered by feature 2
by feature 1 (2nd BB)
Examples covered
(1st BB) by feature 3
(3rd BB)
Training
examples

2019/6/22 ICDM 08 Tutorial 87

DDPMine Algorithm Pipeline

1. Branch-and-Bound Search

2. Training Instance Elimination

Is Training Set Empty ?

3. Output discriminative patterns

2019/6/22 ICDM 08 Tutorial 88

Efficiency Analysis: Iteration Number
• min_sup   0 ; frequent itemset  i at i-th iteration
since | T ( i ) | 0 | Di 1 |

| Di || Di 1 |  | T ( i ) | (1   0 ) | Di 1 | ...  (1   0 )i | D0 |

• Number of iterations:

n  log 1 | D0 |
1 0

• If  0  0.5 n  log 2 | D0 | ;  0  0.2 n  log 1.25 | D0 |

2019/6/22 ICDM 08 Tutorial 89

Accuracy
Datasets Harmony PatClass DDPMine
adult 81.90 84.24 84.82
chess 43.00 91.68 91.85
crx 82.46 85.06 84.93
hypo 95.24 99.24 99.24
mushroom 99.94 99.97 100.00
sick 93.88 97.49 98.36
sonar 77.44 90.86 88.74
waveform 87.28 91.22 91.83
Average 82.643 92.470 92.471

Accuracy Comparison

2019/6/22 ICDM 08 Tutorial 90

Efficiency: Runtime
PatClass

Harmony

DDPMine

2019/6/22 ICDM 08 Tutorial 91

Branch-and-Bound Search: Runtime

2019/6/22 ICDM 08 Tutorial 92

Mining Most Significant Graph with
Leap Search [Yan et al., SIGMOD’08]

Objective functions

2019/6/22 ICDM 08 Tutorial 93

Upper-Bound

2019/6/22 ICDM 08 Tutorial 94

Upper-Bound: Anti-Monotonic

Rule of Thumb :
If the frequency difference of a graph pattern in
the positive dataset and the negative dataset
increases, the pattern becomes more interesting

We can recycle the existing graph mining algorithms to

accommodate non-monotonic functions.

2019/6/22 ICDM 08 Tutorial 95

Structural Similarity
Structural similarity 
Size-4 graph Significance similarity
g ~ g' F (g) ~ F (g')

Sibling
Size-5 graph

Size-6 graph

2019/6/22 ICDM 08 Tutorial 96

Structural Leap Search
Leap on g’ subtree if
2  ( g , g ' )

sup  ( g )  sup  ( g ' )
2  ( g , g ' )

sup  ( g )  sup  ( g ' )
 : leap length, tolerance of
structure/frequency dissimilarity
g : a discovered graph Mining Part Leap Part

g’: a sibling of g

2019/6/22 ICDM 08 Tutorial 97

Frequency Association

Association between pattern’s frequency and objective scores

Start with a high frequency threshold, gradually decrease it

98
LEAP Algorithm

1. Structural Leap Search with

Frequency Threshold

2. Support Descending Mining

F(g*) converges

3. Branch-and-Bound Search
with F(g*)

99
Branch-and-Bound vs. LEAP

Branch-and-Bound LEAP

Parent-child bound Sibling similarity

Pruning base (“vertical”) (“horizontal”)
strict pruning approximate pruning

Feature
Guaranteed Near optimal
Optimality

Efficiency Good Better

2019/6/22 ICDM 08 Tutorial 100

NCI Anti-Cancer Screen Datasets
Name Assay ID Size Tumor Description
MCF-7 83 27,770 Breast
MOLT-4 123 39,765 Leukemia
NCI-H23 1 40,353 Non-Small Cell Lung
OVCAR-8 109 40,516 Ovarian
P388 330 41,472 Leukemia
PC-3 41 27,509 Prostate
SF-295 47 40,271 Central Nerve System
SN12C 145 40,004 Renal
SW-620 81 40,532 Colon
UACC257 33 39,988 Melanoma
YEAST 167 79,601 Yeast anti-cancer

Data Description

2019/6/22 ICDM 08 Tutorial 101

Efficiency Tests

Search Efficiency Search Quality: G-test

2019/6/22 ICDM 08 Tutorial 102

Mining Quality: Graph Classification
Name OA Kernel* LEAP OA Kernel LEAP
(6x) (6x)
MCF-7 0.68 0.67 0.75 0.76
MOLT-4 0.65 0.66 0.69 0.72
NCI-H23 0.79 0.76 0.77 0.79
OVCAR-8 0.67 0.72 0.79 0.78
P388 0.79 0.82 0.81 0.81
PC-3 0.66 0.69 0.79 0.76
Average 0.70 0.72 0.75 0.77

AUC Runtime

* OA Kernel: Optimal Assignment Kernel LEAP: LEAP search

OA Kernel Ο(n m )
2 3

[Frohlich et al., ICML’05]

scalability problem!

2019/6/22 ICDM 08 Tutorial 103

Direct Mining via Model-Based Search Tree
[Fan et al., KDD’08] Classifier Feature
Miner
• Basic flows

Mine & dataset Compact set

Select Most
P: 20% discriminative of highly
1 F based on IG
Y N discriminative
patterns
Mine & Mine &
Select Select
2 5 P:20%
1
P: 20%
Y N Y N 2
3
Mine & Mine &
Global 4
Select Select
P:20% 3 4 6 7 P:20%
Support: 5
Y N Y N Y N 6
… 10*20%/10000
Few
=0.02% 7
+ Data … + .
.
.
Divide-and-Conquer Based Frequent Mined Discriminative
Pattern Mining Patterns
2019/6/22 ICDM 08 Tutorial 104
Analyses (I)

1. Scalability of pattern enumeration

• Upper bound:

• “Scale down” ratio:

2. Bound on number of returned

features

2019/6/22 ICDM 08 Tutorial 105

Analyses (II)

3. Subspace pattern selection

• Original set:

• Subset:

4. Non-overfitting

5. Optimality under exhaustive search

2019/6/22 ICDM 08 Tutorial 106
Experimental Study: Itemset Mining (I)
 Scalability comparison
Log(DT #Pat) Log(MbT #Pat)

4
3
2
1
0
Adult Chess Hypo Sick Sonar

Datasets MbT #Pat #Pat using MbT Ratio (MbT #Pat / #Pat
sup using MbT sup)
Adult 1039.2 252809 0.41%
Chess 46.8 +∞ ~0%
Hypo 14.8 423439 0.0035%
Sick 15.4 4818391 0.00032%
Sonar 7.4 95507 0.00775%

2019/6/22 ICDM 08 Tutorial 107

Experimental Study: Itemset Mining (II)
 Accuracy of mined itemsets

DT Accuracy MbT Accuracy

100%
90%
4 Wins
80%
1 loss
70%
Adult Chess Hypo Sick Sonar

much smaller
Log(DT #Pat) Log(MbT #Pat) number of
patterns
4
3
2
1
0
Adult Chess Hypo Sick Sonar

2019/6/22 ICDM 08 Tutorial 108

Tutorial Outline
 Frequent Pattern Mining

 Classification Overview

 Associative Classification

 Substructure-Based Graph Classification

 Direct Mining of Discriminative Patterns

 Integration with Other Machine Learning Techniques

 Conclusions and Future Directions

2019/6/22 ICDM 08 Tutorial 109

Integrated with Other Machine
Learning Techniques
• Boosting
– Boosting an associative classifier [Sun, Wang
and Wong, TKDE’06]
– Graph classification with boosting [Kudo,
Maeda and Matsumoto, NIPS’04]

• Sampling and ensemble

– Data and feature ensemble for graph
classification [Cheng et al., In preparation]

2019/6/22 ICDM 08 Tutorial 110

Boosting An Associative Classifier
[Sun, Wang and Wong, TKDE’06]
• Apply AdaBoost to associative classification with
low-order rules

• Three weighting strategies for combining classifiers

– Classifier-based weighting (AdaBoost)

– Sample-based weighting (Evaluated to be the best)

– Hybrid weighting

2019/6/22 ICDM 08 Tutorial 111

Graph Classification with Boosting
[Kudo, Maeda and Matsumoto, NIPS’04]
• Decision stump  t, y 
– If a molecule
x contains t , it is classified as y
 y if t  x,
ht , y  ( x )  
 ny otherwise
• Gain gain ( t , y )   yi ht , y  ( x i )
i 1
– Find a decision stump (subgraph) which maximizes gain

• Boosting with weight nvector d

(k)
 (d ,..., d
(k )
1
(k )
n )
gain ( t , y )   yi d i( k ) ht , y  ( x i )
i 1
2019/6/22 ICDM 08 Tutorial 112
Sampling and Ensemble
[Cheng et al., In Preparation]
• Many real graph datasets are extremely
skewed
– Aids antiviral screen data: 1% active samples
– NCI anti-cancer data: 5% active samples

• Traditional learning methods tend to be biased

towards the majority class and ignore the
minority class

• The cost of misclassifying minority examples is

usually huge

2019/6/22 ICDM 08 Tutorial 113

Sampling
• Repeated samples of the positive class
• Under-samples of the negative class
• Re-balance the data distribution

- - -
+ - - - -
-- - - -
----

+- - +- - +- - … +- -

2019/6/22 ICDM 08 Tutorial 114

Balanced Data Ensemble
+- - +- - +- - … +- -

FS-based FS-based
… …
Classification Classification

C1 C2 C3 … Ck

1 k i
f ( x)   f ( x)
E

k i 1
The error of each classifier is independent, could be reduced
through ensemble.
2019/6/22 ICDM 08 Tutorial 115
ROC Curve
Sampling and ensemble

2019/6/22 ICDM 08 Tutorial 116

ROC50 Comparison

SE: Sampling + Ensemble FS: Single model with frequent subgraphs

GF: Single model with graph fragments

2019/6/22 ICDM 08 Tutorial 117

Tutorial Outline
 Frequent Pattern Mining

 Classification Overview

 Associative Classification

 Substructure-Based Graph Classification

 Direct Mining of Discriminative Patterns

 Integration with Other Machine Learning Techniques

 Conclusions and Future Directions

2019/6/22 ICDM 08 Tutorial 118

Conclusions
• Frequent pattern is a discriminative feature in
classifying both structured and unstructured data.

• Direct mining approach can find the most

discriminative pattern with significant speedup.

• When integrated with boosting or ensemble, the

performance of pattern-based classification can
be further enhanced.

2019/6/22 ICDM 08 Tutorial 119

Future Directions
• Mining more complicated patterns
– Direct mining top-k significant patterns
– Mining approximate patterns

• Integration with other machine learning tasks

– Semi-supervised and unsupervised learning
– Domain adaptive learning

• Applications: Mining colossal discriminative

patterns?
– Software bug detection and localization in large programs
– Outlier detection in large networks
• Money laundering in wired transfer network
• Web spam in internet

2019/6/22 ICDM 08 Tutorial 120

References (1)
 R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic Subspace
Clustering of High Dimensional Data for Data Mining Applications,
SIGMOD’98.
 R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules,
VLDB’94.
 C. Borgelt, and M.R. Berthold. Mining Molecular Fragments: Finding Relevant
Substructures of Molecules, ICDM’02.
 C. Chen, X. Yan, P.S. Yu, J. Han, D. Zhang, and X. Gu, Towards Graph
Containment Search and Indexing, VLDB'07.
 C. Cheng, A.W. Fu, and Y. Zhang. Entropy-based Subspace Clustering for
Mining Mumerical Data, KDD’99.
 H. Cheng, X. Yan, and J. Han. Seqindex: Indexing Sequences by Sequential
Pattern Analysis, SDM’05.
 H. Cheng, X. Yan, J. Han, and C.-W. Hsu, Discriminative Frequent Pattern
Analysis for Effective Classification, ICDE'07.
 H. Cheng, X. Yan, J. Han, and P. S. Yu, Direct Discriminative Pattern Mining
for Effective Classification, ICDE’08.
 H. Cheng, W. Fan, X. Yan, J. Gao, J. Han, and P. S. Yu, Classification with
Very Large Feature Sets and Skewed Distribution, In Preparation.
 J. Cheng, Y. Ke, W. Ng, and A. Lu. FG-Index: Towards Verification-Free Query
Processing on Graph Databases, SIGMOD’07.

121
References (2)
 G. Cong, K. Tan, A. Tung, and X. Xu. Mining Top-k Covering Rule Groups for
Gene Expression Data, SIGMOD’05.
 M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent
Substructure-based Approaches for Classifying Chemical Compounds,
TKDE’05.
 G. Dong and J. Li. Efficient Mining of Emerging Patterns: Discovering
Trends and Differences, KDD’99.
 G. Dong, X. Zhang, L. Wong, and J. Li. CAEP: Classification by Aggregating
Emerging Patterns, DS’99
 R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification (2nd ed.), John
Wiley & Sons, 2001.
 W. Fan, K. Zhang, H. Cheng, J. Gao, X. Yan, J. Han, P. S. Yu, and O.
Verscheure. Direct Mining of Discriminative and Essential Graphical and
Itemset Features via Model-based Search Tree, KDD’08.
 J. Han and M. Kamber. Data Mining: Concepts and Techniques (2nd ed.),
Morgan Kaufmann, 2006.
 J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate
Generation, SIGMOD’00.
 T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical
Learning, Springer, 2001.
 D. Heckerman, D. Geiger and D. M. Chickering. Learning Bayesian Networks:
The Combination of Knowledge and Statistical Data, Machine Learning,
1995.
122
References (3)
 T. Horvath, T. Gartner, and S. Wrobel. Cyclic Pattern Kernels for
Predictive Graph Mining, KDD’04.
 J. Huan, W. Wang, and J. Prins. Efficient Mining of Frequent Subgraph
in the Presence of Isomorphism, ICDM’03.
 A. Inokuchi, T. Washio, and H. Motoda. An Apriori-based Algorithm for
Mining Frequent Substructures from Graph Data, PKDD’00.
 T. Kudo, E. Maeda, and Y. Matsumoto. An Application of Boosting to
Graph Classification, NIPS’04.
 M. Kuramochi and G. Karypis. Frequent Subgraph Discovery, ICDM’01.
 W. Li, J. Han, and J. Pei. CMAR: Accurate and Efficient Classification
based on Multiple Class-association Rules, ICDM’01.
 B. Liu, W. Hsu, and Y. Ma. Integrating Classification and Association
Rule Mining, KDD’98.
 H. Liu, J. Han, D. Xin, and Z. Shao. Mining Frequent Patterns on Very
High Dimensional Data: A Topdown Row Enumeration Approach,
SDM’06.
 S. Nijssen, and J. Kok. A Quickstart in Frequent Structure Mining Can
Make a Difference, KDD’04.
 F. Pan, G. Cong, A. Tung, J. Yang, and M. Zaki. CARPENTER: Finding
Closed Patterns in Long Biological Datasets, KDD’03

123
References (4)
 F. Pan, A. Tung, G. Cong G, and X. Xu. COBBLER: Combining Column, and
Row enumeration for Closed Pattern Discovery, SSDBM’04.
 J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M-C. Hsu.
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-projected
Pattern Growth, ICDE’01.
 R. Srikant and R. Agrawal. Mining Sequential Patterns: Generalizations and
Performance Improvements, EDBT’96.
 Y. Sun, Y. Wang, and A. K. C. Wong. Boosting an Associative Classifier,
TKDE’06.
 P-N. Tan, V. Kumar, and J. Srivastava. Selecting the Right Interestingness
Measure for Association Patterns, KDD’02.
 R. Ting and J. Bailey. Mining Minimal Contrast Subgraph Patterns, SDM’06.
 N. Wale and G. Karypis. Comparison of Descriptor Spaces for Chemical
Compound Retrieval and Classification, ICDM’06.
 H. Wang, W. Wang, J. Yang, and P.S. Yu. Clustering by Pattern Similarity in
Large Data Sets, SIGMOD’02.
 J. Wang and G. Karypis. HARMONY: Efficiently Mining the Best Rules for
Classification, SDM’05.
 X. Yan, H. Cheng, J. Han, and P. S. Yu, Mining Significant Graph Patterns
by Scalable Leap Search, SIGMOD’08.
 X. Yan and J. Han. gSpan: Graph-based Substructure Pattern Mining,
ICDM’02.
124
References (5)
 X. Yan, P.S. Yu, and J. Han. Graph Indexing: A Frequent Structure-
based Approach, SIGMOD’04.
 X. Yin and J. Han. CPAR: Classification Based on Predictive
Association Rules, SDM’03.
 M.J. Zaki. Scalable Algorithms for Association Mining, TKDE’00.
 M.J. Zaki. SPADE: An Efficient Algorithm for Mining Frequent
Sequences, Machine Learning’01.
 M.J. Zaki and C.J. Hsiao. CHARM: An Efficient Algorithm for Closed
Itemset mining, SDM’02.
 F. Zhu, X. Yan, J. Han, P.S. Yu, and H. Cheng. Mining Colossal Frequent
Patterns by Core Pattern Fusion, ICDE’07.

125
Questions?

[email protected]
http://www.se.cuhk.edu.hk/~hcheng

2019/6/22 ICDM 08 Tutorial 126

BCA Data Mining
No ratings yet
BCA Data Mining
116 pages
Introduction To Data Mining 2005
60% (5)
Introduction To Data Mining 2005
400 pages
V7 Adobe Acrobat Pro DC 2018 (11 - 04-11 - 10) (11 - 25)
50% (2)
V7 Adobe Acrobat Pro DC 2018 (11 - 04-11 - 10) (11 - 25)
6 pages
Dunham - Data Mining PDF
83% (6)
Dunham - Data Mining PDF
156 pages
Data Mining and Analysis: Fundamental Concepts and Algorithms
No ratings yet
Data Mining and Analysis: Fundamental Concepts and Algorithms
9 pages
ATC - Lecture - Notes - Data Mining Techniques - 2021
No ratings yet
ATC - Lecture - Notes - Data Mining Techniques - 2021
77 pages
Dunham - Data Mining PDF
100% (1)
Dunham - Data Mining PDF
156 pages
1.3 What Kind of Data Can Be Mined?
No ratings yet
1.3 What Kind of Data Can Be Mined?
5 pages
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
100% (1)
Data Classification - Algorithms and Applications-Chapman and Hall - CRC (2014) - (Chapman & Hall - CRC Data Mining and Knowledge Discovery Series) Charu C. Aggarwal PDF
704 pages
Data Mining Slide
No ratings yet
Data Mining Slide
35 pages
Data Mining
No ratings yet
Data Mining
33 pages
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
No ratings yet
CT075!3!2-DTM-Topic 8 - Introduction To Data Mining
32 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
Lect 1
No ratings yet
Lect 1
38 pages
DM 2
No ratings yet
DM 2
71 pages
Data Mining
No ratings yet
Data Mining
37 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
3 DM
No ratings yet
3 DM
36 pages
KDDM-Lecture 3
No ratings yet
KDDM-Lecture 3
21 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Data Management
No ratings yet
Data Management
36 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
Lecture 1
No ratings yet
Lecture 1
55 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Week 3
No ratings yet
Week 3
56 pages
Data Mining Tasks
No ratings yet
Data Mining Tasks
24 pages
PROFICIENCY Data Mining
No ratings yet
PROFICIENCY Data Mining
6 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
On Unit-3
No ratings yet
On Unit-3
30 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Tutorial
No ratings yet
Tutorial
52 pages
Association Rules Max-Pattern Closed-Pattern Sequential Pattern
No ratings yet
Association Rules Max-Pattern Closed-Pattern Sequential Pattern
8 pages
DM Overview
No ratings yet
DM Overview
52 pages
DMlecture 1
No ratings yet
DMlecture 1
39 pages
Wk. 1. Introduction (08.10.2020)
No ratings yet
Wk. 1. Introduction (08.10.2020)
30 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Frequent Pattern For Classification
No ratings yet
Frequent Pattern For Classification
10 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Darood
No ratings yet
Darood
22 pages
Slides Courtesy: Ling Chen [email protected]
No ratings yet
Slides Courtesy: Ling Chen [email protected]
42 pages
Cka Practice Questions
100% (1)
Cka Practice Questions
11 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Data Mining Techniques
No ratings yet
Data Mining Techniques
11 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
Planning and Design of Radiology & Imaging Sciences
100% (1)
Planning and Design of Radiology & Imaging Sciences
39 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
47 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Data Mining-Model Based Clustering
No ratings yet
Data Mining-Model Based Clustering
8 pages
Data Mining All Summary
No ratings yet
Data Mining All Summary
47 pages
Data Mining Intro IEP
No ratings yet
Data Mining Intro IEP
47 pages
Content DM
No ratings yet
Content DM
10 pages
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
No ratings yet
What Is Not Data Mining - Ex: Generation of Attendance Report (Of A Course) From Registration Cards. - Student Table (STD)
33 pages
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
0% (1)
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
20 pages
Assessments in Occupational Therapy Mental Health An Integrative Approach, 4th Edition Full Digital Edition
100% (15)
Assessments in Occupational Therapy Mental Health An Integrative Approach, 4th Edition Full Digital Edition
16 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
Is Zc415 (Data Mining BITS-WILP)
No ratings yet
Is Zc415 (Data Mining BITS-WILP)
4 pages
METERIAL REQUISITION FIL 2015-2016
No ratings yet
METERIAL REQUISITION FIL 2015-2016
329 pages
Data Mining - Classification Using Frequent Pattern
No ratings yet
Data Mining - Classification Using Frequent Pattern
8 pages
Globe Telecom Globe Telecom: Jedi + 3G Migration
No ratings yet
Globe Telecom Globe Telecom: Jedi + 3G Migration
51 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Geol 194 Syllabus Revised
No ratings yet
Geol 194 Syllabus Revised
4 pages
How You Can Talk With God
No ratings yet
How You Can Talk With God
5 pages
User Manual Geafol Neo en
No ratings yet
User Manual Geafol Neo en
25 pages
How To Use DNA Baser - 2 Minutes Video Tutorial - Url
No ratings yet
How To Use DNA Baser - 2 Minutes Video Tutorial - Url
13 pages
Fpsyt 15 1458939
No ratings yet
Fpsyt 15 1458939
11 pages
UG - CAO - .00132-002 Tools & Equipment
No ratings yet
UG - CAO - .00132-002 Tools & Equipment
39 pages
Access Que
No ratings yet
Access Que
19 pages
21 - Olorunfemi - Assessment of The Effect
No ratings yet
21 - Olorunfemi - Assessment of The Effect
7 pages
AEIF 2024 Proposal Forms
No ratings yet
AEIF 2024 Proposal Forms
10 pages
National-Oilwell: Top Drive
No ratings yet
National-Oilwell: Top Drive
6 pages
LP 4TH Grade 10 Day1
No ratings yet
LP 4TH Grade 10 Day1
3 pages
Namra Finance Limited
No ratings yet
Namra Finance Limited
5 pages
La Liberación Del Libro. Una Crítica Del Sistema de Precio Fijo. Pedro Schwartz.
No ratings yet
La Liberación Del Libro. Una Crítica Del Sistema de Precio Fijo. Pedro Schwartz.
79 pages
KALAnnualReport2016 17
No ratings yet
KALAnnualReport2016 17
92 pages
Lesson Plan (Thai Son)
No ratings yet
Lesson Plan (Thai Son)
8 pages
New Developments in Freefem++
No ratings yet
New Developments in Freefem++
16 pages
C-Data Gepon Olt Fd2000s Ems User Manual-V2.0
No ratings yet
C-Data Gepon Olt Fd2000s Ems User Manual-V2.0
67 pages
Stock Tables and Stock Types
No ratings yet
Stock Tables and Stock Types
10 pages
Excise, Taxation and Narcotics - Government of Sindh
No ratings yet
Excise, Taxation and Narcotics - Government of Sindh
1 page
Hipotesis Uji T Kontrol Dan Intervensi
No ratings yet
Hipotesis Uji T Kontrol Dan Intervensi
3 pages
Avoid News Part1 TEXT PDF
No ratings yet
Avoid News Part1 TEXT PDF
11 pages
Special Comment(s) Overall:: 9299724. Clark Builders. Raymond Block - Level3 - Zone1 - Phase1. March 01, 2018
No ratings yet
Special Comment(s) Overall:: 9299724. Clark Builders. Raymond Block - Level3 - Zone1 - Phase1. March 01, 2018
3 pages
Adjective Order NA
No ratings yet
Adjective Order NA
2 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Six Sigma For Dummies
From Everand
Six Sigma For Dummies
Craig Gygi
3.5/5 (23)