0% found this document useful (0 votes)

34 views58 pages

CH 4

The document discusses concept description and association rule mining. It defines concept description as providing concise characterizations and comparisons of data collections through techniques like data generalization, summarization, and analytical attribute analysis. Association rule mining finds frequent patterns and correlations among items in transactional databases. It describes market basket analysis and the Apriori algorithm for finding frequent item sets.

Uploaded by

gauravkhunt110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views58 pages

CH 4

Uploaded by

gauravkhunt110

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Unit 4

Unit 5- Concept Description &

Association Rule Mining
 What is concept description?
 Data generalization and summarization-based
characterization
 Analytical characterization: Analysis of attribute relevance
 Market basket analysis
 Finding frequent item sets
 Apriori algorithm
 Improved Apriori algorithm
 Incremental ARM
 Associative Classification
Concept Description
 Descriptive vs. predictive data mining

 Descriptive mining: describes concepts or task-relevant data

sets in concise, summarative, informative, discriminative
forms
 Predictive mining: Based on data and analysis, constructs
models for the database, and predicts the trend and
properties of unknown data
 Concept description:
 Characterization: provides a concise and succinct
summarization of the given collection of data
 Comparison: provides descriptions comparing two or more
collections of data
Concept Description vs. OLAP
 Concept description:
 can handle complex data types of the attributes and
their aggregations
 a more automated process
 OLAP:
 restricted to a small number of dimension and measure
types
 user-controlled process
Unit 5- Concept Description &
Association Rule Mining
 What is concept description?
 Data generalization and summarization-based
characterization
 Analytical characterization: Analysis of attribute relevance
 Market basket analysis
 Finding frequent item sets
 Apriori algorithm
 Improved Apriori algorithm
 Incremental ARM
 Associative Classification
Data Generalization and Summarization-based
Characterization
 Data generalization
 A process which abstracts a large set of task-relevant
data in a database from a low conceptual levels to higher
ones. 1
2
3
4
Conceptual Levels
5

 Approaches:
 Data cube approach(OLAP approach)
 Attribute-oriented induction approach
Characterization: Data Cube Approach
(without using AO-Induction)
 Perform computations and store results in data cubes
 Strength
 An efficient implementation of data generalization
 Computation of various kinds of measures
 e.g., count( ), sum( ), average( ), max( )
 Generalization and specialization can be performed on a data cube
by roll-up and drill-down
 Limitations
 handle only dimensions of simple nonnumeric data and measures of
simple aggregated numeric values.
 Lack of intelligent analysis, can’t tell which dimensions should be
used and what levels should the generalization reach
Attribute-Oriented Induction
 Not confined to categorical data nor particular measures.
 How it is done?
 Collect the task-relevant data( initial relation) using a
relational database query
 Perform generalization by attribute removal or attribute
generalization.
 Apply aggregation by merging identical, generalized tuples
and accumulating their respective counts.
 Interactive presentation with users.
Basic Principles of Attribute-Oriented
Induction
 Data focusing: task-relevant data, including dimensions, and the result
is the initial relation.
 Attribute-removal: remove attribute A if there is a large set of distinct
values for A but (1) there is no generalization operator on A, or (2) A’s
higher level concepts are expressed in terms of other attributes.
 Attribute-generalization: If there is a large set of distinct values for A,
and there exists a set of generalization operators on A, then select an
operator and generalize A.
 Attribute-threshold control: typical 2-8, specified/default.
 Generalized relation threshold control: control the final relation/rule
size.
Basic Algorithm for Attribute-Oriented
Induction
 InitialRel: Query processing of task-relevant data, deriving the initial
relation.
 PreGen: Based on the analysis of the number of distinct values in each
attribute, determine generalization plan for each attribute: removal? or
how high to generalize?
 PrimeGen: Based on the PreGen plan, perform generalization to the
right level to derive a “prime generalized relation”, accumulating the
counts.
 Presentation: User interaction: (1) adjust levels by drilling, (2) pivoting,
(3) mapping into rules, cross tabs, visualization presentations.
Example
 DMQL: Describe general characteristics of graduate students in
the Big-University database
use Big_University_DB
mine characteristics as “Science_Students”
in relevance to name, gender, major, birth_place,
birth_date, residence, phone#, gpa
from student
where status in “graduate”
 Corresponding SQL statement:
Select name, gender, major, birth_place, birth_date,
residence, phone#, gpa
from student
where status in {“Msc”, “MBA”, “PhD” }
Class Characterization: An Example
Name Gender Major Birth-Place Birth_date Residence Phone # GPA
Jim M CS Vancouver,BC, 8-12-76 3511 Main St., 687-4598 3.67
Initial Woodman Canada Richmond
Relation Scott M CS Montreal, Que, 28-7-75 345 1st Ave., 253-9106 3.70
Lachance Canada Richmond
Laura Lee F Physics Seattle, WA, USA 25-8-70 125 Austin Ave., 420-5232 3.83
… … … … … Burnaby … …
…
Removed Retained Sci,Eng, Country Age range City Removed Excl,
Bus VG,..
Gender Major Birth_region Age_range Residence GPA Count
Prime Generalized M Science Canada 20-25 Richmond Very-good 16
F Science Foreign 25-30 Burnaby Excellent 22
Relation
… … … … … … …

Birth_Region
Canada Foreign Total
Gender
M 16 14 30
F 10 22 32
Total 26 36 62
Presentation of Generalized Results
 Generalized relation:
Relations where some or all attributes are generalized, with counts
or other aggregation values accumulated.
 Cross tabulation:
Mapping results into cross tabulation form (similar to contingency
tables).
Visualization techniques:
Pie charts, bar charts, curves, cubes, and other visual forms.
 Quantitative characteristic rules:
Mapping generalized result into characteristic rules with quantitative
information associated with it, e.g.,
grad( x)  male( x) 
birth_ region( x) "Canada"[t :53%] birth_ region( x) " foreign"[t : 47%].
Unit 5- Concept Description &
Association Rule Mining
 What is concept description?
 Data generalization and summarization-based
characterization
 Analytical characterization: Analysis of attribute relevance
 Association Mining
 Market basket analysis
 Finding frequent item sets
 Apriori algorithm
 Improved Apriori algorithm
 Incremental ARM
 Associative Classification
Association Mining

 Association rule mining:

 Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in
transaction databases, relational databases, and other
information repositories.
 Applications:
 Basket data analysis, cross-marketing, catalog design,
loss-leader analysis, clustering, classification, etc.
What Is Frequent Pattern Analysis?
 Frequent pattern: a pattern (a set of items, subsequences, substructures,
etc.) that occurs frequently in a data set
 First proposed by Agrawal, Imielinski, and Swami [AIS93] in the context of
frequent itemsets and association rule mining
 Motivation: Finding inherent regularities in data
 What products were often purchased together?— Beer and diapers?!
 What are the subsequent purchases after buying a PC?
 What kinds of DNA are sensitive to this new drug?
 Can we automatically classify web documents?
 Applications
 Basket data analysis, cross-marketing, catalog design, sale campaign
analysis, Web log (click stream) analysis, and DNA sequence analysis.
Why Is Freq. Pattern Mining Important?
 Discloses an intrinsic and important property of data sets
 Forms the foundation for many essential data mining tasks
 Association, correlation, and causality analysis
 Sequential, structural (e.g., sub-graph) patterns
 Pattern analysis in spatiotemporal, multimedia, time-series,
and stream data
 Classification: associative classification
 Cluster analysis: frequent pattern-based clustering
 Data warehousing: iceberg cube and cube-gradient
 Semantic data compression: fascicles
 Broad applications
Basic Concepts: Frequent Patterns and Association
Rules
Transaction-id Items bought  Itemset X = {x1, …, xk}
10 A, B, D  Find all the rules X  Y with minimum
20 A, C, D support and confidence
30 A, D, E
 support, s, probability that a
40 B, E, F
transaction contains X  Y
50 B, C, D, E, F
 confidence, c, conditional
Customer Customer probability that a transaction
buys both buys diaper
having X also contains Y
Let supmin = 50%, confmin = 50%
Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3}
Association rules:
Customer A  D (60%, 100%)
buys beer
D  A (60%, 75%)
Closed Patterns and Max-Patterns
 A long pattern contains a combinatorial number of sub-
patterns, e.g., {a1, …, a100} contains (1001) + (1002) + … + (110000) =
2100 – 1 = 1.27*1030 sub-patterns!
 Solution: Mine closed patterns and max-patterns instead
 An itemset X is closed if X is frequent and there exists no
super-pattern Y ‫ כ‬X, with the same support as X (proposed by
Pasquier, et al. @ ICDT’99)
 An itemset X is a max-pattern if X is frequent and there
exists no frequent super-pattern Y ‫ כ‬X (proposed by Bayardo
@ SIGMOD’98)
 Closed pattern is a lossless compression of freq. patterns
 Reducing the # of patterns and rules
Closed Patterns and Max-Patterns
 Exercise. DB = {<a1, …, a100>, < a1, …, a50>}
 Min_sup = 1.
 What is the set of closed itemset?
 <a1, …, a100>: 1
 < a1, …, a50>: 2
 What is the set of max-pattern?
 <a1, …, a100>: 1
 What is the set of all patterns?
 !!
Chapter 5: Mining Frequent Patterns,
Association and Correlations
 Basic concepts and a road map
 Efficient and scalable frequent itemset mining
methods
 Mining various kinds of association rules
 From association mining to correlation analysis
 Constraint-based association mining
 Summary
Scalable Methods for Mining Frequent Patterns

 The downward closure property of frequent patterns

 Any subset of a frequent itemset must be frequent
 If {beer, diaper, nuts} is frequent, so is {beer, diaper}
 i.e., every transaction having {beer, diaper, nuts} also
contains {beer, diaper}
 Scalable mining methods: Two major approaches
 Apriori
 Freq. pattern growth
Apriori: A Candidate Generation-and-Test Approach

 Apriori Principle : All nonempty subsets of a frequent itemset

must also be frequent.
 Apriori pruning principle: If there is any itemset which is
infrequent, its superset should not be generated/tested!
Method:
 Initially, scan DB once to get frequent 1-itemset
 Generate length (k+1) candidate itemsets from length k
frequent itemsets
 Test the candidates against DB
 Terminate when no frequent or candidate set can be
generated
Example
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
Tid Items
L1 {A} 2
C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset L3 Itemset sup

3rd scan {B, C, E} 2
{B, C, E}
Example
Example
The Apriori Algorithm
 Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Important Details of Apriori
 How to generate candidates?
 Step 1: self-joining Lk
 Step 2: pruning
 How to count supports of candidates?
 Example of Candidate-generation
 L3={abc, abd, acd, ace, bcd}
 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because ade is not in L3
 C4={abcd}
Challenges of Frequent Pattern Mining

 Challenges
 Multiple scans of transaction database

 Huge number of candidates

 Tedious workload of support counting for candidates

 Improving Apriori: general ideas

 Reduce passes of transaction database scans

 Shrink number of candidates

 Facilitate support counting of candidates

Methods to Improve Apriori’s Efficiency
 Hash-based itemset counting: A k-itemset whose
corresponding hashing bucket count is below the
threshold cannot be frequent.
 Transaction reduction: A transaction that does not contain
any frequent k-itemset is useless in subsequent scans.
 Partitioning: Any itemset that is potentially frequent in DB
must be frequent in at least one of the partitions of DB.
 Sampling: mining on a subset of given data, lower support
threshold + a method to determine the completeness.
 Dynamic itemset counting: add new candidate itemsets
only when all of their subsets are estimated to be frequent.
Outline of the Presentation

Outline
 Frequent Pattern Mining: Problem statement and an
example
 Review of Apriori-like Approaches
 FP-Growth:
 Overview
 FP-tree:
structure, construction and advantages
 FP-growth:
 FP-tree conditional pattern bases  conditional FP-tree
frequent patterns
 Experiments
 Discussion:
 Improvement of FP-growth
 Conclusion Remarks

32
Frequent Pattern Mining: An
Example
Given a transaction database DB and a minimum support threshold ξ,
find all frequent patterns (item sets) with support no less than ξ.

Input: DB: TID Items bought

100 {f, a, c, d, g, i, m, p}
200 {a, b, c, f, l, m, o}
300 {b, f, h, j, o}
400 {b, c, k, s, p}
500 {a, f, c, e, l, p, m, n}

Minimum support: ξ =3

Output: all frequent patterns, i.e., f, a, …, fa, fac, fam, fm,am…

Problem Statement: How to efficiently find all frequent patterns?

33
Apriori
Candidate
 Main Steps of Apriori Algorithm: Generation
 Use frequent (k – 1)-itemsets (Lk-1) to generate candidates of
frequent k-itemsets Ck
 Scan database and count each pattern in Ck , get frequent k-
itemsets ( Lk ) .
 E.g. , Candidate
Test
TID Items bought Apriori iteration
100 {f, a, c, d, g, i, m, p} C1 f,a,c,d,g,i,m,p,l,o,h,j,k,s,b,e,n
200 {a, b, c, f, l, m, o} L1 f, a, c, m, b, p

300 {b, f, h, j, o} C2 fa, fc, fm, fp, ac, am, …bp

400 {b, c, k, s, p} L2 fa, fc, fm, …
500 {a, f, c, e, l, p, m, n} …

34
Performance Bottlenecks of Apriori

 Bottlenecks of Apriori: candidate generation

 Generate huge candidate sets:
 104 frequent 1-itemset will generate 107 candidate 2-itemsets
 To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100},
one needs to generate 2100  1030 candidates.

 Candidate Test incur multiple scans of database:

each candidate

35
Overview of FP-Growth: Ideas
 Compress a large database into a compact, Frequent-
Pattern tree (FP-tree) structure
 highly compacted, but complete for frequent pattern mining
 avoid costly repeated database scans

 Develop an efficient, FP-tree-based frequent pattern

mining method (FP-growth)
 A divide-and-conquer methodology: decompose mining tasks into
smaller ones
 Avoid candidate generation: sub-database test only.

36
FP-tree:
Construction and Design
FP-tree

Construct FP-tree
Two Steps:
1. Scan the transaction DB for the first time, find frequent
items (single item patterns) and order them into a list L
in frequency descending order.
e.g., L={f:4, c:4, a:3, b:3, m:3, p:3}
In the format of (item-name, support)
2. For each transaction, order its frequent items according
to the order in L; Scan DB the second time, construct FP-
tree by putting each frequency ordered transaction onto
it.
38
FP-tree Example: step 1
Step 1: Scan DB once, find frequent 1-itemset

TID Items bought

100 {f, a, c, d, g, i, m, p}
200 {a, b, c, f, l, m, o} Item frequency
300 {b, f, h, j, o} f 4
400 {b, c, k, s, p} c 4
500 {a, f, c, e, l, p, m, n} a 3
b 3
m 3
p 3

39
FP-tree Example: step 2

Step 2: scan the DB for the second time, order frequent items
in each transaction

TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}

40
FP-tree Example: step 2
Step 2: construct FP-tree

{} {}

f:1 f:2
{f, c, a, m, p} {f, c, a, b, m}
{} c:1 c:2

a:1 a:2

m:1 m:1 b:1

NOTE: Each
transaction
corresponds to one p:1 p:1 m:1
path in the FP-tree

41
FP-tree Example: step 2
Step 2: construct FP-tree

{} {} {}

f:3 f:3 c:1 f:4 c:1

{f, b} {c, b, p} {f, c, a, m, p}
c:2 b:1 c:2 b:1 b:1 c:3 b:1 b:1

a:2 a:2 p:1 a:3 p:1

m:1 b:1 m:1 b:1 m:2 b:1

p:1 m:1 p:1 m:1 p:2 m:1

Node-Link

42
Construction Example
Final FP-tree

{}
Header Table
f:4 c:1
Item head
f
c c:3 b:1 b:1
a
b a:3 p:1
m
p m:2 b:1

p:2 m:1

43
FP-Tree Definition
 FP-tree is a frequent pattern tree . Formally, FP-tree is a tree
structure defined below:
1. One root labeled as “null", a set of item prefix sub-trees as the
children of the root, and a frequent-item header table.
2. Each node in the item prefix sub-trees has three fields:
 item-name : register which item this node represents,
 count, the number of transactions represented by the portion of the
path reaching this node,
 node-link that links to the next node in the FP-tree carrying the same
item-name, or null if there is none.
3. Each entry in the frequent-item header table has two fields,
 item-name, and
 head of node-link that points to the first node in the FP-tree carrying
the item-name.

44
Advantages of the FP-tree Structure
 The most significant advantage of the FP-tree
 Scan the DB only twice and twice only.

 Completeness:
 the FP-tree contains all the information related to mining frequent
patterns (given the min-support threshold). Why?

 Compactness:
 The size of the tree is bounded by the occurrences of frequent items

 The height of the tree is bounded by the maximum number of items in

a transaction

45
Questions?
 Why descending order?
 Example 1: {}

f:1 a:1

TID (unordered) frequent items

100 {f, a, c, m, p} a:1 f:1
500 {a, f, c, p, m}
c:1 c:1

m:1 p:1

p:1 m:1

46
FP-tree

Questions?
 Example 2: {}
TID (ascended) frequent items
100 {p, m, a, c, f} p:3 m:2 c:1
200 {m, b, a, c, f}
300 {b, f} m:2 b:1 b:1 b:1
400 {p, b, c}
500 {p, m, a, c, f}
a:2 c:1 a:2 p:1

This tree is larger than FP-tree,

c:2 c:1
because in FP-tree, more frequent
items have a higher position, which
makes branches less f:2 f:2

47
FP-growth:
Mining Frequent Patterns
Using FP-tree
Mining Frequent Patterns Using FP-tree
 General idea (divide-and-conquer)
Recursively grow frequent patterns using the FP-tree:
looking for shorter ones recursively and then concatenating
the suffix:
 For each frequent item, construct its conditional pattern
base, and then its conditional FP-tree;
 Repeat the process on each newly created conditional FP-
tree until the resulting FP-tree is empty, or it contains
only one path (single path will generate all the
combinations of its sub-paths, each of which is a frequent
pattern)

49
3 Major Steps
Starting the processing from the end of list L:
Step 1:
Construct conditional pattern base for each item in the header
table
Step 2
Construct conditional FP-tree from each conditional pattern base
Step 3
Recursively mine conditional FP-trees and grow frequent patterns
obtained so far. If the conditional FP-tree contains a single path,
simply enumerate all the patterns

50
Step 1: Construct Conditional Pattern Base
 Starting at the bottom of frequent-item header table in the FP-tree
 Traverse the FP-tree by following the link of each frequent item
 Accumulate all of transformed prefix paths of that item to form a
conditional pattern base
{} Conditional pattern bases
Header Table item cond. pattern base
f:4 c:1
Item head p fcam:2, cb:1
f m fca:2, fcab:1
c c:3 b:1 b:1
a b fca:1, f:1, c:1
b a:3 p:1 a fc:3
m
p m:2 b:1 c f:3
f {}
p:2 m:1
51
Properties of FP-Tree
 Node-link property
 For any frequent item ai, all the possible frequent patterns that
contain ai can be obtained by following ai's node-links, starting from
ai's head in the FP-tree header.
 Prefix path property
 To calculate the frequent patterns for a node ai in a path P, only the
prefix sub-path of ai in P need to be accumulated, and its frequency
count should carry the same count as node ai.

52
Step 2: Construct Conditional FP-tree
 For each pattern base
 Accumulate the count for each item in the base
 Construct the conditional FP-tree for the frequent items of the
pattern base

{}
Header Table
Item head f:4 {}
f 4
c 4 c:3 f:3
m- cond. pattern base:
a 3
b 3
a:3  fca:2, fcab:1 
c:3
m 3 m:2 b:1
p 3 a:3
m:1 m-conditional FP-tree

53
Step 3: Recursively mine the conditional FP-tree
conditional FP-tree of conditional FP-tree of conditional FP-tree of
“m”: (fca:3) “am”: (fc:3) add “cam”: (f:3)
{} “c” {}
{} add Frequent Pattern Frequent Pattern
Frequent Pattern
“a” f:3 f:3
f:3 add c:3 add add
“c” “f” “f”
c:3
conditional FP-tree of conditional FP-tree of
a:3 “cm”: (f:3) of “fam”: 3
add
{} “f”
Frequent Pattern
Frequent Pattern
add conditional FP-tree of
f:3 “fcm”: 3
“f”

Frequent Pattern Frequent Pattern

fcam
conditional FP-tree of “fm”: 3

54
Frequent Pattern
Principles of FP-Growth
 Pattern growth property
 Let  be a frequent itemset in DB, B be 's conditional pattern
base, and  be an itemset in B. Then    is a frequent itemset
in DB iff  is frequent in B.
 Is “fcabm ” a frequent pattern?
 “fcab” is a branch of m's conditional pattern base
 “b” is NOT frequent in transactions containing “fcab ”
 “bm” is NOT a frequent itemset.

55
Conditional Pattern Bases and
Conditional FP-Tree

Item Conditional pattern base Conditional FP-tree

p {(fcam:2), (cb:1)} {(c:3)}|p
m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m
b {(fca:1), (f:1), (c:1)} Empty
a {(fc:3)} {(f:3, c:3)}|a
c {(f:3)} {(f:3)}|c
f Empty Empty
order of L
56
Single FP-tree Path Generation
 Suppose an FP-tree T has a single path P. The complete set of frequent
pattern of T can be generated by enumeration of all the combinations of
the sub-paths of P
{}
All frequent patterns concerning m:
combination of {f, c, a} and m
f:3
m,
c:3  fm, cm, am,
fcm, fam, cam,
a:3
fcam
m-conditional FP-tree

57
Summary of FP-Growth Algorithm
 Mining frequent patterns can be viewed as first
mining 1-itemset and progressively growing each 1-
itemset by mining on its conditional pattern base
recursively

 Transform a frequent k-itemset mining problem into a

sequence of k frequent 1-itemset mining problems via
a set of conditional pattern bases

Language and National Identity in Asia PDF
100% (1)
Language and National Identity in Asia PDF
477 pages
Middle Childhood, Cognitive and Physical Development 112.Ppt Jas
100% (3)
Middle Childhood, Cognitive and Physical Development 112.Ppt Jas
51 pages
Always More Than One by Erin Manning
100% (4)
Always More Than One by Erin Manning
41 pages
Lecture 2.1.1 2.1.2
No ratings yet
Lecture 2.1.1 2.1.2
23 pages
Data Mining: Concepts and Techniques: - Chapter 5
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 5
63 pages
Data Mining: Concepts and Techniques: April 30, 2012
No ratings yet
Data Mining: Concepts and Techniques: April 30, 2012
64 pages
Data Mining: Concepts and Techniques: November 21, 2013
No ratings yet
Data Mining: Concepts and Techniques: November 21, 2013
64 pages
Data Mining: Concepts and Techniques: January 14, 2014
No ratings yet
Data Mining: Concepts and Techniques: January 14, 2014
64 pages
Chapter 5 Concept Description Characterization and Comparison 395
No ratings yet
Chapter 5 Concept Description Characterization and Comparison 395
64 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
73 pages
Unit 2: Scs5623 - Data Mining and Warehousing
No ratings yet
Unit 2: Scs5623 - Data Mining and Warehousing
9 pages
Data Mining Unit3
No ratings yet
Data Mining Unit3
19 pages
5 Desc
No ratings yet
5 Desc
60 pages
Unit III: Concept Description: Characterization and Comparison
No ratings yet
Unit III: Concept Description: Characterization and Comparison
53 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
22 pages
Chapter 5: Concept Description: Characterization and Comparison
No ratings yet
Chapter 5: Concept Description: Characterization and Comparison
58 pages
Data Mining Unit2
No ratings yet
Data Mining Unit2
9 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
Concept Description:: Characterization & Comparison
No ratings yet
Concept Description:: Characterization & Comparison
51 pages
Mining Frequent Patterns, Association and Correlations
No ratings yet
Mining Frequent Patterns, Association and Correlations
42 pages
DM Concepts
No ratings yet
DM Concepts
64 pages
Association Rules
No ratings yet
Association Rules
64 pages
6.concept Description Characterization and Comparison
No ratings yet
6.concept Description Characterization and Comparison
69 pages
06 FPBasic
No ratings yet
06 FPBasic
65 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 5
64 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
ADB Slides 5
No ratings yet
ADB Slides 5
52 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Data Warehousing/Mining Comp 150 DW Chapter 5: Concept Description: Characterization and Comparison
No ratings yet
Data Warehousing/Mining Comp 150 DW Chapter 5: Concept Description: Characterization and Comparison
59 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
Unit 3
No ratings yet
Unit 3
38 pages
FP Tree Basics
No ratings yet
FP Tree Basics
67 pages
Attribute Oriented Induction
100% (1)
Attribute Oriented Induction
6 pages
Lecture 2.3.1 2.3.2
No ratings yet
Lecture 2.3.1 2.3.2
23 pages
An 15 DM Caracterizacion
No ratings yet
An 15 DM Caracterizacion
38 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
06apriori Edited v3
No ratings yet
06apriori Edited v3
29 pages
Data Warehousing/Mining Comp 150 DW Chapter 5: Concept Description: Characterization and Comparison
No ratings yet
Data Warehousing/Mining Comp 150 DW Chapter 5: Concept Description: Characterization and Comparison
59 pages
Module 3
No ratings yet
Module 3
136 pages
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
No ratings yet
Data Mining Session 6 - Main Theme Mining Frequent Patterns, Association, and Correlations Dr. Jean-Claude Franchitti
66 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
93 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
67 pages
CH 5
No ratings yet
CH 5
45 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
06 FPBasic
No ratings yet
06 FPBasic
59 pages
Ch5 DataMIning
No ratings yet
Ch5 DataMIning
99 pages
Presentations PPT Unit-5 29042019034847AM
No ratings yet
Presentations PPT Unit-5 29042019034847AM
39 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Data Mining Unit-III
No ratings yet
Data Mining Unit-III
5 pages
4 Association
No ratings yet
4 Association
66 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Week 3
No ratings yet
Week 3
56 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
94 pages
ATC - Lecture - Notes - Data Mining Techniques - 2021
No ratings yet
ATC - Lecture - Notes - Data Mining Techniques - 2021
77 pages
ICS 2408 - Lecture 5 - Association
No ratings yet
ICS 2408 - Lecture 5 - Association
44 pages
Efficient Mining Frequent Itemsets Algorithms: Marghny H. Mohamed Mohammed M. Darwieesh
No ratings yet
Efficient Mining Frequent Itemsets Algorithms: Marghny H. Mohamed Mohammed M. Darwieesh
11 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
UNIT 1 Introduction of Data Mining
No ratings yet
UNIT 1 Introduction of Data Mining
11 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
DM GTU Study Material Presentations Unit-3 21052021124240PM
No ratings yet
DM GTU Study Material Presentations Unit-3 21052021124240PM
54 pages
Learning R for Geospatial Analysis
From Everand
Learning R for Geospatial Analysis
Michael Dorman
No ratings yet
An Nahw
No ratings yet
An Nahw
31 pages
Counseling
No ratings yet
Counseling
4 pages
Filipino Reading Assessment Sa Phil IRI
No ratings yet
Filipino Reading Assessment Sa Phil IRI
20 pages
1E English Weekly Lesson Plan For Topic 1 Year 2017
No ratings yet
1E English Weekly Lesson Plan For Topic 1 Year 2017
2 pages
Chicaiza Verónica M0x Autonomouswork1
No ratings yet
Chicaiza Verónica M0x Autonomouswork1
5 pages
Class-9 Computer Ch-1 Part-2 QandA Brajesh
No ratings yet
Class-9 Computer Ch-1 Part-2 QandA Brajesh
3 pages
Independent Study Skills Rakhmatova M
No ratings yet
Independent Study Skills Rakhmatova M
96 pages
ALX Foundations Overview
No ratings yet
ALX Foundations Overview
20 pages
Unit 1 - Creative Writing
100% (1)
Unit 1 - Creative Writing
4 pages
Microlearning A Strategy For Ongoing Professional Development.
No ratings yet
Microlearning A Strategy For Ongoing Professional Development.
15 pages
Pearson Scope and Sequence
No ratings yet
Pearson Scope and Sequence
12 pages
BUS 205 - Chapter 01 Slides
No ratings yet
BUS 205 - Chapter 01 Slides
12 pages
Lidya Fitriani - 1814050075 - MID - SEMANTICS AND PRAGMATICS
No ratings yet
Lidya Fitriani - 1814050075 - MID - SEMANTICS AND PRAGMATICS
3 pages
RM 4 Prof Prabha Kumari BBA
No ratings yet
RM 4 Prof Prabha Kumari BBA
30 pages
Thick Description
No ratings yet
Thick Description
13 pages
Book Review Henry Mintzbergs Strategy Safari
100% (2)
Book Review Henry Mintzbergs Strategy Safari
19 pages
FDP02-Programme Schedule
No ratings yet
FDP02-Programme Schedule
1 page
Assignment 3
No ratings yet
Assignment 3
5 pages
LP Civic Year 1
No ratings yet
LP Civic Year 1
19 pages
1 - Discipline of Communication
100% (3)
1 - Discipline of Communication
7 pages
Jauhiainen Thesis PDF
No ratings yet
Jauhiainen Thesis PDF
296 pages
Cycles of Decision and Learning
No ratings yet
Cycles of Decision and Learning
28 pages
Class - 10 Syllabus
100% (1)
Class - 10 Syllabus
22 pages
851 - Organizational Behaviour-Pearson Education Limited (2020)
No ratings yet
851 - Organizational Behaviour-Pearson Education Limited (2020)
5 pages
ALVINN
No ratings yet
ALVINN
45 pages
Shaik Vali Basha - v1
No ratings yet
Shaik Vali Basha - v1
1 page
Portfolio Sociolinguistics
No ratings yet
Portfolio Sociolinguistics
7 pages

CH 4

Uploaded by

CH 4

Uploaded by

Unit 4

Unit 5- Concept Description &

 Descriptive mining: describes concepts or task-relevant data

 Association rule mining:

 The downward closure property of frequent patterns

 Apriori Principle : All nonempty subsets of a frequent itemset

C3 Itemset L3 Itemset sup

 Huge number of candidates

 Tedious workload of support counting for candidates

 Improving Apriori: general ideas

 Shrink number of candidates

 Facilitate support counting of candidates

Input: DB: TID Items bought

Output: all frequent patterns, i.e., f, a, …, fa, fac, fam, fm,am…

Problem Statement: How to efficiently find all frequent patterns?

300 {b, f, h, j, o} C2 fa, fc, fm, fp, ac, am, …bp

 Bottlenecks of Apriori: candidate generation

 Candidate Test incur multiple scans of database:

 Develop an efficient, FP-tree-based frequent pattern

TID Items bought

TID Items bought (ordered) frequent items

m:1 m:1 b:1

f:3 f:3 c:1 f:4 c:1

a:2 a:2 p:1 a:3 p:1

m:1 b:1 m:1 b:1 m:2 b:1

p:1 m:1 p:1 m:1 p:2 m:1

 The height of the tree is bounded by the maximum number of items in

TID (unordered) frequent items

This tree is larger than FP-tree,

Frequent Pattern Frequent Pattern

Item Conditional pattern base Conditional FP-tree

 Transform a frequent k-itemset mining problem into a

You might also like