0% found this document useful (0 votes)
54 views12 pages

Data Mining: Sunitha R S Dept of ISE, RIT

The document discusses methods for generating frequent itemsets from transactional datasets in a concise yet informative manner. It defines key terms like frequent itemset, maximal frequent itemset, and closed frequent itemset. It then outlines various alternative methods for generating frequent itemsets, including traversing the itemset lattice using breadth-first or depth-first approaches, representing the database using horizontal or vertical data layouts, and employing prefix trees or suffix trees. The goal is to efficiently identify representative sets of frequent itemsets from large transactional datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
54 views12 pages

Data Mining: Sunitha R S Dept of ISE, RIT

The document discusses methods for generating frequent itemsets from transactional datasets in a concise yet informative manner. It defines key terms like frequent itemset, maximal frequent itemset, and closed frequent itemset. It then outlines various alternative methods for generating frequent itemsets, including traversing the itemset lattice using breadth-first or depth-first approaches, representing the database using horizontal or vertical data layouts, and employing prefix trees or suffix trees. The goal is to efficiently identify representative sets of frequent itemsets from large transactional datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Mining

Sunitha R S
Dept of ISE,
RIT
Compact Representation of Frequent Itemsets
• The number of frequent itemsets produced from a
transaction dataset can be very large.

• It is useful to identify a small representative set of


itemsets from which all other frequent itemsets can be
derived.
Definitions
• Frequent Itemset: An itemset whose support is greater
than some user specified minimum support.

• Maximal Frequent Itemset: An itemset is maximal


frequent if none of its immediate supersets is frequent.

• Closed Frequent Itemset: An itemset is closed if none of


its immediate supersets has the same support as the
itemset.
Downward closure property:
• All subsets of any frequent itemset must also be
frequent.
• If milk,bread,butter is a frequent itemset, then
the following Itemsets are frequent.
milk,
bread
butter
milk, bread
milk, butter
bread, butter
• If there are k items then, we can generate 2k-1
frequent Itemsets.
Need for Maximal and Closed itemsets
• Used when the amount of data is huge.

• When the computation is very expensive and there is no


interest to find additional subsets. This can be avoided
by frequent itemset with maximum length.

• Disadvantage of maximal frequent itemsets then even


all its subsets are frequent, the support information is
not known. For mining rules support information is
important.

• So closed frequent itemset is preferred.


Closed Itemset

• An itemset is closed if none of its immediate


supersets has the same support as the itemset

TID Items Itemset Support Itemset Support


1 {A,B} {A} 4 {A,B,C} 2
2 {B,C,D} {B} 5 {A,B,D} 3
3 {A,B,C,D} {C} 3 {A,C,D} 2
4 {A,B,D} {D} 4 {B,C,D} 3
5 {A,B,C,D} {A,B} 4 {A,B,C,D} 2
{A,C} 2
{A,D} 3
{B,C} 3
{B,D} 4
{C,D} 3
Maximal vs Closed Itemsets

Frequent
Itemsets

Closed
Frequent
Itemsets

Maximal
Frequent
Itemsets
Alternative Methods for Generating Frequent
Itemsets
• Traversal of Itemset Lattice
– General-to-specific vs Specific-to-general
Frequent
itemset Frequent
border null null itemset null
border

.. .. ..
.. .. ..
Frequent
{a1,a2,...,an} {a1,a2,...,an} itemset {a1,a2,...,an}
border
(a) General-to-specific (b) Specific-to-general (c) Bidirectional
Alternative Methods for Generating Frequent
Itemsets
• Works effectively if the maximum length of the
frequent itemset is not too long.

• In general-to-specific we start with some general set


of items and merge the k-1 items to obtain the k-
itemset.

• In specific-to-general strategy specific frequent


Itemsets are considered first, before finding the more
general frequent Itemsets.
Alternative Methods for Generating Frequent
Itemsets
• Traversal of Itemset Lattice
– Equivalence Classes
null null

A B C D A B C D

AB AC AD BC BD CD AB AC BC AD BD CD

ABC ABD ACD BCD ABC ABD ACD BCD

ABCD ABCD

(a) Prefix tree (b) Suffix tree


Alternative Methods for Generating Frequent
Itemsets
• Traversal of Itemset Lattice
– Breadth-first vs Depth-first

(a) Breadth first (b) Depth first


Alternative Methods for Generating Frequent
Itemsets
• Representation of Database
– horizontal vs vertical data layout
Horizontal
Data Layout Vertical Data Layout
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B

You might also like