0% found this document useful (0 votes)
86 views2 pages

Improve The Efficiency of Apriori-Unit3

There are several variations of the Apriori algorithm that aim to improve its efficiency: 1) The hash-based technique decreases the size of candidate itemsets by hashing items into buckets. 2) Transaction reduction marks or deletes transactions that do not include frequent itemsets to reduce scanning. 3) Partitioning divides the database into partitions, finds local frequent itemsets in each, and then identifies global frequent itemsets with two scans.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
86 views2 pages

Improve The Efficiency of Apriori-Unit3

There are several variations of the Apriori algorithm that aim to improve its efficiency: 1) The hash-based technique decreases the size of candidate itemsets by hashing items into buckets. 2) Transaction reduction marks or deletes transactions that do not include frequent itemsets to reduce scanning. 3) Partitioning divides the database into partitions, finds local frequent itemsets in each, and then identifies global frequent itemsets with two scans.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

8/16/22, 4:40 PM How can we further improve the efficiency of Apriori-based mining?

How can we further improve the efficiency of Apriori-


based mining?

There are some variations of the Apriori algorithm that have been projected that target
developing the efficiency of the original algorithm which are as follows −

The hash-based technique (hashing itemsets into corresponding buckets) − A hash-


based technique can be used to decrease the size of the candidate k-itemsets, Ck, for k > 1.
For instance, when scanning each transaction in the database to create the frequent 1-
itemsets,L1, from the candidate 1-itemsets in C1, it can make some 2-itemsets for each
transaction, hash (i.e., map) them into the several buckets of a hash table structure, and
increase the equivalent bucket counts.

Transaction reduction − A transaction that does not include some frequent k-itemsets cannot
include some frequent (k + 1)-itemsets. Thus, such a transaction can be marked or deleted
from further consideration because subsequent scans of the database for j-itemsets, where j >
k, will not need it.

Partitioning − A partitioning technique can be used that needed two database scans to mine
the frequent itemsets. It includes two phases involving In Phase I, the algorithm subdivides the
transactions of D into n non-overlapping partitions. If the minimum support threshold for
transactions in D is min_sup, therefore the minimum support count for a partition is min_sup ×
the number of transactions in that partition.

For each partition, all frequent itemsets within the partition are discovered. These are defined
as local frequent itemsets. The process employs a specific data structure that, for each
itemset, records the TIDs of the transactions including the items in the itemset. This enables it
to find all of the local frequent k-itemsets, for k = 1, 2... in only one scan of the database.

A local frequent itemset can or cannot be frequently related to the whole database, D. Any
itemset that is possibly frequent related D must appear as a frequent itemset is partially one of
the partitions. Thus, all local frequent itemsets are candidate itemsets slightly D. The set of
frequent itemsets from all partitions forms the worldwise candidate itemsets for D. In Phase II,
the second scan of D is organized in which the actual support of each candidate is assessed to
decide the global frequent itemsets.

Sampling − The fundamental idea of the sampling approach is to select a random sample S of
the given data D, and then search for frequent itemsets in S rather than D. In this method, it
can trade off some degree of accuracy against efficiency. The sample size of S is such that the
search for frequent itemsets in S can be completed in main memory, and therefore only one
scan of the transactions in S is needed overall.

https://www.tutorialspoint.com/how-can-we-further-improve-the-efficiency-of-apriori-based-mining 1/2
8/16/22, 4:40 PM How can we further improve the efficiency of Apriori-based mining?

https://www.tutorialspoint.com/how-can-we-further-improve-the-efficiency-of-apriori-based-mining 2/2

You might also like