Assignment 2
Assignment 2
Assignment II
Marks: 50
Questions II
Explain in details about the apriori algorithm and write its function
S
Elements/Grade Good (5) Average(3) Needs Improvement (1)
No
Definition Well clear
definition Definition
1 ( 5 Marks) Definition is not correct
Apriori not clear
Algorithm.
Property Properties are
Correct properties not clear or
( 5Marks) Lack in detailed explanation
2 involved in proper and
of property
algorithm explanation is
also not clear.
Work Explanation
How does it work
3 is also not Explanation not clear
( 5Marks) in premises
clear.
Steps involved Process but it
Step-by-Step
(Algorithm) is not Process is not properly
4 process with
sequential correct
( 5 Marks) sequential order
order.
5 Applications, Summarize the 5 Below 6 Only two and the below
to 10 points you points
advantages and
think are most
disadvantages
important
( 5 Marks)
Answer:
APRIORI ALGORITHM
An algorithm known as Apriori is a common one in data mining. It's used to identify the
most frequently occurring elements and meaningful associations in a dataset. As an
example, products brought in by consumers to a shop may all be used as inputs in this
system.
Apriori Property
In 1994, R. Agrawal and R. Srikant developed the Apriori method for identifying the
most frequently occurring itemsets in a dataset using the boolean association rule.
Since it makes use of previous knowledge about common itemset features, the method
is referred to as Apriori. This is achieved by the use of an iterative technique or level-
wise approach, in which k-frequent itemsets are utilized to locate k+1 itemsets.
An essential feature known as the Apriori property is utilized to boost the effectiveness
of level-wise production of frequent itemsets. This property helps by minimizing the
search area, which in turn serves to maximize the productivity of level-wise creation of
frequent patterns.
Step 1: Create a list of all the elements that appear in every transaction and
create a frequency table.
Step 2: Set the minimum level of support. Only those elements whose support
exceeds or equals the threshold support are significant.
Step 3: All potential pairings of important elements must be made, bearing in
mind that AB and BA are interchangeable.
Step 4: Tally the number of times each pair appears in a transaction.
Step 5: Only those sets of data that meet the criterion of support are significant.
Step 6: Now, suppose you want to find a set of three things that may be bought
together. A rule, known as self-join, is needed to build a three-item set. The item
pairings OP, OB, PB, and PM state that two combinations with the same initial
letter are sought from these sets.
Step 7: When the threshold criterion is applied again, you'll get the significant
itemset.
Step 1: Determine the level of transactional database support and establish the
minimal degree of assistance and dependability.
Step 2: Take all of the transaction's supports that are greater than the standard
or chosen support value.
Step 3: Look for all rules with greater precision than the cutoff or baseline
standard, in these subgroups.
Step 4: It is best to arrange the rules in ascending order of strength.
Hash-Based Technique
Using a hash-based structure known as a hash table, the k-itemsets and their related
counts are generated. The table is generated using a hash function.
Transaction Reduction
There are fewer transactions to scan throughout each loop when using this strategy.
Items that are not often used in a process are either tagged or deleted.
Partitioning
Two database searches are all that is needed to find the frequently occurring itemsets
using this approach. For any item set to be considered "possibly frequent" in the
database, it must be prevalent in at least a few of the database subdivisions.
Sampling
A random sample S is selected from database D, and then a search is conducted for
frequent itemsets within that sample S. Global frequent itemsets may be misplaced. By
reducing the min sup, this may be decreased.
During the screening of the dataset, this approach may add new iterations at any
indicated starting position of the directory.
Advantages of Apriori
Disadvantages of Apriori
Education
Through the use of traits and specializations, data mining of accepted students may be
used to extract association rules.
Medical
Forestry
Frequency and intensity of forest fire analysis using forest fire data.
Autocomplete Tool