Apriori Algorithm
Apriori Algorithm
Mining
Gayathri Prasad S
Association Rule Mining
• For large sets of data, there can be hundreds of items in hundreds of thousands transactions.
The Apriori algorithm tries to extract rules for each possible combination of items. This
process can be extremely slow due to the number of combinations. To speed up the process,
we need to perform the following steps:
• Set a minimum value for support and confidence. This means that we are only interested
in finding rules for the items that have certain default existence (e.g. support) and have a
minimum value for co-occurrence with other items (e.g. confidence).
• Extract all the subsets having higher value of support than minimum threshold.
• Select all the rules from the subsets with confidence value higher than minimum
threshold.
• Order the rules by descending order of Lift.
Frequent Item Set
• Easy to implement
• Use large itemset property
Shortcomings
There are two major shortcomings of Apriori Algorithms
• The size of itemset from candidate generation could be
extremely large. In general, a dataset that contains k items
can potentially generate up to 2^K itemset. Because K can be
very large in many practical applications, it becomes
computationally expensive.
• Lots of time wasted on counting the support since we have to
scan the itemset database over and over again
Fp Growth Algorithm
k=2
k =1, minimum support =2 Item Tidset
Item Tidset {Bread, Butter} {T1, T4, T8, T9}
Bread {T1, T4, T5, T7, T8, T9 } {Bread, Milk} {T5, T7, T8, T9}
{Bread, Coke} {T4}
Butter {T1, T2, T3, T4, T6, T8, T9} {Bread, J am} {T1, T8}
Milk {T3, T5, T6, T7, T8, T9} {Butter, Milk} {T3, T6, T8, T9}
{Butter, Coke} {T2, T4}
Coke {T2, T4} {Butter, J am} {T1, T8}
Jam {T1, T8} {Milk, J am} {T8}
k =3
Item Tidset
{Bread, Butter, Milk} {T8, T9}
{Bread, Butter, J am} {T1, T8}
k =4
We will stop at k =4 because there are no more element-tipset pairs to combine.
Since minimum support =2, we conclude the following rules from this dataset: —
Item Tidset
{Bread, Butter, Milk, J am} {T8}
Features of Eclat
Advantages
• Since the Eclat algorithm uses a Depth-First Search approach, it consumes less
memory than the Apriori algorithm
• The Eclat algorithm does not involve in the repeated scanning of the data in
order to calculate the individual support values
• Eclat algorithm scans the currently generated dataset unlike Apriori which
scans the original dataset
Disadvantage
• If the tid list is too large, the Eclat algorithm may run out of memory.
Thank
You