ML Algorithm
ML Algorithm
Aprori Algorithm
Introduction:
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding frequent itemsets in a dataset
for association rule. Name of the algorithm is Apriori because it uses prior knowledge of frequent itemset
properties. An iterative approach or level-wise search where k-frequent itemsets are used to find k+1
itemsets.
Applications:
Apriori algorithm is a sequence of steps to be followed to find the most frequent itemset in the given
database. This data mining technique follows the join and the prune(Trimming) steps iteratively until
the most frequent itemset is achieved. A minimum support threshold is given in the problem or it is
assumed by the user.
#1) In the first iteration (K=1) of the algorithm, each item is taken as a 1-itemsets (One Item in
One transaction) candidate. The algorithm will count the occurrences of each item.
#2) Let there be some minimum support, min_sup ( eg 50%). The set of 1 – itemsets whose
occurrence is satisfying the min sup are determined. Only those candidates which count more
than or equal to min_sup, are taken ahead for the next iteration and the others are pruned.
#3) Next, 2-itemset frequent items with min_sup are discovered. For this in the join step, the 2-
itemset is generated by forming a group of 2 by combining items with itself.
#4) The 2-itemset candidates are pruned using min-sup threshold value. Now the table will have
2 –itemsets with min-sup only.
#5) The next iteration will form 3 –itemsets using join and prune step. This iteration will follow
antimonotone property where the subsets of 3-itemsets, that is the 2 –itemset subsets of each
group fall in min_sup. If all 2-itemset subsets are frequent then the superset will be frequent
otherwise it is pruned.
#6) Next step will follow making 4-itemset by joining 3-itemset with itself and pruning if its
subset does not meet the min_sup criteria. The algorithm is stopped when the most frequent
itemset is achieved (Item set is null)
Important Terms and Formulas
N
Frequency(x) = ∑ Occurance of xi
i=0
Freq( x)
Support (x) =
N
Support ( A , B)
Confidence (A,B) =
Support ( A)
Support ( A , B)
Lift (A,B) =
Support ( A ) . Support (B)
Item Set (L)= Item set whose support is greater or equal to the minimum support
Example
For the following given transaction dataset generate the rule using Aprori Algorithm. Given that
minimum support is 0.5 (50 %) and acceptable confidence is 0.75 (75%)
Transaction ID Items
1. Bread Cheese Egg Juice
2. Bread Cheese Juice
3. Bread Milk Yogurt
4. Bread Juice Milk
5. Cheese Juice Milk
Solution
Step 1: K=1
Create a table containing support count of each item present in dataset – Called C1(candidate set)
Pruning
compare candidate set item’s support count with minimum support count(here min_support=50% if support
of candidate set items is less than min_support then remove those items). This gives us itemset L1.
Now this frequent item set become candidate set for next Iteration
Step 2: K=2
compare candidate set item’s support count with minimum support count(here min_support=50%
if support of candidate set items is less than min_support then remove those items). This gives us
itemset L1.
Grouping
Step 3: K=3
compare candidate set item’s support count with minimum support count(here min_support=50% if support
of candidate set items is less than min_support then remove those items). This gives us itemset L1.
Rule 1: (Bread, Juice) =>Confidence = Support (Bread, Juice) /Support(Bread) = (3/5) / (4/5)
= 3/4
= 75%
Rule 2: (Juice, Bread) =>Confidence = Support (Juice, Bread) /Support(Juice) = (3/5) / (4/5)
= 3/4
= 75%
Rule 3: (Cheese, Juice) =>Confidence = Support (Cheese, Juice) /Support(Cheese) = (3/5) / (3/5)
=1
= 100%
The lift is a value between 0 and infinity: A lift value greater than 1 indicates that the rule body and
the rule head appear more often together than expected, this means that the occurrence of the rule body
has a positive effect on the occurrence of the rule head.
Rule 4: (Juice, Cheese) =>Confidence = Support (Juice, Cheese) /Support(Juice) = (3/5) / (4/5)
= 75%
3 {Ck = apriori-gen(Lk-1)
Juice 4.
minsup Milk 5.
Yogurt 6.
C1={1,4…
C2={2,3,..
Updated cetroid
MatLab
Naïve Bayse Algorithm (Classification)
Example:
Consider a given dataset, apply naïve bayse algorithm to predict the fruit if it has following
properties
Fruit X={Yellow, Sweet, Long}
Solution:
First Compare with Mango
P(X|Mango) =P(Y|M).P(S|M).P(L|M)
According to bayes theorem
P(L|M) = P(M|L).P(L)/P(M)
= (0/400)*(400/2050) /(800/2050)
= 0
P(X|Mango) = P(Y|M).P(S|M).P(L|M)
= (0.43375)*(0.5625)*(0)
P(X|Mango) = 0
Now Compare With Banana
P(X|Banana) =P(Y|B).P(S|B).P(L|B)
According to bayes theorem
P(Y|B) = P(B|Y).P(Y) /P(B)
= (400/800)(800/2050) /(1050/2050)
= 0.380
P(L|B) = P(B|L).P(L)/P(B)
= (350/400)*(400/2050) /(1050/2050)
= 0.333
P(X|B) = P(Y|B).P(S|B).P(L|B)
= (0.380)*(0.2857)*(0.33)
= 0.035
P(L|O)= P(O|L).P(L)/P(O)
= (50/400)*(400/2050) /(200 /2050)
= 0.25
P(X|B) = P(Y|B).P(S|B).P(L|B)
= (0.25)*(0.5)*(0.25)
= 0.03125
KNN