0% found this document useful (0 votes)
17 views9 pages

Exp 9

The document describes the Apriori algorithm for finding frequent itemsets in transactional datasets. The Apriori algorithm uses an iterative approach where it first finds frequent items, and then joins them to find candidate itemsets of increasing size. It prunes the candidates that have an infrequent subset. This process continues until no further frequent itemsets are found. The algorithm calculates support and confidence of rules generated from frequent itemsets to find strong association rules.

Uploaded by

ansari amman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

Exp 9

The document describes the Apriori algorithm for finding frequent itemsets in transactional datasets. The Apriori algorithm uses an iterative approach where it first finds frequent items, and then joins them to find candidate itemsets of increasing size. It prunes the candidates that have an infrequent subset. This process continues until no further frequent itemsets are found. The algorithm calculates support and confidence of rules generated from frequent itemsets to find strong association rules.

Uploaded by

ansari amman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

EXPERIMENT NO.

AIM:-
Implementation of Association Rule Mining algorithm (Apriori).

THEORY:-
Apriori algorithm is given by R. Agrawal and R. Srikant in 1994 for finding
frequent itemsets in a dataset for boolean association rule. Name of the
algorithm is Apriori because it uses prior knowledge of frequent itemset
properties. We apply an iterative approach or level-wise search where k-
frequent itemsets are used to find k+1 itemsets.
To improve the efficiency of level-wise generation of frequent itemsets, an
important property is used called Apriori property which helps by reducing the
search space.

Apriori Property –
All non-empty subset of frequent itemset must be frequent. The key concept of
Apriori algorithm is its anti-monotonicity of support measure. Apriori assumes
that
All subsets of a frequent itemset must be frequent(Apriori propertry).
If an itemset is infrequent, all its supersets will be infrequent.
Before we start understanding the algorithm, go through some definitions
which are explained in my previous post.
Consider the following dataset and we will find frequent itemsets and generate
association rules for them.
minimum support count is 2
minimum confidence is 60%

Step-1: K=1
(I) Create a table containing support count of each item present in dataset –
Called C1(candidate set)

(II) compare candidate set item’s support count with minimum support
count(here min_support=2 if support_count of candidate set items is less than
min_support then remove those items). This gives us itemset L1.

Step-2: K=2
 Generate candidate set C2 using L1 (this is called join step). Condition of
joining Lk-1 and Lk-1 is that it should have (K-2) elements in common.
 Check all subsets of an itemset are frequent or not and if not frequent
remove that itemset.(Example subset of{I1, I2} are {I1}, {I2} they are
frequent.Check for each itemset)
 Now find support count of these itemsets by searching in dataset.
(II) compare candidate (C2) support count with minimum support
count(here min_support=2 if support_count of candidate set item is less
than min_support then remove those items) this gives us itemset L2.

Step-3:
 Generate candidate set C3 using L2 (join step). Condition of
joining Lk-1 and Lk-1 is that it should have (K-2) elements in
common. So here, for L2, first element should match.
So itemset generated by joining L2 is {I1, I2, I3}{I1, I2, I5}{I1,
I3, i5}{I2, I3, I4}{I2, I4, I5}{I2, I3, I5}
 Check if all subsets of these itemsets are frequent or not and if
not, then remove that itemset.(Here subset of {I1, I2, I3} are {I1,
I2},{I2, I3},{I1, I3} which are frequent. For {I2, I3, I4}, subset
{I3, I4} is not frequent so remove it. Similarly check for every
itemset)
 find support count of these remaining itemset by searching in
dataset.
(II) Compare candidate (C3) support count with minimum support
count(here min_support=2 if support_count of candidate set item is less
than min_support then remove those items) this gives us itemset L3.

Step-4:
 Generate candidate set C4 using L3 (join step). Condition of
joining Lk-1 and Lk-1 (K=4) is that, they should have (K-2)
elements in common. So here, for L3, first 2 elements (items)
should match.
 Check all subsets of these itemsets are frequent or not (Here
itemset formed by joining L3 is {I1, I2, I3, I5} so its subset
contains {I1, I3, I5}, which is not frequent). So no itemset in C4
 We stop here because no frequent itemsets are found further.

Thus, we have discovered all the frequent item-sets. Now generation of


strong association rule comes into picture. For that we need to calculate
confidence of each rule.

Confidence –
A confidence of 60% means that 60% of the customers, who purchased
milk and bread also bought butter.

Confidence(A->B)=Support_count(A𝖴B)/Support_count(A)

So here, by taking an example of any frequent itemset, we will show the


rule generation.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as
strong association rules.

PROGRAM:-
import itertools

list=[]

support=float(input("Enter the minimum support: "))

items=['i1','i2','i3','i4','i5']
num=int(input("Enter number of transactions: "))

for i in range(num):
print("Enter the items bought in transaction ",i+1," separated by a comma: ")
val=input()
list.append(val)
print("Tranactions are as follows: ")

for i in list:
print(i)

print("The candidate set C1 is : ",items)

#calculate support for all items in candidate set C1


supportc1=[]
for item in items:
val=0
for i in range(len(list)):
if item in list[i]:
val+=1
supportc1.append(float(val/5))

for i in range(5):
print("Support for ",items[i]," is : ",supportc1[i])

print("Genrating L1 which is frequent 1-item set from C1")


l1=[]
for i in range(len(items)):
if supportc1[i]>=support:
l1.append(items[i])
print("L1 is : ",l1)

#Generating Candidate set C2


c2=[]
for val in itertools.combinations(items,2):
c2.append(val)

#calculating support for all items in c2


print("Candidate set c2 is : ",c2)
supportc2=[]

for i in range(len(c2)):
val=0
for item in list:
if c2[i][0] in item and c2[i][1] in item:
val+=1
supportc2.append(float(val/5))

for i in range(len(c2)):
print("Support for ",c2[i]," is : ",supportc2[i])

#generating L2 from C2
l2=[]
for i in range(len(c2)):
if supportc2[i] >=support:
l2.append(c2[i])
print(l2)

c3=[]
for val in itertools.combinations(items,3):
c3.append(val)

supportc3=[]

for i in range(len(c3)):
val=0
for item in list:
if c3[i][0] in item and c3[i][1] in item and c3[i][2] in item:
val+=1
supportc3.append(float(val/5))

for i in range(len(c3)):
print("Support for : ",c3[i]," is: ",supportc3[i])

#generating L3 from C3

l3=[]

for i in range(len(c3)):
if supportc3[i] >=support:
l3.append(c3[i])
print("L3 is : ",l3)

confidence=[]

for i in range(len(l3)):
val=0
div=0
for item in list:
if l3[i][0] in item:
if l3[i][0] in item and l3[i][1] in item and l3[i][2] in item:
val+=1
div+=1
confidence.append(float(val/div))

for i in range(len(l3)):
print("Confidence for ",l3[i]," is: ",confidence[i])

OUTPUT:-

Enter the minimum support: 0.2


Enter number of transactions: 9
Enter the items bought in transaction 1 separated by a comma:
i1,i2,i5
Enter the items bought in transaction 2 separated by a comma:
i2,i4
Enter the items bought in transaction 3 separated by a comma:
i2,i3
Enter the items bought in transaction 4 separated by a comma:
i1,i2,i4
Enter the items bought in transaction 5 separated by a comma:
i1,i3
Enter the items bought in transaction 6 separated by a comma:
i1,i3
Enter the items bought in transaction 7 separated by a comma:
i1,i2,i3,i5
Enter the items bought in transaction 8 separated by a comma:
i1,i3
Enter the items bought in transaction 9 separated by a comma:
i1,i2,i3
Tranactions are as follows:
i1,i2,i5
i2,i4
i2,i3
i1,i2,i4
i1,i3
i1,i3
i1,i2,i3,i5
i1,i3
i1,i2,i3
The candidate set C1 is : ['i1', 'i2', 'i3', 'i4', 'i5']
Support for i1 is : 1.4
Support for i2 is : 1.2
Support for i3 is : 1.2
Support for i4 is : 0.4
Support for i5 is : 0.4
Genrating L1 which is frequent 1-item set from C1
L1 is : ['i1', 'i2', 'i3', 'i4', 'i5']
Candidate set c2 is : [('i1', 'i2'), ('i1', 'i3'), ('i1', 'i4'), ('i1', 'i5'), ('i2', 'i3'), ('i2',
'i4'), ('i2', 'i5'), ('i3', 'i4'), ('i3', 'i5'), ('i4', 'i5')]
Support for ('i1', 'i2') is : 0.8
Support for ('i1', 'i3') is : 1.0
Support for ('i1', 'i4') is : 0.2
Support for ('i1', 'i5') is : 0.4
Support for ('i2', 'i3') is : 0.6
Support for ('i2', 'i4') is : 0.4
Support for ('i2', 'i5') is : 0.4
Support for ('i3', 'i4') is : 0.0
Support for ('i3', 'i5') is : 0.2
Support for ('i4', 'i5') is : 0.0
[('i1', 'i2'), ('i1', 'i3'), ('i1', 'i4'), ('i1', 'i5'), ('i2', 'i3'), ('i2', 'i4'), ('i2', 'i5'), ('i3', 'i5')]
Support for : ('i1', 'i2', 'i3') is: 0.4
Support for : ('i1', 'i2', 'i4') is: 0.2
Support for : ('i1', 'i2', 'i5') is: 0.4
Support for : ('i1', 'i3', 'i4') is: 0.0
Support for : ('i1', 'i3', 'i5') is: 0.2
Support for : ('i1', 'i4', 'i5') is: 0.0
Support for : ('i2', 'i3', 'i4') is: 0.0
Support for : ('i2', 'i3', 'i5') is: 0.2
Support for : ('i2', 'i4', 'i5') is: 0.0
Support for : ('i3', 'i4', 'i5') is: 0.0
L3 is : [('i1', 'i2', 'i3'), ('i1', 'i2', 'i4'), ('i1', 'i2', 'i5'), ('i1', 'i3', 'i5'), ('i2', 'i3', 'i5')]
Confidence for ('i1', 'i2', 'i3') is: 1.0
Confidence for ('i1', 'i2', 'i4') is: 1.0
Confidence for ('i1', 'i2', 'i5') is: 1.0
Confidence for ('i1', 'i3', 'i5') is: 1.0
Confidence for ('i2', 'i3', 'i5') is: 1.0

You might also like