Session5 6 (Am) PDF
Session5 6 (Am) PDF
Today Objective
• Transaction T: T is subset of I
• Problem: Find rules that have support and confidence greater that
user-specified minimum support and minimun confidence
Key Concepts :
• Frequent Itemsets: The sets of item
which has minimum support (denoted
by Li for ith-Itemset).
• Join Operation: To find Lk , a set of
candidate k-itemsets is generated by
joining Lk with itself.
• Apriori Property: Any subset of
frequent itemset must be frequent.
Indian Institute of Management (IIM),Rohtak
Understanding Apriori through an Example
C1 L1
C2
Indian Institute of Management (IIM),Rohtak
Step 3: Generating 3-itemset Frequent Pattern
Itemset Sup
Count
{I1, I2} 4 L2 Join L2 are joinable if first
{I1, I3} 4 k-1(First Item) items are common.
{I1, I5} 2
{I2, I3} 4
{I2, I4} 2
{I2, I5} 2
• The generation of the set of candidate 3-itemsets, C3 , involves use of
the Apriori Property.
• In order to find C3, we compute L2 Join L2.
• C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5},
{I2, I4, I5}}.
• Now, Join step is complete and Prune step will be used to reduce the
size of C3. Prune step helps to avoid heavy computation due to large Ck.
Compare
Scan D for Scan D for Itemset Sup. candidate Itemset Sup
count of Itemset count of support count
Count with min
Count
each each
candidate {I1, I2, I3} candidate {I1, I2, I3} 2 support count {I1, I2, I3} 2
{I1, I2, I5} {I1, I2, I5} 2
{I1, I2, I5} 2
C3 C3 L3
• Back To Example:
We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4},
{I2,I5}, {I1,I2,I3}, {I1,I2,I5}}.
– Lets take l = {I1,I2,I5}.
– Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.
Therefor ,the set of all frequent item sets are {A},{B},{D},{A B},{A
D},{B D},{A B D}
Sugar->egg
milk->bread
Bread->milk
Milk,egg->bread
Egg,bread->milk
Indian Institute of Management (IIM),Rohtak
Association Mining
Case(items.csv)
Question: Find the association rules with support = 0.22, and
confidence=0.7
Sol: Save the “items.csv” file in the working directory and load the data
item <- read.transactions("items.csv", format = "basket", sep = ",")
summary(item)
Note: read.transaction requires “arules” package to be installed and
loaded
http://www.learnbymarketin
g.com/1043/working-with-
arules-transactions-and-
read-transactions/
Case(supermarket.csv)
Question: Find the association rules with support = 0.4, and
confidence=0.95
Sol: Save the file in Documents folder (working directory)
Load the data:
• supermarket <- read.transactions("supermarket.csv",
format = "basket", sep = ",")
• summary(supermarket)
Note: format = "basket" is when you have multiple data
items
Mining the rules
• rules.all <- apriori(supermarket, parameter =
list(minlen=2, supp=0.22, conf=0.7))
• inspect(rules.all)
Indian Institute of Management (IIM),Rohtak
Association Mining
summary(groceries1)
transactions as itemMatrix in sparse format with
9835 rows (elements/itemsets/transactions) and
169 columns (items) and a density of 0.02609146
inspect(groceries1)
Indian Institute of Management (IIM),Rohtak
Association Mining
Case(groceries.csv)
Display first three transaction
inspect(groceries1[1:3])
items
[1] {citrus fruit,margarine,ready so
ups,semi-finished bread}
[2] {coffee,tropical fruit,yogurt}
[3] {whole milk}
itemFrequencyPlot(groceries1,topN=10)
itemFrequencyPlot(groceries1,support=.15)
inspect(sort(m1,by="lift")[1:4])
lhs rhs support confidence
[1] {citrus fruit,other vegetables,whole milk} => {root vegetables} 0.005795628
0.4453125
[2] {butter,other vegetables} => {whipped/sour cream} 0.005795628
0.2893401
[3] {herbs} => {root vegetables} 0.007015760 0.4312500
[4] {citrus fruit,pip fruit} => {tropical fruit} 0.005592272 0.4044118
lift
[1] 4.085493
[2] 4.036397
[3] 3.956477
[4] 3.854060