0% found this document useful (0 votes)
19 views45 pages

Session 7

Uploaded by

J S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views45 pages

Session 7

Uploaded by

J S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Business Analytics

Today Objective

ENHANCING DECISION MAKING


Association Analytics : A mining
Approach

Indian Institute of Management (IIM),Rohtak


Why are we discussing ?
Association rule learning is one of the most frequently used data
mining technique used for market basket analysis. It is basically a
rule-based machine learning method for identifying interesting
relations between items/products in market basket data. There are
several metrics used in association rules that allow us to understand
the relations between products. Let’s go one by one with a simple
example as well.
The metrics we will go through are as following:

Support Confidence
Lift
Leverage Conviction
Indian Institute of Management (IIM),Rohtak
Association Model: Problem Satamente

Two Way lift


Three Way lift
Store Layout design

Indian Institute of Management (IIM),Rohtak


Association Model: Problem Satamente

Apriori Procedure

Indian Institute of Management (IIM),Rohtak


The Apriori Procedures
The Apriori Method is an influential method for
mining frequent item sets.

Key Concepts :
• Frequent Itemsets: The sets of item
which has minimum support (denoted
by Li for ith-Itemset).
• Join Operation: To find Lk , a set of
candidate k-itemsets is generated by
joining Lk with itself.
• Apriori Property: Any subset of
frequent itemset must be frequent.
Indian Institute of Management (IIM),Rohtak
Indian Institute of Management (IIM),Rohtak
Understanding Apriori through an Example

TID List of Items


• Consider a database, D , consisting
of 9 transactions.
T100 I1, I2, I5
• Suppose min. support count
T101 I2, I4
required is 2 (i.e. min_sup = 2/9 =
T102 I2, I3 22 % )
T103 I1, I2, I4

T104 I1, I3
• We have to first find out the
frequent itemset using Apriori
T105 I2, I3 algorithm.
T106 I1, I3 • Then, Association rules will be
T107 I1, I2 ,I3, I5 generated using min. support &
min. confidence.
T108 I1, I2, I3

Indian Institute of Management (IIM),Rohtak


Step 1: Generating 1-itemset Frequent Pattern
Itemset Sup.Count Itemset Sup.Count
Compare candidate
Scan D for {I1} 6 support count with {I1} 6
count of each minimum support
candidate {I2} 7 {I2} 7
count
{I3} 6 {I3} 6
{I4} 2 {I4} 2
{I5} 2 {I5} 2

C1 L1

• In the first iteration of the algorithm, each item is a


member of the set of candidate.
• The set of frequent 1-itemsets, L1 , consists of the
candidate 1-itemsets satisfying minimum support.
Indian Institute of Management (IIM),Rohtak
Step 2: Generating 2-itemset Frequent Pattern [Cont.]
• To discover the set of frequent 2-itemsets, L2 , the
algorithm uses L1 Join L1 to generate a candidate set
of 2-itemsets, C2.
• Next, the transactions in D are scanned and the
support count for each candidate itemset in C2 is
accumulated (as show in the middle table in next
slide).
• The set of frequent 2-itemsets, L2 , is then
determined, consisting of those candidate 2-itemsets
in C2 having minimum support.
• Note: We haven’t used Apriori Property yet.

Indian Institute of Management (IIM),Rohtak


Step 2: Generating 2-itemset Frequent Pattern
Itemset Sup.Count
{I1} 6
{I2} 7
{I3} 6
{I4} 2
{I5} 2

Itemset Itemset Sup. Itemset Sup


Compare
Generate {I1, I2} Scan D for Count candidate Count
C2 count of {I1, I2} 4 support count {I1, I2} 4
candidates {I1, I3} each with
from L1 {I1, I4} candidate {I1, I3} 4 minimum {I1, I3} 4
support count
{I1, I5} {I1, I4} 1 {I1, I5} 2

{I2, I3} {I1, I5} 2 {I2, I3} 4

{I2, I4} {I2, I4} 2


{I2, I3} 4
{I2, I5} {I2, I5} 2
{I2, I4} 2
{I3, I4} {I2, I5} 2 L2
{I3, I5}
{I3, I4} 0
{I4, I5}
{I3, I5} 1
C2 {I4, I5} 0

C2
Indian Institute of Management (IIM),Rohtak
Step 3: Generating 3-itemset Frequent Pattern
Itemset Sup
Generate candidate set C3 using L2 (join step).
Count
{I1, I2} 4
Condition of joining Lk-1 and Lk-1 is that it should
{I1, I3} 4 have (K-2) elements in common. So here, for L2,
{I1, I5} 2 first element should match. (general
{I2, I3} 4 obsevations
•The generation of the set of candidate 3-
{I2, I4} 2 itemsets, C3 , involves use of the Apriori
{I2, I5} 2
Property.
•C3 = L2 Join L2={{I1, I2, I3},{I1, I2, I5},{I1, I3, I5},{I2, I3, I4}, {I2, I3, I5},{I2, I4,I5}}.
If we go for all
•C3 = L2 Join L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I2, I4},
{I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
•Now, Join step is complete and Prune step will be used to
reduce the size of C3. Prune step helps to avoid heavy
computation due to large Ck.
Indian Institute of Management (IIM),Rohtak
Step 3: Generating 3-itemset Frequent Pattern [Cont.]

lets take {I2, I3, I4}.


The 2-item subsets of it are {I2, I3}, {I3, I4} & {I2, I4}.
Minimum support =2 Itemset Sup.
Count

Since {I3,I4} is not {I1, I2}

{I1, I3}
4

frequent ,therefore {I1, I4}

{I1, I5}
1

{I2, I3} 4

{I2, I3, I4} to be {I2, I4}

{I2, I5}
2

eliminated from {I3, I4}

{I3, I5}
0

candidate set {I4, I5} 0

Indian Institute of Management (IIM),Rohtak


Step 3: Generating 3-itemset Frequent Pattern [Cont.]
• Based on the Apriori property that all subsets of a frequent Itemset Sup
itemset must also be frequent, we can determine that four Count
candidates cannot possibly be frequent. How ? {I1, I2} 4
• For example , lets take {I1, I2, I3}. The 2-item subsets of it {I1, I3} 4
are {I1, I2}, {I1, I3} & {I2, I3}. Since all 2-item subsets of {I1, I5} 2
{I1, I2, I3} are members of L2, We will keep {I1, I2, I3} in {I2, I3} 4
C3. {I2, I4} 2
• Lets take another example of {I2, I3, I5} which shows how {I2, I5} 2
the pruning is performed. The 2-item subsets are {I2, I3}, {I2, Itemset Sup.
I5} & {I3,I5}. Count
{I1, I2} 4
• BUT, {I3, I5} is not a member of L2 and hence it is not
{I1, I3} 4
frequent violating Apriori Property. Thus We will have to
{I1, I4} 1
remove {I2, I3, I5} from C3. {I1, I5} 2
• Therefore, C3 = {{I1, I2, I3}, {I1, I2, I5}} after checking for {I2, I3} 4
all members of result of Join operation for Pruning. {I2, I4} 2
{I2, I5} 2
• Now, the transactions in D are scanned in order to determine
{I3, I4} 0
L3, consisting of those candidates 3-itemsets in C3 having
{I3, I5} 1
minimum support. {I4, I5} 0

Indian Institute of Management (IIM),Rohtak


Step 3: Generating 3-itemset Frequent Pattern

Compare
Scan D for Scan D for Itemset Sup. candidate Itemset Sup
count of Itemset count of support count
Count with min
Count
each each
candidate {I1, I2, I3} candidate {I1, I2, I3} 2 support count {I1, I2, I3} 2
{I1, I2, I5} {I1, I2, I5} 2
{I1, I2, I5} 2
C3 C3 L3

Generate candidate set C4 using L3 (join step). Condition


of joining Lk-1 and Lk-1 (K=4) is that, they should have (K-2)
elements in common. So here, for L3, first 2 elements
(items) should match.

Indian Institute of Management (IIM),Rohtak


Step 4: Generating 4-itemset Frequent Pattern
• The algorithm uses L3 Join L3 to generate a candidate
set of 4-itemsets, C4. Although the join results in {{I1,
I2, I3, I5}}, this itemset is pruned since its subset {{I2,
I3, I5}} is not frequent.
• Thus, C4 = φ , and procedure terminates, having
found all of the frequent items. This completes our
Apriori Algorithm.
• What’s Next ?
These frequent itemsets will be used to generate
strong association rules ( where strong association
rules satisfy both minimum support & minimum
confidence).

Indian Institute of Management (IIM),Rohtak


Step 5: Generating Association Rules from Frequent
Itemsets
• Procedure:
• For each frequent itemset “l”, generate all nonempty subsets of l.
• For every nonempty subset s of l, output the rule “s → (l-s)” if
support_count(l) / support_count(s) >= min_conf where
min_conf is minimum confidence threshold.

• Back To Example:
We had L = {{I1}, {I2}, {I3}, {I4}, {I5}, {I1,I2}, {I1,I3}, {I1,I5}, {I2,I3}, {I2,I4},
{I2,I5}, {I1,I2,I3}, {I1,I2,I5}}.
– Lets take l = {I1,I2,I5}.
– Its all nonempty subsets are {I1,I2}, {I1,I5}, {I2,I5}, {I1}, {I2}, {I5}.

• Let confidence threshold is , say 100%.


• The resulting association rules are shown below, each listed with its
confidence.

Indian Institute of Management (IIM),Rohtak


Step 5: Generating Association Rules from Frequent
Itemsets [Cont.]
– R1: I1 ^ I2 → I5
• Confidence = sc{I1,I2,I5}/sc{I1,I2} = 2/4 = 50%
• R1 is Rejected.
– R2: I1 ^ I5 → I2
• Confidence = sc{I1,I2,I5}/sc{I1,I5} = 2/2 = 100%
• R2 is Selected.
– R3: I2 ^ I5 → I1
• Confidence = sc{I1,I2,I5}/sc{I2,I5} = 2/2 = 100%
• R3 is Selected
– R4: I1 → I2 ^ I5
• Confidence = sc{I1,I2,I5}/sc{I1} = 2/6 = 33%
• R4 is Rejected.
– R5: I2 → I1 ^ I5
• Confidence = sc{I1,I2,I5}/{I2} = 2/7 = 29%
• R5 is Rejected.
– R6: I5 → I1 ^ I2
• Confidence = sc{I1,I2,I5}/ {I5} = 2/2 = 100%
• R6 is Selected.
In this way, We have found three strong association rules.
Indian Institute of Management (IIM),Rohtak
An Example :

Q1.A database has four transactions .


Let min_sup=70%(LET SAY 3),and min_conf=100%.
TID Date item_ bought
T100 10/15/99 {K, A, D, B}
T200 10/15/99 {D, A, C, E, B}
T300 10/19/99 {C, A, B, E}
T400 10/22/99 {B, A, D}

Indian Institute of Management (IIM),Rohtak


An Example :
C1 L1
Q1.A database has four transactions . A 4 A 4 AB 4
Let min_sup=70%(LET SAY 3),and B 4 B 4 AD 3
min_conf=100%.
TID Date item_ bought C 2 C 2 BD 3
T100 10/15/99 {K, A, D, B} D 3 D 3
T200 10/15/99 {D, A, C, E, B} E 2 E 2
T300 10/19/99 {C, A, B, E} ABD 3
T400 10/22/99 {B, A, D} K 1 K 1

Therefor ,the set of all frequent item sets are {A},{B},{D},{A B},{A
D},{B D},{A B D}

Indian Institute of Management (IIM),Rohtak


An Example :
min_sup = 50%(2), min_conf = 100%: generate Strong Association
Rule Tid Item bought
T100 Sugar (A), Egg(C), Butter (D)
T200 Milk(B) , Egg(C), Bread(E)
T300 Sugar (A) , Milk (B) Egg
, (C), Bread (E)
T400 Milk (B), Bread (E)

Indian Institute of Management (IIM),Rohtak


An Example :

Sugar->egg
milk->bread
Bread->milk
Milk,egg->bread

Egg,bread->milk
Indian Institute of Management (IIM),Rohtak
An Example :
min_sup = 50%(2), min_conf = 80%: generate Strong Association Rule
Tid Item bought
T100 Sugar (A), Egg(C), Butter (D)
T200 Milk(B) , Egg(C), Bread(E)
T300 Sugar (A) , Milk (B) Egg
, (C), Bread (E)
T400 Milk (B), Bread (E)

Indian Institute of Management (IIM),Rohtak


An Example :

Sugar->egg
milk->bread
Bread->milk
Milk,egg->bread
Egg,bread->milk
Indian Institute of Management (IIM),Rohtak
Minimum support count = 2, minimum confidence threshold = 80%,

Database D itemset sup.


L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
Indian Institute of Management (IIM),Rohtak
Assignment : Generate all possible rule which have minimum
support 60%

Transaction id Items
T100 M,O,N,K,E,Y
T200 D,O,N,K,E,Y
T300 M,A,K,E
T400 M,U,C,K,Y
T500 C,O,O,K,I,E

Indian Institute of Management (IIM),Rohtak


Assignment : Generate all possible rule which have minimum
support 60%

M 3 3/5 60% Next Round


0 4 4/5 80% Next Round
N 2 2/5 40% Reject
K 5 5/5 100% Next Round
E 4 4/5 80% Next Round
Y 3 3/5 60% Next Round
D 1 1/5 20% Reject
A 1 1/5 20% Reject
U 1 1/5 20% Reject
C 1 1/5 20% Reject

Indian Institute of Management (IIM),Rohtak


M,O,K,E,Y

M,O 1 1/5=20% Reject


M,K 3 3/5=60% Selected
M,E 2 2/5=40% Reject
M,Y 2 2/5=40% Reject
O,K 3 3/5=60% Selected
O,E 3 3/5=60% Selected
O,Y 2 2/5=40% Reject
K,E 4 4/5=80% Selected
K,Y 3 3/5=60% Selected
E,Y 2 2/5=40% Reject

Indian Institute of Management (IIM),Rohtak


M,K M ,K,O 1=20%
O,K M,K,E 2=40%
O,E M,K,Y 2=40% O,K,E
K,E O,K,E 3=60%
K,Y O,K,Y 2=40%
K,E,Y 2=40%

IF WE FIX FRIST ITEM COMMON

O,K,E 3 60%
O,K,E
K,E,Y 2 40%

Indian Institute of Management (IIM),Rohtak


Find a pair of items(below), a and b, such that the rules
{a} → {b} and {b} → {a}
have the same confidence. Any such pair will do

Indian Institute of Management (IIM),Rohtak


Find a pair of items(below), a and b, such that the rules
{a} → {b} and {b} → {a}
have the same confidence. Any such pair will do

(Beer cookies) (Bread Butter)


Indian Institute of Management (IIM),Rohtak
If {1, 2, 3} and {2, 3, 4} are the only large 3 itemsets, identify
for each one of the following sets if it is or is not a large itemset
,or you cannot be certain if it is a large itemset or not.
{1} Yes/no
{1, 2} Yes/no
{1, 4} yes/no
{1, 2, 3, 4} Yes/no
{1, 3, 4} Yes/no

Assume that the confidence of the decision rule, 1-> 2, is 100%. Is the
confidence of the decision rule, 2-> 1, also 100%? Give an example
of data to justify your answer.
Transaction Items
1 1,2
2 1,2,3
3 2,3
4 1,2,4
5 2,3,4

Indian Institute of Management (IIM),Rohtak


If {1, 2, 3} and {2, 3, 4} are the only large 3 itemsets, identify
for each one of the following sets if it is or is not a large itemset
,or you cannot be certain if it is a large itemset or not.
{1} yes
{1, 2} yes
{1, 4} no
{1, 2, 3, 4} no
{1, 3, 4} no

Assume that the confidence of the decision rule, 1-> 2, is 100%. Is the
confidence of the decision rule, 2-> 1, also 100%? Give an example
of data to justify your answer.
Transaction Items
1 1,2
Confidence for 1->2 is 100%
2 1,2,3 while confidence for 2->1 is 60%
3 2,3
4 1,2,4
5 2,3,4

Indian Institute of Management (IIM),Rohtak


Example :

End

Indian Institute of Management (IIM),Rohtak


Simulation

Indian Institute of Management (IIM),Rohtak


While the Apriori algorithm is foundational and easy to understand,
its computational complexity, scanning the data, repetitive rules
scalability, and efficiency drawbacks have led to the development of
more advanced algorithms like FP-Growth and ECLAT, which
address many of these limitations and are more suitable for large-
scale data mining applications.

The ECLAT (Equivalence Class Transformation) algorithm is


efficient for frequent itemset mining. Unlike the Apriori algorithm,
which uses a breadth-first search and generates candidate itemsets in
each iteration, ECLAT uses a depth-first search approach and a
vertical data format, making it more efficient in many scenarios.

Indian Institute of Management (IIM),Rohtak


Indian Institute of Management (IIM),Rohtak
ECLAT transforms the dataset into a vertical format where each item
is associated with a list of transaction IDs (TIDs) in which it appears.
This is known as a TID list.

Transactions: Vertical Format (TID lists):


T1: {A, B, D} A: {T1, T3, T4, T5}
T2: {B, C, E} B: {T1, T2, T3, T5}
T3: {A, B, C, E} C: {T2, T3, T5}
T4: {A, E} D: {T1}
T5: {A, B, C, E} E: {T2, T3, T4, T5}

Indian Institute of Management (IIM),Rohtak


Step-by-Step Execution:
1.Start with single items Combine items recursively and intersect
and their TID lists: TID lists:
1. A: {T1, T3, T4, T5} •A ∩ B = {T1, T3, T5} (Support = 3)
2. B: {T1, T2, T3, T5} •A ∩ C = {T3, T5} (Support = 2)
3. C: {T2, T3, T5} •A ∩ E = {T3, T4, T5} (Support = 3)
4. D: {T1} •B ∩ C = {T2, T3, T5} (Support = 3)
5. E: {T2, T3, T4, T5} •B ∩ E = {T2, T3, T5} (Support = 3)
•C ∩ E = {T2, T3, T5} (Support = 3)
Continue combining itemsets and intersecting TID
lists:
•A ∩ B ∩ C = {T3, T5} (Support = 2)
•A ∩ B ∩ E = {T3, T5} (Support = 2)
•B ∩ C ∩ E = {T2, T3, T5} (Support = 3)
•A ∩ B ∩ C ∩ E = {T3, T5} (Support = 2)
Collect frequent itemsets:
•{A}, {B}, {C}, {E}, {A, B}, {A, E}, {B, C}, {B, E}, {C, E}, {A, B, C},
{A, B, E}, {B, C, E}, {A, B, C, E} Indian Institute of Management (IIM),Rohtak
1.Since the Eclat algorithm uses a Depth-First Search approach, it
consumes less memory than the Apriori algorithm
2.The Eclat algorithm does not involve in the repeated scanning of the
data to calculate the individual support values

Indian Institute of Management (IIM),Rohtak


Association Mining

Any Variable Name <-apriori(Object name)


inspect(Any Variable Name)
x1<-apriori( object name , parameter = list( minlen=3, supp=0.002, conf=0.2),
appearance = list(rhs, lhs)))

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


ORANGE
Image Analytics

Indian Institute of Management (IIM),Rohtak


ORANGE
Image Analytics

Indian Institute of Management (IIM),Rohtak


ORANGE

Indian Institute of Management (IIM),Rohtak


Thank you !!!
Indian Institute of Management (IIM),Rohtak

You might also like