Association Rule Mod 3
Association Rule Mod 3
2.A frequent item set is a set of items that occur together frequently in a dataset. The
frequency of an item set is measured by the support count, which is the number of
transactions or records in the dataset that contain the item set. For example, if a dataset
contains 100 transactions and the item set {milk, bread} appears in 20 of those
transactions, the support count for {milk, bread} is 20.
3.Association rule mining algorithms, such as Apriori or FP-Growth, are used to find
frequent item sets and generate association rules
4.Frequent item sets and association rules can be used for a variety of tasks such as
1.Market basket analysis,
2.cross-selling and recommendation systems..
it is important to use appropriate measures such as lift and conviction to evaluate the
interestingness of the generated rules.
key properties of data mining are
Association Mining searches for frequent items in the data
set. Frequent Mining shows which items appear together in a
transaction or relationship.
Need of Association Mining: Frequent mining is the
generation of association rules from a Transactional
Dataset. If there are 2 items X and Y purchased frequently
then it’s good to put them together in stores or provide some
discount offer on one item on purchase of another item. This
can really increase sales. For example, it is likely to find that if
a customer buys Milk and bread he/she also buys Butter. So
the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So
the seller can suggest the customer buy butter if he/she buys
Milk and Bread.
For example, if a customer buys bread, he most likely can also
buy butter, eggs, or milk, so these products are stored within
a shelf or mostly nearby..
Example of Association Rules
An example of Association Rules
1.Assume there are 100 customers.
2.10 of them bought milk, 8 bought butter and 6 bought both
of them.
3.bought milk => bought butter.
4.support = P(Milk & Butter) = 6/100 = 0.06.
5.confidence = support/P(Butter) = 0.06/0.08 = 0.75.
6.lift = confidence/P(Milk) = 0.75/0.10 = 7.5.
Support , confidence and lift
• Support : It is one of the measures of interestingness. This tells about the usefulness and
certainty of rules. 5% Support means total 5% of transactions in the database follow the rule.
• Support is the frequency of A or how frequently an item appears in the dataset. It is defined
as the fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:
Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X
and Y occur together in the dataset when the occurrence of X is already given. It is the ratio of
the transaction that contains X and Y to the number of records that contain X.
Lift
It is the strength of any rule, which can be defined as below formula: It is the ratio of the
observed support measure and expected support if X and Y are independent of each other. It
has three possible values:
key properties of data mining are
•If Lift= 1: The probability of occurrence of antecedent and
consequent is independent of each other.
•Lift>1: It determines the degree to which the two itemsets
are dependent to each other.
•Lift<1: It tells us that one item is a substitute for other
items, which means one item has a negative effect on
another.
Freq itemset: is whose sup is greater than some user defined user sup
Closed Frequent itemset: if non of its immediate supersets has same support
Maximal: FQi: is maximal frequent if none of its immediate supersets is frequent
Ex:
If length of a freq itemset is k, then by downward closure
property, all of it 2k subsets are also freq. All items are freq because their sup count is
greater than min sup
Milk
Bread
Butter
Milk, bread, Min Sup is 3
Milk, butter If length of a freq itemset is k, then by downward closure
Bread, butter property, all of it 2ksubsets are also freq.
If length of a freq itemset is k, then by downward closure
property, all of it 2k subsets are also freq.
Freq Itemset
MaximalFreq
Itemset
TID List of Items
1 A,B,C,D
2 A,B,C,D
Min Sup is 3
3 A,B,C
4 B,C,D
5 C,D
ITEM Count
A 3
B 4
C 5
D 4 All items are freq because their sup count is
greater than min sup
ITEM Count
All items are freq except AD
A,B 3 A(3)>AB(3), AC(3),AD(2)
A,C 3 A(count) is not greater than its immediate superset,
A is not closed (if none of its immediate supersets has same
A,D 2 support)
In A’s Superset items are present with min support , i.e 3, all are
B,C 4 freq
B,D 3 A is not maximal.
C,D 4
Min Sup is 3
Association Rules
txn id items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke Baskets or Transactions
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
support is 60%
Confidence 100% - Customers buy diaper 100% of the times when
they buy beer
Association Rules
txn id items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke
Eggs 1 0.2
Coke 2 0.4
Diaper 0.8
4
Eggs 1 0.2
Coke 2 0.4
Diaper 0.8
4
This property belongs to a special category of properties called antimonotone in the sense that if a set cannot pass a test, all of its
supersets will fail the same test as well
Transactional data for an All Electronics
branch.
TID List of item IDs
Example: There are nine transactions in this database, that is, |D| = 9.
We use Apriori algorithm for finding frequent itemsets in D. 1.
1. In the first iteration of the algorithm, each item is a member of the set of
candidate 1-itemsets, C1. The algorithm simply scans all of the transactions in
order to count the number of occurrences of each item.
2. Suppose that the minimum support count required is 2, that is, min sup = 2.
The set of frequent 1-itemsets, L1, can then be determined. It consists of the
candidate 1-itemsets satisfying minimum support.
3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 X
L1 to generate a candidate set of 2-itemsets, C2. C2 consists of(L1/2) 2-
itemsets. Note that no candidates are removed from C2 during the prune step
because each subset of the candidates is also frequent.
Example
Example
Example
Apriori Algorithm
This algorithm uses frequent datasets to generate association
rules. It is designed to work on the databases that contain
transactions. This algorithm uses a breadth-first search and Hash
Tree to calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to
understand the products that can be bought together. It can also
be used in the healthcare field to find drug reactions for patients.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation.
This algorithm uses a depth-first search technique to find frequent
itemsets in a transaction database. It performs faster execution
than Apriori Algorithm.
Example
2
4. Next, the transactions in D are scanned and the support count of each
candidate itemset inC2 is accumulated, as shown in the middle table of the
second row in Figure 5.2.
5. The set of frequent 2-itemsets, L2, is then determined, consisting of those
candidate 2-itemsets in C2 having minimum support.
The transactions in D are scanned in order to determine L3, consisting of those candidate 3-itemsets in C3
having minimum support.
Join: C3 = L2 on L2 = {{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}} X {{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2,
I4}, {I2, I5}} = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
Prune using the Apriori property: All nonempty subsets of a frequent itemset must also be frequent. Do
any of the candidates have a subset that is not frequent?
The 2-item subsets of {I1, I2, I3} are {I1, I2}, {I1, I3}, and {I2, I3}. All 2-item subsets of {I1, I2, I3} are
members of L2. Therefore, keep {I1, I2, I3} in C3.
The 2-item subsets of {I1, I2, I5} are {I1, I2}, {I1, I5}, and {I2, I5}. All 2-item subsets of {I1, I2, I5} are
members of L2. Therefore, keep {I1, I2, I5} in C3.
The 2-item subsets of {I1, I3, I5} are {I1, I3}, {I1, I5}, and {I3, I5}. {I3, I5} is not a member of L2, and so it is
not frequent. Therefore, remove {I1, I3, I5} from C3.
The 2-item subsets of {I2, I3, I4} are {I2, I3}, {I2, I4}, and {I3, I4}. {I3, I4} is not a member of L2, and so it is
not frequent. Therefore, remove {I2, I3, I4} from C3.
The 2-item subsets of {I2, I3, I5} are {I2, I3}, {I2, I5}, and {I3, I5}. {I3, I5} is not a member of L2, and so it is
not frequent. Therefore, remove {I2, I3, I5} from C3.
The 2-item subsets of {I2, I4, I5} are {I2, I4}, {I2, I5}, and {I4, I5}. {I4, I5} is not a member of L2, and so it is
Example
2