0% found this document useful (0 votes)
39 views28 pages

Association Rule Mod 3

Uploaded by

tpavan073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views28 pages

Association Rule Mod 3

Uploaded by

tpavan073
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 28

(Patterns & Association Rule Mining)

Frequent Item set in Data set


1.Frequent item sets, are a fundamental concept in association rule mining, which is a
technique used in data mining to discover relationships between items in a dataset. The
goal of association rule mining is to identify relationships between items in a dataset that
occur frequently together.

2.A frequent item set is a set of items that occur together frequently in a dataset. The
frequency of an item set is measured by the support count, which is the number of
transactions or records in the dataset that contain the item set. For example, if a dataset
contains 100 transactions and the item set {milk, bread} appears in 20 of those
transactions, the support count for {milk, bread} is 20.
3.Association rule mining algorithms, such as Apriori or FP-Growth, are used to find
frequent item sets and generate association rules
4.Frequent item sets and association rules can be used for a variety of tasks such as
1.Market basket analysis,
2.cross-selling and recommendation systems..

it is important to use appropriate measures such as lift and conviction to evaluate the
interestingness of the generated rules.
key properties of data mining are
Association Mining searches for frequent items in the data
set. Frequent Mining shows which items appear together in a
transaction or relationship.
Need of Association Mining: Frequent mining is the
generation of association rules from a Transactional
Dataset. If there are 2 items X and Y purchased frequently
then it’s good to put them together in stores or provide some
discount offer on one item on purchase of another item. This
can really increase sales. For example, it is likely to find that if
a customer buys Milk and bread he/she also buys Butter. So
the association rule is [‘milk]^[‘bread’]=>[‘butter’]. So
the seller can suggest the customer buy butter if he/she buys
Milk and Bread.
For example, if a customer buys bread, he most likely can also
buy butter, eggs, or milk, so these products are stored within
a shelf or mostly nearby..
Example of Association Rules
An example of Association Rules
1.Assume there are 100 customers.
2.10 of them bought milk, 8 bought butter and 6 bought both
of them.
3.bought milk => bought butter.
4.support = P(Milk & Butter) = 6/100 = 0.06.
5.confidence = support/P(Butter) = 0.06/0.08 = 0.75.
6.lift = confidence/P(Milk) = 0.75/0.10 = 7.5.
Support , confidence and lift
• Support : It is one of the measures of interestingness. This tells about the usefulness and
certainty of rules. 5% Support means total 5% of transactions in the database follow the rule.
• Support is the frequency of A or how frequently an item appears in the dataset. It is defined
as the fraction of the transaction T that contains the itemset X. If there are X datasets, then for
transactions T, it can be written as:

Support(A -> B) = Support_count(A ∪ B)

Confidence
Confidence indicates how often the rule has been found to be true. Or how often the items X
and Y occur together in the dataset when the occurrence of X is already given. It is the ratio of
the transaction that contains X and Y to the number of records that contain X.

Lift
It is the strength of any rule, which can be defined as below formula: It is the ratio of the
observed support measure and expected support if X and Y are independent of each other. It
has three possible values:
key properties of data mining are
•If Lift= 1: The probability of occurrence of antecedent and
consequent is independent of each other.
•Lift>1: It determines the degree to which the two itemsets
are dependent to each other.
•Lift<1: It tells us that one item is a substitute for other
items, which means one item has a negative effect on
another.
Freq itemset: is whose sup is greater than some user defined user sup
Closed Frequent itemset: if non of its immediate supersets has same support
Maximal: FQi: is maximal frequent if none of its immediate supersets is frequent
Ex:
If length of a freq itemset is k, then by downward closure
property, all of it 2k subsets are also freq. All items are freq because their sup count is
greater than min sup
Milk
Bread
Butter
Milk, bread, Min Sup is 3
Milk, butter If length of a freq itemset is k, then by downward closure
Bread, butter property, all of it 2ksubsets are also freq.
If length of a freq itemset is k, then by downward closure
property, all of it 2k subsets are also freq.

Freq Itemset

Closed Freq Itemset

MaximalFreq
Itemset
TID List of Items
1 A,B,C,D
2 A,B,C,D
Min Sup is 3
3 A,B,C
4 B,C,D
5 C,D
ITEM Count
A 3
B 4
C 5
D 4 All items are freq because their sup count is
greater than min sup
ITEM Count
All items are freq except AD
A,B 3 A(3)>AB(3), AC(3),AD(2)
A,C 3 A(count) is not greater than its immediate superset,
A is not closed (if none of its immediate supersets has same
A,D 2 support)
In A’s Superset items are present with min support , i.e 3, all are
B,C 4 freq
B,D 3 A is not maximal.

C,D 4

Min Sup is 3
Association Rules
txn id items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke Baskets or Transactions
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke

Calculate the following metrics for each pair of items (X,Y)

In how many baskets people have


bought these two items together
with respect to all baskets?

What is percentage of times


people buy Y when they buy x?
Association Rules
• An association rule is represented as
Body => Head [Support,
Confidence] For example

Beer => Diaper [60%,


100%]

 support is 60%
 Confidence 100% - Customers buy diaper 100% of the times when
they buy beer
Association Rules
txn id items
1 Bread Milk
2 Bread Diaper Beer Eggs
3 Milk Diaper Beer Coke
4 Bread Milk Diaper Beer
5 Bread Milk Diaper Coke

Association Rules Support Confidence Lift


Bread 0.8
4 0.8
Milk
4 0.6
Beer
3

Eggs 1 0.2
Coke 2 0.4
Diaper 0.8
4

Beer -> Bread 2 0.4 0.67 0.83


Beer -> Diaper 3 0.6 1.00 1.25
Apriori Algorithm
• A subset of a frequent itemset must also be
a frequent itemset
– if {Beer, Potato} is a frequent itemset, both {Beer} and {Potato}
should be a frequent itemset.
– Filter candidate items by setting minimum support, which
reduces number of items to generate useful rules

If we set a minimum support of 60%


Association Rules Support Confidence Lift
Bread 0.8
4 0.8
Milk
4 0.6
Beer
3

Eggs 1 0.2
Coke 2 0.4
Diaper 0.8
4

Beer -> Bread 2 0.4 0.67 0.83


Beer -> Diaper 3 0.6 1.00 1.25
Apriori Algorithm of data mining are
Apriori Algorithm
This algorithm uses frequent datasets to generate association rules. It is designed to
work on the databases that contain transactions. This algorithm uses a breadth-first
search and Hash Tree to calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to understand the products that
can be bought together. It can also be used in the healthcare field to find drug
reactions for patients.
Apriori property: All nonempty subsets of a frequent itemset must also be frequent.
Apriori employs an iterative approach known as a level-wise search, where k-itemsets are used to explore
(k + 1)-itemsets. First, the set of frequent 1-itemsets is found by scanning the database to accumulate the
count for each item, and collecting those items that satisfy minimum support. The resulting set is denoted
L1. Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no
more frequent k-itemsets can be found. The finding of each Lk requires one full scan of the database.
The Apriori property is based on the following observation. By definition, if an itemset I does not satisfy
the minimum support threshold, min sup, then I is not frequent; that is, P(I) < min sup.

This property belongs to a special category of properties called antimonotone in the sense that if a set cannot pass a test, all of its
supersets will fail the same test as well
Transactional data for an All Electronics
branch.
TID List of item IDs

T100 I1, I2, I5


T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
Example
Example
2

Example: There are nine transactions in this database, that is, |D| = 9.
We use Apriori algorithm for finding frequent itemsets in D. 1.
1. In the first iteration of the algorithm, each item is a member of the set of
candidate 1-itemsets, C1. The algorithm simply scans all of the transactions in
order to count the number of occurrences of each item.
2. Suppose that the minimum support count required is 2, that is, min sup = 2.
The set of frequent 1-itemsets, L1, can then be determined. It consists of the
candidate 1-itemsets satisfying minimum support.
3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 X
L1 to generate a candidate set of 2-itemsets, C2. C2 consists of(L1/2) 2-
itemsets. Note that no candidates are removed from C2 during the prune step
because each subset of the candidates is also frequent.
Example
Example
Example
Apriori Algorithm
This algorithm uses frequent datasets to generate association
rules. It is designed to work on the databases that contain
transactions. This algorithm uses a breadth-first search and Hash
Tree to calculate the itemset efficiently.
It is mainly used for market basket analysis and helps to
understand the products that can be bought together. It can also
be used in the healthcare field to find drug reactions for patients.
Eclat Algorithm
Eclat algorithm stands for Equivalence Class Transformation.
This algorithm uses a depth-first search technique to find frequent
itemsets in a transaction database. It performs faster execution
than Apriori Algorithm.
Example
2

4. Next, the transactions in D are scanned and the support count of each
candidate itemset inC2 is accumulated, as shown in the middle table of the
second row in Figure 5.2.
5. The set of frequent 2-itemsets, L2, is then determined, consisting of those
candidate 2-itemsets in C2 having minimum support.

The generation of the set of candidate 3-itemsets,C3, is detailed in Figure 5.3.


From the join step, we first getC3 = L2 ⋊⋉ L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3,
I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}. Based on the Apriori property that all
subsets of a frequent itemset must also be frequent, we can determine that the
four latter candidates cannot possibly be frequent. We therefore remove them
from C3, thereby saving the effort of unnecessarily obtaining their counts during
the subsequent scan of D to determine L3. Note that when given a candidate k-
itemset, we only need to check if its(k−1)-subsets are frequent since the Apriori
algorithm uses a level-wise search strategy
Example
2

The transactions in D are scanned in order to determine L3, consisting of those candidate 3-itemsets in C3
having minimum support.

Join: C3 = L2 on L2 = {{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2, I4}, {I2, I5}} X {{I1, I2}, {I1, I3}, {I1, I5}, {I2, I3}, {I2,
I4}, {I2, I5}} = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}.
Prune using the Apriori property: All nonempty subsets of a frequent itemset must also be frequent. Do
any of the candidates have a subset that is not frequent?
The 2-item subsets of {I1, I2, I3} are {I1, I2}, {I1, I3}, and {I2, I3}. All 2-item subsets of {I1, I2, I3} are
members of L2. Therefore, keep {I1, I2, I3} in C3.
The 2-item subsets of {I1, I2, I5} are {I1, I2}, {I1, I5}, and {I2, I5}. All 2-item subsets of {I1, I2, I5} are
members of L2. Therefore, keep {I1, I2, I5} in C3.
The 2-item subsets of {I1, I3, I5} are {I1, I3}, {I1, I5}, and {I3, I5}. {I3, I5} is not a member of L2, and so it is
not frequent. Therefore, remove {I1, I3, I5} from C3.
The 2-item subsets of {I2, I3, I4} are {I2, I3}, {I2, I4}, and {I3, I4}. {I3, I4} is not a member of L2, and so it is
not frequent. Therefore, remove {I2, I3, I4} from C3.
The 2-item subsets of {I2, I3, I5} are {I2, I3}, {I2, I5}, and {I3, I5}. {I3, I5} is not a member of L2, and so it is
not frequent. Therefore, remove {I2, I3, I5} from C3.
The 2-item subsets of {I2, I4, I5} are {I2, I4}, {I2, I5}, and {I4, I5}. {I4, I5} is not a member of L2, and so it is
Example
2

Types of Association Rules:


There are various types of association rules in data mining:-
•Multi-relational association rules
•Generalized association rules
•Quantitative association rules
•Interval information association rules
Example
2

1. Multi-relational association rules: Multi-Relation Association Rules


(MRAR) is a new class of association rules, different from original, simple, and
even multi-relational association rules (usually extracted from multi-relational
databases), each rule element consists of one entity but many a relationship.
These relationships represent indirect relationships between entities.
2. Generalized association rules: Generalized association rule extraction
is a powerful tool for getting a rough idea of ​interesting patterns hidden in
data. However, since patterns are extracted at each level of abstraction, the
mined rule sets may be too large to be used effectively for decision-making.
Therefore, in order to discover valuable and interesting knowledge, post-
processing steps are often required. Generalized association rules should
have categorical (nominal or discrete) properties on both the left and right
sides of the rule.
•3. Quantitative association rules: Quantitative association rules is a
special type of association rule. Unlike general association rules, where both
left and right sides of the rule should be categorical (nominal or discrete)
attributes, at least one attribute (left or right) of quantitative association
rules must contain numeric attributes
Uses of Association Rules

Uses of Association Rules


Some of the uses of association rules in different fields are given below:
•Medical Diagnosis: Association rules in medical diagnosis can be used to help doctors
cure patients. As all of us know that diagnosis is not an easy thing, and there are many
errors that can lead to unreliable end results. Using the multi-relational association rule,
we can determine the probability of disease occurrence associated with various factors
and symptoms.
•Market Basket Analysis: It is one of the most popular examples and uses of association
rule mining. Big retailers typically use this technique to determine the association
between items.
Uses of Association Rules

Uses of Association Rules


Some of the uses of association rules in different fields are given below:
•Medical Diagnosis: Association rules in medical diagnosis can be used to help doctors
cure patients. As all of us know that diagnosis is not an easy thing, and there are many
errors that can lead to unreliable end results. Using the multi-relational association rule,
we can determine the probability of disease occurrence associated with various factors
and symptoms.
•Market Basket Analysis: It is one of the most popular examples and uses of association
rule mining. Big retailers typically use this technique to determine the association
between items.

You might also like