0% found this document useful (0 votes)

5 views18 pages

Bigdata Section4

Uploaded by

larahesham225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views18 pages

Bigdata Section4

Uploaded by

larahesham225

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Introduction to Big

Data Management
Engineer Merna Magdy
Engineer Ahmed Ramadan
Association Rules

Association rules is a data mining technique used to discover

meaningful relationships, associations, or patterns within a
large dataset.

It helps in determining how frequently items or characteristics

appear together in a dataset.

The Apriori algorithm is the most well-known for mining

association rules.
Itemset Association Rule
• Collection of one or more • “If itemset A is bought,
items or attributes. then itemset B is bought”
represents the association
• Frequent itemset is the set rule statement
of items that appear
together often enough • It is denoted as A → B,
where A and B are itemsets
• Ex: In retail the items can
be products

Key Concepts
Support Confidence
and Terms
• Measures the frequency of • % of records that contains
occurrence of an itemset in a X which also contains Y
dataset
𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝑋∩𝑌)
• 𝐶𝑜𝑛𝑓 𝑋 → 𝑌 =
𝐶𝑜𝑢𝑛𝑡 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋)
• 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 = 𝑇𝑜𝑡𝑎𝑙 𝑅𝑒𝑐𝑜𝑟𝑑𝑠
• High confidence implies a
• High support indicated that the strong association between
itemset is frequently present in items A and B
the dataset Apriori Property: Any subset of a
frequent itemset is also frequent
Key Concepts and Terms

Lift Leverage
It provides valuable insights into how the It quantifies the difference between the observed frequency of
presence of one item in a transaction co-occurrence of two items in a dataset and the expected
affects the likelihood of the presence of frequency of co-occurrence if the items were independent of
another item. each other

𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋 ∩ 𝑌) 𝐿𝑒𝑣𝑒𝑟𝑎𝑔𝑒 𝑋 → 𝑌 = 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 ∩ 𝑌 − 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 ∗ 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑌

𝐿𝑖𝑓𝑡 𝑋 → 𝑌 =
𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 ∗ 𝑆𝑢𝑝𝑝𝑜𝑟𝑡(𝑌)
= 0: no association
= 1: no association > 0: positive association
> 1: positive association < 0: negative association
< 1: negative association
Lift and leverage are used to assess the statistical associations
and dependencies between items in data, providing insights
into the presence or absence of meaningful relationships.
The higher the leverage and lift values, the stronger the positive
association.
1. For the given dataset, apply apriori algorithm to discover strong
association rules among image tags. Assume that min support = 40% and
min confidence = 70%. Generate association rules from the freuqent
itemsets. Calculate the confidence of each rule and identify all the strong
association rules

Image ID Associated Tags

1 beach, sunshine, holiday

2 sand, beach

3 sunshine, beach, ocean

4 ocean, people, beach, sunshine

5 holiday, sunshine
Solution
𝐦𝐢𝐧 𝐬𝐮𝐩𝐩𝐨𝐫𝐭 = 𝟒𝟎%
𝐦𝐢𝐧 𝐬𝐮𝐩𝐩𝐨𝐫𝐭 𝐜𝐨𝐮𝐧𝐭 = 𝐦𝐢𝐧 𝐬𝐮𝐩𝐩𝐨𝐫𝐭 ∗ 𝐭𝐨𝐭𝐚𝐥 𝐜𝐨𝐮𝐧𝐭
𝟒𝟎% ∗ 𝟓 = 𝟐
Image Associated Tags
ID
1-itemset Support
1 beach, sunshine, count
holiday 1- Support
beach 4 Frequent count
2 sand, beach
holiday 2 itemset
3 sunshine, beach,
ocean ocean 2 beach 4

4 ocean, people, beach, people 1 holiday 2

sunshine sand 1 ocean 2
5 holiday, sunshine sunshine 4 sunshine 4
Solution
𝐦𝐢𝐧 𝐬𝐮𝐩𝐩𝐨𝐫𝐭 = 𝟐
1- Support
Frequent count 2-itemset Support
itemset count 2-Frequent Support
beach 4 beach, holiday 1 itemset count
holiday 2 beach, ocean 2 beach, ocean 2
ocean 2 beach, sunshine 3 beach, sunshine 3
sunshine 4 holiday, ocean 0 holiday, sunshine 2
holiday, sunshine 2 ocean, sunshine 2
ocean, sunshine 2
Solution
𝐦𝐢𝐧 𝐬𝐮𝐩𝐩𝐨𝐫𝐭 = 𝟐

2-Frequent Support 3-itemset Support count

itemset count
beach, ocean 2 beach, ocean, holiday 0
beach, sunshine 3 beach, ocean, sunshine 2
holiday, sunshine 2 beach, holiday, sunshine 1
ocean, sunshine 2 ocean, holiday, sunshine 0

3-Frequent itemset Support count

beach, ocean, sunshine 2
Solution
𝐦𝐢𝐧 𝐜𝐨𝐧𝐟𝐢𝐝𝐞𝐧𝐜𝐞 = 𝟕𝟎%

2-itemset Confidence 𝐒𝐮𝐩𝐩𝐨𝐫𝐭(𝐗 ∩ 𝐘)

𝐂𝐨𝐧𝐟 𝐗 → 𝒀 =
𝒃𝒆𝒂𝒄𝒉} → {𝒐𝒄𝒆𝒂𝒏 2ൗ = 0.5 𝐒𝐮𝐩𝐩𝐨𝐫𝐭 (𝐗)
4
𝒐𝒄𝒆𝒂𝒏} → {𝒃𝒆𝒂𝒄𝒉 2ൗ = 1
2 3-itemset Confidence
𝒃𝒆𝒂𝒄𝒉} → {𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 3ൗ = 0.75 𝒃𝒆𝒂𝒄𝒉} → {𝒐𝒄𝒆𝒂𝒏, 𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 2ൗ = 0.5
4 4
𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆} → {𝒃𝒆𝒂𝒄𝒉 3ൗ = 0.75 𝒐𝒄𝒆𝒂𝒏, 𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆} → {𝒃𝒆𝒂𝒄𝒉 2ൗ = 1
4 2
𝒉𝒐𝒍𝒊𝒅𝒂𝒚} → {𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 2ൗ = 1 𝒐𝒄𝒆𝒂𝒏} → {𝒃𝒆𝒂𝒄𝒉, 𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 2ൗ = 1
2 2
𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆} → {𝒉𝒐𝒍𝒊𝒅𝒂𝒚 2ൗ = 0.5 𝒃𝒆𝒂𝒄𝒉, 𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆} → {𝒐𝒄𝒆𝒂𝒏 2ൗ = 0.67
4 3
𝒐𝒄𝒆𝒂𝒏} → {𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 2ൗ = 1 𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆} → {𝒃𝒆𝒂𝒄𝒉, 𝒐𝒄𝒆𝒂𝒏 2ൗ = 0.5
2 4
𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆} → {𝒐𝒄𝒆𝒂𝒏 2ൗ = 0.5 𝒃𝒆𝒂𝒄𝒉, 𝒐𝒄𝒆𝒂𝒏} → {𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 2ൗ = 1
4 2
Solution 𝐒𝐮𝐩𝐩𝐨𝐫𝐭(𝐗 ∩ 𝐘)
𝐂𝐨𝐧𝐟 𝐗 → 𝒀 =
𝐒𝐮𝐩𝐩𝐨𝐫𝐭 (𝐗)

𝑺𝒖𝒑𝒑𝒐𝒓𝒕 (𝑿 ∩ 𝒀)
𝑳𝒊𝒇𝒕 𝑿 → 𝒀 =
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∗ 𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝒀)

𝑳𝒆𝒗𝒆𝒓𝒂𝒈𝒆 𝑿 → 𝒀 = 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∩ 𝒀 − [𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∗ 𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝒀)]

2-itemset Confidence Lift Leverage 3-itemset Confidence Lift Leverage

𝒐𝒄𝒆𝒂𝒏} → {𝒃𝒆𝒂𝒄𝒉 1 1.25 0.08 𝒐𝒄𝒆𝒂𝒏, 𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 → 𝒃𝒆𝒂𝒄𝒉 1 1.25 0.08

𝒃𝒆𝒂𝒄𝒉} → {𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 0.75 0.938 -0.04 𝒐𝒄𝒆𝒂𝒏 → 𝒃𝒆𝒂𝒄𝒉, 𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 1 1.667 0.16

𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆} → {𝒃𝒆𝒂𝒄𝒉 0.75 0.938 -0.04 𝒃𝒆𝒂𝒄𝒉, 𝒐𝒄𝒆𝒂𝒏 → 𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 1 1.25 0.08

𝒉𝒐𝒍𝒊𝒅𝒂𝒚} → {𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 1 1.25 0.08

𝒐𝒄𝒆𝒂𝒏} → {𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 1 1.25 0.08

2. An online retailer has a database that stores 20,000 transactions of last month. After
analyzing the data, a data science team has identified the following statistics:

{Milk} appears in 18,000 transactions.

{Cheese} appears in 16,000 transactions.
{Rice} appears in 15,000 transactions.
{Yogurt} appears in 14,000 transactions.
{Pasta} appears in 13,500 transactions.
{Oil} appears in 12,000 transactions.
{Cereal} appears in 10,000 transactions.
{Pasta, Oil} appears in 11,500 transactions.
{Rice, Oil} appears in 9,500 transactions.
{Milk, Cheese} appears in 13,000 transactions.
{Milk, Cereal} appears in 10,000 transactions.
{Milk, Yogurt} appears in 8,000 transactions.
{Milk, Cereal, Cheese} appears in 8,500 transactions.
{Milk, Cereal, Yogurt} appears in 7,500 transactions.
{Milk, Cereal, Cheese, Yogurt} appears in 7,350 transactions.
Applying Apriori algorithm, answer the following questions:

a) What are the support values of the preceding itemsets?

b) Assuming the minimum support is 0.4, which itemsets are considered frequent?

c) What are the confidence values of the following rules:

1. {Milk} → {Cereal}
2. {Milk, Cereal} → {Cheese}
3. {Milk, Cereal, Cheese} → {Yogurt}
Which of the three rules is more interesting? Why?

d) List all the candidate rules that can be formed from the statistics. Which rules are
considered interesting at the minimum confidence 0.2? out of these interesting rules, which
rule is considered the most useful (least coincidental)?
Solution
a) What are the support values of the preceding itemsets?
𝒄𝒐𝒖𝒏𝒕
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 =
𝟏𝟏,𝟓𝟎𝟎 𝒕𝒐𝒕𝒂𝒍
𝑴𝒊𝒍𝒌 =
𝟏𝟖,𝟎𝟎𝟎
= 𝟎. 𝟗 𝑷𝒂𝒔𝒕𝒂, 𝑶𝒊𝒍 = = 𝟎. 𝟓𝟕𝟓
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟗,𝟓𝟎𝟎
𝑪𝒉𝒆𝒆𝒔𝒆 =
𝟏𝟔,𝟎𝟎𝟎
= 𝟎. 𝟖 𝑹𝒊𝒄𝒆, 𝑶𝒊𝒍 = = 𝟎. 𝟒𝟕𝟓
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟏𝟑,𝟎𝟎𝟎
𝑹𝒊𝒄𝒆 =
𝟏𝟓,𝟎𝟎𝟎
= 𝟎. 𝟕𝟓 𝑴𝒊𝒍𝒌, 𝑪𝒉𝒆𝒆𝒔𝒆 = = 𝟎. 𝟔𝟓
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟏𝟎,𝟎𝟎𝟎
𝒀𝒐𝒈𝒖𝒓𝒕 =
𝟏𝟒,𝟎𝟎𝟎
= 𝟎. 𝟕 𝑴𝒊𝒍𝒌, 𝑪𝒆𝒓𝒆𝒂𝒍 = = 𝟎. 𝟓
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟖,𝟎𝟎𝟎
𝑷𝒂𝒔𝒕𝒂 =
𝟏𝟑,𝟓𝟎𝟎
= 𝟎. 𝟔𝟕𝟓 𝑴𝒊𝒍𝒌, 𝒀𝒐𝒈𝒖𝒓𝒕 = = 𝟎. 𝟒
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟖,𝟓𝟎𝟎
𝑶𝒊𝒍 =
𝟏𝟐,𝟎𝟎𝟎
= 𝟎. 𝟔 𝑴𝒊𝒍𝒌, 𝑪𝒆𝒓𝒆𝒂𝒍, 𝑪𝒉𝒆𝒆𝒔𝒆 = = 𝟎. 𝟒𝟐5
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟕,𝟓𝟎𝟎
𝑪𝒆𝒓𝒆𝒂𝒍 =
𝟏𝟎,𝟎𝟎𝟎
= 𝟎. 𝟓 𝑴𝒊𝒍𝒌, 𝑪𝒆𝒓𝒆𝒂𝒍, 𝒀𝒐𝒈𝒖𝒓𝒕 = = 𝟎. 𝟑𝟕𝟓
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟕,𝟑𝟓𝟎
𝑴𝒊𝒍𝒌, 𝑪𝒆𝒓𝒆𝒂𝒍, 𝑪𝒉𝒆𝒆𝒔𝒆, 𝒀𝒐𝒈𝒖𝒓𝒕 = = 𝟎. 𝟑𝟔𝟖
𝟐𝟎,𝟎𝟎𝟎
Solution
b) Assuming the minimum support is 0.4, which itemsets are considered frequent?

𝟏𝟖,𝟎𝟎𝟎
𝑴𝒊𝒍𝒌 = = 𝟎. 𝟗 𝑹𝒊𝒄𝒆, 𝑶𝒊𝒍 =
𝟗,𝟓𝟎𝟎
= 𝟎. 𝟒𝟕𝟓
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟏𝟔,𝟎𝟎𝟎
𝑪𝒉𝒆𝒆𝒔𝒆 = = 𝟎. 𝟖 𝑴𝒊𝒍𝒌, 𝑪𝒉𝒆𝒆𝒔𝒆 =
𝟏𝟑,𝟎𝟎𝟎
= 𝟎. 𝟔𝟓
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟏𝟓,𝟎𝟎𝟎
𝑹𝒊𝒄𝒆 = = 𝟎. 𝟕𝟓 𝑴𝒊𝒍𝒌, 𝑪𝒆𝒓𝒆𝒂𝒍 =
𝟏𝟎,𝟎𝟎𝟎
= 𝟎. 𝟓
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟏𝟒,𝟎𝟎𝟎
𝒀𝒐𝒈𝒖𝒓𝒕 = = 𝟎. 𝟕 𝑴𝒊𝒍𝒌, 𝒀𝒐𝒈𝒖𝒓𝒕 =
𝟖,𝟎𝟎𝟎
= 𝟎. 𝟒
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟏𝟑,𝟓𝟎𝟎
𝑷𝒂𝒔𝒕𝒂 = = 𝟎. 𝟔𝟕𝟓 𝑴𝒊𝒍𝒌, 𝑪𝒆𝒓𝒆𝒂𝒍, 𝑪𝒉𝒆𝒆𝒔𝒆 =
𝟖,𝟓𝟎𝟎
= 𝟎. 𝟒𝟐
𝟐𝟎,𝟎𝟎𝟎
𝟐𝟎,𝟎𝟎𝟎

𝟏𝟐,𝟎𝟎𝟎
𝑶𝒊𝒍 = = 𝟎. 𝟔
𝟐𝟎,𝟎𝟎𝟎

𝟏𝟎,𝟎𝟎𝟎
𝑪𝒆𝒓𝒆𝒂𝒍 = = 𝟎. 𝟓
𝟐𝟎,𝟎𝟎𝟎

𝟏𝟏,𝟓𝟎𝟎
𝑷𝒂𝒔𝒕𝒂, 𝑶𝒊𝒍 = = 𝟎. 𝟓𝟕𝟓
𝟐𝟎,𝟎𝟎𝟎
Solution
c) What are the confidence values of the following rules:
1. {Milk} → {Cereal}
2. {Milk, Cereal} → {Cheese} 𝐒𝐮𝐩𝐩𝐨𝐫𝐭(𝐗 ∩ 𝐘)
3. {Milk, Cereal, Cheese} → {Yogurt} 𝐂𝐨𝐧𝐟 𝐗 → 𝒀 =
𝐒𝐮𝐩𝐩𝐨𝐫𝐭 (𝐗)
Which of the three rules is more interesting? Why?
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 (𝑿 ∩ 𝒀)
𝑳𝒊𝒇𝒕 𝑿 → 𝒀 =
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∗ 𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝒀)

{Milk} → {Cereal}: 𝑳𝒆𝒗𝒆𝒓𝒂𝒈𝒆 𝑿 → 𝒀 = 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∩ 𝒀 − [𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∗ 𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝒀)]

𝟎.𝟓
𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = 𝟎.𝟗 = 𝟎. 𝟓𝟓𝟔
𝟎.𝟓
𝑳𝒊𝒇𝒕 = 𝟎.𝟗 ∗ 𝟎.𝟓 = 𝟏. 𝟏𝟏 {Milk, Cereal, Cheese} → {Yogurt}:
𝟎.𝟑𝟔𝟖
𝑳𝒆𝒗𝒆𝒓𝒂𝒈𝒆 = 𝟎. 𝟓 − (𝟎. 𝟗 ∗ 𝟎. 𝟓) = 𝟎. 𝟎𝟓 𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = 𝟎.𝟒𝟐𝟓 = 𝟎. 𝟖𝟔𝟓
𝟎.𝟑𝟔𝟖
𝑳𝒊𝒇𝒕 = 𝟎.𝟒𝟐𝟓 ∗ 𝟎.𝟕 = 𝟏. 𝟐𝟑𝟔
{Milk, Cereal} → {Cheese}: 𝑳𝒆𝒗𝒆𝒓𝒂𝒈𝒆 = 𝟎. 𝟑𝟔𝟖 − 𝟎. 𝟒𝟐𝟓 ∗ 𝟎. 𝟕 = 𝟎. 𝟎𝟕
𝟎.𝟒𝟐𝟓
𝑪𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 = 𝟎.𝟓 = 𝟎. 𝟖𝟓
𝟎.𝟒𝟐𝟓
𝑳𝒊𝒇𝒕 = 𝟎.𝟓 ∗ 𝟎.𝟖 = 𝟏. 𝟎𝟔𝟐𝟓 Rule 3 has the largest confidence and lift thus it is
𝑳𝒆𝒗𝒆𝒓𝒂𝒈𝒆 = 𝟎. 𝟒𝟐𝟓 − (𝟎. 𝟓 ∗ 𝟎. 𝟖) = 𝟎. 𝟎𝟐𝟓 more interesting
Solution
d) List all the candidate rules that can be formed from the statistics. Which rules are
considered interesting at the minimum confidence 0.2? out of these interesting rules,
which rule is considered the most useful (least coincidental)?

𝐒𝐮𝐩𝐩𝐨𝐫𝐭(𝐗 ∩ 𝐘)
𝐂𝐨𝐧𝐟 𝐗 → 𝒀 =
𝐒𝐮𝐩𝐩𝐨𝐫𝐭 (𝐗)
2- Frequent Support
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 (𝑿 ∩ 𝒀)
itemset 𝑳𝒊𝒇𝒕 𝑿 → 𝒀 =
𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∗ 𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝒀)
Pasta, Oil 0.575
𝑳𝒆𝒗𝒆𝒓𝒂𝒈𝒆 𝑿 → 𝒀 = 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∩ 𝒀 − [𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∗ 𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝒀)]
Rice, Oil 0.475
Milk, Cheese 0.65
Milk, Cereal 0.5 3-Frequent itemset Support
Milk, Yogurt 0.4 Milk, Cereal, Cheese 0.42
Solution
d) List all the candidate rules that can be formed from the statistics. Which rules are
considered interesting at the minimum confidence 0.2? out of these interesting rules,
which rule is considered the most useful (least coincidental)?

2-itemset Confidence Lift Leverage 3-itemset Confidence Lift Leverage

𝑷𝒂𝒔𝒕𝒂} → {𝑶𝒊𝒍 0.85 1.419 0.17 𝑴𝒊𝒍𝒌 → 𝑪𝒆𝒓𝒆𝒂𝒍, 𝑪𝒉𝒆𝒆𝒔𝒆 0.467 - -

𝑶𝒊𝒍} → {𝑷𝒂𝒔𝒕𝒂 0.958 1.419 0.17 𝑪𝒆𝒓𝒆𝒂𝒍, 𝑪𝒉𝒆𝒆𝒔𝒆 → 𝑴𝒊𝒍𝒌 - - -

𝑹𝒊𝒄𝒆} → {𝑶𝒊𝒍 0.63 1.056 0.025 𝑪𝒉𝒆𝒆𝒔𝒆 → 𝑴𝒊𝒍𝒌, 𝑪𝒆𝒓𝒆𝒂𝒍 0.525 1.05 0.02

𝑶𝒊𝒍} → {𝑹𝒊𝒄𝒆 0.79 1.056 0.025 𝑴𝒊𝒍𝒌, 𝑪𝒆𝒓𝒆𝒂𝒍 → 𝑪𝒉𝒆𝒆𝒔𝒆 0.84 1.05 0.02

𝑴𝒊𝒍𝒌} → {𝑪𝒉𝒆𝒆𝒔𝒆 0.72 0.903 -0.07 𝑪𝒆𝒓𝒆𝒂𝒍 → 𝑴𝒊𝒍𝒌, 𝑪𝒉𝒆𝒆𝒔𝒆 0.84 1.292 0.095

𝑪𝒉𝒆𝒆𝒔𝒆} → {𝑴𝒊𝒍𝒌 0.813 0.903 -0.07 𝑴𝒊𝒍𝒌, 𝑪𝒉𝒆𝒆𝒔𝒆 → 𝑪𝒆𝒓𝒆𝒂𝒍 0.646 1.292 0.095

𝑴𝒊𝒍𝒌} → {𝑪𝒆𝒓𝒆𝒂𝒍 0.556 1.111 0.05

𝑪𝒆𝒓𝒆𝒂𝒍} → {𝑴𝒊𝒍𝒌 1 1.111 0.05

𝑶𝒊𝒍} → {𝑷𝒂𝒔𝒕𝒂
𝑴𝒊𝒍𝒌} → {𝒀𝒐𝒈𝒖𝒓𝒕 0.444 0.635 -0.23

𝒀𝒐𝒈𝒖𝒓𝒕} → {𝑴𝒊𝒍𝒌 0.571 0.635 -0.23

Thank you!☺
[email protected]
[email protected]

HackerRank Python Practice Topics
0% (1)
HackerRank Python Practice Topics
14 pages
Data Structure Unit 5
50% (4)
Data Structure Unit 5
14 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Association Rules
No ratings yet
Association Rules
29 pages
Association Rules
No ratings yet
Association Rules
14 pages
DWDM FINAL4
No ratings yet
DWDM FINAL4
37 pages
Samsung R&D Interview Exp
No ratings yet
Samsung R&D Interview Exp
3 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Pattern Recognition and Machine Learning: Fuzzy Sets in Pattern Recognition Debrup Chakraborty Cinvestav
No ratings yet
Pattern Recognition and Machine Learning: Fuzzy Sets in Pattern Recognition Debrup Chakraborty Cinvestav
15 pages
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
No ratings yet
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
18 pages
Unit 4
No ratings yet
Unit 4
72 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Association Rules Notes
No ratings yet
Association Rules Notes
30 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
ML Algorithm
No ratings yet
ML Algorithm
12 pages
Lecture 11 Assiciation Rules II M
No ratings yet
Lecture 11 Assiciation Rules II M
27 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Class 4-Associative Analysis
No ratings yet
Class 4-Associative Analysis
42 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
DM Unit Ii
No ratings yet
DM Unit Ii
30 pages
Association Rule
No ratings yet
Association Rule
5 pages
DWDM 3
No ratings yet
DWDM 3
31 pages
Program No 6
No ratings yet
Program No 6
7 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Mod 5
No ratings yet
Mod 5
56 pages
Unit 4
No ratings yet
Unit 4
97 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Unit 2
No ratings yet
Unit 2
8 pages
ML Module3
No ratings yet
ML Module3
83 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
06 Association Rules
No ratings yet
06 Association Rules
32 pages
Association Rule Mapping - Unit-4
No ratings yet
Association Rule Mapping - Unit-4
11 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
5 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
BIA Unit 4
No ratings yet
BIA Unit 4
11 pages
Association Rules
No ratings yet
Association Rules
24 pages
Additional Exercises
No ratings yet
Additional Exercises
4 pages
Evolutionary Algorithms PDF
No ratings yet
Evolutionary Algorithms PDF
2 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
How
No ratings yet
How
4 pages
Association Rule
No ratings yet
Association Rule
27 pages
UNIT 2 Updated
No ratings yet
UNIT 2 Updated
50 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Chapter 13 - Association Rules: Data Mining For Business Intelligence
No ratings yet
Chapter 13 - Association Rules: Data Mining For Business Intelligence
22 pages
Data Analysis (No Free Launch Theorem)
No ratings yet
Data Analysis (No Free Launch Theorem)
8 pages
Arm 1
No ratings yet
Arm 1
46 pages
CSBP319 Fa21 LCN 4
No ratings yet
CSBP319 Fa21 LCN 4
61 pages
Assignment 2
No ratings yet
Assignment 2
13 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
AR Measures: Tea & Coffee Association (Tea T - Coffee C) Association (Coffee C - Tea T)
No ratings yet
AR Measures: Tea & Coffee Association (Tea T - Coffee C) Association (Coffee C - Tea T)
1 page
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
DibyasomPuhan - DAA Assignment 2
No ratings yet
DibyasomPuhan - DAA Assignment 2
14 pages
Cse Pps Question Bank (24-25) For Mid2
No ratings yet
Cse Pps Question Bank (24-25) For Mid2
12 pages
Asymptotic Notations
No ratings yet
Asymptotic Notations
18 pages
DS3 Queues
No ratings yet
DS3 Queues
38 pages
Interesting Measures For Mining Association Rules: FAST-NUCES, Lahore
No ratings yet
Interesting Measures For Mining Association Rules: FAST-NUCES, Lahore
4 pages
Mla - 2 (Cia - 3) - 20221013
No ratings yet
Mla - 2 (Cia - 3) - 20221013
21 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
9 AIML Question Bank Updated 5 Units
No ratings yet
9 AIML Question Bank Updated 5 Units
21 pages
Searching Algorithms
No ratings yet
Searching Algorithms
12 pages
Daa Lab Final
No ratings yet
Daa Lab Final
39 pages
100 Top Data Structures and Algorithms Multiple Choice Questions and Answers
No ratings yet
100 Top Data Structures and Algorithms Multiple Choice Questions and Answers
31 pages
1.7 Recursion
No ratings yet
1.7 Recursion
23 pages
Journal of Parallel and Distributed Computing
No ratings yet
Journal of Parallel and Distributed Computing
13 pages
Aiml Lab
No ratings yet
Aiml Lab
14 pages
Data, Information and Data Structure
No ratings yet
Data, Information and Data Structure
40 pages
Lupi
No ratings yet
Lupi
5 pages
Introduction To Graph Theory
No ratings yet
Introduction To Graph Theory
8 pages
Fin500J Topic05 NumericalOptimization 2010
No ratings yet
Fin500J Topic05 NumericalOptimization 2010
25 pages
DSA Oral Questions
No ratings yet
DSA Oral Questions
2 pages
L10 - Strobogrammatic Number
No ratings yet
L10 - Strobogrammatic Number
15 pages
Submissions S Itcs123 Bit16 Intermediate Programming Lec 2nd Sem 2022 2023 S Itcs123 Finals Summative Dlsu D College Gs
No ratings yet
Submissions S Itcs123 Bit16 Intermediate Programming Lec 2nd Sem 2022 2023 S Itcs123 Finals Summative Dlsu D College Gs
9 pages
BigData Section2
No ratings yet
BigData Section2
17 pages
IM535 - MOD003 Trade Protectionism
No ratings yet
IM535 - MOD003 Trade Protectionism
44 pages
Lab Graph
No ratings yet
Lab Graph
4 pages
BigData Section1
No ratings yet
BigData Section1
14 pages
2ND IA Que Bank
No ratings yet
2ND IA Que Bank
2 pages
Kortext - PDF Reader
No ratings yet
Kortext - PDF Reader
7 pages
Shoelace
No ratings yet
Shoelace
2 pages
Artificial Variables: V O Thomas
No ratings yet
Artificial Variables: V O Thomas
19 pages

Bigdata Section4

Uploaded by

Bigdata Section4

Uploaded by

Introduction to Big

Association rules is a data mining technique used to discover

It helps in determining how frequently items or characteristics

The Apriori algorithm is the most well-known for mining

𝑆𝑢𝑝𝑝𝑜𝑟𝑡 (𝑋 ∩ 𝑌) 𝐿𝑒𝑣𝑒𝑟𝑎𝑔𝑒 𝑋 → 𝑌 = 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 ∩ 𝑌 − 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑋 ∗ 𝑆𝑢𝑝𝑝𝑜𝑟𝑡 𝑌

Image ID Associated Tags

3 sunshine, beach, ocean

4 ocean, people, beach, sunshine

4 ocean, people, beach, people 1 holiday 2

2-Frequent Support 3-itemset Support count

3-Frequent itemset Support count

2-itemset Confidence 𝐒𝐮𝐩𝐩𝐨𝐫𝐭(𝐗 ∩ 𝐘)

𝑳𝒆𝒗𝒆𝒓𝒂𝒈𝒆 𝑿 → 𝒀 = 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∩ 𝒀 − [𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∗ 𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝒀)]

2-itemset Confidence Lift Leverage 3-itemset Confidence Lift Leverage

𝒉𝒐𝒍𝒊𝒅𝒂𝒚} → {𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 1 1.25 0.08

𝒐𝒄𝒆𝒂𝒏} → {𝒔𝒖𝒏𝒔𝒉𝒊𝒏𝒆 1 1.25 0.08

{Milk} appears in 18,000 transactions.

a) What are the support values of the preceding itemsets?

c) What are the confidence values of the following rules:

{Milk} → {Cereal}: 𝑳𝒆𝒗𝒆𝒓𝒂𝒈𝒆 𝑿 → 𝒀 = 𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∩ 𝒀 − [𝑺𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 ∗ 𝑺𝒖𝒑𝒑𝒐𝒓𝒕(𝒀)]

2-itemset Confidence Lift Leverage 3-itemset Confidence Lift Leverage

𝑶𝒊𝒍} → {𝑷𝒂𝒔𝒕𝒂 0.958 1.419 0.17 𝑪𝒆𝒓𝒆𝒂𝒍, 𝑪𝒉𝒆𝒆𝒔𝒆 → 𝑴𝒊𝒍𝒌 - - -

𝑴𝒊𝒍𝒌} → {𝑪𝒆𝒓𝒆𝒂𝒍 0.556 1.111 0.05

𝑪𝒆𝒓𝒆𝒂𝒍} → {𝑴𝒊𝒍𝒌 1 1.111 0.05

𝒀𝒐𝒈𝒖𝒓𝒕} → {𝑴𝒊𝒍𝒌 0.571 0.635 -0.23

You might also like