0% found this document useful (0 votes)

13 views

Data Mining

Uploaded by

Ram Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Data Mining

Uploaded by

Ram Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Mining Frequent Patterns, Associations and Correlations

Frequent patterns in data mining are patterns that occur frequently in a dataset. Identifying
these patterns is crucial in tasks like association rule mining, sequential pattern mining, and
structural pattern mining. Frequent patterns can take the form of itemsets, subsequences, or
substructures.

• Itemsets: An itemset is a collection of items that occur together in a transactional

dataset. For example, in a supermarket dataset, an itemset might be {milk, bread,
butter}. Finding frequent itemsets helps in discovering products that are frequently
purchased together.
• Subsequences: A subsequence is a sequence of events or items that occur in a specific
order. For instance, in a dataset of customer transactions over time, a frequent
subsequence might be that customers buy {shampoo} followed by {conditioner}. This
is the basis for sequential
• Substructures: These are more complex and may represent graphs, trees, or networks
that frequently occur in the dataset. For example, in social networks, frequent
substructures could represent common patterns of relationships between users.

The goal is to find patterns that appear frequently enough to be of interest based on a user-
specified threshold, known as minimum support.

Association Rules
Association Rules are used is to identify relationships between items in large datasets.
Association rules are used to discover how the occurrence of one item is associated with the
occurrence of other items. These rules are expressed in the form of "if-then" statements that
describe the likelihood of items being purchased or occurring together.

Example: in a supermarket scenario, an association rule might be:

{Bread} → {Butter}, which means that if a customer buys bread, there is a significant
probability they will also buy butter.

An association rule is an implication of the form: X→Y

Where:
• X is called the antecedent (or left-hand side, LHS), which is a set of items (called an
itemset) that precedes.
• Y is called the consequent (or right-hand side, RHS), which is also an itemset that
follows.
The rule X → Y means that if a transaction contains X, there is a high likelihood that it also
contains Y.

The main task in association rule mining is to identify the strong rules discovered in databases
using measures of support, confidence, and lift
Example
Computer → antivirus software [support = 2%, confidence = 60%]

A support of 2% means that 2% of all the transactions under analysis show that computer
and antivirus software are purchased together.
A confidence of 60% means that 60% of the customers who purchased a computer also bought
the software.
Typically, association rules are considered interesting if they satisfy both a minimum support
threshold and a minimum confidence threshold. These thresholds can be a set by users or
domain experts.

Metrics for Association Rules

To evaluate the usefulness and strength of association rules, several key metrics are used:
Support, Confidence, Lift, and Conviction. These metrics determine how frequently the rule
occurs, how strong the rule is, and how useful it might be.

a) Support
Support refers to the frequency with which an itemset appears in the dataset.
For a rule X → Y, support is the fraction of transactions in the dataset that contain both X and
Y. It reflects the prevalence of the rule in the dataset.
Mathematically, support for an association rule X → Y is defined as:

Number of transactions containing both X and Y

Support(X→Y) = -------------------------------------------------------------
Total number of transactions

Example: If 100 out of 1,000 transactions contain both bread and butter, then the support for
the rule {Bread} → {Butter} is 10%.

b) Confidence
Confidence measures the strength of the association rule. It indicates the likelihood of Y
being purchased when X has already been purchased.
Confidence is defined as the ratio of the number of transactions that contain both X and Y to
the number of transactions that contain X.
Mathematically, confidence is calculated as:

Support(X∪Y)
Confidence(X→Y) = ----------------------
Support(X)

Example: If 80 out of 100 transactions that include bread also include butter, the confidence
of the rule {Bread} → {Butter} is 80%.
c) Lift
Lift measures how much more likely the consequent Y is to occur when the antecedent X has
occurred, compared to the likelihood of Y occurring independently.
Mathematically, lift is calculated as:
Confidence(X→Y)
Lift(X→Y) = -------------------------
Support(Y)

A lift value:
o Greater than 1: Indicates a positive association, meaning that the occurrence of X
increases the likelihood of Y occurring.
o Equal to 1: Indicates independence between X and Y (no association).
o Less than 1: Indicates a negative association, meaning that the occurrence of X
decreases the likelihood of Y occurring.

Example: If the lift of {Bread} → {Butter} is 1.5, it means that customers who buy bread are
1.5 times more likely to buy butter than a customer chosen at random.

Steps in Association Rule Mining

a) Step 1: Find Frequent Itemsets
The first step in association rule mining is to identify frequent itemsets. A frequent
itemset is a set of items that appears together in a significant number of transactions, as
determined by a user-defined minimum support threshold.

The most commonly used algorithms for frequent itemset mining include:
o Apriori Algorithm: Uses a level-wise search and employs the downward closure
property (if an itemset is frequent, all its subsets must also be frequent).
o FP-Growth Algorithm: Builds an FP-tree to find frequent itemsets without
candidate generation.
b) Step 2: Generate Strong Association Rules
After identifying frequent itemsets, the next step is to generate association rules from
these itemsets. Rules are generated by dividing the frequent itemsets into antecedent
and consequent pairs.
The rules are evaluated based on confidence. Only the rules with confidence higher
than a user-specified minimum confidence threshold are considered strong association
rules.
Market Basket Analysis
Frequent itemset mining leads to the discovery of associations and correlations among items in
large transactional or relational data sets
A typical example of frequent itemset mining is market basket analysis. This process analyzes
customer buying habits by finding associations between the different items that customers
place in their “shopping baskets” (see Figure below). The discovery of these associations can
help retailers develop marketing strategies by gaining insight into which items are frequently
purchased together by customers. For instance, if customers are buying milk, how likely are
they to also buy bread (and what kind of bread) on the same trip to the supermarket? This
information can lead to increased sales by helping retailers do selective marketing and plan
their shelf space.

“Which groups or sets of items are customers likely to purchase on a given trip to the store?”

To answer your question, market basket analysis may be performed on the retail data of
customer transactions at your store. You can then use the results to plan marketing or
advertising strategies, or in the design of a new catalog.

DM UNIT II (1)
No ratings yet
DM UNIT II (1)
30 pages
Data Mining frequent patterns
No ratings yet
Data Mining frequent patterns
22 pages
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
No ratings yet
Association Rule Mining:: "If A Customer Buys Bread, He's 70% Likely of Buying Milk."
12 pages
Unit-II Association Rules
No ratings yet
Unit-II Association Rules
16 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
UNIT 2 Updated (1) (1)
No ratings yet
UNIT 2 Updated (1) (1)
50 pages
14-Introduction to Apriori level wise algorithm-03-09-2024
No ratings yet
14-Introduction to Apriori level wise algorithm-03-09-2024
32 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations, and Correlations: Basic Concepts and Methods
12 pages
15. Association RuleMining
No ratings yet
15. Association RuleMining
52 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Module 2
No ratings yet
Module 2
13 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
Unit 3 Final
No ratings yet
Unit 3 Final
13 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
UNIT-2 DMA (2)
No ratings yet
UNIT-2 DMA (2)
68 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Association Rule Mining Using Apriori Al PDF
No ratings yet
Association Rule Mining Using Apriori Al PDF
11 pages
M5 m6 KC
No ratings yet
M5 m6 KC
36 pages
Unit 2
No ratings yet
Unit 2
14 pages
Clickstream Analytics
No ratings yet
Clickstream Analytics
22 pages
DM Unit 3
No ratings yet
DM Unit 3
22 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
20 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
DA Unit 4
No ratings yet
DA Unit 4
125 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
DMDW_Association Analysis
No ratings yet
DMDW_Association Analysis
12 pages
TMK_DWDM_Unit 4. From government engineering College
No ratings yet
TMK_DWDM_Unit 4. From government engineering College
176 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
lec2
No ratings yet
lec2
18 pages
UNIT-iii
No ratings yet
UNIT-iii
13 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Association Rule Mining: Applications in Various Areas: Akash Rajak and Mahendra Kumar Gupta
No ratings yet
Association Rule Mining: Applications in Various Areas: Akash Rajak and Mahendra Kumar Gupta
5 pages
DMDW 05
No ratings yet
DMDW 05
12 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Unit 3 - DM FULL
No ratings yet
Unit 3 - DM FULL
46 pages
CH - 5
No ratings yet
CH - 5
43 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Chapter 13 - Association Rules: Data Mining For Business Intelligence
No ratings yet
Chapter 13 - Association Rules: Data Mining For Business Intelligence
22 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
Association Rule Mining
No ratings yet
Association Rule Mining
8 pages
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
No ratings yet
UNIT 3: Association Rules and Regression: I) Apriori Algorithm
18 pages
Association
No ratings yet
Association
54 pages
Supply Chain Track: Management and Sustainability
From Everand
Supply Chain Track: Management and Sustainability
Arash Azadikhah Jahromi
No ratings yet
Economics for Investment Decision Makers Workbook: Micro, Macro, and International Economics
From Everand
Economics for Investment Decision Makers Workbook: Micro, Macro, and International Economics
Christopher D. Piros
No ratings yet
NOTES SOCIOLOGY OF EDUCATION AND COMPARATIVE EDUCATION SS_042659
No ratings yet
NOTES SOCIOLOGY OF EDUCATION AND COMPARATIVE EDUCATION SS_042659
30 pages
Gtu Information Technology 3140708 Winter 2023
No ratings yet
Gtu Information Technology 3140708 Winter 2023
3 pages
Circle of Concern and Circle of Influence
No ratings yet
Circle of Concern and Circle of Influence
4 pages
EPRS_ATA(2023)753162_EN
No ratings yet
EPRS_ATA(2023)753162_EN
2 pages
APC Umbrella Programme
No ratings yet
APC Umbrella Programme
33 pages
BITSAT Solved Paper 2009 PDF
No ratings yet
BITSAT Solved Paper 2009 PDF
132 pages
75HUMANSETTTLEMENTSYSTEMS
No ratings yet
75HUMANSETTTLEMENTSYSTEMS
9 pages
English paper 1 complete notes
No ratings yet
English paper 1 complete notes
5 pages
Morley - 2016 - Frailty and sarcopenia in elderly
No ratings yet
Morley - 2016 - Frailty and sarcopenia in elderly
7 pages
Encyclopedia of the World s Endangered Languages Routledge Language Family Series 1st Edition Christopher Moseleypdf download
100% (2)
Encyclopedia of the World s Endangered Languages Routledge Language Family Series 1st Edition Christopher Moseleypdf download
47 pages
Cognitive Development and Reading
No ratings yet
Cognitive Development and Reading
28 pages
Grade7 Grade 11 STEM Application Form SY 2025 2026
No ratings yet
Grade7 Grade 11 STEM Application Form SY 2025 2026
1 page
Chapter 1: Introduction and Research Methods
No ratings yet
Chapter 1: Introduction and Research Methods
40 pages
Geo Paper 1 MS
No ratings yet
Geo Paper 1 MS
14 pages
Pharma Formulation Challenges & Innovations-2
No ratings yet
Pharma Formulation Challenges & Innovations-2
8 pages
Glycosaminoglycans: (Mucopolysaccharides)
No ratings yet
Glycosaminoglycans: (Mucopolysaccharides)
50 pages
Down-Control-of-relaxation-cracking-in-austenitic-high-temperature-components-Van-Wortel
No ratings yet
Down-Control-of-relaxation-cracking-in-austenitic-high-temperature-components-Van-Wortel
60 pages
Download Full Point Process Calculus in Time and Space 1st Edition Pierre Brémaud PDF All Chapters
100% (1)
Download Full Point Process Calculus in Time and Space 1st Edition Pierre Brémaud PDF All Chapters
55 pages
Lubunca 01
100% (1)
Lubunca 01
55 pages
CHAPTER 1 LESSON 1
No ratings yet
CHAPTER 1 LESSON 1
20 pages
Design and Analysis of Commercial Building (C+G+5) Under Wind Load Analiysis Using Staad Pro
100% (1)
Design and Analysis of Commercial Building (C+G+5) Under Wind Load Analiysis Using Staad Pro
3 pages
General Specifications: Y/17B6 Pneumatic Buoyancy Transmitter
No ratings yet
General Specifications: Y/17B6 Pneumatic Buoyancy Transmitter
3 pages
Science Lesson Plan 1
No ratings yet
Science Lesson Plan 1
1 page
Promise of Sociology
No ratings yet
Promise of Sociology
6 pages
The Grandfather of Total Quality Management: Walter A. Shewhart (1891-1967)
100% (1)
The Grandfather of Total Quality Management: Walter A. Shewhart (1891-1967)
7 pages
Platonic Relationship - Google Search
No ratings yet
Platonic Relationship - Google Search
1 page
ĐỀ 12A + 20B
No ratings yet
ĐỀ 12A + 20B
8 pages
RADIAL-CEMENT-BOND-TOOL
No ratings yet
RADIAL-CEMENT-BOND-TOOL
3 pages
H.W
No ratings yet
H.W
17 pages
CL Day 3
No ratings yet
CL Day 3
4 pages

Data Mining

Uploaded by

Data Mining

Uploaded by

Mining Frequent Patterns, Associations and Correlations

• Itemsets: An itemset is a collection of items that occur together in a transactional

Example: in a supermarket scenario, an association rule might be:

An association rule is an implication of the form: X→Y

Metrics for Association Rules

Number of transactions containing both X and Y

Steps in Association Rule Mining

You might also like