0% found this document useful (0 votes)

30 views34 pages

Unit 3 1

This document provides an overview of association analysis and frequent pattern mining. It discusses how retailers can analyze customer transaction data to discover purchasing patterns and relationships between items. The key goals are to find frequent itemsets that occur together above a minimum support threshold and generate high-confidence association rules. It describes the basic terminology, formulations, and challenges of the problem. It also gives an overview of the Apriori algorithm for mining frequent itemsets and generating association rules from transactional data.

Uploaded by

o190585

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views34 pages

Unit 3 1

Uploaded by

o190585

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Mining Frequent Patterns,

Association and Correlations:

Basic Concepts and Methods

BY
Bheema. Shirisha
s soc ia ti o n
A
Ana l ys is
• Many business enterprises generate large quantities of data
from their daily operations.
– Example: Customer purchase data are collected daily at the
checkout counters of grocery stores.

• Each row corresponds to a transaction, which contains a

unique identifier labeled TID and a set of items.
• Retailers are interested in analyzing the data to learn about
the purchasing behavior of their customers.
• Such valuable information can be used to support a variety of
business-related applications such as marketing promotions,
inventory management, catalog design, store layout and
customer relationship management.
Association Analysis
• Useful for discovering interesting relationships
hidden in large data sets.
• The uncovered relationships can be represented in
the form of association rules or sets of frequent
items
{Diapers}  {Beer}
• The rule suggests, customers who buy diapers also
buy beer.
• Retailers can use this type of rules to help them
identify new opportunities for cross-selling their
products to the customers.
Key Issues
• Two key issues when applying association
analysis to market basket data.
– Discovering patterns from a large transaction data
set can be computationally expensive.
– Some of the discovered patterns are potentially
spurious because they may happen simply by
chance.
Basic
Terminology

• Each row corresponds to a transaction and each column

corresponds to an item.
• An item can be treated as a binary variable whose value
is one if the item is present in a transaction and zero
otherwise.
• Because the presence of an item in a transaction is often
considered more important than its absence, an item is
an asymmetric binary variable.
• It ignores certain important aspects of the data such as
the quantity of items sold or the price paid to purchase
them.
Itemset and Support Count
• Let I ={i1,i2,... ,id} be the set of all items in a market basket data and T
={t1,t2,...,tN} be the set of all transactions.
• Each transaction ti contains a subset of items chosen from I.

• In association analysis, a collection of zero or more items is termed an itemset.

• If an itemset contains k items, it is called a k-itemset.
• {Beer, Diapers, Milk} is an example of a 3-itemset.
• The null (or empty) set is an itemset that does not contain any items.

• Support count refers to the number of transactions that contain a particular

itemset.
• Support count for {Beer, Diapers, Milk} is equal to two because there are
only two transactions that contain all three items

• The transaction width is defined as the number of items present in a

transaction.
Association Rule
• An association rule is an implication expression of
the form X → Y, where X and Y are disjoint
itemsets, i.e., X ∩Y = ∅ .
• The strength of an association rule can be
measured in terms of its support and confidence.
• Support determines how often a rule is applicable
to a given data set
• Confidence determines how frequently items in Y
appear in transactions that contain X.
{Milk, Diapers} → {Beer}
Why Use Support and Confidence?
• A low support rule is uninteresting from a business
perspective because it may not be profitable to promote.
• Support is often used to eliminate uninteresting rules

• Confidence measures the reliability of the inference made by

a rule
• For a given rule X → Y, the higher the confidence, the more
likely it is for Y to be present in transactions that contain X.
• Confidence also provides an estimate of the conditional
probability of Y given X.
• It suggests a strong co-occurrence relationship between items
in the antecedent and consequent of the rule.
Formulation of Association Rule
Mining Problem
• Definition 6.1 (Association Rule Discovery):

Given a set of transactions T, find all the rules

having support ≥ min_sup and confidence ≥
min_conf, where min_sup and min_conf are the
corresponding support and confidence thresholds.
Brute-force approach
• The support of a rule X → Y depends only on the
support of its corresponding itemset, X ∪ Y.
• For example, the following rules have identical
support because they involve items from the same
itemset, {Beer, Diapers, Milk}:

• If the itemset is infrequent, then all six candidate

rules can be pruned immediately without our
having to compute their confidence values
Association rule mining algorithms
• A common strategy adopted by many association
rule mining algorithms is to decompose the
problem into two major subtasks:
• 1. Frequent Itemset Generation whose objective
is to find all the itemsets that satisfy the min_sup
threshold. These itemsets are called frequent
itemsets.
• 2. Rule Generation whose objective is to extract
all the high-confidence rules from the frequent
itemsets found in the previous step. These rules are
called strong rules.
Basic Concepts
• Frequent patterns and Association rules are helpful for
making recommendations in business
• Frequent patterns are itemsets, subsequences or substructures
that appear frequently in a data set
– Frequent itemset – appear frequently in a transaction dataset
(Milk and Bread)
– Subsequence – Buying first PC, then camera, then a memory
card, if it occurs frequently in a shopping history DB is a
sequential pattern
– Substructure – refer to different structural forms like subgraphs,
subtrees or sublattices which may be combined with itemsets or
subsequences.
• Finding frequent patterns plays an essential role in mining
associations and correlation relationships among data stored
in transactional and relational data.
• Helps in data classification, clustering and other DM tasks
Market Basket Analysis(MBA)
• Discovery of interesting
correlation and association
relationships can help in many
business decision making
process
• Helpful for selective
marketing and plan for shelf
space
• Advertisement strategies or
design the new catalogue
• Design of different store
layouts
• To plan which items to put on
sale at reduced prices
Cont.,
• Set of items available at store, each item has a Boolean
variable, each basket can be represented by a Boolean
vector of values assigned to items
• The Boolean vector can be analysed for buying patterns
that are frequently associated or purchased together.

• Customers who purchase computer also tend to buy

antivirus software at the same time
• Support and Confidence are two measures of rule
interestingness
Cont.,

• Support of 2% means that 2% of all the transactions

under analysis show that computer and AV purchased
together
• Confidence of 60% means that 60% of customers who
purchased a computer also bought the AV
• Association rules are considered interesting if they
satisfy both minimum support threshold and
minimum confidence threshold
• Additional analysis can be performed to discover
interesting statistical correlations between associated
items
Frequent Itemsets, Closed Itemsets and
Association Rules
• The set {computer, antivirus} is a 2 itemset
• The occurrence frequency of an itemset is the
number of transactions that contain the itemset.
• This is also known as frequency, support count or
count
• Association rule mining can be viewed as two step
process
• i. Find all frequent itemsets – that has to support
min_sup
• ii. Generate strong association rules from the
frequent itemsets – must satisfy min_sup and
min_conf
Closed frequent itemsets and Maximal
frequent itemsets
• An itemset X is closed in a dataset D if there exists no proper
super-itemset Y s.t Y has the same support count as X in D
• An itemset X is a closed frequent itemset in set D if X is
both closed and frequent in D
• An itemset X is a maximal frequent itemset in a dataset D if
X is frequent, and there exists no super – itemset Y s.t
and Y is frequent in D

• Let C be the set of closed frequent itemsets for a dataset D

satisfying a min_sup_thershold
• Let M be the set of maximal frequent itemsets for a dataset D
satisfying min_sup
Example
Frequent Itemset Mining Methods
• Apriori Algorithm : mining frequent itemsets for
Boolean association rules
• Apriori is a candidate generation and test approach
• Name of the algorithm is based on Prior
Knowledge of frequent itemset properties
• It is an iterative approach known as level-wise
search, where k-itemsets are used to explore (k+1)-
itemsets.
• First, the set of frequent 1-itemsets is found by
scanning the database to accumulate the count for
each item, and collecting those items that satisfy
minimum support. The resulting set is denoted by
L1.
• Next, L1 is used to find L2, the set of frequent 2-
itemsets, which is used to find L3, and so on, until
no more frequent k-itemsets can be found. The
finding of each Lk requires one full scan of the
database.
• Apriori property: All nonempty subsets of a frequent itemset
must also be frequent.
•
• By definition, if an itemset I does not satisfy the minimum
support threshold, min_sup, then I is not frequent, that is,
P(I)< min sup.
• If an item A is added to the itemset I, then the resulting
itemset (i.e., I U A) cannot occur more frequently than I.
• Therefore, I U A is not frequent either, that is, P(I U A)<
min_sup.
• This property belongs to a special category of properties
called antimonotonicity in the sense that if a set cannot pass
a test, all of its supersets will fail the same test as well.
• It is called antimonotonicity because the property is
monotonic in the context of failing a test.
Apriori Algorithm (Pseudo Code)
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Apriori Example
Generating Association Rules from
Frequent Itemsets
• strong association rules satisfy both minimum
support and minimum confidence
Pattern – Growth approach for
Mining Frequent Patterns
• The FP - Growth Approach
– Depth-first search
– Avoid explicit candidate generation
• Adopts divide – and – conquer strategy
• First, it compresses the database representing frequent items into a frequent
pattern tree, or FP-tree, which retains the itemset association information.
• It then divides the compressed database into a set of conditional databases
(a special kind of projected database), each associated with one frequent
item or “pattern fragment,” and mines each database separately.
• For each “pattern fragment,” only its associated data sets need to be
examined.
• Therefore, this approach may substantially reduce the size of the data sets to
be searched, along with the “growth” of patterns being examined.
Example
T…U

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
No ratings yet
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
75 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
p139 Data Mining Mafia
No ratings yet
p139 Data Mining Mafia
13 pages
The Islamic Ruling Regarding Photography-Shaykh Muhammad Naasirud-Deen Al-Albaanee
No ratings yet
The Islamic Ruling Regarding Photography-Shaykh Muhammad Naasirud-Deen Al-Albaanee
6 pages
A Survey of Association Rule Mining For Customer Relationship Management
No ratings yet
A Survey of Association Rule Mining For Customer Relationship Management
7 pages
DWDM FINAL4
No ratings yet
DWDM FINAL4
37 pages
Association Rule Mapping - Unit-4
No ratings yet
Association Rule Mapping - Unit-4
11 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
17 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
Lect 6
No ratings yet
Lect 6
74 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Data Mining
No ratings yet
Data Mining
4 pages
Association Rules
No ratings yet
Association Rules
48 pages
Mining Frequent Itemset-Association Analysis
No ratings yet
Mining Frequent Itemset-Association Analysis
59 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
CH - 5
No ratings yet
CH - 5
43 pages
Data Analytics and Visualization Unit-IV
No ratings yet
Data Analytics and Visualization Unit-IV
4 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
ICS 2408 - Lecture 5 - Association
No ratings yet
ICS 2408 - Lecture 5 - Association
44 pages
DM Unit Ii
No ratings yet
DM Unit Ii
30 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
DWDM Module III
No ratings yet
DWDM Module III
33 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
DWM Unit-4
No ratings yet
DWM Unit-4
52 pages
Module 3 DM Notes For 2nd Internals
No ratings yet
Module 3 DM Notes For 2nd Internals
16 pages
VIPDMTheoryChapter 5
No ratings yet
VIPDMTheoryChapter 5
96 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
29 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
Unit 2
No ratings yet
Unit 2
14 pages
Module 2
No ratings yet
Module 2
13 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
DWDM Unit 3 PDF
No ratings yet
DWDM Unit 3 PDF
16 pages
Contents
No ratings yet
Contents
59 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
Practices For Success - Buddhist Keys To Abundance and Good Fortune - Buddha-Nature
No ratings yet
Practices For Success - Buddhist Keys To Abundance and Good Fortune - Buddha-Nature
41 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Frequent Pattern Mining With Associations: Lesson Introduction
No ratings yet
Frequent Pattern Mining With Associations: Lesson Introduction
6 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
Unit-4 DWDM Material
No ratings yet
Unit-4 DWDM Material
19 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Dokumen - Pub - Introduction To Data Mining 2nbsped 2017048641 9780133128901 0133128903 576 593
No ratings yet
Dokumen - Pub - Introduction To Data Mining 2nbsped 2017048641 9780133128901 0133128903 576 593
18 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Unit 5 Mining Frequent Patterns and Cluster Analysis
No ratings yet
Unit 5 Mining Frequent Patterns and Cluster Analysis
63 pages
Mining Frequent, Patterns, Associations, and Correlations
No ratings yet
Mining Frequent, Patterns, Associations, and Correlations
13 pages
Roofing Contract Agree
No ratings yet
Roofing Contract Agree
4 pages
Why Do I Oppose The Unification Church?
No ratings yet
Why Do I Oppose The Unification Church?
6 pages
Discuss The Importance of Religion in Society Today
100% (1)
Discuss The Importance of Religion in Society Today
2 pages
Chapter 2: Developing Marketing Strategies and Plans I. Marketing and Customer Value The Value Delivery Process
No ratings yet
Chapter 2: Developing Marketing Strategies and Plans I. Marketing and Customer Value The Value Delivery Process
7 pages
Responsible Use of Media and Information
No ratings yet
Responsible Use of Media and Information
14 pages
Affidavit of Leonid Mandel & Michael Krichevsky
No ratings yet
Affidavit of Leonid Mandel & Michael Krichevsky
4 pages
Nutrition: What Is Healthy Eating?: Visual Art LES
No ratings yet
Nutrition: What Is Healthy Eating?: Visual Art LES
33 pages
List of Registered Support Organization
No ratings yet
List of Registered Support Organization
337 pages
CN Lab prog-BH
No ratings yet
CN Lab prog-BH
12 pages
Leroy Gardner Sues Arlene James, Quinsigamond Community College
No ratings yet
Leroy Gardner Sues Arlene James, Quinsigamond Community College
8 pages
Diocese of Lafia Dyc Keffi 2020
No ratings yet
Diocese of Lafia Dyc Keffi 2020
5 pages
Heidegger Truth and Dadism
No ratings yet
Heidegger Truth and Dadism
10 pages
Nowacka-Rejzner 2019 IOP Conf. Ser. Mater. Sci. Eng. 603 022087
No ratings yet
Nowacka-Rejzner 2019 IOP Conf. Ser. Mater. Sci. Eng. 603 022087
11 pages
Terjemahan Expectional Speakers Contested and Problematized Gender Identities & Language and Gender in Adolescence
No ratings yet
Terjemahan Expectional Speakers Contested and Problematized Gender Identities & Language and Gender in Adolescence
27 pages
New Perspectives New Truths
No ratings yet
New Perspectives New Truths
50 pages
IT Discussion Forum
No ratings yet
IT Discussion Forum
2 pages
Auxin Signaling Transport and Regulation During Adven - 2024 - Current Plant B
No ratings yet
Auxin Signaling Transport and Regulation During Adven - 2024 - Current Plant B
14 pages
Udc Direct Under Fisheries Deptt Paper I
No ratings yet
Udc Direct Under Fisheries Deptt Paper I
8 pages
2022 Palawhenyo - Quizizz
No ratings yet
2022 Palawhenyo - Quizizz
5 pages
Class Ix - A - Physics Test (Work and Energy) - 22.12.2016
No ratings yet
Class Ix - A - Physics Test (Work and Energy) - 22.12.2016
2 pages
De Thi Giua Hoc Ki 1 Anh 8 (Suu Tam)
No ratings yet
De Thi Giua Hoc Ki 1 Anh 8 (Suu Tam)
3 pages
Marburg Virus Disease
No ratings yet
Marburg Virus Disease
30 pages
MarCom Pos
No ratings yet
MarCom Pos
62 pages
Review Apioninae Chile
No ratings yet
Review Apioninae Chile
21 pages
Untitledadvertising and Pseudo-Culture: An Analysis of The Changing Portrayal of Women in Print Advertisements
No ratings yet
Untitledadvertising and Pseudo-Culture: An Analysis of The Changing Portrayal of Women in Print Advertisements
21 pages
Level 7 Unit 4
No ratings yet
Level 7 Unit 4
2 pages
A Fierce Dog-1
No ratings yet
A Fierce Dog-1
8 pages
Truth-or-Dare Fun Game
No ratings yet
Truth-or-Dare Fun Game
1 page
How to Optimise Your Supply Chain to Make Your Firm Competitive!
From Everand
How to Optimise Your Supply Chain to Make Your Firm Competitive!
Andrei Besedin
2/5 (2)

Unit 3 1

Uploaded by

Unit 3 1

Uploaded by

Mining Frequent Patterns,

Association and Correlations:

• Each row corresponds to a transaction, which contains a

• Each row corresponds to a transaction and each column

• In association analysis, a collection of zero or more items is termed an itemset.

• Support count refers to the number of transactions that contain a particular

• The transaction width is defined as the number of items present in a

• Confidence measures the reliability of the inference made by

Given a set of transactions T, find all the rules

• If the itemset is infrequent, then all six candidate

• Customers who purchase computer also tend to buy

• Support of 2% means that 2% of all the transactions

• Let C be the set of closed frequent itemsets for a dataset D

You might also like