0% found this document useful (0 votes)

196 views12 pages

Unit 4 - Association Analysis

The document discusses association analysis, which is used to discover relationships between items in large transactional datasets. Association rules identify relationships between items that are frequently purchased together. For example, a rule may state that customers who purchase diapers are also likely to purchase beer. To be considered strong, rules must meet minimum thresholds for support (how often the items appear together) and confidence (how often items in the consequent appear in transactions with the antecedent items). Association rule mining algorithms use techniques like Apriori to efficiently identify all high-support and high-confidence rules without examining every possible rule.

Uploaded by

Anand Kumar Bhagat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

196 views12 pages

Unit 4 - Association Analysis

Uploaded by

Anand Kumar Bhagat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

https://genuinenotes.

com
Data Mining and Data Warehousing Unit 4: Association Analysis

Association Analysis
Introduction
Many business enterprises accumulate large quantities of data from their day-to-day operations.
For example, huge amounts of customer purchase data are collected daily at the checkout counters
of grocery stores. Table 1 illustrates an example of such data, commonly known as market basket
transactions. Each row in this table corresponds to a transaction, which contains a unique identifier
labeled TID and a set of items bought by a given customer. Retailers are interested in analyzing
the data to learn about the purchasing behavior of their customers. Such valuable information can
be used to support a variety of business-related applications such as marketing promotions,
inventory management, and customer relationship management.

TID Items
1 {Bread, Milk}
2 {Bread, Diapers, Beer, Eggs}
3 {Milk, Diapers, Beer, Cola}
4 {Bread, Milk, Diapers, Beer}
5 {Bread, Milk, Diapers, Cola}

Table 1 Market basket transactions of five customers

Association analysis is a methodology which is useful for discovering interesting relationships

hidden in large data sets. The uncovered relationships can be represented in the form of association
rules or sets of frequent items. For example, the following rule can be extracted from the data set
shown in Table 1:

{Diapers} → {Beer}

The rule suggests that a strong relationship exists between the sale of diapers and beer because
many customers who buy diapers also buy beer. Retailers can use this type of rules to help them
identify new opportunities for cross-selling their products to the customers.

Association rules take the form “If antecedent, then consequent,” along with a measure of the
support and confidence associated with the rule. Typically, association rules are considered

Arjun Lamichhane 1
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.
Such thresholds can be set by users or domain experts. For example, a particular supermarket may
find that of the 1000 customers shopping on a Thursday night, 200 bought diapers, and of the 200
who bought diapers, 50 bought beer. Thus, the association rule would be: “If buy diapers, then buy
beer,” with a support of 50/1000 = 5% and a confidence of 50/200 = 25%.

If the items or attributes in an association rule reference only one dimension, then it is a single-
dimensional association rule. For example:

buys(X, “computer”) → buys(X, “antivirus software”)

If a rule references two or more dimensions, such as the dimensions age, income, and buys, then
it is a multidimensional association rule. The following rule is an example of a multidimensional
rule:

age(X, “30. . . 39”) ∧ income(X, “42K . . . 48K”) → buys(X, “high resolution TV”)

Examples of association tasks in business and research include:

 Investigating the proportion of subscribers to your company’s cell phone plan that respond
positively to an offer of a service upgrade
 Examining the proportion of children whose parents read to them who are themselves good
readers
 Predicting degradation in telecommunications networks
 Finding out which items in a supermarket are purchased together, and which items are
never purchased together
 Determining the proportion of cases in which a new drug will exhibit dangerous side effects

The basic ideas of Association rule mining involve a number of new definitions

Itemsets

Let I = {I1, I2. . . I3} be a set of items. An itemset is a set of items contained in I, and a k-itemset is
an itemset containing k items. In other words, itemset is collection of zero or more items. For
example, {Bread, Milk} is a 2-itemset, and {Bread, Milk, Cola} is a 3-itemset, each from the Item
set in table 1.

Arjun Lamichhane 2
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

Support count

Support count for an itemset is the number of transaction or market basket containing all items in
the itemset l.

Support

The support s for a particular association rule A → B is the proportion of transactions in D that
contain both A and B. That is,

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑏𝑜𝑡ℎ 𝐴 𝑎𝑛𝑑 𝐵

𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝐴 → 𝐵) =
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑠

Support of a rule determines how often a rule is applicable to a given data set.

Confidence

The confidence c of the association rule A → B is a measure of the accuracy of the rule, as
determined by the percentage of transactions in D containing A that also contains B. In other
words,

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑏𝑜𝑡ℎ 𝐴 𝑎𝑛𝑑 𝐵

𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒(𝐴 → 𝐵) =
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑜𝑛𝑙𝑦 𝐴

Confidence determines how frequently items in B appear in the transactions that contain A.

Consider the rule {Milk, Diapers} → {beer}. Since the support count for {Milk, Diapers, Beer} is
2 and the total number of transactions is 5, so the rule's support is 2/5 = 0.4. The rule's confidence
is obtained by dividing the support count for {Milk, Diapers, Beer} i.e. items in both antecedent
and consequent (or A∪B in the rule A→B) by the support count for {Milk, Diapers} i.e. items in
only antecedent (or A in the rule A→B). Since there are 3 transactions that contain both milk and
diapers, the confidence for this rule is 2/3 = 0.67

Association Rule Discovery

Given a set of transactions T, find all the rules having support ≥ minsup and confidence ≥ minconf,
where minsup and minconf are the corresponding support and confidence thresholds.

Some of approaches for association rules mining are:

Arjun Lamichhane 3
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

Brute- Force Approach

A brute-force approach for mining association rules is to compute the support and confidence for
every possible rule. This approach is prohibitively expensive because there are exponentially many
rules that can be extracted from a data set. More specifically, the total number of possible rules
extracted from a data set that contains d items is

R=3d -2d+r + 1

Even for the small data set shown in Table 6.1, this approach requires us to compute the support
and confidence for 36 - 27 + 1 = 602 rules. More than 80% of the rules are discarded after applying
minsup = 20% and minconf = 50%, thus making most of the computations become wasted.

Mining association rules

A common strategy adopted by many association rule mining algorithms is to decompose the
problem into two major subtasks:

1. Frequent Itemset Generation, whose objective is to find all the itemsets that satisfy the minsup
threshold. These itemsets are called frequent itemsets.

2. Rule Generation, whose objective is to extract all the high-confidence rules from the frequent
itemsets found in the previous step. These rules are called strong rules.

Itemset Lattice

A lattice structure can be used to enumerate the list of all possible itemsets. Figure 1 shows an
itemset lattice for {a, b, c, d, e}. In general, a data set that contains k items can potentially generate
up to 2k - 1 frequent itemsets, excluding the null set. Because k can be very large in many practical
applications, the search space of itemsets that need to be explored is exponentially large.

Apriori Principle
If an itemset is frequent, then all of its subsets must also be frequent.

Suppose {c, d, e} is a frequent itemset. Clearly, any transaction that contains {c, d, e} must also
contain its subsets, {c, d}, {c, e}, {d, e}, {c}, {d}, and {e}. As a result, if {c, d, e} is frequent, then
all subsets of {c, d, e} (i.e., the shaded itemsets in figure 1) must also be frequent.

Arjun Lamichhane 4
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

Figure 1 An illustration of the Apriori principle. If {c, d, e} is frequent then all subset of this itemset are frequent

Conversely, if an itemset such as {a, b} is infrequent, then all of its supersets must be infrequent
too. As illustrated in Figure 2, the entire subgraph containing the supersets of {a, b} can be pruned
immediately once {a, b} is found to be infrequent. This strategy of trimming the exponential search
space based on the support measure is known as support-based pruning. Such a pruning strategy
is made possible by a key property of the support measure, namely, that the support for an itemset
never exceeds the support for its subsets. This property is also known as the anti-monotone
property of the support measure.

Arjun Lamichhane 5
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

Figure 2 An illustration of support-based pruning. If {a, b} is infrequent then all superset of {a, b} are also infrequent.

Apriori Algorithm

1. Let k=1
2. Generate frequent itemsets of length 1
3. Repeat until no new frequent itemsets are identified
3.1 Generate length (k+1) candidate itemsets from length k frequent itemsets
3.2 Prune candidate itemsets containing subsets of length k that are infrequent
3.3 Count the support of each candidate by scanning the DB
3.4 Eliminate (prune) candidates that are infrequent, leaving only those that are frequent.

Example of apriori algorithm

For the transaction database, D of following table, we use apriori algorithm to compute frequent
itemset. We suppose minsup count=2.

Arjun Lamichhane 6
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

TID List of Items

T1 A, B, E
T2 B, D
T3 B, C
T4 A, B, D
T5 A, C
T6 B,C
T7 A, C
T8 A, B, C, E
T9 A, B, C

Table 2 Market Basket Transactions of Nine Customers

--- Refer class notes---

Generating association rules from frequent itemsets

Once the frequent itemsets from transactions in a database D have been found, it is straightforward
to generate strong association rules from them (where strong association rules satisfy both
minimum support and minimum confidence).

For each frequent itemset l, generate all nonempty subsets of l.

For every nonempty subset s of l, output the rule “s → (l - s) if support count(l)/support count(s)
≥ min_conf, where min_conf is the minimum confidence threshold.

Because the rules are generated from frequent itemsets, each one automatically satisfies minimum
support threshold.

Example

From apriori method we have computed many frequent itemsets. One of them is I= {A, B, E}. The
nonempty subsets of I are {A}, {B}, {E}, {A, B}, {A, E} and {B, E}. The association rules that
can be generated are as follows:

{A} → {B, E} confidence = 2/6 = 33%

{B} → {A, E} confidence = 2/7 = 29%

Arjun Lamichhane 7
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

{E} → {A, B} confidence = 2/2 = 100%

{A, B} → {E} confidence = 2/4 = 50%

{A, E} → {B} confidence = 2/2 = 100%

{B, E} → {A} confidence = 2/2 = 100%

So, if we suppose the minimum confidence to be 70%, only {E} → {A, B}, {A, E} → {B},
{B, E} → {A} are strong association rules.

Remember that association rules can be derived from each of 2-frequent itemset, 3-frequent
itemset and so on.

FP growth
FP-growth algorithm takes a radically different approach to discovering frequent itemsets than
apriori algorithm. The algorithm does not follow the generate-and-test paradigm of Apriori.
Instead, it encodes the data set using a compact data structure called an FP-tree and extracts
frequent itemsets directly from this structure.

FP-growth adopts a divide-and-conquer strategy. First, it compresses the database representing

frequent items into a frequent-pattern tree, or FP-tree, which retains the itemset association
information. It then divides the compressed database into a set of conditional databases (a special
kind of projected database), each associated with one frequent item or “pattern fragment,” and
mines each such database separately.

For Example

Construction of FP Tree for the transactions of Table 2:

The first scan of the database is the same as Apriori, which derives the set of frequent items (1-
itemsets) and their support counts (frequencies).

Let the minimum support count be 2.

The set of frequent items is sorted in the order of descending support count. This resulting set or
list is denoted L. Thus, we have L= {{B: 7}, {A: 6}, {C: 6}, {D: 2}, {E: 2}}.

Arjun Lamichhane 8
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

An FP-tree is then constructed as follows. First, create the root of the tree, labeled with “null.”
Scan database D a second time. The items in each transaction are processed in L order (i.e., sorted
according to descending support count), and a branch is created for each transaction. For example,
the scan of the first transaction, “T1: A, B, E,” which contains three items (B, A, E in L order),
leads to the construction of the first branch of the tree with three nodes, 〈B: 1〉 〈A: 1〉 and 〈E: 1〉,
where B is linked as a child of the root, A is linked to B, and E is linked to A. The second
transaction, T2, contains the items B and D in L order, which would result in a branch where B is
linked to the root and D is linked to B. However, this branch would share a common prefix, B,
with the existing path for T1. Therefore, we instead increase the count of the B node by 1, and
create a new node, 〈D: 1〉, which is linked as a child of 〈B: 2〉. In general, when considering the
branch to be added for a transaction, the count of each node along a common prefix is incremented
by 1, and nodes for the items following the prefix are created and linked accordingly.

To facilitate tree traversal, an item header table is built so that each item points to its occurrences
in the tree via a chain of node-links. The tree obtained after scanning all of the transactions is
shown in Figure 3, with the associated node-links. In this way, the problem of mining frequent
patterns in databases is transformed to that of mining the FP-tree.

Figure 3 An FP-tree registers compressed, frequent pattern information.

FP-Tree mining
The FP-tree is mined as follows. Start from each frequent length-1 pattern (as an initial suffix
pattern), construct its conditional pattern base (a “sub database,” which consists of the set of prefix
paths in the FP-tree co-occurring with the suffix pattern), then construct its (conditional) FP-tree,

Arjun Lamichhane 9
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

and perform mining recursively on such a tree. The pattern growth is achieved by the concatenation
of the suffix pattern with the frequent patterns generated from a conditional FP-tree.

We first consider E, which is the last item in L, rather than the first. E occurs in two branches of
the FP-tree of Figure 3. (The occurrences of E can easily be found by following its chain of node-
links.) The paths formed by these branches are {{B, A, E: 1} and {B, A, C, E: 1}}. Therefore,
considering E as a suffix, its corresponding two prefix paths are {B, A: 1} and {B, A, C: 1}, which
form its conditional pattern base. Its conditional FP-tree contains only a single path, 〈 B: 2, A: 2 〉;
C is not included because its support count of 1 is less than the minimum support count. The single
path generates all the combinations of frequent patterns: {B, E: 2}, {A, E: 2} and {B, A, E: 2}.
Similarly we can compute all frequent patterns as:

Item Conditional Pattern Base Conditional FP Tree Frequent Patterns Generated

E {{B, A: 1}, {B, A, C: 1}} 〈 B: 2, A: 2 〉 {B, E: 2}, {A, E: 2}, {B, A, E: 2}
D {{B, A: 1}, {B: 1}} 〈 B: 2〉 {B, D: 2}
C {{B, A: 2}, {B: 2}, {A: 2}} 〈 B: 4, A: 2 〉〈 A: 2 〉 {B, C: 4}, {A, C: 4}, {B, A, C: 2}
A {B: 4} 〈 B: 4 〉 {B, A: 4}
B ---- --- ---

Hence, the frequent patterns generated are: {B, E: 2}, {A, E: 2}, {B, A, E: 2}, {B, D: 2}, {B, C:
4}, {A, C: 4}, {B, A, C: 2} and {B, A: 4}

The FP-growth method transforms the problem of finding long frequent patterns to searching for
shorter ones recursively and then concatenating the suffix. It uses the least frequent items as a
suffix, offering good selectivity. The method substantially reduces the search costs.

Handling Categorical Attributes

There are many applications that contain symmetric binary and nominal attributes. The Internet
survey data shown in Table 3 contains symmetric binary attributes such as Gender, Computer at
Home, Chat Online, Shop Online, and Privacy Concerns; as well as nominal attributes such as
Level of Education and State. Using association analysis, we may uncover interesting information
about the characteristics of Internet users such as:

{Shop Online = Yes} → {Privacy Concerns = Yes}

Arjun Lamichhane 10
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

This rule suggests that most Internet users who shop online are concerned about their personal
privacy.

Table 3 Internet survey data with categorical attributes

To extract such patterns, the categorical and symmetric binary attributes are transformed into
"items" first, so that existing association rule mining algorithms can be applied. This type of
transformation can be performed by creating a new item for each distinct attribute-value pair. For
example, the nominal attribute Level of Education can be replaced by three binary items:
Education = College, Education = Graduate, and Education = High School. Similarly, symmetric
binary attributes such as Gender can be converted into a pair of binary items, MaIe and Female.
Table 4 shows the result of binarizing the Internet survey data.

Table 4 Internet survey data after binarizing categorical and symmetric binary attributes

Arjun Lamichhane 11
https://genuinenotes.com
Data Mining and Data Warehousing Unit 4: Association Analysis

There are several issues to consider when applying association analysis to the binarized data:

Some attribute values may not be frequent enough to be part of a frequent pattern. This problem
is more evident for nominal attributes that have many possible values, e.g., state names. Lowering
the support threshold does not help because it exponentially increases the number of frequent
patterns found (many of which may be spurious) and makes the computation more expensive. A
more practical solution is to group related attribute values into a small number of categories. For
example, each state name can be replaced by its corresponding geographical region, such as
Midwest, Pacific Northwest, Southwest, and East Coast. Another possibility is to aggregate the
less frequent attribute values into a single category called “Others”.

Some attribute values may have considerably higher frequencies than others. For example, suppose
85% of the survey participants own a home computer. By creating a binary item for each attribute
value that appears frequently in the data, we may potentially generate many redundant patterns.

References

[1] J. Han and K. Micheline, Data Mining: Concepts and Techniques, San Francisco: Elsevier
Inc., 2006.

[2] P.N. Tan, M. Steinbach and V. Kumar, INTRODUCTION TO DATA MINING, New York:
PEARSON Addison Wesley, 2006.

[3] I. H. Witten and E. Frank, Data Mining Practical Machine Learning Tools and Techniques,
San Francisco: Morgan Kaufmann Publishers, 2005.

Arjun Lamichhane 12

BB Fuzzing
No ratings yet
BB Fuzzing
1,892 pages
4 5846013955205503562
No ratings yet
4 5846013955205503562
1,209 pages
Dlv02.01 Business Processes
No ratings yet
Dlv02.01 Business Processes
65 pages
Slides
No ratings yet
Slides
92 pages
CH 5
No ratings yet
CH 5
53 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
Association Rules
No ratings yet
Association Rules
39 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Unit 4
No ratings yet
Unit 4
97 pages
1association Analysis-Apriori
No ratings yet
1association Analysis-Apriori
67 pages
Dokumen - Pub - Introduction To Data Mining 2nbsped 2017048641 9780133128901 0133128903 576 593
No ratings yet
Dokumen - Pub - Introduction To Data Mining 2nbsped 2017048641 9780133128901 0133128903 576 593
18 pages
Lect 6
No ratings yet
Lect 6
74 pages
Website Traffic Forecasting
No ratings yet
Website Traffic Forecasting
32 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
No ratings yet
Chapter-6 (Association Analysis Basic Concepts and Algorithms)
75 pages
Custom ORC Job Offer Letter Template
No ratings yet
Custom ORC Job Offer Letter Template
13 pages
Association Rule
No ratings yet
Association Rule
22 pages
Blockchain-Based Privacy-Preserving Deduplication and Integrity Auditing in Cloud Storage
No ratings yet
Blockchain-Based Privacy-Preserving Deduplication and Integrity Auditing in Cloud Storage
13 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
Association Rule
No ratings yet
Association Rule
17 pages
Voip Literature Review
100% (1)
Voip Literature Review
5 pages
DWDM FINAL4
No ratings yet
DWDM FINAL4
37 pages
Association
No ratings yet
Association
54 pages
KX-NS Version 8.3 KX-NSX Version 5.3 Software Release Note: October 9, 2020
100% (1)
KX-NS Version 8.3 KX-NSX Version 5.3 Software Release Note: October 9, 2020
76 pages
Spring 24 Key Points
No ratings yet
Spring 24 Key Points
42 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Association Analysis
No ratings yet
Association Analysis
26 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
19 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Dbatu University Blockchain Notes BCT 1st Unit
No ratings yet
Dbatu University Blockchain Notes BCT 1st Unit
40 pages
DWM Unit-4
No ratings yet
DWM Unit-4
52 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
17 pages
Awesome Bugbounty Writeups
0% (1)
Awesome Bugbounty Writeups
21 pages
Unit Iii (DWDM)
No ratings yet
Unit Iii (DWDM)
11 pages
Handout 1
No ratings yet
Handout 1
12 pages
Digital Foot Prints
No ratings yet
Digital Foot Prints
5 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
CBAP Project 3 Reference - WhatsApp Pay
100% (1)
CBAP Project 3 Reference - WhatsApp Pay
7 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Electronic Commerce Research and Applications: Mehmet Türkay Yoldar, U Ğur Özcan T
No ratings yet
Electronic Commerce Research and Applications: Mehmet Türkay Yoldar, U Ğur Özcan T
17 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
Diploma in Information Technology Coursework: Subject: Web Design Subject Code: DIT2283
No ratings yet
Diploma in Information Technology Coursework: Subject: Web Design Subject Code: DIT2283
15 pages
SP ICT 7 Revised CG - Collaboration Tool
No ratings yet
SP ICT 7 Revised CG - Collaboration Tool
4 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Dreamweaver Cs4 Introduction
No ratings yet
Dreamweaver Cs4 Introduction
22 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
JSON Web Token Vulnerabilities
No ratings yet
JSON Web Token Vulnerabilities
18 pages
MIS at BOSCH
No ratings yet
MIS at BOSCH
57 pages
Unit 2
No ratings yet
Unit 2
14 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Association Rule Mapping - Unit-4
No ratings yet
Association Rule Mapping - Unit-4
11 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
DWDM Unit-4
No ratings yet
DWDM Unit-4
27 pages
04 AssociationRules PDF
No ratings yet
04 AssociationRules PDF
15 pages
Week 9 PDF
No ratings yet
Week 9 PDF
5 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
Chatgpt Guide
100% (1)
Chatgpt Guide
56 pages
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
No ratings yet
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
54 pages
Gail Capstone Final Report
0% (1)
Gail Capstone Final Report
21 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
Sivani Salet Enrollment Number: 91800425079 BBA (H) Sem 4-B
No ratings yet
Sivani Salet Enrollment Number: 91800425079 BBA (H) Sem 4-B
11 pages
IP65 Rated Fingerprint Access Control Reader
No ratings yet
IP65 Rated Fingerprint Access Control Reader
2 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Microsoft Word: Practice 1 Microsoft Word: Practice 1
No ratings yet
Microsoft Word: Practice 1 Microsoft Word: Practice 1
2 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
DM Association
No ratings yet
DM Association
43 pages
Tamim Wasif: Khulna-9100, Bangladesh - +8801772544228
No ratings yet
Tamim Wasif: Khulna-9100, Bangladesh - +8801772544228
1 page
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Data Mining Unit 4 (1) PDF PDF
No ratings yet
Data Mining Unit 4 (1) PDF PDF
11 pages
SOCMED Notes
No ratings yet
SOCMED Notes
10 pages
04 AssociationRules
No ratings yet
04 AssociationRules
15 pages
Association Analysis: Basic Concepts and Algorithms
No ratings yet
Association Analysis: Basic Concepts and Algorithms
28 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
Search Subjects Log In: Step-By-Step Solutions To All Your Questions
No ratings yet
Search Subjects Log In: Step-By-Step Solutions To All Your Questions
2 pages
Exam Questions AZ-104: Microsoft Azure Administrator (Beta)
No ratings yet
Exam Questions AZ-104: Microsoft Azure Administrator (Beta)
23 pages
The Long Tail Theory for Business: Find your niche and future-proof your business
From Everand
The Long Tail Theory for Business: Find your niche and future-proof your business
50minutes
No ratings yet

Unit 4 - Association Analysis

Uploaded by

Unit 4 - Association Analysis

Uploaded by

https://genuinenotes.

Table 1 Market basket transactions of five customers

Association analysis is a methodology which is useful for discovering interesting relationships

buys(X, “computer”) → buys(X, “antivirus software”)

Examples of association tasks in business and research include:

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑏𝑜𝑡ℎ 𝐴 𝑎𝑛𝑑 𝐵

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑎𝑛𝑠𝑎𝑐𝑡𝑖𝑜𝑛𝑠 𝑐𝑜𝑛𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑏𝑜𝑡ℎ 𝐴 𝑎𝑛𝑑 𝐵

Association Rule Discovery

Some of approaches for association rules mining are:

Brute- Force Approach

Mining association rules

Example of apriori algorithm

TID List of Items

Table 2 Market Basket Transactions of Nine Customers

--- Refer class notes---

Generating association rules from frequent itemsets

For each frequent itemset l, generate all nonempty subsets of l.

{A} → {B, E} confidence = 2/6 = 33%

{B} → {A, E} confidence = 2/7 = 29%

{E} → {A, B} confidence = 2/2 = 100%

{A, B} → {E} confidence = 2/4 = 50%

{A, E} → {B} confidence = 2/2 = 100%

{B, E} → {A} confidence = 2/2 = 100%

FP-growth adopts a divide-and-conquer strategy. First, it compresses the database representing

Construction of FP Tree for the transactions of Table 2:

Let the minimum support count be 2.

Figure 3 An FP-tree registers compressed, frequent pattern information.

Item Conditional Pattern Base Conditional FP Tree Frequent Patterns Generated

Handling Categorical Attributes

{Shop Online = Yes} → {Privacy Concerns = Yes}

Table 3 Internet survey data with categorical attributes

You might also like