0% found this document useful (0 votes)

10 views42 pages

Class 4-Associative Analysis

The document discusses data mining techniques, focusing on associative analysis and the discovery of frequent and infrequent patterns. It outlines various data mining tasks and methods, particularly emphasizing association rule discovery, its applications in retail and other fields, and the challenges associated with mining these rules. Additionally, it explains key concepts such as support, confidence, and the Apriori principle, along with the computational complexities involved in mining association rules.

Uploaded by

eltcarva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views42 pages

Class 4-Associative Analysis

Uploaded by

eltcarva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Prof.

Heitor Silvério Lopes

Prof. Thiago H. Silva

Data Mining &

Knowledge
Discovery
Class 4 – Associative Analysis:
frequent and infrequent pattern
discovery
2025
Tasks x Methods in Data Mining
Tasks Methods
Classification Decision trees (C4.5), Classification rules, k-nearest-neighboors,
Random forest, Support vector machine, Bayesian classifier,
Neural network, Adaboost
Association Rules Apriori, FP-growth, Eclat, Zigzag

Regression Linear Regression, Polynomial regression, Logistic regression

Feature Selection & Principal component analysis (PCA), Chi-square, Entropy,

Dimensionality Reduction Information gain

Clustering K-means, Kohonen’s self-organized map, Density-based scan,

Hierarchical grouping, t-SNE
Data visualization * Silhouette plot, scatter plot, heatmap, box plot, clusters, t-SNE
Where do impulse purchases happen most ?
Supermarkets or Shoppings ?

● 34% in Supermarkets
● 25% in Shopping malls
● 19% in online e-commerce
Why impulse purchases ?
● Impulse buy because:
○ Emotional reason (retail therapy)
○ Lack of economic education
○ Belief it is a deal
○ Shopping causes a dopamine effect
What encourages someone to buy something?
RECOMMENDATION

ASSOCIATION
Two important questions for retail companies
● How to Discover the interests of people who browse the
internet, social networks, shopping malls and
supermarkets ?
● How to associate personal interests with products
and/or services in order to encourage consuption ?

● Answer: Association Rule Discovery !!!

● Why? Discover non-obvious relationships between itens
in a data set
Application areas of Association Rules
● Retail sales → Products purchased together
● Recommendation systems → Find products based on shared interests with other people
● Control and supervision systems → Discover relationships between events and failures
● Bioinformatics → Discover unknown interactions between genes and diseases
● Medical diagnosis → Discover unknown cause-effect relationships of drugs
Some difficulties found for discovering Association Rules
● A large amount of (sparce) data is needed to obtain any
useful knowledge
● The algorithms used are computationally (very)
expensive (it is amany-to-many mapping!)
● Some relationships found may happen by chance
● There is no “cause → effect” proof
Associaton rules - Objective
Given a set of transactions, find rules to predict the occurence
of an item based on the occurrence of other items in the set

Shopping basket transactions Some Association Rules (or

frequent itemsets

{Bread} → {Milk}

customer {Beer} → {Diapers}

purchases
William Bread, Milk {Milk} → {Coke}

Gabriel Bread, Diapers, Beer, Eggs

Mary Milk, Diapers, Beer, Coke
Thiago Bread, Milk, Diapers, Beer Implication means
co-occurrency,
Sophia Bread, Milk, Diapers, Coke NOT causality!
Very important definitions customer purchases
William Bread, Milk
● Itemsets
○ A collection of zero or more items. Ex. {Milk, Bread, Gabriel Bread, Diapers, Beer, Eggs
Diapers}
○ K-itemset: a set containing k items Mary Milk, Diapers, Beer, Coke
● Support Count (σ): Thiago Bread, Milk, Diapers, Beer
○ Frequence of occurrence of an itemset
Sophia Bread, Milk, Diapers, Coke
○ E.g. 𝜎({Eggs}) = 1 and 𝜎({Milk, Diapers}) = 3
● Frequent Itemsets:
○ It is an itemset whose support is larger than or equal to
a given threshold minsup
○ E.g. considering minsup=4, {Bread}, {Milk}, {Diapers} are
frequent itemsets, but NOT {Milk, Diapers} !!
● Association Rule:
○ It is an expression with an implication of two itemsets,
such as: X→Y. Example: {Milk, Diapers} →{Beer}
Association Rules Evaluation customer purchases
William Bread, Milk
● Support (s): Gabriel Bread, Diapers, Beer, Eggs
○ It is the fraction of transactions
Mary Milk, Diapers, Beer, Coke
that contain both X and Y
Thiago Bread, Milk, Diapers, Beer
Sophia Bread, Milk, Diapers, Coke

● Confidence (c):
○ Measures how frequently the Consider this association rule:
itens of Y appear in the 𝑀𝑖𝑙𝑘, 𝐷𝑖𝑎𝑝𝑒𝑟𝑠 → 𝐵𝑒𝑒𝑟
transactions that contain X:
σ(𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠,𝐵𝑒𝑒𝑟)
𝑠= = 2/5 = 0,4
𝑁
σ(𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠,𝐵𝑒𝑒𝑟)
𝑐= = 2/3 = 0,67
σ(𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠)
How to interpret support and confidence
● Support (S):
○ A rule with low support means that it can happen by chance, since items will
rarely occur together. E.g. {Diapers, Beer}→Coke, s=1/5

● Confidence (c):
○ Measures the reliability of the inference made by a rule.
○ A high confidence means that Y is more likely to be present in X transactions
○ Rules with high confidence but very low support are generally not of much
interest. E.g. {eggs}→{beer}
Mining association rules
● Given a set of T transactions, the task of mining association rules consists
in finding all rules that satisfy these requirements:
○ support ≥ minsup
○ confidence ≥ minconf

● Brute-force approach:
○ List all possible association rules
○ Compute support and confidence for all rules
○ Delete rules that do not satisfy minsup and minconf thresholds
○ Such an approach is computationally prohibitive for very large itemsets
A two-steps approach for mining association rules
1. Generation of frequent itemsets:
○ All itemsets that satisfy support ≥ minsup
2. Generation of rules:
○ For each itenset, generate rules with high confidence
○ Each rule is a binary partition of this set

● Recall that the Generation of frequent itemsets is still

computationally costly
1: Generation of frequent itemsets
customer purchases Examples of rules and their metrics:
{Milk,Diapers} → {Beer} (s=2/5=0,4, c=2/3=0,67)
William Bread, Milk
{Milk,Beer} → {Diapers} (s=0,4, c=1,0)
Gabriel Bread, Diapers, Beer, Eggs {Diapers,Beer} → {Milk} (s=0,4, c=0,67)
Mary Milk, Diapers, Beer, Coke {Beer} → {Milk,Diapers} (s=0,4, c=0,67)
{Diapers} → {Milk,Beer} (s=0,4, c=0,5)
Thiago Bread, Milk, Diapers, Beer {Milk} → {Diapers,Beer} (s=0,4, c=0,5)
Sophia Bread, Milk, Diapers, Coke

● All the above rules are binary partitions of the same itemset:
{Milk,Diapers,Beer}
● Rules originating from the same itemset have identical support, but different
confidence
● Therefore, If the set {Milk,Diapers,Beer} is infrequent, all 6 rules generated from
it will also be infrequent.
● Thus, the support and confidence requirements can be considered separately
1: Generation of frequent itemsets
● Given d items, there are
2𝑑 possible itemsets
● Example: d=5 → 32
itemsets
● Example: d=100, 2100 =
1.267.650.600.228.229.4
01.496.703.205.376
itemsets

One septillion, two hundred sixty-seven sextillion,

six hundred fifty quintillion, six hundred
quadrillion, two hundred twenty-eight trillion, two
hundred twenty-nine billion, four hundred one
million, four hundred ninety-six thousand, seven
hundred three, two hundred five, three hundred
seventy-six.
1: Generation of frequent itemsets
● Brute-force approach:
○ Each frequent itemset in the graph is a candidate
○ Compute the support for each canditate by examining the dataset

List of candidate Transactions

itemsets
customer purchases
William Bread, Milk
Gabriel Bread, Diapers, Beer, Eggs
Mary Milk, Diapers, Beer, Coke
Thiago Bread, Milk, Diapers, Beer
Sophia Bread, Milk, Diapers, Coke

○ Compare each transaction against all candidates

■ Computational complexity: ~O(N.M.W)
■ Still very hard, since: M = 2d
1: Generation of sets of frequent itemsets
● Given d items (unique!):
○ The total number of frequent itemsets is 2d d−1

d  d−kd− 
k
○ Number of possible association rules: R= 
k
   
 j 


k=   j=
1 1 

○ For d=6: {Milk,Bread,Diapers,Beer,Eggs,Coke} +
=3d
−2d 1
+1
d #itemsets #rules
1 2 0
2 4 2
3 8 12
4 16 50
5 32 180
6 64 602
10000 1,995 E3010 1,63 E4771
1: How to reduce the number of frequent itemsets?
● Reduce the number of candidates (M):
○ For a full search: M=2d
○ Use pruning techniques to reduce M
● Reduce the number of transactions (N):
○ Reduce the size of N as the itemset size increases
● Reduce the number of comparisons (NM):
○ Use some efficient data structure to store the set of
candidates or the set of transactions
1: The Apriori principle
If an item set is frequent, then all its subsets must
also be frequent

● This is the Apriori Principle, which is supported by the property of the

support metric (anti-monotonicity) that establishes that:
○ The support of an itemset never is greater than the support of its subsets. That is:


X,
Y :
(X
Y
) s
(X
) s
(
Y)
1: Illustration of the
Apriori principle

Frequent item sets →

frequent subsets
1: Illustration of the
Apriori principle
Set of
Infrequent items

Support-based
prunning

Prunned supersets
1: Example of itemset generation using Apriori Tid purchases
minsup=0,60
1 Bread, Milk
Minimum count = 3
2 Bread, Diapers, Beer, Eggs
3 Milk, Diapers, Beer, Coke
4 Bread, Milk, Diapers, Beer
6
(1) Item Count
4 (42) Itemset Count
5 Bread, Milk, Diapers, Coke
Bread
Coke
Milk
2
4
{Bread, Milk}
{Bread,Beer}
3
2
( 43 ) Itemset Count
{Bread,Milk,Diapers} 2
Beer 3 {Bread,Diapers} 3
{Brea,Milk,Beer} 1
Diapers 4 {Milk,Beer} 2
Eggs 1 {Milk,Diapers,Beer} 2
{Milk,Diapers} 3
{Beer,Diapers} 3 {Bread,Diapers,Milk} 2

If all the subsets were enumerated:

C(6,1) + C(6,2) + C(6,3) = 6+15+20 = 41
By using pruning based on a minimal support: 6 + 6 + 4 = 16
1: How to adjust the minimun support?
● If minsup is set too high, itemsets involving rare items may be missed
○ E.g. If it is related to purchases, it may be very expensive products that are purchased
infrequently
● If minsup is too low, the search for Association Rules becomes
computationally expensive and the number of itemsets may be very large
○ Is should be considered that real datasets are high-dimensional (many transactions and
many items)
● There is no other way out: the minimum support must be a Variable, and
it is up to the user to dynamically adjust it accordingly to:
○ Problem requirements
○ Computational power
○ Results obtained to date
2: Generation and prunning of candidate rules
● Generation of candidates:
○ Generate new candidate of k-itemsets based on the frequent (k-1)-itemsets of
the previous iteraction

● Candidate pruning using the Apriori principle:

○ Eliminate some of the candidates from k-itemsets
○ Each (k-1)-itemset must be frequent, otherwise it is infrequent.
○ Example:
■ If {a,b,c} is frequent, then: {a,b}, {b,c}, and {a,c} must be frequent.
■ If any of them is not frequent, then {a,b,c} is not frequent
2: Generation and prunning of candidate rules
● Brute-force method:
○ Generate all combinations and prune
candidates without minimum support
(minsup)
○ As mentioned before, the computational
complexity for each candidate at level k
is O(k) and for the entire method is
O(d.𝟐(𝒅−𝟏) )
2: Consequences of the computational complexity
● Minimum support value:
○ Decreasing the minimum support threshold results in more frequent itemsets
○ This can increase the number of candidates and the maximum lenght of frequent
itemsets (W)
● Dimensionality (number of items) of the dataset:
○ More memory is required to store the support counters for each item
○ If the number of frequent items also increases, both the processing cost and the I/O
cost increase
● Size of the transaction database:
○ Since the Apriori algorithm traverses the databasemultiple times, the processing time
of the algorithm increases with the number of transactions
● Average width of transactions:
○ The average width increases with denser datasets
○ This can increase W and the number of subsets in a transaction icreass with its
maximum width
Case study #1
● Dataset: Foodmart 2000
● What is the relationship
between purchases of
lightbulbs and
batteries?

Step 1: Generation of
frequent itemsets (that
contain lightbulbs)
Case study #1
● Association rules
that must include
batteries in the
antecedent
Case study #2
● Titanic dataset
● Apriori with minsup=10%
and minconf=90%
● Frequent itemsets:
Case study #2
● Titanic dataset
● Association rules found:
How to evaluate association rules ?
● Association Rule mining algorithms tend to produce a large number of
rules for different values of Support and Confidence
● However, many of these rules are redundant or uninteresting
○ A rule is redundant if {A,B,C}→{D} and {A,B}→{D}, and both rules have the same support
and confidence
○ (Quantitatively) uninteresting patterns:
■ Involving a set of mutually independent items (the occurrence of one event does
not affect the probability of the other)
■ Covering very few transactions
○ (Qualitatively) uninteresting patterns:
■ That are obvious or expected (for most people), e.g., {bread→milk}
● Measures of interest can be used for pruning or ranking the association
rules obtained by an algorithm
How to evaluate association rules ?
● Objective measures:
○ The association rules are ranked based on statistical methods computed over the data
○ An association metric is used, such as: Support, Confidence, Laplace, Gini, Mutual
Information, Jaccard index, etc
● Subjective measures:
○ The association rules are ranked according to the user interpretation.
○ E.g. a rule is subjectively interested when it contradicts the user’s expectation

A. Silberschatz & A.Tuzhilin, On subjective measures of

interestingness in knowledge discovery. Proc.
Knowledge Discovery in Databases conference, p. 275-
281,1995
Objective metrics for the evaluation of
association rules
● Support (s):
○ Fraction of transactions that contain both, X and Y customer purchases
William Bread, Milk
Gabriel Bread, Diapers, Beer, Eggs
● Confidence (c): Mary Milk, Diapers, Beer, Coke
○ Measures how frequent the itens in Y appear in Thiago Bread, Milk, Diapers, Beer
transactions that contain X Sophia Bread, Milk, Diapers, Coke

● Lift: {Milk,Diapers}→Beer
○ Confidence divided by the propotion of instances covered
𝜎 𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠,𝐵𝑒𝑒𝑟 2
by the consequent 𝑆= = = 0,4
𝑁 5
○ Measures the importance of the association that is 𝜎 𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠,𝐵𝑒𝑒𝑟 2
𝐶= = = 0,67
independent of support 𝜎 𝑀𝑖𝑙𝑘,𝐷𝑖𝑎𝑝𝑒𝑟𝑠 3

𝐶 0,4
𝑙𝑖𝑓𝑡 = = = 0,67
𝑆 𝐵𝑒𝑒𝑟 0,6
Objective metrics for the evaluation of
association rules
● Leverage (L):
○ Proportion of additional instances covered by both, antecedente and
consequent, above of what would be expected if both were statistically
independent

● Conviction (conviction):
○ Measures the independency between antecedente and consequent

𝑃 𝑋 . 𝑃(! 𝑌)
𝐶𝑜𝑛𝑣𝑖𝑐𝑡𝑖𝑜𝑛 =
𝑃 𝑋, ! 𝑌
Objective metrics for the
evaluation of association rules
● There are many
metrics proposed
in the literature
● Some of them may
be good for certain
applications, but
not for others
● There is no clear
definition of their
usefulness...
On transforming nominal and categorical attributes
● Many datasets can contain atributes of diferente types in a list of items
● It is necessary to convert them to a suitable format to be explored by
Associative Analysis methods
On transforming nominal and categorical attributes
● Sex: Simmetric binary atribute → two binary atributes (M, F)
● Education: categorical atribute → three binary atributes (PG, S, 2G)
● Problem: if the values of the nominal atributes are infrequent (e.g. State), it may
not gerate frequent items → Grouping
On transforming continuous attributes
● Discretization methods:
○ Equal intervals, same frequenc of data, etc
○ Problems: discretization can generate a large number of atributes
○ Adjusting the optimal ranges is computionally expensive
Case study #3
● Breast cancer Wisconsin dataset
● Objective: differential diagnosis of breast cancer using characteristics of the
cell nuclei presente in the images
Case study #3
● Breast cancer Wisconsin dataset
● Question: What are the antecedentes that lead to malignancy ?
Associative analysis – advanced topics
● Association rules for infrequent/negatively correlated patterns
● Association rules for sequential (temporal) patterns
● Association rules for graphs

● In all cases, extensive preliminar manipulation of the dataset

may be necessary to allow the use of conventional software

ML Module3
No ratings yet
ML Module3
83 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
Slides
No ratings yet
Slides
92 pages
06 FPBasic
No ratings yet
06 FPBasic
77 pages
Data Mining Association Analysis
No ratings yet
Data Mining Association Analysis
18 pages
Unit-2 Dma
No ratings yet
Unit-2 Dma
68 pages
Association Rules
No ratings yet
Association Rules
39 pages
304A Data Warehousing and Data Mining Unit-3
No ratings yet
304A Data Warehousing and Data Mining Unit-3
17 pages
Lect 6
No ratings yet
Lect 6
74 pages
Unit 4 .3 Association Analysis
No ratings yet
Unit 4 .3 Association Analysis
50 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
04-Association Rule Mining
No ratings yet
04-Association Rule Mining
22 pages
Chapter Five
No ratings yet
Chapter Five
9 pages
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
No ratings yet
16-Efficient and Scalable Frequent Item Set Mining Methods - Apriori Algorithm-05-02-2025
37 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Association Rule
No ratings yet
Association Rule
17 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
6 - Association Rules - For Students
No ratings yet
6 - Association Rules - For Students
39 pages
Big Data Analytics AAM Unit 4
No ratings yet
Big Data Analytics AAM Unit 4
80 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
Equent Itemsets & Clustering
No ratings yet
Equent Itemsets & Clustering
27 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Unit 2
No ratings yet
Unit 2
14 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
Association Rules and Frequent Item Analysis
No ratings yet
Association Rules and Frequent Item Analysis
30 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
DM Association
No ratings yet
DM Association
43 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Sample MA Due Diligence Issues Report
No ratings yet
Sample MA Due Diligence Issues Report
10 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Association Rules
No ratings yet
Association Rules
24 pages
Quotation of Classroom Block at Springs Educational Services 2024
No ratings yet
Quotation of Classroom Block at Springs Educational Services 2024
2 pages
Acsai Test Score
No ratings yet
Acsai Test Score
3 pages
Simulación Con VLECalc
No ratings yet
Simulación Con VLECalc
36 pages
Case Study Instructions
No ratings yet
Case Study Instructions
8 pages
OUM MARKETING MANAGEMENT BBPM2103 Topic 2
No ratings yet
OUM MARKETING MANAGEMENT BBPM2103 Topic 2
45 pages
Resume: Lokam Srikanth Contact No: +91 8463931010
No ratings yet
Resume: Lokam Srikanth Contact No: +91 8463931010
2 pages
The Business of Intellectual Property A Literature Review of IP Management Research
No ratings yet
The Business of Intellectual Property A Literature Review of IP Management Research
20 pages
Association Rules Explained
No ratings yet
Association Rules Explained
10 pages
Final Nikhil Cover - Page - Certi.
No ratings yet
Final Nikhil Cover - Page - Certi.
10 pages
Features Description: Single Phase, Multifunction Energy Meter IC
No ratings yet
Features Description: Single Phase, Multifunction Energy Meter IC
30 pages
Tom's Introduction To The MBT Binaural Beats and How Best To Use Them
No ratings yet
Tom's Introduction To The MBT Binaural Beats and How Best To Use Them
3 pages
Helping Hand - An Advance Way To Communicate With An Orphanage Organization
No ratings yet
Helping Hand - An Advance Way To Communicate With An Orphanage Organization
3 pages
Technical Manual: Includes
No ratings yet
Technical Manual: Includes
13 pages
Itri 613 Database Systems Assignment 1 29435927
No ratings yet
Itri 613 Database Systems Assignment 1 29435927
9 pages
Problem Set 1 Answers
No ratings yet
Problem Set 1 Answers
4 pages
Lab 6 Introduction To Basic Interface
No ratings yet
Lab 6 Introduction To Basic Interface
7 pages
Computer 10 4th MY ANSWER
No ratings yet
Computer 10 4th MY ANSWER
11 pages
Yanmar SV20 - Partsbook PDF
100% (2)
Yanmar SV20 - Partsbook PDF
168 pages
3 Categories of Entrants
No ratings yet
3 Categories of Entrants
5 pages
NLP Extc Sem8 Final Exam IMPs
No ratings yet
NLP Extc Sem8 Final Exam IMPs
3 pages
Powervu Model D9835 Satellite Receiver: Description
No ratings yet
Powervu Model D9835 Satellite Receiver: Description
4 pages
Brakes Volvo Trucks
No ratings yet
Brakes Volvo Trucks
2 pages
2
No ratings yet
2
2 pages
Week 1 Lec 2 CC
No ratings yet
Week 1 Lec 2 CC
13 pages
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
No ratings yet
Nighthawk Ac1900 Wifi Usb Adapter-Usb 3.0, Dual Band: Performance & Use
4 pages
Everything-As-A-Service (XaaS) For Original Equipment Manufacturers
No ratings yet
Everything-As-A-Service (XaaS) For Original Equipment Manufacturers
26 pages
RD545 Acoustic Leak Detector: Advanced Electronic Ground Microphone
No ratings yet
RD545 Acoustic Leak Detector: Advanced Electronic Ground Microphone
2 pages
Foundation Plan (Delos Santos)
No ratings yet
Foundation Plan (Delos Santos)
1 page
The Muncaster Steam-Engine Models: 5-Vertical Stationary Engines
No ratings yet
The Muncaster Steam-Engine Models: 5-Vertical Stationary Engines
3 pages
Overhaul of WR & IMR Bearings
No ratings yet
Overhaul of WR & IMR Bearings
2 pages
Legit Side Hustles - Low Cost Startup
From Everand
Legit Side Hustles - Low Cost Startup
Dean Archer
No ratings yet
Fundraising with Businesses: 40 New (and Improved!) Strategies for Nonprofits
From Everand
Fundraising with Businesses: 40 New (and Improved!) Strategies for Nonprofits
Joe Waters
No ratings yet
eBay's Secrets Revealed: The Insider's Guide to Advertising, Marketing, and Promoting Your eBay Store - With Little or No Money
From Everand
eBay's Secrets Revealed: The Insider's Guide to Advertising, Marketing, and Promoting Your eBay Store - With Little or No Money
Dan Blacharski
No ratings yet

Class 4-Associative Analysis

Uploaded by

Class 4-Associative Analysis

Uploaded by

Prof.

Heitor Silvério Lopes

Data Mining &

Regression Linear Regression, Polynomial regression, Logistic regression

Feature Selection & Principal component analysis (PCA), Chi-square, Entropy,

Clustering K-means, Kohonen’s self-organized map, Density-based scan,

● Answer: Association Rule Discovery !!!

Shopping basket transactions Some Association Rules (or

customer {Beer} → {Diapers}

Gabriel Bread, Diapers, Beer, Eggs

● Recall that the Generation of frequent itemsets is still

One septillion, two hundred sixty-seven sextillion,

List of candidate Transactions

○ Compare each transaction against all candidates

● This is the Apriori Principle, which is supported by the property of the

Frequent item sets →

If all the subsets were enumerated:

● Candidate pruning using the Apriori principle:

A. Silberschatz & A.Tuzhilin, On subjective measures of

● In all cases, extensive preliminar manipulation of the dataset

You might also like