0% found this document useful (0 votes)

135 views24 pages

Dm4part1 PDF

This document discusses association rule mining and summarizes a lecture given by Dr. Sanjay Ranka on the topic. It defines association rules and the concepts of support and confidence. The goal of association rule mining is to discover all rules that have support and confidence above minimum threshold values. A two-step approach is used: 1) generate frequent itemsets and 2) generate high confidence rules from each frequent itemset. Techniques for efficiently mining frequent itemsets include reducing the number of candidates, transactions, and comparisons using the Apriori principle and candidate hashing.

Uploaded by

Frederick Patacsil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

135 views24 pages

Dm4part1 PDF

Uploaded by

Frederick Patacsil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

University of Florida CISE department Gator Engineering

Association Analysis
Part 1
Dr. Sanjay Ranka
Professor
Computer and Information Science and Engineering
University of Florida
University of Florida CISE department Gator Engineering

Mining Associations
• Given a set of records, find rules that will predict
the occurrence of an item based on the
occurrences of other items in the record
Market-Basket transactions
Example:

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Definition of Association Rule

Association Rule:

Support:

Confidence:

Goal: Example:
Discover all rules having
support ≥ minsup and
confidence ≥ minconf
thresholds.

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

How to Mine Association Rules?

Example of Rules:
{Milk,Diaper} → {Beer} (s=0.4, c=0.67)
{Milk,Beer} → {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} → {Milk} (s=0.4, c=0.67)
{Beer} → {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} → {Milk,Beer} (s=0.4, c=0.5)
{Milk} → {Diaper,Beer} (s=0.4, c=0.5)

Observations:
• All the rules above correspond to the
same itemset: {Milk, Diaper, Beer}
• Rules obtained from the same
itemset have identical support but
can have different confidence
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

How to Mine Association Rules?

• Two step approach:
1. Generate all frequent itemsets (sets of items
whose support > minsup )
2. Generate high confidence association rules
from each frequent itemset
– Each rule is a binary partition of a frequent itemset

– Frequent itemset generation is more

expensive operation

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Itemset Lattice

There are 2d possible

itemsets
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Generating Frequent Itemsets

• Naive approach:
– Each itemset in the lattice is a candidate frequent
itemset
– Count the support of each candidate by scanning
the database

– Complexity ~ O(NM) => Expensive since M = 2d !!!

Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Computational Complexity
• Given d unique items:
– Total number of itemsets = 2d
– Total number of possible association rules:

If d=6, R = 602 rules

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Approach for Mining Frequent Itemsets

• Reduce the number of candidates (M)

– Complete search: M=2d
– Use Apriori heuristic to reduce M

• Reduce the number of transactions (N)

– Reduce size of N as the size of itemset increases
– Used by DHP and vertical-based mining algorithms

• Reduce the number of comparisons (NM)

– Use efficient data structures to store the candidates or
transactions
– No need to match every candidate against every
transaction
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Reducing Number of Candidates

• Apriori principle:
– If an itemset is frequent, then all of its subsets
must also be frequent
• Apriori principle holds due to the following
property of the support measure:

– Support of an itemset never exceeds the support

of any of its subsets
– This is known as the anti-monotone property of
support
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Using Apriori Principle for Pruning Candidates

If an itemset is infrequent,
then all of its supersets
must also be infrequent

Found to be
Infrequent

Pruned
supersets
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Illustrating Apriori Principle

Items (1-itemsets)

Pairs (2-itemsets)

(No need to generate

candidates involving Coke
or Eggs)

Minimum Support = 3
Triplets (3-itemsets)

If every subset is considered,

6C + 6C + 6C = 41
1 2 3
With support-based pruning,
6 + 6 + 1 = 13

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Reducing Number of Comparisons

• Candidate counting:
– Scan the database of transactions to determine
the support of candidate itemsets
– To reduce number of comparisons, store the
candidates using a hash structure

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Association Rule Discovery:

Hash Tree for Fast Access
Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
1, 4 or 7
124 159 689
125
457 458

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Association Rule Discovery:

Hash Tree for Fast Access
Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
2, 5 or 8
124 159 689
125
457 458

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Association Rule Discovery:

Hash Tree for Fast Access
Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
3, 6 or 9
124 159 689
125
457 458

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Candidate Counting
• Given a transaction L = {1,2,3,5,6}
• Possible subsets of size 3:
{1,2,3} {2,3,5} {3,5,6}
{1,2,5} {2,3,6}
{1,2,6} {2,5,6}
{1,3,5}
{1,3,6}
{1,5,6}
• If width of transaction is w, there are 2w-1
possible non-empty subsets
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Association Rule Discovery: Subset Operation

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9

2,5,8
3+ 56

234
567

145 136
345 356 367
357 368
124 159 689
125
457 458

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Association Rule Discovery: Subset Operation …

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356 2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Rule Generation
• Given a frequent itemset L, find all non-empty
subsets f ⊂ L such that f → L – f satisfies the
minimum confidence requirement
– If {A,B,C,D} is a frequent itemset, candidate rules:
ABC →D, ABD →C, ACD →B, BCD →A,
A →BCD, B →ACD, C →ABD, D →ABC
AB →CD, AC → BD, AD → BC, BC →AD,
BD →AC, CD →AB,

• If |L| = k, then there are 2k – 2 candidate

association rules (ignoring L → ∅ and ∅ → L)

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Rule Generation
• How to efficiently generate rules from frequent
itemsets?
– In general, confidence does not have an anti-
monotone property
– But confidence of rules generated from the same
itemset has an anti-monotone property
– L = {A,B,C,D}:

c(ABC → D) ≥ c(AB → CD) ≥ c(A → BCD)

• Confidence is non-increasing as number of items in rule

consequent increases

Data Mining Sanjay Ranka Spring 2011

University of Florida CISE department Gator Engineering

Rule Generation for Apriori Algorithm

Lattice of rules

• Lattice corresponds to partial order of items in the rule consequent

Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Rule Generation for Apriori Algorithm …

• Candidate rule is generated by merging

two rules that share the same prefix
in the rule consequent
• join(CD=>AB,BD=>AC)
would produce the candidate
rule D => ABC
• Prune rule D=>ABC if its
subset AD=>BC does not have
high confidence
Data Mining Sanjay Ranka Spring 2011
University of Florida CISE department Gator Engineering

Other Frequent Itemset Algorithms

• Traversal of Itemset Lattice
– Apriori uses breadth-first (level-wise)
traversal

• Representation of Database
– Apriori uses horizontal data layout

• Generate-and-count paradigm

Data Mining Sanjay Ranka Spring 2011

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
An Efficient Algorithm For Mining
No ratings yet
An Efficient Algorithm For Mining
6 pages
Introduction To Data Mining: Dr. Sanjay Ranka
No ratings yet
Introduction To Data Mining: Dr. Sanjay Ranka
44 pages
Closet - An Efficient Algorithm For Mining Frequent
No ratings yet
Closet - An Efficient Algorithm For Mining Frequent
8 pages
L6-7 - Apriori
No ratings yet
L6-7 - Apriori
22 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Unit-5 DWDM
No ratings yet
Unit-5 DWDM
7 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Frequent Pattern Analysis-Arpriori
No ratings yet
Frequent Pattern Analysis-Arpriori
27 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
p132 Closet
No ratings yet
p132 Closet
11 pages
Association Analysis: Basic Concepts and Algorithms: Problem Definition
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Problem Definition
15 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Extraction of Interesting Association Rules Using Genetic Algorithms
No ratings yet
Extraction of Interesting Association Rules Using Genetic Algorithms
8 pages
p139 Data Mining Mafia
No ratings yet
p139 Data Mining Mafia
13 pages
Association Analysis: Unit-V
No ratings yet
Association Analysis: Unit-V
12 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
12 pages
Ijctt V27P116
No ratings yet
Ijctt V27P116
7 pages
Report of 2nd Defence
No ratings yet
Report of 2nd Defence
6 pages
DWM Unit 4
No ratings yet
DWM Unit 4
11 pages
Term Paper CS705A
No ratings yet
Term Paper CS705A
8 pages
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
No ratings yet
Comparative Evaluation of Association Rule Mining Algorithms With Frequent Item Sets
7 pages
Incremental Association Rule Mining Using Promising Frequent Itemset Algorithm
No ratings yet
Incremental Association Rule Mining Using Promising Frequent Itemset Algorithm
5 pages
Slides
No ratings yet
Slides
92 pages
(IJCST-V4I2P44) :dr. K.Kavitha
No ratings yet
(IJCST-V4I2P44) :dr. K.Kavitha
7 pages
Data Mining
No ratings yet
Data Mining
5 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Volume 2, No. 5, April 2011 Journal of Global Research in Computer Science Research Paper Available Online at WWW - Jgrcs.info
No ratings yet
Volume 2, No. 5, April 2011 Journal of Global Research in Computer Science Research Paper Available Online at WWW - Jgrcs.info
3 pages
ch6 PDF
No ratings yet
ch6 PDF
82 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
No ratings yet
Image Content With Double Hashing Techniques: ISSN No. 2278-3091
4 pages
Chapter 5 Mining Frequent Pattern-DWM
No ratings yet
Chapter 5 Mining Frequent Pattern-DWM
48 pages
Utility Mining
No ratings yet
Utility Mining
5 pages
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
No ratings yet
Optimization Algorithms For Association Rule Mining (ARM) : K.Indira
118 pages
Unit 4
No ratings yet
Unit 4
21 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
C ARM: An Efficient Algorithm For Closed Association Rule Mining
No ratings yet
C ARM: An Efficient Algorithm For Closed Association Rule Mining
20 pages
Module 4
No ratings yet
Module 4
71 pages
DS2 Association
No ratings yet
DS2 Association
48 pages
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
No ratings yet
A Comparative Analysis of NFA and Tree-Based Approach For Infrequent Itemset Mining
5 pages
Data Analytics - Unit - 4
No ratings yet
Data Analytics - Unit - 4
14 pages
DM Association
No ratings yet
DM Association
43 pages
Apriori
No ratings yet
Apriori
33 pages
Generating Non-Redundant Association Rules
No ratings yet
Generating Non-Redundant Association Rules
18 pages
Literature Review On Interestingness Based Data Mining For Business Development
No ratings yet
Literature Review On Interestingness Based Data Mining For Business Development
6 pages
Mining Frequent Itemsets Using Apriori Algorithm
No ratings yet
Mining Frequent Itemsets Using Apriori Algorithm
5 pages
Mining Items From Large Database Using Coherent Rules
No ratings yet
Mining Items From Large Database Using Coherent Rules
10 pages
DMDW 3rd Module
No ratings yet
DMDW 3rd Module
34 pages
An Efficient Algorithm (Fufm) For Mining Frequent Item Sets
No ratings yet
An Efficient Algorithm (Fufm) For Mining Frequent Item Sets
5 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
65 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
Frequent Pattern Mining With Associations: Lesson Introduction
No ratings yet
Frequent Pattern Mining With Associations: Lesson Introduction
6 pages
Apriori and FP-Growth Algorithm
No ratings yet
Apriori and FP-Growth Algorithm
48 pages
Smarter Decisions – The Intersection of Internet of Things and Decision Science
From Everand
Smarter Decisions – The Intersection of Internet of Things and Decision Science
Jojo Moolayil
No ratings yet
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
From Everand
Data Lake Development with Big Data: Explore architectural approaches to building Data Lakes that ingest, index, manage, and analyze massive amounts of data using Big Data technologies
Pradeep Pasupuleti
No ratings yet
Analyzing The Relationship Between Information Technology Jobs Advertised On-Line and Skills Requirements Using Association Rules
No ratings yet
Analyzing The Relationship Between Information Technology Jobs Advertised On-Line and Skills Requirements Using Association Rules
14 pages
Three Ways To Manage Stress
No ratings yet
Three Ways To Manage Stress
4 pages
Analyzing The Relationship Between Information Technology Jobs Advertised On-Line and Skills Requirements Using Association Rules
No ratings yet
Analyzing The Relationship Between Information Technology Jobs Advertised On-Line and Skills Requirements Using Association Rules
15 pages
1582 Ijiet 2959
No ratings yet
1582 Ijiet 2959
9 pages
What Is Love
No ratings yet
What Is Love
7 pages
We Serve The Same Focussermon
No ratings yet
We Serve The Same Focussermon
8 pages
Whole Book 2021
No ratings yet
Whole Book 2021
204 pages
Facts, Faith, Feelings
No ratings yet
Facts, Faith, Feelings
8 pages
PART I. Online Assessment Tools Used
No ratings yet
PART I. Online Assessment Tools Used
5 pages
Effectiveness of Computer Assisted Instruction During The Pandemic
No ratings yet
Effectiveness of Computer Assisted Instruction During The Pandemic
2 pages
The Good Life
No ratings yet
The Good Life
3 pages
Skewness and The Mean
No ratings yet
Skewness and The Mean
5 pages
Verse
No ratings yet
Verse
1 page
Distinguished Guests
No ratings yet
Distinguished Guests
1 page
2020 Teachers Guide To Set Works and The World Focus
No ratings yet
2020 Teachers Guide To Set Works and The World Focus
25 pages
New Notes Edp Old Course PDF
No ratings yet
New Notes Edp Old Course PDF
21 pages
The Simplicity of Faith
No ratings yet
The Simplicity of Faith
4 pages
College Recommender System Using Student' Preferences/voting: A System Development With Empirical Study
No ratings yet
College Recommender System Using Student' Preferences/voting: A System Development With Empirical Study
12 pages
CH6 Foudation BI DB Information
No ratings yet
CH6 Foudation BI DB Information
21 pages
Module 4
No ratings yet
Module 4
3 pages
Dbms Project
No ratings yet
Dbms Project
10 pages
File Organization Terms and Concepts
100% (1)
File Organization Terms and Concepts
3 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
4 pages
Data Preprocessing
No ratings yet
Data Preprocessing
57 pages
Learning Paths
No ratings yet
Learning Paths
25 pages
J P Morgan Case Study
No ratings yet
J P Morgan Case Study
6 pages
Developer Guide For EMMA
100% (2)
Developer Guide For EMMA
6 pages
Unit 4 NMU
No ratings yet
Unit 4 NMU
4 pages
Geeky Banker CAIIB IT MODULE B COMPLETE
No ratings yet
Geeky Banker CAIIB IT MODULE B COMPLETE
39 pages
Frequently Asked Questions RDA
No ratings yet
Frequently Asked Questions RDA
42 pages
Overview of Exchange Server Database Architecture and Database Engine
100% (1)
Overview of Exchange Server Database Architecture and Database Engine
5 pages
Modern Anti Forensics - A Systems Disruption Approach
100% (1)
Modern Anti Forensics - A Systems Disruption Approach
50 pages
NLP Questions
No ratings yet
NLP Questions
3 pages
Secondary Storage Devices
No ratings yet
Secondary Storage Devices
36 pages
Dbms Short Notes
No ratings yet
Dbms Short Notes
15 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Accounts Question 1 PDF
No ratings yet
Accounts Question 1 PDF
11 pages
HCI Lesson Plan
No ratings yet
HCI Lesson Plan
1 page
02 As An AI Language Model Developed by OpenAI
No ratings yet
02 As An AI Language Model Developed by OpenAI
3 pages
CE4515 4.0v1 Getting Started With Sophos Central XDR Data Lake
No ratings yet
CE4515 4.0v1 Getting Started With Sophos Central XDR Data Lake
15 pages
Library System Thesis Philippines
100% (3)
Library System Thesis Philippines
8 pages
Data Engineering SQL Concepts - Mindmap
No ratings yet
Data Engineering SQL Concepts - Mindmap
1 page
Data Warehousing Summary SET A
No ratings yet
Data Warehousing Summary SET A
27 pages
Terminal and Command-Line Cheat Sheet: Simon
No ratings yet
Terminal and Command-Line Cheat Sheet: Simon
6 pages
Interrelationship of Functional Management Information Systems
100% (1)
Interrelationship of Functional Management Information Systems
11 pages
SDSD
No ratings yet
SDSD
6 pages
Assignment 1
No ratings yet
Assignment 1
8 pages
Analisa Penggunaan Aplikasi Ipusnas Oleh Warga Kecamatan Cileungsi Dalam Memenuhi Kebutuhan Informasi
No ratings yet
Analisa Penggunaan Aplikasi Ipusnas Oleh Warga Kecamatan Cileungsi Dalam Memenuhi Kebutuhan Informasi
19 pages

Dm4part1 PDF

Uploaded by

Dm4part1 PDF

Uploaded by

University of Florida CISE department Gator Engineering

Data Mining Sanjay Ranka Spring 2011

Definition of Association Rule

Data Mining Sanjay Ranka Spring 2011

How to Mine Association Rules?

How to Mine Association Rules?

– Frequent itemset generation is more

Data Mining Sanjay Ranka Spring 2011

There are 2d possible

Generating Frequent Itemsets

– Complexity ~ O(NM) => Expensive since M = 2d !!!

If d=6, R = 602 rules

Data Mining Sanjay Ranka Spring 2011

Approach for Mining Frequent Itemsets

• Reduce the number of candidates (M)

• Reduce the number of transactions (N)

• Reduce the number of comparisons (NM)

Reducing Number of Candidates

– Support of an itemset never exceeds the support

Using Apriori Principle for Pruning Candidates

Illustrating Apriori Principle

(No need to generate

If every subset is considered,

Data Mining Sanjay Ranka Spring 2011

Reducing Number of Comparisons

Data Mining Sanjay Ranka Spring 2011

Association Rule Discovery:

Data Mining Sanjay Ranka Spring 2011

Association Rule Discovery:

Data Mining Sanjay Ranka Spring 2011

Association Rule Discovery:

Data Mining Sanjay Ranka Spring 2011

Association Rule Discovery: Subset Operation

Data Mining Sanjay Ranka Spring 2011

Association Rule Discovery: Subset Operation …

Data Mining Sanjay Ranka Spring 2011

• If |L| = k, then there are 2k – 2 candidate

Data Mining Sanjay Ranka Spring 2011

c(ABC → D) ≥ c(AB → CD) ≥ c(A → BCD)

• Confidence is non-increasing as number of items in rule

Data Mining Sanjay Ranka Spring 2011

Rule Generation for Apriori Algorithm

• Lattice corresponds to partial order of items in the rule consequent

Rule Generation for Apriori Algorithm …

• Candidate rule is generated by merging

Other Frequent Itemset Algorithms

Data Mining Sanjay Ranka Spring 2011

You might also like