0% found this document useful (0 votes)

52 views36 pages

MS (Data Science) Fall 2020 Semester

This document provides information about a data mining course taught by Dr. Sohail Abdul Sattar at NED University. It lists three recommended textbooks for the course and covers topics that will be discussed, including market basket analysis, frequent itemsets, and algorithms for mining association rules from transactional data.

Uploaded by

Faiza Israr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views36 pages

MS (Data Science) Fall 2020 Semester

Uploaded by

Faiza Israr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

MS (Data Science)

Fall 2020 Semester

CT-530 DATA MINING

Dr. Sohail Abdul Sattar

Ex Professor & Co-chairman
Department of Computer Science & Information
Technology
NED University of Engineering & Technology
2

Course Teacher

Dr. Sohail Abdul Sattar

Ex Professor & Co-Chairman
Department of Computer Science & Information
Technology
NED University

PhD Computer Vision NED + UCF (Orlando, US)

MS Comp. Science NED
MCS Comp. Science KU
BE Mech. Engg. NED
3

Books
• “Introduction to Data Mining” by Tan, Steinbach, Kumar.

• Mining Massive Datasets by Anand Rajaraman, Jeff Ullman, and Jure

Leskovec. Free online book. Includes slides from the course

• “Data Mining: Concepts and Techniques”, by Jiawei Han and

Micheline Kambe

• “Data Mining: Practical Machine Learning Tools and Techniques”

by Ian H. Witten
4

Thanks to the owner of these slides !!!

DATA MINING
LECTURE 4
Market-Basket Analysis
Frequent Itemsets
6

This is how it all started…

• Rakesh Agrawal, Tomasz Imielinski, Arun N. Swami:
Mining Association Rules between Sets of Items in
Large Databases. SIGMOD Conference 1993: 207-216
• Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms
for Mining Association Rules in Large Databases.
VLDB 1994: 487-499

• These two papers are credited with the birth of Data

Mining
• For a long time people were fascinated with Association
Rules and Frequent Itemsets
• Some people (in industry and academia) still are.
7

Market-Basket Data
• A large set of items, e.g., things sold in a
supermarket.
• A large set of baskets, each of which is a small
subset of the items, e.g., the things one customer
buys on one day.
Items: {Bread, Milk, Diaper, Beer, Eggs, Coke}
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
Baskets: Transactions
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
8

Frequent itemsets
• Goal: find combinations of items (itemsets) that
occur frequently
• Called Frequent Itemsets

Support : number of
TID Items transactions that contain
1 Bread, Milk
itemset I
2 Bread, Diaper, Beer, Eggs Examples of frequent itemsets ≥ 3
3 Milk, Diaper, Beer, Coke {Bread}: 4
4 Bread, Milk, Diaper, Beer {Milk} : 4
5 Bread, Milk, Diaper, Coke {Diaper} : 4
{Beer}: 3
{Diaper, Beer} : 3
{Milk, Bread} : 3
9

Market-Baskets – (2)
• Really, a general many-to-many mapping
(association) between two kinds of things, where the
one (the baskets) is a set of the other (the items)
• But we ask about connections among “items,” not “baskets.”

• The technology focuses on common/frequent events,

not rare events (“long tail”).
10

Applications – (1)
• Items = products; baskets = sets of products
someone bought in one trip to the store.

• Example application: given that many people buy

beer and diapers together:
• Run a sale on diapers; raise price of beer.
• Only useful if many buy diapers & beer.
11

Applications – (2)
• Baskets = Web pages; items = words.

• Example application: Unusual words appearing

together in a large number of documents, e.g.,
“Brad” and “Angelina,” may indicate an interesting
relationship.
12

Applications – (3)
• Baskets = sentences; items = documents
containing those sentences.

• Example application: Items that appear together

too often could represent plagiarism.

• Notice items do not have to be “in” baskets.

Definitions
• Itemset TID Items
• A collection of one or more items 1 Bread, Milk
• Example: {Milk, Bread, Diaper} 2 Bread, Diaper, Beer, Eggs
• k-itemset 3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
• An itemset that contains k items
5 Bread, Milk, Diaper, Coke
• Support (s)
• Count: Frequency of occurrence of an itemset
• E.g. s({Milk, Bread,Diaper}) = 2
• Fraction: Fraction of transactions that contain an itemset
• E.g. s({Milk, Bread, Diaper}) = 40%
• Frequent Itemset
• An itemset whose support is greater than or equal to a
minsup threshold, minsup
14

Mining Frequent Itemsets task

• Input: Market basket data, threshold minsup
• Output: All frequent itemsets with support ≥ minsup

• Problem parameters:
• N (size): number of transactions
• Wallmart: billions of baskets per year
• Web: billions of pages
• d (dimension): number of (distinct) items
• Wallmart sells more than 100,000 items
• Web: billions of words
• w: max size of a basket
• M: Number
M =2𝑑 of possible itemsets.
15

The itemset lattice

null Representation of all possible
itemsets and their relationships

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there are 2d

ABCDE possible itemsets
16

A Naïve Algorithm
• Brute-force approach: Every itemset is a candidate :
• Consider all itemsets in the lattice, and scan the data for each candidate to
compute the support
• OR
• Scan the data, and for each transaction generate all possible itemsets.
Keep a count for each itemset in the data.

• Expensive since M = 2d !!!

• No solution that considers all candidates is acceptable!

Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w
17

Computation Model
• Typically, data is kept in flat files rather than in a
database system.
• Stored on disk.
• Stored basket-by-basket.
• We can expand a basket into pairs, triples, etc. as we read
the data.
• Use k nested loops, or recursion to generate all itemsets of size k.

• Data is too large to be loaded in memory.

Example file: retail

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
30 31 32
33 34 35
36 37 38 39 40 41 42 43 44 45 46
38 39 47 48
38 39 48 49 50 51 52 53 54 55 56 57 58 Example: items are positive integers,
32 41 59 60 61 62
3 39 48 and each basket corresponds to a line in
63 64 65 66 67 68 the file of space-separated integers
32 69
48 70 71 72
39 73 74 75 76 77 78 79
36 38 39 41 48 79 80 81
82 83 84
41 85 86 87 88
39 48 89 90 91 92 93 94 95 96 97 98 99 100 101
36 38 39 48 89
39 41 102 103 104 105 106 107 108
38 39 41 109 110
39 111 112 113 114 115 116 117 118
119 120 121 122 123 124 125 126 127 128 129 130 131 132 133
48 134 135 136
39 48 137 138 139 140 141 142 143 144 145 146 147 148 149
39 150 151 152
38 39 56 153 154 155
19

Computation Model – (2)

• The true cost of mining disk-resident data is
usually the number of disk I/O’s.
• In practice, association-rule algorithms read the
data in passes – all baskets read in turn.

• Thus, we measure the cost by the number of

passes an algorithm takes.
20

Main-Memory Bottleneck
• For many frequent-itemset algorithms, main
memory is the critical resource.
• As we read baskets, we need to count something, e.g.,
occurrences of pairs.
• The number of different things we can count is limited
by main memory.
• Swapping counts in/out is too slow
21

Computational Model - Summary

• There are two quantities that capture the cost of
our algorithm
1. [Time] The number of passes we make over the data
2. [Space] The amount of memory that we use

• As it is usually the case, there is a trade-off

between the two.
22

The Apriori Principle

• Apriori principle (Main observation):
– If an itemset is frequent, then all of its subsets must also
be frequent
– If an itemset is not frequent, then all of its supersets
cannot be frequent
– The support of an itemset never exceeds the support of
its subsets
∀ 𝑋 ,𝑌 : 𝑋 ⊆ 𝑌 ⇒ 𝑠 ( 𝑋 ) ≥ 𝑠(𝑌 )

– This is known as the anti-monotone property of support

Illustration of the Apriori principle

Frequent
subsets

Found to be frequent
24

Illustration of the Apriori principle

null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Infrequent supersets
ABCDE
Pruned
25

The Apriori algorithm

Ck = candidate itemsets of size k
Level-wise approach
Lk = frequent itemsets of size k

1. k = 1, C1 = all items
2. While Ck not empty
Frequent
itemset 3. Scan the database to find which itemsets in
generation Ck are frequent and put them into Lk
Candidate 4. Generate the candidate itemsets Ck+1 of
generation
size k+1 using Lk
5. k = k+1
R. Agrawal, R. Srikant: "Fast Algorithms for Mining Association Rules",
Proc. of the 20th Int'l Conference on Very Large Databases, 1994.
26

Candidate Generation
• Apriori principle:
• An itemset of size k+1 is candidate to be frequent only if
all of its subsets of size k are known to be frequent

Candidate generation:
• Construct a candidate of size k+1 by combining
frequent itemsets of size k
• If k = 1, take the all pairs of frequent items
• If k > 1, join pairs of itemsets that differ by just one item
• For each generated candidate itemset ensure that all
subsets of size k are frequent.
27

Generate Candidates Ck+1

• Assumption: The items in an itemset are ordered
• Integers ordered in increasing order, strings ordered lexicographicly
• The order ensures that if item y > x appears before x, then x is not in
the itemset

• The itemsets in Lk are also ordered

Create a candidate itemset of size k+1, by joining
two itemsets of size k, that share the first k-1 items
Item 1 Item 2 Item 3
1 2 3
1 2 5
1 4 5
28

Generate Candidates Ck+1

• Assumption: The items in an itemset are ordered
• Integers ordered in increasing order, strings ordered in lexicographicly
• The order ensures that if item y > x appears before x, then x is not in
the itemset

• The itemsets in Lk are also ordered

Create a candidate itemset of size k+1, by joining
two itemsets of size k, that share the first k-1 items
Item 1 Item 2 Item 3
1 2 3
1 2 3 5
1 2 5
1 4 5
29

Generate Candidates Ck+1

• The itemsets in Lk are also ordered

Create a candidate itemset of size k+1, by joining
two itemsets of size k, that share the first k-1 items
Item 1 Item 2 Item 3
Are we missing something?
1 2 3
What about this candidate?
1 2 5
1 2 4 5
1 4 5
30

Example
• L3={abc, abd, acd, ace, bcd}

• Generating candidate set C4

• Self-join: L3*L3
item1 item2 item3 item1 item2 item3
a b c a b c
a b d a b d
a c d a c d
a c e a c e
b c d b c d
31

Example
• L3={abc, abd, acd, ace, bcd}

• Generating candidate set C4

• Self-join: L3*L3
item1 item2 item3 item1 item2 item3
a b c a b c
a b d a b d
a c d a c d
a c e a c e
b c d b c d
32

Example
• L3={abc, abd, acd, ace, bcd}

• Generating candidate set C4

C4 ={abcd}
• Self-join: L3*L3
item1 item2 item3 item1 item2 item3
a b c a b c
a b d a b d
a c d a c d {a,b,c} {a,b,d}

a c e a c e
{a,b,c,d}
b c d b c d

p.item1=q.item1,p.item2=q.item2, p.item3< q.item3

Example
• L3={abc, abd, acd, ace, bcd}

• Generating candidate set C4

C4 ={abcd
• Self-join: L3*L3
acde}
item1 item2 item3 item1 item2 item3
a b c a b c
a b d a b d
{a,c,d} {a,c,e}
a c d a c d
a c e a c e {a,c,d,e}
b c d b c d

p.item1=q.item1,p.item2=q.item2, p.item3< q.item3

Number of Combinations
35

TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
36

Illustration of the Apriori principle

TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
minsup = 3 3 Milk, Diaper, Beer, Coke
Item Count Items (1-itemsets) 4 Bread, Milk, Diaper, Beer
Bread 4 5 Bread, Milk, Diaper, Coke
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3
Diaper 4
{Bread,Milk} 3
{Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Triplets (3-itemsets)

If every subset is considered, Itemset Count
+ + = 6 + 15 + 20 = 41 {Bread,Milk,Diaper} 2
With support-based pruning,
+ + = 6 + 6 + 1 = 13 Only this triplet has all subsets to be frequent
But it is below the minsup threshold

GST Telugu
No ratings yet
GST Telugu
285 pages
Data Mining: Frequent Itemsets and Association Rules
No ratings yet
Data Mining: Frequent Itemsets and Association Rules
105 pages
Datamining Lect2 Frequent
No ratings yet
Datamining Lect2 Frequent
59 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Big Data Analytics AAM Unit 4
No ratings yet
Big Data Analytics AAM Unit 4
80 pages
Unit - III
No ratings yet
Unit - III
38 pages
L2: Frequent Itemsets Mining and Association Rules
No ratings yet
L2: Frequent Itemsets Mining and Association Rules
54 pages
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
No ratings yet
Unit 4 - DA - Frequent Itemsets and Clustering-1 (Unit-5)
86 pages
Unit 4 - DA - Frequent Itemsets and Associations
No ratings yet
Unit 4 - DA - Frequent Itemsets and Associations
31 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Equent Patterns
No ratings yet
Equent Patterns
74 pages
5 Frequent Pattern Mining
No ratings yet
5 Frequent Pattern Mining
44 pages
CS2202 AssociationRuleMining
No ratings yet
CS2202 AssociationRuleMining
59 pages
CH 4
No ratings yet
CH 4
51 pages
Chap5 Frequent Itemset
No ratings yet
Chap5 Frequent Itemset
70 pages
Association Rules
No ratings yet
Association Rules
58 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
Frequent Itemsets and Associations
No ratings yet
Frequent Itemsets and Associations
15 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
No ratings yet
Data Mining and Predictive Modeling: Lecture 9: Association Rule Mining, Apriori Algorithm
24 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
Slides
No ratings yet
Slides
92 pages
Dataanalytics Unit-4
No ratings yet
Dataanalytics Unit-4
23 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
DM Association
No ratings yet
DM Association
43 pages
DM 2
No ratings yet
DM 2
71 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Dmunit 2
No ratings yet
Dmunit 2
85 pages
Unit 2
No ratings yet
Unit 2
14 pages
ch03 Assocrules
No ratings yet
ch03 Assocrules
59 pages
Dm&bi - L10-Association Rules
No ratings yet
Dm&bi - L10-Association Rules
43 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
65 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
(2025-05-27) - FPM - Lecture 9
No ratings yet
(2025-05-27) - FPM - Lecture 9
35 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
P-3 1 5-Association
No ratings yet
P-3 1 5-Association
46 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
38 GM - ASAP-Association Rule Mining
No ratings yet
38 GM - ASAP-Association Rule Mining
64 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
DM - Unit 2
No ratings yet
DM - Unit 2
49 pages
Unit 5
No ratings yet
Unit 5
40 pages
Bu I 11 FIM Apriori
No ratings yet
Bu I 11 FIM Apriori
72 pages
BITS WASE Data Mining Session 5 PDF
No ratings yet
BITS WASE Data Mining Session 5 PDF
83 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Lecture 10-Assiciation Rule Mining-I-M
No ratings yet
Lecture 10-Assiciation Rule Mining-I-M
30 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
Session 8-Association Rules Mining
No ratings yet
Session 8-Association Rules Mining
75 pages
Association Rules
No ratings yet
Association Rules
39 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
178 pages
I Can Be a Cool Coder
From Everand
I Can Be a Cool Coder
Thomas Canavan
No ratings yet
3-D Engineering: Design and Build Your Own Prototypes
From Everand
3-D Engineering: Design and Build Your Own Prototypes
Vicki V. May
No ratings yet
Jeans and a T-Shirt: Fun and Fabulous Upcycling Projects for Denim and More
From Everand
Jeans and a T-Shirt: Fun and Fabulous Upcycling Projects for Denim and More
Niki Meiners
No ratings yet
Data Structure Unit-2 Quiz
No ratings yet
Data Structure Unit-2 Quiz
7 pages
Certificado Profissional - IBM Data Analyst
No ratings yet
Certificado Profissional - IBM Data Analyst
1 page
89 JSP Interview Questions 1
No ratings yet
89 JSP Interview Questions 1
13 pages
Basic .NET, ASP - Net, OOPS and SQL Server Interview Questions and Answers
No ratings yet
Basic .NET, ASP - Net, OOPS and SQL Server Interview Questions and Answers
11 pages
8 SDLC - Class
No ratings yet
8 SDLC - Class
15 pages
Week 8-Requirements Elicitation & Analysis
No ratings yet
Week 8-Requirements Elicitation & Analysis
7 pages
How To Open Password Protected ZIP File Without Password ?
No ratings yet
How To Open Password Protected ZIP File Without Password ?
13 pages
UNIT 6 Spreadsheets and Database Packages
No ratings yet
UNIT 6 Spreadsheets and Database Packages
15 pages
Sistem Informasi Pengolahan Data Barang
No ratings yet
Sistem Informasi Pengolahan Data Barang
7 pages
Ai For IT Coders
No ratings yet
Ai For IT Coders
18 pages
Java File Dialog Box
No ratings yet
Java File Dialog Box
4 pages
Add Bank and Branch Duty To Employee Role Including Regenerating Roles
No ratings yet
Add Bank and Branch Duty To Employee Role Including Regenerating Roles
5 pages
Configuration Management (SRAN7.0 01)
No ratings yet
Configuration Management (SRAN7.0 01)
19 pages
PRN222Lab - 01 - ProductManagement - ASP. NET Core Web App MVC
No ratings yet
PRN222Lab - 01 - ProductManagement - ASP. NET Core Web App MVC
37 pages
Nandhan CV
No ratings yet
Nandhan CV
2 pages
CP Practical 9
No ratings yet
CP Practical 9
5 pages
Application Audit Work Program
No ratings yet
Application Audit Work Program
2 pages
Big Data EarthScape Climate Agency
No ratings yet
Big Data EarthScape Climate Agency
18 pages
Veeam Backup 11 0 Tapes User Guide
No ratings yet
Veeam Backup 11 0 Tapes User Guide
235 pages
Software Process and Agile Practices Course Notes
No ratings yet
Software Process and Agile Practices Course Notes
64 pages
Fundamentals of Data Structures: Technical
No ratings yet
Fundamentals of Data Structures: Technical
404 pages
ORDER MANAGEMENT (Interview Questions) : What Does Back Ordered Mean in OM?
No ratings yet
ORDER MANAGEMENT (Interview Questions) : What Does Back Ordered Mean in OM?
9 pages
Inventory Management System (Lab3)
No ratings yet
Inventory Management System (Lab3)
10 pages
Database Theory Report 2
No ratings yet
Database Theory Report 2
2 pages
IMportant Points in The IBGI Process
No ratings yet
IMportant Points in The IBGI Process
10 pages
Project Managemnet
No ratings yet
Project Managemnet
67 pages
Database Systems: Design, Implementation, and Management: Data Models
No ratings yet
Database Systems: Design, Implementation, and Management: Data Models
49 pages
Administrator Guide: Toolkit Overview
No ratings yet
Administrator Guide: Toolkit Overview
182 pages
Web Services With REST and ICF
No ratings yet
Web Services With REST and ICF
12 pages

MS (Data Science) Fall 2020 Semester

Uploaded by

MS (Data Science) Fall 2020 Semester

Uploaded by

MS (Data Science)

Fall 2020 Semester

CT-530 DATA MINING

Dr. Sohail Abdul Sattar

Dr. Sohail Abdul Sattar

PhD Computer Vision NED + UCF (Orlando, US)

• Mining Massive Datasets by Anand Rajaraman, Jeff Ullman, and Jure

• “Data Mining: Concepts and Techniques”, by Jiawei Han and

• “Data Mining: Practical Machine Learning Tools and Techniques”

Thanks to the owner of these slides !!!

This is how it all started…

• These two papers are credited with the birth of Data

• The technology focuses on common/frequent events,

• Example application: given that many people buy

• Example application: Unusual words appearing

• Example application: Items that appear together

• Notice items do not have to be “in” baskets.

Mining Frequent Itemsets task

The itemset lattice

ABCD ABCE ABDE ACDE BCDE

Given d items, there are 2d

• Expensive since M = 2d !!!

• Data is too large to be loaded in memory.

Example file: retail

Computation Model – (2)

• Thus, we measure the cost by the number of

Computational Model - Summary

• As it is usually the case, there is a trade-off

The Apriori Principle

– This is known as the anti-monotone property of support

Illustration of the Apriori principle

Illustration of the Apriori principle

ABCD ABCE ABDE ACDE BCDE

The Apriori algorithm

Generate Candidates Ck+1

• The itemsets in Lk are also ordered

Generate Candidates Ck+1

• The itemsets in Lk are also ordered

Generate Candidates Ck+1

• The itemsets in Lk are also ordered

• Generating candidate set C4

• Generating candidate set C4

• Generating candidate set C4

p.item1=q.item1,p.item2=q.item2, p.item3< q.item3

• Generating candidate set C4

p.item1=q.item1,p.item2=q.item2, p.item3< q.item3

Illustration of the Apriori principle

You might also like