100% found this document useful (1 vote)
143 views108 pages

Module 5.1 - Association Rule Mining, Apriori Algorithm, Data Mining, Support, Confidence, Examples

The document discusses frequent pattern mining and the Apriori algorithm. It begins with an overview of frequent itemset mining and its applications. It then covers key concepts like support, confidence and interestingness of association rules. The document explains the steps of the Apriori algorithm, including how it uses the Apriori property to reduce the search space. An example is provided to illustrate how the algorithm works. Finally, techniques to improve the efficiency of Apriori, such as hash-based methods, are discussed.

Uploaded by

MR Conqueror
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
143 views108 pages

Module 5.1 - Association Rule Mining, Apriori Algorithm, Data Mining, Support, Confidence, Examples

The document discusses frequent pattern mining and the Apriori algorithm. It begins with an overview of frequent itemset mining and its applications. It then covers key concepts like support, confidence and interestingness of association rules. The document explains the steps of the Apriori algorithm, including how it uses the Apriori property to reduce the search space. An example is provided to illustrate how the algorithm works. Finally, techniques to improve the efficiency of Apriori, such as hash-based methods, are discussed.

Uploaded by

MR Conqueror
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 108

Data Mining:

Concepts and Techniques

Module 5
Mining Frequent Patterns, Association and
Correlations

* Data Mining: Concepts and Techniques 1


Frequent Itemset Mining:Market Basket Analysis
Frequent itemset mining leads to the discovery of associations and correlations
among items in large transactional or relational data sets

The results of MBA can be used :


• To plan Marketing strategies
• To plan Advertising strategies
• To develop new designs (e.g
Store layout designs)
• It help retailers to plan which
items to put on sale at reduced
prices.

* 2
Frequent Itemset Mining:Market Basket Analysis
▪ Frequent patterns can be represented in the form of association rules.

▪ Rule support and confidence are two measures of rule interestingness.

▪ They respectively reflect the usefulness and certainty of discovered rules.

▪ A support of 2% for Rule means that 2% of all the transactions under


analysis show that computer and antivirus software are purchased together.

▪ A confidence of 60% means that 60% of the customers who purchased a


computer also bought the software.

▪ Typically, association rules are considered interesting if they satisfy both a


minimum support threshold and a minimum confidence threshold.

* 3
Basic Terms of Association Rule Mining

Absolute support

Relative support

4
Support and confidence

• Rules that satisfy both a minimum support threshold (min sup) and a minimum
confidence threshold (min conf ) are called strong. 5
Association Rule Mining steps
■ In general, association rule mining can be viewed as a two-step process:
1. Find all frequent itemsets: By definition, each of these itemsets will occur at
least as frequently as a predetermined minimum support count, min sup.

2. Generate strong association rules from the frequent itemsets: By


definition, these rules must satisfy minimum support and minimum confidence.

6
Frequent Itemset,Closed Itemset and Maximal
Frequent Itemset

8
Downward Closure Property of Frequent Patterns

9
11
Example: Frequent Itemset,Closed Itemset and
Maximal Frequent Itemset

12
Example: Frequent Itemset,Closed Itemset and
Maximal Frequent Itemset
2-itemsets

13
Example: Frequent Itemset,Closed Itemset and
Maximal Frequent Itemset
2-itemsets
Is C closed,Maximal?

Is D closed,Maximal?

14
Example: Frequent Itemset,Closed Itemset and
Maximal Frequent Itemset
3-Itemsets Is AB closed,Maximal?

Is AC closed,Maximal?

Is AD frequent?

15
Example: Frequent Itemset,Closed Itemset and
Maximal Frequent Itemset

16
Example: Frequent Itemset,Closed Itemset and
Maximal Frequent Itemset
4-Itemsets Is ABC closed,Maximal?

What about ABCD?

ABCD is not frequent Is BCD closed,Maximal?

17
18
Example

19
Example

20
21
22
23
Frequent Itemset Mining Methods:
Apriori Algorithm: Finding Frequent Itemsets by Confined Candidate Generation

■ Algorithm for mining frequent itemsets for Boolean association rules.


■ The algorithm uses prior knowledge of frequent itemset properties.
■ Apriori employs an iterative approach known as a level-wise search, where k-
itemsets are used to explore (k+1) - itemsets.
■ First, the set of frequent 1-itemsets is found by scanning the database to
accumulate the count for each item, and collecting those items that satisfy
minimum support. The resulting set is denoted by L1.
■ Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to
find L3, and so on, until no more frequent k-itemsets can be found.
■ The finding of each Lk requires one full scan of the database.
■ To improve the efficiency of the level-wise generation of frequent itemsets, an
the Apriori property (Any subset of frequent itemset must be frequent) is used to
reduce the search space.

* 24
“How is the Apriori property used in the
algorithm?”

How Lk-1 is used to find Lk for k= 2?


1. Join Step: To find Lk, a set of candidate k-itemsets is generated by
joining Lk-1 with itself. This set of candidates is denoted as Ck.
● The join, Lk-1 ⋈ Lk-1, is performed, where members of Lk-1
are
joinable if their first (k -2) items are in common.

1. The prune step: Select only frequent itemsets


■ Apriori pruning principle: If there is any itemset which is
infrequent, its superset should not be generated/tested!

25
26
The Apriori Algorithm
■ Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=∅; k++) do begin
Ck+1 = candidates generated from Lk ;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return ∪k Lk ;
* Data Mining: Concepts and Techniques 28
The Apriori Algorithm—An Example
Supmin = 2 Itemset sup
Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup
{A, C} 2 2nd scan {A, B}
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3 scan
rd L3 Itemset sup
{B, C, E} {B, C, E} 2 C: Candidate Itemset
L: Frequent Itemset
* Data Mining: Concepts and Techniques 29
Example:Apriori Algorithm

30
Example:Apriori Algorithm

31
Example:Apriori Algorithm

32
Example:Apriori Algorithm

33
Example:Apriori Algorithm

34
Example:Apriori Algorithm

35
Example:Apriori Algorithm

36
Excercise

37
Excercise

38
Excercise

39
Excercise

40
Excercise

41
Excercise

42
Exercise 2

43
Exercise 2

44
Exercise 2

45
Exercise 2

46
Exercise 2

47
Exercise 2

48
Exercise 2

49
Apriori Algorithm

50
Improving Efficiency of Apriori
Algorithm

51
Improving Efficiency of Apriori
Algorithm

52
Improving Efficiency of Apriori
Algorithm: Hash Based Technique

53
Improving Efficiency of Apriori
Algorithm

54
Improving Efficiency of Apriori
Algorithm: Transaction Reduction

55
Improving Efficiency of Apriori
Algorithm: Transaction Reduction

This method reduces no of transaction scans.


56
Improving Efficiency of Apriori
Algorithm

57
Improving Efficiency of Apriori
Algorithm: Partitioning

58
Improving Efficiency of Apriori
Algorithm:

59
Improving Efficiency of Apriori
Algorithm:DIC

60
Improving Efficiency of Apriori
Algorithm:DIC

61
Improving Efficiency of Apriori
Algorithm:DIC

62
e. Sampling
■ The fundamental idea of the sampling approach is to select a random
sample S of the given data D, and then search for frequent itemsets in
S rather than D.

■ It may be possible to lose the global frequent itemset. This can be


reduced by lowering the minimum support.

■ The sample size of S is such that the search for frequent itemsets in S
can be completed in main memory, and therefore only one scan of the
transactions in S is needed overall.

■ In this method, it can trade off some degree of accuracy against


efficiency.

63
Applications Of Apriori Algorithm

1. Improved store layout


2. Cross marketing
3. Focused attached mailings / add-on sales
4. Maintenance Agreement (What the store should do to
boost Maintenance Agreement sales)
5. Home Electronics (What other products should the
store stock up?)
6. Pharmacy to find frequent medicine itemsets
7. for Student's Trends Analysis

64
Application of Apriori Algorithm in the Real World

1. Education Field

As we all know that all the students studying in the school have different
characteristics and personalities like every student are different in age, gender,
different names, and different parents' names, etc.

2. Medical Field

In every hospital, there are a lot of patients admitted over there every patient have
a different kind of disease according to which they are given a different treatment,
and they all will have a different type of characteristics and different medical
history, so here it is necessary to use the computer science method of Apriori
algorithm in order analyze the patients' database. So that there should be no
mixing of the information of different patients
65
4. New Technology Business Firms

Apriori is used by many companies like Flipkart, Amazon, etc. where they have to
maintain the record of various items of products that are purchased by various
customers for recommender systems and by google for the autocomplete
features.

5. Offices

One of the most efficient uses of this computer science technique is in the offices
where they have to record a large number of day to day transactions related to
sale and purchase of various good and services, like recording the transactions of
creditors, sales and purchases so there is need of analysis of all such
transactions so that there should not be any kind of confusion.

6. Mobile e-Commerce
66
Applications Of Apriori Algorithm

Some fields where Apriori is used:

1. In Education Field: Extracting association rules in data


mining of admitted students through characteristics and
specialties.
2. In the Medical field: For example Analysis of the patient’s
database.
3. In Forestry: Analysis of probability and intensity of forest
fire with the forest fire data.
4. Apriori is used by many companies like Amazon in the
Recommender System and by Google for the auto-
complete feature.

67
FP (Frequent Pattern Algorithm)

68
FP (Frequent Pattern Algorithm)

69
FP (Frequent Pattern Algorithm)

70
FP (Frequent Pattern Algorithm)

71
FP (Frequent Pattern Algorithm)

72
FP (Frequent Pattern Algorithm)

73
74
75
Conditional pattern base(CPB): This is a truncated version of the transaction database
that is pertinent to a particular prefix.

76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
ECLAT(Equivalence Class Transformation)
Vertical Apriori

93
94
95
96
97
98
99
100
Single Dimensional Association Rules

101
Single Dimensional Association Rules

102
Single dimensional Association Rules

103
Single Dimensional Association Rules

104
Multidimensional Association Rules

105
Multidimensional Association Rules

106
Multidimensional Association Rules

107
Multidimensional Association Rules

108
Multidimensional Association Rules

109
Multidimensional Association Rules

110
Multidimensional Association Rules

111

You might also like