Module 3 - Part 2 - Frequency Pattern Mining
Module 3 - Part 2 - Frequency Pattern Mining
ANALYTICS
5 {I,O} - Total 1
7 {M} - Total 1
8 Empty set
• With Apriori the no. of candidate itemsets processed is about 40% of the possible
itemsets.
FPM –Apriori –Mining Frequent Itemset
• The structure used to illustrate the principle of Apriori is
called as Enumeration tree”.
Practice: Thus the Rule X Y – X does not meet the min_conf threshold
Try with
b) Rule X’ Y – X’ is to be seen as { I } {M, O} with confidence calculated as
X’={O}
conficence( X’
• Thus the same rules that were strong when the groups are analyzed separately will
become much weaker when combined.
Other Types of Patterns - Sequences
• The other patterns are: Sequences or Graphs.
• Their mining is based on similar principles as were presented for Frequent itemset
mining.
• The approaches for mining frequent sequences and graphs are mostly extensions and
modifications of Apriori, Eclat, and FP-Growth methods.
• Here only basic definitions, focusing on sequential patterns is considered
• The input to sequential pattern mining is a sequence database, denoted by S.
• Each row consists of a sequence of events consecutively recorded in time.
• Each event is an itemset of arbitrary length assembled from items available in the data.
• Let us have two sequences S1 = < X1, X2, …., Xn> and S2 = <Y1, Y2, ….., Ym> , where
n <= m. Then S1 is called “subsequence” of S2 if there exists 1<i1<i2<…<in <= m such
that X1 is a subset of Yi1, X2 is a subset of Yi2,………….
Other Types of Patterns - Sequences
• The support of a given sequence s, in the sequence database S is the number of rows in
S of which s is a subsequence
• Eg., Consider customer id=1 purchasing pattern as below:
• 1st visit: Items a & b ; 2nd visit: Items a, b & c ; 3rd visit: Items a,c,d,e ; 4th visit: items b & f
• Let the sequence be s2 = < Y1, Y2, Y3, Y4> where Y1 = {a,b}, Y2= {a,b,c}, Y3={a,d,c,e}
and Y4 = {b,f} having m=4
• Let s1 = <X1, X2> where X1={b} and X2={a,d,e} having n=2
• It can be observed that, there exists i=1 and i=3 where X1 is subset of Y1 and X2 is subset
of Y3. Therefore s1 is a subsequence of s2
• Similary another mapping also exists where X1 is subset of Y2 and X2 is subset of Y3.
• An example of calculation of support for a sequence is shown in next slide.
Other Types of Patterns - Sequences
• Consider the sequence database S
with items a, b, c, d, e, f as shown
besides
From the table it can be seen that the support for sequence
in data base is 4/5 = 0.8
Sequences: Frequent sequence mining
• Let I be set of all available items; S – sequential database and min_sup is threshold
value
• The aim of frequent sequence mining is to find those sequences, called frequent sequences,
generated from I for which support in S is at least = min_sup
• Note: The no. of frequent sequences generated from S with available items I is usually
greater than no. of itemsets generated from I
• E.g. The no. of possible itemsets with 6 items a, b, c, d, e, f are 26 – 1 = 63
For the table shown in the previous page,
a) with min-sup = 0.8 , frequent sequences= 20
b) with min_sup = 0.4, frequent sequences - 237
Sequences: Closed and maxima sequences
• A frequent sequential pattern is closed if it is not a subsequence of any other frequent
sequential pattern with the same support.
• A frequent sequential pattern is maximal if it is not a subsequence of any other frequent
sequential pattern
End of Module 3