0% found this document useful (0 votes)
168 views31 pages

Frequent Item-Set Mining Methods: Prepared By-Mr - Nilesh Magar

The document discusses frequent itemset mining methods. It begins with an introduction to frequent itemset mining and the Apriori algorithm. The Apriori algorithm uses a candidate generation-and-test approach to find frequent itemsets. It performs two main steps - join and prune - to generate candidate itemsets. The document then discusses other methods like the FP-Growth approach that mines frequent itemsets without candidate generation by constructing a frequent-pattern tree. It also discusses transforming the transaction database to a vertical format to efficiently mine frequent itemsets. Overall, the document provides an overview of key concepts and algorithms for frequent itemset mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
168 views31 pages

Frequent Item-Set Mining Methods: Prepared By-Mr - Nilesh Magar

The document discusses frequent itemset mining methods. It begins with an introduction to frequent itemset mining and the Apriori algorithm. The Apriori algorithm uses a candidate generation-and-test approach to find frequent itemsets. It performs two main steps - join and prune - to generate candidate itemsets. The document then discusses other methods like the FP-Growth approach that mines frequent itemsets without candidate generation by constructing a frequent-pattern tree. It also discusses transforming the transaction database to a vertical format to efficiently mine frequent itemsets. Overall, the document provides an overview of key concepts and algorithms for frequent itemset mining.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Frequent Item-set Mining

Methods

Prepared By- Mr.Nilesh Magar


Data Mining:

Data mining is the efficient discovery of


valuable, non obvious information from a
large collection of data.

Prepared By- Mr.Nilesh Magar


Frequent Item-set:

 Most important concepts in Data-mining

 Item-set & frequent item-set:

 Market Basket model

Prepared By- Mr.Nilesh Magar


Example Of Market basket Model:
B1 = {m, c, b} B2 = {m, p, j} B3 = {m, b} B4 = {c, j}

B5 = {m, p, b} B6 = {m, c, b, j} B7 = {c, b, j} B8 = {b, c}

Suppose Min support =3

Frequent item-sets: {m:5}, {c:5}, {b:6}, {j:4}, {m, b:4}, {c,


b:4}, {j, c:3}.

Prepared By- Mr.Nilesh Magar


Association Rule:
 Medical diagnosis dataset-symptoms and illness.

 A rule is define as an implication of the form X Y where X,Y


I (Items). Or in other words: if { i1, i2,…, ik} j, means: if a
basket contains all of i1,…, ik then it is likely to Contain j.

 The probability of finding Y for us to accept this rule is called


the confidence of the rule.

 Conf(X Y)=SUPP(X U Y)/SUPP(X)

 {m,b}c ::: Confidence = 2/4 = 50%

 Thus Association mining is 2 step Process:

 Find all frequent item-sets:

 Generate Strong association rules


Prepared By- Mr.Nilesh Magar from frequent item-set
The Apriori algorithm

 Mining frequent item-set for Boolean association rule

 Prior knowledge

 Iterative approach known as level-wise search k-item-


sets are used to explore (k+1)-item-sets

 One full scan of the database required to find lK , L1->


Items with Min Support. L2-> generating 2-item-set etc.

Prepared By- Mr.Nilesh Magar


 Two steps:

Join
finding Lk, a set of candidate k-itemsets is generated
by joining Lk-1 with itself
Prune
To reduce the size of Ck the Apriori property is used:
if any (k-1) subset of a candidate k-itemset is not in
Lk-1, then the candidate cannot be frequent either,so
it can be removed from Ck. – subset testing.

Prepared By- Mr.Nilesh Magar


Join & prune
Step

Prepared By- Mr.Nilesh Magar


Example:
TID List of item_IDs
T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
Prepared By- Mr.Nilesh Magar
Prepared By- Mr.Nilesh Magar
 Scan D for count of each candidate
 C1: I1 – 6, I2 – 7, I3 -6, I4 – 2, I5 - 2
 Compare candidate support count with minimum support count (min_sup=2)
 L1: I1 – 6, I2 – 7, I3 -6, I4 – 2, I5 - 2
 Generate C2 candidates from L1 and scan D for count of each candidate
 C2: {I1,I2} – 4, {I1, I3} – 4, {I1, I4} – 1, …
 Compare candidate support count with minimum support count
 L2: {I1,I2} – 4, {I1, I3} – 4, {I1, I5} – 2, {I2, I3} – 4, {I2, I4} - 2, {I2, I5} – 2
 Generate C3 candidates from L2 using the join and prune steps:
 Join: C3=L2xL2={{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4,
I5}}
 Prune: C3: {I1, I2, I3}, {I1, I2, I5}
 Scan D for count of each candidate
 C3: {I1, I2, I3} - 2, {I1, I2, I5} – 2
 Compare candidate support count with minimum support count
 L3: {I1, I2, I3} – 2, {I1, I2, I5} – 2
 Generate C4 candidates from L3
 C4=L3xL3={I1, I2, I3, I5}
 This itemset is pruned, because its subset {{I2, I3, I5}} is not frequent => C4=null

Prepared By- Mr.Nilesh Magar


Generating association rules from
frequent item-sets: from Slide 5

Finding the frequent item-sets from


transactions in a database D
Generating strong association rules:
Confidence(A=>B)=P(B|A)=
support_count(AUB)/support_count(A)
support_count(AUB) – number of transactions
containing the itemsets AUB
support_count(A) - number of transactions containing
the itemsets A

Prepared By- Mr.Nilesh Magar


 Example:
 lets have l={I1, I2, I5}
 The nonempty subsets are {I1, I2}, {I1, I5}, {I2, I5}, {I1}, {I2}, {I5}.
 Generating association rules:

I1 and I2=>I5 conf=2/4=50%


I1 and I5=>I2 conf=2/2=100%
I2 and I5=> I1 conf=2/2=100%
I1=>I2 and I5 conf=2/6=33%
I2=>I1 and I5 conf=2/7=29%
I5=>I1 and I2 conf=2/2=100%
If min_conf is 70%, then only the second, third and last rules above
are output.

Prepared By- Mr.Nilesh Magar


Advantages & Disadvantages:

 Adv: Dis-Adv:
1) Assumes
 1) Uses Large item-set
transaction
Property database is
 2) Easily parallelized memory resident
Requires up to ‘m’
 3) Easy to implement
database scan.

Prepared By- Mr.Nilesh Magar


Mining Frequent Itemsets without
candidate generation
The candidate generate and test method
It may need to generate a huge number of
candidate sets
It may need to repeatedly scan the database
and check a large set of candidates by pattern
matching
Frequent-pattern growth method(FP-
growth) – frequent pattern tree(FP-tree)

Prepared By- Mr.Nilesh Magar


Example:
TID List of item_IDs
T100 I1, I2, I5
T200 I2, I4
T300 I2, I3
T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
Prepared By- Mr.Nilesh Magar
Step-1: Step-2:Arrange Transaction in descending order
Item Count TID List of item List of item
(Before) (After)
I1 6
T100 I1, I2, I5 I2,I1,I5
I2 7 T200 I2, I4 I2,I4
I3 6 T300 I2, I3 I2,I3
T400 I1, I2, I4 I2,I1,I4
I4 2
T500 I1, I3 I1,I3
I5 2 T600 I2, I3 I2,I3
T700 I1, I3 I1,I3
T800 I1, I2, I3, I5 I2,I1,I3,I5
T900 I1, I2, I3 I2,I1,I3

Prepared By- Mr.Nilesh Magar


FP-TREE

Prepared By- Mr.Nilesh Magar


Item Conditional Conditional Frequent Pattern
Pattern Base FP-tree Generated

I5 {{I2, I1:1}, {I2, (I2:2, I1:2) {I2, I5:2}, {I1,


I1, I3:1}} I5:2}, {I2, I1, I5:2}

I4 {{I2, I1:2}, (I2:2) {I2, I4:2}


{I2:1}}

I3 {{I2, I1:2}, (I2:4, I1:2), {I2, I3:4}, {I1,


{I2:2}, {I1:2}} (I1:2), I3:4}, {I2, I1, I3:2}

I1 {{I2:4}} (I2:4) {I2, I1:4}


Prepared By- Mr.Nilesh Magar
Mining frequent itemsets using vertical
data format
 Transforming the horizontal data format of the
transaction database D into a vertical data
format:
Itemset TID_set
I1 {T100, T400, T500, T700, T800, T900}
I2 {T100, T200, T300, T400, T600, T800, T900}
I3 {T300, T500, T600, T700, T800, T900}
I4 {T200, T400}
I5 {T100, T800}
Prepared By- Mr.Nilesh Magar
Example For Practice

Prepared By- Mr.Nilesh Magar


Minimum support threshold is 3

Prepared By- Mr.Nilesh Magar


Prepared By- Mr.Nilesh Magar
T List of item
(After)

T1 f,c,a,m,p

T2 f,c,a,b,m

T3 f,b

T4 c,b,p

T5 f,c,a,p,m

Prepared By- Mr.Nilesh Magar


FP-Growth Example

Header Table {}

Item frequency head


f 4 f:4 c:1
c 4
a 3
b 3 c:3 b:1 b:1
m 3
p 3
a:3 p:1

m:2 b:1

p:2 m:1
Prepared By- Mr.Nilesh Magar
FP-Growth Example

Item Conditional pattern-base Conditional FP-tree

p {(fcam:2), (cb:1)} {(c:3)}|p

m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m

b {(fca:1), (f:1), (c:1)} Empty

a {(fc:3)} {(f:3, c:3)}|a

c {(f:3)} {(f:3)}|c

f Empty Empty

Prepared By- Mr.Nilesh Magar


FP-Tree Algorithm:
Input: DB, min_support

Output: FP-Tree

1. Scan DB & count all frequent items.

2. Create null root & set as current node.

3. For each Transaction T


 Sort T’s items.

 For each sorted Item I


 Insert I into tree as a child of current node.

 Connect new tree node to header list.

Prepared By- Mr.Nilesh Magar


FP- Growth Algorithm:

Prepared By- Mr.Nilesh Magar


Adv. & disAdv. Of FP- Growth:

Adv:
1) Only 2 Passes Over Data-set
2) No Candidate Generation
3) Much Faster Than Apriori

DisAdv:
• FP-Tree may not fit in memory.
• FP-Tree is expensive to build

Prepared By- Mr.Nilesh Magar


Subjects
1) U.M.L.
2) P.P.L.
3) D.M.D.W.
4) O.S.
5) Programming Languages
6) RDBMS
Mr. Nilesh Magar
Lecturer at MIT, Kothrud, Pune.
9975155310.
Prepared By - Mr. Nilesh Magar
Thank You

Prepared By - Mr. Nilesh Magar

You might also like