Introduction To Data Mining - Lecture03
Introduction To Data Mining - Lecture03
Mining
Madava Viranjan
• The world is rich in data
• Class/Concept Description
• Classes and Concepts can be described in summarized terms
• Mining Frequent Patterns
• Patterns that occur frequently in a dataset
• Classification
• Find a model that describes and distinguishes classes/concepts
• Cluster Analysis
• Objects are grouped to maximize intra-class similarity but minimize
inter-class similarities
• Are all patterns interesting?
• Frequent patterns are patterns that appear frequently in data set. Could be
either frequent itemset, frequent sequence or frequent substructure.
• Apriori property
• All non empty subsets of a frequent itemset must also be frequent
T1 i1, i2, i5
T2 i2, i4
T3 i2, i3
T4 i1, i2, i4
T5 i1, i3
T6 i2, i3
T7 i1, i3
T9 i1, i2, i3
Minimum Support = 2
Mining Frequent Itemsets – Apriori
Algorithm Contd.
T2 0 1 1 1 0
T3 0 0 0 1 1
T4 1 1 0 1 0
T5 1 1 1 0 1
T6 1 1 1 1 1
{i1, i2}=>i5
{i1, i5}=>i2
{i2, i5}=>i1
i1=>{i2, i5}
I2=>{i1, i5}
Problems of Apriori Mining
T1 i1, i2, i5
T2 i2, i4
T3 i2, i3
T7 i1, i3
T9 i1, i2, i3
Mining Frequent Itemsets – A Pattern
Growth Approach contd.
• When mining start from each length-1 pattern and construct its conditional
pattern base. Then construct its conditional FP tree and do this in recursive
manner.
TID Items
1 {a, b}
2 {b, c, d}
3 {a, c, d, e}
4 {a, d, e}
5 {a, b, c}
6 {a, b, c, d}
7 {a}
8 {a, b, c}
9 {a, b, d}
10 {b, c, e}
Minimum Support = 2
• Association rule can be misleading
Min_sup = 30%
Min_confidence = 60%
Correlation Analysis