0% found this document useful (0 votes)
7 views

Introduction To Data Mining - Lecture03

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Introduction To Data Mining - Lecture03

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Introduction to Data

Mining
Madava Viranjan
• The world is rich in data

• Repositories to store data from multiple heterogeneous data sources

• OLAP as analysis technique with functionalities like summarization,


consolidation and aggregation.
What is Data Mining?

• The process of discovering interesting patterns and knowledge from large


amount of data
• Does it same as Knowledge Discovery from Data (KDD)?
KDD vs
Data Mining
Data Mining Functionalities

• Class/Concept Description
• Classes and Concepts can be described in summarized terms
• Mining Frequent Patterns
• Patterns that occur frequently in a dataset
• Classification
• Find a model that describes and distinguishes classes/concepts
• Cluster Analysis
• Objects are grouped to maximize intra-class similarity but minimize
inter-class similarities
• Are all patterns interesting?

• Can Data Mining system generate all of the interesting patterns?

• Can Data Mining system generate only required patterns?


It is a
Combination
of Subjects
Mining Frequent
Patterns
Frequent Patterns

• Frequent patterns are patterns that appear frequently in data set. Could be
either frequent itemset, frequent sequence or frequent substructure.

• Mining frequent patterns leads to discover interesting associations and


correlations in data
Frequent Itemset Mining

• Market Basket Analysis


• Typical example of
frequent itemset mining
Mining Frequent Itemsets – Apriori
Algorithm

• It uses prior knowledge of frequent itemset to determine level wise


frequent itemsets.

• Apriori property
• All non empty subsets of a frequent itemset must also be frequent

• Minimum Support Threshold


• At least frequencies should be satisfy minimum support
Mining Frequent Itemsets – Apriori
Algorithm Contd.
TID List of item_id

T1 i1, i2, i5

T2 i2, i4

T3 i2, i3

T4 i1, i2, i4

T5 i1, i3

T6 i2, i3

T7 i1, i3

T8 i1, i2, i3, i5

T9 i1, i2, i3

Minimum Support = 2
Mining Frequent Itemsets – Apriori
Algorithm Contd.

TID Computer Webcam Antivirus Office Suite SDCard


Software
T1 1 1 1 0 0

T2 0 1 1 1 0

T3 0 0 0 1 1

T4 1 1 0 1 0

T5 1 1 1 0 1

T6 1 1 1 1 1

Minimum Support = 50%


Mining Frequent Itemsets – Apriori
Algorithm Contd

• step1 : create 1-itemset, C1


• step2: by considering min_support get the frequent 1-itemset, L1
• step3: join L1 with L1(same) and create candidate 2-itemset, C2
• step4: by considering min_support get the frequent 2-itemset, L2
• step5: join L2 with L2(same) and create candidate 3-itemset. Remove
itemsets which does not satisfy appriori property.
• step6: by considering min_support get the frequent 3-itemset, L3
Mining Frequent Itemsets – Apriori
Algorithm Contd.
• How to compute confidence?

{i1, i2}=>i5
{i1, i5}=>i2
{i2, i5}=>i1
i1=>{i2, i5}
I2=>{i1, i5}
Problems of Apriori Mining

• Need to generate huge number of candidate sets

• Need to scan whole database repeatedly


Mining Frequent Itemsets – A Pattern
Growth Approach

TID List of item_id

T1 i1, i2, i5

T2 i2, i4

T3 i2, i3

• Divide and conquer approach T4 i1, i2, i4

• Create a Frequent Pattern tree (FP- T5 i1, i3


Tree)
T6 i2, i3

T7 i1, i3

T8 i1, i2, i3, i5

T9 i1, i2, i3
Mining Frequent Itemsets – A Pattern
Growth Approach contd.

step1 : Derives the 1-itemset(similar to Apriori)


step2: Create list ‘L’ by oredering 1-itemset in descending order
step3: Create the root of FP-tree and labeled as ‘null’
step4: Scan the database and again and in each transaction add a branch
based on the same order as ‘L’
Mining Frequent Itemsets – A Pattern
Growth Approach contd.

• When mining start from each length-1 pattern and construct its conditional
pattern base. Then construct its conditional FP tree and do this in recursive
manner.
TID Items

1 {a, b}

2 {b, c, d}

3 {a, c, d, e}

4 {a, d, e}

5 {a, b, c}

6 {a, b, c, d}

7 {a}

8 {a, b, c}

9 {a, b, d}

10 {b, c, e}

Minimum Support = 2
• Association rule can be misleading

Total number of transactions = 10000


Buys computer games = 6000
Buys videos = 7500
Buys both = 4000

Min_sup = 30%
Min_confidence = 60%
Correlation Analysis

• Other than measuring support and confidence correlation between


itemsets being considered.
Correlation Analysis with Lift Measure

• Lift is a measure which used in Correlation Analysis

• If the result is less than 1 then A is negatively correlated with B

You might also like