0% found this document useful (0 votes)

13 views

Introduction To Data Mining - Lecture03

Uploaded by

vikum.amarananda47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Introduction To Data Mining - Lecture03

Uploaded by

vikum.amarananda47

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Introduction to Data

Mining
Madava Viranjan
• The world is rich in data

• Repositories to store data from multiple heterogeneous data sources

• OLAP as analysis technique with functionalities like summarization,

consolidation and aggregation.
What is Data Mining?

• The process of discovering interesting patterns and knowledge from large

amount of data
• Does it same as Knowledge Discovery from Data (KDD)?
KDD vs
Data Mining
Data Mining Functionalities

• Class/Concept Description
• Classes and Concepts can be described in summarized terms
• Mining Frequent Patterns
• Patterns that occur frequently in a dataset
• Classification
• Find a model that describes and distinguishes classes/concepts
• Cluster Analysis
• Objects are grouped to maximize intra-class similarity but minimize
inter-class similarities
• Are all patterns interesting?

• Can Data Mining system generate all of the interesting patterns?

• Can Data Mining system generate only required patterns?

It is a
Combination
of Subjects
Mining Frequent
Patterns
Frequent Patterns

• Frequent patterns are patterns that appear frequently in data set. Could be
either frequent itemset, frequent sequence or frequent substructure.

• Mining frequent patterns leads to discover interesting associations and

correlations in data
Frequent Itemset Mining

• Market Basket Analysis

• Typical example of
frequent itemset mining
Mining Frequent Itemsets – Apriori
Algorithm

• It uses prior knowledge of frequent itemset to determine level wise

frequent itemsets.

• Apriori property
• All non empty subsets of a frequent itemset must also be frequent

• Minimum Support Threshold

• At least frequencies should be satisfy minimum support
Mining Frequent Itemsets – Apriori
Algorithm Contd.
TID List of item_id

T1 i1, i2, i5

T2 i2, i4

T3 i2, i3

T4 i1, i2, i4

T5 i1, i3

T6 i2, i3

T7 i1, i3

T8 i1, i2, i3, i5

T9 i1, i2, i3

Minimum Support = 2
Mining Frequent Itemsets – Apriori
Algorithm Contd.

TID Computer Webcam Antivirus Office Suite SDCard

Software
T1 1 1 1 0 0

T2 0 1 1 1 0

T3 0 0 0 1 1

T4 1 1 0 1 0

T5 1 1 1 0 1

T6 1 1 1 1 1

Minimum Support = 50%

Mining Frequent Itemsets – Apriori
Algorithm Contd

• step1 : create 1-itemset, C1

• step2: by considering min_support get the frequent 1-itemset, L1
• step3: join L1 with L1(same) and create candidate 2-itemset, C2
• step4: by considering min_support get the frequent 2-itemset, L2
• step5: join L2 with L2(same) and create candidate 3-itemset. Remove
itemsets which does not satisfy appriori property.
• step6: by considering min_support get the frequent 3-itemset, L3
Mining Frequent Itemsets – Apriori
Algorithm Contd.
• How to compute confidence?

{i1, i2}=>i5
{i1, i5}=>i2
{i2, i5}=>i1
i1=>{i2, i5}
I2=>{i1, i5}
Problems of Apriori Mining

• Need to generate huge number of candidate sets

• Need to scan whole database repeatedly

Mining Frequent Itemsets – A Pattern
Growth Approach

TID List of item_id

T1 i1, i2, i5

T2 i2, i4

T3 i2, i3

• Divide and conquer approach T4 i1, i2, i4

• Create a Frequent Pattern tree (FP- T5 i1, i3

Tree)
T6 i2, i3

T7 i1, i3

T8 i1, i2, i3, i5

T9 i1, i2, i3
Mining Frequent Itemsets – A Pattern
Growth Approach contd.

step1 : Derives the 1-itemset(similar to Apriori)

step2: Create list ‘L’ by oredering 1-itemset in descending order
step3: Create the root of FP-tree and labeled as ‘null’
step4: Scan the database and again and in each transaction add a branch
based on the same order as ‘L’
Mining Frequent Itemsets – A Pattern
Growth Approach contd.

• When mining start from each length-1 pattern and construct its conditional
pattern base. Then construct its conditional FP tree and do this in recursive
manner.
TID Items

1 {a, b}

2 {b, c, d}

3 {a, c, d, e}

4 {a, d, e}

5 {a, b, c}

6 {a, b, c, d}

7 {a}

8 {a, b, c}

9 {a, b, d}

10 {b, c, e}

Minimum Support = 2
• Association rule can be misleading

Total number of transactions = 10000

Buys computer games = 6000
Buys videos = 7500
Buys both = 4000

Min_sup = 30%
Min_confidence = 60%
Correlation Analysis

• Other than measuring support and confidence correlation between

itemsets being considered.
Correlation Analysis with Lift Measure

• Lift is a measure which used in Correlation Analysis

• If the result is less than 1 then A is negatively correlated with B

Software Architecture Document Template
No ratings yet
Software Architecture Document Template
28 pages
Data Mining
No ratings yet
Data Mining
41 pages
Unit II
No ratings yet
Unit II
22 pages
DM_U_2
No ratings yet
DM_U_2
16 pages
Data Mining - Lecture 4
No ratings yet
Data Mining - Lecture 4
40 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
DATA MINING UNIT-II NOTES
No ratings yet
DATA MINING UNIT-II NOTES
24 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
2 unit dm k raj kuamr
No ratings yet
2 unit dm k raj kuamr
26 pages
Literature Survey On Various Frequent Pattern Mining Algorithm
No ratings yet
Literature Survey On Various Frequent Pattern Mining Algorithm
7 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
Unit-03 Dw&Dm Notes Ashish Singh PDF 11
No ratings yet
Unit-03 Dw&Dm Notes Ashish Singh PDF 11
8 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Unit 2 Material
No ratings yet
Unit 2 Material
17 pages
ML Unit - Iii
No ratings yet
ML Unit - Iii
64 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Unit 5
No ratings yet
Unit 5
40 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
DMDW-U3
No ratings yet
DMDW-U3
16 pages
FDS Unit - 3
No ratings yet
FDS Unit - 3
10 pages
Chap 6
No ratings yet
Chap 6
77 pages
CH - 5
No ratings yet
CH - 5
43 pages
Association Rules
No ratings yet
Association Rules
48 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
dm 2
No ratings yet
dm 2
71 pages
Unit 2_Apriori and FP Growth Algortithm
No ratings yet
Unit 2_Apriori and FP Growth Algortithm
15 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Efficient Algorithm For Mining Frequent Patterns Java Project
No ratings yet
Efficient Algorithm For Mining Frequent Patterns Java Project
38 pages
Association Rules
No ratings yet
Association Rules
20 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
Contents
No ratings yet
Contents
59 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
DMDW Chapter 4(Updated)
No ratings yet
DMDW Chapter 4(Updated)
28 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Chapter06 (Frequent Patterns)
No ratings yet
Chapter06 (Frequent Patterns)
47 pages
Hot Keys
No ratings yet
Hot Keys
4 pages
Unit3 Data mining Pattern
No ratings yet
Unit3 Data mining Pattern
46 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
No ratings yet
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
7 pages
M9 Asosiasi
No ratings yet
M9 Asosiasi
58 pages
Lecture 5
No ratings yet
Lecture 5
43 pages
Apriori Algorithm Example PDF
No ratings yet
Apriori Algorithm Example PDF
7 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
Chapter4
No ratings yet
Chapter4
32 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
NumPy: Beginner's Guide - Third Edition
From Everand
NumPy: Beginner's Guide - Third Edition
Ivan Idris
3.5/5 (3)
CIS_Oracle_Cloud_Infrastructure_Foundations_Benchmark_v3.0.0
No ratings yet
CIS_Oracle_Cloud_Infrastructure_Foundations_Benchmark_v3.0.0
215 pages
Coimbatore Freelance Web Designer
100% (1)
Coimbatore Freelance Web Designer
3 pages
How To Store Login Credentials, The Right Way in Flutter. (2023 Update) - by Carlo Loguercio - System Weakness
No ratings yet
How To Store Login Credentials, The Right Way in Flutter. (2023 Update) - by Carlo Loguercio - System Weakness
23 pages
Partitioning
No ratings yet
Partitioning
37 pages
Đọc thử Trader Handbook by TP Trading - Issuu
No ratings yet
Đọc thử Trader Handbook by TP Trading - Issuu
4 pages
The AutoCAD Working Environment
100% (1)
The AutoCAD Working Environment
39 pages
J-Trader Quick User Guideline
No ratings yet
J-Trader Quick User Guideline
28 pages
Oracle Flexcube Restful Services Usage
No ratings yet
Oracle Flexcube Restful Services Usage
10 pages
Information Technology Audit Methodology:: Planning Phase
No ratings yet
Information Technology Audit Methodology:: Planning Phase
4 pages
How WebKit Works
No ratings yet
How WebKit Works
16 pages
Finals - Performance Task 2 Guidelines
No ratings yet
Finals - Performance Task 2 Guidelines
4 pages
logic
No ratings yet
logic
10 pages
SIOC Project Handbook (From Simon)
No ratings yet
SIOC Project Handbook (From Simon)
9 pages
Services Standard Build User Guide
No ratings yet
Services Standard Build User Guide
19 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Expert System Thesis Topics
100% (3)
Expert System Thesis Topics
4 pages
HIDALGO, NANCY VS MCDONNELL, JOSEPH RAY - Court Records - UniCourt
No ratings yet
HIDALGO, NANCY VS MCDONNELL, JOSEPH RAY - Court Records - UniCourt
3 pages
Codage Rfid
No ratings yet
Codage Rfid
5 pages
Adding A RAC NODE: DB Experts Basavraju 1
No ratings yet
Adding A RAC NODE: DB Experts Basavraju 1
22 pages
International Training Catalog Customers 2023 - 2024
No ratings yet
International Training Catalog Customers 2023 - 2024
42 pages
SQL Based Metric Extension Oem12c1
No ratings yet
SQL Based Metric Extension Oem12c1
9 pages
SOFTWARE SYSTEM DESIGN
No ratings yet
SOFTWARE SYSTEM DESIGN
55 pages
What Parameters Are Passed To WinMain
No ratings yet
What Parameters Are Passed To WinMain
1 page
EN - Brochure - ZiehmVisionRFD - 28453 Rev 03
No ratings yet
EN - Brochure - ZiehmVisionRFD - 28453 Rev 03
11 pages
PMT Hps 4493 HPS BRO QSonic Plus
No ratings yet
PMT Hps 4493 HPS BRO QSonic Plus
8 pages
Embracing Agile Practices
No ratings yet
Embracing Agile Practices
4 pages
Online Store Using e Commerce and Database Design and Implementation IJERTV9IS100168
No ratings yet
Online Store Using e Commerce and Database Design and Implementation IJERTV9IS100168
8 pages
Hitachi HDDs Repair Scheme Based On MRT Pro
No ratings yet
Hitachi HDDs Repair Scheme Based On MRT Pro
21 pages
PERBANDINGAN Huawei FusionCube Hyper
No ratings yet
PERBANDINGAN Huawei FusionCube Hyper
2 pages

Introduction To Data Mining - Lecture03

Uploaded by

Introduction To Data Mining - Lecture03

Uploaded by

Introduction to Data

• Repositories to store data from multiple heterogeneous data sources

• OLAP as analysis technique with functionalities like summarization,

• The process of discovering interesting patterns and knowledge from large

• Can Data Mining system generate all of the interesting patterns?

• Can Data Mining system generate only required patterns?

• Mining frequent patterns leads to discover interesting associations and

• Market Basket Analysis

• It uses prior knowledge of frequent itemset to determine level wise

• Minimum Support Threshold

T8 i1, i2, i3, i5

TID Computer Webcam Antivirus Office Suite SDCard

Minimum Support = 50%

• step1 : create 1-itemset, C1

• Need to generate huge number of candidate sets

• Need to scan whole database repeatedly

TID List of item_id

• Divide and conquer approach T4 i1, i2, i4

• Create a Frequent Pattern tree (FP- T5 i1, i3

T8 i1, i2, i3, i5

step1 : Derives the 1-itemset(similar to Apriori)

Total number of transactions = 10000

• Other than measuring support and confidence correlation between

• Lift is a measure which used in Correlation Analysis

• If the result is less than 1 then A is negatively correlated with B

You might also like