0% found this document useful (0 votes)

11 views27 pages

Data Mining Unupervised Techniques

This document discusses various unsupervised data mining techniques including market basket analysis, clustering, and dimension reduction. It provides an overview and examples of each technique. For market basket analysis, it discusses how to identify items frequently bought together and how to handle large datasets. For clustering, it explains k-means clustering and hierarchical clustering. It also discusses principal component analysis (PCA) as a method for dimension reduction.

Uploaded by

anantsatapathymba2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views27 pages

Data Mining Unupervised Techniques

Uploaded by

anantsatapathymba2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

Data Mining Techniques

and Application in
Business
(Unsupervised)

Dr Sunil D Lakdawala
Content
 Market Basket Analysis
 Overview and Applications
 Example
 Virtual Items
 Problems with Big Data
 How to handle those problems
 Exercise – Groceries
Content (cont)
Clustering
 Overview
 K-Means Method
 Hierarchical Clustering
 Applications
 Exercise - IRIS
Dimension Reduction
 Overview
 Principal Components Analysis (PCA)
ASSOCIATION
(Market Basket Analysis)
Overview and Applications
 Unsupervised
 Tells what items are bought together (Same time or
within certain time period)
 Blue Jeans, white shirt, Black Tie bought together
 Bear and Diaper bought together on weekend by
Young couples
 VCR is bought within 6 months of buying TV
 Share Trading account opened within 3 months of
opening demat account
 Certain symptoms go with certain disease
(Running Nose, Headache, High Fever with Flu)
 Person who buy “DW” books also buy “BI” books

Data Mining - Market Basket Analysis 5

Overview and Applications (Cont)
 Tells what items are bought together (Cont)
 Which products should be advertised in Spanish?
(Hamburgers and Potato Chips in English and
Sausages and Doritos in Spanish)
 Improving WEB Page design for e-commerce sites
 Homeopathic working
 Hotel Chain Services
 Preventive Maintenance

Data Mining - Market Basket Analysis 6

Example
Sr No Jean Shirt Tie Sock Jacket

1 X X

2 X X X

3 X

4 X

5 X X X

6 X X

7 X

8 X X

9 X

10 X
Example
Jeans  Shirt
(antecedent) (Consequent)
Support = 3/10 = 30% =P(Jeans, Shirt) / P(Jeans)
(# of times Jeans and Shirt bought Together)
Confidence = ¾ = 75% =P(Jeans, Shirt)
(Conditional Probability of buying Shirt)
Lift = Confidence / P(Shirt) = ¾ / 6/10 = 30/24 = 1.25
= P(Jeans, Shirt) / (P(Jeans)*P(Shirt))
LIMITATION
1. Value not considered
2. Frequency not considered
Example
Jeans, Shirt  Tie
(antecedent) (Consequent)
Support = 1/10 = 10%
(# of times antecedent and consequent bought
Together)
Confidence = 1/3 = 33%
(Conditional Probability of buying Consequent)
Lift = Confidence / P(Consequent) = 2/3
Virtual Items
 Store Location, Time of purchase, Mode of
Payment, Customer Profile (Signed Transaction)
 Virtual items to be added to other item, e.g. Blue
Jeans and White Shirt bought at Andheri shop
on Sunday
 Able to analyze time-wise, location-wise what
items go together (Which items are bought
together during Diwali, Which items are bought
together in rich locality vs. middle class locality)

Data Mining - Market Basket Analysis 10

Problems of Large Data
For menu having 100 items
# of Combinations with 1 item: 100
# of Combinations with 2 item: 4,950
# of Combinations with 3 item: 161,700

Typical super market has 10,000 items. # of

combination with 2 items: 50 million, 3 items:
100 billion!
Number of transactions: Million per year!!

Data Mining - Market Basket Analysis 11

Handling Problems of Big Data

 Pruning Techniques: Minimum Support,

Minimum Confidence, Minimum Lift (typically
1)
 Use of Taxonomies: e.g. instead of coke,
pepsi, etc.. Use soda, i.e. higher category

Data Mining - Market Basket Analysis 12

Taxonomy
Carbonated Drink
Bakery Product

Pepsi Coke ThumbsUp Bread Cake

200 Ml Plastic Britania Modern

Regular

500 gm White
Bread

Data Mining - Market Basket Analysis 13

CLUSTERING
Overview
Many times, whole population is diverse, but might consist
of number of similar Groups (Clusters)
“Clustering” is Undirected Knowledge Discovery or
Unsupervised Learning technique of Data Mining. It can
spot such similar groups.
Once cluster have been detected, other methods must be
applied in order to figure out what that cluster means.
Clustering is similar to classification, but classes / Groups /
Clusters are not predefined

Data Mining - Clustering 15

Overview (Cont)

Sometimes, common marketing strategy may not work out.

However, separate marketing strategy for each cluster,
based on Age, Gender, Income, Marital Status, Years of
loyalty (how long person has been a customer) might
work out.

Data Mining - Clustering 16

Applications and Exercise

Applications
 Hotel Chain

 Marketing Segmentation

 Offer by Pizza Hut

 Alumni

Exercise: IRIS

Data Mining - Clustering 17

Method : K-means Clustering
K Means Clustering
 Decide K

 Decide input attributes

 Define Distance Function

 Goodness of cluster r: Intra-cluster distance / inter-

cluster distance
 Iterative Method .. When to stop

 Address Outliers

Data Mining - Clustering 18

K Means Clustering
N = 50,000
K = ? =3 SIII
X1: Age
X2: Income
P1(18,25,000) SII
Income
P2(43, 54,000)
D(Age1,Age2,Income1,Income2)=?
SI P3
-------------------------------
SI : (27,28,000) P1 P2
SII : (39, 32,000)
SIII : (59, 65,000)
r = Intra-cluster / Inter-cluster = 0.2
Age

BI&A Overview
Effect of outlier on Clustering (K=2)
P7

P5 P6
P4
P3
P1
P2

BI&A Overview
Method : K-means Clustering (cont)

 Guidelines for K
 K=1? K=N? K : large integer? K: small integer?
 Plot “r” vs “K” and look for elbow shape
 Hierarchical Clustering
 Business Interpretation

Data Mining - Clustering 21

Hierarchical Clustering - Example
A B C D E
A 1 2 8 9
B 1 7 7
D(E1,E2) K=1
C 10 8
D 2
E

Distance

K=2 2

K=3 1
K=N=5
{A} {B} {C} {D} {E}
Cluster

Data Mining - Clustering 22

Dimension Reduction
Overview
Objective:
 Too many variables, many of them are

correlated or superfluous
 Accuracy of classification / prediction reduces

 Cost of data collection and processing high

 Need to reduce # of variables

PCA: Principle Component Analysis (Cont)
 This transformation is defined in such a way that the
first principal component has the largest possible
variance (that is, accounts for as much of the
variability in the data as possible)
 each succeeding component in turn has the highest
variance possible under the constraint that it is
orthogonal to the preceding components.
 The resulting vectors are an uncorrelated
orthogonal basis set.
 PCA is sensitive to the relative scaling of the original
variables.
 It is difficult to interpret PCA
PCA Eigenvalues

5
λ1 λ2

2
4.0 4.5 5.0 5.5 6.0
PCA
Z1 = C11*X1 + C12*X2 +C13*X3 + C14*X4
….
Z4 = C41*X1 + C42*X2 +C43*X3 + C44*X4
Z1 C11 C12 C13 C14 X1
Z2 C21 C22 C23 C24 X2
Z3 = C31 C32 C33 C34 X3
Z4 C41 C42 C43 C44 X4

 Should we normalize X1, X2, ..??

 Z1,Z2 are linearly independent, and Mean value
zero

UNIT 5 Introduction To Data Mining-1
No ratings yet
UNIT 5 Introduction To Data Mining-1
185 pages
Data Mining
No ratings yet
Data Mining
395 pages
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
No ratings yet
UNIT-04: Introduction To Data Mining: Data Mining Techniques KDD Process Association Rules.
40 pages
Datamining 1
No ratings yet
Datamining 1
30 pages
1 - Lect 1 & 2 Data Mining
No ratings yet
1 - Lect 1 & 2 Data Mining
20 pages
Knowledge Management - 10 - Data Mining Overview
No ratings yet
Knowledge Management - 10 - Data Mining Overview
41 pages
تنقيب بيانات 7 بعد التعديل Maj
No ratings yet
تنقيب بيانات 7 بعد التعديل Maj
35 pages
Data Mining Merged PDF CS1 CS8
No ratings yet
Data Mining Merged PDF CS1 CS8
272 pages
Leadership and Decision Making
100% (1)
Leadership and Decision Making
17 pages
Data Mining Intro
No ratings yet
Data Mining Intro
56 pages
CH 1
No ratings yet
CH 1
66 pages
Data Mining Concept (MMU)
No ratings yet
Data Mining Concept (MMU)
38 pages
Data Mining
No ratings yet
Data Mining
19 pages
Unit 3
No ratings yet
Unit 3
58 pages
DATA MINING For Search Engines
No ratings yet
DATA MINING For Search Engines
33 pages
Introduction Lecture1gghhhhh
No ratings yet
Introduction Lecture1gghhhhh
23 pages
Data Mining - Concepts and Techniques
No ratings yet
Data Mining - Concepts and Techniques
224 pages
BI Lecture 5ppt
No ratings yet
BI Lecture 5ppt
18 pages
Week 4 - Introduction To Data Mining and Data Mining Techniques
No ratings yet
Week 4 - Introduction To Data Mining and Data Mining Techniques
44 pages
Chapter 1
No ratings yet
Chapter 1
38 pages
Fundamentals of Data Mining: Dr. Jasim Saeed Jasim - Saeed@riphah - Edu.pk
No ratings yet
Fundamentals of Data Mining: Dr. Jasim Saeed Jasim - Saeed@riphah - Edu.pk
15 pages
Study Material I
No ratings yet
Study Material I
140 pages
Data Mining
No ratings yet
Data Mining
17 pages
Data Mining
No ratings yet
Data Mining
31 pages
Combinepdf 1
No ratings yet
Combinepdf 1
74 pages
Association Rule Mining
No ratings yet
Association Rule Mining
61 pages
DWDM
No ratings yet
DWDM
30 pages
Chapter 1 Data Mining Lecture Note
No ratings yet
Chapter 1 Data Mining Lecture Note
31 pages
Introduction To Data Mining
75% (4)
Introduction To Data Mining
45 pages
Predictive Analysis 5
No ratings yet
Predictive Analysis 5
8 pages
SWEN3165 Lecture 9 - Data Mining
No ratings yet
SWEN3165 Lecture 9 - Data Mining
32 pages
Assessment 1 Form 1
No ratings yet
Assessment 1 Form 1
2 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
43 pages
Data Mining
No ratings yet
Data Mining
20 pages
Data Mining
No ratings yet
Data Mining
20 pages
Data Mining:: Dr. Hany Saleeb
No ratings yet
Data Mining:: Dr. Hany Saleeb
37 pages
Data Mining: Business Intelligence
No ratings yet
Data Mining: Business Intelligence
68 pages
5 Data Mining Proccess and Techniques - Week 7
No ratings yet
5 Data Mining Proccess and Techniques - Week 7
61 pages
Data Mining
No ratings yet
Data Mining
63 pages
Inggris
No ratings yet
Inggris
4 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
44 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
CSM6404 DM L1
No ratings yet
CSM6404 DM L1
29 pages
Data Miningppt378
No ratings yet
Data Miningppt378
31 pages
Big Data 4 (3 - 4)
No ratings yet
Big Data 4 (3 - 4)
13 pages
Unit 3 BI & Data Science
No ratings yet
Unit 3 BI & Data Science
19 pages
Civil III Surveying I (10cv34) Notes
No ratings yet
Civil III Surveying I (10cv34) Notes
109 pages
Introduction To Data Mining With Case Studies - Sample Index
0% (1)
Introduction To Data Mining With Case Studies - Sample Index
16 pages
Data Mining: Nikita K Somaiya
No ratings yet
Data Mining: Nikita K Somaiya
19 pages
1.supervised and Unsupervised
No ratings yet
1.supervised and Unsupervised
42 pages
Chap 1
No ratings yet
Chap 1
45 pages
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
No ratings yet
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
35 pages
4 Datamining
No ratings yet
4 Datamining
90 pages
BI SR Nilam Daerah 2018
100% (1)
BI SR Nilam Daerah 2018
3 pages
Introduction To Data Mining
No ratings yet
Introduction To Data Mining
69 pages
Chapter 6 Data Mining
No ratings yet
Chapter 6 Data Mining
39 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
4C's and Principles of Communication
100% (1)
4C's and Principles of Communication
2 pages
Descon
No ratings yet
Descon
10 pages
Data Mining Technique Using Weka Tool
No ratings yet
Data Mining Technique Using Weka Tool
21 pages
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
No ratings yet
To Data Mining: Motivation: "Necessity Is The Mother of Invention"
14 pages
As A Future Educator I See Myself As A Effective Teacher
67% (3)
As A Future Educator I See Myself As A Effective Teacher
1 page
1 Intro
No ratings yet
1 Intro
33 pages
Lecture Notes On Nonlinear Dynamics PDF
No ratings yet
Lecture Notes On Nonlinear Dynamics PDF
345 pages
Data Mining - IMT Nagpur-Manish
No ratings yet
Data Mining - IMT Nagpur-Manish
82 pages
Handout Materi Sketchup
No ratings yet
Handout Materi Sketchup
31 pages
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
No ratings yet
Data Mining and Data Warehouse BY: Dept. of Computer Science Engineering
10 pages
Digimon World 4 (NTSC-U) .Pnach
No ratings yet
Digimon World 4 (NTSC-U) .Pnach
2 pages
Per g01 Pub 585 Touchstone AssessmentQPHTMLMode1 GATE2451 GATE2451S2D3934 17402118184514583 CS25S26409309 GATE2451S2D3934E1.HTML#
No ratings yet
Per g01 Pub 585 Touchstone AssessmentQPHTMLMode1 GATE2451 GATE2451S2D3934 17402118184514583 CS25S26409309 GATE2451S2D3934E1.HTML#
33 pages
Extended Essay Skeleton Outline Template
No ratings yet
Extended Essay Skeleton Outline Template
4 pages
Strategic Mgt. Process
No ratings yet
Strategic Mgt. Process
13 pages
Market Identification Guide
0% (1)
Market Identification Guide
14 pages
9709 s11 Ms 62 PDF
No ratings yet
9709 s11 Ms 62 PDF
6 pages
The State of AI in The Cloud 2025
No ratings yet
The State of AI in The Cloud 2025
7 pages
Fourier Analysis
No ratings yet
Fourier Analysis
37 pages
Law and Economics Anthology - Kenneth G. Dau-Schmidt y Thomas S. Ulen
No ratings yet
Law and Economics Anthology - Kenneth G. Dau-Schmidt y Thomas S. Ulen
64 pages
Airconditioner Energy Saver: Sooraj Raju Univ Roll No:372845
No ratings yet
Airconditioner Energy Saver: Sooraj Raju Univ Roll No:372845
23 pages
CS1402 Ooad
No ratings yet
CS1402 Ooad
9 pages
Celery Lab
No ratings yet
Celery Lab
3 pages
Dragon-Kings Lorenz 16dec10
No ratings yet
Dragon-Kings Lorenz 16dec10
73 pages
Econometrics Notes
No ratings yet
Econometrics Notes
8 pages
Suggestion ATR-1
No ratings yet
Suggestion ATR-1
2 pages
Suggestion ATR-2
No ratings yet
Suggestion ATR-2
2 pages
Suggestion ATR-3
No ratings yet
Suggestion ATR-3
2 pages
Myers Brigg
No ratings yet
Myers Brigg
7 pages
Kids Presentation Xyz
No ratings yet
Kids Presentation Xyz
1 page
Procedure Rolling Moment Report PDF
No ratings yet
Procedure Rolling Moment Report PDF
12 pages
Wi-Vi Technology PDF
No ratings yet
Wi-Vi Technology PDF
11 pages
Local Literature 1
No ratings yet
Local Literature 1
5 pages
Phone Email Linkedin Address: Rofile
No ratings yet
Phone Email Linkedin Address: Rofile
1 page
TTPPro Bands
No ratings yet
TTPPro Bands
1 page
Student Feedback Analysis
No ratings yet
Student Feedback Analysis
3 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
From Everand
Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data
EMC Education Services
No ratings yet

Data Mining Unupervised Techniques

Uploaded by

Data Mining Unupervised Techniques

Uploaded by

Data Mining Techniques

Data Mining - Market Basket Analysis 5

Data Mining - Market Basket Analysis 6

Data Mining - Market Basket Analysis 10

Typical super market has 10,000 items. # of

Data Mining - Market Basket Analysis 11

 Pruning Techniques: Minimum Support,

Data Mining - Market Basket Analysis 12

Pepsi Coke ThumbsUp Bread Cake

200 Ml Plastic Britania Modern

Data Mining - Market Basket Analysis 13

Data Mining - Clustering 15

Sometimes, common marketing strategy may not work out.

Data Mining - Clustering 16

 Offer by Pizza Hut

Data Mining - Clustering 17

 Decide input attributes

 Define Distance Function

 Goodness of cluster r: Intra-cluster distance / inter-

Data Mining - Clustering 18

Data Mining - Clustering 21

Data Mining - Clustering 22

 Cost of data collection and processing high

 Need to reduce # of variables

 Should we normalize X1, X2, ..??

You might also like