0% found this document useful (0 votes)

36 views59 pages

Lecture 02 - Cluster Analysis 1

This document provides an overview of cluster analysis techniques presented in a data analytics lecture. It discusses factor analysis, different clustering methods like hierarchical and k-means clustering, key steps in conducting cluster analysis such as formulating the problem, selecting distance measures and clustering procedures, and interpreting results. Examples are provided to illustrate cluster analysis and its applications in segmentation.

Uploaded by

Ghabri Fida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views59 pages

Lecture 02 - Cluster Analysis 1

Uploaded by

Ghabri Fida

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

MEDITERRANEAN SCHOOL OF BUSINESS

COURSE: DATA ANALYTICS

PROFESSOR: Dr. Ramla Jarrar

Lecture 02: Cluster Analysis (3h)

1
Review

2
Factor Analysis
Factor analysis is a class of procedures used for data reduction and
summarization.
It is an interdependence technique: no distinction between dependent
and independent variables.

Factor analysis is used:

◦ To identify underlying dimensions, or factors, that explain the
correlations among a set of variables.
◦ To identify a new, smaller, set of uncorrelated variables to replace the
original set of correlated variables.
Factor Analysis Model
The common factors themselves can be expressed as linear
combinations of the observed variables.

Fi = Wi1X1 + Wi2X2 + Wi3X3 + . . . + WikXk

Where:
Fi = estimate of ith factor
Wi= weight or factor score coefficient
k = number of variables
The Factor Analysis Process

Formulate the Interpret your Calculate your

Problem factors factor stores

Look at the Rotate your

correlation matrix factors

Determine the
Determine the
Method of
number of factors
Extraction
This Session: Cluster
Analysis

6
Course Structure: 24h including 9h labs

Underlying theory Recap Business Statistics (3h inc. Lab)

Data Analytics

Factor Analysis (3h)+Lab (3h)

Cluster Analysis (3h)+Lab (3h)

Multidimensional
Techniques
Regression (4.5h)+Lab (3h)

Capstone Session (1.5h)

7
Multivariate Methods
Regression

Metric Dependent
Variable
ANOVA if the repressors
are nonmetric
Dependence

Nonmetric Dependent MDA (Multiple

Variable Discriminant Analysis)
Type of relationship
tested

The relationship is
Factor Analysis
among the variables

Interdependence
The relationship is
among Cluster Analysis
cases/respondents

8
Example
In our database we count more than 10,000 Customers - we know
their age, city name, income, employment status, designation (i.e.
level of seniority).

You have to sell 100 Blackberry phones(each costs $1000) to the

people in this group.

How can you be efficient in your sales strategy?

Example of Clustering
◦Divide the whole population
into two groups employed /
unemployed

◦Further divide the employed

population into two groups
high/low salary

◦Further divide that group into

high /low designation (i.e.
Senior Vs. less Senior)
Cluster Analysis
Cluster analysis is a class of techniques used to classify objects or cases into relatively
homogeneous groups called clusters. Objects in each cluster tend to be similar to each
other and dissimilar to objects in the other clusters. Cluster analysis is also called
classification analysis, or numerical taxonomy.

Both cluster analysis and discriminant analysis are concerned with classification.
◦ However, discriminant analysis requires prior knowledge of the cluster or group
membership for each object or case included, to develop the classification rule.
◦ In contrast, in cluster analysis there is no a priori information about the group or cluster
membership for any of the objects. Groups or clusters are suggested by the data, not
defined a priori.
Cluster analysis Vs. Factor analysis
CLUSTER ANALYSIS FACTOR ANALYSIS

Grouping is based on the Grouping is based on patterns of

distance (proximity). variation (correlation)

We form group of people We form group of variables based

based on their responses to on the several people’s responses
several variables. to those variables.
An Ideal Clustering Situation
Groups are distinct:

Variable 1

Variable 2
A Practical Clustering Situation
Groups are not that distinct:

Variable 1

X
Variable 2
Statistics Associated with Cluster Analysis
Agglomeration schedule: An agglomeration schedule gives information on the
objects or cases being combined at each stage of a hierarchical clustering
process.
Cluster centroid: The cluster centroid is the mean values of the variables for all
the cases or objects in a particular cluster.
Cluster centers: The cluster centers are the initial starting points in
nonhierarchical clustering. Clusters are built around these centers, or seeds.
Cluster membership: Cluster membership indicates the cluster to which each
object or case belongs.
Statistics Associated with Cluster Analysis
Dendrogram: A dendrogram, or tree graph, is a graphical device for
displaying clustering results. Vertical lines represent clusters that are
joined together. The position of the line on the scale indicates the
distances at which clusters were joined. The dendrogram is read from left
to right.
Distances between cluster centers: These distances indicate how
separated the individual pairs of clusters are. Clusters that are widely
separated are distinct, and therefore desirable.
Statistics Associated with Cluster Analysis
Icicle diagram: An icicle diagram is a graphical display of clustering
results, so called because it resembles a row of icicles hanging from the
eaves of a house. The columns correspond to the objects being
clustered, and the rows correspond to the number of clusters. An icicle
diagram is read from bottom to top.
Similarity/distance coefficient matrix: A similarity/distance coefficient
matrix is a lower-triangle matrix containing pairwise distances between
objects or cases.
Conducting Cluster Analysis

Decide on
Formulate Select a Select a Interpret & Assess the
the
the Distance Clustering Profile Validity of
Number of
Problem Measure Procedure Clusters Clustering
Clusters
I - Formulate the Problem
• Perhaps the most important part of formulating the clustering problem is
selecting the variables on which the clustering is based.
• Inclusion of even one or two irrelevant variables may distort an otherwise
useful clustering solution.
• Basically, the set of variables selected should describe the similarity
between objects in terms that are relevant to the research problem.
• The variables should be selected based on past research, theory, or a
consideration of the hypotheses being tested. In exploratory research,
the researcher should exercise judgment and intuition.
II – Select a Distance Measure (1)
Several distance measures are available, each with
specific characteristics.

◦ Euclidean distance. The most commonly used measure

of similarity. It is the square root of the sum of the
squared differences in values for each variable.
◦ Squared Euclidean distance. The sum of the squared
differences without taking the square root.
◦ City- block (Manhattan) distance. Euclidean distance.
Uses the sum of the variables’ absolute differences
◦ Chebychev distance. Is the maximum of the absolute
difference in the clustering variables’ values. Frequently
used when working with metric (or ordinal) data.
◦ Mahalanobis distance (D2). Is a generalized distance
measure that accounts for the correlations among
variables in a way that weights each variables equally.
II – Select a Distance Measure (2)
•If the variables are measured in vastly different units, the clustering
solution will be influenced by the units of measurement. In these
cases, before clustering respondents, we must
• standardize the data by rescaling each variable to have a
mean of zero and a standard deviation of unity. It is also
desirable to eliminate outliers (cases with atypical values).
•How?
•Use of different distance measures may lead to different clustering
results. Hence, it is advisable to use different measures and
compare the results.
III – Select a Clustering Procedure (1)
Simple

Linkage Complete

Agglomerative Variance Average

Clustering

Hierarchical
Divisive Centroid
K-Means

Two-Step

22
III – Select a Clustering Procedure (2)
Hierarchical clustering is characterized by the development of a
hierarchy or tree-like structure. Hierarchical methods can be
agglomerative or divisive.
◦ Agglomerative clustering starts with each object in a separate
cluster. Clusters are formed by grouping objects into bigger and
bigger clusters. This process is continued until all objects are
members of a single cluster.
◦ Divisive clustering starts with all the objects grouped in a single
cluster. Clusters are divided or split until each object is in a
separate cluster.
III – Select a Clustering Procedure (3)-HP
Linkage Methods: The single Linkage
The single linkage method is based on minimum distance, or the
nearest neighbor rule:
◦ At every stage, the distance between two clusters is the distance
between their two closest points.

Single Linkage

Minimum Distance

Cluster 1 Cluster 2
III – Select a Clustering Procedure (3)- HP
Linkage Methods: The complete Linkage
The complete linkage method is similar to single linkage, except
that it is based on the maximum distance or the furthest neighbor
approach:
◦ In complete linkage, the distance between two clusters is calculated
as the distance between their two furthest points.

Complete Linkage
Maximum Distance

Cluster 1 Cluster 2
III – Select a Clustering Procedure (3)- HP
Linkage Methods: Average Linkage
The average linkage method works similarly. However, in this
method, the distance between two clusters is defined as the
average of the distances between all pairs of objects, where one
member of the pair is from each of the clusters.

Average Linkage

Average Distance
Cluster 1 Cluster 2
III – Select a Clustering Procedure (3)-HP
Variance Method
• The variance methods attempt to generate clusters to minimize the
within-cluster variance.
• A commonly used variance method is the Ward's procedure:
◦ For each cluster, the means for all the variables are computed.
◦ Then, for each object, the squared Euclidean distance to the cluster means is
calculated.
◦ These distances are summed for all the objects.

Ward’s Procedure
III – Select a Clustering Procedure (3)-HP
Centroid Method
• In the centroid methods, the distance between two clusters is the distance
between their centroids (means for all the variables). Every time objects are
grouped, a new centroid is computed.
• Of the hierarchical methods, average linkage and Ward's methods have been
shown to perform better than the other procedures.

Centroid Method
IV – Select a Clustering Procedure (3) NHP
K-Means Method
• The nonhierarchical clustering methods are frequently referred to as k-means clustering:
◦ Note that in this procedure the number k of clusters is fixed
• In the sequential threshold method, a cluster center is selected and all objects within a
prespecified threshold value from the center are grouped together. Then a new cluster
center or seed is selected, and the process is repeated for the unflustered points. Once
an object is clustered with a seed, it is no longer considered for clustering with subsequent
seeds.
• Algorithm:
1. Place K points (or seeds) into the space represented by the objects that are being clustered
2. These points represent initial group centroids.
3. Assign each object to the group that has the closest centroid.
4. When all objects have been assigned, recalculate the positions of the K centroids
5. Repeat Steps 2 and 3 until the centroids no longer move.
Hierarchical vs Non hierarchical methods
HIERARCHICAL CLUSTERING NON HIERARCHICAL CLUSTERING

No decision about the number of Faster, more reliable

clusters
Need to specify the number of clusters
Problems when data contain a high (arbitrary)
level of error
Need to set the initial seeds (arbitrary)
Can be very slow
Initial decision are more influential (one
step only)
IV – Select a Clustering Procedure (3)
Two Step Method
• It
has been suggested that the hierarchical and
nonhierarchical methods be used in tandem.
•First, an initial clustering solution is obtained using a
hierarchical procedure, such as average linkage or
Ward's.

• The number of clusters and cluster centroids so

obtained are used as inputs to the k-means procedure.
IV - Decide on the Number of Clusters
•Theoretical, conceptual, or practical considerations may suggest a certain
number of clusters.
•In hierarchical clustering, the distances at which clusters are combined can be
used as criteria. This information can be obtained from the agglomeration
schedule or from the dendrogram.
•In nonhierarchical clustering, the ratio of total within-group variance to
between-group variance can be plotted against the number of clusters.
• The point at which an elbow or a sharp bend occurs indicates an appropriate
number of clusters.
•The relative sizes of the clusters should be meaningful.
V - Interpreting and Profiling the Clusters
• Interpreting and profiling clusters involves examining the
cluster centroids. The centroids enable us to describe
each cluster by assigning it a name or label.

• It is often helpful to profile the clusters in terms of

variables that were not used for clustering. These may
include demographic, psychographic, product usage,
media usage, or other variables.
VI - Assess Reliability and Validity
1. Perform cluster analysis on the same data using different distance
measures. Compare the results across measures to determine the
stability of the solutions.
2. Use different methods of clustering and compare the results.
3. Split the data randomly into halves. Perform clustering separately on
each half. Compare cluster centroids across the two subsamples.
4. Delete variables randomly. Perform clustering based on the reduced
set of variables. Compare the results with those obtained by
clustering based on the entire set of variables.
5. In nonhierarchical clustering, the solution may depend on the order of
cases in the data set. Make multiple runs using different order of
cases until the solution stabilizes.
Illustration
Example: Attitudinal Data
Lifestyle Questionnaire
We asked a sample of 19 individuals to rate their attitudes
towards the following statements on a 1-7 Scale:
1. I like having fun
2. Going out is bad for budget
3. I like eating out
4. I always look for bargains and best buys
5. I don’t care about going out often
6. I like comparing prices
Attitudinal Data For Case No. V1 V2 V3 V4 V5 V6

Clustering 1
2
6
2
4
3
7
1
3
4
2
5
3
4
3 7 2 6 4 1 3
4 4 6 4 5 3 6
5 1 3 2 2 6 4
6 6 4 6 3 3 4
7 5 3 6 3 3 4
8 7 3 7 4 1 4
9 2 4 3 3 6 3
10 3 5 3 6 4 6
11 1 3 2 3 5 3
12 5 4 5 4 2 4
13 2 2 1 5 4 4
14 4 6 4 6 4 7
15 6 5 4 2 1 4
16 3 5 4 6 4 7
17 4 4 7 2 2 5
18 3 7 2 6 4 3
19 4 6 3 7 2 7
Results of Hierarchical
Clustering
Stage 6: Individual 10 joins the cluster formed at stage
Stage 1: 1, i.e. 14 & 16: now we have a cluster with 14, 16, 10
Individuals 14 Agglomeration Schedule
and 16 are the Cluster Combined Stage Cluster First Appears
Stage Coefficients Next Stage
first to be Cluster 1 Cluster 2 Cluster 1 Cluster 2
joined together 1 14 16 1.000 0 0 6
2 6 7 2.000 0 0 7
3 2 13 3.500 0 0 14
Stage 2: 4 5 11 5.000 0 0 8
Individuals 6 & 5 3 8 6.500 0 0 15
7 now form a 6 10 14 8.167 0 1 9
segment
7 6 12 10.500 2 0 10
8 5 9 13.000 4 0 14
The amount of
9 4 10 15.583 0 6 11
error created at
10 1 6 18.500 0 7 12
each clustering
11 4 19 23.250 9 0 16
stage. A large
12 1 17 28.600 10 0 13
jump in the value
13 1 15 36.833 12 0 15
of the error term
14 2 5 46.533 3 8 17
indicates that two
different things 15 1 3 59.200 13 5 18

have been 16 4 18 74.367 11 0 17

brought together 17 2 4 154.545 14 16 18

18 1 2 300.632 15 17 0
Dendogram 3 clusters seems to give a satisfactory overall similarity

• Graphical representation (tree graph)

Cluster 3
of the results of a hierarchical
procedure.

• Starting with each object as a

Cluster 2
separate cluster.

• The dendogram shows graphically

how the clusters are combined at

Cluster 1
each step of the procedure until all
are contained in a single cluster
Cluster Membership
Cluster Membership
Solution 1: 3 clusters Solution 2: 2 clusters
Case 3 Clusters 2 Clusters
1 1 1
2 2 2
Individuals 1 & 3 1 1
3 belong to 4 3 2
cluster 1 5 2 2
6 1 1
Individuals 5 & 7 1 1
9 belong to 8 1 1
cluster 2 Individuals'
9 2 2
10 3 2 membership for the
11 2 2 2 cluster solution
12 1 1
13 2 2
Individuals 14
14 3 2
& 16 belong to
15 1 1
cluster
16 3 2
17 1 1
18 3 2
19 3 2
Cluster Membership: Icicle Plot
Individuals Individuals or cases
9,11,5,13,2
belong to
cluster 2

3 Cluster 3 Cluster 2 Cluster 1 A 3 cluster Solution

Individuals 18,
19, 16,14,,10,4
belong to
cluster 3
Cluster Membership: Icicle Plot
Individuals Individuals or cases
9,11,5,13,2
belong to
cluster 3

1 2 3 4 5
Individual 18 A 5 cluster Solution
belongs to
cluster 1
Cluster Centroids: Description of
clusters
Examine the cluster centroids (we retained 3 clusters here)

Report

Mean
I always I like
I Iike having Going out is Bad I like Eating I Don't Care
Cluster look for Best Comparing
fun For Budget Out about going out
Buys Prices
1 5.7500 3.6250 6.0000 3.1250 1.8750 3.8750

2 1.6000 3.0000 1.8000 3.4000 5.2000 3.6000

3 3.5000 5.8333 3.3333 6.0000 3.5000 6.0000

Total 3.9474 4.1579 4.0526 4.1053 3.2632 4.4737

• Cluster 1: 3eich (party people)
• Cluster 2: Zeid nekess (don’t care)
• Cluster 3 : El Hedhek (the stingy)
Results of Nonhierarchical
Clustering (K-Means)
Centroid of the initial Centroid of the final solution: individuals
solution will be classified relative to these centroids

Initial Cluster Centers Final Cluster Centers

Cluster Cluster

1 2 3
1 2 3
I like having fun 3.50 1.67 5.75

I like having fun 4.00 1.00 7.00 Going out is bad for
5.83 3.00 3.63
Going out is bad for budget
6.00 3.00 2.00
budget
I like eating Out 3.33 1.83 6.00
I like eating Out 3.00 2.00 6.00
I always look for best buys
6.00 3.50 3.13
I always look for best and bargains
7.00 2.00 4.00
buys and bargains
I don't care about going
I don't care about going 3.50 5.50 1.88
2.00 6.00 1.00 out
out
I like comparing prices 6.00 3.67 3.88
I like comparing prices 7.00 4.00 3.00

We go through a series of iterations until the centroids stabilize

and do not change from one iteration to the other
Cluster Membership (k-Means) for 3
clusters Cluster Membership
Case Cluster Distance
Number
1 3 1.414
2 2 1.190
3 3 2.550
4 1 1.404
Final cluster membership

5 2 1.756
6 3 1.225
7 3 1.500
8 3 2.121
9 2 1.848 Distance from
10 1 1.143 each individual
11 2 1.190
and the cluster
12 3 1.581
13 2 2.533 centroid
14 1 1.404
15 3 2.828
16 1 1.624
17 3 2.598
18 1 3.555
19 1 2.154
20 2 1.658
Distance between the 3 cluster
centroids
Distances between Final Cluster Centers

Cluster 1 2 3

1 5.416 5.698

2 5.416 6.910

3 5.698 6.910

The between cluster distance should be bigger than the within cluster
distance
Thank you
Anova Analysis

47
Simple Example
Suppose a marketing researcher wishes to determine market segments in a
community based on patterns of loyalty to brands and stores.
A small sample of seven respondents is selected as a pilot test of how cluster
analysis is applied.
◦ Two measures of loyalty were measured for each respondents on 0-10 scale :
◦ V1(store loyalty)
◦ V2(brand loyalty)
Scatter Plot of the responses
How do we measure similarity?
Proximity Matrix of Euclidean Distance Between Observations
Observations
Observation
A B C D E F G

A ---
B 3.162 ---
C 5.099 2.000 ---
D 5.099 2.828 2.000 ---
E 5.000 2.236 2.236 4.123 ---
F 6.403 3.606 3.000 5.000 1.414 ---
G 3.606 2.236 3.606 5.000 2.000 3.162 ---
How do we form clusters?
SIMPLE RULE:
◦ Identify the two most similar(closest) observations not already in the same cluster and
combine them.

◦ We apply this rule repeatedly to generate a number of cluster solutions, starting with
each observation as its own “cluster” and then combining two clusters at a time until all
observations are in a single cluster.
◦ This process is termed a hierarchical procedure because it moves in a stepwise fashion to form an entire range
of cluster solutions. It is also an agglomerative method because clusters are formed by combining existing
clusters
How do we form clusters?
AGGLOMERATIVE PROCESS CLUSTER SOLUTION
Minimum Overall Similarity
Distance Measure
Step Unclustered Observation Cluster Membership Number of (Average
Observationsa Pair Clusters Within-Cluster
Distance)
Initial Solution (A)(B)(C)(D)(E)(F)(G) 7 0
1 1.414 E-F (A)(B)(C)(D)(E-F)(G) 6 1.414
2 2.000 E-G (A)(B)(C)(D)(E-F-G) 5 2.192
3 2.000 C-D (A)(B)(C-D)(E-F-G) 4 2.144
4 2.000 B-C (A)(B-C-D)(E-F-G) 3 2.234
5 2.236 B-E (A)(B-C-D-E-F-G) 2 2.896
6 3.162 A-B (A-B-C-D-E-F-G) 1 3.420
• In steps 1,2,3 and 4, the OSM does not change substantially, which indicates that we
are forming other clusters with essentially the same heterogeneity of the existing
clusters.
• When we get to step 5, we see a large increase. This indicates that joining clusters
(B-C-D) and (E-F-G) resulted a single cluster that was markedly less homogenous.
How many groups do we form?
Therefore, the three – cluster solution of Step 4 seems the most appropriate for a
final cluster solution, with two equally sized clusters, (B-C-D) and (E-F-G), and a
single outlying observation (A).

This approach is particularly useful in identifying outliers, such as Observation A. It

also depicts the relative size of varying clusters, although it becomes unwieldy
when the number of observations increases.
Graphical Portrayals
Graphical Portrayals: Dendogram
Dendogram
Graphical representation (tree graph) of the results of a hierarchical procedure. Starting
with each object as a separate cluster.
The dendogram shows graphically how the clusters are combined at each step of the
procedure until all are contained in a single cluster
Clustering in SPSS
To select this procedures using SPSS for Windows click:

Analyze>Classify>Hierarchical Cluster …

Analyze>Classify>K-Means Cluster …

Analyze>Classify>Two-Step Cluster …
SPSS Windows: Hierarchical Clustering
1. Select ANALYZE from the SPSS menu bar.
2. Click CLASSIFY and then HIERARCHICAL CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],” “Don’t Care [v5],”
and “Compare Prices [v6].” in to the VARIABLES box.
4. In the CLUSTER box check CASES (default option). In the DISPLAY box check STATISTICS and
PLOTS (default options).
5. Click on STATISTICS. In the pop-up window, check AGGLOMERATION SCHEDULE. In the
CLUSTER MEMBERSHIP box check RANGE OF SOLUTIONS. Then, for MINIMUM NUMBER OF
CLUSTERS: enter 2 and for MAXIMUM NUMBER OF CLUSTERS enter 4. Click CONTINUE.
6. Click on PLOTS. In the pop-up window, check DENDROGRAM. In the ICICLE box check ALL
CLUSTERS (default). In the ORIENTATION box, check VERTICAL. Click CONTINUE.
7. Click on METHOD. For CLUSTER METHOD select WARD’S METHOD. In the MEASURE box check
INTERVAL and select SQUARED EUCLIDEAN DISTANCE. Click CONTINUE
8. Click OK.
SPSS Windows: K-Means
Clustering
1. Select ANALYZE from the SPSS menu bar.
2. Click CLASSIFY and then K-MEANS CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],”
“Don’t Care [v5],” and “Compare Prices [v6].” in to the VARIABLES box.
4. For NUMBER OF CLUSTER select 3.
5. Click on OPTIONS. In the pop-up window, In the STATISTICS box, check
INITIAL CLUSTER CENTERS and CLUSTER INFORMATION FOR EACH CASE. Click
CONTINUE.
6. Click OK.
SPSS Windows: Two-Step
Clustering
1. Select ANALYZE from the SPSS menu bar.
2. Click CLASSIFY and then TWO-STEP CLUSTER.
3. Move “Fun [v1],” “Bad for Budget [v2],” “Eating Out [v3],” “Best Buys [v4],”
“Don’t Care [v5],” and “Compare Prices [v6].” in to the CONTINUOUS
VARIABLES box.
4. For DISTANCE MEASURE select EUCLIDEAN.
5. For NUMBER OF CLUSTER select DETERMINE AUTOMATICALLY.
6. For CLUSTERING CRITERION select AKAIKE’S INFORMATION CRITERION (AIC).
7. Click OK.

AI-900 Exam Notes
75% (4)
AI-900 Exam Notes
44 pages
Coursersaaa - Quiz
No ratings yet
Coursersaaa - Quiz
14 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
No ratings yet
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
43 pages
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
No ratings yet
Cluster Analysis: Classification Analysis, or Numerical Taxonomy
13 pages
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
No ratings yet
Cluster Analysis: Clusters Classification Analysis Numerical Taxonomy
50 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Chapter 20: Cluster Analysis: Advance Marketing Research
No ratings yet
Chapter 20: Cluster Analysis: Advance Marketing Research
40 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
35 pages
Cluster Analysis
No ratings yet
Cluster Analysis
34 pages
Cluster Analysis GP Seminar
No ratings yet
Cluster Analysis GP Seminar
13 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
46 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Cluster Analysis
No ratings yet
Cluster Analysis
33 pages
Block 18 ST3188
No ratings yet
Block 18 ST3188
29 pages
In Marketing, Cluster Analysis Is Used For: Statistical
No ratings yet
In Marketing, Cluster Analysis Is Used For: Statistical
3 pages
Cluster Analysis
No ratings yet
Cluster Analysis
15 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
Group#10 (Cluster Analysis)
No ratings yet
Group#10 (Cluster Analysis)
53 pages
Cluster Analysis
No ratings yet
Cluster Analysis
61 pages
Markup 01 Statistika Lanjut - Cluster Analysis 1
No ratings yet
Markup 01 Statistika Lanjut - Cluster Analysis 1
60 pages
Presentation Malo
No ratings yet
Presentation Malo
65 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Lec 35
No ratings yet
Lec 35
18 pages
Cluster Analysis
No ratings yet
Cluster Analysis
25 pages
L18 19 Clustering
No ratings yet
L18 19 Clustering
48 pages
Cluster Analysis: Prentice-Hall, Inc
No ratings yet
Cluster Analysis: Prentice-Hall, Inc
33 pages
Malhotra MR6e 20
No ratings yet
Malhotra MR6e 20
46 pages
Cluster Analysis: Consumer Segmentation
No ratings yet
Cluster Analysis: Consumer Segmentation
17 pages
Lecture-9 Cluster Analysis - LAK
No ratings yet
Lecture-9 Cluster Analysis - LAK
4 pages
8.cluster Analysis HCA
No ratings yet
8.cluster Analysis HCA
31 pages
Chapter Twenty: Cluster Analysis
No ratings yet
Chapter Twenty: Cluster Analysis
41 pages
Cluster Analysis
No ratings yet
Cluster Analysis
45 pages
19 - Clustering in Operation Research
No ratings yet
19 - Clustering in Operation Research
11 pages
Cluster Analysis
No ratings yet
Cluster Analysis
23 pages
Lecture-11 Cluster Analysis-1
No ratings yet
Lecture-11 Cluster Analysis-1
28 pages
A Famous Example of Cluster Analysis
No ratings yet
A Famous Example of Cluster Analysis
5 pages
Market Segmentation - Cluster Analysis
No ratings yet
Market Segmentation - Cluster Analysis
18 pages
Aula - Análise de Clusters
No ratings yet
Aula - Análise de Clusters
93 pages
Cluster Analysis
No ratings yet
Cluster Analysis
101 pages
MDA Session 4
No ratings yet
MDA Session 4
5 pages
Market Research
No ratings yet
Market Research
88 pages
Business Research: Cluster Analysis
No ratings yet
Business Research: Cluster Analysis
10 pages
Marielle Caccam Jewel Refran
No ratings yet
Marielle Caccam Jewel Refran
100 pages
10.cluster Analysis
No ratings yet
10.cluster Analysis
68 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Cluster Analysis
No ratings yet
Cluster Analysis
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
5 pages
Clustering X
No ratings yet
Clustering X
2 pages
Chapter-5-Cluster Analysis PDF
No ratings yet
Chapter-5-Cluster Analysis PDF
5 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
Cluster Analysis-2
No ratings yet
Cluster Analysis-2
7 pages
SPSS Tutorial Cluster Analysis PDF
No ratings yet
SPSS Tutorial Cluster Analysis PDF
42 pages
SPSS Tutorial Cluster Analysis
No ratings yet
SPSS Tutorial Cluster Analysis
42 pages
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Understanding Analysis: Foundations and Applications
From Everand
Understanding Analysis: Foundations and Applications
Tanmay Shroff
No ratings yet
Advanced Techniques for Multivariate Data Analysis Using PYTHON. Predictive Models for Classification and Segmentation
From Everand
Advanced Techniques for Multivariate Data Analysis Using PYTHON. Predictive Models for Classification and Segmentation
César Pérez López
No ratings yet
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
Metode Subtractive Fuzzy C-Means (SFCM) Dalam Pengelompokan
No ratings yet
Metode Subtractive Fuzzy C-Means (SFCM) Dalam Pengelompokan
13 pages
Clustering Dan Evaluasi
No ratings yet
Clustering Dan Evaluasi
35 pages
Machine Learning Roadmap
No ratings yet
Machine Learning Roadmap
22 pages
Splunk 6.1 SearchReference
No ratings yet
Splunk 6.1 SearchReference
454 pages
NLP Research Paper
No ratings yet
NLP Research Paper
19 pages
DMDW Notes
100% (1)
DMDW Notes
62 pages
14
No ratings yet
14
72 pages
Data Analytics
No ratings yet
Data Analytics
302 pages
Unit - 4 Introduction To Data Mining
No ratings yet
Unit - 4 Introduction To Data Mining
71 pages
Presentation
No ratings yet
Presentation
16 pages
Merged Presentation Choladeck Choladeck-Compressed
No ratings yet
Merged Presentation Choladeck Choladeck-Compressed
239 pages
E-Commerce &ERP Notes Part-4
No ratings yet
E-Commerce &ERP Notes Part-4
29 pages
Smart Log Data Analytics Techniques For Advanced Security Analysis 1st Ed 2021 Florian Skopik PDF Download
No ratings yet
Smart Log Data Analytics Techniques For Advanced Security Analysis 1st Ed 2021 Florian Skopik PDF Download
79 pages
Metrix in ML
No ratings yet
Metrix in ML
7 pages
Web Usage Mining On Proxy Servers: A Case Study: January 2001
No ratings yet
Web Usage Mining On Proxy Servers: A Case Study: January 2001
19 pages
10 Challenging Problems in Data Mining Research
No ratings yet
10 Challenging Problems in Data Mining Research
8 pages
Does India Have Subnational Welfare Regimes The Role of State Governments in Shaping Social Policy
No ratings yet
Does India Have Subnational Welfare Regimes The Role of State Governments in Shaping Social Policy
18 pages
BCDS501 Introduction To Data Analytics and Visualization: CO1 K CO2 K, K CO3 K, K CO4 K, K CO5 K, K
No ratings yet
BCDS501 Introduction To Data Analytics and Visualization: CO1 K CO2 K, K CO3 K, K CO4 K, K CO5 K, K
1 page
Unit 1 Aktu
No ratings yet
Unit 1 Aktu
26 pages
Managing and Mining Graph Data
No ratings yet
Managing and Mining Graph Data
620 pages
Unit V Cluster Analysis
No ratings yet
Unit V Cluster Analysis
2 pages
Remote Sensing Classification Methods
No ratings yet
Remote Sensing Classification Methods
47 pages
Efficient Routing Leach (Er-Leach) Enhanced On Leach Protocol in Wireless Sensor Networks
No ratings yet
Efficient Routing Leach (Er-Leach) Enhanced On Leach Protocol in Wireless Sensor Networks
7 pages
Wheat Disease Detection Using Image Processing
No ratings yet
Wheat Disease Detection Using Image Processing
4 pages
DS Unit2
No ratings yet
DS Unit2
23 pages
Paper 65-Fraud Detection in Credit Cards
No ratings yet
Paper 65-Fraud Detection in Credit Cards
12 pages
BIDSS
No ratings yet
BIDSS
47 pages

Lecture 02 - Cluster Analysis 1

Uploaded by

Lecture 02 - Cluster Analysis 1

Uploaded by

MEDITERRANEAN SCHOOL OF BUSINESS

COURSE: DATA ANALYTICS

PROFESSOR: Dr. Ramla Jarrar

Lecture 02: Cluster Analysis (3h)

Factor analysis is used:

Fi = Wi1X1 + Wi2X2 + Wi3X3 + . . . + WikXk

Formulate the Interpret your Calculate your

Look at the Rotate your

Underlying theory Recap Business Statistics (3h inc. Lab)

Factor Analysis (3h)+Lab (3h)

Cluster Analysis (3h)+Lab (3h)

Capstone Session (1.5h)

Nonmetric Dependent MDA (Multiple

You have to sell 100 Blackberry phones(each costs $1000) to the

How can you be efficient in your sales strategy?

◦Further divide the employed

◦Further divide that group into

Grouping is based on the Grouping is based on patterns of

We form group of people We form group of variables based

◦ Euclidean distance. The most commonly used measure

Agglomerative Variance Average

No decision about the number of Faster, more reliable

• The number of clusters and cluster centroids so

• It is often helpful to profile the clusters in terms of

have been 16 4 18 74.367 11 0 17

brought together 17 2 4 154.545 14 16 18

• Graphical representation (tree graph)

• Starting with each object as a

• The dendogram shows graphically

3 Cluster 3 Cluster 2 Cluster 1 A 3 cluster Solution

2 1.6000 3.0000 1.8000 3.4000 5.2000 3.6000

3 3.5000 5.8333 3.3333 6.0000 3.5000 6.0000

Total 3.9474 4.1579 4.0526 4.1053 3.2632 4.4737

Initial Cluster Centers Final Cluster Centers

We go through a series of iterations until the centroids stabilize

This approach is particularly useful in identifying outliers, such as Observation A. It

You might also like