Predictive Analytics and Data Mining: Segmentation Using Clustering

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 25

Predictive Analytics and Data Mining

Segmentation using Clustering

Automatic Cluster Detection

DM techniques used to find patterns in data

• Not always easy to identify
 No observable pattern
 Too many patterns

Decomposition (break down into smaller pieces)

Automatic Cluster Detection is useful to find “better behaved” clusters

of data within a larger dataset
Automatic Cluster Detection

K-Means clustering algorithm – depends on a geometric interpretation of the


Other automatic cluster detection (ACD) algorithms include:

• Gaussian mixture models
• Agglomerative clustering
• Divisive clustering
• Self-organizing maps (SOM) - Neural Nets

ACD is a tool
• No preclassified training data set
• No distinction between independent and dependent variables
• Marketing clusters referred to as “segments”
• Customer segmentation is a popular application of clustering

ACD rarely used in isolation – other methods follow up


Organizing Customers into groups with similar traits, product

preferences or expectations
• Demographic Characteristics
• Psychographics (interests, attitudes, opinions, personality, values,
• Desired benefits from products/services
• Past-purchase or product use behaviors
K-means Clustering

“K” – circa 1967 – this algorithm looks for a fixed number of

clusters which are defined in terms of proximity of data
points to each other

How K-means works (see next slide figures):

• Algorithm selects K data points randomly
• Assigns each of the remaining data points to one of K clusters
• Calculate the mean of cases of each cluster and move the K data
points/ cluster seeds to the mean of the cluster
• Reassign cases closest to the new seed I as belonging to cluster I
• Euclidean distance (dist. Between two points (u1,v1) and (u2,v2) is
the sq. root (sq. (u1-u2) + sq. (v1-v2)
K-means Clustering
K-means Clustering

Resulting clusters describe

underlying structure in the
data, however, there is no
one right description of that
Similarity & Difference

Automatic Cluster Detection is quite simple for a software program to

accomplish – data points, clusters mapped in space

However, business data points are not about points in space but about
purchases, phone calls, airplane trips, car registrations, etc. which
have no obvious connection to the dots in a cluster diagram
Similarity & Difference

Clustering business data requires some notion of natural association –

records (data) in a given cluster are more similar to each other than
to those in another cluster

For DM software, this concept of association must be translated into

some sort of numeric measure of the degree of similarity

Most common translation is to translate data values (eg., gender, age,

product, etc.) into numeric values so can be treated as points in

If two points are close in geometric sense then they represent similar
data in the database
Similarity & Difference

Business variable (fields) types:

• Categorical (eg., mint, cherry, chocolate)
• Ranks (eg., freshman, soph, etc. or valedictorian, salutatorian)
• Intervals (eg., 56 degrees, 72 degrees, etc)
• True measures – interval variables that measure from a meaningful
zero point
 Age, weight, height, length, tenure are good examples
Pattern Discovery

“…the discovery of interesting, unexpected, or valuable structures in

large data sets.”
- David Hand, Professor of Statistics, Imperial College

- “If you’ve got terabytes of data, and you’re relying on data mining to
find interesting things in there for you, you’ve lost before you’ve
even begun. You really need people who understand what it is they
are looking for – and what they can do with it once they find it.”
- Herb Edelstein, President of Two Crows Corporation
Inputs (Desirable Charateristics)

Meaningful to the analysis objective

Relatively independent

Limited in number

Have a measurement level of Interval

Have low kurtosis and skewness statistics

What Value of K to Use

Subject Matter Knowledge (there are most likely five groups)

Convenience (It is convenient to market to 3 or 4 groups)

Constraints (You have 5 products and need 5 segments)

Arbitrarily (always pick 10)

Based on the Data (Ward’s method or Elbow Criterion )

(Elbow Plot – plot of ratio of within cluster variance to between cluster

variance vs the no. of clusters)
Ward’s Method

Algorithm for Hierarchical cluster analysis

In this method each observation is considered a cluster, and the

clusters are hierarchically joined, based on minimizing the ratio of
variation within clusters to between clusters

Based on a statistical analysis, the number of clusters is selected

This number of clusters is used for k-means cluster analysis

Ward’s Method in SAS Enterprise Miner

Preliminary k-means clustering on data to save many cluster centroids

(default 50)

Ward’s hierarchical clustering on saved cluster centroids (k, then k-1, k-

2 etc) to determine ideal value of k (greater than minimum specified
in selection criteria and has a CCC (cubic clustering criterion)
statistic greater than threshold specified in selection criteria)

K-means clustering on the original dataset using k from step 2

Evaluating Clusters

What does it mean to say that a cluster is “good”?

• Clusters should have members that have a high degree of similarity
• Standard way to measure within-cluster similarity is variance* –
clusters with lowest variance is considered best
• Cluster size is also important so alternate approach is to use average

* The sum of the squared differences of each element from the mean
** The total variance divided by the size of the cluster
Evaluating Clusters

Finally, if detection identifies good clusters along with weak ones it

could be useful to set the good ones aside (for further study) and run
the analysis again to see if improved clusters are revealed from only
the weaker ones
Validating Clusters

Goal: obtain meaningful and useful clusters

(1) Random chance can often produce apparent clusters
(2) Different cluster methods produce different results


Obtain summary statistics

Also review clusters in terms of variables not used in clustering

Label the cluster (e.g. clustering of financial firms in 2008 might

yield label like “midsize, sub-prime loser”)
Desirable Cluster Features

Stability – are clusters and cluster assignments sensitive to slight

changes in inputs? Are cluster assignments in partition B similar to
partition A?

Separation – check ratio of between-cluster variation to within-

cluster variation (higher is better)
Grocery Store Case Study
Analysis goal:
Where should you open new grocery store locations?
Group geographic regions into segments based on
income, household size, and population density.
Analysis plan:
 Select and transform segmentation inputs.
 Select the number of segments to create.
 Create segments with the Cluster tool.
 Interpret the segments.


You might also like