0% found this document useful (0 votes)

64 views142 pages

Factor Analysis - Segmentation New

There are different approaches to segmenting customers based on their behaviors. Rule-based segmentation involves manually grouping customers based on 1-3 key factors related to a business objective. Supervised segmentation uses predictive algorithms to group customers based on many factors that influence a specific outcome metric. Unsupervised segmentation uses clustering algorithms to group customers based only on behavioral patterns across multiple factors, without a predefined outcome variable. Common techniques include K-means clustering, hierarchical clustering, and mean-shift clustering. Segmentation provides benefits like improved targeting, retention, and lifetime value.

Uploaded by

Amiya Kumar Biswal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views142 pages

Factor Analysis - Segmentation New

Uploaded by

Amiya Kumar Biswal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 142

Factor Analysis

and
Segmentation

Disclaimer: This material is protected under copyright act AnalytixLabs ©, 2011-2016. Unauthorized use and/ or duplication of this material or any part of this material
including data, in any form without explicit and written permission from AnalytixLabs is strictly prohibited. Any violation of this copyright will attract legal actions
Introduction to Segmentation
Segmentation
Each individual is so different
that ideally we would want to reach out to each one of them in a different way
1 2 3 4 5 6

Problem : The volume is too large for customization at individual level

1 3 4
2
Solution : Identify segments where people have same characters and target each of
these segments in a different way

Segmentation is for better targeting

Cluster Analysis
Example
Business Example
Consider a portfolio with 1000 customers having Credits. Business wants to make different strategies to different groups of people. How company
can group them into similar groups?

In this case we need some profiling as below: -

Total Population
(1000)

Avg. delinquency
Avg. delinquency age = 0 age = 75 days and
Avg. delinquency Avg. delinquency Avg. age = 50 yrs.
days and Avg. age = 35 yrs.
age = 15 days and age = 12 days and Avg. Utilization = 40%
Avg. Utilization > 80%
Avg. age = 33 yrs. Avg. age = 25 yrs.
Avg. Utilization = 60% Avg. Utilization = 90%

We can exclude the group with avg. delinquency age = 75 days from mailing

This type of segmentation is known as ‘Subjective Segmentation’. It gives the salient characteristics of
the best customers
Applications of Segmentation
Customer Segmentation
Customer Segmentation:
• Customer segmentation is the process of splitting your customer database into smaller groups. By
focusing on specific customer types, you will maximize customer lifetime value and better understand
who they are and what they need.

Typically customers differ in terms of:

• Products they are interested in
• Marketing channels they interact with (e.g. offline media like TV and press, social networks etc.)
• The maximum amount they can pay for a product (willingness to pay)
• Types of promotions and benefits they expect (discounts, free shipping)
• Buying patterns and frequency.
Key variables to use for customer segmentation
Geographical location – knowing where customers live can give you a good idea on their income and lifestyle (you can
also incorporate databases like Experian Mosaic)

Age and gender – younger customers are often more impulsive and frequent buyers while female customers might
have a higher long-term value

Acquisition channel – e.g. customers from Social Media are often less valuable then customers navigating to your site
directly

First product purchased – pay close attention to the transaction value and product category to differentiate between
price-focused and quality-focused customers

Device types – e.g. customers using a mobile device typically spend less than customers on a desktop PC

Recency, Frequency and Monetary value of customer transactions is a complete segmentation strategy

etc…
Applications of customer segmentation
Customer segmentation can help other parts of your business. It will allow you to:

 Improve customer retention by providing products tailored for specific segments

 Increase profits by leveraging disposable incomes and willingness to spend

 Grow you business quicker by focusing marketing campaigns on segments with higher propensity to buy

 Improve customer lifetime value by identifying purchasing patterns and targeting customers when they are in the market

 Retain customers by appearing as relevant and responsive

 Identify new product opportunities and improve the products you already have

 Optimize operations by focusing on geographies, age groups etc. with the most value

 Increase sales by offering free shipping to high frequency buyers

 Offer improved customer support to VIP customers

 Gain brand evangelists by incentivising them to comment, review or talk about your product with free gifts or discounts

 Reactivate customers who have churned and no longer interact with you
Types of customer Segmentation

 Value Based Segmentation: Customer ranking and segmentation according to current and
expected/estimated customer value

 Life Stage Segmentation: Segmentation according to current life stage which he/she
belongs

 Loyalty Segmentation: Segmentation according to current & Previous value

 Behavioral Segmentation: Customer segmentation based on behavioral attributes

There are 3 approaches to behavioral segmentation
Suggested
Description When to do technique Client example
Segment customers, ▪ Only a couple factors are ▪ Cable client segmented prospects on
manually, based on 1 to 3 thought to drive the segments Cross-tabs and their potential telecom spend and used
1
Rule-based: factors to drive specific ▪ Known hypothesis to cut conditional data the segmentation to align sales
Hypothesis driven business objective the data to create segments cuts resources and offers to improve go-to-
market strategy

Segment customers using ▪ Data-driven segments CHAID ▪ Telecom client segmented customers
predictive algorithm, desired, but first and foremost on various factors that drive churn
Supervised:
Behavioral based on high number of segments need to be propensity and targeted high churn
2 With a dependent
segmentation factors that potentially differentiated on a specific segments with retention campaigns
variable
drive a specific outcome outcome/metric (e.g. and offers
revenue)

Segment customers using ▪ Data-driven segments TwoStep, ▪ Retail client segmented customers on
clustering algorithm desired K-Means behavioral shopping factors that
Unsupervis-ed:
3 Without a dependent
based on high number of ▪ Segments need to be included category spend, shopping
factors differentiated across many frequency/tendency, and store/channel
variable
behavioral factors shopped to inform merchandising and
offer strategy
RFM SEGMENTATION
RFM SEGMENTATION- STEPS
RFM SEGMENTATION - STEPS
RFM-SEGMENTATION STEPS
Behavioral Segmentation - Clustering Techniques
• K-means
• Iteratively re-assign points to the nearest cluster center
• Agglomerative clustering(Heirarchical)
• Start with each point as its own cluster and iteratively merge the closest
clusters
• Mean-shift clustering
• Estimate modes of pdf
• Spectral clustering
• Split the nodes in a graph based on assigned links with similarity weights

As we go down this chart, the clustering strategies have more tendency to

transitively group points even if they are not nearby in feature space
Behavioral Segmentation: Hierarchical Vs. Non-hierarchical
Behavioral Segmentation: Subjective Segmentation-Cluster Analysis
Highest value segment

Big Small
Ticket Frequent Ticket Infrequent Returner Overall
% Customers 9.8 4.2 13.5 69.5 6.6 100.0
% Revenue 27.4 33.6 15.4 13.5 10.1 100.0
Revenue per customer ($) 1,038 8,618 1209.1 220 1613.5 1077.2
Visits per customer 3.1 34.2 16.1 2.1 8.3 4.8
Basket size ($) 970.1 252.7 75.1 105.2 165.1 224.8
Average departments shopped 3.6 5.5 1.9 1.2 2.9 1.9
Stores shopped 1.1 3.0 1.8 1.1 1.2 1.7
Returning propensity (%) 0.3 6.5 5.5 0.3 25.5 3.2
Shopped in December (%) 15.1 70.8 53.3 19.4 23.3 26.6
Shopped on Memorial Day (%) 1.6 17.9 2.4 0.9 2.1 2.2
Shopped on Labor Day (%) 1.0 14.1 1.8 0.6 1.5 1.7
Shopped on President's Day (%) 0.7 12.0 1.8 0.6 1.8 1.5

Average Discount Rate (%) 14.8 11.4 6.6 4.5 10.6 11.2
Customer lifetime (months) 25.2 46.2 42.2 28.4 27.2 30.8

Note that key profile variables are not always the same as basis variables
used to generate the segmentation
Subjective Segmentation: Cluster Analysis Process
Data Cleaning and Creating New
Selection of
Preparing the data set for Relevant Variables
Variables
analysis
Step 3
Step 1 Step 2

Multicollinearity Check Treatment of Missing Values Tackling the

Outliers
Step 6 Step 5 Step 4

Getting Cluster Checking the

Standardization Optimality of the
Solution
Step 7 Solution
Step 8
Step 9
Process Flow for Cluster Analysis
Subjective Segmentation: K-Means Clustering Algorithm
K-Means clustering

Overall population
K-Means clustering

Fix the number of clusters

K-Means clustering

Calculate the distance of each case

from all clusters
K-Means clustering

Assign each case to nearest cluster

K-Means clustering

Re calculate the cluster centers

K-Means clustering
K-Means clustering
K-Means clustering
K-Means clustering
K-Means clustering
K-Means clustering
K-Means clustering

Reassign after changing the cluster

centers
K-Means clustering
K-Means clustering

Continue till there is no significant

change between two iterations
Calculating the distance
Weight
Cust1 68 Which of the two customers are similar?
Cust2 72
Cust3 100

Weight Age
Cust1 68 25
Which of the two customers are similar now?
Cust2 72 70
Cust3 100 28

Weight Age Income

Cust1 68 25 60,000
Which two of the customers are similar in this
Cust2 72 70 9,000 case?
Cust3 100 28 62,000
Distance Measures
• Euclidean distance
• City-block (Manhattan) distance
• Chebychev similarity
• Minkowski distance
• Mahalanobis distance
• Maximum distance
• Cosine similarity
• Simple correlation between observations
• Minimum distance
• Weighted distance

Not sure all these measures will result in same clusters in the above example
Spectral Clustering (Density Based Clustering)
Density-based Clustering

• Basic idea
– Clusters are dense regions in the data space, separated by
regions of lower object density
– A cluster is defined as a maximal set of density- connected points
– Discovers clusters of arbitrary shape
• Method
– DBSCAN
Density Definition

• -Neighborhood – Objects within a radius of  from

an object.
N  ( p) :{q | d ( p, q)   }
• ͞High density͟ - ε-Neighborhood of an object contains at least MinPts of
objects.

ε-Neighborhood of p
ε ε ε-Neighborhood of q
q p
Density of p is “high” (MinPts = 4)
Density of q is “low” (MinPts = 4)
Core, Border & Outlier

Outlier Given  and MinPts, categorize the

objects into three exclusive groups.
Border
A point is a core point if it has more than a specified
Core number of points (MinPts) within Eps—These are points
that are at the interior of a cluster.

A border point has fewer than MinPts within Eps,

 = 1unit, MinPts = 5 but is in the neighborhood of a core point.

A noise point is any point that is not a core point

nor a border point.
Example

Original Points Point types: core, border

and outliers
 = 10, MinPts = 4
Density-reachability

• Directly density-reachable
• An object q is directly density-reachable from object p
if p is a core object and q is in p’s -neighborhood.

• q is directly density-reachable from p

ε ε • p is not directly density-reachable from q
q p • Density-reachability is asymmetric

MinPts = 4
Density-reachability

• Density-Reachable (directly and indirectly):

– A point p is directly density-reachable from p2
– p2 is directly density-reachable from p1
– p1 is directly density-reachable from q
– p  p2  p1 q form a chain

p • p is (indirectly) density-reachable from q

p2 • q is not density-reachable from p
p1
q
MinPts = 7
DBSCAN Algorithm: Example

• Parameter
•  = 2 cm
• MinPts = 3

for each o  D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign o to NOISE
DBSCAN Algorithm: Example

• Parameter
•  = 2 cm
• MinPts = 3

for each o  D do
if o is not yet classified then
if o is a core-object then
collect all objects density-reachable from o
and assign them to a new cluster.
else
assign o to NOISE
DBSCAN: Sensitive to Parameters
DBSCAN: Determining EPS and MinPts
• Idea is that for points in a cluster, their kth nearest
neighbors are at roughly the same distance
• Noise points have the kth nearest neighbor at farther distance
• So, plot sorted distance of every point to its kth nearest neighbor
When DBSCAN Works Well

Original Points Clusters

• Resistant to Noise
• Can handle clusters of different shapes and sizes
When DBSCAN Does NOT Work Well

(MinPts=4, Eps=9.92).

Original Points

•Cannot handle varying densities

•sensitive to parameters—hard to determine the correct
set of parameters
(MinPts=4, Eps=9.75)
Take-away Message

• The basic idea of density-based clustering

• The two important parameters and the definitions of neighborhood
and density in DBSCAN
• Core, border and outlier points
• DBSCAN algorithm
• DBSCAN’s pros and cons
Objective Segmentation
Classification: Definition

• Given a collection of records (training set )

– Each record contains a set of attributes, one of the attributes is the
class.
• Find a model for class attribute as a function of the values of
other attributes.
• Goal: previously unseen records should be assigned a class
as accurately as possible.
– A test set is used to determine the accuracy of the model. Usually, the given
data set is divided into training and test sets, with training set used to build
the model and test set used to validate it.
Illustrating Classification Task

Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No

Learning
2 No Medium 100K No
algorithm
3 No Small 70K No

4 Yes Medium 120K No

5 No Large 95K Yes

Induction
6 No Medium 60K No

7 Yes Large 220K No Learn Model

8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes

Model
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set

55
Examples of Classification Task

• Predicting tumor cells as benign or malignant

• Classifying credit card transactions as legitimate or

fraudulent

• Classifying emails as spams or normal emails

• Categorizing news stories as finance, weather, entertainment,

sports, etc
Classification Techniques

• Decision Tree
• Naïve Bayes
• Nearest Neighbor
• Rule-based Classification
• Logistic Regression
• Support Vector Machines
• Ensemble methods
• ……
Example of a Decision Tree

Tid Refund Marital Taxable Cheat

Splitting Attributes
Status Income
1 Yes Single 125K No
2 No Married 100K No Refund
3 No Single 70K No Yes No
4 Yes Married 120K No MarSt
NO
5 No Divorced 95K Yes
Single, Divorced Married
6 No Married 60K No
7 Yes Divorced 220K No TaxInc NO
8 No Single 85K Yes < 80K > 80K
9 No Married 75K No
NO YES
10 No Single 90K Yes
10

Training Data Model: Decision Tree

Another Example of Decision Tree

MarSt Single,
Tid Refund Marital Taxable Cheat
Married Divorced
Status Income
NO Refund
1 Yes Single 125K No
Yes No
2 No Married 100K No
3 No Single 70K No NO TaxInc
4 Yes Married 120K No
< 80K > 80K
5 No Divorced 95K Yes
6 No Married 60K No NO YES
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No There could be more than one tree that fits the same
10 No Single 90K Yes data!
10
Decision Tree Classification Task
Tid Attrib1 Attrib2 Attrib3 Class Tree Induction
1 Yes Large 125K No algorithm
2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

Induction
5 No Large 95K Yes

6 No Medium 60K No Learn Model

7 Yes Large 220K No

8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes

Model
Training Set
Apply Decision
Model
Tid Attrib1 Attrib2 Attrib3 Class
Tree
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ? Deduction

14 No Small 95K ?

15 No Large 67K ?
10

Test Set
Apply Model to Test Data
Test Data
Start from the root of tree. Refund Marital Taxable Cheat
Status Income

No Married 80K ?
Refund 10

Yes No

NO MarSt