0% found this document useful (0 votes)

93 views25 pages

Customer Segmentation in Python Chapter3

The document discusses customer segmentation using k-means clustering in Python. It covers advantages of k-means clustering and its key assumptions of symmetric and similar variable distributions. It then discusses how to identify and manage skewed variables through transformations like logarithms and dealing with negative values. The document also covers centering and scaling variables to have equal means and variances before clustering. It emphasizes standardizing the data in the proper sequence of first unskewing, then centering, followed by scaling to meet k-means assumptions before applying the algorithm.

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views25 pages

Customer Segmentation in Python Chapter3

Uploaded by

Fgpeqw

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

DataCamp Customer Segmentation in Python

CUSTOMER SEGMENTATION IN PYTHON

Data pre-processing for k-

means clustering

Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python

Advantages of k-means clustering

One of the most popular unsupervised learning method
Simple and fast
Works well*

* with certain assumptions about the data

DataCamp Customer Segmentation in Python

Key k-means assumptions

Symmetric distribution of variables (not skewed)
Variables with same average values
Variables with same variance
DataCamp Customer Segmentation in Python

Skewed variables

Left-skewed

Right-skewed
DataCamp Customer Segmentation in Python

Skewed variables
Skew removed with logarithmic
transformation
DataCamp Customer Segmentation in Python

Variables on the same scale

datamart_rfm.describe()
K-means assumes equal mean
And equal variance
It's not the case with RFM data
DataCamp Customer Segmentation in Python

CUSTOMER SEGMENTATION IN PYTHON

Let's review the concepts

DataCamp Customer Segmentation in Python

CUSTOMER SEGMENTATION IN PYTHON

Managing skewed variables

Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python

Identifying skewness
Visual analysis of the distribution
If it has a tail - it's skewed
DataCamp Customer Segmentation in Python

Exploring distribution of Recency

import seaborn as sns
from matplotlib import pyplot as plt

sns.distplot(datamart['Recency'])
plt.show()
DataCamp Customer Segmentation in Python

Exploring distribution of Frequency

sns.distplot(datamart['Frequency'])
plt.show()
DataCamp Customer Segmentation in Python

Data transformations to manage skewness

Logarithmic transformation (positive values only)
import numpy as np
frequency_log= np.log(datamart['Frequency'])

sns.distplot(frequency_log)
plt.show()
DataCamp Customer Segmentation in Python

Dealing with negative values

Adding a constant before log transformation
Cube root transformation
DataCamp Customer Segmentation in Python

CUSTOMER SEGMENTATION IN PYTHON

Let's practice how to

identify and manage
skewed variables!
DataCamp Customer Segmentation in Python

CUSTOMER SEGMENTATION IN PYTHON

Centering and scaling

variables

Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python

Identifying an issue
datamart_rfm.describe()
Analyze key statistics of the dataset
Compare mean and standard
deviation
DataCamp Customer Segmentation in Python

Centering variables with different means

K-means works well on variables with the same mean
Centering variables is done by subtracting average value from each observation
datamart_centered = datamart_rfm - datamart_rfm.mean()
datamart_centered.describe().round(2)
DataCamp Customer Segmentation in Python

Scaling variables with different variance

K-means works better on variables with the same variance / standard deviation
Scaling variables is done by dividing them by standard deviation of each
datamart_scaled = datamart_rfm / datamart_rfm.std()
datamart_scaled.describe().round(2)
DataCamp Customer Segmentation in Python

Combining centering and scaling

Subtract mean and divide by standard deviation manually
Or use a scaler from scikit-learn library (returns numpy.ndarray object)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(datamart_rfm)
datamart_normalized = scaler.transform(datamart_rfm)

print('mean: ', datamart_normalized.mean(axis=0).round(2))

print('std: ', datamart_normalized.std(axis=0).round(2))

mean: [-0. -0. 0.]

std: [1. 1. 1.]
DataCamp Customer Segmentation in Python

CUSTOMER SEGMENTATION IN PYTHON

Test different approaches

by yourself!
DataCamp Customer Segmentation in Python

CUSTOMER SEGMENTATION IN PYTHON

Sequence of structuring
pre-processing steps

Karolis Urbonas
Head of Data Science, Amazon
DataCamp Customer Segmentation in Python

Why the sequence matters?

Log transformation only works with positive data
Normalization forces data to have negative values and log will not work
DataCamp Customer Segmentation in Python

Sequence
1. Unskew the data - log transformation
2. Standardize to the same average values
3. Scale to the same standard deviation
4. Store as a separate array to be used for clustering
DataCamp Customer Segmentation in Python

Coding the sequence

Unskew the data with log transformation

import numpy as np
datamart_log = np.log(datamart_rfm)

Normalize the variables with StandardScaler

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(datamart_log)

Store it separately for clustering

datamart_normalized = scaler.transform(datamart_log)
DataCamp Customer Segmentation in Python

CUSTOMER SEGMENTATION IN PYTHON

Practice on RFM data!

Test Bank For Abnormal Psychology 11th Us Edition by Comer
100% (1)
Test Bank For Abnormal Psychology 11th Us Edition by Comer
40 pages
Credit Risk Modeling in Python Chapter3
No ratings yet
Credit Risk Modeling in Python Chapter3
35 pages
Customer Segmentation in Python Chapter2
No ratings yet
Customer Segmentation in Python Chapter2
33 pages
Handbook - Mcdonnell Miller Service Guide
100% (2)
Handbook - Mcdonnell Miller Service Guide
40 pages
Introduction To Data Visualization With Seaborn Chapter3
100% (1)
Introduction To Data Visualization With Seaborn Chapter3
32 pages
Designing Machine Learning Workflows in Python Chapter2
No ratings yet
Designing Machine Learning Workflows in Python Chapter2
39 pages
Python Library Functions
No ratings yet
Python Library Functions
12 pages
Credit Risk - Predictive Modelling
No ratings yet
Credit Risk - Predictive Modelling
47 pages
Portfolio Management Report
No ratings yet
Portfolio Management Report
10 pages
Credit Risk Modeling Using Python
No ratings yet
Credit Risk Modeling Using Python
133 pages
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
No ratings yet
Predictive Analytics I: Data Mining: Process, Methods, and Algorithms
60 pages
Analyzing IoT Data in Python Chapter3
No ratings yet
Analyzing IoT Data in Python Chapter3
30 pages
Analyzing IoT Data in Python Chapter4
No ratings yet
Analyzing IoT Data in Python Chapter4
34 pages
(Morton Lane) Alternative Risk Strategies
No ratings yet
(Morton Lane) Alternative Risk Strategies
725 pages
6632-Bootcamp in Credit Risk
No ratings yet
6632-Bootcamp in Credit Risk
167 pages
Credit Score Validation
No ratings yet
Credit Score Validation
5 pages
Credit Derivatives
No ratings yet
Credit Derivatives
146 pages
PWC Basel III Capital Market Risk Final Rule
No ratings yet
PWC Basel III Capital Market Risk Final Rule
30 pages
Accenture Counterparty Credit Risk Basel Framework Successful Implementation
No ratings yet
Accenture Counterparty Credit Risk Basel Framework Successful Implementation
17 pages
Risk Definitions From CreditSuisse
No ratings yet
Risk Definitions From CreditSuisse
22 pages
Designing Machine Learning Workflows in Python Chapter4
No ratings yet
Designing Machine Learning Workflows in Python Chapter4
38 pages
0.1 Guilherme Marthe - Boston House Pricing Challenge
100% (1)
0.1 Guilherme Marthe - Boston House Pricing Challenge
15 pages
Credit Behavioral Model
No ratings yet
Credit Behavioral Model
54 pages
Recipes For State Space Models in R Paul Teetor
No ratings yet
Recipes For State Space Models in R Paul Teetor
27 pages
Bank Stress Testing and Comprehensive Capital Assessment and Review (CCAR)
No ratings yet
Bank Stress Testing and Comprehensive Capital Assessment and Review (CCAR)
34 pages
8 Best Python Cheat Sheets For Beginners and Intermediate Learners
100% (1)
8 Best Python Cheat Sheets For Beginners and Intermediate Learners
13 pages
Introduction To Data Visualization With Matplotlib Chapter2
No ratings yet
Introduction To Data Visualization With Matplotlib Chapter2
27 pages
The Data Science Process
100% (1)
The Data Science Process
53 pages
Classification and Prediction
100% (1)
Classification and Prediction
31 pages
Spoken Language Processing in Python Chapter3
No ratings yet
Spoken Language Processing in Python Chapter3
26 pages
Functions and Packages
No ratings yet
Functions and Packages
7 pages
XG Boost PDF
100% (1)
XG Boost PDF
3 pages
Designing Machine Learning Workflows in Python Chapter1
No ratings yet
Designing Machine Learning Workflows in Python Chapter1
32 pages
Python Programming For Economics Finance
No ratings yet
Python Programming For Economics Finance
267 pages
Kaossilator Manual PDF
No ratings yet
Kaossilator Manual PDF
80 pages
RAROC A Tool For Factoring Risk
No ratings yet
RAROC A Tool For Factoring Risk
5 pages
Data Scientist Certification Study Guide
No ratings yet
Data Scientist Certification Study Guide
7 pages
GLPK Intro
No ratings yet
GLPK Intro
12 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
Tractable Stochastic Analysis in High Dimensions Via Robust Optimization
100% (1)
Tractable Stochastic Analysis in High Dimensions Via Robust Optimization
48 pages
Analyzing IoT Data in Python Chapter1
100% (1)
Analyzing IoT Data in Python Chapter1
27 pages
Financial Numerical Recipes in C++
No ratings yet
Financial Numerical Recipes in C++
152 pages
The Musician Guide To Music
No ratings yet
The Musician Guide To Music
38 pages
Bayes PDF
No ratings yet
Bayes PDF
634 pages
Black-Scholes Made Easy
No ratings yet
Black-Scholes Made Easy
96 pages
A Combination of Hidden Markov Model and Fuzzy Model For Stock Market Forecasting PDF
No ratings yet
A Combination of Hidden Markov Model and Fuzzy Model For Stock Market Forecasting PDF
8 pages
How To Credit Score With Predictive Analytics: Whitepaper
No ratings yet
How To Credit Score With Predictive Analytics: Whitepaper
7 pages
Risk and Types of Risks
No ratings yet
Risk and Types of Risks
2 pages
Customer Segmentation in Python Chapter4
No ratings yet
Customer Segmentation in Python Chapter4
37 pages
ML0101EN Clus K Means Customer Seg Py v1
100% (1)
ML0101EN Clus K Means Customer Seg Py v1
8 pages
Customer Segmentation Using Machine Learning
100% (1)
Customer Segmentation Using Machine Learning
28 pages
Ads Phase 4
No ratings yet
Ads Phase 4
12 pages
Customer Segmentation Using K
No ratings yet
Customer Segmentation Using K
16 pages
Chapter 1,2 Report
No ratings yet
Chapter 1,2 Report
5 pages
DWDM PPT
No ratings yet
DWDM PPT
13 pages
ML Assignment 4
No ratings yet
ML Assignment 4
6 pages
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
100% (2)
K-Means Clustering For Customer Segmentation - A Practical Example - Kimberly Coffey, PH.D - PDF
41 pages
ML Assignment 1
No ratings yet
ML Assignment 1
23 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Customer Segmentation
No ratings yet
Customer Segmentation
15 pages
Mastering Python For Data Science - Sample Chapter
71% (7)
Mastering Python For Data Science - Sample Chapter
24 pages
Customer Segmentation Using K Means Clustering
No ratings yet
Customer Segmentation Using K Means Clustering
10 pages
Customer Segmentation
No ratings yet
Customer Segmentation
43 pages
KMeans Presentation
No ratings yet
KMeans Presentation
10 pages
Spoken Language Processing in Python Chapter1
No ratings yet
Spoken Language Processing in Python Chapter1
17 pages
Spoken Language Processing in Python Chapter4
No ratings yet
Spoken Language Processing in Python Chapter4
46 pages
Preparing Your Gures To Share With Others: Ariel Rokem
No ratings yet
Preparing Your Gures To Share With Others: Ariel Rokem
35 pages
Spoken Language Processing in Python Chapter2
No ratings yet
Spoken Language Processing in Python Chapter2
23 pages
Designing Machine Learning Workflows in Python Chapter3
No ratings yet
Designing Machine Learning Workflows in Python Chapter3
42 pages
Changing Plot Style and Color: Erin Case
No ratings yet
Changing Plot Style and Color: Erin Case
54 pages
Chapter3 PDF
No ratings yet
Chapter3 PDF
36 pages
Introduction To Data Visualization With Matplotlib: Ariel Rokem
No ratings yet
Introduction To Data Visualization With Matplotlib: Ariel Rokem
30 pages
Introduction To Data Visualization With Seaborn Chapter2
No ratings yet
Introduction To Data Visualization With Seaborn Chapter2
38 pages
Introduction To Data Visualization With Seaborn Chapter1
No ratings yet
Introduction To Data Visualization With Seaborn Chapter1
26 pages
Customer Segmentation in Python Chapter3
No ratings yet
Customer Segmentation in Python Chapter3
25 pages
Cleaning Data With PySpark Chapter3
No ratings yet
Cleaning Data With PySpark Chapter3
25 pages
Credit Risk Modeling in Python Chapter4
100% (1)
Credit Risk Modeling in Python Chapter4
35 pages
Cleaning Data With PySpark Chapter1
0% (1)
Cleaning Data With PySpark Chapter1
20 pages
Cleaning Data With PySpark Chapter4
No ratings yet
Cleaning Data With PySpark Chapter4
23 pages
Cleaning Data With PySpark Chapter2
100% (1)
Cleaning Data With PySpark Chapter2
25 pages
Building Chatbots in Python Chapter2 PDF
No ratings yet
Building Chatbots in Python Chapter2 PDF
41 pages
Building Chatbots in Python Chapter4
No ratings yet
Building Chatbots in Python Chapter4
20 pages
Analyzing IoT Data in Python Chapter2
No ratings yet
Analyzing IoT Data in Python Chapter2
35 pages
Golfing Doctors - Brochure
No ratings yet
Golfing Doctors - Brochure
6 pages
First Day of School
No ratings yet
First Day of School
2 pages
Mandate Letter: British Columbia Ministry of Public Safety and Solicitor General
No ratings yet
Mandate Letter: British Columbia Ministry of Public Safety and Solicitor General
5 pages
Spetech Static Seals
No ratings yet
Spetech Static Seals
75 pages
Youth Leadership: and Global Citizenship
No ratings yet
Youth Leadership: and Global Citizenship
26 pages
Mini Care Plan 1
No ratings yet
Mini Care Plan 1
2 pages
Financial Accounting - I 4th Edition Mohammed Hanif instant download
100% (1)
Financial Accounting - I 4th Edition Mohammed Hanif instant download
182 pages
Check List For Refractory Job
No ratings yet
Check List For Refractory Job
5 pages
Catalogue VSUN
No ratings yet
Catalogue VSUN
18 pages
Mobilizing People
No ratings yet
Mobilizing People
23 pages
What Is Hybrid Warfare?: Mushtaq Ahmad Mahindro
No ratings yet
What Is Hybrid Warfare?: Mushtaq Ahmad Mahindro
2 pages
I-Learn Smart Start Grade 5 Theme 6 - Reading & Writing Test
No ratings yet
I-Learn Smart Start Grade 5 Theme 6 - Reading & Writing Test
5 pages
Đề 3 - 5
No ratings yet
Đề 3 - 5
18 pages
Lesson Exemplar PE5
No ratings yet
Lesson Exemplar PE5
5 pages
Communication and Human Right
No ratings yet
Communication and Human Right
5 pages
Laporan Harian "Konstruksi" Gresem Pipeline Project
No ratings yet
Laporan Harian "Konstruksi" Gresem Pipeline Project
10 pages
Docs Securityonion Net en 2.3
No ratings yet
Docs Securityonion Net en 2.3
389 pages
Hd112 - Revolving Nozzle
No ratings yet
Hd112 - Revolving Nozzle
2 pages
Ponavljanje, Unit 6 - Conditionals
No ratings yet
Ponavljanje, Unit 6 - Conditionals
4 pages
Lecture Notes Chapter 3 Fin 358
No ratings yet
Lecture Notes Chapter 3 Fin 358
18 pages
Spherical Trigonometry
No ratings yet
Spherical Trigonometry
9 pages
Electrochemistry - DPP 03 (Of Lec 06) - Lakshya JEE 2025
No ratings yet
Electrochemistry - DPP 03 (Of Lec 06) - Lakshya JEE 2025
3 pages
Techincal Specifications Shop Manual 2011 REX 550 - ETEC en
No ratings yet
Techincal Specifications Shop Manual 2011 REX 550 - ETEC en
20 pages
Cristia Arbo Lack - Urban Wind Turbines - Master Thesis - 250210
No ratings yet
Cristia Arbo Lack - Urban Wind Turbines - Master Thesis - 250210
128 pages
Harshavardhana Dynasty
100% (1)
Harshavardhana Dynasty
26 pages
AWSome Day Sep 24 Handout
No ratings yet
AWSome Day Sep 24 Handout
6 pages
TechED EMEA 2019 - VZ08 - Distributed HMI With FactoryTalk® View Site Edition Basic Lab
No ratings yet
TechED EMEA 2019 - VZ08 - Distributed HMI With FactoryTalk® View Site Edition Basic Lab
17 pages
Art of Kakadu Updated 16 SOLD APRIL 2018
No ratings yet
Art of Kakadu Updated 16 SOLD APRIL 2018
6 pages
Introduction To Credit
No ratings yet
Introduction To Credit
3 pages

Customer Segmentation in Python Chapter3

Uploaded by

Customer Segmentation in Python Chapter3

Uploaded by

DataCamp Customer Segmentation in Python

CUSTOMER SEGMENTATION IN PYTHON

Data pre-processing for k-

Advantages of k-means clustering

* with certain assumptions about the data

Key k-means assumptions

Variables on the same scale

CUSTOMER SEGMENTATION IN PYTHON

Let's review the concepts

CUSTOMER SEGMENTATION IN PYTHON

Managing skewed variables

Exploring distribution of Recency

Exploring distribution of Frequency

Data transformations to manage skewness

Dealing with negative values

CUSTOMER SEGMENTATION IN PYTHON

Let's practice how to

CUSTOMER SEGMENTATION IN PYTHON

Centering and scaling

Centering variables with different means

Scaling variables with different variance

Combining centering and scaling

print('mean: ', datamart_normalized.mean(axis=0).round(2))

mean: [-0. -0. 0.]

CUSTOMER SEGMENTATION IN PYTHON

Test different approaches

CUSTOMER SEGMENTATION IN PYTHON

Why the sequence matters?

Coding the sequence

Unskew the data with log transformation

Normalize the variables with StandardScaler

from sklearn.preprocessing import StandardScaler

Store it separately for clustering

CUSTOMER SEGMENTATION IN PYTHON

Practice on RFM data!

You might also like