0% found this document useful (0 votes)
22 views3 pages

SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O

The document discusses the K-Means clustering algorithm. It describes how K-Means works by assigning data points to clusters based on distance to centroids, and recalculating centroids iteratively until clusters stabilize. It notes that specifying the number of clusters K is difficult and provides steps to apply K-Means in SAP HANA to cluster customer mobile phone usage data into segments.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views3 pages

SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O

The document discusses the K-Means clustering algorithm. It describes how K-Means works by assigning data points to clusters based on distance to centroids, and recalculating centroids iteratively until clusters stabilize. It notes that specifying the number of clusters K is difficult and provides steps to apply K-Means in SAP HANA to cluster customer mobile phone usage data into segments.

Uploaded by

jefferyleclerc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

3/14/24, 7:16 AM SAP HANA PAL – K-Means Algorithm or How to do Cust...

- SAP Community

Each cluster is associated with a centroid and each point is assigned to the cluster with the closest centroid. The centroid is
the mean of the points in the cluster. The closeness can be measured using:

Manhattan Distance
Euclidean Distance (most commonly used)
Minkowski Distance

Every time a point is assigned to a cluster the centroid is recalculated. This is repeated in multiple iterations until centroids
don’t change anymore (meaning all points have been assigned to a corresponding cluster) or until relatively few points
change clusters. Usually most of the centroid movement happens in the first iterations.

One of the main drawbacks of the K-Means Algorithm is that you need to specify the number of Ks (or clusters) upfront as
an input parameter. Knowing this value is usually very hard, that is why it is important to run quality measurement
functions to check the quality of your clustering. Later in this post we will talk about this.

I came across a very interesting paper that talks about segmentation in the telecommunication industry, so I thought it
would be a very nice use case to demo the K-Means algorithm in HANA (if you are interested in this topic, I very much
recommend reading this paper). These are the steps I followed:

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 3/39


3/14/24, 7:16 AM SAP HANA PAL – K-Means Algorithm or How to do Cust... - SAP Community

Prepare the Data

The first step is creating a table that will contain information on customers mobile phone usage habits with the following
structure:

CREATE COLUMN TABLE "TELCO" (

"ID" INTEGER NOT NULL, --> Customer ID

"AVG_CALL_DURATION" DOUBLE, --> Average Call Duration

"AVG_NUMBER_CALLS_RCV_DAY" DOUBLE, --> Average Calls Received per Day

"AVG_NUMBER_CALLS_ORI_DAY" DOUBLE, --> Average Calls Originated per Day

"DAY_TIME_CALLS" DOUBLE, --> Percentage of Calls made during day time hours (9 a.m. - 6 p.m.)

"WEEK_DAY_CALLS" DOUBLE, --> Percentage of Calls made during week days (Monday thru Friday)

"CALLS_TO_MOBILE" DOUBLE, --> Percentage of Calls made to mobile phones

"SMS_RCV_DAY" DOUBLE, --> Number of SMSs received per day

"SMS_ORI_DAY" DOUBLE, --> Number of SMSs sent per day

PRIMARY KEY ("ID"))

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 4/39


3/14/24, 7:16 AM SAP HANA PAL – K-Means Algorithm or How to do Cust... - SAP Community

/* Table Type that will be used as the input parameter

that will contain the data that I would like to cluster */

DROP TYPE PAL_KMEANS_DATA_TELCO;

CREATE TYPE PAL_KMEANS_DATA_TELCO AS TABLE(

"ID" INT,

"AVG_CALL_DURATION" DOUBLE,

"AVG_NUMBER_CALLS_RCV_DAY" DOUBLE,

"AVG_NUMBER_CALLS_ORI_DAY" DOUBLE,

"DAY_TIME_CALLS" DOUBLE,

"WEEK_DAY_CALLS" DOUBLE,

"CALLS_TO_MOBILE" DOUBLE,

"SMS_RCV_DAY" DOUBLE,

"SMS_ORI_DAY" DOUBLE,

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 7/39

You might also like