0% found this document useful (0 votes)
42 views12 pages

03 Segmenting Stores Using Clustering - SAC

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views12 pages

03 Segmenting Stores Using Clustering - SAC

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Segmenting Stores Using

Clustering
Authors:
Nitin Kalé, University of Southern California
Nancy Jones, San Diego State University

Revised:
Liz Simmons, July 2022

OBJECTIVE
The objective of this exercise is to segment retail stores based on various attributes to help with sales
promotions.

ACTIVITIES
• Import and prepare data.
• Apply Smart Grouping cluster analysis.
• Merge data.
• Create data visualizations.
• Analyze and interpret output from models.

SOFTWARE PREREQUISITES
• SAP Analytics Cloud
• Microsoft Excel

DATA SET
Data file titled Stores.csv

1 of 12
Scenario
The Country Manager of a retail chain (which has 150 stores) is finalizing plans for three sales
promotion strategies. Data pertaining to the stores such as store location, sales turnover, store
size, staff, and profit margin are stored in a CSV file. The manager wants to segment the 150
stores into three different groups based on sales turnover, profit margin, store size, and staff
size so specific strategies can be applied to each store segment. You will use clustering of retail
stores data to assist the manager in developing promotion strategies.

Cluster Analysis
Given a dataset, organizing it into meaningful groups is a basic and useful approach to data
mining and data analysis. Clustering classifies samples into groups using a measure of
association so that data points within a group are similar. Data points from different groups are
not similar. Data points are multidimensional, that is they consist of several variables.
Visualization is not practical for humans when datasets consist of more than three dimensions.
The input to a clustering exercise is a dataset and the number of clusters. The result of the
analysis is a set of clusters. K-means clustering is a method of finding clusters and their
centers (R) given a choice in the number of clusters (K). It is often used for market
segmentation. The goal is to make the inter-cluster difference (distance) high and the intra-
cluster difference (distance) low.

1. Visualize the Store Data


1. In SAP Analytics Cloud (SAC), Select Stories → Create New → Canvas.
2. Add data → Data uploaded from file.
3. Select Source File and choose the Stores.csv file provided to you, Open.
a. Use first row as column headers should be selected and the CSV Delimiter
should be set to Auto-detect.
b. Import.
c. You will be directed to the Data view. You should have 150 rows of data: four
Measures (Profit Margin, Sales Turnover, Staff Size and Store Size) and one
Dimension (Store) in the dataset.
4. Select the Story view. You will now create a visualization of the relationships among
the variables of the data set as a first step to helping the Regional Manager understand
the dynamics of each of the stores in her area of responsibility.

2 of 12
a. Insert Chart.
b. Select Bubble Chart from the Correlation charts.
c. Configure the Chart Structure as follows:
(1) + Add Sales Turnover to the X-Axis.
(2) + Add Staff Size to the Y-Axis.
(3) + Add Profit Margin to Size.
(4) + Add Store to the Dimensions.
(5)+ Add a Tooltip Measure as shown in Figure 1. You can find Add Tooltip after
clicking the three dots icon next to Chart Structure.

Figure 1: Adding a Tooltip

(6) Tooltip Measures will now show as a Chart Structure option. + Add Store Size
to Tooltip Measures.
(7) You will now see a Bubble chart of the first three measures by Store.

3 of 12
Figure 2: A Bubble Chart of Store Data

2. Creating the Cluster Analysis


You may hover over any of the bubbles of the Bubble chart to get more information about the
data point. You may also filter to stores of interest. However, I think you will agree this chart is
of limited usefulness. Let’s group the stores that are similar using a k-means. In SAC,
clustering is done using Smart Grouping.
1. Toggle on Smart Grouping (near the bottom of the Builder panel).
2. Change the Number of Groups to 3, (3 is k in the k-means algorithm).
3. Change the Group Label to “Cluster” just to be consistent with your understanding of
cluster analysis.
4. Select Include Tooltip Measures in grouping so all four Measures are considered in
the cluster analysis.

4 of 12
Figure 3: Configure Smart Grouping

5. The clusters in the default monochromatic color scheme tend to blend together, so you
may want to change the Color pallet. You should now see three distinct groups
(clusters) in your chart. You can filter on the clusters by clicking the cluster number you
wish to examine.

Question 1: Add your name to the title of the clustered Bubble chart and
submit a screenshot of the chart.

3. Visualization and Interpretation


The results of the grouping (clustering) can be further analyzed by associating the cluster
numbers with the data in the Stores data “model”. (The original Stores.csv file is stored as a
private or embedded “model” within your SAC story.)
Each cluster may be analyzed individually. That means you can create visualizations for the
data filtered by cluster. However, since the manager wants to compare each of the clusters of
5 of 12
customers, it will be useful to actually create a new data set that includes all three clusters
together. To do this, you will export the data from each of the three clusters to a spreadsheet
and add a cluster identifier.
1. Merging data sets.
a. Filter to Cluster 1 by clicking on the Legend of the Bubble chart you created. Refer
to Figure 4.

Figure 4: Filter the Cluster

(1) Notice that SAC Smart Grouping will continue to break down the filtered data set
to even smaller clusters. You can ignore these new groups.
(2) Select Export from the chart dropdown list.

6 of 12
Figure 5: Export the Clustered Data

(3) Name the .csv file “Cluster_1”. The data from Cluster 1 will be downloaded to
your computer.
b. Repeat these steps for Clusters 2 and 3 and name the files “Cluster_2”
and “Cluster_3” respectively.
NOTE: Be sure to remove the chart filter (click the X to the right of 1 Filter in the
header) and replace it with the next cluster number before downloading the data.
You should have three downloaded .csv files.
c. Now you will prepare the cluster data for integration with the Stores data model in
SAC.
(1) The first step is to clean up the header information so it is only one row. Open the
Cluster_1.csv file.
(i) Move content of cells B1:D1 to cells B2:D2.
(ii) Delete row 1.
(2) Next add a column called “Cluster”.
(i) Add the cluster number to all the rows of data.
(3) Save the .csv file.

7 of 12
(4) You can see the results of your clean up in the following before and after Figures:

Figure 6: The .csv file from SAC

Figure 7: The .csv file after "Wrangling"

d. Repeat these steps for Clusters 2 and 3.


e. To merge the cluster data with the Stores data do the following:
(1) Go to Data view, Grid mode.
(2) Use the dropdown menu next to Stores to select + Add New Data.

Figure 8: Adding Data to the Analysis

(3) Select Data uploaded from a file.


(4) Select Source Cluster_1.csv.
(i) Use first row as header should be selected.
(ii) Import.

8 of 12
(5) On the Save dropdown select Open With Basic Data Preparation. This
will allow you to append the files for clusters 2 and 3.

Figure 9: Open with Basic Data Preparation

(6) Select Reimport Data from the Data ribbon.

Figure 10: Reimport Data

(7) Select Cluster_2.csv.


(8) When you see the following screen, select Append.

9 of 12
Figure 11: Append a File

(9) Finish.
(10) Repeat the append for Cluster_3.csv.
(i) Now look at the data in the Clusters data set and you should find stores in all
three clusters and 150 rows.
f. Save.
2. To visualize the Stores and Clusters data, go to the Story view.
a. Add a new page with either a Canvas or a Responsive page.
b. Add a chart.
c. Add a Calculated Measure for Count of Stores as shown below:

Figure 12: Count of Stores


10 of 12
d. In Builder, select the Link Dimensions icon to the right of Data Source.
e. You now need to choose the matching dimensions from each data set. In your case,
the “Stores” dimension in the Stores data matches the “Stores” dimension in the
Cluster data.
f. Click the More dots to the right of Dimension to select Data Samples > ID to see
samples of the linked values. The Link Dimension settings are shown below. Then
click Set, and then Done.

Figure 13: Link Dimensions

g. Leave the chart as a Column chart. To add variables to the chart, you will now have a
choice of which data set you would like to use. You will see them as a drop down
when you add a Measure or Dimension. SAC calls this a blended data chart.
(1) Add Count of Stores from the Store data set to Measures.
(2) Add Cluster from the Clusters data set to Dimensions.

Question 2: Which Cluster has the highest number of stores?


Support your answer with a screenshot.

11 of 12
3. Create visualizations to answer the following questions:

Question 3: Provide the name of one store in each cluster. Include a


screenshot of the store name from within each cluster.
(Hover over the cluster circles to see the data they represent.)

Question 4: How does Average Profit Margin, Average Sales Turnover, and
Average Staff Size compare amongst the clusters?
Support your answers with a screenshot.
Hint: You will need to create Calculated Measures to determine
Averages.

Challenge Activity 1 (Optional, Not Graded)


Choose one cluster to analyze further using visualizations. Provide a detailed
description/analysis of the stores within the cluster you have chosen. Based on what you see
in this cluster, what kind of marketing strategy to improve sales for the stores in the cluster
do you recommend?

Challenge Activity 2 (Optional, Not Graded)


Design tables and/or visualizations to determine the density of each of the three clusters.
Hint: You may want to create some calculated measures for measures of dispersion or
you could also use the variance tool.

12 of 12

You might also like