Assignment 2
Assignment 2
BIDM Assignment 2
Section B, Group 7 Kuldeep Das PGP26282 Nitul Das PGP26105 Amit Roykaran PGP26196
Contents
Introduction ............................................................................................................................................ 3 Understanding the business problem & objectives ................................................................................ 3 Business Objectives............................................................................................................................. 3 Data mining objectives........................................................................................................................ 3 Data preparation ( Done in excel file) ..................................................................................................... 4 Clustering Analysis Using SAS ................................................................................................................. 5 Clustering based on Demographics .................................................................................................... 6 Clustering based on purchase behaviour............................................................................................ 9 Clustering based on Purchase Basis .................................................................................................. 10 Clustering based on Purchase behaviour + Purchase basis .............................................................. 15 Question 2 ............................................................................................................................................. 17
ASSIGNMENT2 - BIDM
Introduction
CRISA is an Asian market research agency that specializes in tracking consumer purchase behaviour in consumer goods. CRISA has recorded the data of household consumption pattern. The households were selected using stratified sampling techniques. The data captured by CRISA contains the following information: Demographics of the households (updated annually) Possession of durable goods: This data is used to calculate the affluence index Purchase data of product categories and brands (updated monthly)
In this project, we have used k-means clustering to identify clusters based on parameters such as: Purchase behaviour (volume, frequency, susceptibly to discounts, and brand loyalty) Basis of purchase (price, selling proposition)
And then we have combined the above variables to find segmentation based on both purchase behaviour and Basis of purchase.
To find the best segmentation of these clusters using demographic variables also in combination with the above variables. There is an upper cap on the number of clusters due to the number of promotional campaigns that can be run which is 5. Hence, an ideal clustering should not exceed more than 5 clusters.
Each of these criteria is normalized (between 0 to 1) so as to remove the bias of higher numeric values for a given criteria. a) No. Of brands As the number of brands increases, the probability of switching between the brands increases, hence the lower the number of brands its better. Hence we assign a lower score to rows which have low number of brands thus indicating a better brand loyalty. b) Brand Runs The lower the number of brand runs, the better it is. A higher number of brand runs increases the probability of having brand runs for multiple brands, therefore indicating a higher switching behaviour. Hence we assign a lower score to rows which have lower number of brand runs. c) Volume of purchases attributed to each brand The higher the purchase for a given brand, the better it is and hence we attribute a lower score to this parameter. The way we have worked out the score for this criteria is that We find the max % volume attributed to any one of the given brands From the given table below we assign the score to this variable (Note that the score increases as the % volume decreases, this is to ensure that we get a lower score for the brand loyalty index in consistent with the other 2 criterias) Score 0.0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
The final score for brand loyalty index is therefore a linear combination of the three criteria mentioned above with different weights assigned to indicate relative importance. Volume of purchase is given low importance. This is illustrated by the example below. A customer might buy a brand less number of times, but the times he buy a brand, purchases in bulk quantities, in this case he is less loyal than a person, who visits to buy a brand more number of times, but buys in less quantities. Brand Loyalty Index = 0.4 * No. Of brands_score + 0.4 * Brand Run_score + 0.2 * Volume of purchase attributed to a given brand_score The lower the brand loyalty index, the better it is.
Figure 3: Variable Importance, Demographic clustering As we can see, that Affluence_Index is the most important variable among the demographic variables
Figure 6: Segment Profile of the generated clusters, Demographic clustering Segment 1 2 3 4 5 Comment on Affluence Index Little less than average Very low Very high Little higher than average Average
Figure 8: Variable importance, clustering based on purchase behaviour As we can see, Total Volume, No_of_trans, and Brand Loyal are the important variables.
clusters (> 15). Hence we manually limit the number of clusters to 4, 5&6 and then come to the conclusion that the best cluster is 5. (Below diagrams illustrate that cluster size 5 gives the best distribution)
Figure 16: Variable importance, Purchase Basis As we can see, that Pr_Cat_2 is the most important variable.
Figure 19: Segment profile, Purchase Basis Segment 1 2 3 4 5 Comment on Pr_Cat_2 variable distribution in the cluster Lowest among the all Less than average Significantly higher than average Less than average Higher than average
Figure 25: Segment profile, both purchase basis + purchase behaviour As it can be seen, that purchase behaviour variables dominate more than the purchase basis variables from the variables importance table.
Question 2
To identify the best segmentation basis out of the 3 profiles (purchase behavior, basis for purchase, both basis for purchase and purchase behavior) we have to see the distance between the clusters. The following plots shows the distance between the clusters in the 3 profiles used:
Basis of Purchase
Demographic
Both basis of purchase and purchase behavior Hence, we can see that a combination of both basis for purchase and purchase behavior gives the highest degree of separation between the clusters and hence is the best segmentation criteria. Based on the segment profile of this segmentation basis, we can say that the segments have the following membership Segment 1 2 3 4 5 Key Characteristics Less than average volume purchase, Least brand loyalty, Less than average price category 1 More than average volume purchase, Less than average brand loyalty, More than average price category 1 Average Volume purchase, Average brand loyalty, Average price category 1 Lowest Volume purchase, Highest brand loyalty, Highest price category 1 Highest Volume purchase, More than average brand loyalty, Lowest price category 1