Amazon User Segmentation: Ankit Chaudhary, Abhishek Pal, Ankit Saraswat, Harshit Jindal, Jagbeer Singh
Amazon User Segmentation: Ankit Chaudhary, Abhishek Pal, Ankit Saraswat, Harshit Jindal, Jagbeer Singh
ISSN: 1001-4055
Vol. 45 No. 2 (2024)
___________________________________________________________________
Abstract— In e-commerce environments, user experiences and marketing strategies are best optimized by
effective user segmentation. This project classifies users according to pertinent criteria like age and buy rating by
using sophisticated clustering algorithms like K-means and DBSCAN. The Elbow Method makes it easier to
determine the ideal number of clusters, resulting in a more accurate and detailed segmentation. Disk plots and
other visualizations provide information about different user groups and their purchase patterns. Interactive user
interfaces make these segments easier to explore.
The effectiveness of K-means is compared with hierarchical clustering and DBSCAN in a thorough analysis that
includes criteria such as the Silhouette Score for reliable cluster evaluation. We go into great detail about ethical
issues, such as algorithmic fairness and user privacy. By strengthening our knowledge of user behavior in the
context of e-commerce, this study lays the groundwork for customized marketing campaigns that cater to
individual user preferences.
Keywords— E-commerce, DBSCAN, User segmentation, silhouette metric, clustering technique, Amazon
I. Introduction
In the dynamic landscape of e-commerce, understanding user behavior and preferences is crucial for delivering
personalized experiences. This project delves into the segmentation of Amazon users, employing advanced
clustering techniques to categorize users based on age, income, and the purchase rating.
1279
Tuijin Jishu/Journal of Propulsion Technology
ISSN: 1001-4055
Vol. 45 No. 2 (2024)
___________________________________________________________________
investigation of substitute techniques. A noteworthy answer was offered by density-based clustering, which was
demonstrated by DBSCAN. This method identified clusters based on data density. This methodology not only
exhibited resilience to anomalies but also exhibited adaptability in identifying groups of any kind, conforming to
the varied and complex user conduct patterns observed in actual situations.
Current developments in user segmentation involve merging data from several sources, including past purchases,
preferences, and user interactions. Neural network-based clustering is one example of a deep learning technique
that has been studied for its ability to find complex correlations in user data, perhaps leading to improved
segmentation accuracy.
But difficulties also accompany advancement. One of the key concerns that still exists is the interpretability of
clusters, which is essential for practical insights. More complexity in this area is related to handling high-
dimensional data, maintaining privacy compliance, and meeting the need for real-time segmentation to adjust to
changing user behavior.
The literature study concludes by outlining the development of user segmentation techniques in e-commerce and
highlighting the field's dynamic character. Though K-means clustering is still a popular option, research into
density-based techniques such as DBSCAN, along with developments in deep learning and integrating several
data sources, suggests a search for more flexible and precise segmentation methods. Obstacles and unanswered
concerns highlight the necessity of ongoing investigation and creativity in user segmentation approaches, offering
a solid basis for next studies targeted at improving tailored advertising and user experience on e-commerce sites.
1280
Tuijin Jishu/Journal of Propulsion Technology
ISSN: 1001-4055
Vol. 45 No. 2 (2024)
___________________________________________________________________
customized strategies for personalized product offerings. The ITF model has been discovered to be applicable, in
identifying user innovation segments. in online communities in a study conducted by Chen et al. (2018).
Table 1. A comprehensive overview of the existing literature, on market segmentation, throughout time
periods.
IV. Proposed Work Plan
A. GENERAL ARCHITECTURE,
The overall design consists of a methodical flow of operations. Preprocessing the data sets the stage for the Elbow
Method's application, which yields the ideal cluster numbers. After that, K-means clustering is carried out, and
Plotly Express is used to show the outcomes. One way to quantify the quantitative quality of a cluster is through
the Silhouette Score.
● K- Means Clustering
1281
Tuijin Jishu/Journal of Propulsion Technology
ISSN: 1001-4055
Vol. 45 No. 2 (2024)
___________________________________________________________________
The K-means clustering technique is the foundation of user segmentation. Based on purchase rate and age, this
approach, which is based on centroid-based clustering, divides users into groups. Cluster assignments are refined
iteratively to guarantee a complete segmentation procedure that culminates in discrete user segments.
● Visualization
Plotly Express is skillfully used to create dynamic visual representation that is visually striking. Interactive
components give visually striking scatter plots that bring user clusters to life. This gives customers the ability to
interactively examine and evaluate individual data points in addition to giving them a perceptive overview of the
segmentation results.
The Silhouette Score is a crucial numerical indicator that measures how effective the clustering procedure was.
This score provides a nuanced indication of the isolation and distinctiveness of the clusters that have been detected.
Examining the Silhouette Score provides the project with important information about the overall effectiveness
and coherence of the user segmentation that was accomplished.
1282
Tuijin Jishu/Journal of Propulsion Technology
ISSN: 1001-4055
Vol. 45 No. 2 (2024)
___________________________________________________________________
1283
Tuijin Jishu/Journal of Propulsion Technology
ISSN: 1001-4055
Vol. 45 No. 2 (2024)
___________________________________________________________________
1. Euclidean Distance:
The Euclidean distance between two points P(x1,y1) and Q(x2,y2) in a two-dimensional space is given by:
Distance = √(𝑥2 − 𝑥1 )2 + (𝑦2 − 𝑦1 )2
• This distance measure is used to determine the proximity of data points to cluster centroids.
4. Silhouette Score:
𝑏(𝑖)−𝑎(𝑖)
𝒔(𝒊) =
max (𝑎(𝑖),𝑏(𝑖))
where:
• s(i) is the silhouette score for sample i.
• a(i) is the average distance from sample i to other samples in the same cluster.
• b(i) is the smallest average distance from sample i to samples in a different
cluster, minimized over clusters.
VI. Conclusion
Research uses advanced clustering techniques to give a comprehensive approach to user segmentation on
Amazon.com. Important insights into the preferences and behavior of users are obtained by means of the
Silhouette Score and the outcome’s analysis. This establishes the framework for upcoming improvements and
practical uses of the concept. This research not only helped us gain a better understanding of how users behave
on Amazon.com, but it also set the stage for more individualized marketing approaches and enhanced user
interfaces. Prospective directions for future research encompass investigating other clustering techniques,
integrating real-time data for dynamic segmentation, and working with industry partners for pragmatic
application. Machine learning applied to customer segmentation is a modern, data-driven strategy in e-commerce
that promises more developments and insights for improving user experiences and happiness. Encouragingly, the
User Segmentation Project represents a major advancement in the optimization of user-centric methods within the
ever-changing e-commerce ecosystem.
1284
Tuijin Jishu/Journal of Propulsion Technology
ISSN: 1001-4055
Vol. 45 No. 2 (2024)
___________________________________________________________________
References
[1] 1956, R. Smith explored the theme of " Product Distinctiveness and Market Subsegmentation as Alternative
Marketing Strategies" in the Journal of Marketing (vol. 21, pp. 3-8).
[2] Kotler, and G. Armstrong, Principles of Marketing: Upper Saddle River . NJ: Prentice Hall, 1999.
[3] J. Cahill, Lifestyle market segmentation. New York: Haworth Press, 2006.
[4] Anderson, D. C. Jain, and P. K. Chintagunta, "Customer value assessment in business markets: a state-of-
practice study," Journal of Business-to-Business Marketing, vol. 1, pp. 3-29, 1993.
[5] S. Hassan, and S. H. Craft, "Linking global market segmentation decisions with strategic positioning options,"
Journal of Consumer Marketing, vol. 22, pp. 81-89, 2005.
[6] Abratt, "Market segmentation practices of industrial marketers," Industrial Marketing Management, vol. 22,
pp. 79-84, 1993.
[7] K. Foedermayr and A. Diamantopoulos, "Market segmentation in practice: review of empirical studies,
methodological assessment, and agenda for future research," Journal of Strategic Marketing, vol. 16, pp. 223-
265, 2008.
[8] Kumar, and W. Reinartz, "Customer relationship management issues in the business-to-business context,"
Customer Relationship Management, pp. 261-277, 2012.
[9] Tsiptsis, and A. Chorianopoulos, Data Mining Techniques in CRM: inside Customer Segmentation. Wiley
Publishing, 2010.
1285