Mini Project Report 2024 IS07
Mini Project Report 2024 IS07
MACHINE LEARNING
Dr. G. P. Hegde
Professor and Head
in partial fulfillment of the requirements for the award of the degree of
Bachelor of Engineering
UJIRE-574 240
CERTIFICATE
Certified that the Project Work titled ‘Customer Segment Using Machine Learning’
is carried out by Ms. K Spandana Bhat, USN: 4SU21IS018, Ms. Prabhavati M
Patil, USN: 4SU21IS032 and Mr. Purushottam P Kudale, USN: 4SU22IS406, are
bonafide students of SDM Institute of Technology, Ujire, in partial fulfillment for the
award of the degree of Bachelor of Engineering in Information Science and
Engineering of Visvesvaraya Technological University, Belagavi during the year
2023-2024. It is certified that all the corrections/suggestions indicated for Internal
Assessment have been incorporated in the report deposited in the departmental library.
The report has been approved as it satisfies the academic requirements in respect of
project work prescribed for the said degree.
1.
2.
i
Acknowledgement
It is our pleasure to express our heartfelt thanks to Dr. G. P. Hegde, Professor and Head of
Department of Information Science and Engineering, for his supervision and guidance which
enabled us to understand and develop this project.
We are indebted to Dr. Ashok Kumar T, Principal, and Dr. G. P. Hegde, Professor and Head of
the Department, for their advice and suggestions at various stages of the work. We also extend our
heartfelt gratitude to the Management of SDM Institute of Technology, Ujire, for providing us with
a good learning environment, library and laboratory facilities. We appreciate the help and the
support rendered by the teaching and non-teaching staff of Information Science and Engineering.
Lastly, we take this opportunity to offer our regards to all of those who have supported us directly
or indirectly in the successful completion of this project work.
K Spandana Bhat
Prabhavati M Patil
Purushottam P Kudale
ii
Abstract
Nowadays Customer segmentation became very popular method for dividing company’s customers
for retaining customers and making profit out of them, in the following study customers of different
of organizations are classified on the basis of their behavioural characteristics such as spending and
income, by taking behavioural aspects into consideration makes these methods an efficient one as
compares to others. For this classification a machine algorithm named as k-means clustering
algorithm is used and based on the behavioural characteristic’s customers are classified. Formed
clusters help the company to target individual customer and advertise the content to them through
marketing campaign and social media sites which they are really interested in.
iii
Table of Contents
Page No.
Acknowledgment i
Abstract ii
Table of Content iii
List of Figures iv
Chapter 1 Introduction 1
Chapter 2 Literature Review 2
2.1 General Introduction 2
2.2 Literature Survey 2
Chapter 3 Problem Formulation 3
3.1 Motivation 3
3.2 Objectives 3
Chapter 4 System Requirements and Methodology 4
4.1 Hardware Requirements 4
4.2 Software Requirements 4
4.3 Methodology Used 4
Chapter 5 System Design 6
5.1 Architecture of the Proposed System 6
5.3 System Flow Chart 6
5.3 Implementation of Code 7
Chapter 6 Results and Discussion 17
6.1 Results 17
6.2 Discussion 18
Chapter 7 Conclusion and Scope for Future Work 19
7.1 Conclusion 19
7.2 Scope for Future Work 19
References 20
Personal Profile 21
iv
List of Figures
Page No.
Figure 4.1 Block diagram 5
v
Chapter 1
Introduction
Today many of the businesses are going online and, in this case, online marketing is becoming
essential to hold customers, but during this, considering all customers as same and targeting all of
them with similar marketing strategy is not very efficient way rather it's also annoys the customers by
neglecting his or her individuality, so customer segmentation is becoming very popular and also
became the efficient solution for this existing problem. Customer segmentation is defined as dividing
company's customers on the basis of demographic (age, gender, marital status) and behavioural (types
of products ordered, annual income) aspects. Since demographic characteristics does not emphasize
on individuality of customer because same age groups may have different interests so behavioural
aspects is a better approach for customer segmentation as its focus on individuality and we can do
proper segmentation with the help of it.
1
Chapter 2
Literature Review
2
Chapter 3
Problem Formulation
Develop a customer segmentation model using machine learning to group customers based on their
purchasing behaviour. Utilize demographic, behavioural, and psychographic data collected from our
CRM system. The model should accurately identify distinct customer segments that can be used to
personalize marketing campaigns and improve customer satisfaction.
3.1 Motivation
Customer segmentation using machine learning offers businesses a strategic edge by unlocking
profound insights from vast datasets. By harnessing advanced algorithms, companies can categorize
customers into distinct groups based on their behaviours, preferences, and purchasing patterns. This
segmentation enables personalized marketing strategies that resonate more deeply with each
segment, enhancing engagement and conversion rates. Moreover, machine learning facilitates
predictive analytics to forecast customer behaviours such as churn or buying propensity,
empowering proactive retention and targeted marketing efforts. This data-driven approach not only
optimizes resource allocation but also fosters continuous adaptation to evolving market dynamics,
ensuring sustained competitiveness. Ultimately, customer segmentation through machine learning
drives enhanced customer experiences, operational efficiency, and strategic decision-making,
positioning businesses to thrive in a dynamic marketplace.
3.2 Objectives
The objectives of the proposed project are as follows:
• To fill the communication gap that differently-abled people face when they try to communicate
with normal people or vice versa.
3
Chapter 4
• Ram : 2 GB or more
4
Figure 4.1: Block Diagram
5
Chapter 5
System Design
The machine learning-based system architecture for customer segmentation comprises a structured
framework that manages the intricacies of data processing, modelling, and application integration.
The fundamental starting point of the architecture is the ingestion of data from many sources,
including external data streams, CRM systems, and transactional databases. To get ready for analysis,
this raw data goes through preprocessing procedures like cleaning, normalization, and feature
engineering. To find the most pertinent variables that best describe the behaviours and preferences of
customers, feature selection techniques like principal component analysis (PCA) and feature
importance ranking are utilized.
6
5.2 System Flow Chart
A flowchart is a visual representation of a process, system, or algorithm. It uses a standardized set
of symbols, such as rectangles, diamonds, and ovals, to illustrate a sequence of steps. These steps
can encompass anything from basic actions to intricate decision-making processes. Flowcharts are
a powerful tool for conveying complex processes clearly and straightforwardly, even for audiences
without a technical background. By visually breaking down the process into its constituent parts
and depicting the flow of information or materials, flowcharts enable users to grasp the logic and
structure of the process with ease. Figure 5.5 describes the flow chart of this project.
7
5.3 Implementation of Code
import pandas as pd
import numpy as np
df = pd.read_csv("C:/Users/Prabhavati/Downloads/customer_segmentation_dataset.csv")
df.head()
df.shape
df.info()
df.isnull().sum()
df.isnull().sum()
dates = []
for i in df['Dt_Customer']:
i = i.date()
dates.append(i)
df =df.rename(columns={"MntWines":
"Wines","MntFruits":"Fruits","MntMeatProducts":"Meat","MntFishProducts":"Fish","MntSweetProdu
cts":"Sweets","MntGoldProds":"Gold"})
df.head()
df.describe()
df = df[(df["Age_on_2014"]<90)]
df = df[(df["Income"]<600000)]
print("The total number of data-points after removing the outliers are:", len(df))
df.columns
for i, col in enumerate(['Income', 'Recency', 'Wines', 'Fruits', 'Meat', 'Fish', 'Sweets', 'Gold']):
plt.subplot(5, 3, i+1)
sns.countplot(x=df[col])
plt.title(f"Distribution of {col}")
9
plt.figure(figsize = (12, 12))
plt.subplot(5, 3, 1)
plt.subplot(5, 3, 2)
plt.subplot(5, 3, 3)
sns.countplot(x=df["Education"].dropna(), data=df)
plt.subplot(5, 3, 4)
sns.countplot(x=df["AcceptedCmp1"].dropna(), data=df)
plt.subplot(5, 3, 5)
sns.countplot(x=df["AcceptedCmp2"].dropna(), data=df)
plt.subplot(5, 3, 6)
sns.countplot(x=df["AcceptedCmp3"].dropna(), data=df)
plt.subplot(5, 3, 7)
sns.countplot(x=df["AcceptedCmp4"].dropna(), data=df)
plt.subplot(5, 3, 8)
sns.countplot(x=df["AcceptedCmp5"].dropna(), data=df)
10
plt.subplot(5, 3, 9)
sns.countplot(x=df["Complain"].dropna(), data=df)
plt.title('Distribution of Complain')
plt.subplot(5, 3, 10)
sns.countplot(x=df["Response"].dropna(), data=df)
plt.subplot(5, 3, 11)
sns.countplot(x=df["Living_with"].dropna(), data=df)
plt.subplot(5, 3, 12)
sns.countplot(x=df["Is_parent"].dropna(), data=df)
plt.show()
plt.figure(figsize=(12, 6))
plt.title("Spent vs Age")
plt.show()
plt.title("Spent vs Income")
plt.grid(False)
plt.show()
11
plt.title("Spent vs Family Size")
plt.show()
a = (df.dtypes == 'object')
object_cols = list(a[a].index)
LE = LabelEncoder()
for i in object_cols:
df[i] = df[[i]].apply(LE.fit_transform)
df.head()
corrmax = df.corr()
plt.show()
df1 = df.copy()
scaler = StandardScaler()
scaler.fit(df1)
scaled_df1
pca.fit(scaled_df1)
cumsum = np.cumsum(pca.explained_variance_ratio_)
12
d = np.argmax(cumsum >= 0.95) + 1
df1_reduced = pca.fit_transform(scaled_df1)
pca.n_components_
cumsum
cumsum[21]
plt.plot(cumsum, linewidth=3)
plt.xlabel("Dimensions")
plt.ylabel("Explained Variance")
plt.grid(True)
plt.show()
pca.explained_variance_ratio_
pca.fit(scaled_df1)
df1_reduced
13
from sklearn.cluster import KMeans
elbow.fit(df1_reduced)
elbow.show()
plt.figure(figsize=(8, 3))
plt.xlabel("$k$")
plt.ylabel("Silhouette score")
plt.grid(True)
plt.show()
plt.figure(figsize=(11, 10))
plt.subplot(4, 2, k - 1)
14
padding = len(df1_reduced) // 30
pos = padding
ticks = []
for i in range(k):
coeffs = silhouette_coefficients[y_pred == i]
coeffs.sort()
color = plt.cm.Spectral(i / k)
ticks.append(pos + len(coeffs) // 2)
plt.gca().yaxis.set_major_locator(FixedLocator(ticks))
plt.gca().yaxis.set_major_formatter(FixedFormatter(range(k)))
if k in (3, 5):
plt.ylabel("Cluster")
if k in (5, 6):
plt.xlabel("Silhouette Coefficient")
else:
plt.tick_params(labelbottom=False)
plt.title(f"$k={k}$")
plt.show()
cluster_labels = kmeans.fit_predict(df1_reduced)
df1['Cluster'] = cluster_labels
15
df1.to_excel('Clustered_data.xlsx', index = False)
df1.head()
df['Cluster'] = cluster_labels
df.head()
cluster_distribution = df1['Cluster'].value_counts().sort_index()
plt.bar(cluster_distribution.index, cluster_distribution.values)
plt.xlabel('Cluster')
plt.show()
plt.legend()
plt.show()
16
Chapter 6
6.1 Results
Machine learning-based consumer segmentation has produced impressive results, completely
changing how companies view and interact with their clientele. Businesses are able to identify
complex patterns and behaviours that are typically overlooked by traditional segmentation
techniques by utilizing sophisticated algorithms and large datasets. With the help of this feature,
it is possible to create more accurate and significant consumer segments by taking into account
variables like past purchases, demographics, online activity, and interactions with marketing
campaigns. To sum up, the outcomes of using machine learning for consumer segmentation
highlight its revolutionary influence on business strategy and client connections. Future
developments in data analytics and machine learning algorithms hold the potential to improve
segmentation strategies even more as technology develops, giving companies the advantage to
stay ahead of the curve in a cutthroat market.
17
Figure 6.1: Results
6.2 Discussion
Machine learning-based customer segmentation necessitates a number of important conversations to
guarantee successful execution and application of the segmentation findings. First and foremost, it's
critical to focus on the preparation and selection of data sources, highlighting the significance of
diverse data kinds such transactional records, demographic information, and behavioural patterns.
The groundwork for discovering relevant features that will guide the segmentation process is laid
out in this talk. Carefully weighing clustering algorithms such as K-means or hierarchical clustering,
or dimensionality reduction strategies like PCA—each with a specific applicability based on the
features of the dataset and the segmentation goals—is also necessary when choosing an algorithm.
Furthermore, in order to evaluate the quality and coherence of the generated segments, segmentation
models must be evaluated by establishing relevant metrics, such as silhouette scores or purity
metrics. Furthermore, in order to evaluate the quality and coherence of the generated segments,
segmentation models must be evaluated by establishing relevant metrics, such as silhouette scores
or purity metrics. These measurements provide as reference points for segment interpretation and
the extraction of practical knowledge that can guide the development of focused marketing
campaigns, customized client experiences, and operational enhancements. Talks should also cover
how segmentation models are incorporated into operational procedures, including deployment
obstacles and ongoing methods for improving as consumer habits change.
18
Chapter 7
7.1 Conclusion
In conclusion, customer segmentation using machine learning represents a pivotal strategy for
modern businesses seeking to thrive in a data-driven marketplace. By leveraging advanced
algorithms to uncover patterns and behaviours within their customer base, organizations can tailor
their marketing efforts with unprecedented precision. This approach not only enhances customer
satisfaction through personalized experiences but also optimizes resource allocation and improves
overall operational efficiency. Moreover, predictive analytics capabilities empower businesses to
anticipate future trends and proactively address customer needs, thereby fostering long-term
loyalty and sustainable growth. As technology continues to evolve, the strategic advantage gained
from customer segmentation using machine learning will remain essential in maintaining
competitiveness and driving innovation across diverse industries.
19
References
[1] Blanchard, Tommy. Bhatnagar, Pranshu. Behera, Trash. (2019). Marketing Analytics Scientific
Data: Achieve your marketing objectives with Python's data analytics capabilities. S.l: Packt
printing is limited
[2] Griva, A., Bardaki, C., Pramatari, K., Papakiriakopoulos, D. (2018). Sales business analysis:
Customer categories use market basket data. Systems Expert Systems, 100, 1-16.
[3] By Jerry W Thomas. 2007. Accessed at: www.decisionanalyst.com on July 12, 2015.
[4] Jayant Tikmani, Sudhanshu Tiwari, Sujata Khedkar "Telecom Customer Classification Based
on Group Analysis of K-methods", JIRCCE, Year: 2015.
[5] Vaishali R. Patel and Rupa G. Mehta “Impact of Outlier Removal and Normalization Approach
in Modified k-Means Clustering Algorithm”, IJCSI,Year: 2011.
20
Personal Profile
21
22
1