Assignment
Assignment
Write code to fill missing values in column 'A' with the column mean.
Drop rows where all values are missing.
Scenario Question: Applying Supervised and Unsupervised Machine Learning Techniques
You are working as a data scientist for a retail company that wants to optimize its customer
retention strategy. The company has historical data on customers, including their purchase
behavior and demographic information. Your task is to build machine learning models to:
1. Predict whether a customer will churn (leave) based on their behavior and demographic
features (a supervised learning problem).
2. Segment customers into groups with similar purchasing behaviors (an unsupervised
learning problem) to identify different customer personas for targeted marketing.
Data Description:
Features for supervised and unsupervised learning:
o CustomerID: Unique identifier for each customer.
o Age: Age of the customer.
o Income: Annual income of the customer.
o Num_Purchases: Number of purchases made by the customer.
o Avg_Purchase_Value: Average value of purchases made.
o Last_Purchase_Days_Ago: Number of days since the last purchase.
o Churn: Target variable indicating whether a customer has churned (1) or not (0)
(only available for supervised learning).
Step 1: Apply Supervised Learning Techniques
Use the following supervised learning algorithms to predict customer churn:
1. Logistic Regression
2. Decision Trees
3. Random Forest
4. Support Vector Machine (SVM)
5. K-Nearest Neighbors (KNN)
Step 2: Apply Unsupervised Learning Techniques
Use the following unsupervised learning techniques to segment customers based on their
purchase behavior:
1. K-Means Clustering
2. Hierarchical Clustering
3. DBSCAN
For each supervised learning model, compute the accuracy, precision, recall, and F1-score on a
test set. Present the results in the following table:
Which supervised learning model performed the best, and why do you think it was
effective for this problem?
How many customer segments were found using unsupervised learning? Describe the
characteristics of the customer segments (e.g., high-value vs. low-value customers).
What strategies could the company employ based on the model's predictions and
segmentation results?