Solution
Solution
Based on the dataset, I have identified the following 10 major KPIs that would be useful for the
business:
Average Order Value (AOV): Average amount spent by customers in a single transaction
Customer Retention Rate: Percentage of customers who have made repeat purchases
Product Category Sales: Sales revenue generated by each product category (e.g. dairy,
bakery, etc.)
Top-Selling Products: Products that have generated the highest sales revenue
Region-wise Sales: Sales revenue generated by each region (e.g. Chennai, Coimbatore, etc.)
State-wise Sales: Sales revenue generated by each state (e.g. Tamil Nadu, Karnataka, etc.)
Inventory Turnover: Number of times inventory is sold and replaced within a given period
Step 2: Load the dataset and perform Data Preprocessing, Outlier Detection, and Exploratory Data
Analysis
To perform data preprocessing, outlier detection, and exploratory data analysis, I will use Python
with the Pandas and NumPy libraries.
import pandas as pd
import numpy as np
# Data Preprocessing
print(df.isnull().sum())
df.fillna(df.mean(), inplace=True)
# Outlier Detection
z_scores = np.abs(stats.zscore(df))
print(z_scores)
# Summary statistics
print(df.describe())
df.plot(kind='bar')
plt.show()
Output:
Step 3: Use Association Rule Mining technique to identify the items frequently bought together
and their demands
To perform association rule mining, I will use the Apriori algorithm implemented in the Python
library mlxtend.
transactions = []
transactions.append(row['Item Name'])
print(rules.head(10))
Output:
Top 10 association rules showing the items frequently bought together and their demands
Step 4: Use Classification techniques to develop a model and predict the item categories and sub-
categories that would provide the highest sales and profit region-wise/state-wise
y = df['Item Category']
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
print('Classification Report:')
print(classification_report(y_test, y_pred))
Output:
Step 5: Modify the dataset to incorporate the Non-Volatile feature of data warehouse
To modify the dataset to incorporate the Non-Volatile feature of data warehouse, I will create a new
column Version to track changes to the data.
df['Version'] = 1