Automobile Part
Manufacturer Company's Data
MRA Project - 1
By: Akash Dey
Agenda & Executive Summary of the data
Problem statement
Executive Summary
Data Dictionary
Summary Stats
Assumptions about data
Exploratory Analysis & Insights
Univariate, Bivariate, and multivariate analysis using data visualization
Weekly, Monthly, Quarterly, Yearly Trends in Sales
Sales Across different Categories of different features in the given data
Summarize the inferences from the above analysis
Customer Segmentation using RFM analysis
What is RFM and which tool used
What all parameters used and assumptions made
Output table head
Workflow image to be put when KNIME used
Inferences from RFM Analysis and identified segments
Who are your best customers?
Which customers are on the verge of churning?
Who are your lost customers?
Who are your loyal customers?
Agenda :
01 02 03 04 05
Agenda & Exploratory RFM analysis for Identification of Recommendatio
Executive analysis and customer customers ns
Summary of insights segmentation based on
the data different
parameters
Agenda & Executive Summary of the data
• Problem statement
• About Data (Info, Shape, Summary Stats, your assumptions about data)
An automobile parts manufacturing company has collected data of transactions for 3
years. They do not have any in-house data science team, thus they have hired you
as their consultant. Your job is to use your magical data science skills to provide
them with suitable insights about their data and their customers.
Problem
Statement:
Executive
Summary:
• Data: past 3 years.
• Objective: identify the underlying buying patterns of the customers and recommend
customized marketing strategies for different segments of customers.
• Dataset: 20 columns and 2747 rows,
• Missing values and Duplicate values: None
• Outliers: some columns has few outliers
• The exploratory analysis and insights provide a clear understanding of the data and
highlight the key trends and patterns in sales.
• RFM analysis has been performed to segment the customers into four categories
based on their buying behavior, and customized marketing strategies have been
recommended for each segment.
• The presentation concludes with recommendations for the company to enhance its
customer relationships and drive business growth.
ORDERNUMBER : Order Number ORDERDATE : Order Date
CUSTOMERNAME : customer COUNTRY : Country customer
QUANTITYORDERED : Quantity ordered DAYS_SINCE_LASTORDER : Days_ Since_Lastorder
PHONE : Phone of the customer CONTACTLASTNAME : Contact person customer
STATUS : Status of order like Shipped or not
PRICEEACH : Price of Each item
Data
ADDRESSLINE1 : Address of customer
CONTACTFIRSTNAME : Contact person customer
PRODUCTLINE : Product line – CATEGORY
Dictionary
ORDERLINENUMBER : order line
DEALSIZE : Size of the deal based on Quantity and
CITY : City of customer Item Price
SALES : Sales amount MSRP : Manufacturer's Suggested Retail Price
POSTALCODE : Postal Code of customer PRODUCTCODE : Code of Product
Numeric columns: ORDERNUMBER, QUANTITYORDERED, PRICEEACH, ORDERLINENUMBER SALES,
DAYS_SINCE_LASTORDER, MSRP.
Assumptions:
The marketing
The customer
strategies may vary for
segments may be
Each row in the data each customer The order date and
defined based on the
represents a unique segment, and the days since last order
purchasing frequency,
transaction made by a company may need to columns are accurately
amount spent, and
customer. personalize their calculated.
recency of purchases
marketing efforts
(RFM Analysis).
accordingly.
The recommendations
The sales column is
The status column provided in the
calculated as the
indicates the current presentation are based
product of quantity
status of the order on the insights gained
ordered and price
accurately. from the analysis of
each.
the transaction data.
Statistical
Summary of
Numerical
Columns
Inference :
• The average number of items ordered per sales order is 35, with a standard deviation of 9.76.
• The average price of each item is 101.09, with a standard deviation of 42.04.
• The average sales amount per order is 3553.05, with a standard deviation of 1838.95.
• The average time since the last order is 1757.09 days, with a standard deviation of 819.28.
• The summary statistics do not indicate any red flags or abnormalities that could potentially indicate
issues with the data.
• Exploratory Analysis & Insights
Univariate, Bivariate, and multivariate analysis using data visualization
• Weekly, Monthly, Quarterly, Yearly Trends in Sales
• Sales Across different Categories of different features in the given data
• Summarize the inferences from the above analysis
Outliers has not been treated.
Bivariate Analysis
We can see that there is a dip in the yearly
sales.
Yearly Sales We need to find more information regarding this
as its not a good sign.
Quarterly Sales
We can see that in the quarter 4 there is high sales as compared
to other quarters.
We can see that in the 11th month there is
highest sales.
Monthly Sales
And in 6th month the sales is the lowest.
Sales is consistent in the first 4 months of the
year.
Weekday Sales
We can see Thursday has
the lowest sales and
Sunday has the highest
sales.
Sales increases from Friday
to Sunday and dips from
Monday to Thursday
Day Sales
In the stating days of month there is more sales as compared to
the end days.
Multi-Variate
Analysis
We can see most of the orders which are on hold belongs to
usa, some orders on hold belong to sweden.
Status, Country UK, USA, Spain, Sweden have approx. same number of
& Sales canceled orders.
Spain has most disputes and also most disputes are solved.
Order
Shipped &
Sales
• Usa has most shipped orders.
• Spain and France have approx.
same number of orders.
• Ireland have the least number
of orders shipped.
• Most people deal parts of classic car as it has the
higher %.
Pivot Table • Parts of motercycles are the
most Disputed category.
• Least people deal with Train parts.
Sales & Customer names
Most sales: Classic cars
Sales & Deal Least sales: Trains
Size Most Deal size: medium
Least Deal size: large
Sales of truck and buses and motorcycle are approx. same
•Highest sales : USA
Country, Product •Least sales: Ireland
line, Sales •Switzerland deals with Only classic car parts.
Status, Sales and Deal Size
No order of large deal type is cancelled it’s a good sign.
Most shipped parts belong to medium deal size type.
PRODUCT LINE AND SALES
Univariate Analysis
We can see that sales data is skewed toward
left.
Sales There are many outliers.
Most data exist between 1.5k - 3.5k.
Outliers are present in the MRSP Data
MRSP This data is shewed toward left .
QUANTITY Outliers are present in the data of Quantity
ORDERED Ordered.
PRICE OF EACH Price of Each data has money outliers
Dashboard : Sales
INFERENCES
Quarter 4 has Thursday has the
The yearly sales Sales are consistent
higher sales lowest sales, and
have dipped, which in the first 4
compared to other Sunday has the
is not a good sign. months of the year.
quarters. highest sales.
Sales increase from Spain, USA, UK, and
In the starting days Most orders on hold
Friday to Sunday Sweden have
of the month, there belong to the USA,
and dip from approximately the
are more sales than and some belong to
Monday to same number of
the end days. Sweden.
Thursday. canceled orders.
Classic car parts
Spain has the most
have the highest
disputes, and most
percentage of
disputes are solved.
sales.
Recommendation
To capitalize on the Businesses should
Further investigation Businesses should
high sales in quarter identify the reasons
is necessary to focus their marketing
4, businesses should behind the low sales
identify the reasons efforts on Thursdays
focus on increasing in the 6th month and
for the dip in yearly to improve sales on
their inventory during work towards
sales. this day.
this period. addressing them.
Businesses should
Strategies such as Businesses should
consider offering Efforts should be
weekend sales can be monitor canceled
discounts or made to resolve the
employed to increase orders closely and
promotions during the orders on hold to
sales from Friday to identify any trends to
start of the month to prevent revenue loss.
Sunday. address them.
increase sales.
Businesses should
Steps should be taken
consider expanding
to resolve disputes
their inventory of
quickly and efficiently
classic car parts,
to maintain customer
given their high sales
satisfaction.
percentage.
Customer Segmentation using RFM analysis
• What is RFM and which tool used
• What all parameters used and assumptions made
• Output table head
• Workflow image to be put when KNIME used
What is RFM ?
Recency, frequency,
monetary value (RFM) is a
marketing analysis tool used
to identify a firm’s best
clients based on the nature of
their spending habits.
An RFM analysis evaluates
clients and customers by
scoring them in three
categories: how recently
they’ve made a purchase,
how often they buy, and the
size of their purchases.
Tool used :
KNIME
KNIME, the Konstanz Information Miner, is
a free and open-source data analytics,
reporting and integration platform.
What all parameters used and
assumptions made
As per instructions the column 'Days since last order' is ignored and new column
Recency as '[Max(order date)-order date]'
We have assumed ‘01-06-2020’ as a reference date and created recency column.
The calculated formula for:
Recency :- [min(Recency) customer wise].
Frequency:- [count(customer name) customer wise]. We can also take order quantity
Monetary:- [sum(unit price + qty ordered) customer wise]. We can also take sales
Based on above we have made 3 bins : high , medium , low
KNIME Workflow
Few rows of
output
Inferences from RFM Analysis and identified
segments
Who are your best customers?
Which customers are on the verge of churning?
Who are your lost customers?
Who are your loyal customers?
Top 5 best customers
According to RFM score we have grouped the top customers.
We have given importance to recency more and ordered the
customers accordingly.
Top loyal
customers
Based on RFM analysis these are the loyal customers
We have focused on monetary value
If we focus on these customer we can turn them in to best customers.
Verge on churning
Customers
As per RFM score we can see that these are the top customers on the verge of
churning.
We should focus on these customers before we lose them.
We should try some action plan to convert them into regular customers.
Top Lost
Customers
As per RFM score we can see that these are the customers which we have lost.
There recency is very low as well as they have not made purchase frequently.
We should study them survey them to understand the reasons why we lost them. And
take further steps so that we do not lose the customers.
Recommendations
According to RFM analysis, customers can be categorized into
four distinct groups: best, loyal, verge of churn, and lost
customers. It is important to develop a focused approach for
Recommendations
each group in order to optimize customer retention and
enhance customer experience.
best customers, it is recommended to provide personalized
recognition, exclusive offers, and incentives to ensure that
they continue to choose our company over others. By doing
so, we can maintain their loyalty and strengthen the long-
term relationship.
loyal customers, it is essential to offer periodic discounts
and offers to keep them engaged and interested in our
products or services. By keeping them engaged, we can turn
them into our best customers and improve their satisfaction
level with our brand.
verge of churn customer, we need to develop an effective
action plan to prevent them from leaving the company. We
can conduct surveys, offer incentives, and personalize the
communication to identify and address their concerns, and
thereby increase their loyalty towards our brand.
lost customers, it is important to analyze their behavior
Thank you